This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author LiarPrincess
Recipients LiarPrincess, ezio.melotti, vstinner
Date 2022-04-06.17:23:31
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1649265811.35.0.201706038285.issue47243@roundup.psfhosted.org>
In-reply-to
Content
This one is so tiny that I'm not really sure we want to merge it…

=== Problem ===

`Objects/unicodetype_db.h` starts in a following way:

```c
/* a list of unique character type descriptors */
const _PyUnicode_TypeRecord _PyUnicode_TypeRecords[] = {
    {0, 0, 0, 0, 0, 0},
    {0, 0, 0, 0, 0, 0},
    {0, 0, 0, 0, 0, 32},
    {0, 0, 0, 0, 0, 48},
    …
```

The 1st record (`{0, 0, 0, 0, 0, 0}`) is duplicated.
This is not a problem, since the 1st occurrence is never used, but if we wanted to remove it then this is the ticket about it.

=== Detailed description ===

`Objects/unicodetype_db.h` is generated by `Tools/unicode/makeunicodedata.py` (I removed irrelevant lines):

```py
def makeunicodetype(unicode, trace):
    dummy = (0, 0, 0, 0, 0, 0)
    table = [dummy] # (1)
    cache = {0: dummy} # (2)

    for char in unicode.chars:
        # Things…

        item = (upper, lower, title, decimal, digit, flags)

        i = cache.get(item) # (3)
        if i is None:
            cache[item] = i = len(table)
            table.append(item)

        index[char] = i
```

- (1) - list which contains unique character properties (as `(upper, lower, title, decimal, digit, flags)` tuples)
- (2) - mapping from character properties to index in `table` - improperly initialized as a mapping from index to character properties
- (3) - we check if the current tuple is in `cache`

=== Result ===

The first time we get to a character that has `(0, 0, 0, 0, 0, 0)` properties (which is code point 0 - `NULL`) we check if it is in cache. It it not (there is an entry that goes from index `0` to `(0, 0, 0, 0, 0, 0)` - the other way around), so we add this entry to `table` and `cache`.

=== Fix ===

In the line `(2)` we should have: `cache = {dummy: 0}`. Obviously after doing so we have to run `makeunicodedata.py` - this is why this simple change modifies a lot of lines.

I will submit PR on github in just a sec…
History
Date User Action Args
2022-04-06 17:23:31LiarPrincesssetrecipients: + LiarPrincess, vstinner, ezio.melotti
2022-04-06 17:23:31LiarPrincesssetmessageid: <1649265811.35.0.201706038285.issue47243@roundup.psfhosted.org>
2022-04-06 17:23:31LiarPrincesslinkissue47243 messages
2022-04-06 17:23:31LiarPrincesscreate