Skip to content

Bug: ASCII range mentioned as U+0001..U+007F, rather than U+0000..U+007F #135923

Open
@ezequiel-garzon

Description

@ezequiel-garzon

Describe the bug

Section 2.3. Identifiers and keywords of the python.org language reference indicates:

Within the ASCII range (U+0001..U+007F), the valid characters for identifiers include the uppercase and lowercase letters A through Z, the underscore _ and, except for the first character, the digits 0 through 9.

But the ASCII range is U+0000..U+007F, not U+0001..U+007F. The documentation points to PEP 3131, where the same range is used.

ASCII-range

To Reproduce

Visit the referenced webpage.

Expected behavior

Unicode's first block, with "Range: 0000..007F" (second line of U0000), is known as Basic Latin or C0 Controls and Basic Latin. The official charts index page refers to it as Basic Latin (ASCII), emphasizing the historical connection with the older standard.

Therefore, if it is deemed necessary to clarify the ASCII range at all, it should be U+0000..U+007F, not U+0001..U+007F.

URL to the issue

https://docs.python.org/3/reference/lexical_analysis.html#identifiers

Screenshots

No response

Browsers

Chrome

Operating System

macOS

Browser Version

No response

Relevant log output

No response

Additional context

No response

Activity

merwok

merwok commented on Apr 23, 2025

@merwok
StanFromIreland

StanFromIreland commented on Jun 25, 2025

@StanFromIreland
Member

I see arguments for both, since the null character would not be valid, and that is probably what the note in the PEP implied. Changing the docs should be fine, but this issue could simply fall under the existing meta issue: #135676. cc @encukou

encukou

encukou commented on Jun 25, 2025

@encukou
Member

NUL is not a valid source character, so the ASCII range for Python source code is indeed U+0001..U+007F.

I have a WIP branch that'll define “source character” for #135676, so I'll assign this to myself.

self-assigned this
on Jun 25, 2025
terryjreedy

terryjreedy commented on Jun 25, 2025

@terryjreedy
Member
>>> ord('\0')
0
>>> '\0'.isascii()
True 
>>> # But ...
>>> print(0, '\0', 0)
0 0  # In IDLE, that is \0 NUL between the 0s, but Firefox will not paste it.
>>> for c in '0 0': pass  # 3 chars copied from line above (but NUL not pasted).
SyntaxError: source code string cannot contain null bytes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

docsDocumentation in the Doc dirtype-bugAn unexpected behavior, bug, or error

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @encukou@merwok@hugovk@ezequiel-garzon@terryjreedy

      Issue actions

        Bug: ASCII range mentioned as U+0001..U+007F, rather than U+0000..U+007F · Issue #135923 · python/cpython