Skip to content

gh-48181: Document codecs.charmap_build #135997

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 30, 2025

Conversation

StanFromIreland
Copy link
Member

@StanFromIreland StanFromIreland commented Jun 26, 2025

That is one old issue:-)

I used my docs from when I documented the underlying C API.

Per @malemburg 's comment, this does not need more testing, since the C API is already well tested. This raises the question, should we note what this function is in the documentation (i.e. link to to C API it exports)? I am also not sure about the best place to put it, I am happy to move it.


📚 Documentation preview 📚: https://cpython-previews--135997.org.readthedocs.build/

@malemburg
Copy link
Member

The details are a bit more involved.

The function returns a dictionary, if there are non-BMP chars involved or there's no 1-1 mapping of NUL to \x00. In all other cases, a special trie object of type EncodingMap is returned, which optimizes the lookups.

The details can be found in the PyUnicode_BuildEncodingMap() function in unicodeobject.c. Here's the documentation of that function (the charmap_bulid() function is a wrapper around this C API): https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_BuildEncodingMap

You may want to copy that description and perhaps also include an example of how it is used (see e.g. encodings/cp1251.py at the end). The usual approach is to create a decoding mapping string (going from ordinal to Unicode code string) and then pass this to charmap_build() to create a corresponding encoding map (going from Unicode code point ordinal to bytes ordinal).

@StanFromIreland
Copy link
Member Author

StanFromIreland commented Jun 30, 2025

Here's the documentation of that function (the charmap_bulid() function is a wrapper around this C API): https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_BuildEncodingMap

I wrote that doc some time ago, but I simplified it here unnecessarily. EncodingMap should be documented too, no? I personally do not see need for an example.

@malemburg
Copy link
Member

Ah, I didn't know.

I'm not sure about EncodingMap, since this is really only used internally. You can't create it directly from Python and it only has a single method .size() which returns the size of the trie (number of mappings). I don't think it's used anywhere.

It may be worth noting that an internal object EncodingMap is returned, which can be used with the codecs.charmap_encode() function. But that function isn't documented either.

@StanFromIreland
Copy link
Member Author

StanFromIreland commented Jun 30, 2025

I see, I think we should then leave it undocumented then, I modified the text.

@malemburg
Copy link
Member

Just noticed: you have encoding and decoding reversed in the text. The function builds an encoding mapping (Unicode to bytes) and uses a decoding mapping string as input (bytes ordinals via the position in the string to Unicode).

Copy link
Member

@malemburg malemburg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm now. Thanks, @StanFromIreland

@malemburg
Copy link
Member

Do you want me to merge it or will you do this ?

@StanFromIreland
Copy link
Member Author

Thank you, I am only a triager;-) Hopefully one day...

@malemburg malemburg added needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes labels Jun 30, 2025
@malemburg malemburg merged commit 2bdd503 into python:main Jun 30, 2025
33 checks passed
@github-project-automation github-project-automation bot moved this from Todo to Done in Docs PRs Jun 30, 2025
@miss-islington-app
Copy link

Thanks @StanFromIreland for the PR, and @malemburg for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13, 3.14.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jun 30, 2025
(cherry picked from commit 2bdd503)

Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jun 30, 2025
(cherry picked from commit 2bdd503)

Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
@bedevere-app
Copy link

bedevere-app bot commented Jun 30, 2025

GH-136123 is a backport of this pull request to the 3.14 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label Jun 30, 2025
@bedevere-app
Copy link

bedevere-app bot commented Jun 30, 2025

GH-136124 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Jun 30, 2025
@StanFromIreland StanFromIreland deleted the doc-charmap-build branch June 30, 2025 13:46
malemburg pushed a commit that referenced this pull request Jun 30, 2025
gh-48181: Document `codecs.charmap_build` (GH-135997)
(cherry picked from commit 2bdd503)

Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
malemburg pushed a commit that referenced this pull request Jun 30, 2025
gh-48181: Document `codecs.charmap_build` (GH-135997)
(cherry picked from commit 2bdd503)

Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
@malemburg
Copy link
Member

Thanks, @StanFromIreland, for your work on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir skip news
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants