-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
gh-48181: Document codecs.charmap_build
#135997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The details are a bit more involved. The function returns a dictionary, if there are non-BMP chars involved or there's no 1-1 mapping of NUL to \x00. In all other cases, a special trie object of type EncodingMap is returned, which optimizes the lookups. The details can be found in the PyUnicode_BuildEncodingMap() function in unicodeobject.c. Here's the documentation of that function (the charmap_bulid() function is a wrapper around this C API): https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_BuildEncodingMap You may want to copy that description and perhaps also include an example of how it is used (see e.g. encodings/cp1251.py at the end). The usual approach is to create a decoding mapping string (going from ordinal to Unicode code string) and then pass this to charmap_build() to create a corresponding encoding map (going from Unicode code point ordinal to bytes ordinal). |
I wrote that doc some time ago, but I simplified it here unnecessarily. EncodingMap should be documented too, no? I personally do not see need for an example. |
Ah, I didn't know. I'm not sure about EncodingMap, since this is really only used internally. You can't create it directly from Python and it only has a single method .size() which returns the size of the trie (number of mappings). I don't think it's used anywhere. It may be worth noting that an internal object EncodingMap is returned, which can be used with the codecs.charmap_encode() function. But that function isn't documented either. |
I see, I think we should then leave it undocumented then, I modified the text. |
Just noticed: you have encoding and decoding reversed in the text. The function builds an encoding mapping (Unicode to bytes) and uses a decoding mapping string as input (bytes ordinals via the position in the string to Unicode). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm now. Thanks, @StanFromIreland
Do you want me to merge it or will you do this ? |
Thank you, I am only a triager;-) Hopefully one day... |
Thanks @StanFromIreland for the PR, and @malemburg for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13, 3.14. |
(cherry picked from commit 2bdd503) Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
(cherry picked from commit 2bdd503) Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
GH-136123 is a backport of this pull request to the 3.14 branch. |
GH-136124 is a backport of this pull request to the 3.13 branch. |
Thanks, @StanFromIreland, for your work on this. |
That is one old issue:-)
I used my docs from when I documented the underlying C API.
Per @malemburg 's comment, this does not need more testing, since the C API is already well tested. This raises the question, should we note what this function is in the documentation (i.e. link to to C API it exports)? I am also not sure about the best place to put it, I am happy to move it.
📚 Documentation preview 📚: https://cpython-previews--135997.org.readthedocs.build/