New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
encoding unicode values - best practice? #65
Comments
If I make it an e- field istead, we still get
becomes
Is having |
I guess my expectation is that the HTML property would be preserve the original markup, i.e. continue to include the entity I'd vote to not to force the result to ASCII anymore because every system we expect to use mf2py support UTF-8, and we don't want to subject our Russian friends to "content": [{
"html": "\n<p>\u0430 \u043f\u0440\u044f\u043c\u043e\u0433\u043e \u0438\u0437 \u041c\u0421\u041a \u0432 \u0442\u043e\u0447\u043a\u0443 \u043d\u0430\u0437\u043d\u0430\u0447\u0435\u043d\u0438\u044f \u043d\u0435 \u0431\u044b\u043b\u043e?</p>\n",
"value": "\n\u0430 \u043f\u0440\u044f\u043c\u043e\u0433\u043e \u0438\u0437 \u041c\u0421\u041a \u0432 \u0442\u043e\u0447\u043a\u0443 \u043d\u0430\u0437\u043d\u0430\u0447\u0435\u043d\u0438\u044f \u043d\u0435 \u0431\u044b\u043b\u043e?\n"
}] so I propose <div class="h-entry"><span class="e-name">Entity — emdash</span></div> should be parsed as {"rels": {},
"items":
[{"type": ["h-entry"], "properties":
{"name":
[{"html": "Entity — emdash",
"value": "Entity — emdash"}]}}],
"rel-urls": {}} |
Can you clarify that 'Russian' example? They use KOI8-r or utf8, don't they?
|
Heh, yeah UTF-9 was an unfortunate typo. Wish GitHub would wait a tick before sending the email notification... And yep I'm agreeing with you, except I think we should leave html entities as-is in the "html" output (precisely because there are exceptions like |
phpmf2 also encodes the |
We're on Python 3 now, and mf2py now returns (Also, afaik JSON technically is an ASCII-only format, which is why both mf2py's |
JSON is utf8,
https://tools.ietf.org/id/draft-ietf-json-rfc4627bis-09.html#rfc.section.8.1
but unicode characters may be escaped:
https://tools.ietf.org/id/draft-ietf-json-rfc4627bis-09.html#rfc.section.7
If you're using emoji or other non basic plane chars, the escaping is
nasty.
…On Fri, 14 Jan 2022, 2:43 pm Ryan Barrett, ***@***.***> wrote:
Closed #65 <#65>.
—
Reply to this email directly, view it on GitHub
<#65 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAYFQHCZJPYZ2F6I423OXLUWAZBJANCNFSM4B6QO3XQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
If I start with
I get
with the emdash as a unicode entity.
If we passed
ensure_ascii=False
to json.dumps() we'd getWould that be more normal json? What is good practice here?
The text was updated successfully, but these errors were encountered: