-
Notifications
You must be signed in to change notification settings - Fork 518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create NON_PRINTABLE regex at runtime #301
base: main
Are you sure you want to change the base?
Conversation
I submitted a similar PR last year, where I replaced the regex with proper unicode character class queries: #124 |
That would work for me too. I'm still concerned with the behavior of the C code, but that removes the problematic regex. |
AFAICT the C module relies on the libyaml implementation, which does not use regular expressions but a simple range check |
@nitzmahone Any chance we can get this merged? Or #124 for that matter. It impacts both the resource usage and load time. |
We're planning on poking at stuff for the next release pretty soon- I need to go back and untangle why the current split impl was done the way it was before deciding what makes the most sense here (vs #124). Ideally we shouldn't need a regex at all, because clearly there's a significant expense to it, but I'd need to measure to see which is worse. |
@nitzmahone Any chance this can be merged? The regex changed so it's conflicting now :( |
@nitzmahone Are there any blockers to merge this PR (once the merge conflict is solved)? We would benefit from it, having to manually patch |
f1e261f
to
f84086e
Compare
f84086e
to
5080507
Compare
This moves Reader.NON_PRINTABLE to a property, instead of a class variable.
We're testing the 5.1 release on Python 2.7 with UCS4 support, and we saw an
increase of 30M due to this single regular expression. As we're using the C
extension, this code path is not even triggered, so making it evaluated at
runtime works around the issue.
I'm curious how the C extension does the job and if there isn't a difference in
behavior.
This isn't an issue with Python3.