-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add efficient primitives for str.strip(). #18742
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Looks good overall, just a few things I noticed. Can you run some microbenchmarks to make sure the performance impact is as expected?
mypyc/test-data/run-strings.test
Outdated
@@ -774,3 +774,16 @@ def test_surrogate() -> None: | |||
assert ord(f()) == 0xd800 | |||
assert ord("\udfff") == 0xdfff | |||
assert repr("foobar\x00\xab\ud912\U00012345") == r"'foobar\x00«\ud912𒍅'" | |||
|
|||
[case testStrip] | |||
# This is a negative test. strip variants without args does not use efficient primitives. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment out of date / incorrect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was out of date. Removed it.
s = "xxb2yy" | ||
assert s.lstrip("xy") == "b2yy" | ||
assert s.strip("xy") == "b2" | ||
assert s.rstrip("xy") == "xxb2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you test all string kinds and different character code ranges, such as these (and mixing these):
- Character codes between 128 and 255 (0x80 to 0xff)
- Character codes between 256 and 65535 (0x100 to 0xffff)
- Character codes 65536+ (0x10000+)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
mypyc/lib-rt/str_ops.c
Outdated
@@ -5,6 +5,58 @@ | |||
#include <Python.h> | |||
#include "CPy.h" | |||
|
|||
// Copied from cpython.git:Objects/unicodeobject.c. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please mention the Python version or commit date from where this is from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
BLOOM_MASK sepmask; | ||
Py_ssize_t seplen; | ||
|
||
kind = PyUnicode_KIND(self); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to call PyUnicode_READY
on self
and sepobj
(and check the return values) on 3.9 at least, before you can call PyUnicode_KIND
etc. You can check the relevant function in Python 3.9 to see how it's used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// Copied from do_strip function in cpython.git/Objects/unicodeobject.c. | ||
PyObject *_CPyStr_Strip(PyObject *self, int strip_type, PyObject *sep) { | ||
if (sep == NULL || sep == Py_None) { | ||
Py_ssize_t len, i, j; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to above, I think you'll need to call PyUnicode_READY
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* Fixing code comments. * Adding tests with more unicode chars. * Adding commit ID for code copied from cpython.git.
I have pushed a new commit that addresses review comments.
|
Fixes mypyc/mypyc#1090.
Copying cpython implementation for strip, lstrip and rstrip to
str_ops.c
.