-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for pluralization on messages #12
Add support for pluralization on messages #12
Conversation
…lations using ngettext.
…g as message ids.
Creating the C objects is about 10 times faster (measurements taken on Python 3.6.6 with IPython 6.2.1): In [1]: from zope.i18nmessageid.message import Message, pyMessage
In [2]: Message
Out[2]: zope.i18nmessageid.message.Message
In [3]: pyMessage
Out[3]: zope.i18nmessageid.message.Message
In [4]: pyMessage is Message
Out[4]: False
In [5]: %timeit Message(u'a str', 'a domain', None, None)
331 ns ± 3.73 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [6]: %timeit pyMessage(u'a str', 'a domain', None, None)
3.24 µs ± 46.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) Since many message ids are created at import time, that has an impact on startup speed. C MessageIDs are also ~60% faster to pickle (that might matter for some ZODB or network applications?): In [13]: %timeit msg.__reduce__()
348 ns ± 2.83 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [14]: %timeit pymsg.__reduce__()
564 ns ± 3.78 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) They report the same size and take about the same amount of time to access attribute, though: In [7]: sys.getsizeof(Message(u'a str', 'a domain', None, None))
Out[7]: 110
In [8]: sys.getsizeof(pyMessage(u'a str', 'a domain', None, None))
Out[8]: 110
In [9]: msg = Message(u'a str', 'a domain', None, None)
In [10]: pymsg = pyMessage(u'a str', 'a domain', None, None)
In [11]: %timeit msg.domain
41.4 ns ± 0.6 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [12]: %timeit pymsg.domain
38.7 ns ± 0.24 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) |
I see your point regarding speed when creating the objects, but compared to the rest of the code loaded and all the objects created in a Zope stack, I am not convinced that's a huge difference. However I believe that it is a bigger burden to maintain, where we can easily have bugs and memory issues. I do not think the pickling time is really relevant here, since translations are generally done inside the software and business logic, and are not data that stored in the database. If we all decide to keep the C extension, I can invest time in order to extend the current one with the new attributes, but I am still not convinced that this really needed. |
The other feature provided by the C version is that message IDs become truly immutable, which was important for @jimfulton for security reasons (memory is fading, but "message IDs should be rocks" still echoes somehow). |
Mostly: #1 😉
That sounds reasonable. There's no need to wrap them in security proxies if they're truly immutable. In fact, zope.security specifically declines to proxy them with the comment that "they're immutable, so it's ok" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically LGTM, with some (mostly) style comments.
The 0
-> None
degradation on copy should be fixed and I think deserves a separate unit test.
The question on whether number
can be a long
on Python 2 is, I think, somewhat important. (Perhaps we can use six.int_types
?) I admit I was surprised to see float
s also accepted; ngettext()
does not support floats in C and I look forward to discovering how this is going to be handled.
I'm going to heroically resist raising the question about passing decimal.Decimal() or complex numbers to messages. 😜
The security implications of true immutability are, I think, not relevant to this PR: if it's bad to have possibly mutable pure-Python message IDs, then we're already in that situation if the user didn't happen to have a C compiler available when they installed zope.i18nmessageid.
I will refrain from speculating on performance questions.
""") | ||
sys.stderr.write(str(e) + '\n') | ||
sys.stderr.write('*' * 80 + '\n') | ||
return open(path.join(path.dirname(__file__), *rnames)).read() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What a perfect opportunity to use a with open(...) as f: return f.read()
here!
@@ -123,7 +53,6 @@ def _unavailable(self, e): | |||
'Programming Language :: Python :: 3.5', | |||
'Programming Language :: Python :: 3.6', | |||
'Programming Language :: Python :: 3.7', | |||
"Programming Language :: Python :: Implementation :: CPython", | |||
"Programming Language :: Python :: Implementation :: PyPy", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me think this package is PyPy-only and doesn't support CPython anymore.
My gut feeling is that we should either list both CPython and PyPy or neither, and that it's better to list both when we have explicit CI tests on both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You do not need to have a C extension to work on CPython (which is the regular one).
__slots__ = ('domain', 'default', 'mapping', '_readonly') | ||
__slots__ = ( | ||
'domain', 'default', 'mapping', '_readonly', | ||
'msgid_plural', 'default_plural', 'number' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like a trailing comma when the closing bracket is on the next line.
default_plural = ( | ||
ustr.default_plural and ustr.default_plural[:] | ||
or default_plural) | ||
number = ustr.number is not None and ustr.number or number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can ustr.number
be 0
? Because this and-or trick will treat it the same as None
and can in fact be shortened to ustr.number or number
.
Maybe it's time to use if-else expressions instead of and-or tricks?
(Also I'm somewhat surprised that _(_(..., default='foo'), default='bar').default
is 'foo' and not 'bar', but maybe if I took the time to read the doctests I'd see justification for it.)
|
||
if number is not None and not isinstance(number, (int, float)): | ||
# Number must be an integer | ||
raise TypeError('`number` should be an integer or a float.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Python 2 it can also be long
, can't it?
return unicode(self), self.domain, self.default, self.mapping | ||
return ( | ||
unicode(self), self.domain, self.default, self.mapping, | ||
self.msgid_plural, self.default_plural, self.number) | ||
|
||
def __reduce__(self): | ||
return self.__class__, self.__getstate__() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks backwards-compatible with old pickles, good!
Oh dear: https://bugs.launchpad.net/zope3/+bug/220122/comments/5
Somebody file a bug about that please while I run off into the woods. |
self.domain = domain | ||
self.default = default | ||
self.mapping = mapping | ||
self.msgid_plural = msgid_plural | ||
self.default_plural = default_plural or msgid_plural |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the or msgid_plural
logic here? It's not done for self.default
, and it's not done in zopefoundation/zope.i18n#34 (review), where you can do translate("msgid", msgid_plural="msgids")
without providing defaults and it'll pass a default_plural of None to the underlying catalog translation machinery.
Regarding as security argument, I believe that to be a weak argument nowadays, because:
But nonetheless I will update the C extension with the attributes. I should probably make a new branch in order not to waste the history on the files that have been removed here. |
I made a new pull request with changes in the C extension: #13 |
This is the first part of two pull requests, the other one is zopefoundation/zope.i18n#34, in order to add proper support for plurization. This add some attributes to the Message in order to support it.
We did not see the advantage to the C extension, which seemed just to act as a tuple, so instead of adding attributes there we removed it. I hope this will not cause any problem.