Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for pluralization on messages #12

Closed
wants to merge 7 commits into from
Closed

Add support for pluralization on messages #12

wants to merge 7 commits into from

Conversation

thefunny42
Copy link
Member

@thefunny42 thefunny42 commented Oct 17, 2018

This is the first part of two pull requests, the other one is zopefoundation/zope.i18n#34, in order to add proper support for plurization. This add some attributes to the Message in order to support it.

We did not see the advantage to the C extension, which seemed just to act as a tuple, so instead of adding attributes there we removed it. I hope this will not cause any problem.

@jamadden
Copy link
Member

We did not see the advantage to the C extension,

Creating the C objects is about 10 times faster (measurements taken on Python 3.6.6 with IPython 6.2.1):

In [1]: from zope.i18nmessageid.message import Message, pyMessage

In [2]: Message
Out[2]: zope.i18nmessageid.message.Message

In [3]: pyMessage
Out[3]: zope.i18nmessageid.message.Message

In [4]: pyMessage is Message
Out[4]: False

In [5]: %timeit Message(u'a str', 'a domain', None, None)
331 ns ± 3.73 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [6]: %timeit pyMessage(u'a str', 'a domain', None, None)
3.24 µs ± 46.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Since many message ids are created at import time, that has an impact on startup speed.

C MessageIDs are also ~60% faster to pickle (that might matter for some ZODB or network applications?):

In [13]: %timeit msg.__reduce__()
348 ns ± 2.83 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [14]: %timeit pymsg.__reduce__()
564 ns ± 3.78 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

They report the same size and take about the same amount of time to access attribute, though:

In [7]: sys.getsizeof(Message(u'a str', 'a domain', None, None))
Out[7]: 110

In [8]: sys.getsizeof(pyMessage(u'a str', 'a domain', None, None))
Out[8]: 110

In [9]: msg = Message(u'a str', 'a domain', None, None)

In [10]: pymsg = pyMessage(u'a str', 'a domain', None, None)

In [11]: %timeit msg.domain
41.4 ns ± 0.6 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [12]: %timeit pymsg.domain
38.7 ns ± 0.24 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

@thefunny42
Copy link
Member Author

I see your point regarding speed when creating the objects, but compared to the rest of the code loaded and all the objects created in a Zope stack, I am not convinced that's a huge difference. However I believe that it is a bigger burden to maintain, where we can easily have bugs and memory issues.

I do not think the pickling time is really relevant here, since translations are generally done inside the software and business logic, and are not data that stored in the database.

If we all decide to keep the C extension, I can invest time in order to extend the current one with the new attributes, but I am still not convinced that this really needed.

@tseaver
Copy link
Member

tseaver commented Oct 17, 2018

The other feature provided by the C version is that message IDs become truly immutable, which was important for @jimfulton for security reasons (memory is fading, but "message IDs should be rocks" still echoes somehow).

@jamadden
Copy link
Member

The other feature provided by the C version is that message IDs become truly immutable

Mostly: #1 😉

which was important for jimfulton for security reasons

That sounds reasonable. There's no need to wrap them in security proxies if they're truly immutable. In fact, zope.security specifically declines to proxy them with the comment that "they're immutable, so it's ok"

Copy link
Member

@mgedmin mgedmin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically LGTM, with some (mostly) style comments.

The 0 -> None degradation on copy should be fixed and I think deserves a separate unit test.

The question on whether number can be a long on Python 2 is, I think, somewhat important. (Perhaps we can use six.int_types?) I admit I was surprised to see floats also accepted; ngettext() does not support floats in C and I look forward to discovering how this is going to be handled.

I'm going to heroically resist raising the question about passing decimal.Decimal() or complex numbers to messages. 😜

The security implications of true immutability are, I think, not relevant to this PR: if it's bad to have possibly mutable pure-Python message IDs, then we're already in that situation if the user didn't happen to have a C compiler available when they installed zope.i18nmessageid.

I will refrain from speculating on performance questions.

""")
sys.stderr.write(str(e) + '\n')
sys.stderr.write('*' * 80 + '\n')
return open(path.join(path.dirname(__file__), *rnames)).read()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a perfect opportunity to use a with open(...) as f: return f.read() here!

@@ -123,7 +53,6 @@ def _unavailable(self, e):
'Programming Language :: Python :: 3.5',
'Programming Language :: Python :: 3.6',
'Programming Language :: Python :: 3.7',
"Programming Language :: Python :: Implementation :: CPython",
"Programming Language :: Python :: Implementation :: PyPy",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me think this package is PyPy-only and doesn't support CPython anymore.

My gut feeling is that we should either list both CPython and PyPy or neither, and that it's better to list both when we have explicit CI tests on both.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do not need to have a C extension to work on CPython (which is the regular one).

__slots__ = ('domain', 'default', 'mapping', '_readonly')
__slots__ = (
'domain', 'default', 'mapping', '_readonly',
'msgid_plural', 'default_plural', 'number'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like a trailing comma when the closing bracket is on the next line.

default_plural = (
ustr.default_plural and ustr.default_plural[:]
or default_plural)
number = ustr.number is not None and ustr.number or number
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can ustr.number be 0? Because this and-or trick will treat it the same as None and can in fact be shortened to ustr.number or number.

Maybe it's time to use if-else expressions instead of and-or tricks?

(Also I'm somewhat surprised that _(_(..., default='foo'), default='bar').default is 'foo' and not 'bar', but maybe if I took the time to read the doctests I'd see justification for it.)


if number is not None and not isinstance(number, (int, float)):
# Number must be an integer
raise TypeError('`number` should be an integer or a float.')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Python 2 it can also be long, can't it?

return unicode(self), self.domain, self.default, self.mapping
return (
unicode(self), self.domain, self.default, self.mapping,
self.msgid_plural, self.default_plural, self.number)

def __reduce__(self):
return self.__class__, self.__getstate__()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks backwards-compatible with old pickles, good!

@mgedmin
Copy link
Member

mgedmin commented Oct 17, 2018

Oh dear: https://bugs.launchpad.net/zope3/+bug/220122/comments/5

Because reimplementation is in fact undesired. We really want people
to use the immutable (and therefore secure) C implementation.

Somebody file a bug about that please while I run off into the woods.

self.domain = domain
self.default = default
self.mapping = mapping
self.msgid_plural = msgid_plural
self.default_plural = default_plural or msgid_plural
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the or msgid_plural logic here? It's not done for self.default, and it's not done in zopefoundation/zope.i18n#34 (review), where you can do translate("msgid", msgid_plural="msgids") without providing defaults and it'll pass a default_plural of None to the underlying catalog translation machinery.

@thefunny42
Copy link
Member Author

Regarding as security argument, I believe that to be a weak argument nowadays, because:

  • indeed not everybody can use the C extension,
  • Grok does not use the security proxies, and only do view-based permission checking,
  • Zope 2 has a different policy and the messages id are probably not usable in a ZODB-script,
  • I did not hear anyone working on a regular Zope3/Bluebream in more than 6 years.

But nonetheless I will update the C extension with the attributes. I should probably make a new branch in order not to waste the history on the files that have been removed here.

@thefunny42
Copy link
Member Author

I made a new pull request with changes in the C extension: #13

@thefunny42 thefunny42 closed this Oct 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants