Add support for pluralization on messages #12

thefunny42 · 2018-10-17T12:01:42Z

This is the first part of two pull requests, the other one is zopefoundation/zope.i18n#34, in order to add proper support for plurization. This add some attributes to the Message in order to support it.

We did not see the advantage to the C extension, which seemed just to act as a tuple, so instead of adding attributes there we removed it. I hope this will not cause any problem.

…lations using ngettext.

…g as message ids.

jamadden · 2018-10-17T13:27:01Z

We did not see the advantage to the C extension,

Creating the C objects is about 10 times faster (measurements taken on Python 3.6.6 with IPython 6.2.1):

In [1]: from zope.i18nmessageid.message import Message, pyMessage

In [2]: Message
Out[2]: zope.i18nmessageid.message.Message

In [3]: pyMessage
Out[3]: zope.i18nmessageid.message.Message

In [4]: pyMessage is Message
Out[4]: False

In [5]: %timeit Message(u'a str', 'a domain', None, None)
331 ns ± 3.73 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [6]: %timeit pyMessage(u'a str', 'a domain', None, None)
3.24 µs ± 46.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Since many message ids are created at import time, that has an impact on startup speed.

C MessageIDs are also ~60% faster to pickle (that might matter for some ZODB or network applications?):

In [13]: %timeit msg.__reduce__()
348 ns ± 2.83 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [14]: %timeit pymsg.__reduce__()
564 ns ± 3.78 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

They report the same size and take about the same amount of time to access attribute, though:

In [7]: sys.getsizeof(Message(u'a str', 'a domain', None, None))
Out[7]: 110

In [8]: sys.getsizeof(pyMessage(u'a str', 'a domain', None, None))
Out[8]: 110

In [9]: msg = Message(u'a str', 'a domain', None, None)

In [10]: pymsg = pyMessage(u'a str', 'a domain', None, None)

In [11]: %timeit msg.domain
41.4 ns ± 0.6 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [12]: %timeit pymsg.domain
38.7 ns ± 0.24 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

thefunny42 · 2018-10-17T14:55:58Z

I see your point regarding speed when creating the objects, but compared to the rest of the code loaded and all the objects created in a Zope stack, I am not convinced that's a huge difference. However I believe that it is a bigger burden to maintain, where we can easily have bugs and memory issues.

I do not think the pickling time is really relevant here, since translations are generally done inside the software and business logic, and are not data that stored in the database.

If we all decide to keep the C extension, I can invest time in order to extend the current one with the new attributes, but I am still not convinced that this really needed.

tseaver · 2018-10-17T14:58:17Z

The other feature provided by the C version is that message IDs become truly immutable, which was important for @jimfulton for security reasons (memory is fading, but "message IDs should be rocks" still echoes somehow).

jamadden · 2018-10-17T15:03:30Z

The other feature provided by the C version is that message IDs become truly immutable

Mostly: #1 😉

which was important for jimfulton for security reasons

That sounds reasonable. There's no need to wrap them in security proxies if they're truly immutable. In fact, zope.security specifically declines to proxy them with the comment that "they're immutable, so it's ok"

mgedmin

Basically LGTM, with some (mostly) style comments.

The 0 -> None degradation on copy should be fixed and I think deserves a separate unit test.

The question on whether number can be a long on Python 2 is, I think, somewhat important. (Perhaps we can use six.int_types?) I admit I was surprised to see floats also accepted; ngettext() does not support floats in C and I look forward to discovering how this is going to be handled.

I'm going to heroically resist raising the question about passing decimal.Decimal() or complex numbers to messages. 😜

The security implications of true immutability are, I think, not relevant to this PR: if it's bad to have possibly mutable pure-Python message IDs, then we're already in that situation if the user didn't happen to have a C compiler available when they installed zope.i18nmessageid.

I will refrain from speculating on performance questions.

mgedmin · 2018-10-17T15:21:42Z

setup.py

-        """)
-        sys.stderr.write(str(e) + '\n')
-        sys.stderr.write('*' * 80 + '\n')
+    return open(path.join(path.dirname(__file__), *rnames)).read()


What a perfect opportunity to use a with open(...) as f: return f.read() here!

mgedmin · 2018-10-17T15:23:05Z

setup.py

@@ -123,7 +53,6 @@ def _unavailable(self, e):
        'Programming Language :: Python :: 3.5',
        'Programming Language :: Python :: 3.6',
        'Programming Language :: Python :: 3.7',
-        "Programming Language :: Python :: Implementation :: CPython",
        "Programming Language :: Python :: Implementation :: PyPy",


This makes me think this package is PyPy-only and doesn't support CPython anymore.

My gut feeling is that we should either list both CPython and PyPy or neither, and that it's better to list both when we have explicit CI tests on both.

You do not need to have a C extension to work on CPython (which is the regular one).

mgedmin · 2018-10-17T15:23:36Z

src/zope/i18nmessageid/message.py

-    __slots__ = ('domain', 'default', 'mapping', '_readonly')
+    __slots__ = (
+        'domain', 'default', 'mapping', '_readonly',
+        'msgid_plural', 'default_plural', 'number'


I like a trailing comma when the closing bracket is on the next line.

mgedmin · 2018-10-17T15:25:11Z

src/zope/i18nmessageid/message.py

+            default_plural = (
+                ustr.default_plural and ustr.default_plural[:]
+                or default_plural)
+            number = ustr.number is not None and ustr.number or number


Can ustr.number be 0? Because this and-or trick will treat it the same as None and can in fact be shortened to ustr.number or number.

Maybe it's time to use if-else expressions instead of and-or tricks?

(Also I'm somewhat surprised that _(_(..., default='foo'), default='bar').default is 'foo' and not 'bar', but maybe if I took the time to read the doctests I'd see justification for it.)

mgedmin · 2018-10-17T15:30:15Z

src/zope/i18nmessageid/message.py

+
+        if number is not None and not isinstance(number, (int, float)):
+            # Number must be an integer
+            raise TypeError('`number` should be an integer or a float.')


On Python 2 it can also be long, can't it?

mgedmin · 2018-10-17T15:31:18Z

src/zope/i18nmessageid/message.py

-        return unicode(self), self.domain, self.default, self.mapping
+        return (
+            unicode(self), self.domain, self.default, self.mapping,
+            self.msgid_plural, self.default_plural, self.number)

    def __reduce__(self):
        return self.__class__, self.__getstate__()


This looks backwards-compatible with old pickles, good!

mgedmin · 2018-10-17T15:51:38Z

Oh dear: https://bugs.launchpad.net/zope3/+bug/220122/comments/5

Because reimplementation is in fact undesired. We really want people
to use the immutable (and therefore secure) C implementation.

Somebody file a bug about that please while I run off into the woods.

mgedmin · 2018-10-17T16:08:31Z

src/zope/i18nmessageid/message.py

        self.domain = domain
        self.default = default
        self.mapping = mapping
+        self.msgid_plural = msgid_plural
+        self.default_plural = default_plural or msgid_plural


Why the or msgid_plural logic here? It's not done for self.default, and it's not done in zopefoundation/zope.i18n#34 (review), where you can do translate("msgid", msgid_plural="msgids") without providing defaults and it'll pass a default_plural of None to the underlying catalog translation machinery.

thefunny42 · 2018-10-18T06:29:31Z

Regarding as security argument, I believe that to be a weak argument nowadays, because:

indeed not everybody can use the C extension,
Grok does not use the security proxies, and only do view-based permission checking,
Zope 2 has a different policy and the messages id are probably not usable in a ZODB-script,
I did not hear anyone working on a regular Zope3/Bluebream in more than 6 years.

But nonetheless I will update the C extension with the attributes. I should probably make a new branch in order not to waste the history on the files that have been removed here.

thefunny42 · 2018-10-18T08:30:04Z

I made a new pull request with changes in the C extension: #13

trollfot and others added 7 commits September 6, 2018 11:33

removed the C extension. Added plural variables to allow plural trans…

2d2362e

…lations using ngettext.

Removed unused code.

fcbe54e

Fall back to msgid_plural in case of having a source language doublin…

2112e1f

…g as message ids.

add gitattributes for better POs

17db565

Update CHANGES.txt.

786071a

Merge remote-tracking branch 'zope/master' into CB-556-FR-translation

1facbcf

Remove .gitattributes.

1c33425

thefunny42 requested review from mgedmin and icemac October 17, 2018 12:01

thefunny42 mentioned this pull request Oct 17, 2018

Add support for pluralization zopefoundation/zope.i18n#34

Merged

mgedmin requested changes Oct 17, 2018

View reviewed changes

mgedmin reviewed Oct 17, 2018

View reviewed changes

thefunny42 closed this Oct 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for pluralization on messages #12

Add support for pluralization on messages #12

thefunny42 commented Oct 17, 2018 •

edited

jamadden commented Oct 17, 2018

thefunny42 commented Oct 17, 2018

tseaver commented Oct 17, 2018

jamadden commented Oct 17, 2018

mgedmin left a comment

mgedmin Oct 17, 2018

mgedmin Oct 17, 2018

thefunny42 Oct 18, 2018

mgedmin Oct 17, 2018

mgedmin Oct 17, 2018

mgedmin Oct 17, 2018

mgedmin Oct 17, 2018

mgedmin commented Oct 17, 2018

mgedmin Oct 17, 2018

thefunny42 commented Oct 18, 2018

thefunny42 commented Oct 18, 2018

Add support for pluralization on messages #12

Add support for pluralization on messages #12

Conversation

thefunny42 commented Oct 17, 2018 • edited

jamadden commented Oct 17, 2018

thefunny42 commented Oct 17, 2018

tseaver commented Oct 17, 2018

jamadden commented Oct 17, 2018

mgedmin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mgedmin commented Oct 17, 2018

Choose a reason for hiding this comment

thefunny42 commented Oct 18, 2018

thefunny42 commented Oct 18, 2018

thefunny42 commented Oct 17, 2018 •

edited