E2E encryption does not handle unicode properly #1719

dluciv · 2016-07-05T15:46:36Z

!!! VERY PROBABLY SECURITY ISSUE, SEE EXPLANATION BELOW !!!

Problem itself

Vector fails to receive unicode messages. For example, we can send cyrillic Достоевский and then receiving side will show nothing. Receiving side browser console output for this message will be:

Caught /sync error SyntaxError: Unexpected token � in JSON at position 164
    at Object.parse (native)
    at _decryptMessage (https://vector.im/develop/bundle.js:45792:26)
    at mapper (https://vector.im/develop/bundle.js:48012:26)
    at https://vector.im/develop/bundle.js:86429:17
    at Array.map (native)
    at SyncApi._mapSyncEventsFormat (https://vector.im/develop/bundle.js:86425:24)
    at https://vector.im/develop/bundle.js:86166:36
    at Array.forEach (native)
    at SyncApi._processSyncResponse (https://vector.im/develop/bundle.js:86163:16)
    at https://vector.im/develop/bundle.js:86055:19

Technical Analysis

This problem very likely comes from Olm library itself. So I nearly know it is Olm, not Vector bug. But I see no better place to log it.

When trying to test recent olm.js from https://matrix.org/packages/npm/olm/olm-0.1.0.tgz against https://matrix.org/git/olm/tree/javascript/demo/one_to_one_demo.html, unicode is decrypted incorrectly.

When I encrtypt Byron, Olm decrypts it correctly then.
When I encrypt Достоевский, Olm decrypts it as:

Достоевский









�

So having 10 '\n' (exactly '\n') and then '\x03'.

Literally, when we debug line var plaintext = to_session.decrypt(message.type, message.body); in HTML file, we can get "Достоевский\n\n\n\n\n\n\n\n\n\n\u0003" -- just as above.

Not sure this issue alone leads to vector errors in unicode E2E, but is definitely should not be so.

SECURITY

This issue very likely comes from incorrectly considering string length-in-bytes as length-in-codepoints. For ASCII those lengths are similar (if we use UTF-8 as internal Emscripten encoding, for example), so ASCII texts are (en|de)crypted correctly. But when it comes no non-ASCII symbols, e.g. Cyrillic, those lengths are different, and this likely causes such a problem.

Other examples also show different garbage after result string -- not only '\n' and '\x03', but other bytes too. Very probably it sends an amount of Emscripten memory located after the end of the string. This garbage can contain important data (e.g. keys) and it can leak this way. So fix should be applied as close to Emscripten part as possible or maybe even in C++ code itself.

The text was updated successfully, but these errors were encountered:

richvdh · 2016-07-05T22:12:41Z

Thanks for the report. I've repeated the issue and prepared a fix, which we'll land once we've reviewed it.

dluciv · 2016-07-06T04:16:51Z

Thank you very much!
So finally, was it Olm bug? And could it be security issue as I was afraid of?
Can I see the fix too? Just intrigued what was wrong.

ara4n · 2016-07-06T18:07:43Z

~~the fix looks to be pr #11~~

ignore me; i'm confused. the fix will be somewhere in the matrix.org/git/olm repo shortly

richvdh · 2016-07-06T20:57:20Z

@dluciv: it is a bug in the javascript wrappers for olm.

You can see the fix on the rav/fix_decrypt_utf8 git branch. And yes, I suspect under certain circumstances it will expose random crap from the stack. The underlying Olm library goes to some effort to check that sensitive data isn't left lying around in stack memory, so I think you'd struggle to exploit the issue, but certainly we appreciate the report.

richvdh · 2016-07-06T21:36:31Z

The fix is now landed in olm-master, but since vector is still configured to use the broken (release) version of olm, I'm keeing this bug open for now.

dluciv · 2016-07-09T16:00:53Z

Also when releasing please be so kind to update https://matrix.org/packages/npm/olm/olm-0.1.0.tgz to https://matrix.org/packages/npm/olm/olm-0.1.1.tgz (or some other version, not sure about this) and don't forget to update everything dependent on it. I know two places (but I guess there are more) where it is mentioned:

README.md in master branch with Olm instruction
package.json in develop branch with optional dependency in olm-....tgz

Hope to see updated version of Olm at least for develop branch soon! Thank you!

Fixes #1719

richvdh · 2016-07-11T14:06:07Z

I have now released olm 1.0.0, and updated the develop branch of vector to use it; accordingly considering this fixed. Thanks again for the report.

richvdh self-assigned this Jul 5, 2016

richvdh closed this as completed Jul 6, 2016

richvdh reopened this Jul 6, 2016

richvdh added a commit that referenced this issue Jul 11, 2016

Bump to olm 1.0.0

3e53879

Fixes #1719

richvdh closed this as completed Jul 11, 2016

richvdh mentioned this issue Sep 4, 2016

Emoji in megolm msgs choke with '**SyntaxError: Unexpected token in JSON at position 254**' or similar #2078

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

E2E encryption does not handle unicode properly #1719

E2E encryption does not handle unicode properly #1719

dluciv commented Jul 5, 2016 •

edited

Loading

richvdh commented Jul 5, 2016

dluciv commented Jul 6, 2016 •

edited

Loading

ara4n commented Jul 6, 2016 •

edited

Loading

richvdh commented Jul 6, 2016

richvdh commented Jul 6, 2016

dluciv commented Jul 9, 2016 •

edited

Loading

richvdh commented Jul 11, 2016

E2E encryption does not handle unicode properly #1719

E2E encryption does not handle unicode properly #1719

Comments

dluciv commented Jul 5, 2016 • edited Loading

Problem itself

Technical Analysis

SECURITY

richvdh commented Jul 5, 2016

dluciv commented Jul 6, 2016 • edited Loading

ara4n commented Jul 6, 2016 • edited Loading

richvdh commented Jul 6, 2016

richvdh commented Jul 6, 2016

dluciv commented Jul 9, 2016 • edited Loading

richvdh commented Jul 11, 2016

dluciv commented Jul 5, 2016 •

edited

Loading

dluciv commented Jul 6, 2016 •

edited

Loading

ara4n commented Jul 6, 2016 •

edited

Loading

dluciv commented Jul 9, 2016 •

edited

Loading