-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
E2E encryption does not handle unicode properly #1719
Comments
Thanks for the report. I've repeated the issue and prepared a fix, which we'll land once we've reviewed it. |
Thank you very much! |
ignore me; i'm confused. the fix will be somewhere in the matrix.org/git/olm repo shortly |
@dluciv: it is a bug in the javascript wrappers for olm. You can see the fix on the |
The fix is now landed in olm-master, but since vector is still configured to use the broken (release) version of olm, I'm keeing this bug open for now. |
Also when releasing please be so kind to update
Hope to see updated version of Olm at least for |
I have now released olm 1.0.0, and updated the develop branch of vector to use it; accordingly considering this fixed. Thanks again for the report. |
!!! VERY PROBABLY SECURITY ISSUE, SEE EXPLANATION BELOW !!!
Problem itself
Vector fails to receive unicode messages. For example, we can send cyrillic
Достоевский
and then receiving side will show nothing. Receiving side browser console output for this message will be:Technical Analysis
This problem very likely comes from Olm library itself. So I nearly know it is Olm, not Vector bug. But I see no better place to log it.
When trying to test recent
olm.js
fromhttps://matrix.org/packages/npm/olm/olm-0.1.0.tgz
againsthttps://matrix.org/git/olm/tree/javascript/demo/one_to_one_demo.html
, unicode is decrypted incorrectly.When I encrtypt
Byron
, Olm decrypts it correctly then.When I encrypt
Достоевский
, Olm decrypts it as:So having 10
'\n'
(exactly'\n'
) and then'\x03'
.Literally, when we debug line
var plaintext = to_session.decrypt(message.type, message.body);
in HTML file, we can get"Достоевский\n\n\n\n\n\n\n\n\n\n\u0003"
-- just as above.Not sure this issue alone leads to vector errors in unicode E2E, but is definitely should not be so.
SECURITY
This issue very likely comes from incorrectly considering string length-in-bytes as length-in-codepoints. For ASCII those lengths are similar (if we use UTF-8 as internal Emscripten encoding, for example), so ASCII texts are (en|de)crypted correctly. But when it comes no non-ASCII symbols, e.g. Cyrillic, those lengths are different, and this likely causes such a problem.
Other examples also show different garbage after result string -- not only
'\n'
and'\x03'
, but other bytes too. Very probably it sends an amount of Emscripten memory located after the end of the string. This garbage can contain important data (e.g. keys) and it can leak this way. So fix should be applied as close to Emscripten part as possible or maybe even in C++ code itself.The text was updated successfully, but these errors were encountered: