New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Unable to decrypt: The sender's device has not sent us the keys for this message." (The UISI bug) #2996

Open
richvdh opened this Issue Jan 19, 2017 · 11 comments

Comments

Projects
6 participants
@richvdh
Member

richvdh commented Jan 19, 2017

This message (or, less often, the closely related "OLM.UNKNOWN_MESSAGE_INDEX") can be caused by a number of things. This bug serves as a reference to the reasons we know about.

  • You logged into your device, or joined the room, after the message was sent. This is sort-of by design, though #2258, #2286 and #2713 are relevant.
  • The sender has blacklisted your device. This is definitely by design, though see #3831 for improving visibility on this.
  • The keys haven't arrived yet. Patience you must have.

Client-specific bugs:

Protocol/server things:

Unexplained:

@richvdh richvdh added the type:e2e label Jan 19, 2017

@ara4n ara4n changed the title from "Unable to decrypt: The sender's device has not sent us the keys for this message." to "Unable to decrypt: The sender's device has not sent us the keys for this message." (The UISI bug) Mar 19, 2017

@ara4n

This comment has been minimized.

Show comment
Hide comment
@ara4n

ara4n Apr 1, 2017

Member

For anyone reading this looking for a workaround, the best advice currently is that if you suddenly find yourself unable to decrypt messages from someone, ask them to open up your contact details on their client. This should force their Riot to resync its copy of your device list, increasing the chance that the next message sent will actually be encrypted for your current device.

If this doesn't work, you have no choice but create a new room (or worst case, export your session keys, logout, login and import your session keys), until the workaround in #3553 is implemented (or this meta-bug is finally closed up)

Member

ara4n commented Apr 1, 2017

For anyone reading this looking for a workaround, the best advice currently is that if you suddenly find yourself unable to decrypt messages from someone, ask them to open up your contact details on their client. This should force their Riot to resync its copy of your device list, increasing the chance that the next message sent will actually be encrypted for your current device.

If this doesn't work, you have no choice but create a new room (or worst case, export your session keys, logout, login and import your session keys), until the workaround in #3553 is implemented (or this meta-bug is finally closed up)

@lampholder lampholder added this to the RW002 milestone Apr 3, 2017

@lampholder

This comment has been minimized.

Show comment
Hide comment
@lampholder

lampholder Apr 3, 2017

Member

Making forward progress on this requires either @richvdh to hop back into it, or for him to hand over to somebody else.

Member

lampholder commented Apr 3, 2017

Making forward progress on this requires either @richvdh to hop back into it, or for him to hand over to somebody else.

@lampholder lampholder modified the milestones: RW003 - candidates, RW002 - candidates Apr 3, 2017

@lampholder lampholder added this to Ready to Start in E2E Usability and Stability Apr 12, 2017

@lampholder lampholder referenced this issue Apr 12, 2017

Open

E2E stability and usability. #63

0 of 4 tasks complete

@lampholder lampholder removed this from the RW003 - candidates milestone Apr 24, 2017

@ara4n

This comment has been minimized.

Show comment
Hide comment
@ara4n

ara4n May 4, 2017

Member

I just spent a while reviewing all of the known remaining UISI causes with @richvdh. UISI bugs fall into two broad categories:

Wedged olm sessions:

  • #3309: Olm sessions can get wedged if we throw away OTKs for messages in-flight
  • #2783: We can reduce the risk of #3309 by prioritising established olm sessions over half-open ones
  • #2325: Olm sessions corrupt & wedge if you open the same Riot in multiple tabs
    • Fixable without much crypto-fu by swapping the storage layer for indexeddb. Would also help mitigate #3660 (running out of local storage)
  • #3231: If you reuse node-localstorage and share curve25519 keys between different device IDs, Olm wedges <-- NOT A BUG FOR RIOT<->RIOT.
  • #3822: If users restore a device backup or tab whilst the existing device is still around and active, Olm will wedge.

Missing megolm keys:

  • matrix-org/synapse#2165: Caused if federation is broken when you login, so remote HS don't get told about your new device.
    • There is no solution to this; if alice doesn't know bob's new device exists when encrypting messages for him, he'll never be able to decrypt them.
  • #3754: Caused by your HS federation being broken when someone starts a new megolm session to you, but you then receive messages from other HSes before you receive your megolm keys
    • We could query the origin HS (if it's available again) for the keys, to speed up the recovery process.
  • #3187: You ran out of Olm OTKs, so can't start a new Olm session, and thus share Megolm keys (or do anything else)
  • #3796: If Alice adds a new device whilst we're downloading her old device list, we may not spot the new device.
  • #3825: If Alice and Bob only ever use Riot in short-lived incognito windows, they may never successfully exchange megolm keys
  • ...possibly other bugs where the act of loading the target's MemberInfo kicks the sender into refreshing their device list?

Rich estimates UISIs to roughly be caused 50/50 split between the two.

However, all of the 'missing megolm key' class of bugs can be worked around by giving users a way of recovering missing megolm keys - and in some cases (broken federation; matrix-org/synapse#2165) this is the only plausible solution. In turn, if we had a way of recovering missing megolm keys, we'd also have a way to share history to new devices - the infamous 'share history' bug (#2286).

Therefore the suggestion is to focus entirely[1] on solving the problem of sharing megolm keys, given the value of solving both the 'missing megolm key' bugs as well as the 'sharing history' feature is greater than the value of solving the individual bitty 'wedged olm session' bugs (which all have different solutions). This means setting aside UISI hell and forging ahead and solving history sharing (#2286) or at least a subset of it. This could well include improving the UX for sharing history by supporting cross-signed keys (#2714).

[1] We can probably progress the 'multiple tabs' problem (#2325) in parallel. And the plan is to finish the devicelist race #3796 first.

Member

ara4n commented May 4, 2017

I just spent a while reviewing all of the known remaining UISI causes with @richvdh. UISI bugs fall into two broad categories:

Wedged olm sessions:

  • #3309: Olm sessions can get wedged if we throw away OTKs for messages in-flight
  • #2783: We can reduce the risk of #3309 by prioritising established olm sessions over half-open ones
  • #2325: Olm sessions corrupt & wedge if you open the same Riot in multiple tabs
    • Fixable without much crypto-fu by swapping the storage layer for indexeddb. Would also help mitigate #3660 (running out of local storage)
  • #3231: If you reuse node-localstorage and share curve25519 keys between different device IDs, Olm wedges <-- NOT A BUG FOR RIOT<->RIOT.
  • #3822: If users restore a device backup or tab whilst the existing device is still around and active, Olm will wedge.

Missing megolm keys:

  • matrix-org/synapse#2165: Caused if federation is broken when you login, so remote HS don't get told about your new device.
    • There is no solution to this; if alice doesn't know bob's new device exists when encrypting messages for him, he'll never be able to decrypt them.
  • #3754: Caused by your HS federation being broken when someone starts a new megolm session to you, but you then receive messages from other HSes before you receive your megolm keys
    • We could query the origin HS (if it's available again) for the keys, to speed up the recovery process.
  • #3187: You ran out of Olm OTKs, so can't start a new Olm session, and thus share Megolm keys (or do anything else)
  • #3796: If Alice adds a new device whilst we're downloading her old device list, we may not spot the new device.
  • #3825: If Alice and Bob only ever use Riot in short-lived incognito windows, they may never successfully exchange megolm keys
  • ...possibly other bugs where the act of loading the target's MemberInfo kicks the sender into refreshing their device list?

Rich estimates UISIs to roughly be caused 50/50 split between the two.

However, all of the 'missing megolm key' class of bugs can be worked around by giving users a way of recovering missing megolm keys - and in some cases (broken federation; matrix-org/synapse#2165) this is the only plausible solution. In turn, if we had a way of recovering missing megolm keys, we'd also have a way to share history to new devices - the infamous 'share history' bug (#2286).

Therefore the suggestion is to focus entirely[1] on solving the problem of sharing megolm keys, given the value of solving both the 'missing megolm key' bugs as well as the 'sharing history' feature is greater than the value of solving the individual bitty 'wedged olm session' bugs (which all have different solutions). This means setting aside UISI hell and forging ahead and solving history sharing (#2286) or at least a subset of it. This could well include improving the UX for sharing history by supporting cross-signed keys (#2714).

[1] We can probably progress the 'multiple tabs' problem (#2325) in parallel. And the plan is to finish the devicelist race #3796 first.

@richvdh

This comment has been minimized.

Show comment
Hide comment
@richvdh

richvdh May 5, 2017

Member

#3231: If you reuse node-localstorage and share curve25519 keys between different device IDs, Olm wedges <-- NOT A BUG FOR RIOT.

Well, it is a bug for riot, in that if anyone uses the js-sdk in the obvious manner, riot fails to talk e2e with the resultant client. It's arguable whose fault that is - ideally both ends would be fixed. But it's not a bug for riot inasmuchas it doesn't affect riot<->riot comms.

...possibly other bugs where the act of loading the target's MemberInfo kicks the sender into refreshing their device list?

AFAIK the only thing that loading the MemberInfo would solve these days is #3796.

Member

richvdh commented May 5, 2017

#3231: If you reuse node-localstorage and share curve25519 keys between different device IDs, Olm wedges <-- NOT A BUG FOR RIOT.

Well, it is a bug for riot, in that if anyone uses the js-sdk in the obvious manner, riot fails to talk e2e with the resultant client. It's arguable whose fault that is - ideally both ends would be fixed. But it's not a bug for riot inasmuchas it doesn't affect riot<->riot comms.

...possibly other bugs where the act of loading the target's MemberInfo kicks the sender into refreshing their device list?

AFAIK the only thing that loading the MemberInfo would solve these days is #3796.

@ara4n

This comment has been minimized.

Show comment
Hide comment
@ara4n

ara4n May 6, 2017

Member

writing #2286 (comment) made me realise that perhaps we can also improve the experience here in general with better error messages. For instance, do we have any way of detecting when an Olm session has got wedged, such that we can complain about that (and perhaps reset it?) rather than just whine about missing megolm keys?

Member

ara4n commented May 6, 2017

writing #2286 (comment) made me realise that perhaps we can also improve the experience here in general with better error messages. For instance, do we have any way of detecting when an Olm session has got wedged, such that we can complain about that (and perhaps reset it?) rather than just whine about missing megolm keys?

@richvdh

This comment has been minimized.

Show comment
Hide comment
@richvdh

richvdh May 6, 2017

Member

Yes, we can certainly improve this. We can give the user feedback about failing to decrypt to_device messages (though they tend to get replayed at initial sync, so we'd have to think how to avoid false positives). vector-im/riot-android#800 randomly, covers that. If we can get it reliable, we can start a new Olm session to try and unwedge things. We can also consider giving better feedback from the sender's end (#2494).

In general it's hard to tell the cause of any particular UISI, because you can't correlate it to a to_device you couldn't decrypt.

Member

richvdh commented May 6, 2017

Yes, we can certainly improve this. We can give the user feedback about failing to decrypt to_device messages (though they tend to get replayed at initial sync, so we'd have to think how to avoid false positives). vector-im/riot-android#800 randomly, covers that. If we can get it reliable, we can start a new Olm session to try and unwedge things. We can also consider giving better feedback from the sender's end (#2494).

In general it's hard to tell the cause of any particular UISI, because you can't correlate it to a to_device you couldn't decrypt.

@pik

This comment has been minimized.

Show comment
Hide comment
@pik

pik May 6, 2017

If users restore a device backup or tab whilst the existing device is still around and active, Olm will wedge.
If you reuse node-localstorage and share curve25519 keys between different device IDs, Olm wedges <-- NOT A BUG FOR RIOT<->RIOT.

While it is possible at the moment I don't really like the idea of the same keys on multiple devices (this is surely a security risk). Notably also device_keys changing while device_id remains the same is another way to wedge olm sessions (the client sharing the megolm session key thinks they sent keys to the associated device, but that device has new keys and cannot decrypt the olm session).

pik commented May 6, 2017

If users restore a device backup or tab whilst the existing device is still around and active, Olm will wedge.
If you reuse node-localstorage and share curve25519 keys between different device IDs, Olm wedges <-- NOT A BUG FOR RIOT<->RIOT.

While it is possible at the moment I don't really like the idea of the same keys on multiple devices (this is surely a security risk). Notably also device_keys changing while device_id remains the same is another way to wedge olm sessions (the client sharing the megolm session key thinks they sent keys to the associated device, but that device has new keys and cannot decrypt the olm session).

@richvdh

This comment has been minimized.

Show comment
Hide comment
@richvdh

richvdh May 8, 2017

Member

@pik: please take your questions about #3231 and #3822 to the relevant bugs.

Another thing we should consider on this bug is a way for the recipient to distinguish "the sender failed to send to you" vs "the sender chose not to send to you" (either they blocked you or your device explicitly, or because they had the 'Never send encrypted messages to unverified devices from this device' setting (matrix-org/matrix-js-sdk#336) checked).

  • basically we should be able to distinguish between "it went a bit wrong" and "expected behaviour".

Of course that would probably mean the sender sending a "you're blocked' notification to the recipient, but that wouldn't be hard.

Member

richvdh commented May 8, 2017

@pik: please take your questions about #3231 and #3822 to the relevant bugs.

Another thing we should consider on this bug is a way for the recipient to distinguish "the sender failed to send to you" vs "the sender chose not to send to you" (either they blocked you or your device explicitly, or because they had the 'Never send encrypted messages to unverified devices from this device' setting (matrix-org/matrix-js-sdk#336) checked).

  • basically we should be able to distinguish between "it went a bit wrong" and "expected behaviour".

Of course that would probably mean the sender sending a "you're blocked' notification to the recipient, but that wouldn't be hard.

@richvdh

This comment has been minimized.

Show comment
Hide comment
@richvdh

richvdh May 8, 2017

Member

Added #3845 for the "blocked vs failure" sidebar

Member

richvdh commented May 8, 2017

Added #3845 for the "blocked vs failure" sidebar

@ubmarco

This comment has been minimized.

Show comment
Hide comment
@ubmarco

ubmarco Apr 24, 2018

The workaround 'opening the contact details' did not work for us. I can read all messages from the mobile of my colleague but not from Linux Riot. We found a workaround: my colleague closed Riot, deleted the config folder ~/.config/Riot and started the app again; now his device showed up as unverified but I could read the messages again; after verification, we get the green lock and everything works as expected.

ubmarco commented Apr 24, 2018

The workaround 'opening the contact details' did not work for us. I can read all messages from the mobile of my colleague but not from Linux Riot. We found a workaround: my colleague closed Riot, deleted the config folder ~/.config/Riot and started the app again; now his device showed up as unverified but I could read the messages again; after verification, we get the green lock and everything works as expected.

@lampholder lampholder added this to the RW008 - Candidates milestone Apr 24, 2018

@hettipeti

This comment has been minimized.

Show comment
Hide comment
@hettipeti

hettipeti Apr 24, 2018

I had problems with my Phone Key after restoring a TWRP Backup.
I couln't read my phone on the same account with my web client on the desktop. With exactly this error: "Unable to decrypt: The sender's device has not sent us the keys for this message."
After Bugrequest I got following Fault + Solution:
Fault: "Looks like by restoring your phone from backup its crypto state got completely hosed and out of sync with the server."

Solution that worked: Export my room keys, logging out & in again and reimporting them on the telephone.

Phone will get new key after log out and new login.

Twitter Thread:
https://twitter.com/Th3PeKo/status/988821611313758208

hettipeti commented Apr 24, 2018

I had problems with my Phone Key after restoring a TWRP Backup.
I couln't read my phone on the same account with my web client on the desktop. With exactly this error: "Unable to decrypt: The sender's device has not sent us the keys for this message."
After Bugrequest I got following Fault + Solution:
Fault: "Looks like by restoring your phone from backup its crypto state got completely hosed and out of sync with the server."

Solution that worked: Export my room keys, logging out & in again and reimporting them on the telephone.

Phone will get new key after log out and new login.

Twitter Thread:
https://twitter.com/Th3PeKo/status/988821611313758208

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment