Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NSE failed to persist megolm sessions, causing an explosion of UISIs #5426

Open
ara4n opened this issue Jan 24, 2022 · 14 comments · Fixed by #5503
Open

NSE failed to persist megolm sessions, causing an explosion of UISIs #5426

ara4n opened this issue Jan 24, 2022 · 14 comments · Fixed by #5503
Labels
A-E2EE A-Notifications A-SDK O-Occasional Affects or can be seen by some users regularly or most users rarely S-Major Severely degrades major functionality or product features, with no satisfactory workaround T-Defect Something isn't working: bugs, crashes, hangs and other reported problems

Comments

@ara4n
Copy link
Member

ara4n commented Jan 24, 2022

Steps to reproduce

I spent the morning on EW, and then fg'd up EI when going to lunch. However, many recent sessions (including my own) were UISI. Rageshaked on sender (EW) and receiver (EI).

Outcome

No UISIs, rather than loads of UISIs from different senders.

It looks like the NSE received these new sessions (from my EI rageshake):

2022-01-24 12:29:01.187 RiotNSE[11625:5755912]  [MXOlmDevice] decryptMessage: Olm Session id 0r0JFyWrv+gY6tQUK6Z83LjGcQH7BR3KvKDM+wf5Hc4(1) from Nhu0+B3JJNHr3abToRm0rYjXne20De67kH1RaNZBEAM
2022-01-24 12:29:01.189 RiotNSE[11625:5755912]  [MXRealmCryptoStore] inboundGroupSessionWithId: qaJz1TQlUu9bh5zZKJ7vY00CIJ8a3J1ZN97tXklL3/c -> not found
2022-01-24 12:29:01.191 RiotNSE[11625:5755912]  [MXRealmCryptoStore] inboundGroupSessionWithId: qaJz1TQlUu9bh5zZKJ7vY00CIJ8a3J1ZN97tXklL3/c -> not found
2022-01-24 12:29:01.191 RiotNSE[11625:5755912]  [MXOlmDevice] addInboundGroupSession: Add megolm session Nhu0+B3JJNHr3abToRm0rYjXne20De67kH1RaNZBEAM|qaJz1TQlUu9bh5zZKJ7vY00CIJ8a3J1ZN97tXklL3/c (import: NO)
2022-01-24 12:29:01.192 RiotNSE[11625:5755912]  [MXRealmCryptoStore] storeInboundGroupSessions: store 1 keys (1 new) in 1.507ms

but then when the main app tried to decrypt using that inbound session:

2022-01-24 13:14:43.899 Riot[11241:5771620] 🚨 [MXRealmCryptoStore] performSessionOperationWithGroupSessionWithId. Error: megolm session qaJz1TQlUu9bh5zZKJ7vY00CIJ8a3J1ZN97tXklL3/c not found
2022-01-24 13:14:43.899 Riot[11241:5771620]  [MXOlmDevice] decryptGroupMessage: Cannot decrypt in room !GLMaZaGdKXwMtyRevY:matrix.org with session Nhu0+B3JJNHr3abToRm0rYjXne20De67kH1RaNZBEAM|qaJz1TQlUu9bh5zZKJ7vY00CIJ8a3J1ZN97tXklL3/c. Error: Error Domain=org.matrix.sdk.decryption Code=4 "Un
known inbound session id" UserInfo={NSLocalizedDescription=Unknown inbound session id}

So, how did the NSE fail to store the session?

Your phone model

12 pro max

Operating system version

iOS15.1.1

Application version

1.6.13-develop-#20220120114547

Homeserver

matrix.org

Will you send logs?

Yes

@ara4n ara4n added the T-Defect Something isn't working: bugs, crashes, hangs and other reported problems label Jan 24, 2022
@RiotRobot RiotRobot added this to Incoming in Issue triage Jan 24, 2022
@ara4n
Copy link
Member Author

ara4n commented Jan 24, 2022

So I went to look for UISIs in my rageshake:

matthew@shadowfax tmp % cat console\ \(1\).log.gz| grep 'Unknown inbound session id' | perl -ne 'if (/with session (.*?)\. /) { print "$1\n"; }' | sort -n | uniq -c | sort -rn | head -n 20                                                                                                      
  28 Nhu0+B3JJNHr3abToRm0rYjXne20De67kH1RaNZBEAM|qaJz1TQlUu9bh5zZKJ7vY00CIJ8a3J1ZN97tXklL3/c
  24 Nhu0+B3JJNHr3abToRm0rYjXne20De67kH1RaNZBEAM|LujzpqLNsDzm2rKRee/URzXbsgWRPdPbv9XsnL8XHuI
  14 0lpDfPoxFw5CickEP4fWOLUb9Kho+Ab/sVDvlouUdgk|owIs7HdX7PXei0WhdmWQMVGXtZyKhkKUXqXoaSyRg3c
  13 Nhu0+B3JJNHr3abToRm0rYjXne20De67kH1RaNZBEAM|SPb1+XaxT9oRL3hs8ISgFIu4arf6f+UnOBMYH97pphY
  11 n9ergqPOnEh0M2fRn6tvPKnw0gnfpi1lIbhmotrLvi0|WQYMsur6lRZ7D5X7zCgtlQ3pZ5dvpRQTqTm8JLQ4ZTE
   8 Nhu0+B3JJNHr3abToRm0rYjXne20De67kH1RaNZBEAM|/Nb1PKRmbU3zIls+UdEhTtiPLlf/m9ujYSF+ZSdRQ/E
   7 uNZ0N/v6CpwAhyyI0pbuYXwa80tG9p1ZYfoosxKl5C4|+fDJYyAG6YANCWQSNwUP2gtejx/BBn0Osw9ya4dHR8M
   7 jFws4MBKStDjUtKKdseHVtLEru4Zc3IWp18qsxeo2BA|MsKFLA3hOTxDbNncSU+B7L11CLreRVnMwP5e4DmJWFw
   6 yCzeJlVPpPrg6prkD4l4UezGxye3aXJE9Rd40wzNNhg|d1NN9IkkYQ/vAZa+MoUpOmqk/W+pSzvCKkZtwWwvuLo
   6 RhM45V0bRElCMEnRw9lvCNbTAe783go9JV4iFW1SMjc|OL8c/fMRQzKRQ5TBvWQk/NpBFfqxp5K+vjmGAQr3GeI
   6 0lpDfPoxFw5CickEP4fWOLUb9Kho+Ab/sVDvlouUdgk|Y6ChY27TvJZrkBZtfatVdSOWoVYSnDZIYsSQTgdPal0
   5 Ss6eaJGA82Uc9TQn1BbcLqpm6jtjGGwqMBadf/8Y+WY|Xm6C/TLFdxMY/s+HOaSauFn2Gf0guWykfNrEO89Krhc
   4 e5UAtAyPGD4Qc2bUZ9T5/vZ0sxwYyKbAWcUf7YNMtk8|Mv+VlA4OYBFk1IT5f8lt0aLcf9HTa6zeUfjrwuY3Bz0
   4 SVTkyzyUkQSD9d38H415xF8OwRShBlx2vwBU+oKpAms|6J20fpgz0jEExgj0nV1ryDtnxlX4x0LoP1ZQJiZTqbw
   4 PEVQqixq167OdEiMfcCl50kZZnnnDfJLTPnezQH+Ozc|kePI4Kk377uTkDGS4zOXS0U6QKpVNlZfYsu8TtBqnsU
   4 PEVQqixq167OdEiMfcCl50kZZnnnDfJLTPnezQH+Ozc|DqhU4UmK3CyMuLAZ3HEl59wxOL5fpkLFl6WWW1rorn8
   4 Nhu0+B3JJNHr3abToRm0rYjXne20De67kH1RaNZBEAM|tV1ewVkNVgJYqPpm1iCNa3J1u+LRHHOrkfQrr66lfQ0
   3 fHxmopYLiyDbfMsWa1duedG81iriApfMmVZEO21DkEY|gg0t9VjqSq19gk4lgVFLwFqWNJSPhIuPmNjozagQ/D8
   3 I7LzuJFIMpsYnq3ObiH1QPSLPo+pfSV0RbH3mocdryc|WYhC/EcioVGm30XLqeG0Mv77I8YpoxeysGApiyQNZ70
   3 4U2y3T6C5bxMY6LFrPDwwnmYH0l8hs089YPzjwuudzE|V15G50fryO8TREX9JGP5TVpK7PIUjTUwYtrSQi6tGj8

...and then looked to see what the NSE was up to with the most common ones...

matthew@shadowfax tmp % for i in qaJz1TQlUu9bh5zZKJ7vY00CIJ8a3J1ZN97tXklL3/c LujzpqLNsDzm2rKRee/URzXbsgWRPdPbv9XsnL8XHuI owIs7HdX7PXei0WhdmWQMVGXtZyKhkKUXqXoaSyRg3c SPb1+XaxT9oRL3hs8ISgFIu4arf6f+UnOBMYH97pphY WQYMsur6lRZ7D5X7zCgtlQ3pZ5dvpRQTqTm8JLQ4ZTE; do fgrep $i console-nse.log.gz; done
2022-01-24 12:29:01.189 RiotNSE[11625:5755912]  [MXRealmCryptoStore] inboundGroupSessionWithId: qaJz1TQlUu9bh5zZKJ7vY00CIJ8a3J1ZN97tXklL3/c -> not found
2022-01-24 12:29:01.191 RiotNSE[11625:5755912]  [MXRealmCryptoStore] inboundGroupSessionWithId: qaJz1TQlUu9bh5zZKJ7vY00CIJ8a3J1ZN97tXklL3/c -> not found
2022-01-24 12:29:01.191 RiotNSE[11625:5755912]  [MXOlmDevice] addInboundGroupSession: Add megolm session Nhu0+B3JJNHr3abToRm0rYjXne20De67kH1RaNZBEAM|qaJz1TQlUu9bh5zZKJ7vY00CIJ8a3J1ZN97tXklL3/c (import: NO)
2022-01-24 12:34:22.580 RiotNSE[11625:5759055]  [MXRealmCryptoStore] inboundGroupSessionWithId: owIs7HdX7PXei0WhdmWQMVGXtZyKhkKUXqXoaSyRg3c -> not found
2022-01-24 12:34:22.581 RiotNSE[11625:5759055]  [MXRealmCryptoStore] inboundGroupSessionWithId: owIs7HdX7PXei0WhdmWQMVGXtZyKhkKUXqXoaSyRg3c -> not found
2022-01-24 12:34:22.581 RiotNSE[11625:5759055]  [MXOlmDevice] addInboundGroupSession: Add megolm session 0lpDfPoxFw5CickEP4fWOLUb9Kho+Ab/sVDvlouUdgk|owIs7HdX7PXei0WhdmWQMVGXtZyKhkKUXqXoaSyRg3c (import: NO)
2022-01-24 12:34:34.809 RiotNSE[11625:5760020]  [MXRealmCryptoStore] inboundGroupSessionWithId: owIs7HdX7PXei0WhdmWQMVGXtZyKhkKUXqXoaSyRg3c -> found
2022-01-24 12:34:34.829 RiotNSE[11625:5760020]  [MXRealmCryptoStore] inboundGroupSessionWithId: owIs7HdX7PXei0WhdmWQMVGXtZyKhkKUXqXoaSyRg3c -> found
2022-01-24 12:34:34.830 RiotNSE[11625:5760020]  [MXRealmCryptoStore] inboundGroupSessionWithId: owIs7HdX7PXei0WhdmWQMVGXtZyKhkKUXqXoaSyRg3c -> found
2022-01-24 13:09:41.300 RiotNSE[11625:5774961]  [MXRealmCryptoStore] inboundGroupSessionWithId: owIs7HdX7PXei0WhdmWQMVGXtZyKhkKUXqXoaSyRg3c -> not found
2022-01-24 13:09:41.301 RiotNSE[11625:5774961]  [MXRealmCryptoStore] inboundGroupSessionWithId: owIs7HdX7PXei0WhdmWQMVGXtZyKhkKUXqXoaSyRg3c -> not found
2022-01-24 13:09:41.312 RiotNSE[11625:5774961]  [MXRealmCryptoStore] inboundGroupSessionWithId: owIs7HdX7PXei0WhdmWQMVGXtZyKhkKUXqXoaSyRg3c -> not found
2022-01-24 13:09:41.313 RiotNSE[11625:5774961]  [MXRealmCryptoStore] inboundGroupSessionWithId: owIs7HdX7PXei0WhdmWQMVGXtZyKhkKUXqXoaSyRg3c -> not found

Which is interesting, as somewhere between 12:34:34.830 and 13:09:41.300 Realm looks to have forgotten owIs7HdX7PXei0WhdmWQMVGXtZyKhkKUXqXoaSyRg3c.

@ara4n
Copy link
Member Author

ara4n commented Jan 24, 2022

So looks like the NSE deleted our own crypto store, during this run:

2022-01-24 13:01:51.561 RiotNSE[11625:5769581]  ################################################################################
2022-01-24 13:01:51.561 RiotNSE[11625:5769581]  [NotificationService] Instance: <RiotNSE.NotificationService: 0x108fb2c40>, thread: <NSThread: 0x1098c0940>{number = 22, name = (null)}
2022-01-24 13:01:51.561 RiotNSE[11625:5769581]  [NotificationService] Payload came: [AnyHashable("event_id"): $X8dEMlE-plUFWV8XbqukpXZgn2vK1RYqW1DxILwIpac, AnyHashable("aps"): {
    alert =     {
        "loc-args" =         (
        );
        "loc-key" = NOTIFICATION;
    };
    "mutable-content" = 1;
}, AnyHashable("room_id"): !wwFVmCFqtJcHEoUKHL:vector.modular.im, AnyHashable("unread_count"): 227]
2022-01-24 13:01:51.562 RiotNSE[11625:5769581]  [NotificationService] Memory: footprint: 15.17 MB - available: 8.83 MB
2022-01-24 13:01:51.562 RiotNSE[11625:5769581]  [MXKAccountManager] Force reload existing accounts from local storage
2022-01-24 13:01:51.562 RiotNSE[11625:5769581]  [MXKAccountManager] loadAccounts
2022-01-24 13:01:51.567 RiotNSE[11625:5769581]  [MXHTTPClient] #48 - GET _matrix/client/r0/devices/FFQXZSYKLZ
2022-01-24 13:01:51.567 RiotNSE[11625:5769581]  [MXKAccountManager] loadAccounts. 1 accounts loaded in 5ms
2022-01-24 13:01:51.571 RiotNSE[11625:5769581]  [MXSyncResponseFileStore] readMetaData: File does not exist
2022-01-24 13:01:51.571 RiotNSE[11625:5769581]  [NotificationService] fetchEvent
2022-01-24 13:01:51.571 RiotNSE[11625:5771376]  [MXBackgroundSyncService] event: Start processing $X8dEMlE-plUFWV8XbqukpXZgn2vK1RYqW1DxILwIpac after waiting for 0ms
2022-01-24 13:01:51.571 RiotNSE[11625:5771376]  [MXBackgroundSyncService] fetchEvent: $X8dEMlE-plUFWV8XbqukpXZgn2vK1RYqW1DxILwIpac. allowSync: true
2022-01-24 13:01:51.579 RiotNSE[11625:5771376]  [MXSyncResponseFileStore] readMetaData: File does not exist
2022-01-24 13:01:51.579 RiotNSE[11625:5771376]  [MXSyncResponseFileStore] readMetaData: File does not exist
2022-01-24 13:01:51.579 RiotNSE[11625:5771376]  [MXBackgroundSyncService] updateBackgroundServiceStoresIfNeeded: Reset MXBackgroundCryptoStore
2022-01-24 13:01:51.579 RiotNSE[11625:5771376]  [MXRealmCryptoStore] deleteStore for @matthew:matrix.org:bgCryptoStore:FFQXZSYKLZ, readOnly: NO
2022-01-24 13:01:51.581 RiotNSE[11625:5771376]  [MXRealmCryptoStore] deleteStore for @matthew:matrix.org:bgCryptoStore:FFQXZSYKLZ, readOnly: YES
2022-01-24 13:01:51.583 RiotNSE[11625:5771376]  [MXRealmCryptoStore] deleteStore for @matthew:matrix.org:FFQXZSYKLZ, readOnly: YES
2022-01-24 13:01:51.602 RiotNSE[11625:5769581]  [UNUserNotificationCenter] removeUnwantedNotifications: Removing 0 notifications.
2022-01-24 13:01:51.680 RiotNSE[11625:5771376]  [MXRealmCryptoStore] createStore for @matthew:matrix.org:bgCryptoStore:FFQXZSYKLZ
2022-01-24 13:01:51.749 RiotNSE[11625:5771376]  [MXRealmCryptoStore] initWithCredentials for @matthew:matrix.org:bgCryptoStore:FFQXZSYKLZ
2022-01-24 13:01:51.749 RiotNSE[11625:5771376]  [MXRealmCryptoStore] Schema version: 17
2022-01-24 13:01:51.750 RiotNSE[11625:5771376]  [MXSyncResponseFileStore] readMetaData: File does not exist
2022-01-24 13:01:51.750 RiotNSE[11625:5771376]  [MXSyncResponseStoreManager] event: Not found event $X8dEMlE-plUFWV8XbqukpXZgn2vK1RYqW1DxILwIpac in room !wwFVmCFqtJcHEoUKHL:vector.modular.im
2022-01-24 13:01:51.750 RiotNSE[11625:5771376]  [MXBackgroundSyncService] fetchEvent: We don't have the event in stores. Launch a background sync to fetch it.
2022-01-24 13:01:51.750 RiotNSE[11625:5771376]  [MXSyncResponseFileStore] readMetaData: File does not exist
2022-01-24 13:01:51.750 RiotNSE[11625:5771376]  [MXBackgroundSyncService] launchBackgroundSync: start from token m2639600963~3.2639600967_757284974_8672925_1197832790_1165725988_2932224_404967967_2816333768_211296
2022-01-24 13:01:51.750 RiotNSE[11625:5771376]  [MXHTTPClient] #49 - GET _matrix/client/r0/sync
2022-01-24 13:01:51.751 RiotNSE[11625:5755274]  [MXHTTPClient] #48 - GET _matrix/client/r0/devices/FFQXZSYKLZ completed in 184ms
2022-01-24 13:01:51.751 RiotNSE[11625:5755274]  [MXHTTPClient] cleanupBackgroundTask
2022-01-24 13:01:51.751 RiotNSE[11625:5755274]  [MXKAccountManager] saveAccounts...
2022-01-24 13:01:51.754 RiotNSE[11625:5755274]  [MXKAccountManager] saveAccounts. Done (result: 1) in 3ms
2022-01-24 13:02:12.677 RiotNSE[11625:5755274]  [MXHTTPClient] #49 - GET _matrix/client/r0/sync completed in 20926ms
2022-01-24 13:02:12.677 RiotNSE[11625:5755274]  [MXHTTPClient] Request 0x108f779b0 failed for path: _matrix/client/r0/sync - HTTP code: 0. Error: Error Domain=NSURLErrorDomain Code=-1001 "The request timed out." UserInfo={_kCFStreamErrorCodeKey=-2102, NSUnderlyingError=0x109f094f0 {Error Domain=kCFErrorDomainCFNetwork Code=-1001 "(null)" UserInfo={_kCFStreamErrorCodeKey=-2102, _kCFStreamErrorDomainKey=4}}, _NSURLErrorFailingURLSessionTaskErrorKey=LocalDataTask <126637DC-8A79-4DDC-B61A-F96535C14D67>.<25>, _NSURLErrorRelatedURLSessionTaskErrorKey=(
    "LocalDataTask <126637DC-8A79-4DDC-B61A-F96535C14D67>.<25>"
), NSLocalizedDescription=The request timed out., NSErrorFailingURLStringKey=https://matrix-client.matrix.org/_matrix/client/r0/sync?filter=139&set_presence=offline&since=m2639600963~3.2639600967_757284974_8672925_1197832790_1165725988_2932224_404967967_2816333768_211296&timeout=0, NSErrorFailingURLKey=https://matrix-client.matrix.org/_matrix/client/r0/sync?filter=139&set_presence=offline&since=m2639600963~3.2639600967_757284974_8672925_1197832790_1165725988_2932224_404967967_2816333768_211296&timeout=0, _kCFStreamErrorDomainKey=4}
2022-01-24 13:02:12.678 RiotNSE[11625:5755274]  [MXHTTPClient] cleanupBackgroundTask
2022-01-24 13:02:12.688 RiotNSE[11625:5766004]  [MXBackgroundSyncService] launchBackgroundSync: MXRestClient.syncFromToken returned with error: Error Domain=NSURLErrorDomain Code=-1001 "The request timed out." UserInfo={_kCFStreamErrorCodeKey=-2102, NSUnderlyingError=0x109f094f0 {Error Domain=kCFErrorDomainCFNetwork Code=-1001 "(null)" UserInfo={_kCFStreamErrorCodeKey=-2102, _kCFStreamErrorDomainKey=4}}, _NSURLErrorFailingURLSessionTaskErrorKey=LocalDataTask <126637DC-8A79-4DDC-B61A-F96535C14D67>.<25>, _NSURLErrorRelatedURLSessionTaskErrorKey=(
    "LocalDataTask <126637DC-8A79-4DDC-B61A-F96535C14D67>.<25>"
), NSLocalizedDescription=The request timed out., NSErrorFailingURLStringKey=https://matrix-client.matrix.org/_matrix/client/r0/sync?filter=139&set_presence=offline&since=m2639600963~3.2639600967_757284974_8672925_1197832790_1165725988_2932224_404967967_2816333768_211296&timeout=0, NSErrorFailingURLKey=https://matrix-client.matrix.org/_matrix/client/r0/sync?filter=139&set_presence=offline&since=m2639600963~3.2639600967_757284974_8672925_1197832790_1165725988_2932224_404967967_2816333768_211296&timeout=0, _kCFStreamErrorDomainKey=4}
2022-01-24 13:02:12.688 RiotNSE[11625:5755274]  [NotificationService] fetchEvent: error: Error Domain=NSURLErrorDomain Code=-1001 "The request timed out." UserInfo={_kCFStreamErrorCodeKey=-2102, NSUnderlyingError=0x109f094f0 {Error Domain=kCFErrorDomainCFNetwork Code=-1001 "(null)" UserInfo={_kCFStreamErrorCodeKey=-2102, _kCFStreamErrorDomainKey=4}}, _NSURLErrorFailingURLSessionTaskErrorKey=LocalDataTask <126637DC-8A79-4DDC-B61A-F96535C14D67>.<25>, _NSURLErrorRelatedURLSessionTaskErrorKey=(
    "LocalDataTask <126637DC-8A79-4DDC-B61A-F96535C14D67>.<25>"
), NSLocalizedDescription=The request timed out., NSErrorFailingURLStringKey=https://matrix-client.matrix.org/_matrix/client/r0/sync?filter=139&set_presence=offline&since=m2639600963~3.2639600967_757284974_8672925_1197832790_1165725988_2932224_404967967_2816333768_211296&timeout=0, NSErrorFailingURLKey=https://matrix-client.matrix.org/_matrix/client/r0/sync?filter=139&set_presence=offline&since=m2639600963~3.2639600967_757284974_8672925_1197832790_1165725988_2932224_404967967_2816333768_211296&timeout=0, _kCFStreamErrorDomainKey=4}
2022-01-24 13:02:12.688 RiotNSE[11625:5755274]  [NotificationService] fallbackToBestAttemptContent: method called.
2022-01-24 13:02:12.689 RiotNSE[11625:5755274]  --------------------------------------------------------------------------------
2022-01-24 13:02:12.690 RiotNSE[11625:5766004]  [NotificationService] deinit for <RiotNSE.NotificationService: 0x108fb2c40>
2022-01-24 13:02:12.691 RiotNSE[11625:5766004]  [NotificationService] Memory: footprint: 15.22 MB - available: 8.78 MB

...which would rather explain why E2EE then exploded.

@ara4n
Copy link
Member Author

ara4n commented Jan 24, 2022

(also, why do we keep calling /devices every time we receive a push?)

@ara4n
Copy link
Member Author

ara4n commented Jan 25, 2022

Having read the code, I see that the NSE is trying to maintain a separate bgCryptoStore, and it's that which is being deleted rather than the main realm DB. However, there's presumably a race of some kind which means that it thinks that the app has consumed all the data from bgCryptoStore and so resets it prematurely. From a quick look at the code:

        if syncResponseStoreManager.syncResponseStore.syncResponseIds.count == 0 {
            // To avoid dead lock between processes, we write to the cryptoStore only from only one process.
            // If there is no cached sync responses, it means they have been consumed by MXSession. Now is the
            // right time to clean the cryptoStore.
            MXLog.debug("[MXBackgroundSyncService] updateBackgroundServiceStoresIfNeeded: Reset MXBackgroundCryptoStore")
            cryptoStore.reset()
        }

Looks like the if clause might be incorrectly returning 0?

In terms of the race, the timing here is that our push comes in at 13:01:51, concludes that the bgCryptoStore should be reset - but meanwhile the main app is fg'd and running from 13:01:28.018, doing a big slow incremental catchup that finally completes at 13:01:52.253 after taking 16.9s(!) to commit the MXFileStore.

I'm failing to see how the activity in the bgCryptoStore ever gets copied over to the real CryptoStore however, so i'm unclear on how the race is actually failing.

This whole thing feels like an incredibly fragile hack around the fact that our DB is too slow and inefficient to be used in NSE. Would love to know how rust-sdk and rust-sdk-crypto compare.

@ara4n ara4n changed the title NSE failed to save megolm sessions, causing an explosion of UISIs NSE failed to persist megolm sessions, causing an explosion of UISIs Jan 25, 2022
@manuroe
Copy link
Member

manuroe commented Jan 25, 2022

This whole thing feels like an incredibly fragile hack around the fact that our DB is too slow and inefficient to be used in NSE. Would love to know how rust-sdk and rust-sdk-crypto compare.

+100
The NSE and the app run own their own process. The workaround is because Realm does not support multi-process access. We doubled the number DB to make write operations only occurring from one process at a time.

Damir is aware of that. He confirmed that the rust-sdk-crypto and its DB support this scenario of multi-process.

@pixlwave pixlwave added A-E2EE A-Notifications A-SDK O-Occasional Affects or can be seen by some users regularly or most users rarely S-Major Severely degrades major functionality or product features, with no satisfactory workaround labels Jan 26, 2022
@pixlwave pixlwave moved this from Incoming to Triaged in Issue triage Jan 26, 2022
@ismailgulek
Copy link
Contributor

ismailgulek commented Feb 3, 2022

  • We've seen a case where we didn't process the latest sync caches gathered on NSE, because of some session state management issue. We'll fix this by processing sync cache right after we finished the previous processing, regardless of the session state.

  • We cannot explain /event/(null) requests yet, we'll add some logs to figure out where they come from. Also we'll avoid making those requests which will get constantly 404 Not Found errors.

  • [MXThrottler] throttle: Cancel previous block logs should not be an issue.

@ara4n
Copy link
Member Author

ara4n commented Mar 2, 2022

I think i just saw this again in 1.8.3 TF; rageshake attached.

@ara4n ara4n reopened this Mar 2, 2022
@jtrees
Copy link

jtrees commented Sep 11, 2022

I think my iPhone contacts may be running into this issue. They are unable to decrypt any messages any more. One contact ran into this state a few weeks ago and has since ditched Element and Matrix entirely (after years of use). The other has just reached it now.

I have two questions:

  1. What's the plan of action on your side? Are you actively investigating and/or working on a fix or waiting for element-x-ios?
  2. What workaround can we apply to get out of this state in the mean time? Clearing the cache didn't seem to help.

@initpwn
Copy link

initpwn commented Sep 14, 2022

Any updates/workarounds?

@toshanmugaraj
Copy link
Contributor

Yes the issue still exists

@toshanmugaraj
Copy link
Contributor

toshanmugaraj commented Nov 16, 2022

(also, why do we keep calling /devices every time we receive a push?)

I can see only the one way sync from the NSE side..

performSessionOperationWithDevice: Transfer data for vxgNnP4nW8M9xqh2/Yuc699K92Pa4rcsRzQ+RI06T8w from cryptoStore to bgCryptoStore

and there Is no bgCryptoStore to cryptoStore transfer.

is there a way to force sync cryptoStore through sync api.

@toshanmugaraj
Copy link
Contributor

Just curious if this can affect the sync tokens and cryptoStore getting updated:

matrix-org/matrix-ios-sdk@13ce311

@toshanmugaraj
Copy link
Contributor

toshanmugaraj commented Nov 20, 2022

Steps to reproduce decrypt error:

  • Keep the app in background so that new message notification will be received.
  • To simulate the issue, From sender device (Android) \discardsession .
  • Send new message.
  • Notification received in the receiver.
  • Open the message to see the decrypt error.

matrix-org/matrix-ios-sdk@13ce311

making the fileStore.loadMetaData(true) fixes this provided the discardsession happens only once..

@toshanmugaraj
Copy link
Contributor

This fix the issue :

matrix-org/matrix-ios-sdk#1638

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-E2EE A-Notifications A-SDK O-Occasional Affects or can be seen by some users regularly or most users rarely S-Major Severely degrades major functionality or product features, with no satisfactory workaround T-Defect Something isn't working: bugs, crashes, hangs and other reported problems
Projects
Development

Successfully merging a pull request may close this issue.

7 participants