Add retries for sending to APNs or FCM #13518
Labels
area: mobile
Mobile push notifications; features motivated by mobile; issues requiring changes in mobile code.
area: notifications (messages)
area: production
This is similar to #9760 (aka #13294), but at the next link in the chain. When sending a push message to APNs or FCM (from either the bouncer service, or a Zulip server which has been set up with its own credentials for APNs and FCM), we should retry requests that fail with network errors, with 5xx responses, or with a timeout.
(We should also make sure there is a timeout on those requests, so that if for some reason a request fails by hanging forever, this mechanism covers it.)
Moreover the retry should go back on the queue, at the end, so that if one or a few specific push messages trigger an error, they don't starve everything else. This is the same thing that since 2b6cfbc we do for failures in talking to the bouncer service.
A further good improvement would be to ensure that the retries back off by at least a few seconds, so that if there is a transient issue the backend has some time to recover before we blow through the max number of retries we want to make. For example the retry event could have a not-before timestamp. The one tricky thing then would be to ensure that if there are a number of retries in the queue and we get a fresh event -- perhaps one that doesn't trigger an issue and so would succeed promptly if handled -- we do actually get to it promptly.
The text was updated successfully, but these errors were encountered: