Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polling the homeserver, the app always times out after 60 seconds #2169

Closed
mcg-matrix opened this issue Sep 25, 2020 · 6 comments · Fixed by #2385
Closed

Polling the homeserver, the app always times out after 60 seconds #2169

mcg-matrix opened this issue Sep 25, 2020 · 6 comments · Fixed by #2385
Assignees
Labels
A-Store-F-Droid T-Defect Something isn't working: bugs, crashes, hangs and other reported problems
Milestone

Comments

@mcg-matrix
Copy link

I am using Element installed from F-Droid. Version 1.0.7 brought along #2080. Setting "Sync request timeout" does make the app set the "&timeout" parameter in a background-polling request for "/_matrix/client/r0/sync" as expected. 👍

"Sync request timeout" apparently can be set to any integer value. (I like that. Given the option (and support by the homeserver for very-long-living requests), I'd probably set it to "infinite", so I'd have a permanent TCP connection to the homeserver.)

However, no matter the value of "Sync request timeout", the app gives up waiting for an answer after 60 seconds. It then closes the TCP connection, which is a very expensive operation, because the connection (and probably the SSL session, too) can't be reused for further requests.

Would it be possible to increase the app's internal timeout to something like "Sync request timeout + 10 seconds"?

@BillCarsonFr BillCarsonFr added the T-Defect Something isn't working: bugs, crashes, hangs and other reported problems label Sep 26, 2020
@witchent
Copy link

Just a question, if that bug would be fixed, would increasing the "Sync request timeout" help with high data usage (even though I am only in two DM it used 14MB for around 50 messages) or is this just an unfixable thing with the matrix protocol/synapses implementation?

@mcg-matrix
Copy link
Author

Just reporting back that it works, thanks! 👍

Using version 1.0.11 from F-Droid, I see the client app giving up waiting for an answer to the sync request after 30 minutes (1799.x seconds).

So I'm now using a manually configured "Sync request timeout" of 1777 seconds (with a "Delay between each sync" of 0 seconds (so the next sync request gets placed quickly) and "Background Sync Mode" set to "Optimized for real time" (so I can see the icon in the status bar)), and I can observe a substantial reduction in battery use by Element:

1.0.9 1.0.10 1.0.11 (gaps from dozing) 1.0.11 (almost fulltime listening)
1.0.9 1.0.10 1 0 11-1777s-dozing 1.0.11 doze

I've now even told Element to ignore background restrictions (from the "Troubleshoot Notifications" page; not sure where else that setting can be changed or checked), so the next sync request really does get placed very shortly even when Android is dozing.

A single TCP connection is often used for dozens of HTTP requests (both long-lasting sync requests or other, quick requests).

In case someone's wondering: To make these long-living connections work, I have placed a reverse HTTP proxy in front of the Synapse server (which is hosted by Modular Element Matrix Services). That proxy:

  • enables TCP-Keepalive (every 4.5 minutes, to keep NAT devices, firewalls, etc. happy) with a huge number of unsuccessful checks required before giving up on a connection (to allow continued use of a connection even if the client side was asleep or dozing or without reception or whatever for a while),
  • allows HTTP-Keepalive with generous timeouts between requests (to allow reuse of the TCP connection with its SSL context even if the client side is asleep/dozing/whatever and takes a long time until the next request gets placed),
  • repeats requests to the upstream Synapse server if that server sends a timeout response (which happens after 3 minutes in the case of my homeserver), so the client application won't be bothered.

@mcg-matrix
Copy link
Author

Using version 1.0.11 from F-Droid, I see the client app giving up waiting for an answer to the sync request after 30 minutes (1799.x seconds).

Well, that's only part of the story. With "Background Sync Mode" set to "Optimized for battery", I could observe different behaviour:

  • A: While the phone is "awake", the mobile side typically gives up after 599 seconds, tearing down the TCP connection. Thus the practically usable maximum for "Sync request timeout" is less than 600 seconds.
  • B: Latest after about an hour of idling around (Android "doze mode" kicking in? Is that a state per app or a state of the whole system?), the mobile side frequently gives up a lot earlier (after anything from 15 seconds to several minutes; I failed to detect a pattern).

A and B each lead to a higher energy usage in mode "Optimized for battery" than in mode "Optimized for real time". I would have lost about 15% of the battery savings achieved as described above (#2169 (comment)), so I'll continue using "Optimized for real time".

@karlkashofer
Copy link

Hi ! Fighting with battery issue too, could you please post your nginx reverse proxy config ?

@mcg-matrix
Copy link
Author

... could you please post your nginx reverse proxy config ?

Here's the standard part of the nginx reverse proxy config. "cheers.modular.im" is the DNS name of the real homeserver. Enabling HTTP-keep-alive to upstream is optional, it doesn't make a difference to the Matrix clients.

        proxy_set_header Host cheers.modular.im;  # needed!
        proxy_pass https://cheersmodularim;  # NOTE: Do not add a / after the port in proxy_pass, otherwise nginx will canonicalise/normalise the URI.
        proxy_ssl_server_name on;  # enables SNI
        proxy_ssl_name cheers.modular.im;  # checked against the certificate, I think
        proxy_ssl_trusted_certificate /usr/share/ca-certificates/mozilla/DST_Root_CA_X3.crt;  # for Let's Encrypt

        # Enable HTTP-keep-alive to upstream:
        proxy_http_version 1.1;
        proxy_set_header Connection "";

Here's a part of the reverse proxy config specific to making it work for Matrix. Apparently, the homeserver's real name is used somewhere in the returned content, making the client contact the real homeserver directly.
Note that I don't care about server-to-server communication: m.server in .well-known/matrix/server contains the real homeserver's name, only m.homeserver in .well-known/matrix/client points to the reverse proxy.

        proxy_set_header Accept-Encoding "";  # with compression, sub_filter won't work... ...
        sub_filter https://cheers.modular.im https://matrix4c.cheers.de;  ## sooner or later, this shall break something...
        sub_filter_once off;
        sub_filter_types *;

        # Add back compression towards clients:
        gzip on;
        gzip_types application/json;
        # We are adding 23 bytes of "Content-Encoding: gzip" to every response.
        # Note that gzip_min_length only looks at the Content-Length header: no such header => always compression.

Here's the config to enable repetition of requests by the nginx reverse proxy on behalf of the clients, in two parts.
Again, considering HTTP-keep-alive towards upstream doesn't make a difference to the Matrix clients.

        # If we haven't heard anything from upstream within 1 hour, that's
        # very unusual, so let's ask the next upstream:
        proxy_read_timeout 3600;
        # 504 is the usual respone of the real homeserver (well, its front proxy, I guess) after ~3 minutes.
        proxy_next_upstream error timeout invalid_header http_500 http_502 http_503 http_504;
upstream cheersmodularim {
    # Each of these will close the connection or send a 504 or something
    # like that after 3 minutes.
    # It appears that each server is contacted only once per client request,
    # even if many minutes have passed since the server timed out, and no
    # matter whether it is marked as "backup" or not.
    # It seems that something the Matrix client is supposed to know about
    # happens at least every 2700 seconds = 45 minutes. Having ~15 servers
    # listed here should thus be sufficient to never have to send a 504 to
    # the client.
    # priority is currently only available via commercial subscription and SRV records, it seems.
    # Let's use weights to increase the chance that we'll reuse a keepalive connection:
    server cheers.modular.im:443 weight=1000000000000016;
    server cheers.modular.im:443 weight=100000000000015;
    server cheers.modular.im:443 weight=10000000000014;
    server cheers.modular.im:443 weight=1000000000013;
    server cheers.modular.im:443 weight=100000000012;
    server cheers.modular.im:443 weight=10000000011;
    server cheers.modular.im:443 weight=1000000010;
    server cheers.modular.im:443 weight=100000009;
    server cheers.modular.im:443 weight=10000008;
    server cheers.modular.im:443 weight=1000007;
    server cheers.modular.im:443 weight=100006;
    server cheers.modular.im:443 weight=10005;
    server cheers.modular.im:443 weight=1004;
    server cheers.modular.im:443 weight=103;
    server cheers.modular.im:443 weight=12;
    server cheers.modular.im:443 weight=1;
    keepalive 99;
}

Keeping TCP connections from clients alive and usable for HTTP requests:

    # Whenever a TCP connection (from a client) hasn't been active
    # for 4m35s, have it checked. If no response, check again
    # every 4m35s minutes. Shut down the connection after 127 (max allowed) checks.
    # (Goal #1 keep state-observing routers happy, #2 allow continued
    # use even after many hours of offline/sleep.)
    listen  [2a01:4f8:200:1473:cee:a1c5:4c:0]:443 ssl so_keepalive=275s:275s:127 ;
    listen  144.76.140.122:443                    ssl so_keepalive=275s:275s:127 ;
        # Let clients use a connection for a looong time if they want to:
        ## we are adding 23 bytes of "Connection: keep-alive" to every response!
        keepalive_requests 9999;
        keepalive_timeout 99999s;

@mcg-matrix
Copy link
Author

Looks like I missed a (non-significant) part of my nginx config: The log_format I'm using is a pimped version of "combined" that allows the operator to see the reuse of TCP connections for multiple requests by a client, duration of handling a request, the possibly more-than-one upstream responses for a client's single request, and compression efficiency. I'm using this format for various virtual hosts; $scheme, $host, and $upstream_cache_status are not related to the Matrix reverse proxying.

log_format mcgfull '${remote_addr}_$remote_port#$connection_requests - $remote_user [$time_local] '
                    '"$request" $status $bytes_sent>$body_bytes_sent($gzip_ratio) '
                    '"$http_referer" "$http_user_agent" $scheme $host $upstream_cache_status '
                    'request_time=$request_time upstream_response_time="$upstream_response_time" '
                    'upstream_status="$upstream_status"';

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Store-F-Droid T-Defect Something isn't working: bugs, crashes, hangs and other reported problems
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants