Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disappearing custom relation events after restarting client or initial sync #27132

Open
tcpipuk opened this issue Mar 6, 2024 · 19 comments
Open
Labels
A-Timeline O-Uncommon Most users are unlikely to come across this or unexpected workflow S-Minor Impairs non-critical functionality or suitable workarounds exist T-Defect Z-Spec-Compliance An area where Element doesn't correctly implement the spec

Comments

@tcpipuk
Copy link

tcpipuk commented Mar 6, 2024

Steps to reproduce

When closing Element Web (I've tried app.element.io and develop.element.io) or clearing cache, the m.notice messages seem to disappear.

Here is a screenshot of an example ping test I just ran in the #ping-no-synapse:maunium.net room:

Ping results before clearing cache in client

Then how the room looks after clearing the cache in client to perform an initial sync:

Missing ping results after clearing cache in client

I've also tried using /devtools to show all hidden messages and there's simply no trace of these missing m.notice messages. I'm confident it's not a server issue as this behaviour doesn't occur in Element Android or other clients.

Outcome

Expected outcome is that rooms would appear roughly the same after restarting client or clearing cache as they did beforehand.

Operating system

Microsoft Windows 11 Pro 23H2 (22631.3235)

Browser information

Microsoft Edge 122.0.2365.66 (Official build) (64-bit)

URL for webapp

develop.element.io

Application version

Element version: f01d69f-react-942fabc5a8fe-js-7fee37680f33 Crypto version: Rust SDK 0.7.0 (b1918e9), Vodozemac 0.5.1

Homeserver

doctoruwu.uk

Will you send logs?

Yes

@t3chguy
Copy link
Member

t3chguy commented Mar 6, 2024

I'm confident it's not a server issue as this behaviour doesn't occur in Element Android or other clients.

Have you looked at the sync responses to confirm this? Just because another client gets them doesn't mean the server sent them the same data.

Please provide a sample of event IDs to correlate in your logs for the missing events.

@t3chguy t3chguy added the X-Needs-Info This issue is blocked awaiting information from the reporter label Mar 6, 2024
@tcpipuk
Copy link
Author

tcpipuk commented Mar 6, 2024

Have you looked at the sync responses to confirm this?

No, I'm not sure what I'd be looking at/for?

Please provide a sample of event IDs to correlate in your logs for the missing events.

The "spiritsail" one in the screenshot above is $-uu5w_Yz7EIgiTpH3Mnm3moPurp0aP7wAtdvDv016KE

@t3chguy
Copy link
Member

t3chguy commented Mar 6, 2024

image

Can you share the View Source of that same event please

@tcpipuk
Copy link
Author

tcpipuk commented Mar 6, 2024

Not this second, but I've pulled the event from the server over federation from another server I have access to right now:

$-uu5w_Yz7EIgiTpH3Mnm3moPurp0aP7wAtdvDv016KE
{
    "auth_events": [
        "$fJMpHekmhxm920T0QYIb-uoiAhkYd0y_cDN260N7m2o", 
        "$vYTZkTQPBTGPdzHtCpgo-Qz7dZ9uh5mbkqXPB4teN_s", 
        "$jS7DCCu_i_N4IJtmG9iofFZtC_i1p2QvwygWoYjUE-U"
    ], 
    "content": {
        "body": "@tom:doctoruwu.uk: Pong! (ping \"Example\" took 275 ms to arrive)", 
        "format": "org.matrix.custom.html", 
        "formatted_body": "<a href='https://matrix.to/#/@tom:doctoruwu.uk'>@tom:doctoruwu.uk</a>: Pong! (<a href='https://matrix.to/#/!ping-no-synapse:maunium.net/$f618YeJnv6DWI7eRTN9ysOX7324mp4V7Hju5K8h095k'>ping</a> \"Example\" took 275 ms to arrive)", 
        "m.relates_to": {
            "event_id": "$f618YeJnv6DWI7eRTN9ysOX7324mp4V7Hju5K8h095k", 
            "from": "doctoruwu.uk", 
            "ms": 275, 
            "rel_type": "xyz.maubot.pong"
        }, 
        "msgtype": "m.notice", 
        "pong": {
            "from": "doctoruwu.uk", 
            "ms": 275, 
            "ping": "$f618YeJnv6DWI7eRTN9ysOX7324mp4V7Hju5K8h095k"
        }
    }, 
    "depth": 33763, 
    "hashes": {
        "sha256": "QiRLY9/dH/QZXAxwnUFKsKsqwjv5vifzJqJ2JacP0K4"
    }, 
    "origin": "spritsail.io", 
    "origin_server_ts": 1709722417057, 
    "prev_events": [
        "$f618YeJnv6DWI7eRTN9ysOX7324mp4V7Hju5K8h095k"
    ], 
    "room_id": "!ping-no-synapse:maunium.net", 
    "sender": "@echo:spritsail.io", 
    "signatures": {
        "spritsail.io": {
            "ed25519:4cL10W6x": "S3ETw2iRZYqyh4MG5Id3vAF1muF6xmGQoJ6XFDHS5ZJzwt/qiAcVfZrmxmn7QlfPZoMgen/TWWjuUPIVAQeWDQ"
        }
    }, 
    "type": "m.room.message"
}

@t3chguy
Copy link
Member

t3chguy commented Mar 6, 2024

        "m.relates_to": {
            "event_id": "$f618YeJnv6DWI7eRTN9ysOX7324mp4V7Hju5K8h095k", 
            "from": "doctoruwu.uk", 
            "ms": 275, 
            "rel_type": "xyz.maubot.pong"
        }, 

This is likely causing the issue

@t3chguy t3chguy changed the title Disappearing m.notice messages after restarting client or initial sync Disappearing custom relation events after restarting client or initial sync Mar 6, 2024
@tcpipuk
Copy link
Author

tcpipuk commented Mar 6, 2024

Ah, I see. Is the problem that @tulir's echo plugin uses custom relations, or is the problem that Element Web filters them out?

@t3chguy
Copy link
Member

t3chguy commented Mar 6, 2024

Neither, its a bug in some assumptions the matrix-js-sdk makes

The spec says the unknown relation should be ignored

Relationships which don’t match the schema, or which break the rules of a relationship, are simply ignored.

@t3chguy
Copy link
Member

t3chguy commented Mar 6, 2024

The code today considers the relation and tries to find the parent event - $f618YeJnv6DWI7eRTN9ysOX7324mp4V7Hju5K8h095k in the case of your example - and see which timeline that event fits in. Are you able to get the source for that event too?

@tcpipuk
Copy link
Author

tcpipuk commented Mar 6, 2024

This should be it:

{
    "auth_events": [
        "$fJMpHekmhxm920T0QYIb-uoiAhkYd0y_cDN260N7m2o", 
        "$V1MtR_rtFySFmgRJydfrcWugbd6MTkoi3fdS7pv65_o", 
        "$vYTZkTQPBTGPdzHtCpgo-Qz7dZ9uh5mbkqXPB4teN_s"
    ], 
    "content": {
        "body": "!ping Example", 
        "m.mentions": {}, 
        "msgtype": "m.text"
    }, 
    "depth": 33762, 
    "hashes": {
        "sha256": "8NTX2wrlICbf/jca+DBulP9vBLiR6eqJJE5R/w/MqmA"
    }, 
    "origin": "doctoruwu.uk", 
    "origin_server_ts": 1709722416757, 
    "prev_events": [
        "$TiVPb7UurJVqspFMQRf69TufHj9svGp9K4YKdvTUasw"
    ], 
    "room_id": "!ping-no-synapse:maunium.net", 
    "sender": "@tom:doctoruwu.uk", 
    "signatures": {
        "doctoruwu.uk": {
            "ed25519:wl95vzHo": "UVTgxmdwuzLhH9bQvYEZfiNO66RlW8gzfTqZaEljky5PpebF8C+e01iHKOk6dSEWneWoGZNh7UaMWYKEIE9MCA"
        }
    }, 
    "type": "m.room.message", 
    "unsigned": {}
}

@t3chguy
Copy link
Member

t3chguy commented Mar 6, 2024

So I guess the issue here boils down to that parent event not being available to the client at the time the relation is loaded and the code isn't asynchonously loading the parent of every event with a relation as this is not needed for all specced relation types (e.g. threads, replies, edits, reactions) as this would increase server load substantially. Definitely something that needs fixing but unlikely to get any attention from the team given the edge case.

EA not having this issue is because it dumps all events in the main timeline by default, but this is a common cause of stuck notifications (in cases other than this one)

@tcpipuk
Copy link
Author

tcpipuk commented Mar 6, 2024

I see, so do you think this is a problem with Conduit not providing events in the correct order when doing a full sync?

I agree it'd be nice if Element Web didn't have this problem, but if the problem is that the server's not providing events in the correct order (so the parent arrives after the child) then I can look at possible patches on the server side to mitigate this in the meantime.

@t3chguy
Copy link
Member

t3chguy commented Mar 6, 2024

I see, so do you think this is a problem with Conduit not providing events in the correct order when doing a full sync?

No, more likely the event is just outside the page of events provided in the sync

Ideally this would be solved by the parent event being provided via unsigned so the client always knew the parent event but that would need spec work

@t3chguy t3chguy added X-Needs-Info This issue is blocked awaiting information from the reporter S-Minor Impairs non-critical functionality or suitable workarounds exist A-Timeline Z-Spec-Compliance An area where Element doesn't correctly implement the spec O-Uncommon Most users are unlikely to come across this or unexpected workflow and removed X-Needs-Info This issue is blocked awaiting information from the reporter labels Mar 6, 2024
@realtyem
Copy link

realtyem commented Mar 6, 2024

What other information would you think might be helpful here? I'd like this resolved as well.

@t3chguy
Copy link
Member

t3chguy commented Mar 6, 2024

What usecases warrant these non-standard relations for better informed triage

@tcpipuk
Copy link
Author

tcpipuk commented Mar 6, 2024

The m.notice events are used frequently by bots, which is why it's very prevalent in the ping rooms I used in my examples above.

It sounds like "fixing" this would require quite a lot of work, but would it be feasible to offer a /devtools or Labs option to specify a different limit, so the initial sync can pull back more events if the server permits it?

I just tested this working by overriding the limit value to 50 in Nginx so it pulls back enough events to be able to resolve the relation:

  # Specific match for sync request with "limit=20"
  location ~ ^/_matrix/client/v3/sync\?(.*)(limit=)20(.*)&_cacheBuster=(.*)$ {
    set $args $1$2$350$4; # Modify limit value to 50
    proxy_pass http://doctoruwu_conduwuit/_matrix/client/v3/sync?$args;
  }

  location /_matrix/ {
    proxy_pass http://doctoruwu_conduwuit;
  }

With this in place, Element Web now shows all of the missing events, but this workaround is just a pretty messy hack to demonstrate that it would resolve it.

@t3chguy
Copy link
Member

t3chguy commented Mar 6, 2024

The m.notice events are used frequently by bots, which is why it's very prevalent in the ping rooms I used in my examples above.

It has nothing to do with them being m.notice events. It is entirely to do with them using a non-standard relation.

but would it be feasible to offer a /devtools or Labs option to specify a different limit, so the initial sync can pull back more events if the server permits it?

The limit is configurable via config.json only

@tcpipuk
Copy link
Author

tcpipuk commented Mar 6, 2024

The m.notice events are used frequently by bots, which is why it's very prevalent in the ping rooms I used in my examples above.

It has nothing to do with them being m.notice events. It is entirely to do with them using a non-standard relation.

Sorry, I misspoke. It seems these types of relations are used whenever Maubot replies to a user, and Maubot is one of the most commonly used bot frameworks in Matrix, so while I don't have specific numbers, it would be quite common.

The limit is configurable via config.json only

Excellent. I'll look at running my own build to mitigate this issue for now, thanks.

@t3chguy
Copy link
Member

t3chguy commented Mar 6, 2024

Do you know why maubot doesn't use actual replies as would be semantically accurate?

@tulir
Copy link
Contributor

tulir commented Mar 6, 2024

They're pongs, not replies. Only used by echo bots mostly in ping rooms though, not any other bots. Rendering replies would take even more space in the UI and there's no reason for them to be replies

@t3chguy t3chguy removed the X-Needs-Info This issue is blocked awaiting information from the reporter label Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Timeline O-Uncommon Most users are unlikely to come across this or unexpected workflow S-Minor Impairs non-critical functionality or suitable workarounds exist T-Defect Z-Spec-Compliance An area where Element doesn't correctly implement the spec
Projects
None yet
Development

No branches or pull requests

4 participants