Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible out-of-order delivery of server->client messages #446

Open
krajj7 opened this issue May 9, 2024 · 14 comments
Open

Possible out-of-order delivery of server->client messages #446

krajj7 opened this issue May 9, 2024 · 14 comments

Comments

@krajj7
Copy link

krajj7 commented May 9, 2024

Hi Peter,

I noticed that the buffering of server->client messages that Sente does can randomly cause out-of-order delivery under certain conditions.

For example when a thread quickly sends two messages like this via Sente using the send-fn:
N1 larger message
N2 small message

The client might receive them in the order N2 N1.

From looking at the code, around this point the flush-buffer! function seems problematic:

(let [buffered-evs-ppstr (pack packer buffered-evs)]

If a message happens to take a longer time to pack (say 50 ms), or the thread just stalls for whatever reason after picking up a message form the buffer, other threads executing flush-buffer! can pick up and send newer messages before the older thread is done.

If none of the messages use {:flush? true}, the stall would probably have to be longer than send-buf-ms-ws (30 ms by default) to potentially cause a problem. In case message N2 used {:flush? true} and N1 didn't, then it's probably not even needed to stall for as long as send-buf-ms-ws, depending on timing (send N2 just after N1 is pulled from the buffer).

The current workaround for me is to use {:flush? true} for the subset of messages that are ordering-critical.

@ptaoussanis
Copy link
Member

Hi Jan-

Just to make sure we're on the same page- did something in particular give you the impression that message order is somehow guaranteed?

In general you can't really be sure that messages will be received in the same order they're sent, even when they are sent in strictly sequential order - so it's generally best not to depend on strict ordering.

Could you describe a little more about your use case?

@krajj7
Copy link
Author

krajj7 commented May 9, 2024

As far as I know, raw Websockets do guarantee message order, since they use a single TCP connection. I believe Websocket libraries generally keep this property, which is why I thought Sente would do the same.

In my app I use Sente to synchronize some state between the server and the clients. If updates can arrive in random order that would make it a bit more difficult.

@ptaoussanis
Copy link
Member

Sente isn't a raw WebSocket library, it operates over both WebSockets and Ajax - and supports the same API over both. Which does mean embracing some limitations.

I've generally never had an issue with needing strict message ordering though- that's usually something that can be overcome without too much trouble at the application level.

Might depend on the details of your use-case though (so how precisely your state sychronization works).

@krajj7
Copy link
Author

krajj7 commented May 9, 2024

Understood, if ordering is deliberately not guaranteed that is fine.

I would suggest documenting this though, because to be honest Sente's README.md does give me the impression that ordering would be guaranteed:

  • "Loosely inspired by Socket.IO" - socket.io explicitly guarantees ordering: https://socket.io/docs/v4/delivery-guarantees
  • "Bidirectional a/sync comms over WebSockets with auto Ajax fallback" - the focus is on Websockets
  • "It just works: auto keep-alive, buffering, protocol selection, reconnects" - guaranteed message ordering would also be nice

Assuming Ajax is disabled (clients use :type :ws) and messages are sent with {:flush? true}, is there anything in Sente that might still break message ordering? From what I can see in these circumstances ordering would actually be kept, which is perfectly good enough for me.

@ptaoussanis
Copy link
Member

Understood, if ordering is deliberately not guaranteed that is fine.

That is indeed the current situation, apologies if that was unclear. I'll take a look at clarifying the docs when I'm next doing batched work on Sente.

Assuming Ajax is disabled (clients use :type :ws) and messages are sent with {:flush? true}, is there anything in Sente that might still break message ordering?

Not that I'm aware of off-hand, but I could be mistaken.

Note that I'd still stop short of confidently saying that even pure WebSocket messages will always be received in order. While it's a lot less likely to occur, I believe there's still network conditions that could lead to out-of-order receipt unless some sort of sequencing protocol is explicitly used.

My advice is usually to avoid the need for strict ordering, since in my experience one rarely needs such a thing - and when one does, it's often anyway better to implement some sort of timestamp or sequence mechanism appropriate at the application layer.

Of course YMMV.

Best of luck!

@krajj7
Copy link
Author

krajj7 commented May 9, 2024

No worries, thank you for the clarification.

I'm sure all protocols can work without guaranteed ordering by clever design and careful implementation, but having the ordering guaranteed just means one less thing to worry about.

I believe that TCP connections, and therefore Websockets, have strong enough guarantees about delivery order under any network conditions. TCP already does all the sequencing and rearranging for us, I don't want to reimplement it again at the application level. If it didn't reliably work in TCP, that would be a serious flaw in a fundamental internet protocol.

If I can disable the Ajax fallback and buffering to get the ordering guarantee back (as strong as the underlying Websocket), that seems like a good tradeoff in 2024 with universal Websocket support. I think this would be great as an official option.

Please feel free to close this issue.

@ptaoussanis
Copy link
Member

I believe that TCP connections, and therefore Websockets, have strong enough guarantees about delivery order under any network conditions.

I'm far from an expert on this, but my understanding is that WebSockets don't explicitly provide cross-message ordering guarantees. That'd generally require some kind of sequencing mechanism, which (again - my understanding) isn't part of the WebSocket protocol.

TCP guarantees in-order delivery of data at the byte level only. That's different from a guarantee of in-order delivery at the message level.

Anyway, I might be mistaken - and it sounds like you're satisfied with the ordering provided natively by WebSockets.

I think this would be great as an official option.

Just to clarify- what do you believe would prevent you from doing this? It's certainly an official option to disable Ajax fallback, as you described. Then you're left with whatever WebSockets provide.

You can write a little closure to force flushing on every call if you like. Or is there something else you have in mind that you'd like Sente to provide?

@krajj7
Copy link
Author

krajj7 commented May 9, 2024

I am also not an expert. Unfortunately the WebSocket RFC doesn't say explicitly, but I believe the ordering guarantee is implicit in how the TCP connection is used. The byte-level guarantee that TCP provides makes it easy to guarantee ordering of Websocket messages within a single connection.

I only have some non-authoritative sources that confirm this:

Just to clarify- what do you believe would prevent you from doing this? It's certainly an official option to disable Ajax fallback, as you described. Then you're left with whatever WebSockets provide.

Nothing prevents me from using the workaround right now, only that

  1. neither of us is sure that this fully restores ordering guarantees back to what websockets guarantee
  2. it means relying on some implementation detail and new versions of Sente may freely introduce other features that can reorder messages

By "official option" I meant that the docs could explicitly say that in this configuration, Websocket ordering guarantees would be maintained.

@ptaoussanis
Copy link
Member

ptaoussanis commented May 10, 2024

I guess it's possible we're debating the semantics of what "guaranteed" means. I'd take that to mean some protocol-level mechanism to provide a formal assurance - as TCP does do for its byte-level ordering.

My superficial understanding is that WebSockets messages, while implemented using TCP and typically ordered - are still susceptible to mis-ordered delivery at the message level.

For example via packet loss and retransmission during network congestion, variable network latency, or the influence of a load balancer or proxy.

How frequently is that likely to occur, if it indeed can occur? I can't say. I can say that in my experience one encounters all sorts of strange networks environments in the real world, so my advice has generally been to try aim for an application-level protocol that is robust to unexpected network conditions.

Even saying "it's 2024, just use WebSockets - they're supported universally" doesn't always agree with the real-world IME. I still see non-trivial numbers of Ajax connections in my logs, often due to apparent proxy issues, mobile networks or hotspots doing weird things, etc. But I also have a lot of connections from international markets, so again YMMV.

I wouldn't be keen on adding hard sequencing to Sente at its protocol level since I actually quite like the current level of network abstraction in Sente. I believe it handles the very mundane details, while encouraging folks to think about what semantics they want at an application level.

And an advantage of an application-level design is that you get to choose the specific semantics and therefore trade-offs you want. And you can do that on a per-message basis, i.e. at the domain level.

For example-

  • For some kinds of updates, I'm happy to discard/drop an out-of-sequence message.
  • For some kinds of updates, I'm happy to merge an out-of-sequence message (when sequence doesn't matter).
  • For some kinds of updates, I'd rather force a resync back to an acknowledged checkpoint.

The tough part of building realtime apps IME is deciding on the semantics (and therefore trade-offs) you want for your application.

Sente's current protocol is intended to address the common denominator stuff, without making presumptions about anything that one might typically want to tweak on an application basis. And ideal sequencing semantics vary a lot from app to app IME (even from message to message).

This is undoubtedly a matter of taste though, and might not be ideal for all cases. I'm usually happy to treat the network with skepticism, but the extent of that skepticism might be atypical or even unreasonable.

I would be inclined to pose the question though:

  • What's the worst that happens if you expect reliability, but don't get it?
  • What's the worst that happens if you don't expect reliability, but do get it?

The cost of the latter depends on thinking through the semantics you want, and the cost of your remedy. A high-cost remedy might make this approach silly, a low-cost remedy might make it reasonable. So I guess a lot hinges here on the cost of the remedy. I think you might be be estimating a high default cost, and I'm estimating a low default cost.

If your sync algorithm absolutely must have implicit strict cross-message ordering, then Sente's might not an ideal fit.

Though I'd (personally, perhaps unreasonably!) also be skeptical that even a plain WebSocket connection would get you strict cross-message ordering reliable enough to "just count on".

Given the linked comments and Socket.IO docs, I might be outright wrong here or at least in a minority. I didn't write the Socket.IO docs, but their use of "guarantee[d] message ordering" makes me uncomfortable if the source of that guarantee is solely TCP byte-ordering.

tl;dr - summary of my superficial understanding:

  • TCP bytes are received in-order, guaranteed at the TCP protocol level.

  • TCP messages are typically received in-order due to the above, but may (rarely) be out-of-order due to unusual network conditions or configurations.

  • WebSockets don't have their own order mechanism/guarantees, but instead rely on the underlying TCP connection's mechanism/guarantees.

  • WebSocket messages are typically received in-order, but may (rarely) be out-of-order due to unusual network conditions or configurations.

IF the above is indeed accurate then my position would be:

  • I would (personally) not feel comfortable saying that Sente's message are guaranteed to be received in order, without implementing an explicit sequencing scheme at the protocol level.
  • I wouldn't be keen on implementing such a scheme at the protocol level, since I believe it's fairly common that different applications (and even messages) benefit from different/custom schemes.

By "official option" I meant that the docs could explicitly say that in this configuration, Websocket ordering guarantees would be maintained.

Without claiming any order guarantees, I'm happy to look through the code next time I'm on batched Sente work and document what'd be necessary to reliably prevent Sente from interfering with whatever semantics are provided by the underlying WebSocket.

It sounds like that might be sufficient in your case?

@krajj7
Copy link
Author

krajj7 commented May 10, 2024

We clearly believe different things about how TCP and Websockets work and what guarantees they provide.

To avoid repetition I'll skip ahead to the TLDR part where I most disagree and that I think is crucial:

  • TCP bytes are received in-order, guaranteed at the TCP protocol level.
  • TCP messages are typically received in-order due to the above, but may (rarely) be out-of-order due to unusual network conditions or configurations.

Agree on the first point, but the second point seems like a big misunderstanding of TCP. TCP is a stream-oriented protocol, there are no TCP messages (see Wikipedia or any resource on networking). Any discrete packets sent over the network are an implementation detail that you shouldn't worry about. Logically a TCP connection is a stream of bytes.

Messages are only introduced on the Websocket layer, and since a sequence of Websocket messages is also a sequence of bytes, they cannot be reordered when they are sent over a single TCP connection.

I honestly believe these are non-controversial objective facts and hope we could reach some agreement here.

I would be inclined to pose the question though:

  • What's the worst that happens if you expect reliability, but don't get it?
  • What's the worst that happens if you don't expect reliability, but do get it?

If the ordering guarantee of TCP/Websockets is not strong enough for you, what about the guarantee that bytes/messages will not be skipped entirely or delivered multiple times? Does Sente or the protocols you design on top also handle these cases, doing re-transmission and deduplication?

Fortunately there is no need, because TCP works really well and the worst that network conditions can do is make it slow or break the connection entirely. Anything else would be a flaw in the protocol or a bug in the implementation. TCP is so battle-hardened and foundational that I'm fully comfortable just ignoring that possibility. Websocket implementations may be less trustworthy, but the hard work is already done by TCP, so I'm comfortable with that as well.

Even saying "it's 2024, just use WebSockets - they're supported universally" doesn't always agree with the real-world IME. I still see non-trivial numbers of Ajax connections in my logs, often due to apparent proxy issues, mobile networks or hotspots doing weird things, etc.

I suspect Sente falls back to Ajax in many cases where retrying the Websocket connection would work just as well. I'm not against having the fallback, but for me it's not worth breaking the ordering guarantee.

Without claiming any order guarantees, I'm happy to look through the code next time I'm on batched Sente work and document what'd be necessary to reliably prevent Sente from interfering with whatever semantics are provided by the underlying WebSocket.

It sounds like that might be sufficient in your case?

That would be excellent and would 100% work for me, even if we disagree about the guarantees.

@ptaoussanis
Copy link
Member

ptaoussanis commented May 10, 2024

the second point seems like a big misunderstanding of TCP. TCP is a stream-oriented protocol, there are no TCP messages (see Wikipedia or any resource on networking).

I was, confusingly, referring informally to TCP segments. And I was indeed mistaken! It seems that TCP's sequence mechanism does cover cross-segment order, and that helps makes sense of the assertion that WebSocket message order should likewise be covered.

I'm quite ignorant of the underlying protocols- thanks a lot for patiently walking through this with me, it's very helpful to have pinpointed where my misunderstanding was 🙏

what about the guarantee that bytes/messages will not be skipped entirely or delivered multiple times? Does Sente or the protocols you design on top also handle these cases, doing re-transmission and deduplication?

I suspect my answer might frustrate you: I tend to prefer application-level protocols that don't much care about skipped or duplicated data :-)

I've seen so many cases over the years where some software or protocol makes promises that end up being broken in practice by some fault along the network (often proxies), that I'm in the habit of just preferring application-level robustness that I can fully grok and test myself without worrying about what some non-conformant mobile or hotspot provider in Southeast Asia might be doing with my traffic. (Not hypothetical, this has come up a lot).

I guess this comes down to the point I made earlier about remedy costs. In my experience a few sensible choices are often sufficient to have confidence at an application level. In practice, I often find it simpler to just assume traffic might be unreliable from the outset.

And assuming the worst (e.g. re: imperfect message ordering) also enables trade-offs like more aggressive buffering (which I often find hugely beneficial for certain types of applications), and indeed compatibility with Ajax (which I'm fairly confident is in fact still needed for a shrinking but still non-trivial proportion of traffic in some regions/markets).

But that's another thread, and as always a lot varies on the circumstances.

Websocket implementations may be less trustworthy, but the hard work is already done by TCP, so I'm comfortable with that as well.

Sounds reasonable to me if you're happy without the buffering or Ajax support, and especially with the welcome correction re: reliable message order.

That would be excellent and would 100% work for me, even if we disagree about the guarantees.

Sounds good, though you've got me sold on the guarantees 👍 (modulo my stubbornness about still preferring an application-level solution for such things when simple and easy).

Hope this (long) exchange hasn't been frustrating, it's been a productive one for me at least.

@krajj7
Copy link
Author

krajj7 commented May 10, 2024

I'm glad that this discussion was productive and that we reached an agreement about Websocket guarantees.

I share your appreciation for robust protocols that can handle all kinds of unexpected situations, but they are more difficult to design and implement. I could rework my synchronization to tolerate reordering, but I'm just too lazy when I know that a lower-level protocol should give me these guarantees for free.

Thank you for taking your time to discuss this, I am happy that Sente will continue fit my use case.

@ptaoussanis
Copy link
Member

ptaoussanis commented May 10, 2024

and that we reached an agreement about Websocket guarantees.

👍

Thank you for taking your time to discuss this, I am happy that Sente will continue fit my use case.

You're welcome.

I'll include something about this in the docs for the next release, and I'll check how easy it'd be to try make buffering reliably order-preserving for WebSockets. Will also see about an option to make it easier to disable buffering entirely.

Not planning a new release anytime soon though (just other open source priorities atm), but I did a quick skim of the code and couldn't spot anything else obvious that'd interfere with message ordering if you disable Ajax and always flush - though can't say for certain until I've had a closer look.

@danielsz
Copy link
Collaborator

danielsz commented May 10, 2024

Big applause from the captivated audience (me at the very least). We all benefit from a contentious but civil discussion. Kudos to both of you. 👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants