Skip to content

Connection State Recovery fails for long-lived connections #5282

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
bryghtlabs-richard opened this issue Jan 23, 2025 · 15 comments
Open

Connection State Recovery fails for long-lived connections #5282

bryghtlabs-richard opened this issue Jan 23, 2025 · 15 comments
Labels
to triage Waiting to be triaged by a member of the team

Comments

@bryghtlabs-richard
Copy link

bryghtlabs-richard commented Jan 23, 2025

Describe the bug
If the server does not periodically send the client messages, connection state recovery may fail unexpectedly.

To Reproduce

Socket.IO server version: 4.7.5, in-memory adapter

Server

import { Server } from "socket.io";

const io = new Server(3000, {
  connectionStateRecovery: {
    // the backup duration of the sessions and the packets
    maxDisconnectionDuration: 2 * 60 * 1000,
    // whether to skip middlewares upon successful recovery
    skipMiddlewares: true,
  }
});

io.on("connection", (socket) => {
  console.log(`connect ${socket.id}`);
  socket.emit("hello client", "Assigning offset to client");

  socket.on("disconnect", () => {
    console.log(`disconnect ${socket.id}`);
  });
});

We noticed this using a bespoke client, but appears to be a server-side issue. I've tried to mock up the client-side code accordingly.
Client

import { io } from "socket.io-client";

const socket = io({
  reconnectionDelay: 10000, // defaults to 1000
  reconnectionDelayMax: 10000 // defaults to 5000
});

socket.on("connect", () => {
  console.log("recovered?", socket.recovered);

  setTimeout(() => {
    if (socket.io.engine) {
      // close the low-level connection and trigger a reconnection
      socket.io.engine.close();
    }
  }, 3 * 60 * 1000);
});

Steps:

  1. Startup server, connect client to server, server emits an event to set client's recovery offset.
  2. Wait for 2x maxDisconnectionDuration without sending any events.
  3. Eventually Server will purge all buffered events on its side within maxDisconnectionDuration (+/- a minute or two).
  4. Cause client to reconnect
  5. Server cannot find appropriate offset because it has purged all buffered events.
  6. Client is assigned new ID.

Expected behavior
Connection State Recovery should succeed within maxDisconnectionDuration of disconnection, as long as the server must send at least one event, in order to initialize the offset on the client side.

It might help if server were to store offset of last queued packet in a separate variable, which could be checked even if restoreSession() cannot find the offset in the queue.

If server no longer relies upon pending message queue, we could then use Engine.IO PING/PONG events to trim queue(Once PONG is received, all events sent before the PING that triggered PONG are known to have arrived at the client).

Platform:

  • Client Device: electronic board game
  • OS: FreeRTOS
@bryghtlabs-richard bryghtlabs-richard added the to triage Waiting to be triaged by a member of the team label Jan 23, 2025
@Waxer59
Copy link

Waxer59 commented May 15, 2025

I'm having the same problem, is there any solution?

@bryghtlabs-richard
Copy link
Author

Periodically (maxDisconnectionDuration/2) send server->client garbage packet?

@Waxer59
Copy link

Waxer59 commented May 15, 2025

Maybe, but it would increase the load on the server

@Waxer59
Copy link

Waxer59 commented May 15, 2025

Maybe the problem is here?

Image

@bryghtlabs-richard
Copy link
Author

bryghtlabs-richard commented May 15, 2025

That seems familiar - IIRC, connection state recovery requires this.packets to contain at least one packet. Probably at risk for this bug occurring as soon as maxDisconnectionDuration time passes since the last server->client packet.

Your cursor is highlighting where the server periodically discards packets older than maxDisconnectionDuration - perhaps a quick fix might be to always keep the most recent packet, regardless of maxDisconnectionDuration?

@Waxer59
Copy link

Waxer59 commented May 15, 2025

I've been trying to solve the issue, but the problem is that the packets are being saved without any relationship to who sent them. How can we know which packet should stay?

@bryghtlabs-richard
Copy link
Author

bryghtlabs-richard commented May 15, 2025

That loop goes through each connection, one at a time. It also gives you the socket ID.

@bryghtlabs-richard
Copy link
Author

bryghtlabs-richard commented May 15, 2025

I think you could always keep the last packet in this.packets here, and maybe adjust the spot where we check if a matching packet is in that list to ignore the timestamp if there's only one packet?

Alternatively, you could save off the last packet from this.packets once they're all expired, and adjust the spot where we check if a matching packet is in the list to check both places.

(Also, I thought I should mention I don't know JavaScript, you have now reached further chasing this down than I got).

@Waxer59
Copy link

Waxer59 commented May 16, 2025

The problem is also that right now each packet that is sent is stored in this. packets, and at some point we have to know when to delete them and besides that each packet is sent with a different ID within the array to know where it was left and send the packets that did not reach the client, we must also take into account that the sessions are only created once the client is disconnected, so if for example we make each packet has the id of the session that belongs and delete them only from the array when the client disconnects, when storing the packets, would not be a problem to store everything sent by the server and only delete it when the client disconnects?

For example if in the server we have that to a client has been sent 1000 messages all that would be stored until it disconnected.

@Waxer59
Copy link

Waxer59 commented May 16, 2025

I think restoring the data of a client would be easy but sending the packets that haven't been sent is the issue

@Waxer59
Copy link

Waxer59 commented May 16, 2025

Maybe this method should return the session if the index is -1 and the function to send the lost packets should be a separate thing.

  override restoreSession(
    pid: PrivateSessionId,
    offset: string,
  ): Promise<Session> {
    const session = this.sessions.get(pid);
    if (!session) {
      // the session may have expired
      return null;
    }
    const hasExpired =
      session.disconnectedAt + this.maxDisconnectionDuration < Date.now();
    if (hasExpired) {
      // the session has expired
      this.sessions.delete(pid);
      return null;
    }
    const index = this.packets.findIndex((packet) => packet.id === offset);
    if (index === -1) {
      // the offset may be too old
      return null;
    }
    const missedPackets = [];
    for (let i = index + 1; i < this.packets.length; i++) {
      const packet = this.packets[i];
      if (shouldIncludePacket(session.rooms, packet.opts)) {
        missedPackets.push(packet.data);
      }
    }
    return Promise.resolve({
      ...session,
      missedPackets,
    });
  }

@Waxer59
Copy link

Waxer59 commented May 16, 2025

Perhaps splitting the thresholds and having something like this would be a good solution

export class SessionAwareAdapter extends Adapter {
  private readonly maxDisconnectionDuration: number;
  private readonly maxMissingPacketExpiration: number;

  private sessions: Map<PrivateSessionId, SessionWithTimestamp> = new Map();
  private packets: PersistedPacket[] = [];

  constructor(readonly nsp: any) {
    super(nsp);
    this.maxDisconnectionDuration =
      nsp.server.opts.connectionStateRecovery.maxDisconnectionDuration;
    this.maxMissingPacketExpiration = nsp.server.opts.connectionStateRecovery.maxMissingPacketExpiration

    const timer = setInterval(() => {
      const maxDisconnectionDurationThreshold = Date.now() - this.maxDisconnectionDuration;
      const maxMissingPacketExpirationThreshold = Date.now() - this.maxMissingPacketExpiration;
      
      this.sessions.forEach((session, sessionId) => {
        const hasExpired = session.disconnectedAt < maxDisconnectionDurationThreshold;
        if (hasExpired) {
          this.sessions.delete(sessionId);
        }
      });
      for (let i = this.packets.length - 1; i >= 0; i--) {
        const hasExpired = this.packets[i].emittedAt < maxMissingPacketExpirationThreshold;
        if (hasExpired) {
          this.packets.splice(0, i + 1);
          break;
        }
      }
    }, 60 * 1000);
    // prevents the timer from keeping the process alive
    timer.unref();
  }

  override persistSession(session: SessionToPersist) {
    (session as SessionWithTimestamp).disconnectedAt = Date.now();
    this.sessions.set(session.pid, session as SessionWithTimestamp);
  }

  override restoreSession(
    pid: PrivateSessionId,
    offset: string,
  ): Promise<Session> {
    const session = this.sessions.get(pid);
    if (!session) {
      // the session may have expired
      return null;
    }
    const hasExpired =
      session.disconnectedAt + this.maxDisconnectionDuration < Date.now();
    if (hasExpired) {
      // the session has expired
      this.sessions.delete(pid);
      return null;
    }
    const index = this.packets.findIndex((packet) => packet.id === offset);
    if (index === -1) {
      // the offset may be too old
      return Promise.resolve({
        ...session,
        missedPackets: [],
      })
    }
    const missedPackets = [];
    for (let i = index + 1; i < this.packets.length; i++) {
      const packet = this.packets[i];
      if (shouldIncludePacket(session.rooms, packet.opts)) {
        missedPackets.push(packet.data);
      }
    }
    return Promise.resolve({
      ...session,
      missedPackets,
    });
  }

  override broadcast(packet: any, opts: BroadcastOptions) {
    const isEventPacket = packet.type === 2;
    // packets with acknowledgement are not stored because the acknowledgement function cannot be serialized and
    // restored on another server upon reconnection
    const withoutAcknowledgement = packet.id === undefined;
    const notVolatile = opts.flags?.volatile === undefined;
    if (isEventPacket && withoutAcknowledgement && notVolatile) {
      const id = yeast();
      // the offset is stored at the end of the data array, so the client knows the ID of the last packet it has
      // processed (and the format is backward-compatible)
      packet.data.push(id);
      this.packets.push({
        id,
        opts,
        data: packet.data,
        emittedAt: Date.now(),
      });
    }
    super.broadcast(packet, opts);
  }
}

@bryghtlabs-richard
Copy link
Author

Ohhh, I did not understand that the loop that trims stale packets runs for all packets - I missed that the for-each-session-id doesn't cover that. That makes things much more complicated.

@Waxer59
Copy link

Waxer59 commented May 16, 2025

Ohhh, I did not understand that the loop that trims stale packets runs for all packets - I missed that the for-each-session-id doesn't cover that. That makes things much more complicated.

For the moment I think the best thing to do is to create a package to solve this and when there is an answer see what to do next, wdyt?

@bryghtlabs-richard
Copy link
Author

Sorry, I don't understand Socket.io well enough to offer a useful opinion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
to triage Waiting to be triaged by a member of the team
Projects
None yet
Development

No branches or pull requests

2 participants