Skip to content

WebSocketProvider handle ws close and reconnect #1053

Open
@PierreJeanjacquot

Description

@PierreJeanjacquot

Hi @ricmoo,

I'm using WebSocketProvider server-side to listen to blockchain events and performing calls to smart contracts.
Sometimes the websocket pipe got broken and I need to reconnect it.

I use this code to detect ws close and reconnect but it would be nice to not have to rely on _websocket to do it:

let wsProvider;

init = async () => {
  wsProvider = new ethers.providers.WebSocketProvider(wsHost);
  wsProvider._websocket.on('close', async (code) => {
    console.log('ws closed', code);
    wsProvider._websocket.terminate();
    await sleep(3000); // wait before reconnect
    init();
  });
  wsProvider.on('block', doStuff);
};

I also noticed when the websocket is broken Promise call don't reject wich is not super intuitive.

Activity

ricmoo

ricmoo commented on Oct 23, 2020

@ricmoo
Member

This is a very large feature... When I first (begrudgingly) added WebSocketProvider mentioned this would be something I would eventually get to, but that it won't be high priority any time soon. :)

But I want to! :)

It is still on the backlog, and I'll use this issue to track it, but there are other things I need to work on first.

Keep in mind when you reconnect, you may have been disconnected for a long time, in which case you should find and trigger events that were missed; you may have also been down fo a short period of time, in which case you must dedup events you've already emitted. Also, earlier events should be emitted before later ones. Unless there was a re-org, exactly-once semantics should be adhered to. All subscriptions will need some custom logic, depending on the type of subscription to handle this.

Also ethers providers guarantee consistent read-after-events. So, if a block number X has been emitted, a call to getBlock(X) must return a block. In many cases, due to the distributed nature of the Blockchain, especially with a FallbackProvider, one backend may have seen a block before others, so calling getBlock might occur on a node before it has actually seen the block, so the call must stall and (with exponential back-off) poll for the block and resolve it when it comes in. Similarly, this is true for events which include the transactionHash; a call to getTransaction must succeed, stalling until the data becomes available.

Also keep special note of block, debug, poll and network events which need themselves some coordination and may recall some changes in their super class to handle properly...

Basically, it's a feature I really want too, but I know it's going to take considerable time to complete and properly test. I just wanted to give some background on the complexity.

mikevercoelen

mikevercoelen commented on Mar 27, 2021

@mikevercoelen

I think this is probably the best solution:

const EXPECTED_PONG_BACK = 15000
const KEEP_ALIVE_CHECK_INTERVAL = 7500

export const startConnection = () => {
  provider = new ethers.providers.WebSocketProvider(config.ETH_NODE_WSS)

  let pingTimeout = null
  let keepAliveInterval = null

  provider._websocket.on('open', () => {
    keepAliveInterval = setInterval(() => {
      logger.debug('Checking if the connection is alive, sending a ping')

      provider._websocket.ping()

      // Use `WebSocket#terminate()`, which immediately destroys the connection,
      // instead of `WebSocket#close()`, which waits for the close timer.
      // Delay should be equal to the interval at which your server
      // sends out pings plus a conservative assumption of the latency.
      pingTimeout = setTimeout(() => {
        provider._websocket.terminate()
      }, EXPECTED_PONG_BACK)
    }, KEEP_ALIVE_CHECK_INTERVAL)

    // TODO: handle contract listeners setup + indexing
  })

  provider._websocket.on('close', () => {
    logger.error('The websocket connection was closed')
    clearInterval(keepAliveInterval)
    clearTimeout(pingTimeout)
    startConnection()
  })

  provider._websocket.on('pong', () => {
    logger.debug('Received pong, so connection is alive, clearing the timeout')
    clearInterval(pingTimeout)
  })
}

This send a ping every 15 seconds, when it sends a ping, it expects a pong back within 7.5 seconds otherwise it closes the connection and calls the main startConnection function to start everything over.

Where it says // TODO: handle contract listeners setup + indexing that's where you should do any indexing or listening for contract events etc.

Fine tune these timing vars to taste, depending on who your Node provider is, this are the settings I use for QuikNode

const EXPECTED_PONG_BACK = 15000
const KEEP_ALIVE_CHECK_INTERVAL = 7500
gwendall

gwendall commented on May 17, 2021

@gwendall

To elaborate on @mikevercoelen's answer, I extracted the logic to a function

type KeepAliveParams = {
  provider: ethers.providers.WebSocketProvider;
  onDisconnect: (err: any) => void;
  expectedPongBack?: number;
  checkInterval?: number;
};

const keepAlive = ({
  provider,
  onDisconnect,
  expectedPongBack = 15000,
  checkInterval = 7500,
}: KeepAliveParams) => {
  let pingTimeout: NodeJS.Timeout | null = null;
  let keepAliveInterval: NodeJS.Timeout | null = null;

  provider._websocket.on('open', () => {
    keepAliveInterval = setInterval(() => {
      provider._websocket.ping();

      // Use `WebSocket#terminate()`, which immediately destroys the connection,
      // instead of `WebSocket#close()`, which waits for the close timer.
      // Delay should be equal to the interval at which your server
      // sends out pings plus a conservative assumption of the latency.
      pingTimeout = setTimeout(() => {
        provider._websocket.terminate();
      }, expectedPongBack);
    }, checkInterval);
  });

  // eslint-disable-next-line @typescript-eslint/no-explicit-any
  provider._websocket.on('close', (err: any) => {
    if (keepAliveInterval) clearInterval(keepAliveInterval);
    if (pingTimeout) clearTimeout(pingTimeout);
    onDisconnect(err);
  });

  provider._websocket.on('pong', () => {
    if (pingTimeout) clearInterval(pingTimeout);
  });
};

Then in my code, i get:

const startBot = () => {
  const provider = new ethers.providers.WebSocketProvider(wsUrl);
  keepAlive({
      provider,
      onDisconnect: (err) => {
        startBot();
        console.error('The ws connection was closed', JSON.stringify(err, null, 2));
      },
    });
};
mikevercoelen

mikevercoelen commented on May 17, 2021

@mikevercoelen

We're two months in and the code mentioned before, has been running steadily on our node :) 0 downtime.

gwendall

gwendall commented on May 17, 2021

@gwendall

Really cool ! Thanks again for sharing :)

sentilesdal

sentilesdal commented on May 19, 2021

@sentilesdal

@mikevercoelen I'm using ethers 5.0.32 and the websocket provider doesn't have the 'on' method which really hampers implementing your solution ;). What version of ethers are you using?

ricmoo

ricmoo commented on May 19, 2021

@ricmoo
Member

There should definitely be an .on method. There is no version of WebSocketProvider that didn’t have it, since it inherits from JsonRpcProvider.

sentilesdal

sentilesdal commented on May 20, 2021

@sentilesdal

Ok well I'm not sure what's going on. Its definitely not there, I'm seeing an interface for provider._websocket that looks just like a regular websocket interface: https://developer.mozilla.org/en-US/docs/Web/API/WebSocket/onopen.

Is there typo in the code above? Perhaps instead of
provider._websocket.on('open', () => {})
I should be calling these directly on the provider? I tried this too but the provider doesn't recognize the 'open', 'close', and 'pong' event types. websocket-provider.ts from ethers only shows these event types: 'block', 'pending', 'filter', 'tx', 'debug', 'poll', 'willPoll', 'didPoll', 'error'.

ricmoo

ricmoo commented on May 20, 2021

@ricmoo
Member

Oh! Sorry, yes. In general you should use provider.on. The _websocket is a semi-private member and should not generally be touched , unless direct access too it is needed. But only ethers-supported events are supported by provider.on.

It depends on your environment what your ._webprovider is. Some platforms may use .addEventListener instead of .on, maybe?

If your goal is to enable automatic reconnect, this is not something that is simple to do in a safe way, so make sure you test it thoroughly. :)

sentilesdal

sentilesdal commented on May 20, 2021

@sentilesdal

We are actually using alchemy so was able to just use their web3 websocket provider and plugged it into our ethers ecosystem with ethers.provider.Web3Provider. they handle all the reconnects and even dropped calls very gracefully.

rrecuero

rrecuero commented on Jun 9, 2021

@rrecuero

One question @ricmoo , @gwendall when trying to use the code snippet above I get that the websocket object doesn't have on method.

I am using the latest ethers 5.3 from the dapp

sentilesdal

sentilesdal commented on Jun 9, 2021

@sentilesdal

@rrecuero I ran into the same problem and I'm still not sure how that code above works :P

92 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or improvement.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @kebian@gwendall@58bits@rrecuero@jophish

      Issue actions

        WebSocketProvider handle ws close and reconnect · Issue #1053 · ethers-io/ethers.js