Allow to handle consumer retry failure at the user level #643

goriunov · 2020-02-17T01:05:40Z

Initial implementation for:
#69

Allow to handle consumer retry failure at the user level.

If you have any suggesting/improvement/changes let me know. Would be good to have this, or something similar soon :)

goriunov · 2020-02-17T02:38:57Z

I don't think test failure related to this change.

Nevon · 2020-02-28T10:38:13Z

Could you provide an example use-case for this change? I don't exactly see what would be the purpose of this, especially since as of the current implementation, there's no way for the user to actually affect the consumer (stopping it from restarting, for example). As it is currently, it's simply a hook that allows you to run some arbitrary code on crashes, which we already have in the form of the consumer.events.CRASH instrumentation event

goriunov · 2020-02-29T09:08:55Z

@Nevon from what i understand this change will allow for users to stop auto restart on system fail of the consumer if they wish. As setTimeout(() => restart(onCrash), retryTime) is responsible for consumer restart and i believe user should have a way to stop autoRestart (which is impossible at the moment) .

We had few case cases which were caused by network issues (hard to constantly reproduce) where we want to stop consumer, wait while system recovers and then create new consumer. The behaviour was we trigger disconnect which does not resolve then when we are back we get old consumer reconnected but runner thinks that consumer is disconnected so basically bad state of the consumer all of that due to above setTimeout which is impossible to cancel. I have a fork which we currently use in our prod where this setTiemout has been disabled and we can correctly fail the consumer and create new one without causing bad state and not working consumers connected to out CGs.

I think that this setTimeout at least should be under some kinda flag/option.

Nevon · 2020-02-29T10:14:08Z

I understand. So the goal is to allow users to bail out of the restart cycle. I can see how that would be useful, and I think it's a reasonable thing to allow for.

In that case, we should change the implementation a bit though. Currently it's just a synchronous function that receives no arguments and returns nothing, which means that pretty much the only thing you can do in there is process.exit - no way to shut down resources cleanly or anything.

I would say that we should change it to be an async function that receives the error as an argument and then communicates back to the caller whether or not it should restart. Something like:

if (e.name === 'KafkaJSNumberOfRetriesExceeded' || e.retriable === true) {
  const shouldRestart = !retry.restartOnFailure || (await retry.restartOnFailure(e)).catch(error => {
    logger.error('Caught error when invoking user-provided "restartOnFailure" callback. Defaulting to restarting.', {
      error: error.message || error,
      originalError: e.message || e,
      groupId,
    })

    return true
  })

  if (shouldRestart) {
    const retryTime = e.retryTime || retry.initialRetryTime || initialRetryTime
    logger.error(`Restarting the consumer in ${retryTime}ms`, {
      retryCount: e.retryCount,
      retryTime,
      groupId,
    })

    setTimeout(() => restart(onCrash), retryTime)
  }
}

This would allow you to stop the restarting depending on the error, as well as clean up any resources you might have (database connections etc.) or wait for ongoing processes to finish.

goriunov · 2020-03-01T19:44:16Z

@Nevon you approach looks better :) I have updated PR, let me know if anything else should be added

Nevon · 2020-03-02T07:53:32Z

I fixed a syntax error in the code that I proposed, and also pushed some tests to verify that it actually does what it says on the tin.

moreiravictor · 2022-01-07T14:55:52Z

Was this option released? I really needed it!

goriunov requested review from tulios and Nevon February 17, 2020 01:07

goriunov and others added 3 commits March 1, 2020 10:03

Update index.js

b36f186

Add onRetryFail function

5ae4072

Add async restartOnFailure

a86b098

goriunov force-pushed the master branch from 4ac9f54 to a86b098 Compare March 1, 2020 19:34

Nevon added 2 commits March 2, 2020 08:50

Fix syntax error

937c5d3

Test restartOnFailure

c6fe0ba

Nevon merged commit 728a325 into tulios:master Mar 2, 2020

goriunov mentioned this pull request Mar 15, 2020

Fix missing typings for restartOnFailure #664

Merged

ankon mentioned this pull request Apr 5, 2020

Exposing "is consumer running" to clients #688

Closed

Nevon mentioned this pull request Apr 6, 2020

Document Consumer retry.restartOnFailure #690

Closed

ankon mentioned this pull request Jun 24, 2020

Question: TypeError: request is not a function #779

Closed

tulios mentioned this pull request Sep 11, 2020

Implement a means to react on retries #69

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow to handle consumer retry failure at the user level #643

Allow to handle consumer retry failure at the user level #643

goriunov commented Feb 17, 2020 •

edited

Loading

goriunov commented Feb 17, 2020

Nevon commented Feb 28, 2020

goriunov commented Feb 29, 2020

Nevon commented Feb 29, 2020

goriunov commented Mar 1, 2020

Nevon commented Mar 2, 2020 •

edited

Loading

moreiravictor commented Jan 7, 2022

Allow to handle consumer retry failure at the user level #643

Allow to handle consumer retry failure at the user level #643

Conversation

goriunov commented Feb 17, 2020 • edited Loading

goriunov commented Feb 17, 2020

Nevon commented Feb 28, 2020

goriunov commented Feb 29, 2020

Nevon commented Feb 29, 2020

goriunov commented Mar 1, 2020

Nevon commented Mar 2, 2020 • edited Loading

moreiravictor commented Jan 7, 2022

goriunov commented Feb 17, 2020 •

edited

Loading

Nevon commented Mar 2, 2020 •

edited

Loading