Description
Description
Differentiate status codes from functions vs OpenFaaS components
Why do you need this?
When invoking functions or attempting to retry them, it's currently hard to differentiate between an error caused by Kubernetes (i.e. node eviction during an invocation) or by a function (i.e. a 429 because the Twitter API that was being used is overloaded)
Who is this for?
@kevin-lindsay-1 for Surge has requested this - but Waylay also wanted this for their integration cc @OcamsRazor
Expected Behaviour
A way to determine whether a 500 error was from the gateway / watchdog or from the function
Current Behaviour
There are some hints depending on the message body and headers, however no consistency right now.
List All Possible Solutions and Workarounds
- Add a header when handling an error condition in any of the OpenFaaS components
- Add a header when the watchdog receives a HTTP response from a function
Which Solution Do You Recommend?
I recommend 1 - because 2 depends on the use of the watchdog, which is not used by all users.
For 1 - the gateway needs a change since it can invoke functions directly. The provider also needs a change for when direct_functions is set to false and invocations flow through it instead. The watchdog should also have a change so that if it handles an error that can be passed back up the stream.
Headers do support multiple values for a key, i.e. X-OpenFaaS-Source: [watchdog, gateway]
For 1 - the watchdog does not need to be updated, and even when it's not in use, this header would still propagate and flow.
Then there's a wider conversation about how the queue-worker should retry errors when the "X-OpenFaaS-Source" header is present - assuming that these need to be retried due to an error during scaling - or node eviction.