Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add propper handling of 'http2: server sent GOAWAY and closed the connection' #10111

Closed
2 tasks done
ra-coder opened this issue Sep 6, 2023 · 9 comments
Closed
2 tasks done

Comments

@ra-coder
Copy link

ra-coder commented Sep 6, 2023

Welcome!

  • Yes, I've searched similar issues on GitHub and didn't find any.
  • Yes, I've searched similar issues on the Traefik community forum and didn't find any.

What did you do?

In my production installation under big load i get 500 from traefik and next in error log

{"level":"debug","msg":"'500 Internal Server Error' caused by: http2: Transport: cannot retry err [http2: Transport received Server's graceful shutdown GOAWAY] after Request.Body was written; define Request.GetBody to avoid this error"}

here is big discussion
golang/go#18639

here is some brif conclusion
https://stackoverflow.com/questions/45209168/http2-server-sent-goaway-and-closed-the-connection-laststreamid-1999/77049485#77049485

This is an issue of how Go handle behaviour of aws load balancer (or other) under http 2.0 connection

https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-troubleshooting.html

The load balancer sends a response code of 000 With HTTP/2 connections, if the compressed length of any of the headers exceeds 8 K bytes or if the number of requests served through one connection exceeds 10,000, the load balancer sends a GOAWAY frame and closes the connection with a TCP FIN

So the root cause => is "to heavy header" or "ddos" (hight load in the moment) to the server/service where your Go app do http 2.0 requests

I suggest to
handle this error in trafik app and return 503 Busy http code

What did you see instead?

{"level":"debug","msg":"'500 Internal Server Error' caused by: http2: Transport: cannot retry err [http2: Transport received Server's graceful shutdown GOAWAY] after Request.Body was written; define Request.GetBody to avoid this error"}

What version of Traefik are you using?

2.9

What is your environment & configuration?

docker_traefik_image: "traefik:2.9"

If applicable, please paste the log output in DEBUG level

No response

@nmengin
Copy link
Contributor

nmengin commented Sep 11, 2023

Hey @ra-coder!

Thanks for your suggestion.

After discussing it internally, we think that's better to fit with the Go implementation.
However, we'd like to know the community's opinion on this topic.

So we are going to leave the status as "kind/proposal" to give the community time to let us know whether they would like this or not.
We will reevaluate as people respond.

@nmengin nmengin added kind/proposal a proposal that needs to be discussed. area/server and removed status/0-needs-triage labels Sep 11, 2023
@DOBRO-228

This comment was marked as off-topic.

@ra-coder
Copy link
Author

@nmengin

can you give any ideas of how long to wait for community feed back ?

@rtribotte rtribotte self-assigned this Oct 23, 2023
@rtribotte
Copy link
Member

Hello @ra-coder,

Regarding your proposal and the use of the 503 status code versus the 500 status code in Traefik. I would like to inquire further about the rationale behind favoring the 503 status code over the 500 status code in this context.

While both status codes indicate server errors, they do carry different meanings. A 500 status code typically suggests an unhandled server error (the actual situation from Traefik's point of view), while a 503 status code signals that the server is intentionally unavailable or overloaded.

In Traefik, the 503 status code has also a special meaning and reflects an empty server load balancer situation.

Could you elaborate on why the 503 status code would be considered more suitable in this scenario?

@ra-coder
Copy link
Author

ra-coder commented Oct 23, 2023

hello @rtribotte!

When Traefik used as proxy, it returns status code of service behind him, the case above is issue of HTTP/2 witch is most similar to 503 case. Indeed the server behind Traefik (amazon load balanser in my case) is not able to handle the request at the moment, but retry in couple seconds will return 200 httpcode for same request.

@rtribotte
Copy link
Member

Hello @ra-coder,

This is not exactly what is happening, as per this comment, the server behind Traefik did not act expectedly, and an error occurred. If there were no errors on the read operation, Traefik would forward seamlessly the server status code to the client.

For that reason (there was an error while reading the response), I think Traefik's behavior is accurate, and a 500 status code is accurate too. Hence my previous question, how a 503 status code would be better than a 500 status code in that case?
A 500 status code is retryable as well.

@ra-coder
Copy link
Author

ra-coder commented Oct 25, 2023

but right now it not even 500, the error is not handled in traefik and frontend got "drop of connection". If you prefer/insist on 500 http code, ok but pls add handling of the error.

503: Service Unavailable — Usually a temporary status that indicates the server may be overloaded, down for maintenance, etc. Whatever the reason, it’s unable to handle requests at the moment.

500: Internal Server Error — The classic server error. There isn’t a problem with the request, but rather the server. It is a vague status and doesn’t give much more information about what the problem is other than the problem is unexpected.

@rtribotte
Copy link
Member

In my production installation under big load i get 500 from traefik and next in error log

I was acknowledging this from your original comment.
Could you please provide more information to better describe what is happening?

@traefiker
Copy link
Contributor

Hi! I'm Træfiker 🤖 the bot in charge of tidying up the issues.I have to close this one because of its lack of activity 😞Feel free to re-open it or join our Community Forum.

@traefik traefik locked and limited conversation to collaborators Jan 25, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants