-
Notifications
You must be signed in to change notification settings - Fork 2.7k
REST: Revert #12818 and additionally stop retrying on 502/504 #13352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… 5xx retries by throwing CommitStateUnknown (apache#12818)" This reverts commit 4927610.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @singhpk234 this looks good just had a minor comment on testing that would be good to address
core/src/test/java/org/apache/iceberg/rest/TestExponentialHttpRequestRetryStrategy.java
Outdated
Show resolved
Hide resolved
41ade9a
to
34f57db
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @singhpk234 this looks great now! I'll hold off before merging for @RussellSpitzer @rdblue feedback
core/src/main/java/org/apache/iceberg/rest/ExponentialHttpRequestRetryStrategy.java
Show resolved
Hide resolved
LGTM, @dennishuo has now scared me of 503's and anonymous 409's (not thrown by the IRC) potentially causing issues but I think we can address that in a follow up if we get consensus. |
Thanks @singhpk234 for reverting and fixing again! Thanks @amogh-jahagirdar, @nastra , @dennishuo and @rdblue for discussion and review |
About the change
presently, we retry on 502 / 504 as well here, we have a spec definition stating here that when these status are thrown the commit status can be unknown, so says the RFC as something went wrong in the middle of processing the request. So we retry with 502 and 504 we can conflict with ourselves, as we don't know when we receive the status of 409 is it because we retried on 502 and then got 409 or something else happened, so it's better to throw the commit state unknown if we know the commit has been retried.
The Iceberg users who complained this was with 504 : slack thread - https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1747992294134219
We have similar handling for glue as we did here #7198 as there are internal http retries over the glue SDK.
So I Added a PR considering above #12818
But based on offline discussions with more folks, according to the spec its best to not retry on 502 / 504 as it clearly calls out that the status is unknown.
The change attempts to :
Testing
Adding unit tests
cc @amogh-jahagirdar @rdblue @RussellSpitzer