-
Notifications
You must be signed in to change notification settings - Fork 331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
7705 retryable http client #7723
Conversation
🎊 PR Preview 376e69b has been successfully built and deployed to https://treeverse-lakeFS-preview-pr-7723.surge.sh 🕐 Build time: 0.014s 🤖 By surge-preview |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving due to urgency, there are no blocking comments, but please do consider them.
docs/reference/cli.md
Outdated
@@ -87,7 +87,7 @@ lakectl [flags] | |||
{:.no_toc} | |||
|
|||
``` | |||
--base-uri string base URI used for lakeFS address parse | |||
--base-uri string base URI used for lakeFS address parse (default "https://treeverse-proper-boar-yjz5us.us-east-1.lakefscloud.ninja/api/v1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this have to do lakeFS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops... that wasn't supposed to be here...
@@ -521,6 +538,10 @@ func initConfig() { | |||
// set defaults | |||
viper.SetDefault("metastore.hive.db_location_uri", "file:/user/hive/warehouse/") | |||
viper.SetDefault("server.endpoint_url", "http://127.0.0.1:8000") | |||
viper.SetDefault("server.retries.enabled", true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the number of retries enough to decide if we have retries or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is, but I'd prefer to be explicit if it doesn't cost me anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't cost you, but does cost the user understanding
What happens if I don't want reties and I change retries to 0 because that's what I'm used to, what will happen here?
cmd/lakectl/cmd/root.go
Outdated
if !cfg.Server.Retries.Enabled { | ||
httpClient = &http.Client{ | ||
Transport: transport, | ||
} | ||
} else { | ||
httpClient = NewRetryClient(cfg.Server.Retries.MaxRetries, | ||
cfg.Server.Retries.MinWaitInterval, cfg.Server.Retries.MaxWaitInterval, transport) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: but I would expect the newRetryClient to get the httpClient and not the transport, that we can initiate the client regardless and then just wrap it with the retry client if needed
retryClient.RetryMax = maxRetries | ||
retryClient.RetryWaitMin = retryWaitMin | ||
retryClient.RetryWaitMax = retryWaitMax | ||
retryClient.CheckRetry = lakectlRetryPolicy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still believe the default is good enough for us
cmd/lakectl/cmd/retry_client.go
Outdated
func NewRetryClient(maxRetries int, retryWaitMin, retryWaitMax time.Duration, transport *http.Transport) *http.Client { | ||
retryClient := retryablehttp.NewClient() | ||
if transport != nil { | ||
retryClient.HTTPClient.Transport = transport | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still believe that a simple function in cmd/root.go
will do. Also cmd
is not the correct directory for this, I would expect it to be somewhere more generic, such as pkg/httputil
cmd/lakectl/cmd/retry_client.go
Outdated
retryClient.HTTPClient.Transport = transport | ||
} | ||
retryClient.Logger = nil | ||
retryClient.RetryMax = maxRetries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aws cli, uses maxAttemps IMO it's clear, In max retries I never remember if the first try counts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine either way, but done.
cmd/lakectl/cmd/root.go
Outdated
MinWaitInterval time.Duration `mapstructure:"min_wait_interval"` | ||
MaxWaitInterval time.Duration `mapstructure:"max_wait_interval"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please document this two
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added inline comments. If you can point me to the documentation, I'll add it there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess a section here mentioning retries would be appropriate..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! request-changes-worthy-comments:
- Lack of docs for the new values
- I think we should start with something that is less configureable.
cmd/lakectl/cmd/root.go
Outdated
Enabled bool `mapstructure:"enabled"` | ||
MaxRetries int `mapstructure:"max_retries"` | ||
MinWaitInterval time.Duration `mapstructure:"min_wait_interval"` | ||
MaxWaitInterval time.Duration `mapstructure:"max_wait_interval"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These new values should go to the cli reference doc page.
I also suggest to start leaner, maybe just Enabled
& MaxRetries
to kick it off? Maybe even just Enabled
?
I believe the defaults you set would be good enough for almost everyone. I don't feel like we have enough information to determine users need to configure intervals for now.. We can always add more custom config later, removing it is the hard part..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the point of providing fewer configuration options. We lose the flexibility to configure for optimal results, and this way, we'll never get feedback on which settings work for different use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it's adding scope and friction. Instead of having "retry mechanism" for lakectl, we have "configurable retry mechanism". The gap isn't big coding-wise, I agree. But now I have to ask annoying questions regarding usage, like MaxRetries
non-positive, MaxWaitInterval
smaller than MinWaitInterval
, why choose these values and not TotalWaitTime
, etc. Without someone requesting something specific we're guessing what they need here, and then changing it might be a breaking change in the cli configuration.
cmd/lakectl/cmd/root.go
Outdated
@@ -140,6 +147,10 @@ const ( | |||
allowEmptyMsgFlagName = "allow-empty-message" | |||
fmtErrEmptyMsg = `commit with no message without specifying the "--allow-empty-message" flag` | |||
metaFlagName = "meta" | |||
|
|||
defaultMaxRetries = 3 | |||
defaultMaxRetryInterval = 1 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too low IMO
defaultMaxRetryInterval = 1 * time.Second | |
defaultMaxRetryInterval = 5 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used a different value, but raised the max interval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No comments other than not understanding the motivation for the chosen default values.
cmd/lakectl/cmd/root.go
Outdated
defaultMaxRetries = 3 | ||
defaultMaxRetryInterval = 1 * time.Second | ||
defaultMinRetryInterval = 200 * time.Millisecond |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand these numbers. Suppose the HTTP server never sends a "retry after" header (I don't think lakeFS does). Then we have 3 retries, starting with 0.2 s and going up to 1 second. But 3 retries will sleep for 0.2s and then for 0.4s. When does this max retry interval ever come into play?
If we had 6 retries, then IIUC our wait sequence would be (in seconds) 0.2, 0.4, 0.8, 1.0, 1.0, 1.0.
For example, AFAICT this is the default configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we had 6 retries, then IIUC our wait sequence would be (in seconds) 0.2, 0.4, 0.8, 1.0, 1.0, 1.0
Yes, that's accurate. I'm not sure why this is a problem, but I don't mind using the package defaults instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I still prefer that we start with leaner configuration but won't block on it..
I want to take this opportunity to say that @arielshaqed was right! I should unwrap the error 😊 |
* lakectl - Support Retries
Closes #7705
Change Description
Background
See the associated issue for context.
New Feature
This PR creates a new HTTP client for lakectl, based on go-retryablehttp. It supports retries with a custom retry policy tailored for lakeFS and an exponential backoff retry strategy. Its constructor func includes several parameters to tune the retry max count and intervals and the option to provide a custom http.Transport. For seamless integration, the constructor returns a standard *http.Client.
Testing Details
This PR includes a suite of tests for the custom retry policy. The retry mechanism was tested using the abuse command with a playground installation, a modified DDB table, and a small number of RCUs/WCUs.
The abuse command:
Without retries, this command fails with HTTP 500 errors. With retires it retries the failures and completes successfully.
Thanks @itaiad200 for the abuse command ✌🏻