Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compact/tools: add hidden option for retrying downloads #2785

Closed
wants to merge 1 commit into from

Conversation

GiedriusS
Copy link
Member

In certain environments, especially private, on-premise clouds,
sometimes transient errors can happen due to misconfiguration or some
reason. In my case, the S3 end-point sometimes had spuriously reset the TCP
connection. Without functionality like this, it was almost impossible
for Thanos Compact to finish its work considering sometimes we have to
download hundreds of gigabytes of files.

Curiously enough, only the downloading path was affected so this only
adds retries for that direction.

Opening this up without any tests to see if it is something we want to
add to Thanos or we should implement
#2784? Or maybe both?

Signed-off-by: Giedrius Statkevičius giedriuswork@gmail.com

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end-user.

Changes

Added exponential backoff with a configurable amount of retries to block.Download

Verification

N/A for now

In certain environments, especially private, on-premise clouds,
sometimes transient errors can happen due to misconfiguration or some
reason. In my case, the S3 end-point sometimes had spuriously reset the TCP
connection. Without functionality like this, it was almost impossible
for Thanos Compact to finish its work considering sometimes we have to
download hundreds of gigabytes of files.

Curiously enough, only the downloading path was affected so this only
adds retries for that direction.

Opening this up without any tests to see if it is something we want to
add to Thanos or we should implement
thanos-io#2784? Or maybe both?

Signed-off-by: Giedrius Statkevičius <giedriuswork@gmail.com>
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @kakkoyun
I think we have similar questions with Kemal, with where to put timeouts. Let's bring this in next Thanos community meeting? Also it would nice to talk about some ideas and tradeoffs here, maybe in some issue 🤔
WDYT?

@@ -499,6 +500,9 @@ func (cc *compactConfig) registerFlag(cmd *kingpin.CmdClause) *compactConfig {
"This works well for deduplication of blocks with **precisely the same samples** like produced by Receiver replication.").
Hidden().StringsVar(&cc.dedupReplicaLabels)

cmd.Flag("download.retries", "How many time to retry downloading a block with exponential backoff before giving up").
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm again, this is similar to the #2908 discussion. Shouldn't this be applied by upload/download client itself?

@stale
Copy link

stale bot commented Sep 22, 2020

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale label Sep 22, 2020
@stale stale bot closed this Sep 30, 2020
@bwplotka
Copy link
Member

😢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants