Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add configurable retry policy for S3 client #21900

Merged
merged 1 commit into from
May 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion docs/src/main/sphinx/object-storage/file-system-s3.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,16 @@ support:
* - `s3.http-proxy`
- URL of a HTTP proxy server to use for connecting to S3.
* - `s3.http-proxy.secure`
- Set to `true` to enable HTTPS for the proxy server..
- Set to `true` to enable HTTPS for the proxy server.
* - `s3.retry-mode`
- Specifies how the AWS SDK attempts retries. Default value is `LEGACY`.
Other allowed values are `STANDARD` and `ADAPTIVE`. The `STANDARD` mode
includes a standard set of errors that are retried. `ADAPTIVE` mode is
experimental, which includes the functionality of `STANDARD` mode and
includes automatic client-side throttling.
* - `s3.max-error-retries`
- Specifies maximum number of retries the client will make on errors.
Defaults to `10`.
:::

## Authentication
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,22 @@ public static ObjectCannedACL getCannedAcl(S3FileSystemConfig.ObjectCannedAcl ca
}
}

public enum RetryMode
{
anusudarsan marked this conversation as resolved.
Show resolved Hide resolved
STANDARD,
LEGACY,
ADAPTIVE;

public static software.amazon.awssdk.core.retry.RetryMode getRetryMode(RetryMode retryMode)
findepi marked this conversation as resolved.
Show resolved Hide resolved
{
return switch (retryMode) {
case STANDARD -> software.amazon.awssdk.core.retry.RetryMode.STANDARD;
case LEGACY -> software.amazon.awssdk.core.retry.RetryMode.LEGACY;
case ADAPTIVE -> software.amazon.awssdk.core.retry.RetryMode.ADAPTIVE;
};
}
}

private String awsAccessKey;
private String awsSecretKey;
private String endpoint;
Expand All @@ -83,6 +99,8 @@ public static ObjectCannedACL getCannedAcl(S3FileSystemConfig.ObjectCannedAcl ca
private HostAndPort httpProxy;
private boolean httpProxySecure;
private ObjectCannedAcl objectCannedAcl = ObjectCannedAcl.NONE;
private RetryMode retryMode = RetryMode.LEGACY;
private int maxErrorRetries = 10;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you inspire yourself from hadoop s3 code with the number of retries?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes


public String getAwsAccessKey()
{
Expand Down Expand Up @@ -224,6 +242,32 @@ public S3FileSystemConfig setCannedAcl(ObjectCannedAcl objectCannedAcl)
return this;
}

public RetryMode getRetryMode()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully follow why we would want to configure the retry mode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was not particularly revealing any problems as per our internal benchmarks. But according AWS support Standard Retry mechanism helps in throttling issues most of the time. As per https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/ the default "Equal Jitter" is the loser. So having this configurable might help for some workloads, and wouldnt hurt to be exposed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change the default to be STANDARD? I wonder why they default to LEGACY.

{
return retryMode;
}

@Config("s3.retry-mode")
anusudarsan marked this conversation as resolved.
Show resolved Hide resolved
anusudarsan marked this conversation as resolved.
Show resolved Hide resolved
@ConfigDescription("Specifies how the AWS SDK attempts retries, default is LEGACY")
public S3FileSystemConfig setRetryMode(RetryMode retryMode)
{
this.retryMode = retryMode;
return this;
}

@Min(1) // minimum set to 1 as the SDK validates this has to be > 0
public int getMaxErrorRetries()
{
return maxErrorRetries;
}

@Config("s3.max-error-retries")
public S3FileSystemConfig setMaxErrorRetries(int maxErrorRetries)
anusudarsan marked this conversation as resolved.
Show resolved Hide resolved
{
this.maxErrorRetries = maxErrorRetries;
return this;
}

@NotNull
public S3SseType getSseType()
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
import software.amazon.awssdk.auth.credentials.AwsSessionCredentials;
import software.amazon.awssdk.auth.credentials.StaticCredentialsProvider;
import software.amazon.awssdk.core.client.config.ClientOverrideConfiguration;
import software.amazon.awssdk.core.retry.RetryPolicy;
import software.amazon.awssdk.http.apache.ApacheHttpClient;
import software.amazon.awssdk.http.apache.ProxyConfiguration;
import software.amazon.awssdk.regions.Region;
Expand All @@ -37,6 +38,7 @@
import java.net.URI;
import java.util.Optional;

import static io.trino.filesystem.s3.S3FileSystemConfig.RetryMode.getRetryMode;
import static io.trino.filesystem.s3.S3FileSystemConstants.EXTRA_CREDENTIALS_ACCESS_KEY_PROPERTY;
import static io.trino.filesystem.s3.S3FileSystemConstants.EXTRA_CREDENTIALS_SECRET_KEY_PROPERTY;
import static io.trino.filesystem.s3.S3FileSystemConstants.EXTRA_CREDENTIALS_SESSION_TOKEN_PROPERTY;
Expand All @@ -53,11 +55,15 @@ public S3FileSystemFactory(OpenTelemetry openTelemetry, S3FileSystemConfig confi
{
S3ClientBuilder s3 = S3Client.builder();

RetryPolicy retryPolicy = RetryPolicy.builder(getRetryMode(config.getRetryMode()))
.numRetries(config.getMaxErrorRetries())
.build();
s3.overrideConfiguration(ClientOverrideConfiguration.builder()
.addExecutionInterceptor(AwsSdkTelemetry.builder(openTelemetry)
.setCaptureExperimentalSpanAttributes(true)
.setRecordIndividualHttpError(true)
.build().newExecutionInterceptor())
.retryPolicy(retryPolicy)
.build());

Optional<StaticCredentialsProvider> staticCredentialsProvider = getStaticCredentialsProvider(config);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@
import static io.airlift.configuration.testing.ConfigAssertions.assertRecordedDefaults;
import static io.airlift.configuration.testing.ConfigAssertions.recordDefaults;
import static io.airlift.units.DataSize.Unit.MEGABYTE;
import static io.trino.filesystem.s3.S3FileSystemConfig.RetryMode.LEGACY;
import static io.trino.filesystem.s3.S3FileSystemConfig.RetryMode.STANDARD;
import static java.util.concurrent.TimeUnit.MINUTES;

public class TestS3FileSystemConfig
Expand All @@ -47,6 +49,8 @@ public void testDefaults()
.setStsRegion(null)
.setCannedAcl(ObjectCannedAcl.NONE)
.setSseType(S3SseType.NONE)
.setRetryMode(LEGACY)
.setMaxErrorRetries(10)
.setSseKmsKeyId(null)
.setStreamingPartSize(DataSize.of(16, MEGABYTE))
.setRequesterPays(false)
Expand Down Expand Up @@ -75,6 +79,8 @@ public void testExplicitPropertyMappings()
.put("s3.sts.endpoint", "sts.example.com")
.put("s3.sts.region", "us-west-2")
.put("s3.canned-acl", "BUCKET_OWNER_FULL_CONTROL")
.put("s3.retry-mode", "STANDARD")
.put("s3.max-error-retries", "12")
.put("s3.sse.type", "KMS")
.put("s3.sse.kms-key-id", "mykey")
.put("s3.streaming.part-size", "42MB")
Expand Down Expand Up @@ -102,6 +108,8 @@ public void testExplicitPropertyMappings()
.setStsRegion("us-west-2")
.setCannedAcl(ObjectCannedAcl.BUCKET_OWNER_FULL_CONTROL)
.setStreamingPartSize(DataSize.of(42, MEGABYTE))
.setRetryMode(STANDARD)
.setMaxErrorRetries(12)
.setSseType(S3SseType.KMS)
.setSseKmsKeyId("mykey")
.setRequesterPays(true)
Expand Down