-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Problem
When using S3 as remote for DVC, boto3 reads also the configuration file ~/.aws/config typically used by the original AWS CLI tool. Not all configuration parameters from this file have an effect in DVC at the moment. For instance, the last line multipart_threshold = 500MB will be skipped by DVC while the previous line signature_version = s3 will be applied when pushing data into S3:
[default]
output = json
s3 =
signature_version = s3
multipart_threshold = 500MBSuggestion
Suggestion is to support optional yet useful parameters from ~/.aws/config as per AWS CLI S3 documentation to control S3 transfers for performance reasons or to unlock the advanced configuration capabilities.
These parameters should be passed to TransferConfig:
max_concurrent_requests- The maximum number of concurrent requests.max_queue_size- The maximum number of tasks in the task queue.multipart_threshold- The size threshold the CLI uses for multipart transfers of individual files.multipart_chunksize- When using multipart transfers, this is the chunk size that the CLI uses for multipart transfers of individual files.max_bandwidth- The maximum bandwidth that will be consumed for uploading and downloading data to and from Amazon S3.
If it is reasonable and doable - support also the following set of parameters:
use_accelerate_endpoint- Use the Amazon S3 Accelerate endpoint for all s3 and s3api commands. You must first enable S3 Accelerate on your bucket before attempting to use the endpoint. This is mutually exclusive with the use_dualstack_endpoint option.use_dualstack_endpoint- Use the Amazon S3 dual IPv4 / IPv6 endpoint for all s3 and s3api commands. This is mutually exclusive with the use_accelerate_endpoint option.addressing_style- Specifies which addressing style to use. This controls if the bucket name is in the hostname or part of the URL. Value values are: path, virtual, and auto. The default value is auto.payload_signing_enabled- Refers to whether or not to SHA256 sign sigv4 payloads. By default, this is disabled for streaming uploads (UploadPart and PutObject) when using https.
These values must be set under the top level s3 key in the AWS Config File, which has a default location of ~/.aws/config. Below is an example configuration:
[profile development]
aws_access_key_id=foo
aws_secret_access_key=bar
s3 =
max_concurrent_requests = 20
max_queue_size = 10000
multipart_threshold = 64MB
multipart_chunksize = 16MB
max_bandwidth = 50MB/s
use_accelerate_endpoint = true
addressing_style = path