**Amazon Simple Storage Service (Amazon S3)** is an object storage service that offers scalability, data availability, security, and performance.


Amazon S3 is designed for 99.999999999% (11 9's) of durability, and stores data for millions of applications for companies all around the world.


An **Amazon S3 bucket** is a storage location to hold files. S3 files are referred to as **objects**.



**Boto 3** Boto3 is the Amazon Web Services (AWS) Software Development Kit (SDK) for Python, which allows Python developers to write software that makes use of services like Amazon S3 and Amazon EC2.


**Create an Amazon S3 bucket**

The name of an Amazon S3 bucket must be unique across all regions of the AWS platform. The bucket can be located in a specific region to minimize latency or to address regulatory requirements.

In [1]:
import logging
import boto3
from botocore.exceptions import ClientError

In [17]:
AWS_ACCESS_KEY_ID = "xxx"
AWS_SECRET_ACCESS_KEY = "xxx"
region_name = "us-east-1"

In [15]:
# Create an S3 client
s3_client = boto3.client(
    's3',
    region_name=region_name,
    aws_access_key_id=AWS_ACCESS_KEY_ID,
    aws_secret_access_key=AWS_SECRET_ACCESS_KEY
)

In [21]:
# Retrieve the list of buckets
response = s3_client.list_buckets()

# Extract bucket names
buckets = response['Buckets']

# Print bucket names
for bucket in buckets:
    print(bucket['Name'])

my-s3-python
my-s3-python-v2


In [22]:
# Define the name of the bucket
bucket_name = "my-s3-python-v2"

from botocore.exceptions import NoCredentialsError, PartialCredentialsError, ClientError

try:
    if region_name == 'us-east-1':
        # No LocationConstraint required for us-east-1
        response = s3_client.create_bucket(
            Bucket=bucket_name
        )
    else:
        # Specify LocationConstraint for all other regions
        response = s3_client.create_bucket(
            Bucket=bucket_name,
            CreateBucketConfiguration={
                'LocationConstraint': region_name
            }
        )
    print(f'Bucket {bucket_name} created successfully.')
except NoCredentialsError:
    print('Credentials not available.')
except PartialCredentialsError:
    print('Incomplete credentials provided.')
except ClientError as e:
    print(f'Client error: {e}')
# ERROR: lientError: An error occurred (InvalidLocationConstraint) when calling the CreateBucket operation: The specified location-constraint is not valid

Bucket my-s3-python-v2 created successfully.


In [23]:
response

{'ResponseMetadata': {'RequestId': 'BKW1TCD0H67NSZHE',
  'HostId': 'ZPBwjnaNADT6zI/B9AUibqtbmxpO0JOI241355fwtVYsw9/PgSLSrkYufLuMfQgVnHdjGb8kB/w=',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amz-id-2': 'ZPBwjnaNADT6zI/B9AUibqtbmxpO0JOI241355fwtVYsw9/PgSLSrkYufLuMfQgVnHdjGb8kB/w=',
   'x-amz-request-id': 'BKW1TCD0H67NSZHE',
   'date': 'Fri, 09 Aug 2024 19:29:32 GMT',
   'location': '/my-s3-python-v2',
   'server': 'AmazonS3',
   'content-length': '0'},
  'RetryAttempts': 0},
 'Location': '/my-s3-python-v2'}

## Listing Buckets

In [24]:
def get_buckets(s3_client):
    # Retrieve the list of buckets
    response = s3_client.list_buckets()
    
    # Extract bucket names
    buckets = response['Buckets']
    
    # Print bucket names
    for bucket in buckets:
        print(bucket['Name'])

In [25]:
get_buckets(s3_client=s3_client)

my-s3-python
my-s3-python-v2


## Uploading files


In [27]:
bucket_name = "my-s3-python-v2"

# List of files to upload
file_list = [
    'corpus/dog/d1.jpg',
    'corpus/dog/d2.jpg',
    'corpus/dog/d3.jpg'    
]

def upload_files_to_s3(s3_client, bucket_name, file_list):
    for file_path in file_list:
        try:
            # Extract the file name from the path --> object name = file_name in Bucket
            file_name = file_path.split('/')[-1]
            
            # Upload the file
            s3_client.upload_file(file_path, bucket_name, file_name)
            print(f'Successfully uploaded {file_name} to bucket {bucket_name}.')
        except FileNotFoundError:
            print(f'File not found: {file_path}')
        except NoCredentialsError:
            print('Credentials not available.')
        except PartialCredentialsError:
            print('Incomplete credentials provided.')
        except ClientError as e:
            print(f'Client error: {e}')


In [28]:
# Upload the files
upload_files_to_s3(s3_client, bucket_name, file_list)

Successfully uploaded d1.jpg to bucket my-s3-python-v2.
Successfully uploaded d2.jpg to bucket my-s3-python-v2.
Successfully uploaded d3.jpg to bucket my-s3-python-v2.


## Upload as File Object

To upload a file to an S3 bucket as a file object using boto3, you can use the upload_fileobj method. 

This method allows you to upload a file-like object directly to S3. 

This is particularly useful when you have files coming from sources like in-memory objects or files opened with Python's file handling methods.

In [29]:
bucket_name = "my-s3-python-v2"

# List of files to upload
file_list = [
    'corpus/dog/d1.jpg',
    'corpus/dog/d2.jpg',
    'corpus/dog/d3.jpg'    
]

def upload_fileobj_to_s3(s3_client, bucket_name, file_path):
    try:
        # Open the file in binary read mode
        with open(file_path, 'rb') as file_obj:
            # Extract the file name from the path
            file_name = "file_object_" + file_path.split('/')[-1]
            
            # Upload the file object
            s3_client.upload_fileobj(file_obj, bucket_name, file_name)
            print(f'Successfully uploaded {file_name} to bucket {bucket_name}.')
    except FileNotFoundError:
        print(f'File not found: {file_path}')
    except NoCredentialsError:
        print('Credentials not available.')
    except PartialCredentialsError:
        print('Incomplete credentials provided.')
    except ClientError as e:
        print(f'Client error: {e}')

In [30]:
# Upload each file in the list
for file_path in file_list:
    upload_fileobj_to_s3(s3_client, bucket_name, file_path)

Successfully uploaded file_object_d1.jpg to bucket my-s3-python-v2.
Successfully uploaded file_object_d2.jpg to bucket my-s3-python-v2.
Successfully uploaded file_object_d3.jpg to bucket my-s3-python-v2.


In [31]:
get_buckets(s3_client=s3_client)

my-s3-python
my-s3-python-v2


# Get list all files (or objects) in given S3 bucket

This method returns a list of objects in the bucket, which you can then iterate over to get the file names.

In [32]:
def list_files_in_bucket(s3_client, bucket_name):
    try:
        # List objects in the specified bucket
        response = s3_client.list_objects_v2(Bucket=bucket_name)

        # Check if the bucket contains objects
        if 'Contents' in response:
            # Extract file names
            files = [obj['Key'] for obj in response['Contents']]
            return files
        else:
            print(f'No objects found in bucket {bucket_name}.')
            return []
    except NoCredentialsError:
        print('Credentials not available.')
        return []
    except PartialCredentialsError:
        print('Incomplete credentials provided.')
        return []
    except ClientError as e:
        print(f'Client error: {e}')
        return []

# Get the list of files
files = list_files_in_bucket(s3_client, bucket_name)

# Print the file names
for file in files:
    print(file)

d1.jpg
d2.jpg
d3.jpg
file_object_d1.jpg
file_object_d2.jpg
file_object_d3.jpg


Pagination: If your bucket contains a large number of files, you may need to handle pagination. The list_objects_v2 method returns a maximum of 1000 objects by default. To handle more objects, use the ContinuationToken to paginate through results.

In [33]:
def list_files_in_bucket_pagination(s3_client, bucket_name):
    try:
        paginator = s3_client.get_paginator('list_objects_v2')
        file_list = []
        for page in paginator.paginate(Bucket=bucket_name):
            if 'Contents' in page:
                file_list.extend([obj['Key'] for obj in page['Contents']])
        return file_list
    except NoCredentialsError:
        print('Credentials not available.')
        return []
    except PartialCredentialsError:
        print('Incomplete credentials provided.')
        return []
    except ClientError as e:
        print(f'Client error: {e}')
        return []


In [34]:
# Get the list of files
files = list_files_in_bucket_pagination(s3_client, bucket_name)

# Print the file names
for file in files:
    print(file)

d1.jpg
d2.jpg
d3.jpg
file_object_d1.jpg
file_object_d2.jpg
file_object_d3.jpg


## Extra Args
Both upload_file and upload_fileobj accept an optional ExtraArgs parameter that can be used for various purposes.

Some Important ExtraArgs


In [38]:
# permission = public
file_path = 'corpus/dog/d1.jpg'
object_name = "public_" + file_path.split('/')[-1]

#response = s3_client.upload_file(file_path, bucket_name, object_name, ExtraArgs={"ACL": "public-read"})
# S3UploadFailedError: Failed to upload corpus/dog/d1.jpg to my-s3-python-v2/publicd1.jpg: 
# An error occurred (AccessControlListNotSupported) when calling the PutObject operation: The bucket does not allow ACLs

try:
    # Upload the file with public-read permission
    s3_client.upload_file(
        Filename=file_path,
        Bucket=bucket_name,
        Key="public_" + file_path.split('/')[-1],
        ExtraArgs={'ACL': 'public-read'}
    )
    print(f'Successfully uploaded {file_path} to bucket {bucket_name} with public-read access.')
except FileNotFoundError:
    print(f'File not found: {file_path}')
except NoCredentialsError:
    print('Credentials not available.')
except PartialCredentialsError:
    print('Incomplete credentials provided.')
except ClientError as e:
    print(f'Client error: {e}')

Successfully uploaded corpus/dog/d1.jpg to bucket my-s3-python-v2 with public-read access.


## Downloading files

The methods provided by the AWS SDK for Python to download files are similar to those provided to upload files.


The download_file method accepts the names of the bucket and object to download and the filename to save the file to.

In [39]:
object_key = 'd1.jpg'  # The key (path) of the object in S3
local_file_path = 'static/images/' + object_key   # Path to save the downloaded file locally

try:
    # Download the file from S3
    s3_client.download_file(bucket_name, object_key, local_file_path)
    print(f'Successfully downloaded {object_key} from bucket {bucket_name} to {local_file_path}.')
except NoCredentialsError:
    print('Credentials not available.')
except PartialCredentialsError:
    print('Incomplete credentials provided.')
except ClientError as e:
    print(f'Client error: {e}')


Successfully downloaded d1.jpg from bucket my-s3-python-v2 to static/images/d1.jpg.


In [40]:
def download_fileobj_from_s3(s3_client, bucket_name, object_key, file_path):
    try:
        # Open a file-like object to write the downloaded data
        with open(file_path, 'wb') as file_obj:
            # Download the file from S3 to the file-like object
            s3_client.download_fileobj(bucket_name, object_key, file_obj)
            print(f'Successfully downloaded {object_key} from bucket {bucket_name} to {file_path}.')
    except NoCredentialsError:
        print('Credentials not available.')
    except PartialCredentialsError:
        print('Incomplete credentials provided.')
    except ClientError as e:
        print(f'Client error: {e}')

In [41]:
object_key = "file_object_d2.jpg"
local_file_path = 'static/images/' + object_key

# Download the file object
download_fileobj_from_s3(s3_client, bucket_name, object_key, local_file_path)

Successfully downloaded file_object_d2.jpg from bucket my-s3-python-v2 to static/images/file_object_d2.jpg.


## File transfer configuration


When uploading, downloading, or copying a file or S3 object, the AWS SDK for Python automatically manages retries and multipart and non-multipart transfers.

The management operations are performed by using reasonable default settings that are well-suited for most scenarios. To handle a special case, the default settings can be configured to meet requirements.

## Multipart transfers

Multipart transfers occur when the file size exceeds the value of the multipart_threshold attribute.


In [45]:
import boto3
from boto3.s3.transfer import TransferConfig
from botocore.exceptions import NoCredentialsError, PartialCredentialsError, ClientError


def upload_large_file_to_s3(s3_client, file_path, bucket_name, 
                            multipart_threshold=1024 * 1024 * 5,  # 5 MB
                            multipart_chunksize=1024 * 1024 * 5,  # 5 MB
                            max_concurrency=4):
    """
    Uploads a large file to an S3 bucket with multipart upload.

    :param s3_client: S3 client.
    :param file_path: Local path to the file to upload.
    :param bucket_name: Name of the S3 bucket.    
    :param multipart_threshold: The size threshold for multipart uploads in bytes.
    :param multipart_chunksize: The size of each part in the multipart upload in bytes.
    :param max_concurrency: Number of threads to use for uploading parts in parallel.
    """    
    # Configure multipart upload threshold and chunk size
    config = TransferConfig(
        multipart_threshold=multipart_threshold,
        multipart_chunksize=multipart_chunksize,
        max_concurrency=max_concurrency,
        use_threads=True
    )

    try:
        # Upload the file using multipart uploads if needed
        s3_client.upload_file(
            Filename=file_path,
            Bucket=bucket_name,
            Key="big_" + file_path.split('/')[-1],
            Config=config  # Apply the custom TransferConfig
        )
        print(f'Successfully uploaded {file_path} to bucket {bucket_name}.')
    except NoCredentialsError:
        print('Credentials not available.')
    except PartialCredentialsError:
        print('Incomplete credentials provided.')
    except ClientError as e:
        print(f'Client error: {e}')


In [46]:
file_path = 'corpus/dog/d3.jpg'
# object_name = "big_" + file_path.split('/')[-1]
upload_large_file_to_s3(
    s3_client=s3_client,
    file_path=file_path,
    bucket_name=bucket_name        
)

Successfully uploaded corpus/dog/d3.jpg to bucket my-s3-python-v2.


## Presigned URLs

A user who does not have AWS credentials or permission to access an S3 object can be granted temporary access by using a presigned URL.

A presigned URL is generated by an AWS user who has access to the object. The generated URL is then given to the unauthorized user. The presigned URL can be entered in a browser or used by a program or HTML webpage. The credentials used by the presigned URL are those of the AWS user who generated the URL.

A presigned URL remains valid for a limited period of time which is specified when the URL is generated.

In [48]:
import boto3
from botocore.exceptions import NoCredentialsError, PartialCredentialsError, ClientError
from urllib.parse import quote_plus

def generate_presigned_url_get(s3_client, bucket_name, object_key, expiration=3600):
    """
    Generate a presigned URL to download an S3 object.

    :param s3_client
    :param bucket_name: Name of the S3 bucket.
    :param object_key: The key (path) of the object in S3.    
    :param expiration: Time in seconds for the presigned URL to remain valid (default is 1 hour).
    :return: Presigned URL as a string.
    """    
    try:
        # Generate the presigned URL for the GET operation
        response = s3_client.generate_presigned_url('get_object',
                                                    Params={'Bucket': bucket_name, 'Key': object_key},
                                                    ExpiresIn=expiration)
        return response
    except NoCredentialsError:
        print('Credentials not available.')
    except PartialCredentialsError:
        print('Incomplete credentials provided.')
    except ClientError as e:
        print(f'Client error: {e}')


In [51]:
object_key = 'd1.jpg'  # The key (path) of the object in S3
presigned_url = generate_presigned_url_get(
    s3_client=s3_client,
    bucket_name=bucket_name,
    object_key=object_key    
)

print(f'Presigned URL: {presigned_url}')

Presigned URL: https://my-s3-python-v2.s3.amazonaws.com/d1.jpg?AWSAccessKeyId=AKIAQJ2NLI3ANYMUWVFU&Signature=K8X3MaetLYJoyNhGZmfb0bk8%2Fe0%3D&Expires=1723238929


## Bucket policies

An S3 bucket can have an optional policy that grants access permissions to other AWS accounts or AWS Identity and Access Management (IAM) users. Bucket policies are defined using the same JSON format as a resource-based IAM policy.

## Retrieve a Bucket Policy

In [52]:
import boto3
from botocore.exceptions import NoCredentialsError, PartialCredentialsError, ClientError

def get_bucket_policy(s3_client, bucket_name):
    """
    Retrieve the policy of an S3 bucket.

    :param s3_client
    :param bucket_name: Name of the S3 bucket.
    
    :return: Bucket policy as a JSON string or an error message.
    """    
    try:
        # Retrieve the bucket policy
        response = s3_client.get_bucket_policy(Bucket=bucket_name)
        policy = response.get('Policy')
        return policy
    except NoCredentialsError:
        return 'Credentials not available.'
    except PartialCredentialsError:
        return 'Incomplete credentials provided.'
    except ClientError as e:
        return f'Client error: {e}'


In [55]:
bucket_policy = get_bucket_policy(
    s3_client=s3_client,
    bucket_name=bucket_name
)

print(f'Bucket Policy: {bucket_policy}')
# Bucket Policy: Client error: An error occurred (AllAccessDisabled) when calling the GetBucketPolicy operation: All access to this object has been disabled

Bucket Policy: {"Version":"2012-10-17","Id":"Policy1723233518257","Statement":[{"Sid":"Stmt1723233514556","Effect":"Allow","Principal":"*","Action":"s3:GetObject","Resource":"arn:aws:s3:::my-s3-python-v2/*"}]}


## Set a bucket policy

A bucket's policy can be set by calling the put_bucket_policy method.

The policy is defined in the same JSON format as an IAM policy. 



## Policy Format

The **Sid (statement ID)** is an optional identifier that you provide for the policy statement. You can assign a Sid value to each statement in a statement array.

The **Effect** element is required and specifies whether the statement results in an allow or an explicit deny. Valid values for Effect are Allow and Deny.

By default, access to resources is denied. 

Use the **Principal** element in a policy to specify the principal that is allowed or denied access to a resource.

You can specify any of the following principals in a policy:

- AWS account and root user
- IAM users
- Federated users (using web identity or SAML federation)
- IAM roles
- Assumed-role sessions
- AWS services
- Anonymous users


The **Action** element describes the specific action or actions that will be allowed or denied. 

We specify a value using a service namespace as an action prefix (iam, ec2, sqs, sns, s3, etc.) followed by the name of the action to allow or deny.

The **Resource** element specifies the object or objects that the statement covers. We specify a resource using an ARN. Amazon Resource Names (ARNs) uniquely identify AWS resources.

Let's define a policy that enables any user to retrieve any object stored in the bucket identified by the bucket_name variable.

In [59]:
import boto3
from botocore.exceptions import NoCredentialsError, PartialCredentialsError, ClientError
import json

# Define the bucket policy
custom_bucket_policy = {
    "Version": "2012-10-17",
    "Id": "Policy1723233518257",
    "Statement": [
        {
            "Sid": "Stmt1723233514556",
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:GetObject",
            "Resource": f"arn:aws:s3:::{bucket_name}/*"
        }
    ]
}

def set_bucket_policy(s3_client, bucket_name, bucket_policy):
    """
    Set a bucket policy to allow public read access to all objects in the bucket.

    :param s3_client: S3 client.
    :param bucket_name: Name of the S3 bucket.    
    :param bucket_policy
    """    
    # Convert the policy to a JSON string
    policy_string = json.dumps(bucket_policy)

    try:
        # Set the bucket policy
        s3_client.put_bucket_policy(
            Bucket=bucket_name,
            Policy=policy_string
        )
        print(f'Successfully set the policy for bucket {bucket_name}.')
    except NoCredentialsError:
        print('Credentials not available.')
    except PartialCredentialsError:
        print('Incomplete credentials provided.')
    except ClientError as e:
        print(f'Client error: {e}')

In [63]:
# Example usage
set_bucket_policy(
    s3_client=s3_client,
    bucket_name=bucket_name,
    bucket_policy=custom_bucket_policy    
)

Successfully set the policy for bucket my-s3-python-v2.


## Delete a bucket policy


In [61]:
import boto3
from botocore.exceptions import NoCredentialsError, PartialCredentialsError, ClientError

def delete_bucket_policy(s3_client, bucket_name):
    """
    Delete the policy of an S3 bucket.

    :param s3_client
    :param bucket_name: Name of the S3 bucket.
    
    """
    try:
        # Delete the bucket policy
        s3_client.delete_bucket_policy(Bucket=bucket_name)
        print(f'Successfully deleted the policy for bucket {bucket_name}.')
    except NoCredentialsError:
        print('Credentials not available.')
    except PartialCredentialsError:
        print('Incomplete credentials provided.')
    except ClientError as e:
        print(f'Client error: {e}')


In [62]:
# Example usage
delete_bucket_policy(
    s3_client=s3_client,
    bucket_name=bucket_name    
)

Successfully deleted the policy for bucket my-s3-python-v2.


## CORS Configuration

Cross Origin Resource Sharing (CORS) enables client web applications in one domain to access resources in another domain. An S3 bucket can be configured to enable cross-origin requests. The configuration defines rules that specify the allowed origins, HTTP methods (GET, PUT, etc.), and other elements.

## Retrieve a bucket CORS configuration

Retrieve a bucket's CORS configuration by calling the AWS SDK for Python get_bucket_cors method.


In [64]:
import boto3
from botocore.exceptions import NoCredentialsError, PartialCredentialsError, ClientError

def get_bucket_cors(s3_client, bucket_name):
    """
    Retrieve the CORS configuration of an S3 bucket.

    :param s3_client
    :param bucket_name: Name of the S3 bucket.    
    :return: CORS configuration as a JSON string or an error message.
    """    
    try:
        # Retrieve the CORS configuration
        response = s3_client.get_bucket_cors(Bucket=bucket_name)
        cors_configuration = response.get('CORSRules')
        return cors_configuration
    except NoCredentialsError:
        return 'Credentials not available.'
    except PartialCredentialsError:
        return 'Incomplete credentials provided.'
    except ClientError as e:
        return f'Client error: {e}'


In [65]:
cors_config = get_bucket_cors(
    s3_client=s3_client,
    bucket_name=bucket_name
)

print(f'CORS Configuration: {cors_config}')
# CORS Configuration: Client error: An error occurred (NoSuchCORSConfiguration) 
# when calling the GetBucketCors operation: The CORS configuration does not exist

CORS Configuration: Client error: An error occurred (NoSuchCORSConfiguration) when calling the GetBucketCors operation: The CORS configuration does not exist


## Set Bucket CORS

To set the CORS (Cross-Origin Resource Sharing) configuration for an Amazon S3 bucket using Python and boto3, you will define a CORS configuration in JSON format and apply it to the bucket using the put_bucket_cors method of the boto3 S3 client.

Here's a step-by-step guide to set the CORS configuration for an S3 bucket:

Example: Setting a Bucket CORS Configuration

### CORS Configuration Format

The CORS configuration is defined in JSON format and typically includes rules that specify:
+ Allowed origins (domains that can access the resources).
+ Allowed methods (HTTP methods like GET, POST).
+ Allowed headers (HTTP headers that can be used in requests).
+ Expose headers (headers that the client can access).

In [66]:
import boto3
from botocore.exceptions import NoCredentialsError, PartialCredentialsError, ClientError
import json

def set_bucket_cors(s3_client, bucket_name):
    """
    Set the CORS configuration for an S3 bucket.

    :param s3_client
    :param bucket_name: Name of the S3 bucket.    
    """
    # Define the CORS configuration
    cors_configuration = {
        "CORSRules": [
            {
                "AllowedHeaders": ["*"],
                "AllowedMethods": ["GET", "PUT", "POST", "DELETE"],
                "AllowedOrigins": ["*"],
                "ExposeHeaders": ["ETag"],
                "MaxAgeSeconds": 3000
            }
        ]
    }

    # Convert the CORS configuration to a JSON string
    cors_configuration_json = json.dumps(cors_configuration)
    
    try:
        # Set the CORS configuration
        s3_client.put_bucket_cors(
            Bucket=bucket_name,
            CORSConfiguration=cors_configuration
        )
        print(f'Successfully set the CORS configuration for bucket {bucket_name}.')
    except NoCredentialsError:
        print('Credentials not available.')
    except PartialCredentialsError:
        print('Incomplete credentials provided.')
    except ClientError as e:
        print(f'Client error: {e}')

In [67]:
# Example usage
set_bucket_cors(
    s3_client=s3_client,
    bucket_name=bucket_name
)

Successfully set the CORS configuration for bucket my-s3-python-v2.


In [68]:
# Review again CORS
cors_config = get_bucket_cors(
    s3_client=s3_client,
    bucket_name=bucket_name
)

print(f'CORS Configuration: {cors_config}')

CORS Configuration: [{'AllowedHeaders': ['*'], 'AllowedMethods': ['GET', 'PUT', 'POST', 'DELETE'], 'AllowedOrigins': ['*'], 'ExposeHeaders': ['ETag'], 'MaxAgeSeconds': 3000}]
