# Connectors - AWS S3

In this tutorial, you'll learn how to use `ydata-sdk` to connect to AWS S3. 
The `S3Connector` allows you to list buckets, read/write files, and sample datasets stored in S3.

This is useful for integrating synthetic data workflows into cloud-native or hybrid ML pipelines.
Make sure you have your AWS credentials (e.g., access key and secret key) available and with read and write access.

### Benefits of Integration
Integrating ydata-sdk with AWS S3 offers several key benefits:

- **Seamless Cloud Access:** Easily browse, read, and write data from S3 buckets using a unified SDK interface.
- **Cloud-Native Workflows:** Connect directly to your S3-based data lake to enable profiling, synthesis, and anonymization without local downloads.
- **SDK-Wide Compatibility:** All features of ydata-sdk ‚Äî from Q&A generation to synthetic tabular data‚Äîcan operate directly on S3-hosted files.
- **Scalable & Automated:** Ideal for automating recurring workflows or powering large-scale pipelines with S3 as a data backend.

Before running this notebook:
1. Ensure you have an `aws_credentials.json` file with the necessary credentials to access the data in your AWS S3 storage (read & write permissions)
2. ydata-sdk installed

### Authenticate with your account YData

In [None]:
# Authenticate with your ydata-sdk token - https://dashboard.ydata.ai/
import os

os.environ['YDATA_LICENSE_KEY'] = '{add-your-key}'

## Create your AWS S3 connector

In [None]:
from ydata.utils.formats import read_json

def get_token(token_path: str):
    """
    Utility to load the token from .secrets directory,
    supporting both local and cloud environments.
    """
    return read_json(token_path)

token = get_token('insert-credentials-path')

In [None]:
# üîó Initialize the AWS S3 connector
from ydata.connectors import S3Connector

connector = S3Connector(**token)

## Read from your AWS S3

Using the AWS S3 connector it is possible:
- Read a file and a set of files from a folder
- Get a sample of the full dataset
- Write new data to a define folder

In [None]:
# ü™£ List all S3 buckets

buckets = connector.list()
print("Buckets:", buckets)

In [None]:
# üìÅ List files inside a specific S3 bucket

bucket_path = "s3://your-bucket-name/"
files = connector.list(path=bucket_path)
print("Files:", files)

In [None]:
# üì• Read a file from S3
from ydata.connectors.filetype import FileType

file_path = "s3://your-bucket-name/path/to/data.csv"
df = connector.read_file(
    path=file_path,
    file_type=FileType.CSV
)
df.head()

In [None]:
# üîç Read a sample from the S3 file

sample_df = connector.read_sample(
    path=file_path,
    file_type=FileType.CSV
)
sample_df.head()

## Write to your AWS S3 Storage

In [None]:
# üì§ Write a file back to S3

output_path = "s3://your-bucket-name/path/to/output.csv"
connector.write_file(df, path=output_path)
print(f"File written to {output_path}")