# S3Connector - Quick Start

The S3Connector enables you to read/write data within the AWS Simple Storage Service with ease and integrate it with YData's platform.
Reading a dataset from S3 directly into a YData's `Dataset` allows its usage for Data Quality, Data Synthetisation and Preprocessing blocks.

The following tutorial covers:
- How to read data from S3
- How to read data (sample) from S3
- How to write data to S3
- (Advanced) Developer utilities

In [None]:
# Import the necessary packages
from ydata.connectors import S3Connector
from ydata.connectors.filetype import FileType
from ydata.utils.formats import read_json

In [None]:
# Load your credentials from a file
token = read_json('{insert-path-to-credentials}')

In [None]:
# Instantiate the Connector
connector = S3Connector(**token)

In [None]:
# Load a dataset
data = connector.read_file('s3://{insert-bucket}/{insert-filepath}', file_type=FileType.CSV)

In [None]:
# The file_type argument is optional. If not provided, we will infer it from the path you have provided.
parquet_data = connector.read_file('S3://{insert-bucket}/{insert-filepath}.parquet')

In [None]:
# For a quick glimpse, we can load a small subset of the data - let's say 100 rows
small_data = connector.read_sample('s3://{insert-bucket}/{insert-filepath}', sample_size=100)

In [None]:
# Now imagine we want to store the sampled data.
connector.write_file(data, 's3://{insert-bucket}/{insert-filepath}')

In [None]:
# Alternatively, we can write a new Dataframe 
from pandas.util.testing import makeDataFrame
dummy_df = makeDataFrame()
connector.write_file(dummy_df, 's3://{insert-bucket}/{insert-filepath}', write_index=True)

## Advanced Features
Connectors provided developer utilities that enable Data Scientists to navigate S3 Storage via code blocks.

* Check if a bucket exists
* List the contents of a bucket

In [None]:
# We can check if a certain bucket exists
connector.check_bucket('{insert-bucket}')

In [None]:
# We can check the contents of a certain bucket
connector.list(bucket_name='{insert-bucket}')

In [None]:
# We can check the contents of the prefix
connector.list('{insert-bucket}', prefix='{insert-prefix}')