# S3Connector - Quick Start

The S3Connector enables you to read/write data within the AWS Simple Storage Service with ease and integrate it with YData's platform.
Reading a dataset from S3 directly into a YData's `Dataset` allows its usage for Data Quality, Data Synthetisation and Preprocessing blocks.

The GCSConnector allows the user to perform the following actions:
- **S3Connector.check_bucket** - Validates wether a given bucket is available.
- **S3Connector.ls** - List the files available at a give bucket path.
- **S3Connector.read** - Reads from a folder or file/blob given a certain file_type (Parquet or CSV).
- **S3Connector.read_sample** - Reads a sample (sample_size) from a folder or file/blob given a certain file_type (Parquet or CSV).

This tutorial covers:
- How to read data from S3
- How to read data (sample) from S3
- How to write data to S3


In [2]:
# Import the necessary packages
from ydata.connectors import S3Connector
from ydata.connectors.filetype import FileType
from ydata.utils.formats import read_json

In [3]:
# Load your credentials from a file
#token = read_json('{insert-path-to-credentials}')

token = read_json('s3_credentials.json')

In [4]:
# Instantiate the Connector
connector = S3Connector(**token)


+---------+--------+-----------+---------+
| Package | client | scheduler | workers |
+---------+--------+-----------+---------+
| toolz   | 0.11.2 | 0.12.0    | None    |
+---------+--------+-----------+---------+


In [None]:
# Load a dataset
data = connector.read_file('s3://{insert-bucket}/{insert-filepath}', file_type=FileType.CSV)

In [None]:
# The file_type argument is optional. If not provided, we will infer it from the path you have provided.
parquet_data = connector.read_file('S3://{insert-bucket}/{insert-filepath}.parquet', file_type=FileType.PARQUET)

The parameter "sep" and "has_header" is only considered for fFileType.CSV files.


In [None]:
# For a quick glimpse, we can load a small subset of the data - let's say 100 rows
small_data = connector.read_sample('s3://{insert-bucket}/{insert-filepath}', sample_size=100)

In [None]:
# Now imagine we want to store the sampled data.
connector.write_file(data, 's3://{insert-bucket}/{insert-filepath}', file_type=FileType.CSV)

In [None]:
# Alternatively, we can write a new Dataframe 
from pandas.util.testing import makeDataFrame
dummy_df = makeDataFrame()
connector.write_file(dummy_df, 's3://{insert-bucket}/{insert-filepath}', write_index=True)

## Advanced Features
Connectors provided developer utilities that enable Data Scientists to navigate S3 Storage via code blocks.

* Check if a bucket exists
* List the contents of a bucket

In [None]:
# We can check if a certain bucket exists
connector.check_bucket('{insert-bucket}')

In [None]:
# We can check the contents of a certain bucket
connector.list(bucket_name='{insert-bucket}')

In [None]:
# We can check the contents of the prefix
connector.list('{insert-bucket}', prefix='{insert-prefix}')