# Google Cloud Storage Connector - Quick Start

The CGS connector enables you to read/write data within the Google Cloud Storage with ease and integrate it with YData's platform.
Reading a dataset from GCS directly into a YData's `Dataset` allows its usage for Data Quality, Data Synthetisation and Preprocessing blocks.

The GCSConnector allows the user to perform the following actions:
- **GCSConnector.check_blob** - Validates wether a given blob is available.
- **GCSConnector.delete_blob_if_exists** - Deletes a given blob. This action is only possible if the provided credentials have delete access. 
- **GCSConnector.read** - Reads from a folder or file/blob given a certain file_type (Parquet or CSV).
- **GCSConnector.read_sample** - Reads a sample (sample_size) from a folder or file/blob given a certain file_type (Parquet or CSV).

This tutorial covers:
- How to read data from GCS
- How to read data (sample) from GCS
- How to write data to GCS

In [1]:
# Import the necessary packages
from ydata.connectors import GCSConnector
from ydata.connectors.filetype import FileType
from ydata.utils.formats import read_json

In [2]:
# Load your credentials from a file
#token = read_json('{insert-path-to-credentials}')

# Load your credentials from a file
token = read_json('gcs_credentials.json')

In [3]:
# Instantiate the Connector
connector = GCSConnector(project_id=token['project_id'], keyfile_dict=token)

In [None]:
# Load a dataset
# The file_type argument is optional. If not provided, we will infer it from the path you have provided.
#data = connector.read_file('gs://{insert-bucket}/{insert-filepath}', file_type=FileType.CSV)

data = connector.read_file('gs://{insert-bucket}/{insert-filepath}', file_type=FileType.CSV)
data.head(100)

In [None]:
# For a quick glimpse, we can load a small subset of the data - let's say 100 rows
very_small_data = connector.read_sample('gs://{insert-bucket}/{insert-filepath}', sample_size=100, file_type=FileType.CSV)

In [None]:
# Now imagine we want to store the sampled data.
connector.write_file(small_data, 'gs://{insert-bucket}/{insert-filepath}')

In [None]:
# Alternatively, we can write a new Dataframe 
from pandas.util.testing import makeDataFrame
dummy_df = makeDataFrame()
connector.write_file(dummy_df, 'gs://{insert-bucket}/{insert-filepath}', write_index=True)

## Advanced
Advanced features enable you to manage Google Cloud Storage directly through the connector.

In [None]:
#Reading multiple CSV files from a folder
data = connector.read('gs://{insert-bucket}/{insert-folder}/*.csv', file_type=FileType.CSV)

#or

data = connector.read('gs://{insert-bucket}/{insert-folder}/', file_type=FileType.CSV)

In [None]:
# Delete a specific blob
connector.delete_blob_if_exists('gs://{insert-bucket}/{insert-filepath}')

In [None]:
# List the contents under a given bucket
connector.ls('gs://{insert-bucket}/')

In [None]:
# List the contents under a given bucket
connector.ls('gs://{insert-bucket}/{insert-path}')