# Connectors - Google Cloud Storage (GCS)

In this notebook, you'll learn how to use `ydata-sdk` to connect to Google Cloud Storage (GCS) using the `GCSConnector`.

This enables seamless access to GCS-hosted files for profiling, generating synthetic data, and anonymization workflows.

Before you begin:
1. Make sure you have a service account key with read and write access to your GCS Storage.
2. This key should have access to the appropriate GCS buckets.

### Benefits of Integration
Integrating ydata-sdk with Google Cloud Storage (GCS) offers several key benefits:

- **Direct Cloud Access:** Load and store data directly from GCS buckets using a unified SDK interface.
- **Cloud-Native Workflows:** Avoid manual file downloads and local storage by operating fully in the cloud.
- **Credential Security:** Use Google service account keys securely and consistently across environments.
- **Integrated Tooling:** Unlock data profiling, synthesis, and Q&A generation directly from GCS data sources.
- **Scalability & Automation:** Power reproducible pipelines and ML workflows at scale using cloud infrastructure.

Before running this notebook:
1. Ensure you have an `gcs_credentials.json` file with the necessary credentials to access the data in your GCS storage (read & write permissions)
2. ydata-sdk installed

### Authenticate with your account YData

In [None]:
# Authenticate with your ydata-sdk token - https://dashboard.ydata.ai/
import os

os.environ['YDATA_LICENSE_KEY'] = '{add-your-key}'

## Create your GCS connector

In [None]:
from ydata.utils.formats import read_json

def get_token(token_path: str):
    """
    Utility to load the token from .secrets directory,
    supporting both local and cloud environments.
    """
    return read_json(token_path)

token = get_token('insert-credentials-path')

In [None]:
# 🔗 Initialize the AWS S3 connector
from ydata.connectors import GCSConnector

connector = GCSConnector("insert-bucket-name", 
                         keyfile_dict=keyfile_dict)

## Read from your Google cloud storage

Using the GCS connector it is possible:
- Read a file and a set of files from a folder
- Get a sample of the full dataset
- Write new data to a define folder

In [None]:
# 📁 List blobs (files) inside a specific bucket

blobs = connector.list(key='insert-prefix', 
                       bucket_name='insert-bucket-name')
print("Files:", blobs)

In [None]:
# 📥 Read a file (e.g., CSV) from GCS
from ydata.connectors.filetype import FileType

file_path = "gs://path-to-file/data.csv"

df = connector.read_file(
    path=file_path,
    file_type=FileType.CSV
)
df.head()

In [None]:
# 🔍 Read a sample from the GCS file

sample_df = connector.read_sample(
    path=file_path,
    file_type=FileType.CSV
)
sample_df.head()

## Write to your GCS Storage

In [None]:
# 📤 Write a file back to S3

output_path = "gs://path-to-file/output.csv"
connector.write_file(df, path=output_path)
print(f"File written to {output_path}")