# Connectors - Azure Blob Storage

In this tutorial, you'll learn how to use `ydata-sdk` to interact with Azure Blob Storage.
The `AzureBlobConnector` simplifies connecting to your Azure storage account, allowing 
you to list containers, read and sample files, and write processed or synthetic data back.

### Benefits of Integration
Integrating ydata-sdk with Azure Blob Storage offers several key benefits:

- **Unified Data Access:** Directly interact with data stored in Azure without the need for custom scripts or SDK juggling.
- **Secure and Scalable:** Leverage Azure's secure and scalable cloud infrastructure for enterprise-grade data operations.
- **Data-Centric AI Ready:** Instantly apply ydata-sdk tools—profiling, anonymization, synthetic data generation—on data stored in Azure.
- **Faster Prototyping and Automation:** Simplify workflows for experimentation or CI/CD pipelines with consistent, programmatic access to Azure blobs.
- **Modular and Extensible:** Combine with other connectors or pipeline components in ydata-sdk to build reproducible, cloud-native ML pipelines.

Before running this notebook:
1. Ensure you have an `azure_credentials.json` file with the necessary credentials to access the data in your AzureBlobStorage (read & write permissions)
2. ydata-sdk installed

### Authenticate with your YData account

In [None]:
# Authenticate with your ydata-sdk token - https://dashboard.ydata.ai/
import os

os.environ['YDATA_LICENSE_KEY'] = '{add-your-key}'

## Create your Azure Blob Storage connector

In [None]:
from ydata.utils.formats import read_json

def get_token(token_path: str):
    """
    Utility to load the token from .secrets directory,
    supporting both local and cloud environments.
    """
    return read_json(token_path)

token = get_token('insert-credentials-path')

In [None]:
# 🔗 Initialize the Azure Blob connector
from ydata.connectors import AzureBlobConnector

connector = AzureBlobConnector(**token)

## Read from your Azure Blob Storage

Using the AzureBlobStorage connector it is possible:
- Read a file and a set of files from a blob
- Get a sample of the full dataset
- Write new data to a define blob

In [None]:
# 📦 List all containers in the Azure Storage Account
containers = connector.list()
print("Containers:", containers)

In [None]:
# 📁 List all blobs in a specific container

# Replace with your actual container and prefix path
container_path = "abfs://your-container-name"
blobs = connector.list(path=container_path)
print("Blobs:", blobs)

In [11]:
# 📥 Read a CSV file from Azure Blob Storage
from ydata.connectors.filetype import FileType

csv_path = "https://your-storage-url/path/to/data.csv"

df = connector.read_file(
    path=csv_path,
    file_type=FileType.CSV
)
df.head()

INFO: 2024-06-11 22:08:56,902 Successfully opened session 01ef283f-328f-115a-a977-2da27926d77a
[1mDataset 
 
[0m[1mShape: [0m(5000, 13)
[1mSchema: [0m
         Column Variable type
0            id        string
1           age           int
2        gender           int
3        height           int
4        weight         float
5         ap_hi           int
6         ap_lo           int
7   cholesterol           int
8          gluc           int
9         smoke           int
10         alco           int
11       active           int
12       cardio           int




In [None]:
# 🔍 Read a sample of the CSV file

sample_df = connector.read_sample(
    path="abfs://your-c-name/path/to/data.csv",
    file_type=FileType.CSV
)
sample_df.head()

## Write to your Azure Blob Storage

In [17]:
# 📤 Write data back to Azure Blob Storage

output_path = "abfs://your-container-name/path/to/output.csv"
connector.write_file(df, path=output_path)
print(f"File written to {output_path}")

