# Azure Blob Storage Connector - Quick Start

The AzureBob storage connector enables you to read/write data within the Azure Blob Storage with ease and integrate it with YData's platform.
Reading a dataset from AzureBlob's directly into a YData's `Dataset` allows its usage for Data Quality, Data Synthetisation and Preprocessing blocks.

The AzureBlobConnector allows the user to perform the following actions:
- **AzureBlobConnector.read_file** -Reads data from a provided blob or folder. Supports both CSV and Parquet files.
- **AzureBlobConnector.read_sample** - Reads a sample (sample_size) data from a provided blob.
- **AzureBlobConnector.write_file** - Writes a file into a defined blob or path.

- **AzureBlobConnector.ls** - Returns the list of blobs available under a folder
- **AzureBlobConnector.delete_blob_if_exists** - Deletes a given blob if exists.


This tutorial covers:
- How to read data from AzureBlobConnector
- How to read data (sample) from AzureBlobConnector
- How to write data to AzureBlobConnector

In [5]:
from ydata.connectors import AzureBlobConnector
from ydata.connectors.filetype import FileType
from ydata.utils.formats import read_json

In [None]:
# Instantiate the Connector
connector =  AzureBlobConnector(account_name='{insert-account-name},
                                account_key='{insert-account-key}')

In [6]:
# Load a dataset
# The file_type argument is optional. If not provided, we will infer it from the path you have provided.
data = connector.read_file('abfs://{insert-blob-name}/{insert-file-path}.csv', file_type=FileType.CSV)
data.head(100)

Unnamed: 0_level_0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V21,V22,V23,V24,V25,V26,V27,V28,Amount,Class
idx,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,0.0,-1.359807,-0.072781,2.536347,1.378155,-0.338321,0.462388,0.239599,0.098698,0.363787,...,-0.018307,0.277838,-0.110474,0.066928,0.128539,-0.189115,0.133558,-0.021053,149.62,0
1,84784.0,-1.263051,1.191247,1.876061,-0.329401,-0.101887,-0.363037,0.594946,0.206058,-0.147843,...,-0.106123,-0.247052,-0.220087,0.434852,0.324459,0.137817,-0.553111,-0.234804,1.00,0
2,0.0,1.191857,0.266151,0.166480,0.448154,0.060018,-0.082361,-0.078803,0.085102,-0.255425,...,-0.225775,-0.638672,0.101288,-0.339846,0.167170,0.125895,-0.008983,0.014724,2.69,0
3,84785.0,-0.476385,0.819241,1.783225,1.262337,0.158102,-0.469522,0.808097,-0.293043,-0.543071,...,0.217016,0.863862,-0.053844,0.620216,-0.349329,-0.260047,-0.102630,-0.055400,25.00,0
4,1.0,-1.358354,-1.340163,1.773209,0.379780,-0.503198,1.800499,0.791461,0.247676,-1.514654,...,0.247998,0.771679,0.909412,-0.689281,-0.327642,-0.139097,-0.055353,-0.059752,378.66,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,84816.0,-1.290459,0.753220,1.765556,0.222141,-0.077401,-0.559196,0.096817,0.508429,-0.814842,...,-0.031442,-0.235994,-0.239864,0.586838,0.346136,0.268368,-0.098641,-0.071095,1.18,0
96,35.0,1.386397,-0.794209,0.778224,-0.864708,-1.064132,0.351296,-1.191455,0.052686,-0.304404,...,-0.228727,-0.123522,-0.131025,-0.929668,0.181379,1.194928,0.000531,0.019911,30.90,0
97,84817.0,0.913043,-0.187099,0.194711,0.562217,0.219358,0.806584,-0.048158,0.286140,-0.078557,...,0.190388,0.573681,0.031548,-0.585174,0.151313,0.496746,0.017620,0.012060,81.68,0
98,35.0,-1.063236,1.418191,1.086673,1.241440,0.002306,0.045902,0.514121,0.241252,-0.154500,...,-0.057228,0.314153,-0.129863,0.114284,-0.027812,-0.259937,0.118754,-0.008956,20.22,0


In [None]:
# For a quick glimpse, we can load a small subset of the data - let's say 100 rows
very_small_data = connector.read_sample('abfs://{insert-blob-name}/{insert-file-path}.csv', sample_size=100, file_type=FileType.CSV)

In [12]:
# Now imagine we want to store the sampled data.
connector.write_file(very_small_data, 'abfs://{insert-blob-name}/{insert-file-path}.csv', file_type=FileType.CSV)

# Advanced

In [None]:
# List the contents under a given blob
connector.ls('abfs://{insert-blob-name}/{insert-file-path}.csv')

In [None]:
# Delete a specific blob
connector.delete_blob_if_exists('abfs://{insert-blob-name}/{insert-file-path}.csv', file_type=FileType.CSV)

In [None]:
#Reading multiple CSV files from a folder
data = connector.read('abfs://{insert-blob-name}/{insert-file-path}.csv', file_type=FileType.CSV)

#orfss

data = connector.read('abfs://{insert-blob-name}/{insert-file-path}.csv', file_type=FileType.CSV)