# Azure Data Lake Storage Connector - Quick Start

The AzureBob storage connector enables you to read/write data within the Azure Blob Storage and Azure Data Lake with ease and integrate it with YData's platform.
Reading a dataset from AzureBlob's directly into a YData's `Dataset` allows its usage for Data Quality, Data Synthetisation and Preprocessing blocks.

The AzureBlobConnector allows the user to perform the following actions:
- **AzureBlobConnector.get_table** - Reads the data available within a given schema from a selected database. Returns a Dataset object.
- **AzureBlobConnector.query** - Reads the data retrieved by a database query. Returns a Dataset object.
- **AzureBlobConnector.sample_query** - Reads a sample (sample_size) from the a query retrieved data. Returns a Dataset object.
- **AzureBlobConnector.read_database** - Reads the full database data. 
- **AzureBlobConnector.write_table** - Writes a Dataset into a new schema table.

This tutorial covers:
- How to read data from Azure Data Lake Storage
- How to read data (sample) from Azure Data Lake Storage
- How to write data to Azure Data Lake Storage
- Read data from Synapses Data Lake

In [3]:
from ydata.connectors import AzureBlobConnector
from ydata.connectors.filetype import FileType
from ydata.utils.formats import read_json

In [12]:
# Instantiate the Connector
connector =  AzureBlobConnector(account_name='ydatasynapse',
                                account_key='e/lAyP1M76I0ZaT3LTAWDf5hQqg7YBOxvmeVfBxzRhAKw+3E8gDRQJlDukzHW1q+X4oAlaVGhweH+ASt7Stssw==')

In [17]:
# Load a dataset
# The file_type argument is optional. If not provided, we will infer it from the path you have provided.
#Add here the detail on the file path conversation for the Azure Data Lake

#https://{insert-host-name}.dfs.core.windows.net/{insert-container-name}/paysim.csv
# We need to have the following format instead
##abfss://{insert-container-name}@{insert-host-name}.dfs.core.windows.net/{insert-file-path}

data = connector.read_file('abfss://ydata-test@ydatasynapse.dfs.core.windows.net/teste/Synthetic_test.csv', file_type=FileType.CSV)
data.head(100)

Unnamed: 0_level_0,Name,Surname,Gender,Email,Date of birth,Age,Company,Street Num,Street Name,City,...,Zeros_col_44,Zeros_col_45,Zeros_col_46,Zeros_col_47,Zeros_col_48,Zeros_col_49,const_col_0,const_col_1,const_col_2,const_col_3
idx,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,Phillip,Tran,0,qnewton@example.net,1988-10-14,33,"Keller, Gomez and Bruce",751,Brandi Circles,Gibbstown,...,0.000000,-1.699533,1.716764,1.216391,-0.587719,0.268611,1,teste,10,USA
1,Adrian,Schmidt,0,juliemurray@example.org,1983-11-18,38,Carlson and Sons,49047,Lisa Square,West Jaredhaven,...,0.000000,0.631256,-0.168280,0.657332,0.203724,0.000000,1,teste,10,USA
2,Victor,Cole,0,james87@example.net,1958-05-25,64,"Robertson, Patton and Harper",1945,Davis Freeway,Port Ericside,...,0.000000,-0.994021,1.080202,0.600146,0.062191,-0.834428,1,teste,10,USA
3,Stephanie,Hernandez,1,robert48@example.org,1906-09-29,117,Rhodes-Ross,2981,Randy Ramp,South Shelley,...,-0.880854,-1.821816,0.000000,0.802958,-0.723867,0.000000,1,teste,10,USA
4,Taylor,Allen,1,andre62@example.net,1932-05-28,91,Sandoval PLC,724,Christopher Ports,East Jesse,...,0.000000,-0.687172,-1.230364,0.569923,0.018501,0.000000,1,teste,10,USA
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,Jessica,Oliver,1,yking@example.net,1919-01-22,104,Gilbert Ltd,9903,Acosta Stravenue,Williamstad,...,0.000000,-1.925364,0.559183,1.088353,-3.160234,0.000000,1,teste,10,USA
96,David,Johnson,0,angela50@example.net,1925-10-30,97,Jacobson PLC,66061,Wilson Unions,Samanthaview,...,0.000000,-0.911804,0.000000,1.577182,0.854939,-0.575339,1,teste,10,USA
97,Brittany,Chase,1,mary88@example.net,1911-08-27,112,Harvey Group,77188,Robert Underpass,Hillhaven,...,1.313577,-0.821517,0.179651,1.421257,0.922028,0.000000,1,teste,10,USA
98,Lance,Ortiz,0,christinajohnson@example.com,1976-09-24,46,Martinez PLC,200,Taylor Landing,Port Chadfurt,...,0.000000,0.100336,1.466962,-1.133585,0.456118,0.000000,1,teste,10,USA


In [None]:
# For a quick glimpse, we can load a small subset of the data - let's say 100 rows
very_small_data = connector.read_sample('abfss://{insert-container-name}@{insert-host-name}.dfs.core.windows.net/{insert-file-path}.csv', sample_size=100, file_type=FileType.CSV)

In [12]:
# Now imagine we want to store the sampled data.
connector.write_file(very_small_data, 'abfss://{insert-container-name}@{insert-host-name}.dfs.core.windows.net/{insert-file-path}.csv', file_type=FileType.CSV)

# Advanced

In [None]:
# Delete a specific blob
connector.delete_blob_if_exists('abfs://{insert-blob-name}/{insert-file-path}.csv', file_type=FileType.CSV)

In [None]:
#Reading multiple CSV files from a folder
data = connector.read('abfss://{insert-container-name}@{insert-host-name}.dfs.core.windows.net/{insert-file-path}.csv', file_type=FileType.CSV)

#orfss

data = connector.read('abfss://{insert-container-name}@{insert-host-name}.dfs.core.windows.net/{insert-file-path}.csv', file_type=FileType.CSV)

### Exemple - Reading data from Synapse ADLS

In [None]:
#Reading from Synapse Data Lake
connector =  AzureBlobConnector(account_name='{insert-synapse-data-lake-accountname}',
                                account_key='{insert-synapse-data-lake-accountkey}')

#File path from ADLS:
#https://{insert-resource-name}.blob.core.windows.net/{insert-container-name}/{insert-file-path}
#we need to convert it to
#abfs://{insert-container-name}@ydatasynapse.dfs.core.windows.net/{insert-file-path}

data = connector.read_file('abfs://{insert-container-name}@{insert-host-name}.dfs.core.windows.net/{insert-file-path}.csv', file_type=FileType.CSV)

#https://ydata-test@ydatasynapse.dfs.core.windows.net/paysim.csv