# Azure Data Lake Storage Connector - Quick Start

The AzureBob storage connector enables you to read/write data within the Azure Blob Storage and Azure Data Lake with ease and integrate it with YData's platform.
Reading a dataset from AzureBlob's directly into a YData's `Dataset` allows its usage for Data Quality, Data Synthetisation and Preprocessing blocks.

The AzureBlobConnector allows the user to perform the following actions:
- **AzureBlobConnector.get_table** - Reads the data available within a given schema from a selected database. Returns a Dataset object.
- **AzureBlobConnector.query** - Reads the data retrieved by a database query. Returns a Dataset object.
- **AzureBlobConnector.sample_query** - Reads a sample (sample_size) from the a query retrieved data. Returns a Dataset object.
- **AzureBlobConnector.read_database** - Reads the full database data. 
- **AzureBlobConnector.write_table** - Writes a Dataset into a new schema table.

This tutorial covers:
- How to read data from Azure Data Lake Storage
- How to read data (sample) from Azure Data Lake Storage
- How to write data to Azure Data Lake Storage
- Read data from Synapses Data Lake

In [3]:
from ydata.connectors import AzureBlobConnector
from ydata.connectors.filetype import FileType
from ydata.utils.formats import read_json

In [4]:
# Instantiate the Connector
connector =  AzureBlobConnector(account_name='ydatasynapse',
                                account_key='e/lAyP1M76I0ZaT3LTAWDf5hQqg7YBOxvmeVfBxzRhAKw+3E8gDRQJlDukzHW1q+X4oAlaVGhweH+ASt7Stssw==')

In [None]:
# Load a dataset
# The file_type argument is optional. If not provided, we will infer it from the path you have provided.
#Add here the detail on the file path conversation for the Azure Data Lake

#https://{insert-host-name}.dfs.core.windows.net/{insert-container-name}/paysim.csv
# We need to have the following format instead
##abfss://{insert-container-name}@{insert-host-name}.dfs.core.windows.net/{insert-file-path}

data = connector.read_file('abfss://{insert-container-name}@{insert-host-name}.dfs.core.windows.net/{insert-file-path}.csv', file_type=FileType.CSV)
data.head(100)

In [None]:
# For a quick glimpse, we can load a small subset of the data - let's say 100 rows
very_small_data = connector.read_sample('abfss://{insert-container-name}@{insert-host-name}.dfs.core.windows.net/{insert-file-path}.csv', sample_size=100, file_type=FileType.CSV)

In [12]:
# Now imagine we want to store the sampled data.
connector.write_file(very_small_data, 'abfss://{insert-container-name}@{insert-host-name}.dfs.core.windows.net/{insert-file-path}.csv', file_type=FileType.CSV)

# Advanced

In [None]:
# Delete a specific blob
connector.delete_blob_if_exists('abfs://{insert-blob-name}/{insert-file-path}.csv', file_type=FileType.CSV)

In [None]:
#Reading multiple CSV files from a folder
data = connector.read('abfss://{insert-container-name}@{insert-host-name}.dfs.core.windows.net/{insert-file-path}.csv', file_type=FileType.CSV)

#orfss

data = connector.read('abfss://{insert-container-name}@{insert-host-name}.dfs.core.windows.net/{insert-file-path}.csv', file_type=FileType.CSV)

### Exemple - Reading data from Synapse ADLS

In [None]:
#Reading from Synapse Data Lake
connector =  AzureBlobConnector(account_name='{insert-synapse-data-lake-accountname}',
                                account_key='{insert-synapse-data-lake-accountkey}')

#File path from ADLS:
#https://{insert-resource-name}.blob.core.windows.net/{insert-container-name}/{insert-file-path}
#we need to convert it to
#abfs://{insert-container-name}@ydatasynapse.dfs.core.windows.net/{insert-file-path}

data = connector.read_file('abfs://{insert-container-name}@{insert-host-name}.dfs.core.windows.net/{insert-file-path}.csv', file_type=FileType.CSV)

#https://ydata-test@ydatasynapse.dfs.core.windows.net/paysim.csv