# Module 6: Azure Blob Storage

In the digital world, data is often stored in cloud environments. Understanding how to work with data in a cloud environment is essential for a Data Engineer. In this training we are going to focus on the data stored with the Azure Blob Storage.

During this training we will go into detail about Azure Data Lake Storage, and how to work with the data stored in it with Python. During the training we will follow the following outline:
1. What is Azure Blob Storage?
2. Connecting with Azure Blob Storage
3. Azure Blob Storage operations 

Enjoy!

### Section 1: What is Azure Blob Storage? (15 min)

- Talk about Microsoft Azure (small introduction)
- Name several services, and why it can be relevant for a data engineer
- Move to Blob Storage

Azure Blob Storage is an online place where data can be stored, it's as simple as that. And as with almost any data storage location, there is a way to let Python do the heavy lifting for us. There are options and libraries within Python that will allow to communicatie with this way of data storage, and retrieve information from the stored data.

Azure Blob Storage is a storage solution for the cloud, and is optimized for storing huge amounts of unstructured data. The unstructured data is data that usually doesn't follow a data model. Examples are text data, images or videos. This makes Azure Blob Storage an ideal place where to quickly store and access your (unstructured) data.

One can view Azure Blob Storage as a file system, with a hierarchical relationship in storing the data or files. When data is stored within the Azure Blob Storage, you can access it using the Azure Blob Storage REST API, which we'll look at further on.
The structure of Azure Blob Storage is described in the image below.

![blob1.png](attachment:blob1.png)

In the image above, you can see three important things. The storage account, the containers within the storage account, and the blob within the containers. These are the core of what makes up Azure Blob Storage. Let's have a look at each of them.

**The storage account**
A storage account provides a unique namespace in Azure for your data, it creates a space to store your objects and allows you to find those objects again. Every object that you store in Azure Storage has an address that includes your unique account name. The combination of the account name and the Blob Storage endpoint forms the base address for the objects in your storage account. For example, if your storage account is named newstorageaccount, then the default endpoint for Blob storage is: http://newstorageaccount.blob.core.windows.net

**Containers**
A container organizes a set of blobs, similar to a directory works in a file system. A storage account can include an unlimited number of containers, and a container can store an unlimited number of blobs.

**Blobs**
Blobs are the lowest tier within Azure Blob Storage. They are the files that you store within your Blob storage. There are three different types of blobs; Block blobs, Append blobs, and Page blobs. In this training we will only be using the blocks blobs. For now, all you need to know about the different blob types, is that they exists, and that each of them will have a different functionality. See their documentation for more information (if you want): https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blob-pageblob-overview?tabs=dotnet.  

With this as a core, we can move to working with Azure Blob Storage. And as with every data-related subject, we can use Python. During this training we will start with a focus on the following core aspects:
- Create a container.
- Upload a file to block blob.
- List blobs.
- Download a blob to file.
- Delete a blob.
- Delete the container.
Afterwards, we will put them more into practice, so that it all becomes familiar.

https://docs.microsoft.com/en-us/samples/azure-samples/azure-sdk-for-python-storage-blob-upload-download/upload-download-blobs-python/
https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/storage/azure-storage-blob/samples/blob_samples_hello_world.py

- Create a Storage Account using the Azure Portal.
- Create a container.
- Upload a file to block blob.
- List blobs.
- Download a blob to file.
- Delete a blob.
- Delete the container.

In [None]:
key_1 = "o3qLInEpzd+NxLrTEOIVmlxFhbY+cq8zuPMRQzsIiNXkY85M+XcfntK3q0iv7vXOKvlvAPX3gmKB+AStJy9zyQ=="
connection_string = "DefaultEndpointsProtocol=https;AccountName=pythonblobstorage;AccountKey=o3qLInEpzd+NxLrTEOIVmlxFhbY+cq8zuPMRQzsIiNXkY85M+XcfntK3q0iv7vXOKvlvAPX3gmKB+AStJy9zyQ==;EndpointSuffix=core.windows.net"

### Section 2: Connecting with Azure Blob Storage (45 min)

- First connect to the Blob Storage
- Then have a look at the different files and accessing them, the existing structure
- How to download a blob

### Section 3: Azure Blob Storage operations (60 min)

- Create your own containers
- Fill them with your own blobs
- Delete specific blobs
- Delete container

#### Assignment 1: The ElementTree library 1

First import the ElementTree library (the usual way is to import the library 'as ET').
Then load the XML document into a variable called 'tree'.
After that, get the root of the 'tree' variable.

Lastly, print the document using the already existing statement.