# Volumes 
- Allow role based access
- Use **Volumes** for files that need to stay as files
- Raw CSVs, JSON Logs, Images, PDFs Intermediate files

## vs Delta Tables
- Use **Delta Tables** for structured data that we query for SQL
- They are optimized with indexes, statistics, transactions and time travel

## Load files
- Create Volume `raw_data` under Data Ingestion -> workspace -> default

![CreateVolume.png](./CreateVolume.png "CreateVolume.png")

- Upload our file

- They are available in catalog

![CatalogVolume.png](./CatalogVolume.png "CatalogVolume.png")

- Load file with spark

In [0]:
orders_from_volume = spark.read.option("header", "true").csv("/Volumes/workspace/default/raw_data/ECommerceMessyData.csv")
orders_from_volume.count()

20

In [0]:
display(orders_from_volume)

order_id,product_name,category,country,payment_method,total_amount
1,LAPTOP_PRO,Electronics,USA,Credit Card,1299.99
2,office chair,Furniture,Canada,Debit Card,299.5
3,wireless_Mouse,Electronics,UK,PayPal,49.99
4,DESK_LAMP,Furniture,USA,Credit Card,79.99
5,notebook_SET,Stationery,Germany,Bank Transfer,29.99
6,USB Cable,Accessories,France,PayPal,15.99
7,mechanical_KEYBOARD,Electronics,Australia,Debit Card,159.99
8,MONITOR_4K,Electronics,India,Credit Card,599.99
9,ergonomic_CHAIR,Furniture,Canada,PayPal,449.99
10,wireless HEADPHONES,Electronics,USA,Credit Card,199.99


In [0]:
orders_from_volume.printSchema()

root
 |-- order_id: string (nullable = true)
 |-- product_name: string (nullable = true)
 |-- category: string (nullable = true)
 |-- country: string (nullable = true)
 |-- payment_method: string (nullable = true)
 |-- total_amount: string (nullable = true)



- Process files in Spark



In [0]:
processed_orders = orders_from_volume.select('order_id', 'product_name',  'total_amount')
processed_orders.count()
display(processed_orders)
processed_orders.printSchema()

order_id,product_name,total_amount
1,LAPTOP_PRO,1299.99
2,office chair,299.5
3,wireless_Mouse,49.99
4,DESK_LAMP,79.99
5,notebook_SET,29.99
6,USB Cable,15.99
7,mechanical_KEYBOARD,159.99
8,MONITOR_4K,599.99
9,ergonomic_CHAIR,449.99
10,wireless HEADPHONES,199.99


root
 |-- order_id: string (nullable = true)
 |-- product_name: string (nullable = true)
 |-- total_amount: string (nullable = true)



## Write to Volumes

- Create another volume `processed_data`
- Write in csv format with spark
- Write in parquet format with spark

In [0]:
processed_orders.write.mode("overwrite").option("header", "true").csv("/Volumes/workspace/default/processed_data/ECommerceCleanData.csv")

In [0]:
processed_orders.write.mode("overwrite").parquet("/Volumes/workspace/default/processed_data/ECommerceCleanData.parquet")

## Read Parquet Data



In [0]:
parquet_data = spark.read.parquet("/Volumes/workspace/default/processed_data/ECommerceCleanData.parquet")
parquet_data.count()
display(parquet_data)

order_id,product_name,total_amount
1,LAPTOP_PRO,1299.99
2,office chair,299.5
3,wireless_Mouse,49.99
4,DESK_LAMP,79.99
5,notebook_SET,29.99
6,USB Cable,15.99
7,mechanical_KEYBOARD,159.99
8,MONITOR_4K,599.99
9,ergonomic_CHAIR,449.99
10,wireless HEADPHONES,199.99


## List volume contents

In [0]:
%fs ls "/Volumes/workspace/default/raw_data"

path,name,size,modificationTime
dbfs:/Volumes/workspace/default/raw_data/ECommerceMessyData.csv,ECommerceMessyData.csv,1120,1768913924000


In [0]:
%fs ls "/Volumes/workspace/default/processed_data"

path,name,size,modificationTime
dbfs:/Volumes/workspace/default/processed_data/ECommerceCleanData.csv/,ECommerceCleanData.csv/,0,1768915686873
dbfs:/Volumes/workspace/default/processed_data/ECommerceCleanData.parquet/,ECommerceCleanData.parquet/,0,1768915686873
