# Stream Customers Data from Cloud Files to Delta Lake using Auto Loader
1. Read Files from cloud storage using Auto Loader
2. Transform the dataframe to add the following columns
  * file path: Cloud file path
  * ingestion date: Current date
3. Write the transformed data stream to Delta Lake Table

1. Read files using Auto Loader

In [0]:
%run ../utils/utils

In [0]:
gizmobox_mount_point = mount_adls("gizmobox")

In [0]:
data = spark.readStream \
    .format("cloudFiles") \
    .option("cloudFiles.format", "json") \
    .option("cloudFiles.schemaLocation", f"{gizmobox_mount_point}/landing/operational_data/customers_autoloader/_schema") \
    .option("cloudFiles.inferColumnTypes", "true") \
    .option("cloudFiles.schemaHints", "date_of_birth DATE, member_since DATE, created_timestamp TIMESTAMP") \
    .load(f"{gizmobox_mount_point}/landing/operational_data/customers_autoloader")

Transform the dataframe to add the following columns
* file path: Cloud file path
* ingestion date: Current date

In [0]:
from pyspark.sql.functions import col, current_timestamp

data = data.withColumn("file_path", col("_metadata.file_path")) \
    .withColumn("ingestion_date", current_timestamp())

3. Write the transformed data stream to Delta Table

In [0]:
stream_query = write_stream(
  data=data,
  table_name="bronze.customers_autoloader",
  checkpoint=f"{gizmobox_mount_point}/landing/operational_data/customers_autoloader/_checkpoint_stream",
)


In [0]:
%sql
select * from bronze.customers_autoloader;

In [0]:
stream_query.stop()