# Stream Customers Data from Cloud Files to Delta Lake
1. Read Files from cloud storage using DataStreamReader API
2. Transform the dataframe to add the following columns
  * file path: Cloud file path
  * ingestion date: Current date
3. Write the transformed data stream to Delta Lake Table

1. Read files using DataStreamReader API

In [0]:
%run ../utils/utils

In [0]:
gizmobox_mount_point = mount_adls("gizmobox")

In [0]:
from pyspark.sql.types import StructType, StructField, IntegerType, StringType, DateType, TimestampType

schema = StructType(
    fields=[
        StructField("customer_id", IntegerType()),
        StructField("customer_name", StringType()),
        StructField("date_of_birth", DateType()),
        StructField("telephone", StringType()),
        StructField("email", StringType()),
        StructField("member_since", DateType()),
        StructField("created_timestamp", TimestampType()),
    ]
)


In [0]:
data = read_stream(
    file_name="landing/operational_data/customers_stream",
    mount_point=gizmobox_mount_point,
    schema=schema
)

Transform the dataframe to add the following columns
* file path: Cloud file path
* ingestion date: Current date

In [0]:
from pyspark.sql.functions import col, current_timestamp

data = data.withColumn("file_path", col("_metadata.file_path")) \
    .withColumn("ingestion_date", current_timestamp())

3. Write the transformed data stream to Delta Table

In [0]:
stream_query = write_stream(
  data=data,
  table_name="bronze.customers_stream",
  checkpoint=f"{gizmobox_mount_point}/landing/operational_data/customers_stream/_checkpoint_stream",
)

In [0]:
%sql
select * from bronze.customers_stream;

In [0]:
stream_query.stop()