# Module 1 - Exercise


## Exercise 1: Navigate the Lakehouse and Read Data

In this exercise, we will explore how to connect to a Lakehouse in Microsoft Fabric and perform a basic data read operation using PySpark.
Steps:

1) Create a Lakehouse: In Microsoft Fabric, create a Lakehouse to store structured and unstructured data.
2) Load Data: Assume you have ingested data into your Lakehouse (e.g., California Housing Prices).
2) Use PySpark: Load the data from the Delta Table format in your Lakehouse.

In [None]:
# Step 1: Set up the Spark session
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("Fabric_Lakehouse_Demo").getOrCreate()

# Step 2: Define the Lakehouse table path (Delta Table)
lakehouse_table_path = "abfss://<your-container>@<your-storage-account>.dfs.core.windows.net/delta/your-lakehouse-table"

# Step 3: Read the data from the Delta table
df = spark.read.format("delta").load(lakehouse_table_path)

# Step 4: Display the first 5 rows of the dataset
df.show(5)


## Exercise 2: Create a Simple Data Pipeline

In this exercise, you will create a basic Data Pipeline in Microsoft Fabric using Notebooks and Pipelines. You will extract data from a source, transform it using PySpark, and then load it back into a Lakehouse.

Steps:

1) Create a Data Pipeline: Set up a data pipeline in Microsoft Fabric.
2) Extract: Read data from an external source (e.g., CSV or database) and load it into the Data Lake.
3) Transform: Perform some simple data transformation using PySpark (e.g., filtering and aggregation).
4) Load: Write the transformed data back to your Lakehouse in Delta format.

In [None]:
# Step 1: Read data from a source (e.g., CSV)
source_path = "abfss://<your-container>@<your-storage-account>.dfs.core.windows.net/raw/california_housing.csv"
df = spark.read.csv(source_path, header=True, inferSchema=True)

# Step 2: Perform transformation (e.g., Filter rows where 'median_house_value' > 200000)
filtered_df = df.filter(df["median_house_value"] > 200000)

# Step 3: Write the transformed data back to the Lakehouse as Delta Table
filtered_df.write.format("delta").mode("overwrite").save("abfss://<your-container>@<your-storage-account>.dfs.core.windows.net/delta/transformed_california_housing")
