# Practice Notebook

In this notebook, you will familiarize yourself with Fabric Lakehouses and how to interact with them using Fabric notebooks. You will also explore various features within the notebook.

# Step 1: Attach a Lakehouse

First, we will start by attching a Lakehouse to your notebook. This simplifies the workflow to read and write data within your notebooks.

To do this:

1. Select **Lakehouses**.
2. Select **Add Lakehouse**.
3. Provide the name for your Lakehouse. You will use this Lakehouse for the rest of the exercises. **Suggested Name: FC_Workshop**
4. In other notebooks, you can attach to the same Lakehouse by selecting **+Data sources** and **Existing Lakehouse**.


# Step 2: Change your notebook environment

Next, before we start our notebook session, we'll also change our default notebook environment. 

![Select Environment and then select your environment from the list](https://synapseaisolutionsa.blob.core.windows.net/public/Fabric-Conference/AttachEnv.png)


# Step 3: Install libraries

In addition to the Notebook Environment, you can also install libraries for your notebook session using ```% pip install```.


#### Run the cell below to install the required packages for Copilot


In [None]:

#Run this cell to install the required packages for Copilot
%pip install https://aka.ms/chat_magics-0.0.0-py3-none-any.whl
%load_ext chat_magics


StatementMeta(, 97871f36-6cdb-43e1-ad5a-e735e500ffff, 8, Finished, Available)

Collecting chat-magics==0.0.0
  Downloading https://aka.ms/chat_magics-0.0.0-py3-none-any.whl (17 kB)
Collecting ds-copilot@ https://aka.ms/ds_copilot-0.0.0-py3-none-any.whl (from chat-magics==0.0.0)
  Downloading https://aka.ms/ds_copilot-0.0.0-py3-none-any.whl (44 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.4/44.4 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting chat-magics-fabric@ https://aka.ms/chat_magics_fabric-0.0.0-py3-none-any.whl (from chat-magics==0.0.0)
  Downloading https://aka.ms/chat_magics_fabric-0.0.0-py3-none-any.whl (124 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m124.2/124.2 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting pydantic==2.4.2 (from chat-magics-fabric@ https://aka.ms/chat_magics_fabric-0.0.0-py3-none-any.whl->chat-magics==0.0.0)
  Downloading pydantic-2.4.2-py3-none-any.whl (395 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m395.8/395.8 kB[0m [31m20.3 MB/s

### **Chat_magic commands**

* Using installed DSCopilot package.
* Service: fabric
* Model: gpt-35-turbo-16k




#### **AI Processor Disclaimer**


#### Data Privacy and Security

Chat-magics is powered by Azure OpenAI Service and is subject to the supplemental terms of use for
[Microsoft Azure Previews](https://azure.microsoft.com/en-us/support/legal/preview-supplemental-terms/).
Azure OpenAI is fully controlled by Microsoft.

In order to generate a response, chat-magics uses (a) your prompt or input and, when appropriate,
(b) additional data that is retrieved through the grounding process to provide more relevant,
contextual responses. This information is sent to Azure OpenAI, where it is processed and an
output is generated. Therefore, data processed by Azure OpenAI can include:

- Your prompt or input
- Grounding data
- Chat-magics response or output

Your Data:

- Is not used to train models.
- Is not available to other customers.
- Is stored for up to 30 days and may be reviewed by Microsoft employees for abuse monitoring.

Here is your current privacy configuration:  
* You ARE sharing previous messages sent to and replies from the LLM.  
* You ARE sharing the contents of cells that you have executed.  
* You ARE sharing the outputs of cells that you have executed.  
* You ARE sharing the schemas of data sources in your notebook.  
* You ARE sharing sample data from data sources in your notebook.  
* You ARE sharing schemas from external data sources.  
* You ARE automatically scanning for external data sources.  


Consider what sharing settings are appropriate when working with sensitive or confidential data. You can change your sharing settings at any time by using the command: %set\_sharing\_level




### **Usage**
Use the `%chat_magics` command to display this help message.

<details>
<summary>Expand for details...</summary>

To get started, try something like the following:

    %%code
    Load my_data.csv from the current folder into a pandas dataframe.

#### Main Commands
* `%%chat` - Ask questions about your notebook state or let the chat-magics help you understand or author it.
* `%%code` - Generate code to work with or visualize your data.
* `%%describe` - Describe a loaded dataframe.
* `%%add_comments` - Add comments to the code in a cell.
* `%%fix_errors` - Fix errors in a cell.
* `%%translate` - Translate code from one language to another.

#### Configuration Commands
* `%set_output` - Set whether code responses are generated to the current cell, next cell, cell output, a variable, or not at all.
* `%set_language` - Set the default language for generated code.
* `%set_sharing_level` - Control how much chat, code, output, and data is shared with the the configured AI processor.

#### Context Commands
* `%pin & %unpin` - Pin and unpin dataframes to better focus AI responses.
* `%new_task` - Provide overall guidance for generated responses. Clears history.
* `%ignore` - Ignore a cell so that it is not processed by chat-magics.

#### Detailed Usage
For detailed information on each command, use the `?` (help operator), e.g. `%%code?` or `%set_sharing_level?`. Also, note that some magic commands may not be available depending on the AI service you have configured.

</details>




# Step 4: Practice loading and writing data from Lakehouse

## Write data to Lakehouse

In [None]:
import sklearn
from sklearn.datasets import fetch_california_housing
from pyspark.sql import SparkSession
from pyspark.ml.feature import VectorAssembler

# Load scikit-learn dataset
housing = fetch_california_housing()

# Convert scikit-learn dataset to a pandas DataFrame
import pandas as pd
data = pd.DataFrame(housing.data, columns=housing.feature_names)
data['target'] = housing.target

# Drop index column
data.reset_index(drop=True, inplace=True)

# Convert pandas DataFrame to Spark DataFrame
spark_df = spark.createDataFrame(data)

# Save Spark DataFrame to a lakehouse (e.g., Delta Lake)
spark_df.write.format("delta").mode("overwrite").save("Tables/practiceData")



StatementMeta(, 97871f36-6cdb-43e1-ad5a-e735e500ffff, 10, Finished, Available)

## Read data from Lakehouse

If you don't see the practiceData populated within the Lakehouse, you can refresh the Table list.

![Refresh list of tables](https://synapseaisolutionsa.blob.core.windows.net/public/Fabric-Conference/RefreshTables.png)

In [None]:
df = spark.sql("SELECT * FROM FC_Workshop.practiceData")
display(df)

## Use Copilot Chat Magics

To learn more, you can read [Overview of Chat Magics in Fabric](https://learn.microsoft.com/en-us/fabric/get-started/copilot-notebooks-chat-magics).

### Instant query and code generation
The ```%%chat``` command allows you to ask questions about the state of your notebook. 

The ```%%code``` enables code generation for data manipulation or visualization.


### Commenting and debugging
The ``` %%add_comments``` and ```%%fix_errors``` commands help add comments to your code and fix errors respectively. This helps make your notebook more readable and error-free.

In [None]:
%%chat 
How can I create a histogram to see the distribution of HouseAge

StatementMeta(, 97871f36-6cdb-43e1-ad5a-e735e500ffff, 12, Finished, Available)

##### ATTENTION: AI-generated code can include errors or operations you didn't intend. Review the code generated carefully before running it.
To create a histogram to visualize the distribution of the "HouseAge" column, you can use the `hist` function from pandas or PySpark.

Here's an example using pandas:

1. **Histogram of HouseAge using Pandas:**
This plot will show the distribution of the "HouseAge" column using a histogram.
```python
import pandas as pd
import matplotlib.pyplot as plt

# Assuming your dataframe is named 'df'
plt.figure(figsize=(10, 6))
plt.hist(df['HouseAge'], bins=20, color='blue', edgecolor='black')
plt.xlabel('House Age')
plt.ylabel('Frequency')
plt.title('Distribution of House Age')
plt.show()
```

Note: Make sure to adjust the figure size, number of bins, and color according to your preference.

If your dataframe is a PySpark dataframe, you need to convert it to a pandas dataframe first before plotting. Here's an example:

2. **Histogram of HouseAge using PySpark:**
This plot will show the distribution of the "HouseAge" column using a histogram.
```python
import pandas as pd
import matplotlib.pyplot as plt

# Assuming your PySpark dataframe is named 'spark_df'
pandas_df = spark_df.toPandas()

plt.figure(figsize=(10, 6))
plt.hist(pandas_df['HouseAge'], bins=20, color='blue', edgecolor='black')
plt.xlabel('House Age')
plt.ylabel('Frequency')
plt.title('Distribution of House Age')
plt.show()
```

Note: Make sure to adjust the figure size, number of bins, and color according to your preference.

Remember to replace `'df'` or `'spark_df'` with the actual name of your dataframe variable.

In [None]:
%%code 
Load practiceData table into a Spark dataframe and convert it into a Pandas dataframe 

**chat-magics** generated the following cell.  Tokens: 23

Remember that AI can make mistakes, so carefully review code before executing.

In [None]:
df = spark.table('fc_workshop.practicedata')
pandas_df = df.toPandas()

In [None]:
df_spark = spark.read.table("practiceData")
df_pandas = df_spark.toPandas()

StatementMeta(, 97871f36-6cdb-43e1-ad5a-e735e500ffff, 14, Finished, Available)

In [None]:
%%code
Filter out rows where HouseAge in df_pandas >= 80

**chat-magics** generated the following cell.  Tokens: 19

Remember that AI can make mistakes, so carefully review code before executing.

In [None]:
pandas_df_filtered = pandas_df[pandas_df['HouseAge'] < 80]

In [None]:
df_filtered = df_pandas[df_pandas['HouseAge'] < 80]

StatementMeta(, 97871f36-6cdb-43e1-ad5a-e735e500ffff, 16, Finished, Available)

In [None]:
df_filtered = df_pandas[df_pandas['HouseAge'] >= 80]

StatementMeta(, 97871f36-6cdb-43e1-ad5a-e735e500ffff, 17, Finished, Available)