# Saving the Final Data Set

First, install the required Python libraries if not done already. See
[Installing Required Python Libraries](../00_Installing_Required_Python_Libraries.md).

If you're new to Python, you might be interested in [Introduction to Python Lists and Dictionaries for Data Science](../01_Introduction_to_Python_Data_Types.md).

Begin by importing the required packages.

In [1]:
import pandas as pd

## Run the data access notebooks

In [2]:
%run '../03_01_Accessing_and_Exploring_Data/01_Accessing_and_Reading_Local_Files.ipynb'
%run '../03_01_Accessing_and_Exploring_Data/02_Accessing_and_Reading_Data_Lake_Files.ipynb'
%run '../03_01_Accessing_and_Exploring_Data/03_Accessing_and_Reading_Database-Data_Lakehouse_Data.ipynb'

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   customerSubscrCode  3 non-null      int64 
 1   customerSubscrStat  3 non-null      object
dtypes: int64(1), object(1)
memory usage: 180.0+ bytes
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 22 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   ID                      5000 non-null   float64
 1   LostCustomer            5000 non-null   float64
 2   regionPctCustomers      5000 non-null   float64
 3   numOfTotalReturns       5000 non-null   float64
 4   wksSinceLastPurch       5000 non-null   float64
 5   basktPurchCount12Month  5000 non-null   float64
 6   LastPurchaseAmount      5000 non-null   float64
 7   AvgPurchaseAmount12     5000 non-null   float64
 8   AvgPurchase

## Run the data joining notebook

In [3]:
%run './01_Combining_Data.ipynb'

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   customerSubscrCode  3 non-null      int64 
 1   customerSubscrStat  3 non-null      object
dtypes: int64(1), object(1)
memory usage: 180.0+ bytes
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 22 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   ID                      5000 non-null   float64
 1   LostCustomer            5000 non-null   float64
 2   regionPctCustomers      5000 non-null   float64
 3   numOfTotalReturns       5000 non-null   float64
 4   wksSinceLastPurch       5000 non-null   float64
 5   basktPurchCount12Month  5000 non-null   float64
 6   LastPurchaseAmount      5000 non-null   float64
 7   AvgPurchaseAmount12     5000 non-null   float64
 8   AvgPurchase

## Run the data transformation notebook

In [4]:
%run './02_Transforming_and_Enriching_Data.ipynb'

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   customerSubscrCode  3 non-null      int64 
 1   customerSubscrStat  3 non-null      object
dtypes: int64(1), object(1)
memory usage: 180.0+ bytes
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 22 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   ID                      5000 non-null   float64
 1   LostCustomer            5000 non-null   float64
 2   regionPctCustomers      5000 non-null   float64
 3   numOfTotalReturns       5000 non-null   float64
 4   wksSinceLastPurch       5000 non-null   float64
 5   basktPurchCount12Month  5000 non-null   float64
 6   LastPurchaseAmount      5000 non-null   float64
 7   AvgPurchaseAmount12     5000 non-null   float64
 8   AvgPurchase

## Save as a CSV file

In [5]:
df.to_csv('../../data/output/customer_churn_abt.csv', index=False)

## Export to Snowflake (if applicable)

If we had Snowflake and wanted to save a table in it, we would use the following code.

```python
import snowflake as sf
import snowflake.connector
import json
from snowflake.connector.pandas_tools import write_pandas

sf_credential = '../../keys/snowflake_cred.json'

with open(sf_credential, "r") as f:
    sf_credentials_dict = json.load(f)

conn = sf.connector.connect(**sf_credentials_dict)

success, nchunks, nrows, output = write_pandas(
    conn=conn,
    df=df,
    table_name="customer_churn_abt",
    auto_create_table=True,
    overwrite=True
)

conn.close()
```

## Export to Google Cloud Storage (if applicable)

If we had GCS and wanted to save a table in it as a Parquet data set, we would use the following code.

```python
import json

gcs_key = '../../keys/gel-sas1-writer.json'

bucket_name = 'sas1-learn'

output_filename = 'customer_churn_abt.parquet'

output_path = f'gcs://{bucket_name}/data/{output_filename}'

df.to_parquet(
    output_path,
    engine='pyarrow',
    storage_options={'token': gcs_key}
)

print(f"DataFrame written to {output_path}")
```