<a href="https://colab.research.google.com/github/zachtahajian5/pandas-tour/blob/main/pandas_demo_2_11.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [91]:
import pandas as pd
import random
import os
import sqlite3
from google.colab import drive
drive.mount('/content/drive')


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In this part of the pandas mastery tour, we're going to do a number of I/O related tasks:

1. First we'll demonstrate how to read a CSV into our program as a pandas DataFrame object. We'll read that CSV file in from root > data > raw.

2. We're going to create a copy of the 'raw' data by using .copy() on the raw DataFrame.

3. We'll do some arbitrary transformations to our copied DataFrame to create the illusion of "raw to processed" data.

4. We'll write the 'processed' copy to a format of our choosing, in this case, we'll keep it as a CSV. We're working in a Google Collab notebook so we'll create a variable called "processed_csv_path" to hold the copied address of our  root>data>processed location.

In [92]:
df_raw = pd.read_csv("/content/drive/MyDrive/pandas-mini-project/data/raw/io.csv")
#inspecting raw data to ensure we have the right data.

print("first 3 rows:")
print(df_raw.head())
print()
print("last 3 rows:")
print(df_raw.tail())

first 3 rows:
  Platform  Sales
0      PS4   2604
1     Xbox   1263
2   Switch   3272
3       PC   3166
4      PS4   1861

last 3 rows:
   Platform  Sales
95       PC   3998
96      PS4   1083
97     Xbox   1446
98   Switch   4264
99       PC   4971


In [93]:
df = df_raw.copy()
#inspecting data copy to ensure this is a carbon copy of the original data.
print("first 3 rows:")
print(df_raw.head())
print()
print("last 3 rows:")
print(df.tail())

first 3 rows:
  Platform  Sales
0      PS4   2604
1     Xbox   1263
2   Switch   3272
3       PC   3166
4      PS4   1861

last 3 rows:
   Platform  Sales
95       PC   3998
96      PS4   1083
97     Xbox   1446
98   Switch   4264
99       PC   4971


Our arbitrary transformation will be a scalar multiplication of the "Sales" column.

In [94]:
df["Sales"] = df["Sales"] * 2
#verifying transformation took place.
print(df.head())

  Platform  Sales
0      PS4   5208
1     Xbox   2526
2   Switch   6544
3       PC   6332
4      PS4   3722


We'll make a copy of our interim data and assign it to "df_processed" to differentiate our working data and our completed data.

In [95]:
df_processed = df.copy()

Now we'll write df_processed to a CSV file in root>data>processed

In [96]:
processed_dir = "/content/drive/MyDrive/pandas-mini-project/data/processed"
os.makedirs(processed_dir, exist_ok=True)
processed_csv_path = os.path.join(processed_dir, "overviews_sales_scaled.csv")
df_processed.to_csv(processed_csv_path, index=False)

print("Saved to:", processed_csv_path)

Saved to: /content/drive/MyDrive/pandas-mini-project/data/processed/overviews_sales_scaled.csv


We forgot! We needed to export this processed DataFrame to an Excel and SQL, respectively.


In [97]:
processed_excel_path = os.path.join(processed_dir, "overviews_sales_scaled.xlsx")
df_processed.to_excel(processed_excel_path, index=False)
print("Saved to:", processed_excel_path)

Saved to: /content/drive/MyDrive/pandas-mini-project/data/processed/overviews_sales_scaled.xlsx


Finally, we'll get export our data to an SQL db.

In [98]:
db_path = os.path.join(processed_dir, "overview_sales_scaled.db")
# Establishing SQL DB connection
conn = sqlite3.connect(db_path)
df_processed.to_sql("Overview", conn, if_exists="replace", index=False)
#Closing SQ DB connection
conn.close()