# Notebook 1 – Data Loading

This notebook handles the initial step of the workflow — loading raw CSV files into the database.  
It takes the data stored in `data/*.csv` and writes each file to its corresponding table, establishing the foundation for all subsequent cleaning and analysis.

The goal here is accuracy and traceability: every record is loaded as-is, without transformation, so that the raw source data remains a reliable point of reference for later steps.

In [1]:
# import the necesary lbraries
from dotenv import load_dotenv
import os
import pandas as pd
from sqlalchemy import create_engine

In [2]:
# load the environment variables from the .env file
load_dotenv()

DB_SERVER = os.getenv("DB_SERVER")
DB_PORT = os.getenv("DB_PORT")
DB_NAME = os.getenv("DB_NAME")
DB_USER = os.getenv("DB_USER")
DB_PASSWORD = os.getenv("DB_PASSWORD")

In [3]:
# initialize an SQLAlchemy engine for connecting to the database
conn_string = f"postgresql://{DB_USER}:{DB_PASSWORD}@{DB_SERVER}:{DB_PORT}/{DB_NAME}"

db = create_engine(conn_string)
print(f"Initialized engine at {db.url}")

Initialized engine at postgresql://postgres:***@localhost:5432/fundamentals


In [4]:
# write the data in the CSVs to the database
for name in ["companies", "financials", "industry_benchmarks"]:
    path = f"../data/{name}.csv"
    df = pd.read_csv(path)
    df.columns = df.columns.str.strip().str.lower()  # to sanitize column names
    recs = df.to_sql(name, db, if_exists="append", index=False)
    print(f"{recs} records inserted into `{name}` from {path}")

12 records inserted into `companies` from ../data/companies.csv
152 records inserted into `financials` from ../data/financials.csv
36 records inserted into `industry_benchmarks` from ../data/industry_benchmarks.csv
