# Personal Budget Data Preparation

## Plan

### income_log table

1. Import mock income_log data (two)
2. Combine dataframes to one dataframe
3. Sort resulting dataframe by date
4. Export data as sql file with prefix `final_`

### savings_log table

1. Import mock savings_log data (two)
2. Sort each dataframe by date
3. Calculate sum of `change` column per dataframe
4. If sum is negative, change some cells in the `change` column to positive
5. Add new cumulative column called `balance` for each dataframe using `cumsum`
6. Combine dataframes to one dataframe
7. Try: Move `balance` column to be after `change` column
8. Sort resulting dataframe by date
9. Export dataframe to sql file with prefix `final_`

### users table

Write sql file containing two users each with `name`, `username`, and `password`

## Implementation

In [84]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

### income_log table

#### 1. Import mock income_log data (two)

In [85]:
income_log_1 = pd.read_csv("raw_income_log_01.csv")
income_log_2 = pd.read_csv("raw_income_log_02.csv")

In [86]:
income_log_1

Unnamed: 0,user_id,date,source,amount,notes
0,1,2025-04-25,Delivery Job,24,
1,1,2025-01-06,Event Organizing,91,
2,1,2024-04-29,Event Organizing,86,
3,1,2025-06-21,Delivery Job,25,
4,1,2024-02-03,Delivery Job,82,
...,...,...,...,...,...
995,1,2025-06-15,Delivery Job,51,
996,1,2025-05-10,Event Organizing,64,
997,1,2024-06-09,Delivery Job,96,
998,1,2024-12-03,Delivery Job,16,


In [87]:
income_log_2

Unnamed: 0,user_id,date,source,amount,notes
0,2,2025-07-03,Freelance,63,
1,2,2024-12-11,Shop,91,"Quisque arcu libero, rutrum ac, lobortis vel, ..."
2,2,2024-08-31,Freelance,18,"Duis bibendum, felis sed interdum venenatis, t..."
3,2,2025-04-02,Freelance,68,
4,2,2024-08-23,Freelance,86,
...,...,...,...,...,...
995,2,2024-09-07,Freelance,99,
996,2,2024-09-14,Freelance,28,
997,2,2024-01-19,Shop,72,Mauris sit amet eros.
998,2,2024-07-08,University,40,


#### 2. Combine dataframes to one dataframe

In [88]:
income_log = pd.concat([income_log_1, income_log_2], ignore_index=True)

In [89]:
income_log

Unnamed: 0,user_id,date,source,amount,notes
0,1,2025-04-25,Delivery Job,24,
1,1,2025-01-06,Event Organizing,91,
2,1,2024-04-29,Event Organizing,86,
3,1,2025-06-21,Delivery Job,25,
4,1,2024-02-03,Delivery Job,82,
...,...,...,...,...,...
1995,2,2024-09-07,Freelance,99,
1996,2,2024-09-14,Freelance,28,
1997,2,2024-01-19,Shop,72,Mauris sit amet eros.
1998,2,2024-07-08,University,40,


#### 3. Sort resulting dataframe by date

In [90]:
# Convert date column to datetime
income_log["date"] = pd.to_datetime(income_log["date"])

In [91]:
# Sort
income_log = income_log.sort_values("date", ascending=True)

In [92]:
# Reset dataframe index
income_log = income_log.reset_index(drop=True)

In [93]:
income_log

Unnamed: 0,user_id,date,source,amount,notes
0,2,2024-01-01,University,44,
1,1,2024-01-01,Event Organizing,96,
2,2,2024-01-01,Freelance,99,
3,2,2024-01-01,Freelance,94,Quisque porta volutpat erat.
4,2,2024-01-01,Freelance,73,
...,...,...,...,...,...
1995,1,2025-08-22,Delivery Job,35,
1996,1,2025-08-22,Delivery Job,25,
1997,2,2025-08-23,Freelance,76,
1998,2,2025-08-23,University,25,"Aliquam augue quam, sollicitudin vitae, consec..."


#### 4. Export data as sql file with prefix `final_`

In [94]:
# Convert income_log date column back to string
income_log["date"] = income_log["date"].dt.strftime("%Y-%m-%d")

In [95]:
with open("final_income_log.sql", "w", encoding="UTF-8") as file:
    for _, row in income_log.iterrows():
        notes = 'NULL' if str(row['notes']) == 'nan' else f"'{row['notes']}'"
        sql = f"INSERT INTO income_log (user_id, date, source, amount, notes) VALUES ({row['user_id']}, '{row['date']}', '{row['source']}', {row['amount']}, {notes});\n"
        file.write(sql)