## Transformation
In this Python notebook, I explain the thought process behind implementing the Transformation portion of the exam specifications. The transformation process involves (1) converting email addresses to lowercase, (2) converting product names to uppercase, and (3) calculating the total amount spent by each user.

Note that running this Python notebook may result into errors as the directories for this notebook and the original file for `transformation.py` are different. To see the Transformation script, run `transformation.py` from the original directory instead.

### Importing files
The previously processed dataframes are used as the dataframes for the transformation.

We start by importing the `processing.py` file into `transformation.py` and copy the processed dataframes for us to easily use.

In [None]:
import processing
import pandas as pd

users = processing.process_users.get_dataframe()
transactions = processing.process_transactions.get_dataframe()
pricing = processing.process_pricing.get_dataframe()

This lets us use the dataframes that were previously processed, ensuring data quality in the pipeline.

### Converting email addresses to lowercase
We convert email addresses from the `users` dataframe to lowercase with a callable function.

In [None]:
def email_lowercase(users):
    users['email'] = users['email'].str.lower()

email_lowercase(users)

The pandas dataframe lets us transform the data as needed by allowing us to use str functions as well.

### Converting product names to uppercase
We convert product names from the `transactions` dataframe to uppercase with a callable function.

In [None]:
def product_upper(transactions):
    transactions['product'] = transactions['product'].str.upper()

product_upper(transactions)

Once again, using string functions and the pandas library, the values are easily converted to uppercase.

### Summed up expenditures by user
We take the `transactions` dataframe and sum up the total amount that each user has spent.

In [None]:
def total_amount(transactions, users):
    summed_df = transactions.groupby('user_id')['amount'].sum().reset_index()
    merged_df = pd.merge(users, summed_df, on='user_id')

total_amount(transactions, users)

We sum up the total amount and store it into the `summed_df` which contains two columns, the `user_id` and the summed up `amount`. Since we know from reading ahead in the specifications that the transformed data should look like `Users: user_id, name, email, date_joined, total_spent`, we go ahead and merge the datasets that we have at the moment in order to simply load the data to PostgreSQL for the next section.

### Summary
1. We import the `processing.py` file from the previous section to avoid code repetition.
2. We transform the strings in each dataframe using pandas and string manipulation.
3. We summed up the amount spent by each user by using the data in the `transactions` table and merging it with the `users` table for data loading in the next section.