## Data Loading
In this Python notebook, I explain the thought process behind implementing the Data Loading portion of the exam specifications. The data loading process itself is simply just the `dataloading.py` file but it invovles external applications such as Docker for accesibility on the user's side.

Note that running this Python notebook may result into errors as the directories for this notebook and the original file for `dataloading.py` are different. To see the Data loading script, run `dataloading.py` from the original directory instead.

### Creating a Docker container
We start by creating a `docker-compose.yml` file that contains basic details regarding the PostgreSQL database that we will be instantiating and running. The file contains a basic template of most docker-compose files that involve creating a docker container that runs PostgreSQL.

In [None]:
version: '3.8'

services:
  db:
    image: postgres:latest
    environment:
      POSTGRES_USER: myuser
      POSTGRES_PASSWORD: mypassword
      POSTGRES_DB: mydatabase
    ports:
      - "5432:5432"
    volumes:
      - db_data:/var/lib/postgresql/data

volumes:
  db_data:


Next, we create a `Dockerfile` that automatically obtains the latest version of PostgreSQL as well as their initialization scripts. We can then simply run the following commands to get started with our Docker PostgreSQL database.

-`docker pull postgres`

-`docker run --name my_postgres -e POSTGRES_PASSWORD=mysecretpassword -d postgres`

Once the PostgreSQL database is running in a Docker container, we can then proceed with loading the data into it with `dataloading.py`

### Creating the Connection from Python
We use the library for SQLAlchemy to connect with our PostgreSQL database, as well as easily create SQL queries with our Python scripts.

In [1]:
from sqlalchemy import create_engine

dbname="postgres"
user="postgres"
password="munch"
host="localhost"
port = '5432'

connection_string = f'postgresql://{user}:{password}@{host}:{port}/{dbname}'
engine = create_engine(connection_string)

We provide the basic login prerequisites to connect to the database and have our engine running. Once it's ready, we can then use queries to load the dataframes we've been working with into the database.

### Uploading the dataframe
We store the code for uploading the pandas dataframe in the `upload_data` function.

In [None]:
import transformation

def upload_data():
    users = transformation.users
    users.to_sql('users', engine, if_exists='replace', index=False)
    print("Data uploaded successfully!")

The function takes the users table that we've transformed from the previous section and uploads it using the enginer we've made with SQLAlchemy. We then test this by looking at the query we receive from running the `query_data` function.

In [None]:
import pandas as pd
def query_data():
    query = 'SELECT * FROM users'
    df = pd.read_sql(query, engine)
    print("Data fetched successfully!")
    print(df)

Once run, the query selects all the data from the `users` table and stores it into a pandas dataframe. We notice that pandas automatically concatenates an index unto the dataframe, despite having removed it from the previous section. However, this shows that the data is indeed loaded into the database.