### Loading Data into PostgreSQL database

In this process, we will be loading the CSV files created in the data cleaning process as dataframes and writing these dataframes to the `movies` PostgreSQL database. We will be using sqlalchemy for this process.

Note that sqlalchemy relies on the package "psycopg2" to connect to PostgreSQL database. If need be, run `pip install psycopg2` in your terminal/command prompt to install this package.

In [2]:
# Import packages
import pandas as pd
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

In [3]:
# Define folder to save under
root = "Database Data/"

# Read CSV files saved from cleaning process
movies_df = pd.read_csv(root + "movies.csv")
actors_df = pd.read_csv(root + "actors.csv")
movie_cast_df = pd.read_csv(root + "movie_cast.csv")
genres_df = pd.read_csv(root + "genres.csv")
movie_genres_df = pd.read_csv(root + "movie_genres.csv")

In [198]:
# Connect to "movies" database
# Note: Password was replaced by **** after running code
engine = create_engine('postgresql://postgres:****@localhost:5432/movies')

In [200]:
# Load data into database using dataframe.to_sql() with loop
# Use try/except to rollback update if an error is encountered
tables_dict = {
    "movies": movies_df, 
    "actors": actors_df, 
    "movie_cast": movie_cast_df, 
    "genres": genres_df, 
    "movie_genres": movie_genres_df
}

# Create Session
Session = sessionmaker(bind=engine)
session = Session()

with session.begin():
    try:
        # Load dataframes using loop
        for key, df in tables_dict.items():
            df.to_sql(key, engine, if_exists="append", index=False)
        session.commit()
    except Exception as e:
        session.rollback()
        print(f"An error occurred: {str(e)}")

No errors raised, indicating success of writing data to PostgreSQL database.

### Querying Sample Data From PostgreSQL Tables

Data has been successfully loaded into `movies` PostgreSQL database. We can now view the results. However, to run SQL code in Jupyter Notebooks, we first need to install the SQL IPython module and connect to the `movies` database. See steps below:

In [201]:
# Install and run the ipython sql module to run sql code
!pip install ipython-sql 
%load_ext sql



In [202]:
# Connect to database (Note: Password has been replaced with **** after running code)
%sql postgresql://postgres:****@localhost:5432/movies

Connection has been successful, see sample outputs from all PostgreSQL database tables below.

#### movies

In [203]:
%%sql

SELECT *
FROM movies
LIMIT 5

 * postgresql://postgres:***@localhost:5432/movies
5 rows affected.


movie_id,movie_title,release_year,length_minutes,avg_rating,num_ratings
tt0000009,Miss Jerry,1894,45.0,5.3,206
tt0000147,The Corbett-Fitzsimmons Fight,1897,100.0,5.3,475
tt0000574,The Story of the Kelly Gang,1906,70.0,6.0,832
tt0000591,The Prodigal Son,1907,90.0,4.4,20
tt0000615,Robbery Under Arms,1907,,4.3,24


#### actors

In [204]:
%%sql

SELECT *
FROM actors
LIMIT 5

 * postgresql://postgres:***@localhost:5432/movies
5 rows affected.


actor_id,actor_name,birth_year,death_year
nm0000001,Fred Astaire,1899,1987.0
nm0000002,Lauren Bacall,1924,2014.0
nm0000003,Brigitte Bardot,1934,
nm0000004,John Belushi,1949,1982.0
nm0000005,Ingmar Bergman,1918,2007.0


#### movie_cast

In [205]:
%%sql

SELECT *
FROM movie_cast
LIMIT 5

 * postgresql://postgres:***@localhost:5432/movies
5 rows affected.


movie_id,actor_id,known_for
tt0000009,nm0063086,True
tt0000009,nm0183823,True
tt0000009,nm1309758,True
tt0000574,nm0846887,True
tt0000574,nm0846894,True


#### genres

In [206]:
%%sql

SELECT *
FROM genres
LIMIT 5

 * postgresql://postgres:***@localhost:5432/movies
5 rows affected.


genre_id,genre_name
1,Romance
2,Documentary
3,News
4,Sport
5,Action
