# SQL analysis notebook 
## purpose : 
    - Analyse the data stored on the *swapi.db* database
    - Generate tables to be used on the visualization notebook and the final_report
    - Conclude insights and find patterns 
## Data :
    - Data used in this notebook is found on the swapi.db database, all tables in the database are pulled from the SWAPI api and generated in the api_ingestion notebook 
## Results :
    - By running this notebook, tables will be generated in this folder :  ../Output/tables/

In [113]:
# Importing tools needed 
import sqlite3
import pandas as pd

In [114]:
# connect to the Database (swapi.db)
conn = sqlite3.connect("../Database/swapi.db")

### Loading the tables 

In [151]:
# Creating a function to speed up the process ;
def connect(table) :
    return  pd.read_sql(f"select * from {table}", conn)

# Using the "connect" function to load the tables ;
people = connect("people")
starships = connect("starships")
species = connect("species")
planets = connect("planets")
films = connect("films")
vehicles = connect("vehicles")
films_characters = connect("films_characters")
films_planets = connect("films_planets")
films_species = connect("films_species")
people_starships = connect("people_starships")
people_vehicles = connect("people_vehicles")
people_species = connect("people_species")
planet_residents = connect("planet_residents")    

In [152]:
# Creating a simple function to speed up the process of saving the newly created tables
def save(new_table, name) :
    new_table.to_csv(f"../Output/tables/{name}.csv", index = False)

### Tallest people

In [155]:
tallest_ppl = (people[people["height"] != "unknown"]).copy()
tallest_ppl = tallest_ppl.sort_values(by="height",ascending=False)[["name","height"]]
save(tallest_ppl, "tallest_ppl")
tallest_ppl.head(5)

Unnamed: 0,name,height
7,R5-D4,97
2,R2-D2,96
73,R4-P17,96
46,Dud Bolt,94
28,Wicket Systri Warrick,88


> As we can see in the output table, The tallest person is **R5-D4** with a height of **97**

### People with the highest mass

In [147]:
heaviest_ppl = (people[people["mass"] != "unknown"]).copy()
heaviest_ppl = heaviest_ppl.sort_values(by="mass",ascending=True)[["name","mass"]]
save(heaviest_ppl, "heaviest_ppl")
heaviest_ppl.head(5)

Unnamed: 0,name,mass
15,Jabba Desilijic Tiure,1358
69,Dexter Jettster,102
17,Jek Tono Porkins,110
12,Chewbacca,112
22,Bossk,113


>  As we can see in the output table, The person with the highest mass is **Jabba Desilijic Tiure** with a mass of **1,358**

### Oldest people

In [143]:
oldest_ppl = people[people["birth_year"] != "unknown"].copy()
oldest_ppl.loc[:, "birth_year_num"] = (oldest_ppl["birth_year"]
    .str.extract(r"(\d+)")
    .astype(int))
oldest_ppl = (oldest_ppl
    .sort_values(by="birth_year_num",ascending=False)[["name", "birth_year"]])
save(oldest_ppl, "oldest_ppl")
oldest_ppl.head(3)

Unnamed: 0,name,birth_year
18,Yoda,896BBY
15,Jabba Desilijic Tiure,600BBY
12,Chewbacca,200BBY


> As we can see in the output table, the oldest person is **Yoda** with an age of *896 years before the battle of yavin (896BBY)*

### Oldest films

In [141]:
films["release_date"] = pd.to_datetime(films["release_date"])
oldest_films = (films
    .sort_values("release_date", ascending=True))[["title", "release_date"]]
save(oldest_films, "oldest_films")
oldest_films.head(5)

Unnamed: 0,title,release_date
0,A New Hope,1977-05-25
1,The Empire Strikes Back,1980-05-17
2,Return of the Jedi,1983-05-25
3,The Phantom Menace,1999-05-19
4,Attack of the Clones,2002-05-16


> As we can see in the output table, The oldest movie is **"A New Hope"** with a release date of 1977-05-25

### Film episode_id ascending

In [121]:
episode_asc = films.sort_values("episode_id", ascending=True)[['title','episode_id']]
save(episode_asc, "episode_asc")
episode_asc

Unnamed: 0,title,episode_id
3,The Phantom Menace,1
4,Attack of the Clones,2
5,Revenge of the Sith,3
0,A New Hope,4
1,The Empire Strikes Back,5
2,Return of the Jedi,6


> As we can see in the output table, The first movie episode is **"The Phantom Menace"**

### Movies with the most characters

In [138]:
films_characters_num = films_characters.copy()
films_characters_num["characters"] = (films_characters_num["characters"]
    .str.split('/').str[-2])
films_by_character_count = (films_characters_num
    .value_counts("title")
    .reset_index(name="character_count")
    .sort_values(by="character_count",ascending=False))
save(films_by_character_count,"films_by_character_count")
films_by_character_count.head(3)

Unnamed: 0,title,character_count
0,Attack of the Clones,40
1,Revenge of the Sith,34
2,The Phantom Menace,34


> As we can see in the output table, The movie with the most characters is **"Attack of the Clones"** with 40 characters

### Most frequent characters

In [159]:
people["id"] = people["url"].str.split('/').str[-2]
most_frequent_characters = (films_characters_num["characters"]
    .value_counts()
    .reset_index(name="movie_count")).copy()
most_frequent_characters = (most_frequent_characters
    .rename(columns={"characters" : "id"}))
most_frequent_characters_people = (most_frequent_characters
    .merge(people,on="id",how="left"))
most_frequent_characters = (most_frequent_characters_people
    .sort_values("movie_count", ascending=False)[["name", "movie_count"]]).copy()
save(most_frequent_characters,"most_frequent_characters")
most_frequent_characters.head(5)

Unnamed: 0,name,movie_count
0,C-3PO,6
1,R2-D2,6
2,Obi-Wan Kenobi,6
3,Yoda,5
4,Palpatine,5


> As we can see in the output table, The most frequent characters are : **C-3PO, R2-D2, Obi-Wan Kenobi**, Who all appeared in 6 films

### ...