# SWAPI API Ingestion Notebook

This notebook is responsible for **pulling data from the Star Wars API (SWAPI)** and storing it in our **SQLite database (`swapi.db`)**.  

## Purpose
- Keep all data organized in relational tables (`people`, `planets`, `films`, `species`, etc.).
- Ensure the database is ready for analysis and visualization in later notebooks.

## Workflow
1. **Connect to the SQLite database** (`swapi.db`).
2. **Pull data from SWAPI** using Python `requests`.
4. **Store the data in the database**.
5. **Save data pulled from the starwars api(SWAPI) and store them as json files** in the data/ folder.
> By running this notebook, the database will be populated with the latest SWAPI data and ready for analysis in subsequent notebooks.


In [1]:
import requests 
import pandas as pd
import sqlite3

### Requesting tables from the *starwars api (swapi)*

In [2]:
people = requests.get("https://swapi.dev/api/people/")
people = people.json()
people = pd.DataFrame(people["results"])
people.to_json("../Data/people.json", orient="records", indent=2)

In [3]:
films = requests.get("https://swapi.dev/api/films/")
films = films.json()
films = pd.DataFrame(films["results"])
films.to_json("../Data/films.json", orient="records", indent=2)

In [4]:
starships = requests.get("https://swapi.dev/api/starships/")
starships = starships.json()
starships = pd.DataFrame(starships["results"])
starships.to_json("../Data/starships.json", orient="records", indent=2)

In [5]:
species = requests.get("https://swapi.dev/api/species/")
species = species.json()
species = pd.DataFrame(species["results"])
species.to_json("../Data/species.json", orient="records", indent=2)

In [6]:
vehicles = requests.get("https://swapi.dev/api/vehicles/")
vehicles = vehicles.json()
vehicles = pd.DataFrame(vehicles["results"])
vehicles.to_json("../Data/vehicles.json", orient="records", indent=2)

In [7]:
planets = requests.get("https://swapi.dev/api/planets/")
planets = planets.json()
planets = pd.DataFrame(planets["results"])
planets.to_json("../Data/planets.json", orient="records", indent=2)

### Creating **swapi.db** tables using **schema.sql**

In [8]:
import sqlite3

# Connect (creates the DB if it doesn't exist)
conn = sqlite3.connect("../Database/swapi.db")

# Enable foreign keys
conn.execute("PRAGMA foreign_keys = ON;")

# Read the schema.sql file
with open("../Database/schema.sql", "r") as f:
    schema_sql = f.read()

# Execute all SQL commands
conn.executescript(schema_sql)

<sqlite3.Cursor at 0x740cdb9ad140>

### Inserting rows into each table in **swapi.db**

#### Creating a function to speed up the process of creating tables for columns with url lists 

In [14]:
def extra_table(
    json_file,
    owner_col,
    list_col,
    table_name
):
    df = pd.read_json(f"../Data/{json_file}.json")
    exploded = (
        df[[owner_col, list_col]]
        .explode(list_col)
    )
    exploded.to_sql(
        table_name,
        conn,
        if_exists="replace",
        index=False
    )

#### People table 

In [15]:
people_df = pd.read_json("../Data/people.json")
people_df = people_df[[
    "name",
    "height",
    "mass",
    "hair_color",
    "skin_color",
    "eye_color",
    "birth_year",
    "gender",
    "homeworld"
]]
people_df.to_sql(
    "people",
    conn,
    if_exists="append",
    index=False
)
extra_table("people", "name", "starships", "people_starships")
extra_table("people", "name", "vehicles", "people_vehicles")
extra_table("people", "name", "species", "people_species")

#### Films table

In [16]:
films_df = pd.read_json("../Data/films.json")
films_df = films_df[[
    "title",
    "episode_id",
    "opening_crawl",
    "director",
    "producer",
    "release_date"
]]
films_df.to_sql(
    "films",
    conn,
    if_exists="append",
    index=False
)
extra_table("films", "title", "characters", "films_characters")
extra_table("films", "title", "planets", "films_planets")
extra_table("films", "title", "species", "films_species")

#### Planets table

In [17]:
planets_df = pd.read_json("../Data/planets.json")
planets_df = planets_df[[
    "name" ,
    "rotation_period" ,
    "orbital_period" ,
    "diameter" ,
    "climate" ,
    "gravity",
    "terrain",
    "surface_water",
    "population"
]]
people_df.to_sql(
    "people",
    conn,
    if_exists="append",
    index=False
)
extra_table("planets", "name", "residents", "planet_residents")

#### Species table 

In [18]:
species_df = pd.read_json("../Data/species.json")
species_df = species_df[[
    "name" ,
    "classification",
    "designation",
    "average_height",
    "skin_colors",
    "hair_colors",
    "eye_colors",
    "average_lifespan",
    "language",
    "homeworld"
]]
species_df.to_sql(
    "species",
    conn,
    if_exists="append",
    index=False
)

10

#### Vehicles table

In [19]:
vehicles_df = pd.read_json("../Data/vehicles.json")
vehicles_df = vehicles_df[[
    "name",
    "model",
    "manufacturer",
    "cost_in_credits",
    "length",
    "max_atmosphering_speed",
    "crew",
    "passengers",
    "cargo_capacity",
    "vehicle_class"
]]
vehicles_df.to_sql(
    "vehicles",
    conn,
    if_exists="append",
    index=False
)

10

#### Starships table

In [21]:
starships_df = pd.read_json("../Data/starships.json")
starships_df = starships_df[[
    "name",
    "model",
    "manufacturer",
    "cost_in_credits",
    "length",
    "max_atmosphering_speed",
    "crew",
    "passengers",
    "cargo_capacity",
    "consumables",
    "hyperdrive_rating",
    "MGLT",
    "starship_class"
]]
starships_df.to_sql(
    "starships",
    conn,
    if_exists="append",
    index=False
)

10

In [None]:
conn.commit()
conn.close()