# SpaceX API Data Ingestion and SQL Analysis

This exercise is designed to assess your Python and SQL skills while working with real-world data. You will interact with the SpaceX API, store the data in a structured format, and perform analysis using Python and SQL.

## Overview

SpaceX provides a public API that exposes detailed information about their launches, rockets, crew members, and more. In this exercise, we will work with the following API endpoints:

- **Launches:** `/v4/launches` - Provides detailed information about SpaceX launches, including launch dates, rocket IDs, and crew IDs.
- **Rockets:** `/v4/rockets` - Provides information about SpaceX rockets, such as their names and IDs.
- **Crew:** `/v4/crew` - Provides information about astronauts who have been part of SpaceX missions, including their names, agencies, and associated launches.

Your objective is to:

1. Explore the SpaceX API and understand the data returned by these endpoints.
2. Set up a Python class to manage and process the data.
3. Store the data in an SQLite database.
4. Write Python functions to interact with the database and answer key questions.

## Step 1: Explore the SpaceX API

### Task

1. Visit the [SpaceX API documentation](https://github.com/r-spacex/SpaceX-API) and familiarise yourself with the structure of the data returned by the `/v4/launches`, `/v4/rockets`, and `/v4/crew` endpoints.

2. Write Python code to:
   - Fetch data from these three endpoints using the `requests` library.
   - Print a sample of the data (e.g., the first 2-3 records) to understand its structure and the fields available.

3. Take note of key fields, such as `id`, `rocket`, `date_utc` in the `launches` endpoint and `name`, `agency` in the `crew` endpoint, which you will need for further processing.

### Example Output
For `/v4/launches`, you should identify fields like:

- `id`: The unique identifier for a launch.
- `rocket`: The ID of the rocket used in the launch.
- `date_utc`: The UTC date and time of the launch.
- `crew`: A list of crew member IDs associated with the launch.

Write the code below.

In [None]:
import requests

# Fetch data from SpaceX API
launches_url = "https://api.spacexdata.com/v4/launches"
rockets_url = "https://api.spacexdata.com/v4/rockets"
crew_url = "https://api.spacexdata.com/v4/crew"

# Fetch and explore launches data
launches_response = requests.get(launches_url)
launches_data = launches_response.json()
print("Sample Launch Data:", launches_data[:3])

# Fetch and explore rockets data
rockets_response = requests.get(rockets_url)
rockets_data = rockets_response.json()
print("Sample Rocket Data:", rockets_data[:3])

# Fetch and explore crew data
crew_response = requests.get(crew_url)
crew_data = crew_response.json()
print("Sample Crew Data:", crew_data[:3])

## Step 2: Set Up a Python Class

### Task
1. Create a Python class `SpaceXDataHandler` to manage and process the data.
2. Implement the following features:
   - An attribute `launches` to store launch data.
   - An attribute `days_since_last_launch` that catalogs the days since the last launch for each launch.
   - A method `fetch_launch_data` to fetch launch data from the SpaceX API and populate the `launches` attribute.
   - A method `calculate_days_since_last_launch` to calculate the days since the last launch for each launch.

3. Print out a sample of the `days_since_last_launch` attribute to verify your implementation.

### Expected Output
You should see a sample of `days_since_last_launch` like:

```
Days since last launch for each launch (sample): {'5eb87d47ffd86e000604b38a': 456, '5eb87d48ffd86e000604b38b': 123, ...}
```

In [None]:
from datetime import datetime

class SpaceXDataHandler:
    def __init__(self):
        self.launches = []
        self.days_since_last_launch = {}

    def fetch_launch_data(self):
        launches_url = "https://api.spacexdata.com/v4/launches"
        response = requests.get(launches_url)
        response.raise_for_status()
        self.launches = response.json()

    def calculate_days_since_last_launch(self):
        for launch in self.launches:
            launch_date = datetime.fromisoformat(launch['date_utc'].replace('Z', ''))
            days_since = (datetime.utcnow() - launch_date).days
            self.days_since_last_launch[launch['id']] = days_since

# Instantiate the class and verify implementation
spacex_data = SpaceXDataHandler()
spacex_data.fetch_launch_data()
spacex_data.calculate_days_since_last_launch()
print("Days since last launch for each launch (sample):", list(spacex_data.days_since_last_launch.items())[:5])

## Step 3: Store Data in SQLite

### Task
1. Set up an SQLite database `spacex_data.db`.
2. Create three tables:
   - `launches`: To store launch data with columns `launch_id`, `rocket_id`, and `launch_date`.
   - `rockets`: To store rocket data with columns `rocket_id` and `rocket_name`.
   - `crew`: To store crew data with columns `crew_id`, `name`, `agency`, and `launch_id`.

3. Insert data into these tables from the SpaceX API.

4. Print a sample of each table to verify the data insertion.

In [None]:
# Connect to SQLite database
conn = sqlite3.connect('spacex_data.db')
c = conn.cursor()

# Create tables
c.execute('''CREATE TABLE IF NOT EXISTS launches (launch_id TEXT PRIMARY KEY, rocket_id TEXT, launch_date TEXT)''')
c.execute('''CREATE TABLE IF NOT EXISTS rockets (rocket_id TEXT PRIMARY KEY, rocket_name TEXT)''')
c.execute('''CREATE TABLE IF NOT EXISTS crew (crew_id TEXT PRIMARY KEY, name TEXT, agency TEXT, launch_id TEXT)''')

# Insert data into launches
for launch in spacex_data.launches:
    c.execute('''INSERT OR IGNORE INTO launches (launch_id, rocket_id, launch_date) VALUES (?, ?, ?)''',
              (launch['id'], launch['rocket'], launch['date_utc']))

# Insert data into rockets
for rocket in rockets_data:
    c.execute('''INSERT OR IGNORE INTO rockets (rocket_id, rocket_name) VALUES (?, ?)''',
              (rocket['id'], rocket['name']))

# Insert data into crew
for member in crew_data:
    for launch_id in member['launches']:
        c.execute('''INSERT OR IGNORE INTO crew (crew_id, name, agency, launch_id) VALUES (?, ?, ?, ?)''',
                  (member['id'], member['name'], member['agency'], launch_id))

conn.commit()

# Verify data insertion
print("Sample Launches:", c.execute('SELECT * FROM launches LIMIT 5').fetchall())
print("Sample Rockets:", c.execute('SELECT * FROM rockets LIMIT 5').fetchall())
print("Sample Crew:", c.execute('SELECT * FROM crew LIMIT 5').fetchall())

## Step 4: Write Python Functions

### Task
1. Write a function `get_crew_by_launch_id` that takes a `launch_id` as input and retrieves the associated crew members (name and agency) from the SQLite database.
2. Test this function with a valid `launch_id` to ensure it works correctly.

In [None]:
# Define the function to fetch crew by launch ID
def get_crew_by_launch_id(launch_id):
    c.execute('''SELECT name, agency FROM crew WHERE launch_id = ?''', (launch_id,))
    return c.fetchall()

# Test the function
sample_launch_id = spacex_data.launches[0]['id']
crew = get_crew_by_launch_id(sample_launch_id)
print(f"Crew for launch {sample_launch_id}: {crew}")