# **SpaceX  Falcon 9 First Stage Landing Prediction**


# Collecting the data

On its website, SpaceX promotes Falcon 9 rocket launches at a price of 62 million dollars, which is significantly lower than other providers who charge upward of 165 million dollars per launch. The primary reason for this cost savings is SpaceX's ability to reuse the first stage of the rocket. As a result, by determining the likelihood of the first stage landing successfully, we can calculate the cost of a launch. 

This information can be useful for a competing company that wants to bid against SpaceX for a rocket launch. We will gather and format data from an API to predict the success of the Falcon 9 first stage landing. 

Below is an example of a successful launch and landing and unsuccessful ones.

<div style="text-align:center">
<img src="./../Resources/success_landing.gif" alt="Success" style="height:200px; width:auto; display:inline-block; margin:auto;">
<img src="./../Resources/unsuccess_landing.gif" alt="Unsuccess" style="height:200px; width:auto; display:inline-block; margin:auto;">
</div>    

## Objectives


- Request to the SpaceX API
- Clean the requested data

----

## Import Libraries

In [1]:
import requests
import pandas as pd
import numpy as np
import datetime

# # Setting this option will print all collumns of a dataframe
# pd.set_option('display.max_columns', None)
# # Setting this option will print all of the data in a feature
# pd.set_option('display.max_colwidth', None)

## Define Auxiliary Functions
Define auxiliary functions to retrieve data from the SpaceX API

In [2]:
# Takes the dataset and uses the rocket column to call the API and append the data to the list
def get_booster_version(data):
    """Retrieves the booster version for each rocket launch"""
    return [requests.get(f"https://api.spacexdata.com/v4/rockets/{x}").json()['name'] for x in data['rocket'] if x]
            

In [3]:
# Takes the dataset and uses the launchpad column to call the API and append the data to the list

def get_launch_site(data):
    """Retrieves the launch site data for each rocket launch"""
    for x in data['launchpad']:
        if x:
            response = requests.get(f"https://api.spacexdata.com/v4/launchpads/{x}").json()
            Longitude.append(response['longitude'])
            Latitude.append(response['latitude'])
            LaunchSite.append(response['name'])

In [4]:
# Takes the dataset and uses the payloads column to call the API and append the data to the lists

def get_payload_data(data):
    """Retrieves the payload data for each rocket launch"""
    for load in data['payloads']:
        if load:
            response = requests.get(f"https://api.spacexdata.com/v4/payloads/{load}").json()
            PayloadMass.append(response['mass_kg'])
            Orbit.append(response['orbit'])

In [5]:
# Takes the dataset and uses the cores column to call the API and append the data to the lists

def get_core_data(data):
    """Retrieves the core data for each rocket launch"""
    for core in data['cores']:
        if core['core'] != None:
            response = requests.get(f"https://api.spacexdata.com/v4/cores/{core['core']}").json()
            Block.append(response['block'])
            ReusedCount.append(response['reuse_count'])
            Serial.append(response['serial'])
        else:
            Block.append(None)
            ReusedCount.append(None)
            Serial.append(None)
        Outcome.append(f"{core['landing_success']} {core['landing_type']}")    
#         Outcome.append(str(core['landing_success'])+' '+str(core['landing_type']))
        Flights.append(core['flight'])
        GridFins.append(core['gridfins'])
        Reused.append(core['reused'])
        Legs.append(core['legs'])
        LandingPad.append(core['landpad'])

In [6]:
spacex_url="https://api.spacexdata.com/v4/launches/past"

In [7]:
response = requests.get(spacex_url)

In [8]:
# print(response.content)

### Request and parse the SpaceX launch data using the GET request

In [9]:
# Use json_normalize meethod to convert the json result into a dataframe
# Check if the request was successful (status code 200)
if response.status_code == 200:
    # Decode the response content as a JSON object
    json_data = response.json()
    
    # Use json_normalize method to convert the JSON result into a dataframe
    spacex_data = pd.json_normalize(json_data)
else:
    print('Request failed with status code:', response.status_code)


In [10]:
# Get the head of the dataframe    
print(spacex_data.head())

       static_fire_date_utc  static_fire_date_unix    net  window  \
0  2006-03-17T00:00:00.000Z           1.142554e+09  False     0.0   
1                      None                    NaN  False     0.0   
2                      None                    NaN  False     0.0   
3  2008-09-20T00:00:00.000Z           1.221869e+09  False     0.0   
4                      None                    NaN  False     0.0   

                     rocket success  \
0  5e9d0d95eda69955f709d1eb   False   
1  5e9d0d95eda69955f709d1eb   False   
2  5e9d0d95eda69955f709d1eb   False   
3  5e9d0d95eda69955f709d1eb    True   
4  5e9d0d95eda69955f709d1eb    True   

                                            failures  \
0  [{'time': 33, 'altitude': None, 'reason': 'mer...   
1  [{'time': 301, 'altitude': 289, 'reason': 'har...   
2  [{'time': 140, 'altitude': 35, 'reason': 'resi...   
3                                                 []   
4                                                 []   

             

It will be noticed that a lot of the data comprises of IDs. For instance, in the "rocket" column, there is no information about the rocket, but only an identification number is given. The API will now be utilized again to retrieve information about the launches by using the IDs provided for each launch. Specifically, the columns of "rocket", "payloads", "launchpad", and "cores" will be utilized.

In [11]:
# Subset of dataframe keeping only the needed features and the flight number, and date_utc.
spacex_data = spacex_data[['rocket', 'payloads', 'launchpad', 'cores', 'flight_number', 'date_utc']]

# Remove rows with multiple cores because those are falcon rockets with 2 extra rocket boosters and rows that have multiple payloads in a single rocket.
spacex_data = spacex_data[spacex_data['cores'].map(len)==1]
spacex_data = spacex_data[spacex_data['payloads'].map(len)==1]

# Since payloads and cores are lists of size 1 we will also extract the single value in the list and replace the feature.
spacex_data['cores'] = spacex_data['cores'].map(lambda x : x[0])
spacex_data['payloads'] = spacex_data['payloads'].map(lambda x : x[0])

# We also want to convert the date_utc to a datetime datatype and then extracting the date leaving the time
spacex_data['date'] = pd.to_datetime(spacex_data['date_utc']).dt.date

# Using the date we will restrict the dates of the launches
spacex_data = spacex_data[spacex_data['date'] <= datetime.date(2020, 11, 13)]

* From the <code>rocket</code> is geting the booster name

* From the <code>payload</code> is geting the mass of the payload and the orbit that it is going to

* From the <code>launchpad</code> is geting the name of the launch site being used, the longitude, and the latitude.

* From <code>cores</code> is geting the outcome of the landing, the type of the landing, number of flights with that core, whether gridfins were used, whether the core is reused, whether legs were used, the landing pad used, the block of the core which is a number used to seperate version of cores, the number of times this specific core has been reused, and the serial of the core.

The data obtained from these requests will be stored in lists and used to create a new dataframe.

In [12]:
#Global variables 
BoosterVersion = []
PayloadMass = []
Orbit = []
LaunchSite = []
Outcome = []
Flights = []
GridFins = []
Reused = []
Legs = []
LandingPad = []
Block = []
ReusedCount = []
Serial = []
Longitude = []
Latitude = []

In [13]:
BoosterVersion = get_booster_version(spacex_data)

BoosterVersion[0:5]

['Falcon 1', 'Falcon 1', 'Falcon 1', 'Falcon 1', 'Falcon 9']

In [14]:
get_launch_site(spacex_data)

LaunchSite[0:5]

['Kwajalein Atoll',
 'Kwajalein Atoll',
 'Kwajalein Atoll',
 'Kwajalein Atoll',
 'CCSFS SLC 40']

In [15]:
get_payload_data(spacex_data)

PayloadMass[0:5]

[20, None, 165, 200, None]

In [16]:
get_core_data(spacex_data)

In [17]:
# Combine the columns into a dictionary
launch_dict = {'FlightNumber': list(spacex_data['flight_number']),
'Date': list(spacex_data['date']),
'BoosterVersion':BoosterVersion,
'PayloadMass':PayloadMass,
'Orbit':Orbit,
'LaunchSite':LaunchSite,
'Outcome':Outcome,
'Flights':Flights,
'GridFins':GridFins,
'Reused':Reused,
'Legs':Legs,
'LandingPad':LandingPad,
'Block':Block,
'ReusedCount':ReusedCount,
'Serial':Serial,
'Longitude': Longitude,
'Latitude': Latitude}

In [18]:
# Create Pandas dataframe from launch_dict
launch_data = pd.DataFrame(launch_dict)

In [19]:
# Show the head of the dataframe
launch_data.head()

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
0,1,2006-03-24,Falcon 1,20.0,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin1A,167.743129,9.047721
1,2,2007-03-21,Falcon 1,,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin2A,167.743129,9.047721
2,4,2008-09-28,Falcon 1,165.0,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin2C,167.743129,9.047721
3,5,2009-07-13,Falcon 1,200.0,LEO,Kwajalein Atoll,None None,1,False,False,False,,,0,Merlin3C,167.743129,9.047721
4,6,2010-06-04,Falcon 9,,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857


### Filter the dataframe to only include `Falcon 9` launches

In [20]:
# Filter the data dataframe using the BoosterVersion column to only keep the Falcon 9 launches.
data_falcon9 = launch_data[launch_data['BoosterVersion']!='Falcon 1']

In [21]:
data_falcon9.head()

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
4,6,2010-06-04,Falcon 9,,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857
5,8,2012-05-22,Falcon 9,525.0,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0005,-80.577366,28.561857
6,10,2013-03-01,Falcon 9,677.0,ISS,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0007,-80.577366,28.561857
7,11,2013-09-29,Falcon 9,500.0,PO,VAFB SLC 4E,False Ocean,1,False,False,False,,1.0,0,B1003,-120.610829,34.632093
8,12,2013-12-03,Falcon 9,3170.0,GTO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B1004,-80.577366,28.561857


In [22]:
data_falcon9.shape

(90, 17)

Now that we have removed some values we should reset the FlgihtNumber column


In [23]:
# Reset the FlgihtNumber column becouse removed some values
data_falcon9 = data_falcon9.copy()
data_falcon9.loc[:,'FlightNumber'] = range(1, data_falcon9.shape[0]+1)
data_falcon9

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
4,1,2010-06-04,Falcon 9,,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857
5,2,2012-05-22,Falcon 9,525.0,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0005,-80.577366,28.561857
6,3,2013-03-01,Falcon 9,677.0,ISS,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0007,-80.577366,28.561857
7,4,2013-09-29,Falcon 9,500.0,PO,VAFB SLC 4E,False Ocean,1,False,False,False,,1.0,0,B1003,-120.610829,34.632093
8,5,2013-12-03,Falcon 9,3170.0,GTO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B1004,-80.577366,28.561857
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
89,86,2020-09-03,Falcon 9,15600.0,VLEO,KSC LC 39A,True ASDS,2,True,True,True,5e9e3032383ecb6bb234e7ca,5.0,12,B1060,-80.603956,28.608058
90,87,2020-10-06,Falcon 9,15600.0,VLEO,KSC LC 39A,True ASDS,3,True,True,True,5e9e3032383ecb6bb234e7ca,5.0,13,B1058,-80.603956,28.608058
91,88,2020-10-18,Falcon 9,15600.0,VLEO,KSC LC 39A,True ASDS,6,True,True,True,5e9e3032383ecb6bb234e7ca,5.0,12,B1051,-80.603956,28.608058
92,89,2020-10-24,Falcon 9,15600.0,VLEO,CCSFS SLC 40,True ASDS,3,True,True,True,5e9e3033383ecbb9e534e7cc,5.0,12,B1060,-80.577366,28.561857


## Data Wrangling


In [24]:
# Some of the rows are missing values in dataset
data_falcon9.isnull().sum()

FlightNumber       0
Date               0
BoosterVersion     0
PayloadMass        5
Orbit              0
LaunchSite         0
Outcome            0
Flights            0
GridFins           0
Reused             0
Legs               0
LandingPad        26
Block              0
ReusedCount        0
Serial             0
Longitude          0
Latitude           0
dtype: int64

### Dealing with Missing Values

In [25]:
# Calculate the mean value of PayloadMass column
PayloadMass_mean = data_falcon9.PayloadMass.mean()

# Replace the np.nan values with its mean value
data_falcon9.PayloadMass.replace(np.nan, PayloadMass_mean, inplace=True)

In [26]:
data_falcon9.isnull().sum()

FlightNumber       0
Date               0
BoosterVersion     0
PayloadMass        0
Orbit              0
LaunchSite         0
Outcome            0
Flights            0
GridFins           0
Reused             0
Legs               0
LandingPad        26
Block              0
ReusedCount        0
Serial             0
Longitude          0
Latitude           0
dtype: int64

There are no missing values in dataset except for in <code>LandingPad</code>.

In [27]:
print(f"Total number of launches: {len(launch_data)}")
print(f"Average payload mass (kg): {launch_data['PayloadMass'].mean()}")
print(f"Most common booster version: {launch_data['BoosterVersion'].mode()[0]}")

Total number of launches: 94
Average payload mass (kg): 5919.16534090909
Most common booster version: Falcon 9


In [28]:
# export it to a CSV
data_falcon9.to_csv('./../0_DataSets/dataset_part_1.csv', index=False)