# **SpaceX Falcon 9 First Stage Landing Prediction**


## Web scraping Falcon 9 and Falcon Heavy Launches Records from Wikipedia


Web scraping will be performed to collect Falcon 9 historical launch records from a Wikipedia page titled "List of Falcon 9 and Falcon Heavy launches".

[https://en.wikipedia.org/wiki/List_of_Falcon\_9\_and_Falcon_Heavy_launches](https://en.wikipedia.org/wiki/List_of_Falcon\_9\_and_Falcon_Heavy_launches?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDS0321ENSkillsNetwork26802033-2022-01-01)


![](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/labs/module\_1\_L2/images/Falcon9\_rocket_family.svg)


Below is an example of a successful launch and landing and unsuccessful ones.

<div style="text-align:center">
<img src="./../resources/success_landing.gif" alt="Success" style="height:200px; width:auto; display:inline-block; margin:auto;">
<img src="./../resources/unsuccess_landing.gif" alt="Unsuccess" style="height:200px; width:auto; display:inline-block; margin:auto;">
</div>  

More specifically, the launch records are stored in a HTML table shown below:


![](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/labs/module\_1\_L2/images/falcon9-launches-wiki.png)


## Objectives

* Web scrap Falcon 9 launch records with `BeautifulSoup`:
    * Extract a Falcon 9 launch records HTML table from Wikipedia
    * Parse the table and convert it into a Pandas data frame

## Import Libraries

In [1]:
# !pip3 install beautifulsoup4
# !pip3 install requests
# !pip3 install html5lib

In [2]:
import sys
import requests
from bs4 import BeautifulSoup
import re
import unicodedata
import pandas as pd

## Define Auxiliary Functions

In [3]:
def date_time(table_cells):
    """
    This function returns the data and time from the HTML  table cell
    Input: the  element of a table data cell extracts extra row
    """
    return [data_time.strip() for data_time in list(table_cells.strings)][0:2]

def booster_version(table_cells):
    """
    This function returns the booster version from the HTML  table cell 
    Input: the  element of a table data cell extracts extra row
    """
    out=''.join([booster_version for i, booster_version in enumerate(table_cells.strings) if i%2==0][0:-1])
    return out

def landing_status(table_cells):
    """
    This function returns the landing status from the HTML table cell 
    Input: the  element of a table data cell extracts extra row
    """
    out=[i for i in table_cells.strings][0]
    return out


def get_mass(table_cells):
    mass=unicodedata.normalize("NFKD", table_cells.text).strip()
    if mass:
        mass.find("kg")
        new_mass=mass[0:mass.find("kg")+2]
    else:
        new_mass=0
    return new_mass


def extract_column_from_header(row):
    """
    This function returns the landing status from the HTML table cell 
    Input: the  element of a table data cell extracts extra row
    """
    if (row.br):
        row.br.extract()
    if row.a:
        row.a.extract()
    if row.sup:
        row.sup.extract()
        
    colunm_name = ' '.join(row.contents)
    
    # Filter the digit and empty names
    if not(colunm_name.strip().isdigit()):
        colunm_name = colunm_name.strip()
        return colunm_name    


In [4]:
static_url = "https://en.wikipedia.org/w/index.php?title=List_of_Falcon_9_and_Falcon_Heavy_launches"

### Request the Falcon9 Launch Wiki page from its URL

First, let's perform an HTTP GET method to request the Falcon9 Launch HTML page, as an HTTP response.


In [5]:
# Perform an HTTP GET method to request the Falcon9 Launch HTML page, as an HTTP response
data = requests.get(static_url).text

In [6]:
# Use BeautifulSoup() to create a BeautifulSoup object from a response text content
soup = BeautifulSoup(data, 'html5lib')

In [7]:
# Print the page title to verify if the BeautifulSoup object was created properly
print(soup.title)

<title>List of Falcon 9 and Falcon Heavy launches - Wikipedia</title>


### Extract all column/variable names from the HTML table header

In [8]:
# Finding all tables on the wiki page
html_tables=soup.find_all('table')

In [9]:
# Fifth table is our target table contains the actual launch records 
first_launch_table = html_tables[4]
# print(first_launch_table)

In [10]:
column_names = []

# Find all table rows in the first table
rows = first_launch_table.find_all('tr')

# Extract the header row (first row) and get all columns
header_row = rows[0]
columns = header_row.find_all('th')

column_names = [col.text.strip() for col in columns ]
 

In [11]:
# Print the column names
print(column_names)

['Flight No.', 'Date andtime (UTC)', 'Version,booster[b]', 'Launchsite', 'Payload[c]', 'Payload mass', 'Orbit', 'Customer', 'Launchoutcome', 'Boosterlanding']


## Create a data frame by parsing the launch HTML tables

We will create an empty dictionary with keys from the extracted column names in the previous task. Later, this dictionary will be converted into a Pandas dataframe


In [12]:
# Creating an empty dictionary with keys from the extracted column names. This dictionary will be converted into a Pandas dataframe
launch_dict= dict.fromkeys(column_names)

# Removed an irrelvant column
del launch_dict['Date andtime (UTC)']
del launch_dict['Version,booster[b]']
del launch_dict['Launchsite']
del launch_dict['Payload[c]']
del launch_dict['Launchoutcome']
del launch_dict['Boosterlanding']

# Let's initial the launch_dict with each value to be an empty list
launch_dict['Flight No.'] = []
launch_dict['Launch site'] = []
launch_dict['Payload'] = []
launch_dict['Payload mass'] = []
launch_dict['Orbit'] = []
launch_dict['Customer'] = []
launch_dict['Launch outcome'] = []

# Added some new columns
launch_dict['Version Booster']=[]
launch_dict['Booster landing']=[]
launch_dict['Date']=[]
launch_dict['Time']=[]

'launch_dict' will be filled up with launch records extracted from table rows. Typically, unexpected annotations and other types of noises, such as reference links, missing values N/A, inconsistent formatting, etc., may be contained in HTML tables in Wiki pages. 

In [13]:
extracted_row = 0

#Extract each table 
for table_number, table in enumerate(soup.find_all('table',"wikitable plainrowheaders collapsible")):
   # get table row 
    for rows in table.find_all("tr"):
        #check to see if first table heading is as number corresponding to launch a number 
        if rows.th:
            if rows.th.string:
                flight_number = rows.th.string.strip()
                flag = flight_number.isdigit()
        else:
            flag=False
        #get table element 
        row=rows.find_all('td')
        #if it is number save cells in a dictonary 
        if flag:
            extracted_row += 1

            # Flight Number value
            launch_dict["Flight No."].append(flight_number)
            datatimelist=date_time(row[0])
            
            # Date value
            date = datatimelist[0].strip(',')
            launch_dict["Date"].append(date)            
            
            # Time value
            time = datatimelist[1]
            launch_dict["Time"].append(time)            
              
            # Booster version
            bv=booster_version(row[1])
            if not(bv):
                bv=row[1].a.string
            print(bv)
            launch_dict["Version Booster"].append(bv)            

            # Launch Site
            launch_site = row[2].a.string
            launch_dict["Launch site"].append(launch_site)
            
            # Payload
            payload = row[3].a.string
            launch_dict["Payload"].append(payload)
            
            # Payload Mass
            payload_mass = get_mass(row[4])
            launch_dict["Payload mass"].append(payload_mass)
            
            # Orbit
            orbit = row[5].a.string
            launch_dict["Orbit"].append(orbit)
            
            # Customer
            if (row[6].a is not None):
                customer=row[6].a.string
            else:
                customer=row[6].string
            launch_dict["Customer"].append(customer)
            
            # Launch outcome
            launch_outcome = list(row[7].strings)[0]
            launch_dict["Launch outcome"].append(launch_outcome)
            
            # Booster landing
            booster_landing = landing_status(row[8])
            launch_dict["Booster landing"].append(booster_landing)
            

F9 B5
F9 B5
F9 B5
F9 B5
F9 B5
F9 B5
F9 B5
F9 B5B1058.1
F9 B5
F9 B5
F9 B5
F9 B5B1058.2
F9 B5
F9 B5B1049.6
F9 B5
F9 B5B1060.2
F9 B5B1058.3
F9 B5B1051.6
F9 B5
F9 B5
F9 B5B1061.1
F9 B5
F9 B5B1049.7
F9 B5B1058.4
F9 B5
F9 B5
F9 B5
F9 B5B1051.8
F9 B5B1058.5
F9 B5B1060.5
F9 B5
F9 B5B1049.8
F9 B5B1058.6
F9 B5
F9 B5B1060.6
F9 B5
F9 B5B1061.2
F9 B5B1060.7
F9 B5B1049.9
F9 B5B1051.10
F9 B5
F9 B5
F9 B5B1067.1
F9 B5
F9 B5B1062.2
F9 B5
F9 B5
F9 B5B1049.10
F9 B5B1062.3
F9 B5B1067.2
F9 B5B1058.9
F9 B5B1063.3
F9 B5B1060.9
F9 B5
F9 B5
F9 B5
F9 B5
F9 B5
F9 B5
F9 B5
F9 B5
F9 B5
F9 B5
F9 B5
F9 B5
F9 B5
F9 B5B1052.4
F9 B5
F9 B5
F9 B5
F9 B5
F9 B5
F9 B5B1067.4
F9 B5
F9 B5
F9 B5
F9 B5
F9 B5
F9 B5B1061.8
F9 B5B1062.7
F9 B5B1060.13
F9 B5
F9 B5B1061.9
F9 B5B1073.2
F9 B5B1058.13
F9 B5
F9 B5
F9 B5B1051.13
F9 B5
F9 B5
F9 B5
F9 B5B1073.3
F9 B5
F9 B5
F9 B5B1069.2
F9 B5
F9 B5
F9 B5
F9 B5B1067.6
F9 B5B1073.4
F9 B5
F9 B5
F9 B5B1060.14
F9 B5
F9 B5B1062.10
F9 B5
F9 B5
F9 B5
F9 B5B1049.11
F9 B5
F9 B5
F9 B5 
F9 B5
F9 B5
F9 B5


In [14]:
# Create dataframe
df = pd.DataFrame(launch_dict)

In [15]:
df.head()

Unnamed: 0,Flight No.,Payload mass,Orbit,Customer,Launch site,Payload,Launch outcome,Version Booster,Booster landing,Date,Time
0,78,"15,600 kg",LEO,SpaceX,CCSFS,Starlink,Success\n,F9 B5,Success,7 January 2020,02:19:21
1,79,"12,050 kg",Sub-orbital,NASA,KSC,Crew Dragon in-flight abort test,Success\n,F9 B5,No attempt\n,19 January 2020,15:30
2,80,"15,600 kg",LEO,SpaceX,CCSFS,Starlink,Success\n,F9 B5,Success,29 January 2020,14:07
3,81,"15,600 kg",LEO,SpaceX,CCSFS,Starlink,Success\n,F9 B5,Failure,17 February 2020,15:05
4,82,"1,977 kg",LEO,NASA,CCSFS,SpaceX CRS-20,Success\n,F9 B5,Success,7 March 2020,04:50


In [16]:
df.tail()

Unnamed: 0,Flight No.,Payload mass,Orbit,Customer,Launch site,Payload,Launch outcome,Version Booster,Booster landing,Date,Time
131,209,"6,000 kg",LEO,OneWeb,CCSFS,OneWeb #17,Success\n,F9 B5,Success,9 March 2023,19:13
132,210,"2,852 kg",LEO,NASA,KSC,SpaceX CRS-27,Success\n,F9 B5B1073.7,Success,15 March 2023,00:30
133,211,"~16,200 kg",LEO,SpaceX,VSFB,Starlink Group 2-8,Success\n,F9 B5,Success,17 March 2023,19:26
134,212,"~7,000 kg",GTO,SES,CCSFS,SES-18 and SES-19,Success\n,F9 B5,Success,17 March 2023,23:38
135,213,"~17,400 kg",LEO,SpaceX,CCSFS,Starlink Group 5-5,Success\n,F9 B5B1067.10,Success,24 March 2023,15:43


In [17]:
df.describe()

Unnamed: 0,Flight No.,Payload mass,Orbit,Customer,Launch site,Payload,Launch outcome,Version Booster,Booster landing,Date,Time
count,136,136,136,136,136,130,136,136,136,136,136
unique,136,72,9,25,3,88,2,54,3,133,132
top,78,"15,600 kg",LEO,SpaceX,CCSFS,Starlink,Success\n,F9 B5,Success,5 October 2022,15:43
freq,1,24,95,75,71,39,133,83,130,2,2


In [18]:
df = df.replace('\n','', regex=True)

In [19]:
df.head()

Unnamed: 0,Flight No.,Payload mass,Orbit,Customer,Launch site,Payload,Launch outcome,Version Booster,Booster landing,Date,Time
0,78,"15,600 kg",LEO,SpaceX,CCSFS,Starlink,Success,F9 B5,Success,7 January 2020,02:19:21
1,79,"12,050 kg",Sub-orbital,NASA,KSC,Crew Dragon in-flight abort test,Success,F9 B5,No attempt,19 January 2020,15:30
2,80,"15,600 kg",LEO,SpaceX,CCSFS,Starlink,Success,F9 B5,Success,29 January 2020,14:07
3,81,"15,600 kg",LEO,SpaceX,CCSFS,Starlink,Success,F9 B5,Failure,17 February 2020,15:05
4,82,"1,977 kg",LEO,NASA,CCSFS,SpaceX CRS-20,Success,F9 B5,Success,7 March 2020,04:50


In [20]:
# export it to a CSV
df.to_csv('./../data/spacex_web_scraped.csv', index=False)