# **Space X  Falcon 9 First Stage Landing Prediction**
## Web scraping Falcon 9 and Falcon Heavy Launches Records from Wikipedia

Estimated time needed: **40** minutes


In this lab, you will be performing web scraping to collect Falcon 9 historical launch records from a Wikipedia page titled `List of Falcon 9 and Falcon Heavy launches`

[https://en.wikipedia.org/wiki/List_of_Falcon\_9\_and_Falcon_Heavy_launches](https://en.wikipedia.org/wiki/List_of_Falcon\_9\_and_Falcon_Heavy_launches?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDS0321ENSkillsNetwork26802033-2022-01-01)


![](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/labs/module\_1\_L2/images/Falcon9\_rocket_family.svg)


Falcon 9 first stage will land successfully


![](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/api/Images/landing\_1.gif)


Several examples of an unsuccessful landing are shown here:


![](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/api/Images/crash.gif)


More specifically, the launch records are stored in a HTML table shown below:


![](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/labs/module\_1\_L2/images/falcon9-launches-wiki.png)


<h3>Objectives</h3>

* Use BeautifulSoup to extract/web scrape Falcon 9 launch records from an HTML table on Wikipedia
* Parse the table and convert it into a Pandas data frame

<h3>Import Libraries</h3>

In [1]:
!pip3 install beautifulsoup4
!pip3 install requests
print("Installation complete.")

Installation complete.


In [2]:
import sys
import requests
from bs4 import BeautifulSoup
import re
import unicodedata
import pandas as pd
print("All libraries have been imported.")

All libraries have been imported.


<h3=>Define Useful Functions</h3>

In [3]:
def date_time(table_cells):
    """
    This function returns the data and time from the HTML  table cell
    Input: the  element of a table data cell extracts extra row
    """
    return [data_time.strip() for data_time in list(table_cells.strings)][0:2]

def booster_version(table_cells):
    """
    This function returns the booster version from the HTML  table cell 
    Input: the  element of a table data cell extracts extra row
    """
    out=''.join([booster_version for i,booster_version in enumerate( table_cells.strings) if i%2==0][0:-1])
    return out

def landing_status(table_cells):
    """
    This function returns the landing status from the HTML table cell 
    Input: the  element of a table data cell extracts extra row
    """
    out=[i for i in table_cells.strings][0]
    return out


def get_mass(table_cells):
    mass=unicodedata.normalize("NFKD", table_cells.text).strip()
    if mass:
        mass.find("kg")
        new_mass=mass[0:mass.find("kg")+2]
    else:
        new_mass=0
    return new_mass


def extract_column_from_header(row):
    """
    This function returns the landing status from the HTML table cell 
    Input: the  element of a table data cell extracts extra row
    """
    if (row.br):
        row.br.extract()
    if row.a:
        row.a.extract()
    if row.sup:
        row.sup.extract()
        
    colunm_name = ' '.join(row.contents)
    
    if not(colunm_name.strip().isdigit()):
        colunm_name = colunm_name.strip()
        return colunm_name    


### TASK 1: Request the Falcon9 Launch Wiki page from its URL


First, let's perform an HTTP GET method to request the Falcon9 Launch HTML page, as an HTTP response.


To keep the lab tasks consistent, you will be asked to scrape the data from a snapshot of the  `List of Falcon 9 and Falcon Heavy launches` Wikipage updated on
`9th June 2021`


In [4]:
static_url = "https://en.wikipedia.org/w/index.php?title=List_of_Falcon_9_and_Falcon_Heavy_launches&oldid=1027686922"
response = requests.get(static_url).text

In [5]:
response[0:100]

'<!DOCTYPE html>\n<html class="client-nojs" lang="en" dir="ltr">\n<head>\n<meta charset="UTF-8"/>\n<title'

In [6]:
soup = BeautifulSoup(response, "html.parser")
print(soup.prettify()[0:500], '\n==============\n', soup.prettify()[5000:5500])

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   List of Falcon 9 and Falcon Heavy launches - Wikipedia
  </title>
  <script>
   document.documentElement.className="client-js";RLCONF={"wgBreakFrames":false,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequ 
 ;skin=vector">
  </script>
  <meta content="" name="ResourceLoaderDynamicStyles"/>
  <link href="/w/load.php?lang=en&amp;modules=site.styles&amp;only=styles&amp;skin=vector" rel="stylesheet"/>
  <meta content="MediaWiki 1.40.0-wmf.17" name="generator"/>
  <meta content="origin" name="referrer"/>
  <meta content="origin-when-crossorigin" name="referrer"/>
  <meta content="origin-when-cross-origin" name="referrer"/>
  <meta content="noindex,nofollow,max-image-preview:standard" name="robots"/>
 

In [7]:
soup.title

<title>List of Falcon 9 and Falcon Heavy launches - Wikipedia</title>

In [8]:
title_soup = soup.find_all('title')
title_soup

[<title>List of Falcon 9 and Falcon Heavy launches - Wikipedia</title>]

### TASK 2: Extract all column/variable names from the HTML table header


In [9]:
html_tables = soup.find_all('table')

In [10]:
first_launch_table = html_tables[2]
print(str(first_launch_table)[0:500])

<table class="wikitable plainrowheaders collapsible" style="width: 100%;">
<tbody><tr>
<th scope="col">Flight No.
</th>
<th scope="col">Date and<br/>time (<a href="/wiki/Coordinated_Universal_Time" title="Coordinated Universal Time">UTC</a>)
</th>
<th scope="col"><a href="/wiki/List_of_Falcon_9_first-stage_boosters" title="List of Falcon 9 first-stage boosters">Version,<br/>Booster</a> <sup class="reference" id="cite_ref-booster_11-0"><a href="#cite_note-booster-11">[b]</a></sup>
</th>
<th scope


In [11]:
th_elements = first_launch_table.find_all('th')
len(th_elements)

17

In [12]:
for count,i in enumerate(th_elements):
    print(extract_column_from_header(i))

Flight No.
Date and time ( )

Launch site
Payload
Payload mass
Orbit
Customer
Launch outcome

None
None
None
None
None
None
None


In [13]:
column_names = []

for count,i in enumerate(th_elements):
    extract = extract_column_from_header(i)
    if extract is not None and len(extract) > 0:
        print(count, extract)
        column_names.append(extract)
    else:
        print(f'{count} {extract} --- --- --- [Discarded]')

column_names

0 Flight No.
1 Date and time ( )
2  --- --- --- [Discarded]
3 Launch site
4 Payload
5 Payload mass
6 Orbit
7 Customer
8 Launch outcome
9  --- --- --- [Discarded]
10 None --- --- --- [Discarded]
11 None --- --- --- [Discarded]
12 None --- --- --- [Discarded]
13 None --- --- --- [Discarded]
14 None --- --- --- [Discarded]
15 None --- --- --- [Discarded]
16 None --- --- --- [Discarded]


['Flight No.',
 'Date and time ( )',
 'Launch site',
 'Payload',
 'Payload mass',
 'Orbit',
 'Customer',
 'Launch outcome']

In [14]:
print(column_names)

['Flight No.', 'Date and time ( )', 'Launch site', 'Payload', 'Payload mass', 'Orbit', 'Customer', 'Launch outcome']


## TASK 3: Create a data frame by parsing the launch HTML tables


In [15]:
launch_dict = dict.fromkeys(column_names)
print(f'All Keys:\n{launch_dict} \n')

del launch_dict['Date and time ( )']
print(f'Removed \'Date and time ( )\':\n{launch_dict} \n')

launch_dict['Flight No.'] = []
launch_dict['Launch site'] = []
launch_dict['Payload'] = []
launch_dict['Payload mass'] = []
launch_dict['Orbit'] = []
launch_dict['Customer'] = []
launch_dict['Launch outcome'] = []

launch_dict['Version Booster'] = []
launch_dict['Booster landing'] = []
launch_dict['Date'] = []
launch_dict['Time'] = []

print(f'Full Launch Dictionary:\n{launch_dict}')

All Keys:
{'Flight No.': None, 'Date and time ( )': None, 'Launch site': None, 'Payload': None, 'Payload mass': None, 'Orbit': None, 'Customer': None, 'Launch outcome': None} 

Removed 'Date and time ( )':
{'Flight No.': None, 'Launch site': None, 'Payload': None, 'Payload mass': None, 'Orbit': None, 'Customer': None, 'Launch outcome': None} 

Full Launch Dictionary:
{'Flight No.': [], 'Launch site': [], 'Payload': [], 'Payload mass': [], 'Orbit': [], 'Customer': [], 'Launch outcome': [], 'Version Booster': [], 'Booster landing': [], 'Date': [], 'Time': []}


In [16]:
extracted_row = 0

for table_number,table in enumerate(soup.find_all('table',"wikitable plainrowheaders collapsible")):

    for rows in table.find_all("tr"):

        if rows.th:
            if rows.th.string:
                flight_number = rows.th.string.strip()
                flag = flight_number.isdigit()
        else:
            flag = False

        row = rows.find_all('td')

        if flag:
            extracted_row += 1

            datatimelist = date_time(row[0])
            launch_dict['Flight No.']+=[flight_number]

            date = datatimelist[0].strip(',')
            launch_dict['Date']+=[date]

            time = datatimelist[1]
            launch_dict['Time']+=[time]

            bv=booster_version(row[1])
            if not(bv):
                bv=row[1].a.string
            launch_dict['Version Booster']+=[bv]

            launch_site = row[2].a.string
            launch_dict['Launch site']+=[launch_site]

            payload = row[3].a.string
            launch_dict['Payload']+=[payload]

            payload_mass = get_mass(row[4])
            launch_dict['Payload mass']+=[payload_mass]

            orbit = row[5].a.string
            launch_dict['Orbit']+=[orbit]

            if str(row[6])[0:7] == '<td>Var':
                print('HERE !!', row[6])
                customer = 'Various'
                print(f'Customer Name: {customer}')
            elif str(row[6])[0:26] == '<td><a href="/wiki/Turkmen':
                print('HERE !!', row[6])
                customer = 'Turkmenistan National Space Agency'
                print(f'Customer Name: {customer}')
            else:
                print('>>>Test', row[6].a.string)
                customer = row[6].a.string
            launch_dict['Customer']+=[customer]

            launch_outcome = list(row[7].strings)[0]
            launch_dict['Launch outcome']+=[launch_outcome]

            booster_landing = landing_status(row[8])
            launch_dict['Booster landing']+=[booster_landing]

>>>Test SpaceX
>>>Test NASA
>>>Test NASA
>>>Test NASA
>>>Test NASA
>>>Test MDA
>>>Test SES
>>>Test Thaicom
>>>Test NASA
>>>Test Orbcomm
>>>Test AsiaSat
>>>Test AsiaSat
>>>Test NASA
>>>Test NASA
>>>Test USAF
>>>Test ABS
>>>Test NASA
HERE !! <td><a href="/wiki/Turkmenistan_National_Space_Agency" title="Turkmenistan National Space Agency">Turkmenistan National<br/>Space Agency</a><sup class="reference" id="cite_ref-95"><a href="#cite_note-95">[88]</a></sup>
</td>
Customer Name: Turkmenistan National Space Agency
>>>Test NASA
>>>Test Orbcomm
>>>Test NASA
>>>Test SES
>>>Test NASA
>>>Test SKY Perfect JSAT Group
>>>Test Thaicom
>>>Test ABS
>>>Test NASA
>>>Test SKY Perfect JSAT Group
>>>Test Iridium Communications
>>>Test NASA
>>>Test EchoStar
>>>Test SES
>>>Test NRO
>>>Test Inmarsat
>>>Test NASA
>>>Test Bulsatcom
>>>Test Iridium Communications
>>>Test Intelsat
>>>Test NASA
>>>Test NSPO
>>>Test USAF
>>>Test Iridium Communications
>>>Test SES S.A.
>>>Test KT Corporation
>>>Test NASA
>>>Test Iri

In [17]:
for count,i in enumerate(launch_dict):
    column_name = list(launch_dict.keys())[count]
    length_column = str(len(launch_dict[column_name]))
    print(f'{column_name}: {length_column}')

Flight No.: 121
Launch site: 121
Payload: 121
Payload mass: 121
Orbit: 121
Customer: 121
Launch outcome: 121
Version Booster: 121
Booster landing: 121
Date: 121
Time: 121


In [18]:
list(launch_dict.keys())

['Flight No.',
 'Launch site',
 'Payload',
 'Payload mass',
 'Orbit',
 'Customer',
 'Launch outcome',
 'Version Booster',
 'Booster landing',
 'Date',
 'Time']

In [19]:
df = pd.DataFrame(launch_dict)
df

Unnamed: 0,Flight No.,Launch site,Payload,Payload mass,Orbit,Customer,Launch outcome,Version Booster,Booster landing,Date,Time
0,1,CCAFS,Dragon Spacecraft Qualification Unit,0,LEO,SpaceX,Success\n,F9 v1.0B0003.1,Failure,4 June 2010,18:45
1,2,CCAFS,Dragon,0,LEO,NASA,Success,F9 v1.0B0004.1,Failure,8 December 2010,15:43
2,3,CCAFS,Dragon,525 kg,LEO,NASA,Success,F9 v1.0B0005.1,No attempt\n,22 May 2012,07:44
3,4,CCAFS,SpaceX CRS-1,"4,700 kg",LEO,NASA,Success\n,F9 v1.0B0006.1,No attempt,8 October 2012,00:35
4,5,CCAFS,SpaceX CRS-2,"4,877 kg",LEO,NASA,Success\n,F9 v1.0B0007.1,No attempt\n,1 March 2013,15:10
...,...,...,...,...,...,...,...,...,...,...,...
116,117,CCSFS,Starlink,"15,600 kg",LEO,SpaceX,Success\n,F9 B5B1051.10,Success,9 May 2021,06:42
117,118,KSC,Starlink,"~14,000 kg",LEO,SpaceX,Success\n,F9 B5B1058.8,Success,15 May 2021,22:56
118,119,CCSFS,Starlink,"15,600 kg",LEO,SpaceX,Success\n,F9 B5B1063.2,Success,26 May 2021,18:59
119,120,KSC,SpaceX CRS-22,"3,328 kg",LEO,NASA,Success\n,F9 B5B1067.1,Success,3 June 2021,17:29


<h3>Export DataFrame to .CSV</h3>

In [20]:
df.to_csv('spacex_web_scraped.csv', index=False)

<h3 style="width:100%;padding:12px;border-radius:4px;color:#fff;margin-top:256px;background:linear-gradient(90deg,#999,#333);box-shadow:0 0 0px #333;">Other Info</h3>

## Authors


<a href="https://www.linkedin.com/in/yan-luo-96288783/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDS0321ENSkillsNetwork26802033-2022-01-01">Yan Luo</a>


<a href="https://www.linkedin.com/in/nayefaboutayoun/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDS0321ENSkillsNetwork26802033-2022-01-01">Nayef Abou Tayoun</a>


## Change Log


| Date (YYYY-MM-DD) | Version | Changed By | Change Description          |
| ----------------- | ------- | ---------- | --------------------------- |
| 2021-06-09        | 1.0     | Yan Luo    | Tasks updates               |
| 2020-11-10        | 1.0     | Nayef      | Created the initial version |


Copyright © 2021 IBM Corporation. All rights reserved.
