# Introduction

<center><img src="https://i.imgur.com/9hLRsjZ.jpg" height=400></center>

This dataset was scraped from [nextspaceflight.com](https://nextspaceflight.com/launches/past/?page=1) and includes all the space missions since the beginning of Space Race between the USA and the Soviet Union in 1957!

### Install Package with Country Codes

In [1]:
%pip install iso3166

Collecting iso3166
  Downloading iso3166-2.0.2-py3-none-any.whl (8.5 kB)
Installing collected packages: iso3166
Successfully installed iso3166-2.0.2


### Upgrade Plotly

Run the cell below if you are working with Google Colab.

In [2]:
%pip install --upgrade plotly

Collecting plotly
  Downloading plotly-5.3.1-py2.py3-none-any.whl (23.9 MB)
[K     |████████████████████████████████| 23.9 MB 13 kB/s 
Collecting tenacity>=6.2.0
  Downloading tenacity-8.0.1-py3-none-any.whl (24 kB)
Installing collected packages: tenacity, plotly
  Attempting uninstall: plotly
    Found existing installation: plotly 4.4.1
    Uninstalling plotly-4.4.1:
      Successfully uninstalled plotly-4.4.1
Successfully installed plotly-5.3.1 tenacity-8.0.1


### Import Statements

In [3]:
import numpy as np
import pandas as pd
import plotly.express as px
import matplotlib.pyplot as plt
import seaborn as sns

# These might be helpful:
from iso3166 import countries
from datetime import datetime, timedelta

### Notebook Presentation

In [4]:
pd.options.display.float_format = '{:,.2f}'.format

### Load the Data

In [102]:
df_data = pd.read_csv('mission_launches.csv')

# Preliminary Data Exploration


In [103]:
df_data.shape

(4324, 9)

In [104]:
df_data_clean = df_data.dropna()

In [105]:
df_data_clean.head()

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,Organisation,Location,Date,Detail,Rocket_Status,Price,Mission_Status
0,0,0,SpaceX,"LC-39A, Kennedy Space Center, Florida, USA","Fri Aug 07, 2020 05:12 UTC",Falcon 9 Block 5 | Starlink V1 L9 & BlackSky,StatusActive,50.0,Success
1,1,1,CASC,"Site 9401 (SLS-2), Jiuquan Satellite Launch Ce...","Thu Aug 06, 2020 04:01 UTC",Long March 2D | Gaofen-9 04 & Q-SAT,StatusActive,29.75,Success
3,3,3,Roscosmos,"Site 200/39, Baikonur Cosmodrome, Kazakhstan","Thu Jul 30, 2020 21:25 UTC",Proton-M/Briz-M | Ekspress-80 & Ekspress-103,StatusActive,65.0,Success
4,4,4,ULA,"SLC-41, Cape Canaveral AFS, Florida, USA","Thu Jul 30, 2020 11:50 UTC",Atlas V 541 | Perseverance,StatusActive,145.0,Success
5,5,5,CASC,"LC-9, Taiyuan Satellite Launch Center, China","Sat Jul 25, 2020 03:13 UTC","Long March 4B | Ziyuan-3 03, Apocalypse-10 & N...",StatusActive,64.68,Success


## Data Cleaning - Check for Missing Values and Duplicates


In [106]:
df_data_clean = df_data_clean.drop_duplicates(subset=['Organisation','Detail','Date'])

In [107]:
df_data_clean.shape

(963, 9)

## Descriptive Statistics

In [108]:
df_data_clean.Price = df_data_clean.Price.astype(str).str.replace('$', "")
df_data_clean.Price = pd.to_numeric(df_data_clean.Price, errors='ignore')

df_data_clean.sort_values('Price', ascending=True).head()

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,Organisation,Location,Date,Detail,Rocket_Status,Price,Mission_Status
3683,3683,3683,NASA,"LC-39A, Kennedy Space Center, Florida, USA","Thu Apr 04, 1968 12:00 UTC",Saturn V | Apollo 6,StatusRetired,1160.0,Partial Failure
3149,3149,3149,NASA,"LC-39A, Kennedy Space Center, Florida, USA","Mon May 14, 1973 17:30 UTC",Saturn V | Skylab 1,StatusRetired,1160.0,Success
3180,3180,3180,NASA,"LC-39A, Kennedy Space Center, Florida, USA","Tue Dec 19, 1972 19:24 UTC",Saturn V | Apollo 17,StatusRetired,1160.0,Success
3243,3243,3243,NASA,"LC-39A, Kennedy Space Center, Florida, USA","Sun Apr 16, 1972 17:54 UTC",Saturn V | Apollo 16,StatusRetired,1160.0,Success
3384,3384,3384,NASA,"LC-39A, Kennedy Space Center, Florida, USA","Sun Jan 31, 1971 21:03 UTC",Saturn V | Apollo 14,StatusRetired,1160.0,Success


# Number of Launches per Company


In [16]:
organisations = df_data_clean.Organisation.value_counts()
organisations

CASC               157
NASA               149
SpaceX              99
ULA                 98
Arianespace         96
Northrop            83
ISRO                67
MHI                 37
VKS RF              33
US Air Force        26
Roscosmos           23
Kosmotras           22
Eurockot            13
Rocket Lab          13
ILS                 13
Martin Marietta      9
Lockheed             8
Boeing               7
JAXA                 3
RVSN USSR            2
Sandia               1
EER                  1
Virgin Orbit         1
ESA                  1
ExPace               1
Name: Organisation, dtype: int64

In [17]:
fig = px.pie(labels=organisations.index,
  values=organisations.values,
  title="Launches per Company",
  names=organisations.index,
  hole=0.6,
)
fig.update_traces(textposition='inside', textfont_size=15, textinfo='percent')
 
fig.show()

# Number of Active versus Retired Rockets


In [18]:
status = df_data_clean.Rocket_Status.value_counts()
status

StatusActive     585
StatusRetired    378
Name: Rocket_Status, dtype: int64

In [56]:
df_active_vs_retired = df_data_clean.groupby(
    ["Organisation","Rocket_Status"], as_index=False
  ).agg(
      {'Price': pd.Series.count}
  ).rename(columns={'Price': 'count'})

df_active_vs_retired.sort_values('count')

Unnamed: 0,Organisation,Rocket_Status,count
32,Virgin Orbit,StatusActive,1
2,Boeing,StatusActive,1
24,Sandia,StatusActive,1
5,EER,StatusRetired,1
6,ESA,StatusActive,1
8,ExPace,StatusActive,1
21,RVSN USSR,StatusRetired,2
1,Arianespace,StatusRetired,3
12,JAXA,StatusActive,3
3,Boeing,StatusRetired,6


In [53]:
g_bar = px.bar(df_active_vs_retired, 
               x='Organisation', 
               y='count',
               title='Number of Active versus Retired Rockets',
               color='Rocket_Status', 
               barmode='group',
              )

g_bar.update_layout(xaxis_title='Organisation',
                    yaxis_title='Rocket Number',
                    xaxis={'categoryorder':'total descending'},
                    yaxis=dict(type='log'),
                    )

g_bar.show()

# How Expensive are the Launches? 
 

In [83]:
top10_category = df_data_clean.Price.value_counts()[:10]
top10_category

450.0    136
200.0     75
40.0      55
62.0      41
30.8      38
109.0     37
50.0      34
64.68     34
90.0      32
29.75     32
Name: Price, dtype: int64

In [84]:
bar = px.bar(
        x = top10_category.index,
        y = top10_category.values)

bar.show()