# SpaceX Data Science Project

## Table of Contents
* [Introduction](#introduction)
* [Methods](#methods)
* [Results and Discussion](#results-and-discussion)

## Introduction
[SpaceX](https://www.spacex.com) is a privately owned aerospace manufacturer and space transportation company. In 2015 the company achieved it's first successful recovery of the first stage of the rocket when the [Falcon 9 flight 20](https://en.wikipedia.org/wiki/Falcon_9_flight_20) made a vertical landing after launch. This flight was part of a larger effort to develop a reusable launch system, which dramatically reduces cost and waste. Since this successful landing, there have been numerous other attempts, both successful and un-successful. The probability of a successful landing is likely due to a multitude of launch and rocket variables. Therefore, the objective of this project was to develop a model that can predict whether a launch will have a successful first stage landing.  

## Methods
### Data acquisition
Data were collected by calling the SpaceX API and scraping the [SpaceX Wikipedia page](https://en.wikipedia.org/wiki/SpaceX). Data were filtered to contain only hose relating to Falcon 9 launches and specific variables collected are listed below.

#### SpaceX API
* Flight Number - The number of the flight, with 1 indicating the first Falcon 9 flight
* Date - The date of the flight
* Booster Version - They type of booster uses in the launch
* Payload Mass- The mass of the payload in kg
* Orbit - The type of orbit the payload was going into (ex. Near Earth Orbit)
* Launch Site - The ID of the site the rocker was launched from
* Outcome - The outcome (success/failure) and the method of landing (ex. drone ship)
* Flights - The number of previous flights
* Grid Fins - Whether grid fins were present on the rocket
* Reused - Indicates if the rocket was used prior to this launch
* Legs - Indicates if the ship has landing legs
* Landing pad - Indicates the type of landing pad used by the first stage
* Block - The core block number
* Reused Count - The number of times the rocket has been reused
* Serial - The capsule serial
* Longitude - The longitude of the launch site
* Latitude - The latitude of the launch site

#### Scraping Wikipedia
* Flight Number - The number of the flight, with 1 being the first Falcon 9 launch
* Launch Site - The launch site of the rocket
* Payload - The payload the rocket was launching
* Payload mass - The mass of the payload in kg
* Orbit - They type of orbit that was intended (ex. Low Earth Orbit)
* Customer - The customer that SpaceX was launching the rocket for (ex. NASA)
* Launch outcome - The outcome of the launch (success/failure)
* Version - The version of the booster (F9)
* Booster - The version of the F9 booster used
* Booster landing - Indicates the success, failure, or lack of attempt at re-landing the booster
* Date - The date of lauch
* Time - The time of launch

### Data Cleaning
Data were checked for the presence of missing values. There were 5 instances in which `PayloadMass` was missing. This was dealt with by imputing the mean `PayloadMass` for these cases.

As described above, the landing outcome feature indicates both the success or failure of the landing as well as the type of landing. Therefore, a new feature, `Outcome` was created to indicate the success (1) or failure (0) of the landing. This was later used as the prediction target. 

### Feature Selection


The features `Orbit`, `LaunchSite`, `LandingPad`, and `Serial` were categorical variables. These were one-hot encoded and then merged with 