# SpaceX Falcon 9 First Stage Landing Prediction - Data Wrangling

## Objective
In this notebook, we will perform data wrangling on the SpaceX launch data to prepare it for analysis and machine learning modeling. The goal is to create a binary classification label that indicates whether the first stage landing was successful.

## Table of Contents
1. Import Libraries
2. Load and Explore Data
3. Analyze Launch Sites
4. Analyze Orbits
5. Analyze Landing Outcomes
6. Create Binary Classification Labels
7. Export Cleaned Data

---

## 1. Import Required Libraries
We'll start by importing the necessary libraries for data manipulation and analysis.

In [None]:
# Import required libraries
import pandas as pd  # For data manipulation and analysis
import numpy as np   # For numerical operations

---

## 2. Load and Explore the Dataset
Load the SpaceX launch data from the CSV file and display the first few rows to understand the data structure.

In [None]:
# Load the SpaceX launch data from CSV file
df = pd.read_csv('spacex_launch_data.csv')

# Display the first few rows to understand the data structure
df.head()

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude
0,6,2010-06-04,Falcon 9,6123.547647,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857
1,8,2012-05-22,Falcon 9,525.0,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0005,-80.577366,28.561857
2,10,2013-03-01,Falcon 9,677.0,ISS,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0007,-80.577366,28.561857
3,11,2013-09-29,Falcon 9,500.0,PO,VAFB SLC 4E,False Ocean,1,False,False,False,,1.0,0,B1003,-120.610829,34.632093
4,12,2013-12-03,Falcon 9,3170.0,GTO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B1004,-80.577366,28.561857


---

## 3. Analyze Launch Sites
Calculate and display the number of launches at each launch site to understand the distribution of launches across different locations.

In [None]:
# Calculate the number of launches on each site
# Apply value_counts() on 'LaunchSite' column to get the count of launches per site
launch_site_counts = df['LaunchSite'].value_counts()
launch_site_counts

LaunchSite
CCSFS SLC 40    55
KSC LC 39A      22
VAFB SLC 4E     13
Name: count, dtype: int64

---

## 4. Analyze Orbits
Calculate the number and occurrence of each orbit type to understand the variety of missions.

In [None]:
# Calculate the number and occurrence of each orbit
# Apply value_counts() on 'Orbit' column to see the distribution of orbit types
orbit_counts = df['Orbit'].value_counts()
orbit_counts

Orbit
GTO      27
ISS      21
VLEO     14
PO        9
LEO       7
SSO       5
MEO       3
ES-L1     1
HEO       1
SO        1
GEO       1
Name: count, dtype: int64

---

## 5. Analyze Landing Outcomes
Analyze the different landing outcomes to understand which outcomes indicate successful landings and which indicate failures.

In [None]:
# Calculate the number and occurrence of mission outcome of the orbits
# Get all unique landing outcomes and their frequencies
landing_outcomes = df['Outcome'].value_counts()
landing_outcomes

Outcome
True ASDS      41
None None      19
True RTLS      14
False ASDS      6
True Ocean      5
False Ocean     2
None ASDS       2
False RTLS      1
Name: count, dtype: int64

### 5.1 Examine Each Landing Outcome
Display each landing outcome with its index to help identify which outcomes represent failures.

In [None]:
# Iterate through each landing outcome with its index
# This helps us identify which outcomes represent failures
for i, outcome in enumerate(landing_outcomes.keys()):
    print(i, outcome)

0 True ASDS
1 None None
2 True RTLS
3 False ASDS
4 True Ocean
5 False Ocean
6 None ASDS
7 False RTLS


### 5.2 Identify Failed Landing Outcomes
Create a set of outcomes where the second stage did not land successfully. These will be classified as failures (Class = 0).

In [None]:
# Create a set of outcomes where the second stage did not land successfully
# Based on the enumeration above, indices [1,3,5,6,7] represent failed landings
# These outcomes include crashes, ocean landings, and controlled landings that were not successful
bad_outcomes = set(landing_outcomes.keys()[[1, 3, 5, 6, 7]])
bad_outcomes

{'False ASDS', 'False Ocean', 'False RTLS', 'None ASDS', 'None None'}

---

## 6. Create Binary Classification Labels
Create a new column 'Class' that represents whether the landing was successful (1) or failed (0). This will be our target variable for machine learning models.

In [None]:
# Create a landing outcome label from Outcome column
# Class = 0 if the outcome is in bad_outcomes (failed landing)
# Class = 1 if the outcome is not in bad_outcomes (successful landing)
df['Class'] = df['Outcome'].apply(lambda x: 0 if x in bad_outcomes else 1)

# Extract the landing class values as a numpy array
landing_class = df['Class'].values

### 6.1 Verify the Class Column
Display the first few rows of the Class column to verify that the binary labels have been created correctly.

In [None]:
# Assign the landing class to the Class column (already done above, but ensuring consistency)
df['Class'] = landing_class

# Display the first 8 rows of the Class column to verify the binary labels
df[['Class']].head(8)

Unnamed: 0,Class
0,0
1,0
2,0
3,0
4,0
5,0
6,1
7,1


### 6.2 View the Complete Dataset
Display the first 10 rows of the complete dataset with the new Class column included.

In [None]:
# Display the first 10 rows of the complete dataset
# This shows all columns including the newly created 'Class' column
df.head(10)

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude,Class
0,6,2010-06-04,Falcon 9,6123.547647,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857,0
1,8,2012-05-22,Falcon 9,525.0,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0005,-80.577366,28.561857,0
2,10,2013-03-01,Falcon 9,677.0,ISS,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0007,-80.577366,28.561857,0
3,11,2013-09-29,Falcon 9,500.0,PO,VAFB SLC 4E,False Ocean,1,False,False,False,,1.0,0,B1003,-120.610829,34.632093,0
4,12,2013-12-03,Falcon 9,3170.0,GTO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B1004,-80.577366,28.561857,0
5,13,2014-01-06,Falcon 9,3325.0,GTO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B1005,-80.577366,28.561857,0
6,14,2014-04-18,Falcon 9,2296.0,ISS,CCSFS SLC 40,True Ocean,1,False,False,True,,1.0,0,B1006,-80.577366,28.561857,1
7,15,2014-07-14,Falcon 9,1316.0,LEO,CCSFS SLC 40,True Ocean,1,False,False,True,,1.0,0,B1007,-80.577366,28.561857,1
8,16,2014-08-05,Falcon 9,4535.0,GTO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B1008,-80.577366,28.561857,0
9,17,2014-09-07,Falcon 9,4428.0,GTO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B1011,-80.577366,28.561857,0


---

## 7. Export the Cleaned Dataset
Save the cleaned and processed dataframe to a CSV file for use in subsequent analysis and modeling steps.

In [20]:
# Save the cleaned dataframe to a CSV file
# index=False prevents pandas from writing row numbers to the file
df.to_csv('spacex_launch_data_clean.csv', index=False)
print("Data successfully saved to 'spacex_launch_data_clean.csv'")

Data successfully saved to 'spacex_launch_data_clean.csv'


---

## Summary
In this data wrangling notebook, we have:
1. ✅ Loaded the SpaceX launch data
2. ✅ Analyzed the distribution of launches across different launch sites
3. ✅ Examined the variety of orbit types in the dataset
4. ✅ Identified and categorized landing outcomes
5. ✅ Created a binary classification label (Class: 0 = Failed, 1 = Success)
6. ✅ Exported the cleaned dataset for further analysis

The cleaned dataset is now ready for exploratory data analysis, visualization, and machine learning modeling.