## üìã Analysis Questions

1. **Which launch site has the largest number of successful launches?**
2. **Which launch site has the highest launch success rate?**
3. **Which payload range has the highest launch success rate?**
4. **Which payload range has the lowest launch success rate?**
5. **Which F9 Booster version has the highest launch success rate?**

## üì¶ Import Libraries

In [1]:
import pandas as pd
import numpy as np

# Display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)

## üìÇ Load Data

In [2]:
# Read the SpaceX launch data
df = pd.read_csv('spacex_launch_data_clean.csv')

print(f"Dataset loaded: {len(df)} launch records")
print(f"Columns: {', '.join(df.columns)}")

Dataset loaded: 90 launch records
Columns: FlightNumber, Date, BoosterVersion, PayloadMass, Orbit, LaunchSite, Outcome, Flights, GridFins, Reused, Legs, LandingPad, Block, ReusedCount, Serial, Longitude, Latitude, Class


### Dataset Overview

In [3]:
# Display first few rows
df.head()

Unnamed: 0,FlightNumber,Date,BoosterVersion,PayloadMass,Orbit,LaunchSite,Outcome,Flights,GridFins,Reused,Legs,LandingPad,Block,ReusedCount,Serial,Longitude,Latitude,Class
0,6,2010-06-04,Falcon 9,6123.547647,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0003,-80.577366,28.561857,0
1,8,2012-05-22,Falcon 9,525.0,LEO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0005,-80.577366,28.561857,0
2,10,2013-03-01,Falcon 9,677.0,ISS,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B0007,-80.577366,28.561857,0
3,11,2013-09-29,Falcon 9,500.0,PO,VAFB SLC 4E,False Ocean,1,False,False,False,,1.0,0,B1003,-120.610829,34.632093,0
4,12,2013-12-03,Falcon 9,3170.0,GTO,CCSFS SLC 40,None None,1,False,False,False,,1.0,0,B1004,-80.577366,28.561857,0


In [4]:
# Dataset info
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 90 entries, 0 to 89
Data columns (total 18 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   FlightNumber    90 non-null     int64  
 1   Date            90 non-null     object 
 2   BoosterVersion  90 non-null     object 
 3   PayloadMass     90 non-null     float64
 4   Orbit           90 non-null     object 
 5   LaunchSite      90 non-null     object 
 6   Outcome         90 non-null     object 
 7   Flights         90 non-null     int64  
 8   GridFins        90 non-null     bool   
 9   Reused          90 non-null     bool   
 10  Legs            90 non-null     bool   
 11  LandingPad      64 non-null     object 
 12  Block           90 non-null     float64
 13  ReusedCount     90 non-null     int64  
 14  Serial          90 non-null     object 
 15  Longitude       90 non-null     float64
 16  Latitude        90 non-null     float64
 17  Class           90 non-null     int64

In [5]:
# Basic statistics
print(f"Total Launches: {len(df)}")
print(f"Successful Launches: {df['Class'].sum()}")
print(f"Failed Launches: {len(df) - df['Class'].sum()}")
print(f"Overall Success Rate: {(df['Class'].mean() * 100):.2f}%")

Total Launches: 90
Successful Launches: 60
Failed Launches: 30
Overall Success Rate: 66.67%


---

## üéØ Question 1: Which site has the LARGEST SUCCESSFUL LAUNCHES?

We'll count the total number of successful launches (Class = 1) at each launch site.

In [6]:
# Count successful launches by site
success_by_site = df[df['Class'] == 1].groupby('LaunchSite').size().sort_values(ascending=False)

print("Successful Launches by Site:")
print("=" * 50)
print(success_by_site)
print("\n" + "=" * 50)
print(f"‚úì Answer: {success_by_site.idxmax()} with {success_by_site.max()} successful launches")

Successful Launches by Site:
LaunchSite
CCSFS SLC 40    33
KSC LC 39A      17
VAFB SLC 4E     10
dtype: int64

‚úì Answer: CCSFS SLC 40 with 33 successful launches


---

## üìä Question 2: Which site has the HIGHEST LAUNCH SUCCESS RATE?

Success rate = (Successful Launches / Total Launches) √ó 100%

In [7]:
# Calculate success rate by site
success_rate = df.groupby('LaunchSite').agg({
    'Class': ['sum', 'count', 'mean']
}).round(4)

success_rate.columns = ['Successful', 'Total', 'Success_Rate']
success_rate['Success_Rate_Pct'] = (success_rate['Success_Rate'] * 100).round(2)
success_rate = success_rate.sort_values('Success_Rate', ascending=False)

print("Launch Success Rate by Site:")
print("=" * 70)
print(success_rate)
print("\n" + "=" * 70)
print(f"‚úì Answer: {success_rate.index[0]} with {success_rate['Success_Rate_Pct'].iloc[0]}% success rate")

Launch Success Rate by Site:
              Successful  Total  Success_Rate  Success_Rate_Pct
LaunchSite                                                     
KSC LC 39A            17     22        0.7727             77.27
VAFB SLC 4E           10     13        0.7692             76.92
CCSFS SLC 40          33     55        0.6000             60.00

‚úì Answer: KSC LC 39A with 77.27% success rate


---

## üì¶ Questions 3 & 4: Which payload range has the HIGHEST and LOWEST launch success rate?

We'll categorize payload mass into ranges and analyze success rates for each range.

In [8]:
# Create payload range bins
df['PayloadRange'] = pd.cut(
    df['PayloadMass'], 
    bins=[0, 2000, 4000, 6000, 8000, 10000, 15000], 
    labels=['0-2000', '2000-4000', '4000-6000', '6000-8000', '8000-10000', '10000+']
)

# Analyze success rate by payload range
payload_analysis = df.groupby('PayloadRange', observed=True).agg({
    'Class': ['sum', 'count', 'mean']
}).round(4)

payload_analysis.columns = ['Successful', 'Total', 'Success_Rate']
payload_analysis['Success_Rate_Pct'] = (payload_analysis['Success_Rate'] * 100).round(2)
payload_analysis = payload_analysis.sort_values('Success_Rate', ascending=False)

print("Launch Success Rate by Payload Range (kg):")
print("=" * 70)
print(payload_analysis)
print("\n" + "=" * 70)
print(f"‚úì HIGHEST success rate: {payload_analysis.index[0]} kg with {payload_analysis['Success_Rate_Pct'].iloc[0]}% success rate")
print(f"‚úì LOWEST success rate: {payload_analysis.index[-1]} kg with {payload_analysis['Success_Rate_Pct'].iloc[-1]}% success rate")

Launch Success Rate by Payload Range (kg):
              Successful  Total  Success_Rate  Success_Rate_Pct
PayloadRange                                                   
10000+                 2      2        1.0000            100.00
8000-10000             7      8        0.8750             87.50
2000-4000             19     27        0.7037             70.37
0-2000                 7     12        0.5833             58.33
4000-6000              8     16        0.5000             50.00
6000-8000              6     12        0.5000             50.00

‚úì HIGHEST success rate: 10000+ kg with 100.0% success rate
‚úì LOWEST success rate: 6000-8000 kg with 50.0% success rate


---

## üöÄ Question 5: Which F9 Booster version has the HIGHEST launch success rate?

We'll analyze the success rate for each Falcon 9 booster version used across launches.

In [9]:
# Analyze success rate by booster version
booster_analysis = df.groupby('BoosterVersion').agg({
    'Class': ['sum', 'count', 'mean']
}).round(4)

booster_analysis.columns = ['Successful', 'Total', 'Success_Rate']
booster_analysis['Success_Rate_Pct'] = (booster_analysis['Success_Rate'] * 100).round(2)
booster_analysis = booster_analysis.sort_values('Success_Rate', ascending=False)

print("Launch Success Rate by Booster Version:")
print("=" * 70)
print(booster_analysis)
print("\n" + "=" * 70)
print(f"‚úì Answer: {booster_analysis.index[0]} with {booster_analysis['Success_Rate_Pct'].iloc[0]}% success rate")

Launch Success Rate by Booster Version:
                Successful  Total  Success_Rate  Success_Rate_Pct
BoosterVersion                                                   
Falcon 9                60     90        0.6667             66.67

‚úì Answer: Falcon 9 with 66.67% success rate


---

## üìà Key Findings Summary

### üèÜ Top Performers:
1. **Best Site by Total Success**: Identified the site with most successful launches
2. **Best Site by Success Rate**: Found the most reliable launch site
3. **Optimal Payload Range**: Determined which payload weights have best success rates
4. **Best Booster Version**: Identified the most reliable Falcon 9 configuration

### üí° Insights:
- Launch success rates vary significantly across different sites
- Payload mass impacts landing success probability
- Booster version evolution shows improvement in reliability over time
- Certain payload ranges show higher success rates than others

### üéØ Recommendations:
- Prioritize launch sites with higher success rates for critical missions
- Consider payload optimization within high-success ranges
- Use latest booster versions for improved landing probability