## Are We Alone in Space?
### Analyzing NASA’s Exoplanet Data with Python

---

In this project, we’ll explore relationships in real scientific data from NASA’s Exoplanet Archive.

Our goal is to practice exploratory data analysis (EDA): visualizing data, computing basic metrics, and interpreting patterns carefully.

---


### Goals

**1. Load the Data**  

**2. Practice Pure Python Skills**  

**3. Plot Relationships**  

**4. Compute Descriptive Statistics**  



### Questions

These are some of the questions we will tackle:

**1. Planet Size vs. Planet Mass**  
Common sense would suggest that larger planets also have more mass, but this might not be the case for gas planets. We will explore this relationship.

**2. Discovery Year vs. Planet Size**  
Sometimes relationships between 2 variables are more complex. There appears to be a correlation between discovery year and planets size (as better instruments might enable the discovery of smaller planets). We will analyze whether this is true.

**3. Discovery Methods**  
Different discovery methods have been used over the years. We will assess which one of them has found the most planets.

---

## Column descriptions 
Below are the columns we use in the notebook and what they represent.

- **pl_name:** Planet name (string)
- **pl_rade:** Planet radius in Earth radii (float)
- **pl_bmasse:** Planet mass in Earth masses (float)
- **discoverymethod:** Discovery method (categorical — e.g., Transit, Radial Velocity)
- **hostname:** Host star name (string)
- **disc_year:** Discovery year (datetime)


---
## 1. Load Data
In this section we import libraries and load the dataset using `pandas`. We’ll display the first few rows and some basic information about the dataset.

In [None]:
# ============================================================
# Import libraries and load the dataset
#
# ============================================================
from pathlib import Path

# Import that pandas library as pd


# Run this line to load in the data
df = pd.read_csv(Path(r'udacity_exoplanet.csv'))

# Print out the first 5 rows to verify it was loaded correctly


## 2. Inspect the DataFrame
Use a suitable pandas method to inspect column dtypes and missing values.

In [1]:
#TODO:  Inspect the dataframe

## 3. Examine Numeric Summaries

Use a suitable pandas method to inspect numeric columns and distributions (e.g. count, mean, median, mode, etc.)


In [2]:
#TODO: Examine Numeric Summaries

## 4. Create a Sorted List of Unique Planet and Host Star Names

**Instruction:**
- Do NOT use pandas unique()/value_counts().
- Use only basic Python: lists, sets, sorted.

**Goal:**
Show the total unique counts and the first 20 names as samples.

### Part A - Follow these steps for the Planet Names

- Step1: convert the `pl_name` column into a plain Python list


- Step2: remove duplicates using a set, then sort the results into a list

- Step3: print the total number of unique planets


- Step4: print a small sample (first 20) of the unique planet names

In [3]:
#TODO:  Create a sample of 20 unique planet names

# Step1: convert the `pl_name` column into a plain Python list
raw_names = df['pl_name'].tolist()

### Part B - Follow these steps for the Host Star Names

- Step1: convert the `hostname` column into a plain Python list


- Step2: remove duplicates and sort the host star names


- Step3: print the total number of unique host stars


- Step4: print a small sample (first 20) of the unique host star names

In [4]:
#TODO:  Create a sample of 20 unique host star names

# Step1: convert the `hostname` column into a plain Python list
raw_hosts = df['hostname'].tolist()

## 5. On average, how many planets does each star have?

**Instruction:**
Use only basic Python (no pandas or polar aggregations)

**Hint**
Each row represents one planet orbiting one host star. Starter code for
Steps 1 and 2 have been given to you. 

- Step1: create a dictionary to count how many planets orbit each star

- Step2: loop over each host star entry (one per planet)
    - If star is in the dictionary, increment its count otherwise, add the star to the dictionary with a count of 1

- Step3: compute the total number of planets

- Step4: compute the total number of unique host stars

- Step5: compute the average number of planets per star and print the result

In [None]:
#TODO:  Calculate on average, how many planets does each star have

# Step1: create a dictionary to count how many planets orbit each star
planets_per_star = {}

# Step2: loop over each host star entry -- finish this step's code
for host in raw_hosts:
    None


## 6. Which THREE host stars have the MOST known planets?
**Instruction:**
Use only basic Python (lists, dictionaries, sorted).

- Step1: convert the dictionary into a list of (star, planet_count) pairs


- Step2: sort the list by planet count (highest first)


- Step3: select the top three stars


- Step4: print the results


In [5]:
#TODO: Determine the three host stars that have the most planets

# Step1: convert the dictionary into a list of (star, planet_count) pairs
star_planet_pairs = list(planets_per_star.items())


## 7. Visual Inspection

### Histogram of planet radii
**Instruction:**

Create a histogram of planet radii using seaborn and matplotlib. Use log-scale if the distribution is skewed. Be sure to add a title and label the axes.

In [None]:
#TODO: Create a histogram of planet radii using seaborn

### Bar Chart of Top Discovery Methods 
**Instruction:**

Count the top discovery methods and plot them as a horizontal bar chart.
- Assign y-variable to hue (required for palette)
- Hide legend (it would be redundant)
- Be sure to add a title and label the axes.

In [None]:
#TODO: Create a bar chart of the top discovery methods

### Scatter plot of Radius vs. Mass
**Instruction:**
Use hvplot.pandas to make an interactive scatter plot of planet radius vs planet mass. 
- Optionally color by discovery method.
- Be sure to add a title and label the axes.

In [None]:
#TODO: Create interactive scatter plot using hvplot with planet names on hover

### Boxplot of Orbital Periods
**Instruction:**
Create an interactive boxplot using hvplot.pandas of orbital periods (on a log scale) to inspect distribution and outliers.Be sure to add a title and label the axes.

In [None]:
#TODO: Create interactive box plot using hvplot

### Scatter plot of Radius vs. Discovery Year

**Instruction:**
Use hvplot pandas to create a scatter plot of planet radius versus discovery year to explore trends over time. Be sure to add a title and label the axes.

In [6]:
#TODO: Create an interactive scatter plot using hvplot with planet names on hover


## 8. Variable Relationships and Statistics

### Correlation Matrix of Numeric Variables 

**Instruction:**
Use the pandas library to create **one** correlation matrix for the numeric variables of planet radius and mass to identify the relationship between these two variables. 


In [None]:
#TODO:  Create correlation matrix

### Create a scatter plot of radius versus discovery year to explore trends over time 

**Instruction:**
- Use hvplot pandas to create a scatter plot of radius versus discovery year 
- Be sure to add a title and label the axes 

In [None]:
#TODO: Create a scatter of planet radius per discovery year

### Compute radius statistics (mean and median) by discovery year

**Instruction:**
- For each discovery year compute the **mean** and **median** of planet radius (`pl_rade`) 
- Sort the results by year in ascending order.
- Use pandas groupby and aggregations
- Show the first 10 years as a sample

In [7]:
#TODO: Compute mean and median of planet radius per discovery year




#Show the first 10 years as a sample


## 9. Interactive visualization of radius statistics over time
Create an interactive plot using hvplot to visualize how mean and median planet radius change over discovery years. This interactive visualization allows you to zoom, pan, and hover to see exact values.

- Use hvplot.line() to create separate line plots for each statistic,then overlay them using the * operator. 
- Add appropriate labels, colors, and formatting options.
- Be sure to add a title and label the axes.

In [None]:
#TODO:  Create Interactive Line Plots

radius_col = 'pl_rade'
year_col = 'disc_year'

#compute mean and median of planet radius per discovery year 

#create interactive plots using hvplot 

#combine and overlay the plots

## 10.  3D Visualization: Exploring Planet Characteristics

Create an interactive 3D scatter plot using plotly.express package to visualize relationships between planet radius, mass, and orbital period simultaneously. This allows you to explore complex relationships between these fundamental planet characteristics and see how they vary by discovery method.

- Be sure to import the plotly.express library 
- Prepare the data by removing duplicates before creating the scatter plot 
- Color the points by discovery method 
- Use logarithmic scales for all three axes to handle the wide range of values
- Use fig.update_traces() to adjust marker size for better visibility.
- Be sure to add a title and label the axes.

In [None]:
#TODO: Create Interactive Plot in plotly.express