## Are We Alone in Space?
### Analyzing NASA’s Exoplanet Data with Python

---

In this project, we’ll explore relationships in real scientific data from NASA’s Exoplanet Archive.

Our goal is to practice exploratory data analysis (EDA): visualizing data, computing basic metrics, and interpreting patterns carefully.

---


### Goals

**1. Load the Data**  

**2. Practice Pure Python Skills**  

**3. Plot Relationships**  

**4. Compute Descriptive Statistics**  



### Questions

These are some of the questions we will tackle:

**1. Planet Size vs. Planet Mass**  
Common sense would suggest that larger planets also have more mass, but this might not be the case for gas planets. We will explore this relationship.

**2. Discovery Year vs. Planet Size**  
Sometimes relationships between 2 variables are more complex. There appears to be a correlation between discovery year and planets size (as better instruments might enable the discovery of smaller planets). We will analyze whether this is true.

**3. Discovery Methods**  
Different discovery methods have been used over the years. We will assess which one of them has found the most planets.

---
---
## Column descriptions 
Below are the columns we use in the notebook and what they represent.

- **pl_name:** Planet name (string)
- **pl_rade:** Planet radius in Earth radii (float)
- **pl_bmasse:** Planet mass in Earth masses (float)
- **discoverymethod:** Discovery method (categorical — e.g., Transit, Radial Velocity)
- **hostname:** Host star name (string)
- **disc_year:** Discovery year (datetime)


---
## 1. Load Data
In this section we import libraries and load the dataset using `pandas`. We’ll display the first few rows and some basic information about the dataset.

In [None]:
# ============================================================
# Import libraries and load the dataset
#
# ============================================================
from pathlib import Path

# Import that pandas library as pd


# Run this line to load in the data
df = pd.read_csv(Path(r'udacity_exoplanet.csv'))

# Print out the first 5 rows to verify it was loaded correctly


In [None]:
# ============================================================
# Inspect dataframe structure
# Instruction:# ============================================================

# Use a suitable pandas method to inspect column dtypes and missing values.


In [None]:
# ============================================================
# Examine numeric summaries
# Instruction:# ============================================================

# Use a suitable pandas method to inspect numeric columns and distributions (e.g. count, mean, median, mode, etc.)


In [None]:
# ============================================================
# Create sorted lists of unique planet names and host star names
#
# Instruction:
# Do NOT use pandas unique()/value_counts().
# Use only basic Python: lists, sets, sorted.
#
# Goal:
# Show the total unique counts and the first 20 names as samples.
# ============================================================


# -------------------------
# Part A: Planet names
# -------------------------

# Step: convert the `pl_name` column into a plain Python list


# Step: remove duplicates using a set, then sort the results into a list

# Step: print the total number of unique planets


# Step: print a small sample (first 20) of the unique planet names



In [None]:
# ============================================================
# Instruction:
# Do NOT use pandas unique()/value_counts().
# Use only basic Python: lists, sets, sorted.
# ============================================================

# Step: convert the `hostname` column into a plain Python list


# Step: remove duplicates and sort the host star names


# Step: print the total number of unique host stars


# Step: print a small sample (first 20) of the unique host star names


In [None]:
# ============================================================
# On average, how many planets does each star have?
#
# Instruction:
# Use only basic Python (no pandas aggregation).
#
# Hint:
# Each row represents one planet orbiting one host star.
# ============================================================

# Step: create a dictionary to count how many planets orbit each star


# Step: loop over each host star entry (one per planet)


    # Step: if the star is already in the dictionary, increment its count


    # Step: otherwise, add the star to the dictionary with a count of 1


# Step: compute the total number of planets


# Step: compute the total number of unique host stars


# Step: compute the average number of planets per star


# Step: print the result


In [None]:
# ============================================================
# Which THREE host stars have the MOST known planets?
#
# Instruction:
# Use only basic Python (lists, dictionaries, sorted).
# ============================================================

# Step: convert the dictionary into a list of (star, planet_count) pairs


# Step: sort the list by planet count (highest first)


# Step: select the top three stars


# Step: print the results



In [None]:
# ============================================================
# Histogram of planet radii
#
# Instruction:
# Create a histogram of planet radii using seaborn and matplotlib. Use log-scale if the distribution is skewed.
# Be sure to add a title and label the axes.
# ============================================================


# create a histogram of planet radii using seaborn


In [None]:
# ============================================================
# Top discovery methods (bar chart)
#
# Instruction:
# Count the top discovery methods and plot them as a horizontal bar chart.
# Assign y-variable to hue (required for palette)
# Hide legend (it would be redundant)
# Be sure to add a title and label the axes.
# ============================================================
# import the seaborn library as sns


# Create a bar chart of the top discovery methods







In [None]:
# ============================================================
# Scatter — radius vs mass
#
# Instruction:
# Use hvplot.pandas to make a scatter plot of planet radius vs planet mass. Optionally color by discovery method.
# Be sure to add a title and label the axes.
# ============================================================
# import the hvplot.pandas library


# Create interactive scatter plot using hvplot with planet names on hover


In [None]:
# ============================================================
# Boxplot — orbital periods
#
# Instruction:
# Create a boxplot of orbital periods (on a log scale) to inspect distribution and outliers.
# Be sure to add a title and label the axes.
# ============================================================
# import the hvplot.pandas library


# Create interactive box plot using hvplot


In [None]:
# ============================================================
# Correlation matrix
#
# Instruction:
# Use the pandas library to create a correlation matrix for key numeric variables, i.e. planet radius, mass, and orbital period
# to identify relationships between these planetary characteristics. Use a color gradient to 
# make the correlations easier to interpret.
# ============================================================
# import the pandas library as pd


In [None]:
# ============================================================
# Radius vs discovery year
#
# Instruction:
# Use hvplot pandas to create a scatter plot of planet radius versus discovery year to explore trends over time.
# Be sure to add a title and label the axes.
# ============================================================


# Create interactive scatter plot using hvplot with planet names on hover


In [None]:
# ============================================================
# Compute radius statistics by year
#
# Instruction:
# For each discovery year compute the **mean** **median** of planet radius (`pl_rade`) and sort the results by year in ascending order.
# Use pandas groupby and aggregations;
# Show the first 10 years as a sample
# ============================================================

# Compute mean and median of planet radius per discovery year



# Show the first 10 years as a sample


### Interactive visualization of radius statistics over time
Create an interactive plot using hvplot to visualize how mean, median, and mode planet radius change over discovery years. This interactive visualization allows you to zoom, pan, and hover to see exact values.

In [None]:
# ============================================================
# Interactive line plot of radius statistics over time
#
# Instruction:
# Create an interactive visualization using hvplot to show how mean and median planet radius 
# change over discovery years. Use hvplot.line() to create separate line plots for each statistic,
# then overlay them using the * operator. Add appropriate labels, colors, and formatting options.
# Be sure to add a title and label the axes.
# ============================================================

# Compute mean and median of planet radius per discovery year

# Create interactive line plots using hvplot


# Combine and overlay the plots


---
## 3D Visualization: Exploring Planet Characteristics
Create an interactive 3D scatter plot to visualize relationships between planet radius, mass, and orbital period simultaneously. This allows you to explore complex relationships between these fundamental planet characteristics and see how they vary by discovery method.

In [None]:
# ============================================================
# 3D scatter plot of planet characteristics
#
# Instruction:
# Create an interactive 3D scatter plot using plotly.express to visualize the relationships 
# between planet mass, radius, and orbital period simultaneously. Color the points by discovery 
# method and use logarithmic scales for all three axes to handle the wide range of values.
# Use fig.update_traces() to adjust marker size for better visibility.
# Be sure to add a title and label the axes.
# ============================================================
# import the library plotly.epxress as px


# Prepare the data by removing missing values


# Create the interactive 3D scatter plot using Plotly


# Make the dots smaller


# Show the interactive plot
