# Assessment 11 - Python 4 and DAV 2/3

The objective of this assessment is to evaluate your understanding of Python programming for data analysis using external libraries such as pandas, matplotlib, seaborn, and numpy as well as working with python environments and modules. You will be working with a dataset containing information about stars, and you will be required to perform various data analysis tasks for cleaning and preparation and to gain insights into the characteristics and relationships between different types of stars.

### Dataset Description

The dataset contains the following columns:

- **Temperature (K)** = represents the surface temperature of the star measured in Kelvin (K). 
- **Luminosity(L/Lo)** = is a measure of the total amount of energy emitted by a star per unit time. In this column, luminosity is expressed relative to the Sun’s luminosity (Lo). A star with a luminosity of 10 Lo is ten times as bright as the Sun.
- **Radius(R/Ro)** = This measures the star’s size relative to the radius of the Sun (Ro). For instance, a star with a radius of 0.5 Ro is half the size of the Sun.
- **Apparent Magnitude(m)** = is a measure of the brightness of a star as seen from Earth. The scale is logarithmic and inversely proportional, meaning that lower values correspond to brighter stars, and higher values to dimmer ones.
- **Absolute magnitude(Mv)** = measures the intrinsic brightness of a celestial object. It is the hypothetical apparent magnitude (brightness from Earth) of the object if it were placed 10 parsecs (about 32.6 light years) away from Earth. This is used to compare the true brightness of stars regardless of their distance from Earth.
- **Star color** (white, Red, Blue, Yellow, yellow-orange, etc.) = The color of a star is directly linked to its surface temperature. Cooler stars appear red, while hotter stars appear blue. Common colors listed in astronomy include white, red, blue, yellow, and yellow-orange. These colors can provide immediate visual clues about a star’s temperature.
- **Spectral Class** (O, B, A, F, G, K, M) = This classification is based on the absorption lines in a star's spectrum, which correspond to surface temperature. The main spectral classes from hottest to coolest are O, B, A, F, G, K, M. Each class can also have subcategories indicating temperature and other spectral features.
- **Star type** (Red Dwarf, Brown Dwarf, White Dwarf, Main Sequence, SuperGiants, HyperGiants) = This column categorizes stars based on their evolutionary stage and other physical characteristics:
   -  *Red Dwarf*: Small, cool, long-lived stars on the main sequence.
    - *Brown Dwarf*: 'Failed' stars that do not have enough mass to sustain nuclear fusion.
    - *White Dwarf*: Very hot, small, dense remnants of stars that have exhausted their nuclear fuel.
    - *Main Sequence*: Stars that are currently fusing hydrogen into helium in their cores, including our Sun.
    - *SuperGiants*: Extremely large and luminous stars, much larger than the Sun, at a later stage of their evolution.
    - *HyperGiants*: Rare, extremely massive and luminous stars, exhibiting high rates of mass loss.

## 1. Create your environment:
- Open your terminal, check your conda version and update it
- Create your environment named 'star_env' and install Python (version 3.7)
- Activate your environment 'star_env' and install numpy, matplotlib, seaborn and pandas to the environment.
- Check the packages that are installed in 'star_env'

**Take a screenshot of your terminal with all the commands shown**

## 2. Data Loading and Exploration
Load the dataset ***star_dataset.csv*** into a pandas DataFrame, display the first rows of your dataframe and diaplay it's shape.

In [1]:
#Your code goes here

## 3. Data Cleaning and Preparation
Remove rows that contain missing values:

***Hint***: you can use a new function called *dropna()*

In [2]:
#Your code goes here

Identify and remove duplicate rows from the dataframe and display the new ***shape*** of your new dataframe:

In [3]:
#Your code goes here

## 4. Data Reshaping 
Complete the code below to perfom Pivoting (Long to Wide Format) data reshaping of the dataframe and display the result:

In [None]:
data = {
    'Star Name': [
        'Sirius', 'Alpha Centauri', 'Proxima Centauri', 'Betelgeuse', 'Vega', 'Aldebaran', 'Antares',
        'Sirius', 'Alpha Centauri', 'Proxima Centauri', 'Betelgeuse', 'Vega', 'Aldebaran', 'Antares'
    ],
    'Attribute': [
        'Distance (light years)', 'Distance (light years)', 'Distance (light years)', 'Distance (light years)', 
        'Distance (light years)', 'Distance (light years)', 'Distance (light years)',
        'Luminosity (L/Lo)', 'Luminosity (L/Lo)', 'Luminosity (L/Lo)', 'Luminosity (L/Lo)', 
        'Luminosity (L/Lo)', 'Luminosity (L/Lo)', 'Luminosity (L/Lo)'
    ],
    'Value': [
        8.6, 4.37, 4.24, 642.5, 25, 65.1, 550,
        25.4, 1.519, 0.00156, 100000, 50, 400, 13000
    ]
}
# Create the DataFrame from the dictionary
star_df = pd.DataFrame(data)
print("Original Dataframe:\n", star_df)

# Pivoting the DataFrame
#Your code goes here

# Display the pivoted DataFrame
#Your code goes here


# 5. Modules and Calculations

Download the module named `star_calculations_module.py` and complete the functions for the following calculations:

**distance_from_earth_parsecs(Mv,m):**
   - Calculate the distance ***(in parsecs)*** between each star and Earth based on their Absolute and Apparent Magnitude using the formula:
     
     **distance_parsecs = 10 ^ ((m - Mv + 5)/5)**

     Where:
     - ***m*** is the apparent magnitude
     - ***Mv*** is the absolute magnitude
**parsecs_to_lightYears(distance_parsec)**:
  - Convert the distance calculated in the previous function from ***parsecs*** units to ***light years*** units

    Where:
    
    ***1 Parsec = 3.2616 light years***

This functions will be implemented in the `star_calculations_module.py` module. Implement these calculations using **Python** and **NumPy**.


Once the module is completed, complete the following instructions:
- Import the `star_calculations_module.py` module into your python environment. (You may have to restart the kernel for it to work)
- Create a new column called 'Distance (Light Years)' in your DataFrame and calculate the distance of each star from earth in parsecs units and convert it to light years units using the module's functions.

**Note:** Add the following line of code to reload the module to ensure any recent changes are recognized after importing (change 'module_name' to the name of your module):

*import importlib*

*# Reload the module to ensure any recent changes are recognized*

*importlib.reload(module_name)*
    

In [5]:
#Your code goes here

Print and save the dataframe showing the new columns:

In [6]:
#Your code goes here

## 6. Data Visualization
Create a new folder called 'Figures' that will be used to store all the graphs created and saved in this exercise. 
For each graph you will include descriptive titles, axis names and legends names(if applicable).

1. Create and save a bar chart for 'Spectral Class'

In [7]:
#Your code goes here

2. Create and save a pie chart for 'Star type'

In [8]:
#Your code goes here

3. Create and save a box plot using seaborn to provide a summary of the distribution and spread of data for Spectral Class vs Temperature (K).

Provide a short analysis of the box plot, noting if any outliers are visible. Explain the meaning of the varying sizes and positions of the boxes. Which Spectral Class have higher/lower temperatures?

In [9]:
#Your code goes here

**[Your analysis goes here]**

4. Create a bar plot showing the median distance of each star type from Earth.

Write a short analysis explaining which types of stars are closest/farthest from earth.


In [10]:
#Your code goes here

**[Your analysis goes here]**

5. **(BONUS FOR FUN!)** Create an ***interactive*** scatter plot using a new library called *plotly.express* that compares the Absolute magnitude(Mv) and Apparent Magnitude (m) grouped by Star type. Complete the code provided below:

   Where
   - x = Absolute magnitude(Mv)
   - y = Apparent Magnitude (m)
   - title = *[Write a descriptive title]*
   - color = Star type

***Drag your mouse on top of the graph and use the options above it to explore all the awesome things you can see and do with an interactive graph!*** 

*Tip: Try drawing a box inside the graph with your mouse*

Write a brief analysis explaining the comparison between Absolute magnitude(Mv) and Apparent Magnitude (m), which Star types have lower/highest magnitudes values and what does this mean.

In [None]:
import plotly.express as px

# Scatter plot
fig = px.scatter(df, x='YOUR CODE GOES HERE', y='YOUR CODE GOES HERE', 
                 title='YOUR CODE GOES HERE', color = 'YOUR CODE GOES HERE')
fig.show()


**[WRITE YOUR ANALYSIS HERE]**