# **Week 2: Data Visualization in Astronomy**

Visualization is a key aspect of analyzing astronomical data, allowing us to identify patterns, distributions, and relationships. Below are some examples of the different visualization techniques used in the notebook and their significance in astronomy.

In [None]:
!pip install corner
!pip install astropy
!pip install seaborn

In [None]:
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from astropy.table import Table
from mpl_toolkits.mplot3d import Axes3D
import corner

# **Scatter Plot – Absolute Magnitude vs Size for Milky Way Satellites**

A scatter plot is used to examine relationships between two variables. In this case, we plot the absolute magnitude of galaxies against their size (half-light radius) in parsec.

**What does this plot show?**

* Each point represents a satellite galaxy of the Milky Way.
* The x-axis shows the size (half-light radius) and the y-axis shows how bright the galaxy is.
* Brighter galaxies appear higher on the plot due to the inverted y-axis (astronomical convention).

In [None]:
# Example 1: Scatter Plot – Absolute Magnitude vs Half-light Radius for Milky Way Satellites
# -----------------------------------------------------------------------------------------

# Read the dwarf galaxy data from the online database
dsph_mw = Table.read('https://raw.githubusercontent.com/apace7/local_volume_database/main/data/dwarf_mw.csv')

# Print the first few rows to inspect the data
print(dsph_mw[:5])

# Extract the half-light radius and absolute magnitude columns
r_half = dsph_mw['rhalf_sph_physical']  # Half-light radius (pc)
M_V = dsph_mw['M_V']                    # Absolute V-band magnitude

# Create the scatter plot
plt.figure(figsize=(8,6))
plt.scatter(r_half, M_V, color='blue', marker='o')
#plt.plot(r_half, M_V, 'o', color='blue')
plt.gca().invert_yaxis()  # Brighter galaxies have lower (more negative) magnitudes
plt.gca().set_xscale('log')
plt.xlabel("Half-light Radius (pc)")
plt.ylabel("Absolute Magnitude (M$_V$)")
plt.title("Milky Way Satellite Galaxies: M$_V$ vs Half-light Radius")
plt.show()

**Multi-panels and Error bars**

In [None]:
# Read the dwarf galaxy data from the online database
dsph_mw = Table.read('https://raw.githubusercontent.com/apace7/local_volume_database/main/data/dwarf_mw.csv')

# Extract columns
r_half = dsph_mw['rhalf_sph_physical']  # Half-light radius (pc)
M_V = dsph_mw['M_V']                    # Absolute V-band magnitude

# Error columns
r_half_em = dsph_mw['rhalf_sph_physical_em']  # Lower error on r_half
r_half_ep = dsph_mw['rhalf_sph_physical_ep']  # Upper error on r_half
M_V_em = dsph_mw['M_V_em']                    # Lower error on M_V
M_V_ep = dsph_mw['M_V_ep']                    # Upper error on M_V

# Set up side-by-side panels
fig, axes = plt.subplots(1, 2, figsize=(14, 6), sharey=True)

# Left panel: Simple scatter plot
axes[0].scatter(r_half, M_V, color='blue', marker='o')
#axes[0].invert_yaxis()
axes[0].set_xscale('log')
axes[0].set_xlabel("Half-light Radius (pc)")
axes[0].set_ylabel("Absolute Magnitude (M$_V$)")
axes[0].set_title("MW Satellites: M$_V$ vs r$_{half}$ (Scatter)")

# Right panel: Scatter plot with error bars
axes[1].errorbar(
    r_half, M_V,
    xerr=[r_half_em, r_half_ep],
    yerr=[M_V_em, M_V_ep],
    fmt='o', color='orange', mec='k', mew=0.75, ecolor='gray', capsize=3,
    label='MW Satellites'
)
axes[1].invert_yaxis()
axes[1].set_xscale('log')
axes[1].set_xlabel("Half-light Radius (pc)", fontsize=14)
axes[1].set_title("MW Satellites: M$_V$ vs r$_{half}$ (Error Bars)", fontsize=20)
axes[1].grid(True)

# Adjust tick parameters and axes properties directly
axes[1].tick_params(axis='x', labelsize=14, top=True, direction='in')
axes[1].tick_params(axis='y', labelsize=14, right=True, direction='in')
#axes[1].tick_params(axis='x', which='minor', direction='in')
#axes[1].tick_params(axis='y', which='minor', direction='in')

# Add legend to the plot
# Some useful options:
# - loc: location of the legend (e.g., 'upper right', 'lower left', 'best')
# - fontsize: size of the legend text
# - frameon: whether to draw a frame around the legend
axes[1].legend(loc='best', fontsize=14, frameon=True)

plt.tight_layout()
plt.show()

## Hands-on Exercise:

Try the following tasks to practice working with different data types in Python:

1. **Load the data for M31 satellite galaxies**  
   *Hint: Use the following code to load the data:*  
   ```python
   dsph_m31 = Table.read('https://raw.githubusercontent.com/apace7/local_volume_database/main/data/dwarf_m31.csv')
   ```

2. **Extract the half-light radius and absolute magnitude columns**  
   - Use: `dsph_m31['rhalf_sph_physical']` and `dsph_m31['M_V']`

3. **Perform math operations:**  
   - Calculate the mean and median of the half-light radius.
   - Find the brightest and faintest satellite (minimum and maximum absolute magnitude).

4. **Plot the results:**  
   - Create a scatter plot of absolute magnitude vs half-light radius for the M31 satellites, similar to the example given above.

*Tip: Use functions like `np.mean()`, `np.median()`, `np.min()`, and `np.max()` for your calculations.*

In [None]:
#@title

# 1. Load the data for M31 satellite galaxies
dsph_m31 = Table.read('https://raw.githubusercontent.com/apace7/local_volume_database/main/data/dwarf_m31.csv')

# 2. Extract the half-light radius and absolute magnitude columns
r_half_m31 = dsph_m31['rhalf_sph_physical']  # Half-light radius (pc)
M_V_m31 = dsph_m31['M_V']                    # Absolute V-band magnitude

# 3. Perform math operations
mean_r_half = np.mean(r_half_m31)
median_r_half = np.median(r_half_m31)
brightest_M_V = np.min(M_V_m31)   # Most negative = brightest
faintest_M_V = np.max(M_V_m31)    # Most positive = faintest

print("Mean half-light radius:", mean_r_half)
print("Median half-light radius:", median_r_half)
print("Brightest satellite (lowest M_V):", brightest_M_V)
print("Faintest satellite (highest M_V):", faintest_M_V)

# 4. Plot the results (scatter plot)
plt.figure(figsize=(8,6))
plt.scatter(r_half_m31, M_V_m31, color='green', marker='o')
plt.gca().invert_yaxis()  # Brighter galaxies have lower (more negative) magnitudes
plt.gca().set_xscale('log')
plt.xlabel("Half-light Radius (pc)")
plt.ylabel("Absolute Magnitude (M$_V$)")
plt.title("M31 Satellite Galaxies: M$_V$ vs Half-light Radius")
plt.show()

# **Histogram – Distribution of Galaxy Magnitudes**

Histograms are useful for understanding the distribution of a dataset. Here, we examine the frequency of different MW satellite magnitudes.

**Why is it important in astronomy?**

* Helps categorize galaxies into brightness classes.
* Can reveal whether a sample is dominated by bright or faint galaxies.

In [None]:
# Example 2: Histogram - Distribution of Galaxy Magnitudes
# --------------------------------------------------------
plt.figure(figsize=(8,6))
plt.hist(M_V, bins=10, color='purple', edgecolor='black', alpha=0.7)
#plt.hist(M_V, bins=10, color='purple', edgecolor='black', alpha=0.7, cumulative=True)
plt.xlabel("M$_V$")
plt.ylabel("Frequency")
#plt.ylabel("N(<M$_V$)")
plt.title("Histogram of Galaxy Magnitudes")
plt.gca().invert_xaxis() #follows the convention that brighter stars have lower magnitude values.
plt.show()

Can you think about alternative ways to plot histograms in python❓

# **Heatmap – Stellar Density Distribution**

Heatmaps help represent the density of objects in 2D space. Here, we generate random position for stars and plot their stellar density distribution.

**Why is it important in astronomy?**

* Used to visualize star clusters, galaxy distributions, or gas clouds.
* Helps in understanding how stars/gas/etc. are spatially distributed.

In [None]:
# Example 3: Heatmap - Stellar Density Distribution
# --------------------------------------------------
x = np.random.normal(loc=0, scale=1.0, size=500) #Random number generation using a "Normal Distribution" to generate random x, y positions of stars
y = np.random.normal(loc=0, scale=1.0, size=500)

plt.figure(figsize=(8,6))
sns.kdeplot(x=x, y=y, cmap="magma", fill=True) #sns.kdeplot() is used with cmap="magma" to create a kernel density estimate (KDE) plot, which shows regions with high stellar concentrations.
#plt.hist2d(x, y, bins=20, cmap="magma")
#plt.colorbar(label="Number of stars")
plt.xlabel("X Position (kpc)")
plt.ylabel("Y Position (kpc)")
plt.title("Simulated Stellar Density Map")
plt.show()

**Understanding the Normal Distribution**

A normal distribution is described by the probability density function (PDF):

\begin{align}
f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(x - \mu)^2}{2\sigma^2}}
\end{align}

where:

* $μ$ (mean) is the center of the distribution (`loc=0` in the code).
* $σ$ (standard deviation) determines spread (`scale=1.0` in the code).
* The `size=500` parameter specifies that 500 random values are generated.

**Why Use a Normal Distribution in Astronomy?**

* Many astronomical quantities (e.g., measurement errors, velocities, stellar distributions) naturally follow a normal distribution.
* Used in statistical noise modeling for telescope data.
* ...



# **3D Visualization – Star Positions in Space**

A 3D scatter plot is used to visualize the spatial distribution of stars in three dimensions.

**Why is it important in astronomy?**

* Helps analyze the structure of star clusters, galaxies, and cosmic filaments.
* Can reveal large-scale patterns in the universe.

In [None]:
# Example 4: 3D Visualization - Star Positions in Space
# ------------------------------------------------------
fig = plt.figure(figsize=(8,6))
ax = fig.add_subplot(111, projection='3d')

# Generating random 3D positions for 100 stars
np.random.seed(42)
x = np.random.uniform(-100, 100, 100)
y = np.random.uniform(-100, 100, 100)
z = np.random.uniform(-100, 100, 100)

ax.scatter(x, y, z, color='red', marker='o')
ax.set_xlabel("X (pc)")
ax.set_ylabel("Y (pc)")
ax.set_zlabel("Z (pc)")
ax.set_title("3D Distribution of Stars")
plt.show()


**Differences Between `random.uniform` and `random.normal` for Generating Random Numbers**

1. Uniform Distribution (`np.random.uniform`)

* Generates random numbers where every value within a given range has an equal probability of occurring.
* Defined by two parameters: `low` (minimum) and `high` (maximum).
* Example:
```
np.random.uniform(low=0, high=10, size=5)
```
This generates 5 numbers uniformly distributed between 0 and 10.

2. Normal Distribution (`np.random.normal`)

* Generates random numbers following a bell-shaped Gaussian distribution.
* Defined by two parameters: `loc` (mean) and `scale` (standard deviation).
* Example:
```
np.random.normal(loc=0, scale=1.0, size=5)
```
This generates 5 numbers centered around 0 with a spread of 1.

# **Corner Plots - Visualizing Multidimensional Distributions**

Corner plots are useful for displaying relationships between multiple parameters, often used in Bayesian inference and MCMC sampling in astronomy.


**Why use corner plots?**

* Helps visualize correlations between parameters (e.g., how stellar mass correlates with luminosity).
* Used in MCMC sampling results (e.g., estimating galaxy halo properties).
* Provides statistical insights by displaying parameter distributions.

In [None]:
# Simulating stellar parameter distributions
np.random.seed(42)
num_samples = 500

# Generate synthetic data for star properties
stellar_mass = np.random.normal(loc=1.0, scale=0.2, size=num_samples)  # Solar masses
luminosity = np.random.normal(loc=5.0, scale=1.0, size=num_samples)  # Solar luminosities
temperature = np.random.normal(loc=6000, scale=500, size=num_samples)  # Kelvin

# Combine into an array
data = np.vstack([stellar_mass, luminosity, temperature]).T

# Create corner plot
figure = corner.corner(data, labels=["Mass (M$_{\odot}$)", "Luminosity (L$_{\odot}$)", "Temperature (K)"],
                       quantiles=[0.16, 0.5, 0.84], show_titles=True, title_kwargs={"fontsize": 9})

plt.show()

# **Conclusion**
These visualization techniques help astronomers interpret and analyze vast datasets efficiently. From simple scatter plots to complex 3D models, these tools provide insights into stellar distributions, galaxy morphology, and more. 🚀