# Project 6: Cepheid Variable Period-Luminosity Relation

## Astronomy Context

Cepheid variables are pulsating stars whose brightness oscillates with very regular periods (1-100 days). The remarkable discovery is that their period directly relates to their intrinsic luminosity: longer period = more luminous. This makes them "standard candles"—measure the period, know the true luminosity, compare to observed brightness, get the distance! This is how Hubble measured galaxy distances and discovered the universe's expansion.
## Project Overview

In this project, we:
1. Download Cepheid catalog data from the OGLE survey via VizieR and Astroquery
2. Organize and clean the data using Pandas DataFrames
3. Fit the Period-Luminosity relation: magnitude = a log_10(period) + b
4. Perform statistical analysis including chi-squared tests
5. Create publication-quality visualizations
6. Analyze residuals and estimate uncertainties

In [12]:
# Download required libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from astropy.io import fits
from astropy.table import Table
import requests
from io import StringIO
import warnings
warnings.filterwarnings('ignore')

from astroquery.vizier import Vizier
from astropy.coordinates import SkyCoord
import astropy.units as u

## Data Acquisition with Astroquery

Download Cepheid variable data from the OGLE (Optical Gravitational Lensing Experiment) survey using Vizier through astroquery library.

1. Set Vizier row limit to 100 to manage data size (if user decided to switch to bigger Cepheid catalog)
2. Query catalog 'J/AcA/58/163', which contains VI light curves of LMC classical Cepheids (can be changed to other Cepheid catalogs)
3. Extract essential columns: OGLE star IDs, periods, and intensity mean I-band magnitudes
4. Clean data by removing entries with missing period or magnitude values
5. Save the cleaned dataset to CSV

Resulting dataset contains Cepheid periods and i magnitudes for fitting the Period-Luminosity relation. All stars in this catalog are from the same galaxy (LMC).

In [29]:
# Set up Vizier
Vizier.ROW_LIMIT = 100

def download_ogle_lmc_cepheids():
    """
    Download Cepheid data (VI light curves of LMC classical Cepheids)from VizieR.
    Returns DataFrame with periods and magnitudes.
    """
    print("Downloading OGLE LMC Cepheid data...")
    
    try:
        # Query a OGLE-III LMC Cepheids catalog
        catalog = 'J/AcA/58/163'
        catalogs = Vizier.get_catalogs(catalog)
        
        if len(catalogs) == 0:
            print("No data found.")
        
        table = catalogs[0]
        df = table.to_pandas()
        
        print(f"Successfully downloaded {len(df)} Cepheids from LMC")
        
        # Create clean dataset with required columns
        clean_data = {
            'StarID': df['OGLE'],
            'Period': df['Per'],
            'I_mag': df['<Imag>']
        }

        result_df = pd.DataFrame(clean_data)

        # Drop invalid data
        result_df = result_df.dropna(subset=['Period', 'I_mag'])
        print(f"After cleaning: {len(result_df)} stars")

        return result_df
        
    except Exception as e:
        print(f"Error downloading data: {e}")

# Download the data
cepheids_df = download_ogle_lmc_cepheids()

# Display results
print(f"\nFinal dataset: {len(cepheids_df)} Cepheids from LMC")
print(cepheids_df)

print(f"\nPeriod range: {cepheids_df['Period'].min():.1f} to {cepheids_df['Period'].max():.1f} days")
print(f"Magnitude range: {cepheids_df['I_mag'].min():.1f} to {cepheids_df['I_mag'].max():.1f} mag")

# Save to CSV
cepheids_df.to_csv('ogle_lmc_cepheids.csv', index=False)
print("\nData saved to 'ogle_lmc_cepheids.csv'")

Downloading OGLE LMC Cepheid data...
Successfully downloaded 23 Cepheids from LMC
After cleaning: 23 stars

Final dataset: 23 Cepheids from LMC
    StarID     Period   I_mag
0       15  11.394330  14.663
1    10959   5.104732  15.080
2    10922   8.586319  14.355
3    30044   3.156445  14.515
4    13290  18.001330  13.858
5    15404  14.267731  14.394
6    46771   4.233113  14.396
7    15662  13.062168  14.322
8    64106   5.089158  14.116
9    94835   3.656794  14.468
10   34606   7.941375  14.124
11   49313   8.597815  14.621
12  117011   4.659557  15.380
13   58179   9.981881  14.095
14  123188   1.430079  16.347
15   69984   9.243522  13.291
16      11  11.509272  13.962
17      24   2.981438  14.752
18   20290   6.112741  14.042
19     113   5.464574  14.767
20       3   4.818801  13.903
21      11   6.259100  14.231
22    4732   5.746628  15.039

Period range: 1.4 to 18.0 days
Magnitude range: 13.3 to 16.3 mag

Data saved to 'ogle_lmc_cepheids.csv'


## Data Exploration and Cleaning

In [30]:
# Define CepheidAnalyzer class
# Implement explore_data method for basic statistics
# Check data quality and remove invalid entries
# Display dataset information
# Data stored in pandas DataFrame with columns:
# 'Period', 'I_mag', 'I_mag_error', 'Type', etc.

## Period-Luminosity Relation Fitting

In [31]:
# Define P-L model and fit using curve_fit
# Calculate parameters and uncertainties
# period_luminosity_model function: magnitude = a * log10(period) + b
# Using scipy.optimize.curve_fit for fitting

# curve_fit returns parameter uncertainties
# Error propagation for fitted parameters
# Magnitude errors included in fitting

## Visualization

In [32]:
# Create P-L relation plot with fitted line
# Include residuals subplot
# log10(period) on x-axis, magnitude on y-axis
# Fitted line overlaid on data points
# Error bars included

## Statistical Analysis

In [33]:
# Calculate chi-squared, R², other goodness-of-fit metrics
# Basic residual analysis
# chi2 = np.sum((residuals / errors) ** 2)
# reduced_chi2 = chi2 / degrees_of_freedom
# Explicitly calculated and displayed

## Distance Estimation

In [34]:
# Simple distance calculation using P-L relation
# distance_modulus calculation included
# Explanation in markdown cells
# Demonstration: m = M + μ → μ = m - M