# Tutorial 2: Galaxy Catalogues and Lensing

## Learning Objectives
By the end of this tutorial, you will understand:
- How to work with realistic galaxy catalogue data
- How gravitational lensing affects observed galaxy properties
- The relationship between intrinsic and observed galaxy sizes
- How to simulate lensing effects on galaxy catalogues

## Introduction

In the previous tutorial, we learned how to generate kappa maps from power spectra. Now we'll explore how these kappa values affect real galaxy observations. When light from distant galaxies passes through the gravitational field of intervening matter, several observable properties change:

- **Size**: Galaxies appear larger or smaller
- **Shape**: Galaxies become elongated (shear)
- **Brightness**: Surface brightness is conserved, but total flux changes
- **Position**: Slight deflections (usually negligible for most surveys)

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import healpy as hp
from scipy import stats
from astropy.cosmology import Planck18
from astropy import units as u
from astropy.coordinates import SkyCoord
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# Set up plotting
plt.style.use('default')
plt.rcParams['figure.figsize'] = (10, 8)
plt.rcParams['font.size'] = 12
sns.set_palette("husl")

## 1. Creating a Synthetic Galaxy Catalogue

Let's start by creating a realistic synthetic galaxy catalogue with typical properties observed in surveys like SDSS, DES, or LSST.

In [None]:
def generate_galaxy_catalogue(n_galaxies=10000, area_deg2=100, seed=42):
    """
    Generate a synthetic galaxy catalogue
    
    Parameters:
    n_galaxies: number of galaxies
    area_deg2: survey area in square degrees
    seed: random seed
    """
    np.random.seed(seed)
    
    # Random positions on sky
    # For simplicity, use a small square patch
    ra_range = np.sqrt(area_deg2)  # degrees
    dec_range = np.sqrt(area_deg2)  # degrees
    
    ra = np.random.uniform(0, ra_range, n_galaxies)
    dec = np.random.uniform(-dec_range/2, dec_range/2, n_galaxies)
    
    # Redshift distribution (simplified)
    # Using a combination of exponential and power law
    z = np.random.exponential(0.3, n_galaxies) + np.random.gamma(2, 0.2, n_galaxies)
    z = np.clip(z, 0.1, 3.0)  # Realistic redshift range
    
    # Galaxy types (0=elliptical, 1=spiral)
    galaxy_type = np.random.choice([0, 1], n_galaxies, p=[0.3, 0.7])
    
    # Intrinsic sizes (half-light radius in arcsec)
    # Different size distributions for different types
    r_half_intrinsic = np.zeros(n_galaxies)
    
    # Ellipticals: larger, log-normal distribution
    elliptical_mask = galaxy_type == 0
    r_half_intrinsic[elliptical_mask] = np.random.lognormal(np.log(1.5), 0.5, np.sum(elliptical_mask))
    
    # Spirals: smaller, different log-normal
    spiral_mask = galaxy_type == 1
    r_half_intrinsic[spiral_mask] = np.random.lognormal(np.log(0.8), 0.6, np.sum(spiral_mask))
    
    # Apparent magnitudes (simplified, no k-corrections)
    # Magnitude increases with redshift due to distance
    distance_modulus = 5 * np.log10(Planck18.luminosity_distance(z).to(u.pc).value) - 5
    absolute_mag = np.random.normal(-20, 1.5, n_galaxies)  # Absolute magnitude
    apparent_mag = absolute_mag + distance_modulus
    
    # Add some magnitude scatter and observational limits
    apparent_mag += np.random.normal(0, 0.1, n_galaxies)
    
    # Create catalogue
    catalogue = pd.DataFrame({
        'galaxy_id': np.arange(n_galaxies),
        'ra': ra,
        'dec': dec,
        'redshift': z,
        'galaxy_type': galaxy_type,
        'r_half_intrinsic': r_half_intrinsic,
        'apparent_mag': apparent_mag,
        'absolute_mag': absolute_mag
    })
    
    # Apply magnitude limit (typical for surveys)
    mag_limit = 24.5
    catalogue = catalogue[catalogue['apparent_mag'] < mag_limit].reset_index(drop=True)
    
    return catalogue

# Generate the catalogue
print("Generating galaxy catalogue...")
galaxy_cat = generate_galaxy_catalogue(n_galaxies=20000, area_deg2=25)

print(f"Generated catalogue with {len(galaxy_cat)} galaxies")
print(f"Redshift range: {galaxy_cat['redshift'].min():.2f} - {galaxy_cat['redshift'].max():.2f}")
print(f"Size range: {galaxy_cat['r_half_intrinsic'].min():.2f} - {galaxy_cat['r_half_intrinsic'].max():.2f} arcsec")
print(f"Magnitude range: {galaxy_cat['apparent_mag'].min():.1f} - {galaxy_cat['apparent_mag'].max():.1f}")

# Display first few rows
print("\nFirst 5 galaxies:")
print(galaxy_cat.head())

## 2. Visualizing the Galaxy Catalogue

Let's explore the properties of our synthetic galaxy catalogue.

In [None]:
# Create comprehensive plots of catalogue properties
fig, axes = plt.subplots(2, 3, figsize=(18, 12))

# Sky positions
scatter = axes[0,0].scatter(galaxy_cat['ra'], galaxy_cat['dec'], 
                          c=galaxy_cat['redshift'], s=1, cmap='viridis', alpha=0.6)
axes[0,0].set_xlabel('RA (degrees)')
axes[0,0].set_ylabel('Dec (degrees)')
axes[0,0].set_title('Galaxy Positions (colored by redshift)')
plt.colorbar(scatter, ax=axes[0,0], label='Redshift')
axes[0,0].set_aspect('equal')

# Redshift distribution
axes[0,1].hist(galaxy_cat['redshift'], bins=50, alpha=0.7, density=True, 
              color='skyblue', edgecolor='black')
axes[0,1].set_xlabel('Redshift')
axes[0,1].set_ylabel('Probability Density')
axes[0,1].set_title('Redshift Distribution')
axes[0,1].grid(True, alpha=0.3)

# Size distribution by type
ellipticals = galaxy_cat[galaxy_cat['galaxy_type'] == 0]
spirals = galaxy_cat[galaxy_cat['galaxy_type'] == 1]

axes[0,2].hist(ellipticals['r_half_intrinsic'], bins=30, alpha=0.7, 
              label='Ellipticals', color='red', density=True)
axes[0,2].hist(spirals['r_half_intrinsic'], bins=30, alpha=0.7, 
              label='Spirals', color='blue', density=True)
axes[0,2].set_xlabel('Half-light Radius (arcsec)')
axes[0,2].set_ylabel('Probability Density')
axes[0,2].set_title('Intrinsic Size Distribution by Type')
axes[0,2].legend()
axes[0,2].grid(True, alpha=0.3)

# Magnitude vs redshift
scatter2 = axes[1,0].scatter(galaxy_cat['redshift'], galaxy_cat['apparent_mag'], 
                           c=galaxy_cat['galaxy_type'], s=1, alpha=0.6, cmap='coolwarm')
axes[1,0].set_xlabel('Redshift')
axes[1,0].set_ylabel('Apparent Magnitude')
axes[1,0].set_title('Magnitude vs Redshift (colored by type)')
plt.colorbar(scatter2, ax=axes[1,0], label='Galaxy Type', ticks=[0, 1])
axes[1,0].grid(True, alpha=0.3)

# Size vs redshift
axes[1,1].scatter(galaxy_cat['redshift'], galaxy_cat['r_half_intrinsic'], 
                 c=galaxy_cat['galaxy_type'], s=1, alpha=0.6, cmap='coolwarm')
axes[1,1].set_xlabel('Redshift')
axes[1,1].set_ylabel('Half-light Radius (arcsec)')
axes[1,1].set_title('Size vs Redshift (colored by type)')
axes[1,1].grid(True, alpha=0.3)

# Galaxy type fraction vs redshift
z_bins = np.linspace(0.1, 3.0, 15)
z_centers = (z_bins[1:] + z_bins[:-1]) / 2
elliptical_fraction = []

for i in range(len(z_bins)-1):
    mask = (galaxy_cat['redshift'] >= z_bins[i]) & (galaxy_cat['redshift'] < z_bins[i+1])
    if np.sum(mask) > 0:
        frac = np.sum(galaxy_cat[mask]['galaxy_type'] == 0) / np.sum(mask)
        elliptical_fraction.append(frac)
    else:
        elliptical_fraction.append(0)

axes[1,2].plot(z_centers, elliptical_fraction, 'o-', linewidth=2, markersize=6)
axes[1,2].set_xlabel('Redshift')
axes[1,2].set_ylabel('Elliptical Fraction')
axes[1,2].set_title('Galaxy Type Evolution with Redshift')
axes[1,2].grid(True, alpha=0.3)
axes[1,2].set_ylim(0, 1)

plt.tight_layout()
plt.show()

# Print some statistics
print(f"\nCatalogue Statistics:")
print(f"Total galaxies: {len(galaxy_cat)}")
print(f"Ellipticals: {np.sum(galaxy_cat['galaxy_type'] == 0)} ({100*np.sum(galaxy_cat['galaxy_type'] == 0)/len(galaxy_cat):.1f}%)")
print(f"Spirals: {np.sum(galaxy_cat['galaxy_type'] == 1)} ({100*np.sum(galaxy_cat['galaxy_type'] == 1)/len(galaxy_cat):.1f}%)")
print(f"Median redshift: {np.median(galaxy_cat['redshift']):.2f}")
print(f"Median size: {np.median(galaxy_cat['r_half_intrinsic']):.2f} arcsec")

## 3. Adding Lensing Effects to the Catalogue

Now we'll apply gravitational lensing effects to our galaxy catalogue using the kappa maps from the previous tutorial.

In [None]:
def get_kappa_at_positions(ra, dec, kappa_map, nside):
    """
    Get kappa values at galaxy positions from HEALPix map
    
    Parameters:
    ra, dec: galaxy positions in degrees
    kappa_map: HEALPix kappa map
    nside: HEALPix nside parameter
    """
    # Convert RA/Dec to theta/phi for HEALPix
    theta = np.radians(90 - dec)  # Colatitude
    phi = np.radians(ra)  # Longitude
    
    # Get pixel indices
    pix_indices = hp.ang2pix(nside, theta, phi)
    
    # Return kappa values
    return kappa_map[pix_indices]

def apply_lensing_effects(catalogue, kappa_values):
    """
    Apply lensing effects to galaxy catalogue
    
    Parameters:
    catalogue: galaxy catalogue DataFrame
    kappa_values: convergence values at galaxy positions
    """
    cat_lensed = catalogue.copy()
    
    # Calculate magnification (simplified: ignore shear)
    # μ = 1 / (1 - κ)²
    magnification = 1.0 / (1.0 - kappa_values)**2
    
    # Size scaling: observed size = intrinsic size × sqrt(magnification)
    size_scaling = np.sqrt(magnification)
    cat_lensed['r_half_observed'] = catalogue['r_half_intrinsic'] * size_scaling
    
    # Flux scaling: observed flux = intrinsic flux × magnification
    # Convert to magnitude: mag_obs = mag_intrinsic - 2.5 * log10(μ)
    mag_correction = -2.5 * np.log10(magnification)
    cat_lensed['apparent_mag_lensed'] = catalogue['apparent_mag'] + mag_correction
    
    # Add lensing-related columns
    cat_lensed['kappa'] = kappa_values
    cat_lensed['magnification'] = magnification
    cat_lensed['size_scaling'] = size_scaling
    cat_lensed['mag_correction'] = mag_correction
    
    return cat_lensed

# Load or generate a kappa map (using code from previous tutorial)
print("Generating kappa map for lensing...")

# For this tutorial, we'll create a simple kappa map
# In practice, you'd use the more sophisticated version from Tutorial 1
def simple_kappa_map(nside=256, z_source=1.0, seed=42):
    """Simplified kappa map generation"""
    np.random.seed(seed)
    
    # Generate random map with realistic statistics
    # This is a simplified version - use Tutorial 1 for full implementation
    npix = hp.nside2npix(nside)
    
    # Create correlated noise with appropriate power spectrum
    # For simplicity, use uncorrelated Gaussian with realistic variance
    sigma_kappa = 0.01 * z_source  # Rough scaling with source redshift
    kappa = np.random.normal(0, sigma_kappa, npix)
    
    # Smooth the map to create realistic correlations
    kappa_smoothed = hp.smoothing(kappa, fwhm=np.radians(0.5))  # 0.5 degree smoothing
    
    return kappa_smoothed

# Generate kappa map
nside = 256
median_z = np.median(galaxy_cat['redshift'])
kappa_map = simple_kappa_map(nside=nside, z_source=median_z)

print(f"Generated kappa map with nside={nside}")
print(f"Kappa statistics: mean={np.mean(kappa_map):.6f}, std={np.std(kappa_map):.6f}")
print(f"Kappa range: [{np.min(kappa_map):.6f}, {np.max(kappa_map):.6f}]")

In [None]:
# Get kappa values at galaxy positions
print("Interpolating kappa values at galaxy positions...")
galaxy_kappa = get_kappa_at_positions(galaxy_cat['ra'], galaxy_cat['dec'], 
                                     kappa_map, nside)

# Apply lensing effects
print("Applying lensing effects to galaxy catalogue...")
galaxy_cat_lensed = apply_lensing_effects(galaxy_cat, galaxy_kappa)

print(f"\nLensing effects applied to {len(galaxy_cat_lensed)} galaxies")
print(f"Kappa at galaxy positions: [{np.min(galaxy_kappa):.6f}, {np.max(galaxy_kappa):.6f}]")
print(f"Magnification range: [{np.min(galaxy_cat_lensed['magnification']):.4f}, {np.max(galaxy_cat_lensed['magnification']):.4f}]")
print(f"Size scaling range: [{np.min(galaxy_cat_lensed['size_scaling']):.4f}, {np.max(galaxy_cat_lensed['size_scaling']):.4f}]")

# Display sample of lensed catalogue
print("\nSample of lensed catalogue:")
columns_to_show = ['galaxy_id', 'redshift', 'r_half_intrinsic', 'r_half_observed', 
                  'kappa', 'magnification', 'size_scaling']
print(galaxy_cat_lensed[columns_to_show].head(10))

## 4. Comparing Intrinsic vs Observed Properties

Let's examine how lensing changes the observed galaxy properties compared to their intrinsic values.

In [None]:
# Comprehensive comparison of intrinsic vs observed properties
fig, axes = plt.subplots(2, 3, figsize=(18, 12))

# Size comparison
axes[0,0].scatter(galaxy_cat_lensed['r_half_intrinsic'], 
                 galaxy_cat_lensed['r_half_observed'], 
                 c=galaxy_cat_lensed['kappa'], s=1, alpha=0.6, cmap='RdBu_r')
# Add 1:1 line
size_min = min(galaxy_cat_lensed['r_half_intrinsic'].min(), 
              galaxy_cat_lensed['r_half_observed'].min())
size_max = max(galaxy_cat_lensed['r_half_intrinsic'].max(), 
              galaxy_cat_lensed['r_half_observed'].max())
axes[0,0].plot([size_min, size_max], [size_min, size_max], 'k--', alpha=0.8, label='No lensing')
axes[0,0].set_xlabel('Intrinsic Half-light Radius (arcsec)')
axes[0,0].set_ylabel('Observed Half-light Radius (arcsec)')
axes[0,0].set_title('Size: Intrinsic vs Observed')
axes[0,0].legend()
axes[0,0].grid(True, alpha=0.3)

# Magnitude comparison
scatter_mag = axes[0,1].scatter(galaxy_cat_lensed['apparent_mag'], 
                               galaxy_cat_lensed['apparent_mag_lensed'], 
                               c=galaxy_cat_lensed['kappa'], s=1, alpha=0.6, cmap='RdBu_r')
mag_min = min(galaxy_cat_lensed['apparent_mag'].min(), 
             galaxy_cat_lensed['apparent_mag_lensed'].min())
mag_max = max(galaxy_cat_lensed['apparent_mag'].max(), 
             galaxy_cat_lensed['apparent_mag_lensed'].max())
axes[0,1].plot([mag_min, mag_max], [mag_min, mag_max], 'k--', alpha=0.8, label='No lensing')
axes[0,1].set_xlabel('Intrinsic Apparent Magnitude')
axes[0,1].set_ylabel('Lensed Apparent Magnitude')
axes[0,1].set_title('Magnitude: Intrinsic vs Lensed')
axes[0,1].legend()
axes[0,1].grid(True, alpha=0.3)
plt.colorbar(scatter_mag, ax=axes[0,1], label='κ')

# Size scaling vs kappa
axes[0,2].scatter(galaxy_cat_lensed['kappa'], galaxy_cat_lensed['size_scaling'], 
                 s=1, alpha=0.6, color='green')
axes[0,2].axhline(1, color='red', linestyle='--', alpha=0.8, label='No scaling')
axes[0,2].axvline(0, color='red', linestyle='--', alpha=0.8)
axes[0,2].set_xlabel('Convergence κ')
axes[0,2].set_ylabel('Size Scaling Factor')
axes[0,2].set_title('Size Scaling vs Convergence')
axes[0,2].legend()
axes[0,2].grid(True, alpha=0.3)

# Histograms of size differences
size_ratio = galaxy_cat_lensed['r_half_observed'] / galaxy_cat_lensed['r_half_intrinsic']
axes[1,0].hist(size_ratio, bins=50, alpha=0.7, density=True, color='orange', edgecolor='black')
axes[1,0].axvline(1, color='red', linestyle='--', alpha=0.8, linewidth=2, label='No change')
axes[1,0].set_xlabel('Observed Size / Intrinsic Size')
axes[1,0].set_ylabel('Probability Density')
axes[1,0].set_title('Distribution of Size Ratios')
axes[1,0].legend()
axes[1,0].grid(True, alpha=0.3)

# Magnitude differences
mag_diff = galaxy_cat_lensed['apparent_mag_lensed'] - galaxy_cat_lensed['apparent_mag']
axes[1,1].hist(mag_diff, bins=50, alpha=0.7, density=True, color='purple', edgecolor='black')
axes[1,1].axvline(0, color='red', linestyle='--', alpha=0.8, linewidth=2, label='No change')
axes[1,1].set_xlabel('Magnitude Change (Lensed - Intrinsic)')
axes[1,1].set_ylabel('Probability Density')
axes[1,1].set_title('Distribution of Magnitude Changes')
axes[1,1].legend()
axes[1,1].grid(True, alpha=0.3)

# Lensing effects by galaxy type
elliptical_sizes = size_ratio[galaxy_cat_lensed['galaxy_type'] == 0]
spiral_sizes = size_ratio[galaxy_cat_lensed['galaxy_type'] == 1]

axes[1,2].hist(elliptical_sizes, bins=30, alpha=0.7, label='Ellipticals', 
              color='red', density=True)
axes[1,2].hist(spiral_sizes, bins=30, alpha=0.7, label='Spirals', 
              color='blue', density=True)
axes[1,2].axvline(1, color='black', linestyle='--', alpha=0.8, linewidth=2)
axes[1,2].set_xlabel('Observed Size / Intrinsic Size')
axes[1,2].set_ylabel('Probability Density')
axes[1,2].set_title('Size Ratios by Galaxy Type')
axes[1,2].legend()
axes[1,2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Print statistics
print(f"\nLensing Effect Statistics:")
print(f"Mean size scaling: {np.mean(size_ratio):.4f} ± {np.std(size_ratio):.4f}")
print(f"Median size scaling: {np.median(size_ratio):.4f}")
print(f"Fraction of galaxies magnified (size > intrinsic): {np.sum(size_ratio > 1) / len(size_ratio):.3f}")
print(f"Fraction of galaxies demagnified (size < intrinsic): {np.sum(size_ratio < 1) / len(size_ratio):.3f}")
print(f"\nMagnitude changes:")
print(f"Mean magnitude change: {np.mean(mag_diff):.4f} ± {np.std(mag_diff):.4f}")
print(f"Fraction brightened: {np.sum(mag_diff < 0) / len(mag_diff):.3f}")
print(f"Fraction dimmed: {np.sum(mag_diff > 0) / len(mag_diff):.3f}")

## 5. Observational Challenges and Systematic Effects

In real observations, several factors complicate the detection and measurement of lensing effects.

In [None]:
def add_observational_effects(catalogue, seeing_fwhm=0.8, pixel_scale=0.2, noise_level=0.1):
    """
    Add realistic observational effects to the catalogue
    
    Parameters:
    catalogue: galaxy catalogue
    seeing_fwhm: atmospheric seeing FWHM in arcsec
    pixel_scale: detector pixel scale in arcsec/pixel
    noise_level: relative noise level
    """
    cat_obs = catalogue.copy()
    
    # Seeing convolution affects size measurements
    # For simplicity: observed_size = sqrt(intrinsic_size² + seeing²)
    seeing_sigma = seeing_fwhm / 2.355  # Convert FWHM to sigma
    size_seeing_broadened = np.sqrt(cat_obs['r_half_observed']**2 + seeing_sigma**2)
    cat_obs['r_half_seeing'] = size_seeing_broadened
    
    # Pixelization effects
    # Add small random errors due to finite pixel size
    pixel_error = np.random.normal(0, pixel_scale * 0.1, len(cat_obs))
    cat_obs['r_half_pixelized'] = cat_obs['r_half_seeing'] + pixel_error
    cat_obs['r_half_pixelized'] = np.maximum(cat_obs['r_half_pixelized'], pixel_scale)  # Minimum size
    
    # Measurement noise
    # Size measurement errors increase for fainter galaxies
    snr = 10**(-(cat_obs['apparent_mag_lensed'] - 20) / 2.5)  # Approximate S/N
    size_error = noise_level * cat_obs['r_half_pixelized'] / np.sqrt(snr)
    size_measurement_error = np.random.normal(0, size_error)
    cat_obs['r_half_measured'] = cat_obs['r_half_pixelized'] + size_measurement_error
    cat_obs['r_half_measured'] = np.maximum(cat_obs['r_half_measured'], pixel_scale/2)
    
    # Add error estimates
    cat_obs['r_half_error'] = size_error
    cat_obs['snr'] = snr
    
    return cat_obs

# Add observational effects
print("Adding observational effects...")
galaxy_cat_observed = add_observational_effects(galaxy_cat_lensed)

print(f"Added observational effects to {len(galaxy_cat_observed)} galaxies")
print(f"Size measurement errors: mean={np.mean(galaxy_cat_observed['r_half_error']):.4f} arcsec")
print(f"SNR range: [{np.min(galaxy_cat_observed['snr']):.1f}, {np.max(galaxy_cat_observed['snr']):.1f}]")

In [None]:
# Compare different stages of size measurement
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# Evolution of size measurements through observational pipeline
stages = ['r_half_intrinsic', 'r_half_observed', 'r_half_seeing', 'r_half_measured']
stage_labels = ['Intrinsic', 'Lensed', 'Seeing-convolved', 'Measured']
colors = ['blue', 'green', 'orange', 'red']

for i, (stage, label, color) in enumerate(zip(stages, stage_labels, colors)):
    axes[0,0].hist(galaxy_cat_observed[stage], bins=50, alpha=0.6, 
                  label=label, color=color, density=True)

axes[0,0].set_xlabel('Half-light Radius (arcsec)')
axes[0,0].set_ylabel('Probability Density')
axes[0,0].set_title('Size Distribution Evolution')
axes[0,0].legend()
axes[0,0].grid(True, alpha=0.3)
axes[0,0].set_xlim(0, 3)

# Size measurement accuracy vs magnitude
fractional_error = galaxy_cat_observed['r_half_error'] / galaxy_cat_observed['r_half_observed']
scatter = axes[0,1].scatter(galaxy_cat_observed['apparent_mag_lensed'], fractional_error, 
                           c=galaxy_cat_observed['snr'], s=1, alpha=0.6, cmap='viridis')
axes[0,1].set_xlabel('Apparent Magnitude')
axes[0,1].set_ylabel('Fractional Size Error')
axes[0,1].set_title('Size Measurement Precision vs Magnitude')
axes[0,1].set_yscale('log')
plt.colorbar(scatter, ax=axes[0,1], label='SNR')
axes[0,1].grid(True, alpha=0.3)

# Lensing signal vs noise
true_size_change = (galaxy_cat_observed['r_half_observed'] - 
                   galaxy_cat_observed['r_half_intrinsic']) / galaxy_cat_observed['r_half_intrinsic']
measured_size_change = (galaxy_cat_observed['r_half_measured'] - 
                       galaxy_cat_observed['r_half_intrinsic']) / galaxy_cat_observed['r_half_intrinsic']

axes[1,0].scatter(true_size_change, measured_size_change, 
                 c=galaxy_cat_observed['snr'], s=1, alpha=0.6, cmap='plasma')
change_range = [-0.2, 0.2]
axes[1,0].plot(change_range, change_range, 'k--', alpha=0.8, label='Perfect measurement')
axes[1,0].set_xlabel('True Fractional Size Change')
axes[1,0].set_ylabel('Measured Fractional Size Change')
axes[1,0].set_title('Lensing Signal: True vs Measured')
axes[1,0].legend()
axes[1,0].grid(True, alpha=0.3)

# Selection effects: detection completeness
# Galaxies below a certain size/SNR threshold would not be detected
size_threshold = 0.5  # arcsec
snr_threshold = 5.0

detected = ((galaxy_cat_observed['r_half_measured'] > size_threshold) & 
           (galaxy_cat_observed['snr'] > snr_threshold))

axes[1,1].hist(galaxy_cat_observed['r_half_measured'], bins=50, alpha=0.7, 
              label='All galaxies', color='blue', density=True)
axes[1,1].hist(galaxy_cat_observed[detected]['r_half_measured'], bins=50, alpha=0.7, 
              label=f'Detected (size>{size_threshold}", SNR>{snr_threshold})', 
              color='red', density=True)
axes[1,1].axvline(size_threshold, color='red', linestyle='--', alpha=0.8)
axes[1,1].set_xlabel('Measured Half-light Radius (arcsec)')
axes[1,1].set_ylabel('Probability Density')
axes[1,1].set_title('Selection Effects on Size Distribution')
axes[1,1].legend()
axes[1,1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\nObservational Effects Summary:")
print(f"Detection completeness: {np.sum(detected) / len(galaxy_cat_observed):.3f}")
print(f"Mean fractional size error: {np.mean(fractional_error):.4f}")
print(f"Median fractional size error: {np.median(fractional_error):.4f}")
correlation = np.corrcoef(true_size_change, measured_size_change)[0,1]
print(f"Correlation between true and measured size changes: {correlation:.3f}")

## 6. Exercise: Detecting Lensing Signals

Now let's try to detect the lensing signal in our simulated data, just as astronomers would do with real observations.

In [None]:
def detect_lensing_signal(catalogue, bin_by='kappa', n_bins=10):
    """
    Attempt to detect lensing signal by binning galaxies
    """
    # Use only well-detected galaxies
    good_detections = ((catalogue['r_half_measured'] > 0.5) & 
                      (catalogue['snr'] > 5) & 
                      (catalogue['r_half_error'] / catalogue['r_half_measured'] < 0.3))
    
    cat_clean = catalogue[good_detections].copy()
    print(f"Using {len(cat_clean)} well-detected galaxies ({len(cat_clean)/len(catalogue):.2f} of total)")
    
    # Calculate size ratios
    size_ratio_true = cat_clean['r_half_observed'] / cat_clean['r_half_intrinsic']
    size_ratio_measured = cat_clean['r_half_measured'] / cat_clean['r_half_intrinsic']
    
    # Bin by kappa values
    if bin_by == 'kappa':
        bin_values = cat_clean['kappa']
        bin_label = 'Convergence κ'
    elif bin_by == 'position':
        # Bin by spatial position (RA)
        bin_values = cat_clean['ra']
        bin_label = 'RA (degrees)'
    
    # Create bins
    bins = np.linspace(np.percentile(bin_values, 5), np.percentile(bin_values, 95), n_bins + 1)
    bin_centers = (bins[1:] + bins[:-1]) / 2
    
    # Calculate statistics in each bin
    mean_size_ratio_true = []
    mean_size_ratio_measured = []
    std_size_ratio_measured = []
    n_galaxies_per_bin = []
    
    for i in range(len(bins) - 1):
        mask = (bin_values >= bins[i]) & (bin_values < bins[i+1])
        if np.sum(mask) > 10:  # Need enough galaxies per bin
            mean_size_ratio_true.append(np.mean(size_ratio_true[mask]))
            mean_size_ratio_measured.append(np.mean(size_ratio_measured[mask]))
            std_size_ratio_measured.append(np.std(size_ratio_measured[mask]) / np.sqrt(np.sum(mask)))
            n_galaxies_per_bin.append(np.sum(mask))
        else:
            mean_size_ratio_true.append(np.nan)
            mean_size_ratio_measured.append(np.nan)
            std_size_ratio_measured.append(np.nan)
            n_galaxies_per_bin.append(0)
    
    return (bin_centers, mean_size_ratio_true, mean_size_ratio_measured, 
           std_size_ratio_measured, n_galaxies_per_bin)

# Detect lensing signal binned by kappa
print("Attempting to detect lensing signal...")
(kappa_bins, true_ratios, measured_ratios, 
 ratio_errors, n_per_bin) = detect_lensing_signal(galaxy_cat_observed, bin_by='kappa')

# Plot results
fig, axes = plt.subplots(1, 2, figsize=(15, 6))

# Size ratio vs kappa bins
valid_bins = ~np.isnan(measured_ratios)
axes[0].errorbar(kappa_bins[valid_bins], np.array(measured_ratios)[valid_bins], 
                yerr=np.array(ratio_errors)[valid_bins], 
                fmt='ro-', label='Measured', capsize=5, markersize=6)
axes[0].plot(kappa_bins[valid_bins], np.array(true_ratios)[valid_bins], 
            'bo-', label='True', markersize=6)
axes[0].axhline(1, color='black', linestyle='--', alpha=0.8, label='No lensing')
axes[0].set_xlabel('Convergence κ (bin centers)')
axes[0].set_ylabel('Mean Size Ratio (observed/intrinsic)')
axes[0].set_title('Lensing Signal Detection')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Number of galaxies per bin
axes[1].bar(kappa_bins[valid_bins], np.array(n_per_bin)[valid_bins], 
           width=(kappa_bins[1] - kappa_bins[0]) * 0.8, alpha=0.7, color='green')
axes[1].set_xlabel('Convergence κ (bin centers)')
axes[1].set_ylabel('Number of Galaxies')
axes[1].set_title('Sample Size per Bin')
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Calculate detection significance
valid_measured = np.array(measured_ratios)[valid_bins]
valid_errors = np.array(ratio_errors)[valid_bins]
no_lensing_deviation = (valid_measured - 1.0) / valid_errors
chi_squared = np.sum(no_lensing_deviation**2)
dof = len(valid_measured) - 1

print(f"\nLensing Detection Analysis:")
print(f"Valid bins: {np.sum(valid_bins)}")
print(f"Total galaxies in analysis: {np.sum(np.array(n_per_bin)[valid_bins])}")
print(f"χ² = {chi_squared:.2f} for {dof} degrees of freedom")
print(f"Reduced χ² = {chi_squared/dof:.2f}")
print(f"Maximum deviation from no-lensing: {np.max(np.abs(no_lensing_deviation)):.2f}σ")

if chi_squared > dof + 2*np.sqrt(2*dof):  # Rough significance threshold
    print("✓ Lensing signal potentially detected!")
else:
    print("✗ No significant lensing signal detected.")

## Summary and Key Takeaways

In this tutorial, you learned:

1. **Galaxy Catalogue Structure**: Realistic galaxy catalogues contain positions, redshifts, sizes, magnitudes, and types, with appropriate correlations between these properties.

2. **Lensing Effects on Observations**: 
   - Sizes scale as √(magnification)
   - Fluxes scale as magnification
   - Magnifications depend on convergence: μ = 1/(1-κ)²

3. **Observational Challenges**:
   - Atmospheric seeing broadens galaxy images
   - Pixelization introduces measurement errors
   - Noise increases measurement uncertainty
   - Selection effects bias the observed sample

4. **Detection Strategies**:
   - Bin galaxies by lensing strength (kappa)
   - Compare size distributions between bins
   - Require large samples for statistical significance
   - Account for observational systematics

5. **Statistical Considerations**:
   - Lensing effects are typically small (~few percent)
   - Large samples needed for detection
   - Careful error analysis essential
   - Selection effects can mimic or hide lensing signals

## Next Steps

In the next tutorial, we'll focus specifically on:
- Analyzing galaxy size distributions in detail
- Understanding convergence tests
- Comparing small patches to full surveys
- Statistical methods for robust analysis

## Questions for Further Exploration

1. How would increasing the survey area affect lensing detectability?
2. What if we had perfect size measurements (no observational errors)?
3. How do different galaxy selection criteria affect the lensing analysis?
4. Can we use other galaxy properties besides size to detect lensing?
5. How do systematic errors in redshift measurements propagate to lensing analysis?