In [1]:
%run ../talktools.py

# Background on APOGEE, the Cannon, Notes on Lab 2
Astro 128/256 (UC Berkeley, 2024)

## APOGEE

<img src="https://www.sdss.org/wp-content/uploads/2014/07/apogee_v3.jpg" width=500 height=500>


* APOGEE = Apache Point Observatory Galactic Evolution Experiment
* APOGEE is a near-IR spectroscopy survey that is part of the Sloan Digital Sky Survey
* APOGEE: 2008-2014, part of SDSS-III
* APOGEE-2: 2014 - 2020, part of SDSS IV
* APOGEE-2 maps the dynamical and chemical patterns of Milky Way stars with data from:
 * 1-meter New Mexico State (NMSU) Telescope 
 * 2.5-meter Sloan Foundation Telescope at the Apache Point Observatory (APO) in New Mexico (APOGEE-2N),  
 * 2.5-meter du Pont Telescope at Las Campanas Observatory (LCO) in Chile (APOGEE-2S)
* References: [Holtzman et al. 2015](https://arxiv.org/abs/1501.04110), [Majewski et al. 2017](https://arxiv.org/abs/1509.05420)
* APOGEE-3 (Milky Way Mapper): 2021 - 2025(?), part of <a href="https://www.sdss5.org">SDSS-V</a>

### APOGEE Key Science Goals

* What is the history of star formation and chemical enrichment of the Milky Way?
* What are the dynamics of the disk, bulge, and halo of the Milky Way?
* What is the age distribution of stars in the Milky Way?
* Do planet-hosting stars have different properties than stars that have no planets?

### APOGEE Technical Details

* Bright-time observations at APO and LCO
* Duration: Fall 2014 - Fall 2020
* Fiber Complement: 300 fibers per 7 deg$^2$ plate (APO 2.5-m) or 3.5 deg$^2$ plate (LCO 2.5-m), or 10 fiber (APO 1-m)
* Wavelength Range: $1.51-1.70$ $\mu m$
* Spectral Resolution: R$\sim$22,500
* APOGEE-1+2 Sample Size: $\sim$263,000 stars (APOGEE-3: 4 million stars)
* Signal-to-Noise Goal: S/N $>$ 100 per pixel
* Radial Velocity Precision: $\sim$200 m/s
* Element Abundance Precision: $\sim$0.1 dex for 20 calibrated species
* H-band $\lesssim 13$th magnitude

* The APOGEE-2 survey footprint, overlaid on an infrared image of the Milky Way. Each dot shows a position where APOGEE obtains at least 250 stellar spectra.

<img src="https://www.sdss.org/wp-content/uploads/2014/06/apogee2-survey-high.jpg" width=700 height=700>

* Example APOGEE spectra:

<img src="https://blog.sdss.org/wp-content/uploads/2015/01/apogee_tempsequence_new2.png?w=300" width=800 height=800>


## Stellar Parameters for APOGEE Spectra

### ASPCAP


* APOGEE Stellar Parameters and Chemical Abundances Pipeline
* Fit physically motivated stellar atmospheres to all of the APOGEE stellar spectra
 * 1D, plane-parallel Kurucz Stellar models ($T>3500$K)
* Use basic $\chi^2$ minimization
* Written in IDL
* Largely treated as "ground truth" for the APOGEE Survey
* Reference: [Garcia-Perez et al. 2015](http://adsabs.harvard.edu/abs/2015arXiv151007635G)
* But, very slow to run, making it impractical for fitting $\sim 10^5-10^6$ stellar spectra

### The Cannon 

* Named in homage to Annie Jump Cannon
* Data driven model for determing stellar parameters, also called stellar "labels" (e.g., T$_{\rm eff}$, log(g), [Fe/H])
* The idea is to select a set of stars for which we believe the derived labels
* For example, from those that have labels very well determined by ASPCAP or other high res spectra
 * Called the "Training Set"
* Then build a model to transfer labels of the "training set" to all other APOGEE stars
* Reference: [Ness et al. 2015](https://arxiv.org/abs/1501.07604)

## Lab 2

* You're going to build a version of "The Cannon" and measure stellar labels for APOGEE stars.
  * You will, more-or-less, follow the exact method from Ness et al. (2015)

* A few important concepts to think about:
  * __Training Set__ : The set of data from which your data-driven model with be calibrated (or "trained")
  * __Over-fitting__: A modeling error which occurs when a function is too closely fit to a limited set of data points. 
  * __Cross-validation__: any of various model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set.
  * For this lab, you'll only use part of your data to train and the other part to validate -- this is one approach to prevent over-fitting
 
* __Checkpoint 1:__  
  * This is data gathering and spectral exploration
  * Note that the data gathering is non-trivial and figuring it out is part of this lab