Final Project for W210, MIDS, University of California Berkeley

Exoplanet Discovery

Fall 2020

Christine Barger, Cullen Kavoussi, Travis Metz, Dean Wang

This repo has code and notebooks for our final project.

Team ExoPlanet was focused on helping astronomers and scientists understand the different machine learning algorithms used to detect exoplanets. Using data from NASA’s Kepler and TESS satellite missions, which contain graphical views of star brightness over time called threshold crossing events, we are applying known existing planet validation algorithms and comparing these results on a user-friendly website. In addition, we have built our own detection algorithm model that slightly improves the accuracy of exoplanet validation. We intend to use our website to contribute to peer and industry learning regarding exoplanet validation that can be done using machine learning techniques rather than manual visual inspection.

The W2P model relies in part on earlier work done on the Astronet model by Shallue and Vandenberg, with some further inspiration from Firmino.

General workflow

Get list of TCEs from the Kepler website
Download raw data files (.FITS) for all TCEs from the Mikulsi Archive. Use a script to create a batch file to retrieve one by one
Process data files into global and local vectors representing light curves using existing Kepler processing pipeline
Create PNGs of light curves and move to S3 for use in Tableau
Use global and local vectors to build w2p CNN model for TCE classification and add those results to runs from Robovetter and Autovetter
Add classification results from Triceratops
Create output file used by TABLEAU

Explanation of key folder structure and files

w210

/join-to-tess: This folder contains notebooks used to join Kepler space telescope transit candidate events (TCEs) to TESS space telescope data. It is used in the classification with the triceratops model. The Kepler dataset is called full_tce_list.csv and can be found in the kepler-robovetter folder. The TESS dataset is called CTL_v8_ExoFOP-TESS.csv and can be downloaded from ExoFOP-TESS.
- join.ipynb: takes in the Kepler TCE list file and finds the corresponding TESS object IDs.
- Get Target Pixel File Counts.ipynb: finds the number of target pixel files each TESS object ID has. This file is necessary for the triceratops tool classification
/triceratops: This folder contains our running version of the Triceratops model. See more detailed description below.
- triceratops.ipynb: this notebook takes in planet candidate entries and outputs probability of being a planet candidate as well as classificiations (false positives or planet candidates) from the probabilities.
- join.ipynb: this notebook takes in the results of the above classification as well as the w2p classification and merges the datasets together.
/w2p: This folder has our exoplanet classification model
- create_tableau_data_file.ipynb: notebook that reads in the output csv file from Triceratops folder (which begins with the output csv from the w2p model) and creates forweb3.csv, which is the data file used for our Tableau visualizaton*
- exoplanet_model_v3.ipynb: this is the notebook which creates the w2p deep learning model to classify TCEs as either planets or no planets. It outputs a csv file into /processed_data with its results. In the case of our core CNN model that file is w2p_cnn_final.csv
- forweb3.csv: see above
- get_light_curves.py: master script for retrieving raw light curve data files from online archives. Calls make_light_curve_batch.py which creates a batch file (get_kepler.sh) and then runs the batch file (with timing).
- make_dataset.py: runs the entire Kepler processing pipeline to using raw light curve/FITS files downloaded into raw_data. Stores results in /processed_data. This can be parallelized as demonstrated with
- make_png.py: makes PNG light curves from the processed fit files. Then s3_upload_png.py moves them to S3 bucket
- /processed data: This folder has the processed light curve data stored in two files that contain the training data - globalbinned_df.csv and localbinned_df.csv. Not stored on github due to space constraints
  - w2p_cnn_final.csv: this is output file from model showing classifications
  - /light_curve_png: this folder stores all the light curve PNGs that are created and then uploaded to S3
- /raw_data
  - make_light_curve_batch.py: python script to create a batch file in light_curves directory and can be run to download the thousands of .FIT curves required for analysis
  - /light_curves: this folder stores all the FITS files downloaded. Not stored on github due to size
    - get_kepler.sh: batch file that retrieves light curves and stores in this directory
- s3_upload_png.py: takes PNGs created by make_png.py and uploads to S3 bucket so can be used in TABLEAU visualizations
catalog_tab3.twbx: Tableau workbook. Published to Tableau Public here. Embedded in project website

triceratops

The triceratops tool is used to validate planet candidates and it uses data from the TESS space telescope.

The triceratops package can be installed with the following command:

pip install triceratops

More on triceratops can be found in the tool creators' triceratops repo.

Documentation for accessing FITS files

https://docs.astropy.org/en/stable/io/fits/

Documentation on TCE and KOI column names

https://exoplanetarchive.ipac.caltech.edu/docs/API_kepcandidate_columns.html https://exoplanetarchive.ipac.caltech.edu/docs/API_tce_columns.html

List of Kepler TCE from DR25 (34,032)

https://exoplanetarchive.ipac.caltech.edu/cgi-bin/TblView/nph-tblView?app=ExoTbls&config=tce

DR25 KOI (8054) - has 'final' disposition

https://exoplanetarchive.ipac.caltech.edu/cgi-bin/TblView/nph-tblView?app=ExoTbls&config=q1_q17_dr25_koi

Name		Name	Last commit message	Last commit date
Latest commit History 141 Commits
HTML		HTML
astronet		astronet
images		images
join-to-tess		join-to-tess
kepler-robovetter		kepler-robovetter
triceratops		triceratops
w2p		w2p
.gitignore		.gitignore
README.md		README.md
catalog_tab3.twbx		catalog_tab3.twbx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Final Project for W210, MIDS, University of California Berkeley

Exoplanet Discovery

Fall 2020

Christine Barger, Cullen Kavoussi, Travis Metz, Dean Wang

General workflow

Explanation of key folder structure and files

w210

triceratops

Documentation for accessing FITS files

Documentation on TCE and KOI column names

List of Kepler TCE from DR25 (34,032)

DR25 KOI (8054) - has 'final' disposition

About

Releases

Packages

Contributors 4

Languages

travisrmetz/w210

Folders and files

Latest commit

History

Repository files navigation

Final Project for W210, MIDS, University of California Berkeley

Exoplanet Discovery

Fall 2020

Christine Barger, Cullen Kavoussi, Travis Metz, Dean Wang

General workflow

Explanation of key folder structure and files

w210

triceratops

Documentation for accessing FITS files

Documentation on TCE and KOI column names

List of Kepler TCE from DR25 (34,032)

DR25 KOI (8054) - has 'final' disposition

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages