This repo has code and notebooks for our final project.
Team ExoPlanet was focused on helping astronomers and scientists understand the different machine learning algorithms used to detect exoplanets. Using data from NASA’s Kepler and TESS satellite missions, which contain graphical views of star brightness over time called threshold crossing events, we are applying known existing planet validation algorithms and comparing these results on a user-friendly website. In addition, we have built our own detection algorithm model that slightly improves the accuracy of exoplanet validation. We intend to use our website to contribute to peer and industry learning regarding exoplanet validation that can be done using machine learning techniques rather than manual visual inspection.
The W2P model relies in part on earlier work done on the Astronet model by Shallue and Vandenberg, with some further inspiration from Firmino.
- Get list of TCEs from the Kepler website
- Download raw data files (.FITS) for all TCEs from the Mikulsi Archive. Use a script to create a batch file to retrieve one by one
- Process data files into global and local vectors representing light curves using existing Kepler processing pipeline
- Create PNGs of light curves and move to S3 for use in Tableau
- Use global and local vectors to build w2p CNN model for TCE classification and add those results to runs from Robovetter and Autovetter
- Add classification results from Triceratops
- Create output file used by TABLEAU
-
/join-to-tess: This folder contains notebooks used to join Kepler space telescope transit candidate events (TCEs) to TESS space telescope data. It is used in the classification with the
triceratops
model. The Kepler dataset is calledfull_tce_list.csv
and can be found in thekepler-robovetter
folder. The TESS dataset is calledCTL_v8_ExoFOP-TESS.csv
and can be downloaded from ExoFOP-TESS.join.ipynb
: takes in the Kepler TCE list file and finds the corresponding TESS object IDs.Get Target Pixel File Counts.ipynb
: finds the number of target pixel files each TESS object ID has. This file is necessary for thetriceratops
tool classification
-
/triceratops: This folder contains our running version of the Triceratops model. See more detailed description below.
triceratops.ipynb
: this notebook takes in planet candidate entries and outputs probability of being a planet candidate as well as classificiations (false positives or planet candidates) from the probabilities.join.ipynb
: this notebook takes in the results of the above classification as well as the w2p classification and merges the datasets together.
-
/w2p: This folder has our exoplanet classification model
create_tableau_data_file.ipynb
: notebook that reads in the output csv file from Triceratops folder (which begins with the output csv from the w2p model) and createsforweb3.csv
, which is the data file used for our Tableau visualizaton*exoplanet_model_v3.ipynb
: this is the notebook which creates the w2p deep learning model to classify TCEs as either planets or no planets. It outputs a csv file into /processed_data with its results. In the case of our core CNN model that file isw2p_cnn_final.csv
forweb3.csv
: see aboveget_light_curves.py
: master script for retrieving raw light curve data files from online archives. Calls make_light_curve_batch.py which creates a batch file (get_kepler.sh
) and then runs the batch file (with timing).make_dataset.py
: runs the entire Kepler processing pipeline to using raw light curve/FITS files downloaded into raw_data. Stores results in /processed_data. This can be parallelized as demonstrated withmake_png.py
: makes PNG light curves from the processed fit files. Thens3_upload_png.py
moves them to S3 bucket- /processed data: This folder has the processed light curve data stored in two files that contain the training data -
globalbinned_df.csv
andlocalbinned_df.csv
. Not stored on github due to space constraintsw2p_cnn_final.csv
: this is output file from model showing classifications- /light_curve_png: this folder stores all the light curve PNGs that are created and then uploaded to S3
- /raw_data
make_light_curve_batch.py
: python script to create a batch file in light_curves directory and can be run to download the thousands of .FIT curves required for analysis- /light_curves: this folder stores all the FITS files downloaded. Not stored on github due to size
get_kepler.sh
: batch file that retrieves light curves and stores in this directory
s3_upload_png.py
: takes PNGs created by make_png.py and uploads to S3 bucket so can be used in TABLEAU visualizations
-
catalog_tab3.twbx
: Tableau workbook. Published to Tableau Public here. Embedded in project website
The triceratops
tool is used to validate planet candidates and it uses data from the TESS space telescope.
The triceratops
package can be installed with the following command:
pip install triceratops
More on triceratops
can be found in the tool creators' triceratops repo.
https://docs.astropy.org/en/stable/io/fits/
https://exoplanetarchive.ipac.caltech.edu/docs/API_kepcandidate_columns.html https://exoplanetarchive.ipac.caltech.edu/docs/API_tce_columns.html
https://exoplanetarchive.ipac.caltech.edu/cgi-bin/TblView/nph-tblView?app=ExoTbls&config=tce