# Exoplanet Detection Technical Report:

### Objective:

Design a neural network to detect planets passing in front of their stars by analyzing observed brightness. 

***Answer the question:*** <br>
Out of all the stars visible in the night sky (thought to be 5,000) how many stars have detectable planets?
<br>

### Transit Detection Method:	

An exoplanet "transit" occurs when a planet passes in front of its star and can be observed as a decrease in the brightness measured from the star.

***Advantages:*** <br>
>Can be easily applied to many stars compared to other exoplanet detection methods. <br>

***Disadvantages:*** <br>
>Only a small fraction of all planets happen to have an orbital plane that can be detected from Earth. <br>
This method is known to have a high false positive rate. <br>

# Data Collection:

### Data Source:
***Recorded by:*** <br>
Kepler Spacecraft
> Designed to detect planets. <br>
> Campaign 4 time span used.

***Light Curves:***  <br>
Magnitude of brightness measured every 30 minutes for 66 days. <br>
Corrected Flux Magnitude accounted for systematic error and background light. <br>

***Planets Added to Training Data:*** <br>
1100 light curves for stars with confirmed planets were added to the training data. <br>
This gives the model more examples of transits to "learn" from. <br>

### Obtaining Data:
Wget scripts in batch files for campaign 4 downloaded fit files from NASA's Bulk Data API. <br>
For computational concerns, the data used in this project was limited to a small portion of this Kepler campaign.<br>
> This could be scaled to a server to process more data.

### Extracting Relevant Data:
The fit files contained the light curves along with many other readings from the spacecraft. <br>
The relevant data extracted from each fit file the  and the usable information was stored separately. <br>

The light curves from stars with confirmed planets were randomly selected from several timeframes. <br>
>This ensures that the missing values don't occur in some systematic pattern. <br>

### Handling Missing Values: <br>

***Isolated Missing Values:*** <br>
Calculated by taking the mean of the adjacent values.<br> 
This method was chosen to ensure the isolated missing values won't stand out in the data. <br>

***Consecutive Missing Values:*** <br>
The data contained some stretches of consecutive missing values up to 20 hours. <br>
These were filled in with the next non-null value in the time series.<br>

# Data Exploration:

### Solar Flares:
A relatively common feature in these light curves is solar flares. <br>
At first glance they look like outliers, but these are most likely massive explosions that greatly increase the brightness of the star for a short time.
<img src="assets/solar_flare.png">

### Exoplanet Transit:
This is a good example of what the light curve looks like with several very obvious transits. <br>
You can see the sharp decreases in brightness when the planet passes in front of the star. <br>
<img src="assets/transit_light_curve.png">

# Preprocessing:

### Setting the Time Interval:

To compare data retrieved from different Kepler missions, datasets were limited to the size of the smallest dataset which was about 66 days long. <br>
> The disadvantage of this is it is likely to exclude transit events for plantes with an orbital period of more than 66 days. <br>

This is a relatively small time window, for example, the shorted orbital period in our solar system is Mercury's orbit at 88 days. <br>
> However, most known exoplanets have an orbital period within this window.<br>

This could be due to the face that the transit detection method is more sensitve to planets close to their stars because they create a greater relative decrease in the observed brightness of the star. <br>
Planets with small orbits are also more likely to have orbital planes that intersect our observation point. <br>

### Scaling:
To compare the magnitude of light from different stars, the flux levels were set to a normal scale. <br>
This perserves transit events, but now stars with different brightnesses can be compared.<br>

# Neural Network:

### Model: <br>
***One Dimensional Convolutional Neural Network:*** <br>
Convolutional neural networks (CNN's) are often used in image processing and pattern detection problems where the sequence of the input is important.<br> 
Transit events create sharp dips in the brightness of their stars, this CNN will learn to recognize these types of patterns and predict if the star has an exoplanet. <br>

### Topology/Architecture:
***Layers:***<br>
> 3 Convolutional Layers <br>
3 Cooling Layers After Each Convolutional Layer <br>
2 Hidden Layers <br>

# Results:

***Training Validation Set Accuracy:*** <br>
85% Accuracy (baseline 31%) <br>

***Predictions:*** <br>
98 Exoplanet stars predicted: <br>
>12 true exoplanet stars in predictions (15x better than chance) <br>
30% of all exoplanet stars detected (40 stars) <br>

### Answer to the Question:
If we generalize these results to the 5000 stars visible in the night sky, we can conclude: <br>
>There are 40 stars with planets that transit between their star and Earth every 66 days. <br>
The model would generate a list of candidates that would contain 30% of the true planets. <br> 