***
# Week 12 Higgs Boson Case Study
MSDS 7333 Quantifying the World  
*Allison Roderick, Jenna Ford, and Will Arnost* 
***

## Table of Contents

<a href='#Section_1'> 1. Introduction </a>  
<a href='#Section_2'> 2. Question </a>  
<a href='#Section_3'> 3. Methods </a>  
<a href='#Section_3_a'> &nbsp;&nbsp;&nbsp; a. Dataset </a>  
<a href='#Section_3_b'> &nbsp;&nbsp;&nbsp; b. Neural Network Structure </a>  
<a href='#Section_3_c'> &nbsp;&nbsp;&nbsp; c. Other Considerations </a>  
<a href='#Section_4'> 4. Modeling </a>  
<a href='#Section_5'> 5. Results </a>  
<a href='#Section_6'> 6. Conclusion </a>  
<a href='#Section_7'> 7. References </a>  
<a href='#Section_8'> 8. Code </a>  

In [4]:
%%html
<style>
  table {margin-left: 0 !important;}
</style>

<a id = 'Section_1'></a>

## 1. Introduction

This week's case study involves replicating results produces in the paper "Searching for Exotic Particles in High-Energy Physics with Deep Learning" by Baldi, Sadowski, and Whiteson [1]. The 2014 paper looks to distinguish between particle collisions that produce exotic particle and those that do not. The authors investigate the use of deep neural networks to improve accuracy over other methods. 

We will attempt to replicate that paper's neural network architecture and performance. The packages used in the paper are outdated, so we will be using tensorflow to build our network. We hope to get as close to their AUC as possible.

<a id = 'Section_2'></a>

## 2. Question

Given the following paper: https://arxiv.org/pdf/1402.4735.pdf

Build a replica Neural Network with the paper’s architecture using Tensorflow. If possible begin to train on the data located here: https://archive.ics.uci.edu/ml/datasets/HIGGS. How close can you get to the original results?
To facilitate quicker training you may increase the batch size temporarily (this has a small impact on final result, but can speed you calculations significantly). You do not need to train a final result using the paper’s parameters, only the code for your model is required in your final submission.

Include in your report:
Based on the class notes and discussion suggest improvements to the procedure. What are standard practices now versus when this paper was written? What kind of improvements do they provide?
How would you quantify if your result duplicated the paper’s?


<a id = 'Section_3'></a>

## 3. Methods

This section gives an overview of what we know about the data and how we prepared the dataset for modeling.

<a id = 'Section_3_a'></a>

### 3a. Dataset

The dataset contains 11 million observations and 29 columns. The target column indicates if the collision produced exotic particles. The remaining 28 columns are numeric. There are no missing values in the data.

To replicate the methods of the paper, we will use a sample of 2.6M records for model training and 100K records for validation.

<a id = 'Section_3_b'></a>

### 3b. Neural Network Stucture

<a id = 'Section_3_c'></a>

To replicate the study, we want to create a network with the following structure:
- 5 densly connected layers
 - 4 layers with 300 hidden units and tanh activation functions
 - 1 output layer with a linear activation function
- learning rate of 0.05
- weight decay coefficient of 1 × 10−5
 - The learning rate decayed by a factor of 1.0000002 every batch update until it reached a minimum of 10−6
- <!>Weights were initialized from a normal distribution with zero mean and standard deviation 0.1 in the first layer, 0.001 in the output layer, and 0.05 all other hidden layers.
- Mini-Batch sizes of 100
- <!>A momentum term increased linearly over the first 200 epochs from 0.9 to 0.99, at which point it remained constant
- <!>Training ended when the momentum had reached its maximum value and the minimum error on the validation set (500,000 examples) had not decreased by more than a factor of 0.00001 over 10 epochs.his early stopping prevented overfitting and resulted in each neural network being trained for 200-1000 epochs
- <!> Autoencoder pretraining was performed by training a stack of single-hidden-layer autoencoder networks as in [9], then fine-tuning the full network using the class labels. Each autoencoder in the stack used tanh hidden units and linear outputs, and was trained with the same initialization scheme, learning algorithm, and stopping parameters as in the fine-tuning stage. When training with dropout, we increased the learning rate decay factor to 1.0000003, and only ended training when the momentum had reached its maximum value and the error on the validation set had not decreased for 40 epochs.
- <!> Input features were standardized over the entire train/test set with mean zero and standard deviation one, except for those features with values strictly greater than zero – these we scaled so that the mean value was one.

### 3c. Other Considerations

Tensorflow is much newer than the Theano library used in the original paper. Methods for training neural networks have also changed since the paper was written in 2014. We are not able to replicate all aspects of the original network and explain those instances here.

<a id = 'Section_4'></a>

## 4. Modeling

<a id = 'Section_5'></a>

## 5. Results

<a id = 'Section_6'></a>

## 6. Conclusion

<a id = 'Section_7'></a>

<a id = 'Section_7'></a>

## 7. References

1. https://arxiv.org/pdf/1402.4735.pdf Searching for Exotic Particles in High-Energy Physics with Deep Learning by Baldi, Sadowski, and Whiteson

<a id = 'Section_8'></a>

## 8. Code