# Convolutional Neural Network

## CNN1D

### Inputs (7)

|Feature|Description|
|:--:|:--:|
|j1_ptrel|ratio of the pT of each consistent to the pT of the jet|
|j1_etarot|rotated eta of each constituent|
|j1_phirot|rotated phi of each constituent|
|j1_erel|ratio of the energy of each consistent to the pT of the jet|
|j1_deltaR|sqrt ((Δeta)2 + (Δ phi)2 )|
|j1_costhetarel|cos (angle (constituent, jet))|
|j1_pdgid|PDG ID number of the constituent|
|(j1_index)|This will be dropped in training|

MaxParticles: 100

### Labels (5)

|Label|Description|
|:--:|:--:|
|j_g|Gluon jet|
|j_q|Light-quark jet|
|j_w|W-boson|
|j_z|Z-boson|
|j_t|Top-quark|
|(j1_index)|This will be dropped in training|

### Model structure

    Model: "model"
    _________________________________________________________________
    Layer (type)                Output Shape                Param #   
    =================================================================
    input (InputLayer)          [(None, 100, 7)]            0         
    _________________________________________________________________
    conv1_relu_1 (Conv1D)       (None, 100, 8)              232       
    _________________________________________________________________
    conv1_relu_2 (Conv1D)       (None, 50, 4)               132       
    _________________________________________________________________
    conv1_relu_3 (Conv1D)       (None, 17, 2)               34        
    _________________________________________________________________
    flatten (Flatten)           (None, 34)                  0         
    _________________________________________________________________
    fc1_relu (Dense)            (None, 32)                  1120      
    _________________________________________________________________
    rnn_densef (Dense)          (None, 5)                   165       
    =================================================================
    Total params: 1,683
    Trainable params: 1,683
    Non-trainable params: 0
    _________________________________________________________________


#### Input Shape: (100, 7) *100 particles, 7 features*

#### Conv1D Layers (3)

    Filters:                8 + 4 + 2
    Kernel_size:            4 + 4 + 4
    Strides:                1 + 2 + 3
    Regularizer:            Lasso regularization (l = 1e-4)
    Activation function:    Relu
    Kernel initializer:     he_normal

#### Dense Layers (1)

    Perceptrons:            32
    Activation function:    lecun_uniform
    Regularizer:            Lasso regularization (l = 1e-4)
    Kernel initializer:     lecun_uniform

#### Output layer (1)

    Output:                 5-class Classification
    Activation function:    Softmax
    Kernel initializer:     lecun_uniform

##### Learning rate:         1e-4

##### Optimizer:             Adam

##### Loss function:         categorical_crossentropy

##### Metrics:               Accuracy


Convolutional Neural Network is an image-based deep neural network. [Useful Reading](https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53)

It is a very good entry to know about how image classification deep learning application could use in our jet classification project.

In [None]:
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, Flatten, Conv1D
from tensorflow.keras.models import Model
from tensorflow.keras.regularizers import l1
from tensorflow.keras.optimizers import Adam
import h5py
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.metrics import roc_curve, auc
import matplotlib.pyplot as plt
from tqdm import tqdm

# Data Preprocess

From CNN, we will tend to use low-level features as our input for training because they would produce better results than high-level features. First of all, just like in DNN, we need to take all the features and labels we need from the dataset.

In [None]:
# To use one data file:
h5File = h5py.File('data/processed-pythia82-lhc13-all-pt1-50k-r1_h022_e0175_t220_nonu_withPars_truth_0.z', 'r')
treeArray = h5File['t_allpar_new'][()]

h5File.close()

print(treeArray.shape)

# List of features to use
features = ['j1_ptrel', 'j1_etarot', 'j1_phirot', 'j1_erel', 'j1_deltaR', 'j1_costhetarel', 'j1_pdgid', 'j_index']

# List of labels to use
labels = ['j_g', 'j_q', 'j_w', 'j_z', 'j_t', 'j_index']

# Convert to dataframe
features_labels_df = pd.DataFrame(treeArray,columns=list(set(features+labels)))
features_labels_df = features_labels_df.drop_duplicates()

features_df = features_labels_df[features]
labels_df = features_labels_df[labels]
labels_df = labels_df.drop_duplicates()

# Convert to numpy array 
features_val = features_df.values
labels_val = labels_df.values     

if 'j_index' in features:
    features_val = features_val[:,:-1] # drop the j_index feature
if 'j_index' in labels:
    labels_val = labels_val[:,:-1] # drop the j_index label
    print(labels_val.shape)

Then, we need to preprocess the data from constituents(particles) to jets in the way of 1d array. We implemented tqdm to know the preprocessing progress.

In [None]:
MaxParticles = 100
features_2dval = np.zeros((len(labels_df), MaxParticles, len(features)-1))
for i in tqdm(range(0, len(labels_df))):
    features_df_i = features_df[features_df['j_index']==labels_df['j_index'].iloc[i]]
    index_values = features_df_i.index.values
    features_val_i = features_val[np.array(index_values), :]
    nParticles = len(features_val_i)
    features_val_i = features_val_i[features_val_i[:, 0].argsort()[::-1]] # sort descending by ptrel
    if nParticles > MaxParticles:
        features_val_i =  features_val_i[0:MaxParticles, :]
    else:        
        features_val_i = np.concatenate([features_val_i, np.zeros((MaxParticles-nParticles, len(features)-1))])
    features_2dval[i, :, :] = features_val_i
features_val = features_2dval

Then, we will do our favorite train_test_split using scikit-learn package.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(features_val, labels_val, test_size=0.2, random_state=42)

The last step is to normalize our X(predictor) input for the CNN model to recognize. 

In [None]:
#Normalize conv inputs
reshape_X_train_val = X_train.reshape(X_train.shape[0]*X_train.shape[1], X_train.shape[2])
scaler = preprocessing.StandardScaler().fit(reshape_X_train_val)
for p in range(X_train.shape[1]):
    X_train[:,p,:] = scaler.transform(X_train[:, p, :])
    X_test[:,p,:] = scaler.transform(X_test[:, p, :])    

if 'j_index' in labels:
    labels = labels[:-1]

# Train

Construct the model from the description at the top of the page. To understand why the model is construct this way, [Useful Reading](https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53) is really helpful

In [None]:
l1Reg = 0.0001
Inputs = Input(shape = (100,7))
x = Conv1D(filters=8, kernel_size=4, strides=1, padding='same',
               kernel_initializer='he_normal', use_bias=True, name='conv1_relu_1',
               activation = 'relu', kernel_regularizer=l1(l1Reg))(Inputs)
x = Conv1D(filters=4, kernel_size=4, strides=2, padding='same',
               kernel_initializer='he_normal', use_bias=True, name='conv1_relu_2',
               activation = 'relu', kernel_regularizer=l1(l1Reg))(x)
x = Conv1D(filters=2, kernel_size=4, strides=3, padding='same',
               kernel_initializer='he_normal', use_bias=True, name='conv1_relu_3',
               activation = 'relu', kernel_regularizer=l1(l1Reg))(x)
x = Flatten()(x)
x = Dense(32, activation='relu', kernel_initializer='lecun_uniform', 
              name='fc1_relu', kernel_regularizer=l1(l1Reg))(x)
predictions = Dense(5, activation='softmax', kernel_initializer='lecun_uniform', name='rnn_densef')(x)
model = Model(inputs=Inputs, outputs=predictions)
model.summary()

Since we have five output labels, we will use `categorial_crossentropy` in our loss hyper parameter. 

In [None]:
adam = Adam(lr = 0.0001)
model.compile(optimizer=adam, loss=['categorical_crossentropy'], metrics=['accuracy'])
history = model.fit(X_train, y_train, batch_size = 1024, epochs = 100, 
                    validation_split = 0.25, shuffle = True, callbacks = None,
                    use_multiprocessing=True, workers=4)

# Evaluation

Just like in DNN, we will use the same methods to evaluate our training performance for CNN

In [None]:
def learningCurveLoss(history):
    plt.figure()
    plt.plot(history.history['loss'], linewidth=1)
    plt.plot(history.history['val_loss'], linewidth=1)
    plt.title('Model Loss over Epochs')
    plt.legend(['training sample loss','validation sample loss'])
    plt.xlabel('epoch')
    plt.ylabel('loss')
    plt.show()

In [None]:
learningCurveLoss(history)

In [None]:
def makeRoc(features_val, labels_val, labels, model, outputSuffix=''):
    labels_pred = model.predict(features_val)
    df = pd.DataFrame()
    fpr = {}
    tpr = {}
    auc1 = {}
    plt.figure()       
    for i, label in enumerate(labels):
        df[label] = labels_val[:,i]
        df[label + '_pred'] = labels_pred[:,i]
        fpr[label], tpr[label], threshold = roc_curve(df[label],df[label+'_pred'])
        auc1[label] = auc(fpr[label], tpr[label])
        plt.plot(fpr[label],tpr[label],label='%s tagger, AUC = %.1f%%'%(label.replace('j_',''),auc1[label]*100.))
    plt.xlabel("Background Efficiency")
    plt.ylabel("Signal Efficiency")
    plt.xlim([-0.05, 1.05])
    plt.ylim(0.001,1.05)
    plt.grid(True)
    plt.legend(loc='lower right')
    plt.title('%s ROC Curve'%(outputSuffix))
    #plt.savefig('%s_ROC_Curve.png'%(outputSuffix))
    return labels_pred

In [None]:
y_pred = makeRoc(X_test, y_test, labels, model, outputSuffix='Conv1d')

### Exercise

Congratulations on finishing CNN1D tutorial. Now think how CNN2D will work and finish reading this [article](https://towardsdatascience.com/understanding-1d-and-3d-convolution-neural-network-keras-9d8f76e29610)