## Neural Networks

The first approach we used in order to tackle the task was to build a neural network (NN) in order to predict the mass ratio and the chirp mass, so that we could infer their distributions. 
We opted for this approach due to its flexibility, since the architecture of the network can be adapted and tailored for what is needed, from changing its depth to the activation functions used for each layer, the number of nodes and the objective of the task.
In the following sections are presented how the task was modeled and what kind of architectures we tried to use in order to predict the values for the mass ratios and the chirp masses. 

## Regression Vs Classification
Since the quantities of which we want to learn the distribution are continous, the problem can be modeled in different ways depending on how we want to deal with them:
* Keeping its continous nature, the task can be formulated as a **regression** where the algorithm, starting from the initial features of each record, try to predict the exact value of the final quantity as close as possible by approximating the complex dependencies from the input values. This task can be become extremely computationally heavy as the complexity of the data increase. This due to the size of the network needed to capture such intricate relatioships;
* A different approach in dealing with continous target values is to discretize them by assigning different labels to different ranges of their values. This allows to characterize each data record with its own label. In this case the problem can tackled by classification algorithm which will try to predict the label instead of the exact value of the discretized quantity. As the number of labels increase, the complexity of hte task grows.

## Direct Prediction Vs SoftMax Approach

In trying to predict the correct label in a classification algorithm its `objective` can be modified in order to gain more insight on what the model is learning. By objective is intended the way the output of the task is presented.
In this case we will refer to a direct prediction when the output consists in just the label infered from the initial input values, while an alternative approach can be returing the probability associated with each label to be assigned to a praticular record.
By using as the output activation function the `Softmax` function:
$$
\sigma(z)_j  = \frac{e^{z_j}}{\sum_{k} e^{z_k}}
$$
Here for each output node value $z_j$, the number of which is equal to the number of possible labels, is associated the probability $\sigma(z)_j $ of being guessed for the particular record used as input of the NN.
By inspecting the resulting probability distribution several quantities can be infered such as the most probable label as well the dispersion of the distribution which can give insight on how well the model is learning.

## Tested Architectures
For each of the following proposed architecture several variants have been tested in a grid search fashion, in order to improve the capabilities of the algorithms.
Due to the extremely complexity of the data neither regression like tasks, as well the classification ones, yielded satisfying results. Every NN is been implemented by the `Pytorch` python package.

### Regression Architecture
The network is designed with fully connected layers that converge into a single output node.
In a convolutional network fashion, the algorithm present parallel layers disconnected one from another so that each distinct group can focus on learning different aspects of the input data, for later being recombined into a single value.

In [None]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.in1 = nn.Linear(6,4)
        self.in2 = nn.Linear(6,4)
        self.in3 = nn.Linear(6,4)
        # self.in4 = nn.Linear(6,4)

        self.lin1_1 = nn.Linear(4,4)
        self.lin1_2 = nn.Linear(4,4)
        self.lin1_3 = nn.Linear(4,4)
        
        self.out = nn.Linear(12,1)

    # x represents our data
    def forward(self, x):
        x1 = self.in1(x)
        x1 = torch.relu(x1)
        
        x1 = self.lin1_1(x1)
        x1 = torch.relu(x1)
        
        x2 = self.in2(x)
        x2 = torch.relu(x2)
        
        x2 = self.lin1_2(x2)
        x2 = torch.relu(x2)
        
        x3 = self.in3(x)
        x3 = torch.relu(x3)
        
        x3 = self.lin1_3(x3)
        x3 = torch.relu(x3)
        
        
        output = torch.concat((x1, x2, x3), axis=1)
        output = self.out(output)
        output = torch.relu(output)
        
        return output

<center style="margin-left: 10%; margin-right: 10%; background-color: #eeeeee; padding-top: 10px; padding-bottom: 10px;"><figure><img src="Figures/lcp/NN.png" width="60%" height="30%"><figcaption></figcaption></figure></center><br>

<center style="margin-left: 10%; margin-right: 10%; background-color: #eeeeee; padding-top: 10px; padding-bottom: 10px;"><figure><img src="Figures/lcp/regression_distribution.png" width="100%" height="30%"><figcaption></figcaption></figure></center><br>

### Softmax Architecture
The NN is made of fully connected layers of increasing size, which correspond to the number of nodes they are made of.
In the final layer, for each output node is computed the softmax function which yield the probability distributiuon of predicting the label for a given input.

In [None]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.input = nn.Linear(5, 10)
        self.linear1 = nn.Linear(10, 15)
        self.output = nn.Linear(15, 20)

    # x represents our data
    def forward(self, x):
        x = self.input(x)
        x = torch.relu(x)
        
        x = self.linear1(x)
        x = torch.relu(x)
        
        x = self.output(x)
        x = F.softmax(x, dim=0)
        
        return x

<center style="margin-left: 10%; margin-right: 10%; background-color: #eeeeee; padding-top: 10px; padding-bottom: 10px;"><figure><img src="Figures/lcp/NNsm.png" width="100%" height="30%"><figcaption></figcaption></figure></center><br>


<center style="margin-left: 10%; margin-right: 10%; background-color: #eeeeee; padding-top: 10px; padding-bottom: 10px;"><figure><img src="Figures/lcp/sm_distribution.png" width="100%" height="30%"><figcaption></figcaption></figure></center><br>
