<a href="https://www.kaggle.com/code/william2020/multi-layer-perception-in-mlx-explained?scriptVersionId=187289609" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Multi-layer Perception in MLX Explained

In this notebook, we will go through the `MLP` class definition line by line to understand how it constructs a Multi-Layer Perceptron (MLP) using the MLX framework.

### Multi-Layer Perceptrons (MLPs)
A Multi-Layer Perceptron (MLP) is a type of feedforward artificial neural network that consists of multiple layers of neurons, including an input layer, one or more hidden layers, and an output layer. Each neuron in a layer is connected to every neuron in the subsequent layer, and non-linear activation functions are applied to introduce complexity into the model. MLPs are widely used for various tasks such as classification, regression, and pattern recognition due to their ability to learn complex relationships in data.

### MLX Framework
MLX is a high-performance array computing library designed specifically for machine learning on Apple silicon. It provides a NumPy-like API and supports automatic differentiation, lazy computation, and seamless execution on both CPU and GPU. MLX leverages the unified memory architecture of Apple devices, allowing efficient computation without the need for explicit data transfers between CPU and GPU, making it an excellent choice for developing and training machine learning models.

By combining the strengths of MLPs with the efficiency of the MLX framework, we can build powerful neural network models optimized for Apple's hardware.

### First, let's start off with the full MPP class...

In [None]:
!pip install mlx

In [None]:
import mlx.nn as nn
import mlx

In [None]:
class MLP(nn.Module):
    def __init__(
        self, num_layers: int, input_dim: int, hidden_dim: int, output_dim: int
    ):
        super().__init__()
        layer_sizes = [input_dim] + [hidden_dim] * num_layers + [output_dim]
        self.layers = [
            nn.Linear(idim, odim)
            for idim, odim in zip(layer_sizes[:-1], layer_sizes[1:])
        ]

    def __call__(self, x):
        for l in self.layers[:-1]:
            x = mx.maximum(l(x), 0.0)
        return self.layers[-1](x)

## Let's break it down this Multi Layer Perception down line by line

### 1. Class Definition

```bash
    class MLP(nn.Module):
```

Defines a new class MLP that inherits from nn.Module. This is the base class for all neural network modules in MLX.

### 2. Initialization Method

```bash
    def __init__(self, num_layers: int, input_dim: int, hidden_dim: int, output_dim: int):
```

- The __init__ method is the constructor of the class.
	- Parameters:
        - num_layers: Number of hidden layers in the MLP.
        - input_dim: Dimension of the input features.
        - hidden_dim: Dimension of the hidden layers.
        - output_dim: Dimension of the output layer.

### 3. Superclass Initialization

```bash
    super().__init__()
```

Calls the constructor of the superclass nn.Module to initialize the base class.

### 4. Layer Sizes Definition

```bash
    layer_sizes = [input_dim] + [hidden_dim] * num_layers + [output_dim]
```

- Creates a list layer_sizes that defines the sizes of each layer in the network.
- Example: If input_dim=784, hidden_dim=128, num_layers=2, and output_dim=10, then layer_sizes will be [784, 128, 128, 10].

### 5. Creating the Layers

```bash
        self.layers = [
            nn.Linear(idim, odim)
            for idim, odim in zip(layer_sizes[:-1], layer_sizes[1:])
        ]
```

- Uses a list comprehension to create the linear layers of the MLP.
- zip(layer_sizes[:-1], layer_sizes[1:]) pairs each input dimension with the corresponding output dimension for each layer.
- nn.Linear(idim, odim) creates a linear layer with input dimension idim and output dimension odim.
- These layers are stored in self.layers.

### 6. Forward Method

```bash
    def __call__(self, x):
```

Defines the forward pass of the network. This method is called when the instance is used as a function, e.g., model(x).

### 7. Applying the Layers

```bash
        for l in self.layers[:-1]:
            x = mx.maximum(l(x), 0.0)
```

- Iterates over all layers except the last one.
- Applies each layer to the input x and uses the ReLU activation function (mx.maximum(l(x), 0.0)) to introduce non-linearity.

### 8. Final Layer

```bash
        return self.layers[-1](x)
```

- Applies the last layer to the input x and returns the output.
- The last layer does not use an activation function, allowing the network to output raw scores for tasks like classification.

## Conclusion

The MLP class defines a simple multi-layer perceptron with a customizable number of hidden layers. It uses linear layers and ReLU activation functions, except for the final layer which outputs raw scores. This structure is typical for feedforward neural networks used in various machine learning tasks.