# Feedforward Network
---
one-direction\
1. Input Layer: Each neuron represents a feature of the input data.
2. Hidden Layer(s): are responsible for learning the complex patterns in the data. Each neuron applies a weighted sum of inputs followed by a non-linear activation function to introduce non-linearity into the network enabling it to learn and model complex data patterns. 
3. Output Layer: Each neuron represents a class in classification or a prediction in regression.

At a high level, neural network layers are categorized as input, hidden, and output layers. The input layer receives the data, the output layer produces task-specific predictions, and the hidden layer(s) learn representations — this is a broad category that includes many specialized layer types such as dense (fully connected), convolutional, recurrent, pooling, normalization, dropout, and attention.

# Training a FNN
Basically adjusting the weights of the neurons to minimize the error between the predicted output and the actual output. This process is typically performed using backpropagation and gradient descent.\
1. **Forward Propagatin**: the input data passes through the network and the output is calculated.
2. **Loss Calculation**: MSE for regression, cross-entropy for classification.
3. **Backpropagation**: the error is propagated back through the network to update the weights. The gradient of the loss function with respect to each weights is calculated and the weights are adjusted using gradient descent.

## Forward pass

The forward pass computes predictions from inputs using the model's current weights. For a dense layer:

$$z = W x + b, \quad a = \phi(z)$$

Each layer's output becomes the next layer's input; the final layer produces $y_{pred}$. The forward pass alone does not update weights — weight updates occur during backpropagation after computing the loss.

In [None]:
# Minimal NumPy demo: single dense layer forward + manual backward (MSE)
import numpy as np
np.random.seed(0)
# batch_size=3, input_dim=4, output_dim=2
x = np.random.randn(3, 4)
W = np.random.randn(2, 4)
b = np.random.randn(2)
# targets (one-hot-like for simplicity)
y_true = np.array([[1, 0], [0, 1], [1, 0]])
# Forward: z = x W^T + b (identity activation)
z = x.dot(W.T) + b
y_pred = z
# Mean squared error (average over batch): 1/N * 0.5 * sum||y_pred - y_true||^2
loss = np.mean(0.5 * np.sum((y_pred - y_true)**2, axis=1))
print('Loss before update:', loss)
# Backward: gradients wrt outputs, weights, biases (batch-averaged)
dL_dy = (y_pred - y_true) / x.shape[0]  # shape (3,2)
dW = dL_dy.T.dot(x)  # shape (2,4)
db = np.sum(dL_dy, axis=0)  # shape (2,)
# Gradient step (SGD)
lr = 0.1
W -= lr * dW
b -= lr * db
# Forward again to see loss decrease
z2 = x.dot(W.T) + b
y_pred2 = z2
loss2 = np.mean(0.5 * np.sum((y_pred2 - y_true)**2, axis=1))
print('Loss after one SGD step:', loss2)

## Backpropagation

Backpropagation computes gradients of the loss with respect to each parameter using the chain rule and the values cached during the forward pass, then the optimizer uses those gradients to update weights.

. Typical steps:

1. Compute the loss $L$ from predictions and targets.
2. Compute output-layer error:\
  $\delta^L = \partial L / \partial z^L = (\partial L / \partial a^L) \odot \phi'(z^L)$.
3. Backpropagate errors layer-by-layer:\
  $\delta^{l} = (W^{l+1})^T \delta^{l+1} \odot \phi'(z^l)$.
4. Compute gradients:\
$\partial L / \partial W^l = \delta^l (a^{l-1})^T$,\ $\partial L / \partial b^l = \sum_{batch} \delta^l$.

An optimizer (SGD, Adam, etc.) uses these gradients to update parameters (e.g. $W eftarrow W - ta \nabla_W L$). Modern frameworks perform these steps automatically via reverse-mode automatic differentiation.

In [2]:
from re import I
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.losses import SparseCategoricalCrossentropy
from tensorflow.keras.metrics import SparseCategoricalAccuracy


In [4]:
minist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = minist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 0us/step


In [6]:
model = Sequential([
    Flatten(input_shape=(28, 28)), 
    # Flatten layer is input layer, it will convert 28*28(width * height) 2D array to 784 1D array, the unit is pixel
    Dense(128, activation='relu'),
    Dense(10, activation='softmax')
    # these two dense are hidden layers? 
    # the first dense layer has 128 neurons, the second dense layer has 10 neurons, they are fully connected layers by weights and biases
    # they are hiden layers because we don't see them in the input or output
])

  super().__init__(**kwargs)


## Model is compliled with
- Adam optimizer
- Sparse Categorical Crossentropy loss function
- Sparse Categorical Accuracy metric
- Then trained for 5 epochs on the training data

In [7]:
model.compile(optimizer=Adam(),
              loss=SparseCategoricalCrossentropy(),
              metrics=[SparseCategoricalAccuracy()])
model.fit(x_train, y_train, epochs=5)

test_loss, test_acc = model.evaluate(x_test, y_test)
print(f'\nTest accuracy: {test_acc}')

Epoch 1/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 5ms/step - loss: 0.2576 - sparse_categorical_accuracy: 0.9268
Epoch 2/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 5ms/step - loss: 0.1151 - sparse_categorical_accuracy: 0.9661
Epoch 3/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 5ms/step - loss: 0.0785 - sparse_categorical_accuracy: 0.9765
Epoch 4/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 5ms/step - loss: 0.0578 - sparse_categorical_accuracy: 0.9824
Epoch 5/5
[1m1875/1875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 5ms/step - loss: 0.0434 - sparse_categorical_accuracy: 0.9867
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - loss: 0.0905 - sparse_categorical_accuracy: 0.9720

Test accuracy: 0.972000002861023
