# Ch 2 - Mathematical Building Blocks of Neural Networks

Understanding deep learning requires familiarity with many simple mathematical concepts: tensors, tensor operations, differentiation, gradient descent, and so on.

## 2.1 A First Look at a Neural Network



The problem we’re trying to solve here is to classify grayscale images of handwritten digits (28 × 28 pixels) into their 10 categories (0 through 9). We’ll use the MNIST dataset, a classic in the machine-learning community, which has been around almost as long as the field itself and has been intensively studied. It’s a set of 60,000 training images, plus 10,000 test images, assembled by the National Institute of Standards and Technology (the NIST in MNIST) in the 1980s. You can think of “solving” MNIST as the “Hello World” of deep learning—it’s what you do to verify that your algorithms are working as expected.

Note on classes and labels:
- In machine learning, a category in a classification problem is called a class. Data points are called samples. The class associated with a specific sample is called a label.

![MNIST](Images/02_01.jpg)



In [None]:
from keras.datasets import mnist

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

*train_images* and *train_labels* form the training set, the data that the model will learn from.

The model will then be tested on the test set, *test_images* and *test_labels*.

The images are encoded as Numpy arrays, and the labels are an array of digits, ranging from 0 to 9. The images and labels have a one-to-one correspondence.

#### Training Data:

In [None]:
train_images.shape

In [None]:
len(train_labels)

In [None]:
train_labels

#### Testing Data:

In [None]:
test_images.shape

In [None]:
len(test_labels)

In [None]:
test_labels

The workflow will be as follows: First, we’ll feed the neural network the training data, train_images and train_labels. The network will then learn to associate images and labels. Finally, we’ll ask the network to produce predictions for test_images, and we’ll verify whether these predictions match the labels from test_labels.

#### The Network Architecture

In [None]:
from keras import models
from keras import layers

In [None]:
network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
network.add(layers.Dense(10, activation='softmax'))

The core building block of a neural network is the layer, a data-processing module that you can think of as a filter for data. Some data goes in, and it comes out in a more useful form. Specifically, layers extract representations out of the data fed into them—hopefully, representations that are more meaningful for the problem at hand. Most of deep learning consists of chaining together simple layers that will implement a form of progressive data distillation. A deep-learning model is like a sieve for data processing, made of a succession of increasingly refined data filters—the layers.


Here, our network consists of a sequence of two Dense layers, which are densely connected (also called fully connected) neural layers. The second (and last) layer is a 10-way softmax layer, which means it will return an array of 10 probability scores (summing to 1). Each score will be the probability that the current digit image belongs to one of our 10 digit classes.


To make the network ready for training, we need to pick three more things, as part of the compilation step:

- A loss function: how the network will be able to measure its performance on the training data, and thus how it will be able to steer itself in the right direction.

- An optimizer: the mechanism through which the network will update itself based on the data it sees and its loss function.

- Metrics to monitor during training and testing: here, we'll only care about the accuracy (the fraction of the images that were correctly classified).

#### The Compilation Step

In [None]:
network.compile(optimizer='rmsprop',\
                loss='categorical_crossentropy',\
                metrics=['accuracy'])

Before training, we'll preprocess the data by reshaping it into the shape the network expects and scaling it so that all values are in the [0, 1] interval. Previously, our training images, for instance, were stored in an array of shape (60000, 28, 28) of type uint8 with values in the [0, 255] interval. We transform it into a float32 array of shape (60000, 28 * 28) with values between 0 and 1.

#### Preparing the Image Data

In [None]:
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

#### Preparing the Labels

In [None]:
from keras.utils import to_categorical

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

Next we will train the network, which in Keras is done via a call to the network's fit method - we fit the model to its training data:

In [None]:
network.fit(train_images, train_labels, epochs=5, batch_size=128)

Two quantities are displayed during training: the loss of the network over the training data, and the accuracy of the network over the training data.

In [None]:
test_loss, test_acc = network.evaluate(test_images, test_labels)
print('test_acc:', test_acc)

The test-set accuracy turns out to be 97.8%—that’s quite a bit lower than the training set accuracy. This gap between training accuracy and test accuracy is an example of overfitting: the fact that machine-learning models tend to perform worse on new data than on their training data.

## 2.2 Data Representations for Neural Networks

### 2.2.1 Scalars (DD Tensors)

### 2.2.2 Vectors (1D Tensors)

### 2.2.3 Matrices (2D Tensors)

### 2.2.4 3D Tensors and Higher-Dimensional Tensors

### 2.2.5 Key Attributes

### 2.2.6 Manipulating Tensors in Numpy

### 2.2.7 The Notion of Data Batches

### 2.2.8 Real-World Examples of Data Tensors

### 2.2.9 Vector Data

### 2.2.10 Timeseries Data or Sequence Data

### 2.2.11 Image Data

### 2.2.12 Video Data

## 2.3 The Gears of Neural Networks: Tensor Operations

### 2.3.1 Element-Wise Operations

### 2.3.2 Broadcasting

### 2.3.3 Tensor Dot

### 2.3.4 Tensor Reshaping

### 2.3.5 Geometric Interpretation of Tensor Operations

### 2.3.6 A Geometric Interpretation of Deep Learning

## 2.4 The Engine of Neural Networks: Gradient-Based Optimization

### 2.4.1 What's a Derivative?

### 2.4.2 Derivative of a Tensor Operation: the Gradient

### 2.4.3 Stochastic Gradient Descent

### 2.4.4 Chaining Derivatives: the Backpropagation Algorithm

## 2.5 Looking Back at our First Example