# Ch 3 - Getting Started with Neural Networks

## 3.1 Anatomy of a Neural Network

Training a neural network revolves around the following objects:

- **Layers**, which are combined into a **network** (or **model**)

- The **input data** and corresponding **targets**

- The **loss function**, which defines the feedback signal used for learning

- The **optimizer**, which determines how learning proceeds



![NN1](Images/03_01.jpg)


The network maps the input data to predictions. The loss function then compares these predictions to the targets, producing a loss value: a measure of how well the network's predictions match what was expected. The optimizer uses this loss value to update the network's weights. 



### 3.1.1 Layers: the Building Blocks of Deep Learning


The **layer** is the fundamental data structure in neural networks. It is a data-processing module that takes one or more tensors as input and outputs one or more tensors. 

Some layers are stateless, but they more frequently have a state: the layer's **weights**, one or several tensors learned with stochastic gradient descent, which together contain the network's knowledge. 

Different layers are appropriate for different tensor formats and different types of data processing. 

- **Dense Layers**: Simple vector data stored in 2D tensors of shape (samples, features) is often processed by densely connected layers, also called fully connected or dense layers (the Dense class in Keras).

- **Recurrent Layers**: Sequence data stored in 3D tensors of shape (samples, timesteps, features), is typically processed by recurrent layers such as an LSTM layer.

- **2D Convolution Layers**: Image data stored in 4D tensors is usually processed by 2D convolution layers (Conv2D).


Building deep-learning models in Keras is done by clipping together compatible layers to form useful data-transformation pipelines. The notion of **layer compatibility** here refers specifically to the fact that every layer will only accept input tensors of a certain shape and will return output tensors of a certain shape.




#### Example:
(A dense layer with 32 output units)




In [7]:
from keras import layers

layer = layers.Dense(32, input_shape=(784, ))

We are creating a layer that will only accept 2D tensors as input where the first dimension is 784. Since axis 0 is unspecified, any value would be accepted. This layer will return a tensor where the first dimension has been transformed to be 32.

This layer can only be connected to a downstream layer that expects 32-dimensional vectors as its input.

When using Keras, you don't have to worry about compatibility because the layers you add to your models are dynamically built to match the shape of the incoming layer. 

In [9]:
from keras import models

model = models.Sequential()
model.add(layers.Dense(32, input_shape=(784, )))
model.add(layers.Dense(32))

The second layer didn't receive an input shape argument - instead, it automatically inferred its input shape as being the output shape of the layer that came before. 

### 3.1.2 Models: Networks of Layers

The most common dee-learning model instance is a linear stack of layers, mapping a single input to a single output. You will also be exposed to a broader variety of network topologies:

- Two-branch networks

- Multihead networks

- Inception blocks

The topology of a network defines a **hypothesis space**. We defined machine learning as "searching for useful representations of some input data, within a predefined space of possibilities, using guidance from a feedback signal." By choosing a network topology, you constrain your **space of possibilities** (hypothesis space) to a specific series of tensor operations, mapping input data to output data. Then you will look for a good set of values for the weight tensors involved in these tensor operations. 


Picking the right **network architecture** is more art than science, and although there are some best practices and principles you can rely on, only practice can help you become a proper neural-network architect. 

### 3.1.3 Loss Functions and Optimizers: Keys to Configuring the Learning Process


Once you define a network architecture, you still have to choose

- **A loss function (objective function)**: The quantity that will be minimized during training. It represents a measure of success for the task at hand.

- **Optimizer**: Determines how the network will be updated based on the loss function. It implements a specific variant of stochastic gradient descent (SGD).


A neural network that has multiple outputs may have multiple loss functions (one per output). But the gradient-descent process must be based on a single scalar loss value; so, for multiloss networks, all losses are combined (via averaging) into a single scalar quantity.


Choosing the right objective function for the right problem is important: your network will take any shortcut it can, to minimize the loss; so if the objective doesn't fully correlate with success for the task at hand, your network will end up doing things you might not want. 


When it comes to common problems such as classification, regression, and sequence prediction, there are simple guidelines you can follow to choose the correct loss. Only when you're working on truly new research problems will you have to develop your own objective functions.

For example:

- For a two-class classification problem, you'll use binary crossentropy

- For a many-class classification problem, you'll use categorical crossentropy

- For a regression problem, you'll use mean-squared error

- For a sequence-learning problem, you'll use connectionist temporal classification (CTC).



## 3.2 Introduction to Keras

Keras is a deep-learning framework for Python that provides a convenient way to define and train almost any kind of deep-learning model. Keras was initially developed for researchers, with the aim of enabling fast experimentation. Distributed under the MIT license, which means it can be freely used in commercial projects. 

Keras has the following features:

- Allows the same code to run seamlessly on CPU or GPU.

- Has a user-friendly API that makes it easy to quickly prototype deep-learning models.

- Has build-in support for convolutional networks (for computer vision), recurrent networks (for sequence processing), and any combination of both.

- Supports arbitrary network architectures: multi-input or multi-output models, layer sharing, model sharing, and so on. This means Keras is appropriate for building essentially any deep-learning model, from a generative adversarial network to a neural Turing machine.









### 3.2.1 Keras, Tensorflow, Theano, and CNTK

### 3.2.2 Developing with Keras: a Quick Overview

## 3.3 Setting Up a Deep-Learning Workstation

### 3.3.1 Jupyter Notebooks: the Preferred Way to Run Deep-Learning Experiments

### 3.3.2 Getting Keras Running: Two Options

### 3.3.3 Running Deep-Learning Jobs in the Cloud: Pros and Cons

### 3.3.4 What is the Best GPU for Deep Learning?

## 3.4 Classifying Movie Reviews: a Binary Classification Example

### 3.4.1 The IMDB Dataset

### 3.4.2 Preparing the Data

### 3.4.3 Building Your Network

### 3.4.4 Validating Your Approach

### 3.4.5 Using a Trained Network to Generate Predictions on New Data

### 3.4.6 Further Experiments

### 3.4.7 Wrapping Up

## 3.5 Classifying Newswires: a Multiclass Classification Example

### 3.5.1 The Reuters Dataset

### 3.5.2 Preparing the Data

### 3.5.3 Building Your Network

### 3.5.4 Validating Your Approach

### 3.5.5 Generating Predictions on New Data

### 3.5.6 A Different Way to Handle the Labels and the Loss

### 3.5.7 The Importance of Having Sufficiently Large Intermediate Layers

### 3.5.8 Further Experiments

### 3.5.9 Wrapping Up

## 3.6 Predicting House Prices: a Regression Example

### 3.6.1 The Boston Housing Price Dataset

### 3.6.2 Preparing the Data

### 3.6.3 Building Your Network

### 3.6.4 Validating Your Approach Using K-fold Validation

### 3.6.5 Wrapping Up