# Fundamentals of Deep Learning
## 目录
1. [The Neural Network](#Chapter-1.-The-Neural-Network)
    1. The Neuron
    1. Feed-Forward Neural Networks
    1. Linear Neurons
    1. Sigmoid, Tanh, and ReLU Neurons

# Chapter 1. The Neural Network
## The Neuron
![1-6](https://www.safaribooksonline.com/library/view/fundamentals-of-deep/9781491925607/assets/fodl_0106.png)

Figure 1-6. A functional description of a biological neuron’s structure

The neuron receives its inputs along antennae-like structures called `dendrites`. Each of these incoming connections is dynamically strengthened or weakened based on how often it is used (this is how we learn new concepts!), and it’s the strength of each connection that determines the contribution of the input to the neuron’s output. After being weighted by the strength of their respective connections, the inputs are summed together in the `cell body`. This sum is then transformed into a new signal that’s propagated along the cell’s `axon` and sent off to other neurons.

![1-7](https://www.safaribooksonline.com/library/view/fundamentals-of-deep/9781491925607/assets/fodl_0107.png)

Figure 1-7. Schematic for a neuron in an artificial neural net

Just as in biological neurons, our artificial neuron takes in some number of inputs, $x_1, x_2, \cdots, x_n$, each of which is multiplied by a specific weight, $w_1, w_2, \cdots, w_n$. These weighted inputs are, as before, summed together to produce the `logit` of the neuron, $z=\sum_{i=0}^n w_ix_i$. In many cases, the logit also includes a `bias`, which is a constant (not shown in the figure). The logit is then passed through a function to produce the output . This output can be transmitted to other neurons.

Let’s reformulate the inputs as a vector $x = [x_1,x_2,\cdots,x_n]$ and the weights of the neuron as $w = [w_1,w2,\cdots,w_n]$. Then we can re-express the output of the neuron as $y=f(x \cdot w + b)$, where b is the bias term.

## Feed-Forward Neural Networks
![1-9](https://www.safaribooksonline.com/library/view/fundamentals-of-deep/9781491925607/assets/fodl_0109.png)

Figure 1-9. A simple example of a feed-forward neural network with three layers (input, one hidden, and output) and three neurons per layer

The bottom layer of the network pulls in the input data. The top layer of neurons (output nodes) computes our final answer. The middle layer(s) of neurons are called the hidden layers, and we let $w_{i,j}^{(k)}$ be the weight of the connection between the $i^{th}$ neuron in the $k^{th}$ layer with the $j^{th}$ neuron in the ${k+1}^{st}$ layer. These weights constitute our parameter vector, $\theta$, our ability to solve problems with neural networks depends on finding the optimal values to plug into $\theta$.

We note that in this example, connections only traverse from a lower layer to a higher layer. There are no connections between neurons in the same layer, and there are no connections that transmit data from a higher layer to a lower layer. These neural networks are called `feed-forward` networks.

## Linear Neurons
![1-10](https://www.safaribooksonline.com/library/view/fundamentals-of-deep/9781491925607/assets/fodl_0110.png)

Figure 1-10. An example of a linear neuron

## Sigmoid, Tanh, and ReLU Neurons
### Sigmoid
$$f(z)=\frac{1}{1+e^{-z}}$$

![1-11](https://www.safaribooksonline.com/library/view/fundamentals-of-deep/9781491925607/assets/fodl_0111.png)

Figure 1-11. The output of a sigmoid neuron as z varies

### Tanh
Tanh neurons use a similar kind of S-shaped nonlinearity, but instead of ranging from 0 to 1, the output of tanh neurons range from −1 to 1. 

![1-12](https://www.safaribooksonline.com/library/view/fundamentals-of-deep/9781491925607/assets/fodl_0112.png)

Figure 1-12. The output of a tanh neuron as z varies

### ReLU
Restricted linear unit (ReLU) neuron uses the function f(z)=max(0,z), resulting in a characteristic hockey-stick-shaped response, as shown in Figure 1-13.

![1-13](https://www.safaribooksonline.com/library/view/fundamentals-of-deep/9781491925607/assets/fodl_0113.png)

Figure 1-13. The output of a ReLU neuron as z varies

## Softmax Output Layers
We require the sum of all the outputs to be equal to 1. Letting $z_i$ be the logit of the $i^{th}$ softmax neuron, we can achieve this normalization by setting its output to:

$$y_i=\frac{e^{z_i}}{\sum_i e^{z_j}}$$