# Crime Prediction with Recurrent Neural Networks #

## Outline
 - Why do we need Recurrent Neural Networks?
 - Introduction to RNNs
 - Long Short Term Memory Cell
 - Gated Recurrent Unit
 - Introduction to the Crime Dataset 
 - Implementation of RNN, LSTM and GRU
 - Parameter Tuning and Evaluation

## Why do we need Recurrent Neural Networks? - An Example from Natural Language Processing

### Is this a __positive__ or a __negative__ statement?
<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; __“In France, I had a great time”__

#### One Solution:
- Use Bag-of_words to transfer sentence into a vector
- Use a Feed Forward Network to predict the class of the given sentence

<br>

#### Notice:
- The order of the words is not taken into account
- Each sentence / document is a single observation
- The classification of the next sentencen is independent from the last sentence



## Let's change our prediction task:

### What will be the next word?
<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; __“In France, I had a great ...?”__
<br>
#### Whats different?
- Each word is an observation at a given point of time
- The next word depends mainly on the previous words -> the order is important
- Each word might be representet as a single vector

<br>

#### Notice:
- Hard to solve with a Feed Forward Network


## How can we caputure these properties in a Neural Network?

### What will be the next word?
<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; __“In France, I had a great ...?”__
<br>
### Recap Feed Forward Network
<img src="presentation_pics/neural_network1.png" alt="NN" style="width: 400px;"/>
<br>
[Source:  Alisa's Presentation on NN Primer]

## Let's simplify our representation and the input size

### What will be the next word?
<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; __“great ...?”__
<br>
<img src="presentation_pics/NN_simplified.png" alt="NN" style="width: 400px;"/>
<br>
[Source:  fastai Practical Deep Learning for Coders, Lesson 6: RNNs]


## How to add annother preceding timestep

### What will be the next word?
<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; __“a great ...?”__
<br>
<img src="presentation_pics/RNN2.png" alt="NN" style="width: 400px;"/>
<br>
[Source:  fastai Practical Deep Learning for Coders, Lesson 6: RNNs]



## How to add annother preceding timestep

### What will be the next word?
<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; __“had a great ...?”__
<br>
<img src="presentation_pics/RNN3.png" alt="NN" style="width: 600px;"/>
<br>
[Source:  fastai Practical Deep Learning for Coders, Lesson 6: RNNs]




## Adding an abitrary number of preceding words

### What will be the next word?
<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; __“In France, I had a great ...?”__<br>
<img src="presentation_pics/RNN_loop.png" alt="NN" style="width: 600px;"/>
<br>
[Source:  fastai Practical Deep Learning for Coders, Lesson 6: RNNs]

## Yet annother representation

<br>

<img src="presentation_pics/RNN_alt.png" alt="NN" style="width: 600px;"/>
<br>
[Source:  fastai Practical Deep Learning for Coders, Lesson 6: RNNs]

### [Placeholder Math forward]  
<br>
h_t-1 = sigmoid(0 + X_t-1 x W_in) <br>
h_t = sigmoid(h_t-1 x W_h + X_t x W_in) <br>
h_t+1 = sigmoid(h_t x W_h + X_t+1 x W_in) <br>
Y = X_t+1 = softmax(h_t+1 x W_out)<br>

### [Placeholder Math derivation]  

dY / dx_t-1 = dY / dX_t-1 * dh_t+1 / dX_t-1

h_t-1 = sigmoid(X_t-1 x W_in) <br>
h_t = sigmoid(h_t-1 x W_h + X_t x W_in) <br>
h_t+1 = sigmoid(h_2 x W_h + X_t x W_in) <br>
Y = X_t+1 = sigmoid(h_2 x W_h + X_t x W_in) <br>

# Why not use RNN for further implementation?
- issue for long-term dependencies: back to example 
# “In France, I had a great time and I learnt some of the _____ language.”
 
- cannot connect information anymore
- "vanishing gradient problem"




# Vanishing Gradient problem
- backpropagation through time
- as gap between timesteps becomes bigger, product longer and we are multiplying very small numbers (small gradients)
- due to activation function (tanh)
- some crucial previous timesteps do not influence anymore in later timesteps: gradient vanishes...

\begin{align}
\frac{\partial J_2}{\partial W}=\frac{\partial J_2}{\partial y_2}...
\end{align}

<img src="presentation_pics/vanishing_gradient.png" alt="vanishing_gradient" style="width: 600px;"/>
[Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/]

# LSTM Network - Long Short-Term Memory Network

# LSTM
<img src="presentation_pics/LSTM.png" alt="LSTM" style="width: 800px;"/>
[Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/]

## GRU Network - Gated Recurrent Unit##

- Gated Cells (LSTM/GRU) can keep track of information throughout the timeseries
- consider output of previous timesteps

## GRU
<img src="presentation_pics/GRU.png" alt="GRU" style="width: 800px;"/>

[Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/]

## Application / Use Case
### Where should the head of the Chicago Police Force sent his patrols?
<img src="presentation_pics/crime_intro.png" alt="GRU" style="width: 800px;"/>

### [Placeholder Interactive Crime Map]
Could be used to argue crime changes over time, thus this is a time series problem

In [None]:
Implementation of RNN, LSTM and GRU
Parameter Tuning and Evaluation