# Theano Tutorial

## Numpy refresher

Every **row** is an example, every **column** is a feature

In [1]:
import numpy
import numpy as np


In [2]:
# A np matrix
A = numpy.asarray([[1., 2], [3, 4], [5, 6]])
print A
# access a specific element
print A[2,0]

[[ 1.  2.]
 [ 3.  4.]
 [ 5.  6.]]
5.0


### np Broadcasting
The smaller array (or scalar) is broadcasted across the larger array so that they
have compatible shapes

In [3]:
a = numpy.asarray([1.0, 2.0, 3.0])
b=2
a*b

array([ 2.,  4.,  6.])

## Theano Baby Steps
Let's make a simple function: add two numbers together

From now on, we will use *Variable* to mean *symbol*

In [4]:
import numpy
import theano.tensor as T
from theano import function
x = T.dscalar('x') # variable (symbol)
y = T.dscalar('y') # variable (symbol)
z = x + y # z is a Variable which represents addition of x and y
f = function([x, y], z)

### Theano flags
THEANO_FLAGS='device=gpu0,floatX=float32,blas.ldflags="-L/usr/lib/ -lblas"'

#### With debugging
THEANO_FLAGS='device=gpu0,floatX=float32,exception_verbosity=high,optimizer=None,blas.ldflags="-L/usr/lib/ -lblas"'


## Using Pretty Print !!

In [5]:
from theano import pp
print pp(z)


(x + y)


### A Theano function
*First argument* is a list of input Variables. *The second* is a single (or list of) Variable(s). The ** second** argument **is what we want to see as output**.

In [6]:
# Eval fun
f(2,3)

array(5.0)

In [7]:
numpy.allclose(f(16.3, 12.1), 28.4+1e-4)

True

*dscalar* is **not a class**. Therefore, x and y are not instances of dscalar. **x and y are instances of TensorVariable**
Names are not required, but they can help debugging.

### A side note: using `eval` instead of `function`
As a (typing) shortcut, you can just use a variable's eval method

In [8]:
numpy.allclose(z.eval({x : 16.3, y : 12.1}), 28.4)

True

### Adding matrices, is similar
Using `x = T.dmatrix('x')` instead of `T.dscalar('x')`
#### Broadcating:
Using broadcasting, it is possible to add scalars to matrices, vectors to matrices, scalars to vectors, etc.
#### Many types are available.
E.g. `fscalar, fvector, fmatrix, frow, fcol, ftensor3, ftensor4`
See also [Basic Tensor Functionality](http://deeplearning.net/software/theano/library/tensor/basic.html#libdoc-tensor-creation)

### Excercise
Modify and execute this code to compute this expression: `a ** 2 + b ** 2 + 2 * a * b`.

In [9]:
import theano
a = theano.tensor.vector() # declare variable
out = a + a ** 10               # build symbolic expression
f = theano.function([a], out)   # compile function
print(f([0, 1, 2]))

[    0.     2.  1026.]


In [10]:
import theano
a = theano.tensor.vector() # declare variable
b = theano.tensor.vector() # declare variable
out = a ** 2 + b ** 2 + 2 * a * b            # build symbolic expression
f = theano.function([a,b], out)   # compile function
print(f([0, 1, 2], [5, 6, 7]))

[ 25.  49.  81.]


### Logistic Regression example.
Using `T.exp` or `T.tanh`

In [11]:
x = T.dmatrix('x')
s = 1 / (1 + T.exp(-x))
logistic = theano.function([x], s)
logistic([[0, 1], [-1, -2]])

array([[ 0.5       ,  0.73105858],
       [ 0.26894142,  0.11920292]])

In [12]:
s2 = (1 + T.tanh(x / 2)) / 2
logistic = theano.function([x], s2)
logistic([[0, 1], [-1, -2]])

array([[ 0.5       ,  0.73105858],
       [ 0.26894142,  0.11920292]])

## Computing More than one Thing at the Same Time
Theano supports functions with multiple outputs

In [13]:
a, b = T.dmatrices('a', 'b') # is like (matlab's) deal(T.dmatrix)
diff = a - b
abs_diff = abs(diff)
diff_squared = diff**2
f = theano.function([a, b], [diff, abs_diff, diff_squared])

print f([[1, 1], [1, 1]], [[0, 1], [2, 3]])


[array([[ 1.,  0.],
       [-1., -2.]]), array([[ 1.,  0.],
       [ 1.,  2.]]), array([[ 1.,  0.],
       [ 1.,  4.]])]


### Default Value for an Argument
`In` class allows you to specify properties of your function?s parameters 

In [14]:
from theano import In ## <<-- NOTE THIS IMPORT.  
from theano import function
x, y = T.dscalars('x', 'y')
z = x + y
f = function([x, In(y, value=1)], z)
print f(33)
print f(33, 2)

34.0
35.0


Default vals can be also assigned by name 

In [15]:
x, y, w = T.dscalars('x', 'y', 'w')
z = (x + y) * w
f = function([x, In(y, value=1), In(w, value=2, name='w_by_name')], z)
f(33, w_by_name=1, y=0)

array(33.0)

## Shared Variables [(link)](http://deeplearning.net/software/theano/tutorial/examples.html#using-shared-variables)
Used to make a function with an internal state. For example, an accumulator, or parameters (network weights) of a NN.

It is **called a shared variable** because its **value is shared between many functions**.
They *have an internal value* that defines the value taken by this symbolic variable in *all the functions* that use it.

In [16]:
from theano import shared ## <<-- NOTE THIS IMPORT.  
state = shared(0)
inc = T.iscalar('inc')
accumulator = function([inc], state, updates=[(state, state+inc)]) ## Note updates =..

### The `updates` parameter of `function`
`updates` must be supplied with **a list of pairs** of the form *(shared-variable, new expression)*. It **can also be a dictionary** whose keys are shared-variables and values are the new expressions


It is possible to reset the state. With `.set_value()` method:

In [17]:
print(state.get_value())
print accumulator(1)
print(state.get_value())
print accumulator(300)
print(state.get_value())
state.set_value(-1)
print accumulator(3)
print(state.get_value())


0
0
1
1
301
-1
2


In [18]:
# define more than one function to use the same shared variable. 
decrementor = function([inc], state, updates=[(state, state-inc)]) 
decrementor(2)
print(state.get_value())
decrementor(2)
print(state.get_value())

0
-2


### `givens`: When expressed some formula using a `shared` variable, but you do not want to use its value.
the `givens` is a mechanism that allows you to replace any part of your formula with a different expression that evaluates to a tensor of same shape and dtype.

In [19]:
fn_of_state = state * 2 + inc
foo = T.scalar(dtype=state.dtype) # The type of foo must match the shared variable we are replacing with the ``givens``
skip_shared = function([inc, foo], fn_of_state, givens=[(state, foo)]) # <<-- NOTE THIS
print skip_shared(1, 3)
print(state.get_value())
print skip_shared(1, 3)
print(state.get_value())


7
-2
7
-2


### Copying functions:
Creating similar functions but with different shared variables or updates.

The `swap` parameter, which is a dictionary of shared variables to exchange

In [20]:
new_state = theano.shared(34)
new_accumulator = accumulator.copy(swap={state:new_state}) # <<-- NOTE THIS, and the swap dictionary
new_accumulator(100)
print(new_state.get_value())

134


We now create a copy with updates removed using the delete_updates parameter, which is set to False by default
`null_accumulator = accumulator.copy(delete_updates=True)`
It doesn't work. Looks like a [tutorial bug](https://stackoverflow.com/questions/37911325/theano-tutorial-unusedinputerror-theano-function). A workaround is implemented below

In [21]:
null_accumulator = theano.function([inc], state,
    updates=[(state, state+inc)], on_unused_input='ignore')



In [22]:
print null_accumulator(9000)
print state.get_value() 

-2
8998


## Random Variables
Putting randomness into Theano's computations **is to put random variables in your graph**. Theano will **allocate a NumPy RandomStream object** (a random number generator) for each such variable, and **draw from it as necessary**

Other distribuitions are [here](http://deeplearning.net/software/theano/library/tensor/raw_random.html#libdoc-tensor-raw-random)

There are 2 other implementations based on MRG31k3p and CURAND. **The RandomStream only work on the CPU**, MRG31k3p work on the CPU and GPU. CURAND only work on the GPU.
#### Using MRG Version 
`from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams`

In [23]:
from theano.tensor.shared_randomstreams import RandomStreams # <<-- NOTE THIS
from theano import function
srng = RandomStreams(seed=234) # SEED all of the random variables allocated by a RandomStreams
rv_u = srng.uniform((2,2)) # defining a RV 
rv_n = srng.normal((2,2)) # defining a RV 
f = function([], rv_u)
g = function([], rv_n, no_default_updates=True)    #Not updating rv_n.rng
nearly_zeros = function([], rv_u + rv_u - 2 * rv_u)

In [24]:
print f()
print f()

[[ 0.12672381  0.97091597]
 [ 0.13989098  0.88754825]]
[[ 0.31971415  0.47584377]
 [ 0.24129163  0.42046081]]


When we add the extra argument `no_default_updates=True` to function (as in g), then **the random number generator state is not affected** by calling the returned function. So, for example, **calling g multiple times will return the same numbers.**

In [25]:
print g()
print g()

[[ 0.37328447 -0.65746672]
 [-0.36302373 -0.97484625]]
[[ 0.37328447 -0.65746672]
 [-0.36302373 -0.97484625]]


Seed just one random variable by seeding or assigning to the `.rng` attribute, using `.rng.set_value()`

In [26]:
rng_val = rv_u.rng.get_value(borrow=True) 
rng_val.seed(89234) 
rv_u.rng.set_value(rng_val, borrow=True)

### Copying Random State Between Theano Graphs
for example if you are trying to initialize the state of a model, from the parameters of a pickled version of a previous model. This can be achieved by copying elements of the `state_updates` parameter.
Example in [link](http://deeplearning.net/software/theano/tutorial/examples.html#copying-random-state-between-theano-graphs)

## Logistic Regression on MNIST example
See [here](TF-Tut1@Theano.ipynb) doing the tensorflow tutorial on Theano

## Looping (`scan` function)

#### Solving issues with BLAS is [here](http://stackoverflow.com/questions/6789368/how-to-make-sure-the-numpy-blas-libraries-are-available-as-dynamically-loadable)

#### Comments from the doc with Anna:
Deals with building symbolic graphs over loops. The syntax is as follows:


`[rval,updates] = theano.scan(fn *a func handler*, 
 sequences=None *vectors where we iterate, sending a single element to iteration of scan*,
 outputs_info=None *list of initializations for previous rvals needed in an iteration*,
non_sequences=None *non changing inputs per iteration*, 
n_steps=None *number of iterations*, 
name=None *name of iteration - usefull for debugging*)`

in `fn` the func handler,  **the output of the prior call to fn** (or the initial value, initially) **is the first parameter**, *followed by inputs from all non-sequences.*

The general order of function parameters to fn is:

`sequences (if any), prior result(s) (if needed), non-sequences (if any)`


If we need only part of the return values for the next iteration, we initialize outputs_info  properly with the needed values and insert [None] for every rval we don't need. The rval-s we don't need must be at the end. 


In [27]:
k=0
for i in range(10):
    k=k+1
print k    

10


### A counter and a range with theano scan

In [28]:

i = T.iscalar('i')
i0 = T.iscalar('i0')
rval, _ = theano.scan(lambda xprev: xprev+1, outputs_info=[i0], n_steps=i)
f_range = theano.function([i,i0], rval)
f_last = theano.function([i,i0], rval[-1])
print f_range(5,3)
print f_last(6,3)

[4 5 6 7 8]
9


In [29]:
# visualization for the theano graph
import theano.d3viz as d3v

### A recurrence with scan: Fibonacci Series
NOTE: on **`outputs_info`** we need to indicate the **`inital`** variable (`x0` here), and the arguments that will be passed to the function at each step. The key **`taps`** is a list of indices to the results to be used as arguments, -1 and -2 mean the last and penultimate entries. By default, taps is set to [-1]. ([ref](http://nbviewer.jupyter.org/gist/triangleinequality/1350873eebea33973e41))

In [30]:
i = T.iscalar('i')
x0 = T.ivector('x0') ## <<-- NOTE: This is a VECTOR, with 2 elements for init the series
rval, _ = theano.scan(lambda xn2, xn1: xn2+xn1, outputs_info=[{'initial':x0, 'taps':[-2,-1]}], n_steps=i)
f_fibo = theano.function([i, x0], rval)



In [31]:
f_fibo(50, np.asarray([0,1], dtype=np.int32)) ## NOTE OVERFLOW IN THE LAST RESULTS ENTRIES

array([          1,           2,           3,           5,           8,
                13,          21,          34,          55,          89,
               144,         233,         377,         610,         987,
              1597,        2584,        4181,        6765,       10946,
             17711,       28657,       46368,       75025,      121393,
            196418,      317811,      514229,      832040,     1346269,
           2178309,     3524578,     5702887,     9227465,    14930352,
          24157817,    39088169,    63245986,   102334155,   165580141,
         267914296,   433494437,   701408733,  1134903170,  1836311903,
       -1323752223,   512559680,  -811192543,  -298632863, -1109825406], dtype=int32)

In [63]:
import tensorflow as tf

In [68]:
i = T.iscalar('i')
x0 = T.ivector('x0') ## <<-- NOTE: This is a VECTOR, with 2 elements for init the series
elems = np.array([1, 0, 0, 0, 0, 0])
# rval, _ = theano.scan(lambda e,x: (x[1], x[0]+x[1]), sequences=elems, outputs_info=[(np.array(0), np.array(1))])
# f_fibo = theano.function([i, x0], rval)

initializer = (np.array(0), np.array(1))
fibonaccis = tf.scan(lambda a, _: (a[1], a[0] + a[1]), elems, initializer)
print fibonaccis


Tensor("scan_4/TensorArrayPack:0", shape=(6, 2), dtype=int32)


### Adding flow control to scan  loops  ([ref](http://nbviewer.jupyter.org/gist/triangleinequality/1350873eebea33973e41))
we can end early if a condition is met, like say overflow.


In [32]:
def fib(f_m_1, f_m_2):
    ret= f_m_1+ f_m_2
    next_ret = ret+f_m_1
    return ret, theano.scan_module.until(next_ret <0) ## NOTE THE 2nd TERM for Stopping condition
i = T.iscalar('i')
x0 = T.ivector('x0') 
rval, _ = theano.scan(fib, outputs_info=[{'initial':x0, 'taps':[-2,-1]}], n_steps=i)
f_fibo = theano.function([i, x0], rval)

f_fibo(50, np.asarray([0,1], dtype=np.int32))

array([         1,          2,          3,          5,          8,
               13,         21,         34,         55,         89,
              144,        233,        377,        610,        987,
             1597,       2584,       4181,       6765,      10946,
            17711,      28657,      46368,      75025,     121393,
           196418,     317811,     514229,     832040,    1346269,
          2178309,    3524578,    5702887,    9227465,   14930352,
         24157817,   39088169,   63245986,  102334155,  165580141,
        267914296,  433494437,  701408733, 1134903170, 1836311903], dtype=int32)

### Iterating over vectors/matrices/tensors
By passing `scan` a **`sequences`** argument, the object we would like to iterate over. NOTE: It iterates over the first dimension of the tensor: For a vector, over the entries. For a matrix, over the rows...
#### NOTE about fn arguments order
The general order of function parameters to fn is:

`sequences (if any), prior result(s) (if needed), non-sequences (if any)`


In [41]:
x = T.fvector('x')
rval, _ = theano.scan(fn=lambda xi, cprev: cprev+xi, outputs_info=0.0, sequences=x)
f_cumsum = theano.function([x], rval)
X=np.asarray([1,1,3,5,11,-9], dtype=np.float32)
print f_cumsum(X)
print rval.eval({x:X})

[  1.   2.   5.  10.  21.  12.]
[  1.   2.   5.  10.  21.  12.]


## Using shared variables with `scan`
[link with gibbs sampling](http://deeplearning.net/software/theano/library/scan.html#using-shared-variables-gibbs-sampling)

**Highlights**
* `updates` dictionary links a shared variable with its updated value after k steps.
*  if we use shared variables but we do not iterate over them you do not need to pass them as arguments. **However, passing them to the scan function is a good practice**, which results in a simpler computational graph, which **speeds up the optimization and the execution**. 
* To pass the shared variables to `scan` you need to put them in a list as input to `non_sequences` argument
* Using the **`strict` flag**: With `strict=True` ,scan assumes that all the necessary shared variables in fn are passed as a part of `non_sequences`. This has to be ensured by the user. Otherwise, it will result in an error.


## RNN example with `scan`
The RNN function is:

![caption](http://deeplearning.net/software/theano/_images/math/f319ccb886e8ddd6d3f4ff4dc8895d6fddacf054.png)
This is a useless RNN, just used for demo

'u' is a sequence over which we need to iterate on, `x` and `y` are two outputs

* For scan is crucial only for the variables representing the different time taps to be in the same order as the one in which these taps are given. Also, not only taps should respect an order, but also variables, since this is how scan figures out what should be represented by what.

some more details are [here](http://deeplearning.net/software/theano/library/scan.html#multiple-outputs-several-taps-values-recurrent-neural-network-with-scan)

In [43]:
## NOTE in function def, first vars are vars we seq on (u), then the state vars (x,y). 
# Convention here (not a must) is to have first the earlier time stamps (e.g. x_tm3 before x_tm1)
def oneStep(u_tm4, u_t, x_tm3, x_tm1, y_tm1, W, W_in_1, W_in_2, W_feedback, W_out): 
    ## NOTE          vvvv    We do the dot product in the opposite direction, since we seq. on row vectors
    x_t = T.tanh(T.dot(x_tm1, W) + \
                 T.dot(u_t,   W_in_1) + \
                 T.dot(u_tm4, W_in_2) + \
                 T.dot(y_tm1, W_feedback))
    y_t = T.dot(x_tm3, W_out)
    return x_t, y_t

In [50]:
W = T.matrix('W')
W_in_1 = T.matrix('W_in_1')
W_in_2 = T.matrix('W_in_2')
W_feedback = T.matrix('W_feedback')
W_out = T.matrix('W_out')

u = T.matrix('u') # input as  sequence of (row) vectors
## Initial state for x0 and y0
x0 = T.matrix('x0') # initial state of x has to be a matrix, since
                # it has to cover x[-3]
y0 = T.vector('y0') # y0 is just a vector since scan has only to provide
                # y[-1]

([x_vals, y_vals], updates) = theano.scan(fn=oneStep, 
                                          sequences={'input':u, 'taps':[-4,0]},   # <<-- NOTE THIS
                                          outputs_info=[{'initial':x0, 'taps':[-3,-1]}, y0],  # <<-- NOTE THIS
                                          non_sequences=[W, W_in_1, W_in_2, W_feedback, W_out],
                                          strict=True) 
 # for second input y, scan adds -1 in output_taps by default

In [51]:
d3v.d3viz(x_vals, 'pics/x_vals.html')
!firefox pics/x_vals.html