# Solving trivial problem using Theano

## Goal

In this exercise we will solve the same problem as in previous lesson but using dedicated library.

Why would we want to use it? Because as our equations get larger (and we need large equations to solve difficult tasks), writing everything ourselves will become more and more difficult. In particular calculating the gradient of cost function using the approximate method might not work, even if we manage to write id down. Dedicated library like Theano can do that for huge functions with little effort and deliver precise results.

Although Theano is no longer supported and slowly becomes stale, I find it better for explaining machine learning basics. So it is my conscious decision to use Theano for the purpose of this tutorial. After getting sound understanding of fundamentals it will be much easier to grasp all other frameworks. Besides, in later examples we will switch to higher-level framework (Keras) which is very much alive and can use both Theano and, much more popular today, TensorFlow libraries as backend.

The most important information here is that with these libraries we represent all our operations in a symbolic way. This means that when I write:
```
c = a + b
```
and "a" and "b" are Theano variables, the "c" also becomes a Theano variable ant it doesn't store the result of adding "a" and "b" but represents the operation itself. Actually neither "a" nor "b" have assigned any particular number at this point. To actually calculate something I first need to compile a function that evaluates "c" and explicitly run it for some particular values of "a" and "b".

Let's see how this will for in our trivial example.

In [1]:
import theano
import numpy

LEARNING_RATE = 0.6
NUM_UPDATES = 10

Using cuDNN version 7301 on context None
Mapped name None to device cuda: GeForce GTX 970 (0000:01:00.0)


Now we will define two Theano variables that will store our inputs and expected outputs:

In [2]:
x = theano.tensor.fvector('x')
target = theano.tensor.fscalar('target')

We used data type 'fvector' which stands for 'vector of floats' (in our case this will be only 2 elements but that is still a vector) for inputs and 'fscalar' for expected output.

Now let's define our internal parameters (aka weights). Please not that we provide initial values to the constructor.

In [3]:
W = theano.shared(numpy.asarray([0.2, 0.7]), 'W')

And now the three "functions" we had before:

In [4]:
y = (x * W).sum()

cost = theano.tensor.sqr(target - y)

gradients = theano.tensor.grad(cost, [W])

See how elegant this is using symbolic notation? In particular the gradient function, it accepts the cost function and a list of Theano variables representing weights. Here we only have one set of weights but soon when we introduce multi-layered models, this list will become much longer.

Now we need to define the way we will be updating the weights:

In [5]:
W_updated = W - LEARNING_RATE * gradients[0]
updates = [(W, W_updated)]

We create a list that contains weights to be updated. Each element on the list consists of a pair: a Theano variable representing weights and an expression that represents how they should be updated.

Finally we can compile our function:

In [6]:
f = theano.function([x, target], [y, W], updates=updates)

First argument is a list of input variables. To perform an update we need to provide examples: inputs and expected outputs. Second argument represents the output from a function, in this case we are interested in knowing current output of the function as well as current weights. Third argument is the updates which tell how weights should be updated.

Now all we need to do is run it:

In [7]:
for update in range(NUM_UPDATES + 1):
    y, w = f([1.0, 0.5], 20.0)
    print "Output before update {} is {}".format(update + 1, y)
print "Final weights values: w1: {} w2: {}".format(w[0], w[1])

Output before update 1 is 0.55
Output before update 2 is 29.7250011593
Output before update 3 is 15.1374988407
Output before update 4 is 22.4312508695
Output before update 5 is 18.7843744203
Output before update 6 is 20.6078128623
Output before update 7 is 19.6960935326
Output before update 8 is 20.1519532518
Output before update 9 is 19.924023365
Output before update 10 is 20.037988322
Output before update 11 is 19.9810058367
Final weights values: w1: 15.744804669385768 w2: 8.472402334692886


As you can see the result is nearly identical to our result from previous lesson. Because values returned by our Theano function "f" are calculated before applying updates, we did one update more to get comparable output.