## A simple logistic regression example

Import needed modules

In [None]:
# -*- coding: utf-8 -
import numpy
import theano
import theano.tensor as T
rng = numpy.random

num_classes = 10

if( num_classes > 2):
    multiclass = True
else:
    multiclass = False

print "Multiclass case is: "    
print multiclass

Define the number of input neurons (the features), the number of examples, and the training rate.

In [None]:
#N = number of examples
N = 1000
#feats = number of input neurons
feats = 784
#training rate
tr_rate = 0.1 

Define a tensor with two entries:
the first entry is a matrix of size N (number of examples) by feats (number of features) of random numbers on a normal distribution around 0.
The second entry is a vector of size N (number of examples) of either 0 or 1 (the two classes).

In [None]:
D = (rng.randn(N, feats), rng.randint(size=N, low=0, high=num_classes))
training_steps = 1000

Declare the symbolic variables.
x will represent the input, i.e. a matrix of random numbers of size feats for each example (this is the D[0] entry defined above.
y will represent the output, i.e. whether the example belongs to class 0 or class 1

In [None]:
# Declare Theano symbolic variables
x = T.matrix("x")
if(multiclass is False):
    y = T.vector("y")
else:
    y = T.ivector("y")

Define the vector of weights and the bias.
There is just one weight associated to each feature since the output is just one class.
There is just one bias since there is just one output neuron.
The weights are randomly initialised, the bias can be initialised to 0.0 or a small value.

In [None]:
if(multiclass is False):
    w = theano.shared(rng.randn(feats), name="w")
    b = theano.shared(0.01, name="b")
else:
    w = theano.shared(rng.randn(feats, num_classes), name="w")
    b = theano.shared(numpy.full(num_classes, 0.01), name="b")

Optional printing of the initial model weights and bias

In [None]:
print("Initial model:")
print(w.get_value())
print(b.get_value())

Constructing the actual solution. 
Sigma represent the sigmoid 
<br>
$$ \frac{1}{1+exp(-\bf{x}\dot\bf{w}-\bf{b}) } $$ 
<br>

that is expressed in theano as T.nnet.sigmoid(). 
For a multi-class classification, sigma will be represented by a vector 

$$ \sigma_{1}, \dots, \sigma_{j}, \dots, \sigma_{num\_classes} $$ where

<br>
$$ \sigma_{j} = \frac{exp(\bf{x}\dot\bf{w_{j}}+\bf{b})}{\sum_{i=1}^{num\_classes}exp(\bf{x}\dot\bf{w_{i}}+\bf{b}) } $$
<br>

and the theano function representing it is called T.nnet.softmax().

For a 2-class classification, it is enough to check whether sigma is greater than 0.5, otherwise we take 
The prediction can either be 0 or 1 (the two classes) depending on whether the sigmoid is greater or less than 0.5.
The cost function is defined by 

<br>
$$ 
error({\bf w}) = -\frac{1}{N} \sum_{i=1}^{N} [ y^i \ln (\sigma({\bf{x^i}})) + (1-y^i) \ln (1 - \sigma({\bf{x^i}})] 
$$
<br>

where the superscript represents the $i^{th}$ example.
For a multi-class classification, the cost function is modified to be:

<br>
$$ 
error({\bf w}) = -\frac{1}{N} \sum_{i=1}^{N} \sum_{j=1}^{num\_classes} \{ y_{j}=j\} [ y^i \ln (\sigma_{j}({\bf{x^i}}))] 
$$
<br>

which in theano is defined by T.nnet.softmax(). The prediction will be the output class with thehighest value, i.e. the argmax of the output sigma.
The cost adds a value to reduce the possibility of overfitting by keeping larger weights in check.
Finally, theano will calculate the gradient of the cost function that is used for approximating the solution using linear descent.



In [None]:
# Construct Theano expression graph
#sigma = 1 / (1 + T.exp(-T.dot(x, w) - b))        # Probability that target = 1
if(multiclass is False):
    sigma = T.nnet.sigmoid(T.dot(x,w) + b)        # The prediction thresholded
    prediction = sigma > 0.5 
    print "Two classes"
else:
    sigma = T.nnet.softmax(T.dot(x,w) + b) 
    prediction = T.argmax(sigma, axis=1)          # The class with highest probability
    print "%i classes"%num_classes
 
# Cross-entropy loss function
if( multiclass is False):
    xent = -y * T.log(sigma) - (1-y) * T.log(1-sigma) 
else:
    xent = -T.mean(T.log(sigma)[T.arange(y.shape[0]), y])
    
cost = xent + 0.01 * (w ** 2).sum()               # Regularisation 
gw, gb = T.grad(cost, [w, b])                     # Compute the gradient of the cost         

We create the theano function.
The input is given by the set of features per each example.
The output is given by the class per each example.
The training is performed by updating weights and biases using the gradient calculated times a training rate (in order to avoid overshooting the minimum value).

In [None]:
# Compile
train = theano.function(
          inputs=[x,y],
          outputs=[xent],
          updates=((w, w - tr_rate * gw), (b, b - tr_rate * gb)),
          allow_input_downcast=True)
predict = theano.function(inputs=[x], outputs=prediction)

Perform the actual training on the data. 
This updates at each step the weighs and bias making the neural net perform better and get closer to the target solution.

In [None]:
# Train
for i in range(training_steps):
    train(D[0], D[1])

Optional printing of the final model weights and bias

In [None]:
print("Final model:")
print(w.get_value())
print(b.get_value())

Printing of the target values (the classes) and the prediction by our model.

In [None]:
print("target values for D:")
print(D[1])
print("prediction on D:")
print(predict(D[0]))

Calculate the errors, i.e. the numbers of examples that have not been classified correctly and output the accuracy result.

In [None]:
result = predict(D[0]) - D[1]

error = 0
for index in result:
    if result[index] != 0:
        error += 1
        
correct_guesses = N - error
accuracy = (N - error)*100/N

print
print "correct predictions = %f over %i examples" % (correct_guesses, N)
print "accuracy = %i%%" % accuracy