### Softmax Regression implemented by Tensorflow

##### Auther: Xunzhe Wen

 Tensorflow load the MNIST data

In [13]:
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

MNIST data info of train, test and validation sets.

In [14]:
mnist = input_data.read_data_sets("MNIST_data/",one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


In [15]:
print('Training set  :', mnist.train.images.shape, mnist.train.labels.shape)
print('Test set      :',mnist.test.images.shape, mnist.test.labels.shape)
print('Validation set:',mnist.validation.images.shape, mnist.validation.labels.shape)

Training set  : (55000, 784) (55000, 10)
Test set      : (10000, 784) (10000, 10)
Validation set: (5000, 784) (5000, 10)


55000 samples in training set, 10000 samples in test set and 5000 for validation set. Each image has been converted from $28\times28$ into one dimention $1\times784$. And the label represents 0 to 9, which counts ten individual numbers (labels).<br>
<br>
After loading the data, it will use the Softmax Regression algorithm to train the classification model, which can be used in multi-classification tasks. Softmax estimated the probabilities for each class, and choose the highly probabilistic one to output.<br>
<br>
The machanism of the Softmax:
$$ feature_i=\sum_j(W_{i,j}x_j)+b_i $$
where $i$ is the $i^{th}$ class, and $j$ means the $j^{th}$ pixel in an image. $W$ is weight and $b$ is bias. Then calculate the softmax in terms of all the features. In general, it is a $exp$ operation and normalization:
$$ softmax(x)=normalize(exp(x)) $$
the probability of the $i^{th}$ class can be derived by:
$$ softmax(x)_i=\frac{exp(x_i)}{\sum_j(exp(x_j))} $$
Rewrite the Softmax using Matrix operation:
$$ y=softmax(Wx+b) $$

It will implement the Softmax algorithm using Tensorflow. Before that, it is neccessary to mention that: Numpy was used for intensive calculations before Tensorflow was proposed. Numpy was programmed with C and fortran, and the results of each calculation would return to Python, which caused a certain delay for data transmission from different languages. Tensorflow was designed to operate outside Python, therefore no needs to transmit back to Python every signal time.

Create a session, and the following operations are run within this session (default). The data and operations from different session remains independent.<br>
Placeholder creates the space for input data, which specifies the type and shape for the input data.

In [17]:
sess=tf.InteractiveSession()
x=tf.placeholder(tf.float32,[None,784])



Then it will create the Variable objects for weights and biases. Variable are used for model parameter storage, and it will stored as a long time, moreover, it can be updated during each iteration.<br>
In this case, it will initilize the parameters as zeros, but for more complicated models such CNN, RNN or deeper fully-connected networks, initilization can be more important.

In [18]:
W=tf.Variable(tf.zeros([784,10]))
b=tf.Variable(tf.zeros([10]))

Apply the Softmax Regression using Tensorflow:

In [19]:
y=tf.nn.softmax(tf.matmul(x,W)+b)

Softmax is a function of tf.nn, which concludes many conponents of neural networks; tf.matmul is a matrix multiply function. Next, as long as the loss get defined, the derivation and gradient descent will be executed automatically in Tensorflow.<br>
Loss function represents the accuracy of the model, and the goal of training is to decrease the loss until it reaches a local or global optimum.<br>
For multi-classification problem, the commonly used loss function is Cross-Entropy:<br>
$$ H_{y^{'}}(y)= -\sum_i {y_i^{'}}{log(y_i)}$$

In [20]:
y_=tf.placeholder(tf.float32,[None,10])
cross_entropy=tf.reduce_mean(-tf.reduce_sum(y_*tf.log(y),reduction_indices=[1]))

Pre-defined $y$_ was used to store true label, for cross-entropy computation. tf.reduce_mean gets mean for each batch of data, and tf.reduce_sum is for summation.<br>
So far, we have Softmax Regression, and Cross-Entropy, just one more optimization algorithm, the training process will happen. Stochastic Gradient Descent was the commonly used one. For optimization algorithms, see https://github.com/vince-xunzhe/Convex-Optimizer-in-Deep-Learning for more infomation.<br>
During the training, Tensorflow can calculate the derivations automatically based on the flow chart, and reduce the loss by updating the parameters according to backpropagation.<br>

In [21]:
train_step=tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

Learning rate was set to 0.5, the optimizer object is cross_entropy. We will use global variable initilizer, and execute the run method:

In [22]:
tf.global_variables_initializer().run()

At last, we will train using iteration. This method randomly select 100 smamples to form a mini-batch (Stochastic Gradient Descent), feed to placeholder, and train. In this mini-batch strategy, the training speed will be faster than train on the whole data.

In [23]:
for i in range (1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    train_step.run({x:batch_xs,y_:batch_ys})

So far, the training process has been completed. The training accuracy will be tested.

In [25]:
correct_prediction=tf.equal(tf.argmax(y,1),tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

tf.argmax find the index with the maximum probability, and tf.equal can tell which the classification is correct or not by comparison.

In [26]:
print(accuracy.eval({x:mnist.test.images, y_:mnist.test.labels}))

0.9193
