# Multi-class classification with Softmax regression

- https://www.youtube.com/watch?v=MFAnsx1y9ZI&index=13&list=PLlMkM4tgfjnLSOjrEJN31gZATbcj_MpUm

- multinomial classification = 3 개의 독립된 클래시파이어를 만들면 즉 각 클래서를 선택하는 클래시파이어를 만드는 것인데, 그냥 W 행렬을 1x3 짜리 3개를 만들지 말고 3x3 짜리 한 개를 만들자. 그러면, $Y=W * X$ 에서 얻어지는 $Y$는 3차원 벡터가 된다.
- 그러면 $Y$ 값을 확률값으로 변경하고자하면 어떻게 해야하나. softmax 함수를 사용하면 된다.
- $Softmax(y_i) = \frac{\exp{y_i}}{\sum \exp{y_k}}$
- https://www.tensorflow.org/api_docs/python/tf/nn/softmax
- ``log_softmax`` 가 정의되어 있다. https://www.tensorflow.org/versions/r0.10/api_docs/python/nn/classification
- https://www.tensorflow.org/get_started/mnist/beginners
- https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/mnist/mnist_softmax.py : may be numerically unstable so, ``tf.nn.softmax_cross_entropy_with_logits()`` 를 대신 사용한다.

## 학습을 위한 코스트 함수는?
- cross-entropy 를 사용하자
- $D (S, L) = -\sum_i L_i \log(S_i)$, where $S$ is the prediction output vector from softmax and $L$ is the label vector (e.g. [0, 1, 0]).

### Ref.
- http://stackoverflow.com/questions/40675182/tensorflow-log-softmax-tf-nn-logtf-nn-softmaxpredict-tf-nn-softmax-cross-ent

In [1]:
import numpy as np
import tensorflow as tf

In [3]:
xy = np.loadtxt('train-softmax.txt', unpack=True, dtype='float32')
xdata = np.transpose(xy[0:3])
ydata = np.transpose(xy[3:])
print ('xdata: ', xdata)
print ('ydata: ', ydata)
print ('notice that each row corresponds to one data input')

xdata:  [[ 1.  2.  1.]
 [ 1.  3.  2.]
 [ 1.  3.  4.]
 [ 1.  5.  5.]
 [ 1.  7.  5.]
 [ 1.  2.  5.]
 [ 1.  6.  6.]
 [ 1.  7.  7.]]
ydata:  [[ 0.  0.  1.]
 [ 0.  0.  1.]
 [ 0.  0.  1.]
 [ 0.  1.  0.]
 [ 0.  1.  0.]
 [ 0.  1.  0.]
 [ 1.  0.  0.]
 [ 1.  0.  0.]]
notice that each row corresponds to one data input


In [52]:
# TF Graph Input
X = tf.placeholder ("float", [None,3])
Y = tf.placeholder ("float", [None,3])

input_dim = 3
output_dim = 3
W = tf.Variable(tf.random_uniform([input_dim, output_dim], -1,1))
hypothesis = tf.nn.softmax (tf.matmul(X, W))

#cross entropy
cost = tf.reduce_mean(-tf.reduce_sum(Y*tf.log(hypothesis), axis=1))

```
cost = tf.reduce_mean( 
    tf.nn.softmax_cross_entropy_with_logits (labels=Y, logits=hypothesis))
```
learning_rate = 0.1
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

with tf.Session() as ss:
    ss.run(tf.global_variables_initializer())
    feed_dict = {X:xdata[:2], Y:ydata[:2]}
    print ('W:', ss.run(W))
    print ('X', ss.run(X, feed_dict = {X:xdata}))
    print ('Y', ss.run(Y, feed_dict = {Y:ydata}))
    print ('softmax(X*W)', ss.run(tf.nn.softmax(tf.matmul(X,W)), 
                                  feed_dict={X:xdata[:2]}))
    print ('hypothesis=', ss.run(hypothesis, feed_dict))
    print ('tf.log(hypothesis)=',
          ss.run(
          tf.log(hypothesis), feed_dict))
    print ('Y*tf.log(hypothesis)=',
          ss.run(Y*tf.log(hypothesis), feed_dict))
    print ('tf.reduce_sum(Y*tf.log(hypothesis), axis=1)=',
          ss.run(tf.reduce_sum(Y*tf.log(hypothesis), axis=1), feed_dict))
    print ('cost = tf.reduce_mean(-tf.reduce_sum) = ', 
           ss.run(cost, feed_dict))

    feed_dict = {X:xdata, Y:ydata}
    
    for step in range(20001):
        ss.run(optimizer, feed_dict)
        if step%2000==0:
            print ('{}/2001 cost={} '.format(
                step,
                ss.run(cost, feed_dict))
            )
            
    # see the learning result
    print ('--- learning result ---')
    re = ss.run(hypothesis, feed_dict)
    rIndx = [ss.run(tf.arg_max(re,1))]
    print ('hypothesis=', re)
    print ('predicted Indx=', rIndx)
    
    testX = [ [1,11,7], [1,3,4], [1,1,0]]
    print ('test output: ', ss.run(tf.arg_max(hypothesis,1), 
                                   feed_dict={X:testX}))

W: [[ 0.36091304  0.2101326   0.22026467]
 [ 0.1851213  -0.57366562 -0.51584959]
 [ 0.28643727  0.12737894 -0.28107858]]
X [[ 1.  2.  1.]
 [ 1.  3.  2.]
 [ 1.  3.  4.]
 [ 1.  5.  5.]
 [ 1.  7.  5.]
 [ 1.  2.  5.]
 [ 1.  6.  6.]
 [ 1.  7.  7.]]
Y [[ 0.  0.  1.]
 [ 0.  0.  1.]
 [ 0.  0.  1.]
 [ 0.  1.  0.]
 [ 0.  1.  0.]
 [ 0.  1.  0.]
 [ 1.  0.  0.]
 [ 1.  0.  0.]]
softmax(X*W) [[ 0.77999818  0.12544645  0.09455537]
 [ 0.91047555  0.05848143  0.03104303]]
hypothesis= [[ 0.77999818  0.12544645  0.09455537]
 [ 0.91047555  0.05848143  0.03104303]]
tf.log(hypothesis)= [[-0.24846369 -2.07587624 -2.35856962]
 [-0.09378823 -2.839046   -3.47238088]]
Y*tf.log(hypothesis)= [[-0.         -0.         -2.35856962]
 [-0.         -0.         -3.47238088]]
tf.reduce_sum(Y*tf.log(hypothesis), axis=1)= [-2.35856962 -3.47238088]
cost = tf.reduce_mean(-tf.reduce_sum) =  1.44017
0/2001 cost=1.2516590356826782 
2000/2001 cost=0.8216826915740967 
4000/2001 cost=0.8047826290130615 
6000/2001 cost=0.79774117469

## 결론

대략 돌아가는 상황을 파악하는 용도로만 사용하자. 데이터 8개로 피팅이 정말 잘 될 거라고 생각하지는 말아야하지않을까?