## Polynomial Regression
Apply polynomial regression on real-world problem which has multi-dimension on input data.</br>

##### Data
There are 3 classes of 50 instances for each. </br>
The first 40 samples of each class are used as training set, and the last 10 samples as testing set.</br>

**Number of attributes**: 5 (4-dim input (data), 1-dim targrt (class-label))

In [31]:
import pandas as pd
import numpy as np
from numpy.linalg import inv,matrix_power
from scipy.io import loadmat

### Load and preprocess data

In [120]:
data_dir = './data/'
x = loadmat(data_dir+'5_X.mat')['X']
t = loadmat(data_dir+'5_T.mat')['T']
# print(tr)
# print(ts)

x_dim = 4
t_dim = 1

# prepare to divide into training and testing
x = x.reshape(3,50,4)
t = t.reshape(3,50)

# divide into train and test
n_per_class = 50 
tr_x = np.zeros((120,x_dim))
ts_x = np.zeros((30,x_dim))
tr_t = np.zeros(120)
ts_t = np.zeros(30)
for cls in [0,1,2]:
    tr_x[(cls)*40:(cls+1)*40] = x[cls][:40]
    ts_x[(cls)*10:(cls+1)*10] = x[cls][40:]
    tr_t[(cls)*40:(cls+1)*40] = t[cls][:40]
    ts_t[(cls)*10:(cls+1)*10] = t[cls][40:]

N_tr = len(tr_t)
N_ts = len(ts_t)

print(tr_x[119])

[ 6.9  3.1  5.4  2.1]


### Apply regression for class estimation by minimizing the non-regularized error function

The error function looks like below:

$$E(w) = \frac{1}{2}*\sum_{n=1}^{N}(y(x_n,w)-t_n)^{2} $$

However, since our input, x, now turn to be multi-dimension, the polynomial function should be generalized, here is an example of order=2 (**M=2**):


$$y(x,w) = w_0 + \sum_{i=1}^{D}w_i*x_i + \sum_{i=1}^{D}\sum_{j=1}^{D}w_{ij}*x_i*x_j $$

note that in this problem, the dimension, D, equals to 4

In [124]:
'''
compare to preoblem4, the only difference is the form of 'phi', o.w. the other stuffs like the solution of W still 
equals to (phi^T * phi)^-1 * phi^T * t.
phi = [[x1^0,x1^1....x1^M],[x2^0,x2^1....x2^M], .... [xN^0,xN^1....xN^M]], while each x1, x2, ... xN are all in 
'Vector' form.
t = [t1,t2,t3 ... ,tN], which are just 'class-label' of each xN
'''
def VecPower(vec, power):
    vec = np.asarray(vec).reshape(1,len(vec))
    mat_prev = 1
    for i in range(power):
        mat = np.dot(mat_prev,  vec)
        mat_prev = mat.reshape(-1,1) # reshape as (_,1)
    
    result = list(mat.reshape(1,-1)[0])
    result = [float(format(x, '.3f')) for x in result]

    return result

tr_rms_ary = []
ts_rms_ary = []
Ms = [1,2]
for M in Ms:
    # Init data-num, valu-M and matrix-initialization
    # initial phi_mat with [[x1^{0}],[x2^{0}], ... ,[xN^{0}]], and turn it into list-type to prepare concate.
    Phi_mat = [list(x) for x in np.ones((N_tr,1))]
    T = np.zeros((N_tr,1))
    W = []

    for r_idx in range(N_tr):
        for c_idx in range(M):
            # concate diff. power of same xN result in one row
            
            Phi_mat[r_idx] = Phi_mat[r_idx] + VecPower(tr_x[r_idx],c_idx+1)
            if r_idx == 119:
                print(Phi_mat[119])

        T[r_idx][0] = tr_t[r_idx]

    Phi_mat = np.asarray(Phi_mat)
    
    # Start calculating W result
    phi_transpose = Phi_mat.transpose()
    W_tmp = inv(np.dot(phi_transpose , Phi_mat))
    W = np.dot( np.dot(W_tmp , phi_transpose) , T)

#     print(T)
    print(W)
#     print(W.T[0])
#     print(len(W))
    
    
    # Root mean square
    def SumOverDim(sum_n, w, idx, x):
        val = 0
        if sum_n == 1:
            for d in range(len(x)):
                val += w[idx]*x[d]
                idx += 1
        elif sum_n == 2:
            for d_i in range(len(x)):
                for d_j in range(len(x)):
                    val += w[idx]*x[d_i]*x[d_j]
                    idx += 1
                    
        return val, idx
                
    def poly_func (x, w_ary, m):
        y = w_ary[0]
        w_idx = 1
        for i in range(1,m+1): 
            val,w_idx = SumOverDim(i,w_ary,w_idx,x)          
            y += val

        return y

    def Cal_rms(length, input_data, target_data, W, m):
        Err = 0
        for n in range(length):
            # careful on the transpose result have to index into it again
            Err += (poly_func(input_data[n], W.transpose()[0], m) - target_data[n])**2 
        Err /= 2
        RMS_err = np.sqrt((2*Err)/length)
        
        return RMS_err
    
    RMS_err_tr = Cal_rms(N_tr, tr_x, tr_t, W, M)
    RMS_err_ts = Cal_rms(N_ts, ts_x, ts_t, W, M)
    tr_rms_ary.append(RMS_err_tr)
    ts_rms_ary.append(RMS_err_ts)
    
    print('training rms-Err for M = %s is %s.' % (str(M),str(RMS_err_tr)))
    print('testing rms-Err for M = %s is %s.' % (str(M),str(RMS_err_ts)))

[1.0, 6.9, 3.1, 5.4, 2.1]
[[ 1.18784287]
 [-0.14574959]
 [-0.00561512]
 [ 0.28516771]
 [ 0.50820959]]
training rms-Err for M = 1 is 0.226279143015.
testing rms-Err for M = 1 is 0.171124893337.
[1.0, 6.9, 3.1, 5.4, 2.1]
[1.0, 6.9, 3.1, 5.4, 2.1, 47.61, 21.39, 37.26, 14.49, 21.39, 9.61, 16.74, 6.51, 37.26, 16.74, 29.16, 11.34, 14.49, 6.51, 11.34, 4.41]
[[ -2.93370657e+03]
 [  7.91788753e+02]
 [  1.79349757e+02]
 [ -1.29746368e+02]
 [ -3.47921991e+01]
 [ -3.72590599e+01]
 [ -5.06103513e+00]
 [  7.25351435e+11]
 [ -6.30271188e+00]
 [ -3.49653083e+01]
 [  3.41726097e-01]
 [ -3.77224091e+12]
 [  2.26118373e+01]
 [ -7.25351435e+11]
 [  3.77224091e+12]
 [  1.18238994e+00]
 [  3.12321849e+00]
 [  4.93067984e+00]
 [ -1.29949572e+01]
 [ -4.74951388e+00]
 [  6.25962140e-01]]
training rms-Err for M = 2 is 128.884818701.
testing rms-Err for M = 2 is 115.754339859.


In [123]:
for i in [6.9  ,3.1,  5.4,  2.1]:
    for j in [6.9  ,3.1  ,5.4,  2.1]:
        print(i*j)

47.61000000000001
21.39
37.260000000000005
14.490000000000002
21.39
9.610000000000001
16.740000000000002
6.510000000000001
37.260000000000005
16.740000000000002
29.160000000000004
11.340000000000002
14.490000000000002
6.510000000000001
11.340000000000002
4.41


In [75]:
# test on ary, lst concatenate
tst = []
tst.append([4,3,2,1,1])
tst.append([1,1])
print(tst)
tst[0]= tst[0]+[5,6,7,8888]
print(tst)
tst_ary = np.asarray(tst)
print(tst_ary)
print(type(tst_ary))
print(tst_ary.shape)

ary = np.zeros((3,1))
# ary[0] = np.concatenate(ary[0], np.array([444,4]))
# print(np.concatenate(ary[0], np.array([444,4])))

lst =  [list(x) for x in np.ones((N_tr,1))]
print(lst[0]+[4,5,6])

# test on matrix power
print(np.power(np.array([1,2,3,4]),2))

# test on dot
x_tmp = np.array([1,2,3,4])
print(x_tmp.reshape(1,4))
print(np.atleast_2d(x_tmp).T)
print(np.dot(np.atleast_2d(x_tmp).T,  x_tmp.reshape(1,4)))
dotary = np.dot(np.atleast_2d(x_tmp).T,  x_tmp.reshape(1,4)).reshape(1,-1)
print('dot-result = %s, type = %s' % (dotary,type(dotary)))
print('dot-result,first-element = ',dotary[0])
print('dot-result = %s, type = %s' % (list(dotary[0]),type(list(dotary[0]))))
print(np.dot(np.atleast_2d(x_tmp).T,  1))
print(np.dot(1,  x_tmp.reshape(1,4)).reshape(4,1))


[[4, 3, 2, 1, 1], [1, 1]]
[[4, 3, 2, 1, 1, 5, 6, 7, 8888], [1, 1]]
[list([4, 3, 2, 1, 1, 5, 6, 7, 8888]) list([1, 1])]
<class 'numpy.ndarray'>
(2,)
[1.0, 4, 5, 6]
[ 1  4  9 16]
[[1 2 3 4]]
[[1]
 [2]
 [3]
 [4]]
[[ 1  2  3  4]
 [ 2  4  6  8]
 [ 3  6  9 12]
 [ 4  8 12 16]]
dot-result = [[ 1  2  3  4  2  4  6  8  3  6  9 12  4  8 12 16]], type = <class 'numpy.ndarray'>
dot-result,first-element =  [ 1  2  3  4  2  4  6  8  3  6  9 12  4  8 12 16]
dot-result = [1, 2, 3, 4, 2, 4, 6, 8, 3, 6, 9, 12, 4, 8, 12, 16], type = <class 'list'>
[[1]
 [2]
 [3]
 [4]]
[[1]
 [2]
 [3]
 [4]]


In [107]:
num = format(0.88888889, '.2f')
print(type(num))
print(float(num))
print(round(12.565687412, 2))

<class 'str'>
0.89
12.57
