## Singular Value Decomposition

Singular Value Decomposition is a generalisation of the Eigendecomposition method and is applicable on any type of Matrix.It's very intutive when dealing with finding hidden factors. Let's verify some properties

In [31]:
import pandas as pd
import numpy as np
np.set_printoptions(suppress=True)
pd.options.display.float_format = '{:,.3f}'.format

In [2]:
#Let's take this dataset
a = [[0,0],[1,2],[2,3],[3,6],[4,8],[5,9]]
b = ['X','Y']
dat = pd.DataFrame(a,columns = b)
dat

Unnamed: 0,X,Y
0,0,0
1,1,2
2,2,3
3,3,6
4,4,8
5,5,9


In [3]:
#Let's do the eigendecomposition first 
#This way we can verify the properties of SVD

In [4]:
#Covariance Matrix
C = dat.T @ dat

In [5]:
#eigendecomposition of the covariance matrix
eigenvalues, eigenvectors = np.linalg.eig(C)

In [6]:
#Let's sort them now
idx = eigenvalues.argsort()[::-1]   
eigenvalues= eigenvalues[idx]
eigenvectors = eigenvectors[:,idx]

In [7]:
eigenvalues

array([248.75477858,   0.24522142])

In [8]:
eigenvectors

array([[-0.46939609, -0.88298772],
       [-0.88298772,  0.46939609]])

#### Now let's use SVD

In [9]:
#Let'se use the SVD function from np.linalg library
U, s, VT = np.linalg.svd(dat, full_matrices=False)

In [10]:
#Also known as the matrix that shows the effect of each row on the themes
U 

array([[-0.        , -0.        ],
       [-0.14173072, -0.11269112],
       [-0.2274768 ,  0.72251284],
       [-0.42519217, -0.33807335],
       [-0.56692289, -0.45076447],
       [-0.65266896,  0.38443949]])

In [11]:
#The strength of the themes are represented in the diagonals of this matrix
s

array([15.77196179,  0.49519836])

In [12]:
#The themes to columns matrix
VT

array([[-0.46939609, -0.88298772],
       [ 0.88298772, -0.46939609]])

### Relationship between the Matrices

Let's now take a look at how all these matrices are related. The key property that we would be verifying would be that U*s matrix is infact the same as the projection of the original dataset on the principal components

In [13]:
#Let's denote the eigenvectors matrix as X. This is the same as the principal components
X = eigenvectors
X

array([[-0.46939609, -0.88298772],
       [-0.88298772,  0.46939609]])

In [14]:
#Let's project the data now by doing the basis transformation
datn = np.linalg.inv(X) @ dat.T
datn.T

Unnamed: 0,0,1
0,0.0,0.0
1,-2.235372,0.055804
2,-3.587755,-0.357787
3,-6.706115,0.167413
4,-8.941486,0.223218
5,-10.29387,-0.190374


In [15]:
#Now let's compute the U*s matrix
#First we've to convert s matrix to a diagonal one
ST = np.array([[15.77196179,0],[0,0.49519836]])

In [16]:
#U*ST
U @ ST

array([[ -0.        ,  -0.        ],
       [ -2.23537153,  -0.05580446],
       [ -3.58775534,   0.35778717],
       [ -6.70611458,  -0.16741337],
       [ -8.94148611,  -0.22321782],
       [-10.29386992,   0.1903738 ]])

Hence it has been verified

### So what can we infer?

1. SVD is a matrix manipulation method that gets generalized from the eigendecomposition route.
2. it overcomes the main problem of eigendecomposition where the matrix needs to be square and diagonalizable.
3. It enables us to decompose the data to find hidden themes in a more succint way


In [17]:
## Practice Questions

In [18]:
myfoodratings = pd.read_csv('Ratings/MyFoodRatings.csv')
myfoodratings

Unnamed: 0,Name,Chicken,Mutton,Paneer,ChowMein,SpringRolls,Momo,Sushi,Ramen,Tempura
0,A,5,5,5,0,0,0,0,0,0
1,B,4,4,4,0,0,0,0,0,0
2,C,3,3,3,0,0,0,0,0,0
3,D,2,2,2,0,0,0,0,0,0
4,E,0,0,0,2,2,2,0,0,0
5,F,0,0,0,1,1,1,0,0,0
6,G,0,0,0,5,3,4,0,0,0
7,H,0,0,0,4,4,4,0,0,0
8,I,0,0,0,0,0,0,2,2,4
9,J,0,0,0,0,0,0,1,1,1


In [20]:
myfoodratings.drop('Name',axis=1,inplace=True)
myfoodratings

Unnamed: 0,Chicken,Mutton,Paneer,ChowMein,SpringRolls,Momo,Sushi,Ramen,Tempura
0,5,5,5,0,0,0,0,0,0
1,4,4,4,0,0,0,0,0,0
2,3,3,3,0,0,0,0,0,0
3,2,2,2,0,0,0,0,0,0
4,0,0,0,2,2,2,0,0,0
5,0,0,0,1,1,1,0,0,0
6,0,0,0,5,3,4,0,0,0
7,0,0,0,4,4,4,0,0,0
8,0,0,0,0,0,0,2,2,4
9,0,0,0,0,0,0,1,1,1


In [21]:
U, s, VT = np.linalg.svd(myfoodratings, full_matrices=False)

In [44]:
pd.DataFrame(U)

Unnamed: 0,0,1,2,3,4,5,6,7,8
0,-0.68,0.0,-0.0,0.0,0.0,-0.0,-0.227,-0.408,-0.544
1,-0.544,0.0,-0.0,0.0,0.0,-0.0,-0.073,-0.132,0.824
2,-0.408,0.0,-0.0,0.0,0.0,-0.0,-0.055,0.901,-0.132
3,-0.272,-0.0,0.0,-0.0,-0.0,0.0,0.796,-0.066,-0.088
4,0.0,-0.326,-0.0,0.0,-0.29,-0.0,0.498,0.0,0.0
5,0.0,-0.163,0.0,-0.0,-0.145,0.0,-0.059,0.0,0.0
6,0.0,-0.664,0.0,-0.0,0.747,0.0,0.0,0.0,0.0
7,0.0,-0.652,0.0,-0.0,-0.58,0.0,-0.234,0.0,0.0
8,0.0,0.0,-0.541,-0.837,0.0,0.084,0.0,0.0,0.0
9,0.0,0.0,-0.194,0.162,0.0,0.369,0.0,0.0,0.0


In [24]:
s

array([12.72792206, 10.57703788,  8.84826058,  1.24205263,  1.06125858,
        0.40692755,  0.        ,  0.        ,  0.        ])

In [57]:
np.diag(s[:3])

array([[17.32050808,  0.        ,  0.        ],
       [ 0.        , 13.85640646,  0.        ],
       [ 0.        ,  0.        , 10.39230485]])

In [45]:
pd.DataFrame(VT)

Unnamed: 0,0,1,2,3,4,5,6,7,8
0,-0.577,-0.577,-0.577,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0
1,0.0,0.0,0.0,-0.638,-0.512,-0.575,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,-0.476,-0.558,-0.68
3,0.0,0.0,0.0,0.0,0.0,0.0,0.296,0.626,-0.721
4,-0.0,-0.0,-0.0,0.653,-0.756,-0.051,-0.0,-0.0,-0.0
5,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,0.828,-0.545,-0.133
6,0.0,0.0,0.0,-0.408,-0.408,0.816,0.0,0.0,0.0
7,-0.816,0.408,0.408,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0
8,0.0,-0.707,0.707,0.0,0.0,0.0,0.0,0.0,0.0


In [49]:
data = pd.read_csv('Ratings/FoodRatings_all_same_5.csv')
data.drop('Name',axis=1,inplace=True)
U, s, VT = np.linalg.svd(data, full_matrices=True)

In [53]:
s

array([17.32050808, 17.32050808, 17.32050808,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ])

In [54]:
data = pd.read_csv('Ratings/FoodRatings_all_same_5_4_3.csv')
data.drop('Name',axis=1,inplace=True)
U, s, VT = np.linalg.svd(data, full_matrices=True)

In [55]:
s

array([17.32050808, 13.85640646, 10.39230485,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ])

In [61]:
a = [1,2,3]
b = [-2,1,0]
np.dot(a,b)

0