# Understanding SVD

We first try to gain some intuition on SVD using a small toy dataset. Let the following be a user-movie rating dataset where every user has rated every movie. 

In [None]:
import numpy as np
import pandas as pd

np.set_printoptions(formatter={'float': lambda x: "{0:0.2f}".format(x)})

# We should look at this data first, and then see what happens when we add another row
#df = pd.DataFrame([[1,2,8,9,3,3],[2,1,9,8,4,2],[2,2,6,8,2,3],
#                   [9,7,2,3,1,1],[1,1,1,2,8,7],[2,2,3,2,8,8],
#                   [7,9,2,2,2,3],[9,8,2,3,1,3]], 
#                  columns=["horror1","horror2","drama1","drama2","art1","art2"], 
#                  index=["u0","u1","u2","u3","u4","u5","u6","u7"])


# A little change
df = pd.DataFrame([[1,2,8,9,3,3],[2,1,9,8,4,2],[2,2,6,8,2,3],
                   [9,7,2,3,1,1],[1,1,1,2,8,7],[2,2,3,2,8,8],
                   [7,9,2,2,2,3],[9,8,2,3,1,3],[7,1,1,9,2,8]], 
                  columns=["horror1","horror2","drama1","drama2","art1","art2"], 
                  index=["u0","u1","u2","u3","u4","u5","u6","u7","u8"])


df

Let us extract the data in the form of a matrix and subtract the mean from each row. 

In [None]:
A = df.values
means = np.mean(A,axis=1).reshape((A.shape[0],1))
#print(means)
A = A - means
print(A)

Let us also examine on average what is the rating for each movie.

In [None]:
print(np.mean(A,axis=0))

## Compute SVD

In [None]:
U, S, VT = np.linalg.svd(A, full_matrices=False)

print("U = \n", U, "\n")
print("S = ", S, "\n")
print("VT = \n", VT, "\n")

## Dimension Reduction

We project the data onto the first $k$ singular vectors (optionally scaled by the corresponding singular values). This gives us the data in a reduced dimension, but should retain the most important information. 

In [None]:
k = 3
#Ak = np.diag(S[:k]) @ np.transpose(U[:,:k]) @ A
Ak = np.transpose(U[:,:k])  @ A
print(Ak)