## Matrix factorization

$$
\operatorname{Data}_{m x n}=U_{m x m} \Sigma_{m x n} V_{n x n}^{T}
$$

The decomposition creates the $\Sigma$, which will have only diagonal elements; all other elements of this matrix are 0. Another convention is that the diagonal elements of $\Sigma$ are sorted from largest to smallest.

## SVD in Python

In [3]:
import numpy as np
U, Sigma, VT = np.linalg.svd([[1,1],[7,7]])

In [4]:
U

array([[-0.14142136, -0.98994949],
       [-0.98994949,  0.14142136]])

In [5]:
Sigma

array([10.,  0.])

In [6]:
VT

array([[-0.70710678, -0.70710678],
       [-0.70710678,  0.70710678]])

In [7]:
def loadExData():
    return[[1, 1, 1, 0, 0],
           [2, 2, 2, 0, 0],
           [1, 1, 1, 0, 0],
           [5, 5, 5, 0, 0],
           [1, 1, 0, 2, 2],
           [0, 0, 0, 3, 3],
           [0, 0, 0, 1, 1]]

In [8]:
data = loadExData()
U, Sigma, VT = np.linalg.svd(data)
U.shape

(7, 7)

In [9]:
Sigma.shape

(5,)

In [10]:
VT.shape

(5, 5)

In [11]:
Sigma

array([9.72140007e+00, 5.29397912e+00, 6.84226362e-01, 4.96619610e-16,
       1.57294073e-16])

In [12]:
Sig3 = np.mat([[Sigma[0],0,0],[0,Sigma[1],0],[0,0,Sigma[2]]])
print(U[:,:3]*Sig3*VT[:3,:])

[[ 1.00000000e+00  1.00000000e+00  1.00000000e+00 -6.04659552e-16
  -5.95010152e-16]
 [ 2.00000000e+00  2.00000000e+00  2.00000000e+00  6.70904304e-16
   6.90419943e-16]
 [ 1.00000000e+00  1.00000000e+00  1.00000000e+00 -7.44630052e-16
  -7.34980653e-16]
 [ 5.00000000e+00  5.00000000e+00  5.00000000e+00 -4.35415592e-16
  -3.66026653e-16]
 [ 1.00000000e+00  1.00000000e+00 -7.77156117e-16  2.00000000e+00
   2.00000000e+00]
 [-5.55111512e-17 -1.38777878e-16  5.55111512e-17  3.00000000e+00
   3.00000000e+00]
 [-6.93889390e-18 -4.85722573e-17  1.38777878e-17  1.00000000e+00
   1.00000000e+00]]


## Collaborative filtering–based recommendation engines
### Measuring similarity 

In [20]:
import numpy as np

def ecludSim(inA, inB):
    return 1.0/(1.0+np.linalg.norm(inA-inB))

def pearsSim(inA, inB):
    if len(inA)<3: return 1.0
    return 0.5+0.5*np.corrcoef(inA, inB, rowvar=0)[0][1]

def cosSim(inA, inB):
    num = float(inA.T * inB)
    denom = np.linalg.norm(inA)*np.linalg.norm(inB)
    return 0.5+0.5*(num/denom)

In [21]:
myMat=np.mat(loadExData())
ecludSim(myMat[:,0],myMat[:,4])

0.13367660240019172

In [22]:
ecludSim(myMat[:,0],myMat[:,0])

1.0

In [23]:
cosSim(myMat[:,0],myMat[:,4])

0.5472455591261534

In [24]:
cosSim(myMat[:,0],myMat[:,0])

0.9999999999999999

In [25]:
pearsSim(myMat[:,0],myMat[:,4])

0.23768619407595826

### Recommending untasted dishes