# Matrix Factorization 

Let's say we work in a vinyl record and CD store downtown. A customer comes in and asks us for a recommendation. We ask the obvious question "What do you like?" and she responds with "I like Taylor Swift and Carrie Underwood." There are several ways we might come up with a recommendation for her. One is to reflect on regular customers to our store who bought albums from those artists and think what else they bought. Perhaps we notice that recently, people who bought Taylor Swift CDs also bought CDs by Miranda Lambert and we recommend Miranda Lambert to the new customer. This is a two step process. First, we determine previous customers who are most similar to the person standing in front of us and second, we look at what those previous customers bought and then use that information to make recommendations to our current customer. 

Another way we might come up with a recommendation is as follows. We know Taylor Swift and Carrie Underwood CDs share certain features. They both, obviously, have prominent female vocals. They both feature singer-songwriters They both have country influences and no PBR&B influences (the term PBR&B, aka hipster R&B and R neg B, is a portmanteau of PBR--Pabst Blue Ribbon, the hipster beer of choice--and R&B).  Then we think "Hey, Miranda Lambert CDs also have prominent female vocals and have country influences but no PBR&B influences, and we recommend Miranda Lambert to our new customer. With this recommendation method we extract a set of features from CDs this person likes and then think what other CDs share these features. 

Let's check this out a bit further. We will restrict ourselves to two features (country and PBR&B influences), five artists (Taylor Swift, Miranda Lambert,  Carrie Underwood, Jhené Aiko, and The Weeknd) and two customers (Jake and Ann).  As owners of the vinyl record store we have gone through and meticulously rated artists on these features. 

|Artist| Country| PBR&B|
|:-----------|:------:|:------:|
| Taylor Swift | 0.90 | 0.05|
|Miranda Lambert | 0.98| 0.00|
|Carrie Underwood|0.95| 0.03|
| Jhené Aiko | 0.01 | 0.99 |
| The Weeknd | 0.03 | 0.98 |

So Taylor Swift exudes a lot of country influences (0.90) but little PBR&B (0.05).

When customers come into our store we ask them on a scale of 0 to 5 how well they like country and how well they like PBR&B:

|Customer| Country| PBR&B|
|:-----------|:------:|:------:|
| Jake | 5 | 1|
|Ann | 0| 5|

Suppose Jake comes into the store,  has never heard of Miranda Lambert, and we are trying to predict how he might rate her. Jake rated country music a 5 and Miranda is 0.98 country so we multiply those numbers together.

$$5 \times 0.98 = 4.9$$

We do the same for the PBR&B numbers and add them together to get our estimate of Jake's rating of Miranda Lambert.

$$rating_{Jake,Miranda} =  5 \times 0.98 + 1 \times 0.05 = 4.9 + 0.05 = 4.95$$


#### What is Ann's ratings of Taylor Swift and Jhené Aiko?

$$rating_{a,ts} = 0 \times  0.9 + 5 \times .05 = 0.25$$

$$rating_{a,ja} = 0 \times 0.01  + 5 \times  0.99 = 4.95$$

### a slight change

Let's change this scenario a bit. Suppose we still rate our artists as above, but this time instead of asking customers about how well they like country and PBR&B we ask them how well they like various artists and we get something like the following (a question mark indicates that that customer has not rated that artist):

|Customer | Taylor Swift | Miranda Lambert | Carrie Underwood | Jhené Aiko | The Weeknd |
|:-----------|:------:|:------:|:---------:|:------:|:--------:|
|Jake|5|?|5|2|2|
|Clara|2|?|?|4|5|
|Kelsey|5|5|5|2|?|
|Ann|2|3|?|5|5|
|Jessica|2|1|?|5|?|



<h3 style="color:red">Q1. Ratings</h3>

<span style="color:red">Can you create a DataFrame called `R` (for ratings) representing the information in the above table? The question marks should be represented as not a number. The index should be the customer name.</span>

In [7]:
import numpy as np
from pandas import DataFrame
import pandas as pd

## to be done
r = {'Customer': ['Jake', 'Clara', 'Kelsey', 'Ann', 'Jessica'], 
     'Taylor': [5,2, 5, 2, 2], 'Miranda': [None,None,5,3,1], 
     'Carrie': [5,None,5,None,None], 'Jhené': [2,4,2,5,5], 'The Weeknd': [2, 5, None, 5,None]}

R = DataFrame(r, columns = ['Customer', 'Taylor', 'Miranda', 'Carrie', 'Jhené', 'The Weeknd'])
R = R.set_index('Customer')
R

Unnamed: 0_level_0,Taylor,Miranda,Carrie,Jhené,The Weeknd
Customer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Jake,5,,5.0,2,2.0
Clara,2,,,4,5.0
Kelsey,5,5.0,5.0,2,
Ann,2,3.0,,5,5.0
Jessica,2,1.0,,5,


Let's back up a step to see what we've done. When customers come in they rate various artists. That's table *R* above. And in our heads we figure that Jake likes country music and Miranda Lambert is a country artist so we recommend her to him. 

### asking the customers directly
Instead of doing stuff in our head, suppose we ask customers directly how well they like country and PBR&B. So we would get a table like:


|Customer| Country| PBR&B|
|:-----------|:------:|:------:|
| Jake | 5 | 2|
| Clara | 2 | 4.5|
| Kelsey | 5 | 2|
|Ann | 2.5| 5|
| Jessica | 1.5 | 5|

In addition we rate various artists on these features as well:

|Artist| Country| PBR&B|
|:-----------|:------:|:------:|
| Taylor Swift | 0.90 | 0.05|
|Miranda Lambert | 0.98| 0.00|
|Carrie Underwood|0.95| 0.03|
| Jhené Aiko | 0.01 | 0.99 |
| The Weeknd | 0.03 | 0.98 |

Let's make a DataFrame for the customer ratings (call it *P*) and one for the artists (call it *Q*)

In [9]:
p = {'Customer': ['Jake', 'Clara', 'Kelsey', 'Angel', 'Jordyn'], 'Country': [5, 2, 5, 2.5, 1.5], 'PBR&B': [2, 4.5, 2, 5, 5]}
P = DataFrame(p)
P = P.set_index('Customer')
print(P)
q = {'Artist': ['Taylor', 'Miranda', 'Carrie', 'Jhené', 'The Weeknd'],
     'Country': [0.90, 0.98, 0.95, 0.01, 0.03], 
     'PBR&B': [0.05, 0.00, 0.03, 0.99, 0.98]}
Q = DataFrame(q)
Q = Q.set_index('Artist')
print(Q)

          Country  PBR&B
Customer                
Jake          5.0    2.0
Clara         2.0    4.5
Kelsey        5.0    2.0
Angel         2.5    5.0
Jordyn        1.5    5.0
            Country  PBR&B
Artist                    
Taylor         0.90   0.05
Miranda        0.98   0.00
Carrie         0.95   0.03
Jhené          0.01   0.99
The Weeknd     0.03   0.98



# Matrix Factorization
For matrix factorization we don't tell the algorithm a preset list of features (female vocal, country, PBR&B, etc.). Instead we give the algorithm a chart (matrix) like the following:



|Customer | Taylor Swift | Miranda Lambert | Carrie Underwood | Jhené Aiko | The Weeknd |
|:-----------|:------:|:------:|:---------:|:------:|:--------:|
|Jake|5|?|5|2|2|
|Clara|2|?|?|4|5|
|Kelsey|5|5|5|2|?|
|Ann|2|3|?|5|5|
|Jessica|2|1|?|5|?|


and ask the algorithm to extract a set of features from this data. To anthropomorphize this yet even more, it is like asking the algorithm, *Okay algorithm, given 2 features (or some number of features) call them feature 1 and feature 2, can you come up with the P and Q matrices?*  These extracted features are not going to be something like 'female vocals' or 'country influence'. In fact, we don't care what these features represent. Again, we are going to ask the algorithm to extract features that are hidden in that table above. In order to make this sound a bit fancier than 'hidden features' data scientists use the Latin word for 'lie hidden', *lateo*, and call these **latent features**.

The inputs to the matrix factorization algorithm are the data in the chart shown above and the number of latent features to use (for example, 2). Our eventual goal is to calculate  $\hat{R}$ a table of estimated ratings. That is, a table similar to the above but with all the numbers filled in:

|Customer | Taylor Swift | Miranda Lambert | Carrie Underwood |Jhené Aiko| The Weeknd |
|:-----------|:------:|:------:|:---------:|:------:|:--------:|
|Jake|4.92|**4.78**|4.94|1.79|2.17|
|Clara|2.16|**2.97**|**1.64**|4.30|4.62|
|Kelsey|4.98|4.91|4.96|2.12|**2.52**|
|Ann|2.04|2.99|**1.45**|4.79|5.13|
|Jessica|1.79|**2.80**|1.16|4.89|**5.22**|

The bolded numbers are those that were blank in the original chart but predicted by our algorithm. The unbolded numbers are predicted values that have an actual value in the original table.  From our original data we see that Jake gave a rating of 5 to both Taylor Swift and Carrie Underwood and we see that the algorithm's estimates for those are 4.92 and 4.94---pretty good!
To get these predicted values we use latent features as an intermediary. Let's say we have two features: *feature 1* and *feature 2*. And, to keep things simple, let's just look at how to get Jake's rating of Taylor Swift.  Jake's rating is based solely on these two features and for Jake, these features are not equal in importance but are weighed differently. For example, Jake might weigh these features:

|       -   | Feature 1 | Feature 2 |
|:-----|:----:|:----:|
| Jake |0.717 | 2.309 |


So feature 2 is much more influential in Jake's rating than feature 1 is.  

We are going to have these feature weights for all our users and, again, by convention we call the resulting matrix, *P*:



|       -   | Feature 1 | Feature 2 |
|:-----|:----:|:----:|
| Jake | 0.717 | 2.309 |
| Clara | 1.875 | 0.437 |
| Kelsey | 0.861 | 2.288 |
| Ann | 2.10 | 0.295 |
| Jessica | 2.14 | 0.145 |


The word 'matrix' just means a table of numbers, just like we have above.

The other thing we need is how these features are represented in Taylor Swift-- how much "Feature 1-iness IS Taylor Swift? So we need a table of weights for the artists and again by convention, we call this matrix *Q*:


 
| Artist | Feature 1 | Feature 2 |
|:-----|:----:|:----:|
| Taylor Swift | 0.705 | 1.913 |
| Miranda Lambert | 1.189 | 1.700 |
| Carrie Underwood | 0.407| 2.015 |
| Jhené Aiko | 2.276 | 0.072 |
| The Weeknd | 2.419 | 0.191 |


Now back to our task of predicting how Jake will rate Taylor Swift ...




If we want to know how Jake will rate Taylor Swift we take Jake's weights for these features


|       -   | Feature *x* | Feature *y* |
|:-----|:----:|:----:|
| Jake | 0.717 | 2.309 |

and Taylor Swift's:

| Artist   | Feature *x* | Feature *y* |
|:-----|:----:|:----:|
| Taylor Swift  | 0.705 | 1.913 |

Multiply together Jake's and Taylor Swift's values for each feature:


|       -   | Feature *x* | Feature *y* |
|:-----|:----:|:----:|
| Jake| 0.717 | 2.309 |
| Taylor Swift  | 0.705 | 1.913 |
| **Product** | 0.505| 4.417 |

Then add those products up to get the predicted rating, *r*

$$ r =0.505 + 4.417 = 4.92$$

## Dot Product

This operation is called the dot product. A list of numbers, for example, Jake's weights for the features: [0.717, 2.309] is called a **vector**. A dot product is performed on two vectors of equal length and produces a single value. It is defined as follows:

Let A and B be two vectors of equal length. Then

$$A \cdot  B = \sum_{i=1}^nA_iB_i=A_1B_1+A_2B_2+A_1B_1+...A_nB_n$$

So, for example, if

$$A = [1, 3, 5, 7, 9]$$

and 

$$B = [2, 4, 6, 8, 10]$$

then the dot  product of A and B is

$$A \cdot B = 1 \times 2 + 3 \times 4 + 5 \times 6 + 7 \times 8 + 9 \times 10 =  190$$

 

So above we determined Jake's rating of Taylor Swift by getting the dot product of Jake, *J* and Taylor Swift, *S*:

$$J \cdot  S = 0.717 \times 0.705 +  2.309 \times 1.913 = 4.92$$

And, since I am giving things fancy names in this section,  I am going to call the Table from users to weights of the different features, Matrix P and the table from artists to weight Matrix Q. Once we have P and Q it is easy to make predictions. 

## Multiplying matrices
Great. We now have an estimate of how Jake will rate Taylor Swift. Now we want to do this for all user, artist pairs to get $\hat{R}$  (the little hat over the *R* indicates it is our estimate of the ratings). The actual ratings are in the matrix *R* above. We get $\hat{R}$ by multiplying the *P* and *Q* matrices together.  Here's the thing about multipying matrices. To multiply matrices one matrix needs to have the same number of columns as the other has rows. If you look at *P* and *Q* above you can see that this is not the case. To make this work out mathematically, we need to flip one of the matrices on-end so that the rows become the columns. Let's do this for matrix Q. So *Q* originally is 



| Artist  | Feature 1 | Feature 2 |
|:-----|:----:|:----:|
| Taylor Swift | 0.705 | 1.913 |
| Miranda Lambert | 1.189 | 1.700 |
| Carrie Underwood | 0.407| 2.015 |
| Jhené Aiko | 2.276 | 0.072 |
| The Weeknd | 2.419 | 0.191 |



and flipped:

|feature: | Taylor Swift | Miranda Lambert | Carrie Underwood | Jhené Aiko | The Weeknd |
|:-----------|:------:|:------:|:---------:|:------:|:--------:|
|1|0.705|1.189|0.407|2.276|2.419|
|2|1.913|1.700|2.015|0.072|0.191|


This flipping of the table (or matrix) is called transposing the matrix.  If the original matrix is called *Q* the transpose of the matrix is indicated by $Q^T.$ 

So now when you see $Q^T$ you don't need to freak out. Just think, oh, I just flip the matrix so rows become columns!

Let's see how to do that in Pandas

 

In [21]:
q = {'Artist': ['Taylor', 'Miranda', 'Carrie', 'Jhené', 'The Weeknd'],
     'Country': [0.705, 1.189, 0.407, 2.276, 2.419], 
     'PBR&B': [1.913, 1.7, 2.015, 0.072, 0.191]}
Q = DataFrame(q)
Q = Q.set_index('Artist')
Q

Unnamed: 0_level_0,Country,PBR&B
Artist,Unnamed: 1_level_1,Unnamed: 2_level_1
Taylor,0.705,1.913
Miranda,1.189,1.7
Carrie,0.407,2.015
Jhené,2.276,0.072
The Weeknd,2.419,0.191


and transposed:

In [22]:
Q.T

Artist,Taylor,Miranda,Carrie,Jhené,The Weeknd
Country,0.705,1.189,0.407,2.276,2.419
PBR&B,1.913,1.7,2.015,0.072,0.191


Cool. And our estimate of the ratings equals:

$$\hat{R} = PQ^T$$

or in our case of customers and artists:

$$\hat{R} =\begin{bmatrix}
0.717 & 2.309 \\
1.875 & 0.437 \\
0.861 & 2.288 \\
2.100 & 0.295 \\
2.140 & 0.145
\end{bmatrix}  \times
 \begin{bmatrix}
0.705 & 1.189 & 0.407 & 2.276 & 2.419  \\
1.913 & 1.700 & 2.015 & 0.072 & 0.191
\end{bmatrix} $$
and when we do this multiplication we will get the filled in version of our estimated ratings table:


|Customer | Taylor Swift | Miranda Lambert | Carrie Underwood | Nicki Minaj | Ariana Grande |
|:-----------|:------:|:------:|:---------:|:------:|:--------:|
|Jake|-|-|-|-|-|
|Clara|-|-|-|-|-|
|Kelsey|-|-|-|-|-|
|Angelica|-|-|-|-|-|
|Jordyn|-|-|-|-|-|

Here is how we multiply matrices *P* and $Q^T$ together.  To get the value of the first row, first column of our result (in our case Jake's estimated rating of Taylor Swift) we take the dot product of the first row of *P* and the first column of $Q^T.$  




$$ = 0.717 \times 0.705 + 2.309 \times 1.913 = 4.92$$

|Customer | Taylor Swift | Miranda Lambert | Carrie Underwood | Nicki Minaj | Ariana Grande |
|:-----------|:------:|:------:|:---------:|:------:|:--------:|
|Jake|4.92|-|-|-|-|
|Clara|-|-|-|-|-|
|Kelsey|-|-|-|-|-|
|Angelica|-|-|-|-|-|
|Jordyn|-|-|-|-|-|

To get the estimated value for row one column two (Jake's rating of Miranda Lambert) we take the dot product of the first row of *P* and the second column of  $Q^T:$




$$ = 0.717 \times 1.189 + 2.309 \times 1.700 = 4.77$$



|Customer | Taylor Swift | Miranda Lambert | Carrie Underwood | Nicki Minaj | Ariana Grande |
|:-----------|:------:|:------:|:---------:|:------:|:--------:|
|Jake|4.78|4.77|-|-|-|
|Clara|-|-|-|-|-|
|Kelsey|-|-|-|-|-|
|Angelica|-|-|-|-|-|
|Jordyn|-|-|-|-|-|

and so on.

#### Pandas
To multiply 2 matrices together in Pandas we use the `dot` method:


In [23]:
df1 = DataFrame({0: [0,3,6,9], 1: [1,4,7, 10], 2: [2,5,8, 11]})
df2 = DataFrame({0: [0, 4, 8], 1: [1, 5, 9], 2: [2, 6, 10], 3: [3, 7, 11]})
print(df1)
print("--------------")
print(df2)
df1.dot(df2)

   0   1   2
0  0   1   2
1  3   4   5
2  6   7   8
3  9  10  11
--------------
   0  1   2   3
0  0  1   2   3
1  4  5   6   7
2  8  9  10  11


Unnamed: 0,0,1,2,3
0,20,23,26,29
1,56,68,80,92
2,92,113,134,155
3,128,158,188,218




<h3 style="color:red">Q2. Predicting Ratings</h3>

<span style="color:red">Here is a question. Using P and Q can we predict how each customer will rate each artist, $\hat{R}$? I want it to look something like </span>

|Artist | Taylor | Miranda | Carrie | Jhené  | The Weeknd |
|:-----------|:------:|:------:|:---------:|:------:|:--------:|
| Customer | | | | | |					
|Jake	| 4.600	| 4.90	| 4.810	| 2.030	| 2.110
|Clara	| 2.025	| 1.96	| 2.035	| 4.475	| 4.470
|Kelsey	| 4.600	| 4.90	| 4.810	| 2.030	| 2.110
|Angel	| 2.500	| 2.45	| 2.525	| 4.975	| 4.975
|Jordyn	| 1.600	| 1.47	| 1.575	| 4.965	| 4.945




In [24]:
p = {'Customer': ['Jake', 'Clara', 'Kelsey', 'Angel', 'Jordyn'], 'Country': [5, 2, 5, 2.5, 1.5], 'PBR&B': [2, 4.5, 2, 5, 5]}
P = DataFrame(p)
P = P.set_index('Customer')
q = {'Artist': ['Taylor', 'Miranda', 'Carrie', 'Jhené', 'The Weeknd'],
     'Country': [0.90, 0.98, 0.95, 0.01, 0.03], 
     'PBR&B': [0.05, 0.00, 0.03, 0.99, 0.98]}
Q = DataFrame(q)
Q = Q.set_index('Artist')

print("PREDICTED RATINGS")
# your work here
P.dot(Q.T)

PREDICTED RATINGS


Artist,Taylor,Miranda,Carrie,Jhené,The Weeknd
Customer,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Jake,4.6,4.9,4.81,2.03,2.11
Clara,2.025,1.96,2.035,4.475,4.47
Kelsey,4.6,4.9,4.81,2.03,2.11
Angel,2.5,2.45,2.525,4.975,4.975
Jordyn,1.6,1.47,1.575,4.965,4.945


Once we have *P* and *Q* it is easy to generate estimated ratings. But how do we get these matrices *P* and *Q*?

## How do we get Matrices P and Q?
There are several common ways to derive these matrices. One method is called **stochastic gradient descent.** The basic idea is this. We are going to randomly select values for *P* and *Q*.  For example, we would randomly select initial values for Jake:

Jake = [0.03, 0.88]

and randomly select initial values for Taylor Swift:

Taylor = [ 0.73,  0.49]

So with those initial ratings we get a prediction of 
$$J \cdot  S = 0.03 \times 0.73 +  0.88 \times 0.49 = 0.45$$

which is a particularly bad guess considering Jake really gave Taylor Swift a '5'. So we adjust those values. We underestimated Jake's rating of Taylor Swift so we boost maybe something like:

Jake = [0.12, 0.83]

Taylor = [ 0.80,  0.47]

and now we get:

$$J \cdot  S = 0.12 \times 0.80 +  0.83 \times 0.47 = 0.49$$

That is better than before but still we underestimated so we adust and try again. And adjust and try again. We repeat this process thousands of times until our predicted values get close to the actual values. The general algorithm is

1. generate random values for the P and Q matrices
2. using these P and Q matrices estimate the ratings (for ex., Jake's rating of Taylor Swift).
3. compute the error between the actual rating and our estimated rating (for example, Jake actually gave Taylor Swift a '5' but using P and Q we estimated the rating to be 0.45. Our error was 4.55.  Let's call this $e_{Jake,Taylor}$
4. using this error adjust P and Q to improve our estimate
5. If our total error rate is small enough or we have gone through a bunch of iterations (for ex., 4000) terminate the algorithm. Else go to step 2.

For how simple this algorithm is, it works surprisingly well. And that is the algorithm we are going to implement.

To help with debugging I am not going to generate new random *P* and *Q* matrices. Once we are sure the algorithm works we can make the switch.

Let me specify step 4 a bit more, and first let me specify my notation.

If I have a matrix, *P*

    0  1
    2  3
    4  5
    
then

$$P_{00} = 0$$
$$P_{01} = 2$$
$$P_{10} = 1$$

and so on.

Then my formulas for step 4 are:


$$P_{pf} = P_{pf} + \alpha (2e_{pq}Q_{qf} + \beta P_{pf})$$
$$Q_{qf} = Q_{qf} + \alpha (2e_{pq}P_{pf} + \beta Q_{qf})$$

where $\alpha$ is a small constant value. We want to small to prevent overshooting the minimum.

$\beta$ is introduced to avoid overfitting.

When I implemented this with straight python (not using numpy or pandas) my function looked like:


    def matrix_factorization(R, P, Q, K, steps=5000, alpha=0.0002, beta=0.02):
        Q = transpose(Q)
        for step in range(steps):
            for i in range(len(R)):
                for j in range(len(R[i])):
                    if R[i][j] > 0:
                        eij = R[i][j] - dot(P[i],transpose(Q)[j])
                        for k in range(K):
                            P[i][k] = P[i][k] + alpha * (2 * eij * Q[k][j] - beta * P[i][k])
                            Q[k][j] = Q[k][j] + alpha * (2 * eij * P[i][k] - beta * Q[k][j])
            eR = matMult(P,Q)
            e = 0
            for i in range(len(R)):
                for j in range(len(R[i])):
                    if R[i][j] > 0:
                        e = e + pow(R[i][j] - dot(P[i],transpose(Q)[j]), 2)
                        for k in range(K):
                            e = e + (beta/2) * (pow(P[i][k],2) + pow(Q[k][j],2))
            if e < 0.001:
                break
        return P, transpose(Q)


Again, to help you debug, iterating through the Pandas implementation just once gave me:

**R**  (this also shows my error rate at the zeroth iteration was 235.)

    0 235.491377938
    Artist      Taylor   Miranda    Carrie     Nicki    Ariana
    Customer                                                  
    Jake      0.737250  0.623423  0.371328  0.195428  0.142329
    Clara     0.684926  0.368316  0.505351  0.266068  0.244532
    Kelsey    0.604395  0.614587  0.225689  0.118728  0.061554
    Angel     0.292046  0.291816  0.112974  0.059435  0.032488
    Jordyn    0.497745  0.395119  0.270304  0.142272  0.109821

**P**

              feature_1  feature_2
    Customer                      
    Jake       0.643028   0.383676
    Clara      0.092696   0.821839
    Kelsey     0.774896   0.086086
    Angel      0.362096   0.052973
    Jordyn     0.372433   0.315929
    
**Q**    
    
             feature_1  feature_2
    Artist                       
    Taylor    0.696106   0.754893
    Miranda   0.752767   0.363256
    Carrie    0.225768   0.589438
    Nicki     0.118739   0.310354
    Ariana    0.046968   0.292245


good luck!

In [32]:
r = {'Customer': ['Jake', 'Clara', 'Kelsey', 'Angel', 'Jordyn'], 'Taylor': [5,2, 5, 2, 2], 'Miranda': [0,0,5,3,1], 'Carrie': [5,0,5,0,0], 'Nicki': [2,4,2,5,5], 'Ariana': [2, 5, 0, 5,0]}
r = {'Customer': ['Jake', 'Clara', 'Kelsey', 'Angel', 'Jordyn'], 
     'Taylor': [5,2, 5, 2, 2], 'Miranda': [None,None,5,3,1], 
     'Carrie': [5,None,5,None,None], 'Nicki': [2,4,2,5,5], 'Ariana': [2, 5, None, 5,None]}

R = DataFrame(r, columns = ['Customer', 'Taylor', 'Miranda', 'Carrie', 'Nicki', 'Ariana'])
R = R.set_index('Customer')

p = {'Customer': ['Jake', 'Clara', 'Kelsey', 'Angel', 'Jordyn'], 'feature_1': [0.64132372, 0.092069, 0.77184994, 0.36048553,0.37160684], 'feature_2': [0.3808661, 0.82043744, 0.08276139, 0.05087073, 0.3147877]}
P = DataFrame(p)
P = P.set_index('Customer')


q = {'Artist': ['Taylor', 'Miranda', 'Carrie', 'Nicki', 'Ariana'],
     'feature_1': [0.69314147, 0.75093344, 0.22309714, 0.11611411, 0.04559444], 
     'feature_2': [0.75344936, 0.36297899, 0.58856832, 0.30807273, 0.29029353]}
Q = DataFrame(q)
Q = Q.set_index('Artist')




def matrix_factorization(R, P, Q,  steps=5000, alpha=0.0002, beta=0.02):
    for step in range(steps):
        Rhat = P.dot(Q.T)
        ## your work here
        
        
        
        
        
        ## my code to compute the mean square error uncomment when you add the eij computation
        #eij2 = eij.apply(np.square)
        #e = np.nansum(eij2.values)
        #e2 = (beta / 2) * (P.apply(np.square).values.sum() + Q.apply(np.square).values.sum())
        #e += e2
        #if step % 250 ==0:
        #    print (step, e)
    
    return P,Q 

(p, q) = matrix_factorization(R, P, Q)
print(p)
print(q)
print(p.dot(q.T))

          feature_1  feature_2
Customer                      
Jake       0.641324   0.380866
Clara      0.092069   0.820437
Kelsey     0.771850   0.082761
Angel      0.360486   0.050871
Jordyn     0.371607   0.314788
         feature_1  feature_2
Artist                       
Taylor    0.693141   0.753449
Miranda   0.750933   0.362979
Carrie    0.223097   0.588568
Nicki     0.116114   0.308073
Ariana    0.045594   0.290294
Artist      Taylor   Miranda    Carrie     Nicki    Ariana
Customer                                                  
Jake      0.731491  0.619838  0.367243  0.191801  0.139804
Clara     0.681975  0.366939  0.503424  0.263445  0.242366
Kelsey    0.597358  0.609649  0.220908  0.115119  0.059217
Angel     0.288196  0.289166  0.110364  0.057529  0.031204
Jordyn    0.494753  0.393313  0.268178  0.140126  0.108324
