# <center>Recommendation Engine by PySpark </center>
<center> Xinyi Bian</center>

# PART I: Intro
## Two types of recommendation engines
1. Content-Based Filtering: Based on features of items
2. Collaborative Filtering: Based on similar user preferences

## Two types of ratings
1. Explicit ratings
2. Implicit ratings

## Latent features
Features that are contained in data, but aren't directly observable

## Matrix factorization
1. ALS
2. SVD
3. SGD

## Rank of Factor Matrices
Number of Latent Features

# PART II: Data preparation for Spark ALS

In [None]:
# Import monotonically_increasing_id and show R
from pyspark.sql.functions import monotonically_increasing_id
R.show()

    +----------------+-----+----+----------+--------+
    |            User|Shrek|Coco|Swing Kids|Sneakers|
    +----------------+-----+----+----------+--------+
    |    James Alking|    3|   4|         4|       3|
    |Elvira Marroquin|    4|   5|      null|       2|
    |      Jack Bauer| null|   2|         2|       5|
    |     Julia James|    5|null|         2|       2|
    +----------------+-----+----+----------+--------+

In [None]:
# Use the to_long() function to convert the dataframe to the "long" format.
ratings = to_long(R)
ratings.show()

    +----------------+----------+------+
    |            User|     Movie|Rating|
    +----------------+----------+------+
    |    James Alking|     Shrek|     3|
    |    James Alking|      Coco|     4|
    |    James Alking|Swing Kids|     4|
    |    James Alking|  Sneakers|     3|
    |Elvira Marroquin|     Shrek|     4|
    |Elvira Marroquin|      Coco|     5|
    |Elvira Marroquin|  Sneakers|     2|
    |      Jack Bauer|      Coco|     2|
    |      Jack Bauer|Swing Kids|     2|
    |      Jack Bauer|  Sneakers|     5|
    |     Julia James|     Shrek|     5|
    |     Julia James|Swing Kids|     2|
    |     Julia James|  Sneakers|     2|
    +----------------+----------+------+

In [None]:
# Get unique users and repartition to 1 partition
users = ratings.select("User").distinct().coalesce(1)

# Create a new column of unique integers called "userId" in the users dataframe.
users = users.withColumn("userId", monotonically_increasing_id()).persist()
users.show()

    +----------------+------+
    |            User|userId|
    +----------------+------+
    |Elvira Marroquin|     0|
    |      Jack Bauer|     1|
    |    James Alking|     2|
    |     Julia James|     3|
    +----------------+------+

In [None]:
# Extract the distinct movie id's
movies = ratings.select("Movie").distinct() 

# Repartition the data to have only one partition.
movies = movies.coalesce(1) 

# Create a new column of movieId integers. 
movies = movies.withColumn("movieId", monotonically_increasing_id()).persist() 

# Join the ratings, users and movies dataframes
movie_ratings = ratings.join(users, "User", "left").join(movies, "Movie", "left")
movie_ratings.show()

    +----------+----------------+------+------+-------+
    |     Movie|            User|Rating|userId|movieId|
    +----------+----------------+------+------+-------+
    |     Shrek|    James Alking|     3|     2|      3|
    |      Coco|    James Alking|     4|     2|      1|
    |Swing Kids|    James Alking|     4|     2|      2|
    |  Sneakers|    James Alking|     3|     2|      0|
    |     Shrek|Elvira Marroquin|     4|     0|      3|
    |      Coco|Elvira Marroquin|     5|     0|      1|
    |  Sneakers|Elvira Marroquin|     2|     0|      0|
    |      Coco|      Jack Bauer|     2|     1|      1|
    |Swing Kids|      Jack Bauer|     2|     1|      2|
    |  Sneakers|      Jack Bauer|     5|     1|      0|
    |     Shrek|     Julia James|     5|     3|      3|
    |Swing Kids|     Julia James|     2|     3|      2|
    |  Sneakers|     Julia James|     2|     3|      0|
    +----------+----------------+------+------+-------+

# PART III: MovieLens Data

In [None]:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('Recommendations').getOrCreate()
movies = spark.read.csv("Downloads/ml-latest-small/movies.csv",header=True)
ratings = spark.read.csv("Downloads/ml-latest-small/ratings.csv",header=True)
print(ratings.columns)
ratings.show()

    ['userId', 'movieId', 'rating', 'timestamp']
    +------+-------+------+----------+
    |userId|movieId|rating| timestamp|
    +------+-------+------+----------+
    |     1|     31|   2.5|1260759144|
    |     1|   1029|   3.0|1260759179|
    |     1|   1061|   3.0|1260759182|
    |     1|   1129|   2.0|1260759185|
    |     1|   1172|   4.0|1260759205|
    |     1|   1263|   2.0|1260759151|
    |     1|   1287|   2.0|1260759187|
    |     1|   1293|   2.0|1260759148|
    |     1|   1339|   3.5|1260759125|
    |     1|   1343|   2.0|1260759131|
    |     1|   1371|   2.5|1260759135|
    |     1|   1405|   1.0|1260759203|
    |     1|   1953|   4.0|1260759191|
    |     1|   2105|   4.0|1260759139|
    |     1|   2150|   3.0|1260759194|
    |     1|   2193|   2.0|1260759198|
    |     1|   2294|   2.0|1260759108|
    |     1|   2455|   2.5|1260759113|
    |     1|   2968|   1.0|1260759200|
    |     1|   3671|   3.0|1260759117|
    +------+-------+------+----------+
    only showing top 20 rows

## Calculate sparsity
Let's see how much of the `ratings` matrix is actually empty.

In [None]:
# Count the total number of ratings in the dataset
numerator = ratings.select("rating").count()

# Count the number of distinct userIds and distinct movieIds
num_users = ratings.select("userId").distinct().count()
num_movies = ratings.select("movieId").distinct().count()

# Set the denominator equal to the number of users multiplied by the number of movies
denominator = num_users * num_movies

# Divide the numerator by the denominator
sparsity = (1.0 - (numerator *1.0)/denominator)*100
print("The ratings dataframe is ", "%.2f" % sparsity + "% empty.")

The ratings dataframe is  98.36% empty.

## Summary metrics
Now that we know a little more about the dataset, let's look at some general summary metrics of the `ratings` dataset and see how many ratings the movies have and how many ratings each users has provided.

In [None]:
# Import the requisite packages
from pyspark.sql.functions import col

# Group data by userId, count ratings
ratings.groupBy("userId").count().show()

    +------+-----+
    |userId|count|
    +------+-----+
    |   296|   20|
    |   467|   64|
    |   125|  210|
    |   451|   52|
    |   666|   40|
    |     7|   88|
    |    51|   31|
    |   124|   85|
    |   447|   87|
    |   591|   30|
    |   307|   72|
    |   475|  655|
    |   574|  342|
    |   613|   53|
    |   169|  113|
    |   205|  206|
    |   334|   34|
    |   544|  268|
    |   577|  279|
    |   581|   49|
    +------+-----+
    only showing top 20 rows

In [None]:
# Min num ratings for movies
print("Movie with the fewest ratings: ")
ratings.groupBy("movieId").count().select("count").agg({"count": "min"}).show()

# Avg num ratings per movie
print("Avg num ratings per movie: ")
ratings.groupBy("movieId").count().select("count").agg({"count": "avg"}).show()

# Min num ratings for user
print("User with the fewest ratings: ")
ratings.groupBy("userId").count().select("count").agg({"count": "min"}).show()

# Avg num ratings per users
print("Avg num ratings per user: ")
ratings.groupBy("userId").count().select("count").agg({"count": "avg"}).show()

    Movie with the fewest ratings: 
    +----------+
    |min(count)|
    +----------+
    |         1|
    +----------+
    
    Avg num ratings per movie: 
    +------------------+
    |        avg(count)|
    +------------------+
    |11.030664019413193|
    +------------------+
    
    User with the fewest ratings: 
    +----------+
    |min(count)|
    +----------+
    |        20|
    +----------+
    
    Avg num ratings per user: 
    +------------------+
    |        avg(count)|
    +------------------+
    |149.03725782414307|
    +------------------+

## Format the schema
Spark's implementation of ALS requires that `movieId`s and `userId`s be provided as integer datatypes. Many datasets need to be prepared accordingly in order for them to function properly with Spark. A common issue is that Spark thinks numbers are strings, and vice versa.

In [None]:
# Use .printSchema() to see the datatypes of the ratings dataset
ratings.printSchema()

   root  
     &nbsp;&nbsp;&nbsp;&nbsp;|-- userId: string (nullable = true)  
     &nbsp;&nbsp;&nbsp;&nbsp;|-- movieId: string (nullable = true)  
     &nbsp;&nbsp;&nbsp;&nbsp;|-- rating: string (nullable = true)  
     &nbsp;&nbsp;&nbsp;&nbsp;|-- timestamp: string (nullable = true)  

In [None]:
# Tell Spark to convert the columns to the proper data types
ratings = ratings.select(ratings.userId.cast("integer"), ratings.movieId.cast("integer"), ratings.rating.cast("double"))

# Call .printSchema() again to confirm the columns are now in the correct format
ratings.printSchema()

 root  
     &nbsp;&nbsp;&nbsp;&nbsp;|-- userId: integer (nullable = true)  
     &nbsp;&nbsp;&nbsp;&nbsp;|-- movieId: integer (nullable = true)  
     &nbsp;&nbsp;&nbsp;&nbsp;|-- rating: double (nullable = true)

## Create test/train splits and build an ALS model
ALS parameters and hyperparameters
- userCol
- itemCol
- ratingCol
- rank: number of latent features
- maxIter: number of iterations
- regParam: Lambda
- alpha: Only used with implicit ratings
- nonnegative=True: Ensures positive numbers
- coldStartStrateg="drop": Addresses issues with test/train split
- implicitPrefs=True: True/False depending on ratings type

In [None]:
# Import the required functions
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.recommendation import ALS
from pyspark.ml.tuning import ParamGridBuilder, CrossValidator

# Create test and train set
(train, test) = ratings.randomSplit([0.80, 0.20], seed = 1234)

# Create ALS model
als = ALS(userCol="userId", itemCol="movieId", ratingCol="rating", coldStartStrategy="drop", nonnegative = True, implicitPrefs = False)

# Confirm that a model called "als" was created
type(als)

pyspark.ml.recommendation.ALS

## Tune the ALS model

In [None]:
# Import the requisite items
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.tuning import ParamGridBuilder, CrossValidator

# Add hyperparameters and their respective values to param_grid
param_grid = ParamGridBuilder() \
            .addGrid(als.rank, [10, 50, 100, 150]) \
            .addGrid(als.maxIter, [5, 50, 100, 200]) \
            .addGrid(als.regParam, [.01, .05, .1, .15]) \
            .build()
           
# Define evaluator as RMSE and print length of evaluator
evaluator = RegressionEvaluator(metricName="rmse", labelCol="rating", predictionCol="prediction") 
print ("Num models to be tested: ", len(param_grid))

 Num models to be tested:  64

## Build the cross validation pipeline

In [None]:
# Build cross validation using CrossValidator
cv = CrossValidator(estimator=als, estimatorParamMaps=param_grid, evaluator=evaluator, numFolds=5)

# Confirm cv was built
print(cv)

CrossValidator_40e3a38b924ccf0ac669

## Best Model and Best Model Parameters

In [None]:
#Fit cross validator to the 'train' dataset
model = cv.fit(train)

#Extract best model from the cv model above
best_model = model.bestModel

In [None]:
# Print best_model
print(type(best_model))

# Complete the code below to extract the ALS model parameters
print("**Best Model**")

# Print "Rank"
print("  Rank:", best_model.getRank())

# Print "MaxIter"
print("  MaxIter:", best_model.getMaxIter())

# Print "RegParam"
print("  RegParam:", best_model.getRegParam())

    <class 'pyspark.ml.recommendation.ALS'>
    **Best Model**
      Rank: 50
      MaxIter: 100
      RegParam: 0.1

## Model Performance Evaluation

Now that we have a model that is trained on our data and tuned through cross validation, we can see how it performs on the `test` dataframe. To do this, we'll calculate the RMSE.

In [None]:
test_predictions = best_model.transform(test)

# View the predictions 
test_predictions.show()

# Calculate and print the RMSE of test_predictions
RMSE = evaluator.evaluate(test_predictions)
print(RMSE)

    +------+-------+------+------------------+
    |userId|movieId|rating|        prediction|
    +------+-------+------+------------------+
    |   380|    463|   3.0| 4.093334993256898|
    |   460|    471|   5.0| 4.789751482535894|
    |   440|    471|   3.0| 2.440344619907418|
    |   306|    471|   3.0|3.3247629567900976|
    |    19|    471|   3.0| 3.067333162723295|
    |   299|    471|   4.5| 5.218491499885204|
    |   537|    471|   5.0|  5.69083471617962|
    |   241|    471|   4.0| 3.816546176254299|
    |    23|    471|   3.5| 2.539020466532909|
    |   195|    471|   3.0| 3.355342979133588|
    |   487|    471|   4.0| 3.105186392315445|
    |   242|    471|   5.0| 5.893115933597325|
    |    30|    471|   4.0| 4.017221049024606|
    |   516|   1088|   3.0|3.3911144131643005|
    |   111|   1088|   3.5| 4.504826156475481|
    |    57|   1088|   4.0| 3.024549429857915|
    |    54|   1088|   5.0| 5.235519746597422|
    |    19|   1088|   3.0|  3.70884171609874|
    |   387|   1088|   4.0| 4.070842875474657|
    |   514|   1088|   3.0| 2.313176685047038|
    +------+-------+------+------------------+
    only showing top 20 rows
    
    0.6332304339145925

An RMSE of 0.633 means that on average the model predicts 0.633 above or below values of the original ratings matrix.

## Generate Recommendations

In [None]:
# Generate top n recommendations for all 
usersrecommendForAllUsers(n) # n is an integer

## Clean up Recommendation output

In [None]:
ALS_recommendations.show()

    +------+---------------------+
    |userId|      recommendations|
    +------+---------------------+
    |   360|[[65037, 4.491346]...|
    |   246|[[3414, 4.8967672]...|
    |   346|[[4565, 4.9247236]...|
    |   476|[[83318,4.9556283]...|
    |   367|[[4632, 4.7018986]...|
    |   539|[[1172, 5.2528191]...|
    |   599|[[6413, 4.7284415]...|
    |   220|[[80,   4.4857406]...|
    +------+---------------------+

In [None]:
# Save the dataframe as temporary table
ALS_recommendations.registerTempTable("ALS_recs_temp")

In [None]:
# The `explode` command separates the items within the `recommendations` column
exploded_recs = spark.sql("SELECT uderId,                                  
                          explode(recommendations) AS MovieRec                           
                          FROM ALS_recs_temp")
exploded_recs.show()

    +------+---------------------------------------+
    |userId|                               MovieRec|
    +------+---------------------------------------+
    |   360|{"movieId": 65037, "rating": 4.4913464}|
    |   360|{"movieId": 59684, "rating": 4.4832921}|
    |   360|{"movieId": 31435, "rating": 4.4822811}|
    |   360|{"movieId": 593, "rating": 4.456215}   |
    |   360|{"movieId": 67504, "rating": 4.4028492}|
    |   360|{"movieId": 83411, "rating": 4.3391834}|

In [None]:
# Perform sql query to make it readable
clean_recs = spark.sql("SELECT userId, 
                       movieIds_and_ratings.movieId AS movieId, 
                       movieIds_and_ratings.rating AS prediction 
                       FROM ALS_recs_temp 
                       LATERAL VIEW explode(recommendations) exploded_table AS movieIds_and_ratings") # Save the `MovieRec` column as table named `movieIds_and_ratings`

clean_recs.show()

    +------+------------------+
    |userId|movieId|prediction|
    +------+------------------+
    |   360|  65037|  4.491346|
    |   360|  59684|  4.491346|
    |   360|  34135|  4.491346|
    |   360|    593|  4.453185|
    |   360|  67504|  4.389951|
    |   360|  83411|  4.389944|

In [None]:
# Join the movies table to get the movies' information
clean_recs.join(movie_info, ["movieId"], "left").show()

    +------+------------------+--------------------+
    |userId|movieId|prediction|               title|
    +------+------------------+--------------------+
    |   360|  65037|  4.491346|        Ben X (2007)|
    |   360|  59684|  4.491346| Lake of Fire (2006)|
    |   360|  34135|  4.491346|Rory O Shea Was H...|

The ALS output include all movies for all users, whether they've seen them or not.

In [None]:
# `rating` column contains the original rating by user, if the rating is null, the user hasn't seen the movie yet
clean_recs.join(movie_ratings, ["userId", "movieId"], "left")

    +------+------------------+------+
    |userId|movieId|prediction|rating|
    +------+------------------+------+
    |   173|    318|  4.947126|  null|
    |   150|    318|  4.066513|   5.0|
    |   369|    318|  4.514297|   5.0|

In [None]:
# By filtering the `rating` column, the output contains only the predictions for movies users haven't seen yet
clean_recs.join(movie_ratings, ["userId", "movieId"], "left").filter(movie_ratings.rating.isNull()).show()

    +------+------------------+------+
    |userId|movieId|prediction|rating|
    +------+------------------+------+
    |   173|    318|  4.947126|  null|
    |    27|    318|  4.523860|  null|
    |   515|    318|  5.165822|  null|

## Do Recommendations Make Sense?

In [None]:
original_ratings.show()


    +------+-------+------+--------------------+--------------------+  
    |userId|movieId|rating|               title|              genres|  
    +------+-------+------+--------------------+--------------------+  
    |    26|      1|     5|      ToyStory(1995)|Adventure|Animati...|
    |    26|   2542|     5|LockStock&TwoSmok...|Comedy|Crime|Thri...|
    +------+-------+------+--------------------+--------------------+

In [None]:
# Look at user 60's ratings
print("User 60's Ratings:")
original_ratings.filter(col("userId") == 60).sort("rating", ascending=False).show()

# Look at the movies recommended to user 60
print("User 60s Recommendations:")
recommendations.filter(col("userId") == 60).show()

# Look at user 63's ratings
print("User 63's Ratings:")
original_ratings.filter(col("userId") == 63).sort("rating", ascending=False).show()

# Look at the movies recommended to user 63
print("User 63's Recommendations:")
recommendations.filter(col("userId") == 63).show()

    User 60's Ratings:
    +------+-------+------+--------------------+--------------------+
    |userId|movieId|rating|               title|              genres|
    +------+-------+------+--------------------+--------------------+
    |    60|    858|     5|  GodfatherThe(1972)|         Crime|Drama|
    |    60|    235|     5|        EdWood(1994)|        Comedy|Drama|
    |    60|   1732|     5|BigLebowskiThe(1998)|        Comedy|Crime|
    |    60|   2324|     5|LifeIsBeautiful(L...|Comedy|Drama|Roma...|
    |    60|   3949|     5|RequiemforaDream(...|               Drama|
    |    60|    541|     5|   BladeRunner(1982)|Action|Sci-Fi|Thr...|
    |    60|   5995|     5|    PianistThe(2002)|           Drama|War|
    |    60|   6350|     5|Laputa:Castleinth...|Action|Adventure|...|
    |    60|   7361|     5|EternalSunshineof...|Drama|Romance|Sci-Fi|
    |    60|   8638|     5|  BeforeSunset(2004)|       Drama|Romance|
    |    60|   8981|     5|        Closer(2004)|       Drama|Romance|
    |    60|  27803|     5|SeaInsideThe(Mara...|               Drama|
    |    60|  30749|     5|   HotelRwanda(2004)|           Drama|War|
    +------+-------+------+--------------------+--------------------+
    only showing top 20 rows
    
    User 60s Recommendations:
    +------+-------+----------+--------------------+--------------------+
    |userId|movieId|prediction|               title|              genres|
    +------+-------+----------+--------------------+--------------------+
    |    60|  83318|  5.810963|       GoatThe(1921)|              Comedy|
    |    60|  83411|  5.810963|          Cops(1922)|              Comedy|
    |    60|  73344|  5.315315|ProphetA(UnProphÃ...|         Crime|Drama|
    |    60|   3309| 5.2298656|    Dog'sLifeA(1918)|              Comedy|
    |    60|   8609| 5.2298656|OurHospitality(1923)|              Comedy|
    |    60|  72647| 5.2298656|   Zorn'sLemma(1970)|               Drama|
    |    60|   5059| 5.2298656|LittleDieterNeeds...|         Documentary|
    |    60|   8797| 5.2298656|      Salesman(1969)|         Documentary|
    |    60|  25764| 5.2298656|  CameramanThe(1928)|Comedy|Drama|Romance|
    |    60|   7074| 5.2298656|  NavigatorThe(1924)|              Comedy|
    |    60|  31547| 5.2298656|LessonsofDarkness...|     Documentary|War|
    |    60|   4405| 5.2298656|LastLaughThe(Letz...|               Drama|
    |    60|  26400| 5.2298656| GatesofHeaven(1978)|         Documentary|
    |    60|  80599| 5.2298656|BusterKeaton:AHar...|         Documentary|
    |    60|  92494| 5.1418443|DylanMoran:Monste...|  Comedy|Documentary|
    |    60|   3216| 5.1418443|VampyrosLesbos(Va...|Fantasy|Horror|Th...|
    |    60|   6918| 5.1184077|UnvanquishedThe(A...|               Drama|
    |    60|  40412| 5.0673676|DeadMan'sShoes(2004)|      Crime|Thriller|
    |    60|  52767|  5.043912|          21Up(1977)|         Documentary|
    |    60|   8955| 5.0317564|      Undertow(2004)|Crime|Drama|Thriller|
    +------+-------+----------+--------------------+--------------------+
    
    User 63's Ratings:
    +------+-------+------+--------------------+--------------------+
    |userId|movieId|rating|               title|              genres|
    +------+-------+------+--------------------+--------------------+
    |    63|      1|     5|      ToyStory(1995)|Adventure|Animati...|
    |    63|     16|     5|        Casino(1995)|         Crime|Drama|
    |    63|    260|     5|StarWars:EpisodeI...|Action|Adventure|...|
    |    63|    318|     5|ShawshankRedempti...|         Crime|Drama|
    |    63|    592|     5|        Batman(1989)|Action|Crime|Thri...|
    |    63|   1193|     5|OneFlewOvertheCuc...|               Drama|
    |    63|   1198|     5|RaidersoftheLostA...|    Action|Adventure|
    |    63|   1214|     5|         Alien(1979)|       Horror|Sci-Fi|
    |    63|   1221|     5|Godfather:PartIIT...|         Crime|Drama|
    |    63|   1259|     5|     StandbyMe(1986)|     Adventure|Drama|
    |    63|   1356|     5|StarTrek:FirstCon...|Action|Adventure|...|
    |    63|   1639|     5|    ChasingAmy(1997)|Comedy|Drama|Romance|
    |    63|   2797|     5|           Big(1988)|Comedy|Drama|Fant...|
    |    63|   2858|     5|AmericanBeauty(1999)|       Drama|Romance|
    |    63|   2918|     5|FerrisBueller'sDa...|              Comedy|
    |    63|   3114|     5|     ToyStory2(1999)|Adventure|Animati...|
    |    63|   3176|     5|TalentedMr.Ripley...|Drama|Mystery|Thr...|
    |    63|   3481|     5|  HighFidelity(2000)|Comedy|Drama|Romance|
    |    63|   3578|     5|     Gladiator(2000)|Action|Adventure|...|
    |    63|   4306|     5|         Shrek(2001)|Adventure|Animati...|
    +------+-------+------+--------------------+--------------------+
    only showing top 20 rows
    
    User 63's Recommendations:
    +------+-------+----------+--------------------+--------------------+
    |userId|movieId|prediction|               title|              genres|
    +------+-------+----------+--------------------+--------------------+
    |    63|  92210| 4.8674645|DisappearanceofHa...|Adventure|Animati...|
    |    63| 110873| 4.8674645|CentenarianWhoCli...|Adventure|Comedy|...|
    |    63|   9010| 4.8588977|LoveMeIfYouDare(J...|       Drama|Romance|
    |    63| 108583|  4.836118|FawltyTowers(1975...|              Comedy|
    |    63|   8530| 4.8189244|   DearFrankie(2004)|       Drama|Romance|
    |    63|  83318|  4.813581|       GoatThe(1921)|              Comedy|
    |    63|  83411|  4.813581|          Cops(1922)|              Comedy|
    |    63|  65037| 4.7906556|          BenX(2007)|               Drama|
    |    63|  54328|  4.688013|MyBestFriend(Monm...|              Comedy|
    |    63|   3437|  4.678849|     CoolasIce(1991)|               Drama|
    |    63|   2924|  4.675808|DrunkenMaster(Jui...|       Action|Comedy|
    |    63|   1196| 4.6633716|StarWars:EpisodeV...|Action|Adventure|...|
    |    63|  27156| 4.6382804|NeonGenesisEvange...|Action|Animation|...|
    |    63|  26865| 4.6308517|FistofLegend(Jing...|        Action|Drama|
    |    63|   5244| 4.6302986|ShogunAssassin(1980)|    Action|Adventure|
    |    63|  93320| 4.6302986|TrailerParkBoys(1...|        Comedy|Crime|
    |    63|  50641| 4.6302986|  House(Hausu)(1977)|Comedy|Fantasy|Ho...|
    |    63|   6598|  4.624031|StepIntoLiquid(2002)|         Documentary|
    |    63|   7502| 4.6232696|BandofBrothers(2001)|    Action|Drama|War|
    |    63|  73344|  4.609774|ProphetA(UnProphÃ...|         Crime|Drama|
    +------+-------+----------+--------------------+--------------------+
    

It looks like the model picked up on user 60's preference for drama, crime, and comedy or user 63's preference for action, adventure, and drama.