<a href="https://colab.research.google.com/github/usshaa/SMBDA/blob/main/C-5.11%3A%20Alternating_Least_Squares_(ALS)_for_recommendation_systems_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Alternating Least Squares (ALS) for recommendation systems.

The ALS algorithm provides a robust framework for building recommendation systems based on collaborative filtering. The recommendations it generates are personalized to each user based on their historical ratings and similarities with other users. Interpreting these recommendations involves understanding the predicted ratings and how they can guide personalized content delivery.

In [None]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import col
from pyspark.ml.recommendation import ALS
from pyspark.ml.evaluation import RegressionEvaluator

In [None]:
# Initialize Spark session
spark = SparkSession.builder \
    .appName("ALS Recommendation System") \
    .getOrCreate()

In [None]:
# Read CSV file into DataFrame
users_spark = spark.read.csv("/FileStore/tables/users.csv", header=True, inferSchema=True)
movies_spark = spark.read.csv("/FileStore/tables/movies.csv", header=True, inferSchema=True)
ratings_spark = spark.read.csv("/FileStore/tables/ratings.csv", header=True, inferSchema=True)

In [None]:
# ALS requires numerical user and item ids, so we need to index them
from pyspark.ml.feature import StringIndexer

user_indexer = StringIndexer(inputCol="UserId", outputCol="UserIndex", handleInvalid="skip")
movie_indexer = StringIndexer(inputCol="MovieId", outputCol="MovieIndex", handleInvalid="skip")

users_spark = user_indexer.fit(users_spark).transform(users_spark)
movies_spark = movie_indexer.fit(movies_spark).transform(movies_spark)

In [None]:
# Split data into training and test sets
(training, test) = ratings_spark.randomSplit([0.8, 0.2], seed=42)

In [None]:
# Build the recommendation model using ALS on the training data
als = ALS(maxIter=10, regParam=0.01, userCol="UserId", itemCol="MovieId", ratingCol="Rating",
          coldStartStrategy="drop")
model = als.fit(training)

In [None]:
# Evaluate the model by computing the RMSE on the test data
predictions = model.transform(test)
evaluator = RegressionEvaluator(metricName="rmse", labelCol="Rating", predictionCol="prediction")
rmse = evaluator.evaluate(predictions)
print(f"Root Mean Squared Error (RMSE) = {rmse}")
predictions.show()

Root Mean Squared Error (RMSE) = 2.2554362946483786
+------+-------+------+-----------+
|UserId|MovieId|Rating| prediction|
+------+-------+------+-----------+
|    85|     10|     4|  0.0783415|
|    85|     17|     1|-0.29120976|
|    85|     20|     2|  4.0072727|
|    85|     34|     3|  3.1405997|
|    65|     23|     4|  1.1245769|
|    65|     41|     1|  2.2354593|
|    65|     49|     1| 0.47916606|
|    53|     42|     3|  2.1885614|
|    78|      6|     3| 0.56656164|
|    34|     12|     5|   1.036366|
|    28|     37|     3| 0.94793254|
|    76|     33|     2|0.052800775|
|    26|      7|     4|  1.4488953|
|    26|     19|     5|  1.0592656|
|    26|     37|     5|   1.029136|
|    44|      6|     4|   2.867073|
|    44|     49|     1|  4.9170256|
|    12|     10|     4|  1.9511747|
|    12|     31|     2|  1.7514855|
|    12|     36|     3|  1.1424004|
+------+-------+------+-----------+
only showing top 20 rows



In [None]:
# Generate top 10 movie recommendations for each user
userRecs = model.recommendForAllUsers(10)
userRecs.show(truncate=False)

+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|UserId|recommendations                                                                                                                                                           |
+------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|20    |[{18, 7.1888847}, {10, 6.4883485}, {40, 6.3231626}, {11, 6.1776}, {14, 5.650364}, {35, 5.391718}, {42, 5.2807508}, {38, 5.171051}, {36, 5.051551}, {3, 4.9758244}]        |
|40    |[{27, 5.1210904}, {25, 4.7762384}, {23, 3.987914}, {45, 3.9577198}, {37, 2.9865966}, {33, 2.8086948}, {13, 2.7648032}, {10, 2.647931}, {17, 2.547537}, {5, 2.1752563}]    |
|100   |[{8, 4.977813}, {2, 4.790382}, {44, 3.9966474}, {46, 3.9600406}, {19, 3.9260445}, {38, 3.790