# Recommend Products using SparkML

This notebook is written in **Python** so the default cell type is Python. However, you can use different languages by using the `%LANGUAGE` syntax. Python, Scala, SQL, and R are all supported.

As for the notebooks, to run these you will need to register for a free Databricks
[Community Edition account](https://community.cloud.databricks.com/)

## import modules

In [3]:
from pyspark.sql import SparkSession
from pyspark import SparkContext
import pyspark.sql.functions as F

from pyspark.mllib.recommendation import ALS, MatrixFactorizationModel, Rating

## Design Schema

In [5]:
rate_schema = "`userid` string, `accoid` string, `rating` INT"
accos_schema = "`id` string, `title` string, `location` string, `price` int, `rooms` int, `rating` float"

## Read csv files to Spark DataFrame

In [7]:
rating_file_location = '/FileStore/tables/rating-1.csv'
dfRates = spark.read.csv(rating_file_location, rate_schema)

print(dfRates.count())
dfRates.show(5)

In [8]:
print(dfRates.count())

In [9]:
accos_file_location = "/FileStore/tables/accommodation-1.csv"
dfAccos = spark.read.csv(accos_file_location, accos_schema)

print(dfAccos.count())
dfAccos.show(5)

## Aggregations

In [11]:
dfRates.show(3)

In [12]:
df_agg = dfRates.agg(F.count('userid').alias('num_ratings'),
            F.countDistinct('userid').alias('distinct_users_rating'),
            F.max('rating').alias('best_rating'),
            F.min('rating').alias('worst_rating'),
            F.avg('rating').alias('avg_rating')
           )
df_agg.show()

## Left Join

In [14]:
df_leftjoined = dfAccos.join(dfRates, dfAccos.id == dfRates.accoid, how = 'left')
print(df_joined.count())
df_leftjoined.show(3)

In [15]:
df_leftjoined.select('*').where(df_joined.id == '100').show()

## Right Join

In [17]:
df_rightjoined = dfAccos.join(dfRates, dfAccos.id == dfRates.accoid, how = 'right')
print(df_joined.count())
df_rightjoined.show(3)

In [18]:
df_rightjoined.select('*').where(df_joined.id == '100').show()

In [19]:
df_rightjoined.select('*').where(df_joined.accoid == '101').show()

## Inner Join

In [21]:
df_innerjoined = dfAccos.join(dfRates, dfAccos.id == dfRates.accoid, how = 'inner')
print(df_innerjoined.count())
df_innerjoined.show(3)

In [22]:
df_innerjoined.select('*').where(df_joined.id == '100').show()

In [23]:
df_innerjoined.select('*').where(df_joined.accoid == '101').show()

## Train the model and recommend products with the model

### Train the model

In [26]:
model = ALS.train(dfRates.rdd, 20, 20)
print(type(model))

### Save the model

In [28]:
model.save(sc,'/FileStore/model_recommendation.ml' )

### Use the trained model to predict what accommodations each user might be interested

In [30]:
allPredictions = None
for USER_ID in range(0, 100):
  dfUserRatings = dfRates.filter(dfRates.userid == USER_ID).rdd.map(lambda r: r.accoid).collect()
  rddPotential  = dfAccos.rdd.filter(lambda x: x[0] not in dfUserRatings)
  pairsPotential = rddPotential.map(lambda x: (USER_ID, x[0]))
  predictions = model.predictAll(pairsPotential).map(lambda p: (str(p[0]), str(p[1]), float(p[2])))
  predictions = predictions.takeOrdered(5, key=lambda x: -x[2]) # top 5
  print("predicted for user={0}".format(USER_ID))
  if (allPredictions == None):
    allPredictions = predictions
  else:
    allPredictions.extend(predictions)

In [31]:
type(allPredictions)

In [32]:
allPredictions

### Prediction for user '1'

In [34]:
dfUserRatings = dfRates.filter(dfRates.userid == '1').rdd.map(lambda r: r.accoid).collect()
rddPotential  = dfAccos.rdd.filter(lambda x: x[0] not in dfUserRatings)
pairsPotential = rddPotential.map(lambda x: ('1', x[0]))

In [35]:
pairsPotential.collect()

In [36]:
predictions = model.predictAll(pairsPotential).map(lambda p: (str(p[0]), str(p[1]), float(p[2])))

In [37]:
predictions = predictions.takeOrdered(5, key=lambda x: -x[2]) # top 5

#### These are the five accommodations that are recommended to user '1'. Note that the quality of the recommendations are not great because the dataset was so small (the predicted ratings are not very high). Still, this lab illustrates the process you'd go through to create product recommendations.

In [39]:
predictions