<i>Copyright (c) Microsoft Corporation. All rights reserved.</i>

<i>Licensed under the MIT License.</i>

# Goal

The goal of this notebook is to provide an example notebook for how a notebook can be executed on Databricks. This notebook
is the default notebook used by [create_and_run_databricks_pipeline.ipynb](create_and_run_databricks_pipeline.ipynb).

This notebook should be uploaded to your databricks workspace in the location you want to specify as `notebook_path` in [create_and_run_databricks_pipeline.ipynb](create_and_run_databricks_pipeline.ipynb).

It depends on several files existing on dbfs. The easiest way to create those files is to through Section 2 of [als_movie_o16n.ipynb](als_movie_o16n.ipynb), and make sure the paths of the serialized objects map to the files specified here.

**TODO:** Make sure defaults are consistent.

## File Imports

In [4]:
import os
import json
import time ## to time the writes

import azureml.core
from azureml.core import Workspace
from azureml.core.model import Model

import pyspark
from pyspark.ml.recommendation import ALSModel

print("PySpark version:", pyspark.__version__)
print("Azure SDK version:", azureml.core.VERSION)

In [5]:
## parameters:
download_from_workspace = False

In [6]:
# location of aml_config:
ws_config_path = '/dbfs/FileStore/top10'
# location of the secrets file for cosmosdb:
secrets_path = '/dbfs/FileStore/top10/top10_dbsecrets.json'
## name of model in the model registry
model_name = 'mvl-als-reco.mml'
## where to download the model to:
model_download_dir = '/dbfs/FileStore/top10/models'

## other data required:
# Columns
userCol = "UserId"
itemCol = "MovieId"


## Load Information about the workspace to retrieve the model

## Load the model

In [9]:
if download_from_workspace:
  ## add service principal authentication to make this work without interaction
  ws = Workspace.from_config(os.path.join(ws_config_path,'aml_config','config.json'))
  ## get workspace details if you want:
  ## ws.get_details()
  current_model_info = Model(ws, name = model_name)
  model_path=current_model_info.download(target_dir = model_download_dir, exist_ok = True)
  print('downloaded model to: %s' %(model_path))
else:
  model_path=os.path.join(model_download_dir,model_name)
  assert os.path.exists(model_path)
  print('Using model already at: %s' %(model_path))
model=ALSModel.load(path=model_path.replace('/dbfs','dbfs:'))

## Make new recommendations

In [11]:
## Create new scores:
recs = model.recommendForAllUsers(10)

## Write to CosmosDB

In [13]:
## load the info to connect to cosmosDB
with open(secrets_path) as json_data:
    writeConfig = json.load(json_data)

In [14]:
## write new scores to CosmosDB
## took about 1.80 minutes first time
## 2-8 nodes
## 4.82 minutes first overwrite time
## 4.64 minutes second
## 4.87 minutes third
## 15-16 nodes
## 1.27 minutes
## 1.23 minutes
tic = time.time()
recs.withColumn("id",recs[userCol].cast("string")).select("id", "recommendations."+ itemCol)\
    .write.format("com.microsoft.azure.cosmosdb.spark").mode('overwrite').options(**writeConfig).save()
toc = time.time()
print("Rewriting results took %.2f seconds" % (toc - tic))