# Foundations of Artificial Intelligence and Machine Learning
## A Program by IIIT-H and TalentSprint

### To be done in Lab

The objective of this experiment is to use kNN as a kind of a recommendation/prediction for movies.

In this experiment we will use a subset of the original movielens dataset.

Consider the problem of recommending movies to users. We have M Users and N Movies. 
Now, we want to predict whether a given test user $x$ will watch movie $y$.

User $x$ has seen and not seen few movies in the past. We will use $x$'s movie watching history as a feature for our recommendation system.

We will use KNN to find the K nearest neighbour users (users with similar taste) to $x$, and make predictions based on their entries for movie $y$.

A user either had seen the movie (1) or not seen the movie (0). We can represent this as a matrix of size M×N. (M rows and N columns). We have actually used a dictionary with the keys userId and movieId to represent this matrix.

Each element of the matrix is either zero or one. If (u, m) entry in this matrix is 1, then the $u^{th}$ user has seen the movie $m$.

#### Training set
M×N binary matrix indicating seen/not-seen.
#### Test set: 
L test cases with $(x, y)$ pairs. $x$ is N-dimensional binary vector with missing $y^{th}$ entry - which we want to predict.


### Data Source

* AIML_DS_MOVIE-TRAIN_SMALLSUBSETOFMOVIELENSDATASET.csv

* AIML_DS_MOVIE-TEST_SMALLSUBSETOFMOVIELENSDATASET.csv

These have been taken (and modified) from:
http://kevinmolloy.info/teaching/cs504_2017Fall/

This is a small subset of the original movielens dataset.
https://grouplens.org/datasets/movielens/



We will use KNN to find the K nearest neighbour users (users with similar taste) to $x$, and make predictions based on their entries for the movie $y$.

We have given the code for Cosine distance, when computing nearest neighbours.

In [None]:
# Importing required packages
import pandas as pd

In [None]:
## Setting up the files

Train_set = "../DS/AIML_DS_MOVIE-TRAIN_SMALLSUBSETOFMOVIELENSDATASET.csv"
Test_set = "../DS/AIML_DS_MOVIE-TEST_SMALLSUBSETOFMOVIELENSDATASET.csv"

In [None]:
## Loading the data from set up files
rated = pd.read_csv(Train_set, converters={"userId":int, "movieId":int})
rated.describe()

In [None]:
userCount = max(rated.userId)
movieCount = max(rated.movieId)

In [None]:
seen = {}
for x in rated.values:
    seen[(int(x[0]), int(x[1]))] = 1

In [None]:
allUsersMovies = [(u,m) for u in range(userCount) for m in range(movieCount)]

In [None]:
for x in allUsersMovies:
    if x not in seen:
        seen[x] = 0

Now we have the data loaded into a dictionary, let us recast the distance function to use it. Given two users, $u_1$ and $u_2$, for a movie $mx$, we must ignore the entries for $mx$ while computing the distance

In [None]:
# This is actually the cosine distance
def distance(u1, u2, mx):
    d = 0 - seen[(u1, mx)] * seen[(u2, mx)]
    for m in range(movieCount):
        d += seen[(u1, m)] * seen[(u2, m)]
    return d

def kNN(k, givenUser, givenMovie):
    distances = []
    for u in range(userCount):
        if u != givenUser:
            distances.append([distance(u, givenUser, givenMovie), u])
    distances.sort()
    distances.reverse() ## Because cosine distances mean higher = closer
    return distances[:k] 

def prediction(k, givenUser, givenMovie):
    neighbours = kNN(k, givenUser, givenMovie)
    howmanySaw = sum([seen[(u, givenMovie)] for d, u in neighbours])
    return 2 * howmanySaw > k      ### predict 1 if more than half of the similar users have seen this movie, otherwise 0.
        

### Exercise 1

Verify the above code and check if it works

In [None]:
# Your Answer Here

### Exercise 2 

Change the distance function to compute Euclidean, and see if the prediction changes. Remember to modify the kNN function to pick the smallest distances: do not reverse()!

In [None]:
## Your Code Here

### Exercise 3

Change the distance function to compute Manhattan, and see if the prediction changes. Remember to modify the kNN function to pick the smallest distances: do not reverse()!

In [None]:
## Your Code Here

### Summary

In above experiment we have learnt how to build recommendation systems using KNN classifier.