# Collaborative filtering

In [19]:
from src.utils import load_dataset

from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
import pandas as pd

Collaborative filtering is one of the most common techniques used for building recommender systems.
1. **Find Similar Users**:
    - Compute the cosine similarity between users based on their song ratings.
    - Recommend users who have the highest cosine similarity scores.

2. **Recommend Songs**:
    - Predict the ratings a user would give to songs they haven't listened to.
    - Recommend songs with the highest predicted ratings.

In [9]:
# Create user-song matrix
dataframe = load_dataset()
user_song_matrix = dataframe.pivot_table(index='User_Name', columns='Song', values='Star_Rating', fill_value=0)

In [14]:
users = user_song_matrix.index.tolist()
users 

['Alice',
 'Bob',
 'Charlie',
 'David',
 'Emily',
 'Frank',
 'Grace',
 'Hannah',
 'Ivy',
 'Jack',
 'Karen',
 'Liam',
 'Monica',
 'Nancy',
 'Oliver',
 'Paul',
 'Quincy',
 'Rachel',
 'Steve',
 'Tom']

In [37]:
def recommender_system_collaborative(user_song_matrix, user_name):
    # Compute user-user cosine similarity
    user_similarity = cosine_similarity(user_song_matrix)
    user_similarity_df = pd.DataFrame(user_similarity, index=user_song_matrix.index, columns=user_song_matrix.index)
    
    # 1) Recommend similar users
    top_users = user_similarity_df[user_name].sort_values(ascending=False)[1:6].to_dict()
    
    # 2) Recommend songs
    # Compute the predicted ratings
    user_predictions = np.dot(user_similarity, user_song_matrix) / np.array([np.abs(user_similarity).sum(axis=1)]).T
    user_predictions_df = pd.DataFrame(user_predictions, index=user_song_matrix.index, columns=user_song_matrix.columns)
    sorted_user_predictions = user_predictions_df.loc[user_name].sort_values(ascending=False)
    
    # Filter out songs the user has already rated/listened to
    user_data = user_song_matrix.loc[user_name]
    already_listened = user_data[user_data > 0].index.tolist()
    recommendations = sorted_user_predictions[~sorted_user_predictions.index.isin(already_listened)]
    top_song_recommendations = recommendations.head(10).index.tolist()

    return top_users, top_song_recommendations, user_similarity_df, user_predictions_df

# Use the recommender system for a user (e.g., 'User1')
similar_users, recommended_songs, user_similarity_df, _ = recommender_system_collaborative(user_song_matrix, 'Alice')


print("Similar Users to Follow:")
print(similar_users)
print("\nSongs to Listen To:")
print(recommended_songs)

Similar Users to Follow:
{'Emily': 0.3908736874708199, 'Monica': 0.25723820209050213, 'Liam': 0.23947253431888973, 'Karen': 0.20918717073754478, 'Jack': 0.20837327131300545}

Songs to Listen To:
['Song234', 'Song264', 'Song166', 'Song94', 'Song24', 'Song136', 'Song214', 'Song146', 'Song216', 'Song133']


In [38]:
user_similarity_df

User_Name,Alice,Bob,Charlie,David,Emily,Frank,Grace,Hannah,Ivy,Jack,Karen,Liam,Monica,Nancy,Oliver,Paul,Quincy,Rachel,Steve,Tom
User_Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
Alice,1.0,0.0,0.0,0.0,0.390874,0.035852,0.0,0.171132,0.123824,0.208373,0.209187,0.239473,0.257238,0.0,0.183597,0.0,0.108991,0.119319,0.0,0.0
Bob,0.0,1.0,0.0,0.552581,0.0,0.0,0.0,0.133466,0.05807,0.0,0.0,0.111847,0.0,0.054076,0.124482,0.236298,0.108823,0.117585,0.560933,0.0
Charlie,0.0,0.0,1.0,0.0,0.22339,0.326499,0.322373,0.0,0.0,0.231434,0.459809,0.218511,0.0,0.186944,0.0,0.0,0.254993,0.0,0.0,0.466016
David,0.0,0.552581,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.112675,0.227733,0.46793,0.217362,0.0,0.770631,0.0
Emily,0.390874,0.0,0.22339,0.0,1.0,0.072663,0.328256,0.151527,0.080624,0.223481,0.410995,0.222492,0.0,0.162226,0.254616,0.0,0.299856,0.0,0.0,0.0
Frank,0.035852,0.0,0.326499,0.0,0.072663,1.0,0.036384,0.101663,0.018136,0.023623,0.178188,0.185212,0.075741,0.094048,0.0,0.0,0.116676,0.144115,0.0,0.191028
Grace,0.0,0.0,0.322373,0.0,0.328256,0.036384,1.0,0.0,0.0,0.0,0.402075,0.211474,0.0,0.263828,0.0,0.0,0.230661,0.0,0.0,0.0
Hannah,0.171132,0.133466,0.0,0.0,0.151527,0.101663,0.0,1.0,0.231103,0.048593,0.248408,0.141479,0.156317,0.083592,0.331384,0.188681,0.177236,0.22365,0.0,0.0
Ivy,0.123824,0.05807,0.0,0.0,0.080624,0.018136,0.0,0.231103,1.0,0.17987,0.09877,0.15259,0.184889,0.0,0.205241,0.222005,0.140652,0.134505,0.0,0.0
Jack,0.208373,0.0,0.231434,0.0,0.223481,0.023623,0.0,0.048593,0.17987,1.0,0.145239,0.127445,0.172526,0.0,0.157199,0.28726,0.154495,0.0,0.0,0.357409


## Improving the Recommendations



. **More Data**: As with many machine learning tasks, having more data can help in making more accurate recommendations. This includes more user interaction data, richer metadata about items, and user profiles.

. **Data Quality**: Ensure that the data being used is clean, relevant, and free of biases or errors. Removing outliers or irrelevant data can improve the performance of recommendation algorithms.

. **Feature Engineering**: Extract more relevant features from the existing data. For instance, for song recommendations, features like tempo, lyrics sentiment, or time of listening could be useful.

. **Diversity in Recommendations**: Ensure that the recommendations aren't just limited to popular items. Introduce diversity in the results so users get exposed to a wider array of choices. there are different ways to introduce diversity

. **Hybrid Models**: Instead of relying solely on content-based or collaborative filtering, use a combination. Hybrid models can leverage the strengths of both methods to provide more accurate recommendations.

. **Consider Context**: Recommendations can be context-dependent. For instance, a song recommendation for a user might change depending on whether they're at the gym or relaxing at home.

. **Feedback Loop**: Allow users to give feedback on the recommendations. This can be used to fine-tune the recommendation algorithm.

. **Cold Start Problem**: Address the cold start problem, where new users or items without sufficient interaction history can be challenging for recommendation systems. Techniques like content-based filtering or hybrid models can be useful here.


## Evaluating the Model

### 1. Offline Evaluation:
   - **Precision and Recall**: These measure how many of the recommended items are relevant, and how many relevant items are recommended, respectively.
   - **Mean Average Precision (MAP)**: A popular metric in ranking tasks that considers the order of recommendations.
   - **Normalized Discounted Cumulative Gain (NDCG)**: Measures the quality of the ranking of the recommended items.
   - **Root Mean Squared Error (RMSE)**: For rating predictions, the RMSE can measure how far the predicted ratings are from the actual ratings.
   - **Diversity**: These metrics can help to measure how diverse, new, or surprising the recommendations are, depending on the specific goals of the recommendation system.

### 2. Online Evaluation:
   - **A/B Testing**: Split users into different groups and provide different versions of recommendations to each group. Compare metrics like click-through rate, conversion rate, time spent on the platform, etc.
   - **Multi-Armed Bandit Testing**: An extension of A/B testing that dynamically adjusts the proportion of users seeing each version based on ongoing results.

### 3. User Studies:
   - **User Surveys and Interviews**: Sometimes, qualitative feedback can provide insights that quantitative metrics miss. Asking users directly how they feel about the recommendations can provide valuable information.
   - **User Engagement**: Track how users are interacting with the recommended items. Are they clicking on them, spending time with them, purchasing them?

# Item-Item Collaborative Filtering Recommendations

Item-Item Collaborative Filtering recommends items based on their similarity to items the user has shown preference for. 

1. **Compute Item Similarity**:
    - For each pair of items (in our case, songs), compute their similarity. Common metrics include the Pearson correlation or cosine similarity.
    - This results in an item-item similarity matrix.

2. **Predict User Ratings**:
    - For items the user hasn't interacted with, predict their rating by taking a weighted sum of the user's ratings of other items, where the weights are the similarities of those items to the item in question.

3. **Recommend Items**:
    - Recommend items that have the highest predicted ratings for the user.


In [44]:
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity

def item_item_recommendation(user_song_matrix, user_name, n_recommendations=5):
    # Compute the item-item similarity matrix
    item_similarity = pd.DataFrame(cosine_similarity(user_song_matrix.T), 
                                   index=user_song_matrix.columns, 
                                   columns=user_song_matrix.columns)
    
    # Get the user's data
    user_data = user_song_matrix.loc[user_name]
    
    # Predict ratings for items the user hasn't interacted with yet
    missing_items = user_data[user_data == 0].index
    item_scores = {}
    for item in missing_items:
        similar_items = item_similarity[item]
        predicted_score = sum(user_data * similar_items) / sum(abs(similar_items))
        item_scores[item] = predicted_score

    # Sort by predicted score and get top n_recommendations
    recommended_items = sorted(item_scores, key=item_scores.get, reverse=True)[:n_recommendations]
    
    return recommended_items

# Test the function
recommended_songs = item_item_recommendation(user_song_matrix, 'Alice')
print(recommended_songs)

['Song13', 'Song243', 'Song93', 'Song73', 'Song103']
