# Laboratorium 5 - rekomendacje grupowe

## Przygotowanie

 * pobierz i wypakuj dataset: https://files.grouplens.org/datasets/movielens/ml-latest-small.zip
   * więcej możesz poczytać tutaj: https://grouplens.org/datasets/movielens/
 * [opcjonalnie] Utwórz wirtualne środowisko
 `python3 -m venv ./recsyslab5`
 * zainstaluj potrzebne biblioteki:
 `pip install numpy pandas matplotlib`

## Część 1. - przygotowanie danych

In [1]:
# importujemy wszystkie potrzebne pakiety

import math
import numpy as np
import pandas

from random import choice, sample
from statistics import mean, stdev

from reco_utils import *

In [2]:
# wczytujemy oceny uytkownikow i obliczamy (za pomocą collaborative filtering) wszystkie przewidywane oceny filmow

raw_ratings = pandas.read_csv('ml-latest-small/ratings.csv').drop(columns=['timestamp'])
movies = list(raw_ratings['movieId'].unique())
users = list(raw_ratings['userId'].unique())
ratings = get_predicted_ratings(raw_ratings)
ratings

Total error: 213556.8724544077
Total error: 207017.59365106368
Total error: 200918.88739705057
Total error: 195218.44311324565
Total error: 189879.00275124036
Total error: 184867.65131561962
Total error: 180155.2221530232
Total error: 175715.79585786397
Total error: 171526.27602626398
Total error: 167566.028467732
Total error: 163816.57310848142
Total error: 160261.3198743124
Total error: 156885.34145994327
Total error: 153675.17717633606
Total error: 150618.66309351297
Total error: 147704.78452075043
Total error: 144923.54753233978
Total error: 142265.86678866882
Total error: 139723.46734490912
Total error: 137288.7985030201
Total error: 134954.95806272564
Total error: 132715.62557577426
Total error: 130565.00341484732
Total error: 128497.76464160919
Total error: 126509.00680370304
Total error: 124594.2109129405
Total error: 122749.20496042933
Total error: 120970.13141219056
Total error: 119253.41820353366
Total error: 117595.75281418597
Total error: 115994.05906075976
Total error: 11

Unnamed: 0,1,2,3,4,5,6,7,8,9,10,...,193565,193567,193571,193573,193579,193581,193583,193585,193587,193609
1,8,6,3,7,10,7,8,6,4,5,...,7,6,7,10,10,7,5,8,5,9
2,8,10,0,10,9,6,10,3,10,8,...,0,10,0,10,6,0,5,8,4,10
3,10,1,5,10,4,9,5,6,5,10,...,2,0,4,6,6,3,2,0,10,0
4,6,6,4,5,5,6,6,6,7,6,...,5,6,6,4,5,5,7,4,3,4
5,6,0,5,5,0,9,5,4,0,5,...,8,2,10,6,9,0,0,0,10,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
606,6,6,6,6,6,6,6,6,6,6,...,6,6,6,6,6,6,6,6,6,6
607,7,5,10,1,8,10,3,10,4,4,...,8,1,9,5,1,10,7,3,6,5
608,6,6,6,7,6,5,6,6,6,7,...,6,6,6,6,6,7,6,6,6,6
609,1,7,3,9,10,6,7,5,5,5,...,10,10,5,10,8,6,0,1,10,1


In [3]:
# wczytujemy nazwy filmow i kategorie

movies_metadata = pandas.read_csv('ml-latest-small/movies.csv').set_index('movieId')
movies_metadata

Unnamed: 0_level_0,title,genres
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1
1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
2,Jumanji (1995),Adventure|Children|Fantasy
3,Grumpier Old Men (1995),Comedy|Romance
4,Waiting to Exhale (1995),Comedy|Drama|Romance
5,Father of the Bride Part II (1995),Comedy
...,...,...
193581,Black Butler: Book of the Atlantic (2017),Action|Animation|Comedy|Fantasy
193583,No Game No Life: Zero (2017),Animation|Comedy|Fantasy
193585,Flint (2017),Drama
193587,Bungo Stray Dogs: Dead Apple (2018),Action|Animation


In [4]:
# wczytujemy przykladowe grupy uzytkownikow
groups = pandas.read_csv('groups.csv').values.tolist()
groups

[[111, 307, 474, 599, 414],
 [469, 182, 232, 448, 600],
 [508, 581, 497, 402, 566],
 [300, 515, 245, 568, 507],
 [2, 371, 252, 518, 37],
 [269, 360, 469, 287, 308],
 [243, 527, 418, 118, 370],
 [186, 559, 327, 553, 314]]

In [5]:
# przygotowujemy funkcje pomocnicza

def describe_group(group, N=10):
    print(f'\n\nUser ids: {group}')
    group_size = len(group)
    
    mean_stdev = ratings.loc[group].std(axis=0).mean()
    median_stdev = ratings.loc[group].std(axis=0).median()
    std_stdev = ratings.loc[group].std(axis=0).std()
    print(f'\nMean ratings deviation: {mean_stdev}')
    print(f'Median ratings deviation: {median_stdev}')
    print(f'Standard deviation of ratings deviation: {std_stdev}')
    
    average_scores = ratings.iloc[group].mean(axis=0)
    average_scores = average_scores.sort_values()
    best_movies = [(movies_metadata['title'][movie_id], average_scores[movie_id]) for movie_id in list(average_scores[-N:].index)]
    worst_movies = [(movies_metadata['title'][movie_id], average_scores[movie_id]) for movie_id in list(average_scores[:N].index)]
    
    print('\nBest movies:')
    for movie, score in best_movies[::-1]:
        print(f'{movie}, {score}*')
    print('\nWorst movies:')
    for movie, score in worst_movies:
        print(f'{movie}, {score}*')

describe_group(groups[7])



User ids: [186, 559, 327, 553, 314]

Mean ratings deviation: 3.2637528944479266
Median ratings deviation: 3.361547262794322
Standard deviation of ratings deviation: 0.9110521258998032

Best movies:
Broadway Danny Rose (1984), 9.0*
Just Friends (2005), 9.0*
Iron Man (1931), 9.0*
Oh, Hello: On Broadway (2017), 8.8*
Pekka ja Pätkä salapoliiseina (1957), 8.6*
Chain of Fools (2000), 8.6*
Tomorrow Never Dies (1997), 8.6*
Partly Cloudy (2009), 8.6*
Shake Hands with the Devil (2007), 8.6*
Solaris (2002), 8.6*

Worst movies:
Wicker Man, The (2006), 1.8*
Night of the Living Dead (1990), 2.6*
Russians Are Coming, the Russians Are Coming, The (1966), 2.8*
Bad Asses on the Bayou (2015), 2.8*
Inferno (2016), 2.8*
Lean on Me (1989), 2.8*
Lion in Winter, The (1968), 2.8*
See No Evil, Hear No Evil (1989), 3.0*
Who's That Knocking at My Door? (1967), 3.0*
Mary and Max (2009), 3.0*


## Część 2. - algorytmy proste

In [6]:
# zdefiniujmy interfejs dla wszystkich algorytmow rekomendacyjnych

class Recommender:
    def recommend(self, movies, ratings, group, size):
        pass


# jako pierwszy zaimplementujemy algorytm losowy - dla porownania
    
class RandomRecommender(Recommender):
    def __init__(self):
        self.name = 'random'
        
    def recommend(self, movies, ratings, group, size):
        return sample(movies, size)

In [7]:
# algorytm rekomendujacy filmy o najwyzszej sredniej ocen

class AverageRecommender(Recommender):
    def __init__(self):
        self.name = 'average'
    
    def recommend(self, movies, ratings, group, size):
        return list(ratings.iloc[group].mean(axis=0).sort_values(ascending=False).index[:size])

In [8]:
# algorytm rekomendujacy filmy o najwyzszej sredniej ocen,
#   ale rownoczesnie wykluczajacy te filmy, ktore otrzymaly choc jedna ocene ponizej thresholdu

class AverageWithoutMiseryRecommender(Recommender):
    def __init__(self, score_threshold):
        self.name = 'average_without_misery'
        self.score_threshold = score_threshold
        
    def recommend(self, movies, ratings, group, size):
        return list(ratings.iloc[group].mean(axis=0).sort_values(ascending=False)[ratings.iloc[group].min(axis=0) >= self.score_threshold].index[:size])

In [22]:
# algorytm uwzgledniajacy preferencje tylko jednego uzytkownika w kazdej iteracji

class FairnessRecommender(Recommender):
    def __init__(self):
        self.name = 'fairness'
        
    def recommend(self, movies, ratings, group, size):
        result = []
        iter_idx = 0
        users_idx = {user_id: 0 for user_id in group}
        while len(result) < size:
            user_id = group[iter_idx % len(group)]
            top_movie = ratings.loc[user_id].sort_values(ascending=False).index[users_idx[user_id]]
            if top_movie not in result:
                result.append(top_movie)
            iter_idx += 1
            users_idx[user_id] += 1
        return result

In [24]:
# wybrany algorytm wyborczy (dyktatura, glosowanie proste, Borda, Copeland)

class VotingRecommender(Recommender):
    def __init__(self):
        self.name = 'voting'
    
    def recommend(self, movies, ratings, group, size):
        # worst user movie gets 0, second worst 1, etc.
        movies_scores = {movie: 0 for movie in movies}
        for user_id in group:
            for i, movie_id in enumerate(ratings.loc[user_id].sort_values(ascending=False).index):
                movies_scores[movie_id] += i
        return list(sorted(movies_scores, key=movies_scores.get, reverse=True)[:size])

In [26]:
# algorytm zachlanny, aproksymujacy metode Proportional Approval Voting
#   w kazdej iteracji wybieramy ten film, ktory najbardziej zwieksza zadowolenie zgodnie z punktacja PAV

class ProportionalApprovalVotingRecommender(Recommender):
    def __init__(self, threshold):
        self.threshold = threshold
        self.name = 'PAV'
        
    def recommend(self, movies, ratings, group, size):
        # first element gets 1 another one 1/2 another 1/3 etc.
        result = []
        movies_scores = {movie: 0 for movie in movies}
        for user_id in group:
            for i, movie_id in enumerate(ratings.loc[user_id].sort_values(ascending=False).index):
                movies_scores[movie_id] += 1 / (i + 1)
        while len(result) < size:
            best_movie = None
            best_score = 0
            for movie in movies:
                if movie in result:
                    continue
                score = movies_scores[movie]
                if score > best_score:
                    best_score = score
                    best_movie = movie
            result.append(best_movie)
        return result

## Część 3. - funkcje celu

In [12]:
# dwie funkcje pomocnicze:
#  - znajdujaca ulubione filmy danego uzytkownika
#  - obliczajaca sume ocen wystawionych przez uzytkownika wszystkim filmom w rekomendacji

def top_n_movies_for_user(ratings, movies, user_id, n):
    movies = []
    for movie_id in ratings.loc[user_id].sort_values(ascending=False).index:
        if len(movies) >= n:
            break
        movies.append(movie_id)
    return movies

def total_score(recommendation, user_id, ratings):
    return sum([ratings.loc[user_id][movie_id] for movie_id in recommendation])

In [13]:
# funkcja obliczajaca zadowolenie pojedynczego uzytkownika
#  - iloraz zadowolenia z wygenerowanej rekomendacji oraz zadowolenia z hipotetycznej rekomendacji idealnej
def overall_user_satisfaction(recommendation, user_id, movies, ratings):
    return total_score(recommendation, user_id, ratings) / total_score(top_n_movies_for_user(ratings, movies, user_id, len(recommendation)), user_id, ratings)

# funkcja celu - srednia z zadowolenia wszystkich uzytkownikow w grupie
def overall_group_satisfaction(recommendation, group, movies, ratings):
    return mean([overall_user_satisfaction(recommendation, user_id, movies, ratings) for user_id in group])

# funkcja celu - roznica miedzy maksymalnym i minimalnym zadowolenie w grupie
def group_disagreement(recommendation, group, movies, ratings):
    return max([overall_user_satisfaction(recommendation, user_id, movies, ratings) for user_id in group]) - min([overall_user_satisfaction(recommendation, user_id, movies, ratings) for user_id in group])

## Część 4. - Sequential Hybrid Aggregation

In [28]:
# algorytm balansujacy pomiedzy wyborem elementow o najwyzszej sredniej ocen
#   i o najwyzszej minimalnej ocenie
#   wyliczajacy w kazdej iteracji parametr alfa - jak na wykladzie
class SequentialHybridAggregationRecommender(Recommender):
    def __init__(self):
        self.name = 'sequential_hybrid_aggregation'

    def recommend(self, movies, ratings, group, size):
        # Calculate average score and least score for each movie in the group
        avg_score = ratings.loc[group].mean(axis=0)
        least_score = ratings.loc[group].min()
        alpha = 1

        # Create a dictionary to store scores for each movie
        score = {movie: 0 for movie in ratings.columns}

        recommendation = []

        # Iterate through the specified number of recommendations
        for _ in range(size):
            # Calculate the score for each movie using the weighted average
            score.update(
                {movie: (1 - alpha) * avg_score.loc[movie] + alpha * least_score.loc[movie] for movie in movies})

            # Set the score to -1 for movies already recommended
            score.update({movie: -1 for movie in recommendation})

            # Find the movie with the maximum score
            max_score_movie = max(score.items(), key=lambda k: k[1])[0]
            recommendation.append(max_score_movie)

            # Update alpha using the group disagreement function
            alpha = group_disagreement(recommendation, group, movies, ratings)

        return recommendation
    

## Część 5. - porównanie algorytmów

In [29]:
recommenders = [
    RandomRecommender(),
    AverageRecommender(),
    AverageWithoutMiseryRecommender(5),
    FairnessRecommender(),
    VotingRecommender(),
    ProportionalApprovalVotingRecommender(5),
    SequentialHybridAggregationRecommender()
]

recommendation_size = 10

# dla kazdego algorytmu:
#  - wygenerujmy jedna rekomendacje dla kazdej grupy
#  - obliczmy wartosci obu funkcji celu dla kazdej rekomendacji
#  - obliczmy srednia i odchylenie standardowe dla obu funkcji celu
#  - wypiszmy wyniki na konsole

for recommender in recommenders:
    print(f'\n\n{recommender.name}')
    overall_group_satisfactions = []
    group_disagreements = []
    for group in groups:
        recommendation = recommender.recommend(movies, ratings, group, recommendation_size)
        overall_group_satisfaction_value = overall_group_satisfaction(recommendation, group, movies, ratings)
        group_disagreement_value = group_disagreement(recommendation, group, movies, ratings)
        overall_group_satisfactions.append(overall_group_satisfaction_value)
        group_disagreements.append(group_disagreement_value)
        print(f'\nRecommendation: {recommendation}')
        print(f'Overall group satisfaction: {overall_group_satisfaction_value}')
        print(f'Group disagreement: {group_disagreement_value}')
    print(f'\nMean overall group satisfaction: {mean(overall_group_satisfactions)}')
    print(f'Standard deviation of overall group satisfaction: {stdev(overall_group_satisfactions)}')
    print(f'Mean group disagreement: {mean(group_disagreements)}')
    print(f'Standard deviation of group disagreement: {stdev(group_disagreements)}')



random

Recommendation: [79, 115819, 8391, 96411, 4844, 6820, 26828, 1043, 130052, 27311]
Overall group satisfaction: 0.8266190476190476
Group disagreement: 0.08214285714285707

Recommendation: [5116, 3813, 6157, 122906, 100383, 91660, 103335, 11, 33154, 113705]
Overall group satisfaction: 0.8097142857142857
Group disagreement: 0.21571428571428564

Recommendation: [2307, 3531, 59026, 971, 115210, 6232, 3808, 947, 116505, 8614]
Overall group satisfaction: 0.5840000000000001
Group disagreement: 0.25000000000000006

Recommendation: [104, 2822, 4831, 26776, 27769, 89837, 90243, 66785, 1807, 7620]
Overall group satisfaction: 0.6880000000000001
Group disagreement: 0.15000000000000002

Recommendation: [5445, 7984, 170597, 178, 123200, 6788, 25788, 5787, 32584, 8454]
Overall group satisfaction: 0.61
Group disagreement: 0.29

Recommendation: [7782, 4723, 165947, 1394, 50685, 27865, 428, 151, 1719, 122898]
Overall group satisfaction: 0.612
Group disagreement: 0.15999999999999992

Recommendatio