## Getting one example of a preferred book per user

The data is extracted from the user/book rating matrix.
All books rated 4 and higher (the maximum is 5 according to the goodreads scale) and queried for each user, and one is picked randomly.
The output is a csv file with user_id, book_id and the name of the book

In [None]:
# Import Libraries
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import random
from collections import Counter
%matplotlib inline

In [None]:
def get_random_good_book(df):
    """ Pick one random liked book per user

    Args:
        df_ratings (:obj:`DataFrame`): pandas DataFrame of user books ratings
    Returns:
        :obj:`user_gb`: dictionnary of random book per user
    """    
    user_gb = {}
    for i in range(df.shape[0]):
        choice = random.choice(df.iloc[i][1])
        user_gb[df.iloc[i][0]] = random.choice(df.iloc[i][1])
    return user_gb

In [3]:
# Read in data of users ratings and book id
user_all = pd.read_csv('user_id_rating_book_all.csv')

# Extract books with high ratings
max_ratings = user_all[user_all['rating'] >= 4]
# Group by user id and add column list_books_4
test = max_ratings.groupby('user_id')['book_id_gr'].apply(list).reset_index(name='list_books_4')

In [5]:
# Get the random (good) book and create new DataFrame
user_gb = get_random_good_book(test)
df_user_gb = pd.DataFrame(list(zip(list(user_gb.keys()), list(user_gb.values()))), columns=['user_id', 'book_id'])

In [6]:
# Combine book id with title 
bookssf = pd.read_csv('books-scifi-authors.csv')
user_book_title = bookssf[bookssf['book_id'].isin(list(user_gb.values()))]
user_book = user_book_title.drop(user_book_title.columns[[0, 1, 3, 4, 5, 6, 7, 8]], axis=1)

u_b = pd.merge(df_user_gb, user_book, on='book_id')

In [7]:
u_b.head(5)

Unnamed: 0,user_id,book_id,title
0,1073,9623,Dandelion Wine
1,1272,5863651,"Blood of Ambrose (Morlock Ambrosius, #1)"
2,1440,92769,"Heir of Sea and Fire (Riddle-Master, #2)"
3,125382,92769,"Heir of Sea and Fire (Riddle-Master, #2)"
4,1639,3852641,"Little Brother (Little Brother, #1)"


In [100]:
u_b.to_csv('user_book_fav.csv', index=False)