Topic-Level Bayesian Surprise and Serendipity for Recommender Systems

This repository contains the code and dataset for the paper "Topic-Level Bayesian Surprise and Serendipity for Recommender Systems", Tonmoy Hasan and Razvan Bunescu, ACM RecSys 2023.

Machine Configuration:

The experiments were conducted on a distributed system utilizing a cluster of multiple computers, typically ranging from 10 to 30, which operated simultaneously. Each computer had the following configuration:

Dual 24-Core Intel Xeon Gold 6248R CPU @ 3.00GHz (48 cores per node)
384GB RAM (8GB per core)
100GBit EDR Infiniband Interconnect

Running the code on a personal computer or a single machine may significantly increase the processing time, possibly taking 5-10 days to complete some operations. So it is recommended to run the code on a distributed system if you have the opportunity. Please feel free to reach out to us regarding this.

Experiment Replication:

To replicate our experiments, follow the steps below:

Get Raw Dataset:
- From the following link, download the raw dataset named "goodread_reviews_dedup.json.gz" that we have used in our experiments: Raw Dataset
Create the 'Books' Set:
- Option 1: Use the provided code "create_Books_set.py" from the "data_creation" folder to create the "Books" set, where each file represents a book with a maximum of 10,000 tokens.
- Option 2: Download the pre-created "Books" set from the following link: Books
Create the 'Users' Set:
- Option 1: Use the provided code "create_Users_set.py" from the "data_creation" folder to create the "Users" set, where each file represents a user's reading history. Each user includes the following fields: "book_id", "user_id", "rating", "timestring", "timestamp".
- Option 2: Download the pre-created "Users" set from the following link: Users
Train the LDA Model:
- Option 1: After creating the dataset, train an LDA model using the books that contain at least 1000 tokens. You can use the provided code "train_lda_model.py"
- Option 2: Use our pre-trained model available in the following link: pre_trained_lda_model
Topic Distribution Creation:
- Use the pre-trained LDA model to generate topic distributions for all the books in the "Books" set. Save the topic distributions in the "topic_distrs" folder. You can accomplish this by running the "create_topic_distr.py" file.
Create User Preferences:
- We generated time series of user preferences for all 26,374 users using five different models separately, three of which are based on online learning of normally distributed preference vectors. You can generate user preferences for each model using the provided code in the "create_user_preferences" folder.
Surprising and Serendipitous Item Recommendation:
- Run the code from the "surprise" and "serendipity" folders to get the final results.

Annotated Dataset:

The repository also includes the annotated dataset for four reference users. The files "User1.txt," "User2.txt," "User3.txt," and "User4.txt" contain human annotations for these users. Each file provides the following information in chronological order of the users' reading history:

Book_number
Book_id
Book_title
Human_identified_main_topic
Summary
Human decision
Reason

For books that are surprising in terms of their topic, the annotation for "Human decision" is "Surprising", otherwise, the annotation is "Not surprising."

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
annotated_users		annotated_users
create_user_preferences		create_user_preferences
data_creation		data_creation
model		model
serendipity		serendipity
surprise		surprise
topic_distribution_creation		topic_distribution_creation
LICENSE		LICENSE
README.md		README.md
bayesian_surprise.py		bayesian_surprise.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

annotated_users

annotated_users

create_user_preferences

create_user_preferences

data_creation

data_creation

model

model

serendipity

serendipity

surprise

surprise

topic_distribution_creation

topic_distribution_creation

LICENSE

LICENSE

README.md

README.md

bayesian_surprise.py

bayesian_surprise.py

Repository files navigation

Topic-Level Bayesian Surprise and Serendipity for Recommender Systems

Machine Configuration:

Experiment Replication:

Get Raw Dataset:

Create the 'Books' Set:

Create the 'Users' Set:

Train the LDA Model:

Topic Distribution Creation:

Create User Preferences:

Surprising and Serendipitous Item Recommendation:

Annotated Dataset:

If you have any questions or need assistance, please feel free to reach out to us at thasan1@charlotte.edu

About

Releases

Packages

Languages

License

Ton-moy/surprise-and-serendipity

Folders and files

Latest commit

History

Repository files navigation

Topic-Level Bayesian Surprise and Serendipity for Recommender Systems

Machine Configuration:

Experiment Replication:

Get Raw Dataset:

Create the 'Books' Set:

Create the 'Users' Set:

Train the LDA Model:

Topic Distribution Creation:

Create User Preferences:

Surprising and Serendipitous Item Recommendation:

Annotated Dataset:

If you have any questions or need assistance, please feel free to reach out to us at thasan1@charlotte.edu

About

Resources

License

Stars

Watchers

Forks

Languages