Skip to content

Final code submission for the master thesis "Cultural differences in various Wikipedia language versions - An investigation using a movie recommender system"

Notifications You must be signed in to change notification settings

voitijaner/Movie-RSs-Master-Thesis-Submission-Voit

Repository files navigation

Movie-RSs-Master-Thesis-Submission-Voit

Final code submission for the master thesis "Cultural differences in various Wikipedia language versions - An investigation using a movie recommender system"

Python environemnts

  1. For data transformations, the RDF2Vec implementations (https://github.com/IBCNServices/pyRDF2Vec/tree/4c534b4e1e6df8e6c2f0ae18a43283d5ce320858. Version from Mar 29, 2020) and the similar movie extraction a python 3.7 eviroment is required. This includes the following files:
  • side_data_to_movie_list.py
  • similar_movie_extractions.py
  • statistic_calculations.py
  1. The movie GraphLab Create recommender system needs a python 2.7 environemt. Additional requirements can be found in https://turi.com/download/install-graphlab-create-command-line.html. This environment is used for the following files:
  • item_similarity_recommender_genre_performance.py
  • item_similarity_recommender_number_of_recommendations_productionCountry_genres.py
  • item_similarity_recommender_overall_performance.py
  • item_similarity_recommender_year_performance.py

Intital data which is needed to run the program

Note: The files 1. lang_mapping_list_2016.pkl and 2. final_movie_genre_year_county_list.csv / .pkl already exist. They will be overwritten by 1. similar_movie_extractions.py and 2. side_data_to_movie_list.py files, but are needed for the opposite program part.

├───data
│   ├───cosine_similarities
│   ├───lang_versions # https://wiki.dbpedia.org/downloads-2016-10 (Mappingbased Objects and Instance Types Transitive for each langauge version, .ttl format)
│   │   └───mapping_to_other_languages 	# https://wiki.dbpedia.org/downloads-2016-10 (Interlanguage Links for the english language version, .ttl format)
│   ├───movielense	# https://grouplens.org/datasets/movielens/1m/ (all files, without folder)
│   │   └───movielense_mapping	# https://github.com/sisinflab/LODrecsys-datasets/blob/master/Movielens1M/MappingMovielens2DBpedia-1.2.tsv (renamed to movielensmapping.csv)
│   └───similar_movies
│       └───genre

File definition

Each file contains method definitions or program parts, even if they are duplicated in other files. With this techinique so it is easier to review and understand each of the submitted files individually.

  1. similar_movie_extractions.py
    Creates with the use of RDF2Vec python library mentioned above the embeddings for the movies and stores the similar movie lists.

  2. side_data_to_movie_list.py
    Transforms the MovieLense movie side data and enriches it with production country informations from Wikidata.

  3. statistic_calculations.py
    Calculates the sparsity of the data sets at different points of data preprocessing.

  4. item_similarity_recommender_overall_performance.py
    Calculates the overall performance over the whole data set for each language version with different k nearest neighbours

  5. item_similarity_recommender_number_of_recommendations_productionCountry_genres.py
    Calculates the number of recommended production countries and genres per language version over the whole dataset

  6. item_similarity_recommender_genre_performance.py
    Calculates the genre specific performances for each language version.

  7. item_similarity_recommender_year_performance.py
    Calculates the year specific performance for each langauge verison.

Order of execution

The file 1) similar_movie_extractions.py has to be run before the item_similarity_recommender files 4) - 6)

About

Final code submission for the master thesis "Cultural differences in various Wikipedia language versions - An investigation using a movie recommender system"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages