Skip to content

wiflore/IBM_Articles_Recomender

Repository files navigation

Recommendations with IBM

Introduction

For this project I analyzed the interactions that users have with articles on the IBM Watson Studio platform, and make recommendations to them about new articles that they might will like. Below there is an example of what the dashboard could look like displaying articles on the IBM Watson Platform.

Though the above dashboard is just showing the newest articles, having a recommendation board available here that shows the articles that are most pertinent to a specific user.

In order to determine which articles to show to each user, I performed a study of the data available on the IBM Watson Studio platform. I created an account to become a part of their community, and get a better understanding of their data by creating an account on the platform here.

Tasks

I. Exploratory Data Analysis

Before making recommendations of any kind, I explored the data. There are some basic, required questions to be answered about the data you are working with throughout the rest of the notebook. I Used this space to explore, before I dived into the details of the recommendation system in the later sections.

II. Rank Based Recommendations

To get started in building recommendations, I first found the most popular articles simply based on the most interactions. Since there are no ratings for any of the articles, it is easy to assume the articles with the most interactions are the most popular. These are then the articles we might recommend to new users (or anyone depending on what we know about them).

III. User-User Based Collaborative Filtering

In order to build better recommendations for the users of IBM's platform, we could look at users that are similar in terms of the items they have interacted with. These items could then be recommended to the similar users. This would be a step in the right direction towards more personal recommendations for the users.

IV. Content Based Recommendations

D Many machine learning algorithms require the input to be represented as a fixed-length feature vector. When it comes to texts, one of the most common fixed-length features is bag-of-words. Despite their popularity, bag-of-words features have two major weaknesses: they lose the ordering of the words and they also ignore semantics of the words. For example, "powerful," "strong" and "Paris" are equally distant. Paragraph Vector is an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents. The algorithm represents each document by a dense vector which is trained to predict words in the document. Its construction gives our algorithm the potential to overcome the weaknesses of bag-of-words models. Empirical results show that Paragraph Vectors outperform bag-of-words models as well as other techniques for text representations. Finally, it achieve new state-of-the-art results on several text classification and sentiment analysis tasks.

Doc2vec (aka paragraph2vec, aka sentence embeddings) modifies the word2vec algorithm to unsupervised learning of continuous representations for larger blocks of text, such as sentences, paragraphs or entire documents.

https://arxiv.org/abs/1405.4053
https://rare-technologies.com/doc2vec-tutorial/

V. Matrix Factorization

Finally, I completed a machine learning approach to building recommendations. Using the user-item interactions, I built out a matrix decomposition. Using the decomposition, I got an idea of how well I can predict new articles an individual might interact with (spoiler alert - it isn't great). I finally discussed which methods I might use moving forward, and how I might test how well the recommendations are working for engaging users.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published