GitHub

Recommendations with IBM

Introduction

For this project I analyzed the interactions that users have with articles on the IBM Watson Studio platform, and make recommendations to them about new articles that they might will like. Below there is an example of what the dashboard could look like displaying articles on the IBM Watson Platform.

Though the above dashboard is just showing the newest articles, having a recommendation board available here that shows the articles that are most pertinent to a specific user.

In order to determine which articles to show to each user, I performed a study of the data available on the IBM Watson Studio platform. I created an account to become a part of their community, and get a better understanding of their data by creating an account on the platform here.

Tasks

I. Exploratory Data Analysis

Before making recommendations of any kind, I explored the data. There are some basic, required questions to be answered about the data you are working with throughout the rest of the notebook. I Used this space to explore, before I dived into the details of the recommendation system in the later sections.

II. Rank Based Recommendations

To get started in building recommendations, I first found the most popular articles simply based on the most interactions. Since there are no ratings for any of the articles, it is easy to assume the articles with the most interactions are the most popular. These are then the articles we might recommend to new users (or anyone depending on what we know about them).

III. User-User Based Collaborative Filtering

In order to build better recommendations for the users of IBM's platform, we could look at users that are similar in terms of the items they have interacted with. These items could then be recommended to the similar users. This would be a step in the right direction towards more personal recommendations for the users.

IV. Content Based Recommendations

D Many machine learning algorithms require the input to be represented as a fixed-length feature vector. When it comes to texts, one of the most common fixed-length features is bag-of-words. Despite their popularity, bag-of-words features have two major weaknesses: they lose the ordering of the words and they also ignore semantics of the words. For example, "powerful," "strong" and "Paris" are equally distant. Paragraph Vector is an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents. The algorithm represents each document by a dense vector which is trained to predict words in the document. Its construction gives our algorithm the potential to overcome the weaknesses of bag-of-words models. Empirical results show that Paragraph Vectors outperform bag-of-words models as well as other techniques for text representations. Finally, it achieve new state-of-the-art results on several text classification and sentiment analysis tasks.

Doc2vec (aka paragraph2vec, aka sentence embeddings) modifies the word2vec algorithm to unsupervised learning of continuous representations for larger blocks of text, such as sentences, paragraphs or entire documents.

https://arxiv.org/abs/1405.4053
https://rare-technologies.com/doc2vec-tutorial/

V. Matrix Factorization

Finally, I completed a machine learning approach to building recommendations. Using the user-item interactions, I built out a matrix decomposition. Using the decomposition, I got an idea of how well I can predict new articles an individual might interact with (spoiler alert - it isn't great). I finally discussed which methods I might use moving forward, and how I might test how well the recommendations are working for engaging users.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md
Recommendations_with_IBM.html		Recommendations_with_IBM.html
Recommendations_with_IBM.ipynb		Recommendations_with_IBM.ipynb
articleID_mapping.p		articleID_mapping.p
doc2vec_model		doc2vec_model
requirements.txt		requirements.txt
top_10.p		top_10.p
top_20.p		top_20.p
top_5.p		top_5.p

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

Recommendations_with_IBM.html

Recommendations_with_IBM.html

Recommendations_with_IBM.ipynb

Recommendations_with_IBM.ipynb

articleID_mapping.p

articleID_mapping.p

doc2vec_model

doc2vec_model

requirements.txt

requirements.txt

top_10.p

top_10.p

top_20.p

top_20.p

top_5.p

top_5.p

Repository files navigation

Recommendations with IBM

Introduction

Tasks

About

Releases

Packages

Languages

License

wiflore/IBM_Articles_Recomender

Folders and files

Latest commit

History

Repository files navigation

Recommendations with IBM

Introduction

Tasks

About

Resources

License

Stars

Watchers

Forks

Languages