# Publication recommendation system

## 1. Read dataset linking wikipedia articles with publications and create a bipartite graph of the relation

In [1]:
import os
import sys
import numpy as np
import pandas as pd
import networkx as nx
%matplotlib inline
import matplotlib.pyplot as plt

In [2]:
%load_ext autoreload
%autoreload 2

### Read wikipedia references from a TSV file

In [3]:
base_path = '../data/raw'
processed_path = '../data/processed'

In [4]:
# read TSV data
df = pd.read_csv(os.path.join(base_path,'enwiki.tsv'), sep='\t', parse_dates=['timestamp'],infer_datetime_format=True)

# Convert mistakenly converted type nan to string 'NaN' (wikipedia page name)
df.page_title = df.page_title.fillna("NaN")

df.head(5)

Unnamed: 0,page_id,page_title,rev_id,timestamp,type,id
0,2867096,Mu Aquilae,503137751,2012-07-19 16:08:41,doi,10.1051/0004-6361:20078357
1,2867096,Mu Aquilae,508363722,2012-08-20 22:56:21,arxiv,astro-ph/0604502
2,2867096,Mu Aquilae,508363722,2012-08-20 22:56:21,arxiv,astro-ph/0003329
3,2867096,Mu Aquilae,508363722,2012-08-20 22:56:21,arxiv,0708.1752
4,2867096,Mu Aquilae,503137751,2012-07-19 16:08:41,doi,10.1051/0004-6361:20064946


Add new book information to the dataframe.

In [5]:
df_new = pd.DataFrame({'page_id':[19555312], 'page_title':['Lager'], 'rev_id':[0], 'timestamp':['2015-02-09 17:01:25'],'type':['isbn'],'id':['9780937381502']})

In [6]:
df = df.append(df_new, ignore_index=True)

In [7]:
book_title = 'Designing Great Beers: The Ultimate Guide to Brewing Classic Beer Styles'
df[df.id == '9780937381502']

Unnamed: 0,page_id,page_title,rev_id,timestamp,type,id
390251,5963160,Mash ingredients,640564057,2015-01-01 21:40:40,isbn,9780937381502
3594658,812938,India pale ale,646372580,2015-02-09 17:24:48,isbn,9780937381502
3794695,19555312,Lager,0,2015-02-09 17:01:25,isbn,9780937381502


### Create a directed bipartite graph of references from wikipedia pages to publications

**Create a bipartite graph connecting wiki pages and publications**

In [43]:
# import the project module containing functions for reading data from wikipedia
# and working with the graph-based recommendation system
sys.path.append('../src')
from recomm.graph_rank import GraphRank

Create a GraphRank object - a graph-based model for publication recommendation.

In [9]:
gr = GraphRank()

In [10]:
gr.build_graph(df, 'page_title', 'page_id', 'type', 'id')

** Test of the recommendation system**

* Publications related to "Designing Great Beers"

In [44]:
gr.find_most_relevant(('isbn','9780937381502'), 10)

Original publication: ('isbn', '9780937381502') 
Title: Designing Great Beers: The Ultimate Guide to Brewing Classic ... 


3 pages referring to the publication:
 ['Mash ingredients', 'India pale ale', 'Lager'] 


Number of categories for level 2 publications: 3
Rank: 1 
Citations: 6
ID: ('isbn', '0195367138')
Source: https://books.google.com/books?isbn=0195367138
Title: The Oxford Companion to Beer 

Rank: 2 
Citations: 2
ID: ('isbn', '9781466881952')
Source: https://books.google.com/books?isbn=9781466881952
Title: The Encyclopedia of Beer: The Beer Lover's Bible - A ... 

Rank: 3 
Citations: 2
ID: ('isbn', '9780299188948')
Source: https://books.google.com/books?isbn=9780299188948
Title: The Best Breweries and Brewpubs of Illinois: Searching for ... 

Rank: 4 
Citations: 2
ID: ('isbn', '9780865715561')
Source: https://books.google.com/books?isbn=9780865715561
Title: Fermenting Revolution: How to Drink Beer and Save the World 

Rank: 5 
Citations: 1
ID: ('isbn', '9780937381694')
Source