The goal of this notebook is to make functions to make recommendations for the data we scraped and then choose the function that makes the best recommendations

In [1]:
import pandas as pd

In [2]:
sim = pd.read_csv('../data_for_notebooks/recommendation_matrix.csv') 
sim_1 = pd.read_csv('../data_for_notebooks/recommendation_matrix_wo_tfidf.csv')
sim_2 = pd.read_csv('../data_for_notebooks/recommendation_matrix_w_lem.csv')
sim_3 = pd.read_csv('../data_for_notebooks/recommendation_matrix_w_lem_wo_tfidf.csv')
df = pd.read_csv('../data_for_notebooks/recomm_df.csv')
sim_cat = pd.read_csv('../data_for_notebooks/recommendation_df_cat.csv')

Thus we have the following dataframes loaded: 

**sim** : similarity matrix with stemming and tfidf

**sim_1** : similarity matrix made without using tfidf but with stemming

**sim_2** : similarity matrix made with lemmatizing and tfidf

**sim_3** : similarity matrix made with lemmatization and without tfidf

**sim_cat** : similarity matrix made using category column and with stemming but without tfidf

**df** : dataframe with all the internships

In [3]:
# setting id column as index
sim.set_index('id', inplace = True)
sim_1.set_index('id', inplace = True)
sim_2.set_index('id', inplace = True)
sim_3.set_index('id', inplace = True)
sim_cat.set_index('id', inplace = True)

Making the recommendation function:

In [4]:
def make_recs(sim, df, i, n):
    '''
    returns a dataframe of top n recommendations, based on the similarity matrix provided and dataframe 
    provided, for a user who viewed the internship with the ith ID.
    
    INPUT:
    sim - similarity matrix(dataframe)
    df - original dataframe with all the data
    i - id of the internship that was viewed by the user
    n - top n recommendations to be made to the user 
    
    OUTPUT:
    recs_df - dataframe consisting of the recommended internships
    
    '''
    ith_series = sim.loc[:,str(i)]
    ith_series = ith_series.sort_values(ascending = False)
    recs = ith_series.head(n+1).index.tolist()
    
    # what might happen is that multiple elements attain maximum similarity value. Then it is possible that 
    # we don't get i in our recs. so for that the below is done. 
    if i in recs: 
        recs.remove(i)
    else:
        recs = recs[:-1]
        
    # below ensures that the order of the recommendations is as in recs cause otherwise the use of .isin
    # reorders recs in the way they appear in df, i.e, in ascending order
    recs_df = df[df.id.isin(recs)].set_index('id').T[recs].T.reset_index()
    
    return recs_df

Running an example below for all the similarity matrices :

In [5]:
# original internship viewed by the user
df[df.id == 110]

Unnamed: 0,id,job_title,company_name,job_loc,details,category,compensation,start,end,skills,href
106,110,hr recruitment intern,infytalent hr services,pune,we are looking for hr traineeresponsib...,human resources recruiter,paid,2019-02-17,2019-12-30,agreeableness,http://letsintern.com/internship/Human-Resourc...


### Recommended internships using different dataframes : 

### for sim :

In [6]:
make_recs(sim, df, 110, 3)

Unnamed: 0,id,job_title,company_name,job_loc,details,category,compensation,start,end,skills,href
0,499,hr recruitment intern,infytalent hr services,anywhere in india,we are looking for hr traineeresponsib...,human resources recruiter,paid,2019-02-17,2019-12-30,no skills preferred,http://letsintern.com/internship/Human-Resourc...
1,303,hr recruitment executive,brilliant seeker,thane,end to end recruitmentindividual has t...,human resources assistant,paid,2018-11-05,2019-10-30,"english comprehension,computer literacy",http://letsintern.com/internship/Human-Resourc...
2,203,hr / human resource / recruitment / administra...,manpho,anywhere in india,we are seeking hr (human resource) rep...,human resources recruiter,unpaid,2019-03-05,2019-03-29,no skills preferred,http://letsintern.com/internship/Human-Resourc...


### for sim_1 :

In [7]:
make_recs(sim_1, df, 110, 3) 

Unnamed: 0,id,job_title,company_name,job_loc,details,category,compensation,start,end,skills,href
0,499,hr recruitment intern,infytalent hr services,anywhere in india,we are looking for hr traineeresponsib...,human resources recruiter,paid,2019-02-17,2019-12-30,no skills preferred,http://letsintern.com/internship/Human-Resourc...
1,225,laravel backend developer intern,intact,anywhere in india,we are looking for back-end developers...,web developer,paid,2019-01-06,2019-06-30,"javascript,php,jquery,ajax",http://letsintern.com/internship/Web-Developer...
2,303,hr recruitment executive,brilliant seeker,thane,end to end recruitmentindividual has t...,human resources assistant,paid,2018-11-05,2019-10-30,"english comprehension,computer literacy",http://letsintern.com/internship/Human-Resourc...


### for sim_2 :

In [8]:
make_recs(sim_2, df, 110, 3)

Unnamed: 0,id,job_title,company_name,job_loc,details,category,compensation,start,end,skills,href
0,499,hr recruitment intern,infytalent hr services,anywhere in india,we are looking for hr traineeresponsib...,human resources recruiter,paid,2019-02-17,2019-12-30,no skills preferred,http://letsintern.com/internship/Human-Resourc...
1,303,hr recruitment executive,brilliant seeker,thane,end to end recruitmentindividual has t...,human resources assistant,paid,2018-11-05,2019-10-30,"english comprehension,computer literacy",http://letsintern.com/internship/Human-Resourc...
2,492,hr / human resource / recruitment / administra...,manpho,bangalore,we are seeking hr (human resource) rep...,human resources recruiter,unpaid,2019-03-05,2019-03-29,analytical skills,http://letsintern.com/internship/Human-Resourc...


###  for sim_3 :

In [9]:
make_recs(sim_3, df, 110,3)

Unnamed: 0,id,job_title,company_name,job_loc,details,category,compensation,start,end,skills,href
0,499,hr recruitment intern,infytalent hr services,anywhere in india,we are looking for hr traineeresponsib...,human resources recruiter,paid,2019-02-17,2019-12-30,no skills preferred,http://letsintern.com/internship/Human-Resourc...
1,225,laravel backend developer intern,intact,anywhere in india,we are looking for back-end developers...,web developer,paid,2019-01-06,2019-06-30,"javascript,php,jquery,ajax",http://letsintern.com/internship/Web-Developer...
2,303,hr recruitment executive,brilliant seeker,thane,end to end recruitmentindividual has t...,human resources assistant,paid,2018-11-05,2019-10-30,"english comprehension,computer literacy",http://letsintern.com/internship/Human-Resourc...


### for sim_cat :

In [10]:
make_recs(sim_cat, df, 110, 3)

Unnamed: 0,id,job_title,company_name,job_loc,details,category,compensation,start,end,skills,href
0,1,hr executive - recruitment,engenia technologies,gurgaon,we are seeking a hr recruiter who will...,human resources recruiter,paid,2019-03-02,2019-08-28,hr practices,http://letsintern.com/internship/Human-Resourc...
1,160,flight attendant (cabin crew )for fresher,ht pvt ltd,anywhere in india,congratulation to all job seekers ther...,human resources recruiter,paid,2019-03-01,,no skills preferred,http://letsintern.com/internship/Human-Resourc...
2,207,recruiter,soven developer,anywhere in india,looking for a recruiter - intern for a...,human resources recruiter,paid,2018-12-09,2019-01-15,human resource practices,http://letsintern.com/internship/Human-Resourc...


Now after a lot of mixing and matching and trying out for different ids, we came upon the decision to take a weighted sum of sim_cat and sim_1 where sim_cat will be as it is and sim_1 will be multiplied by 0.3. we do this because the weighted sum gives us a really good mix of good and relevant recommendations with seredipitous recommendations. A lot of recommendations by sim_1 are a little offbeat and help us in keeping the recommendations fresh and the users interest intact. The decision of taking the weights as 1 and 0.3 have come from experimentation and by looking at the values of sim_1 and sim_cat(sim_1 has the largest value of 520 whereas sim_cat has 5). 

We have done this as recommendation systems are a mix of science and art and this was the art portion where there was no exact way to determine which recommendation system was performing the best.

In [11]:
def combine_make_recs(df, i, n):
    '''
    takes the weighted sum of 2 similarity matrices: sim_cat and sim_1. Returns recommendations using the 
    summed up matrix.
    
    INPUT:
    df - original dataframe with all the data
    i - id of the internship that was viewed by the user
    n - top n recommendations to be made to the user 
    
    OUTPUT:
    recs_df - dataframe consisting of the recommended internships
    
    '''
    sim_combined = 0.3*sim_1 + sim_cat
    recs_df = make_recs(sim_combined, df, i, n)
    return recs_df

In [12]:
combine_make_recs(df, 110, 3)

Unnamed: 0,id,job_title,company_name,job_loc,details,category,compensation,start,end,skills,href
0,499,hr recruitment intern,infytalent hr services,anywhere in india,we are looking for hr traineeresponsib...,human resources recruiter,paid,2019-02-17,2019-12-30,no skills preferred,http://letsintern.com/internship/Human-Resourc...
1,303,hr recruitment executive,brilliant seeker,thane,end to end recruitmentindividual has t...,human resources assistant,paid,2018-11-05,2019-10-30,"english comprehension,computer literacy",http://letsintern.com/internship/Human-Resourc...
2,255,mba internship program with highrise at ahmedabad,highrise management services pvt ltd,ahmedabad,"dear candidate,highrise is one of the ...",human resources professional,unpaid,2019-03-31,2019-07-30,"human resource situation handling,hum...",http://letsintern.com/internship/Human-Resourc...
