This project is to predict how fit the candidate is based on their available information (job title and the number of connection on Linkedin), utilizing Doc2Vec, BERT, ELMo embeddings and Learning to Rank model. By calculating the similarity score based on the embeddings of the desired job query and job title, scaling the number of connection, and then starring top 10 fittest applicants, LGBMRanker will be deployed to predict their ranking using the metric ndcg score.

**Data Description:**

- id : unique identifier for candidate (numeric)

- job_title : job title for candidate (text)

- location : geographical location for candidate (text)

- connections: number of connections candidate has, 500+ means over 500 (text)

- Output (desired target): fit - how fit the candidate is for the role? (numeric, probability between 0-1)

Keywords: “Aspiring human resources” or “seeking human resources”

# Table of contents
- [Check datatype,  duplicates & missing values](#nan)
- [Doc2Vec](#d2v)
- [BERT](#bt)
- [ELMo](#elmo)
- [Learning to Rank](#ltr)
- [Conclusion](#cl)


In [1]:
import numpy as np
import pandas as pd
pd.set_option('max_colwidth', 400)
import gensim
import gensim.downloader as api
import re
from gensim.corpora import Dictionary
from gensim.utils import simple_preprocess
from gensim.models.doc2vec import Doc2Vec, TaggedDocument
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.preprocessing import MinMaxScaler
from nltk.corpus import stopwords
from nltk import download
from sentence_transformers import SentenceTransformer
import lightgbm 
from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.metrics import ndcg_score
import random
from tqdm import tqdm

import tensorflow_hub as hub
import tensorflow as tf
from tensorflow.python.framework.ops import disable_eager_execution
tf.compat.v1.disable_eager_execution()

download('stopwords')  # Download stopwords list.
stop_words = stopwords.words('english')

import warnings
warnings.filterwarnings('ignore')

[nltk_data] Downloading package stopwords to /Users/thao/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [2]:
df=pd.read_csv('potential-talents - Aspiring human resources - seeking human resources.csv')
df.head()

Unnamed: 0,id,job_title,location,connection,fit
0,1,2019 C.T. Bauer College of Business Graduate (Magna Cum Laude) and aspiring Human Resources professional,"Houston, Texas",85,
1,2,Native English Teacher at EPIK (English Program in Korea),Kanada,500+,
2,3,Aspiring Human Resources Professional,"Raleigh-Durham, North Carolina Area",44,
3,4,People Development Coordinator at Ryan,"Denton, Texas",500+,
4,5,Advisory Board Member at Celal Bayar University,"İzmir, Türkiye",500+,


In [3]:
print(f' This dataset has {df.shape[0]} rows and {df.shape[1]} columns.')

 This dataset has 104 rows and 5 columns.


# Check datatype,  duplicates & missing values <a class="anchor" id="nan"></a>

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 104 entries, 0 to 103
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   id          104 non-null    int64  
 1   job_title   104 non-null    object 
 2   location    104 non-null    object 
 3   connection  104 non-null    object 
 4   fit         0 non-null      float64
dtypes: float64(1), int64(1), object(3)
memory usage: 4.2+ KB


In [5]:
df.isna().sum()

id              0
job_title       0
location        0
connection      0
fit           104
dtype: int64

In [6]:
df.duplicated().sum()

0

In [7]:
# we'll use only 2 columns job_title and connection
docs=df[['job_title','connection']]
docs.head()

Unnamed: 0,job_title,connection
0,2019 C.T. Bauer College of Business Graduate (Magna Cum Laude) and aspiring Human Resources professional,85
1,Native English Teacher at EPIK (English Program in Korea),500+
2,Aspiring Human Resources Professional,44
3,People Development Coordinator at Ryan,500+
4,Advisory Board Member at Celal Bayar University,500+


# Doc2Vec <a class="anchor" id="d2v"></a>

In [8]:
# prepare dataset for doc2vec by tokenizing and removing stop words

def preprocess(sentence):
    return [w for w in simple_preprocess(sentence) if w not in stop_words]

data = [preprocess(d) for d in docs['job_title']]

# look at the first 5 values
for i in range (5):
    print(f"Job title {i}: {docs['job_title'][i]}")
    print('After preprocessing:', data[i],'\n')
    print('=============================\n')

Job title 0: 2019 C.T. Bauer College of Business Graduate (Magna Cum Laude) and aspiring Human Resources professional
After preprocessing: ['bauer', 'college', 'business', 'graduate', 'magna', 'cum', 'laude', 'aspiring', 'human', 'resources', 'professional'] 


Job title 1: Native English Teacher at EPIK (English Program in Korea)
After preprocessing: ['native', 'english', 'teacher', 'epik', 'english', 'program', 'korea'] 


Job title 2: Aspiring Human Resources Professional
After preprocessing: ['aspiring', 'human', 'resources', 'professional'] 


Job title 3: People Development Coordinator at Ryan
After preprocessing: ['people', 'development', 'coordinator', 'ryan'] 


Job title 4: Advisory Board Member at Celal Bayar University
After preprocessing: ['advisory', 'board', 'member', 'celal', 'bayar', 'university'] 




In [9]:
# Create the tagged document needed for Doc2Vec
def create_tagged_document(list_of_list_of_words):
    for i, list_of_words in enumerate(list_of_list_of_words):
        yield TaggedDocument(list_of_words, [i])

train_data = list(create_tagged_document(data))

# sanity check
print(train_data[:1])

[TaggedDocument(words=['bauer', 'college', 'business', 'graduate', 'magna', 'cum', 'laude', 'aspiring', 'human', 'resources', 'professional'], tags=[0])]


In [10]:
# Initiate the Doc2Vec model
doc2vec = Doc2Vec(min_count=1, epochs=100)

# Build the Vocabulary
doc2vec.build_vocab(train_data)

# Train the Doc2Vec model
doc2vec.train(train_data, total_examples=doc2vec.corpus_count, epochs=doc2vec.epochs)

# sanity check
print("Raw text:", docs.loc[11,'job_title'])
print("\nAfter preprocessing: ", preprocess(docs.loc[11,'job_title']))
print("\nVector:", doc2vec.infer_vector(preprocess(docs.loc[11,'job_title'])))

Raw text: SVP, CHRO, Marketing & Communications, CSR Officer | ENGIE | Houston | The Woodlands | Energy | GPHR | SPHR

After preprocessing:  ['svp', 'chro', 'marketing', 'communications', 'csr', 'officer', 'engie', 'houston', 'woodlands', 'energy', 'gphr', 'sphr']

Vector: [-0.10691799 -0.06836569 -0.19530652 -0.0626753   0.01557998 -0.17214178
 -0.04185408  0.00810375 -0.03722889 -0.10382488 -0.04980865 -0.06761736
 -0.04846491 -0.0109671   0.04851419 -0.136051    0.02338213 -0.13134462
  0.00915116  0.1314455   0.04632595  0.10297962  0.13377146 -0.01888299
  0.09488532 -0.06988629 -0.07636788 -0.07854041 -0.17132972 -0.13502465
  0.12179243  0.1369402   0.0614278   0.02229712  0.00998656  0.16442414
 -0.14905068 -0.06417432 -0.09183645 -0.11946054  0.00405092  0.03931822
 -0.02637834 -0.1921826  -0.01984078 -0.02502044  0.10387113 -0.08810337
  0.15761821  0.18379055  0.00273399  0.12167688  0.09034173  0.01714149
  0.05855346 -0.02097488 -0.02265622 -0.06827357 -0.09534511 -0.04513

In [11]:
query = 'Aspiring human resources'  

# embed all job titles and calculate Cosine Similarity with the query
doc2vec.random.seed(11)
docs['Doc2Vec_Cosine_Similarity'] = docs.apply((lambda row: cosine_similarity(
                                        doc2vec.infer_vector(preprocess(row['job_title'])).reshape(1,-1), 
                                        doc2vec.infer_vector(preprocess(query)).reshape(1,-1))[0][0]), axis=1)

docs.sort_values(by='Doc2Vec_Cosine_Similarity', ascending=False).head(20)

Unnamed: 0,job_title,connection,Doc2Vec_Cosine_Similarity
77,Human Resources Generalist at Schwan's,500+,0.996022
70,"Human Resources Generalist at ScottMadden, Inc.",500+,0.995471
93,Seeking Human Resources Opportunities. Open to travel and relocation.,415,0.995366
94,Student at Westfield State University,57,0.995169
75,Aspiring Human Resources Professional | Passionate about helping to create an inclusive and engaging work environment,212,0.995004
102,Always set them up for Success,500+,0.994992
87,Human Resources Management Major,18,0.994867
103,Director Of Administration at Excellence Logging,500+,0.994567
101,Business Intelligence and Analytics at Travelers,49,0.994506
90,Lead Official at Western Illinois University,39,0.994412


In [12]:
# turn 500+ connection into 500 
docs['connection'] = np.where(docs['connection'] == '500+ ', 500, docs['connection'])

# scale connection
scaler = MinMaxScaler()
docs[['Scaled_Connection']] = scaler.fit_transform(docs[['connection']])

# calculate ranking by weighted sum of cosine_similarity and scaled_connection
docs['Doc2Vec_Ranking']=docs['Doc2Vec_Cosine_Similarity']*0.8 + docs['Scaled_Connection']*0.2

# sort ranking
docs.sort_values(by='Doc2Vec_Ranking', ascending=False).head(20)

Unnamed: 0,job_title,connection,Doc2Vec_Cosine_Similarity,Scaled_Connection,Doc2Vec_Ranking
77,Human Resources Generalist at Schwan's,500,0.996022,1.0,0.996818
70,"Human Resources Generalist at ScottMadden, Inc.",500,0.995471,1.0,0.996377
102,Always set them up for Success,500,0.994992,1.0,0.995994
103,Director Of Administration at Excellence Logging,500,0.994567,1.0,0.995654
12,Human Resources Coordinator at InterContinental Buckhead Atlanta,500,0.993857,1.0,0.995086
74,"Nortia Staffing is seeking Human Resources, Payroll & Administrative Professionals!! (408) 709-2621",500,0.993494,1.0,0.994796
66,"Human Resources, Staffing and Recruiting Professional",500,0.993116,1.0,0.994493
26,Aspiring Human Resources Management student seeking an internship,500,0.9931,1.0,0.99448
84,RRP Brand Portfolio Executive at JTI (Japan Tobacco International),500,0.993054,1.0,0.994443
52,Seeking Human Resources HRIS and Generalist Positions,500,0.992563,1.0,0.994051


# BERT <a class="anchor" id="bt"></a>

In [13]:
# use pre-trained BERT model: all-mpnet-base-v2
model = SentenceTransformer('all-mpnet-base-v2')
print(model)

SentenceTransformer(
  (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
  (2): Normalize()
)


In [14]:
# calculate Cosine Similarity between the query and all job titles
random.seed(22)
docs['BERT_Cosine_Similarity'] = docs.apply((lambda row: cosine_similarity(
                                                model.encode(row['job_title']).reshape(1,-1),
                                                model.encode(query).reshape(1,-1))[0][0]),axis=1)

# sort it    
docs.sort_values(by='BERT_Cosine_Similarity', ascending=False).head(20)

Unnamed: 0,job_title,connection,Doc2Vec_Cosine_Similarity,Scaled_Connection,Doc2Vec_Ranking,BERT_Cosine_Similarity
45,Aspiring Human Resources Professional,44,0.987335,0.086172,0.807102,0.879234
2,Aspiring Human Resources Professional,44,0.991057,0.086172,0.81008,0.879234
20,Aspiring Human Resources Professional,44,0.99269,0.086172,0.811386,0.879234
96,Aspiring Human Resources Professional,71,0.990583,0.140281,0.820523,0.879234
32,Aspiring Human Resources Professional,44,0.992645,0.086172,0.81135,0.879234
57,Aspiring Human Resources Professional,44,0.991254,0.086172,0.810238,0.879234
16,Aspiring Human Resources Professional,44,0.980862,0.086172,0.801924,0.879234
48,Aspiring Human Resources Specialist,1,0.985467,0.0,0.788373,0.864964
5,Aspiring Human Resources Specialist,1,0.990545,0.0,0.792436,0.864964
59,Aspiring Human Resources Specialist,1,0.99092,0.0,0.792736,0.864964


BERT tends to perform better than doc2vec when top rankings are job titles including exactly all the keywords.

In [15]:
# calculate ranking by weighted sum of cosine_similarity and scaled_connection
docs['BERT_Ranking']=docs['BERT_Cosine_Similarity']*0.8 + docs['Scaled_Connection']*0.2

# sort ranking
docs.sort_values(by='BERT_Ranking', ascending=False).head(20)

Unnamed: 0,job_title,connection,Doc2Vec_Cosine_Similarity,Scaled_Connection,Doc2Vec_Ranking,BERT_Cosine_Similarity,BERT_Ranking
66,"Human Resources, Staffing and Recruiting Professional",500,0.993116,1.0,0.994493,0.697636,0.758108
96,Aspiring Human Resources Professional,71,0.990583,0.140281,0.820523,0.879234,0.731443
32,Aspiring Human Resources Professional,44,0.992645,0.086172,0.81135,0.879234,0.720622
16,Aspiring Human Resources Professional,44,0.980862,0.086172,0.801924,0.879234,0.720622
20,Aspiring Human Resources Professional,44,0.99269,0.086172,0.811386,0.879234,0.720622
57,Aspiring Human Resources Professional,44,0.991254,0.086172,0.810238,0.879234,0.720622
45,Aspiring Human Resources Professional,44,0.987335,0.086172,0.807102,0.879234,0.720622
2,Aspiring Human Resources Professional,44,0.991057,0.086172,0.81008,0.879234,0.720622
29,Seeking Human Resources Opportunities,390,0.99337,0.779559,0.950608,0.696302,0.712953
27,Seeking Human Resources Opportunities,390,0.99437,0.779559,0.951408,0.696302,0.712953


# ELMo <a class="anchor" id="elmo"></a>

In [16]:
elmo = hub.Module("https://tfhub.dev/google/elmo/3", trainable=True)

def create_elmo_embeddings(x):
  embeddings = elmo([x], signature="default", as_dict=True)["elmo"]
  with tf.compat.v1.Session() as sess:
    sess.run(tf.compat.v1.global_variables_initializer())
    sess.run(tf.compat.v1.tables_initializer())
    # return average of ELMo features
    return sess.run(tf.reduce_mean(embeddings,1))

# sanity check
print('Embedding for the query: ', create_elmo_embeddings(query))
print('\nEmbedding vector shape', create_elmo_embeddings(query).shape)      

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore
2023-09-11 21:54:10.144017: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


Embedding for the query:  [[-0.05690157 -0.08409429 -0.21298319 ...  0.20266978  0.16399087
  -0.24327344]]
INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore



Embedding vector shape (1, 1024)


In [17]:
# calculate Cosine Similarity between the query and all job titles
random.seed(77)
docs['ELMO_Cosine_Similarity'] = docs.apply((lambda row: cosine_similarity(
                                                    create_elmo_embeddings(row['job_title']).reshape(1,-1),
                                                    create_elmo_embeddings(query).reshape(1,-1))[0][0]),axis=1)

# sort it    
docs.sort_values(by='ELMO_Cosine_Similarity', ascending=False)[['job_title','ELMO_Cosine_Similarity']].head(20)

INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


Unnamed: 0,job_title,connection,Doc2Vec_Cosine_Similarity,Scaled_Connection,Doc2Vec_Ranking,BERT_Cosine_Similarity,BERT_Ranking,ELMO_Cosine_Similarity
96,Aspiring Human Resources Professional,71,0.990583,0.140281,0.820523,0.879234,0.731443,0.730009
2,Aspiring Human Resources Professional,44,0.991057,0.086172,0.81008,0.879234,0.720622,0.730009
45,Aspiring Human Resources Professional,44,0.987335,0.086172,0.807102,0.879234,0.720622,0.730009
57,Aspiring Human Resources Professional,44,0.991254,0.086172,0.810238,0.879234,0.720622,0.730009
20,Aspiring Human Resources Professional,44,0.99269,0.086172,0.811386,0.879234,0.720622,0.730009
32,Aspiring Human Resources Professional,44,0.992645,0.086172,0.81135,0.879234,0.720622,0.730009
16,Aspiring Human Resources Professional,44,0.980862,0.086172,0.801924,0.879234,0.720622,0.730009
23,Aspiring Human Resources Specialist,1,0.990798,0.0,0.792638,0.864964,0.691971,0.720284
48,Aspiring Human Resources Specialist,1,0.985467,0.0,0.788373,0.864964,0.691971,0.720284
35,Aspiring Human Resources Specialist,1,0.990959,0.0,0.792767,0.864964,0.691971,0.720284


# Learning to Rank <a class="anchor" id="ltr"></a> 

In [32]:
df_ltr=docs.sort_values('Doc2Vec_Ranking', ascending=False)[['Doc2Vec_Cosine_Similarity','BERT_Cosine_Similarity','ELMO_Cosine_Similarity','Scaled_Connection']]
# set 0 as the default value for target rank
df_ltr['Target_Rank']=0
df_ltr

Unnamed: 0,Doc2Vec_Cosine_Similarity,BERT_Cosine_Similarity,ELMO_Cosine_Similarity,Scaled_Connection,Target_Rank
77,0.996022,0.589669,0.445697,1.000000,0
70,0.995471,0.559501,0.411359,1.000000,0
102,0.994992,0.296588,0.274777,1.000000,0
103,0.994567,0.335996,0.269287,1.000000,0
12,0.993857,0.541241,0.402998,1.000000,0
...,...,...,...,...,...
0,0.930990,0.721934,0.428323,0.168337,0
18,0.930845,0.721934,0.428323,0.168337,0
13,0.928390,0.721934,0.428323,0.168337,0
71,0.964931,0.638739,0.488449,0.008016,0


In [20]:
# assume that top 10 applicants are considered to be the fittest ones, hence they'll be starred or 
# we'll set 1 for their target rank
df_ltr.iloc[:10]['Target_Rank']=1
df_ltr

Unnamed: 0,Doc2Vec_Cosine_Similarity,BERT_Cosine_Similarity,ELMO_Cosine_Similarity,Scaled_Connection,Target_Rank
77,0.996022,0.589669,0.445697,1.000000,1
70,0.995471,0.559501,0.411359,1.000000,1
102,0.994992,0.296588,0.274777,1.000000,1
103,0.994567,0.335996,0.269287,1.000000,1
12,0.993857,0.541241,0.402998,1.000000,1
...,...,...,...,...,...
0,0.930990,0.721934,0.428323,0.168337,0
18,0.930845,0.721934,0.428323,0.168337,0
13,0.928390,0.721934,0.428323,0.168337,0
71,0.964931,0.638739,0.488449,0.008016,0


In [22]:
X=df_ltr[['Doc2Vec_Cosine_Similarity','BERT_Cosine_Similarity','ELMO_Cosine_Similarity','Scaled_Connection']]
y=df_ltr['Target_Rank']

# train test split
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.2, random_state=1)

Since Scikit learn does not support the metric ndcg score, let's create a custom randomized search cv.

In [23]:
param_dic = {
    'learning_rate': list(np.arange(0.001, 0.1, 0.001)),
    'n_estimators': list(range(5, 101)),
    'num_leaves': list(range(5, 51)),
    'max_depth': list(range(1, 20)),
    'min_split_gain':list(10**i for i in range (-10,1)),
    'min_child_samples':list(range(5, 21))}

def randomized_search_cv_custom(x_train_total, y_train_total):
    random.seed(88)    
    x_train_total, y_train_total = x_train_total.reset_index(drop=True), y_train_total.reset_index(drop=True)
    
    # we would use the same default value of n_iter=10 as RandomizedSearchCV, 
    # and store the mean score of each iteration into these 2 lists
    train_scores = []
    test_scores = []

    # create a list to save the random param set
    param_list=[]
    skf = StratifiedKFold(n_splits=5)

    for iteration in tqdm(range(10)):
        # for each iteration, we would take a random param set from param_dic and save it into a list
        random_params = {k: random.choice(v) for k, v in param_dic.items()}
        param_list.append(random_params)
        
        # each random param set would have 5 train and test scores  
        trainscores_folds = []
        testscores_folds = []

        for fold_number, (train_index, test_index) in enumerate(skf.split(X_train, y_train)):
            
            # select datapoints based on test_index and train_index
                x_train_fold = x_train_total.iloc[train_index]
                y_train_fold = y_train_total.iloc[train_index]
                x_test_fold = x_train_total.iloc[test_index]
                y_test_fold = y_train_total.iloc[test_index]

#             # sanity check     
#                 print('Iter', iteration)
#                 print('Fold number', fold_number)
#                 print('Train index', list(x_train_fold.index))
#                 print('Test index', list(x_test_fold.index))
#                 print('--------------------------------------')
                
            # call and fit the classifier on the x_train_fold    
                gbm = lightgbm.LGBMRanker(objective='lambdarank', n_jobs=-1, **random_params)
                query_train = [x_train_fold.shape[0]]
                gbm.fit(x_train_fold, y_train_fold, group=query_train, eval_metric='ndcg')

            # predict x_test_fold and append the ndcg score in the testscores_folds
                test_pred = gbm.predict(x_test_fold)
                y_test_df = pd.DataFrame({"relevance_score": y_test_fold, "predicted_ranking": test_pred})
                true_relevance = y_test_fold.sort_values(ascending=False)
                relevance_score = y_test_df.sort_values("predicted_ranking", ascending=False)
                testscores_folds.append(ndcg_score([true_relevance.to_numpy()], 
                                                   [relevance_score["relevance_score"].to_numpy()]))

            # predict x_train_fold and append the ndcg score in the trainscores_folds
                train_pred = gbm.predict(x_train_fold)
                y_train_df = pd.DataFrame({"relevance_score": y_train_fold, "predicted_ranking": train_pred})
                train_true_relevance = y_train_fold.sort_values(ascending=False)
                train_relevance_score = y_train_df.sort_values("predicted_ranking", ascending=False)
                trainscores_folds.append(ndcg_score([train_true_relevance.to_numpy()], 
                                                    [train_relevance_score["relevance_score"].to_numpy()]))

    # append the mean score for each random param set
    train_scores.append(np.mean(np.array(trainscores_folds)))
    test_scores.append(np.mean(np.array(testscores_folds)))
    
    # get the highest validation score and best param set
    highest_test_score=max(test_scores)
    highest_test_score_index=np.argmax(test_scores)
    best_param=param_list[highest_test_score_index]
    return highest_test_score, best_param

# sanity check 
#randomized_search_cv_custom(X_train, y_train)

In [24]:
# apply custom randomized search on train set
highest_test_score, best_param = randomized_search_cv_custom(X_train, y_train)
print('Highest validation score', highest_test_score)
print('\nBest params:', best_param)

100%|████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:01<00:00,  6.87it/s]

Highest validation score 0.8770910987887561

Best params: {'learning_rate': 0.051000000000000004, 'n_estimators': 29, 'num_leaves': 26, 'max_depth': 6, 'min_split_gain': 1, 'min_child_samples': 5}





In [28]:
# call and fit the best model
gbm_optimal = lightgbm.LGBMRanker(**best_param, random_state=100)
query_train = [X_train.shape[0]]
gbm_optimal.fit(X_train, y_train, group=query_train, eval_metric='ndcg')

# predict and put into the dataframe
X_train["Predicted_relevance_score"] = gbm_optimal.predict(X_train)
X_train['Target_Rank'] = y_train
X_test["Predicted_relevance_score"] = gbm_optimal.predict(X_test)
X_test['Target_Rank'] = y_test

# sort 
X_train_sorted = X_train.sort_values("Predicted_relevance_score", ascending=False)
X_test_sorted = X_test.sort_values("Predicted_relevance_score", ascending=False)
X_train_sorted.head()

Unnamed: 0,Doc2Vec_Cosine_Similarity,BERT_Cosine_Similarity,ELMO_Cosine_Similarity,Scaled_Connection,Predicted_relevance_score,Target_Rank
52,0.992563,0.591348,0.484473,1.0,1.672627,1
70,0.995471,0.559501,0.411359,1.0,1.672627,1
77,0.996022,0.589669,0.445697,1.0,1.672627,1
26,0.9931,0.63158,0.628849,1.0,1.672627,1
103,0.994567,0.335996,0.269287,1.0,1.672627,1


In [30]:
# calculate ndcg score
y_train_true=[y_train.sort_values(ascending=False).to_numpy()]
y_train_score=[X_train_sorted['Target_Rank'].to_numpy()]
y_test_true=[y_test.sort_values(ascending=False).to_numpy()]
y_test_score=[X_test_sorted['Target_Rank'].to_numpy()]

ndcg_train=ndcg_score(y_train_true, y_train_score, k=10)
ndcg_test=ndcg_score(y_test_true, y_test_score, k=10)

print('nDCG@10 score on TRAIN set:', round(ndcg_train,2))
print('nDCG@10 score on TEST set:', round(ndcg_test,2))
print('nDCG@10 score DIFFERENCE:', round(ndcg_train-ndcg_test,2))

nDCG@10 score on TRAIN set: 1.0
nDCG@10 score on TEST set: 1.0
nDCG@10 score DIFFERENCE: 0.0


We've built a desired ranking model with ndcg score are all absolute on both train and test sets.

# Conclusion <a class="anchor" id="cl"></a> 

This project features how to rank job applicants based on a fitness score - the weighted sum of similarity score and scaled connection - based on 3 kinds of embeddings (Doc2Vec, BERT, and ELMo). After starring some (or top 10) candidates as the fittest ones, a learning-to-rank model with features including 3 similarity scores and scaled connection will re-rank and evaluate the top 10 resulting positions based on the metric ndcg score. And with the custom randomized search cv, we've found the best learning-to-rank model with ndcg score = 1.0 on both train and test sets.

**Room for improvement:** we can filter out candidates who in the first place should not be in this list by setting a threshold for the similarity score between the job title and the query. For example, after sorting and considering the relevance between the bottom job titles and the query, we can determine a cut-off point such that a candidate will be considered potentially fit if their similarity score is >= 0.5 or >= 0.6. 