<a href="https://colab.research.google.com/github/tolerevo/00-hola-mundo/blob/main/Resume_AI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# How to Use this Notebook

If you want to play around with the code, you'll need to hit **file** > **save a copy in Drive** and then work in that copy.

---

This is the code used in [this video](https://youtu.be/Kpm8rEywBDQ) where I try to explore why resumes might be getting (auto)rejected, and, in the process explore resume creation under different effort conditions.

One treatment was an AI-assisted resume - this thing.

In this notebook, I attempt to make a modularized resume generator with the assistance on AI. Given a job posting and a pre-existing resume, I will create several blocks of code that will generate or assist in the generation of:

- Contact Information
- Objective
- Work Experience
- Skills & Interests

This notebook is uses forms to make it a little less scary for people who are code averse (and also because I didn't comment my code so it's pretty hard to parse lmao). Anyway, if you do want to see the code just click **show code**. The bulk of the stuff is in the helper function cell.

In [None]:
#@title Run Once at the Start
#@markdown Installs and imports the packages necessary for this bad boy.
#@markdown
#@markdown Provide the path to a folder containing your resume and job application in CSV format.
#@markdown
#@markdown Examples here for [resume](https://docs.google.com/spreadsheets/d/1TnO_vdqVnmw1QKXLppENw58n6HXbfSuVmsOckUv2Jbg/edit?usp=sharing) and [job posting](https://docs.google.com/spreadsheets/d/1SDFtY45m3m5C2r9wokx8eNvv1oH-5pJgZuUCvJ2YY7w/edit?usp=sharing).
#@markdown Because I don't want to spend time data cleaning, ur gonna need to format your resume and posting like the examples provided above otherwise this isn't gonna work lmao.
#@markdown
#@markdown Include the .csv at the end of the filenames. If the document isn't a .csv, buddy, u gotta go make it a .csv. This system simply is not robust.

PATH = '' #@param {type:"string"}

!pip install pysummarization
!pip install pyinflect
!pip install fuzzywuzzy

import pyinflect
from fuzzywuzzy import process, fuzz

from google.colab import files

from pysummarization.nlpbase.auto_abstractor import AutoAbstractor
from pysummarization.tokenizabledoc.simple_tokenizer import SimpleTokenizer
from pysummarization.abstractabledoc.top_n_rank_abstractor import TopNRankAbstractor

import pandas as pd
import math

import re
import string
import nltk
import spacy
import pandas as pd
import numpy as np
import math
from tqdm import tqdm

from spacy.matcher import Matcher
from spacy.tokens import Span
from spacy import displacy
from spacy.util import filter_spans

nlp = spacy.load('en_core_web_sm')
pd.set_option('display.max_colwidth', 200)

path = PATH
posting_filename = ''#@param{type:"string"}
resume_filename = ''#@param{type:"string"}

posting = pd.read_csv(path + '/' + posting_filename)
resume = pd.read_csv(path + '/' + resume_filename)

In [None]:
#@title Helper Functions
#@markdown This block contains helper functions. Don't judge my documentation. If you find that the text isn't being processed correctly with your examples, it's probably a problem with some of the functions in here. These bad boys are *not robust* lmao.

# ---------------------------------------------------
# Career Summary
# ---------------------------------------------------

def get_title(text):
  """
  Given a title from a job posting, retain the key noun components to avoid copy-paste.
  """
  pattern=[{'POS': 'NOUN', 'OP': '+'}]

  # instantiate a Matcher instance
  matcher = Matcher(nlp.vocab)

  # Add pattern to matcher
  matcher.add("verb-phrases", None, pattern)

  # apply text-preprocessing
  text = prepro(text)

  # create spacy object
  doc = nlp(text)

  # call the matcher to find matches
  matches = matcher(doc)

  spans = [doc[start:end] for _, start, end in matches]

  filtered_spans = filter_spans(spans)

  return str(filtered_spans[0])

def summarize_posting(text):
  """
  Given a paragraph string, return 1/2 of the most important sentences
  as a form of text summary.
  """
  # Object of automatic summarization.
  auto_abstractor = AutoAbstractor()
  # Set tokenizer.
  auto_abstractor.tokenizable_doc = SimpleTokenizer()
  # Set delimiter for making a list of sentence.
  auto_abstractor.delimiter_list = [".", "\n", "?"]
  # Object of abstracting and filtering document.
  abstractable_doc = TopNRankAbstractor()
  # Summarize document.
  result_dict = auto_abstractor.summarize(text, abstractable_doc)
  scores = result_dict['scoring_data']
  scores.sort(key = lambda x: x[1], reverse = True)

  # return 1/3 of the most important sentences as a summary.
  summary = []
  max = math.ceil(len(scores)/2)
  while len(summary) < max:
    i = scores[0][0]
    summary.append(result_dict['summarize_result'][i])
    scores.pop(0)

  return '\n'.join(summary).strip()

def gen_objective(job_title, job_summary):
  """
  Prompt input for objective. Return a final objective.
  """
  objective = 'I am a young professional and I am ready to be your {} because'.format(job_title)
  addon = input('Job description summary:\n"{}"\n\nYour current objective is:\n{}\n\nWhy are you a good fit for this job?: \n> '.format(job_summary, objective))
  if addon == '':
    objective = objective + '!'
  else:
    objective = objective + ' ' + addon
  print()
  print('Your new objective is:\n{}'.format(objective))
  return(objective)


# ---------------------------------------------------
# Work Experience
# ---------------------------------------------------

def prepro(t, punc = False):
  """
  Pre-process any strings. Doesn't delete punctuation because that causes some problems with rule-based matching.
  We hate robust code.
  """
  # lower case
  t = t.lower()
  # remove numbers
  t = re.sub(r'\d+', '', t)
  # remove punctuation
  if punc:
    t = t.translate(str.maketrans('', '', string.punctuation))
  # remove white space
  t = t.strip()
  return t

def make_past(text):
  """
  Given a string, if there are verbs present, return the string with verbs converted to past tense (if applicable)
  """

  doc_dep = nlp(text)
  for i in range(len(doc_dep)):
      token = doc_dep[i]
      if token.tag_ in ['VBP', 'VBZ', 'VBG', 'VB', "VBP"]:
          # print(token.text, token.lemma_, token.pos_, token.tag_)
          text = text.replace(token.text, token._.inflect("VBD"))
  return text

def make_present(text):
  """
  Given a string, if there are verbs present, return the string with verbs converted to present tense (if applicable)
  """

  doc_dep = nlp(text)
  for i in range(len(doc_dep)):
      token = doc_dep[i]
      if token.tag_ in ['VBP', 'VBZ', 'VBG', 'VB', "VBP"]:
          # print(token.text, token.lemma_, token.pos_, token.tag_)
          text = text.replace(token.text, token._.inflect("VBZ"))
  return text

def get_actions(responsibilities):
  """
  Given a list of responsibilities from a job posting, return a summary of those responsibilities as action items
  """
  actions = []

  pattern=[{'POS': 'VERB', 'OP': '+'},
           {'POS': 'ADJ', 'OP': '*'},
           {'POS': 'ADP', 'OP': '*'},
           {'POS': 'NOUN', 'OP': '+'}]

  # instantiate a Matcher instance
  matcher = Matcher(nlp.vocab)

  # Add pattern to matcher
  matcher.add("verb-phrases", None, pattern)

  for r in responsibilities:
    # apply text-preprocessing
    #r = prepro(r)

    # create spacy object
    doc = nlp(r)

    # call the matcher to find matches
    matches = matcher(doc)

    spans = [doc[start:end] for _, start, end in matches]

    filtered_spans = filter_spans(spans)

    for span in filtered_spans:
      t = make_past(str(span))
      t = t[0].upper() + t[1:]
      if t[-1] != '.':
        t = t+'.'
      actions.append(t)

  return actions

def match_actions(job_actions, strOptions):
  """
  Given a set of strings detailing job actions, find each elements best match in a second list of past actions.
  Return a list of tuples detailing each job action, its best match, and the corresponding score.
  """
  matches = []
  for action in job_actions:
    str2Match = action
    best_option = strOptions[0]
    best_score = fuzz.token_sort_ratio(str2Match, strOptions[0])
    # manually comparing using fuzz.token_sort_ratio over process.extractOne because the latter's choices are kinda whack.
    for option in strOptions:
      option_score = fuzz.token_sort_ratio(str2Match, option)
      if option_score >= best_score:
        best_score = option_score
        best_option = option
    matches.append((str2Match, best_option, best_score))
    # for applications and resumes with closer matches, it would be interesting to set a cap on similarity to avoid repeat entries

  return matches

def match_action_to_job(job_actions, past_actions, resume):
  """
  This function is pretty disgusting ngl. It is heavily reliant on messy formatting restricted to this pass at code.
  Anyway, it takes
  """
  exp_in_job = []
  action_matches = pd.DataFrame(matches, columns = ['job_action', 'best_match', 'score'])

  for e in action_matches.values:
    action = e[0]
    match = e[1]
    match_i = resume[resume['my_actions']==match].index.values
    exp_in_job.append((action, resume.loc[match_i, 'my_titles'].values[0]))

  for e in resume.values:
    action = e[2]
    job = e[1]
    exp_in_job.append((action, job))

  exp_in_job = pd.DataFrame(exp_in_job, columns = ['action', 'title'])

  titles = []
  action_job_matches = {}

  for i in range(len(exp_in_job)):
    t = exp_in_job.loc[i, 'title']
    a = exp_in_job.loc[i, 'action']
    if t not in titles:
      titles.append(t)
      action_job_matches[t] = []
    action_job_matches[t].append(a)
  return action_job_matches

# --------------------------------------
# Skills & Interests
# --------------------------------------

def extract_interests(sq_list):
  """
  Given a list of skills and qualifications on a job posting, extract a set of interests.
  """
  interests = []
  # may start with a noun, absolutely cannot have a verb, must end with a noun
  pattern = [{'POS': 'NOUN', 'OP': '*'},
            {'POS': 'VERB', 'OP': '!'},
            {'POS': 'NOUN', 'OP': '+'}]

  # instantiate a Matcher instance
  matcher = Matcher(nlp.vocab)

  # Add pattern to matcher
  matcher.add("verb-phrases", None, pattern)

  for e in sq_list:
    doc = nlp(prepro(e, punc = True))
    matches = matcher(doc)
    spans = [doc[start:end] for _, start, end in matches]
    for e in filter_spans(spans):
      for chunk in e.noun_chunks:
        interests.append(chunk.text)

  return interests

def extract_skills(sq_list):
  """
  Given a list of skills and qualifications on a job posting, extract a set of interests.
  """
  output = []
  # may start with a noun, absolutely cannot have a verb, must end with a noun
  pattern = [{'POS': 'VERB', 'OP': '+'},
            {'POS': 'ADP', 'OP': '*'},
            {'POS': 'ADJ', 'OP': '*'},
            {'POS': 'NOUN', 'OP': '+'}]

  # instantiate a Matcher instance
  matcher = Matcher(nlp.vocab)

  # Add pattern to matcher
  matcher.add("verb-phrases", None, pattern)

  for e in sq_list:
    doc = nlp(prepro(e, punc = True))
    matches = matcher(doc)
    spans = [doc[start:end] for _, start, end in matches]
    for e in filter_spans(spans):
      output.append(make_present(str(e)))
  return output

In [None]:
#@title Personal Information
#@markdown This information isn't stored in any way but if you don't feel comfortable putting stuff in, that's fine. It's literally just for the string export but you can keep add it urself in a local file.
name = ''#@param{type:'string'}
number = ''#@param{type:'string'}
email = ''#@param{type:'string'}
portfolio = ''#@param{type:'string'}

In [None]:
#@title Objective / Mission Statement
#@markdown Run this cell to generate a resume objective. It'll prompt you to input an objective. You will need to use your brain for this one. Or just leave it blank because apparently objectives on resumes are outdated (they are good for padding with buzzwords from the listing though).
#@markdown
#@markdown I use a truly garbage system to print out 30% of the most important sentences in the job_summary - a summary of a summary basically.
#@markdown
job_title = get_title(posting.job_title[0])
job_summary = summarize_posting(posting.job_summary[0])
objective = gen_objective(job_title, job_summary)

In [None]:
#@title Work Experience
#@markdown Here's what happens when you run this cell:
#@markdown * Takes the responsibilities listed in the job posting and converts them into past tense
#@markdown * Compares those phrases to the work experience in your resume using fuzzy matching
#@markdown * Attributes those phrases to the job title with the highest matching experience

# convert job responsibilities to past experience phrases
job_actions = get_actions(posting.job_responsibilities)

# match experiences to resume experiences and jobs
past_actions = resume.my_actions.values
matches = match_actions(job_actions, past_actions)
action_job_matches = match_action_to_job(job_actions, past_actions, resume)

In [None]:
#@title Skills & Interests
#@markdown Here's what happens when you run this cell:
#@markdown * Extracts noun clusters from qualifications and responsibilities from the job posting
#@markdown * Sorts them into interests and skills depending on the format - there's a pretty high chance at repetition between these lists so watch out.
interests = extract_interests(posting.job_skills)
skills = extract_skills(posting.job_skills)

In [None]:
#@title Write Resume
#@markdown If you've successfully run all the cells above, this cell should run and end with a prompt to download a .txt of a lightly formatted resume.
#@markdown
#@markdown **It will not be a good resume.** It will require a lot of human input to cull the noise and check for relevancy and accuracy to your actual experiences.
#@markdown
#@markdown In an ideal world, this new resume will serve as a starting point for resume customizations and will ultimately rely on you to fix it up and make it legible for humans.
#@markdown
#@markdown In our actual world, it's the messy result of a 5 hours coding session featured in [this video](https://youtu.be/Kpm8rEywBDQ).
# initialize. not really necessary but why not.
new_resume = ''

# header information
new_resume += '{}\n{} | {} | {}'.format(name, number, email, portfolio)

# objective header
new_resume += '\n\nOBJECTIVE\n'+('─' * 50)

# add objective
new_resume += '\n{}\n\n'.format(objective)

# work experience header
new_resume += 'WORK EXPERIENCE\n'+('─' * 50) +'\n'

# add jobs and experiences
for t in action_job_matches.keys():
  new_resume += "{}\n".format(t)
  for a in action_job_matches[t]:
    new_resume += '- {}\n'.format(a)
  new_resume += '\n'

# skills experience header
new_resume += 'SKILLS\n'+('─' * 50) +'\n'
new_resume += ', '.join(skills) + '\n\n'

# interests experience header
new_resume += 'INTERESTS\n'+('─' * 50) +'\n'
new_resume += ', '.join(interests)

# write to string and prompt download as a .txt
textfile = open('{} Resume.txt'.format(job_title), 'w')
textfile.write(new_resume)
textfile.close()
files.download('{} Resume.txt'.format(job_title))
