# Experimental notebook on using Langchain

### Covered experiments: 
1. OpenAI API call 
1. PyPDF2 file reading
1. Prompt template
1. Chain 
1. Agent
1. Memory

In [1]:
import os
from dotenv import load_dotenv

load_dotenv()

True

## OpenAI API call

In [127]:
from langchain.llms import OpenAI

llm = OpenAI(temperature=0.9, max_tokens=3000)
# name = llm.predict("t a fency name for this.")
# print(name)

## Read the resume PDF file

In [2]:
PDF_path = '/Users/yusali/Library/CloudStorage/OneDrive-UniversityofToronto/Yussaaa/JOB/Resume_Yusa Li_MLEngineer.pdf'

In [102]:
# importing required modules
import PyPDF2
 
# creating a pdf file object
pdfFileObj = open(PDF_path, 'rb')
 
# creating a pdf reader object
pdfReader = PyPDF2.PdfReader(pdfFileObj)
 
# printing number of pages in pdf file
print(len(pdfReader.pages))
 
# creating a page object
pageObjb = pdfReader.pages[0]
pdf_text = pageObj.extract_text()
# extracting text from page
print(pdf_text)
 
# closing the pdf file object
pdfFileObj.close()

1
Yusa Li Toronto, ON, M4Y 0E8 | 780-707-7844 | yusa.li@mail.utoronto.ca | linkedin.com/in/yusa-li/ | github.com/yussaaa  Enthusiastic ML Engineer with experience in Canadian CPI production, 4+ years of experience on end-to-end ML projects such as NLP classification, named entity extraction, anomaly detection, ML pipeline orchestration and Azure cloud deployment HIGHLIGHTS and TECHNICAL SKILLS          • Self-starter with strong initiative in identifying and solving problems through research, development and testing • Advanced analytical skills developed through projects in areas of finance, retail, marketing using millions of rows dataset  • Analytics & Visualization: SQL · Tableau · Power BI · SparkSQL · BigQuery · Azure Synapse · Matplotlib · Seaborn · Plotly • Big Data & Cloud:                Spark · Databricks · Azure · AWS · GCP · Terraform  • Data Science tools:             PyTorch · TensorFlow · scikit-learn · Pandas · NumPy · Spacy · Label Studio · Great Expectations • Program

In [3]:
## pdftotext preserves the formats from the PDF while PyPDF2 lost them and only returns the text

import pdftotext
# Load your PDF
with open(PDF_path, "rb") as f:
    pdf = pdftotext.PDF(f)
# Read all the text into one string
pdftotext_text = "\n\n".join(pdf)

## Prompot Templates

In [139]:
from langchain.prompts import PromptTemplate

prompt_template_resume_fix = PromptTemplate(
    input_variables =['resume'],
    template = "Help me fix the grammar issues in the following resume \n {resume} \n Bold any changes with ** around the text"
)
p = prompt_template_resume_fix.format(resume=pdftotext_text)
# print(p)

In [140]:
output = llm.predict(p)

In [141]:
len(output)

4162

In [167]:
prompt_template_resume_fix

PromptTemplate(input_variables=['resume'], output_parser=None, partial_variables={}, template='Help me fix the grammar issues in the following resume \n {resume} \n Bold any changes with ** around the text', template_format='f-string', validate_template=True)

## Chains

### Simple chain

In [165]:
from langchain.chains import LLMChain

# Simple chain
chain = LLMChain(llm=llm, prompt=)
response = chain.run(pdftotext_text)

ValidationError: 1 validation error for LLMChain
prompt
  value is not a valid dict (type=type_error.dict)

In [116]:
from langchain.chains import LLMChain

# Simple chain
chain = LLMChain(llm=llm, prompt=prompt_template_resume_fix)
chain.run(pdftotext_text)

'\n\nYusa Li\nToronto, ON, M4Y 0E8 | 780-707-7844 | yusa.li@mail.utoronto.ca | linkedin.com/in/yusa-li/ | github.com/yussaaa\nEnthusiastic ML Engineer with experience in Canadian CPI production, 4+ years of experience on end-to-end ML projects such as NLP classification, named entity extraction, anomaly detection, ML pipeline orchestration and Azure cloud deployment\nHIGHLIGHTS and TECHNICAL SKILLS\n• Self-starter with strong initiative in identifying and solving problems through research, development, and testing\n• Advanced analytical skills developed through projects in areas of finance, retail, and marketing using millions of rows of data\n• Analytics & Visualization: SQL, Tableau, Power BI, SparkSQL, BigQuery, Azure Synapse, Matplotlib, Seaborn, Plotly\n• Big Data & Cloud: Spark, Databricks, Azure, AWS, GCP, Terraform\n• Data Science tools: PyTorch, TensorFlow, scikit-learn, Pandas, NumPy, Spacy, Label Studio, Great Expectations\n• Programming: Python'

### Sequential chains

In [10]:
llm = OpenAI(temperature=0.7)

prompt_template_name = PromptTemplate(
    input_variables =['cuisine'],
    template = "I want to open a restaurant for {cuisine} food. Suggest a fency name for this."
)

name_chain =LLMChain(llm=llm, prompt=prompt_template_name, output_key="restaurant_name")

In [11]:
llm = OpenAI(temperature=0.7)

prompt_template_items = PromptTemplate(
    input_variables = ['restaurant_name'],
    template="Suggest some menu items for {restaurant_name}."
)

food_items_chain =LLMChain(llm=llm, prompt=prompt_template_items, output_key="menu_items")

In [12]:
from langchain.chains import SequentialChain

chain = SequentialChain(
    chains = [name_chain, food_items_chain],
    input_variables = ['cuisine'],
    output_variables = ['restaurant_name', "menu_items"]
)

In [13]:
chain({"cuisine": "Chinese"})

{'cuisine': 'Chinese',
 'restaurant_name': '\n\nDragon Palace',
 'menu_items': '\n\n-Stir-Fried Prawns with Garlic and Chili\n-Honey-Glazed Barbecued Pork\n-Crispy Duck with Hoisin Sauce\n-Wok-Fried Beef with Oyster Sauce\n-Spicy Kung Pao Chicken\n-Szechuan-Style Vegetable Medley\n-Cantonese-Style Chow Mein\n-Egg Fried Rice\n-Vegetable Spring Rolls\n-Hot and Sour Soup\n-Mango Pudding'}

## Agents

In [None]:
from langchain.agents.load_tools import get_all_tool_names

get_all_tool_names()

In [24]:
from langchain.agents import AgentType, initialize_agent, load_tools
from langchain.llms import OpenAI

llm = OpenAI(temperature=0)

# The tools we'll give the Agent access to. Note that the 'llm-math' tool uses an LLM, so we need to pass that in.
tools = load_tools(["serpapi", "llm-math"], llm=llm)

# Finally, let's initialize an agent with the tools, the language model, and the type of agent we want to use.
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)

# Let's test it out!
agent.run("What was the GDP of US in 2022 plus 5?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to find the GDP of US in 2022
Action: Search
Action Input: US GDP in 2022[0m
Observation: [36;1m[1;3m$25.46 trillion[0m
Thought:[32;1m[1;3m I need to add 5 to this number
Action: Calculator
Action Input: 25.46 + 5[0m
Observation: [33;1m[1;3mAnswer: 30.46[0m
Thought:[32;1m[1;3m I now know the final answer
Final Answer: The GDP of US in 2022 plus 5 is $30.46 trillion.[0m

[1m> Finished chain.[0m


'The GDP of US in 2022 plus 5 is $30.46 trillion.'

In [26]:
# install this package: pip install wikipedia

# The tools we'll give the Agent access to. Note that the 'llm-math' tool uses an LLM, so we need to pass that in.
tools = load_tools(["wikipedia", "llm-math"], llm=llm)

# Finally, let's initialize an agent with the tools, the language model, and the type of agent we want to use.
agent = initialize_agent(
    tools, 
    llm, 
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, 
    verbose=True
)

# Let's test it out!
agent.run("When was Elon musk born? What is his age right now in 2023?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to find out when Elon Musk was born and then calculate his age.
Action: Wikipedia
Action Input: Elon Musk[0m
Observation: [36;1m[1;3mPage: Elon Musk
Summary: Elon Reeve Musk ( EE-lon; born June 28, 1971) is a business magnate and investor. Musk is the founder, chairman, CEO and chief technology officer of SpaceX;  angel investor, CEO, product architect and former chairman of Tesla, Inc.; owner, chairman and CTO of X Corp.; founder of the Boring Company; co-founder of Neuralink and OpenAI; and president of the Musk Foundation. He is the wealthiest person in the world, with an estimated net worth of US$226 billion as of September 2023, according to the Bloomberg Billionaires Index, and $249 billion according to Forbes, primarily from his ownership stakes in both Tesla and SpaceX.Musk was born in Pretoria, South Africa, and briefly attended the University of Pretoria before immigrating to Canada at age 18, acquiring c

'Elon Musk was born in 1971 and is 52 years old in 2023.'

## Memory

https://python.langchain.com/docs/modules/memory/

#### Note: 
Be careful with the memory setting. All the history will be passed to the API called and incur cost. 

Could use ConversationBufferWindowMemory with small k to limit the size of the history

from langchain.memory import ConversationBufferWindowMemory

from langchain.chains import ConversationChain

memory = ConversationBufferWindowMemory(k=3)

convo = ConversationChain(
    llm=OpenAI(temperature=0.7),
    memory=memory
)
convo.run("Who won the first cricket world cup?")

In [29]:
convo.run("How much is 5+5?")

' 10.'

In [30]:
convo.run("Who was the captain of the winning team?")

" I'm sorry, I don't know the answer to that question."

In [34]:
print(convo.memory.buffer)

Human: Who was the captain of the winning team?
AI:  I'm sorry, I don't know the answer to that question.


# Parse job posting from LinkedIn

### Attempt using only BS4, failed due to the Show More button

In [55]:
import requests
from bs4 import BeautifulSoup

URL = "https://www.linkedin.com/jobs/view/3664919987"

page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")

In [41]:
soup.find('h1').text

'Machine Learning Engineer - Contract'

In [45]:
soup.find('a', class_='topcard__org-name-link topcard__flavor--black-link').text.strip()

'Wave HQ'

In [46]:
soup.find('span', class_='topcard__flavor topcard__flavor--bullet').text.strip()


'Toronto, Ontario, Canada'

In [47]:
soup.find('span', class_='posted-time-ago__text topcard__flavor--metadata').text.strip()


'1 month ago'

In [87]:
soup.find('div', class_='show-more-less-html__markup relative overflow-hidden')


In [None]:
soup

### Need selenium to click the "Show More" button to reveal the full job description

In [4]:
test_job_url = 'https://www.linkedin.com/jobs/view/3664919987'

In [11]:
from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()

driver.get("https://www.linkedin.com/jobs/view/3664919987")

show_more_button = driver.find_element(By.CLASS_NAME, "show-more-less-html__button")
driver.execute_script("arguments[0].click();", show_more_button)

updated_html = driver.page_source

soup = BeautifulSoup(updated_html, "html.parser")

driver.quit()

NameError: name 'BeautifulSoup' is not defined

In [90]:
job_name = soup.find('h1').text
job_name

'Machine Learning Engineer - Contract'

In [96]:
company_name = soup.find('a', class_='topcard__org-name-link topcard__flavor--black-link').text.strip()
company_name

'Wave HQ'

In [92]:
job_location = soup.find('span', class_='topcard__flavor topcard__flavor--bullet').text.strip()
job_location

'Toronto, Ontario, Canada'

In [93]:
posted_time = soup.find('span', class_='posted-time-ago__text topcard__flavor--metadata').text.strip()
posted_time

'1 month ago'

In [94]:
job_description = soup.find('div', class_='show-more-less-html__markup relative overflow-hidden').text.strip()
job_description

"We believe small businesses are at the heart of our communities, and championing them is worth fighting for. We empower small business owners to manage their finances fearlessly, by offering the simplest, all-in-one financial management solution they can't live without.About The RoleWe are looking for a Machine Learning Engineer who will strengthen our capacity to improve the scalability, maintainability and adaptability of our ML practice. This individual will be part of the ML team and report to the Director of Data.The ML team is responsible for developing machine learning models as point solutions for our functional and product stakeholders within the business. We leverage MLOps to accelerate and systematize model development and the management of machine learning infrastructure.Machine Learning is part of Wave’s wider Data function, and works closely with Data Engineers, Analytic Engineers, and Data Analysts who comprise the Analytics and Data Operations and Platform groups. Our 

In [97]:
info = [job_name, company_name, job_location, posted_time, job_description]

In [100]:
'; '.join(info)

"Machine Learning Engineer - Contract; Wave HQ; Toronto, Ontario, Canada; 1 month ago; We believe small businesses are at the heart of our communities, and championing them is worth fighting for. We empower small business owners to manage their finances fearlessly, by offering the simplest, all-in-one financial management solution they can't live without.About The RoleWe are looking for a Machine Learning Engineer who will strengthen our capacity to improve the scalability, maintainability and adaptability of our ML practice. This individual will be part of the ML team and report to the Director of Data.The ML team is responsible for developing machine learning models as point solutions for our functional and product stakeholders within the business. We leverage MLOps to accelerate and systematize model development and the management of machine learning infrastructure.Machine Learning is part of Wave’s wider Data function, and works closely with Data Engineers, Analytic Engineers, and 

In [145]:
info_dict = {'Job Name': job_name, 
            'Company Name': company_name, 
            'Job Location': job_location, 
            'Posted Time': posted_time, 
            'Job Description': job_description}

In [148]:
job_descp = info_dict['Job Description']

### JD summarization

In [162]:
from langchain.prompts import PromptTemplate

prompt_template_JD_summary = PromptTemplate(
    input_variables =['jd'],
    template = "Summarize the following Job description into two parts. 1. What this role need to do 2. What skills does this role require 3. Benefits of working in this company \
                Each part should have 5 bullet points, shorten the bullet points into keywords only, and make sure to include all the technologies. Make sure all the tools and tech mentioned are covered \
                Output to be in markdown format, with bold part tile, bullet point name \n\n \
                {jd}"
)
p = prompt_template_JD_summary.format(jd=job_descp)

In [176]:
from langchain.prompts import PromptTemplate

prompt_template_JD_summary = PromptTemplate(
    input_variables =['jd'],
    template = "Summarize the following Job description into two parts. 1. What this role need to do 2. What skills does this role require 3. Benefits of working in this company \
                In the end of each part, put 2 new lines\
                Each part should have 5 bullet points, shorten the bullet points into keywords only, and make sure to include all the technologies. Make sure all the tools and tech mentioned are covered \
                Output to be in markdown format, with bold part tile, bullet point name \n\n \
                {jd}"
)
p = prompt_template_JD_summary.format(jd=job_descp)

In [177]:
jd_summary = llm.predict(p)

In [180]:
jd_summary.strip().split('\n\n')

['**What this role need to do**\n- Analyze & engineer features\n- Train & deploy models\n- Maintain machine learning sys.\n- Automate system architectures\n- Build & deploy ML models',
 '**What skills does this role require**\n- 3+yrs exp. ML sys.\n- ML Algorithms:DT,GB,NB,SVM\n- Python & SQL\n- AWS, SageMaker, Docker\n- MLOps, NLP, LLM, LLM Frameworks',
 '**Benefits of working in this company**\n- Diverse & Inclusive Culture\n- Work from Where You Work Best\n- Investment in Health & Wellness\n- Learning Experiences & Education\n- Fair Compensation & Perks']

# Backend helper function testing

In [5]:
from src.linkedin_jd_parser import linkedin_jd_parser

In [6]:
job_test = linkedin_jd_parser(test_job_url)

In [7]:
job_test['Job Description']

"We believe small businesses are at the heart of our communities, and championing them is worth fighting for. We empower small business owners to manage their finances fearlessly, by offering the simplest, all-in-one financial management solution they can't live without.About The RoleWe are looking for a Machine Learning Engineer who will strengthen our capacity to improve the scalability, maintainability and adaptability of our ML practice. This individual will be part of the ML team and report to the Director of Data.The ML team is responsible for developing machine learning models as point solutions for our functional and product stakeholders within the business. We leverage MLOps to accelerate and systematize model development and the management of machine learning infrastructure.Machine Learning is part of Wave’s wider Data function, and works closely with Data Engineers, Analytic Engineers, and Data Analysts who comprise the Analytics and Data Operations and Platform groups. Our 

### Note on the useage between ChatOpenAI and OpenAI API

The LLM OpenAI API is more like a general language model to predict the next word. It had hard time to understand the prompt instructions. It only make prediction based on the given words. 

Whild the ChatOpenAI API is the same as th ChatGPT service is using. The instructions could be easily understood. Thus, in this use case, ChatOpenAI API should be used instead. 

In [41]:
from src.langchain_helper import summarize_job_description, resume_jd_skill_match

In [42]:
jd_summary = summarize_job_description(job_test['Job Description'])

In [68]:
jd_summary_test = llm.predict(f"Summarize the following Job description {job_test['Job Description']} into three parts. 1. What this role need to do 2. What skills does this role require 3. Benefits of working in this company \
                Each part should have 5 bullet points, shorten the bullet points into keywords only, and make sure to include all the technologies. Make sure all the tools and tech mentioned are covered \
                Output to be in markdown format, with bold part tile, bullet point name \n\n \
                ")

In [69]:
print(jd_summary_test)

**Role Responsibilities:**

- Strengthen ML practice scalability, maintainability, and adaptability
- Develop machine learning models for functional and product stakeholders
- Leverage MLOps for model development and management
- Collaborate with Agile team and Wave stakeholder teams
- Automate and maintain system architecture for machine learning

**Required Skills:**

- 3+ years of experience in production machine learning systems
- Strong foundational knowledge in machine learning
- Proficiency in Python, SQL, AWS, Amazon SageMaker, and Docker
- Practical knowledge of MLOps and building pipelines
- Bonus: experience with natural language processing and LLM frameworks

**Benefits of Working at Wave:**

- Flexible work location (office or remote)
- Support for personal growth and learning
- Investment in health and wellness
- Competitive compensation and office perks
- Inclusive and diverse company culture


In [None]:
prompt_template_JD_summary = PromptTemplate(
    input_variables =['jd'],
    template = "Summarize the following Job description {jd} into three parts. 1. What this role need to do 2. What skills does this role require 3. Benefits of working in this company \
                Each part should have 5 bullet points, shorten the bullet points into keywords only, and make sure to include all the technologies. Make sure all the tools and tech mentioned are covered \
                Output to be in markdown format, with bold part tile, bullet point name \n\n \
                "
)

name_chain = LLMChain(llm=llm, prompt=prompt_template_JD_summary)

response = name_chain.run(prompt_template_JD_summary)

In [55]:
print(jd_summary)



**1. What this role need to do**
* Develop innovative solutions with Cloud technologies
* Build and maintain microservices
* Design and implement scalable systems
* Deploy and maintain applications
* Troubleshoot and debug systems

**2. What skills does this role require**
* Expertise in Cloud technologies
* Knowledge of microservices
* Experience with various programming languages
* Proficiency in database management
* Ability to manage large scale systems

**3. Benefits of working in this company**
* Competitive salary and benefits
* Opportunity to work with cutting-edge technologies
* Exposure to software engineering best practices
* Flexible working hours and remote work
* Opportunity to learn and grow


In [15]:
skills_required = jd_summary.strip().split('\n\n')[1]

In [16]:
from src.langchain_helper import resume_jd_skill_match


In [17]:
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.chains import SequentialChain

import os
from dotenv import load_dotenv
load_dotenv()

from langchain.chat_models.openai import ChatOpenAI

llm = ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0.0, max_tokens=3000)

def resume_jd_skill_match(resume, job_skills):

    prompt_template_JD_summary = PromptTemplate(
    input_variables = ['resume','job_skills'],
    template = "Compare and show the skills listed in {job_skills}\
                are 1. Fully matched \
                    2. Partial matched \
                    3. Not matched \
                with the skills in {resume}"
)

    chain = LLMChain(llm=llm, prompt=prompt_template_JD_summary)

    response = chain.run({"resume": resume, 
                          "job_skills": job_skills})

    return response


In [49]:
test_match = resume_jd_skill_match2(pdftotext_text, skills_required)

In [56]:
print(skills_required)

- Develop: React, Node.js, MongoDB, GraphQL
- Design: UI/UX
- Maintain: Database
- Integrate: 3rd-Party APIs
- Troubleshoot: Bugs


In [31]:
llm = ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0.0, max_tokens=3000)

prompt_template_JD_summary = PromptTemplate(
    input_variables = ['resume','job_skills'],
    template = "Compare and show the skills listed in {job_skills}\
                are 1. Fully matched \
                    2. Partial matched \
                    3. Not matched \
                with the skills in {resume}"
)

    chain = LLMChain(llm=llm, prompt=prompt_template_JD_summary)

    response = chain.run({"resume": resume, 
                          "job_skills": job_skills})


1. Fully matched: Design: UI/UX, Integrate: 3rd-Party APIs, Troubleshoot: Bugs
2. Partial matched: Develop: React, Node.js, MongoDB, GraphQL, Maintain: Database
3. Not matched: Machine Learning Engineer, Analytics & Visualization, Big Data & Cloud, Data Science tools, Programming, Certification, Self-starter, Advanced analytical skills, NLP classification, Named entity extraction, Anomaly detection, ML pipeline orchestration, Azure cloud deployment, SQL, Tableau, Power BI, SparkSQL, BigQuery, Azure Synapse, Matplotlib, Seaborn, Plotly, Spark, Databricks, Azure, AWS, GCP, Terraform, PyTorch, TensorFlow, scikit-learn, Pandas, NumPy, Spacy, Label Studio, Great Expectations, Python, R, Linux/Bash, Git, Docker, JavaScript, Flask, Azure Data Scientist Associate, AWS Solution Architect Associate, Data Validation Pipeline, Multi-hop data transferring project, Named Entity Recognition, Outlier Detection, Amazon product historical price tracking tool, Keepa, WebFOCUS, Testing Plan, Business Int

In [57]:
## Test the chat backend API 

In [15]:
from src.langchain_helper_chatapi import summarize_job_description, resume_jd_skill_match

In [32]:
llm = OpenAI(temperature=0.3)

def summarize_job_description(jd):
    """_summary____

    Args:
        jd (_type_): _description_
    """

    # prompt_template_JD_summary = PromptTemplate(
    # input_variables =['jd'],
    template = "Summarize the following Job description {jd} into three parts. 1. What this role need to do 2. What skills does this role require 3. Benefits of working in this company \
                Each part should have 5 bullet points, shorten the bullet points into keywords only, and make sure to include all the technologies. Make sure all the tools and tech mentioned are covered \
                Output to be in markdown format, with bold part tile, bullet point name \n\n \
                "
# )

    # name_chain = LLMChain(llm=llm, prompt=prompt_template_JD_summary)

    # response = name_chain.run(prompt_template_JD_summary)
    response = llm.predict(template)

    return response

In [45]:
llm = ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0.2, max_tokens=3000)
template = "Summarize and extract information from the job description {jd} into three parts. 1. What this role need to do 2. What skills does this role require 3. Benefits of working in this company \
                Each part should have 5 bullet points, shorten the bullet points into keywords only, and make sure to include all the technologies. Make sure all the tools and technologies mentioned are in the output \
                Output to be in markdown format, with bold part tile, bullet point name \n\n \
                "
response = llm.call_as_llm(template)

In [26]:
js_sum_2 = summarize_job_description(job_test['Job Description'])

In [46]:
print(response)

**1. What this role needs to do:**

- Develop and maintain software applications and systems
- Collaborate with cross-functional teams to gather requirements and design solutions
- Conduct code reviews and ensure adherence to coding standards
- Troubleshoot and debug software issues
- Implement and maintain software documentation

**2. Skills required for this role:**

- Proficiency in programming languages such as Java, Python, and C++
- Experience with web development frameworks like Angular and React
- Knowledge of database management systems such as MySQL and MongoDB
- Familiarity with version control systems like Git
- Strong problem-solving and analytical skills

**3. Benefits of working in this company:**

- Competitive salary and benefits package
- Opportunities for professional growth and career advancement
- Collaborative and inclusive work environment
- Cutting-edge technologies and tools
- Work-life balance initiatives


In [35]:
llm.memory

AttributeError: 'ChatOpenAI' object has no attribute 'memory'

In [13]:
print(js_sum_2)

**1. What this role need to do:**

- Develop and maintain software applications
- Collaborate with cross-functional teams to gather requirements and design solutions
- Write clean, efficient, and scalable code
- Conduct unit testing and debugging of applications
- Stay updated with emerging technologies and industry trends

**2. What skills does this role require:**

- Proficiency in programming languages such as Java, Python, and C++
- Experience with web development frameworks like React and Angular
- Strong knowledge of database management systems, such as MySQL and MongoDB
- Familiarity with version control systems like Git
- Excellent problem-solving and analytical skills

**3. Benefits of working in this company:**

- Competitive salary and benefits package
- Opportunities for career growth and professional development
- Collaborative and inclusive work environment
- Cutting-edge technologies and tools
- Work-life balance and flexible work arrangements


In [27]:
print(js_sum_2)

**1. What this role need to do:**

- Develop and maintain software applications
- Collaborate with cross-functional teams to gather requirements and design solutions
- Write clean, efficient, and maintainable code
- Conduct unit testing and debugging of applications
- Stay up-to-date with industry trends and technologies

**2. What skills does this role require:**

- Strong proficiency in programming languages such as Java, Python, and C++
- Experience with web development frameworks like Angular and React
- Knowledge of database management systems such as MySQL and MongoDB
- Familiarity with version control systems like Git
- Understanding of software development methodologies and best practices

**3. Benefits of working in this company:**

- Competitive salary and benefits package
- Opportunities for professional growth and career advancement
- Collaborative and inclusive work environment
- Cutting-edge technologies and tools
- Work-life balance and flexible work arrangements


In [67]:
job_test['Job Description']

"We believe small businesses are at the heart of our communities, and championing them is worth fighting for. We empower small business owners to manage their finances fearlessly, by offering the simplest, all-in-one financial management solution they can't live without.About The RoleWe are looking for a Machine Learning Engineer who will strengthen our capacity to improve the scalability, maintainability and adaptability of our ML practice. This individual will be part of the ML team and report to the Director of Data.The ML team is responsible for developing machine learning models as point solutions for our functional and product stakeholders within the business. We leverage MLOps to accelerate and systematize model development and the management of machine learning infrastructure.Machine Learning is part of Wave’s wider Data function, and works closely with Data Engineers, Analytic Engineers, and Data Analysts who comprise the Analytics and Data Operations and Platform groups. Our 

## Prompt engineering with ChatOpenAi api

In [47]:
from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    AIMessagePromptTemplate,
    HumanMessagePromptTemplate,
)
from langchain.schema import AIMessage, HumanMessage, SystemMessage

In [67]:
chat = ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0.1, max_tokens=3000)

In [77]:

system_message_prompt = SystemMessagePromptTemplate(
    prompt=PromptTemplate(
            template="You are summarizing a job description for a chatbot.\
                    Provide a concise summary of the job description in three parts.\
                    Each part has 5 bullet points. Make sure all the skills, technologies are covered.",
            input_variables=[]
            )
)

human_message_prompt = HumanMessagePromptTemplate(
        prompt=PromptTemplate(
            template="The following is the job description {jd}?",
            input_variables=["jd"],
        )
    )
chat_prompt_template = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])
chat = ChatOpenAI(temperature=0.9)
chain = LLMChain(llm=chat, prompt=chat_prompt_template)
response_jd_summary = chain.run(job_test['Job Description'])

print(response_jd_summary)


Summary:
1. Responsibilities:
   - Strengthen the scalability, maintainability, and adaptability of the machine learning (ML) practice
   - Develop ML models as solutions for functional and product stakeholders
   - Collaborate with Agile teams and stakeholders to build and deploy models that address business objectives
   - Analyze and engineer features using large amounts of data from multiple sources
   - Automate and maintain a system architecture that supports ML at scale

2. Requirements:
   - 3+ years of experience implementing and maintaining production ML systems
   - Strong foundational knowledge in ML and experience with various classification models
   - Proficiency in Python, SQL, AWS, Amazon SageMaker, and Docker
   - Practical knowledge of MLOps and building pipelines for model training and deployment
   - Bonus: experience with natural language processing and large language models

3. Company Culture:
   - Self-motivated and autonomous work style
   - Value collaboratio

In [78]:
print(response_jd_summary.split('\n\n')[1])

2. Requirements:
   - 3+ years of experience implementing and maintaining production ML systems
   - Strong foundational knowledge in ML and experience with various classification models
   - Proficiency in Python, SQL, AWS, Amazon SageMaker, and Docker
   - Practical knowledge of MLOps and building pipelines for model training and deployment
   - Bonus: experience with natural language processing and large language models


In [79]:
system_message_prompt = SystemMessagePromptTemplate(
    prompt=PromptTemplate(
            template="You are comparing the skills listed in a job description with the skills mentioned in a resume. \
                    Categorize skills in the job description skills into 1. Fully matched, 2. Partial matched, 3. Not matched \
                    ",
            input_variables=[]
            )
)

human_message_prompt = HumanMessagePromptTemplate(
        prompt=PromptTemplate(
            template="Job description skills: {job_skill_required} \n\
                    Resume {resume} ",
            input_variables=['resume','job_skill_required'],
        )
    )
chat_prompt_template = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])
chain = LLMChain(llm=chat, prompt=chat_prompt_template)
response = chain.run({'resume': resume_skills_summary, 
                        'job_skill_required': response_jd_summary.split('\n\n')[1]})

print(response)

Fully matched skills:
- Proficiency in Python, SQL, AWS, Amazon SageMaker, and Docker

Partial matched skills:
- 3+ years of experience implementing and maintaining production ML systems
- Strong foundational knowledge in ML and experience with various classification models
- Practical knowledge of MLOps and building pipelines for model training and deployment

Not matched skills:
- Bonus: experience with natural language processing and large language models

Note: The provided resume does not explicitly mention experience with ML systems, ML foundational knowledge, MLOps, or NLP.


In [63]:
llm = OpenAI(temperature=0.1, max_tokens=3000)

def summarize_resume_skills(resume):

    prompt_template_resume_skill_summary = PromptTemplate(
    input_variables =['resume'],
    template = "List all the skills/ technoolgies mentioned in the follow resume {resume}"
)

    name_chain = LLMChain(llm=llm, prompt=prompt_template_resume_skill_summary)

    response = name_chain.run(resume)

    return response


In [64]:
resume_skills_summary = summarize_resume_skills(pdftotext_text)

In [65]:
print(resume_skills_summary)


Skills/Technologies: 
- Self-starter with strong initiative in identifying and solving problems through research, development and testing
- Advanced analytical skills
- Analytics & Visualization: SQL, Tableau, Power BI, SparkSQL, BigQuery, Azure Synapse, Matplotlib, Seaborn, Plotly
- Big Data & Cloud: Spark, Databricks, Azure, AWS, GCP, Terraform
- Data Science tools: PyTorch, TensorFlow, scikit-learn, Pandas, NumPy, Spacy, Label Studio, Great Expectations
- Programming: Python, SQL (MySQL, PostgreSQL), R, Linux/Bash, Git, Docker, JavaScript, React, Flask
- Certification: Azure Data Scientist Associate, AWS Solution Architect Associate


In [2]:
if True: 
    a = 3
a

3