# Resume Document Content Generation

The task is to:

- Pick randomly a job and a skill set related to the given job
- Create prompt messages from the job and skill set
- Give the prompt message to openai api
- Map the openai api response into `pandas DataFrame`
- Save the `DataFrame` into a `csv` file

## Import dependencies

In [None]:
import pandas as pd
import openai
import os
import random

from pathlib import Path

## Get the Openai api key

Openai api key has to be saved in to your environment variables with key `OPENAI_API_KEY`.

In [None]:
openai.api_key = os.getenv("OPENAI_API_KEY")

## Jobs and skill sets

We're going to have multiple jobs as data labels and skills related to each job

In [None]:
jobs = [
    "Frontend Developer", "Backend Developer", "Fullstack Developer", "DevOps",
    "Data Engineer",
]

skills = {
    "Frontend Developer": [
        "HTML", "CSS", "JavaScript", "TypeScript",
        "ReactJS", "VueJS", "AngularJS", "Flutter",
        "Php", "Linux", "Dart",
    ],
    "Backend Developer": [
        "JavaScript", "TypeScript", "NodeJS", "Java",
        "Kotlin", "Rust", "Go", "Python",
        "Django", "SQL", "MongoDB", "Kafka",
        "Php", "Ruby", "Linux",
    ],
    "DevOps": [
        "Azure DevOps", "Amazon AWS", "Kuberneters", "Docker",
        "Java", "Kotlin", "JavaScript", "TypeScript",
        "Go", "Linux",
    ],
    "Data Engineer": [
        "Python", "SQL", "Machine Learning",
    ],
}

## Create a prompt message

The message is based on randomly selected experience, job and set of skills for the given job.

In [None]:
def create_message_and_job():
    experience = random.randint(0, 20)
    job = random.choice(jobs)

    if experience > 5:
        level = "Senior"
    else:
        level = "Junior"

    if job == "Fullstack Developer":
        all_skills = skills["Frontend Developer"] + skills["Backend Developer"] + skills["DevOps"]
    else:
        all_skills = skills[job]

    stack = list(dict.fromkeys(random.choices(all_skills, k=random.randint(1, len(all_skills)))))

    status = level + " " + job 
    skill_stack = ", ".join(stack[0: len(stack) - 2]) + " and " + stack[len(stack) - 1] + "."
    content = "Create a resume for " + status + ", with " + str(experience) + " of years experience of " + skill_stack

    return {
        "job": status,
        "message": [{ "role": "user", "content": content }],
    }

## Generate resume documents

### Generate resume content

Resume document content is generated with openai api using `gpt-3.5-turbo` LLM (Large Language Model). ChatGPT was based on the same LLM. The output will be a dictionary consisting of the job with experience as label and resume content.

In [None]:
def generate_resume():
    messages_and_job = create_message_and_job()
    
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=messages_and_job["message"],
    )

    return {
        "job": messages_and_job["job"],
        "resume": response["choices"][0]["message"]["content"],
    }


### Generate multiple resumes

This function uses previous `generate_resume` function and uses it given `k` times to create resume document content. Finally resumes will be added into a `pandas DataFrame` and returned.

In [None]:
def generate_resumes(k):
    df = pd.DataFrame()
    
    resume_jobs = []
    resumes = []

    for i in range(k):
        try:
            resume = generate_resume()
        except Exception as e:
            print(e)
            break
        resume_jobs.append(resume["job"])
        resumes.append(resume["resume"])

    df["jobs"] = resume_jobs
    df["resumes"] = resumes

    return df

Let's try to generate 4 resumes into a `DataFrame`

In [None]:
df_resumes = generate_resumes(4)

and save them into a `csv` file

TODO: save into parquet

In [None]:
filepath = Path("test_results/df.resumes.gzip")
filepath.parent.mkdir(parents=True, exist_ok=True)

df_resumes.to_parquet(filepath, compression="gzip")

In [None]:
pd.read_parquet(filepath)