# Resume Document Content Generation

The task is to:

- Pick randomly a job and a skill set related to the given job
- Create prompt messages from the job and skill set
- Give the prompt message to openai api
- Map the openai api response into `pandas DataFrame`
- Save the `DataFrame` into a `parquet` file and compres it into `gzip`

## Import dependencies

In [None]:
import pandas as pd
import openai
import os
import random
import json

from pathlib import Path
from tqdm import tqdm

## Get the Openai api key

Openai api key has to be saved in to your environment variables with key `OPENAI_API_KEY`.

In [None]:
openai.api_key = os.getenv("OPENAI_API_KEY")

## Jobs and skill sets

We're going to have multiple jobs as data labels and skills related to each job

In [None]:
with open("skills.json") as f:
    skills = json.load(f)

In [None]:
jobs = list(skills.keys())

## Create a prompt message

The message is based on randomly selected experience, job and set of skills for the given job.

In [None]:
def create_message_and_job():
    experience = random.randint(0, 40)
    job = random.choice(jobs)

    job_skills = skills[job]

    stack = list(dict.fromkeys(random.choices(job_skills, k=random.randint(1, len(job_skills)))))

    skill_stack = ", ".join(stack[0: len(stack) - 2]) + " and " + stack[len(stack) - 1] + "."
    content = "Create a resume for " + job + ", with " + str(experience) + " of years experience of " + skill_stack

    return {
        "job": job,
        "stack": stack,
        "message": [{ "role": "user", "content": content }],
    }

## Generate resume documents

### Generate resume content

Resume document content is generated with openai api using `gpt-3.5-turbo` LLM (Large Language Model). ChatGPT was based on the same LLM. The output will be a dictionary consisting of the job with experience as label and resume content.

In [None]:
def generate_resume():
    messages_and_job = create_message_and_job()
    
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=messages_and_job["message"],
    )

    return {
        "job": messages_and_job["job"],
        "stack": messages_and_job["stack"],
        "resume": response["choices"][0]["message"]["content"],
    }


### Generate multiple resumes

This function uses previous `generate_resume` function and uses it given `k` times to create resume document content. Finally resumes will be added into a `pandas DataFrame` and returned.

In [None]:
def generate_resumes(k):
    df = pd.DataFrame()
    
    resume_jobs = []
    resume_skills = []
    resumes = []



    for i in tqdm(range(k)):
        try:
            resume = generate_resume()
            resume_jobs.append(resume["job"])
            resume_skills.append(resume["stack"])
            resumes.append(resume["resume"])
        except Exception as e:
            print(e)
            break

    df["jobs"] = resume_jobs
    df["skills"] = resume_skills
    df["resumes"] = resumes

    return df

Let's try to generate 10 resumes into a `DataFrame`

In [None]:
df_resumes = generate_resumes(10)

and save them into a `parquet` file

In [None]:
filepath = Path("../data/df.resumes.gzip")
filepath.parent.mkdir(parents=True, exist_ok=True)

In [None]:
df_old = pd.read_parquet(filepath)

if df_old.size > 0:
    pd.concat([df_old, df_resumes]).drop_duplicates(subset=["resumes"]).to_parquet(filepath, compression="gzip")
else:
    df_resumes.to_parquet(filepath, compression="gzip")

In [None]:
pd.read_parquet(filepath)