### Hello, Let's connect on Linkedin. https://www.linkedin.com/in/viviankaun/
### I've added basic file upload and download features for convenience. Tutorial from
### https://learn.deeplearning.ai/courses/multi-ai-agent-syst

In [None]:
%pip install crewai==0.28.8 crewai_tools==0.1.6 langchain_community==0.0.29



In [None]:
import warnings
warnings.filterwarnings('ignore')
from crewai import Agent, Task, Crew

In [None]:
import os
import sys
from openai import OpenAI
from google.colab import userdata

open_ai_key = userdata.get('open_ai_key')
os.environ["OPENAI_MODEL_NAME"] = 'gpt-3.5-turbo'
os.environ["OPENAI_API_KEY"] = open_ai_key

## I don't think crewai FileReadTool support pdf file this moment. let's upload md file.  

In [None]:
# Upload files
from google.colab import files
uploaded = files.upload()

# List uploaded files
import os

for filename in uploaded.keys():
    print(f'User uploaded file "{filename}" with length {len(uploaded[filename])} bytes.')

# Accessing the uploaded file
uploaded_filename = list(uploaded.keys())[0]
print(f'Uploaded file path: /content/{uploaded_filename}')

Saving fake_resume.md to fake_resume (2).md
User uploaded file "fake_resume (2).md" with length 3872 bytes.
Uploaded file path: /content/fake_resume (2).md


In [None]:
from crewai_tools import (
  FileReadTool,
  ScrapeWebsiteTool,
  MDXSearchTool,
  SerperDevTool
)

search_tool = SerperDevTool()
scrape_tool = ScrapeWebsiteTool()
read_resume = FileReadTool(file_path=uploaded_filename)
semantic_search_resume = MDXSearchTool(mdx=uploaded_filename)

Inserting batches in chromadb: 100%|██████████| 1/1 [00:01<00:00,  1.40s/it]


In [None]:
# Agent 1: Researcher
researcher = Agent(
    role="Tech Job Researcher",
    goal="Make sure to do amazing analysis on "
         "job posting to help job applicants",
    tools = [scrape_tool, search_tool],
    verbose=True,
    backstory=(
        "As a Job Researcher, your prowess in "
        "navigating and extracting critical "
        "information from job postings is unmatched."
        "Your skills help pinpoint the necessary "
        "qualifications and skills sought "
        "by employers, forming the foundation for "
        "effective application tailoring."
    )
)

In [None]:
# Agent 2: Profiler
profiler = Agent(
    role="Personal Profiler for Engineers",
    goal="Do increditble research on job applicants "
         "to help them stand out in the job market",
    tools = [scrape_tool, search_tool,
             read_resume, semantic_search_resume],
    verbose=True,
    backstory=(
        "Equipped with analytical prowess, you dissect "
        "and synthesize information "
        "from diverse sources to craft comprehensive "
        "personal and professional profiles, laying the "
        "groundwork for personalized resume enhancements."
    )
)

In [None]:
# Agent 3: Resume Strategist
resume_strategist = Agent(
    role="Resume Strategist for Engineers",
    goal="Find all the best ways to make a "
         "resume stand out in the job market.",
    tools = [scrape_tool, search_tool,
             read_resume, semantic_search_resume],
    verbose=True,
    backstory=(
        "With a strategic mind and an eye for detail, you "
        "excel at refining resumes to highlight the most "
        "relevant skills and experiences, ensuring they "
        "resonate perfectly with the job's requirements."
    )
)

In [None]:
# Agent 4: Interview Preparer
interview_preparer = Agent(
    role="Engineering Interview Preparer",
    goal="Create interview questions and talking points "
         "based on the resume and job requirements",
    tools = [scrape_tool, search_tool,
             read_resume, semantic_search_resume],
    verbose=True,
    backstory=(
        "Your role is crucial in anticipating the dynamics of "
        "interviews. With your ability to formulate key questions "
        "and talking points, you prepare candidates for success, "
        "ensuring they can confidently address all aspects of the "
        "job they are applying for."
    )
)

In [None]:
# Task for Researcher Agent: Extract Job Requirements
research_task = Task(
    description=(
        "Analyze the job posting URL provided ({job_posting_url}) "
        "to extract key skills, experiences, and qualifications "
        "required. Use the tools to gather content and identify "
        "and categorize the requirements."
    ),
    expected_output=(
        "A structured list of job requirements, including necessary "
        "skills, qualifications, and experiences."
    ),
    agent=researcher,
    async_execution=True
)

In [None]:
# Task for Profiler Agent: Compile Comprehensive Profile
profile_task = Task(
    description=(
        "Compile a detailed personal and professional profile "
        "using the GitHub ({github_url}) URLs, and personal write-up "
        "({personal_writeup}). Utilize tools to extract and "
        "synthesize information from these sources."
    ),
    expected_output=(
        "A comprehensive profile document that includes skills, "
        "project experiences, contributions, interests, and "
        "communication style."
    ),
    agent=profiler,
    async_execution=True
)

In [None]:
# Task for Resume Strategist Agent: Align Resume with Job Requirements
resume_strategy_task = Task(
    description=(
        "Using the profile and job requirements obtained from "
        "previous tasks, tailor the resume to highlight the most "
        "relevant areas. Employ tools to adjust and enhance the "
        "resume content. Make sure this is the best resume even but "
        "don't make up any information. Update every section, "
        "inlcuding the initial summary, work experience, skills, "
        "and education. All to better reflrect the candidates "
        "abilities and how it matches the job posting."
    ),
    expected_output=(
        "An updated resume that effectively highlights the candidate's "
        "qualifications and experiences relevant to the job."
    ),
    output_file="tailored_resume.md",
    context=[research_task, profile_task],
    agent=resume_strategist
)

In [None]:
# Task for Interview Preparer Agent: Develop Interview Materials

interview_preparation_task = Task(
    description=(
        "Create a set of potential interview questions and talking "
        "points based on the tailored resume and job requirements. "
        "Utilize tools to generate relevant questions and discussion "
        "points. Make sure to use these question and talking points to "
        "help the candiadte highlight the main points of the resume "
        "and how it matches the job posting."
    ),
    expected_output=(
        "A document containing key questions and talking points "
        "that the candidate should prepare for the initial interview."
    ),
    output_file="interview_materials.md",
    context=[research_task, profile_task, resume_strategy_task],
    agent=interview_preparer
)


In [None]:
job_application_crew = Crew(
    agents=[researcher,
            profiler,
            resume_strategist,
            interview_preparer],

    tasks=[research_task,
           profile_task,
           resume_strategy_task,
           interview_preparation_task],

    verbose=True
)

In [None]:
job_application_inputs = {
    'job_posting_url': 'https://www.linkedin.com/jobs/search/?currentJobId=3997679058&distance=100&f_AL=true&f_E=2%2C3%2C4&f_EA=true&f_JIYN=true&f_JT=F%2CC%2CT&geoId=102095887&keywords=data%20engineer&origin=JOBS_HOME_KEYWORD_HISTORY&refresh=true',
    'github_url': 'https://github.com/viviankaun',
    'personal_writeup': """I am currently a Senior Data Engineer at DataTech Solutions in Seattle, where I lead the development of advanced data solutions that enhance data processing and analysis capabilities. My role involves leveraging cutting-edge technologies like Apache Spark and Kafka to build scalable data pipelines, ensuring that data flows seamlessly from source to destination. I am also adept at managing data warehousing solutions with Amazon Redshift and Snowflake, which significantly improves data retrieval and storage efficiency"""
}

In [None]:
### this execution will take a few minutes to run
result = job_application_crew.kickoff(inputs=job_application_inputs)

[1m[95m [DEBUG]: == Working Agent: Tech Job Researcher[00m
[1m[95m [INFO]: == Starting Task: Analyze the job posting URL provided (https://www.linkedin.com/jobs/search/?currentJobId=3997679058&distance=100&f_AL=true&f_E=2%2C3%2C4&f_EA=true&f_JIYN=true&f_JT=F%2CC%2CT&geoId=102095887&keywords=data%20engineer&origin=JOBS_HOME_KEYWORD_HISTORY&refresh=true) to extract key skills, experiences, and qualifications required. Use the tools to gather content and identify and categorize the requirements.[00m
[1m[92m [DEBUG]: == [Tech Job Researcher] Task output: 

[00m
[1m[95m [DEBUG]: == Working Agent: Personal Profiler for Engineers[00m
[1m[95m [INFO]: == Starting Task: Compile a detailed personal and professional profile using the GitHub (https://github.com/joaomdmoura) URLs, and personal write-up (I am currently a Senior Data Engineer at DataTech Solutions in Seattle, where I lead the development of advanced data solutions that enhance data processing and analysis capabilities. M

In [None]:
from IPython.display import Markdown, display
display(Markdown("./tailored_resume.md"))

# Jane Smith

**Email:** jane.smith@example.com  
**Phone:** (987) 654-3210  
**LinkedIn:** [linkedin.com/in/janesmith](https://linkedin.com/in/janesmith)  
**GitHub:** [github.com/janesmith](https://github.com/janesmith)  
**Website:** [janesmith.com](https://janesmith.com)  

---

## Summary

Results-driven Data Engineer with 6 years of experience in designing, implementing, and maintaining scalable data pipelines and architectures. Expertise in ETL processes, data warehousing, and big data technologies. Proven ability to translate business requirements into technical solutions and optimize data workflows for improved performance and efficiency.

---

## Experience

### Senior Data Engineer  
**DataTech Solutions** — Seattle, WA  
*March 2019 – Present*

- Designed and built scalable data pipelines using Apache Spark and Kafka, improving data processing speed by 50%.
- Implemented and maintained data warehousing solutions with Amazon Redshift and Snowflake, enabling efficient data storage and retrieval.
- Developed ETL workflows using Python and Apache Airflow, automating data ingestion and transformation processes.
- Collaborated with data scientists and analysts to ensure data accuracy and integrity for advanced analytics and reporting.

### Data Engineer  
**Insight Analytics Inc.** — San Francisco, CA  
*August 2015 – February 2019*

- Created and managed data pipelines for large-scale data integration projects using Apache Nifi and SQL.
- Optimized existing ETL processes, resulting in a 30% reduction in data processing time.
- Utilized AWS services (S3, Glue, Lambda) to build and deploy data solutions in a cloud environment.
- Monitored and troubleshot data workflows to ensure reliability and performance.

### Junior Data Engineer  
**TechStart LLC** — Austin, TX  
*June 2013 – July 2015*

- Assisted in developing data pipelines and ETL processes for internal reporting systems.
- Contributed to database management tasks, including indexing and query optimization.
- Supported the migration of legacy systems to modern data platforms, enhancing data accessibility.

---

## Education

### Master of Science in Data Engineering  
**University of California** — Berkeley, CA  
*Graduated: May 2013*

- Relevant coursework: Data Warehousing, Big Data Technologies, Advanced SQL, Cloud Computing

### Bachelor of Science in Computer Science  
**University of Texas** — Austin, TX  
*Graduated: May 2011*

- Relevant coursework: Data Structures, Algorithms, Database Systems, Software Engineering

---

## Skills

- **Programming Languages:** Python, SQL, Java, Scala
- **Big Data Technologies:** Apache Spark, Kafka, Hadoop
- **Data Warehousing:** Amazon Redshift, Snowflake, Google BigQuery
- **ETL Tools:** Apache Airflow, Talend, Informatica
- **Cloud Platforms:** AWS (S3, Glue, Lambda), Google Cloud Platform (BigQuery, Dataflow)
- **Databases:** MySQL, PostgreSQL, MongoDB

---

## Certifications

- **Certified Data Engineer – Google Cloud Professional** — Google Cloud, 2021
- **AWS Certified Big Data – Specialty** — Amazon Web Services, 2020

---

## Projects

### Real-Time Data Pipeline

- Developed a real-time data pipeline using Apache Kafka and Spark Streaming to process and analyze streaming data from IoT devices.
- Implemented data aggregation and analytics features, providing real-time insights and alerts.

### Data Lake Implementation

- Designed and implemented a data lake architecture on AWS, integrating multiple data sources and providing a unified data repository.
- Developed ETL processes for data ingestion and transformation, enabling efficient querying and analysis.

---

## Languages

- **English:** Native
- **Spanish:** Intermediate

---

## Interests

- Machine Learning and AI
- Data Visualization
- Open-source contributions

---

*References available upon request.*

In [None]:
display(Markdown("./interview_materials.md"))

Based on the information extracted from the resume, here are some potential interview questions and talking points for the candidate to prepare:

1. Can you discuss a specific project where you designed and built scalable data pipelines using Apache Spark and Kafka? What challenges did you face, and how did you overcome them?
   
2. How have you utilized Amazon Redshift and Snowflake in your data warehousing solutions? Can you provide an example of a successful implementation?
   
3. Describe a scenario where you optimized ETL processes using Python and Apache Airflow. What was the outcome of your optimization efforts?
   
4. In what ways have you collaborated with data scientists and analysts to ensure data accuracy and integrity for advanced analytics and reporting?
   
5. Can you talk about a project where you created and managed data pipelines for large-scale data integration projects using Apache Nifi and SQL?
   
6. Explain how you utilized AWS services (S3, Glue, Lambda) to build and deploy data solutions in a cloud environment. What benefits did this approach bring to the project?
   
7. Share an experience where you monitored and troubleshot data workflows to ensure reliability and performance. How did you identify and resolve issues effectively?
   
8. How did your coursework in Data Warehousing, Big Data Technologies, Advanced SQL, and Cloud Computing contribute to your skills as a Data Engineer?

9. Discuss a certification you obtained, such as Certified Data Engineer – Google Cloud Professional or AWS Certified Big Data – Specialty. How has this certification impacted your work and skills?

10. Provide examples of projects you have worked on, such as the Real-Time Data Pipeline or Data Lake Implementation. What were the key challenges you faced, and how did you address them?

11. How do you approach problem-solving and collaboration when working on data-related projects with team members from different technical backgrounds?

12. What interests you most about Machine Learning and AI, and how do you stay updated on the latest advancements in these fields?

These questions and talking points are designed to help the candidate showcase their experiences, skills, and achievements relevant to the job requirements for Data Engineer positions.

#Download the result files for further preparation.

In [None]:
from google.colab import files
files.download('interview_materials.md')
files.download('tailored_resume.md')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>