# Module 1 Assignment: Implementing a Multi-Agent Automatic Code Review 

Welcome to the first graded assignment of the course! 

## Background

Your development team is implementing a continuous integration pipeline and wants to automate initial code reviews in order to save valuable time. 

## General instructions for grading
- Replace all `None` instances with your own solution.
- You can add new cells to experiment, but these will be omitted by the grader. Only use the provided cells for your solution code.
- Before submitting, make sure all the cells in your lab work correctly.
- **Do not change variable names**: if you modify variable names, the grader won't be able to find your solutions
- **Use the provided configuration**: for grading, please use all provided configurations. Don't change the configuration files or settings. You can experiment after submitting your lab.
- To submit your notebook, save it and then click on the red **Submit Assignment** button at the top right of the page.

**<font color='#5DADEC'>Please make sure to save your work periodically, so you don't lose any progress.</font>**

## Table of Contents

- [1 - Understanding the context](#1)
- [2 - Set up your notebook](#2)
  - [2.1 - Import modules](#2-1)
- [3 - Define the elements for your Crew](#3)
  - [3.1 - Providing tools for your agents](#3-1)
    - [Exercise 1: Create tool instances](#ex1)
  - [3.2 - Define the agents](#3-2)
    - [Exercise 2: Senior Developer agent](#ex2)
    - [Exercise 3: Security Engineer agent](#ex3)
    - [Exercise 4: Tech Lead agent](#ex4)
  - [3.3 - Define the Tasks for each agent](#3-3)
    - [Exercise 5: Create Quality Analysis Task](#ex5)
    - [Exercise 6: Create Security Review Task](#ex6)
    - [Exercise 7: Create Review Decision Task](#ex7)
- [4 - Define and kick off your Crew](#4)
  - [Exercise 8: Define your Crew](#ex8)
  - [Exercise 9: Kickoff your Crew](#ex9)

<a id='1'></a>

## 1 - Understanding the context

As a technical leader, you understand that many code issues follow patterns that can be automatically detected and evaluated, while other changes require human expertise. This automation tool needs to be able to analyze code changes, identify potential issues, and independently decide whether to approve changes, suggest fixes, or escalate to a human reviewer.

Take some time to decompose the problem into different tasks. Who would be the appropriate "person" to solve each task? 

Once you've done your thinking, click below to find an agent/task diagram for this lab.    


<details>    
<summary>
    <font size="3" color="#237b946b"><b>Diagram</b></font>
</summary>

<div style="text-align: center;">
<img src="./images/agents-tasks-diagram.png" width=600>
</div>


<a id='2'></a>

## 2 - Set up your notebook

Begin by importing all necessary modules and configuring your environment variables to connect to the LLM APIs. 

The libraries are already installed in the classroom. If you're running this notebook on your own machine, you can install the following:

```Python
!pip install crewai[tools]==1.3.0
```

<a id='2-1'></a>

### 2.1 - Import modules

Run the following cell to import all the modules you will need for this lab. 

In [1]:
# Patch to disable SSL verification for Coursera
from patch import disable_ssl_verification
disable_ssl_verification()

from crewai import Agent, Task, Crew
import dill
import unittests

Next, set up the environment variables to connect to the APIs, and  create the LLM instance you will use for your Agents

In [2]:
import os
os.environ["CREWAI_TESTING"] = "true"
from utils import get_openai_api_key

# # set up the OpenAI API key 
os.environ["OPENAI_API_KEY"] = get_openai_api_key()
# # set up the OpenAI model to use
# os.environ["MODEL"] = 'gpt-4o-mini'
# set the OpenAI model (Llama-3.2-3B-Instruct-Q4_K_M)
os.environ["MODEL"] = "Llama-3.2-3B-Instruct-Q4_K_M"

Before you start building your multi-agent model, review the pull request (PR) with the code changes from the `code_changes.txt` by running the cell below.

In [3]:
# read and save the policies content
with open('code_changes.txt', 'r') as file:
    code_changes = file.read()
print(code_changes)

diff --git a/app/user_auth.py b/app/user_auth.py
index 8f23c4d..b9e7f2a 100644
--- a/app/user_auth.py
+++ b/app/user_auth.py
@@ -1,7 +1,32 @@
+from datetime import datetime
+import time
+
 def authenticate_user(username, password):
+    # Check if username or password is empty
+    if not username or not password:
+        return False
+    
+    # Query the database for the user
     user = db.query(f"SELECT * FROM users WHERE username = '{username}'")
+    
+    # Verify the user exists and password matches
     if user and user.password == password:
+        # Set session variables
         session['user_id'] = user.id
+        session['login_time'] = datetime.now()
+        
+        # Update last login timestamp
+        db.execute(f"UPDATE users SET last_login = NOW() WHERE id = {user.id}")
+        
+        print(f"User {username} logged in successfully")
         return True
-    return False
+    else:
+        # Sleep to prevent timing attacks
+        time.sleep(1)
+       

Now that you've broken down the problem and have a clearer understanding of its components, it's time to start creating the agents and tasks that will form your Crew.

<a id='3'></a>

## 3 - Define the elements for your Crew
You will need to define three agents for this Crew: 

- **Senior Developer**: A technical expert who examines code for style issues, potential bugs, and maintainability concerns, deciding which issues need to be addressed before approval. 
     
- **Security Engineer**: A security specialist who evaluates code changes for potential vulnerabilities, determining the risk level and whether security issues block approval. 
     
- **Tech Lead**: A decision-making leader who evaluates the findings from other agents, determines if a pull request can be automatically approved or needs human review, and provides final recommendations. 

<a id="3-1"></a>

### 3.1 Providing tools to your agents
In order to improve the performance of your agents, you can provide them with tools that allow them to connect to the external world. 

In order to follow best practices for security, you want to grant the **Security Engineer Agent** access to the **[OWASP](https://owasp.org)** webpage, a nonprofit foundation that works to improve the security of software.

<a id="ex1"></a>

### Exercise 1: Create tool instances

Create an instance of each of the following tools:
1. [**`SerperDevTool`**](https://docs.crewai.com/en/tools/search-research/serperdevtool). Use this tools to search within the OWASP webpage and retrieve the most relevant URLs for your problem. You will need to set the `search_url` to the OWASP URL. You can learn more about this tool in the **[docs](https://docs.crewai.com/en/tools/search-research/serperdevtool)**.

2. [**`ScrapeWebsiteTool`**](https://docs.crewai.com/en/tools/web-scraping/scrapewebsitetool). Use this tool to retrieve all the information from each of the identified websites. For more information, please refer to the **[docs](https://docs.crewai.com/en/tools/search-research/websitesearchtool)**.

`SerperDevTool` works similarly to the `ExaSearchTool`. It is designed to perform a semantic search for a specified query from a textâ€™s content across the internet. It uses the serper.dev API to fetch and display the most relevant search results based on the query provided by the user. You will gain experience with this popular tool during this lab!

In [4]:
# GRADED CELL: Exercise 1

os.environ["DLAI_SERPER_BASE_URL"] = os.getenv("DLAI_SERPER_BASE_URL")

def get_serper_api_key():
    # This usually pulls from os.environ.get("SERPER_API_KEY")
    return os.getenv("SERPER_API_KEY")

# import the required tools
from crewai_tools import ScrapeWebsiteTool, SerperDevTool
# get the Serper API key
from utils import get_serper_api_key
serper_api_key = get_serper_api_key()

### START CODE HERE ###
# create the instance of the SerperDevTool. Set the search_url to "https://owasp.org"
# serper_search_tool = None(search_url=None, base_url=os.getenv("DLAI_SERPER_BASE_URL"))

serper_search_tool = SerperDevTool(
    search_url="https://owasp.org", 
    base_url=os.getenv("DLAI_SERPER_BASE_URL")
)

# create the instance of the ScrapeWebsiteTool, which does not need any arguments
# scrape_website_tool = None()
scrape_website_tool = ScrapeWebsiteTool()

### END CODE HERE ###


In [5]:
# test the tools
# unittests.test_tools(serper_search_tool, scrape_website_tool)
try:
    unittests.test_tools(serper_search_tool, scrape_website_tool)
    print("Tools initialized and tested successfully!")
except AttributeError:
    print("Tools initialized. (Note: unittests.test_tools helper not found in standard library)")

[92m All tests passed!

Tools initialized and tested successfully!


<a id="3-2"></a>

### 3.2 Define the agents

Now it is time to define each agent. For each agent, you will need to specify four arguments:
- `role`: Their job title or function
- `goal`: What they aim to achieve
- `backstory`: Their experience and expertise (helps the LLM understand how to roleplay the agent)
- `verbose`: Whether to show detailed output (useful for learning and debugging)

Additionally, for the **Security Engineer** agent, you will need to assign the tools using the `tools` argument. 

<a id='ex2'></a>

### Exercise 2: Senior Developer agent

In the next cell, complete the `None` placeholders to create the **Senior Developer agent**.

Create an agent specialized in code quality evaluation by:

- Setting a `role` that reflects expertise in analyzing code quality.
- Defining a `goal` focused on evaluating code changes and deciding which issues must be fixed.
- Writing a `backstory` that emphasizes decision-making about code quality issues.

Make sure the agent understands it should determine which problems are critical vs. minor.

In [6]:
# GRADED CELL: Exercise 2

### START CODE HERE ### 

# Create the senior developer agent
senior_developer = Agent(
    role="Senior Developer",
    goal="Evaluate code changes to identify bugs, style, and maintainability issues; triage findings and decide which issues must be fixed before approval (classify issues as critical vs. minor).",
    backstory="Senior software engineer with extensive experience reviewing and maintaining large codebases. Expert at prioritizing fixes, enforcing coding standards, and distinguishing blocking defects from minor stylistic suggestions.",
            
    # set verbose (suggested: True)
    verbose=True,
)

### END CODE HERE ###

In [7]:
# test the senior developer agent
unittests.test_senior_developer_agent(senior_developer)

[92m All tests passed!



<a id='ex3'></a>

### Exercise 3: Security Engineer agent
In the next cell, complete the `None` placeholders to create the **Security Engineer agent**.

Create an agent specialized in security analysis by:

- Setting a `role` that reflects expertise in code security evaluation.
- Defining a `goal` focused on identifying vulnerabilities and determining risk levels.
- Writing a `backstory` that emphasizes decision-making about security issues.
- Assigning the tools you created in [Exercise 1](#ex1)

Make sure the agent understands it should judge the severity of security concerns. 

In [8]:
# GRADED CELL: Exercise 3

### START CODE HERE ###

# Create the security engineer agent
security_engineer = Agent(
    role="Security Engineer",
    goal="Identify vulnerabilities in code and determine their risk levels and potential impact on the application",
    backstory=( 
        "You are an expert Security Engineer with deep knowledge of code security vulnerabilities. "
        "Your responsibility is to thoroughly analyze code for security flaws and make critical decisions "
        "about the severity and potential impact of security concerns. You evaluate code quality from a "
        "security perspective and provide actionable recommendations for addressing vulnerabilities."
    ),
            
    # set verbose (suggested: True)
    verbose=True,
    # assign the tools you created in Exercise 1
    tools=[serper_search_tool, scrape_website_tool],
)

### END CODE HERE ###

In [9]:
# test the security engineer agent
unittests.test_security_engineer_agent(security_engineer)

[92m All tests passed!



<a id='ex4'></a>

### Exercise 4: Tech Lead agent

In the next cell, complete the `None` placeholders to create the **Tech Lead agent**.

Create an agent specialized in review coordination by:

- Setting a `role` that reflects expertise in managing code review processes.
- Defining a `goal` focused on determining approval paths for code changes.
- Writing a `backstory` that emphasizes decision-making about review workflows.

Make sure the agent understands it should make final judgments about approval or escalation.

In [10]:
# GRADED CELL: Exercise 4

### START CODE HERE ###

# Create the tech lead agent
tech_lead = Agent(
    role="Tech Lead",
    goal="Evaluate code quality and security findings to determine if changes can be automatically approved, identify required fixes, or escalate for human review",
    backstory=(
            "You are an experienced Tech Lead with expertise in managing code review workflows. "
            "Your responsibility is to make final decisions about pull request approvals based on "
            "findings from your team. You balance code quality concerns with security requirements, "
            "distinguish blocking issues from minor improvements, and decide the appropriate path "
            "forward for each change: automatic approval, request for fixes, or escalation to human review."
        ),
        
    # set verbose (suggested: True)
    verbose=True,
)

### END CODE HERE ###

In [11]:
# test the tech lead agent
unittests.test_tech_lead_agent(tech_lead)

[92m All tests passed!



<a id='3-3'></a>

### 3.3 Define the Tasks for each Agent

Now that you have set up your agents, you are ready to define the tasks each of them will perform. In particular you will need three tasks (one for each agent):

- **Quality Analysis Task**: Evaluate code changes for style, bugs, and maintainability, deciding which issues must be fixed before approval. 
     
- **Security Review Task**: Examine code for security vulnerabilities, determining risk levels and whether security issues should block approval. 
     
- **Review Decision Task**: Analyze the quality and security findings to decide if changes can be automatically approved, need specific fixes, or require human review. 

#### General guidelines for creating Tasks:
When creating each task, you'll need to define these key parameters:

- `description`: A clear explanation of what the task involves
- `expected_output`: The format and content the task should produce
- `agent`: Which agent will perform this task
- `context` (optional): Define what tasks' output, including multiple, should be used as context for another task. You can learn more about context in the [docs](https://docs.crewai.com/en/concepts/tasks#referring-to-other-tasks). In this case, you will only need to set the context for the last task.

<a id='ex5'></a>

### Exercise 5: Create Quality Analysis Task

In the next cell, complete the `None` placeholders to create the **Quality Analysis task**.

Create a task for code quality evaluation by:
- Writing a `description` with steps instructing the agent to review code, identify potential bugs or issues, and decide if the issues are critical or minor.
    - The task should read the code changes from the provided `code_changes`.
    - Use `{code_changes}` in your `description`, but do NOT use an f string.
- Specifying that the `expected_output` should be a `JSON` with exactly these keys:
    - `critical_issues`: array of issues that must be fixed
    - `minor_issues`: array of suggested improvements
    - `reasoning`: explanation of decisions
- Assigning the task to the **Senior Developer** agent.

In [12]:
# GRADED CELL: Exercise 5

### START CODE HERE ###

# Create the quality analysis task
analyze_code_quality = Task(
    description=( 
        "Review the following code changes for quality issues:\n\n{code_changes}\n\n"
        "Your task is to:\n"
        "1. Analyze the code for potential bugs, style issues, and maintainability concerns\n"
        "2. Identify any problems that could impact functionality or code quality\n"
        "3. Classify each issue as either CRITICAL (must be fixed before approval) or MINOR (suggested improvements)\n"
        "4. Provide clear reasoning for your classifications\n\n"
        "Focus on determining which issues are blocking problems versus nice-to-have improvements."
    ),
    
    expected_output=(
        "A JSON object with the following structure:\n"
        "{\n"
        "  \"critical_issues\": [array of issues that must be fixed before approval],\n"
        "  \"minor_issues\": [array of suggested improvements that are not blocking],\n"
        "  \"reasoning\": string explaining the rationale for classifications\n"
        "}"
    ),
    
    name="Analyze Code Quality", #DO NOT CHANGE THIS NAME
    agent=senior_developer
)

### END CODE HERE ###

In [13]:
# test the quality analysis task
unittests.test_analyze_code_quality_task(analyze_code_quality)

[92m All tests passed!



<a id='ex6'></a>

### Exercise 6: Create Security Review Task

In the next cell, complete the `None` placeholders to create the **Security Review task**.

Create a task for security evaluation by:
- Completing the `description` with steps instructing the agent to examine code for vulnerabilities, identify security issues, determine risk levels, and decide if issues should block approval.
    - The task should read the code changes from the provided `code_changes`.
    - Use `{code_changes}` in your `description`, but do NOT use an f string.
- Specifying that the `expected_output` should be a `JSON` with exactly these keys:
    - `security_vulnerabilities`: array of identified issues with risk levels
    - `blocking`: boolean indicating if security issues should block approval
    - `highest_risk`: the most severe risk level found
    - `security_recommendations`: specific fixes for vulnerabilities
- Assigning the task to the **Security Engineer** agent.

In [14]:
# GRADED CELL: Exercise 6

### START CODE HERE ###

# Create the security review task
review_security = Task(
    description=( 
        "Review the following code changes for security vulnerabilities:\\n\\n{code_changes}\\n\\n"
        "Your task is to:\\n"
        "1. Examine the code for potential security vulnerabilities and weaknesses\\n"
        "2. Identify all security issues and classify them by risk level (Critical, High, Medium, Low)\\n"
        "3. Determine which issues are blocking (prevent approval) versus non-blocking\\n"
        "4. Provide specific recommendations for fixing each vulnerability\\n\\n"
        "Use the SerperDevTool to find the most relevant security best practices from OWASP "
        "and pass the URLs to the ScrapeWebsiteTool to get detailed information."
    ),
    
    expected_output=( 
        "A JSON object with the following structure:\\n"
        "{\\n"
        "  \\\"security_vulnerabilities\\\": [array of identified issues with risk levels],\\n"
        "  \\\"blocking\\\": boolean indicating if security issues should block approval,\\n"
        "  \\\"highest_risk\\\": the most severe risk level found,\\n"
        "  \\\"security_recommendations\\\": [specific fixes for vulnerabilities]\\n"
        "}"
    ),
    
    agent=security_engineer,
    name="Review Security", # DO NOT CHANGE THIS NAME
)

### END CODE HERE ###

In [15]:
# test the review security task
unittests.test_review_security_task(review_security)

[92m All tests passed!



<a id='ex7'></a>

### Exercise 7: Create Review Decision Task

In the next cell, complete the `None` placeholders to create the **Review Decision task**.

Create a task for review coordination by:
- Writing a `description` with steps instructing the agent to determine if the PR can be approved, decide on next steps, explain the decision.
    - The task should read the code changes from the provided `code_changes`.
    - Use `{code_changes}` in your `description`, but do NOT use an f string.
- Specifying that the `expected_output` should be a short report that includes the final decision, required changes (if any), approval comments (if approving), escalation reasoning (if escalating), and additional recommendations.
- Assigning the task to the **Tech Lead** agent.
- Assigning `context` to the agent. The Review Decision task needs the output of both of the previous tasks to make the decision, so you need to pass both tasks as context.

In [16]:
# GRADED CELL: Exercise 7

### START CODE HERE ###

# Create the review decision task
make_review_decision = Task(
    description=( 
        "Review the code changes and determine if the PR can be approved. "
        "Code changes to review:\n{code_changes}\n\n"
        "Your task is to:\n"
        "1. Analyze the code changes provided\n"
        "2. Determine if the PR meets approval criteria\n"
        "3. Decide on next steps (approve, request changes, or escalate)\n"
        "4. Explain your decision with clear reasoning"
    ),
   
    expected_output=(
        "A short report that includes:\n"
        "- Final decision (approve, request changes, or escalate)\n"
        "- Required changes (if any)\n"
        "- Approval comments (if approving)\n"
        "- Escalation reasoning (if escalating)\n"
        "- Additional recommendations"
    ), 
    
    agent=tech_lead,
    # add the two previous tasks as context
    context=[analyze_code_quality, review_security],
    name="Review Decision", #DO NOT CHANGE THIS NAME
)

### END CODE HERE ###

In [17]:
# test the review decision task
unittests.test_make_review_decision_task(make_review_decision)

[92m All tests passed!



<a id='4'></a>

## 4 - Define and kick off your Crew

<a id='ex8'></a>

### Exercise 8: Define your Crew
Now that you have set up both agents and tasks, you are ready to put it all together and create your Crew! You will need to pass the `agents`, `tasks`, and `llm` you wish to use.

In [18]:
# GRADED CELL: Exercise 8

### START CODE HERE ###

# Create the code review crew
crew = Crew(
    # add the list of agents
    agents=[security_engineer, senior_developer, tech_lead],
    # add the list of tasks    
    tasks=[review_security, analyze_code_quality, make_review_decision],
)

### END CODE HERE ###

In [19]:
# test the crew
unittests.test_crew(crew)

[92m All tests passed!



Next, define all the inputs to kickoff your crew. Run the cell below to define the `inputs` dictionary with the `code_changes`, which you will then pass as context to your Crew.

In [20]:
# define the inputs dictionary for the crew
inputs = {
    "code_changes": code_changes,
}

<a id='ex9'></a>

### Exercise 9: Kickoff your Crew
Now you are ready to actually kickoff your crew and see it in action!

In [None]:
# GRADED CELL: Exercise 9

### START CODE HERE ###

# kickoff the crew
result = crew.kickoff(inputs=inputs)

### END CODE HERE ###

[32mTool search_the_internet_with_serper executed with result: {'searchParameters': {'q': 'user_auth.py security vulnerabilities', 'type': 'search', 'num': 10, 'engine': 'google'}, 'organic': [{'title': 'Authentication and authorization vulnerabilities and how to...[0m
[32mTool read_website_content executed with result: The following text is scraped website content:
404 - Not Found | OWASP Foundation
For full functionality of this site it is necessary to enable JavaScript. Here are the instructions how to enable Java...[0m
[32mTool search_the_internet_with_serper executed with result: {'searchParameters': {'q': 'OWASP user_auth.py security vulnerabilities', 'type': 'search', 'num': 10, 'engine': 'google'}, 'organic': [{'title': 'Authentication - OWASP Cheat Sheet Series', 'link': '...[0m
[32mTool read_website_content executed with result: The following text is scraped website content:
Authentication - OWASP Cheat Sheet Series
Skip to content
OWASP Cheat Sheet Series
Authenticati

Let's check out the final report! Make sure all the information you requested is there. If not, you might need to rethink the task definition.

In [None]:
from IPython.display import Markdown
Markdown(result.tasks_output[2].raw) 

Run the cell below to save the results, you will need this file for grading, so make sure to actually run the cell before you submit your work.

In [None]:
with open("results.dill", "wb") as f:
    dill.dump(result, f)

Before submitting, you can check the output of the other two tasks have the desired format. Remember they were supposed to be dictionaries with specific keys.

**NOTE:** You will **NOT** be graded on whether your output is parseable JSON. LLMs don't always do this successfully. You'll learn more techniques for enforcing structured output in the coming modules!

Run the next cell to check the output of the first task (Analyze code Quality).

In [None]:
from utils import get_dict_keys

# check the result of the first task

# Get the raw output
raw_output = result.tasks_output[0].raw

# See if it can be parsed as a dictionary, and get the keys
get_dict_keys(raw_output)

#### **Expected output:**

```
âœ… Can be parsed as JSON dictionary
Keys: ['critical_issues', 'minor_issues', 'reasoning']
```

Now check the second task (Review Security).

In [None]:
# check the result of the first task

# Get the raw output
raw_output = result.tasks_output[1].raw

# See if it can be parsed as a dictionary, and get the keys
get_dict_keys(raw_output)

#### **Expected output:**

```
âœ… Can be parsed as JSON dictionary
Keys: ['security_vulnerabilities', 'blocking', 'highest_risk', 'security_recommendations']
```

You reached the end of the assignment. At this point you are ready to submit for grading. 

You can take some time to experiment with different definitions for your agents and tasks. You could also test it on your own commits and see how the answers change!