## Code Review Agent
In this notebook, we will create a notebook agent which will take a github repository and filename as input and then it will review every file one by one and store the review in notion document.

### Prerequisites
- OpenAI API key: we will use gpt-4 as our LLM for this task. Get your api key from openai dashboard and add it as a environment variable in your code as it’s recommended to keep it private.
- Github personal access token: we will use this token to make requests to github API as an authenticated user because you can’t access your private repositories without this token and we are assuming that as an organization, you like to keep your repository private. Get your personal access token (PAT) from github settings.
- Notion API key: we will use this notion key to create notion pages and to add data into a notion document using notion API. Create your integration from here and add it in your document by clicking on 3 dots icon → connect to → your integration name.

## Let’s code it
Start by installing the required modules and dependencies


In [20]:
%pip install crewai



In [5]:
%pip install requests



In [6]:
%pip install crewai-tools



In [7]:
%pip install notion_client



Import all the modules and dependencies

In [30]:
from crewai import Crew,Agent,Task
from textwrap import dedent
from langchain.tools import tool
from notion_client import Client
import requests
import os

Make sure to keep all of your API keys private and store it as an environment variable

In [31]:
# secret = os.environ["OPENAI_API_KEY"]
# github_secret = os.environ['GITHUB_KEY']
# notion_key = os.environ['NOTION_KEY']

In [32]:
from google.colab import userdata
secret = userdata.get("OPENAI_API_KEY")
github_secret = userdata.get('GITHUB_KEY')
notion_key = userdata.get('NOTION_KEY')

We will create one notion page and we will keep pushing the review in that notion file instead of creating separate notion pages for every file. we will use `notion_client` module to interact with notion API.

Let’s create a helper function to create a notion page which will take project name as a parameter which will be used as a file name. Also get the `page_id` of a notion page in which you want to add all these documents.

In [33]:
def createNotionPage(projectName):
  parent = {"type": "page_id","page_id": '14b322bb-b8bc-80a4-98ca-c7f7c0090baa'}
  properties = {
      "title": {
          "type": "title",
          "title": [{ "type": "text", "text": { "content": f"Code review of {projectName}" } }]
      },
  }
  create_page_response = Client.pages.create(
    parent=parent,properties=properties
  )
  return create_page_response['id']


Now let’s create a helper method to get the tree structure of given github repository

In [34]:
# Make sure GitHub token is properly set
github_secret = os.getenv('GITHUB_KEY')
if not github_secret:
    raise ValueError("GitHub token not found. Please set GITHUB_KEY environment variable.")

# Global variable for file tree
global_path = ""

def getFileTree(owner, repo, path='', level=0):
    """
    Fetch and print the tree structure of a GitHub repository.
    """
    global global_path
    try:
        api_url = f"https://api.github.com/repos/{owner}/{repo}/contents/{path}"
        headers = {
            'Authorization': f'Bearer {github_secret}',
            'Accept': 'application/vnd.github.v3+json'
        }

        response = requests.get(api_url, headers=headers)
        response.raise_for_status()

        items = response.json()

        if not isinstance(items, list):
            print(f"Error: Expected list of items, got {type(items)}")
            return

        for item in items:
            if item['name'] in {'public', 'images', 'media', 'assets'}:
                continue

            global_path += f"{' ' * (level * 2)}- {item['name']}\n"

            if item['type'] == 'dir':
                getFileTree(owner, repo, item['path'], level + 1)

    except requests.exceptions.RequestException as e:
        print(f"Error accessing GitHub API: {e}")
        print(f"Response content: {response.content if 'response' in locals() else 'No response'}")
    except Exception as e:
        print(f"Unexpected error: {e}")

Define all of your tasks for your agents

In [35]:
# Tasks
from crewai import Agent,Task
import requests
import base64

class Tasks:
  def ReviewTask(agent,repo,context):
    return Task(
                agent=agent,
                description=
                f"""
                  Review the given file and provide detailed feedback and reviews about the file if it doesn't follow the industry code standards
                  Take the file path and file contents from contentAgent
                  Make changes in file content to make it better and return the changed content as updated_code in response
                  Return the below values in response
                  project_name: {repo}
                  file_path: file path
                  review: review_here
                  updated_code: updated content of file after making changes

                  Return the output which follows the below array structure and every element must be wrapped in multi-line string
                  In case of updated_code, add the full code as a multi-line string
                  Only return file content which got changed in updated_code, there are multiple changes in file content then send whole file content

                  Every array should follow this format:
                  [project_name,file_path,review,updated_code]

                  Don't return anything except array in above format
                """,
                context=context,
                expected_output="An array of 4 elements in a format given in description"

    )
  def NotionTask(agent,context,page_id):
    return Task(
        agent=agent,
        description=f"""
        You are given an array of 4 elements and a page id and you will have to add this data in notion
        Here is the id of notion page
        {page_id}
        Say 'Data is added successfully into notion' in case of success else return given array.
        """,
        context=context,
        expected_output="Text saying 'Data is added successfully into notion' in case of success and 'Could not add data in notion' in case of failure"
    )
  def getFilePathTask(agent,filetree,userInput):
    return Task(
        agent=agent,
        description=f"""
        You are given a tree structure of folder and userInput and first you have to decide whether it is a folder or file from given tree structure of a folder
        Follow this approach
        - If it's a file then return array with 1 element which contains the full path of that file in this folder structure
        - If it's a folder then return array of paths of sub files inside that folder, if there is a subfolder in given folder, then return paths for those files as well
        - If userInput is not present in given tree structure then just return empty array
        Please return FULL path of a given file in given folder tree structure
        For example if tree structure looks like this:
        - src
          - components
            - Login.jsx
            - Password.jsx
        - backend
          - api
        Then the full path of Login.jsx will be src/components/Login.jsx
        DON'T send every file content at once, send it one by one to reviewAgent
        here is the tree structure of folder:
        {filetree}
        here is user input
        {userInput}

        NOTE: ONLY RETURN ARRAY OF PATHS WITHOUT ANY EXTRA TEXT IN RESPONSE

        """,
        expected_output="""
        ONLY an array of paths
        For example:
        ['src/load/app.jsx','client/app/pages/404.js']
        """

    )
  def getFileContentTask(agent,owner,repo,path):
    return Task(
        agent=agent,
        description=f"""
        You are given a file path and you have to get the content of file and file name using github API
        here is the file path
        {path}
        here is owner name
        {owner}
        here is repo name
        {repo}
        Don't return anything except filename and content
        """,
        expected_output="filename and content of given file"
  )

Let’s create custom tools which our agents will use.

We will create 2 custom tools here:
- addToNotion: It will add the given content into notion document with given page_id
- getFileContents: It will return the content of given file using github API

We will `@tool` decorator from Langchain to define the custom tool

In [36]:
# Custom Tools
from pprint import pprint
import json
import ast
from langchain.tools import tool
class Tools():

    @tool("Add data to notion")
    def addToNotion(output,page_id):
      """
      Used to add an data given as input in notion document.
      """
      children = [
            {
              "object": "block",
              "type": "heading_2",
              "heading_2": {
                "rich_text": [{ "type": "text", "text": { "content": "🚀 File Name" } }]
              }
            },
            {
              "object": "block",
              "type": "paragraph",
              "paragraph": {
                "rich_text": [{ "type": "text", "text": { "content": output[1] } }]
              }
            },
            {
              "object": "block",
              "type": "heading_2",
              "heading_2": {
                "rich_text": [{ "type": "text", "text": { "content": "📝 Review" } }]
              }
            },
            {
              "object": "block",
              "type": "paragraph",
              "paragraph": {
                "rich_text": [{ "type": "text", "text": { "content": output[2] } }]
              }
            },
            {
              "object": "block",
              "type": "heading_2",
              "heading_2": {
                "rich_text": [{ "type": "text", "text": { "content": "💡 Updated code" } }]
              }
            },
            {
              "object": "block",
              "type": "code",
              "code": {
                "caption": [],
                "rich_text": [{
                  "type": "text",
                  "text": {
                    "content": output[3]
                  }
                }],
                "language": "markdown"
              }
            },
          ]
      add_data_response = Client.blocks.children.append(
          block_id=page_id,children=children
      )
      print(add_data_response)

    @tool("get file contents from given file path")
    def getFileContents(path,owner,repo):
      """
        used to get the content of given file using the given path, owner of repository and repository name
        url will look like this https://api.github.com/repos/{owner}/{repo}/{path}
      """
      if path.startswith("https://"):
        api_url = path
      else:
        api_url = f"https://api.github.com/repos/{owner}/{repo}/contents/{path}"

      # Add the Authorization header with the token
      headers = {'Authorization': f'token {github_secret}','X-GitHub-Api-Version': '2022-11-28'}
      # Make the request
      response = requests.get(api_url,headers=headers)

      # Check if the request was successful
      if response.status_code == 200:
          file_content = response.json()

          # Check the size of the file
          if file_content['size'] > 1000000:  # 1MB in bytes
              return "Skipped: File size is greater than 1 MB."

          # The content is Base64 encoded, so decode it
          content_decoded = base64.b64decode(file_content['content'])

          # Convert bytes to string
          content_str = content_decoded.decode('utf-8')

          # Check the number of lines in the file
          if len(content_str.split('\n')) > 500:
              return "Skipped: File contains more than 500 lines."
          return content_str
      else:
          # Handle errors (e.g., file not found, access denied)
          return f"Error: {response.status_code} - {response.reason}"

We will require 4 agents in our crew:

- ReviewAgent: It will review given file based on filename and file content
- NotionAgent: It will add the given data into given notion document
- ContentAgent: It will return the content of given file using github API
- PathAgent: It will return the array of full paths of files from the given

tree structure so that we can build the API url for that file
Let’s create our agents!

In [37]:
# Agents
class Agents:
  def ReviewAgent():
    return Agent(
            role='Senior software developer',
            goal=
            'Do code reviews on a given file to check if it matches industry code standards',
            backstory=
            "You're a Senior software developer at a big company and you need to do a code review on a given file content.",
            allow_delegation=False,
            verbose=True,
    )
  def NotionAgent():
    return Agent(
        role = "Notion api expert and content writer",
        goal = "Add given array data into notion document using addToNotion tool",
        backstory=
            "You're a notion api expert who can use addToNotion tool and add given data into notion document",
        allow_delegation=True,
        tools=[Tools.addToNotion],
        verbose=True,
    )
  def PathAgent():
    return Agent(
        role="File path extractor",
        goal = "Get the tree structure of folder and return full paths of the given file or files of given folder in array format",
        backstory = "You're a file path extractor who have created several file paths from given tree structure",
        allow_delegation=False,
        verbose=True,
    )
  def ContentAgent():
    return Agent(
        role="github api expert",
        goal="Get the content of given file using github API",
        backstory="You're github api expert who have extracted many file contents using github's api",
        verbose=True,
        allow_delegation=False,
        tools=[Tools.getFileContents]
    )

Now let’s create our crew class, we will call it ReviewCrew in which we will have a run method which will start the crew execution

ReviewCrew will accept 4 parameters:
- owner: name of github repository owner
- repo: name of github repository
- page_id: page id of the notion page in which we want to add the review
- path: full path of file which needs to be reviewed

In [38]:
from crewai import Process
class ReviewCrew:
  def __init__(self,owner,repo,page_id,path):
    self.owner = owner
    self.repo = repo
    self.page_id = page_id
    self.path = path
  def run(self):
    # Agents
    reviewAgent = Agents.ReviewAgent()
    contentAgent = Agents.ContentAgent()
    notionAgent = Agents.NotionAgent()
    # Tasks
    contentTask = Tasks.getFileContentTask(agent=contentAgent,owner=owner,repo=repo,path=path)
    reviewTask = Tasks.ReviewTask(agent=reviewAgent,repo=repo,context=[contentTask])
    notionTask = Tasks.NotionTask(agent=notionAgent,page_id=page_id,context=[reviewTask])
    # Crew
    crew = Crew(
      agents=[contentAgent,reviewAgent,notionAgent],
      tasks=[contentTask,reviewTask,notionTask],
      verbose=2, # You can set it to 1 or 2 to different logging levels
    )
		# Run the crew
    result = crew.kickoff()
    print(result)

Now we are only one step close to our final crew!

Let’s add the code to do following tasks:
- Take input from user
- Get tree structure of given github repository
- Get array of file paths which needs to be reviewed
- Traverse these paths one by one and review them one by one using our crew

In [41]:
import ast
# Take input from user
github_url = input("Provide github repo URL:")
userInput = input("Provide file/folder name you want to review:")

# Get owner and repo name from github url
split_url = github_url.split('/')
owner = split_url[3]
repo = split_url[4]

# Get the tree structure of github repository
print("Fetching repository structure...")
getFileTree(owner=owner, repo=repo)

if not global_path:
    print("Failed to fetch repository structure. Please check your GitHub token and repository permissions.")
    exit(1)

# Create path extraction crew
pathAgent = Agents.PathAgent()
pathTask = Tasks.getFilePathTask(agent=pathAgent, filetree=global_path, userInput=userInput)

# Create and run a crew for path extraction with verbose as boolean
path_crew = Crew(
    agents=[pathAgent],
    tasks=[pathTask],
    verbose=True  # Changed from 2 to True
)

try:
    # Get the paths from crew execution
    paths_result = path_crew.kickoff()
    print(f"Path extraction result: {paths_result}")

    # Reset global path
    global_path = ""

    # Create notion page
    page_id = createNotionPage(projectName=repo)

    # Convert the string result to an array
    paths = ast.literal_eval(paths_result)

    # Traverse the paths one by one and review them using ReviewCrew
    for path in paths:
        print(f"Reviewing file: {path}")
        reviewCrew = ReviewCrew(owner=owner, repo=repo, page_id=page_id, path=path)
        reviewCrew.run()

except Exception as e:
    print(f"Error during execution: {e}")


Provide github repo URL:https://api.github.com/wolfsbane9513/100x_buildathon
Provide file/folder name you want to review:
Fetching repository structure...


ERROR:root:LiteLLM call failed: litellm.AuthenticationError: AuthenticationError: OpenAIException - The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable
ERROR:root:LiteLLM call failed: litellm.AuthenticationError: AuthenticationError: OpenAIException - The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable
ERROR:root:LiteLLM call failed: litellm.AuthenticationError: AuthenticationError: OpenAIException - The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable


Error accessing GitHub API: 403 Client Error: Forbidden for url: https://api.github.com/repos/wolfsbane9513/100x_buildathon/contents/
Response content: b'{"message":"API rate limit exceeded for user ID 11406551. If you reach out to GitHub Support for help, please include the request ID AEA8:1E45EF:415C75F:80E408B:6746D771 and timestamp 2024-11-27 08:25:21 UTC.","documentation_url":"https://docs.github.com/rest/overview/rate-limits-for-the-rest-api","status":"403"}'
Failed to fetch repository structure. Please check your GitHub token and repository permissions.
[1m[95m# Agent:[00m [1m[92mFile path extractor[00m
[95m## Task:[00m [92m
        You are given a tree structure of folder and userInput and first you have to decide whether it is a folder or file from given tree structure of a folder
        Follow this approach
        - If it's a file then return array with 1 element which contains the full path of that file in this folder structure
        - If it's a folder then retu

In [40]:
def test_github_token():
    headers = {
        'Authorization': f'Bearer {github_secret}',
        'Accept': 'application/vnd.github.v3+json'
    }
    test_url = "https://api.github.com/wolfsbane9513/100x_buildathon"
    response = requests.get(test_url, headers=headers)
    print(f"Status code: {response.status_code}")
    if response.status_code == 200:
        print("Token is valid")
    else:
        print(f"Token error: {response.json()}")

test_github_token()

Status code: 403
Token error: {'message': 'API rate limit exceeded for user ID 11406551. If you reach out to GitHub Support for help, please include the request ID C964:C585A:2685BE1:4BE099D:6746D625 and timestamp 2024-11-27 08:19:49 UTC.', 'documentation_url': 'https://docs.github.com/rest/overview/rate-limits-for-the-rest-api', 'status': '403'}
