<a href="https://colab.research.google.com/github/vblagoje/notebooks/blob/main/haystack2x-demos/github_pr_writer_haystack2_x.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Introduction

This notebook demonstrates the versatility of Haystack 2.x framework in integrating with any OpenAPI specification service, exemplified here using automated GitHub Pull Request writing. It highlights how we can dynamically invoke any OpenAPI services and incorporate their outputs into the context of a Large Language Model (LLM), showcasing on-demand, service-based Retrieval-Augmented Generation (RAG).

## 1. Setup

This notebook demos GitHub Pull Request (PR) text generation.

Let's install necessary libraries and import key modules to build the foundation for the subsequent steps.

In [1]:
!pip install -q jsonref openapi3 git+https://github.com/deepset-ai/haystack.git

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m225.4/225.4 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.9/75.9 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.9/76.9 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for haystack-ai (pyproject.toml) ... [?25l[?25hdone
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llmx 0.0.15a0 requires cohere, which is not installed.
llmx 0.0.15a0 requires tiktoken, wh

In [2]:
import getpass
import os

from haystack import Pipeline
from haystack.components.converters import OpenAPIServiceToFunctions
from haystack.components.connectors import OpenAPIServiceConnector
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.components.generators.utils import default_streaming_callback
from haystack.dataclasses import ChatMessage

## 2. API Key Input and System Initialization

Begin by entering your OpenAI API key. Following this step, we initialize a system message for the GitHub PR Expert.

In [3]:
llm_api_key = getpass.getpass("Enter LLM provider api key:")

Enter LLM provider api key:··········


In [4]:
system_message = """
As the GitHub PR Expert, your enhanced role now includes the ability to analyze diffs provided by GitHub REST service.
You'll be given a JSON formatted string consisting of PR commits, description, authors etc. Your primary task is
crafting GitHub Pull Request text in markdown format, structured into five sections:

Why:
What:
How can it be used:
How did you test it:
Notes for the reviewer:

Always use these sections' names, don't rename them.

When provided with a diff link or output, you should review and interpret the changes to accurately describe them
in the PR. In cases where the diff is not clear or more context is needed, you should request additional information
or clarification. Continue to use markdown elements effectively to organize the PR content. Your goal is to offer
insightful, accurate descriptions of code changes, enhancing the understanding of the PR reviewer.
Do not use ```markdown and ``` delimeters, just start your response with ### Why markdown format directly.
"""
openapi_github_compare_branches_spec_url = "https://bit.ly/3tdRUM0"

## 3. Pipeline Creation and Configuration

This section involves setting up the core components of the Haystack 2.x pipeline, which includes the OpenAPIServiceToFunctions, OpenAIChatGenerator, and OpenAPIServiceConnector. These components are connected to create a pipeline that processes and interprets the GitHub PR commands and data.

In [5]:
gen_func_pipeline = Pipeline()
gen_func_pipeline.add_component("spec_to_functions", OpenAPIServiceToFunctions())

functions_result = gen_func_pipeline.run(data={"sources":[openapi_github_compare_branches_spec_url],
                                               "system_messages":[system_message]})

In [6]:
invoke_service_pipe = Pipeline()
invoke_service_pipe.add_component("functions_llm", OpenAIChatGenerator(api_key=llm_api_key, model_name="gpt-3.5-turbo-0613"))
invoke_service_pipe.add_component("openapi_container", OpenAPIServiceConnector())
invoke_service_pipe.connect("functions_llm.replies", "openapi_container.messages")

gen_pipe = Pipeline()
gen_pipe.add_component("llm", OpenAIChatGenerator(api_key=llm_api_key, model_name="gpt-4-1106-preview", streaming_callback=default_streaming_callback))

## 4. User Input and PR Command Processing

Here, the user can input specific GitHub PR commands. Make sure to mention
project, repo and the branches involved.

In [7]:
user_prompt = input("Enter your GitHub PR command: ")
#Example: Compare branches main and test/benchmarks2.0, in project deepset-ai, repo haystack
#Example: Compare branches main and rafaelpadilla:add_bbox_transformations in project huggingface repo transformers

Enter your GitHub PR command: Compare branches main and test/benchmarks2.0, in project deepset-ai, repo haystack


In [8]:
messages = [ChatMessage.from_system("You are a helpful assistant capable of function calling."),
            ChatMessage.from_user(user_prompt)]

## 5. Processing OpenAPI Specification and GitHub Service Invocation
In this step, the notebook retrieves the OpenAPI specification for the GitHub compare branches service. This specification is then transformed into OpenAI function definitions. When a user inputs a command, the LLM generates service information parameters from this input. These parameters are used to dynamically invoke the GitHub compare branches service, allowing for real-time, context-sensitive interactions with GitHub's API.


But before we do that let's review the GitHub OpenAPI service definition.


In [9]:
import json
import requests
from IPython.display import HTML

def render(jstr):
  if type(jstr) != str:
    jstr = json.dumps(jstr)
  return HTML("""
<script src="https://rawgit.com/caldwell/renderjson/master/renderjson.js"></script>
<script>
renderjson.set_show_to_level(1)
document.body.appendChild(renderjson(%s))
new ResizeObserver(google.colab.output.resizeIframeToContent).observe(document.body)
</script>
""" % jstr)

response = requests.get(openapi_github_compare_branches_spec_url)
response.raise_for_status()
render(response.json())

In [10]:
open_api_doc = functions_result["spec_to_functions"]["documents"][0]
openai_functions_definition = json.loads(open_api_doc.content)
openapi_spec = open_api_doc.meta["spec"]

In [11]:
# The fetched data, which includes details like PR commits, descriptions, and author information
tools_param = [{"type": "function", "function": openai_functions_definition}]
tool_choice = {"type": "function", "function": {"name": openai_functions_definition["name"]}}

service_response = invoke_service_pipe.run(data={"messages":[ChatMessage.from_user(user_prompt)],
                                                 "generation_kwargs": {"tools": tools_param,
                                                                       "tool_choice": tool_choice},
                                                 "service_openapi_spec": openapi_spec})

## 6. Generating Github PR Text with GPT-4 Model

Using the latest GPT-4 model (gpt-4-1106-preview), this section generates the textual content of the GitHub PR using the GitHub service data as context.

In [12]:
github_pr_prompt_messages = [ChatMessage.from_system(open_api_doc.meta["system_message"])] + service_response["openapi_container"]["service_response"]
final_result = gen_pipe.run(data={"llm": {"messages": github_pr_prompt_messages}})

### Why:
The purpose of this Pull Request is to introduce new benchmarking capabilities and improve existing testing automation within the Haystack project, a library used for building search systems. The changes include adding new benchmarking workflows, utilities, and integrations to run benchmarks and send results to external services such as Datadog.

### What:
The PR comprises several commits that introduce new files and modifications to the repository:
- Add GitHub Actions workflow for running benchmarks and sending results to Datadog.
- Include Python scripts for handling metrics and sending them to Datadog.
- Create benchmarking scripts that support indexing and retrieval functionalities.
- Set up pipelines for Elasticsearch indexing and retrieval in YAML configuration files.
- Utilize a utility module to assist with document retrieval for benchmarking.
- The last commit applies the 'black' code formatter for consistent code style.

### How can it be used:
- The GitHub Actions 

##7. Displaying the Generated PR Text

Although we also streamed GitHub PR text, the generated GitHub PR text is displayed below in a special markdown component.

In [13]:
from IPython.display import Markdown
Markdown(final_result["llm"]["replies"][0].content)

### Why:
The purpose of this Pull Request is to introduce new benchmarking capabilities and improve existing testing automation within the Haystack project, a library used for building search systems. The changes include adding new benchmarking workflows, utilities, and integrations to run benchmarks and send results to external services such as Datadog.

### What:
The PR comprises several commits that introduce new files and modifications to the repository:
- Add GitHub Actions workflow for running benchmarks and sending results to Datadog.
- Include Python scripts for handling metrics and sending them to Datadog.
- Create benchmarking scripts that support indexing and retrieval functionalities.
- Set up pipelines for Elasticsearch indexing and retrieval in YAML configuration files.
- Utilize a utility module to assist with document retrieval for benchmarking.
- The last commit applies the 'black' code formatter for consistent code style.

### How can it be used:
- The GitHub Actions workflow provides a scheduled benchmarking task that can be run via GitHub's CI/CD pipeline.
- Python scripts can be used for analyzing benchmark results and integrating with Datadog to visualize performance metrics.
- Benchmarking scripts make it possible to evaluate the indexing and retrieval performances across different setups, facilitating regression testing and performance analysis.
- Pipeline configurations allow users to specify and customize the indexing and retrieval processes for benchmarking.

### How did you test it:
The specifics of testing were not included in the PR description. However, standard practice would involve:
- Running the GitHub Actions workflow to ensure it completes successfully and triggers the benchmarking scripts.
- Validating that metric-related scripts correctly parse benchmark results and communicate with Datadog’s API.
- Checking the indexing and retrieval scripts by running them against predefined datasets and evaluating whether the outcomes meet expected performance benchmarks.
- Ensuring that pipeline configurations align with the overall functionality of the document stores and retriever components within Haystack.

### Notes for the reviewer:
- Reviewers should verify that all new scripts and actions are in line with the project's standards for maintainability and performance.
- The impact of these changes on the existing repository structure and workflows should be assessed.
- Attention should be given to the robustness and error handling within the scripts, especially when sending data to external services.
- As benchmark results can influence strategic decisions, it is critical to validate the accuracy of the implemented metrics and their correspondence to real-world search scenarios.

## Thank you, questions?

<a href="www.qr-code-generator.com/" border="0" style="cursor:default" rel="nofollow"><img src="https://chart.googleapis.com/chart?cht=qr&chl=https%3A%2F%2Fgithub.com%2Fvblagoje%2Fnotebooks%2Fblob%2Fmain%2Fhaystack2x-demos%2Fgithub_pr_writer_haystack2_x.ipynb&chs=180x180&choe=UTF-8&chld=L|2"></a>

## Links:
- https://github.com/deepset-ai/haystack/
- https://haystack.deepset.ai/community
- https://x.com/vladblagoje