<a href="https://colab.research.google.com/github/vblagoje/notebooks/blob/main/github_pr_writer_haystack2_x.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Introduction

This notebook demonstrates the versatility of Haystack 2.x framework in integrating with any OpenAPI specification service, exemplified here using automated GitHub Pull Request writing. It highlights how we can dynamically invoke any OpenAPI services and incorporate their outputs into the context of a Large Language Model (LLM), showcasing on-demand, service-based Retrieval-Augmented Generation (RAG).

## 1. Setup

This notebook demos GitHub Pull Request (PR) text generation.

Let's install necessary libraries and import key modules to build the foundation for the subsequent steps.

In [49]:
!pip install -q git+https://github.com/deepset-ai/haystack.git@openapi_container#egg=farm-haystack[preview]

[33mDEPRECATION: git+https://github.com/deepset-ai/haystack.git@openapi_container#egg=farm-haystack[preview] contains an egg fragment with a non-PEP 508 name pip 25.0 will enforce this behaviour change. A possible replacement is to use the req @ url syntax, and remove the egg fragment. Discussion can be found at https://github.com/pypa/pip/issues/11617[0m[33m
[0m  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


In [50]:
!pip install -q jsonref openapi3

In [51]:
import getpass
import os

from haystack.preview import Pipeline
from haystack.preview.components.builders.openapi_service_functions_builder import OpenAPIServiceFunctionsBuilder
from haystack.preview.components.connectors import OpenAPIServiceConnector
from haystack.preview.components.generators.chat import GPTChatGenerator
from haystack.preview.components.generators.utils import default_streaming_callback
from haystack.preview.dataclasses import ChatMessage

## 2. API Key Input and System Initialization

Begin by entering your OpenAI API key. Following this step, we initialize a system message for the GitHub PR Expert.

In [52]:
llm_api_key = getpass.getpass("Enter LLM provider api key:")

Enter LLM provider api key:··········


In [53]:
system_message = """
As the GitHub PR Expert, your enhanced role now includes the ability to analyze diffs provided by GitHub REST service.
You'll be given a JSON formatted string consisting of PR commits, description, authors etc. Your primary task is
crafting GitHub Pull Request text in markdown format, structured into five sections:

Why:
What:
How can it be used:
How did you test it:
Notes for the reviewer:

Always use these sections' names, don't rename them

When provided with a diff link or output, you should review and interpret the changes to accurately describe them
in the PR. In cases where the diff is not clear or more context is needed, you should request additional information
or clarification. Continue to use markdown elements effectively to organize the PR content. Avoid common mistakes
like misinterpreting the diff or providing irrelevant information. Your goal is to offer insightful, accurate
descriptions of code changes, enhancing the understanding of the PR reviewer.
"""
system_message = ChatMessage.from_system(system_message)


## 3. Pipeline Creation and Configuration

This section involves setting up the core components of the Haystack 2.x pipeline, which includes the OpenAPIServiceFunctionsBuilder, GPTChatGenerator, and OpenAPIServiceConnector. These components are connected to create a  pipeline that processes and interprets the GitHub PR commands and data.

In [54]:
functions_builder = OpenAPIServiceFunctionsBuilder()
functions_llm = GPTChatGenerator(api_key=llm_api_key, model_name="gpt-3.5-turbo-0613")
openapi_container = OpenAPIServiceConnector()
llm = GPTChatGenerator(api_key=llm_api_key, model_name="gpt-4-1106-preview", streaming_callback=default_streaming_callback)

In [55]:
pipe = Pipeline()
pipe.add_component("functions_builder", functions_builder)
pipe.add_component("functions_llm", functions_llm)
pipe.add_component("openapi_container", openapi_container)
pipe.connect("functions_builder.functions", "functions_llm.generation_kwargs")
pipe.connect("functions_builder.service_openapi_spec", "openapi_container.service_openapi_spec")
pipe.connect("functions_llm.replies", "openapi_container.messages")

gen_pipe = Pipeline()
gen_pipe.add_component("llm", llm)

## 4. User Input and PR Command Processing

Here, the user can input specific GitHub PR commands. Make sure to mention
project, repo and the branches involved.

In [56]:
user_prompt = input("Enter your GitHub PR command: ")
#Example: Compare branches main and pipeline_run_input, in project deepset-ai, repo haystack

Enter your GitHub PR command: Compare branches main and pipeline_run_input, in project deepset-ai, repo haystack


In [57]:
messages = [ChatMessage.from_system("You are a helpful assistant capable of function calling."),
            ChatMessage.from_user(user_prompt)]

## 5. Processing OpenAPI Specification and GitHub Service Invocation
In this step, the notebook retrieves the OpenAPI specification for the GitHub compare branches service. This specification is then transformed into OpenAI function definitions. When a user inputs a command, the LLM generates service information parameters from this input. These parameters are used to dynamically invoke the GitHub compare branches service, allowing for real-time, context-sensitive interactions with GitHub's API.


But before we do that let's review the GitHub OpenAPI service definition.


In [58]:
openapi_github_compare_branches_spec_url = "https://t.ly/eBODl"

In [59]:
import json
import requests
from IPython.display import HTML

def render(jstr):
  if type(jstr) != str:
    jstr = json.dumps(jstr)
  return HTML("""
<script src="https://rawgit.com/caldwell/renderjson/master/renderjson.js"></script>
<script>
renderjson.set_show_to_level(1)
document.body.appendChild(renderjson(%s))
new ResizeObserver(google.colab.output.resizeIframeToContent).observe(document.body)
</script>
""" % jstr)

response = requests.get(openapi_github_compare_branches_spec_url)
response.raise_for_status()
render(response.json())

In [60]:
# The fetched data, which includes details like PR commits, descriptions, and author information
service_result = pipe.run(data={"functions_builder": {"service_spec_url": openapi_github_compare_branches_spec_url},
                                "functions_llm": {"messages": messages}})

## 6. Generating Github PR Text with GPT-4 Model

Using the latest GPT-4 model (gpt-4-1106-preview), this section generates the textual content of the GitHub PR using the GitHub service data as context.

In [61]:
github_pr_prompt_messages = [system_message] + service_result["openapi_container"]["service_response"]
final_result = gen_pipe.run(data={"llm": {"messages": github_pr_prompt_messages}})

### Why:
In order to enhance the usability and flexibility of the Haystack pipeline execution process, an update has been made to the `run` method interface. The development seeks to offer users an alternative invocation that simplifies providing input to the pipeline components.

### What:
The proposed change introduces several modifications to the `haystack/preview/pipeline.py` file. The key update is enabling users to run a pipeline without specifying component names in the input dictionary. With this update, you can directly pass input keys and values to the `run` method, and the pipeline will resolve the appropriate components internally.

### How can it be used:
Users can now call the `run` method on a Haystack pipeline in two ways:

1. Using a data dictionary where each key is the component name, which in turn maps to another dictionary containing input parameters for that component.
2. With a flat dictionary of input keys and values, allowing the pipeline's internal mechanisms 

##7. Displaying the Generated PR Text

Although we also streamed GitHub PR text, the generated GitHub PR text is displayed below in a special markdown component.

In [62]:
from IPython.display import Markdown
display(Markdown(final_result["llm"]["replies"][0].content))

### Why:
In order to enhance the usability and flexibility of the Haystack pipeline execution process, an update has been made to the `run` method interface. The development seeks to offer users an alternative invocation that simplifies providing input to the pipeline components.

### What:
The proposed change introduces several modifications to the `haystack/preview/pipeline.py` file. The key update is enabling users to run a pipeline without specifying component names in the input dictionary. With this update, you can directly pass input keys and values to the `run` method, and the pipeline will resolve the appropriate components internally.

### How can it be used:
Users can now call the `run` method on a Haystack pipeline in two ways:

1. Using a data dictionary where each key is the component name, which in turn maps to another dictionary containing input parameters for that component.
2. With a flat dictionary of input keys and values, allowing the pipeline's internal mechanisms to automatically determine the relevant components for these inputs.

For instance:

```python
result = pipeline.run(data={"word": "world"})
```
vs.
```python
result = pipeline.run(data={"hello": {"word": "world"}})
```

Both invocations will produce the same outcome, but the former offers a more succinct way of executing the pipeline.

### How did you test it:
A test case has been added to `test/preview/test_pipeline.py` that verifies the functionality of the simplified pipeline input. The test creates a pipeline with a "Hello" component and executes it using three different input structures:

1. Nested input with component names.
2. Flat input without component names.
3. Positional argument style.

The assert statements confirm that all invocation methods produce the expected output.

### Notes for the reviewer:
- This update comes with an "overloaded" version of the `run` method to handle different input structures.
- The test coverage seems appropriate, but additional edge cases might be considered for extensive testing.
- The "overload" mechanism is used for method dispatch based on input arguments.
- Be aware that the new simplified input method interacts with internal pipeline mechanisms which could be subject to future changes. It's marked as an evolving interface, so reviewing for potential long-term maintenance issues would be prudent.
- Please ensure the changes align with the project's conventions for overriding methods and handling pipeline inputs.
- A note has been added to the release notes `allow-simplified-pipeline-run-input-e3dd98ff38f0bc01.yaml` detailing this feature for users upgrading or reading the changelog.