#Introduction

This notebook demonstrates the versatility of Haystack 2.x framework in integrating with any OpenAPI specification service, exemplified here using automated GitHub Pull Request writing. It highlights how we can dynamically invoke any OpenAPI services and incorporate their outputs into the context of a Large Language Model (LLM), showcasing on-demand, service-based Retrieval-Augmented Generation (RAG).

## 1. Setup

This notebook demos GitHub Pull Request (PR) text generation.

Let's install necessary libraries and import key modules to build the foundation for the subsequent steps.

In [1]:
!pip install -q git+https://github.com/deepset-ai/haystack.git@openapi_container_v3#egg=farm-haystack[preview]

[33mDEPRECATION: git+https://github.com/deepset-ai/haystack.git@openapi_container_v3#egg=farm-haystack[preview] contains an egg fragment with a non-PEP 508 name pip 25.0 will enforce this behaviour change. A possible replacement is to use the req @ url syntax, and remove the egg fragment. Discussion can be found at https://github.com/pypa/pip/issues/11617[0m[33m
[0m  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.0/75.0 kB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.7/10.7 MB[0m [31m82.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m48.7/48.7 kB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.8/10.8 MB[0m [31m91.2 MB/s

In [2]:
!pip install -q jsonref openapi3

In [3]:
import getpass
import os

from haystack.preview import Pipeline
from haystack.preview.components.converters import OpenAPIServiceToFunctions
from haystack.preview.components.connectors import OpenAPIServiceConnector
from haystack.preview.components.generators.chat import GPTChatGenerator
from haystack.preview.components.generators.utils import default_streaming_callback
from haystack.preview.dataclasses import ChatMessage

## 2. API Key Input and System Initialization

Begin by entering your OpenAI API key. Following this step, we initialize a system message for the GitHub PR Expert.

In [4]:
llm_api_key = getpass.getpass("Enter LLM provider api key:")

Enter LLM provider api key:··········


In [44]:
system_message = """
As the GitHub PR Expert, your enhanced role now includes the ability to analyze diffs provided by GitHub REST service.
You'll be given a JSON formatted string consisting of PR commits, description, authors etc. Your primary task is
crafting GitHub Pull Request text in markdown format, structured into five sections:

Why:
What:
How can it be used:
How did you test it:
Notes for the reviewer:

Always use these sections' names, don't rename them.

When provided with a diff link or output, you should review and interpret the changes to accurately describe them
in the PR. In cases where the diff is not clear or more context is needed, you should request additional information
or clarification. Continue to use markdown elements effectively to organize the PR content. Your goal is to offer
insightful, accurate descriptions of code changes, enhancing the understanding of the PR reviewer.
Do not use ```markdown and ``` delimeters, just start your response with ### Why markdown format directly.
"""
system_message = ChatMessage.from_system(system_message)


## 3. Pipeline Creation and Configuration

This section involves setting up the core components of the Haystack 2.x pipeline, which includes the OpenAPIServiceToFunctions, GPTChatGenerator, and OpenAPIServiceConnector. These components are connected to create a pipeline that processes and interprets the GitHub PR commands and data.

In [29]:
functions_converter = OpenAPIServiceToFunctions()
functions_llm = GPTChatGenerator(api_key=llm_api_key, model_name="gpt-3.5-turbo-0613")
openapi_container = OpenAPIServiceConnector()
llm = GPTChatGenerator(api_key=llm_api_key, model_name="gpt-4-1106-preview", streaming_callback=default_streaming_callback)

In [30]:
pipe = Pipeline()
pipe.add_component("functions_converter", functions_converter)
pipe.add_component("functions_llm", functions_llm)
pipe.add_component("openapi_container", openapi_container)
pipe.connect("functions_converter.functions", "functions_llm.generation_kwargs")
pipe.connect("functions_converter.service_openapi_spec", "openapi_container.service_openapi_spec")
pipe.connect("functions_llm.replies", "openapi_container.messages")

gen_pipe = Pipeline()
gen_pipe.add_component("llm", llm)

## 4. User Input and PR Command Processing

Here, the user can input specific GitHub PR commands. Make sure to mention
project, repo and the branches involved.

In [50]:
user_prompt = input("Enter your GitHub PR command: ")
#Example: Compare branches main and SearchApi:feat/add-searchapi-integration, in project deepset-ai, repo haystack
#Example: Compare branches main and rafaelpadilla:add_bbox_transformations in project huggingface repo transformers

Enter your GitHub PR command: Compare branches main and SearchApi:feat/add-searchapi-integration in project deepset-ai, repo haystack


In [51]:
messages = [ChatMessage.from_system("You are a helpful assistant capable of function calling."),
            ChatMessage.from_user(user_prompt)]

## 5. Processing OpenAPI Specification and GitHub Service Invocation
In this step, the notebook retrieves the OpenAPI specification for the GitHub compare branches service. This specification is then transformed into OpenAI function definitions. When a user inputs a command, the LLM generates service information parameters from this input. These parameters are used to dynamically invoke the GitHub compare branches service, allowing for real-time, context-sensitive interactions with GitHub's API.


But before we do that let's review the GitHub OpenAPI service definition.


In [20]:
openapi_github_compare_branches_spec_url = "https://t.ly/eBODl"

In [21]:
import json
import requests
from IPython.display import HTML

def render(jstr):
  if type(jstr) != str:
    jstr = json.dumps(jstr)
  return HTML("""
<script src="https://rawgit.com/caldwell/renderjson/master/renderjson.js"></script>
<script>
renderjson.set_show_to_level(1)
document.body.appendChild(renderjson(%s))
new ResizeObserver(google.colab.output.resizeIframeToContent).observe(document.body)
</script>
""" % jstr)

response = requests.get(openapi_github_compare_branches_spec_url)
response.raise_for_status()
render(response.json())

In [52]:
# The fetched data, which includes details like PR commits, descriptions, and author information
service_result = pipe.run(data={"functions_converter": {"service_spec_url": openapi_github_compare_branches_spec_url},
                                "functions_llm": {"messages": messages}})

## 6. Generating Github PR Text with GPT-4 Model

Using the latest GPT-4 model (gpt-4-1106-preview), this section generates the textual content of the GitHub PR using the GitHub service data as context.

In [53]:
github_pr_prompt_messages = [system_message] + service_result["openapi_container"]["service_response"]
final_result = gen_pipe.run(data={"llm": {"messages": github_pr_prompt_messages}})

### Why
The purpose of this Pull Request is to introduce SearchApi integration into the Haystack project, allowing users to access search results from various engines including Google, Google Scholar, YouTube, and YouTube transcripts through a unified API.

### What
This PR consists of adding a new `SearchApi` class, along with associated updates and documentation changes to support the integration of SearchApi as a web search provider. The main changes are:

1. Modification of the `retriever/web.py` to include "SearchApi" as a search engine provider.
2. Adding the new `SearchApi` class inside `providers.py` to handle the actual SearchApi queries and result processing.
3. Updating the `WebSearch` component within `web.py` to support the new SearchApi provider.
4. Extending the `__init__.py` file within `components/websearch` to include the new `SearchApiWebSearch`.
5. Introducing a new `searchapi.py` file in `components/websearch` which defines the `SearchApiWebSearch` component.
6. Do

##7. Displaying the Generated PR Text

Although we also streamed GitHub PR text, the generated GitHub PR text is displayed below in a special markdown component.

In [54]:
from IPython.display import Markdown
Markdown(final_result["llm"]["replies"][0].content)

### Why
The purpose of this Pull Request is to introduce SearchApi integration into the Haystack project, allowing users to access search results from various engines including Google, Google Scholar, YouTube, and YouTube transcripts through a unified API.

### What
This PR consists of adding a new `SearchApi` class, along with associated updates and documentation changes to support the integration of SearchApi as a web search provider. The main changes are:

1. Modification of the `retriever/web.py` to include "SearchApi" as a search engine provider.
2. Adding the new `SearchApi` class inside `providers.py` to handle the actual SearchApi queries and result processing.
3. Updating the `WebSearch` component within `web.py` to support the new SearchApi provider.
4. Extending the `__init__.py` file within `components/websearch` to include the new `SearchApiWebSearch`.
5. Introducing a new `searchapi.py` file in `components/websearch` which defines the `SearchApiWebSearch` component.
6. Documentation updates to describe the new SearchApi provider in `test/4084-agent-demo.md`.
7. Release notes entry (`add-searchapi-integration-bb9130485c3c9429.yaml`) mentioning the integration of SearchApi as a web search provider.
8. Test cases have been added (`test_web_search.py`) to ensure the correct functionality of the SearchApi provider within the web search module.

### How can it be used
After merging this PR, users can configure the Haystack pipeline to use the SearchApi provider for conducting searches across various engines. It can be done by specifying `SearchApi` as the provider when using the `WebSearch` component and providing necessary parameters such as the API key, top search results count (`top_k`), allowed domains, and any other search engine-specific parameters.

### How did you test it
Integration tests have been added for the new provider, as seen in the files `test_web_search.py`, `test_searchapi.py`, and relevant environment keys have been set up to test against the real SearchApi service. Tests cover basic functionality, handling of different `top_k` values, and ensure that proper exceptions are thrown for request timeouts or HTTP errors.

### Notes for the reviewer
- The reviewer should ensure that the added SearchApi class conforms to the existing architecture and coding standards of the Haystack project.
- Attention should be given to the exception handling part to confirm that it's robust and user-friendly.
- The documentation changes should be reviewed to ensure they provide clear and concise guidance to the end-users on how to use the new SearchApi integration.
- Additional focus on the tests is suggested to make sure they cover the crucial parts of the new functionality and are reliable indicators of the provider's correct behavior.
- Since the API key for SearchApi is required for real-time testing, confirm that no sensitive information is exposed and proper environment variable management is in place.

## Thank you, questions?

<a href="www.qr-code-generator.com/" border="0" style="cursor:default" rel="nofollow"><img src="https://chart.googleapis.com/chart?cht=qr&chl=https%3A%2F%2Fgithub.com%2Fvblagoje%2Fnotebooks%2Fblob%2Fmain%2Fhaystack2x-demos%2Fgithub_pr_writer_haystack2_x.ipynb&chs=180x180&choe=UTF-8&chld=L|2"></a>

Links:
https://github.com/deepset-ai/haystack/