<a href="https://www.kaggle.com/code/yatharthbisht/generating-openapi-specs-for-crms-using-gemini?scriptVersionId=208883138" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

 ## Preface <a class="anchor"  id="chapter1"></a>

***Expanding the Horizons of Context with Gemini 1.5 🌌✨***

The ability of a language model to retain and utilize information from previous interactions, known as its "context window," dramatically impacts its performance and potential applications. Gemini 1.5 represents a significant leap forward, boasting a context window capable of processing up to 2 million tokens—equivalent to roughly 100,000 lines of code, 10 years of text messages, or 16 average English novels. 📚💬 This expanded capacity unlocks new possibilities, minimizing reliance on traditional memory augmentation techniques like vector databases and Retrieval Augmented Generation (RAG) while empowering more direct methods such as in-context retrieval and many-shot prompting. 🚀 Context caching, a crucial element in managing this extensive memory, allows the model to efficiently access and process vast amounts of information, enabling coherent and informed responses. 🧠✅ This document explores the innovative applications of Gemini 1.5's expansive context window, focusing on context caching, comparing in-context and traditional retrieval methods, and showcasing a novel approach to many-shot prompting.

***Context Caching: Navigating the Vastness of Gemini's Memory 🗂️⚡***

Gemini 1.5's massive context window demands efficient memory management. Context caching addresses this challenge by strategically storing and retrieving previously processed information. This caching mechanism operates on a hierarchical structure, prioritizing recently used information and employing intelligent eviction policies to manage memory effectively. 🔄 This reduces redundant computations and improves response times, especially when dealing with lengthy conversations or documents. 📑✨

For instance, when processing a long legal document, context caching allows the model to quickly access relevant clauses and precedents mentioned earlier, without needing to reprocess the entire document each time. ⚖️📜


***In-Context Retrieval vs. RAG: A Comparative Analysis 🤔🔍***

Traditional methods like RAG rely on external databases to augment the model's knowledge. Gemini 1.5's large context window facilitates in-context retrieval, where relevant information is directly embedded within the prompt. While RAG offers broader knowledge coverage by accessing external resources 🌐, it introduces latency and complexity. In-context retrieval, on the other hand, offers faster response times and simplified implementation, ideal for scenarios where the necessary information can be contained within the prompt. ⏩💡

For example, when summarizing a specific news article, including the article text directly in the prompt (in-context retrieval) is more efficient than querying a database of all news articles (RAG). 📰 However, if the task requires information beyond the immediate context, like comparing the article to historical events, RAG becomes necessary. 🕰️📘


***Rethinking Many-Shot Prompting: Leveraging Long Examples 🖋️✨***

Many-shot prompting typically involves providing the model with numerous small examples to guide its behavior. With Gemini 1.5, we explore a novel approach: utilizing a smaller number of significantly longer examples. 📏📂

This approach leverages Gemini's extended context window to capture complex relationships and nuances within the examples. Instead of providing numerous short examples of code generation for different functions, a single long example demonstrating the development of a complete module could provide a richer learning experience. 💻📖

This allows the model to learn broader coding patterns and best practices, improving its ability to generate coherent and functional code for more complex tasks. 🎯 While this approach requires careful selection and crafting of the long examples, it has the potential to significantly improve the quality and relevance of the model's output. 🌟✅
********************************************************************************


 ## Part 1: Exploring Online Documentation Q&A using Gemini's Huge Context Window and the Concept of Context Caching <a class="anchor"  id="chapter1"></a>
 
In the first part of this notebook, we explore how to leverage Gemini's long-context feature, which allows prompting with up to 2 million tokens, to ask questions about virtually any documentation available online. 🌐

This capability is particularly valuable as many online documentations lack an integrated AI assistant to address general queries. ❓ With Gemini's vast context window, we can directly embed large portions of documentation into the prompt, enabling users to extract meaningful answers without needing to navigate complex pages manually. 🧠✨

Additionally, by utilizing context caching, we can efficiently manage and process this vast amount of information. Context caching ensures that frequently accessed sections of the documentation are readily available, reducing processing times and enhancing the user experience. ⏩

This approach offers a unique advantage by:

Simplifying Navigation: No need to scroll through lengthy documents for answers. 📜
Enhancing Precision: Ensuring that specific, contextually relevant answers are retrieved. 🎯✅
Saving Time: Quick access to accurate information, even in large and detailed documentations. ⏳💡
Gemini's long-context capability, combined with context caching, opens new possibilities for online documentation Q&A, making technical and non-technical information far more accessible and user-friendly. 🚀🔍

In [1]:
###Below is the implementation of a simple webcrawler , that is responsible for extracting the URLs of literally every page of that webpage


import urllib.request
import urllib.error
from urllib.parse import urlparse, urljoin
from bs4 import BeautifulSoup
from queue import Queue
import json
class SimpleCrawler:
    def __init__(self, root_url):
        self.root_url = root_url.rstrip("/")
        self.base_host = urlparse(root_url).netloc
        self.visited = set()
        self.to_visit = Queue()
        self.to_visit.put(root_url)
        self.crawler_links = {}
        self.modified_urls = []

    def crawl(self):
        while not self.to_visit.empty():
            url = self.to_visit.get()
            if url in self.visited:
                continue
            reduced_url = url.replace(self.root_url, "", 1) or "/"
            self.modified_urls.append(reduced_url)
            print(f"Crawling: {url}")
            self.crawler_links[url] = []
            self.visited.add(url)
            links = self.fetch_links(url)
            for link in links:
                if urlparse(link).netloc == self.base_host and link.startswith(url):
                    self.crawler_links[url].append(link)
                    print(f"  Found relevant link: {link}")
                    if link not in self.visited:
                        self.to_visit.put(link)

    def fetch_links(self, url):
        try:
            response = urllib.request.urlopen(url)
            if response.info().get_content_type() != "text/html":
                return []
            html_content = response.read().decode("utf-8")
            soup = BeautifulSoup(html_content, "html.parser")
            links = [urljoin(url, a.get("href")) for a in soup.find_all("a", href=True)]
            return links
        except urllib.error.URLError as e:
            print(f"Error fetching {url}: {e}")
            return []

    def save_results(self):
        with open("crawler_with_found_links_new_.json", "w") as f:
            json.dump(self.crawler_links, f, indent=4)
        print("Saved crawler_with_found_links_new.json")
        with open("crawler_only_urls_new.json", "w") as f:
            json.dump(self.modified_urls, f, indent=4)
        print("Saved crawler_only_urls_new.json")

def crawl_and_save_results(url):
    crawler = SimpleCrawler(url)
    crawler.crawl()
    crawler.save_results()
    return ("crawler_with_found_links_new.json", "crawler_only_urls_new.json")

**Reason for using openapi's documentation**


Using openapi's documentation not only serves as an example of how we can question whole documentations available online ,but it will also serve as a stepping stone towards building a proper openapi specefication, which is the main goal of the notebook .

In [None]:
paths1, paths2 = crawl_and_save_results("https://learn.openapis.org/")
print("Generated files")

In [None]:
import json
with open('/kaggle/working/crawler_with_found_links_new_.json', 'r') as file:
    data = json.load(file)
all_urls = []
for key, value in data.items():
    if isinstance(value, list):
        all_urls.extend(value)
print(all_urls)

In [None]:
all_urls = list(set(all_urls))
print(len(all_urls))

In [None]:
##in the below method we are successfully extracting text from every webpage of a particular documentation

import nest_asyncio
import asyncio
import aiohttp
nest_asyncio.apply()

async def fetch_text(session, url):
    try:
        async with session.get(url) as response:
            response_text = await response.text()
            soup = BeautifulSoup(response_text, "html.parser")
            body_text = soup.find("body").get_text()
            return body_text.strip()
    except Exception as e:
        print(f"Error fetching {url}: {e}")
        return ""

async def get_text_from_urls(url_list):
    async with aiohttp.ClientSession() as session:
        tasks = [fetch_text(session, url) for url in url_list]
        return await asyncio.gather(*tasks)

url_list = all_urls
text_list = await get_text_from_urls(url_list)
print(text_list)

In [None]:
def write_list_to_file(strings, filename="output.txt"):
    try:
        with open(filename, "w") as f:
            for string in strings:
                f.write(string + "\n")  # Add a newline after each string

    except Exception as e:
        print(f"An error occurred: {e}")


my_strings = text_list
filename = "openapi_cache_documentation.txt"

try:
    write_list_to_file(my_strings, filename)
    print(f"Strings successfully written to {filename}")

except Exception as e:  # Catch any potential errors during file operations
    print(f"An error occurred while writing to the file: {e}")

In [7]:
import os
import google.generativeai as genai
from google.generativeai import caching
import datetime
import time

In [9]:
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
secret_value_0 = user_secrets.get_secret("GEMINI_API_KEY")


In [10]:
genai.configure(api_key=secret_value_0)


 ## Context Caching🗄️ <a class="anchor"  id="chapter1"></a>

Context caching is a feature provided by the Gemini API that allows you to store the initial input tokens sent to a model and reuse them for subsequent requests. In a typical AI workflow, you might send the same input tokens repeatedly. Context caching optimizes this process by caching these tokens, allowing you to reference them instead of resending the entire corpus for each request. This significantly reduces the cost 💰 and latency ⏱️, especially with large initial contexts. You can control the duration for which the cached tokens are stored (Time To Live or TTL), which defaults to 1 hour. The cost for caching is determined by the input token size and the duration of the cache. Context caching is currently supported for both Gemini 1.5 Pro and Gemini 1.5 Flash. It is important to note that context caching is available only for stable models with fixed versions (e.g., gemini-1.5-pro-001).

In our implementation, we leverage context caching due to the substantial size of the extracted information from the OpenAPI documentation. Persisting this large body of text as part of each individual query would be computationally expensive and inefficient. 💸 By caching the entire OpenAPI specification, we process it only once, significantly reducing the cost and latency for subsequent queries. This approach aligns with research conducted by leading technology companies like Google, which demonstrate the benefits of caching mechanisms for improving the efficiency of large language model interactions (e.g., "Efficient Large-Scale Language Model Training on GPU Clusters"). This enables us to efficiently explore and analyze the OpenAPI documentation without incurring excessive computational costs. ✅

In [11]:
path_your_file = "/kaggle/input/context-for-openapi-generation-using-gemini/Long_Context_Openapi_Yatharth/openapi_cache_documentation.txt"
text_file = genai.upload_file(path=path_your_file)
while text_file.state.name == 'PROCESSING':
  print('Waiting for text file to be processed.')
  time.sleep(2)
  text_file = genai.get_file(text_file.name)

print(f'Text processing complete: {text_file.uri}')

Text processing complete: https://generativelanguage.googleapis.com/v1beta/files/i08wr3p7t1cw


In [12]:
cache = caching.CachedContent.create(
    model='models/gemini-1.5-flash-001',
    display_name='openapi_documentation', # used to identify the cache
    system_instruction=(
        'You are a master of documentation analysis, and your job is to answer '
        'the user\'s query based on the documentaton text file you have access to.'
    ),
    contents=[text_file],
    ttl=datetime.timedelta(minutes=5),
)

In [13]:
model = genai.GenerativeModel.from_cached_content(cached_content=cache)

**We will be using  the question answering facillity to extract some rules and regulations** that our openapi spec generation agents will be following in the later stages of this project

In [14]:
response = model.generate_content([(
    "You need to generate a complete set of step by step instructions for an agent that specializes in generating an openapi specefication given 2 openapi specs, given that the resulting ,generated spec has the capabillities of performing actions of both the scripts given to it as input"
                  "You should also consider all the rules the agent should follow and the precautionary measures to take"
               "You should treat the output as direct instructions to the agent and format it accordingly. Please refrain from using unnecessary headers or footers ,titles ,etc")])
print(response.usage_metadata)

prompt_token_count: 405141
candidates_token_count: 599
total_token_count: 405740
cached_content_token_count: 405038



In [None]:
merge=response.text
print(merge)

In [None]:
regs= model.generate_content([(
            "Give my AI agent specializing in generating openapi specefications a detailed, step by step guide as to how it can generate an openapi spec given some json responses as context ,for a particular CRM and user preference"
              "You should also consider all the rules the agent should follow and the precautionary measures to take"
               "You should treat the output as direct instructions to the agent and format it accordingly. Please refrain from using unnecessary headers or footers ,titles ,etc")])
print(regs.usage_metadata)
print(regs.text)

In [17]:
final_instruct=regs.text

In [None]:
print(final_instruct)

In [None]:
corr= model.generate_content([(
            "Give my AI agent specializing in correcting and checking the format of openapi specefications a detailed list of precautions it sholud keep in mind , errors to look out for and some common mistakes"
              "You should also consider all of the script and look out for any mistakes "
               "You should treat the output as direct instructions to the agent and format it accordingly. Please refrain from using unnecessary headers or footers ,titles ,etc")])
print(corr.usage_metadata)
print(corr.text)

In [20]:
corrections_conduct=corr.text

*Please note that the above 3 generated answers will serve as instructions for agents in the future as we dive deeper.*

*Feel free to ask any questions of your own in the cell below*

In [None]:
qtion=input("Enter your question here")
qtions = model.generate_content([(f"You have been asked the following question : {qtion} please answer it to the best of your abillity ")])
print (qtion.text)

**Why was context caching necessary?**

Notice how the prompt size in this scenario was extremely high (over 400 thousand tokens😱!). It not only would have been impossible to pass such a large prompt without Gemini's Long context facilities, it would be incredibly costly. In this scenario, since we are repeatedly questioning the same piece of information, the whole documentation was passed through only once, saving both time⏰ and money💸

In the next section we will discuss how we will utilize the retrieved information to the max!!!💫


## PART 2 Generating OpenAPI Specifications using Gemini 🚀 <a class="anchor"  id="chapter1"></a>


Now we'll utilize the knowledge gathered so far to achieve our goal of generating OpenAPI specifications. The objective of this agentic framework is to generate any OpenAPI spec given only the user's preferences and the CRM's URL. In the example below, we'll use Copper CRM, a CRM platform specifically designed for businesses that use Google Workspace (formerly G Suite). It focuses on relationship management and sales automation within the Google ecosystem.

***Introduction*** 📖

APIs are the backbone of modern businesses, enabling seamless integration and interaction between applications. OpenAPI specifications (OpenAPI specs) are critical for documenting APIs, providing a standardized, machine-readable format that defines API endpoints, methods, request/response structures, and more. These specifications streamline development by enabling automated client and server code generation, testing, and integration. However, manually creating OpenAPI specs from extensive API documentation is a time-consuming ⏳ and error-prone process, particularly for businesses managing multiple APIs with large, complex documentation.

***The Problem*** ⚠️

API documentation for platforms like CRMs (Customer Relationship Management tools) often spans thousands of lines of text, detailing numerous endpoints, parameters, and workflows. Manually converting this documentation into OpenAPI specs demands significant developer time and expertise. Moreover, inconsistencies in documentation formats and the sheer volume of text increase the likelihood of errors, delays, and missed opportunities for automation. These inefficiencies translate into increased costs 💰 for businesses and slow adoption of APIs in critical workflows.

***How Gemini’s Long Context Window Solves the Problem*** ✨

This project leverages Gemini 1.5’s groundbreaking long context window, capable of processing up to 2 million tokens at once. This capability is crucial for handling large API documentation in its entirety, without the need to truncate or split content into smaller chunks. By providing Gemini with the complete documentation as context, combined with user-provided specifications (e.g., "Focus only on endpoints for deals and contacts"), the model can generate precise, ready-to-use OpenAPI specs in YAML or JSON format. This eliminates the manual effort involved in parsing and converting documentation, reducing errors and accelerating API integration. ✅ This approach also facilitates customization, allowing users to specify exactly which parts of the API they want to include in the generated specification. This targeted approach enhances efficiency and reduces complexity in managing multiple APIs. 💯

In [21]:
user_pref=input("enter spec preference") ## for an example , I am using the following prompt : "I want to build a openapi spec, using copper developer api for fetching a lead by ID, creating a new lead and create people in BULK "
link=input("enter spec url") ##I am using copper CRM (https://developer.copper.com/) as reference , you can use any other CRM as well .

enter spec preference I want to build a openapi spec, using copper developer api for fetching a lead by ID, creating a new lead and create people in BULK 
enter spec url https://developer.copper.com/


In [None]:
paths1, paths2 = crawl_and_save_results(link)
print("Generated files")

In [None]:
with open("/kaggle/working/crawler_with_found_links_new_.json", 'r') as file:
    data = json.load(file)

all_urls = []
for key, value in data.items():
    if isinstance(value, list):
        all_urls.extend(value)
all_urls = list(set(all_urls))#for unique entries only
print(all_urls)

In [24]:
len(all_urls)

748


## Part 2(a): Comparing RAG (Retrieval Augmented Generation) based response with In-Context Generation <a class="anchor"  id="subsection1"></a>

As part of Gemini's long-context memory capabilities, the possibility of In-Context Generation (ICG) arises. ✨ In-Context Generation refers to a method where the language model generates a response based solely on the provided context within the prompt itself. 🧠 There’s no external retrieval or database lookup involved; the model leverages its internal understanding of the provided data to produce the output. This contrasts sharply with Retrieval Augmented Generation (RAG), which involves retrieving relevant information from an external knowledge base 📚 and incorporating it into the model’s response.

We will now analyze which of these two approaches—RAG and ICG—is more suitable for retrieving information from large text corpora. 📊 As an example, both frameworks will be tasked with retrieving relevant URLs 🌐 from a substantial collection of URLs extracted from the documentation's webpage. The relevance of these URLs will be determined based on user preferences provided at the start. ✅ This comparison aims to assess the strengths and weaknesses of each method when dealing with extensive amounts of information within Gemini's expanded context window. 🖥️

In [None]:
!pip install llama_index
!pip install llama_index.embeddings.huggingface
!pip install llama_index.llms.gemini

In [None]:
## we will be using a simple llama-index based agentic RAG as our contender for team RAGs

from llama_index.core import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
    Settings,
    load_index_from_storage
)
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.node_parser import SentenceSplitter
from llama_index.llms.gemini import Gemini
import os
import warnings
warnings.filterwarnings('ignore')
import os
from dotenv import load_dotenv

load_dotenv()


def main_match(user_req):

    file_path = "/kaggle/working/crawler_only_urls_new.json"

    Settings.llm  = Gemini(model="models/gemini-1.5-flash", api_key=secret_value_0)
    Settings.embed_model = HuggingFaceEmbedding(model_name="sentence-transformers/all-MiniLM-L6-v2")
    Settings.node_parser = SentenceSplitter(chunk_size=1024, chunk_overlap=200)

    reader = SimpleDirectoryReader(input_files=[file_path])
    documents = reader.load_data()
    nodes = Settings.node_parser.get_nodes_from_documents(documents, show_progress=True)

    vector_index = VectorStoreIndex.from_documents(documents, node_parser=nodes)
    vector_index.storage_context.persist(persist_dir="storage_mini")
    storage_context = StorageContext.from_defaults(persist_dir="storage_mini")
    index = load_index_from_storage(storage_context)
    query_engine = index.as_query_engine()

    q6 = f"""
    You are a system that is adept at evaluating and retrieving links from a json file,
    and those links will be retrieved on the basis of the user's specifications and requests,
    i.e., you will receive a user request (for the purpose of making an openapi spec) and you will evaluate that request,
    each word of it. Then, you will use this information to retrieve possible matches from a json file containing links from specific documentation which might contain the information the user is looking for.
    The user request is as follows: {user_req}
    YOUR FINAL OUTPUT SHOULD ONLY CONTAIN A LIST OF THE MATCHES AND NOTHING ELSE, ABSOLUTELY NOTHING ELSE, AND THESE MATCHES SHOULD BE TRACEABLE AND FOUND DIRECTLY, AS IS, IN THE GIVEN JSON FILE. MAKE SURE TO ADHERE TO THESE INSTRUCTIONS.
    """
    resp6 = query_engine.query(q6)
    print(resp6)
    print(type(resp6))
    return resp6

output= main_match("I want to build a openapi spec, using copper developer api for fetching a lead by ID, creating a new lead and create people in BULK")
list_resp6 = [item.strip() for item in str(output).strip("[]").split("\n")]

print(list_resp6)
print(str(output))

In [None]:
print(type(list_resp6[len(list_resp6)-3]))
new_list = eval(list_resp6[len(list_resp6)-3])
print(new_list)

In [None]:
def remove_duplicates(input_list):
    seen = set()
    result = []
    for item in input_list:
        if item not in seen:
            result.append(item)
            seen.add(item)
    return result
def get_links_for_modified_urls(modified_urls, root_url, filename="crawler_with_found_links_new_.json"):

    file_path = "/kaggle/working/crawler_with_found_links_new_.json"
    with open(file_path, "r") as f:
        crawler_data = json.load(f)

    all_found_links = []
    for url in modified_urls:
        full_url = root_url.rstrip("/") + url
        if full_url in crawler_data:
            all_found_links.extend([full_url] + crawler_data[full_url])
        else:
            print(f"Error: {full_url} not found in the data.")

    all_found_links = remove_duplicates(all_found_links)
    print("length of total links : " , len(all_found_links))

    print (all_found_links)
    return all_found_links
RAG_links=get_links_for_modified_urls(new_list,"https://developer.copper.com/")
print(RAG_links) ## the final output acquired through RAG

In [None]:
## now for in context retrieval by gemini derived links
print("content")
def join_strings(strings):
    return '\n'.join(strings)

# Example usage
strings = all_urls
corpus = join_strings(strings)
print(corpus)


In [26]:
#The above  output will be sent to gemini directly as prompt in order for it to identify all the relevant links
genai.configure(api_key=secret_value_0)

# Create the model
generation_config = {
  "temperature": 1,
  "top_p": 0.95,
  "top_k": 40,
  "max_output_tokens": 8192,
  "response_mime_type": "text/plain",
}

model = genai.GenerativeModel(
  model_name="gemini-1.5-pro",
  generation_config=generation_config,
  system_instruction="You are an excellent and unbiased text parser. You can efficiently and effectively extract smaller pieces of text from a large corpus , given the prompt for it. Your task here would be to extract links related to a particular prompt and return them. ",
)

chat_session = model.start_chat(
  history=[
    {
      "role": "user",
      "parts": [
          corpus
      ],
    },
  ]
)

response = chat_session.send_message(f"from the above given URLs, extract the URLs that might contain the contexts related to the following user specefication : {user_pref} \nPLEASE NOTE THAT YOU SHOULD RETURN AS MANY LINKS AS POSSIBLE, GIVEN THAT THEY ARE CONTEXTUALLY RELEVENT , AND MOST OF ALL DO NOT AT ANY COST RETURN ANYTHING APART FROM THE URLs THEMSELVES\n",
)
print(response.usage_metadata)
print(response.text)

prompt_token_count: 18555
candidates_token_count: 755
total_token_count: 19310

https://developer.copper.com/leads/fetch-a-lead-by-id.html
https://developer.copper.com/leads/fetch-a-lead-by-id.html#fetch-a-lead-by-id
https://developer.copper.com/leads/fetch-a-lead-by-id.html#sample-lead
https://developer.copper.com/leads/fetch-a-lead-by-id.html#example-requests
https://developer.copper.com/leads/create-a-new-lead.html
https://developer.copper.com/leads/create-a-new-lead.html#create-a-new-lead
https://developer.copper.com/leads/create-a-new-lead.html#create-new-lead
https://developer.copper.com/leads/create-a-new-lead.html#request-body
https://developer.copper.com/leads/create-a-new-lead.html#example-requests
https://developer.copper.com/people/bulk-create-people.html
https://developer.copper.com/people/bulk-create-people.html#bulk-create-people
https://developer.copper.com/people/bulk-create-people.html#example-requests
https://developer.copper.com/people/bulk-create-people.html#reques

In [27]:
ICR_corpus=response.text
# ICR stands for in context retrieval

ICR_links = ICR_corpus.splitlines()
ICR_links.pop()
print(ICR_links)


['https://developer.copper.com/leads/fetch-a-lead-by-id.html', 'https://developer.copper.com/leads/fetch-a-lead-by-id.html#fetch-a-lead-by-id', 'https://developer.copper.com/leads/fetch-a-lead-by-id.html#sample-lead', 'https://developer.copper.com/leads/fetch-a-lead-by-id.html#example-requests', 'https://developer.copper.com/leads/create-a-new-lead.html', 'https://developer.copper.com/leads/create-a-new-lead.html#create-a-new-lead', 'https://developer.copper.com/leads/create-a-new-lead.html#create-new-lead', 'https://developer.copper.com/leads/create-a-new-lead.html#request-body', 'https://developer.copper.com/leads/create-a-new-lead.html#example-requests', 'https://developer.copper.com/people/bulk-create-people.html', 'https://developer.copper.com/people/bulk-create-people.html#bulk-create-people', 'https://developer.copper.com/people/bulk-create-people.html#example-requests', 'https://developer.copper.com/people/bulk-create-people.html#request-body', 'https://developer.copper.com/peo

In-Context Generation proved to be significantly easier and more user-friendly. It required no integration with external databases or APIs, streamlining the process and saving valuable time ⏳. With Gemini's expanded context window, ICG could maintain a high fidelity to the provided data, ensuring that minimal information was lost. 🛡️ Furthermore, the risk of hallucinations—where a model generates irrelevant or incorrect outputs—was greatly reduced, making ICG a more reliable option for this use case.

In [None]:
### now that we have recieved all the links from which we need to extract text, we will proceed with the ICR_links

In [None]:
## extracting text from all the relevant web-pages ,which will serve as the corpus from which we derive our context from
import nest_asyncio
import asyncio
import aiohttp
from bs4 import BeautifulSoup
nest_asyncio.apply()

async def fetch_text(session, url):
    try:
        async with session.get(url) as response:
            response_text = await response.text()
            soup = BeautifulSoup(response_text, "html.parser")
            body_text = soup.find("body").get_text()
            return body_text.strip()
    except Exception as e:
        print(f"Error fetching {url}: {e}")
        return ""

async def get_text_from_urls(url_list):
    async with aiohttp.ClientSession() as session:
        tasks = [fetch_text(session, url) for url in url_list]
        return await asyncio.gather(*tasks)

url_list = ICR_links
text_list = await get_text_from_urls(url_list)
print(text_list)

In [30]:

### The agent below is responsible for extracting JSON/HTTP 200 response from the text corpus, which will help us identify and extract all the relevant data(in this case json response) which will be further used as context for spec building

genai.configure(api_key=secret_value_0)
def extract_https200_codes(text_input):
    generation_config = {
    "temperature": 1,
    "top_p": 0.95,
    "top_k": 40,
    "max_output_tokens": 8192,
    "response_mime_type": "text/plain",
    }

    model = genai.GenerativeModel(
    model_name="gemini-1.5-flash",
    generation_config=generation_config,
    system_instruction=
                        """
                    "You are a content extraction assistant tasked with identifying and extracting JSON response codes from large blocks of text. "
                    "Given a block of text, your role is to:\n\n"
                    "- Extract only the JSON response codes mentioned in the text.\n\n"
                    "Your output should consist solely of the JSON response codes found, with no additional text, explanations, or formatting and not any headings / headers like 'here's what i found'etc.\n\n"
                    "If no JSON response codes are found in the text, simply return 'NOT_FOUND' and nothing else."
                    "for some context , the desired output may look something like : "

                                        {
                      "people": [
                        {
                          "name": "My Contact",
                          "emails": [
                            {
                            "email": "mycontact_1233@noemail.com",
                            "category": "work"
                            }
                          ],
                          "address": {
                            "street": "123 Main Street",
                            "city": "Savannah",
                            "state": "Georgia",
                            "postal_code": "31410",
                            "country": "United States"
                          },
                          "phone_numbers": [
                            {
                            "number": "415-123-45678",
                            "category": "mobile"
                            }
                          ]
                        }
                      ]
                    }

                    "NOTICE ONLY THE FORMAT AND NOT THE CONTENT OF TEXT "
                    "MAKE SURE ALL AND EVERY BIT OF TEXT THAT IS REMOTELY RELATED TO JSON FORMAT IS RETRIEVED FROM THE ENTIRETY OF THE TEXT AND NOTHING IS LEFT OUT "
                    """,
    )

    chat_session = model.start_chat(
    history=[

    ]
    )
    a= f"Analyze the following text and return only the HTTPS 200 codes and the JSON response codes:\n\n{text_input}"

    response = chat_session.send_message(a)
    return response.text



In [35]:
def filter_list(input_list):
    keywords = ["YOUR_TOKEN_HERE", "NOT_FOUND"]
    return [item for item in input_list if not any(keyword in item for keyword in keywords)]
def remove_duplicates(input_list):
    seen = set()
    result = []
    for item in input_list:
        if item not in seen:
            result.append(item)
            seen.add(item)
    return result
def jresp_list(text_list):
    http_codes=[]
    for i in text_list:
        http_codes.append(extract_https200_codes(i))
        time.sleep(5)
    http_codes= filter_list(http_codes)
    http_codes= remove_duplicates(http_codes)
    return http_codes

jresp=jresp_list(text_list)

In [36]:
print(type(jresp))

<class 'list'>


In [37]:


### a similar agent to the above ,tasked with extracting CURL queries from the text corpus, which will be further converted to JSON responses in order to bolster our context and makr our agents more robust.
def extract_curl(text_input):
    generation_config = {
    "temperature": 1,
    "top_p": 0.95,
    "top_k": 40,
    "max_output_tokens": 8192,
    "response_mime_type": "text/plain",
    }

    model = genai.GenerativeModel(
    model_name="gemini-1.5-flash",
    generation_config=generation_config,
    system_instruction=
                        """
                    "You are a content extraction assistant tasked with identifying and extracting Curl commands from large blocks of text. "
                    "Given a block of text, your role is to:\n\n"
                    "- Extract only the Curl commands present in the text.\n\n"

                A good example of a Curl Command would be :
                curl --location --request POST "https://api.copper.com/developer_api/v1/leads/{{example_leadconvert_id}}/convert" \
                --header "X-PW-AccessToken: YOUR_TOKEN_HERE" \
                --header "X-PW-Application: developer_api" \
                --header "X-PW-UserEmail: YOUR_EMAIL_HERE" \
                --header "Content-Type: application/json" \
                --data "{
                \"details\":{
                    \"person\":{
                    \"name\":\"John Doe\"
                    },
                    \"opportunity\":{
                    \"name\":\"Demo Project\",
                    \"pipeline_id\":213214,
                    \"pipeline_stage_id\":12345,
                    \"monetary_value\":1000
                    }
                }
                }"


                    "Your output should consist solely of the Curl commands found, with no additional text, explanations, or formatting and not any headings / headers like 'here's what i found'etc\n\n"
                    "If no Curl commands are found in the text, simply return 'NOT_FOUND' and nothing else."
                    "MAKE SURE ALL AND EVERY BIT OF TEXT THAT IS REMOTELY RELATED TO CURL QUERY IS RETRIEVED FROM THE ENTIRETY OF THE TEXT AND NOTHING IS LEFT OUT "
                    """,
    )

    chat_session = model.start_chat(
    history=[

    ]
    )
    a=f"Analyze the following text and return only the Curl commands:\n\n{text_input}"

    response = chat_session.send_message(a)
    return response.text
def filter_list(input_list):
    keywords = ["NOT_FOUND"]
    return [item for item in input_list if not any(keyword in item for keyword in keywords)]
def remove_duplicates(input_list):
    seen = set()
    result = []
    for item in input_list:
        if item not in seen:
            result.append(item)
            seen.add(item)
    return result
def curlresp_list(text_list):
    curl_codes=[]
    for i in text_list:
        curl_codes.append(extract_curl(i))
        time.sleep(5)
    curl_codes= filter_list(curl_codes)
    curl_codes= remove_duplicates(curl_codes)
    return curl_codes

curl=curlresp_list(text_list)


In [38]:



# Simple agent responsible for the conversion of CURL queries to JSON Response
def convert_curl_to_http_response(curl_command):
    generation_config = {
    "temperature": 1,
    "top_p": 0.95,
    "top_k": 40,
    "max_output_tokens": 8192,
    "response_mime_type": "text/plain",
    }

    model = genai.GenerativeModel(
    model_name="gemini-1.5-flash",
    generation_config=generation_config,
    system_instruction=

"""You are a JSON response generation agent designed to interpret and respond to curl API requests. Your task is to analyze each curl query, understand its purpose, and craft an appropriate JSON response that the API might return. Your responses should be accurate representations of what the API would likely send back, including default values, typical metadata, and inferred details from the request body. Use placeholders (e.g., {example_id}) where specific IDs or timestamps might be variable.

                              When creating JSON responses, ensure they contain all relevant fields in a structured format. For example, if a lead conversion request includes details about a person and an opportunity, your response should reflect those entities fully, with fields like id, name, status, and relevant timestamps. Always infer and include logical values for fields not specified in the request."""
                              """
                              Example conversion :
                                 Given Curl Query :
                                 curl --location --request POST "https://api.copper.com/developer_api/v1/leads/{{example_leadconvert_id}}/convert" \
                                    --header "X-PW-AccessToken: YOUR_TOKEN_HERE" \
                                    --header "X-PW-Application: developer_api" \
                                    --header "X-PW-UserEmail: YOUR_EMAIL_HERE" \
                                    --header "Content-Type: application/json" \
                                    --data "{
                                    \"details\":{
                                        \"person\":{
                                        \"name\":\"John Doe\"
                                        },
                                        \"opportunity\":{
                                        \"name\":\"Demo Project\",
                                        \"pipeline_id\":213214,
                                        \"pipeline_stage_id\":12345,
                                        \"monetary_value\":1000
                                        }
                                    }
                                    }"
                             Generated JSON response:
                             {
                                "id": "{{example_leadconvert_id}}",
                                "status": "Converted",
                                "converted_to": {
                                    "person": {
                                    "id": 987654,
                                    "name": "John Doe",
                                    "first_name": "John",
                                    "last_name": "Doe",
                                    "email": null,
                                    "phone_numbers": [],
                                    "address": null,
                                    "socials": [],
                                    "tags": [],
                                    "custom_fields": [],
                                    "date_created": 1672158444,
                                    "date_modified": 1672158444
                                    },
                                    "opportunity": {
                                    "id": 123456,
                                    "name": "Demo Project",
                                    "pipeline_id": 213214,
                                    "pipeline_stage_id": 12345,
                                    "monetary_value": 1000,
                                    "status": "Open",
                                    "date_created": 1672158444,
                                    "date_modified": 1672158444,
                                    "close_date": null,
                                    "tags": [],
                                    "custom_fields": []
                                    }
                                },
                                "original_lead": {
                                    "id": "{{example_leadconvert_id}}",
                                    "name": "Original Lead Name",
                                    "status": "Converted",
                                    "customer_source_id": 331242,
                                    "date_created": 1672157444,
                                    "date_modified": 1672158444,
                                    "date_last_contacted": null
                                }
                              }


                        "MAKE ABSOLUTELY SURE THAT YOUR OUTPUT IS SYNTACTICALLY CORRECT ,ACCURATE AND COMPLETE.ANY AND ALL ERRORS ARE TO BE AVOIDED ENTIRELY"
                        "Your output should consist solely of the converted commands found, with no additional text, explanations, or formatting and not any headings / headers like 'here's what i found'etc\n\n"
                    """,
    )

    chat_session = model.start_chat(
    history=[

    ]
    )
    a=f"Convert the following Curl command to a JSON response format:\n\n{curl_command}"

    response = chat_session.send_message(a)
    return response.text

def convert(curl_codes):
    curl2http=[]
    for i in curl_codes:
        a=convert_curl_to_http_response(i)
        curl2http.append(a)
        time.sleep(5)
    return curl2http

converted=convert(curl)

In [39]:
##COMBINED TEXT CORPUS : which will serve as main input to our baseline openapi generator
jresp.extend(converted)
combined_corpus="\n".join(jresp)
print(combined_corpus)

{
  "id": 8894157,
  "name": "Test Lead",
  "prefix": null,
  "first_name": "Test",
  "last_name": "Lead",
  "middle_name": null,
  "suffix": null,
  "address": {
    "street": "301 Howard St Ste 600",
    "city": "San Francisco",
    "state": "CA",
    "postal_code": "94105",
    "country": "US"
  },
  "assignee_id": 137658,
  "company_name": "Lead's Company",
  "customer_source_id": 331241,
  "details": "This is a demo description",
  "email": {
    "email": "address@workemail.com",
    "category": "work"
  },
  "monetary_value": 100,
  "socials": [
    {
      "url": "facebook.com/test_lead",
      "category": "facebook"
    }
  ],
  "status": "New",
  "status_id": 208231,
  "tags": [
    "tag 1",
    "tag 2"
  ],
  "title": "Title",
  "websites": [
    {
      "url": "www.workwebsite.com",
      "category": "work"
    }
  ],
  "phone_numbers": [
    {
      "number": "415-999-4321",
      "category": "mobile"
    },
    {
      "number": "415-555-1234",
      "category": "work"
   

## Part 2 {b}: Many-Shot Prompting   <a class="anchor"  id="subsection1"></a>


Many-shot prompting, a concept derived from Google Research, involves providing numerous example pairs to guide the language model in generating accurate outputs. 🧠 This approach is made possible by Gemini’s long context window, which allows the model to accommodate a vast number of examples without losing focus or coherence. ✨

In general, many-shot prompts can include hundreds or even thousands of examples. However, I explored a different approach here. 🚀

The dataset used for this experiment is sourced from the APIs Guru OpenAPI directory. It consists of 10 pairs of JSON inputs and their corresponding OpenAPI responses, which are inherently large in size. 📂 These pairs serve as the example "shots" for the model.

To preserve the clarity and integrity of the system prompt, these examples will not be passed as part of the system prompt itself. Instead, the main instruction extracted from our OpenAPI documentation’s answers will remain prominent and standalone, acting as the primary directive. 🛡️ The example pairs will instead be passed as a history of responses, enabling the system to reference them without cluttering the primary instruction.

This strategy ensures that:

-->The **system prompt** remains focused, allowing the main instruction to stand out clearly. 🌟

-->The **benefits of many-shot prompting** are fully utilized, guiding the model with high-quality, contextual examples. 📊

-->Gemini’s long context window efficiently handles large datasets while maintaining **accuracy and coherence.** ✅

-->By leveraging this approach, the example pairs act as a powerful foundation, enabling the model to generate precise outputs while keeping the overall system design streamlined and effective. 🖥️✨



In [40]:
import os
def extract_text_from_files(directory):
    text_list = []
    try:
        for file_name in os.listdir(directory):
            if file_name.endswith('.txt'):
                file_path = os.path.join(directory, file_name)
                with open(file_path, 'r', encoding='utf-8') as file:
                    content = file.read()
                    text_list.append(content)

        print(f"Successfully extracted text from {len(text_list)} files.")
        return text_list

    except Exception as e:
        print(f"An error occurred: {e}")
        return []


In [41]:
path = "/kaggle/input/context-for-openapi-generation-using-gemini/Long_Context_Openapi_Yatharth/long_ctxt_10/openapi_txt"
openapi_context = extract_text_from_files(path)
print(openapi_context)

Successfully extracted text from 10 files.


In [42]:
path = "/kaggle/input/context-for-openapi-generation-using-gemini/Long_Context_Openapi_Yatharth/long_ctxt_10/json_txt"
json_context = extract_text_from_files(path)
print(json_context)

Successfully extracted text from 10 files.


In [43]:
history_new=[
]

for i in range(0,10):
  user=    {
    "role": "user",
    "parts": [
      json_context[i],
    ],
  }
  history_new.append(user)
  model =     {
    "role": "model",
    "parts": [
        openapi_context[i],
    ],
  }
  history_new.append(model)

In [44]:
print(history_new)##new history established , essentailly making the agent think it is already adept at the task it is assigned



In [45]:
## the agent below builds up our primary respone , i.e. the first iteration of the openapi specs that we will be generating.
def build_baseline_openapi_spec(combined_corpus, user_specifications):
    generation_config = {
    "temperature": 1,
    "top_p": 0.95,
    "top_k": 40,
    "max_output_tokens": 8192,
    "response_mime_type": "text/plain",
    }
    model = genai.GenerativeModel(
    model_name="gemini-1.5-flash",
    generation_config=generation_config,
    system_instruction=
                      f"""You are an assistant tasked with creating a baseline OpenAPI specification.
                              You will be provided with multiple JSON response objects (OpenAPI specifications) as context,
                              and user specifications outlining requirements for the API.
                              Your role is to:
                              - Analyze all provided scripts to understand the existing API functionalities.
                              - Identify common elements and best practices from the scripts.
                              - Incorporate the user specifications into the baseline OpenAPI spec in YAML format.
                              - Produce a baseline OpenAPI specification that includes the common elements.
                              - Make sure your output is syntactically correct and contextually viable, and should adhere to all the customer's demands from the provided scripts and meets the user's specifications.
                              "Follow the given instructions below to the letter in order to generate the best possible scenario":
                              {final_instruct}
                              Your output should be the baseline OpenAPI specification and nothing else, with no additional text, explanations, or formatting and not any headings / headers like 'here's what i found' etc."""
    )

    chat_session = model.start_chat(
    history=history_new
    )
    a=f"Here are the Json response scripts to use as context:\n\n{combined_corpus}\n\nUser specifications:\n\n{user_specifications}"

    response = chat_session.send_message(a)
    return response.text ,response.usage_metadata

zeroshot,tokens=build_baseline_openapi_spec(combined_corpus,user_pref)

In [48]:
print(tokens)

prompt_token_count: 400744
candidates_token_count: 2023
total_token_count: 402767



**We can see how "many shot prompting" lives up to its name, as with an overwhelmingly large 400 thousand and over tokens it still manages to generate a coherent and accurate openapi specs!!**

In [None]:
print(zeroshot)

In [None]:
## The agent below is responsible for reducing hallucinations and mis-steps as much as possible, as it iterates through each and every element of context , further refining and confirming whether the openapi spec being generated really does contain everything the user asked for

def merge_http200_scripts(script1, script2):
    generation_config = {
    "temperature": 1,
    "top_p": 0.95,
    "top_k": 40,
    "max_output_tokens": 8192,
    "response_mime_type": "text/plain",
    }
    ###########################
    #### add merging instructions here later to improve prompt quality
    ###########################

    model = genai.GenerativeModel(
    model_name="gemini-1.5-flash",
    generation_config=generation_config,
    system_instruction=
                      f"""
                      "You are a content merging assistant tasked with combining two HTTP 200 scripts into a single, more elaborate OpenAPI specification. "
                    "Given two "scripts, your role is to:\n\n"
                    "- Understand all aspects of both scripts.\n"
                    "- Identify functionalities in script2 that are not explained or covered in script1.\n"
                    "- Feel free to ignore script 2 if it is not in accordance to the user defined preference for their openapi specefication , which is : {user_pref}, but please do it very judiciously and with care, because we dont want any loss of context"
                    "- Add those functionalities to script1, extending and elongating it appropriately.\n\n"
                    "The output should be a single, merged OpenAPI specification (HTTP 200 script) that combines the features of both scripts, "
                    "with all unique elements from script2 added to script1 where appropriate.\n\n"
                    "You need to follow the given instructions below to the letter and dont skip on anything:"
                    {merge}
                    "Your output should be the merged OpenAPI specification and nothing else.with no additional text, explanations, or formatting and not any headings / headers like 'here's what i found'etc\n\n"""""
    )

    chat_session = model.start_chat(
    history=[

    ]
    )
    a=f""" Merge the following two HTTP 200 scripts into a single, more elaborate OpenAPI specification:\n\n"
                    "Script 1:\n"
                    f"{script1}\n\n"
                    "Script 2:\n"
                    f"{script2}"
                    """

    response = chat_session.send_message(a)
    return response.text , response.usage_metadata
overall=zeroshot
for i in jresp:
    temp,tokenz=merge_http200_scripts(overall,i)
    overall=temp
    print(tokenz)
    time.sleep(5)
print ("done")

In [49]:
##This is the final agent ,responsible for making sure that the output generated by the previous agent is syntactically correct or not.
##While making this project I came across numerous times when the final output seemed to contain everything but was not syntactically correct, one such response is used here as example . This ensures that the final output of this whole project is delivered without errors.
def corrections(spec):
    generation_config = {
    "temperature": 1,
    "top_p": 0.95,
    "top_k": 40,
    "max_output_tokens": 8192,
    "response_mime_type": "text/plain",
    }
   ##################
   ### add correction instructions here , dont add many shot example and explain why it was added in generation of zeroth sol but not here, it is primarily

    ### due to the fact that it ensures the original spec that is being corrected does not get lost with all the examples
#####################
    model = genai.GenerativeModel(
    model_name="gemini-1.5-pro",
    generation_config=generation_config,
    system_instruction=
                        f"""
                        Objective:\n"
                        "Your task is to review, correct, and refine OpenAPI specifications to ensure they are in a format "
                        "that is directly compatible with Postman and other API testing tools. Your goal is to produce a clean, "
                        "structured, and fully functional OpenAPI specification in YAML format, retaining all original content without any loss of information.\n\n"

                        "Follow the given instructions below to the letter while making a decision as to how to proceed :"
                        {corr}

                        "Final Deliverable:\n"
                        "The output should be a refined, fully compatible OpenAPI specification in YAML format that:\n"
                        "- Is immediately usable in Postman or similar tools without errors.\n"
                        "- Is clean, well-structured, and easy to read.\n"
                        "- Preserves all content from the original specification, without any loss or alteration of intent."

                        Heres some context for you to follow where you can Identify how an appropriate openapi spec should look like ,please learn from it :

                        Following is some context you should keep in mind :
                        openapi: "3.0.0"
                        info:
                        version: 1.0.0
                        title: Swagger Petstore
                        license:
                            name: MIT
                        servers:
                        - url: http://petstore.swagger.io/v1
                        paths:
                        /pets:
                            get:
                            summary: List all pets
                            operationId: listPets
                            tags:
                                - pets
                            parameters:
                                - name: limit
                                in: query
                                description: How many items to return at one time (max 100)
                                required: false
                                schema:
                                    type: integer
                                    format: int32
                            responses:
                                200:
                                description: An paged array of pets
                                headers:
                                    x-next:
                                    description: A link to the next page of responses
                                    schema:
                                        type: string
                                content:
                                    application/json:
                                    schema:
                                        $ref: "#/components/schemas/Pets"
                                default:
                                description: unexpected error
                                content:
                                    application/json:
                                    schema:
                                        $ref: "#/components/schemas/Error"
                            post:
                            summary: Create a pet
                            operationId: createPets
                            tags:
                                - pets
                            responses:
                                201:
                                description: Null response
                                default:
                                description: unexpected error
                                content:
                                    application/json:
                                    schema:
                                        $ref: "#/components/schemas/Error"
                        /pets/(petId):
                            get:
                            summary: Info for a specific pet
                            operationId: showPetById
                            tags:
                                - pets
                            parameters:
                                - name: petId
                                in: path
                                required: true
                                description: The id of the pet to retrieve
                                schema:
                                    type: string
                            responses:
                                200:
                                description: Expected response to a valid request
                                content:
                                    application/json:
                                    schema:
                                        $ref: "#/components/schemas/Pets"
                                default:
                                description: unexpected error
                                content:
                                    application/json:
                                    schema:
                                        $ref: "#/components/schemas/Error"
                        components:
                        schemas:
                            Pet:
                            required:
                                - id
                                - name
                            properties:
                                id:
                                type: integer
                                format: int64
                                name:
                                type: string
                                tag:
                                type: string
                            Pets:
                            type: array
                            items:
                                $ref: "#/components/schemas/Pet"
                            Error:
                            required:
                                - code
                                - message
                            properties:
                                code:
                                type: integer
                                format: int32
                                message:
                                type: string
                    """,
    )

    chat_session = model.start_chat(
    history=[
        {
        "role": "user",
        "parts": [
            """Correct the following OpenAPI specification and make sure to return only the corrected openapi spec and nothing else ,no headings or comments like ,  'heres what i found ':
                    The spec:
                    openapi: 3.0.3
                    info:
                    title: Pharmaceutical Shop API
                    description: API for managing products, orders, and customers in a pharmaceutical shop.
                    version: 1.0.0
                    servers:
                    - url: https://api.pharmashop.com/v1
                        description: Production server
                    - url: https://sandbox.api.pharmashop.com/v1
                        description: Sandbox server for testing
                    paths:
                    /products:
                        get:
                        summary: Retrieve a list of products
                        description: Fetches a list of all available pharmaceutical products.
                        responses:
                            '200':
                            description: A list of products.
                            content:
                                application/json:
                                schema:
                                    type: array
                                    items:
                                    $ref: '#/components/schemas/Product'
                        post:
                        summary: Add a new product
                        description: Adds a new pharmaceutical product to the inventory.
                        requestBody:
                            description: Product to add
                            required: true
                            content:
                            application/json:
                                schema:
                                $ref: '#/components/schemas/Product'
                        responses:
                            '201':
                            description: Product created successfully.
                            content:
                                application/json:
                                schema:
                                    $ref: '#/components/schemas/Product'
                    /products/{productId}:
                        get:
                        summary: Retrieve a product by ID
                        description: Fetches details of a specific product by its ID.
                        parameters:
                            - name: productId
                            in: path
                            required: true
                            description: ID of the product to retrieve
                            schema:
                                type: string
                        responses:
                            '200':
                            description: Product details retrieved successfully.
                            content:
                                application/json:
                                schema:
                                    $ref: '#/components/schemas/Product'
                            '404':
                            description: Product not found.
                            content:
                                application/json:
                                schema:
                                    $ref: '#/components/schemas/Error'
                        put:
                        summary: Update a product by ID
                        description: Updates the details of a specific product by its ID.
                        parameters:
                            - name: productId
                            in: path
                            required: true
                            description: ID of the product to update
                            schema:
                                type: string
                        requestBody:
                            description: Updated product information
                            required: true
                            content:
                            application/json:
                                schema:
                                $ref: '#/components/schemas/Product'
                        responses:
                            '200':
                            description: Product updated successfully.
                            content:
                                application/json:
                                schema:
                                    $ref: '#/components/schemas/Product'
                            '404':
                            description: Product not found.
                            content:
                                application/json:
                                schema:
                                    $ref: '#/components/schemas/Error'
                        delete:
                        summary: Delete a product by ID
                        description: Removes a specific product from the inventory by its ID.
                        parameters:
                            - name: productId
                            in: path
                            required: true
                            description: ID of the product to delete
                            schema:
                                type: string
                        responses:
                            '204':
                            description: Product deleted successfully.
                            '404':
                            description: Product not found.
                            content:
                                application/json:
                                schema:
                                    $ref: '#/components/schemas/Error'
                    /orders:
                        get:
                        summary: Retrieve a list of orders
                        description: Fetches a list of all customer orders.
                        responses:
                            '200':
                            description: A list of orders.
                            content:
                                application/json:
                                schema:
                                    type: array
                                    items:
                                    $ref: '#/components/schemas/Order'
                        post:
                        summary: Create a new order
                        description: Places a new order for products.
                        requestBody:
                            description: Order to create
                            required: true
                            content:
                            application/json:
                                schema:
                                $ref: '#/components/schemas/Order'
                        responses:
                            '201':
                            description: Order created successfully.
                            content:
                                application/json:
                                schema:
                                    $ref: '#/components/schemas/Order'
                    /orders/{orderId}:
                        get:
                        summary: Retrieve an order by ID
                        description: Fetches details of a specific order by its ID.
                        parameters:
                            - name: orderId
                            in: path
                            required: true
                            description: ID of the order to retrieve
                            schema:
                                type: string
                        responses:
                            '200':
                            description: Order details retrieved successfully.
                            content:
                                application/json:
                                schema:
                                    $ref: '#/components/schemas/Order'
                            '404':
                            description: Order not found.
                            content:
                                application/json:
                                schema:
                                    $ref: '#/components/schemas/Error'
                        put:
                        summary: Update an order by ID
                        description: Updates the details of a specific order by its ID.
                        parameters:
                            - name: orderId
                            in: path
                            required: true
                            description: ID of the order to update
                            schema:
                                type: string
                        requestBody:
                            description: Updated order information
                            required: true
                            content:
                            application/json:
                                schema:
                                $ref: '#/components/schemas/Order'
                        responses:
                            '200':
                            description: Order updated successfully.
                            content:
                                application/json:
                                schema:
                                    $ref: '#/components/schemas/Order'
                            '404':
                            description: Order not found.
                            content:
                                application/json:
                                schema:
                                    $ref: '#/components/schemas/Error'
                        delete:
                        summary: Delete an order by ID
                        description: Cancels a specific order by its ID.
                        parameters:
                            - name: orderId
                            in: path
                            required: true
                            description: ID of the order to delete
                            schema:
                                type: string
                        responses:
                            '204':
                            description: Order deleted successfully.
                            '404':
                            description: Order not found.
                            content:
                                application/json:
                                schema:
                                    $ref: '#/components/schemas/Error'
                    /customers:
                        get:
                        summary: Retrieve a list of customers
                        description: Fetches a list of all registered customers.
                        responses:
                            '200':
                            description: A list of customers.
                            content:
                                application/json:
                                schema:
                                    type: array
                                    items:
                                    $ref: '#/components/schemas/Customer'
                        post:
                        summary: Register a new customer
                        description: Adds a new customer to the system.
                        requestBody:
                            description: Customer to register
                            required: true
                            content:
                            application/json:
                                schema:
                                $ref: '#/components/schemas/Customer'
                        responses:
                            '201':
                            description: Customer registered successfully.
                            content:
                                application/json:
                                schema:
                                    $ref: '#/components/schemas/Customer'
                    /customers/{customerId}:
                        get:
                        summary: Retrieve a customer by ID
                        description: Fetches details of a specific customer by their ID.
                        parameters:
                            - name: customerId
                            in: path
                            required: true
                            description:
                    ::contentReference[oaicite:0]{index=0}


                    """,
        ],
        },
        {
        "role": "model",
        "parts": [
            """
                                openapi: "3.0.0"
                                info:
                                title: Pharmaceutical Shop API
                                version: 1.0.0
                                description: API for managing products, orders, and customers in a pharmaceutical shop.
                                servers:
                                - url: https://api.pharmashop.com/v1
                                    description: Production server
                                - url: https://sandbox.api.pharmashop.com/v1
                                    description: Sandbox server for testing
                                paths:
                                /products:
                                    get:
                                    summary: Retrieve a list of products
                                    operationId: listProducts
                                    tags:
                                        - Products
                                    responses:
                                        '200':
                                        description: A list of products.
                                        content:
                                            application/json:
                                            schema:
                                                type: array
                                                items:
                                                $ref: "#/components/schemas/Product"
                                        default:
                                        description: unexpected error
                                        content:
                                            application/json:
                                            schema:
                                                $ref: "#/components/schemas/Error"
                                    post:
                                    summary: Add a new product
                                    operationId: addProduct
                                    tags:
                                        - Products
                                    requestBody:
                                        description: Product to add
                                        required: true
                                        content:
                                        application/json:
                                            schema:
                                            $ref: "#/components/schemas/Product"
                                    responses:
                                        '201':
                                        description: Product created successfully.
                                        content:
                                            application/json:
                                            schema:
                                                $ref: "#/components/schemas/Product"
                                        default:
                                        description: unexpected error
                                        content:
                                            application/json:
                                            schema:
                                                $ref: "#/components/schemas/Error"
                                /products/{productId}:
                                    get:
                                    summary: Retrieve a product by ID
                                    operationId: getProductById
                                    tags:
                                        - Products
                                    parameters:
                                        - name: productId
                                        in: path
                                        required: true
                                        description: ID of the product to retrieve
                                        schema:
                                            type: string
                                    responses:
                                        '200':
                                        description: Product details retrieved successfully.
                                        content:
                                            application/json:
                                            schema:
                                                $ref: "#/components/schemas/Product"
                                        default:
                                        description: unexpected error
                                        content:
                                            application/json:
                                            schema:
                                                $ref: "#/components/schemas/Error"
                                components:
                                schemas:
                                    Product:
                                    type: object
                                    required:
                                        - id
                                        - name
                                        - price
                                    properties:
                                        id:
                                        type: string
                                        name:
                                        type: string
                                        price:
                                        type: number
                                        format: float
                                        description:
                                        type: string
                                        category:
                                        type: string
                                    Error:
                                    type: object
                                    required:
                                        - code
                                        - message
                                    properties:
                                        code:
                                        type: integer
                                        format: int32
                                        message:
                                        type: string
                                """,
        ],
        },
    ]
    )
    a=f"Correct the following OpenAPI specification; AND MAKE SURE YOUR RESPONSE DOES NOT CONTAIN ANYTHING APART FROM THE OPENAPI SPECEFICATION, NO HEADINGS , COMMENTS , FOOTNOTES , WHAT YOU DID ETC, SIMPLY THE YAML SCRIPT OF OPENAPI SPEC:\n{spec}\n please make sure that the spec you generate is in accordance with the user defined preference which is : {user_pref}"

    response = chat_session.send_message(a)
    return response.text, response.usage_metadata

final_output,tokens= corrections(overall)


## FINAL OUTPUT✨✨  <a class="anchor"  id="chapter1"></a>


In [50]:
print(tokens)

prompt_token_count: 7087
candidates_token_count: 873
total_token_count: 7960



In [51]:
print(final_output)

```yaml
openapi: 3.0.0
info:
  title: Copper API
  version: v1
  description: API specification for interacting with Copper CRM. This includes fetching leads by ID, creating new leads, and creating people in bulk.
paths:
  /leads/{leadId}:
    get:
      summary: Get a lead by ID
      description: Retrieves a single lead from Copper CRM using its ID.
      parameters:
        - in: path
          name: leadId
          required: true
          schema:
            type: integer
            format: int64
          description: ID of the lead to retrieve.
      responses:
        '200':
          description: Successful operation
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Lead'
        '404':
          description: Lead not found
        '422':
          description: Unprocessable Entity
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/Error'
  /leads: