# Queries with and without Azure OpenAI

So far, you have your Search Engine loaded **from two different data sources in two diferent indexes**, on this notebook we are going to try some example queries and then use Azure OpenAI service to see if we can get a good answer for the user query.

The idea is that a user can ask a question about Computer Science (first datasource/index) or about Covid (second datasource/index), and the engine will respond accordingly.
This **Multi-Index** demo, mimics the scenario where a company loads multiple type of documents of different types and about completly different topics and the search engine must respond with the most relevant results.

## Set up variables

In [1]:
import os
import urllib
import requests
import random
import json
from collections import OrderedDict
from IPython.display import display, HTML, Markdown
from typing import List
from operator import itemgetter

# LangChain Imports needed
from langchain_openai import AzureChatOpenAI
from langchain_openai import AzureOpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.retrievers import BaseRetriever
from langchain_core.callbacks import CallbackManagerForRetrieverRun
from langchain_core.documents import Document
from langchain_core.runnables import ConfigurableField


# Our own libraries needed
from common.prompts import DOCSEARCH_PROMPT
from common.utils import get_search_results

from dotenv import load_dotenv
load_dotenv("credentials.env")

True

In [2]:
# Setup the Payloads header
headers = {'Content-Type': 'application/json','api-key': os.environ['AZURE_SEARCH_KEY']}
params = {'api-version': os.environ['AZURE_SEARCH_API_VERSION']}

## Multi-Index Search queries

In [3]:
# Text-based Indexes that we are going to query (from Notebook 01 and 02)
index1_name = "cogsrch-index-files"
index2_name = "cogsrch-index-csv"
indexes = [index2_name, index1_name]

Try questions that you think might be answered or addressed in computer science papers in 2020-2021 or that can be addressed by medical publications about COVID in 2020-2021. Try comparing the results with the open version of ChatGPT.<br>
The idea is that the answers using Azure OpenAI only looks at the information contained on these publications.

**Example Questions you can ask**:
- What is CLP?
- How Markov chains work?
- What are some examples of reinforcement learning?
- What are the main risk factors for Covid-19?
- What medicine reduces inflamation in the lungs?
- Why Covid doesn't affect kids that much compared to adults?
- Does chloroquine really works against covid?
- Who won the 1994 soccer world cup? # This question should yield no answer if the system is correctly grounded

In [4]:
QUESTION = "What medicine reduces inflamation in the lungs?"

### Search on both indexes individually and aggragate results

#### **Note**: 
In order to standarize the indexes, **there must be 6 mandatory fields present on each index**: `id, title, name, location, chunk, chunkVector`. This is so that each document can be treated the same along the code. Also, **all indexes must have a semantic configuration**.

We are going to use Hybrid Queries: Text + Vector Search combined for optimal results!

In [5]:
agg_search_results = dict()
k = 10

for index in indexes:
    search_payload = {
        "search": QUESTION, # Text query
        "select": "id, title, name, location, chunk",
        "queryType": "semantic",
        "vectorQueries": [{"text": QUESTION, "fields": "chunkVector", "kind": "text", "k": k}], # Vector query
        "semanticConfiguration": "my-semantic-config",
        "captions": "extractive",
        "answers": "extractive",
        "count":"true",
        "top": k
    }

    r = requests.post(os.environ['AZURE_SEARCH_ENDPOINT'] + "/indexes/" + index + "/docs/search",
                     data=json.dumps(search_payload), headers=headers, params=params)
    print(r.status_code)

    search_results = r.json()
    agg_search_results[index]=search_results
    print("Index:", index, "Results Found: {}, Results Returned: {}".format(search_results['@odata.count'], len(search_results['value'])))

200
Index: cogsrch-index-csv Results Found: 70150, Results Returned: 10
200
Index: cogsrch-index-files Results Found: 118874, Results Returned: 10


### Display the top results (from both searches) based on the score

In [6]:
display(HTML('<h4>Top Answers</h4>'))

for index,search_results in agg_search_results.items():
    for result in search_results['@search.answers']:
        if result['score'] > 0.5: # Show answers that are at least 50% of the max possible score=1
            display(HTML('<h5>' + 'Answer - score: ' + str(round(result['score'],2)) + '</h5>'))
            display(HTML(result['text']))
            
print("\n\n")
display(HTML('<h4>Top Results</h4>'))

content = dict()
ordered_content = OrderedDict()


for index,search_results in agg_search_results.items():
    for result in search_results['value']:
        if result['@search.rerankerScore'] > 1:# Show answers that are at least 25% of the max possible score=4
            content[result['id']]={
                                    "title": result['title'],
                                    "chunk": result['chunk'], 
                                    "name": result['name'], 
                                    "location": result['location'] ,
                                    "caption": result['@search.captions'][0]['text'],
                                    "score": result['@search.rerankerScore'],
                                    "index": index
                                    }
    
#After results have been filtered we will Sort and add them as an Ordered list\n",
for id in sorted(content, key= lambda x: content[x]["score"], reverse=True):
    ordered_content[id] = content[id]
    url = str(ordered_content[id]['location']) + os.environ['BLOB_SAS_TOKEN']
    title = str(ordered_content[id]['title']) if (ordered_content[id]['title']) else ordered_content[id]['name']
    score = str(round(ordered_content[id]['score'],2))
    display(HTML('<h5><a href="'+ url + '">' + title + '</a> - score: '+ score + '</h5>'))
    display(HTML(ordered_content[id]['caption']))






### Comments on Query results

As seen above the semantic re-ranking feature of Azure AI Search service is good. It gives answers (sometimes) and also the top results with the corresponding file and the paragraph where the answers is possible located.

Let's see if we can make this better with Azure OpenAI

# Using Azure OpenAI

To use OpenAI to get a better answer to our question, the thought process is simple: let's **give the answer and the content of the documents from the search result to the GPT model as context and let it provide a better response**. This is what RAG (Retreival Augmented Generation) is about.

Now, before we do this, we need to understand a few things first:

1) Chainning and Prompt Engineering
2) Embeddings

We will use a library call **LangChain** that wraps a lot of boiler plate code.
Langchain is one library that does a lot of the prompt engineering for us under the hood, for more information see [here](https://python.langchain.com/en/latest/index.html)

In [7]:
# Set the ENV variables that Langchain needs to connect to Azure OpenAI
os.environ["OPENAI_API_VERSION"] = os.environ["AZURE_OPENAI_API_VERSION"]

**Important Note**: Starting now, we will utilize OpenAI models. Please ensure that you have deployed the following models within the Azure OpenAI portal:

- text-embedding-ada-002 (or newer)
- gpt-35-turbo (1106 or newer)
- gpt-4-turbo (1106 or newer)

Reference for Azure OpenAI models (regions, limits, dimensions, etc): [HERE](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models)

## A gentle intro to chaining LLMs and prompt engineering

Chains refer to sequences of calls - whether to an LLM, a tool, or a data preprocessing step.

Azure OpenAI is a type of LLM (provider) that you can use but there are others like Cohere, Huggingface, etc.

Chains can be simple (i.e. Generic) or specialized (i.e. Utility).

* Generic — A single LLM is the simplest chain. It takes an input prompt and the name of the LLM and then uses the LLM for text generation (i.e. output for the prompt).

Here’s an example:

In [8]:
COMPLETION_TOKENS = 2500
llm = AzureChatOpenAI(deployment_name=os.environ["GPT35_DEPLOYMENT_NAME"], 
                      temperature=0, 
                      max_tokens=COMPLETION_TOKENS)

In [9]:
output_parser = StrOutputParser()
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an assistant that give thorough responses to users."),
    ("user", "{input}. Give your response in {language}")
])

The | symbol is similar to a unix pipe operator, which chains together the different components feeds the output from one component as input into the next component.

In [10]:
chain = prompt | llm | output_parser

In [11]:
%%time
display(Markdown(chain.invoke({"input": QUESTION, "language": "Spanish"})))

El medicamento que reduce la inflamación en los pulmones se llama corticoide inhalado. Este tipo de medicamento se utiliza comúnmente para tratar enfermedades respiratorias como el asma y la enfermedad pulmonar obstructiva crónica (EPOC). Los corticoides inhalados ayudan a reducir la inflamación en los pulmones y a aliviar los síntomas respiratorios. Es importante seguir las indicaciones de un médico para el uso adecuado de este medicamento.

CPU times: user 64.9 ms, sys: 563 µs, total: 65.4 ms
Wall time: 1.56 s


**Note**: this is the first time you use OpenAI in this Accelerator, so if you get a Resource not found error, is most likely because the name of your OpenAI model deployment is different than the environmental variable set above `os.environ["GPT35_DEPLOYMENT_NAME"]`

Great!!, now you know how to create a simple prompt and use a chain in order to answer a general question using ChatGPT knowledge!. 

It is important to note that we rarely use generic chains as standalone chains. More often they are used as building blocks for Utility chains (as we will see next). Also important to notice is that we are NOT using our documents or the result of the Azure Search yet, just the knowledge of ChatGPT on the data it was trained on.

**The second type of Chains are Utility:**

* Utility — These are specialized chains, comprised of many building blocks to help solve a specific task. For example, LangChain supports some end-to-end chains (such as `create_retrieval_chain` for QnA Doc retrieval, Summarization, etc).

We will build our own specific chain in this workshop for digging deeper and solve our use case of enhancing the results of Azure AI Search.


But before dealing with the utility chain needed, let's first review the concept of Embeddings and Vector Search and RAG. 

## Embeddings and Vector Search

From the Azure OpenAI documentation ([HERE](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/embeddings?tabs=python)), An embedding is a special format of data representation that can be easily utilized by machine learning models and algorithms. The embedding is an information dense representation of the semantic meaning of a piece of text. Each embedding is a vector of floating point numbers, such that the distance between two embeddings in the vector space is correlated with semantic similarity between two inputs in the original format. For example, if two texts are similar, then their vector representations should also be similar. 

### Why Do We Need Vectors?

Vectors are essential for several reasons:

- **Semantic Richness**: They convert the semantic meaning of text into mathematical vectors, capturing nuances that simple keyword searches miss. This makes them incredibly powerful for understanding and processing language.
- **Human-like Searching**: Searching using vector distances mimics the human approach to finding information based on context and meaning, rather than relying solely on exact word matches.
- **Efficiency in Scale**: Vector representations allow for efficient handling and searching of large datasets. By reducing complex text to numerical vectors, algorithms can quickly sift through vast amounts of information.

### Understanding LLM Tokens' Context Limitation

Large Language Models (LLMs) like GPT come with a token limit for each input, which poses a challenge when dealing with lengthy documents or extensive data sets. This limitation restricts the model's ability to understand and generate responses based on the full context of the information provided. It becomes crucial, therefore, to devise strategies that can effectively manage and circumvent this limitation to leverage the full power of LLMs.

To address this challenge, the solution incorporates several key steps:

1. **Segmenting Documents**: Breaking down large documents into smaller, manageable segments.
2. **Vectorization of Chunks**: Converting these segments into vectors, making them compatible with vector-based search techniques.
3. **Hybrid Search**: Employing both vector and text search methods to pinpoint the most relevant segments in relation to the query.
4. **Optimal Context Provision**: Presenting the LLM with the most pertinent segments, ensuring a balance between detail and brevity to stay within token limits.


Our ultimate goal is to rely solely on vector indexes and hybrid searchs (vector + text). While it is possible to manually code parsers with OCR for various file types and develop a scheduler to synchronize data with the index, there is a more efficient alternative: **Azure AI Search has automated chunking strategies and vectorization**.

It's important to note that **document segmentation and vectorization have already been completed in AI Azure Search**, as seen in the `ordered_content` dictionary. This pre-processing step simplifies subsequent operations, ensuring rapid response times and adherence to the token limits of the chosen OpenAI model.


So really, our only job now is to make sure that the results from the Azure AI Search queries fit on the LLM context size, and then let it do its magic.

In [12]:
index_name = "cogsrch-index-files"
index2_name = "cogsrch-index-csv"
indexes = [index1_name, index2_name]

In order to not duplicate code, we have put many of the code used above into functions. These functions are in the `common/utils.py` and `common/prompts.py` files. This way we can use these functios in the app that we will build later.

`get_search_results()` do the multi-index search and returns the combined ordered list of documents/chunks.

In [13]:
k = 20  # play with this parameter and see the quality of the final answer
ordered_results = get_search_results(QUESTION, indexes, k=k, reranker_threshold=1)
print("Number of results:",len(ordered_results))

Number of results: 20


In [14]:
# Uncomment the below line if you want to inspect the ordered results
# ordered_results

Now let's create a Prompt Template that will ground the response only in the chunks retrieve by our hybrid AI Search.

In [15]:
template = """Answer the question thoroughly, based **ONLY** on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

In [16]:
%%time 
# Creation of our custom chain
chain = prompt | llm | output_parser

try:
    display(Markdown(chain.invoke({"question": QUESTION, "context": ordered_results})))
except Exception as e:
    print(e)

Chloroquine is a medicine that has been shown to reduce inflammation in the lungs. It has been found to inhibit the release of inflammatory cytokines by human lung explants, specifically reducing the release of TNF-α, IL-6, CCL2, and CCL3. This suggests that chloroquine might mitigate the cytokine storm associated with severe pneumonia caused by coronaviruses. Therefore, chloroquine is a potential medicine for reducing inflammation in the lungs.

CPU times: user 31 ms, sys: 839 µs, total: 31.9 ms
Wall time: 2.02 s


### From GPT-3.5 to GPT-4

Now let's see how the response changes if we change to GPT-4


In [20]:
llm_2 = AzureChatOpenAI(deployment_name=os.environ["GPT4_DEPLOYMENT_NAME"], temperature=0.5, max_tokens=COMPLETION_TOKENS)
chain = prompt | llm_2 | output_parser

In [21]:
%%time
try:
    display(Markdown(chain.invoke({"question": QUESTION, "context": ordered_results})))
except Exception as e:
    print(e)

Based on the provided context, several medicines and therapeutic approaches have been mentioned that reduce inflammation in the lungs:

1. Chloroquine: In human lung parenchymal explants, a clinically achievable concentration of chloroquine (100 μM) inhibited the lipopolysaccharide-induced release of inflammatory cytokines such as TNF-α, IL-6, CCL2, and CCL3, which could also mitigate the cytokine storm associated with severe pneumonia caused by coronaviruses.

2. Recombinant IL-22: Administering recombinant IL-22 in vivo has been shown to reduce inflammation and fluid leak into the lung, demonstrating the potential of targeting the IL-22/IL-22BP axis for reducing influenza-induced pneumonia.

3. Low doses of radiation therapy (RT): Low doses of RT (0.5-1 Gy) increased IL-10 secretion and decreased IFNγ production in the culture supernatant of Poly(I:C)-stimulated human lung macrophages, suggesting a potential therapeutic strategy to counteract lung inflammation.

4. Anti-inflammatory pharmacological therapies: These therapies aim to inhibit excessive inflammation or manipulate the resulting physiological derangement causing respiratory failure. Cell-based therapy is an emerging approach that seeks to convert the inflammatory process from an injurious one to a reparative one.

5. IL-6 blockade with tocilizumab: In selected patients with severe, rapidly deteriorating COVID-19-related pneumonia, IL-6 blockade with tocilizumab could curb the cytokine storm, prevent ICU admission, and the requirement for mechanical ventilation.

6. Glucocorticoids: Although glucocorticoids are considered effective anti-inflammatory therapies for chronic inflammatory and immune diseases like asthma, they may be relatively ineffective in certain populations such as asthmatic smokers, and patients with chronic obstructive pulmonary disease (COPD) or cystic fibrosis (CF).

7. Escin: Escin, an agent with potent anti-inflammatory and anti-viral effects in lung injury, has been suggested as a potential add-on therapy in acute lung injury (ALI) related to COVID-19 infection.

8. Anti-inflammatory therapy in neonates: Early post-natal anti-inflammatory therapy could help in preventing the development of chronic lung disease (CLD) in preterm newborns who may be unable to activate the anti-inflammatory cytokine IL-10.

9. Noscapine: Noscapine, a cough medicine, has been shown to inhibit the bradykinin-enhanced cough response in humans, suggesting it may help attenuate the intense immunological reaction seen in the lung tissue during viral infections like SARS-CoV-2.

10. Pharmacological management of chronic obstructive pulmonary disease (COPD): Rapid-acting bronchodilators, systemic corticosteroids, and antibiotics are key to managing exacerbations of COPD and preventing exacerbations should also be a component of therapy for the disease.

11. Macrolide and ketolide antibiotics: These antibiotics may exert anti-inflammatory effects on the chronically inflamed airways in addition to their anti-infective action and may play a role in the management of asthma.

These are just some of the medicines and approaches mentioned in the context provided that are associated with reducing lung inflammation. It is important to note that the effectiveness and safety of these treatments can vary based on the specific condition being treated, the patient's overall health, and other factors. Clinical judgment and appropriate medical consultation are essential for the proper management of lung inflammation.

CPU times: user 21.7 ms, sys: 11.5 ms, total: 33.2 ms
Wall time: 1min 28s


#### As we can see, the model selection MATTERS!

We will dive deeper into this later, but for now, **look at the diference between GPT3.5 and GPT4, in quality and in response time**.

# Improving the Prompt and adding citations

We could see above that the answer given by GPT3.5 was very simple compared to GPT4, even when the prompt says "thorough responses to users". We also could see that there is no citations or references. **How do we know if the answer is grounded on the context or not?**

Let's see if these two issues can be improved by Prompt Engineering.<br>
On `common/prompts.py` we created a prompt called `DOCSEARCH_PROMPT` check it out!

Let's also create a custom Retriever class so we can plug it in easily within the chain building. 
Note: we can also use the Azure AI Search retriever class [HERE](https://python.langchain.com/docs/integrations/vectorstores/azuresearch), however we want to create a custom Retriever for the following reasons:

1) We want to do multi-index searches in one call
2) Easier to teach complex concepts of LangChain in this notebook
3) We want to use the REST API vs the Python Azure Search SDK

In [23]:
class CustomRetriever(BaseRetriever):
    
    topK : int
    reranker_threshold : int
    indexes: List
    sas_token: str = None
    
    def _get_relevant_documents(self, query: str) -> List[Document]:
        
        ordered_results = get_search_results(query, self.indexes, k=self.topK, 
                                             reranker_threshold=self.reranker_threshold, 
                                             sas_token=self.sas_token)
        top_docs = []
        for key,value in ordered_results.items():
            location = value["location"] if value["location"] is not None else ""
            top_docs.append(Document(page_content=value["chunk"], metadata={"source": location, "score":value["score"]}))

        return top_docs

In [28]:
# Create the retriever
retriever = CustomRetriever(indexes=indexes, topK=k, reranker_threshold=1, sas_token=os.environ['BLOB_SAS_TOKEN'])

In [30]:
# Test retreiver
results = retriever.get_relevant_documents(QUESTION)
len(results)

20

In [31]:
# We can create now a dynamically configurable llm object that can change the model at runtime
dynamic_llm = AzureChatOpenAI(deployment_name=os.environ["GPT35_DEPLOYMENT_NAME"], 
                              temperature=0.5, max_tokens=COMPLETION_TOKENS).configurable_alternatives(
    # This gives this field an id
    # When configuring the end runnable, we can then use this id to configure this field
    ConfigurableField(id="model"),
    # This sets a default_key.
    # If we specify this key, the default LLM  (initialized above) will be used
    default_key="gpt35",
    # This adds a new option, with name `gpt4`
    gpt4=AzureChatOpenAI(deployment_name=os.environ["GPT4_DEPLOYMENT_NAME"], 
                         temperature=0.5, max_tokens=COMPLETION_TOKENS),
    # You can add more configuration options here
)

In [32]:
# Declaration of the chain with the dynamic llm and the new prompt
configurable_chain = (
    {
        "context": itemgetter("question") | retriever, # Passes the question to the retriever and the results are assign to context
        "question": itemgetter("question")
    }
    | DOCSEARCH_PROMPT  # Passes the input variables above to the prompt template
    | dynamic_llm   # Passes the finished prompt to the LLM
    | StrOutputParser()  # converts the output (Runnable object) to the desired output (string)
)

In [33]:
%%time

try:
    display(Markdown(configurable_chain.with_config(configurable={"model": "gpt35"}).invoke({"question": QUESTION})))
except Exception as e:
    print(e)

Chloroquine is a medicine that has been shown to reduce inflammation in the lungs. On human lung parenchymal explants, chloroquine concentration clinically achievable in the lung (100 μM) inhibited the lipopolysaccharide-induced release of TNF-α (by 76%), IL-6 (by 68%), CCL2 (by 72%) and CCL3 (by 67%). Beside its antiviral activity, chloroquine might also mitigate the cytokine storm associated with severe pneumonia caused by coronaviruses<sup><a href="https://doi.org/10.1093/cid/ciaa546" target="_blank">[1]</a></sup>.

Furthermore, chest irradiation at equivalent doses below 1 Gy has been used successfully in the past to treat pneumonia. It has been shown that low doses of radiation therapy (RT) protect the lung from inflammation. Nerve- and airway-associated macrophages (NAMs) and the pro-versus anti-inflammatory cytokine balance (IL6-IFNγ/IL-10) were recently shown to regulate lung inflammation. Chest irradiation using low doses of RT significantly increased the percentage of NAMs producing IL-10, leading to lung protection from inflammation<sup><a href="https://doi.org/10.1101/2020.05.11.077651" target="_blank">[3]</a></sup>.

In addition, the review discusses the role of the tight junction of the airway epithelium as the predominating structure conferring epithelial tightness and preventing exudate formation in the lungs, and the impact of inflammatory perturbations on their function. Inflammatory lung diseases predispose patients to severe lung failures like alveolar edema, respiratory distress syndrome, and acute lung injury. Preventing exacerbations should also be a component of therapy for the disease<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5203840/" target="_blank">[11]</a></sup>.

These findings suggest that chloroquine, low doses of radiation therapy, and therapies targeting the airway epithelium's tight junctions play a role in reducing inflammation in the lungs.

CPU times: user 384 ms, sys: 14.8 ms, total: 399 ms
Wall time: 8.45 s


As seen above, we were able to improve the quality and breath of the answer and add citations with only prompt engineering!

Let's try again GPT-4

In [34]:
%%time
try:
    display(Markdown(configurable_chain.with_config(configurable={"model": "gpt4"}).invoke({"question": QUESTION})))
except Exception as e:
    print(e)

Several medicines and treatments have been identified to reduce inflammation in the lungs:

1. Chloroquine has been shown to inhibit the lipopolysaccharide-induced release of various inflammatory cytokines such as TNF-α, IL-6, CCL2, and CCL3 in human lung parenchymal explants<sup><a href="https://doi.org/10.1093/cid/ciaa546" target="_blank">[1]</a></sup>.

2. Interleukin-22 (IL-22) has been found to protect the lung during infection by promoting tight junction formation and reducing pulmonary inflammation. This has been demonstrated in IL-22BP-knockout mice and normal human bronchial epithelial cells<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6917921/" target="_blank">[2]</a></sup>.

3. Low doses of radiation therapy (RT) can protect the lung from inflammation by increasing IL-10 secretion and decreasing IFNγ production, as well as reducing the percentage of human lung macrophages producing IL-6<sup><a href="https://doi.org/10.1101/2020.05.11.077651" target="_blank">[3]</a></sup>.

4. Cell-based therapy has been considered for acute lung injury, aiming to convert the inflammatory process from an injurious to a reparative one<sup><a href="https://doi.org/10.1055/s-0033-1351119" target="_blank">[4]</a></sup>.

5. IL6 blockade with tocilizumab has been used in selected patients with severe COVID-19-related pneumonia to curb the "cytokine storm," potentially preventing ICU admission and the requirement for mechanical ventilation<sup><a href="https://doi.org/10.1101/2020.04.20.20061861" target="_blank">[5]</a></sup>.

6. Glucocorticoids, despite being the most effective anti-inflammatory therapies for chronic inflammatory diseases like asthma, have limitations in certain patient populations, leading to the exploration of new therapeutic approaches<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7120595/" target="_blank">[6]</a></sup>.

7. Escin, which has potent anti-inflammatory and anti-viral effects, has been suggested as a potential add-on therapy in acute lung injury related to COVID-19 infection<sup><a href="https://doi.org/10.1002/jcph.1644" target="_blank">[7]</a></sup>.

8. Anti-inflammatory cytokines like IL-10 may be used in early post-natal anti-inflammatory therapy to help prevent the development of chronic lung disease (CLD) in preterm infants<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7102381/" target="_blank">[8]</a></sup>.

9. RNA interference (RNAi) has been considered for treating a range of respiratory conditions due to its specificity, potency, and reduced risk of toxic effects<sup><a href="https://doi.org/10.1021/mp070048k" target="_blank">[9]</a></sup>.

10. Traditional Chinese herbal abstractions, such as Tanshinone IIA (TIIA), have shown significant anti-inflammatory effects in animal models of pulmonary fibrosis<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4003542/" target="_blank">[10]</a></sup>.

11. Noscapine, a cough medicine, has been shown to inhibit bradykinin enhanced cough response in humans and has been suggested for trials to attenuate severe symptoms due to lung damage from viral infections<sup><a href="https://doi.org/10.1002/ddr.21676" target="_blank">[11]</a></sup>.

12. Systemic corticosteroids are used to manage exacerbations of chronic obstructive pulmonary disease (COPD)<sup><a href="https://www.ncbi.nlm.nih.gov/pubmed/27904303/" target="_blank">[12]</a></sup>.

13. Macrolide and ketolide antibiotics may have anti-inflammatory effects on the airways in asthma in addition to their anti-infective action<sup><a href="https://www.ncbi.nlm.nih.gov/pubmed/17353114/" target="_blank">[13]</a></sup>.

14. Herbal medicine, while often considered to have fewer side effects due to its natural origin, can also have anti-inflammatory effects, though there is a need for awareness of potential drug-induced lung injuries<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7123307/" target="_blank">[14]</a></sup>.

It is important to note that the effectiveness and safety of these treatments can vary, and they should be used under the guidance of healthcare professionals.

CPU times: user 148 ms, sys: 7.99 ms, total: 156 ms
Wall time: 1min 52s


#### As you can see the answer from GPT4 is richer, and includes all the relevant chunks. GPT3.5 tends to focus in the first and last chunks only

## Adding Streaming to improve user experience and performance

It is obvious by now that **GPT4 answers are better quality than GPT3.5**. None are incorrect, but GPT4 is better at understanding the context, following the prompt instructions and on giving a comprehensive answer.

One way to make GPT4 look faster is to stream the answer, so the user can see the response as it is typed. To do this, we just simply need to call the method `stream` instead of `invoke`. More on Streaming and Callbacks in later notebooks, but for now, this is one simple way to do it:

In [35]:
for chunk in configurable_chain.with_config(configurable={"model": "gpt4"}).stream({"question": QUESTION}):
    print(chunk, end="", flush=True)

Several medicines and therapeutic approaches have been identified to reduce inflammation in the lungs:

1. Chloroquine has been shown to inhibit the release of various inflammatory cytokines such as TNF-α, IL-6, CCL2, and CCL3 in human lung parenchymal explants, which suggests it may mitigate the cytokine storm associated with severe pneumonia caused by coronaviruses<sup><a href="https://doi.org/10.1093/cid/ciaa546" target="_blank">[1]</a></sup>.

2. Interleukin-22 (IL-22) has a critical role in protecting the lung during infection, and administering recombinant IL-22 in vivo reduces inflammation and fluid leak into the lung<sup><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6917921/" target="_blank">[2]</a></sup>.

3. Low doses of radiation therapy (RT) have been used to increase IL-10 secretion by lung macrophages and decrease IFNγ production, which suggests a mechanism for regulating lung inflammation and favoring anti-inflammatory cytokine secretion by lung macrophages<sup><

# Summary
##### By using OpenAI, the answers to user questions are way better than taking just the results from Azure AI Search. So the summary is:
- Utilizing Azure AI Search, we conduct a multi-index hybrid search that identifies the top chunks of documents from each index.
- Subsequently, Azure OpenAI utilizes these extracted chunks as context, comprehends the content, and employs it to deliver optimal answers.
- Best of two worlds!

##### Important observations on this notebook:

1) Answers with GPT-3.5 are less quality but way faster
2) Answers with GPT-3.5 sometimes failed on provinding citations in the right format
3) Answers with GPT-4 are great quality but way slower
4) Answers with GPT-4 always provide good and diverse citations in the right format
5) Streaming the answers improves the user experience big time!

# NEXT
In the next notebook, we are going to see how we can treat complex and large documents separately, also using Vector Search