# 2. hydeRAG

<img src="https://humanloop.com/blog/rag-architectures/HyDe.png" height=300px>

In [2]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains import create_retrieval_chain
from langchain_community.vectorstores import FAISS
from langchain_community.document_loaders import PyPDFDirectoryLoader
from langchain_groq import ChatGroq

  from .autonotebook import tqdm as notebook_tqdm


### Loading the LLM and the Embedding Models

In [3]:
# Load the GROQ and OpenAI API keys
groq_api_key = ""
google_api_key = ""

llm = ChatGroq(groq_api_key=groq_api_key, model_name="Llama3-8b-8192")

Embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")



### Loading and formating the data

In [4]:
# Data Loader
loader = PyPDFDirectoryLoader("./pdf")  # Load PDFs from directory
docs = loader.load()

# Splitting / Creating Chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
splits = text_splitter.split_documents(docs)

# Creating Embeddings by Passing HyDe Embeddings to Vector Store
vectorstore = FAISS.from_documents(documents=splits, embedding=Embeddings)

# Creating Retriever
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 4})

# Importing the Prompt Template
template = """For the given question try to generate a hypothetical answer
Only generate the answer and nothing else:
Question: {question}
"""
# for converting the query to a prompt
Prompt = ChatPromptTemplate.from_template(template)
query = Prompt.format(question='What are some climate related risks?')

hypothetical_answer = llm.invoke(query).content
print(hypothetical_answer)

Sea-level rise, more frequent and severe heatwaves, droughts, and floods, melting of polar ice caps, intense storms, and changes in precipitation patterns.


In [5]:
# retrieval with hypothetical answer/document
similar_docs = retriever.get_relevant_documents(hypothetical_answer)

for doc in similar_docs:
 print(doc.page_content)
 print()

  similar_docs = retriever.get_relevant_documents(hypothetical_answer)


temperature means, desertification, decreasing precipitation, loss of biodiversity, land and forest degradation, glacial retreat and related impacts, ocean 
acidification, sea level rise and salinization. {2.1.2}

Very likely
Global sea
level rise
Glacier
retreat
Medium conﬁdence
Increase in 
compound
ﬂooding
Increase in 
agricultural 
& ecological 
drought
Increase 
in ﬁre
weather

conditions, which are increasingly attributed to human inﬂuence
Attribution of observed physical climate changes to human inﬂuence:
Virtually certain
Increase 
in hot 
extremes 
Upper 
ocean
acidiﬁcation
pH
Likely
Increase 
in heavy 
precipitation
Very likely
Global sea
level rise
Glacier
retreat

health challenges 36 ( very high confidence), flooding in coastal and other low-lying cities and regions ( high confidence), 
biodiversity loss in land, freshwater and ocean ecosystems ( medium to  very high confidence, depending on ecosystem),



### Creating the Prompt Template

In [6]:
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

Prompt = ChatPromptTemplate.from_template(template)
# Creating a function to format the retrieved docs
def format_docs(docs):
   return "\n\n".join(doc.page_content for doc in docs)

formatted_docs = format_docs(similar_docs)

Query_Prompt = Prompt.format(context=formatted_docs, 
question="What are some climate related risks?")
print(Query_Prompt)

Human: Answer the following question based on this context:

temperature means, desertification, decreasing precipitation, loss of biodiversity, land and forest degradation, glacial retreat and related impacts, ocean 
acidification, sea level rise and salinization. {2.1.2}

Very likely
Global sea
level rise
Glacier
retreat
Medium conﬁdence
Increase in 
compound
ﬂooding
Increase in 
agricultural 
& ecological 
drought
Increase 
in ﬁre
weather

conditions, which are increasingly attributed to human inﬂuence
Attribution of observed physical climate changes to human inﬂuence:
Virtually certain
Increase 
in hot 
extremes 
Upper 
ocean
acidiﬁcation
pH
Likely
Increase 
in heavy 
precipitation
Very likely
Global sea
level rise
Glacier
retreat

health challenges 36 ( very high confidence), flooding in coastal and other low-lying cities and regions ( high confidence), 
biodiversity loss in land, freshwater and ocean ecosystems ( medium to  very high confidence, depending on ecosystem),

Question

### Final Output

In [7]:
response = llm.invoke(Query_Prompt)

print("Final answer :")
print(response.content)

Final answer :
According to the provided context, some climate-related risks include:

1. Global sea level rise
2. Glacier retreat
3. Compound flooding
4. Agricultural and ecological drought
5. Increase in fire weather conditions
6. Hot extremes
7. Upper ocean acidification
8. Heavy precipitation
9. Flooding in coastal and other low-lying cities and regions
10. Biodiversity loss in land, freshwater, and ocean ecosystems


### Testing on questions which are not clearly present on docs

In [12]:
question="Who are Policymakers?"

# Creating Retriever
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 4})

# Importing the Prompt Template
template = """For the given question try to generate a hypothetical answer
Only generate the answer and nothing else:
Question: {question}
"""
# for converting the query to a prompt
Prompt = ChatPromptTemplate.from_template(template)
query = Prompt.format(question="Who are Policymakers?")

hypothetical_answer = llm.invoke(query).content
print(hypothetical_answer)

Elected officials, government agencies, and specialized organizations that have the authority to make decisions that affect society, economy, and environment, such as presidents, prime ministers, congress members, regulatory bodies, and international organizations.


In [13]:
similar_docs = retriever.get_relevant_documents(hypothetical_answer)
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

Prompt = ChatPromptTemplate.from_template(template)
# Creating a function to format the retrieved docs
def format_docs(docs):
   return "\n\n".join(doc.page_content for doc in docs)

formatted_docs = format_docs(similar_docs)

Query_Prompt = Prompt.format(context=formatted_docs, 
question="Who are Policymakers?")
print(Query_Prompt)

Human: Answer the following question based on this context:

interests, enable coordination and inform strategy setting but require adequate institutional capacity. Policy support is 
influenced by actors in civil society, including businesses, youth, women, labour, media, Indigenous Peoples, and local

development pathways
Civil 
society
Governments
Private 
sector
Conditions that enable 
individual and collective actions
• Inclusive governance 
• Diverse knowledges and values
• Finance and innovation
• Integration across sectors 
and time scales
• Ecosystem stewardship

communities. Effectiveness is enhanced by political commitment and partnerships between different groups in society. 
(high confidence) {2.2, 4.7}
C.6.3 Effective multilevel governence for mitigation, adaptation, risk management, and climate resilient development is

the part of the Intergovernmental Panel on Climate Change concerning the legal status of any country, territory, city or area or of 
its authorities, or 

In [14]:
response = llm.invoke(Query_Prompt)

print("Final answer :")
print(response.content)

Final answer :
Based on the provided context, Policymakers are not explicitly mentioned. However, it can be inferred that Policymakers are likely to be Governments, which are mentioned as one of the actors influencing policy support, along with actors in civil society, including businesses, youth, women, labour, media, Indigenous Peoples, and local communities.
