# Advanced RAG w/ Re-Ranking | Groq + Ollama + LangChain + Cohere + PineCone + Llama3-70B

This Jupyter notebook demonstrates an advanced implementation of the Retrieval-Augmented Generation (RAG) technique, enhanced with document re-ranking, to generate precise summaries based on a given query. The process involves several key steps:

1. **Importing Libraries**: Sets up the environment by importing necessary Python libraries for PDF processing, text splitting, embeddings, and API interactions.
2. **Language Model Setup with Groq**: Initializes a language model using Groq's platform, leveraging the Llama3 model with 8 billion parameters for generating responses.
3. **PDF Text Processing**: Reads and processes text from a PDF document, splitting it into manageable chunks for further processing.
4. **Document Embedding with Ollama**: Utilizes the Ollama embeddings to convert text chunks into vector representations.
5. **Pinecone Indexing**: Sets up a Pinecone vector database for storing and querying document embeddings.
6. **Query Embedding and Retrieval**: Embeds a user query for similarity search in the Pinecone index, retrieving relevant document chunks.
7. **Re-Ranking with Cohere**: Applies Cohere's re-ranking model to refine the search results, ensuring the most relevant documents are selected.
8. **Summary Generation**: Uses the Groq language model to generate a comprehensive summary based on the context provided by the re-ranked documents.

This notebook serves as a comprehensive guide for implementing a sophisticated RAG system with re-ranking, showcasing the integration of multiple AI and NLP technologies to enhance information retrieval and summarization tasks.

install dependencies

In [58]:
!pip install groq ollama cohere langchain pinecone-client PyPDF2 langchain_community langchain-groq python-dotenv


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.1.1[0m[39;49m -> [0m[32;49m24.1.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


Add your Environment Variables to hit the varietry of different API's

- GROQ_API_KEY
- PINECONE_API_KEY
- COHERE_API_KEY

In [59]:
import PyPDF2
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import OllamaEmbeddings
from langchain_groq import ChatGroq

#### Setup Language Model with Groq

Let's now set up the language model with Groq, Llama3 8 Billion Parameter

In [60]:
# llm_local = ChatOllama(model="mistral:instruct")
from os import getenv
from dotenv import load_dotenv
load_dotenv()

GROQ_API_KEY=getenv("GROQ_API_KEY")

llm_groq = ChatGroq(
            groq_api_key=GROQ_API_KEY,
        #     model_name='llama3:8b' 
            model_name='llama-3.1-8b-instant'
            # model_name='mixtral-8x7b-32768'
    )

#### Read and Split PDF Text

Reads text from a specified PDF file, concatenates it into a single string, and then splits the text into manageable chunks for processing.

In [61]:
# Read the PDF file
pdf = PyPDF2.PdfReader("/Users/williamzebrowski/Library/Mobile Documents/com~apple~CloudDocs/groq/data/Merged_Document.pdf")
pdf_text = ""
for page in pdf.pages:
    pdf_text += page.extract_text()

# Split the text into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
texts = text_splitter.split_text(pdf_text)
texts[0]

"Two-step verification helps protect your account. When you log in, you'll be asked to provide a\none-time code that you'll receive through email, text, or an authenticator app. If one of the methods\nisn't working, try a different method. You'll also receive a backup code when you enable two-step\nverification, which lets you access your account. If you don't have your backup code, contact the\nFSA help center: https://studentaid.gov/help-center/contactIf you created an FSA ID with a SSN, it will take 1-3 days to be verified.\nIf a parent/contributor created an FSA ID without an SSN and successfully answered the knowledge\nbased identity questions, it can be used immediately. If unable to confirm identity, an email will be\nsent with a case number and instructions to submit additional documents. Contributors can complete"

#### Embed Documents with Ollama Embeddings

Let's pull down a Ollama Embedding model.

If you have ollama install, run this in your terminal:

`ollama pull nomic-embed-text`

In [62]:
embeddings = OllamaEmbeddings(model="nomic-embed-text")

r1 = embeddings.embed_documents(
    texts
)

#### Pinecone Index Setup and Upsert

Configures Pinecone for vector database operations, creates or connects to an index, and upserts document embeddings. Also defines a function to get query embeddings.

In [63]:
from os import getenv
from dotenv import load_dotenv
load_dotenv()
from pinecone import Pinecone

PINECONE_API_KEY=getenv("PINECONE_API_KEY")

pc = Pinecone(api_key=PINECONE_API_KEY)

index = pc.Index("ai-index")

for i in range(len(texts)):
    index.upsert([((str(i),r1[i],{"text":texts[i]}))])
    
print("done upserting...")

def get_query_embdedding(text):
    embedding=embeddings.embed_query(text)
    return embedding

done upserting...


#### Cohere Setup and Query Embedding

Initializes the Cohere client with an API key, generates an embedding for a query, and performs a vector search in the Pinecone index to find similar documents

In [64]:
import cohere
from dotenv import load_dotenv
load_dotenv()

COHERE_API_KEY=getenv("COHERE_API_KEY")
# init client
co = cohere.Client(COHERE_API_KEY)

query="what is two step verification?"

question_embedding=get_query_embdedding(query)

query_result = index.query(vector=question_embedding, top_k=5, include_metadata=True)
similar_texts = []
# Extract metadata from query result
docs = {x["metadata"]['text']: i for i, x in enumerate(query_result["matches"])}

# print out the top 5 results:
for i, match in enumerate(query_result["matches"], start=0):
    text = match["metadata"]["text"]
    score = match["score"]  # Assuming each match has a score attribute
    print(f"Result {i}:")
    print(f"Text: {text}")
    print(f"Score: {score}\n")


Result 0:
Text: Two-step verification helps protect your account. When you log in, you'll be asked to provide a
one-time code that you'll receive through email, text, or an authenticator app. If one of the methods
isn't working, try a different method. You'll also receive a backup code when you enable two-step
verification, which lets you access your account. If you don't have your backup code, contact the
FSA help center: https://studentaid.gov/help-center/contactIf you created an FSA ID with a SSN, it will take 1-3 days to be verified.
If a parent/contributor created an FSA ID without an SSN and successfully answered the knowledge
based identity questions, it can be used immediately. If unable to confirm identity, an email will be
sent with a case number and instructions to submit additional documents. Contributors can complete
Score: 0.624541283

Result 1:
Text: If you entered an incorrect email, it's okay as long as the other items match. Once they log in with
their FSA ID, they'll

Initializes the Cohere client with an API key, generates an embedding for a query, and performs a vector search in the Pinecone index to find similar documents.

#### Document Re-ranking with Cohere

Uses Cohere's re-ranking model to refine the search results based on relevance to the query, then prepares a template for generating a summary.

In [67]:
# Rerank the documents
rerank_docs = co.rerank(
    model="rerank-english-v3.0",
    query=query, 
    documents=list(docs.keys()), 
    top_n=5, 
    return_documents=True
)
print("rerank_docs...",rerank_docs)

# Extract reranked documents
reranked_texts = [doc.document.text for doc in rerank_docs.results]

context=" ".join(reranked_texts)

Template = f"Based on the following context : {context} generate precise summary related to question : {query} Do not remove necessary information related to context. Consider `\n` as newline character."  
# Filling the template with the actual context and question.
filled_template = Template.format(context=context, question=query)

print(filled_template)


Based on the following context : Two-step verification helps protect your account. When you log in, you'll be asked to provide a
one-time code that you'll receive through email, text, or an authenticator app. If one of the methods
isn't working, try a different method. You'll also receive a backup code when you enable two-step
verification, which lets you access your account. If you don't have your backup code, contact the
FSA help center: https://studentaid.gov/help-center/contactIf you created an FSA ID with a SSN, it will take 1-3 days to be verified.
If a parent/contributor created an FSA ID without an SSN and successfully answered the knowledge
based identity questions, it can be used immediately. If unable to confirm identity, an email will be
sent with a case number and instructions to submit additional documents. Contributors can complete •provide increased security, and
•are generally faster than email or SMS text.
Step 1: Install an authenticator app
You can use an authentica

Uses Cohere's re-ranking model to refine the search results based on relevance to the query, then prepares a template for generating a summary.

#### Generate Summary with Groq

Configures the Groq client, sends the filled template to the chat model for processing, and prints the generated summary based on the context and query provided.

In [68]:
from os import getenv
from groq import Groq
from dotenv import load_dotenv
load_dotenv()

GROQ_API_KEY=getenv("GROQ_API_KEY")

client = Groq(
    api_key=GROQ_API_KEY ,
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": filled_template,
        }
    ],
    model="llama-3.1-8b-instant",
)

print(chat_completion.choices[0].message.content)

Based on the context provided, two-step verification is a security feature that helps protect online accounts by requiring two separate forms of verification to access the account.

Two-step verification helps protect your account. When you log in, you'll be asked to provide a one-time code that you'll receive through email, text, or an authenticator app. This requires you to use a second form of verification, in addition to your regular login credentials.

The benefits of two-step verification include:

• Increased security: Two-step verification makes it harder for hackers to gain access to your account, even if they know your login credentials.

• Faster access: Two-step verification methods like authenticator apps are generally faster than email or SMS text.

Two-step verification can be set up using an authenticator app, such as Google Authenticator, Microsoft Authenticator, Authy, LastPass, or Duo Mobile. Alternatively, you can use SMS text or email to receive your authentication

# Conclusion

In this notebook, we successfully demonstrated the advanced implementation of the Retrieval-Augmented Generation (RAG) technique, complemented by document re-ranking, to generate precise summaries from a given query. Through the integration of cutting-edge technologies and platforms such as Groq, Ollama, Pinecone, and Cohere, we showcased a sophisticated system capable of enhancing information retrieval and summarization tasks.

Key takeaways include:
- The ability to process and split PDF text into manageable chunks for further analysis.
- The use of Ollama embeddings to convert text chunks into vector representations, facilitating efficient document retrieval.
- The application of Pinecone's vector database for storing and querying document embeddings, enabling fast and scalable searches.
- The implementation of Cohere's re-ranking model to refine search results, ensuring the selection of the most relevant documents.
- The generation of comprehensive summaries using the Groq language model, based on the context provided by re-ranked documents.

This notebook not only serves as a practical guide to implementing a RAG system with re-ranking but also illustrates the power of combining multiple AI and NLP technologies to solve complex problems in information retrieval and summarization. We hope this demonstration inspires further exploration and development of advanced NLP applications.