# RAG
- RAG is a technique for augmenting LLM knowledge with additional data.
- LLMs can reason about wide-ranging topics, but their knowledge is limited to the public data up to a specific point in time that they were trained on.
- If you want to build AI applications that can reason about private data or data introduced after a model's cutoff date, you need to augment the knowledge of the model with the specific information it needs.
- The process of bringing the appropriate information and inserting it into the model prompt is known as retrieval augmented generation (RAG).

# Langchain
LangChain is a framework for developing applications powered by large language models (LLMs).

https://python.langchain.com/docs/introduction/

# NIM
NIM is a set of optimized cloud-native microservices designed to shorten time-to-market and simplify deployment of generative AI models anywhere, across cloud, data center, and GPU-accelerated workstations. It expands the developer pool by abstracting away the complexities of AI model development and packaging for production ‌using industry-standard APIs.

https://developer.nvidia.com/blog/nvidia-nim-offers-optimized-inference-microservices-for-deploying-ai-models-at-scale/

https://docs.api.nvidia.com/nim/reference/llm-apis

![image.png](images/NIM.png)

 # NVIDIA API Catalog
 https://docs.api.nvidia.com/
 
- NVIDIA API Catalog is a hosted platform for accessing a wide range of microservices online.
- You can test models on the catalog and then export them with an NVIDIA AI Enterprise license for on-premises or cloud deployment
  
# Milvus vectorStore
https://milvus.io/docs

Milvus is a high-performance, highly scalable vector database that runs efficiently across a wide range of environments, from a laptop to large-scale distributed systems. It is available as both open-source software and a cloud service.

Milvus is an open-source project under LF AI & Data Foundation distributed under the Apache 2.0 license. Most contributors are experts from the high-performance computing (HPC) community, specializing in building large-scale systems and optimizing hardware-aware code.

# Mistral mixtral-8x7b-instruct

https://docs.api.nvidia.com/nim/reference/mistralai-mixtral-8x7b-instruct


Mixtral 8x7B Instruct is a language model that can follow instructions, complete requests, and generate creative text formats. Mixtral 8x7B a high-quality sparse mixture of experts model (SMoE) with open weights.

This model has been optimized through supervised fine-tuning and direct preference optimization (DPO) for careful instruction following. On MT-Bench, it reaches a score of 8.30, making it the best open-source model, with a performance comparable to GPT3.5.

Mixtral outperforms Llama 2 70B on most benchmarks with 6x faster inference. It is the strongest open-weight model with a permissive license and the best model overall regarding cost/performance trade-offs. In particular, it matches or outperforms GPT3.5 on most standard benchmarks.

Mixtral has the following capabilities.

- It gracefully handles a context of 32k tokens.
- It handles English, French, Italian, German and Spanish.
- It shows strong performance in code generation.
- It can be finetuned into an instruction-following model that achieves a score of 8.3 on MT-Bench.

In [41]:
from dotenv import dotenv_values
import os
# read env file
ROOT_DIR = os.getcwd()
config = dotenv_values(os.path.join(ROOT_DIR, "keys", ".env"))

In [2]:
os.environ['NVIDIA_API_KEY'] = config.get('NVIDIA_API_KEY')

In [3]:
# test run and see that you can genreate a respond successfully
from langchain_nvidia_ai_endpoints import ChatNVIDIA,NVIDIAEmbeddings
llm = ChatNVIDIA(model="mistralai/mixtral-8x7b-instruct-v0.1", max_tokens=1024)
embedder_document = NVIDIAEmbeddings(model="NV-Embed-QA", truncate="END")

In [4]:
import requests

urls_content = []

url_template1 = "https://nvidianews.nvidia.com/news/nvidia-announces-financial-results-for-{quarter}-quarter-fiscal-{year}"
url_template2 = "https://nvidianews.nvidia.com/news/nvidia-announces-financial-results-for-{quarter}-quarter-and-fiscal-{year}"

for quarter in ["first", "second", "third", "fourth"]:
    for year in range(2020,2025):
        args = {"quarter":quarter, "year": str(year)}
        if quarter == "fourth":
            urls_content.append(requests.get(url_template2.format(**args)).content)
        else:
            urls_content.append(requests.get(url_template1.format(**args)).content)

In [5]:
# extract the url, title, text content, and tables in the html
from bs4 import BeautifulSoup
import markdownify

def extract_url_title_time(soup):
    url = ""
    title = ""
    revised_time = ""
    tables = []
    try:
        if soup.find("title"):
            title = str(soup.find("title").string)

        og_url_meta = soup.find("meta", property="og:url")
        if og_url_meta:
            url = og_url_meta.get("content", "")

        for table in soup.find_all("table"):
            tables.append(markdownify.markdownify(str(table)))
            table.decompose()

        text_content = soup.get_text(separator=' ', strip=True)
        text_content = ' '.join(text_content.split())

        return url, title,text_content, tables
    except:
        print("parse error")
        return "", "", "", "", []

parsed_htmls = []
for url_content in urls_content:
    soup = BeautifulSoup(url_content, 'html.parser')
    url, title, content, tables = extract_url_title_time(soup)
    parsed_htmls.append({"url":url, "title":title, "content":content, "tables":tables})

In [35]:
parsed_htmls[0]["url"]

'http://nvidianews.nvidia.com/news/nvidia-announces-financial-results-for-first-quarter-fiscal-2020'

In [47]:
parsed_htmls[0]["tables"][0]

"In Q1 FY20, NVIDIA's GAAP revenue was $2,220 million, up 1% Q/Q and down 31% Y/Y. Gross margin was 58.4%, up 3.7% Q/Q and down 610 bps Y/Y. Operating expenses were $938 million, up 3% Q/Q and 21% Y/Y. Operating income was $358 million, up 22% Q/Q and down 72% Y/Y. Net income was $394 million, down 31% Q/Q and 68% Y/Y. Diluted earnings per share were $0.64, down 30% Q/Q and 68% Y/Y."

In [6]:
# summarize tables
def get_table_summary(table, title, llm):
    res = ""
    try:
        #table = markdownify.markdownify(table)
        prompt = f"""
                    [INST] You are a virtual assistant.  Your task is to understand the content of TABLE in the markdown format.
                    TABLE is from "{title}".  Summarize the information in TABLE into SUMMARY. SUMMARY MUST be concise. Return SUMMARY only and nothing else.
                    TABLE: ```{table}```
                    Summary:
                    [/INST]
                """
        result = llm.invoke(prompt)
        res = result.content
    except Exception as e:
        print(f"Error: {e} while getting table summary from LLM")
        if not os.getenv("NVIDIA_API_KEY", False):
            print("NVIDIA_API_KEY not set")
        pass
    finally:
        return res


for parsed_item in parsed_htmls:
    title = parsed_item['title']
    for idx, table in enumerate(parsed_item['tables']):
        print(f"parsing tables in {title}...")
        table = get_table_summary(table, title, llm)
        parsed_item['tables'][idx] = table

parsing tables in NVIDIA Announces Financial Results for First Quarter Fiscal 2020 | NVIDIA Newsroom...
parsing tables in NVIDIA Announces Financial Results for First Quarter Fiscal 2020 | NVIDIA Newsroom...
parsing tables in NVIDIA Announces Financial Results for First Quarter Fiscal 2020 | NVIDIA Newsroom...
parsing tables in NVIDIA Announces Financial Results for First Quarter Fiscal 2020 | NVIDIA Newsroom...
parsing tables in NVIDIA Announces Financial Results for First Quarter Fiscal 2020 | NVIDIA Newsroom...
parsing tables in NVIDIA Announces Financial Results for First Quarter Fiscal 2020 | NVIDIA Newsroom...
parsing tables in NVIDIA Announces Financial Results for First Quarter Fiscal 2021 | NVIDIA Newsroom...
parsing tables in NVIDIA Announces Financial Results for First Quarter Fiscal 2021 | NVIDIA Newsroom...
parsing tables in NVIDIA Announces Financial Results for First Quarter Fiscal 2021 | NVIDIA Newsroom...
parsing tables in NVIDIA Announces Financial Results for First Q

In [14]:
parsed_item.keys()

dict_keys(['url', 'title', 'content', 'tables'])

In [42]:
len(parsed_htmls)

20

In [44]:
parsed_htmls[0]['tables']

["In Q1 FY20, NVIDIA's GAAP revenue was $2,220 million, up 1% Q/Q and down 31% Y/Y. Gross margin was 58.4%, up 3.7% Q/Q and down 610 bps Y/Y. Operating expenses were $938 million, up 3% Q/Q and 21% Y/Y. Operating income was $358 million, up 22% Q/Q and down 72% Y/Y. Net income was $394 million, down 31% Q/Q and 68% Y/Y. Diluted earnings per share were $0.64, down 30% Q/Q and 68% Y/Y.",
 'Q1 FY20 revenue was $2.22 billion, up 1% quarter-over-quarter and down 31% year-over-year. Gross margin was 59.0%, up 300 bps QoQ and down 570 bps YoY. Operating expenses increased 16% YoY to $753 million. Operating income was $557 million, up 16% QoQ but down 61% YoY. Net income was $543 million, up 9% QoQ and down 58% YoY. Diluted EPS was $0.88, up 10% QoQ and down 57% YoY.',
 "In Q1 FY2020, NVIDIA's revenue was $2.22 billion, down from $3.21 billion in Q1 FY2019. Gross profit was $1.29 billion, with income from operations at $358 million. Net income stood at $394 million, or $0.64 per diluted share,

In [15]:
parsed_item['url']

'http://nvidianews.nvidia.com/news/nvidia-announces-financial-results-for-fourth-quarter-and-fiscal-2024'

In [16]:
parsed_item['title']

'NVIDIA Announces Financial Results for Fourth Quarter and Fiscal 2024 | NVIDIA Newsroom'

In [18]:
#parsed_item['content']

In [13]:
parsed_item['tables'][0]

"In Q4 FY24, NVIDIA's GAAP revenue was $22,103 million, up 22% Q/Q and 265% Y/Y. Gross margin was 76.0%, an increase of 2.0 points Q/Q and 12.7 points Y/Y. Operating expenses were $3,176 million, up 6% Y/Y. Operating income was $13,615 million, a 31% increase Q/Q and 983% Y/Y. Net income stood at $12,285 million, up 33% Q/Q and 769% Y/Y, with diluted earnings per share at $4.93, a 33% increase Q/Q and 765% Y/Y."

# Splitter Model
- https://huggingface.co/intfloat/e5-large-v2
- https://api.python.langchain.com/en/latest/sentence_transformers/langchain_text_splitters.sentence_transformers.SentenceTransformersTokenTextSplitter.html

In [21]:
from langchain_milvus import Milvus
from langchain.docstore.document import Document
from langchain.text_splitter import SentenceTransformersTokenTextSplitter
TEXT_SPLITTER_MODEL = "intfloat/e5-large-v2"
TEXT_SPLITTER_CHUNCK_SIZE = 200
TEXT_SPLITTER_CHUNCK_OVERLAP = 50

text_splitter = SentenceTransformersTokenTextSplitter(
    model_name=TEXT_SPLITTER_MODEL,
    tokens_per_chunk=TEXT_SPLITTER_CHUNCK_SIZE,
    chunk_overlap=TEXT_SPLITTER_CHUNCK_OVERLAP,
)

documents = []

for parsed_item in parsed_htmls:
    title = parsed_item['title']
    url =  parsed_item['url']
    text_content = parsed_item['content']
    documents.append(Document(page_content=text_content, metadata = {'title':title, 'url':url}))

    for idx, table in enumerate(parsed_item['tables']):
        table_content = table
        documents.append(Document(page_content=table, metadata = {'title':title, 'url':url}))

documents = text_splitter.split_documents(documents)
print(f"obtain {len(documents)} chunks")

  from tqdm.autonotebook import tqdm, trange


modules.json:   0%|          | 0.00/387 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/67.8k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/57.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/616 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/314 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/201 [00:00<?, ?B/s]

obtain 746 chunks


In [48]:
documents[0]

Document(metadata={'title': 'NVIDIA Announces Financial Results for First Quarter Fiscal 2020 | NVIDIA Newsroom', 'url': 'http://nvidianews.nvidia.com/news/nvidia-announces-financial-results-for-first-quarter-fiscal-2020'}, page_content='nvidia announces financial results for first quarter fiscal 2020 | nvidia newsroom " artificial intelligence computing leadership from nvidia platforms autonomous machines cloud & data center deep learning & ai design & pro visualization healthcare high performance computing self - driving cars gaming & entertainment other links developers industries shop drivers support about nvidia view all products gpu technology conference nvidia blog community careers technologies newsroom nvidia in brief exec bios nvidia blog podcast media assets in the news press contacts online press kits nvidia in brief exec bios nvidia blog podcast media assets in the news press contacts online press kits press release share tweet twitter share linkedin share facebook email i

In [22]:
URI = "./milvus_example.db"

In [24]:
COLLECTION_NAME = "NVIDIA_Finance"
from langchain_milvus import Milvus
vectorstore = Milvus.from_documents(
    documents,
    embedder_document,
    collection_name=COLLECTION_NAME,
    connection_args={"uri": URI}, # replace this with the ip of the workstation where milvus is running
    drop_old=True,
)

In [39]:
docs = vectorstore.similarity_search_with_score("what are 2024 Q3 revenues? ")

In [40]:
docs

[(Document(metadata={'pk': 453541554316116439, 'title': 'NVIDIA Announces Financial Results for Third Quarter Fiscal 2022 | NVIDIA Newsroom', 'url': 'http://nvidianews.nvidia.com/news/nvidia-announces-financial-results-for-third-quarter-fiscal-2022'}, page_content='revenue for q3 fy22 was $ 7. 103 billion, a 9 % q / q and 50 % y / y increase. gross margin was 67. 0 %, up 30 bps q / q and 150 bps y / y. operating income was $ 3. 386 billion, a 10 % q / q and 70 % y / y increase. net income was $ 2. 973 billion, a 13 % q / q and 62 % y / y increase. diluted earnings per share were $ 1. 17, a 13 % q / q and 60 % y / y increase. operating expenses were up 9 % q / q and 25 % y / y.'),
  0.6031383275985718),
 (Document(metadata={'pk': 453541554316116672, 'title': 'NVIDIA Announces Financial Results for Fourth Quarter and Fiscal 2023 | NVIDIA Newsroom', 'url': 'http://nvidianews.nvidia.com/news/nvidia-announces-financial-results-for-fourth-quarter-and-fiscal-2023'}, page_content='period in q4

In [27]:
from langchain.prompts.prompt import PromptTemplate

PROMPT_TEMPLATE = """[INST]You are a friendly virtual assistant and maintain a conversational, polite, patient, friendly and gender neutral tone throughout the conversation.

Your task is to understand the QUESTION, read the Content list from the DOCUMENT delimited by ```, generate an answer based on the Content, and provide references used in answering the question in the format "[Title](URL)".
Do not depend on outside knowledge or fabricate responses.
DOCUMENT: ```{context}```

Your response should follow these steps:

1. The answer should be short and concise, clear.
    * If detailed instructions are required, present them in an ordered list or bullet points.
2. If the answer to the question is not available in the provided DOCUMENT, ONLY respond that you couldn't find any information related to the QUESTION, and do not show references and citations.
3. Citation
    * ALWAYS start the citation section with "Here are the sources to generate response." and follow with references in markdown link format [Title](URL) to support the answer.
    * Use Bullets to display the reference [Title](URL).
    * You MUST ONLY use the URL extracted from the DOCUMENT as the reference link. DO NOT fabricate or use any link outside the DOCUMENT as reference.
    * Avoid over-citation. Only include references that were directly used in generating the response.
    * If no reference URL can be provided, remove the entire citation section.
    * The Citation section can include one or more references. DO NOT include same URL as multiple references. ALWAYS append the citation section at the end of your response.
    * You MUST follow the below format as an example for this citation section:
      Here are the sources used to generate this response:
      * [Title](URL)
[/INST]
[INST]
QUESTION: {question}
FINAL ANSWER:[/INST]"""

prompt_template = PromptTemplate(template=PROMPT_TEMPLATE, input_variables=["context", "question"])

In [28]:
def build_context(chunks):
    context = ""
    for chunk in chunks:
        context = context + "\n  Content: " + chunk.page_content + " | Title: (" + chunk.metadata["title"] + ") | URL: (" + chunk.metadata.get("url", "source") + ")"
    return context


def generate_answer(llm, vectorstore, prompt_template, question):
    retrieved_chunks = vectorstore.similarity_search(question)
    context = build_context(retrieved_chunks)
    args = {"context":context, "question":question}
    prompt = prompt_template.format(**args)
    ans = llm.invoke(prompt)
    return ans.content


question = "what are 2024 Q1 revenues?"

In [29]:
generate_answer(llm, vectorstore, prompt_template, question)

"NVIDIA's Q1 fiscal 2024 revenue was $7,192 million.\n\nHere are the sources used to generate this response:\n- [NVIDIA Announces Financial Results for First Quarter Fiscal 2024 | NVIDIA Newsroom](http://nvidianews.nvidia.com/news/nvidia-announces-financial-results-for-first-quarter-fiscal-2024)"