# Vector & Graph Retrieval Augmented Generation

In this notebook we will be using the LLama Index toolkit to store and retrieve our dataset that will be used as input into our LLM Use case. The goal here is to demonstrate how we can enhance a standard LLM to answer use case specific questions with high accuracy

## Introduction

## Imports

In [1]:
!pip install llama-index==0.9.43

Collecting llama-index==0.9.43
  Downloading llama_index-0.9.43-py3-none-any.whl (15.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.9/15.9 MB[0m [31m50.4 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json (from llama-index==0.9.43)
  Downloading dataclasses_json-0.6.4-py3-none-any.whl (28 kB)
Collecting deprecated>=1.2.9.3 (from llama-index==0.9.43)
  Downloading Deprecated-1.2.14-py2.py3-none-any.whl (9.6 kB)
Collecting dirtyjson<2.0.0,>=1.0.8 (from llama-index==0.9.43)
  Downloading dirtyjson-1.0.8-py3-none-any.whl (25 kB)
Collecting httpx (from llama-index==0.9.43)
  Downloading httpx-0.26.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.9/75.9 kB[0m [31m13.1 MB/s[0m eta [36m0:00:00[0m
Collecting openai>=1.1.0 (from llama-index==0.9.43)
  Downloading openai-1.12.0-py3-none-any.whl (226 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m226.7/226.7 kB[0m [31m36.1 MB/s[0m eta [36m

In [2]:
%%capture
!pip install pypdf pyvis llama_index==0.9.43

In [9]:
# import the llama_
from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index import ServiceContext
from llama_index.storage.storage_context import StorageContext
from llama_index.graph_stores import SimpleGraphStore
from llama_index import KnowledgeGraphIndex
from llama_index.llms import OpenAI
from llama_index.embeddings import OpenAIEmbedding

import pickle

from pyvis.network import Network

from ipywidgets import GridspecLayout
import ipywidgets as widgets

## API Key Setup

In [10]:
OPENAI_API_KEY = "tomoro_jpmc_event_2024"


llm = OpenAI(temperature=0, model="gpt-3.5-turbo",api_base='https://jpmcproxy-j2v7yxjplq-nw.a.run.app/v1', api_key='tomoro_jpmc_event_2024')
embeds = OpenAIEmbedding(api_base='https://jpmcproxy-j2v7yxjplq-nw.a.run.app/v1', api_key=OPENAI_API_KEY)
service_context = ServiceContext.from_defaults(llm=llm, chunk_size_limit=512,embed_model=embeds)




The documents we are querying are related to BABA, AliBaba , recent articles from Seeking Alpha as well as earnings call transcript.

In [5]:
document_folder = './documents/'

### First a simple example implementing RAG via Vector Embeddings:

This code allows us to query the documents, retrieve relevant content from the documents with respect to the question, then this will be sent to the LLM model

In [11]:
#Load the directory into LLamas directory reader
documents = SimpleDirectoryReader(document_folder).load_data()

#Index the documents
simple_rag_index = VectorStoreIndex.from_documents(documents,service_context=service_context)

#Set your query engine
rag_query_engine = simple_rag_index.as_query_engine()


#Pass your question against the vector store to receive the relevant context and pass to the LLM
response = rag_query_engine.query("How are Alibaba doing?")

print(f"Here is the model output: {response}")

Here is the model output: Alibaba is facing challenges in its business. The company's earnings have not been growing as expected, and it has suffered impairments in some of its ventures. Competition is increasing, both domestically and internationally, which is impacting Alibaba's market share. The company is trying to strengthen its position by investing more in its international activities, but investors have doubts about whether these expenses will justify the results. Additionally, concerns have been raised in Europe regarding the influx of Chinese small-ticket items, which are seen as unfair competition for European manufacturers. Overall, Alibaba's Cloud and Chinese e-commerce businesses have lost market share, and it remains uncertain if this trend can be reversed or if it will accelerate further.


### Example implementing RAG via Knowledge Graph:

In [None]:
graph_store = SimpleGraphStore()
storage_context = StorageContext.from_defaults(graph_store=graph_store)

kg_index = KnowledgeGraphIndex.from_documents(documents=documents,storage_context=storage_context,service_context=service_context)
kg_query_engine = kg_index.as_query_engine()


Save the knowledge graph index as a file

In [None]:
with open('kg_index.pkl', 'wb') as f:
    pickle.dump(kg_index, f)

Use graphing tool called networkx https://networkx.org/documentation/stable/tutorial.html that supports graph building

In [None]:
g = kg_index.get_networkx_graph()
net = Network(notebook=True, cdn_resources="in_line", directed=True)
net.from_nx(g)
net.show("AlibabaGraph.html")

Asking the same question using the knowledge graph to collect relevant information with respects to the question being asked

In [None]:
response = kg_query_engine.query('What are the most divergent opinions on Alibaba?')
print (response)

In [None]:
def rag_query(prompt):
    return rag_query_engine.query(prompt)

def kg_query(prompt):
    return kg_query_engine.query(prompt)

This is a simple UI that allows you to ask questions with respects to the documents using both techniques

In [None]:
grid = GridspecLayout(8,6)


submit_button = widgets.Button(
    description='Query',
    disabled=False,
    button_style='', # 'success', 'info', 'warning', 'danger' or ''
    tooltip='Press to submit',
    icon='lock',
    layout=widgets.Layout(height='auto', width='auto'),
)

input_box = widgets.Textarea(
    value=None,
    placeholder='Type a question about your documents',
    description='Question:',
    disabled=False,
    layout=widgets.Layout(height='auto', width='auto'),
    rows=4
)

results_box = widgets.Textarea(
    value=None,
    placeholder='Results...',
    description='Answer:',
    disabled=False,
    layout=widgets.Layout(height='auto', width='auto'),
    rows=4
)

# we will populate this later with our list of methods.
dropdown = widgets.Select(
    options=['RAG','KG Rag'],
    value='RAG',
    # rows=10,
    description='Method:',
    disabled=False
)
filter_methods = {'RAG':rag_query,'KG Rag':kg_query}
dropdown.options = (filter_methods.keys())

grid[4:7,5] = dropdown
grid[1:4,:5] = input_box
grid[1:3,5] = submit_button
grid[4:8,:5] = results_box

# anywhere you can now just update the variable and it will live update.
def question(e):
    user_input = input_box.value
    method = filter_methods[dropdown.value]
    results_box.value=method(user_input).response

submit_button.on_click(question)

grid