# LlamaIndex Deep Dive

## Using LLMs

In [10]:
# Use the paid OpenAI

from llama_index.llms.openai import OpenAI

import dotenv

dotenv.load_dotenv()

True

In [3]:
from llama_index.llms.ollama import Ollama
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.core.embeddings import resolve_embed_model

In [4]:
# global settings

# bge-m3 embedding model
Settings.embed_model = resolve_embed_model("local:BAAI/bge-small-en-v1.5")

# ollama
Settings.llm = Ollama(model="mistral", request_timeout=30.0)

llm = Ollama(model="mistral", request_timeout=30.0)

In [11]:
response = llm.complete("Rafael Nadal is ")
print(response)

 Rafael Nadal is a professional tennis player from Spain. He is widely regarded as one of the greatest tennis players of all time. Nadal has won a total of 20 Grand Slam titles, including a record 13 French Open titles, making him the most successful tennis player in history at Roland Garros. His dominant play on clay courts earned him the moniker "The King of Clay." In addition to his success in Grand Slam tournaments, Nadal has also held the No. 1 spot in the ATP rankings for a total of 209 weeks. He continues to compete at the highest level and is known for his impressive work ethic, tenacity, and powerful left-handed game.


In [16]:
# switch to llama2 model
llm = Ollama(model="llama2", request_timeout=60.0)
response = llm.complete("Rafael Nadal is ")
print(response)

Rafael Nadal is a professional tennis player from Spain. He is widely considered one of the greatest tennis players of all time, and has won numerous awards and accolades throughout his career. Some of his achievements include:

* 19 Grand Slam titles, including 4 French Open titles and 4 US Open titles.
* 5 ATP Finals titles.
* 3 Davis Cup titles with Spain.
* An Olympic gold medal in singles at the 2008 Beijing Olympics.
* Ranked as the world number one for a record 260 weeks.
* Won 87 ATP titles overall.

Nadal is known for his aggressive playing style, which includes his powerful topspin forehand and quick footwork around the court. He has been praised for his athleticism, work ethic, and mental toughness, and is widely regarded as one of the greatest tennis players of all time.


In [5]:
# load the Paul Graham essay for context
documents = SimpleDirectoryReader("../data/llamaindex").load_data()
index = VectorStoreIndex.from_documents(documents,)

In [19]:
query_engine = index.as_query_engine()

response = query_engine.query(
    "What's the latest news of Markel?"
)

print(response)

 The letter is a message from the Chief Executive Officer (CEO) of Markel, Thomas S. Gayner, to the shareholders. In this letter, he expresses his gratitude for their role as customers, associates, and/or shareholders of Markel. He also mentions that they have experienced significant growth in their share price since going public. Additionally, he refers to the annual meeting which will be held on May 17, 2023, and encourages attendance for the opportunity to connect with the management team and engage in conversations. The letter also touches upon the importance of their work and the impact it has on customers, associates, and shareholders. However, the letter does not provide any specific newsworthy information regarding recent developments or current events at Markel.


In [21]:
response = query_engine.query(
    "What's the financial results of Markel in 2022?"
)

print(response)

 In 2022, Markel reported total operating revenues of $11,675 million, gross written premiums of $13,202 million, a combined ratio of 92%, invested assets of $27,420 million, and invested assets per common share of $2,042.73. The net income (loss) to common shareholders was $(250) million, and the comprehensive income (loss) to shareholders was $(1,309) million. The company's shareholders' equity was $13,066 million, with a book value per common share of $929.27. Additionally, Markel reported a 5-year CAGR in book value per common share of 6% and a closing stock price per share of $1,317.49. The financial results represent the outcome of the dedication and effort put forth by the company's people.


## Loading Data (Ingestion)

In [22]:
from llama_index.core import VectorStoreIndex

vector_index = VectorStoreIndex.from_documents(documents)
vector_index.as_query_engine()

<llama_index.core.query_engine.retriever_query_engine.RetrieverQueryEngine at 0x7f5d294b51e0>

In [6]:
from llama_index.core.node_parser import SentenceSplitter

# parse nodes
parser = SentenceSplitter()
nodes = parser.get_nodes_from_documents(documents)

# build index
index = VectorStoreIndex(nodes)

In [7]:
query_engine = index.as_query_engine()

response = query_engine.query(
    "What's the latest news of Markel?"
)

print(response)

 Markel Corporation's Chief Executive Officer, Thomas S. Gayner, expresses gratitude to customers, associates, and shareholders in a letter. He mentions that just as Cal Ripken Jr. and Bill Russell have been instrumental in their respective fields, the team at Markel is dedicated to their work and could not do it without the support of their stakeholders. Additionally, he invites everyone to attend the annual meeting on May 17, 2023, for a chance to connect with the management team and engage in thoughtful discussions. The letter also touches upon the growth in Markel's share price since going public and the importance of each win for customers, associates, and shareholders.


## Indexing

* A VectorStoreIndex is by far the most frequent type of Index you’ll encounter. The Vector Store Index takes your Documents and splits them up into Nodes. It then creates vector embeddings of the text of every node, ready to be queried by an LLM.

* A Summary Index is a simpler form of Index best suited to queries where, as the name suggests, you are trying to generate a summary of the text in your Documents. It simply stores all of the Documents and returns all of them to your query engine.

## Storing

In [9]:
# store the index into files

index.storage_context.persist(persist_dir="../data/llamaindex/temp")

In [10]:
# get back from reading

from llama_index.core import StorageContext, load_index_from_storage

storage_context = StorageContext.from_defaults(persist_dir="../data/llamaindex/temp")

# load index
index = load_index_from_storage(storage_context)

In [None]:
# open source vector store of chroma

