# LlamaIndex Deep Dive

## Using LLMs

In [10]:
# Use the paid OpenAI

from llama_index.llms.openai import OpenAI

import dotenv

dotenv.load_dotenv()

True

In [3]:
from llama_index.llms.ollama import Ollama
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.core.embeddings import resolve_embed_model

In [4]:
# global settings

# bge-m3 embedding model
Settings.embed_model = resolve_embed_model("local:BAAI/bge-small-en-v1.5")

# ollama
Settings.llm = Ollama(model="mistral", request_timeout=30.0)

llm = Ollama(model="mistral", request_timeout=30.0)

In [11]:
response = llm.complete("Rafael Nadal is ")
print(response)

 Rafael Nadal is a professional tennis player from Spain. He is widely regarded as one of the greatest tennis players of all time. Nadal has won a total of 20 Grand Slam titles, including a record 13 French Open titles, making him the most successful tennis player in history at Roland Garros. His dominant play on clay courts earned him the moniker "The King of Clay." In addition to his success in Grand Slam tournaments, Nadal has also held the No. 1 spot in the ATP rankings for a total of 209 weeks. He continues to compete at the highest level and is known for his impressive work ethic, tenacity, and powerful left-handed game.


In [16]:
# switch to llama2 model
llm = Ollama(model="llama2", request_timeout=60.0)
response = llm.complete("Rafael Nadal is ")
print(response)

Rafael Nadal is a professional tennis player from Spain. He is widely considered one of the greatest tennis players of all time, and has won numerous awards and accolades throughout his career. Some of his achievements include:

* 19 Grand Slam titles, including 4 French Open titles and 4 US Open titles.
* 5 ATP Finals titles.
* 3 Davis Cup titles with Spain.
* An Olympic gold medal in singles at the 2008 Beijing Olympics.
* Ranked as the world number one for a record 260 weeks.
* Won 87 ATP titles overall.

Nadal is known for his aggressive playing style, which includes his powerful topspin forehand and quick footwork around the court. He has been praised for his athleticism, work ethic, and mental toughness, and is widely regarded as one of the greatest tennis players of all time.


In [5]:
# load the Paul Graham essay for context
documents = SimpleDirectoryReader("../data/llamaindex").load_data()
index = VectorStoreIndex.from_documents(documents,)

In [19]:
query_engine = index.as_query_engine()

response = query_engine.query(
    "What's the latest news of Markel?"
)

print(response)

 The letter is a message from the Chief Executive Officer (CEO) of Markel, Thomas S. Gayner, to the shareholders. In this letter, he expresses his gratitude for their role as customers, associates, and/or shareholders of Markel. He also mentions that they have experienced significant growth in their share price since going public. Additionally, he refers to the annual meeting which will be held on May 17, 2023, and encourages attendance for the opportunity to connect with the management team and engage in conversations. The letter also touches upon the importance of their work and the impact it has on customers, associates, and shareholders. However, the letter does not provide any specific newsworthy information regarding recent developments or current events at Markel.


In [21]:
response = query_engine.query(
    "What's the financial results of Markel in 2022?"
)

print(response)

 In 2022, Markel reported total operating revenues of $11,675 million, gross written premiums of $13,202 million, a combined ratio of 92%, invested assets of $27,420 million, and invested assets per common share of $2,042.73. The net income (loss) to common shareholders was $(250) million, and the comprehensive income (loss) to shareholders was $(1,309) million. The company's shareholders' equity was $13,066 million, with a book value per common share of $929.27. Additionally, Markel reported a 5-year CAGR in book value per common share of 6% and a closing stock price per share of $1,317.49. The financial results represent the outcome of the dedication and effort put forth by the company's people.


## Loading Data (Ingestion)

In [22]:
from llama_index.core import VectorStoreIndex

vector_index = VectorStoreIndex.from_documents(documents)
vector_index.as_query_engine()

<llama_index.core.query_engine.retriever_query_engine.RetrieverQueryEngine at 0x7f5d294b51e0>

In [6]:
from llama_index.core.node_parser import SentenceSplitter

# parse nodes
parser = SentenceSplitter()
nodes = parser.get_nodes_from_documents(documents)

# build index
index = VectorStoreIndex(nodes)

In [7]:
query_engine = index.as_query_engine()

response = query_engine.query(
    "What's the latest news of Markel?"
)

print(response)

 Markel Corporation's Chief Executive Officer, Thomas S. Gayner, expresses gratitude to customers, associates, and shareholders in a letter. He mentions that just as Cal Ripken Jr. and Bill Russell have been instrumental in their respective fields, the team at Markel is dedicated to their work and could not do it without the support of their stakeholders. Additionally, he invites everyone to attend the annual meeting on May 17, 2023, for a chance to connect with the management team and engage in thoughtful discussions. The letter also touches upon the growth in Markel's share price since going public and the importance of each win for customers, associates, and shareholders.


## Indexing

* A VectorStoreIndex is by far the most frequent type of Index you’ll encounter. The Vector Store Index takes your Documents and splits them up into Nodes. It then creates vector embeddings of the text of every node, ready to be queried by an LLM.

* A Summary Index is a simpler form of Index best suited to queries where, as the name suggests, you are trying to generate a summary of the text in your Documents. It simply stores all of the Documents and returns all of them to your query engine.

## Storing

In [9]:
# store the index into files

index.storage_context.persist(persist_dir="../data/llamaindex/temp")

In [10]:
# get back from reading

from llama_index.core import StorageContext, load_index_from_storage

storage_context = StorageContext.from_defaults(persist_dir="../data/llamaindex/temp")

# load index
index = load_index_from_storage(storage_context)

In [11]:
# open source vector store of chroma
!pip install chromadb

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting chromadb
  Downloading chromadb-0.4.22-py3-none-any.whl.metadata (7.3 kB)
Collecting build>=1.0.3 (from chromadb)
  Downloading build-1.0.3-py3-none-any.whl.metadata (4.2 kB)
Collecting chroma-hnswlib==0.7.3 (from chromadb)
  Downloading chroma_hnswlib-0.7.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (252 bytes)
Collecting posthog>=2.4.0 (from chromadb)
  Downloading posthog-3.4.2-py2.py3-none-any.whl.metadata (2.0 kB)
Collecting pulsar-client>=3.1.0 (from chromadb)
  Downloading pulsar_client-3.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.0 kB)
Collecting onnxruntime>=1.14.1 (from chromadb)
  Downloading onnxruntime-1.16.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.3 kB)
Collecting opentelemetry-api>=1.2.0 (from chromadb)
  Downloading opentelemetry_api-1.22.0-py3-none-any.whl.metadata (1.4 kB)
Collecting opentelemetry-exporter-otlp-proto-grpc>=1.2.0 (from chromadb)
  Downloading opentelemetr

Collecting opentelemetry-semantic-conventions==0.43b0 (from opentelemetry-instrumentation-fastapi>=0.41b0->chromadb)
  Downloading opentelemetry_semantic_conventions-0.43b0-py3-none-any.whl.metadata (2.3 kB)
Collecting opentelemetry-util-http==0.43b0 (from opentelemetry-instrumentation-fastapi>=0.41b0->chromadb)
  Downloading opentelemetry_util_http-0.43b0-py3-none-any.whl.metadata (2.5 kB)
Collecting asgiref~=3.0 (from opentelemetry-instrumentation-asgi==0.43b0->opentelemetry-instrumentation-fastapi>=0.41b0->chromadb)
  Downloading asgiref-3.7.2-py3-none-any.whl.metadata (9.2 kB)
Collecting monotonic>=1.5 (from posthog>=2.4.0->chromadb)
  Downloading monotonic-1.6-py2.py3-none-any.whl (8.2 kB)
Collecting httptools>=0.5.0 (from uvicorn[standard]>=0.18.3->chromadb)
  Downloading httptools-0.6.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.6 kB)
Collecting uvloop!=0.15.0,!=0.15.1,>=0.14.0 (from uvicorn[standard]>=0.18.3->ch

Downloading opentelemetry_sdk-1.22.0-py3-none-any.whl (105 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m105.6/105.6 kB[0m [31m21.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading posthog-3.4.2-py2.py3-none-any.whl (41 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.2/41.2 kB[0m [31m7.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pulsar_client-3.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.4/5.4 MB[0m [31m118.7 MB/s[0m eta [36m0:00:00[0m00:01[0m
[?25hDownloading typer-0.9.0-py3-none-any.whl (45 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.9/45.9 kB[0m [31m9.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading backoff-2.2.1-py3-none-any.whl (15 kB)
Downloading google_auth-2.28.0-py2.py3-none-any.whl (186 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m186.9/186.9 kB[0m [31m33.1 MB/s[0m eta [36m0:00

In [15]:
!pip install llama-index-vector-stores-chroma
!pip install llama-index-embeddings-huggingface

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting llama-index-vector-stores-chroma
  Downloading llama_index_vector_stores_chroma-0.1.2-py3-none-any.whl.metadata (795 bytes)
INFO: pip is looking at multiple versions of llama-index-vector-stores-chroma to determine which version is compatible with other requirements. This could take a while.
  Downloading llama_index_vector_stores_chroma-0.1.1-py3-none-any.whl.metadata (745 bytes)
  Downloading llama_index_vector_stores_chroma-0.1.0-py3-none-any.whl.metadata (737 bytes)
Collecting llama-index-core==0.10.0 (from llama-index-vector-stores-chroma)
  Downloading llama_index_core-0.10.0-py3-none-any.whl.metadata (3.0 kB)
Collecting llama-index-vector-stores-chroma
  Downloading llama_index_vector_stores_chroma-0.0.2-py3-none-any.whl.metadata (745 bytes)
Collecting llama-index-core<0.10.0,>=0.9.32 (from llama-index-vector-stores-chroma)
  Downloading llama_index_core-0.9.56-py3-none-any.whl.metadata (3.0 kB)
Collecting llama-index-vector-stores-chroma
  Downloading llama_index_vec



Downloading llama_index_vector_stores_chroma-0.0.1-py3-none-any.whl (4.5 kB)
Downloading llama_index_core-0.9.56-py3-none-any.whl (605 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m605.7/605.7 kB[0m [31m17.0 MB/s[0m eta [36m0:00:00[0m00:01[0m
[?25hInstalling collected packages: llama-index-core, llama-index-vector-stores-chroma
  Attempting uninstall: llama-index-core
    Found existing installation: llama-index-core 0.10.8
    Uninstalling llama-index-core-0.10.8:
      Successfully uninstalled llama-index-core-0.10.8
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llama-index 0.10.7 requires llama-index-core<0.11.0,>=0.10.0, but you have llama-index-core 0.9.56 which is incompatible.
llama-index-agent-openai 0.1.1 requires llama-index-core<0.11.0,>=0.10.1, but you have llama-index-core 0.9.56 which is incompatible.
llama-index-

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting llama-index-core<0.11.0,>=0.10.1 (from llama-index-embeddings-huggingface)
  Downloading llama_index_core-0.10.11-py3-none-any.whl.metadata (3.6 kB)


Downloading llama_index_core-0.10.11-py3-none-any.whl (15.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.4/15.4 MB[0m [31m88.2 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[?25hInstalling collected packages: llama-index-core
  Attempting uninstall: llama-index-core
    Found existing installation: llama-index-core 0.9.56
    Uninstalling llama-index-core-0.9.56:
      Successfully uninstalled llama-index-core-0.9.56
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llama-index-vector-stores-chroma 0.0.1 requires llama-index-core<0.10.0,>=0.9.32, but you have llama-index-core 0.10.11 which is incompatible.[0m[31m
[0mSuccessfully installed llama-index-core-0.10.11


In [16]:
import chromadb
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext

In [17]:
# load some documents
documents = SimpleDirectoryReader("../data/llamaindex").load_data()

# initialize client, setting path to dave data
db = chromadb.PersistentClient(path="../data/llamaindex/chroma_db")

In [18]:
# create collection
chroma_collection = db.get_or_create_collection("quickstart")

# assign chroma as the vector store to the context
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

In [20]:
# create index
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)

In [26]:
# create query engine and try
query_engine = index.as_query_engine()
response = query_engine.query("Is Markel a good company? Say your conclution first, then list your reasons as bullets.")
print(response)

 Based on the information provided in the text, Markel appears to be a successful company with a strong focus on creating value for its customers, associates, and shareholders. Here are some reasons why Markel may be considered a good company:

* The company has experienced significant growth over the past several years, with a CAGR (compound annual growth rate) of 10% between 2002 and 2022.
* Markel prioritizes a win-win-win culture, where all stakeholders - customers, associates, and shareholders - benefit from the company's operations.
* The company provides essential services to its customers, such as insurance, food, medical care, housing, and transportation, among others.
* Markel offers opportunities for associates to grow and create value, both personally and professionally, through problem-solving, innovation, and community involvement.
* Shareholders win when the company earns good returns on their capital investments, as evidenced by the company's growth trajectory and stron

In [27]:
response = query_engine.query("Give me a 200 words summary of the characteristics of Paul Graham.")
print(response)

 Paul Graham is an essayist, programmer, and entrepreneur. Before college, he spent most of his time outside school working on writing and programming. He began writing short stories but found them to be lacking in plot. His early programming experiences were with an IBM 1401 using an early version of Fortran, where he struggled to find uses for the machine due to a lack of data and limited capabilities.

In high school, Graham was envious of friends who had microcomputers, which revolutionized his perspective on computing. He eventually convinced his father to buy him a TRS-80, with which he began serious programming. In college, Graham planned to study philosophy but discovered it to be underwhelming and instead switched to Artificial Intelligence (AI). He was captivated by science fiction novels featuring intelligent computers and advancements in AI research.

Graham's writing career began again in 2021 when he wrote essays on various topics, ultimately leading him to reflect on his

In [35]:
# load the stored index and query

# initialize client
db = chromadb.PersistentClient(path="../data/llamaindex/chroma_db")

In [36]:
# get collection
chroma_collection = db.get_or_create_collection("quickstart")

# assign chroma as the vector_store to the context
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

In [37]:
# load your index from stored vectors
index = VectorStoreIndex.from_vector_store(
    vector_store, storage_context=storage_context
)

In [48]:
query_engine = index.as_query_engine()
response = query_engine.query("Who is the CEO of Markel, any introduction of him?")
print(response)

 Thomas S. Gayner is the Chief Executive Officer of Markel. He expresses his gratitude for the opportunity to serve in this role and lead the team at Markel. The text also mentions that he finds value in connecting with shareholders in person during the annual meeting and encourages attendance at the event for spontaneous conversations and thoughtful questions.


## Query

In [51]:
query_engine = index.as_query_engine()
response = query_engine.query(
    "If you are the CEO of Markel, please write an email to shareholder as 2023."
)
print(response)

 Subject: Exciting Updates and Future Aspiration as We Build One of the World's Great Companies – Markel's 2023 Shareholder Letter

Dear Valued Markel Shareholders,

First and foremost, we would like to express our deepest gratitude for your unwavering commitment and support. As we embark on another year together, we are thrilled to share some updates from the past year and our aspirations for 2023 and beyond.

Financial Performance:
We are pleased to report that Markel's financial performance continues to be strong. Our total operating revenues increased by X% in 2022, reaching $XXX million, while gross written premiums grew by Y%. Our combined ratio also improved from the previous year, standing at Z%. These achievements underscore our team's dedication and focus on delivering results for you as our shareholders.

People and Growth:
Our workforce has expanded significantly since we went public in 1986. Today, over 20,000 talented individuals are part of the Markel family. Each one br

In [54]:
from llama_index.core import VectorStoreIndex, get_response_synthesizer
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import SimilarityPostprocessor

# build index
index = VectorStoreIndex.from_documents(documents)

# configure retriever
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=20,
)

# configure response synthesizer
response_synthesizer = get_response_synthesizer()

# assemble query engine
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
    node_postprocessors=[SimilarityPostprocessor(similarity_cutoff=0.6)],
)

# query
response = query_engine.query("What did the author do growing up?")
print(response)

 The author's main activities outside of school before college were writing and programming. He wrote short stories, which he now acknowledges were awful with little plot. His first experiences with programming were on an IBM 1401 computer in ninth grade, where he was unable to write programs due to the lack of input options or knowledge for complex mathematical calculations. However, his perspective changed when he saw a friend use a microcomputer and eventually convinced his father to buy him one around 1980. With this new technology, he started programming in earnest, creating simple games, tools, and even a basic word processor. Despite his initial interest in philosophy as a college major, he became disillusioned with the field and switched to artificial intelligence instead due to its prominence in literature and media at the time.


## Tracing and Debug

In [59]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

In [57]:
response = query_engine.query("Does the pdf mentioned anything about Buffett?")
print(response)

INFO:httpx:HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
 The Markel Shareholder Letter 2022 does not mention Warren Buffett or Berkshire Hathaway based on the provided text. The new context does not offer any new details about Buffett.


In [61]:
response = query_engine.query("Do you think Paul is a good Programmer?")
print(response)

INFO:httpx:HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/ch

## Observability & Evaluating

In [62]:
# TODO:wandb integration

In [63]:
from llama_index.core.evaluation import FaithfulnessEvaluator

In [64]:
# define evaluator
evaluator = FaithfulnessEvaluator(llm=llm)

In [65]:
response = query_engine.query("Do you think Paul is a good Programmer?")

eval_result = evaluator.evaluate_response(response=response)
print(str(eval_result.passing))

INFO:httpx:HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:11434/api/chat "HTTP/1.1 200 OK"
HTTP Request: POST http://localhost:11434/api/ch

RuntimeError: asyncio.run() cannot be called from a running event loop

## Cost Analysis

* Each call to an LLM will cost some amount of money - for instance, OpenAI’s gpt-3.5-turbo costs $0.002 / 1k tokens. 

* LlamaIndex offers token predictors to predict token usage of LLM and embedding calls. This allows you to estimate your costs during 1) index construction, and 2) index querying, before any respective LLM calls are made.