In [1]:
import os
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())
openai_api_key = os.environ["OPENAI_API_KEY"]

## Previous name: GPT Index

## Home Page: Pitch
* Unleash the power of LLMs over your data
    * Data Ingestion
        * Unstructured data: PDF, Text, Video, Images, etc.
        * Strucured data: Excel, SQL, etc.
        * Semi-strucured data: API's Slack, Salesforce, Notion, etc. 
    * Data Indexing
        * Store (save)
        * Index (find)
        * Integrate with vector stores and databases 
    * Query Interface
        * Accepts any input prompt over your data
        * Returns a knowledge-augmented response

## Home Page: Use Cases
* Document QA
* Data Augmented Chatbots
* Knowledge Agents
* Structured Analytics

## Home Page: Products
* LlamaIndex (Python)
* LlamaIndex.TS (Typescript version)
* LlamaHub
    * Llama Packs
    * Data Loaders
    * Agent tools 
* SEC Insights: end to end app
* create-llama: CLI tool to install llamaindex from terminal

## Last features
* [RAGs](https://github.com/run-llama/rags):
    * Build, customize, and use multiple ChatGPTs over your data, all with natural language.
    * RAGs is a Streamlit app that lets you create a RAG pipeline from a data source using natural language.
* [LLama Packs](https://llamahub.ai/). Interesting llama packs:
    * Resume screener
    * Gmail OpenAI agent
    * Deeplake multimodal retrieval
    * Sub_question Webiate

## Documentation: structure
* Getting started
* Use cases
* Understanding LLamaIndex
    * Tutorial series 
* Optimizing
    * When you already have LlamaIndex app working and are looking to further refine it.
    * List of first things you should try: embedding model, chunk size, customizations, etc.
    * Fine tuning your model.
* Module guides
    * Guides to the individual components of LlamaIndex

## Documentation: Starter Tutorial

In [2]:
#pip install llama-index

#### Load Private Document

In [2]:
from llama_index import SimpleDirectoryReader

documents = SimpleDirectoryReader("data").load_data()

In [3]:
documents

[Document(id_='c27681cc-4d49-4dac-bdc4-a1d71e6b5a22', embedding=None, metadata={'file_path': 'data\\be-good.txt', 'file_name': 'be-good.txt', 'file_type': 'text/plain', 'file_size': 16710, 'creation_date': '2025-01-07', 'last_modified_date': '2025-01-07', 'last_accessed_date': '2025-01-07'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, text='Be good\n\nApril 2008(This essay is derived from a talk at the 2008 Startup School.)About a month after we started Y Combinator we came up with the\nphrase that became our motto: Make something people want.  We\'ve\nlearned a lot since then, but if I were choosing now that\'s still\nthe one I\'d pick.Another thing we tell founders is not to worry too much about the\nbusiness model, at least at first.  Not because making

#### Create Vector Database (LlamaIndex call them "indexes")

In [4]:
from llama_index import VectorStoreIndex

index = VectorStoreIndex.from_documents(documents)

#### QA over private document

In [5]:
query_engine = index.as_query_engine()

response = query_engine.query("Summarize the article in less than 100 words.")

print(response)

The article discusses the importance of creating something people want and not focusing solely on making money in the early stages of a startup. It explores the idea that behaving like a charity can lead to success, using examples like Craigslist and Google. Additionally, it highlights how being benevolent can boost morale, attract help from others, and lead to better decision-making in startups. The article emphasizes the value of helping people, building a sense of mission, and creating a positive impact to increase the chances of startup success.


#### Save the vector database in your computer

In [6]:
index.storage_context.persist()

By default, this will save the data to the directory storage, but you can change that by passing a `persist_dir` parameter.

## Documentation: High-Level Concepts

#### RAG
* Your data is loaded
* Your data is indexed: prepared for queries
* When you ask a question, LlamaIndex gets the most relevant data from the vector database and passes your question and this most relevant data (called "the context") to the LLM so the LLM can redact a conversational answer.

#### Stages within RAG
1. Loading
2. Indexing: convert data into embeddings and metadata
3. Storing: store your embeddings and metadata
4. Querying
    * sub-queries
    * multi-step queries
    * hybrid strategies
5. Evaluation: checking how your accurate, faithful and fast responses to queries are

#### Important concepts within some of the previous stages
1. Loading
    * Document: data format (PDF, API, etc).
    * Node: data chunk with metadata.
    * Connector or Reader: connects with data sources.
2. Indexing
    * Indexing: transformation and storage of data into embeddings with metadata in vector databases.
    * Embeddings: numerical representation of data.
4. Querying
    * Retrievers: how to retrieve relevant context from an index when given a query. The retrieval strategy is key to the performance of the app.
    * Routers: determines which retriever will be used based on the reriever's metadata and the query.
    * Node postprocessors: applies transformations, filtering and re-ranking logic to nodes.
    * Response synthesizers: given a query and a set of retrieved text chunks, it generates the conversational response from an LLM.

#### Naming of the 3 main use cases
* Query Engines: ask questions about your data.
* Chat Engines: have a conversation with your data.
* Agents: automated decision maker.

## Documentation: Customization Tutorial

#### Starting point: basic RAG

In [7]:
from llama_index import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("In less than 100 words, what is the meaning of good for the author?")
print(response)

For the author, being good is not about claiming moral superiority or sanctimony. Instead, being good is seen as a practical approach that can benefit startups in various ways. It involves helping others, maintaining morale, and creating a sense of mission that drives perseverance and resilience. Goodness is linked to genuine care for users, a willingness to assist, and a commitment to making a positive impact, ultimately contributing to the success and survival of a startup.


#### Parse the document into smaller chunks

In [8]:
from llama_index import ServiceContext

service_context = ServiceContext.from_defaults(chunk_size=1000)

In [9]:
index = VectorStoreIndex.from_documents(
    documents, 
    service_context=service_context
)

#### Use a different vector database

In [10]:
import chromadb
from llama_index.vector_stores import ChromaVectorStore
from llama_index import StorageContext

chroma_client = chromadb.PersistentClient()
chroma_collection = chroma_client.create_collection("quickstart")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(
    vector_store=vector_store
)

In [11]:
index = VectorStoreIndex.from_documents(
    documents, 
    storage_context=storage_context
)

#### Retrieve more context when I query

In [12]:
query_engine = index.as_query_engine(similarity_top_k=5)

#### Use a different LLM

In [13]:
# from llama_index.llms import PaLM

# service_context = ServiceContext.from_defaults(llm=PaLM())

#### Use a different response mode

In [16]:
query_engine = index.as_query_engine(response_mode="tree_summarize")

#### Stream the response back

In [26]:
index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)

In [29]:
#query_engine = index.as_query_engine(streaming=True)
#response = query_engine.query("In less than 100 words, what is the meaning of good according to the author?")
#response.print_response_stream()

#### Use a chatbot instead of a QA

In [30]:
#query_engine = index.as_chat_engine()
#response = query_engine.chat("In less than 100 words, what is the meaning of bad according to the author?")
#print(response)

#response = query_engine.chat("Oh interesting, tell me more.")
#print(response)

## Documentation: The LlamaIndex Video Series
* Build a document chatbot from scratch
* Sub-questions
* Manage documents from a source that is constantly updating like Discord
* Combining SQL and Semantic Search

## Documentation: Use Cases
* QA
* Chatbots
* Agents
* Structured Data Extraction
* Multi-modal

## Documentation: Understanding (LI vs LC)
* Using LLMs
    * Different way of loading OpenAIEmbeddings than LC
    * Similar approach to Prompt templates 
* Loading
    * Very interesting: multi-purpose loader
    * Splitter, chunk_size, chunk_overlap
    * Creating chunks (nodes) manually
    * Adding metadata to document (copied to nodes)
    * Loading connectors from LLamaHub
* Indexing
    * Index types:
        * Vector store index
            * Nodes and embeddings
            * Semantic search
            * Top K Retrieval
        * Summary index
            * If you want to summarize the document 
        * Knowledge graph index
            * If your data is a set of disconnected concepts (a "graph") 
* Storing
    * by default, indexed data is stored only in memory
    * creating embeddings is expensive
    * store to avoid the time and cost of re-indexing
    * save: .persist()
    * load persisted index: load_index_from_storage()
* Querying
    * the most significant part of an LLM App
    * stages: retrieval, postprocessing, response synthesis.
    * customizing the stages of querying.
* Putting it all together
    * advanced techniques
    * how to build a full-stack app
        * React + Flask API
* Observability: tracing and debugging.
    * Logging
    * Callbacks to help debug
    * One-click observability with eval tools offered by partners (W&B, etc)
* Evaluation.
    * Response evaluation
    * Retrieval evaluation
    * Analizing the cost of your app
        * MockLLM to predict token usage
        * MockEmbedding

## Documentation: Optimizing
* Advanced prompt techniques
* Prompt engineering for RAG
* Advanced retrieval strategies
* Agentic strategies
    * OpenAI Agent
* Evaluation
* Fine-tuning
* Building performant RAG apps for production
    * General techniques
        * decoupling retrieval chunks vs syntesis chunks
        * structured retrieval for large document sets
        * dynamically retrieve chunks
        * optimize context embeddings
    * Long list of specific techniques
* Building RAG from scratch (lower-level)  