<a href="https://colab.research.google.com/github/victor-iyi/llm-examples/blob/main/LlamaIndex_Sandbox.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# LlamaIndex Tutorial


In [None]:
!pip install -q -U llama-index llama-index-core llama-index-readers-file llama-index-llms-ollama llama-index-embeddings-huggingface
!pip install -q -U python-dotenv

In [None]:
from google.colab import userdata

# Get OpenAI key from colab secrets.
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')

# Add OPENAI_API_KEY to .env
!echo 'OPENAI_API_KEY='{OPENAI_API_KEY} > .env

In [None]:
# Download dataset
path = 'res/data/paul_graham'
!mkdir -p {path}
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O {path}/paul_graham_essay.txt


--2024-04-28 13:25:17--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75042 (73K) [text/plain]
Saving to: ‘res/data/paul_graham/paul_graham_essay.txt’


2024-04-28 13:25:18 (15.9 MB/s) - ‘res/data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]



In [None]:
import os

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

from dotenv import load_dotenv

load_dotenv()

True

In [None]:
os.listdir()

['.config', 'res', '.env', 'sample_data']

In [None]:
documents = SimpleDirectoryReader(path).load_data()
index = VectorStoreIndex.from_documents(documents, show_porgress=True)

In [None]:
query_engine = index.as_query_engine()

In [None]:
response = query_engine.query("Who is the main character in the story?")
print(response)

The main character in the story is Paul Graham.


In [None]:
response = query_engine.query("What did the author do growing up?")
print(response)

The author, growing up, worked on writing short stories and programming. Initially, the author wrote short stories as a beginning writer, focusing more on characters with strong feelings rather than intricate plots. In terms of programming, the author started with early programming attempts on an IBM 1401 using Fortran, where the main challenge was the limited input options available. The author's interest in programming grew significantly with the introduction of microcomputers, particularly after acquiring a TRS-80 and exploring various programming projects like simple games and a word processor.


In [None]:
question = 'What is the moral of the story?' # @param {type: "string"}
response = query_engine.query(question)
print(response)

The moral of the story is that pursuing work that may not be prestigious but aligns with your genuine interests and motives can lead to valuable discoveries and keep you on a path that is true to yourself, rather than being swayed by the desire to impress others. This approach can help avoid common pitfalls and guide individuals towards meaningful and fulfilling endeavors.


In [None]:
response = query_engine.query('What does a computer scientist do?')
response.response

'A computer scientist typically engages in a variety of activities related to computing, such as programming, software development, research, analysis, and problem-solving. They may work on developing new algorithms, designing software applications, studying computational theory, building systems, conducting experiments, writing code, and exploring the capabilities and limitations of computers and software.'

In [None]:
response

Response(response='A computer scientist typically engages in a variety of activities related to computing, such as programming, software development, research, analysis, and problem-solving. They may work on developing new algorithms, designing software applications, studying computational theory, building systems, conducting experiments, writing code, and exploring the capabilities and limitations of computers and software.', source_nodes=[NodeWithScore(node=TextNode(id_='f36cf90b-5749-4ab8-b991-519732e43e70', embedding=None, metadata={'file_path': '/content/res/data/paul_graham/paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain', 'file_size': 75042, 'creation_date': '2024-04-28', 'last_modified_date': '2024-04-28'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date',

In [None]:
len(response.source_nodes)

2

In [None]:
response.source_nodes[0]

NodeWithScore(node=TextNode(id_='f36cf90b-5749-4ab8-b991-519732e43e70', embedding=None, metadata={'file_path': '/content/res/data/paul_graham/paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain', 'file_size': 75042, 'creation_date': '2024-04-28', 'last_modified_date': '2024-04-28'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='38ba11bd-e502-4ae3-ac85-8a9cc558c4c3', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'file_path': '/content/res/data/paul_graham/paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain', 'file_size': 75042, 'creation_date': '2024-04-28', 'last_modified_date': '2024-04-28'}, hash='09f6925a5aa6eae5881d0e619e6ac3b920a