<a href="https://colab.research.google.com/github/thegallier/configs/blob/main/Mistral_7b_instruct_feature_test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Note: Responses from local models can be quite slow, especially with 8-bit quantization.

With 4bit quantization, `mistralai/Mistral-7B-Instruct-v0.1` uses about 12GB of VRAM and 8.5GB of RAM. I used a T4-High RAM instance for this notebook.

In [1]:
# !pip install edgartools
from edgar import *

In [2]:
!pip install git+https://github.com/run-llama/llama_index

Collecting git+https://github.com/run-llama/llama_index
  Cloning https://github.com/run-llama/llama_index to /tmp/pip-req-build-y2x50bh2
  Running command git clone --filter=blob:none --quiet https://github.com/run-llama/llama_index /tmp/pip-req-build-y2x50bh2
  Resolved https://github.com/run-llama/llama_index to commit 22544444fd001d8ff6b69788c26e65ea76969ff8
  Running command git submodule update --init --recursive -q
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


In [3]:
!pip install transformers accelerate bitsandbytes

Collecting accelerate
  Downloading accelerate-0.25.0-py3-none-any.whl (265 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m265.7/265.7 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting bitsandbytes
  Downloading bitsandbytes-0.41.3.post2-py3-none-any.whl (92.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.6/92.6 MB[0m [31m17.2 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: bitsandbytes, accelerate
Successfully installed accelerate-0.25.0 bitsandbytes-0.41.3.post2


## Setup

### Data

In [4]:
from llama_index.readers import BeautifulSoupWebReader

url = "https://www.theverge.com/2023/9/29/23895675/ai-bot-social-network-openai-meta-chatbots"

documents = BeautifulSoupWebReader().load_data([url])

In [5]:
documents

[Document(id_='3c54bf12-8b73-40ae-8ebc-15bfb4500aca', embedding=None, metadata={'URL': 'https://www.theverge.com/2023/9/29/23895675/ai-bot-social-network-openai-meta-chatbots'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='8bd1ac6935d2b15aeb539b7d5502efa0116c547f02899a478795a82705825838', text="The synthetic social network is coming - The VergeSkip to main contentThe VergeThe Verge logo.The Verge homepageThe Verge homepageThe VergeThe Verge logo./Tech/Reviews/Science/Entertainment/MoreMenuExpandThe VergeThe Verge logo.MenuExpandPlatformer/Artificial Intelligence/TechThe synthetic social network is comingThe synthetic social network is coming / Between ChatGPT’s surprisingly human voice and Meta’s AI characters, our feeds may be about to change foreverBy  Casey Newton, a contributing editor who has been writing about tech for over 10 years. He founded Platformer, a newsletter about Big Tech and democracy. Sep 29, 2023, 1:30 PM UTC|CommentsShare

### LLM

This should run on a T4 instance on the free tier

In [4]:
import torch
from transformers import BitsAndBytesConfig
from llama_index.prompts import PromptTemplate
from llama_index.llms import HuggingFaceLLM

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)


llm = HuggingFaceLLM(
    model_name="mistralai/Mistral-7B-Instruct-v0.1",
    tokenizer_name="mistralai/Mistral-7B-Instruct-v0.1",
    query_wrapper_prompt=PromptTemplate("<s>[INST] {query_str} [/INST] </s>\n"),
    context_window=3900,
    max_new_tokens=256,
    model_kwargs={"quantization_config": quantization_config},
    # tokenizer_kwargs={},
    generate_kwargs={"temperature": 0.2, "top_k": 5, "top_p": 0.95},
    device_map="auto",
)

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.94G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.47k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/72.0 [00:00<?, ?B/s]

In [5]:
#import openai


In [5]:
!pip install llama_index



In [6]:
from llama_index import ServiceContext

#service_context = ServiceContext.from_defaults(llm=llm, embed_model="local:BAAI/bge-small-en-v1.5")
service_context = ServiceContext.from_defaults(llm=llm, embed_model="local")

config.json:   0%|          | 0.00/684 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

### Index Setup

In [7]:
from edgar import *
import os
os.environ['EDGAR_IDENTITY']="peter decrem pdecrem@hotmail.com"
aapl=Company("aapl")
filings=aapl.get_filings(form="10-K")


In [7]:
aapl_html=filings.latest(1).html()

In [8]:
with open("aaplhtml","w") as f:
  f.write(aapl_html)

In [9]:
from llama_index.readers.file.flat_reader import FlatReader
from pathlib import Path

reader = FlatReader()
documents= reader.load_data(Path("aaplhtml"))


[nltk_data] Downloading package stopwords to
[nltk_data]     /usr/local/lib/python3.10/dist-
[nltk_data]     packages/llama_index/_static/nltk_cache...
[nltk_data]   Unzipping corpora/stopwords.zip.
[nltk_data] Downloading package punkt to
[nltk_data]     /usr/local/lib/python3.10/dist-
[nltk_data]     packages/llama_index/_static/nltk_cache...
[nltk_data]   Unzipping tokenizers/punkt.zip.


In [10]:
from llama_index import VectorStoreIndex

vector_index = VectorStoreIndex.from_documents(documents, service_context=service_context)

NameError: ignored

In [19]:
from llama_index import SummaryIndex

summary_index = SummaryIndex.from_documents(documents, service_context=service_context)

In [20]:
!pip install langchain sentence_transformers

Collecting langchain
  Downloading langchain-0.0.352-py3-none-any.whl (794 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m794.4/794.4 kB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting sentence_transformers
  Downloading sentence-transformers-2.2.2.tar.gz (85 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.0/86.0 kB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting jsonpatch<2.0,>=1.33 (from langchain)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl (12 kB)
Collecting langchain-community<0.1,>=0.0.2 (from langchain)
  Downloading langchain_community-0.0.6-py3-none-any.whl (1.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.5/1.5 MB[0m [31m12.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-core<0.2,>=0.1 (from langchain)
  Downloading langchain_core-0.1.3-py3-none-any.whl (192 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [8]:
from langchain.embeddings.huggingface import HuggingFaceBgeEmbeddings
from llama_index import ServiceContext

embed_model = HuggingFaceBgeEmbeddings(model_name="BAAI/bge-base-en")

#service_context = ServiceContext.from_defaults(embed_model=embed_model,llm=llm)
service_context = ServiceContext.from_defaults(embed_model="local",llm=llm)

In [11]:
from llama_index import ServiceContext, set_global_service_context

set_global_service_context(service_context)

In [43]:
from llama_index.node_parser import (
    UnstructuredElementNodeParser,
)

node_parser = UnstructuredElementNodeParser(llm=llm)

In [12]:
from llama_index.node_parser import (
    UnstructuredElementNodeParser,
)

node_parser = UnstructuredElementNodeParser()
raw_nodes_2021 = node_parser.get_nodes_from_documents(documents)

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
100%|██████████| 4/4 [00:10<00:00,  2.73s/it]


# old
https://medium.com/@jerryjliu98/how-unstructured-and-llamaindex-can-help-bring-the-power-of-llms-to-your-own-data-3657d063e30dm

In [44]:
raw_nodes_2021 = node_parser.get_nodes_from_documents(documents,llm=llm,embed_model="local")

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
__root__
  Invalid \escape: line 5 column 5 (char 151) (type=value_error.jsondecode; msg=Invalid \escape; doc={
"summary": "This table is a list of notes issued by the Nasdaq Stock Market LLC with various maturity dates and interest rates.",
"columns": [
{
"col\_name": "Trading symbol(s)",
"col\_type": "string"
},
{
"col\_name": "Name of each exchange on which registered",
"col\_type": "string"
},
{
"col\_name": "Common Stock, $0.00001 par value per share",
"col\_type": "string"
},
{
"col\_name": "AAPL",
"col\_type": "string"
},
{
"col\_name": "The Nasdaq Stock Market LLC",
"col\_type": "string"
},
{
"col\_name": "1.375% Notes due 2024",
"col\_type": "string"
},
{
"col\_name": "The Nasdaq Stock Market LLC",
"col\_type": "string"
}; pos=151; lineno=5; colno=5)
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/pydantic/v1/main.py", line 539, in parse_raw
    obj = load_str_bytes(
  File "

Validation error on structured response: 1 validation error for TableOutput
__root__
  Invalid \escape: line 5 column 5 (char 151) (type=value_error.jsondecode; msg=Invalid \escape; doc={
"summary": "This table is a list of notes issued by the Nasdaq Stock Market LLC with various maturity dates and interest rates.",
"columns": [
{
"col\_name": "Trading symbol(s)",
"col\_type": "string"
},
{
"col\_name": "Name of each exchange on which registered",
"col\_type": "string"
},
{
"col\_name": "Common Stock, $0.00001 par value per share",
"col\_type": "string"
},
{
"col\_name": "AAPL",
"col\_type": "string"
},
{
"col\_name": "The Nasdaq Stock Market LLC",
"col\_type": "string"
},
{
"col\_name": "1.375% Notes due 2024",
"col\_type": "string"
},
{
"col\_name": "The Nasdaq Stock Market LLC",
"col\_type": "string"
}; pos=151; lineno=5; colno=5)
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/pydantic/v1/main.py", line 539, in parse_raw
    obj = load_str_bytes(


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
 25%|██▌       | 1/4 [00:34<01:44, 34.83s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
__root__
  Invalid \escape: line 5 column 5 (char 376) (type=value_error.jsondecode; msg=Invalid \escape; doc={
"summary": "This table shows the potential impact of a hypothetical interest rate increase on an investment portfolio. It compares the fair value and annual interest expense of the portfolio in 2022 and 2023, assuming a 100 basis point increase in interest rates for all tenors. The table also includes the impact on term debt and investment portfolio.",
"columns": [
{
"col\_name": "Interest Rate Sensitive Instrument",
"col\_type": "string",
"summary": "The instrument that is sensitive to interest rate changes."
},
{
"col\_name": "Hypothetical Interest Rate Increase",
"col\_type": "string",
"summary": "The potential interest rate increase."
},
{
"col\_name": "Potential Impact 2023",
"col\_type": "str

Validation error on structured response: 1 validation error for TableOutput
__root__
  Invalid \escape: line 5 column 5 (char 376) (type=value_error.jsondecode; msg=Invalid \escape; doc={
"summary": "This table shows the potential impact of a hypothetical interest rate increase on an investment portfolio. It compares the fair value and annual interest expense of the portfolio in 2022 and 2023, assuming a 100 basis point increase in interest rates for all tenors. The table also includes the impact on term debt and investment portfolio.",
"columns": [
{
"col\_name": "Interest Rate Sensitive Instrument",
"col\_type": "string",
"summary": "The instrument that is sensitive to interest rate changes."
},
{
"col\_name": "Hypothetical Interest Rate Increase",
"col\_type": "string",
"summary": "The potential interest rate increase."
},
{
"col\_name": "Potential Impact 2023",
"col\_type": "string",
"summary": "The potential impact of the interest rate increase on the investment portfolio in 2023.

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
 50%|█████     | 2/4 [01:04<01:03, 31.88s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
__root__
  Invalid \escape: line 5 column 5 (char 386) (type=value_error.jsondecode; msg=Invalid \escape; doc={
"summary": "This table provides consolidated financial statements for a company for the years ended September 30, 2023, September 24, 2022, and September 25, 2021. It includes statements of operations, comprehensive income, balance sheets, shareholders' equity, and cash flows, as well as notes and a report from an independent registered public accounting firm.",
"columns": [
{
"col\_name": "Statement",
"col\_type": "string",
"summary": "The type of financial statement provided, such as statements of operations or balance sheets."
},
{
"col\_name": "Year",
"col\_type": "string",
"summary": "The year the financial statement is for, such as 2023 or 2022."
},
{
"col\_name": "Description",
"col\_type": 

Validation error on structured response: 1 validation error for TableOutput
__root__
  Invalid \escape: line 5 column 5 (char 386) (type=value_error.jsondecode; msg=Invalid \escape; doc={
"summary": "This table provides consolidated financial statements for a company for the years ended September 30, 2023, September 24, 2022, and September 25, 2021. It includes statements of operations, comprehensive income, balance sheets, shareholders' equity, and cash flows, as well as notes and a report from an independent registered public accounting firm.",
"columns": [
{
"col\_name": "Statement",
"col\_type": "string",
"summary": "The type of financial statement provided, such as statements of operations or balance sheets."
},
{
"col\_name": "Year",
"col\_type": "string",
"summary": "The year the financial statement is for, such as 2023 or 2022."
},
{
"col\_name": "Description",
"col\_type": "string",
"summary": "A brief description of the financial statement, such as 'Consolidated Statements of

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
 75%|███████▌  | 3/4 [01:33<00:30, 30.27s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
__root__
  Invalid \escape: line 5 column 5 (char 386) (type=value_error.jsondecode; msg=Invalid \escape; doc={
"summary": "This table provides consolidated financial statements for a company for the years ended September 30, 2023, September 24, 2022, and September 25, 2021. It includes statements of operations, comprehensive income, balance sheets, shareholders' equity, and cash flows, as well as notes and a report from an independent registered public accounting firm.",
"columns": [
{
"col\_name": "Statement",
"col\_type": "string",
"summary": "The type of financial statement provided, such as statements of operations or balance sheets."
},
{
"col\_name": "Year",
"col\_type": "string",
"summary": "The year the financial statement is for, such as 2023 or 2022."
},
{
"col\_name": "Description",
"col\_type": 

Validation error on structured response: 1 validation error for TableOutput
__root__
  Invalid \escape: line 5 column 5 (char 386) (type=value_error.jsondecode; msg=Invalid \escape; doc={
"summary": "This table provides consolidated financial statements for a company for the years ended September 30, 2023, September 24, 2022, and September 25, 2021. It includes statements of operations, comprehensive income, balance sheets, shareholders' equity, and cash flows, as well as notes and a report from an independent registered public accounting firm.",
"columns": [
{
"col\_name": "Statement",
"col\_type": "string",
"summary": "The type of financial statement provided, such as statements of operations or balance sheets."
},
{
"col\_name": "Year",
"col\_type": "string",
"summary": "The year the financial statement is for, such as 2023 or 2022."
},
{
"col\_name": "Description",
"col\_type": "string",
"summary": "A brief description of the financial statement, such as 'Consolidated Statements of

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
100%|██████████| 4/4 [02:01<00:00, 30.40s/it]


In [13]:
base_nodes_2021, node_mappings_2021 = node_parser.get_base_nodes_and_mappings(
    raw_nodes_2021
)

In [28]:
# this is the output from openai
# DEFAULT_SUMMARY_QUERY_STR = """\
# What is this table about? Give a very concise summary (imagine you are adding a caption), \
# and also output whether or not the table should be kept.\
# """
# node_mappings_2021=
# {'id_19_table': TextNode(id_='id_19_table', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='id_19_table_ref', node_type=<ObjectType.INDEX: '3'>, metadata={'col_schema': 'Column: Trading symbol(s)\nType: string\nSummary: Symbols of the securities\n\nColumn: Name of each exchange on which registered\nType: string\nSummary: Exchanges where the securities are registered'}, hash='1a0eeb8b79c1493e5bb28bea141b31dd567b6c19505a333ec8d25d1f1891bafe'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='171b23d5-fbdf-4b26-9965-b3949ed9ad24', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='a28dc6872e177d263c8500f0489a9cde2da28ff680a1484a054b02f9b44891e6')}, hash='f62b125c21afcada1fb8cc0267abb238a7268e98d4458dc831d6e8ca484edda7', text='                          Title of each class Trading symbol(s) Name of each exchange on which registered\n0  Common Stock, $0.00001 par value per share              AAPL               The Nasdaq Stock Market LLC\n1                       1.375% Notes due 2024                 —               The Nasdaq Stock Market LLC\n2                       0.000% Notes due 2025                 —               The Nasdaq Stock Market LLC\n3                       0.875% Notes due 2025                 —               The Nasdaq Stock Market LLC\n4                       1.625% Notes due 2026                 —               The Nasdaq Stock Market LLC\n5                       2.000% Notes due 2027                 —               The Nasdaq Stock Market LLC\n6                       1.375% Notes due 2029                 —               The Nasdaq Stock Market LLC\n7                       3.050% Notes due 2029                 —               The Nasdaq Stock Market LLC\n8                       0.500% Notes due 2031                 —               The Nasdaq Stock Market LLC\n9                       3.600% Notes due 2042                 —               The Nasdaq Stock Market LLC', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'),
#  'id_423_table': TextNode(id_='id_423_table', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='id_423_table_ref', node_type=<ObjectType.INDEX: '3'>, metadata={'col_schema': 'Column: 2023\nType: Decline in fair value\nSummary: $3,089\n\nColumn: 2022\nType: Decline in fair value\nSummary: $4,022\n\nColumn: 2023\nType: Increase in annual interest expense\nSummary: $194\n\nColumn: 2022\nType: Increase in annual interest expense\nSummary: $201'}, hash='5de940ebcf421c25630886eacb5d6adcdb97d1cc97ceefd025fe7bcbdea6f722'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='5dc54d7b-01a0-4b29-a786-67a8845658fd', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='583179470397fee8c28ff70216cb50268d695f313d5ab3b04efc26b28754c1e2')}, hash='339a6188fceea58e6327d0b148ad758a60235abd266b390525a593e8c8d5dbfb', text='          Interest Rate          Sensitive Instrument                Hypothetical Interest Rate Increase Potential Impact 2023   2022\n0  Investment portfolio  100 basis points, all tenors                Decline in fair value             $            3,089    $  4,022\n1             Term debt  100 basis points, all tenors  Increase in annual interest expense             $              194    $    201', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'),
#  'id_429_table': TextNode(id_='id_429_table', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='id_429_table_ref', node_type=<ObjectType.INDEX: '3'>, metadata={'col_schema': 'Column: Page\nType: string\nSummary: Index to page numbers of the financial statements\n\nColumn: Statement\nType: string\nSummary: Type of financial statement\n\nColumn: Year\nType: string\nSummary: Year of the financial statement'}, hash='77adc4947b56ab3bb5d091b991fc87dafc9126c55749ca96cbf32a11e40b9361'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='ab7ee7a5-914e-418f-ba0b-e084e9581211', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='0152dc9fea23196d6e1b31447fefe40d939b8b1b3561954c88077ec14b2e1c57')}, hash='471e1e32a518f2f3eaf8ad54a70b8f38105ac6793fa8df5cb3930869a22d100c', text='                                                                                          Index to Consolidated Financial Statements Page\n0            Consolidated Statements of Operations for the years ended September 30, 2023, September 24, 2022 and September 25, 2021   28\n1  Consolidated Statements of Comprehensive Income for the years ended September 30, 2023, September 24, 2022 and September 25, 2021   29\n2                                                        Consolidated Balance Sheets as of September 30, 2023 and September 24, 2022   30\n3  Consolidated Statements of Shareholders’ Equity for the years ended September 30, 2023, September 24, 2022 and September 25, 2021   31\n4            Consolidated Statements of Cash Flows for the years ended September 30, 2023, September 24, 2022 and September 25, 2021   32\n5                                                                                         Notes to Consolidated Financial Statements   33\n6                                                                           Reports of Independent Registered Public Accounting Firm   49', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'),
#  'id_737_table': TextNode(id_='id_737_table', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='id_737_table_ref', node_type=<ObjectType.INDEX: '3'>, metadata={'col_schema': 'Column: Page\nType: string\nSummary: Index to Consolidated Financial Statements\n\nColumn: Statement\nType: string\nSummary: Type of Financial Statement\n\nColumn: Year\nType: string\nSummary: Year of the Financial Statement'}, hash='cf9f43e4a8a46585daaf1f576d4654e891ea6906dd03bf9a8dca967f7c9e26ad'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='94902292-639b-4f0d-b87b-5c5c960205d9', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='91da52f500b3efd4273f7ad51e866eef648247f9d8da7443a92d6538659f5f24')}, hash='5b455d485bfa1ed99b2b3e88a1efc4482962a61462893fa46e43a40cd3f048fd', text='                                                                                          Index to Consolidated Financial Statements Page\n0            Consolidated Statements of Operations for the years ended September 30, 2023, September 24, 2022 and September\xa025, 2021   28\n1  Consolidated Statements of Comprehensive Income for the years ended September 30, 2023, September 24, 2022 and September\xa025, 2021   29\n2                                                        Consolidated Balance Sheets as of September 30, 2023 and September 24, 2022   30\n3  Consolidated Statements of Shareholders’ Equity for the years ended September 30, 2023, September 24, 2022 and September\xa025, 2021   31\n4            Consolidated Statements of Cash Flows for the years ended September 30, 2023, September 24, 2022 and September\xa025, 2021   32\n5                                                                                         Notes to Consolidated Financial Statements   33\n6                                                                          Reports of Independent Registered Public Accounting Firm*   49', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n')}

In [29]:
from llama_index.retrievers import RecursiveRetriever
from llama_index.query_engine import RetrieverQueryEngine
from llama_index import VectorStoreIndex

In [30]:
# construct top-level vector index + query engine
vector_index = VectorStoreIndex(base_nodes_2021)
vector_retriever = vector_index.as_retriever(similarity_top_k=1)
vector_query_engine = vector_index.as_query_engine(similarity_top_k=1)

In [31]:
from llama_index.retrievers import RecursiveRetriever

recursive_retriever = RecursiveRetriever(
    "vector",
    retriever_dict={"vector": vector_retriever},
    node_dict=node_mappings_2021,
    verbose=True,
)
query_engine = RetrieverQueryEngine.from_args(recursive_retriever)

In [32]:
response

Response(response='The revenue in 2023 was $96,995 million.', source_nodes=[NodeWithScore(node=TextNode(id_='46374367-875d-46b8-b899-9cc3883ccc9a', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='8ea7fbf8-c956-4c53-8df0-b6e25b17b5dd', node_type=<ObjectType.DOCUMENT: '4'>, metadata={}, hash='dcd4d39a57d46207779b5736fb219986cbebc22c6aa6b397b325cc2477c427b3'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='e830cfb7-3228-4428-be1b-e21ff99f3d15', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='7542837f5c00170dbc2cd4ec16376462961786fd9ac45c1d39a96ac582b16816'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='72604aef-11d9-4631-802b-52096f3960c2', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='294ec1934c50c0228cc4d1d18513c0dc123cd0f0bb7a5dc2ec5bdd1bcbac660d')}, hash='c7b8c7d36cd922be05cd6e33c22e4012e8814932bfcc07a0c12c51de734663a8', text='(2)\n\nSe

In [20]:
response = query_engine.query("What was the revenue in 2023?")
print(str(response))

[1;3;34mRetrieving with query id None: What was the revenue in 2023?
[0m[1;3;38;5;200mRetrieving text node: (2)

Services net sales include amortization of the deferred value of services bundled in the sales price of certain products.

Total net sales include $8.2 billion of revenue recognized in 2023 that was included in deferred revenue as of September 24, 2022, $7.5 billion of revenue recognized in 2022 that was included in deferred revenue as of September 25, 2021, and $6.7 billion of revenue recognized in 2021 that was included in deferred revenue as of September 26, 2020.

The Company’s proportion of net sales by disaggregated revenue source was generally consistent for each reportable segment in Note 13, “Segment Information and Geographic Data” for 2023, 2022 and 2021, except in Greater China, where iPhone revenue represented a moderately higher proportion of net sales.

Note 3 – Earnings Per Share

The following table shows the computation of basic and diluted earnings per 

In [21]:
benchmark_df

NameError: ignored

In [None]:
from llama_index.llama_dataset import download_llama_dataset
from llama_index.llama_pack import download_llama_pack
from llama_index import VectorStoreIndex

# download and install dependencies for benchmark dataset
rag_dataset, documents = download_llama_dataset(
  "Uber10KDataset2021", "./data"
)

# build basic RAG system
index = VectorStoreIndex.from_documents(documents=documents)
query_engine = index.as_query_engine()

# evaluate using the RagEvaluatorPack
RagEvaluatorPack = download_llama_pack(
  "RagEvaluatorPack", "./rag_evaluator_pack"
)
rag_evaluator_pack = RagEvaluatorPack(
    rag_dataset=rag_dataset,
    query_engine=query_engine,
    show_progress=True,
)

############################################################################
# NOTE: If have a lower tier subscription for OpenAI API like Usage Tier 1 #
# then you'll need to use different batch_size and sleep_time_in_seconds.  #
# For Usage Tier 1, settings that seemed to work well were batch_size=5,   #
# and sleep_time_in_seconds=15 (as of December 2023.)                      #
############################################################################

benchmark_df = rag_evaluator_pack.run(
    batch_size=20,  # batches the number of openai api calls to make
    sleep_time_in_seconds=1,  # seconds to sleep before making an api call
)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
  5%|▌         | 1/20 [00:03<01:05,  3.46s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
 10%|█         | 2/20 [00:06<00:58,  3.23s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
 15%|█▌        | 3/20 [00:09<00:52,  3.09s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
 20%|██        | 4/20 [00:12<00:50,  3.13s/it]Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


In [24]:
# was benchmark_df = await rag_evaluator_pack.arun(
#     batch_size=20,  # batches the number of openai api calls to make
#     sleep_time_in_seconds=1,  # seconds to sleep before making an api call
# )

['__await__',
 '__class__',
 '__del__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__name__',
 '__ne__',
 '__new__',
 '__qualname__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'close',
 'cr_await',
 'cr_code',
 'cr_frame',
 'cr_origin',
 'cr_running',
 'send',
 'throw']

In [None]:
# https://llamahub.ai/l/llama_datasets-docugami_kg_rag-sec_10_q?from=llama_datasets
# https://llamahub.ai/l/llama_datasets-10k-uber_2021?from=llama_datasets
# https://huggingface.co/datasets/PatronusAI/financebench
# https://huggingface.co/datasets/him1411/EDGAR10-Q/viewer/default/train?p=14985
# https://openreview.net/pdf?id=cB3OdLInAr9 paper data is below for q and a over tables
# https://github.com/IBM/AITQA/tree/master

In [18]:
benchmark_df

NameError: ignored

In [None]:
# {
#   "Question": "How has anthropogenic climate change affected the distribution and seasonal activities of oceanic and coastal organisms?",
#   "Context": "Anthropogenic climate change has led to unprecedented changes in ocean and coastal ecosystems over millennia, significantly impacting marine life. These changes include alterations in the physical and chemical characteristics of the ocean, which have affected the timing of seasonal activities, distribution, and abundance of oceanic and coastal organisms, ranging from microbes to mammals. Specifically, since the 1950s, surface warming has caused marine taxa and communities to shift poleward at an average rate of 59.2 km per decade. Seasonal events among planktonic organisms and fish are occurring earlier by approximately 4.3 to 7.5 days and 3 days per decade, respectively. Furthermore, warming, acidification, and deoxygenation are altering ecological communities, leading to habitat loss, population declines, increased risk of species extirpations and extinctions, and rearrangements in marine food webs.",
#   "Correct Answer": "Anthropogenic climate change has led to fundamental changes in the ocean's physical and chemical characteristics, resulting in shifts in the geographic distribution and timing of seasonal activities of oceanic and coastal organisms. Marine species have shifted poleward at an average of 59.2 km per decade since the 1950s, and seasonal events for planktonic organisms and fish are occurring earlier by approximately 4.3 to 7.5 days and 3 days per decade, respectively."
# }
# {
#   "Question": "How is climate change affecting recreational fishing activities and the spread of marine-borne pathogens?",
#   "Context": "Climate change is significantly impacting marine environments and the activities associated with them. One of the notable effects is the shift of recreational fishing activities, which are moving poleward and diversifying in response to changing climate conditions. This shift in recreational fishing is attributed to the alterations in marine ecosystems and fish populations due to climate change. Additionally, there is an increasing risk and geographic spread of marine-borne pathogens, such as Vibrio sp., which is being exacerbated by the impacts of climate change on marine life. These changes reflect the broader effects of climate change on marine ecosystems and the activities and risks associated with them.",
#   "Correct Answer": "Climate change is causing recreational fishing activities to shift poleward and diversify due to changes in marine ecosystems and fish populations. Furthermore, it is increasing the geographic spread and risk of marine-borne pathogens like Vibrio sp."
# }
# {
#   "Question": "What are the main product lines of the Company, and what operating systems do they use?",
#   "Context": "The Company, in its latest report, outlines its primary product lines and their respective operating systems. These products include the iPhone line of smartphones, which operates on the iOS operating system. The iPhone line features various models such as iPhone 14 Pro, iPhone 14, iPhone 13, iPhone SE, iPhone 12, and iPhone 11. The Mac line of personal computers, which includes laptops like MacBook Air and MacBook Pro, as well as desktops like iMac, Mac mini, Mac Studio, and Mac Pro, operates on the macOS operating system. Additionally, the iPad line of multipurpose tablets, featuring models like iPad Pro, iPad Air, iPad, and iPad mini, operates on the iPadOS operating system.",
#   "Correct Answer": "The main product lines of the Company include the iPhone line of smartphones using the iOS operating system, the Mac line of personal computers using the macOS operating system, and the iPad line of multipurpose tablets using the iPadOS operating system."
# }
# {
#   "Question": "What products are included in the Company's Wearables, Home and Accessories category?",
#   "Context": "The Company's Wearables, Home and Accessories category features a diverse range of products. This category includes the AirPods line, which are wireless headphones available in different models like AirPods, AirPods Pro, and AirPods Max. Another product in this category is the Apple TV, a media streaming and gaming device operating on the tvOS operating system, and it includes models such as Apple TV 4K and Apple TV HD. Additionally, the Apple Watch line of smartwatches, operating on the watchOS operating system, includes various models like Apple Watch Ultra, Apple Watch Series 8, and Apple Watch SE. The category also comprises Beats products, the HomePod mini, and other accessories.",
#   "Correct Answer": "The Company's Wearables, Home and Accessories category includes AirPods (wireless headphones), Apple TV (media streaming and gaming device), Apple Watch (line of smartwatches), Beats products, HomePod mini, and other accessories."
# }
# {
#   "Question": "What are the key digital content and payment services offered by the Company?",
#   "Context": "In its financial report, the Company outlines its various digital content and payment services. For digital content, the Company operates platforms like the App Store, which allows customers to discover and download applications and digital content, including books, music, video games, and podcasts. It also offers subscription-based services such as Apple Arcade (a game subscription service), Apple Fitness+ (a personalized fitness service), Apple Music (a curated listening experience with on-demand radio stations), Apple News+ (a subscription news and magazine service), and Apple TV+ (offering exclusive original content and live sports). In terms of payment services, the Company provides Apple Card (a co-branded credit card) and Apple Pay (a cashless payment service).",
#   "Correct Answer": "The Company offers key digital content services like the App Store, Apple Arcade, Apple Fitness+, Apple Music, Apple News+, and Apple TV+. In payment services, it provides Apple Card and Apple Pay."
# }
# {
#   "Question": "What initiatives and measures has the Company implemented in terms of inclusion and diversity, employee engagement, and health and safety?",
#   "Context": "In addressing risk factors, the Company has taken several initiatives. For inclusion and diversity, the Company is dedicated to creating a workforce that reflects the communities it serves, focusing on increasing diverse representation, fostering an inclusive culture, and ensuring equitable pay and opportunity. Regarding employee engagement, the Company emphasizes open communication, encouraging team members to share feedback and conducting surveys to assess areas like career development and inclusivity. For health and safety, the Company is committed to protecting its team members by identifying workplace risks, providing safety and security training, and implementing specific programs for high-hazard environments. The Company has also enhanced its health and safety measures in response to the COVID-19 pandemic.",
#   "Correct Answer": "The Company has implemented initiatives for inclusion and diversity by working towards a more inclusive workforce, for employee engagement by promoting open communication and conducting surveys, and for health and safety by identifying workplace risks, providing safety training, and taking additional measures during the COVID-19 pandemic."
# }
# {
#   "Question": "What risks does the Company face regarding supply and pricing of components, and how does it affect their operations?",
#   "Context": "The Company's operations are significantly impacted by supply and pricing risks related to components. Many of these components, including those available from multiple sources, are subject to industry-wide shortages and commodity pricing fluctuations. A notable example is the global semiconductor industry, which is experiencing high demand and supply shortages. This situation has adversely affected the Company's ability to obtain sufficient quantities of components on commercially reasonable terms. Additionally, the Company relies on certain components that are available from single or limited sources, further exacerbating these risks. Suppliers' financial conditions and industry consolidations can also limit the Company's ability to secure components reasonably. Economic conditions affecting suppliers can influence the Company's operations, leading to potential supply shortages and price increases, which can materially affect the Company's business, operations, and financial condition.",
#   "Correct Answer": "The Company faces risks related to supply shortages and price fluctuations of components, especially due to industry-wide shortages and reliance on single or limited sources. These risks impact the Company's ability to secure components on commercially reasonable terms, potentially leading to adverse effects on its business, operations, and financial condition."
# }
# {
#   "Question": "What impact can adverse macroeconomic conditions, including foreign exchange fluctuations, have on the Company's business?",
#   "Context": "The Company's report highlights the potential impacts of adverse macroeconomic conditions on its business. These conditions include inflation, slower growth or recession, new or increased tariffs, changes in fiscal and monetary policy, and currency fluctuations, which can adversely affect consumer confidence and spending, thereby impacting demand for the Company’s products and services. Additionally, global or regional economic uncertainties can significantly impact the Company’s suppliers, contract manufacturers, logistics providers, and other partners, potentially leading to financial instability and credit issues. Economic downturns can also increase credit and collectibility risks, affect the Company's ability to issue new debt, reduce liquidity, and lead to declines in the value of financial instruments. Such factors can materially affect the Company’s business, results of operations, financial condition, and stock price.",
#   "Correct Answer": "Adverse macroeconomic conditions, including foreign exchange fluctuations, can reduce consumer confidence and spending, impacting demand for the Company's products and services. They can also affect the Company’s supply chain, increase credit and collectibility risks, limit the ability to issue new debt, reduce liquidity, and decrease the value of financial instruments, ultimately affecting the Company's business performance and stock price."
# }
# {
#   "Question": "How does the Company use interest rate swaps to manage interest rate risk?",
#   "Context": "In its financial report, the Company discloses its strategy for managing interest rate risk associated with its U.S. dollar-denominated fixed-rate notes. To mitigate this risk, the Company has engaged in interest rate swaps. These swaps are financial instruments that allow the Company to effectively convert the fixed interest rates of a portion of its notes into floating interest rates. This strategy is part of the Company’s broader risk management approach, which also includes using foreign currency swaps to manage foreign currency risk on its foreign currency-denominated notes by converting these notes to U.S. dollar-denominated notes.",
#   "Correct Answer": "The Company uses interest rate swaps to manage interest rate risk on its U.S. dollar-denominated fixed-rate notes by converting the fixed interest rates to floating interest rates on a portion of these notes."
# }
# {
#   "Question": "What strategies does the Company employ to protect and enhance its intellectual property (IP) rights?",
#   "Context": "The Company, operating in industries characterized by rapid technological advances, heavily relies on its ability to continuously introduce competitive products, services, and technologies. To protect and enhance its intellectual property, which includes patents, designs, copyrights, trademarks, and other forms of IP rights in the U.S. and various foreign countries, the Company engages in several strategies. These include ongoing research and development (R&D), licensing of intellectual property, and acquisition of third-party businesses and technology. The Company also regularly files for new patents, design, copyright, and trademark applications worldwide. Over time, it has accumulated a large portfolio of issued and registered intellectual property rights globally. While no single IP right is solely responsible for protecting the Company's products and services, the combination of these rights, along with the innovative skills and technical competence of its personnel, play a crucial role in differentiating and sustaining its business.",
#   "Correct Answer": "The Company employs strategies such as continuous research and development, licensing of intellectual property, and acquisition of third-party technology to protect and enhance its intellectual property. Additionally, it regularly files for new patents, designs, copyrights, and trademarks worldwide, accumulating a large portfolio of intellectual property rights globally."
# }
# {
#   "Question": "What are the different types of notes the Company has issued and their respective due dates as listed in the report?",
#   "Context": "In the financial report, the Company has listed various notes that it has issued along with their respective due dates. These notes are a part of the Company's debt structure and are traded on The Nasdaq Stock Market LLC. The notes listed include: 1.000% Notes due 2022, 1.375% Notes due 2024, 0.000% Notes due 2025, 0.875% Notes due 2025, 1.625% Notes due 2026, 2.000% Notes due 2027, 1.375% Notes due 2029, 3.050% Notes due 2029, 0.500% Notes due 2031, and 3.600% Notes due 2042. Each note is characterized by a different interest rate and a specific maturity date, reflecting the terms under which the Company has borrowed funds through these debt instruments.",
#   "Correct Answer": "The Company has issued various notes, including 1.000% Notes due 2022, 1.375% Notes due 2024, 0.000% Notes due 2025, 0.875% Notes due 2025, 1.625% Notes due 2026, 2.000% Notes due 2027, 1.375% Notes due 2029, 3.050% Notes due 2029, 0.500% Notes due 2031, and 3.600% Notes due 2042, all traded on The Nasdaq Stock Market LLC."
# }
# {
#   "Question": "How does the Company manage its interest rate risk related to its investment portfolio and outstanding debt?",
#   "Context": "The Company's exposure to interest rate risk primarily concerns its investment portfolio and outstanding debt, with a significant impact from U.S. interest rate fluctuations. These fluctuations affect interest earned on cash, cash equivalents, marketable securities, and the fair value of these securities, as well as hedging costs and interest on debt. The Company's investment strategy focuses on capital preservation and liquidity, investing in highly rated securities to minimize principal loss risk. A sensitivity analysis indicated that a 100 basis point increase in interest rates would decrease the fair market value of its investment portfolio by approximately $4.0 billion as of September 24, 2022, and $4.1 billion as of September 25, 2021. Additionally, the Company uses interest rate swaps to manage interest rate risk on its debt, which amounted to $110.1 billion as of September 24, 2022, and $118.7 billion as of September 25, 2021. These swaps convert fixed-rate payments to floating-rate, or vice versa, with gains and losses generally offset by corresponding changes in the hedging instrument. A 100 basis point increase in market interest rates would increase annual interest expense by $201 million and $186 million for the respective dates.",
#   "Correct Answer": "To manage interest rate risk, the Company invests in highly rated securities and uses interest rate swaps on its outstanding debt. The Company's investment policy focuses on capital preservation and liquidity support. A sensitivity analysis showed that a 100 basis point increase in interest rates would result in a $4.0 billion and $4.1 billion decrease in the fair market value of the investment portfolio as of September 24, 2022, and September 25, 2021, respectively. The Company also manages interest rate risk on its debt ($110.1 billion as of September 24, 2022, and $118.7 billion as of September 25, 2021) using interest rate swaps, which can effectively convert fixed-rate payments to floating-rate, or vice versa."
# }
# {
#   "Question": "How does interest rate risk affect the Company's investment portfolio, outstanding debt, and hedging strategies?",
#   "Context": "The Company’s financial report details how interest rate risk impacts its investment portfolio and outstanding debt. Interest rate fluctuations, particularly in U.S. rates, influence the interest earned on cash, cash equivalents, marketable securities, and the fair value of these securities, as well as the costs associated with hedging and interest paid on debt. The Company’s investment strategy prioritizes capital preservation and liquidity requirements, typically investing in highly rated securities and limiting credit exposure. A sensitivity analysis showed that a hypothetical 100 basis point increase in interest rates would decrease the fair market value of the investment portfolio by $4.0 billion and $4.1 billion, based on positions as of September 24, 2022, and September 25, 2021, respectively. As for debt, as of September 24, 2022, the Company had outstanding fixed-rate notes, and on September 25, 2021, it had both floating- and fixed-rate notes. The Company utilizes interest rate swaps to manage interest rate risk on its term debt, allowing it to convert fixed-rate payments to floating-rate payments or vice versa, with gains and losses on debt generally offset by corresponding hedging instruments.",
#   "Correct Answer": "Interest rate risk affects the Company’s investment portfolio and outstanding debt by impacting interest earnings, the fair value of securities, hedging costs, and interest payments on debt. A 100 basis point increase in interest rates could decrease the fair market value of the investment portfolio by approximately $4.0 billion to $4.1 billion. The Company uses interest rate swaps to manage risk on its term debt, converting between fixed-rate and floating-rate payments, with gains and losses on these swaps generally offsetting those on the debt."
# }
# {
#   "Question": "How can adverse macroeconomic conditions impact the Company's business, including its products, services, and financial stability?",
#   "Context": "The Company’s financial report discusses how various adverse macroeconomic conditions can impact its business operations and financial stability. These conditions include inflation, slower growth or recession, new or increased tariffs, trade barriers, changes in fiscal and monetary policy, tighter credit, higher interest rates, high unemployment, and currency fluctuations. These factors can lead to a decrease in consumer confidence and spending, adversely affecting the demand for the Company’s products and services. Additionally, global or regional economic downturns can significantly impact the Company’s suppliers, contract manufacturers, logistics providers, distributors, and cellular network carriers, leading to financial instability and credit issues. Economic downturns can also increase credit and collectibility risks on the Company’s trade receivables, limit its ability to issue new debt, reduce liquidity, and cause declines in the fair value of financial instruments. These factors, coupled with the ongoing impact of the COVID-19 pandemic, can materially affect the Company’s business results, operations, financial condition, and stock price.",
#   "Correct Answer": "Adverse macroeconomic conditions can significantly impact the Company’s business by reducing consumer confidence and spending, which affects demand for its products and services. These conditions can also lead to financial instability among the Company’s supply chain and channel partners, increase credit and collectibility risks, limit new debt issuance, reduce liquidity, and decrease the fair value of financial instruments. Such conditions, along with the effects of the COVID-19 pandemic, can materially affect the Company’s overall business performance and financial health."
# }
# {
#   "Question": "How does the Company manage foreign exchange risks and what was the potential impact of these risks as assessed in their latest report?",
#   "Context": "The Company employs various strategies to manage foreign exchange risks. It may use foreign currency forward and option contracts with financial institutions to hedge against risks associated with certain assets and liabilities, firmly committed transactions, forecasted future cash flows, and net investments in foreign subsidiaries. The Company also hedges portions of its forecasted foreign currency exposure related to revenue and inventory purchases, typically for up to 12 months. However, it may choose not to hedge certain exposures due to various reasons, including accounting considerations or the high economic cost of hedging. To assess the foreign currency risk, the Company performed a sensitivity analysis using a value-at-risk (VAR) model. This model used a Monte Carlo simulation to estimate the potential impact of exchange rate fluctuations. As of September 24, 2022, the VAR model indicated a maximum one-day loss in fair value of $1.0 billion, compared to $550 million as of September 25, 2021. These potential losses in the fair value of foreign currency instruments are generally offset by gains in the value of the underlying exposures.",
#   "Correct Answer": "The Company manages foreign exchange risks by using foreign currency forward and option contracts and may hedge its forecasted foreign currency exposure for up to 12 months. A value-at-risk model estimated a potential maximum one-day loss in fair value of $1.0 billion as of September 24, 2022, due to exchange rate fluctuations, compared to $550 million as of September 25, 2021. These losses are typically offset by gains in the value of the hedged exposures."
# }
# {
#   "Question": "How does the Company manage foreign currency risk associated with its assets, liabilities, transactions, and investments in foreign subsidiaries?",
#   "Context": "The Company employs various strategies to manage foreign currency risk. It enters into foreign currency forward and option contracts with financial institutions to hedge against risks associated with certain assets, liabilities, firmly committed transactions, forecasted future cash flows, and net investments in foreign subsidiaries. Additionally, the Company uses foreign currency contracts to offset foreign currency exchange gains and losses on its foreign currency-denominated debt issuances. The Company typically hedges portions of its forecasted foreign currency exposure related to revenue and inventory purchases for up to 12 months, although it may not hedge certain exposures for various reasons, such as accounting considerations or the high economic cost of hedging. To assess the risk, the Company conducted a sensitivity analysis using a value-at-risk (VAR) model, which estimated a maximum one-day loss in fair value of $1.0 billion as of September 24, 2022, compared to $550 million as of September 25, 2021. This VAR model is used as a risk estimation tool and does not necessarily represent actual potential losses. It should be noted that the losses incurred on hedging instruments are generally offset by gains in the fair value of the underlying exposures.",
#   "Correct Answer": "The Company manages foreign currency risk by using foreign currency forward and option contracts to hedge against risks associated with various financial aspects and net investments in foreign subsidiaries. It also uses these contracts to offset gains and losses on foreign currency-denominated debt issuances. The Company's sensitivity analysis estimated a potential maximum one-day loss in fair value of $1.0 billion as of September 24, 2022. However, losses on hedging instruments are typically offset by gains in the underlying exposures."
# }
# {
#   "Question": "How does the Company manage foreign exchange risks associated with its assets, liabilities, and transactions?",
#   "Context": "The Company's report details its approach to managing foreign exchange risks. It uses foreign currency forward and option contracts with financial institutions to mitigate risks associated with existing assets and liabilities, firmly committed transactions, forecasted future cash flows, and net investments in foreign subsidiaries. Additionally, the Company enters into foreign currency contracts to offset the foreign currency exchange gains and losses on its foreign currency–denominated debt issuances. The Company generally hedges parts of its forecasted foreign currency exposure related to revenue and inventory purchases, typically for up to 12 months. However, it may choose not to hedge certain exposures due to accounting considerations or the prohibitive economic cost. The Company uses a value-at-risk (VAR) model, specifically a Monte Carlo simulation, for sensitivity analysis to assess the potential impact of fluctuations in exchange rates. This analysis estimated a maximum one-day loss in fair value of $1.0 billion as of September 24, 2022, under normal market conditions, compared to $550 million as of September 25, 2021. The losses in fair value on hedging instruments are generally offset by increases in the value of the underlying exposures.",
#   "Correct Answer": "The Company manages foreign exchange risks through foreign currency forward and option contracts, hedging parts of its forecasted foreign currency exposure related to revenue and inventory purchases. It also uses a value-at-risk model for sensitivity analysis, which estimated a potential one-day loss in fair value of $1.0 billion as of September 24, 2022. The Company may not hedge certain exposures for specific reasons, and the losses on hedging instruments are generally offset by gains in the underlying exposures."
# }
# {
#   "Question": "What was the total of current liabilities for the Company as of September 24, 2022?",
#   "Context": "The Consolidated Balance Sheets of the Company, as of September 24, 2022, and September 25, 2021, provide detailed information on the Company's financial position. This includes the total current liabilities for both years. The current liabilities as of September 24, 2022, are composed of several items:\n- Accounts payable: $64,115 million\n- Other current liabilities: $60,845 million\n- Deferred revenue: $7,912 million\n- Commercial paper: $9,982 million\n- Term debt: $11,128 million",
#   "stepByStep": "To calculate the total of current liabilities as of September 24, 2022, we add up the individual components:\n1. Accounts payable: $64,115 million\n2. Other current liabilities: $60,845 million\n3. Deferred revenue: $7,912 million\n4. Commercial paper: $9,982 million\n5. Term debt: $11,128 million\nTotal = $64,115 million + $60,845 million + $7,912 million + $9,982 million + $11,128 million = $153,982 million",
#   "Correct Answer": "$153,982 million"
# }
# {
#   "Question": "What is the total amount of current assets as of September 24, 2022, according to the consolidated balance sheets?",
#   "Context": "The consolidated balance sheets in the financial report provide detailed information about the Company's assets as of September 24, 2022, and September 25, 2021. For September 24, 2022, the current assets are listed as follows:
#       - Cash and cash equivalents: $23,646 million
#       - Marketable securities: $24,658 million
#       - Accounts receivable, net: $28,184 million
#       - Inventories: $4,946 million
#       - Vendor non-trade receivables: $32,748 million
#       - Other current assets: $21,223 million
#   The total current assets are the sum of these individual asset categories.",
#   "StepByStep": [
#     "Step 1: Add the value of 'Cash and cash equivalents': $23,646 million",
#     "Step 2: Add the value of 'Marketable securities': $24,658 million",
#     "Step 3: Add the value of 'Accounts receivable, net': $28,184 million",
#     "Step 4: Add the value of 'Inventories': $4,946 million",
#     "Step 5: Add the value of 'Vendor non-trade receivables': $32,748 million",
#     "Step 6: Add the value of 'Other current assets': $21,223 million",
#     "Step 7: Calculate the total sum of these values"
#   ],
#   "Correct Answer": "$135,405 million"
# }
#
# {
#   "Question": "What was the change in the Company's cash and cash equivalents from September 25, 2021, to September 24, 2022?",
#   "Context": "The table from the Company's consolidated balance sheets provides the following information for the years ended September 25, 2021, and September 24, 2022:

#     - Cash and cash equivalents on September 25, 2021: $34,940 million
#     - Cash and cash equivalents on September 24, 2022: $23,646 million",

#   "StepByStep": [
#     "1. Identify the cash and cash equivalents for September 25, 2021, which is $34,940 million.",
#     "2. Identify the cash and cash equivalents for September 24, 2022, which is $23,646 million.",
#     "3. Calculate the change by subtracting the 2022 figure from the 2021 figure: $34,940 million - $23,646 million."
#   ],
#   "Correct Answer": "The change in the Company's cash and cash equivalents from September 25, 2021, to September 24, 2022, was a decrease of $11,294 million."
# }
# {
#   "Question": "What was the total amount of cash and cash equivalents held by the Company as of September 24, 2022?",
#   "Context": "According to the Company's Consolidated Balance Sheets from its 2022 Form 10-K report, the total current assets as of September 24, 2022, included various items. These items were: Cash and cash equivalents - $23,646 million, Marketable securities - $24,658 million, Accounts receivable, net - $28,184 million, Inventories - $4,946 million, Vendor non-trade receivables - $32,748 million, Other current assets - $21,223 million. The sum of these items gave the total current assets figure. This table provides a detailed view of the Company's current financial position, specifically highlighting its liquidity through cash and cash equivalents.",
#   "StepByStep": [
#     "Identify the 'Cash and cash equivalents' line in the current assets section of the balance sheet.",
#     "Note the amount listed next to 'Cash and cash equivalents' as of September 24, 2022."
#   ],
#   "Correct Answer": "$23,646 million"
# }
# {
#   "Question": "Calculate the total amount of 'Cash and cash equivalents' for the years 2021 and 2022.",
#   "Context": "The table from the Company's consolidated balance sheets shows the amounts for 'Cash and cash equivalents' for the years 2021 and 2022. For September 25, 2021, the amount was $34,940 million, and for September 24, 2022, it was $23,646 million. These figures represent the liquid assets available to the Company that are readily convertible to known amounts of cash.",
#   "StepByStep": [
#     "Identify the 'Cash and cash equivalents' amounts for the two years from the table.",
#     "For 2021: The amount listed is $34,940 million.",
#     "For 2022: The amount listed is $23,646 million.",
#     "Add the amounts for both years together."
#   ],
#   "Correct Answer": "The total amount of 'Cash and cash equivalents' for the years 2021 and 2022 is $58,586 million."
# }
# {
#   "Question": "What is the total value of the Company's current assets as of September 24, 2022?",
#   "Context": "The Company's consolidated balance sheet as of September 24, 2022, lists the following current assets (in millions): Cash and cash equivalents - $23,646, Marketable securities - $24,658, Accounts receivable, net - $28,184, Inventories - $4,946, Vendor non-trade receivables - $32,748, Other current assets - $21,223. To calculate the total current assets, the individual values of these assets need to be summed up.",
#   "StepByStep": [
#     "Step 1: Add Cash and cash equivalents ($23,646 million)",
#     "Step 2: Add Marketable securities ($24,658 million)",
#     "Step 3: Add Accounts receivable, net ($28,184 million)",
#     "Step 4: Add Inventories ($4,946 million)",
#     "Step 5: Add Vendor non-trade receivables ($32,748 million)",
#     "Step 6: Add Other current assets ($21,223 million)",
#     "Step 7: Sum the values obtained in steps 1 to 6"
#   ],
#   "Correct Answer": "$135,405 million"
# }
# {
#   "Question": "Calculate the total change in cash and cash equivalents for the Company from September 25, 2021, to September 24, 2022.",
#   "Context": "The Consolidated Balance Sheets table in the Company's financial report provides the following figures:

#   - Cash and cash equivalents as of September 25, 2021: $34,940 million
#   - Cash and cash equivalents as of September 24, 2022: $23,646 million

#   To calculate the total change in cash and cash equivalents during this period, we need to subtract the figure for 2022 from the figure for 2021.",
#   "StepByStep": [
#     "Step 1: Identify the cash and cash equivalents for September 25, 2021, which is $34,940 million.",
#     "Step 2: Identify the cash and cash equivalents for September 24, 2022, which is $23,646 million.",
#     "Step 3: Subtract the 2022 amount from the 2021 amount: $34,940 million - $23,646 million."
#   ],
#   "Correct Answer": "-$11,294 million"
# }
























In [4]:
!pip install llama_index


Collecting llama_index
  Downloading llama_index-0.9.21-py3-none-any.whl (15.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.7/15.7 MB[0m [31m31.5 MB/s[0m eta [36m0:00:00[0m
Collecting beautifulsoup4<5.0.0,>=4.12.2 (from llama_index)
  Downloading beautifulsoup4-4.12.2-py3-none-any.whl (142 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m143.0/143.0 kB[0m [31m19.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting dataclasses-json (from llama_index)
  Downloading dataclasses_json-0.6.3-py3-none-any.whl (28 kB)
Collecting deprecated>=1.2.9.3 (from llama_index)
  Downloading Deprecated-1.2.14-py2.py3-none-any.whl (9.6 kB)
Collecting httpx (from llama_index)
  Downloading httpx-0.26.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.9/75.9 kB[0m [31m9.0 MB/s[0m eta [36m0:00:00[0m
Collecting openai>=1.1.0 (from llama_index)
  Downloading openai-1.6.1-py3-none-any.whl (225 kB)
[2K     [90m━━━━━

In [13]:
!pip install pypdf



In [12]:
import llama_index

In [15]:
!llamaindex-cli download-llamadataset Uber10KDataset2021 --download-dir ./data

Successfully downloaded Uber10KDataset2021 to ./data


In [14]:
# https://python.langchain.com/docs/integrations/retrievers/kay


In [None]:
# uber dataset for training
# https://llamahub.ai/l/llama_datasets-10k-uber_2021?from=all

In [16]:
from llama_index import SimpleDirectoryReader
from llama_index.llama_dataset import LabelledRagDataset

rag_dataset = LabelledRagDataset.from_json("./data/rag_dataset.json")
documents = SimpleDirectoryReader(
    input_dir="./data/source_files"
).load_data()

In [19]:
response = query_engine.query("How many treasuries did apple hold?")
print(str(response))

[1;3;34mRetrieving with query id None: How many treasuries did apple hold?
[0m[1;3;38;5;200mRetrieving text node: Treasury securities 19,406   —   ( 1,292 ) 18,114   35   5,468   12,611   U.S. agency securities 5,736   —   ( 600 ) 5,136   36   271   4,829   Non-U.S. government securities 17,533   6   ( 1,048 ) 16,491   —   11,332   5,159   Certificates of deposit and time deposits 1,354   —   —   1,354   1,034   320   —   Commercial paper 608   —   —   608   —   608   —   Corporate debt securities 76,840   6   ( 5,956 ) 70,890   20   12,627   58,243   Municipal securities 628   —   ( 26 ) 602   —   192   410   Mortgage- and asset-backed securities 22,365   6   ( 2,735 ) 19,636   —   344   19,292   Subtotal 144,470   18   ( 11,657 ) 132,831   1,125   31,162   100,544   Total  (2) $ 173,752   $ 30   $ ( 11,683 ) $ 162,099   $ 29,965   $ 31,590   $ 100,544

2022 Adjusted Cost Unrealized Gains Unrealized Losses Fair Value Cash and Cash Equivalents Current Marketable Securities Non-Curre

In [51]:
response

Response(response='[/', source_nodes=[NodeWithScore(node=TextNode(id_='20f1222a-a49d-48aa-a9f0-3770dbd3508d', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='b36e46cc-4121-4ddd-92ad-cd27925399e9', node_type=<ObjectType.DOCUMENT: '4'>, metadata={}, hash='dcd4d39a57d46207779b5736fb219986cbebc22c6aa6b397b325cc2477c427b3'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='61ebad6f-b6ce-49ab-a2cb-91a4b8ac17c5', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='c2989eb96fd5f51ce58d26e0d0d2bc57d8e9f9f76861a076830d2f83a456e328'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='5b4a3dd0-c7d8-48de-b633-4838f7ee4aef', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='9d3a00881b0683d1cec91e35fa25479ff8ad83e7d3ef14f746d1850ab29c848c')}, hash='50c415621b2336e302fc3c9b03fc9a5e42db72aa3484966f73caf6fe9d86734b', text='Apple Inc. | 2023 Form 10-K | 31\n\nApple Inc.\

In [66]:
llm.complete("Extract apple's cash position from the following text\n ```"+response.source_nodes[0].node.text)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


CompletionResponse(text='\nApple Inc. | 2023 Form 10-K | 31\n\nApple Inc.\n\nCONSOLIDATED STATEMENTS OF CASH FLOWS\n\n(In millions)\n\nYears ended September 30, 2023 September 24, 2022 September 25, 2021 Cash, cash equivalents and restricted cash, beginning balances $ 24,977 $ 35,929 $ 39,789 \n\nOperating activities: Net income 96,995 99,803 94,680 \n\nAdjustments to reconcile net income to cash generated by operating activities: Depreciation and amortization 11,519 11,104 11,284 \n\nShare-based compensation expense 10,833 9,038 7,906 \n\nOther ( 2,227 ) 1,006 1,006 \n\nChanges in operating assets and li', additional_kwargs={}, raw={'model_output': tensor([[    1,     1,   733,  ..., 12858,   304,   635]], device='cuda:0')}, delta=None)

In [65]:
response.source_nodes[0].node.text

'Apple Inc. | 2023 Form 10-K | 31\n\nApple Inc.\n\nCONSOLIDATED STATEMENTS OF CASH FLOWS\n\n(In millions)\n\nYears ended September 30, 2023 September 24, 2022 September 25, 2021 Cash, cash equivalents and restricted cash, beginning balances $ 24,977 \xa0 $ 35,929 \xa0 $ 39,789 \xa0  Operating activities: Net income 96,995 \xa0 99,803 \xa0 94,680 \xa0 Adjustments to reconcile net income to cash generated by operating activities: Depreciation and amortization 11,519 \xa0 11,104 \xa0 11,284 \xa0 Share-based compensation expense 10,833 \xa0 9,038 \xa0 7,906 \xa0  Other ( 2,227 ) 1,006 \xa0 ( 4,921 ) Changes in operating assets and liabilities: Accounts receivable, net ( 1,688 ) ( 1,823 ) ( 10,125 ) Vendor non-trade receivables 1,271 \xa0 ( 7,520 ) ( 3,903 ) Inventories ( 1,618 ) 1,484 \xa0 ( 2,642 ) Other current and non-current assets ( 5,684 ) ( 6,499 ) ( 8,042 ) Accounts payable ( 1,889 ) 9,448 \xa0 12,326 \xa0 Other current and non-current liabilities 3,031 \xa0 6,110 \xa0 7,475 \xa0 C

In [67]:
from typing import Any

from pydantic import BaseModel
from unstructured.partition.html import partition_html


In [31]:
!pip install InstructorEmbedding

Collecting InstructorEmbedding
  Downloading InstructorEmbedding-1.0.1-py2.py3-none-any.whl (19 kB)
Installing collected packages: InstructorEmbedding
Successfully installed InstructorEmbedding-1.0.1


In [None]:
!pip install sentence_transformers

In [68]:
from typing import Any, List
from InstructorEmbedding import INSTRUCTOR
from llama_index.embeddings.base import BaseEmbedding


class InstructorEmbeddings(BaseEmbedding):
    def __init__(
        self,
        instructor_model_name: str = "hkunlp/instructor-large",
        instruction: str = "Represent the Computer Science documentation or question:",
        **kwargs: Any,
    ) -> None:
        self._model = INSTRUCTOR(instructor_model_name)
        self._instruction = instruction
        super().__init__(**kwargs)

        def _get_query_embedding(self, query: str) -> List[float]:
            embeddings = self._model.encode([[self._instruction, query]])
            return embeddings[0]

        def _get_text_embedding(self, text: str) -> List[float]:
            embeddings = self._model.encode([[self._instruction, text]])
            return embeddings[0]

        def _get_text_embeddings(self, texts: List[str]) -> List[List[float]]:
            embeddings = self._model.encode(
                [[self._instruction, text] for text in texts]
            )
            return embeddings

In [None]:
!pip install langchain

In [69]:
from langchain.embeddings.huggingface import HuggingFaceBgeEmbeddings
from llama_index import ServiceContext

embed_model = HuggingFaceBgeEmbeddings(model_name="BAAI/bge-base-en")

#service_context = ServiceContext.from_defaults(embed_model=embed_model,llm=llm)
service_context = ServiceContext.from_defaults(embed_model="local",llm=llm)

In [72]:
embed_model.get_text_embeddings = embed_model.embed_documents
embed_model.get_text_embedding(
    "It is raining cats and dogs here!"
)
embed_model.embed_documents("It'raining and the cats are out")

ValueError: ignored

In [70]:
embeddings = embed_model.get_text_embedding(
    "It is raining cats and dogs here!"
)


AttributeError: ignored

In [73]:
from llama_index import ServiceContext, set_global_service_context
from llama_index.llms import OpenAI
from llama_index.embeddings import OpenAIEmbedding, HuggingFaceEmbedding
from llama_index.node_parser import (
    SentenceWindowNodeParser,
)
from llama_index.text_splitter import SentenceSplitter

# create the sentence window node parser w/ default settings
node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=3,
    window_metadata_key="window",
    original_text_metadata_key="original_text",
)

# base node parser is a sentence splitter
text_splitter = SentenceSplitter()

#llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
embed_model = HuggingFaceEmbedding(
    model_name="sentence-transformers/all-mpnet-base-v2", max_length=512
)
ctx = ServiceContext.from_defaults(
    llm=llm,
    embed_model=embed_model,
    # node_parser=node_parser,
)

In [35]:

set_global_service_context(service_context)

In [36]:
raw_nodes_2021 = node_parser.get_nodes_from_documents(documents,llm=llm,embed_model="local")

In [37]:
len(raw_nodes_2021)

625

In [38]:
import os
import pickle

if not os.path.exists("2021_nodes.pkl"):
    # raw_nodes_2021 = node_parser.get_nodes_from_documents(documents,llm=llm,embed_model="local:BAAI/bge-small-en-v1.5")
    raw_nodes_2021 = node_parser.get_nodes_from_documents(documents,llm=llm,embed_model="local")
    pickle.dump(raw_nodes_2021, open("2021_nodes.pkl", "wb"))
else:
    raw_nodes_2021 = pickle.load(open("2021_nodes.pkl", "rb"))

In [None]:
print(raw_nodes_2021)

### Helpful Imports / Logging

In [39]:
from llama_index.response.notebook_utils import display_response

In [74]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

## Basic Query Engine

### Compact (default)

In [75]:
query_engine = vector_index.as_query_engine(response_mode="compact")

response = query_engine.query("What are apples financial risks wrt to interest rates, inflation and foreign exachange?")

display_response(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


**`Final Response:`** [/

In [76]:
response

Response(response='[/', source_nodes=[NodeWithScore(node=TextNode(id_='82726765-43d7-492c-98ef-8d4030c3afb3', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='96761280-1add-4e51-b1f3-d1af28c43f4f', node_type=<ObjectType.DOCUMENT: '4'>, metadata={}, hash='458f4f616ef88556bd5bdcfaf788bbfba1aca4762edc538de07853990ef633a0'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='bdbc6346-4dac-4ec8-95e9-05d2b633e36b', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='91d8b35cb6e67f0746b52fb8fbb6db457278d71b152d36a58e8856252eeb45d9'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='4357cf46-3a05-4b3f-a937-250f6f750b8f', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='7fd6ce578bd10222fa4598814e601072116f045a246deb89e8b47042ee560cb8')}, hash='7ddfdf856d367dbb4709dfdf9bb765a02de44cec8af362f4e4c722d4fba49034', text='Additionally, strengthening of foreign currenci

### Refine

In [77]:
query_engine = vector_index.as_query_engine(response_mode="refine")

response = query_engine.query("How do OpenAI and Meta differ on AI tools?")

display_response(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


**`Final Response:`** [/

In [None]:
response

### Tree Summarize

In [78]:
query_engine2 = vector_index.as_query_engine(response_mode="tree_summarize")

response = query_engine2.query("Does apple have exposure to foreign exhange changes?")

display_response(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


**`Final Response:`** [/

In [83]:
response

Response(response='[/', source_nodes=[NodeWithScore(node=TextNode(id_='82726765-43d7-492c-98ef-8d4030c3afb3', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='96761280-1add-4e51-b1f3-d1af28c43f4f', node_type=<ObjectType.DOCUMENT: '4'>, metadata={}, hash='458f4f616ef88556bd5bdcfaf788bbfba1aca4762edc538de07853990ef633a0'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='bdbc6346-4dac-4ec8-95e9-05d2b633e36b', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='91d8b35cb6e67f0746b52fb8fbb6db457278d71b152d36a58e8856252eeb45d9'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='4357cf46-3a05-4b3f-a937-250f6f750b8f', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='7fd6ce578bd10222fa4598814e601072116f045a246deb89e8b47042ee560cb8')}, hash='7ddfdf856d367dbb4709dfdf9bb765a02de44cec8af362f4e4c722d4fba49034', text='Additionally, strengthening of foreign currenci

In [80]:
llm.complete("Summarize the following :"+response)

TypeError: ignored

## Router Query Engine

In [84]:
from llama_index.tools import QueryEngineTool, ToolMetadata

vector_tool = QueryEngineTool(
    vector_index.as_query_engine(),
    metadata=ToolMetadata(
        name="vector_search",
        description="Useful for searching for specific facts."
    )
)

summary_tool = QueryEngineTool(
    summary_index.as_query_engine(response_mode="tree_summarize"),
    metadata=ToolMetadata(
        name="summary",
        description="Useful for summarizing an entire document."
    )
)

### Single Selector

In [87]:
from llama_index.query_engine import RouterQueryEngine

query_engine = RouterQueryEngine.from_defaults(
    [vector_tool, summary_tool],
    service_context=service_context,
    select_multi=False
)

response = query_engine.query("Does the company do interest rate swaps?")

display_response(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


**`Final Response:`** [/

In [91]:
len(response.response)

2

In [95]:
response.response

'[/'

### Multi Selector

In [96]:
from llama_index.query_engine import RouterQueryEngine

query_engine = RouterQueryEngine.from_defaults(
    [vector_tool, summary_tool],
    service_context=service_context,
    select_multi=True,
)

response = query_engine.query("Summarize apples interest rate, foreign exchange and inflation risk and the hedges they have in 3 bullet points")

display_response(response)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:2 for o

OutOfMemoryError: ignored

## SubQuestion Query Engine

In [None]:
from llama_index.tools import QueryEngineTool, ToolMetadata

vector_tool = QueryEngineTool(
    vector_index.as_query_engine(),
    metadata=ToolMetadata(
        name="vector_search",
        description="Useful for searching for specific facts."
    )
)

summary_tool = QueryEngineTool(
    summary_index.as_query_engine(response_mode="tree_summarize"),
    metadata=ToolMetadata(
        name="summary",
        description="Useful for summarizing an entire document."
    )
)

In [None]:
import nest_asyncio
nest_asyncio.apply()

In [None]:
from llama_index.query_engine import SubQuestionQueryEngine

query_engine = SubQuestionQueryEngine.from_defaults(
    [vector_tool, summary_tool],
    service_context=service_context,
    verbose=True,
)

response = query_engine.query("What was mentioned about Meta? How Does it differ from how OpenAI is talked about?")

display_response(response)

## SQL Query Engine

Here, we download and use a sample SQLite database with 11 tables, with various info about music, playlists, and customers. We will limit to a select few tables for this test.

In [None]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"

In [None]:
!curl https://www.sqlitetutorial.net/wp-content/uploads/2018/03/chinook.zip -O /content/chinook.zip
!unzip /content/chinook.zip

In [None]:
from sqlalchemy import create_engine, MetaData, Table, Column, String, Integer, select, column

engine = create_engine("sqlite:////content/chinook.db")

In [None]:
from llama_index import SQLDatabase

sql_database = SQLDatabase(engine)

In [None]:
from llama_index.indices.struct_store import NLSQLTableQueryEngine

query_engine = NLSQLTableQueryEngine(
    sql_database=sql_database,
    tables=["albums", "tracks", "artists"],
    service_context=service_context
)

In [None]:
response = query_engine.query("What are some albums? Limit it to 5.")

display_response(response)

In [None]:
response

In [None]:
response = query_engine.query("What are some artists? Limit it to 5.")

display_response(response)

This last query should be a more complex join

In [None]:
response = query_engine.query("What are some tracks from the artist AC/DC? Limit it to 3")

display_response(response)

In [None]:
response

In [None]:
print(response.metadata['sql_query'])

## Programs

Depending the LLM, you will have to test with either `OpenAIPydanticProgram` or `LLMTextCompletionProgram`

In [None]:
from typing import List
from pydantic import BaseModel

from llama_index.program import OpenAIPydanticProgram, LLMTextCompletionProgram

class Song(BaseModel):
    """Data model for a song."""

    title: str
    length_seconds: int


class Album(BaseModel):
    """Data model for an album."""

    name: str
    artist: str
    songs: List[Song]

In [None]:
from llama_index.output_parsers import PydanticOutputParser

prompt_template_str = """\
Generate an example album, with an artist and a list of songs. \
Using the movie {movie_name} as inspiration.\
"""
program = LLMTextCompletionProgram.from_defaults(
    output_parser=PydanticOutputParser(Album),
    prompt_template_str=prompt_template_str,
    llm=llm,
    verbose=True,
)

In [None]:
output = program(movie_name="The Shining")

In [None]:
print(output)

## Data Agent

Similar to programs, OpenAI LLMs will use `OpenAIAgent`, while other LLMs will use `ReActAgent`.

In [None]:
from llama_index.agent import OpenAIAgent, ReActAgent

agent = ReActAgent.from_tools(
    [vector_tool, summary_tool],
    llm=llm,
    verbose=True
)

It seems tool usage is pretty flakey

In [None]:
response = agent.chat("Hello!")
print(response)

In [None]:
response = agent.chat("What was mentioned about Meta? How Does it differ from how OpenAI is talked about?")
print(response)