# Advanced RAG with LlamaParse

<a href="https://colab.research.google.com/github/run-llama/llama_parse/blob/main/examples/demo_advanced.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook is a complete walkthrough for using LlamaParse with advanced indexing/retrieval techniques in LlamaIndex over the Apple 10K Filing.

This allows us to ask sophisticated questions that aren't possible with "naive" parsing/indexing techniques with existing models.

Note for this example, we are using the `llama_index >=0.10.4` version

In [2]:
!pip install -qU llama-index
!pip install -qU llama-index-core==0.10.6.post1
!pip install -qU llama-index-embeddings-openai
!pip install -qU llama-index-postprocessor-flag-embedding-reranker
!pip install -qU git+https://github.com/FlagOpen/FlagEmbedding.git
!pip install -qU llama-parse

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llama-index 0.11.8 requires llama-index-core<0.12.0,>=0.11.8, but you have llama-index-core 0.10.6.post1 which is incompatible.
llama-index-agent-openai 0.3.1 requires llama-index-core<0.12.0,>=0.11.0, but you have llama-index-core 0.10.6.post1 which is incompatible.
llama-index-cli 0.3.1 requires llama-index-core<0.12.0,>=0.11.0, but you have llama-index-core 0.10.6.post1 which is incompatible.
llama-index-embeddings-openai 0.2.4 requires llama-index-core<0.12.0,>=0.11.0, but you have llama-index-core 0.10.6.post1 which is incompatible.
llama-index-indices-managed-llama-cloud 0.3.0 requires llama-index-core<0.12.0,>=0.11.0, but you have llama-index-core 0.10.6.post1 which is incompatible.
llama-index-llms-openai 0.2.3 requires llama-index-core<0.12.0,>=0.11.7, but you have llama-index-core 0.10.6.post1 which

In [3]:
!wget "https://s2.q4cdn.com/470004039/files/doc_financials/2021/q4/_10-K-2021-(As-Filed).pdf" -O apple_2021_10k.pdf

--2024-09-12 01:06:23--  https://s2.q4cdn.com/470004039/files/doc_financials/2021/q4/_10-K-2021-(As-Filed).pdf
Resolving s2.q4cdn.com (s2.q4cdn.com)... 68.70.205.3, 68.70.205.2, 68.70.205.4, ...
Connecting to s2.q4cdn.com (s2.q4cdn.com)|68.70.205.3|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 789896 (771K) [application/pdf]
Saving to: ‘apple_2021_10k.pdf’


2024-09-12 01:06:23 (4.06 MB/s) - ‘apple_2021_10k.pdf’ saved [789896/789896]



Some OpenAI and LlamaParse details

In [5]:
# llama-parse is async-first, running the async code in a notebook requires the use of nest_asyncio
import nest_asyncio

nest_asyncio.apply()

import os
import getpass

# API access to llama-cloud
os.environ["LLAMA_CLOUD_API_KEY"] = getpass.getpass("Enter your LlamaIndex Cloud API Key")

# Using OpenAI API for embeddings/llms
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key")

Enter your LlamaIndex Cloud API Key··········
Enter your OpenAI API Key··········


In [6]:
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import VectorStoreIndex
from llama_index.core import Settings

embed_model = OpenAIEmbedding(model="text-embedding-3-small")
llm = OpenAI(model="gpt-3.5-turbo-0125")

Settings.llm = llm
Settings.embed_model = embed_model

## Using brand new `LlamaParse` PDF reader for PDF Parsing

we also compare two different retrieval/query engine strategies:
1. Using raw Markdown text as nodes for building index and apply simple query engine for generating the results;
2. Using `MarkdownElementNodeParser` for parsing the `LlamaParse` output Markdown results and building recursive retriever query engine for generation.

In [7]:
from llama_parse import LlamaParse

documents = LlamaParse(result_type="markdown").load_data("./apple_2021_10k.pdf")

Started parsing the file under job_id e189ef98-2058-45c1-b2b8-a7fda9da248d


In [8]:
print(type(documents))

<class 'list'>


In [9]:
print(len(documents))

82


In [11]:
print(type(documents[0]))

<class 'llama_index.core.schema.Document'>


In [13]:
print(dict(documents[0]).keys())

dict_keys(['id_', 'embedding', 'metadata', 'excluded_embed_metadata_keys', 'excluded_llm_metadata_keys', 'relationships', 'text', 'mimetype', 'start_char_idx', 'end_char_idx', 'text_template', 'metadata_template', 'metadata_seperator'])


In [14]:
from copy import deepcopy
from llama_index.core.schema import TextNode
from llama_index.core import VectorStoreIndex


def get_page_nodes(docs, separator="\n---\n"):
    """Split each document into page node, by separator."""
    nodes = []
    for doc in docs:
        doc_chunks = doc.text.split(separator)
        for doc_chunk in doc_chunks:
            node = TextNode(
                text=doc_chunk,
                metadata=deepcopy(doc.metadata),
            )
            nodes.append(node)

    return nodes

In [15]:
page_nodes = get_page_nodes(documents)

In [16]:
from llama_index.core.node_parser import MarkdownElementNodeParser

node_parser = MarkdownElementNodeParser(
    llm=OpenAI(model="gpt-3.5-turbo-0125"), num_workers=8
)

In [17]:
nodes = node_parser.get_nodes_from_documents(documents)

2it [00:00, 17962.76it/s]
1it [00:00, 919.60it/s]
4it [00:00, 3623.59it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
1it [00:00, 11915.64it/s]
1it [00:00, 952.17it/s]
0it [00:00, ?it/s]
1it [00:00, 687.82it/s]
1it [00:00, 962.00it/s]
3it [00:00, 30102.66it/s]
2it [00:00, 478.36it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
1it [00:00, 10782.27it/s]
1it [00:00, 11949.58it/s]
1it [00:00, 967.54it/s]
1it [00:00, 9078.58it/s]
1it [00:00, 9686.61it/s]
5it [00:00, 5507.23it/s]
1it [00:00, 2085.68it/s]
0it [00:00, ?it/s]
0it [00:00, ?it/s]
1it [00:00, 12671.61it/s]
2it [00:00, 16416.06it/s]
2it [00:00, 21129.99it/s]
3it [00:00, 2884.67it/s]
3it [00:00, 26829.24it/s]
2it [00:00, 3650.40it/s]
1it [00:00

In [19]:
print(type(nodes))

<class 'list'>


In [20]:
print(len(nodes))

263


In [21]:
print(type(nodes[0]))

<class 'llama_index.core.schema.TextNode'>


In [22]:
print(dict(nodes[0]).keys())

dict_keys(['id_', 'embedding', 'metadata', 'excluded_embed_metadata_keys', 'excluded_llm_metadata_keys', 'relationships', 'text', 'mimetype', 'start_char_idx', 'end_char_idx', 'text_template', 'metadata_template', 'metadata_seperator'])


In [25]:
print(dict(nodes[0]).items())

dict_items([('id_', '8078449a-6974-458d-8d61-b0fa2f92c15d'), ('embedding', None), ('metadata', {}), ('excluded_embed_metadata_keys', []), ('excluded_llm_metadata_keys', []), ('relationships', {<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='56879057-0f8a-4af4-b027-6c0bcbbf14f5', node_type=<ObjectType.DOCUMENT: '4'>, metadata={}, hash='8e9fc24e7ae31d9e4baba136e3f03f08aad93d1045bee247f64d7de229d20428'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='d0ef0d1d-3e8f-49cf-8b63-63466b2621f9', node_type=<ObjectType.INDEX: '3'>, metadata={'col_schema': 'Column: California\nType: text\nSummary: None\n\nColumn: 94-2404110\nType: text\nSummary: None'}, hash='5185d5642e76059e242d641ee853b16497ff520afef2420e80fc545e445875fa')}), ('text', 'UNITED STATES SECURITIES AND EXCHANGE COMMISSION\n\n Washington, D.C. 20549\n\n FORM 10-K\n\n(Mark One)\n\n☒ ANNUAL REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934\n\nFor the fiscal year ended September 25, 2021\n\nor\

In [18]:
base_nodes, objects = node_parser.get_nodes_and_objects(nodes)

In [30]:
print(dict(base_nodes[0]).items())

dict_items([('id_', '8078449a-6974-458d-8d61-b0fa2f92c15d'), ('embedding', None), ('metadata', {}), ('excluded_embed_metadata_keys', []), ('excluded_llm_metadata_keys', []), ('relationships', {<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='56879057-0f8a-4af4-b027-6c0bcbbf14f5', node_type=<ObjectType.DOCUMENT: '4'>, metadata={}, hash='8e9fc24e7ae31d9e4baba136e3f03f08aad93d1045bee247f64d7de229d20428'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='d0ef0d1d-3e8f-49cf-8b63-63466b2621f9', node_type=<ObjectType.INDEX: '3'>, metadata={'col_schema': 'Column: California\nType: text\nSummary: None\n\nColumn: 94-2404110\nType: text\nSummary: None'}, hash='5185d5642e76059e242d641ee853b16497ff520afef2420e80fc545e445875fa')}), ('text', 'UNITED STATES SECURITIES AND EXCHANGE COMMISSION\n\n Washington, D.C. 20549\n\n FORM 10-K\n\n(Mark One)\n\n☒ ANNUAL REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934\n\nFor the fiscal year ended September 25, 2021\n\nor\

In [34]:
print(dict(objects[0]).items())

dict_items([('id_', 'd0ef0d1d-3e8f-49cf-8b63-63466b2621f9'), ('embedding', None), ('metadata', {'col_schema': 'Column: California\nType: text\nSummary: None\n\nColumn: 94-2404110\nType: text\nSummary: None'}), ('excluded_embed_metadata_keys', ['col_schema']), ('excluded_llm_metadata_keys', []), ('relationships', {<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='56879057-0f8a-4af4-b027-6c0bcbbf14f5', node_type=<ObjectType.DOCUMENT: '4'>, metadata={}, hash='8e9fc24e7ae31d9e4baba136e3f03f08aad93d1045bee247f64d7de229d20428'), <NodeRelationship.PREVIOUS: '2'>: RelatedNodeInfo(node_id='8078449a-6974-458d-8d61-b0fa2f92c15d', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='7a31757b3085897c40c7fae865530db81c84f7560d8c3de6f536b7a7343921dc'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='c43760dd-76e1-4bfd-9b1c-45753b1c3644', node_type=<ObjectType.TEXT: '1'>, metadata={'table_df': "{'California': {0: '(State or other jurisdiction', 1: 'of incorporation or organization)'}, '9

In [35]:
objects[0].get_content()

"This table provides information about a specific entity's incorporation or organization details, including the state or jurisdiction of incorporation and the corresponding IRS Employer Identification Number.,\nwith the following columns:\n- California: None\n- 94-2404110: None\n"

In [36]:
# dump both indexed tables and page text into the vector index
recursive_index = VectorStoreIndex(nodes=base_nodes + objects + page_nodes)

In [37]:
print(page_nodes[31].get_content())

# Apple Inc.

# CONSOLIDATED STATEMENTS OF OPERATIONS

# (In millions, except number of shares which are reflected in thousands and per share amounts)

| |Years ended|September 25, 2021|September 26, 2020|September 28, 2019|
|---|---|---|---|---|
|Net sales:|Products|$ 297,392|$ 220,747|$ 213,883|
| |Services|$ 68,425|$ 53,768|$ 46,291|
| |Total net sales|$ 365,817|$ 274,515|$ 260,174|
|Cost of sales:|Products|$ 192,266|$ 151,286|$ 144,996|
| |Services|$ 20,715|$ 18,273|$ 16,786|
| |Total cost of sales|$ 212,981|$ 169,559|$ 161,782|
| |Gross margin|$ 152,836|$ 104,956|$ 98,392|
|Operating expenses:|Research and development|$ 21,914|$ 18,752|$ 16,217|
| |Selling, general and administrative|$ 21,973|$ 19,916|$ 18,245|
| |Total operating expenses|$ 43,887|$ 38,668|$ 34,462|
|Operating income| |$ 108,949|$ 66,288|$ 63,930|
|Other income/(expense), net| |$ 258|$ 803|$ 1,807|
|Income before provision for income taxes| |$ 109,207|$ 67,091|$ 65,737|
|Provision for income taxes| |$ 14,527|$ 9,6

In [38]:
from llama_index.postprocessor.flag_embedding_reranker import FlagEmbeddingReranker

reranker = FlagEmbeddingReranker(
    top_n=5,
    model="BAAI/bge-reranker-large",
)

recursive_query_engine = recursive_index.as_query_engine(
    similarity_top_k=5, node_postprocessors=[reranker], verbose=True
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/443 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/279 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/801 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.24G [00:00<?, ?B/s]

In [39]:
print(len(nodes))

263


## Setup Baseline

For comparison, we setup a naive RAG pipeline with default parsing and standard chunking, indexing, retrieval.

In [40]:
from llama_index.core import SimpleDirectoryReader

reader = SimpleDirectoryReader(input_files=["apple_2021_10k.pdf"])
base_docs = reader.load_data()
raw_index = VectorStoreIndex.from_documents(base_docs)
raw_query_engine = raw_index.as_query_engine(
    similarity_top_k=5, node_postprocessors=[reranker]
)

## Using `new LlamaParse` as pdf data parsing methods and retrieve tables with two different methods
we compare base query engine vs recursive query engine with tables

### Table Query Task: Queries for Table Question Answering

In [41]:
query = "Purchases of marketable securities in 2020"

response_1 = raw_query_engine.query(query)
print("\n***********Basic Query Engine***********")
print(response_1)

response_2 = recursive_query_engine.query(query)
print("\n***********New LlamaParse+ Recursive Retriever Query Engine***********")
print(response_2)


***********Basic Query Engine***********
The purchases of marketable securities in 2020 amounted to $171.886 billion.
[1;3;38;2;11;159;203mRetrieval entering f25a7114-d549-49ff-9e41-8f20f2d7303d: TextNode
[0m[1;3;38;2;237;90;200mRetrieving from object TextNode with query Purchases of marketable securities in 2020
[0m
***********New LlamaParse+ Recursive Retriever Query Engine***********
The purchases of marketable securities in 2020 amounted to $169.487 billion.


In [42]:
print(response_2.source_nodes[2].get_content())

Apple Inc. | 2021 Form 10-K

 Financial Statements

 Dilutive Effect of Potentially Dilutive Securities

The Company applies the treasury stock method to determine the dilutive effect of potentially dilutive securities. Potentially dilutive securities representing 62 million shares of common stock were excluded from the computation of diluted earnings per share for 2019 because their effect would have been antidilutive.

 Cash Equivalents and Marketable Securities

All highly liquid investments with maturities of three months or less at the date of purchase are classified as cash equivalents.

The Company’s investments in marketable debt securities have been classified and accounted for as available-for-sale. The Company classifies its marketable debt securities as either short-term or long-term based on each instrument’s underlying contractual maturity date. Unrealized gains and losses on marketable debt securities classified as available-for-sale are recognized in other comprehensive

In [43]:
query = "effective interest rates of all debt issuances in 2021"

response_1 = raw_query_engine.query(query)
print("\n***********Basic Query Engine***********")
print(response_1)

response_2 = recursive_query_engine.query(query)
print("\n***********New LlamaParse+ Recursive Retriever Query Engine***********")
print(response_2)


***********Basic Query Engine***********
0.75%, 1.43%, 1.43%
[1;3;38;2;11;159;203mRetrieval entering c30aaa71-ca6c-4dd0-b40b-89f2904e4753: TextNode
[0m[1;3;38;2;237;90;200mRetrieving from object TextNode with query effective interest rates of all debt issuances in 2021
[0m[1;3;38;2;11;159;203mRetrieval entering 948345c1-9e8d-440b-b73c-51f65c0fdae5: TextNode
[0m[1;3;38;2;237;90;200mRetrieving from object TextNode with query effective interest rates of all debt issuances in 2021
[0m
***********New LlamaParse+ Recursive Retriever Query Engine***********
The effective interest rates of all debt issuances in 2021 ranged from 0.48% to 2.86%.


In [None]:
print(response_1.source_nodes[0].get_content())

Term Debt
As of September 25, 2021 , the Company had outstanding floating- and fixed-rate notes with varying maturities for an aggregate 
principal amount of $118.1 billion  (collectively the “Notes”). The Notes are senior unsecured obligations and interest is payable in 
arrears. The following table provides a summary of the Company’s term debt as of September 25, 2021  and September 26, 
2020 :
Maturities
(calendar year)2021 2020
Amount
(in millions)Effective
Interest RateAmount
(in millions)Effective
Interest Rate
2013 – 2020 debt issuances:
Floating-rate notes  2022 $ 1,750 0.48%  – 0.63% $ 2,250 0.60%  – 1.39%
Fixed-rate 0.000%  – 4.650%  notes 2022  – 2060  95,813 0.03%  – 4.78%  103,828 0.03%  – 4.78%
Second quarter 2021 debt issuance:
Fixed-rate 0.700%  – 2.800%  notes 2026  – 2061  14,000 0.75%  – 2.81%  —  — %
Fourth quarter 2021 debt issuance:
Fixed-rate 1.400%  – 2.850%  notes 2028  – 2061  6,500 1.43%  – 2.86%  —  — %
Total term debt  118,063  106,078 
Unamortized premium/

In [44]:
query = "Impacts of the U.S. Tax Cuts and Jobs Act of 2017 on income taxes in 2020"

response_1 = raw_query_engine.query(query)
print("\n***********Basic Query Engine***********")
print(response_1)

response_2 = recursive_query_engine.query(query)
print("\n***********New LlamaParse+ Recursive Retriever Query Engine***********")
print(response_2)


***********Basic Query Engine***********
The U.S. Tax Cuts and Jobs Act of 2017 had an impact on income taxes in 2020, as evidenced by a decrease in the provision for income taxes compared to the prior year.
[1;3;38;2;11;159;203mRetrieval entering 7ad8400e-f9ee-44c4-8775-e4c64364c86c: TextNode
[0m[1;3;38;2;237;90;200mRetrieving from object TextNode with query Impacts of the U.S. Tax Cuts and Jobs Act of 2017 on income taxes in 2020
[0m[1;3;38;2;11;159;203mRetrieval entering d13245cf-b146-4e73-a006-6e4ea039aa5c: TextNode
[0m[1;3;38;2;237;90;200mRetrieving from object TextNode with query Impacts of the U.S. Tax Cuts and Jobs Act of 2017 on income taxes in 2020
[0m[1;3;38;2;11;159;203mRetrieval entering cbfadbae-262f-4eab-99fc-b5df09aed6dc: TextNode
[0m[1;3;38;2;237;90;200mRetrieving from object TextNode with query Impacts of the U.S. Tax Cuts and Jobs Act of 2017 on income taxes in 2020
[0m[1;3;38;2;11;159;203mRetrieval entering 63d955b8-6583-491f-b16b-dc0393e50084: TextNod

In [46]:
print(response_1.source_nodes[0].get_content())

Other Income/(Expense), Net
The following table shows the detail of OI&E for 2021 , 2020  and 2019  (in millions):
2021 2020 2019
Interest and dividend income $ 2,843 $ 3,763 $ 4,961 
Interest expense  (2,645)  (2,873)  (3,576) 
Other income/(expense), net  60  (87)  422 
Total other income/(expense), net $ 258 $ 803 $ 1,807 
Note 5 – Income Taxe s
Provision for Income Taxes and Effective  Tax Rat e
The provision for income taxes for 2021 , 2020  and 2019 , consisted of the following (in millions):
2021 2020 2019
Federal:
Current $ 8,257 $ 6,306 $ 6,384 
Deferred  (7,176)  (3,619)  (2,939) 
Total  1,081  2,687  3,445 
State:
Current  1,620  455  475 
Deferred  (338)  21  (67) 
Total  1,282  476  408 
Foreign:
Current  9,424  3,134  3,962 
Deferred  2,740  3,383  2,666 
Total  12,164  6,517  6,628 
Provision for income taxes $ 14,527 $ 9,680 $ 10,481 
The foreign provision for income taxes is based on foreign pretax earnings of $68.7 billion , $38.1 billion  and $44.3 billion  in 2021 ,

In [47]:
query = "federal deferred tax in 2019-2021"

response_1 = raw_query_engine.query(query)
print("\n***********Basic Query Engine***********")
print(response_1)

response_2 = recursive_query_engine.query(query)
print("\n***********New LlamaParse+ Recursive Retriever Query Engine***********")
print(response_2)


***********Basic Query Engine***********
$3,619 million in 2019, $7,176 million in 2020, and $2,645 million in 2021.
[1;3;38;2;11;159;203mRetrieval entering d13245cf-b146-4e73-a006-6e4ea039aa5c: TextNode
[0m[1;3;38;2;237;90;200mRetrieving from object TextNode with query federal deferred tax in 2019-2021
[0m[1;3;38;2;11;159;203mRetrieval entering 1e78f51e-337c-4394-b51c-44f3bf8abd19: TextNode
[0m[1;3;38;2;237;90;200mRetrieving from object TextNode with query federal deferred tax in 2019-2021
[0m
***********New LlamaParse+ Recursive Retriever Query Engine***********
$2,939 million in 2019, $3,619 million in 2020, and $(7,176) million in 2021.


In [48]:
query = "give me the deferred state income tax in 2019-2021 (include +/-)"

response_1 = raw_query_engine.query(query)
print("\n***********Basic Query Engine***********")
print(response_1)

response_2 = recursive_query_engine.query(query)
print("\n***********New LlamaParse+ Recursive Retriever Query Engine***********")
print(response_2)


***********Basic Query Engine***********
$1,282 million in 2019, $476 million in 2020, -$338 million in 2021
[1;3;38;2;11;159;203mRetrieval entering d13245cf-b146-4e73-a006-6e4ea039aa5c: TextNode
[0m[1;3;38;2;237;90;200mRetrieving from object TextNode with query give me the deferred state income tax in 2019-2021 (include +/-)
[0m[1;3;38;2;11;159;203mRetrieval entering 1e78f51e-337c-4394-b51c-44f3bf8abd19: TextNode
[0m[1;3;38;2;237;90;200mRetrieving from object TextNode with query give me the deferred state income tax in 2019-2021 (include +/-)
[0m[1;3;38;2;11;159;203mRetrieval entering cbfadbae-262f-4eab-99fc-b5df09aed6dc: TextNode
[0m[1;3;38;2;237;90;200mRetrieving from object TextNode with query give me the deferred state income tax in 2019-2021 (include +/-)
[0m[1;3;38;2;11;159;203mRetrieval entering 7ad8400e-f9ee-44c4-8775-e4c64364c86c: TextNode
[0m[1;3;38;2;237;90;200mRetrieving from object TextNode with query give me the deferred state income tax in 2019-2021 (inc

In [49]:
print(response_2.source_nodes[0].get_content())

Summary of Federal, State, and Foreign current and deferred income taxes for the years 2019, 2020, and 2021, along with the provision for income taxes.,
with the following columns:
- 2021: None
- 2020: None
- 2019: None

| |2021|2020|2019|
|---|---|---|---|
|Federal:| | | |
|Current|$ 8,257|$ 6,306|$ 6,384|
|Deferred|(7,176)|(3,619)|(2,939)|
|Total|1,081|2,687|3,445|
|State:| | | |
|Current|1,620|455|475|
|Deferred|(338)|21|(67)|
|Total|1,282|476|408|
|Foreign:| | | |
|Current|9,424|3,134|3,962|
|Deferred|2,740|3,383|2,666|
|Total|12,164|6,517|6,628|
|Provision for income taxes|$ 14,527|$ 9,680|$ 10,481|



In [50]:
query = "current state taxes per year in 2019-2021 (include +/-)"

response_1 = raw_query_engine.query(query)
print("\n***********Basic Query Engine***********")
print(response_1)

response_2 = recursive_query_engine.query(query)
print("\n***********New LlamaParse+ Recursive Retriever Query Engine***********")
print(response_2)


***********Basic Query Engine***********
$1,620 million in 2019, $455 million in 2020, $475 million in 2021
[1;3;38;2;11;159;203mRetrieval entering d13245cf-b146-4e73-a006-6e4ea039aa5c: TextNode
[0m[1;3;38;2;237;90;200mRetrieving from object TextNode with query current state taxes per year in 2019-2021 (include +/-)
[0m[1;3;38;2;11;159;203mRetrieval entering e166943a-9d99-42f2-8aef-02df5bbd3414: TextNode
[0m[1;3;38;2;237;90;200mRetrieving from object TextNode with query current state taxes per year in 2019-2021 (include +/-)
[0m[1;3;38;2;11;159;203mRetrieval entering 7ad8400e-f9ee-44c4-8775-e4c64364c86c: TextNode
[0m[1;3;38;2;237;90;200mRetrieving from object TextNode with query current state taxes per year in 2019-2021 (include +/-)
[0m[1;3;38;2;11;159;203mRetrieval entering 69f5f95f-90b9-463d-b3cb-e0269215f735: TextNode
[0m[1;3;38;2;237;90;200mRetrieving from object TextNode with query current state taxes per year in 2019-2021 (include +/-)
[0m[1;3;38;2;11;159;203mR