##### Compare Documents using SubQuestionQueryEngine
![alt text](image/2024-01-23_17-46.png "a title")

In [2]:
!pip install llama_index==0.9.31 pypdf python-dotenv



You should consider upgrading via the 'C:\llamaindex\llamaindex-samples\llamaindex\Scripts\python.exe -m pip install --upgrade pip' command.


In [3]:

import nest_asyncio

nest_asyncio.apply()

In [4]:
import logging
import sys
import os

# logging.basicConfig(stream=sys.stdout, level=logging.INFO)
# logging.getLogger().handlers = []
# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

import openai
from dotenv import load_dotenv
load_dotenv(".env", override=True)
openai.api_key = os.environ["OPENAI_API_KEY"]

In [5]:
from llama_index import SimpleDirectoryReader,  VectorStoreIndex
from llama_index.response.pprint_utils import pprint_response
from llama_index.tools import QueryEngineTool, ToolMetadata
from llama_index.query_engine import SubQuestionQueryEngine

##### Load uber and lyft documents

In [6]:
lyft_docs = SimpleDirectoryReader(input_files=["data_pdfs/lyft_2021.pdf"]).load_data()
uber_docs = SimpleDirectoryReader(input_files=["data_pdfs/uber_2021.pdf"]).load_data()

In [7]:
print(f'Loaded lyft 10-K with {len(lyft_docs)} pages')
print(f'Loaded Uber 10-K with {len(uber_docs)} pages')

Loaded lyft 10-K with 238 pages
Loaded Uber 10-K with 307 pages


##### Build indices

In [8]:
lyft_index = VectorStoreIndex.from_documents(lyft_docs)
uber_index = VectorStoreIndex.from_documents(uber_docs)

##### Basic QA

In [9]:
lyft_engine = lyft_index.as_query_engine(similarity_top_k=3)


In [10]:
uber_engine = uber_index.as_query_engine(similarity_top_k=3)


In [11]:
response = lyft_engine.query('What is the revenue of Lyft in 2021? Answer in millions with page reference')


In [12]:
pprint_response(response)

Final Response: The revenue of Lyft in 2021 was $3.21 billion. (Page
reference: 63)


In [13]:
response = uber_engine.query('What is the revenue of Uber in 2021? Answer in millions, with page reference')


In [14]:
pprint_response(response)

Final Response: The revenue of Uber in 2021 was $17,455 million. (Page
reference: 57)


##### For comparing between uber and lyft

In [15]:
query_engine_tools = [
    QueryEngineTool(
        query_engine=lyft_engine,
        metadata=ToolMetadata(name='lyft_10k', description='Provides information about Lyft financials for year 2021')
    ),
    QueryEngineTool(
        query_engine=uber_engine,
        metadata=ToolMetadata(name='uber_10k', description='Provides information about Uber financials for year 2021')
    ),
]

s_engine = SubQuestionQueryEngine.from_defaults(query_engine_tools=query_engine_tools)

In [16]:
response = s_engine.query('Compare and contrast the customer segments and geographies that grew the fastest')

Generated 4 sub questions.
[1;3;38;2;237;90;200m[lyft_10k] Q: What were the customer segments that grew the fastest for Lyft in 2021?
[0m[1;3;38;2;90;149;237m[lyft_10k] Q: What were the geographies that grew the fastest for Lyft in 2021?
[0m[1;3;38;2;11;159;203m[uber_10k] Q: What were the customer segments that grew the fastest for Uber in 2021?
[0m[1;3;38;2;155;135;227m[uber_10k] Q: What were the geographies that grew the fastest for Uber in 2021?
[0m[1;3;38;2;155;135;227m[uber_10k] A: Chicago, Miami, New York City in the United States, Sao Paulo in Brazil, and London in the United Kingdom.
[0m[1;3;38;2;237;90;200m[lyft_10k] A: Riders who use Lyft to commute to and from work, explore their cities, spend more time at local businesses, and stay out longer knowing they can get a reliable ride home.
[0m[1;3;38;2;90;149;237m[lyft_10k] A: The geographies that grew the fastest for Lyft in 2021 were not explicitly mentioned in the provided context information.
[0m[1;3;38;2;11;1

In [17]:
print(response)

The customer segments that grew the fastest for Lyft in 2021 were riders who use the service for commuting, exploring their cities, supporting local businesses, and ensuring a reliable ride home. On the other hand, the customer segments that experienced the most growth for Uber in 2021 were related to membership programs such as Uber One, Uber Pass, Eats Pass, and Rides Pass.

Regarding the geographies that saw the fastest growth, Lyft did not specify any particular locations in the provided context information. In contrast, Uber experienced significant growth in cities like Chicago, Miami, and New York City in the United States, Sao Paulo in Brazil, and London in the United Kingdom.


In [18]:
response = s_engine.query('Compare revenue growth of Uber and Lyft from 2020 to 2021')

Generated 4 sub questions.
[1;3;38;2;237;90;200m[uber_10k] Q: What was the revenue of Uber in 2020?
[0m[1;3;38;2;90;149;237m[uber_10k] Q: What was the revenue of Uber in 2021?
[0m[1;3;38;2;11;159;203m[lyft_10k] Q: What was the revenue of Lyft in 2020?
[0m[1;3;38;2;155;135;227m[lyft_10k] Q: What was the revenue of Lyft in 2021?
[0m[1;3;38;2;90;149;237m[uber_10k] A: $17,455
[0m[1;3;38;2;155;135;227m[lyft_10k] A: The revenue of Lyft in 2021 was $3,208,323.
[0m[1;3;38;2;237;90;200m[uber_10k] A: $11,139 million
[0m[1;3;38;2;11;159;203m[lyft_10k] A: Lyft's revenue in 2020 was $2,364,681.
[0m

In [19]:
print(response)

Uber's revenue grew by $6,316 million from 2020 to 2021, while Lyft's revenue increased by $843,642 from 2020 to 2021.
