In [1]:
import os
os.environ["OPENAI_API_KEY"] = "..."
os.environ["HUGGINGFACE_API_KEY"] = "..."

### Impact of Embeddings on Quality with Sub Question Query

In this tutorial, we load longer text (Fellowship of the Ring) and utilize Llama-Index Sub Question Query to evlauate a complex question around Frodo's character evolution.

In addition, we will iterate through different embeddings and chunk sizes and use TruLens to select the best one.

In [2]:
# NOTE: This is ONLY necessary in jupyter notebook.
# Details: Jupyter runs an event-loop behind the scenes. 
#          This results in nested event-loops when we start an event-loop to make async queries.
#          This is normally not allowed, we use nest_asyncio to allow it for convenience.  
import nest_asyncio
nest_asyncio.apply()

In [3]:
# Import main tools for build app
from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext
from llama_index.tools import QueryEngineTool, ToolMetadata
from llama_index.query_engine import SubQuestionQueryEngine

# load data
lotr = SimpleDirectoryReader(input_dir="data/lotr").load_data()

In [4]:
# Imports main tools for eval
from trulens_eval import TruLlama, Feedback, Tru, feedback
tru = Tru()

#hugs = feedback.Huggingface()
openai = feedback.OpenAI()

No .env found in /Users/jreini/Desktop/development/trulens/trulens_eval/examples/frameworks/llama_index or its parents. You may need to specify secret keys in another manner.


In [5]:
# iterate through embeddings and chunk sizes, evaluating each response's agreement with chatgpt using TruLens
chunk_sizes = [20, 32]
embeddings = ['text-embedding-ada-001','text-embedding-ada-002']
for embedding in(embeddings):
    for chunk_size in chunk_sizes:
        # initialize service context (set chunk size)
        service_context = ServiceContext.from_defaults(chunk_size=chunk_size)

        # build index and query engine
        query_engine = VectorStoreIndex.from_documents(lotr).as_query_engine(embed_model=embedding)

        # setup base query engine as tool
        query_engine_tools = [
            QueryEngineTool(
                query_engine=query_engine, 
                metadata=ToolMetadata(name='Lord of the Rings', description='Fellowship of the Ring')
            )
        ]

        query_engine = SubQuestionQueryEngine.from_defaults(query_engine_tools=query_engine_tools, service_context=service_context)

        # Question/answer relevance between overall question and answer.
        model_agreement = Feedback(openai.model_agreement).on_input_output()

        tc = TruLlama(app_id = f'subquery_engine_{embedding}_chunksize_{chunk_size}', app = query_engine, feedbacks = [model_agreement])

        response = tc.query('Describe Frodo Baggins growth from the beginning of his time at the shire to throwing the one ring in Mount Doom.')

✅ In model_agreement, input prompt will be set to *.__record__.main_input or `Select.RecordInput` .
✅ In model_agreement, input response will be set to *.__record__.main_output or `Select.RecordOutput` .
✅ app subquery_engine_text-embedding-ada-001_chunksize_20 -> default.sqlite
✅ feedback def. feedback_definition_hash_48818e5c65e8c38bac0635894efaa278 -> default.sqlite
Generated 3 sub questions.
[36;1m[1;3m[Lord of the Rings] Q: What is Frodo Baggins' initial state at the beginning of his time at the shire?
[0m[36;1m[1;3m[Lord of the Rings] A: 
Frodo Baggins is initially in a state of shock and confusion. He has just been attacked by the men of Carn Dûm and is now standing in the middle of a barrow-top, wearing thin white rags and crowned and belted with pale gold. He is surrounded by his friends, who are also in a state of shock and confusion.
[0m[33;1m[1;3m[Lord of the Rings] Q: What events occur in Frodo Baggins' journey from the beginning of his time at the shire to throwin

In [6]:
tru.run_dashboard()

Starting dashboard ...


Accordion(children=(VBox(children=(VBox(children=(Label(value='STDOUT'), Output())), VBox(children=(Label(valu…

Dashboard started at http://192.168.50.34:8502 .


<Popen: returncode: None args: ['streamlit', 'run', '--server.headless=True'...>

From our knowledge of Lord of the Rings, we know that Frodo is in fact aware of the danger of the One Ring, highlighting the issue with using ChatGPT alone for complex queries. We should then use model_agreement *not* as a measure of correctness, but rather agreement with an incorrect ChatGPT answer.

Assessing the four chain iterations using the TruLens leaderboard, we can see that *subquery_engine_text-embedding-ada-002_chunksize_20* has the lowest model agreement with ChatGPT (in other words, likely the best answer), along with the lowest cost, token usage and latency.

![subquery_leaderboard](subquery_leaderboard.png)

This example highlights the benefit of using llama-index for document-augmnented retrieval + the ability to break down complex queries into subquestions, along with the importance of using TruLens for evaluation to help us select a good embedding and an appropriate chunk size.