# Leia V3 Retriever Router

This version of Leia is a test case for a Router based Query Engine. Instead of giving the tools explicitly, each "tool" is a separate index and the router transfers the query to the most suitable one.
using a retrieval router means that we can have a very large number of tools and indexes without spamming the context.

In [1]:
import os
import json
import sys

In [2]:
# NOTE: This is ONLY necessary in jupyter notebook.
# Details: Jupyter runs an event-loop behind the scenes.
#          This results in nested event-loops when we start an event-loop to make async queries.
#          This is normally not allowed, we use nest_asyncio to allow it for convenience.
import nest_asyncio

nest_asyncio.apply()

## Read and set API keys

In [3]:
# Open and read the config file
with open('config.json', 'r') as config_file:
    config_data = json.load(config_file)

# Retrieve the API key from the config data
api_key = config_data['api_key']
os.environ['OPENAI_API_KEY'] = api_key

## Logging

In [4]:
# Set up logging
import logging
logging.basicConfig(stream=sys.stdout, level=logging.INFO) #DEBUG, INFO, WARNING, ERROR, CRITICAL
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

In [5]:
import tiktoken
from llama_index.callbacks import CallbackManager, TokenCountingHandler
from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext

# you can set a tokenizer directly, or optionally let it default
# to the same tokenizer that was used previously for token counting
# NOTE: The tokenizer should be a function that takes in text and returns a list of tokens
token_counter = TokenCountingHandler(
    tokenizer=tiktoken.encoding_for_model("gpt-4").encode,
    verbose=True  # set to true to see usage printed to the console
    )
callback_manager = CallbackManager([token_counter])

INFO:numexpr.utils:Note: NumExpr detected 10 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
Note: NumExpr detected 10 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8.
INFO:numexpr.utils:NumExpr defaulting to 8 threads.
NumExpr defaulting to 8 threads.


## Set up query engine for each index separately

### Load the Actions index

In [6]:
from llama_index import StorageContext, load_index_from_storage

# rebuild storage context
storage_context = StorageContext.from_defaults(persist_dir="./storage/actions")
# load index
actions_index = load_index_from_storage(storage_context)
actions_engine = actions_index.as_query_engine(similarity_top_k=3) #uses the default llm! Not gpt4!

INFO:llama_index.indices.loading:Loading all indices.
Loading all indices.


### Load the Controls index

### Load the Documentation index

In [7]:
# rebuild storage context
storage_context = StorageContext.from_defaults(persist_dir="./storage/documentation")
# load index
docu_index = load_index_from_storage(storage_context)
docu_engine = docu_index.as_query_engine(similarity_top_k=3) #uses the default llm! Not gpt4!

INFO:llama_index.indices.loading:Loading all indices.
Loading all indices.


## setup tool retriever query engine

In [8]:
from llama_index.tools import QueryEngineTool, ToolMetadata

query_engine_tools = [
    QueryEngineTool(
        query_engine=actions_engine,
        metadata=ToolMetadata(
            name="LeiaActions",
            description="provides documentation and descriptions for all implemented LeiaActions. try finding the appropriate action for a given user request and fill in the arguments. in case a required argument is not provided by context or user input, ask the user for it. LeiaActions are intended to be used under the hood of LiquidEarth, so the user should not be aware of them.to trigger an action, append the output of the tool to your answer in the following form [TRIGGERACTION nameOfAction(arguments)]. Never return any actions that you are not sure about!"
        )
    ),
    QueryEngineTool(
        query_engine=docu_engine,
        metadata=ToolMetadata(
            name="Documentation",
            description="provides general information, user manual and the roadmap for LiquidEarth. pass the entire user request to the tool to find the appropriate documentation and use it for your response."
        )
    ),

]

In [9]:
from llama_index.objects import ObjectIndex, SimpleToolNodeMapping

tool_mapping = SimpleToolNodeMapping.from_objects(query_engine_tools)
obj_index = ObjectIndex.from_objects(
    query_engine_tools,
    tool_mapping,
    VectorStoreIndex,
)


## custom Agent (wip)

In [10]:
from llama_index.llms import OpenAI
from llama_index import ServiceContext

gpt4 = OpenAI(temperature=0, model="gpt-4")
service_context_gpt4 = ServiceContext.from_defaults(llm=gpt4,callback_manager=callback_manager)

### create Prompt Template

In [11]:
from llama_index import Prompt
# define custom Prompt
TEMPLATE_STR = (
    "You are Leia, the LiquidEarth Intelligent Assistant. You are helping a user with a question about LiquidEarth. You are very smart and friendly and always in a great mood.\n"
    "In LiquidEarth, a 'Space' and a 'Project are the same thing. We have provided Documentation on the software and further information below. In some cases the metadata includes a 'Control' Field that points to a UI element in the app associated to the described functionality. this is only for internal use. when describing controls to the user, use descriptions and names from the text, not the 'control' values. \n"
    "If the user asks you to do something in LiquidEarth, look for the right 'LeiaAction' in the context and compile the function call based on the information and example. do not use any Actions that are not documented. return the function call at the end of your response in curly braces. \n"
    "---------------------\n"
    "{context_str}"
    "\n---------------------\n"
    "Answer the question for a human to understand. Additionally, return the 'Control' properties in the order of operations in the following form at the end of your response: [[control1], [control2], ...]. Append the list to your response without further comment. If no controls are found, do not comment it. Never include any controls that are not specified in the Metadata Field in the provided documentation. Do not interpret any controls from the text body. If the answer requires multiple steps, describe each step in detail. Given this information, please answer the question: {query_str}\n"
)
QA_TEMPLATE = Prompt(TEMPLATE_STR)

In [19]:
from llama_index.query_engine import ToolRetrieverRouterQueryEngine


query_engine = ToolRetrieverRouterQueryEngine(obj_index.as_retriever(),service_context_gpt4)



In [None]:
#dirty hack: trying to increase the context size
query_engine.retriever._similarity_top_k = 10

In [22]:
response = query_engine.query("what can you do for me?")

INFO:llama_index.query_engine.router_query_engine:Combining responses from multiple query engines.
Combining responses from multiple query engines.
LLM Prompt Token Usage: 112
LLM Completion Token Usage: 42


In [23]:
print(response)

Based on the context information, I can provide you with answers to frequently asked questions about the basics and guide you on how to change the background color of the LiquidEarth Workspace using the LeiaActionsSwitchBackgroundColor command.


## Chat with a prompt template (ToDo)

In [None]:
custom_prompt = Prompt("""\
Given a conversation (between Human and Assistant) and a follow up message from Human, \
rewrite the message to be a standalone question that captures all relevant context \
from the conversation.

<Chat History>
{chat_history}

<Follow Up Message>
{question}

<Standalone question>
""")

# list of (human_message, ai_message) tuples
custom_chat_history = [
    (
        'Hello assistant, we are having a insightful discussion about Paul Graham today.',
        'Okay, sounds good.'
    )
]

query_engine = index.as_query_engine()
chat_engine = CondenseQuestionChatEngine.from_defaults(
    query_engine=query_engine,
    condense_question_prompt=custom_prompt,
    chat_history=custom_chat_history,
    verbose=True
)

## print token usage

In [None]:
print('Embedding Tokens: ', token_counter.total_embedding_token_count, '\n',
      'LLM Prompt Tokens: ', token_counter.prompt_llm_token_count, '\n',
      'LLM Completion Tokens: ', token_counter.completion_llm_token_count, '\n',
      'Total LLM Token Count: ', token_counter.total_llm_token_count)