# TRACE 

This is a KG-enhanced reader model that employs Knowledge Graphs (KGs) to imporve generation performance: https://arxiv.org/abs/2406.11460.  

In [9]:
%pip install -q python-terrier accelerate pyterrier_t5
%pip install -q pyterrier_dr 
%pip install -q pyterrier_caching 
# %pip install -q pyterrier-rag

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[0mNote: you may need to restart the kernel to use updated packages.


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[0mNote: you may need to restart the kernel to use updated packages.


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[0mNote: you may need to restart the kernel to use updated packages.


In [None]:
import pyterrier as pt
import pyterrier_rag as ptr 

## Retrieval Setup

Lets get a BM25 retriever from PyTerrier. We could also have used a fast Pisa retriever, via [PyTerrier_Pisa](https://github.com/terrierteam/pyterrier_pisa).

In [11]:
sparse_index = pt.Artifact.from_hf('pyterrier/raghpq-terrier')
bm25 = pt.rewrite.tokenise() >> sparse_index.bm25(include_fields=['docno', 'text', 'title']) >> pt.rewrite.reset()




14:47:07.560 [main] WARN org.terrier.structures.BaseCompressingMetaIndex -- Structure meta reading data file directly from disk (SLOW) - try index.meta.data-source=fileinmem in the index properties file. 1.1 GiB of memory would be required.


# Reader Setup

We're going to use a Llama3-8B-Instruct model through Launcher API as the reader model for generating the final answer.

In [None]:
from fastchat.model import get_conversation_template
from pyterrier_rag.backend import OpenAIBackend 
from pyterrier_rag.prompt import PromptTransformer
from pyterrier_rag.readers import Reader

LAUNCHER_API_KEY = "put_your_key_here"  # replace with your key 

# let us define the prompt 
system_message = r"""
You are a question answering system that answers based strictly on the provided structured knowledge triples, without using any prior knowledge.

You will be given a list of reasoning paths (each is a sequence of knowledge triples), which together support answering the question.

Instructions:
1. Combine relevant facts across the triples logically.
2. If multiple paths are available, prioritize the top-3 scoring ones.
3. Provide a concise, factual answer. Do NOT explain your reasoning.
4. Do NOT say things like "Based on the context..." or "The chain indicates..."
5. Never hallucinate - only answer what is directly entailed by the reasoning paths.
"""
prompt_text = """
Question: {{ query }}

Reasoning Paths:
{{ qcontext }}

Answer:"""

template = get_conversation_template("meta-llama-3.1-sp")
prompt = PromptTransformer(
    conversation_template=template,
    system_message=system_message,
    instruction=prompt_text,
    api_type="openai"
)

# Now we define the LLM backend used in the reader model 
llm_backend = OpenAIBackend(
    model_id="llama-3-8b-instruct", 
    api_key=LAUNCHER_API_KEY,
    generation_args={"temperature":0.0, "max_tokens":256},
    base_url="http://api.terrier.org/v1"
)

# finally we can obtain a reader 
reader = Reader(llm_backend, prompt)

## TRACE model

TRACE model requires converting each retrieved document into a set of knowledge triples and then constructs KG-based reasoning chains from these triples to identify useful information, we will use the `KnowledgeGraphExtractor` and `ReasoningChainGenerator` to achieve this. 

In [None]:
from pyterrier_rag import KnowledgeGraphExtractor, ReasoningChainGenerator 
from pyterrier_dr import E5 
from pyterrier_caching import ScorerCache # optional 

kg_extractor = KnowledgeGraphExtractor(llm_backend) 
reasoning_chain_generator = ReasoningChainGenerator(
    llm_backend, 
    E5(), 
    dataset="hotpotqa",
    verbose=True
)

cache_path="/nfs/pyterrier/cache" 
kg_cache = ScorerCache(cache_path, kg_extractor, group=None, key="docno", value="knowledge_graph", pickle=True) # Cache the extraced KG triples for later use 
trace = kg_cache >> reasoning_chain_generator >> reader  

bm25_trace = (bm25%10) >> trace # TRACE pipeline with BM25 retrieval 

Initialized ReasoningChainGenerator with E5.base() ranking model


Lets make a simple RAG pipeline for comparison. 

In [None]:
from pyterrier_rag.prompt import Concatenator

system_message2 = r"""You are an expert Q&A system that is trusted around the world.
        Always answer the query using the provided context information,
        and not prior knowledge.
        rules to follow:
        1. Not directly reference the given context in your answer
        2. Avoid statements like 'Based on the context, ...' or
        'The context information ...' or anything along those lines."""

prompt_text2 = """
Question: {{ query }}

Context information is:{{ qcontext }}

Answer:"""

template = get_conversation_template("meta-llama-3.1-sp")
prompt = PromptTransformer(
    conversation_template=template,
    system_message=system_message2,
    instruction=prompt_text2,
    api_type="openai"
)

# >>Concatenator()
bm25_reader = Reader(llm_backend, prompt=prompt)
bm25 = (bm25%10)>>Concatenator()>>bm25_reader

# Evaluation 

Now lets run a quick experiment using HotPotQA, comparing vanilla RAG model and TRACE. 

In [17]:
dataset = pt.get_dataset('rag:hotpotqa')
result = pt.Experiment(
    [bm25, bm25_trace],
    dataset.get_topics('dev').head(50),
    dataset.get_answers('dev'),
    [ptr.measures.F1, ptr.measures.EM],
    batch_size=25,
    verbose=True,
    precompute_prefix=True,
    names=['RAG', 'TRACE'],
    baseline=0
)
print(result)

Precomputing results of 50 topics on shared pipeline component (pt.apply.query() >> TerrierRetr(BM25) >> <pyterrier.terrier.rewrite.ResetQuery object at 0x7f77da169bb0>)
  warn("precompute_prefix with batch_size is very experimental. Please report any problems")
pt.Experiment precomputation: 100%|██████████| 2/2 [00:03<00:00,  1.59s/batches]
pt.Experiment: 100%|██████████| 4/4 [04:22<00:00, 65.71s/batches]

    name    EM        F1  EM +  EM -  EM p-value  F1 +  F1 -  F1 p-value
0    RAG  0.02  0.172222   NaN   NaN         NaN   NaN   NaN         NaN
1  TRACE  0.16  0.253494   7.0   0.0    0.006833  12.0  11.0    0.157522



