### Step-0: Installing the required libraries

In [36]:
# !pip install pageindex
# !pip install dotenv
# !pip install cerebras-cloud-sdk

In [37]:
import os
from dotenv import load_dotenv

load_dotenv()
pageindex_api_key = os.getenv("PAGEINDEX_API_KEY")
cerebras_api_key = os.getenv("CEREBRAS_API_KEY")

In [39]:
from cerebras.cloud.sdk import AsyncCerebras

async def call_llm(prompt, model="qwen-3-235b-a22b-instruct-2507", temparature=0):
    client = AsyncCerebras(api_key=cerebras_api_key)
    response = await client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        temperature=temparature
    )
    return response.choices[0].message.content.strip()

### Step-1: PageIndex Tree Generation

In [3]:
from pageindex import PageIndexClient
 
pi_client = PageIndexClient(api_key=pageindex_api_key)

In [9]:
import os, requests

pdf_path = os.path.join("./data", "fia_f1_power_unit_financial_regulations_issue_1_-_2022-08-16.pdf")
os.makedirs(os.path.dirname(pdf_path), exist_ok=True)

doc_id = pi_client.submit_document(pdf_path)["doc_id"]
print('Document Submitted:', doc_id)

Document Submitted: pi-cmgb1z9gt00iq09oambspf9a0


In [12]:
import pageindex.utils as utils
if pi_client.is_retrieval_ready(doc_id):
    tree = pi_client.get_tree(doc_id, node_summary=True)['result']
    print('Simplified Tree Structure of the Document:')
    utils.print_tree(tree)
else:
    print("Processing document, please try again later...")

Simplified Tree Structure of the Document:
[{'title': 'FORMULA 1 POWER UNIT FINANCIAL REGULATIO...',
  'node_id': '0000',
  'prefix_summary': 'This partial document outlines the struc...',
  'nodes': [{'title': 'Scope',
             'node_id': '0001',
             'summary': '## Scope\n\n1.1 These Power Unit Financial...'},
            {'title': 'Objectives',
             'node_id': '0002',
             'summary': 'This partial document outlines the objec...'},
            {'title': 'Accountability',
             'node_id': '0003',
             'summary': '## Accountability\n\n1.8 Each Power Unit M...'},
            {'title': '2. POWER UNIT MANUFACTURER OBLIGATIONS',
             'node_id': '0004',
             'summary': 'This partial document outlines the oblig...'},
            {'title': 'Cap on Relevant Costs',
             'node_id': '0005',
             'summary': '## Cap on Relevant Costs\n\n2.2 A Power Un...'},
            {'title': 'The Power Unit Cost Cap',
             'node

In [15]:
tree[0].keys()

dict_keys(['title', 'node_id', 'page_index', 'prefix_summary', 'text', 'nodes'])

In [16]:
len(tree)

1

In [20]:
len(tree[0]['nodes'])

39

In [31]:
id_ = 6
print(tree[0]['nodes'][id_]['title'])
print("------")
print(tree[0]['nodes'][id_]['text'])

Reporting Group
------
## Reporting Group

2.5 For the purposes of reporting Total Costs of the Reporting Group, a Power Unit Manufacturer's Reporting Group shall comprise the Power Unit Manufacturer together with, where the Power Unit Manufacturer has incurred less than $95 \%$ of the costs of the Power Unit Activities undertaken by or on behalf of the Power Unit Manufacturer in the Reporting Period, such additional entities within the Power Unit Manufacturer's Legal Group Structure as are determined in accordance with Article 2.6.
2.6 The additional entities to be included within the Reporting Group where a Power Unit Manufacturer has incurred less than $95 \%$ of the costs of the Power Unit Activities undertaken by or on behalf of that Power Unit Manufacturer in the Reporting Period shall be the entity (other than the Power Unit Manufacturer) within the Power Unit Manufacturer's Legal Group Structure that incurred the greatest amount of costs of the Power Unit Activities undertaken 

### Step-2: Reasoning-Based Retrieval with Tree Search

In [42]:
import json 
query = f"What are the conclusions in this document?"

tree_without_text = utils.remove_fields(tree.copy(), fields=['text'])

search_prompt = f"""
You are given a question and a tree structure of a document.
Each node contains a node id, node title, and a corresponding summary.
Your task is to find all nodes that are likely to contain the answer to the question.

Question: {query}

Document tree structure:
{json.dumps(tree_without_text, indent=2)}

Please reply in the following JSON format:
{{
    "thinking": "<Your thinking process on which nodes are relevant to the question>",
    "node_list": ["node_id_1", "node_id_2", ..., "node_id_n"]
}}
Directly return the final JSON structure. Do not output anything else.
"""

tree_search_result = await call_llm(search_prompt)

In [44]:
node_map = utils.create_node_mapping(tree)
tree_search_result_json = json.loads(tree_search_result)

print(f"Reasoning Process: ")
utils.print_wrapped(tree_search_result_json['thinking'])

print(f"\nRetrieved Nodes:")
for node_id in tree_search_result_json['node_list']:
    node = node_map[node_id]
    print(f"Node ID: {node['node_id']}\t Page: {node['page_index']}\t Title: {node['title']}")

Reasoning Process: 
The question asks for the conclusions in the document. Conclusions are typically found in sections
that summarize findings, decisions, or outcomes. In this regulatory document, the most likely
sections to contain conclusions are those that detail decisions made by the Cost Cap Adjudication
Panel, outcomes of investigations, or final determinations on compliance. The node titled 'Decision'
(node_id: 0022) explicitly describes the process and content of decisions made after hearings,
including whether a breach is found, what sanctions are imposed, and how compliance is confirmed.
This node is the most likely to contain conclusions. Additionally, the 'Review of Reporting
Documentation' (node_id: 0015) outlines possible outcomes of reviews, such as issuing a compliance
certificate or taking further action, which may also reflect conclusions. The 'Accepted Breach
Agreement' (node_id: 0019) and 'Appeals' (node_id: 0023) may contain final determinations or
resolutions, but

In [45]:
tree_search_result_json.keys()

dict_keys(['thinking', 'node_list'])

In [50]:
print(len(node_map))
print(node_map.keys())
print(node_map['0000'].keys())

40
dict_keys(['0000', '0001', '0002', '0003', '0004', '0005', '0006', '0007', '0008', '0009', '0010', '0011', '0012', '0013', '0014', '0015', '0016', '0017', '0018', '0019', '0020', '0021', '0022', '0023', '0024', '0025', '0026', '0027', '0028', '0029', '0030', '0031', '0032', '0033', '0034', '0035', '0036', '0037', '0038', '0039'])
dict_keys(['title', 'node_id', 'page_index', 'prefix_summary', 'text', 'nodes'])


### Step-3: Answer Generation

In [52]:
node_list = tree_search_result_json['node_list']
relevant_context = "\n\n".join(node_map[node_id]['text'] for node_id in node_list)

print(f"Retrieved Context: ")
utils.print_wrapped(relevant_context[:1000]+"...")


Retrieved Context: 
## Decision

7.26 Following a hearing, the judging panel shall make its decision, which shall:
(a) be reached unanimously or else by a majority vote with each member of the judging panel having
one vote and in the event of a deadlock the President of the Hearing having a further casting vote;
(b) be in writing in the English language;
(c) state the reasons for its decision;
(d) be notified to each of the FIA and the Respondent;
(e) in the event that a Power Unit Manufacturer is found to have been in breach of these Power Unit
Financial Regulations, contain details of:

(i) any sanction (which shall be determined in accordance with Article 9); and
(ii) the costs to be borne by the Power Unit Manufacturer, which shall be calculated by reference to
the reasonable costs incurred by the Cost Cap Administration and the Cost Cap Adjudication Panel in
connection with any investigation and/or adjudication. In the event that the reasonable costs
incurred by the Cost Cap Admin

In [53]:
answer_prompt = f"""
Answer the question based on the context:

Question: {query}
Context: {relevant_context}

Provide a clear, concise answer based only on the context provided.
"""

print('Generated Answer:\n')
answer = await call_llm(answer_prompt)
utils.print_wrapped(answer)

Generated Answer:

The conclusions in this document are the decisions made by the judging panel following a hearing,
which must be unanimous or majority-based, in writing, with stated reasons, and notified to the FIA
and the Respondent. If a Power Unit Manufacturer is found in breach of the Financial Regulations,
the conclusion includes details of any sanction and the costs they must bear. If compliant, the
conclusion instructs the Cost Cap Administration to issue a compliance certificate. These decisions
may be published (excluding confidential information) and can be reexamined within three months if
important new evidence emerges.
