![pageindex_banner](https://pageindex.ai/static/images/pageindex_banner.jpg)

<p align="center">Reasoning-based RAG&nbsp; ‚ó¶ &nbsp;No Vector DB&nbsp; ‚ó¶ &nbsp;No Chunking&nbsp; ‚ó¶ &nbsp;Human-like Retrieval</p>

<p align="center">
  <a href="https://vectify.ai">üè† Homepage</a>&nbsp; ‚Ä¢ &nbsp;
  <a href="https://chat.pageindex.ai">üñ•Ô∏è Platform</a>&nbsp; ‚Ä¢ &nbsp;
  <a href="https://docs.pageindex.ai/quickstart">üìö API Docs</a>&nbsp; ‚Ä¢ &nbsp;
  <a href="https://github.com/VectifyAI/PageIndex">üì¶ GitHub</a>&nbsp; ‚Ä¢ &nbsp;
  <a href="https://discord.com/invite/VuXuf29EUj">üí¨ Discord</a>&nbsp; ‚Ä¢ &nbsp;
  <a href="https://ii2abc2jejf.typeform.com/to/tK3AXl8T">‚úâÔ∏è Contact</a>&nbsp;
</p>

<div align="center">

[![Star us on GitHub](https://img.shields.io/github/stars/VectifyAI/PageIndex?style=for-the-badge&logo=github&label=‚≠êÔ∏è%20Star%20Us)](https://github.com/VectifyAI/PageIndex) &nbsp;&nbsp; [![Follow us on X](https://img.shields.io/badge/Follow%20Us-000000?style=for-the-badge&logo=x&logoColor=white)](https://twitter.com/VectifyAI)

</div>

---


# Document QA with PageIndex Chat API

Similarity-based RAG based on Vector-DB has shown big limitations in recent AI applications, reasoning-based or agentic retrieval has become important in current developments.

[PageIndex Chat](https://chat.pageindex.ai/) is a AI assistant that allow you chat with multiple super-long documents without worrying about limited context or context rot problem. It is based on [PageIndex](https://pageindex.ai/blog/pageindex-intro), a vectorless reasoning-based RAG framework which gives more transparent and reliable results like a human expert.
<div align="center">
  <img src="https://docs.pageindex.ai/images/cookbook/vectorless-rag.png" width="70%">
</div>

You can now access PageIndex Chat with API or SDK.

## üìù Notebook Overview

This notebook demonstrates a simple, minimal example of doing document analysis with PageIndex Chat API on the recently released [NVIDA 10Q report](https://d18rn0p25nwr6d.cloudfront.net/CIK-0001045810/13e6981b-95ed-4aac-a602-ebc5865d0590.pdf).

### Install PageIndex SDK

In [2]:
%pip install -q --upgrade pageindex

### Setup PageIndex

In [25]:
from pageindex import PageIndexClient

# Get your PageIndex API key from https://dash.pageindex.ai/api-keys
PAGEINDEX_API_KEY = "Your API KEY"
pi_client = PageIndexClient(api_key=PAGEINDEX_API_KEY)

### Upload a document

In [4]:
import os, requests

pdf_url = "https://d18rn0p25nwr6d.cloudfront.net/CIK-0001045810/13e6981b-95ed-4aac-a602-ebc5865d0590.pdf"
pdf_path = os.path.join("../data", pdf_url.split('/')[-1])
os.makedirs(os.path.dirname(pdf_path), exist_ok=True)

response = requests.get(pdf_url)
with open(pdf_path, "wb") as f:
    f.write(response.content)
print(f"Downloaded {pdf_url}")

doc_id = pi_client.submit_document(pdf_path)["doc_id"]
print('Document Submitted:', doc_id)

Downloaded https://d18rn0p25nwr6d.cloudfront.net/CIK-0001045810/13e6981b-95ed-4aac-a602-ebc5865d0590.pdf
Document Submitted: pi-cmi73f7r7022y09nwn40paaom


### Check the processing status

In [22]:
from pprint import pprint

doc_info = pi_client.get_document(doc_id)
pprint(doc_info)

if doc_info['status'] == 'completed':
  print(f"\n Document ready! ({doc_info['pageNum']} pages)")
elif doc_info['status'] == 'processing':
  print("\n Document is still processing. Please wait and check again.")

{'createdAt': '2025-11-20T07:11:44.669000',
 'description': "This document is NVIDIA Corporation's Form 10-Q Quarterly "
                'Report for the period ending October 26, 2025, detailing its '
                'financial performance, operational results, market risks, and '
                'legal proceedings.',
 'id': 'pi-cmi73f7r7022y09nwn40paaom',
 'name': '13e6981b-95ed-4aac-a602-ebc5865d0590.pdf',
 'pageNum': 48,
 'status': 'completed'}

 Document ready! (48 pages)


### Ask a question about this document

In [24]:
query = "what is the revenue? Also show me which page I can find it."

for chunk in pi_client.chat_completions(
    messages=[{"role": "user", "content": query}],
    doc_id=doc_id,
    stream=True
):
    print(chunk, end='', flush=True)

I'll help you find the revenue information from NVIDIA's 10-Q report. Let me first check the document structure to locate the financial information.{"doc_name": "13e6981b-95ed-4aac-a602-ebc5865d0590.pdf"}Now let me get the content from pages that contain the revenue information. Based on the structure, revenue information should be in the financial statements section, which appears early in the document. Let me check pages 3-8 which should contain the Condensed Consolidated Statements of Income.{"doc_name": "13e6981b-95ed-4aac-a602-ebc5865d0590.pdf", "pages": "3-5"}Perfect! I found the revenue information. Let me provide you with the details:

## Revenue Information from NVIDIA's 10-Q Report

**You can find the revenue on Page 3** of the document in the **Condensed Consolidated Statements of Income**.

### Revenue Details:

**For the Three Months Ended October 26, 2025 (Q3 FY2026):**
- **Revenue: $57,006 million** ($57.0 billion)
- Compared to Q3 FY2025 (October 27, 2024): $35,082 mill