# RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

This notebook demonstrates RAPTOR using **Anthropic Claude** for summarization and QA.

## Setup

Make sure you have a `.env` file in the project root with your API keys:
```
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...  # for embeddings
```

In [None]:
# Load the sample document
with open('../demo/sample.txt', 'r') as file:
    text = file.read()

print(text[:100])

## How RAPTOR works

1. **Building**: RAPTOR recursively embeds, clusters, and summarizes chunks of text to construct a tree with varying levels of summarization from the bottom up. You can create a tree from the text in `sample.txt` using `RA.add_documents(text)`.

2. **Querying**: At inference time, the RAPTOR model retrieves information from this tree, integrating data across lengthy documents at different abstraction levels. You can perform queries on the tree with `RA.answer_question`.

### Building the tree (Claude + OpenAI embeddings)

In [None]:
from raptor import RetrievalAugmentation

In [None]:
# Default config now uses Claude for summarization & QA, OpenAI for embeddings
RA = RetrievalAugmentation()

# Build the tree
RA.add_documents(text)

### Querying from the tree

```python
question = "any question"
RA.answer_question(question)
```

In [None]:
question = "How did Cinderella reach her happy ending?"

answer = RA.answer_question(question=question)

print("Answer:", answer)

In [None]:
# Save the tree
SAVE_PATH = "../demo/cinderella"
RA.save(SAVE_PATH)

In [None]:
# Load back the tree
RA = RetrievalAugmentation(tree=SAVE_PATH)

answer = RA.answer_question(question=question)
print("Answer:", answer)

## Using custom models

RAPTOR is designed to be flexible. You can use any models for summarization, QA, and embeddings by extending the base classes.

In [None]:
from raptor import (
    BaseSummarizationModel,
    BaseQAModel,
    BaseEmbeddingModel,
    SBertEmbeddingModel,
    RetrievalAugmentationConfig,
    ClaudeSummarizationModel,
    ClaudeQAModel,
)

# Use Claude for summarization + QA, SBert for embeddings
config = RetrievalAugmentationConfig(
    summarization_model=ClaudeSummarizationModel(),
    qa_model=ClaudeQAModel(),
    embedding_model=SBertEmbeddingModel(),
)

RA = RetrievalAugmentation(config=config)
RA.add_documents(text)

answer = RA.answer_question(question="How did Cinderella reach her happy ending?")
print("Answer:", answer)

## Benchmark: RAPTOR vs Flat Retrieval

Compare RAPTOR's hierarchical tree retrieval against standard flat FAISS retrieval.

In [None]:
from raptor import RaptorBenchmark

benchmark = RaptorBenchmark()

questions = [
    "How did Cinderella reach her happy ending?",
    "What role did the birds play in the story?",
    "How were the stepsisters punished?",
]

report = benchmark.run(text, questions)
print(report.summary())