# Cost-Optimized Parsing with Auto-Mode

<a href="https://colab.research.google.com/github/run-llama/llama_parse/blob/main/examples/demo_auto_mode.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![](diagram.jpg)

Many documents can have varying complexity across pages - some pages have text, and other pages have images. The text-only pages only require cheap parsing modes, whereas the image-based pages require more advanced modes. In this notebook we show you how to take advantage of "auto mode" in LlamaParse which adaptively parses different pages according to different modes, which lets you get optimal performance at the cheapest cost.


In [None]:
!pip install llama-index
!pip install llama-index-core
!pip install llama-index-embeddings-openai llama-index-llms-openai
!pip install llama-parse

In [None]:
!mkdir -p data
!wget 'https://www.dropbox.com/scl/fi/sterddtajrf844ytvwlim/the-amazon-nova-family-of-models-technical-report-and-model-card.pdf?rlkey=0if0ct5diw70jifr9m8fikpsc&dl=0' -O './data/nova_technical_report.pdf'

Some OpenAI and LlamaParse details

In [None]:
# llama-parse is async-first, running the async code in a notebook requires the use of nest_asyncio
import nest_asyncio

nest_asyncio.apply()

In [None]:
import os

# API access to llama-cloud
os.environ["LLAMA_CLOUD_API_KEY"] = "llx-xxx"

# Using OpenAI API for embeddings/llms
os.environ["OPENAI_API_KEY"] = "sk-proj-xxx"

In [None]:
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import VectorStoreIndex
from llama_index.core import Settings

embed_model = OpenAIEmbedding(model="text-embedding-3-small")
llm = OpenAI(model="gpt-4o-mini")

Settings.llm = llm
Settings.embed_model = embed_model

## Using `LlamaParse` with Auto-Mode

We feed the Nova technical report into LlamaParse with auto-mode enabled to get back the Markdown representation.

In [None]:
from llama_parse import LlamaParse

file_path = "data/nova_technical_report.pdf"

documents = LlamaParse(
    result_type="markdown",
    auto_mode=True,
    auto_mode_trigger_on_image_in_page=True,
    auto_mode_trigger_on_table_in_page=True,
    # auto_mode_trigger_on_text_in_page="<text_on_page>"
    # auto_mode_trigger_on_regexp_in_page="<regexp_on_page>"
).load_data(file_path)

Started parsing the file under job_id 638e9b31-eb09-43d2-bf08-b54cef51ddb1
.......

We'll also run the same document through the default markdown mode to compare.

In [None]:
base_documents = LlamaParse(result_type="markdown").load_data(file_path)

Started parsing the file under job_id ddca58f6-4919-4069-b864-616a5956696e
........

## Creating page nodes from our parsed documents

This is just a convenience to make it easy to inspect the parsed pages.

In [None]:
from copy import deepcopy
from llama_index.core.schema import TextNode
from llama_index.core import VectorStoreIndex


def get_page_nodes(docs, separator="\n---\n"):
    """Split each document into page node, by separator."""
    nodes = []
    for doc in docs:
        doc_chunks = doc.text.split(separator)
        for doc_chunk in doc_chunks:
            node = TextNode(
                text=doc_chunk,
                metadata=deepcopy(doc.metadata),
            )
            nodes.append(node)

    return nodes

In [None]:
page_nodes = get_page_nodes(documents)
base_page_nodes = get_page_nodes(base_documents)

## Triggering on images

Let's look at the first page, which has a complex diagram:

<img src="./page_1.png" width="500">


The auto-mode parsed page has automatically converted the diagram into a [Mermaid](https://mermaid.js.org/) chart:

In [None]:
print(page_nodes[0].get_content())

# The Amazon Nova Family of Models:
# Technical Report and Model Card

Amazon Artificial General Intelligence

```mermaid
graph TD
    A[Text] --> B[Nova Lite]
    C[Image] --> B
    D[Video] --> E[Nova Pro]
    F[Code] --> E
    G[Docs] --> E
    B --> H[Text]
    B --> I[Code]
    E --> H
    E --> I
    J[Text] --> K[Nova Micro]
    L[Code] --> K
    K --> M[Text]
    K --> N[Code]
    O[Text] --> P[Nova Canvas]
    Q[Image] --> P
    P --> R[Image]
    S[Text] --> T[Nova Reel]
    U[Image] --> T
    T --> V[Video]
    
    style B fill:#f9f,stroke:#333,stroke-width:2px
    style E fill:#f9f,stroke:#333,stroke-width:2px
    style K fill:#f9f,stroke:#333,stroke-width:2px
    style P fill:#f9f,stroke:#333,stroke-width:2px
    style T fill:#f9f,stroke:#333,stroke-width:2px
    
    classDef input fill:#lightblue,stroke:#333,stroke-width:1px;
    class A,C,D,F,G,J,L,O,Q,S,U input;
    
    classDef output fill:#lightgreen,stroke:#333,stroke-width:1px;
    class H,I,M,N,R,V output;
```



This chart renders the diagram accurately:

![Mermaid chart](./mermaid_render.png)


For comparison, standard mode does not address the chart at all:

In [None]:
print(base_page_nodes[0].get_content())

# The Amazon Nova Family of Models: Technical Report and Model Card

# Amazon Artificial General Intelligence

# Abstract

We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents and text. Amazon Nova Micro is a text-only model that delivers our lowest-latency responses at very low cost. Amazon Nova Canvas is an image generation model that creates professional grade images with rich customization controls. Amazon Nova Reel is a video generation model offering high-quality outputs, customization, and motion control. Our models were built responsibly and with a commitment to customer trust, security, and reliability. We report benchmarking results fo

## Triggering on tables

**Page 11** contains a table, and we can see that auto-mode automatically switches to higher-quality parsing vs. the default parsed page.

<img src="./page_11.png" width="500">

Auto mode has accurately got table 6 with headings and subheadings:

In [None]:
print(page_nodes[10].get_content())

# The Amazon Nova Family of Models

| Nova Micro | Nova Lite | Nova Pro |
|------------|-----------|----------|
| Nova Micro performance chart | Nova Lite performance chart | Nova Pro performance chart |

Figure 2: Text Needle-in-a-Haystack recall performance for Nova Micro (up-to 128k), Nova Lite (up-to 300k) and Nova Pro (up-to 300k) models.

| | SQuALITY | LVBench |
|------------|-----------|----------|
| | ROUGE-L | accuracy |
| Nova Pro | 19.8 ±8.7 | 41.6 ±2.5 |
| Nova Lite | 19.2 ±8.6 | 40.4 ±2.4 |
| Nova Micro | 18.8 ±8.6 | - |
| Claude 3.5 Sonnet (Jun) | 13.4 ±7.5 | - |
| Gemini 1.5 Pro (001) | - | 33.1 ±2.3 |
| Gemini 1.5 Pro (002) | 19.1 ±8.6 M | - |
| Gemini 1.5 Flash (002) | 18.1 ±8.4 M | - |
| GPT-4o | 18.8 ±8.6 | 30.8 ±2.3 |
| Llama 3 - 70B | 16.4 ±8.1 | - |
| Llama 3 - 8B | 15.3 ±7.9 | - |

Table 6: Text and Multimodal long context performance on SQuALITY (ROUGE-L) and LVBench (Accuracy). For SQuALITY, measurements for Claude 3.5 Sonnet, GPT-4o, Llama 3 70B and Llama 3 8

While in standard mode the subheadings end up merged into the first row:

In [None]:
print(base_page_nodes[10].get_content())

# The Amazon Nova Family of Models

| |Nova Micro|Nova Lite|Nova Pro|
|---|---|---|---|
|10|10|10| |
|20|20|20| |
|30|30|30| |
|2| | |75|
|40|40|40| |
|[ 50|50|50|50|
|1 60|60|60| |
|70|70|70| |
| |80|80|80|
| |90|90|90|
|100|100|100| |
|3 3 8|3 3 4 %|3 8|3 3 8 8 3 8|

Figure 2: Text Needle-in-a-Haystack recall performance for Nova Micro (up-to 128k), Nova Lite (up-to 300k) and Nova Pro (up-to 300k) models.

| |SQuALITY|LVBench| |
|---|---|---|---|
|ROUGE-L|Nova Pro|19.8 ±8.7|41.6 ±2.5|
| |Nova Lite|19.2 ±8.6|40.4 ±2.4|
| |Nova Micro|18.8 ±8.6|-|
| |Claude 3.5 Sonnet (Jun)|13.4 ±7.5|-|
| |Gemini 1.5 Pro (001)|-|33.1 ±2.3|
| |Gemini 1.5 Pro (002)|19.1 ±8.6 M|-|
| |Gemini 1.5 Flash (002)|18.1 ±8.4 M|-|
| |GPT-4o|18.8 ±8.6|30.8 ±2.3|
| |Llama 3 - 70B|16.4 ±8.1|-|
| |Llama 3 - 8B|15.3 ±7.9|-|

Table 6: Text and Multimodal long context performance on SQuALITY (ROUGE-L) and LVBench (Accuracy). For SQuALITY, measurements for Claude 3.5 Sonnet, GPT-4o, Llama 3 70B and Llama 3 8B are taken from

## Rendering charts

**Page 14** contains all charts. Auto-mode detects these charts and uses premium processing to convert these charts into both tabular and mermaid format. Whereas the markdown mode has a few more challenges in converting the chart to markdown.

![](page_14.png)

In [None]:
print(page_nodes[13].get_content())

# The Amazon Nova Family of Models

| Model Family | Meta | Amazon | Google | Mistral AI | OpenAI | Anthropic |
|--------------|------|--------|--------|------------|--------|-----------|
| Time to First Token (sec) | 0.72 | 0.37 | 0.35 | 0.53 | 0.62 | 0.98 |
| Output Tokens per Second | 58 | 115 | 190 | 73 | 64 | 29 |
| Total Response Time (sec) | 2.9 | 1.4 | 0.9 | 2.4 | 2.7 | 4.0 |

```mermaid
graph TD
    subgraph "Time to First Token (sec)"
        A1[Llama 2 7B] --> 0.29
        A2[Nova Micro] --> 0.32
        A3[Gemini 1.5 Pro 1B] --> 0.35
        A4[Gemini 1.5 Pro 1.5B] --> 0.35
        A5[Mistral 8x7B] --> 0.36
        A6[Llama 2 13B] --> 0.36
        A7[Nova Lite] --> 0.37
        A8[Nova Pro] --> 0.38
        A9[Llama 2 70B] --> 0.42
        A10[GPT-3.5] --> 0.42
        A11[Llama 2 34B] --> 0.46
        A12[Mistral Large] --> 0.53
        A13[GPT-4] --> 0.62
        A14[Llama 2 34B] --> 0.72
        A15[Claude 2] --> 0.72
        A16[Claude 3 Sonnet] --> 0.87
        A17[Gem

This table renders the three charts very neatly as a single Markdown table:

| Model Family | Meta | Amazon | Google | Mistral AI | OpenAI | Anthropic |
|--------------|------|--------|--------|------------|--------|-----------|
| Time to First Token (sec) | 0.72 | 0.37 | 0.35 | 0.53 | 0.62 | 0.98 |
| Output Tokens per Second | 58 | 115 | 190 | 73 | 64 | 29 |
| Total Response Time (sec) | 2.9 | 1.4 | 0.9 | 2.4 | 2.7 | 4.0 |

While standard mode does not do nearly as well:

In [None]:
print(base_page_nodes[13].get_content())

# The Amazon Nova Family of Models

# Model Family

| | |Meta| |Amazon| |Google|Mistral AI| | |OpenAI| |Anthropic| | | | | | | | | | | | | | | | | | | | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|1.0| | | | | | | | | | | | | |0.98| | | | | | | | | | | | | | | | | | |
|0.8| | | | | | | | | | | | | |0.87| | | | | | | | | | | | | | | | | | |
|0.6| | | | | | | | | | | |0.72|0.72| | | | | | | | | | | | | | | | | | | |
|0.4| | | | | | |0.42|0.42|0.46| | | | | | | | | | | | | | | | | | | | | | | |
|0.2|0.32|0.35|0.35|0.36|0.36|0.37|0.38| | | | | | | | | | | | | | | | | | | | | | | | | |
|0.0|9|1|2|1|8|8|2|2|8|8|2|8|1|8|2|8|2|J|2| | | | | | | | | | | | | |
|1| | | | | | | | | | | | |8|1|1|1|1|5|1|~|{|Tv|!0'A|3|3|8|3|A|2| | | |
| |283|250|200|210|190|163|157|157| | | | | | | | | | | | | | | | | | | | | | | | |
| |150| | | |124|115|113|100| | | | | | | | | | | | | | | | | | | | | | | | |
| |100| | | | |

## Text-only pages

**Page 3** is fully text, and we can see there's no difference between the auto-mode parsed page vs. the default markdown-mode parsed page. 

<img src="page_3.png" width="500">


In [None]:
print(page_nodes[2].get_content())

# The Amazon Nova Family of Models

# 1 Introduction

This document introduces Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance.

# 1.1 Amazon Nova Pro, Lite, and Micro

Key capabilities of Amazon Nova Pro, Lite, and Micro include:

- Frontier intelligence: Amazon Nova models possess frontier intelligence, enabling them to understand and process complex language tasks with state-of-the-art accuracy. Amazon Nova Micro sets new standards in its intelligence tier in several text benchmarks such as Language Understanding (MMLU), Deep Reasoning (GPQA), Mathematics (MATH), and Multi-step Reasoning (Big-Bench Hard). Our multimodal models, Amazon Nova Pro and Lite, take text, images, documents, and video as input and generate text as output. These models set standards in several benchmarks such as Video Captioning (VATEX), Visual QA (TextVQA), Function Calling (BFCL), and multimodal agentic benchmarks 

In [None]:
print(base_page_nodes[2].get_content())

# The Amazon Nova Family of Models

# 1 Introduction

This document introduces Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance.

# 1.1 Amazon Nova Pro, Lite, and Micro

Key capabilities of Amazon Nova Pro, Lite, and Micro include:

- Frontier intelligence: Amazon Nova models possess frontier intelligence, enabling them to understand and process complex language tasks with state-of-the-art accuracy. Amazon Nova Micro sets new standards in its intelligence tier in several text benchmarks such as Language Understanding (MMLU), Deep Reasoning (GPQA), Mathematics (MATH), and Multi-step Reasoning (Big-Bench Hard). Our multimodal models, Amazon Nova Pro and Lite, take text, images, documents, and video as input and generate text as output. These models set standards in several benchmarks such as Video Captioning (VATEX), Visual QA (TextVQA), Function Calling (BFCL), and multimodal agentic benchmarks 

## Triggering on strings

Instead of trigger on specific structures, we can specify specific strings of interest to switch to Premium mode for. In this case, we'll re-parse the document but this time look for the word "agents" and switch to Premium mode for any page that contains that word.

In [None]:
file_path = "data/nova_technical_report.pdf"
agent_parser = LlamaParse(
    result_type="markdown",
    auto_mode=True,
    # auto_mode_trigger_on_image_in_page=True,
    # auto_mode_trigger_on_table_in_page=True,
    auto_mode_trigger_on_text_in_page="agents"
    # auto_mode_trigger_on_regexp_in_page="<regexp_on_page>"
)
agent_documents = agent_parser.load_data(file_path)

Started parsing the file under job_id dae5b4bf-25c4-4373-98bf-91325d2de5e2


In this example, these pages aren't going to be that different when parsed, but we can verify which pages triggered auto-made by looking at the [JSON output](https://github.com/run-llama/llama_parse/blob/main/examples/demo_json_tour.ipynb) of LlamaParse:

In [None]:
json_output = agent_parser.get_json_result(file_path)[0]

As you can see, the pages that contain the word "agents" are marked as `triggeredAutoMode=True`:

In [None]:
for page in json_output["pages"]:
    print(f"Page {page['page']}: {page['triggeredAutoMode']}")

Page 1: False
Page 2: False
Page 3: True
Page 4: False
Page 5: False
Page 6: False
Page 7: False
Page 8: True
Page 9: True
Page 10: False
Page 11: False
Page 12: False
Page 13: False
Page 14: False
Page 15: False
Page 16: False
Page 17: False
Page 18: False
Page 19: False
Page 20: False
Page 21: True
Page 22: False
Page 23: True
Page 24: False
Page 25: False
Page 26: False
Page 27: True
Page 28: False
Page 29: False
Page 30: False
Page 31: False
Page 32: False
Page 33: False
Page 34: False
Page 35: False
Page 36: False
Page 37: False
Page 38: False
Page 39: False
Page 40: False
Page 41: False
Page 42: False
Page 43: False


## Triggering on regular expressions

Finally, if we have a more complicated pattern of interest, we can specify a regular expression to trigger on. In this case, we'll look for any page that contains the word "agents" or "agentic" in the text:


In [None]:
file_path = "data/nova_technical_report.pdf"
agentic_parser = LlamaParse(
    result_type="markdown",
    auto_mode=True,
    # auto_mode_trigger_on_image_in_page=True,
    # auto_mode_trigger_on_table_in_page=True,
    # auto_mode_trigger_on_text_in_page="agents"
    auto_mode_trigger_on_regexp_in_page="/(A|a)gent(s|ic)/g",
)
agentic_json_output = agentic_parser.get_json_result(file_path)[0]

Started parsing the file under job_id 1a1749ca-5fb0-430e-a2cf-23a4f34f8b09


And if we once again examine the JSON output, you can see that a different set of pages have been upgraded:

In [None]:
for page in agentic_json_output["pages"]:
    print(f"Page {page['page']}: {page['triggeredAutoMode']}")

Page 1: True
Page 2: True
Page 3: True
Page 4: False
Page 5: False
Page 6: False
Page 7: False
Page 8: True
Page 9: True
Page 10: True
Page 11: False
Page 12: False
Page 13: False
Page 14: False
Page 15: False
Page 16: False
Page 17: False
Page 18: False
Page 19: False
Page 20: False
Page 21: True
Page 22: False
Page 23: True
Page 24: False
Page 25: False
Page 26: False
Page 27: True
Page 28: False
Page 29: False
Page 30: False
Page 31: False
Page 32: False
Page 33: False
Page 34: False
Page 35: False
Page 36: False
Page 37: False
Page 38: False
Page 39: False
Page 40: False
Page 41: False
Page 42: False
Page 43: False


## Set up a Simple RAG Pipeline

Let's set up a simple RAG pipeline over these documents! 

In [None]:
# dump both indexed tables and page text into the vector index
vector_index = VectorStoreIndex(page_nodes)
query_engine = vector_index.as_query_engine(similarity_top_k=3)

In [None]:
response = query_engine.query(
    "Give me a comparison graph of time-to-first-token among all models"
)

In [None]:
print(str(response))

Here is a comparison of the Time to First Token (TTFT) for various models:

- **Llama 2 7B**: 0.29 sec
- **Nova Micro**: 0.32 sec
- **Gemini 1.5 Pro 1B**: 0.35 sec
- **Gemini 1.5 Pro 1.5B**: 0.35 sec
- **Mistral 8x7B**: 0.36 sec
- **Llama 2 13B**: 0.36 sec
- **Nova Lite**: 0.37 sec
- **Nova Pro**: 0.38 sec
- **Llama 2 70B**: 0.42 sec
- **GPT-3.5**: 0.42 sec
- **Llama 2 34B**: 0.46 sec
- **Mistral Large**: 0.53 sec
- **GPT-4**: 0.62 sec
- **Llama 2 34B**: 0.72 sec
- **Claude 2**: 0.72 sec
- **Claude 3 Sonnet**: 0.87 sec
- **Gemini 1.5 Pro**: 0.98 sec

This data can be visualized in a bar graph format, with the models on the x-axis and the corresponding TTFT values on the y-axis, showing the performance of each model in terms of response time.
