# Docling PDF Reader — RAG with rich grounding

In [None]:
from rich.pretty import pprint
import warnings
import os
from dotenv import load_dotenv

load_dotenv()
warnings.filterwarnings(action="ignore", category=FutureWarning, module="easyocr")
warnings.filterwarnings(action="ignore", category=UserWarning, module="pydantic")

## Basic usage

In order to load PDF data with Docling, we use a `DoclingPDFReader`.

To leverage Docling's rich metadata for grounding purposes, we configure:
- `export_type` to JSON, for utilizing Docling's native representation (JSON), and
- `chunk_docs` to True, for having the reader object automatically chunk that JSON representation for us (otherwise we will get LlamaIndex Documents with raw JSON content)

In [None]:
from llama_index.readers.docling.base import DoclingPDFReader

reader = DoclingPDFReader(
    export_type=DoclingPDFReader.ExportType.JSON,  # rich JSON format or Markdown export
    chunk_docs=True,  # whether to chunk the docs already within the reader or return the raw content
)
nodes = reader.load_data(
    file_path="https://arxiv.org/pdf/2408.09869",  # PDF local path or URL (or iterable thereof)
)

Fetching 9 files:   0%|          | 0/9 [00:00<?, ?it/s]

Let's preview a sample node (chunk):

In [None]:
pprint(nodes[5], max_length=5, max_string=250, max_depth=2)

## RAG demo

Setting up the embed model:

In [None]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

Setting up the vector store:

In [None]:
from tempfile import TemporaryDirectory
from llama_index.vector_stores.milvus import MilvusVectorStore

vector_store = MilvusVectorStore(
    uri=os.environ.get(
        "MILVUS_URL", default=f"{(tmp_dir := TemporaryDirectory()).name}/milvus_demo.db"
    ),
    collection_name="docling_collection",
    dim=len(embed_model.get_text_embedding("hi")),
    overwrite=True,
)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Setting up the index:

In [None]:
from llama_index.core import StorageContext, VectorStoreIndex

storage_context = StorageContext.from_defaults(vector_store=vector_store)

index = VectorStoreIndex(
    nodes=nodes,
    embed_model=embed_model,
    storage_context=storage_context,
    show_progress=True,
)

Generating embeddings:   0%|          | 0/83 [00:00<?, ?it/s]

Setting up the LLM:

In [None]:
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI

HF_TOKEN = os.environ.get("HF_TOKEN")

llm = HuggingFaceInferenceAPI(
    token=HF_TOKEN,
    model_name="mistralai/Mixtral-8x7B-Instruct-v0.1",
)

And now we are ready to ask questions on our document content.

As shown below, besides the response itself, we are getting the PDF-level grounding, incl. page number and bounding box information:

In [None]:
query_engine = index.as_query_engine(llm=llm)
query_res = query_engine.query("Can I use OCR with Docling?")
pprint(query_res, max_length=5, max_string=70, max_depth=4)