# Standalone Model Evaluation

This notebook allows you to deploy and evaluate the individual NIM models (Nemotron Parse and NeMoRetriever OCR) **outside** of the full NV-Ingest pipeline. This is useful for:
- Comparing model performance directly
- Testing models before integrating with NV-Ingest
- Debugging and development

**Prerequisites**: 
1. You need a NGC account and API Key (set in your `.env` file)
2. Docker must be installed and configured

**Note**: If you deployed NV-Ingest with the respective profiles, these models may already be running. The cells below will check for existing deployments first.

Checkout the build.nvidia.com endpoints to view the model results in a UI:
- https://build.nvidia.com/nvidia/nemotron-parse
- https://build.nvidia.com/nvidia/nemoretriever-ocr-v1


## Nemotron Parse Standalone Evaluation

[Nemotron Parse](https://build.nvidia.com/nvidia/nemotron-parse) is NVIDIA's state-of-the-art model for  document understanding. It extracts text, tables, and document structure in a single pass.

**Model Details:**
- Container: `nvcr.io/nvidia/nemo-microservices/nemotron-parse:latest`
- Port: 8000 (standalone) or integrated via NV-Ingest
- GPU Memory: ~8-16GB depending on document complexity


In [None]:
%%bash

# Check if Nemotron Parse is already running (either via NV-Ingest or standalone)
echo "Checking for existing Nemotron Parse deployment..."

if docker ps | grep -q "nemotron-parse"; then
    echo "✓ Nemotron Parse is already running!"
    docker ps | grep nemotron-parse
else
    echo "Nemotron Parse not found. Deploying standalone container..."
    echo "This may take several minutes on first run..."
    
    # Create cache directory
    mkdir -p ~/.cache/nim
    
    # Deploy Nemotron Parse standalone
    export NGC_API_KEY=<your-api-key>
    export LOCAL_NIM_CACHE=~/.cache/nim
    mkdir -p "$LOCAL_NIM_CACHE"
    docker run -d \
        --name nemotron-parse-standalone \
        --gpus all \
        --shm-size=16GB \
        -e NGC_API_KEY \
        -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
        -u $(id -u) \
        -p 8000:8000 \
        nvcr.io/nim/nvidia/nemotron-parse:latest

    
    echo "Nemotron Parse container started. Waiting for model to load..."
    echo "Check status with: docker logs -f nemotron-parse-standalone"
fi


Checking for existing Nemotron Parse deployment...
Nemotron Parse not found. Deploying standalone container...
This may take several minutes on first run...


Unable to find image 'nvcr.io/nim/nvidia/nemotron-parse:latest' locally
latest: Pulling from nim/nvidia/nemotron-parse
d9d352c11bbd: Pulling fs layer
4f4fb700ef54: Pulling fs layer
cebaad6571a9: Pulling fs layer
7848405aa76a: Pulling fs layer
3c313eb917ea: Pulling fs layer
1f7fdc9ac514: Pulling fs layer
d1c1b0ca7e57: Pulling fs layer
9f11a52127b5: Pulling fs layer
1d7d94b9f382: Pulling fs layer
f1af243d1e83: Pulling fs layer
3f893d6c8100: Pulling fs layer
fa8ec00ab1d4: Pulling fs layer
fa0fe06a747b: Pulling fs layer
77e2816d77cd: Pulling fs layer
abb9b93d3c5f: Pulling fs layer
1b59656f075d: Pulling fs layer
fb8084993d4d: Pulling fs layer
21a61217e20e: Pulling fs layer
8866bb33a5ff: Pulling fs layer
9f11a52127b5: Waiting
7dc054b813a9: Pulling fs layer
5c0a4f7c45dd: Pulling fs layer
1d7d94b9f382: Waiting
f137fd4a4db3: Pulling fs layer
e5992e1e6490: Pulling fs layer
2073c894355d: Pulling fs layer
f191c297ebee: Pulling fs layer
fa8ec00ab1d4: Waiting
6a8967222828: Pulling fs layer
15824fc38

0152151ec9e685ac24170a57ea00098c2e7289ad3b90ca00fc4bbfb67bc1387f
Nemotron Parse container started. Waiting for model to load...
Check status with: docker logs -f nemotron-parse-standalone


### Test Nemotron Parse

Once the model is loaded, you can test it with a sample document. The model accepts base64-encoded images or PDF pages.


In [11]:
import requests
import base64
import json
from pathlib import Path

# Sample image URL for testing
SAMPLE_IMAGE_URL = "https://assets.ngc.nvidia.com/products/api-catalog/nemotron-parse/example_2.jpg"

def test_nemotron_parse(
    image_source: str, 
    endpoint: str = "http://localhost:8000/v1/chat/completions",
    tool_type: str = "markdown_bbox"
):
    """
    Test Nemotron Parse with a document image.
    
    Based on: https://docs.nvidia.com/nim/vision-language-models/latest/examples/nemotron-parse/api.html
    
    Args:
        image_source: Either a URL (http/https) or a local file path (PNG, JPG)
        endpoint: The Nemotron Parse API endpoint
        tool_type: Extraction mode - one of:
            - "markdown_bbox": (default) Text with bounding boxes in markdown format
            - "markdown_no_bbox": Text in markdown format without bounding boxes
            - "detection_only": Bounding boxes and types only, no text transcription
    
    Returns:
        List of detected elements with text, bounding boxes, and types
    """
    # Determine if input is a URL or local file
    if image_source.startswith(("http://", "https://")):
        # Use URL directly - the API will download it
        image_url = image_source
    else:
        # Read local file and encode as base64
        with open(image_source, "rb") as f:
            image_data = base64.b64encode(f.read()).decode("utf-8")
        
        # Determine media type from file extension
        suffix = Path(image_source).suffix.lower()
        media_types = {".png": "image/png", ".jpg": "image/jpeg", ".jpeg": "image/jpeg"}
        media_type = media_types.get(suffix, "image/png")
        image_url = f"data:{media_type};base64,{image_data}"
    
    # Prepare the request payload per official API docs
    # Note: Text input is not supported - only images
    payload = {
        "model": "nvidia/nemotron-parse",
        "tools": [
            {
                "type": "function",
                "function": {
                    "name": tool_type  # markdown_bbox, markdown_no_bbox, or detection_only
                }
            }
        ],
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": image_url
                        }
                    }
                ]
            }
        ],
        "temperature": 0.0
    }
    
    try:
        response = requests.post(endpoint, json=payload, timeout=120)
        response.raise_for_status()
        result = response.json()
        
        # Parse the tool call response
        tool_call = result["choices"][0]["message"]["tool_calls"][0]
        detections = json.loads(tool_call["function"]["arguments"])
        return detections
        
    except requests.exceptions.ConnectionError:
        return {"error": "Could not connect to Nemotron Parse. Make sure the container is running and model is loaded."}
    except KeyError as e:
        return {"error": f"Unexpected response format: {e}", "raw_response": result}
    except Exception as e:
        return {"error": str(e)}

# Check if service is ready
print("Checking Nemotron Parse health...")
try:
    health = requests.get("http://localhost:8000/v1/health/ready", timeout=10)
    if health.status_code == 200:
        print("✓ Nemotron Parse is ready!")
        print("\nTest with the sample image:")
        print(f'  result = test_nemotron_parse("{SAMPLE_IMAGE_URL}")')
        print("\nOr use a local file:")
        print('  result = test_nemotron_parse("./path/to/image.png")')
        print('\nTool types available:')
        print('  - "markdown_bbox" (default): Text + bounding boxes')
        print('  - "markdown_no_bbox": Text only, no bounding boxes')
        print('  - "detection_only": Bounding boxes only, no text')
    else:
        print(f"Service returned status: {health.status_code}")
except requests.exceptions.ConnectionError:
    print("⏳ Nemotron Parse is still loading. Please wait and try again.")
    print("   Check status with: !docker logs -f nemotron-parse-standalone")


Checking Nemotron Parse health...
✓ Nemotron Parse is ready!

Test with the sample image:
  result = test_nemotron_parse("https://assets.ngc.nvidia.com/products/api-catalog/nemotron-parse/example_2.jpg")

Or use a local file:
  result = test_nemotron_parse("./path/to/image.png")

Tool types available:
  - "markdown_bbox" (default): Text + bounding boxes
  - "markdown_no_bbox": Text only, no bounding boxes
  - "detection_only": Bounding boxes only, no text


In [12]:
 result = test_nemotron_parse("https://assets.ngc.nvidia.com/products/api-catalog/nemotron-parse/example_2.jpg")

In [13]:
result

[[{'bbox': {'xmin': 0.0565878453038674,
    'ymin': 0.0648,
    'xmax': 0.9412497237569062,
    'ymax': 0.0781},
   'text': '_Appl. Sci._ **2023**, _13_, 12967 4 of 21',
   'type': 'Page-header'},
  {'bbox': {'xmin': 0.2754486187845304,
    'ymin': 0.3078,
    'xmax': 0.9412497237569062,
    'ymax': 0.5164},
   'text': 'The adopted experimental equipment for performing VHCF tests was an ultrasonic fatigue testing machine (UFTM). The UFTM is shown in Figure 1 and was developed at Politecnico di Torino for testing components in the chosen fatigue range. The UFTM can perform tests at \\(20\\pm 0.5\\) kHz under fully reversed tension-compression stress (_R_=-1), which significantly reduces the testing time compared to conventional testing apparatuses. To perform fatigue tests, the UFTM adopts the following devices: (i) an electric generator (Branson DCX 4 kW, not visible in Figure 1) which generates an electrical sinusoidal wave at 20 kHz; (ii) a piezoelectric transducer, responsible for c

### Cleanup Nemotron Parse Standalone

Run this cell to stop and remove the standalone Nemotron Parse container when you're done testing.


In [14]:
%%bash

# Stop and remove the standalone Nemotron Parse container
if docker ps -a | grep -q "nemotron-parse-standalone"; then
    echo "Stopping Nemotron Parse standalone container..."
    docker stop nemotron-parse-standalone
    docker rm nemotron-parse-standalone
    echo "✓ Nemotron Parse standalone container removed."
else
    echo "No standalone Nemotron Parse container found."
fi


Stopping Nemotron Parse standalone container...
nemotron-parse-standalone
nemotron-parse-standalone
✓ Nemotron Parse standalone container removed.


## NeMoRetriever OCR Standalone Evaluation

[NeMoRetriever OCR](https://build.nvidia.com/nvidia/nemoretriever-ocr-v1/deploy) is NVIDIA's high-performance OCR model that provides faster and more accurate text extraction compared to PaddleOCR.

**Model Details:**
- Container: `nvcr.io/nvidia/nemo-microservices/nemoretriever-ocr-v1:latest`
- Port: 8001 (standalone to avoid conflict with Nemotron Parse)
- Model Name: `scene_text_ensemble`
- GPU Memory: ~4-8GB


In [None]:
%%bash

# Check if NeMoRetriever OCR is already running (either via NV-Ingest or standalone)
echo "Checking for existing NeMoRetriever OCR deployment..."

if docker ps | grep -q "nemoretriever-ocr"; then
    echo "✓ NeMoRetriever OCR is already running!"
    docker ps | grep nemoretriever-ocr
else
    echo "NeMoRetriever OCR not found. Deploying standalone container..."
    echo "This may take several minutes on first run..."
    
    # Create cache directory
    mkdir -p ~/.cache/nim
    
    # Deploy NeMoRetriever OCR standalone on port 8001
    # Deploy Nemotron Parse standalone
    export NGC_API_KEY=<your-api-key>
    export LOCAL_NIM_CACHE=~/.cache/nim
    mkdir -p "$LOCAL_NIM_CACHE"
    docker run -d \
        --name nemotron-ocr-standalone \
        --gpus all \
        --shm-size=16GB \
        -e NGC_API_KEY \
        -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
        -u $(id -u) \
        -p 8000:8000 \
        nvcr.io/nim/nvidia/nemoretriever-ocr-v1:latest
    
    echo "NeMoRetriever OCR container started. Waiting for model to load..."
    echo "Check status with: docker logs -f nemoretriever-ocr-standalone"
fi


Checking for existing NeMoRetriever OCR deployment...
NeMoRetriever OCR not found. Deploying standalone container...
This may take several minutes on first run...


Unable to find image 'nvcr.io/nim/nvidia/nemoretriever-ocr-v1:latest' locally
latest: Pulling from nim/nvidia/nemoretriever-ocr-v1
76249c7cd503: Pulling fs layer
4f4fb700ef54: Pulling fs layer
2c012c05aa52: Pulling fs layer
e208c1addcef: Pulling fs layer
fd1d0313cdb1: Pulling fs layer
25b35e68ee46: Pulling fs layer
a5196ab7827e: Pulling fs layer
1029073a30ca: Pulling fs layer
e208c1addcef: Waiting
fd1d0313cdb1: Waiting
9a1ae0b54ca0: Pulling fs layer
25b35e68ee46: Waiting
b2653b5a2bd4: Pulling fs layer
a5196ab7827e: Waiting
9a1ae0b54ca0: Waiting
1029073a30ca: Waiting
0480cd94b346: Pulling fs layer
b2653b5a2bd4: Waiting
812e57e14e40: Pulling fs layer
2cc679c17d7f: Pulling fs layer
0480cd94b346: Waiting
812e57e14e40: Waiting
92fff3630921: Pulling fs layer
2cc679c17d7f: Waiting
92fff3630921: Waiting
16216e2fe824: Pulling fs layer
cef11905364f: Pulling fs layer
16216e2fe824: Waiting
5b3fe16cb4d1: Pulling fs layer
ba3f119ad6b1: Pulling fs layer
cef11905364f: Waiting
5b3fe16cb4d1: Waiting
823

1ecf924e797ef9f5f443f4314674a048c7ec41b9d6f0073b0aee8a63d002718c
NeMoRetriever OCR container started. Waiting for model to load...
Check status with: docker logs -f nemoretriever-ocr-standalone


### Test NeMoRetriever OCR

Once the model is loaded, you can test it with a sample image. The model accepts base64-encoded images and returns extracted text with bounding boxes.


In [30]:
import requests
import base64
from pathlib import Path

# Sample image URL for testing OCR
SAMPLE_OCR_IMAGE_URL = "https://assets.ngc.nvidia.com/products/api-catalog/nemotron-parse/example_2.jpg"

def test_nemoretriever_ocr(
    image_source: str, 
    endpoint: str = "http://localhost:8000/v1/infer"
):
    """
    Test NeMoRetriever OCR with an image.
    
    Based on: https://build.nvidia.com/nvidia/nemoretriever-ocr-v1/deploy
    
    Args:
        image_source: Either a URL (http/https) or a local file path (PNG, JPG)
        endpoint: The NeMoRetriever OCR API endpoint
    
    Returns:
        OCR results with extracted text and bounding boxes
    """
    # For URLs, download first then encode
    if image_source.startswith(("http://", "https://")):
        # Download image from URL
        img_response = requests.get(image_source, timeout=30)
        img_response.raise_for_status()
        image_data = base64.b64encode(img_response.content).decode("utf-8")
    else:
        # Read local file and encode as base64
        with open(image_source, "rb") as f:
            image_data = base64.b64encode(f.read()).decode("utf-8")
    
    # NeMoRetriever OCR expects objects with 'type' and 'url' fields
    # URL must be a data URL with base64 encoded image
    payload = {
        "input": [
            {
                "type": "image_url",
                "url": f"data:image/jpeg;base64,{image_data}"
            }
        ]
    }
    
    headers = {
        "accept": "application/json",
        "Content-Type": "application/json"
    }
    
    try:
        response = requests.post(endpoint, headers=headers, json=payload, timeout=120)
        response.raise_for_status()
        result = response.json()
        return result
    except requests.exceptions.HTTPError as e:
        # Get detailed error from response
        try:
            error_detail = response.json()
        except:
            error_detail = response.text
        return {"error": str(e), "detail": error_detail}
    except requests.exceptions.ConnectionError:
        return {"error": "Could not connect to NeMoRetriever OCR. Make sure the container is running and model is loaded."}
    except Exception as e:
        return {"error": str(e)}

# Check if service is ready
print("Checking NeMoRetriever OCR health...")
try:
    health = requests.get("http://localhost:8000/v1/health/ready", timeout=10)
    if health.status_code == 200:
        print("✓ NeMoRetriever OCR is ready!")
        print("\nTest with the sample image:")
        print(f'  result = test_nemoretriever_ocr("{SAMPLE_OCR_IMAGE_URL}")')
        print("\nOr use a local file:")
        print('  result = test_nemoretriever_ocr("./path/to/image.png")')
    else:
        print(f"Service returned status: {health.status_code}")
except requests.exceptions.ConnectionError:
    print("⏳ NeMoRetriever OCR is still loading. Please wait and try again.")
    print("   Check status with: !docker logs -f nemotron-ocr-standalone")


Checking NeMoRetriever OCR health...
✓ NeMoRetriever OCR is ready!

Test with the sample image:
  result = test_nemoretriever_ocr("https://assets.ngc.nvidia.com/products/api-catalog/nemotron-parse/example_2.jpg")

Or use a local file:
  result = test_nemoretriever_ocr("./path/to/image.png")


In [31]:
result = test_nemoretriever_ocr("https://assets.ngc.nvidia.com/products/api-catalog/nemotron-parse/example_2.jpg")

In [32]:
result

{'data': [{'index': 0,
   'text_detections': [{'text_prediction': {'text': 'Appl. Sci. 2023, 13, 12967',
      'confidence': 0.9621646701868591},
     'bounding_box': {'points': [{'x': 0.05840383097529411,
        'y': 0.06721106916666031},
       {'x': 0.20411844551563263, 'y': 0.06721106916666031},
       {'x': 0.20411844551563263, 'y': 0.07830017060041428},
       {'x': 0.05840383097529411, 'y': 0.07830017060041428}]}},
    {'text_prediction': {'text': 'oof 21', 'confidence': 0.5725537641363276},
     'bounding_box': {'points': [{'x': 0.9043284058570862,
        'y': 0.06817606836557388},
       {'x': 0.9392924308776855, 'y': 0.06817606836557388},
       {'x': 0.9392924308776855, 'y': 0.07534684240818024},
       {'x': 0.9043284058570862, 'y': 0.07534684240818024}]}},
    {'text_prediction': {'text': 'Table 1. Adopted material properties.',
      'confidence': 0.9481045938087516},
     'bounding_box': {'points': [{'x': 0.2790148854255676,
        'y': 0.11504768580198288},
       {'

### Cleanup NeMoRetriever OCR Standalone

Run this cell to stop and remove the standalone NeMoRetriever OCR container when you're done testing.


In [39]:
%%bash

# Stop and remove the standalone NeMoRetriever OCR container
if docker ps -a | grep -q "nemotron-ocr-standalone"; then
    echo "Stopping NeMoRetriever OCR standalone container..."
    docker stop nemotron-parse-standalone
    docker rm nemotron-parse-standalone
    echo "✓ NeMoRetriever OCR standalone container removed."
else
    echo "No standalone NeMoRetriever OCR container found."
fi


No standalone NeMoRetriever OCR container found.
