# Query the Nemotron Nano 12B v2 VL API

This notebook demonstrates how to interact with the NVIDIA Nemotron Nano 12B v2 VL API using Python. 
It covers launching the NIM container, sending basic vision requests, handling streaming, using reasoning capabilities, and performing text-only queries.

Reference: [NVIDIA Docs](https://docs.nvidia.com/nim/vision-language-models/latest/examples/nemotron-nano-12b-v2-vl/api.html)

## 1. Launch NIM

Before running the Python code, you need to launch the Nemotron Nano 12B NIM container. 
Run the following command in your terminal:

```bash
# Choose a container name for bookkeeping 
export CONTAINER_NAME="nvidia-nemotron-nano-12b-v2-vl" 
# The container name from the previous ngc registry image list command 
Repository="nemotron-nano-12b-v2-vl"
Latest_Tag="1.5.0"

# Choose a VLM NIM Image from NGC 
export IMG_NAME="nvcr.io/nim/nvidia/${Repository}:${Latest_Tag}" 

# Choose a path on your system to cache the downloaded models 
export LOCAL_NIM_CACHE=~/.cache/nim 
mkdir -p "$LOCAL_NIM_CACHE" 

# Start the VLM NIM 
docker run -it --rm --name=$CONTAINER_NAME \
  --runtime=nvidia \
  --gpus all \
  --shm-size=32GB \
  -e NGC_API_KEY=$NGC_API_KEY \
  -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
  -u $(id -u) \
  -p 8000:8000 \
  $IMG_NAME
```

## 2. Install Dependencies

In [None]:
!pip install openai requests

## 3. Basic Vision Query

Send a request with an image URL to describe the image.

In [None]:
from openai import OpenAI

client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used")

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What is in this image?"
            },
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
                }
            }
        ]
    }
]

chat_response = client.chat.completions.create(
    model="nvidia/nemotron-nano-12b-v2-vl",
    messages=messages,
    max_tokens=1024,
    stream=False,
)

assistant_message = chat_response.choices[0].message
print(assistant_message.content)

## 4. Streaming Response

Handle streaming response for real-time output.

In [None]:
stream = client.chat.completions.create(
    model="nvidia/nemotron-nano-12b-v2-vl",
    messages=messages,
    max_tokens=1024,
    stream=True,
)

print("Streaming response:")
for chunk in stream:
    delta = chunk.choices[0].delta
    if delta and delta.content:
        print(delta.content, end="", flush=True)
print()

## 5. Reasoning Capability

Enable reasoning by adding `/think` to the system prompt (NOTE: Video inputs do not support reasoning).

In [None]:
reasoning_messages = [
    { "role": "system", "content": "/think" },
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What is in this image?"
            },
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
                }
            }
        ]
    }
]

chat_response = client.chat.completions.create(
    model="nvidia/nemotron-nano-12b-v2-vl",
    messages=reasoning_messages,
    max_tokens=4096,
    stream=False,
)

print(chat_response.choices[0].message.content)

## 6. Text-only Queries

The model can also be used as a standard LLM for text-only tasks.

In [None]:
text_messages = [
    { "role": "system", "content": "You are a helpful assistant" },
    { "role": "user", "content": "Create a detailed itinerary for a week-long adventure trip through Southeast Asia." }
]

chat_response = client.chat.completions.create(
    model="nvidia/nemotron-nano-12b-v2-vl",
    messages=text_messages,
    max_tokens=4096,
    stream=False,
)

print(chat_response.choices[0].message.content)