# Query the Nemotron Nano 12B v2 VL API

This notebook demonstrates how to interact with the NVIDIA Nemotron Nano 12B v2 VL API using Python. 
It covers launching the NIM container, sending basic vision requests, handling streaming, using reasoning capabilities, and performing text-only queries.

Reference: [NVIDIA Docs](https://docs.nvidia.com/nim/vision-language-models/latest/examples/nemotron-nano-12b-v2-vl/api.html)

## 1. Launch NIM

Before running the Python code, you need to launch the Nemotron Nano 12B NIM container. 
Run the following command in your terminal:

```bash
# Choose a container name for bookkeeping 
export CONTAINER_NAME="nvidia-nemotron-nano-12b-v2-vl" 
# The container name from the previous ngc registry image list command 
Repository="nemotron-nano-12b-v2-vl"
Latest_Tag="1.5.0"

# Choose a VLM NIM Image from NGC 
export IMG_NAME="nvcr.io/nim/nvidia/${Repository}:${Latest_Tag}" 

# Choose a path on your system to cache the downloaded models 
export LOCAL_NIM_CACHE=~/.cache/nim 
mkdir -p "$LOCAL_NIM_CACHE" 

# Start the VLM NIM 
docker run -it --rm --name=$CONTAINER_NAME \
  --runtime=nvidia \
  --gpus all \
  --shm-size=32GB \
  -e NGC_API_KEY=$NGC_API_KEY \
  -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
  -u $(id -u) \
  -p 8000:8000 \
  $IMG_NAME
```

## 2. Install Dependencies

In [None]:
!pip install openai requests

## 3. Basic Vision Query

Send a request with an image URL to describe the image.

In [2]:
from openai import OpenAI

client = OpenAI(base_url="http://0.0.0.0:8000/v1", api_key="not-used")

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What is in this image?"
            },
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
                }
            }
        ]
    }
]

chat_response = client.chat.completions.create(
    model="nvidia/nemotron-nano-12b-v2-vl",
    messages=messages,
    max_tokens=256,
    stream=False,
)

assistant_message = chat_response.choices[0].message
print(assistant_message.content)

This image captures a serene natural landscape, likely taken in a rural or semi-rural area. The boardwalk is a prominent feature, suggesting it's a place where people can walk and enjoy the natural surroundings, possibly for recreational or educational purposes. The tall grasses indicate a wetland or meadow ecosystem, which could be home to a variety of wildlife. The clear sky and lighting conditions suggest the photo was taken on a day with good weather, possibly in the morning or late afternoon when the light is softer. The absence of people and animals gives the scene a peaceful and untouched quality. Overall, the image conveys a sense of tranquility and the beauty of natural landscapes.



## 4. Streaming Response

Handle streaming response for real-time output.

In [3]:
stream = client.chat.completions.create(
    model="nvidia/nemotron-nano-12b-v2-vl",
    messages=messages,
    max_tokens=1024,
    stream=True,
)

print("Streaming response:")
for chunk in stream:
    delta = chunk.choices[0].delta
    if delta and delta.content:
        print(delta.content, end="", flush=True)
print()

Streaming response:
This image shows a wooden boardwalk extending from the bottom of the frame into a grassy field. The boardwalk is bordered by tall, green grass, with some golden and reddish-brown plants visible on the right side. The grass is lush and extends to the horizon, where a line of trees marks the landscape's edge. Above, the sky is a vibrant blue, dotted with wispy white clouds. The scene is peaceful and untouched, conveying a sense of calm and natural beauty.



## 5. Reasoning Capability

Enable reasoning by adding `/think` to the system prompt (NOTE: Video inputs do not support reasoning).

In [4]:
reasoning_messages = [
    { "role": "system", "content": "/think" },
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What is in this image?"
            },
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
                }
            }
        ]
    }
]

chat_response = client.chat.completions.create(
    model="nvidia/nemotron-nano-12b-v2-vl",
    messages=reasoning_messages,
    max_tokens=4096,
    stream=False,
)

print(chat_response.choices[0].message.content)

The image shows a serene natural landscape with a wooden boardwalk (or pier) extending through a field of tall, green grass. The grassy area is surrounded by trees and bushes, with some vegetation having reddish or orange hues. The sky is bright blue with scattered white clouds, suggesting a sunny day. The scene appears to be a wetland or meadow environment, with the boardwalk providing a path through the lush greenery.  
In summary, the image contains a wooden boardwalk, tall green grass, trees/bushes in the background, and a clear blue sky with clouds.  
The answer is A wooden boardwalk through a field of tall green grass, with trees and bushes in the background under a bright blue sky with clouds.


## 6. Text-only Queries

The model can also be used as a standard LLM for text-only tasks.

In [5]:
text_messages = [
    { "role": "system", "content": "You are a helpful assistant" },
    { "role": "user", "content": "Create a detailed itinerary for a week-long adventure trip through Southeast Asia." }
]

chat_response = client.chat.completions.create(
    model="nvidia/nemotron-nano-12b-v2-vl",
    messages=text_messages,
    max_tokens=4096,
    stream=False,
)

print(chat_response.choices[0].message.content)

**Southeast Asia Adventure Itinerary: Week-Long Trip**

**Day 1: Bangkok, Thailand**  
- **Morning**: Arrival and check-in at a boutique hotel.  
- **Afternoon**: Visit the Grand Palace and Wat Pho (Temple of the Reclining Buddha).  
- **Evening**: Sunset boat ride along the Chao Phraya River, followed by a dinner at a riverside restaurant.  

**Day 2: Bangkok**  
- **Morning**: Explore Wat Arun (Temple of Dawn) and the historic Thonburi district.  
- **Afternoon**: Day trip to Ayutthaya (UNESCO World Heritage Site) to explore ancient temples.  
- **Evening**: Return to Bangkok; enjoy a street food tour in Khlong Toei or Sukhumvit.  

**Day 3: Bangkok to Chiang Mai, Thailand**  
- **Morning**: Flight to Chiang Mai (2.5 hours). Check-in at a hillside resort.  
- **Afternoon**: Visit the historic Old City, including Wat Phra Singh and the Three Kings Monument.  
- **Evening**: Relax at a local night market or enjoy a traditional Thai massage.  

**Day 4: Chiang Mai (Nature & Culture)**  