# Customizing Llama-3.3-Nemotron-Super-49B-v1.5 with Nemo Customizer

This notebook demonstrates how to easily fine-tune the Llama-3.3-Nemotron-Super-49B-v1.5 model using NeMo Customizer. It provides a minimal, end-to-end example that highlights how simple and streamlined the customization process is—from setup to fine-tuning and inference.

## Step 0. Install Prerequisites

In [1]:
import os
from getpass import getpass

os.environ['NGC_API_KEY'] = getpass("Enter your NGC API Key")

Enter your NGC API Key ········


In [2]:
os.environ['HUGGINGFACE_HUB_TOKEN'] = getpass("Enter your Huggingface token")

Enter your Huggingface token ········


In [3]:
%%bash
chmod +x ./deploy-nmp-2510_nemotron.sh
./deploy-nmp-2510_nemotron.sh --progress

[1;32m[INFO][0m Starting NeMo Microservices deployment...
[1;32m[INFO][0m Detailed logs will be written to: /tmp/nemo-deploy.log

[1;32m[INFO][0m Step 1/8: Checking prerequisites [██████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 12%
[1;32m[INFO][0m Step 2/8: Downloading Helm chart [████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 25%
[1;32m[INFO][0m Step 3/8: Starting Minikube [██████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░] 37%
[1;32m[INFO][0m Step 4/8: Setting up NGC and Helm [█████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░] 50%
[1;32m[INFO][0m Step 5/8: Installing NeMo microservices [███████████████████████████████░░░░░░░░░░░░░░░░░░░] 62%
[1;32m[INFO][0m Step 6/8: Waiting for pods [█████████████████████████████████████░░░░░░░░░░░░░] 75%
[1;32m[INFO][0m Step 7/8: Checking pod health [███████████████████████████████████████████░░░░░░░] 87%
[1;32m[INFO][0m Step 8/8: Configuring DNS [██████████████████████████████████████████████████] 100%
[1;32m[INFO]

In [15]:
%%bash
source .venv/bin/activate
python -m ensurepip --upgrade
python -m pip install --upgrade pip setuptools wheel
pip install nemo-microservices==1.1.0 huggingface-hub==0.34.4

Looking in links: /tmp/tmp7ur17iwq
Collecting pip
  Downloading pip-25.2-py3-none-any.whl.metadata (4.7 kB)
Collecting wheel
  Downloading wheel-0.45.1-py3-none-any.whl.metadata (2.3 kB)
Downloading pip-25.2-py3-none-any.whl (1.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m41.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading wheel-0.45.1-py3-none-any.whl (72 kB)
Installing collected packages: wheel, pip
  Attempting uninstall: pip
    Found existing installation: pip 25.0.1
    Uninstalling pip-25.0.1:
      Successfully uninstalled pip-25.0.1
Successfully installed pip-25.2 wheel-0.45.1
/home/ubuntu/.venv/bin/pip
Collecting nemo-microservices==1.1.0
  Using cached nemo_microservices-1.1.0-py3-none-any.whl.metadata (17 kB)
Collecting huggingface-hub==0.34.4
  Using cached huggingface_hub-0.34.4-py3-none-any.whl.metadata (14 kB)
Collecting distro<2,>=1.7.0 (from nemo-microservices==1.1.0)
  Downloading distro-1.9.0-py3-none-any.whl.metadata (6.8

## Step 1. Initilize Client

In [16]:
from nemo_microservices import NeMoMicroservices

# Configure microservice host URLs
NEMO_BASE_URL = "http://nemo.test"
NIM_BASE_URL = "http://nim.test"
DATA_STORE_BASE_URL = "http://data-store.test"

# Initialize the client
nemo_client = NeMoMicroservices(
    base_url=NEMO_BASE_URL,
    inference_base_url=NIM_BASE_URL
)

## Step 2. Upload Data

In [18]:
from huggingface_hub import HfApi, upload_file

# Define entity details
NAMESPACE = "nemotron-tutorial"
DATASET_NAME = "example-dataset"

# Initialize HF API client
hf_api = HfApi(endpoint=f"{DATA_STORE_BASE_URL}/v1/hf", token="")

# Create dataset repo in datastore
repo_id = f"{NAMESPACE}/{DATASET_NAME}"
hf_api.create_repo(repo_id , repo_type="dataset")

# Upload the dataset
hf_api.upload_file(
      repo_type="dataset",
      repo_id=repo_id,
      revision="main",
      path_or_fileobj="./dataset/training.jsonl",
      path_in_repo="training/training.jsonl" 
)

hf_api.upload_file(
      repo_type="dataset",
      repo_id=repo_id,
      revision="main",
      path_or_fileobj="./dataset/validation.jsonl",
      path_in_repo="validation/validation.jsonl" 
)

hf_api.upload_file(
      repo_type="dataset",
      repo_id=repo_id,
      revision="main",
      path_or_fileobj="./dataset/testing.jsonl",
      path_in_repo="testing/testing.jsonl" 
)

  from .autonotebook import tqdm as notebook_tqdm
training.jsonl: 100%|██████████| 633k/633k [00:00<00:00, 66.8MB/s]
validation.jsonl: 100%|██████████| 77.5k/77.5k [00:00<00:00, 10.9MB/s]
testing.jsonl: 100%|██████████| 82.6k/82.6k [00:00<00:00, 16.4MB/s]


CommitInfo(commit_url='', commit_message='Upload testing/testing.jsonl with huggingface_hub', commit_description='', oid='fe943486843c86e9077f6e6a9049d72d18734779', pr_url=None, repo_url=RepoUrl('', endpoint='https://huggingface.co', repo_type='model', repo_id=''), pr_revision=None, pr_num=None)

In [19]:
# Register Dataset in NeMo Entity Store
response = nemo_client.datasets.create(
    name=DATASET_NAME,
    namespace=NAMESPACE,
    description="test dataset",
    files_url=f"hf://datasets/{NAMESPACE}/{DATASET_NAME}",
    project="customizer-tutorial",
    custom_fields={},
)
print(response)

Dataset(files_url='hf://datasets/nemotron-tutorial/example-dataset', id='dataset-GmW7sSZDkaNUjKMAvE9Hqx', created_at=datetime.datetime(2025, 10, 22, 0, 21, 2, 168799), custom_fields={}, description='test dataset', format=None, hf_endpoint=None, limit=None, name='example-dataset', namespace='nemotron-tutorial', project='customizer-tutorial', split=None, updated_at=datetime.datetime(2025, 10, 22, 0, 21, 2, 168802))


## Step 3. Run Fine Tuning

In [20]:
# Get all customization configurations
configs = nemo_client.customization.configs.list()

print(f"Found {len(configs.data)} configurations")
for config in configs.data:
    print(f"Config namespace: {config.namespace}")
    print(f"Config name: {config.name}")
    print(f"  Training options: {len(config.training_options)}")
    for option in config.training_options:
        print(f"    - {option.training_type}/{option.finetuning_type}: {option.num_gpus} GPUs")

Found 1 configurations
Config namespace: nvidia
Config name: nemotron-super-llama-3.3-49b@v1.5+A100
  Training options: 1
    - sft/lora: 4 GPUs


In [21]:
# OPTIONAL: set up WANDB key if you have it
os.environ['WANDB_API_KEY'] = getpass("Enter your WandB API Key")

Enter your WandB API Key ········


Create fine tuning job

In [24]:
# Set up WandB API key for enhanced visualization
extra_headers = {}
if os.getenv('WANDB_API_KEY'):
    extra_headers['wandb-api-key'] = os.getenv('WANDB_API_KEY')

# Create a customization job with W&B integration
job = nemo_client.customization.jobs.create(
    config="nvidia/nemotron-super-llama-3.3-49b@v1.5+A100",
    dataset={
        "name": DATASET_NAME,
        "namespace": NAMESPACE
    },
    hyperparameters={
        "training_type": "sft",
        "finetuning_type": "lora",
        "epochs": 1,
        "batch_size": 16,
        "learning_rate": 0.0001,
        "lora": {
            "adapter_dim": 8
        }
    },
    output_model="nvidia/nemotron-super-lora@v1",
    extra_headers=extra_headers
)

print(f"Created job with W&B integration:")
print(f"Job ID: {job.id}")
print(f"Status: {job.status}")

Created job with W&B integration:
Job ID: cust-ULHUmRxu6kW8YekzzufuFv
Status: created


In [27]:
nemo_client.customization.jobs.status(job.id)

CustomizationStatusDetails(created_at=datetime.datetime(2025, 10, 22, 0, 25, 36, 464597), status='running', updated_at=datetime.datetime(2025, 10, 22, 0, 25, 36, 464597), best_epoch=None, elapsed_time=0.0, epochs_completed=0, metrics=None, percentage_done=0.0, status_logs=[StatusLog(updated_at=datetime.datetime(2025, 10, 22, 0, 25, 36), detail=None, message='PVCCreated'), StatusLog(updated_at=datetime.datetime(2025, 10, 22, 0, 25, 36), detail=None, message='EntityHandler_0_Created'), StatusLog(updated_at=datetime.datetime(2025, 10, 22, 0, 25, 36, 464597), detail=None, message='created'), StatusLog(updated_at=datetime.datetime(2025, 10, 22, 0, 25, 36, 464597), detail='The training job is pending', message='TrainingJobPending'), StatusLog(updated_at=datetime.datetime(2025, 10, 22, 0, 25, 46), detail=None, message='EntityHandler_0_Pending'), StatusLog(updated_at=datetime.datetime(2025, 10, 22, 0, 25, 46), detail=None, message='EntityHandler_0_Completed'), StatusLog(updated_at=datetime.dat

In [53]:
import time

while True:
    status = nemo_client.customization.jobs.status(job.id)
    if status.status == "completed" or status.status == "failed":
        break
    time.sleep(5)

print(status.status)

completed


wait for training to complete before moving to inference

## Step 4. Run Inference

In [29]:
# Deploying base model NIM with Nemo Deployment Management Service
deployment = nemo_client.deployment.model_deployments.create(
    name="nemotron-super-llama-3.3-49b-v1.5",
    namespace="nvidia",
    config={
        "model": "nvidia/nemotron-super-llama-3.3-49b-v1.5",
        "nim_deployment": {
            "image_name": "nvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1.5",
            "image_tag": "1.13.1",
            "pvc_size": "200Gi",
            "gpu": 4,
            "additional_envs": {
                "NIM_GUIDED_DECODING_BACKEND": "outlines"
            }
        }
    }
)
print(deployment)

ModelDeployment(config=DeploymentConfig(created_at=None, custom_fields=None, description=None, external_endpoint=None, model='nvidia/nemotron-super-llama-3.3-49b-v1.5', name=None, namespace=None, nim_deployment=NIMDeploymentConfig(gpu=4, image_name='nvcr.io/nim/nvidia/llama-3.3-nemotron-super-49b-v1.5', image_tag='1.13.1', additional_envs={'NIM_GUIDED_DECODING_BACKEND': 'outlines'}, disable_lora_support=False, namespace=None, pvc_size='200Gi'), ownership=None, project=None, schema_version=None, updated_at=None), status_details=ModelDeploymentStatusDetails(status='pending', description='Model deployment created'), url='', async_enabled=False, created_at=datetime.datetime(2025, 10, 22, 1, 0, 2, 167944, tzinfo=TzInfo(0)), custom_fields=None, deployed=False, description=None, models=None, name='nemotron-super-llama-3.3-49b-v1.5', namespace='nvidia', ownership=None, project=None, schema_version=None, updated_at=None)


In [30]:
# Using the deployment object from the previous step
deployment_status = nemo_client.deployment.model_deployments.retrieve(
    namespace=deployment.namespace,
    deployment_name=deployment.name
)
print(deployment_status.status_details)

ModelDeploymentStatusDetails(status='ready', description='deployment "modeldeployment-nvidia-nemotron-super-llama-3-3-49b-v1-5" successfully rolled out\n')


Wait until the deployment status becomes 'ready' before proceeding. deploying larger model takes 10-20 mins

In [31]:
# list all available NIMs for inference by their IDs
available_nims = nemo_client.inference.models.list()
for nim in available_nims.data:
    print(nim.id)

nvidia/nemotron-super-llama-3.3-49b-v1.5
nvidia/nemotron-super-lora@v1


In [52]:
# Inference with base model
response = nemo_client.chat.completions.create(
    model="nvidia/nemotron-super-llama-3.3-49b-v1.5",
    messages=[
        {"role":"system", "content":"/no_think"}, 
        {"role":"user", "content":"How many 'r's are in 'strawberry'?"}
    ],
    temperature=0.7,
    max_tokens=200,
    stream=False
)
print(response.choices[0].message.content)

Let's go through the word **"strawberry"** step by step to count how many **'r's** are in it.

### Step 1: Write out the word clearly
**strawberry**

### Step 2: Break it down letter by letter
**s - t - r - a - w - b - e - r - r - y**

### Step 3: Identify and count the 'r's
Let's go through each letter and note when we see an **'r'**:

- s → not 'r'
- t → not 'r'
- **r** → 1st 'r'
- a → not 'r'
- w → not 'r'
- b → not 'r'
- e → not 'r'
- **r** → 2nd 'r'
- **r** → 3rd 'r'
- y → not 'r'

### Final Answer:
There are **3 '


In [51]:
# Inference with fined-tuned model
response = nemo_client.chat.completions.create(
    model=job.output_model,
    messages=[
        {"role":"system", "content":"/no_think"}, 
        {"role":"user", "content":"What institutional structures still exist from medieval times?"}
    ],
    temperature=0.7,
    max_tokens=200,
    stream=False
)
print(response.choices[0].message.content)

Many institutional structures that originated in medieval times continue to exist today, often in evolved or adapted forms. These institutions have played significant roles in shaping modern society in areas such as governance, religion, education, law, and social organization. Here are some key examples:

---

### 1. **Monarchies**
- **Description**: While many medieval monarchies were absolute, some have evolved into constitutional monarchies.
- **Examples**:
  - United Kingdom (House of Windsor)
  - Japan (House of Yamato, one of the oldest continuous hereditary monarchies)
  - Sweden, Norway, Denmark, Netherlands, Belgium, and others in Europe
  - Saudi Arabia (House of Saud, with roots in the 18th century but with strong traditional authority)
- **Modern Role**: Primarily symbolic or ceremonial in most Western countries, but still influential in some others.

---

### 2. **The Catholic Church**
- **Description**: The Roman Catholic Church was a dominant institution in
