Skip to content

Llama3.2-1B ONNX Graph generated by olive auto-opt fails to run on DirectML execution provider #1892

Open
@RachidFK

Description

@RachidFK

The olive optimized onnx graph of Llama3.2-1B generated using the olive auto-opt command in the getting started example is failing to run on the DirectML execution provider with the following error:

2025-05-29 15:58:14.0223186 [E:onnxruntime:onnxruntime-genai, inference_session.cc:2094 onnxruntime::InferenceSession::Initialize] This session cannot use the graph capture feature as requested by the user  as all compute graph nodes have not been partitioned to the DmlExecutionProvider
Traceback (most recent call last):
  File "C:\Users\rashi\dev\rai14\RyzenAI-SW-14\llm\gpu_run\run_model.py", line 140, in <module>
    model = model_load(args.model_dir)
  File "C:\Users\rashi\dev\rai14\RyzenAI-SW-14\llm\gpu_run\run_model.py", line 31, in model_load
    model = og.Model(model_dir)
RuntimeError: This session cannot use the graph capture feature as requested by the user  as all compute graph nodes have not been partitioned to the DmlExecutionProvider

Could you please assist in debugging this error?

I used the same commands in the getting started example changing only 2 arguments (device, provider) to run on DML EP :

Optimizing the model runs successfully:

olive auto-opt \
    --model_name_or_path meta-llama/Llama-3.2-1B-Instruct \
    --trust_remote_code \
    --output_path models/llama/ao \
    --device gpu\
    --provider DmlExecutionProvider \
    --use_ort_genai \
    --precision int4 \
    --log_level 1

Script used to run the model fails with the error pasted above:

import onnxruntime_genai as og

def model_load(model_dir : str):
    
    model = og.Model(model_dir)

    return model

def get_tokenizer(model):
    tokenizer = og.Tokenizer(model)
    tokenizer_stream = tokenizer.create_stream()

    return tokenizer, tokenizer_stream



def run(): 
    
    prompt = "The sun is shining outside."
    
    model = "path/to/olive/optimized/model"

    tokenizer, tokenizer_stream = get_tokenizer(model)

    input_tokens = tokenizer.encode(prompt)
    
    search_options = {}
    search_options['do_sample'] = True

    params = og.GeneratorParams(model)
    params.set_search_options(**search_options)

    print("Creating generator")
    generator = og.Generator(model, params)

    generator.append_tokens(input_tokens)
    
    num_tokens = 0

    try:
        while not generator.is_done():
            generator.generate_next_token()
            
            new_token = generator.get_next_tokens()[0]
            
            print(tokenizer_stream.decode(new_token), end='', flush=True)

            num_tokens += 1

    except KeyboardInterrupt:
        print("  --control+c pressed, aborting generation--")

run()

System configuration:

  • HW: AMD Ryzen AI 9 365 w/ Radeon 880M
  • GPU Driver: 32.0.21001.9024
  • OS: Windows 11

Conda Environment:

Package                    Version
-------------------------- -----------
accelerate                 1.7.0
alembic                    1.16.1
annotated-types            0.7.0
bitsandbytes               0.46.0
certifi                    2025.4.26
charset-normalizer         3.4.2
colorama                   0.4.6
coloredlogs                15.0.1
colorlog                   6.9.0
filelock                   3.18.0
flatbuffers                25.2.10
fsspec                     2025.5.1
greenlet                   3.2.2
huggingface-hub            0.32.2
humanfriendly              10.0
idna                       3.10
Jinja2                     3.1.6
lightning-utilities        0.14.3
Mako                       1.3.10
MarkupSafe                 3.0.2
ml_dtypes                  0.5.1
mpmath                     1.3.0
networkx                   3.4.2
numpy                      1.26.4
olive-ai                   0.10.0.dev0
onnx                       1.18.0
onnxruntime-directml       1.22.0
onnxruntime-genai-directml 0.8.0
onnxscript                 0.2.7
optimum                    1.25.3
optuna                     4.3.0
packaging                  25.0
pandas                     2.2.3
peft                       0.15.2
pip                        25.1
protobuf                   6.31.1
psutil                     7.0.0
pydantic                   2.11.5
pydantic_core              2.33.2
pyreadline3                3.5.4
python-dateutil            2.9.0.post0
pytz                       2025.2
PyYAML                     6.0.2
regex                      2024.11.6
requests                   2.32.3
safetensors                0.5.3
scipy                      1.15.3
setuptools                 78.1.1
six                        1.17.0
SQLAlchemy                 2.0.41
sympy                      1.14.0
tokenizers                 0.19.1
tomli                      2.2.1
torch                      2.7.0
torchmetrics               1.7.2
tqdm                       4.67.1
transformers               4.44.2
typing_extensions          4.13.2
typing-inspection          0.4.1
tzdata                     2025.2
urllib3                    2.4.0
wheel                      0.45.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions