Llama3.2-1B ONNX Graph generated by olive auto-opt fails to run on DirectML execution provider

The olive optimized onnx graph of Llama3.2-1B generated using the olive auto-opt command in the [getting started example](https://microsoft.github.io/Olive/getting-started/getting-started.html) is failing to run on the DirectML execution provider with the following error: 
```
2025-05-29 15:58:14.0223186 [E:onnxruntime:onnxruntime-genai, inference_session.cc:2094 onnxruntime::InferenceSession::Initialize] This session cannot use the graph capture feature as requested by the user  as all compute graph nodes have not been partitioned to the DmlExecutionProvider
Traceback (most recent call last):
  File "C:\Users\rashi\dev\rai14\RyzenAI-SW-14\llm\gpu_run\run_model.py", line 140, in <module>
    model = model_load(args.model_dir)
  File "C:\Users\rashi\dev\rai14\RyzenAI-SW-14\llm\gpu_run\run_model.py", line 31, in model_load
    model = og.Model(model_dir)
RuntimeError: This session cannot use the graph capture feature as requested by the user  as all compute graph nodes have not been partitioned to the DmlExecutionProvider
```
Could you please assist in debugging this error? 


I used the same commands in the getting started example changing only 2 arguments (device, provider) to run on DML EP : 

Optimizing the model runs successfully:
```
olive auto-opt \
    --model_name_or_path meta-llama/Llama-3.2-1B-Instruct \
    --trust_remote_code \
    --output_path models/llama/ao \
    --device gpu\
    --provider DmlExecutionProvider \
    --use_ort_genai \
    --precision int4 \
    --log_level 1
```

Script used to run the model fails with the error pasted above:
```
import onnxruntime_genai as og

def model_load(model_dir : str):
    
    model = og.Model(model_dir)

    return model

def get_tokenizer(model):
    tokenizer = og.Tokenizer(model)
    tokenizer_stream = tokenizer.create_stream()

    return tokenizer, tokenizer_stream



def run(): 
    
    prompt = "The sun is shining outside."
    
    model = "path/to/olive/optimized/model"

    tokenizer, tokenizer_stream = get_tokenizer(model)

    input_tokens = tokenizer.encode(prompt)
    
    search_options = {}
    search_options['do_sample'] = True

    params = og.GeneratorParams(model)
    params.set_search_options(**search_options)

    print("Creating generator")
    generator = og.Generator(model, params)

    generator.append_tokens(input_tokens)
    
    num_tokens = 0

    try:
        while not generator.is_done():
            generator.generate_next_token()
            
            new_token = generator.get_next_tokens()[0]
            
            print(tokenizer_stream.decode(new_token), end='', flush=True)

            num_tokens += 1

    except KeyboardInterrupt:
        print("  --control+c pressed, aborting generation--")

run()
```

System configuration: 

- **HW:** AMD Ryzen AI 9 365 w/ Radeon 880M 
- **GPU Driver:** 32.0.21001.9024
- **OS:** Windows 11

Conda Environment:
```
Package                    Version
-------------------------- -----------
accelerate                 1.7.0
alembic                    1.16.1
annotated-types            0.7.0
bitsandbytes               0.46.0
certifi                    2025.4.26
charset-normalizer         3.4.2
colorama                   0.4.6
coloredlogs                15.0.1
colorlog                   6.9.0
filelock                   3.18.0
flatbuffers                25.2.10
fsspec                     2025.5.1
greenlet                   3.2.2
huggingface-hub            0.32.2
humanfriendly              10.0
idna                       3.10
Jinja2                     3.1.6
lightning-utilities        0.14.3
Mako                       1.3.10
MarkupSafe                 3.0.2
ml_dtypes                  0.5.1
mpmath                     1.3.0
networkx                   3.4.2
numpy                      1.26.4
olive-ai                   0.10.0.dev0
onnx                       1.18.0
onnxruntime-directml       1.22.0
onnxruntime-genai-directml 0.8.0
onnxscript                 0.2.7
optimum                    1.25.3
optuna                     4.3.0
packaging                  25.0
pandas                     2.2.3
peft                       0.15.2
pip                        25.1
protobuf                   6.31.1
psutil                     7.0.0
pydantic                   2.11.5
pydantic_core              2.33.2
pyreadline3                3.5.4
python-dateutil            2.9.0.post0
pytz                       2025.2
PyYAML                     6.0.2
regex                      2024.11.6
requests                   2.32.3
safetensors                0.5.3
scipy                      1.15.3
setuptools                 78.1.1
six                        1.17.0
SQLAlchemy                 2.0.41
sympy                      1.14.0
tokenizers                 0.19.1
tomli                      2.2.1
torch                      2.7.0
torchmetrics               1.7.2
tqdm                       4.67.1
transformers               4.44.2
typing_extensions          4.13.2
typing-inspection          0.4.1
tzdata                     2025.2
urllib3                    2.4.0
wheel                      0.45.1
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Llama3.2-1B ONNX Graph generated by olive auto-opt fails to run on DirectML execution provider #1892

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Llama3.2-1B ONNX Graph generated by olive auto-opt fails to run on DirectML execution provider #1892

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions