Skip to content

com.microsoft.Attention do_rotary flag doesn't work on apple silicon #24528

@nklskyoy

Description

@nklskyoy

Describe the issue

The com.microsoft.Attention contrib operator defines a do_rotary attribute (per its schema) but on Apple Silicon (ARM64), with onnxruntime-silicon, setting do_rotary=1 does not change the output

To reproduce

import numpy as np
import onnx
from onnx import helper, TensorProto, numpy_helper
import onnxruntime as ort

def attention_output(x, W, b, num_heads, qkv_hidden_size, do_rotary):
    # Create a minimal graph: one Attention node
    input_vi = helper.make_tensor_value_info("input", TensorProto.FLOAT, x.shape)
    output_vi = helper.make_tensor_value_info("output", TensorProto.FLOAT, x.shape)
    W_init = numpy_helper.from_array(W, name="weights")
    B_init = numpy_helper.from_array(b, name="bias")
    attn_node = helper.make_node(
        "Attention",
        inputs=["input", "weights", "bias"],
        outputs=["output"],
        domain="com.microsoft",
        num_heads=num_heads,
        unidirectional=0,
        qkv_hidden_sizes=[qkv_hidden_size]*3,
        do_rotary=do_rotary
    )
    graph = helper.make_graph(
        [attn_node],
        "MinimalAttnGraph",
        [input_vi],
        [output_vi],
        initializer=[W_init, B_init]
    )
    model = helper.make_model(
        graph,
        opset_imports=[helper.make_operatorsetid("com.microsoft", 1)]
    )

    sess = ort.InferenceSession(model.SerializeToString(),
                                providers=['CPUExecutionProvider'])
    return sess.run(None, {"input": x})[0]

if __name__ == "__main__":
    # Dummy data
    batch, seq_len, in_hid = 1, 5, 8
    num_heads, head_size = 4, 2
    hidden_size = num_heads * head_size

    x = np.random.rand(batch, seq_len, in_hid).astype(np.float32)
    W = np.random.rand(in_hid, hidden_size * 3).astype(np.float32)
    b = np.random.rand(hidden_size * 3).astype(np.float32)

    out0 = attention_output(x, W, b, num_heads, hidden_size, do_rotary=0)
    out1 = attention_output(x, W, b, num_heads, hidden_size, do_rotary=1)

    diff = np.linalg.norm(out0 - out1)
    print(f"Norm difference (do_rotary=0 vs 1): {diff:.6f}")

    # If outputs are identical, error out
    if diff == 0.0:
        raise RuntimeError(
            "do_rotary flag was ignored by the CPU Execution Provider: "
            "outputs are identical (norm difference == 0)"
        )

    print("Success: do_rotary had an effect on the outputs.")

Urgency

No response

Platform

Mac

OS Version

macOS 15.4 (24E248)

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.16.3

ONNX Runtime API

Python

Architecture

ARM64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleissues that have not been addressed in a while; categorized by a bot

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions