-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Closed as not planned
Closed as not planned
Copy link
Labels
staleissues that have not been addressed in a while; categorized by a botissues that have not been addressed in a while; categorized by a bot
Description
Describe the issue
The com.microsoft.Attention
contrib operator defines a do_rotary
attribute (per its schema) but on Apple Silicon (ARM64), with onnxruntime-silicon, setting do_rotary=1
does not change the output
To reproduce
import numpy as np
import onnx
from onnx import helper, TensorProto, numpy_helper
import onnxruntime as ort
def attention_output(x, W, b, num_heads, qkv_hidden_size, do_rotary):
# Create a minimal graph: one Attention node
input_vi = helper.make_tensor_value_info("input", TensorProto.FLOAT, x.shape)
output_vi = helper.make_tensor_value_info("output", TensorProto.FLOAT, x.shape)
W_init = numpy_helper.from_array(W, name="weights")
B_init = numpy_helper.from_array(b, name="bias")
attn_node = helper.make_node(
"Attention",
inputs=["input", "weights", "bias"],
outputs=["output"],
domain="com.microsoft",
num_heads=num_heads,
unidirectional=0,
qkv_hidden_sizes=[qkv_hidden_size]*3,
do_rotary=do_rotary
)
graph = helper.make_graph(
[attn_node],
"MinimalAttnGraph",
[input_vi],
[output_vi],
initializer=[W_init, B_init]
)
model = helper.make_model(
graph,
opset_imports=[helper.make_operatorsetid("com.microsoft", 1)]
)
sess = ort.InferenceSession(model.SerializeToString(),
providers=['CPUExecutionProvider'])
return sess.run(None, {"input": x})[0]
if __name__ == "__main__":
# Dummy data
batch, seq_len, in_hid = 1, 5, 8
num_heads, head_size = 4, 2
hidden_size = num_heads * head_size
x = np.random.rand(batch, seq_len, in_hid).astype(np.float32)
W = np.random.rand(in_hid, hidden_size * 3).astype(np.float32)
b = np.random.rand(hidden_size * 3).astype(np.float32)
out0 = attention_output(x, W, b, num_heads, hidden_size, do_rotary=0)
out1 = attention_output(x, W, b, num_heads, hidden_size, do_rotary=1)
diff = np.linalg.norm(out0 - out1)
print(f"Norm difference (do_rotary=0 vs 1): {diff:.6f}")
# If outputs are identical, error out
if diff == 0.0:
raise RuntimeError(
"do_rotary flag was ignored by the CPU Execution Provider: "
"outputs are identical (norm difference == 0)"
)
print("Success: do_rotary had an effect on the outputs.")
Urgency
No response
Platform
Mac
OS Version
macOS 15.4 (24E248)
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.16.3
ONNX Runtime API
Python
Architecture
ARM64
Execution Provider
Default CPU
Execution Provider Library Version
No response
Metadata
Metadata
Assignees
Labels
staleissues that have not been addressed in a while; categorized by a botissues that have not been addressed in a while; categorized by a bot