-
Notifications
You must be signed in to change notification settings - Fork 66
[IR] Implement __buffer__
for tensors
#2241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a new tofile method to enable writing tensor content as bytes to a file-like object. It updates both the protocol definition in _protocols.py and provides concrete implementations in _core.py for tensors with different data layouts.
- Added the tofile method definition in the tensor protocol.
- Implemented two tofile methods in _core.py: one for invoking tobytes() for specially handled dtypes and another for writing raw byte content.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
onnxscript/ir/_protocols.py | Added tofile method signature to the tensor protocol. |
onnxscript/ir/_core.py | Implemented tofile methods to handle tensor serialization, including handling of non-native dtypes and raw data writing. |
❌ 8 Tests Failed:
View the top 3 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
WIP: before
|
For some reason ndarray to file is really slow
|
With memoryview
|
Calling contiguous on pytorch first: 14.954 <module> builder.py:1
└─ 14.954 create_model builder.py:3440
├─ 11.296 LlamaModel.save_model builder.py:517
│ ├─ 10.178 save ../../onnxscript/onnxscript/ir/_io.py:40
│ │ └─ 10.150 unload_from_model ../../onnxscript/onnxscript/ir/external_data.py:331
│ │ └─ 10.149 convert_tensors_to_external ../../onnxscript/onnxscript/ir/external_data.py:228
│ │ └─ 10.148 _write_external_data ../../onnxscript/onnxscript/ir/external_data.py:157
│ │ ├─ 6.712 BufferedWriter.write <built-in>
│ │ ├─ 3.135 LazyTensor.__buffer__ ../../onnxscript/onnxscript/ir/_core.py:114
│ │ │ └─ 3.120 LazyTensor.numpy ../../onnxscript/onnxscript/ir/_core.py:1003
│ │ │ └─ 3.116 LazyTensor._evaluate ../../onnxscript/onnxscript/ir/_core.py:967
│ │ │ └─ 3.116 tensor_func builder.py:555
│ │ │ ├─ 2.923 <lambda> builder.py:811
│ │ │ │ ├─ 2.227 Tensor.contiguous <built-in>
│ │ │ │ └─ 0.657 Tensor.to <built-in>
│ │ │ └─ 0.179 <lambda> builder.py:1068
│ │ │ └─ 0.179 Parameter.to <built-in>
│ │ └─ 0.301 [self] ../../onnxscript/onnxscript/ir/external_data.py
│ ├─ 0.909 remove <built-in>
│ └─ 0.204 collect <built-in>
├─ 2.429 LlamaModel.make_model builder.py:2277
│ ├─ 2.237 AutoModelForCausalLM.from_pretrained transformers/models/auto/auto_factory.py:452
│ │ [83 frames hidden] transformers, importlib, torch, trito...
│ └─ 0.186 LlamaModel.make_layer builder.py:2264
│ └─ 0.168 LlamaModel.make_attention builder.py:1632
├─ 0.938 LlamaModel.save_processing builder.py:452
│ ├─ 0.683 AutoTokenizer.from_pretrained transformers/models/auto/tokenization_auto.py:819
│ │ [4 frames hidden] transformers, <built-in>
│ └─ 0.255 PreTrainedTokenizerFast.save_pretrained transformers/tokenization_utils_base.py:2356
│ [2 frames hidden] transformers, <built-in>
└─ 0.201 AutoConfig.from_pretrained transformers/models/auto/configuration_auto.py:1013
[18 frames hidden] transformers, huggingface_hub, reques... |
__buffer__
for tensors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements the buffer protocol for tensor classes to allow zero‐copy access via memoryview, which helps improve performance when saving external tensor data. Key changes include:
- Updating tensor_adapters.py to ensure tensors are contiguous via calling .contiguous() before conversion.
- Replacing tobytes() with memoryview() in external_data.py for more efficient data access.
- Adding buffer implementations in both TensorBase and ExternalTensor classes in _protocols.py and _core.py.
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
File | Description |
---|---|
onnxscript/ir/tensor_adapters.py | Enforces contiguous tensor format before numpy conversion to accelerate common usage. |
onnxscript/ir/external_data.py | Uses memoryview(tensor) for writing external data and adjusts the release order accordingly. |
onnxscript/ir/_protocols.py & _core.py | Introduces and implements the buffer protocol in tensor base classes for zero‐copy support. |
Comments suppressed due to low confidence (3)
onnxscript/ir/tensor_adapters.py:84
- [nitpick] Add tests for the scenario where self.raw is already contiguous to verify that calling .contiguous() does not introduce an unnecessary copy or performance overhead.
self.raw: torch.Tensor = self.raw.contiguous()
onnxscript/ir/external_data.py:180
- [nitpick] Ensure test coverage for writing external tensor data using memoryview, especially for tensors with non-trivial memory layouts or edge-case sizes.
data_file.write(memoryview(tensor))
onnxscript/ir/_core.py:131
- [nitpick] Consider refactoring this code to remove the type ignore comment by ensuring that the array passed to memoryview is correctly typed, which may help prevent potential runtime issues.
return memoryview(array) // type: ignore[arg-type]
gemma, after Profile at /home/justinchu/dev/onnxruntime-genai/src/python/py/models/builder.py:3659
228.397 <module> builder.py:1
└─ 228.397 create_model builder.py:3440
├─ 220.427 Gemma3Model.save_model builder.py:517
│ └─ 218.916 save ../../onnxscript/onnxscript/ir/_io.py:40
│ └─ 218.713 unload_from_model ../../onnxscript/onnxscript/ir/external_data.py:331
│ └─ 218.707 convert_tensors_to_external ../../onnxscript/onnxscript/ir/external_data.py:228
│ └─ 218.686 _write_external_data ../../onnxscript/onnxscript/ir/external_data.py:157
│ ├─ 163.483 LazyTensor.__buffer__ ../../onnxscript/onnxscript/ir/_core.py:113
│ │ └─ 163.434 LazyTensor.numpy ../../onnxscript/onnxscript/ir/_core.py:1002
│ │ └─ 163.417 LazyTensor._evaluate ../../onnxscript/onnxscript/ir/_core.py:966
│ │ └─ 163.415 tensor_func builder.py:555
│ │ └─ 163.352 <lambda> builder.py:811
│ │ ├─ 149.903 Tensor.contiguous <built-in>
│ │ └─ 11.592 Tensor.to <built-in>
│ ├─ 52.574 BufferedWriter.write <built-in>
│ └─ 2.614 [self] ../../onnxscript/onnxscript/ir/external_data.py
├─ 3.872 Gemma3Model.make_model builder.py:2277
│ └─ 3.249 AutoModelForCausalLM.from_pretrained transformers/models/auto/auto_factory.py:452
└─ 3.501 Gemma3Model.save_processing builder.py:452
└─ 2.687 AutoTokenizer.from_pretrained transformers/models/auto/tokenization_auto.py:819
[2 frames hidden] transformers before 208.994 <module> builder.py:1
└─ 208.974 create_model builder.py:3440
├─ 198.222 Gemma3Model.save_model builder.py:517
│ ├─ 195.303 save ../../onnxscript/onnxscript/ir/_io.py:40
│ │ └─ 194.712 unload_from_model ../../onnxscript/onnxscript/ir/external_data.py:332
│ │ └─ 194.692 convert_tensors_to_external ../../onnxscript/onnxscript/ir/external_data.py:229
│ │ └─ 194.075 _write_external_data ../../onnxscript/onnxscript/ir/external_data.py:157
│ │ ├─ 147.473 LazyTensor.tobytes ../../onnxscript/onnxscript/ir/_core.py:974
│ │ │ ├─ 108.904 LazyTensor._evaluate ../../onnxscript/onnxscript/ir/_core.py:934
│ │ │ │ └─ 108.904 tensor_func builder.py:555
│ │ │ │ └─ 108.836 <lambda> builder.py:811
│ │ │ │ ├─ 95.261 Tensor.contiguous <built-in>
│ │ │ │ └─ 11.653 Tensor.to <built-in>
│ │ │ └─ 38.567 TorchTensor.tobytes ../../onnxscript/onnxscript/ir/tensor_adapters.py:101
│ │ ├─ 43.618 BufferedWriter.write <built-in>
│ │ └─ 2.668 [self] ../../onnxscript/onnxscript/ir/external_data.py
│ └─ 2.672 remove <built-in>
├─ 5.073 Gemma3Model.make_model builder.py:2277
│ └─ 4.644 AutoModelForCausalLM.from_pretrained transformers/models/auto/auto_factory.py:452
│ [9 frames hidden] transformers, importlib
└─ 4.332 Gemma3Model.save_processing builder.py:452
└─ 3.677 AutoTokenizer.from_pretrained transformers/models/auto/tokenization_auto.py:819
[4 frames hidden] transformers |
Removing a call to contiguous() 156.458 <module> builder.py:1
└─ 156.444 create_model builder.py:3440
├─ 145.216 Gemma3Model.save_model builder.py:517
│ └─ 143.432 save ../../onnxscript/onnxscript/ir/_io.py:40
│ └─ 142.773 unload_from_model ../../onnxscript/onnxscript/ir/external_data.py:330
│ └─ 142.755 convert_tensors_to_external ../../onnxscript/onnxscript/ir/external_data.py:227
│ └─ 142.395 _write_external_data ../../onnxscript/onnxscript/ir/external_data.py:157
│ ├─ 86.049 LazyTensor.__buffer__ ../../onnxscript/onnxscript/ir/_core.py:113
│ │ └─ 85.990 LazyTensor.numpy ../../onnxscript/onnxscript/ir/_core.py:1002
│ │ ├─ 65.119 TorchTensor.numpy ../../onnxscript/onnxscript/ir/tensor_adapters.py:79
│ │ │ └─ 65.111 Tensor.contiguous <built-in>
│ │ └─ 20.863 LazyTensor._evaluate ../../onnxscript/onnxscript/ir/_core.py:966
│ │ └─ 20.857 tensor_func builder.py:555
│ │ └─ 20.808 <lambda> builder.py:811
│ │ └─ 20.778 Tensor.to <built-in>
│ ├─ 53.200 BufferedWriter.write <built-in>
│ └─ 3.141 [self] ../../onnxscript/onnxscript/ir/external_data.py
├─ 5.426 Gemma3Model.make_model builder.py:2277
│ └─ 4.780 AutoModelForCausalLM.from_pretrained transformers/models/auto/auto_factory.py:452
│ [15 frames hidden] transformers, importlib, torch
└─ 4.430 Gemma3Model.save_processing builder.py:452
└─ 3.778 AutoTokenizer.from_pretrained transformers/models/auto/tokenization_auto.py:819
[5 frames hidden] transformers, <built-in> |
301.946 <module> builder.py:1
└─ 301.946 create_model builder.py:3440
├─ 293.662 Gemma3Model.save_model builder.py:517
│ ├─ 283.700 save ../../onnxscript/onnxscript/ir/_io.py:40
│ │ └─ 283.546 unload_from_model ../../onnxscript/onnxscript/ir/external_data.py:331
│ │ └─ 283.541 convert_tensors_to_external ../../onnxscript/onnxscript/ir/external_data.py:228
│ │ └─ 283.519 _write_external_data ../../onnxscript/onnxscript/ir/external_data.py:157
│ │ ├─ 166.181 LazyTensor.__buffer__ ../../onnxscript/onnxscript/ir/_core.py:113
│ │ │ └─ 166.122 LazyTensor.numpy ../../onnxscript/onnxscript/ir/_core.py:1005
│ │ │ └─ 166.098 LazyTensor._evaluate ../../onnxscript/onnxscript/ir/_core.py:969
│ │ │ └─ 166.097 tensor_func builder.py:555
│ │ │ └─ 166.019 <lambda> builder.py:811
│ │ │ ├─ 145.567 Tensor.contiguous <built-in>
│ │ │ └─ 18.497 Tensor.to <built-in>
│ │ └─ 114.390 BufferedWriter.write <built-in>
│ └─ 9.655 remove <built-in>
├─ 3.822 Gemma3Model.make_model builder.py:2277
│ └─ 3.252 AutoModelForCausalLM.from_pretrained transformers/models/auto/auto_factory.py:452
└─ 3.100 Gemma3Model.save_processing builder.py:452 |
No: builder contiguouse() Yes: Conditional torchtensor contiguous() and numpy contiguous(): 181.161 <module> builder.py:1
└─ 181.137 create_model builder.py:3440
├─ 169.306 Gemma3Model.save_model builder.py:517
│ ├─ 166.007 save ../../onnxscript/onnxscript/ir/_io.py:40
│ │ └─ 165.417 unload_from_model ../../onnxscript/onnxscript/ir/external_data.py:331
│ │ └─ 165.407 convert_tensors_to_external ../../onnxscript/onnxscript/ir/external_data.py:228
│ │ └─ 165.068 _write_external_data ../../onnxscript/onnxscript/ir/external_data.py:157
│ │ ├─ 96.420 LazyTensor.__buffer__ ../../onnxscript/onnxscript/ir/_core.py:113
│ │ │ └─ 96.343 LazyTensor.numpy ../../onnxscript/onnxscript/ir/_core.py:1005
│ │ │ ├─ 78.057 TorchTensor.numpy ../../onnxscript/onnxscript/ir/tensor_adapters.py:79
│ │ │ │ └─ 78.047 Tensor.contiguous <built-in>
│ │ │ └─ 18.281 LazyTensor._evaluate ../../onnxscript/onnxscript/ir/_core.py:969
│ │ │ └─ 18.280 tensor_func builder.py:555
│ │ │ └─ 18.217 <lambda> builder.py:811
│ │ │ └─ 18.213 Tensor.to <built-in>
│ │ ├─ 65.679 BufferedWriter.write <built-in>
│ │ └─ 2.952 [self] ../../onnxscript/onnxscript/ir/external_data.py
│ └─ 2.976 remove <built-in>
├─ 5.284 Gemma3Model.make_model builder.py:2277
│ └─ 4.782 AutoModelForCausalLM.from_pretrained transformers/models/auto/auto_factory.py:452
│ [13 frames hidden] transformers, importlib, torch
└─ 5.020 Gemma3Model.save_processing builder.py:452
└─ 4.356 AutoTokenizer.from_pretrained transformers/models/auto/tokenization_auto.py:819
[5 frames hidden] transformers, <built-in> |
144.380 <module> builder.py:1
└─ 144.363 create_model builder.py:3440
├─ 134.189 Gemma3Model.save_model builder.py:517
│ ├─ 131.956 save ../../onnxscript/onnxscript/ir/_io.py:40
│ │ └─ 131.508 unload_from_model ../../onnxscript/onnxscript/ir/external_data.py:332
│ │ └─ 131.495 convert_tensors_to_external ../../onnxscript/onnxscript/ir/external_data.py:229
│ │ └─ 131.284 _write_external_data ../../onnxscript/onnxscript/ir/external_data.py:157
│ │ ├─ 78.125 LazyTensor.__buffer__ ../../onnxscript/onnxscript/ir/_core.py:113
│ │ │ └─ 77.989 LazyTensor.numpy ../../onnxscript/onnxscript/ir/_core.py:1005
│ │ │ ├─ 63.879 TorchTensor.numpy ../../onnxscript/onnxscript/ir/tensor_adapters.py:79
│ │ │ │ └─ 63.866 Tensor.contiguous <built-in>
│ │ │ └─ 14.103 LazyTensor._evaluate ../../onnxscript/onnxscript/ir/_core.py:969
│ │ │ └─ 14.103 tensor_func builder.py:555
│ │ │ └─ 13.997 <lambda> builder.py:811
│ │ │ └─ 13.993 Tensor.to <built-in>
│ │ ├─ 50.105 BufferedWriter.write <built-in>
│ │ └─ 3.019 memoryview.__exit__ <built-in>
│ └─ 1.938 remove <built-in>
├─ 5.209 Gemma3Model.save_processing builder.py:452
│ └─ 4.430 AutoTokenizer.from_pretrained transformers/models/auto/tokenization_auto.py:819
│ [5 frames hidden] transformers, <built-in>
└─ 3.634 Gemma3Model.make_model builder.py:2277
└─ 3.054 AutoModelForCausalLM.from_pretrained transformers/models/auto/auto_factory.py:452
[9 frames hidden] transformers, importlib |
if not _IS_LITTLE_ENDIAN: | ||
# Need to copy because we are returning the underlying data directly | ||
array = array.view(array.dtype.newbyteorder("<")).copy() | ||
return array.__buffer__(flags) |
Check failure
Code scanning / lintrunner
MYPY/attr-defined Error
array = self.numpy() | ||
assert array.data.c_contiguous, "Bug: The array should be contiguous" | ||
assert self.dtype.itemsize == array.itemsize, "Bug: The itemsize should match" | ||
return array.__buffer__(flags) |
Check failure
Code scanning / lintrunner
MYPY/attr-defined Error
Users can now call
memoryview
on tensors and avoid a copy when saving to external tensors. The TorchTensor class from pytorch gets this method automatically because it inherits from the TensorBase class.Update: I was able to save some memory but the overload time it takes to save seems to be longer. Need more investigations.
Before | After