You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A user should be able to specify if the model data should be saved in half precision, and tell if the data should be loaded back to the full precision
Feature motivation
Often the performance of model inference does not degrade much if f32 precision is truncated in half but it may matter during training. To reduce the size of the exported model, it would be beneficial to save in bf16 (truncated version of f32) and later to load back in f32 (If the target architecture supports only f32 and up).
(Optional) Suggest a Solution
Use half library to convert f32 to bf16. half also supports serde for (de)serialization. User should be able to specify target data (f32).
The text was updated successfully, but these errors were encountered:
Feature description
A user should be able to specify if the model data should be saved in half precision, and tell if the data should be loaded back to the full precision
Feature motivation
Often the performance of model inference does not degrade much if f32 precision is truncated in half but it may matter during training. To reduce the size of the exported model, it would be beneficial to save in bf16 (truncated version of f32) and later to load back in f32 (If the target architecture supports only f32 and up).
(Optional) Suggest a Solution
Use half library to convert f32 to bf16. half also supports serde for (de)serialization. User should be able to specify target data (f32).
The text was updated successfully, but these errors were encountered: