Save model data in half precision (bf16) and load back to the full precision f32 #202

antimora · 2023-03-06T20:37:47Z

Feature description

A user should be able to specify if the model data should be saved in half precision, and tell if the data should be loaded back to the full precision

Feature motivation

Often the performance of model inference does not degrade much if f32 precision is truncated in half but it may matter during training. To reduce the size of the exported model, it would be beneficial to save in bf16 (truncated version of f32) and later to load back in f32 (If the target architecture supports only f32 and up).

(Optional) Suggest a Solution

Use half library to convert f32 to bf16. half also supports serde for (de)serialization. User should be able to specify target data (f32).

nathanielsimard mentioned this issue Mar 22, 2023

State serialization/deserialization overhaul #247

Merged

nathanielsimard closed this as completed in #247 Mar 23, 2023

jwhogg mentioned this issue Jun 27, 2024

Remove closed 'future improvements' #1935

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save model data in half precision (bf16) and load back to the full precision f32 #202

Save model data in half precision (bf16) and load back to the full precision f32 #202

antimora commented Mar 6, 2023

Save model data in half precision (bf16) and load back to the full precision f32 #202

Save model data in half precision (bf16) and load back to the full precision f32 #202

Comments

antimora commented Mar 6, 2023

Feature description

Feature motivation

(Optional) Suggest a Solution