Open
Description
This feature aims to support memory-efficient training by enabling gradient checkpointing on specific layers of Lux models. Users will be able to configure which layers should trade compute for memory.
Success when:
Gradient checkpointing can be toggled per layer.
Memory usage reduction is shown via benchmarks on larger models.
Metadata
Metadata
Assignees
Labels
No labels