Skip to content

Gradient Checkpointing Support #16

Open
@jakubMitura14

Description

@jakubMitura14

This feature aims to support memory-efficient training by enabling gradient checkpointing on specific layers of Lux models. Users will be able to configure which layers should trade compute for memory.
Success when:
Gradient checkpointing can be toggled per layer.
Memory usage reduction is shown via benchmarks on larger models.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions