Skip to content

xrsrke/elasticgoose

Repository files navigation

ElasticGoose: A fault-tolerance library for PyTorch

Features

  • Synchronous variables: let users defines custom handlers for recovering from failures, reset, synchronize

About

A fault-tolerant elastic training framework for PyTorch

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages