FLOR: Fast Low-Overhead Recovery
FLOR is a suite of machine learning tools for hindsight logging.
What is hindsight logging?
Hindsight logging is an optimistic logging practice favored by agile model developers. Model developers log training metrics such as the loss and accuracy by default, and selectively restore additional training data --- like tensor histograms, images, and overlays --- post-hoc, if and when there is evidence of a problem.
What tools does FLOR bundle?
A low-overhead background materializer. By our microbenchmarks, the background materializer cuts logging overheads by 75% on average. This tool lets you use your logger of choice in the backgroud: e.g. TensorBoard, WandB, MLFLow, and others. When your logging practice is optimistic, logging overheads are light---but if you're logging more conservatively, or hindisght logging (i.e. restoring) heavy volumes of data post-hoc, you should use this toold.
A periodic checkpointing library.
A SkipBlock API.
An instrumentation library.
Flor is licensed under the Apache v2 License.