Permalink
Commits on Nov 3, 2018
Commits on Oct 15, 2018
  1. Broadcast optimizer options in addition to parameter state (#562)

    tgaddair committed Oct 15, 2018
    * Broadcast optimizer options in addition to parameter state
    
    * Added comment
    
    * Added tests for all the optimizer subclasses
    
    * Added comment
Commits on Oct 8, 2018
Commits on Sep 29, 2018
Commits on Sep 28, 2018
  1. FP16 support for GPU tensors in all frameworks (#529)

    tgaddair committed Sep 28, 2018
    * Initial support for FP16
    
    Bump version to a dev release
    
    Cast vars to fp16 before allreduce to compress gradients
    
    Abstracted compression algorithm into a class hierarchy and added algorithm flag to optimizer and allreduce signatures
    
    Changed compressor to set the dtype on initialization
    
    Resolved conflicts
    
    Additional conflicts
    
    Formatting
    
    More formats
    
    Updated license
    
    Added fp16 compression for Keras
    
    Added arguments to keras examples
    
    Fixed imports
    
    * Added compression to tf.keras
    
    * Added PyTorch compression API
    
    Added unit tests
    
    Whitespace
    
    * Added C interfaces and types
    
    * Forward declare
    
    * Removed Half from older versions of PyTorch
    
    * Added error for old version of PyTorch
    
    * Removed reference to float16
    
    * Updated examples, added compression to the Keras model load
    
    * Cleaned imports
    
    * Removed dependency on enums
    
    * Updated unit tests
    
    * Test compatability fix
    
    * Reverted version updates
    
    * Fixed message
    
    * Removed imports
    
    * Added cuda.HalfTensor to all PyTorch tests with CUDA
    
    * Only compare versions once
    
    * Renamed --fp16 in examples to --fp16-allreduce for clarity
    
    * Replaced assignment with set_
    
    * Modified compression algorithms to be stateless with optional context parameters
    
    * Removed optional ctx parameter
    
    * Replaced 0.4.2 with 1.0.0
    
    * Only run GPU tests with HalfTensors if fp16 is supported
Commits on Sep 26, 2018
  1. Added tf.keras support (#513)

    tgaddair committed Sep 26, 2018
    * Added support for tf.keras
    
    * Added unit tests
    
    * Refactoring
    
    * Fixed tests
    
    * Hide implementation modules
    
    * Moved _DistributedOptimizer into the impl file and wrapped with function
    
    * Added cooperative multiple inheritance
    
    * Backwards compatability with TensorFlow versions less than 1.4.0
    
    * Removed duplicate headers
Commits on Sep 14, 2018
  1. Fixed issue with dynamically linking PyTorch module on Mac OSX (#494)

    tgaddair committed Sep 14, 2018
    * Fixed issue with dynamically linking PyTorch module on Mac OSX due to hidden symbol
    
    * Changed to kleene star matching for consistency
Commits on Aug 2, 2018
  1. Added command line args to Keras ImageNet examples (#419)

    tgaddair committed Aug 2, 2018
    * Added command line args to Keras ImageNet example in line with those in the PyTorch examples
    
    * Addressed comments
    
    * Fixed checkpoint formatting
Commits on Jul 23, 2018
Commits on Jul 13, 2018
  1. Added custom load_model function to wrap the model optimizer with a H…

    tgaddair committed Jul 13, 2018
    …orovod DistributedOptimizer (#359)
    
    * Added custom load_model function to wrap the model optimizer with a Horovod DistributedOptimizer
    
    * Added Keras tests
    
    * Removed imports
    
    * Fixed license
    
    * Updated imagenet example
    
    * Fixed unit tests and API for compatability with Keras 2.0.0 and TensorFlow 1.1.0
    
    * Added guarded import of Keras to avoid race conditions
    
    * Reverted formatting changes to example and only load model on rank 0 node
    
    * Added additional unit tests for optional parameters, fixed issue with key when using custom_optimizers
    
    * Added broadcast tests
    
    * Updated example
    
    * Clear session between tests to reset variables
    
    * Updated comment
    
    * Execute all Keras unit tests in a custom TensorFlow session
    
    * Added save_model for parity with PyTorch API
    
    * Added assertions
    
    * Revert "Added save_model for parity with PyTorch API"
    
    This reverts commit e6381f0.
  2. Added PyTorch support for restoring optimizer state on model load an…

    tgaddair committed Jul 13, 2018
    …d broadcast (#371)
    
    * Added PyTorch support for restoring optimizer state on model load
    
    * Fixed examples, cleaned up API
    
    * Imports
    
    * Just in time variable declaration
    
    * Updated API, fixed test
    
    * Replaced iteritems() with items() for Python 3
    
    * Python 3 compatability issue
    
    * Added test for custom_state
    
    * Added comments, cache state_dict
    
    * Test not saving and loading the optimizer
    
    * Added broadcast_object function and changed optimizer state broadcast to use this instead
    
    * Fixed PyTorch 0.3.0 compatability and updated the docs
    
    * Removed save_model and load_model
    
    * Removed check
    
    * Init tensor from size, removed extra whitespace
Commits on Jul 3, 2018
  1. Made synchronous pure functional Horovod ops differentiable in PyTorch (

    tgaddair committed Jul 3, 2018
    #338)
    
    * Made synchronous pure functional Horovod ops differentiable in PyTorch and added unit tests to verify gradient correctness
    
    * Added comment
    
    * Fixed backward compatability with PyTorch version 0.3.0
    
    * Fixed example code to use variable api
  2. Made Horovod ops differentiable in TensorFlow API (#331)

    tgaddair committed Jul 3, 2018
    * Made Horovod ops differentiable in TensorFlow and added unit tests to verify gradient correctness
    
    * Fixed allgather and broadcast gradients, and improved unit tests
    
    * Fixed broadcast gradient
    
    * Private gradients
    
    * Removed stop gradient
    
    * Fixed unit test by avoiding reusing variable
    
    * Fixed compatability with TensorFlow 1.1
    
    * TensorFlow 1.9 change prevents  from backpropogating through integer tensors, so we disabled tests on integer tensors for the time
    
    * Added comment