Permalink
Browse files

Automatic parameter tuning for hierarchical allreduce, fusion buffer,…

… and cycle time (#615)
  • Loading branch information...
tgaddair committed Dec 11, 2018
1 parent 7d90bd1 commit a5abcf0c90e500dca1ae5c85f2742e808fa9aa91
@@ -0,0 +1,6 @@
[submodule "third_party/lbfgs"]
path = third_party/lbfgs
url = https://github.com/yixuan/LBFGSpp.git
[submodule "third_party/eigen"]
path = third_party/eigen
url = https://github.com/eigenteam/eigen-git-mirror.git
62 LICENSE
@@ -271,4 +271,64 @@
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Horovod includes derived work from the following:

scikit-learn
Copyright (c) 2007–2018 The scikit-learn developers. All rights reserved.

Licensed under the New BSD License:

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

a. Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
b. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
c. Neither the name of the Scikit-learn Developers nor the names of
its contributors may be used to endorse or promote products
derived from this software without specific prior written
permission.


THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
DAMAGE.

The derived work can be found in the files:

- horovod/common/optim/gaussian_process.h
- horovod/common/optim/gaussian_process.cc

krasserm/bayesian-machine-learning (http://krasserm.github.io)
Copyright 2018 Martin Krasser. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

The derived work can be found in the files:

- horovod/common/optim/bayesian_optimization.h
- horovod/common/optim/bayesian_optimization.cc
- horovod/common/optim/gaussian_process.h
- horovod/common/optim/gaussian_process.cc
@@ -1,3 +1,16 @@
recursive-include * *.h *.cc *.md

include LICENSE horovod.lds horovod.exp
prune .eggs

# prune eigen LGPL2
graft third_party/eigen/Eigen
exclude third_party/eigen/Eigen/Eigen
exclude third_party/eigen/Eigen/IterativeLinearSolvers
exclude third_party/eigen/Eigen/MetisSupport
exclude third_party/eigen/Eigen/Sparse
exclude third_party/eigen/Eigen/SparseCholesky
exclude third_party/eigen/Eigen/SparseLU
exclude third_party/eigen/Eigen/src/IterativeSolvers/*
exclude third_party/eigen/Eigen/src/OrderingMethods/Amd.h
exclude third_party/eigen/Eigen/src/SparseCholesky/*
@@ -0,0 +1,52 @@
// Copyright 2018 Uber Technologies, Inc. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// =============================================================================

#include "fusion_buffer_manager.h"

namespace horovod {
namespace common {

Status FusionBufferManager::InitializeBuffer(int64_t threshold, int device, std::shared_ptr<OpContext> context,
std::function<void()> on_start_init,
std::function<void()> on_end_init) {
auto& elem = tensor_fusion_buffers_[std::make_tuple(device, context->framework())];
auto& buffer = elem.first;
int64_t& size = elem.second;
if (size != threshold) {
buffer.reset();
size = 0;
}

if (buffer == nullptr) {
on_start_init();
size = threshold;

// Lazily allocate persistent buffer for Tensor Fusion and keep it
// forever per device.
Status status = context->AllocatePersistent(threshold, &buffer);
on_end_init();

return status;
}

return Status::OK();
}

std::shared_ptr<PersistentBuffer>& FusionBufferManager::GetBuffer(int device, Framework framework) {
return tensor_fusion_buffers_[std::make_tuple(device, framework)].first;
}

} // namespace common
} // namespace horovod
@@ -0,0 +1,61 @@
// Copyright 2018 Uber Technologies, Inc. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// =============================================================================

#ifndef HOROVOD_FUSION_BUFFER_MANAGER_H
#define HOROVOD_FUSION_BUFFER_MANAGER_H

#include <iostream>
#include <unordered_map>

#include "common.h"
#include "hashes.h"
#include "operations.h"

namespace horovod {
namespace common {

// Encapsulates the process of creating and destroying fusion buffers as the requested
// threshold is changed.
class FusionBufferManager {
public:
// Initializes a buffer of the given threshold size if not already cached.
//
// Args:
// threshold: Size of the buffer in bytes.
// device: Device ID to associate the buffer.
// context: Framework used to create the buffer and associate it.
// on_start_init: Callback on starting buffer initialization.
// on_end_init: Callback on completing buffer initialization.
Status InitializeBuffer(int64_t threshold,
int device, std::shared_ptr<OpContext> context,
std::function<void()> on_start_init,
std::function<void()> on_end_init);

// Returns the buffer associated with the given device and framework, or null.
std::shared_ptr<PersistentBuffer>& GetBuffer(int device, Framework framework);

private:
// Memory buffers for Tensor Fusion. They are keyed off device ID and
// framework, and all are allocated tensor_fusion_threshold bytes if
// initialized.
std::unordered_map<
std::tuple<int, Framework>,
std::pair<std::shared_ptr<PersistentBuffer>, int64_t>> tensor_fusion_buffers_;
};

} // namespace common
} // namespace horovod

#endif //HOROVOD_FUSION_BUFFER_MANAGER_H
Oops, something went wrong.

0 comments on commit a5abcf0

Please sign in to comment.