Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CUDA] Multi-GPU for CUDA Version #6138

Open
wants to merge 82 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
ee3923b
initialize nccl
shiyu1994 Oct 10, 2023
82668d0
Merge branch 'master' into nccl-dev
shiyu1994 Oct 26, 2023
6189cbb
Merge branch 'master' into nccl-dev
shiyu1994 Oct 27, 2023
f39f877
change year in header
shiyu1994 Nov 8, 2023
e513662
Merge branch 'master' into nccl-dev
shiyu1994 Nov 8, 2023
47f3e50
Merge branch 'nccl-dev' of https://github.com/Microsoft/LightGBM into…
shiyu1994 Nov 8, 2023
985780f
add implementation of nccl gbdt
shiyu1994 Nov 8, 2023
35b0ca1
add nccl topology
shiyu1994 Nov 9, 2023
7d36a14
clean up
shiyu1994 Nov 9, 2023
5470d99
Merge branch 'master' into nccl-dev
shiyu1994 Nov 9, 2023
7b47a1e
clean up
shiyu1994 Nov 9, 2023
839c375
Merge branch 'nccl-dev' of https://github.com/Microsoft/LightGBM into…
shiyu1994 Nov 9, 2023
8eaf3ad
Merge branch 'master' into nccl-dev
shiyu1994 Dec 15, 2023
cc72fc8
Merge branch 'master' into nccl-dev
shiyu1994 Dec 22, 2023
209e25d
set nccl info
shiyu1994 Jan 25, 2024
431f967
support quantized training with categorical features on cpu
shiyu1994 Feb 5, 2024
b07caf2
remove white spaces
shiyu1994 Feb 5, 2024
cf60467
add tests for quantized training with categorical features
shiyu1994 Feb 5, 2024
bf2f649
skip tests for cuda version
shiyu1994 Feb 5, 2024
2fc9525
fix cases when only 1 data block in row-wise quantized histogram cons…
shiyu1994 Feb 6, 2024
dce770c
remove useless capture
shiyu1994 Feb 6, 2024
f0c44fc
Merge branch 'master' into nccl-dev
shiyu1994 Feb 6, 2024
e2cb41f
Merge branch 'nccl-dev' of https://github.com/Microsoft/LightGBM into…
shiyu1994 Feb 6, 2024
f3985ef
fix inconsistency of gpu devices
shiyu1994 Feb 7, 2024
d000a41
fix creating boosting object from file
shiyu1994 Feb 7, 2024
ecdccd5
change num_gpu to num_gpus in test case
shiyu1994 Feb 7, 2024
dfa4419
fix objective initialization
shiyu1994 Feb 9, 2024
f4b8906
Merge branch 'nccl-dev' of https://github.com/Microsoft/LightGBM into…
shiyu1994 Feb 9, 2024
f0b22d1
fix c++ compilation warning
shiyu1994 Feb 9, 2024
617b3b2
fix lint errors
shiyu1994 Feb 9, 2024
6d090b2
Merge branch 'master' into fix-6257
shiyu1994 Feb 20, 2024
736ab8a
Merge branch 'master' into nccl-dev
shiyu1994 Feb 20, 2024
ad72d9f
Merge branch 'fix-6257' into nccl-dev
shiyu1994 Feb 20, 2024
2670f48
fix compilation warnings
shiyu1994 Feb 20, 2024
02b725b
change num_gpu to num_gpus in R test case
shiyu1994 Feb 20, 2024
3bfb784
add nccl synchronization in tree training
shiyu1994 Feb 20, 2024
fe1f592
fix global num data update
shiyu1994 Feb 21, 2024
a528bd6
merge master
shiyu1994 Feb 22, 2024
996d70b
fix ruff-format issues
shiyu1994 Feb 22, 2024
671bed3
merge master
shiyu1994 Feb 23, 2024
34610fb
use global num data in split finder
shiyu1994 Feb 23, 2024
041018b
Merge branch 'master' into nccl-dev
shiyu1994 Mar 6, 2024
e1b4512
explicit initialization of NCCLInfo members
shiyu1994 Mar 11, 2024
0a21b5f
Merge branch 'master' into nccl-dev
shiyu1994 Mar 25, 2024
be29624
Merge branch 'nccl-dev' of https://github.com/Microsoft/LightGBM into…
shiyu1994 Mar 25, 2024
06cfde4
Merge branch 'master' into nccl-dev
shiyu1994 Apr 11, 2024
75afe5e
Merge branch 'master' into nccl-dev
shiyu1994 May 20, 2024
1e6e4a1
Merge branch 'master' into nccl-dev
shiyu1994 Jun 30, 2024
614605c
merge master
shiyu1994 Oct 8, 2024
18babb0
Merge branch 'nccl-dev' of https://github.com/Microsoft/LightGBM into…
shiyu1994 Oct 8, 2024
11f4062
fix compilation
shiyu1994 Oct 8, 2024
b4c21c2
use CUDAVector
shiyu1994 Oct 9, 2024
70fe10f
use CUDAVector
shiyu1994 Oct 9, 2024
849a554
merge master
shiyu1994 Oct 18, 2024
19a2662
merge master
shiyu1994 Oct 18, 2024
6db879a
use CUDAVector
shiyu1994 Oct 25, 2024
b43f88b
use CUDAVector for cuda tree and column data
shiyu1994 Oct 25, 2024
582c760
update gbdt
shiyu1994 Oct 25, 2024
b9e143b
changes for cuda tree
shiyu1994 Oct 25, 2024
483e521
use CUDAVector for cuda column data
shiyu1994 Oct 25, 2024
950199d
fix bug in GetDataByColumnPointers
shiyu1994 Oct 25, 2024
f30ee85
Merge branch 'master' into nccl-dev
shiyu1994 Oct 25, 2024
d11991a
disable cuda by default
shiyu1994 Oct 25, 2024
4bb4411
Merge branch 'nccl-dev' of https://github.com/Microsoft/LightGBM into…
shiyu1994 Oct 25, 2024
b56b39e
fix single machine gbdt
shiyu1994 Oct 25, 2024
3bebc19
merge main
shiyu1994 Dec 17, 2024
47b4364
clean up
shiyu1994 Dec 17, 2024
a326c87
fix typo
shiyu1994 Dec 17, 2024
5f999e7
fix lint issues
shiyu1994 Dec 17, 2024
d8ea043
Merge branch 'master' into nccl-dev
shiyu1994 Dec 18, 2024
a0864dc
Merge branch 'nccl-dev' of https://github.com/Microsoft/LightGBM into…
shiyu1994 Dec 18, 2024
2f040b7
Merge branch 'master' into nccl-dev
shiyu1994 Dec 23, 2024
0cf1062
Merge branch 'nccl-dev' of https://github.com/Microsoft/LightGBM into…
shiyu1994 Dec 24, 2024
ae4cce6
use num_gpu instead of num_gpus
shiyu1994 Dec 24, 2024
266e02b
Merge branch 'master' into nccl-dev
shiyu1994 Feb 25, 2025
6aa2aff
fix compilation error
shiyu1994 Feb 25, 2025
b137216
fix cpp lint errors
shiyu1994 Feb 25, 2025
3e1452a
Merge branch 'master' into nccl-dev
shiyu1994 Mar 7, 2025
a86bb42
fix reset config for cuda data partition
shiyu1994 Mar 7, 2025
37fb144
fix subrow copy in cuda column data
shiyu1994 Mar 7, 2025
4fa4837
Merge branch 'nccl-dev' of https://github.com/Microsoft/LightGBM into…
shiyu1994 Mar 7, 2025
5bf50de
fix cmakelint errors
shiyu1994 Mar 7, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
initialize nccl
  • Loading branch information
shiyu1994 committed Oct 10, 2023
commit ee3923b5d6018292f853df90238d20f6e3c62d13
5 changes: 5 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -204,6 +204,7 @@ endif()

if(USE_CUDA)
find_package(CUDA 11.0 REQUIRED)
find_package(Nccl REQUIRED)
include_directories(${CUDA_INCLUDE_DIRS})
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -Xcompiler=${OpenMP_CXX_FLAGS} -Xcompiler=-fPIC -Xcompiler=-Wall")

@@ -561,6 +562,10 @@ if(USE_GPU)
target_link_libraries(lightgbm_objs PUBLIC ${OpenCL_LIBRARY} ${Boost_LIBRARIES})
endif()

if(USE_CUDA)
target_link_libraries(lightgbm_objs PUBLIC ${NCCL_LIBRARY})
endif(USE_CUDA)

if(__INTEGRATE_OPENCL)
# targets OpenCL and Boost are added in IntegratedOpenCL.cmake
add_dependencies(lightgbm_objs OpenCL Boost)
70 changes: 70 additions & 0 deletions cmake/modules/FindNccl.cmake
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Tries to find NCCL headers and libraries.
#
# Usage of this module as follows:
#
# find_package(NCCL)
#
# Variables used by this module, they can change the default behaviour and need
# to be set before calling find_package:
#
# NCCL_ROOT - When set, this path is inspected instead of standard library
# locations as the root of the NCCL installation.
# The environment variable NCCL_ROOT overrides this variable.
#
# This module defines
# Nccl_FOUND, whether nccl has been found
# NCCL_INCLUDE_DIR, directory containing header
# NCCL_LIBRARY, directory containing nccl library
# NCCL_LIB_NAME, nccl library name
# USE_NCCL_LIB_PATH, when set, NCCL_LIBRARY path is also inspected for the
# location of the nccl library. This would disable
# switching between static and shared.
#
# This module assumes that the user has already called find_package(CUDA)

if (NCCL_LIBRARY)
if(NOT USE_NCCL_LIB_PATH)
# Don't cache NCCL_LIBRARY to enable switching between static and shared.
unset(NCCL_LIBRARY CACHE)
endif(NOT USE_NCCL_LIB_PATH)
endif()

if (BUILD_WITH_SHARED_NCCL)
# libnccl.so
set(NCCL_LIB_NAME nccl)
else ()
# libnccl_static.a
set(NCCL_LIB_NAME nccl_static)
endif (BUILD_WITH_SHARED_NCCL)

find_path(NCCL_INCLUDE_DIR
NAMES nccl.h
PATHS $ENV{NCCL_ROOT}/include ${NCCL_ROOT}/include)

find_library(NCCL_LIBRARY
NAMES ${NCCL_LIB_NAME}
PATHS $ENV{NCCL_ROOT}/lib/ ${NCCL_ROOT}/lib)

message(STATUS "Using nccl library: ${NCCL_LIBRARY}")

include(FindPackageHandleStandardArgs)
find_package_handle_standard_args(Nccl DEFAULT_MSG
NCCL_INCLUDE_DIR NCCL_LIBRARY)

mark_as_advanced(
NCCL_INCLUDE_DIR
NCCL_LIBRARY
)
169 changes: 169 additions & 0 deletions src/boosting/cuda/nccl_gbdt.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
/*!
* Copyright (c) 2021 Microsoft Corporation. All rights reserved.
* Licensed under the MIT License. See LICENSE file in the project root for license information.
*/

#ifndef LIGHTGBM_BOOSTING_CUDA_NCCL_GBDT_HPP_
#define LIGHTGBM_BOOSTING_CUDA_NCCL_GBDT_HPP_

#ifdef USE_CUDA

#include "../gbdt.h"
#include <LightGBM/objective_function.h>
#include <LightGBM/network.h>
#include "cuda_score_updater.hpp"
#include <pthread.h>

namespace LightGBM {

template <typename GBDT_T>
class NCCLGBDT: public GBDT_T {
public:
NCCLGBDT();

~NCCLGBDT();

void Init(const Config* gbdt_config, const Dataset* train_data,
const ObjectiveFunction* objective_function,
const std::vector<const Metric*>& training_metrics) override;

void Boosting() override;

void RefitTree(const std::vector<std::vector<int>>& /*tree_leaf_prediction*/) override {
Log::Fatal("RefitTree is not supported for NCCLGBDT.");
}

bool TrainOneIter(const score_t* gradients, const score_t* hessians) override;

const double* GetTrainingScore(int64_t* /*out_len*/) override {
Log::Fatal("GetTrainingScore is not supported for NCCLGBDT.");
}

void ResetTrainingData(const Dataset* /*train_data*/, const ObjectiveFunction* /*objective_function*/,
const std::vector<const Metric*>& /*training_metrics*/) override {
Log::Fatal("ResetTrainingData is not supported for NCCLGBDT.");
}

void ResetConfig(const Config* /*gbdt_config*/) override {
Log::Fatal("ResetConfig is not supported for NCCLGBDT.");
}

private:
struct BoostingThreadData {
int gpu_index;
ObjectiveFunction* gpu_objective_function;
score_t* gradients;
score_t* hessians;
const double* score;

BoostingThreadData() {
gpu_index = 0;
gpu_objective_function = nullptr;
}
};

struct TrainTreeLearnerThreadData {
int gpu_index;
TreeLearner* gpu_tree_learner;
const score_t* gradients;
const score_t* hessians;
bool is_first_time;
int class_id;
data_size_t num_data_in_gpu;
std::unique_ptr<Tree> tree;

TrainTreeLearnerThreadData() {
gpu_index = 0;
gpu_tree_learner = nullptr;
gradients = nullptr;
hessians = nullptr;
is_first_time = false;
class_id = 0;
num_data_in_gpu = 0;
tree.reset(nullptr);
}
};

struct UpdateScoreThreadData {
int gpu_index;
ScoreUpdater* gpu_score_updater;
TreeLearner* gpu_tree_learner;
Tree* tree;
int cur_tree_id;

UpdateScoreThreadData() {
gpu_index = 0;
gpu_score_updater = nullptr;
gpu_tree_learner = nullptr;
tree = nullptr;
cur_tree_id = 0;
}
};

static void* BoostingThread(void* thread_data);

static void* TrainTreeLearnerThread(void* thread_data);

static void* UpdateScoreThread(void* thread_data);

void Bagging(int /*iter*/) override {
Log::Fatal("Bagging is not supported for NCCLGBDT.");
}

void InitNCCL();

double BoostFromAverage(int class_id, bool update_scorer) override;

void UpdateScore(const std::vector<std::unique_ptr<Tree>>& tree, const int cur_tree_id);

void UpdateScore(const Tree* /*tree*/, const int /*cur_tree_id*/) {
Log::Fatal("UpdateScore is not supported for NCCLGBDT.");
}

void RollbackOneIter() override {
Log::Fatal("RollbackOneIter is not supported for NCCLGBDT.");
}

std::vector<double> EvalOneMetric(const Metric* metric, const double* score, const data_size_t num_data) const override;

void SetCUDADevice(int gpu_id) const {
if (gpu_list_.empty()) {
CUDASUCCESS_OR_FATAL(cudaSetDevice(gpu_id));
} else {
CUDASUCCESS_OR_FATAL(cudaSetDevice(gpu_list_[gpu_id]));
}
}

int GetCUDADevice(int gpu_id) const {
if (gpu_list_.empty()) {
return gpu_id;
} else {
return gpu_list_[gpu_id];
}
}

int num_gpu_;
int num_threads_;
int master_gpu_device_id_;
int master_gpu_index_;
std::vector<int> gpu_list_;
std::vector<std::unique_ptr<ObjectiveFunction>> per_gpu_objective_functions_;
std::vector<std::unique_ptr<ScoreUpdater>> per_gpu_train_score_updater_;
std::vector<std::unique_ptr<CUDAVector<score_t>>> per_gpu_gradients_;
std::vector<std::unique_ptr<CUDAVector<score_t>>> per_gpu_hessians_;
std::vector<std::unique_ptr<Dataset>> per_gpu_datasets_;
std::vector<data_size_t> per_gpu_data_start_;
std::vector<data_size_t> per_gpu_data_end_;
std::vector<pthread_t> host_threads_;
std::vector<BoostingThreadData> boosting_thread_data_;
std::vector<TrainTreeLearnerThreadData> train_tree_learner_thread_data_;
std::vector<UpdateScoreThreadData> update_score_thread_data_;
std::vector<int> nccl_gpu_rank_;
std::vector<ncclComm_t> nccl_communicators_;
std::vector<std::unique_ptr<TreeLearner>> per_gpu_tree_learners_;
};

} // namespace LightGBM

#endif // USE_CUDA
#endif // LIGHTGBM_BOOSTING_CUDA_NCCL_GBDT_HPP_