GitHub

Distributed Training of Decision Trees and Random Forests

Following divide and conquer, this system treats each tree node to construct as a task, and parallelizes all tasks as much as possible following the idea of divide and conquer.

This allows it to use all CPU cores in a cluster to train tree models over big data. We require data to be kept on Hadoop Distributed File System for parallel loading. For more details on how to run the system, please read the file "ReadMe.txt".

Configure the Running Environment

Please follow the documentation here to deploy: https://yanlab19870714.github.io/yanda/gthinker/deploy.html We additionally require you to install the Boost C++ library. You may need to update the Makefiles to run on your platform.

Contact

Da Yan: https://yanlab19870714.github.io/yanda

Video Demo: https://www.youtube.com/watch?v=4DnLv_OFlIg

Email: yanda@uab.edu

Contributors

CHOWDHURY, Md Mashiur Rahman (Mashiur)

YAN, Da (Daniel)

GUO, Guimu

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.settings		.settings
data		data
data_regression		data_regression
datasets_prepared_by_Mashiur		datasets_prepared_by_Mashiur
master		master
put		put
slave		slave
.DS_Store		.DS_Store
.cproject		.cproject
.project		.project
Da_note.txt		Da_note.txt
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
ReadMe.txt		ReadMe.txt
ReqQueue.h		ReqQueue.h
ReqServer.h		ReqServer.h
RespQueue.h		RespQueue.h
RespServer.h		RespServer.h
Worker.h		Worker.h
column.h		column.h
column_server.h		column_server.h
communication.h		communication.h
config.h		config.h
conmap2t.h		conmap2t.h
conque.h		conque.h
conque_p.h		conque_p.h
criterion.h		criterion.h
csv.h		csv.h
deque_p.h		deque_p.h
global.h		global.h
ioser.h		ioser.h
matrix.h		matrix.h
run_distributed_train.cpp		run_distributed_train.cpp
run_otherTests.cpp		run_otherTests.cpp
run_test.cpp		run_test.cpp
serialization.h		serialization.h
splitter.h		splitter.h
timer.h		timer.h
tree.h		tree.h
treeServer_appendix.pdf		treeServer_appendix.pdf
tree_msgs.h		tree_msgs.h
tree_obj.h		tree_obj.h
ydhdfs.h		ydhdfs.h
ydhdfs1.h		ydhdfs1.h
ydhdfs2.h		ydhdfs2.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distributed Training of Decision Trees and Random Forests

Configure the Running Environment

Contact

Contributors

About

Releases

Packages

Languages

License

yanlab19870714/TreeServer

Folders and files

Latest commit

History

Repository files navigation

Distributed Training of Decision Trees and Random Forests

Configure the Running Environment

Contact

Contributors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages