DiSC is a system for fast approximate score computation for learning multinomial Bayesian networks on large-scale distributed data. It employs decentralized computation using gossip algorithms, hashing techniques for load balancing, and a probabilistic approach for lowering resource consumption during score computation. DiSC significantly outperforms MapReduce-style computation for computing scores, which is a fundamental task during Bayesian structure learning.
Praveen Rao, Anas Katib, Kobus Barnard, Charles Kamhoua, Kevin Kwiat, Laurent Njilla - Scalable Score Computation for Learning Multinomial Bayesian Networks Over Distributed Data. In the AAAI 2017 Workshop on Distributed Machine Learning (DML 2017), pages 498-504, San Francisco, CA, 2017. PDF
Anas Katib, Praveen Rao, Kobus Barnard, Charles Kamhoua - Fast Approximate Score Computation on Large-Scale Distributed Data for Learning Multinomial Bayesian Networks. In the ACM Transactions on Knowledge Discovery from Data (TKDD), 13(2):14:1-14:40, 2019. PDF
Arun Zachariah, Praveen Rao, Anas Katib, Monica Senapati, Kobus Barnard - A Gossip-Based System for Fast Approximate Score Computation in Multinomial Bayesian Networks. In the 35th IEEE International Conference on Data Engineering (ICDE), pages 1968-1971, Macau, China, 2019. PDF
Faculty PI: Praveen Rao
PhD Students: Anas Katib, Arun Zachariah, Monica Senapati
Others: Kobus Barnard, Charles Kamhoua, Laurent Njilla, Kevin Kwiat
We would like the acknowledge the partial support of NSF Grant No. 1747751.