The "Gamma" operator

Last updated: April 7, 2018

This repository contains almost all the code I wrote during my Ph.D. study at the University of Houston for the "Gamma" operator project.

The "Gamma" operator is a matrix operator that can be used to generate a summarization matrix (which we call the "Gamma" matrix) for a given input matrix. This "Gamma" matrix can be used as an intermediate matrix for computing many linear machine learning models including linear regression, PCA, Naive Bayes Classifier, K-means Clustering, etc.

This research has been published into several papers, listed below:

The Gamma Matrix to Summarize Dense and Sparse Data Sets for Big Data Analytics
Carlos Ordonez, Yiqun Zhang, Wellington Cabrera
IEEE Transactions on Knowledge and Data Engineering (TKDE)，
28(7): 1905-1918 (2016) [IEEEXplore] [PDF]
A Cloud System for Machine Learning Exploiting a Parallel Array DBMS
Yiqun Zhang, Carlos Ordonez, Lennart Johnsson
IEEE Proceedings of the DEXA Workshop 2017 [PDF]
The Gamma Operator for Big Data Summarization on an Array DBMS
Carlos Ordonez, Yiqun Zhang, Wellington Cabrera
Journal of Machine Learning Research (JMLR): Workshop and Conference Proceedings (BigMine 2014: 88-103) [PDF]
Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix
Carlos Ordonez, Yiqun Zhang
Proc. Alberto Mendelzon International Workshop on Foundations of Data Management (AMW), 2016 [PDF]
Big Data Analytics Integrating a Parallel Columnar DBMS and the R Language
Yiqun Zhang, Carlos Ordonez, Wellington Cabrera
IEEE International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2016 [IEEEXplore] [PDF]

Repository structure

The earliest work of this project was published in the BigMine 2014 paper. The "Gamma" operator was initially written in C++, running on SciDB. You can find those operators in the gamma-scidb directory, in different versions (dense/sparse), even with GPU acceleration (OpenACC). We later wanted to compare this SciDB implementation with Spark and Vertica, so there they are, the gamma-spark and the gamma-vertica directory. Also, there is a ScaLAPACK prototype authored by a previous student Hadi Montakhabi in the scalapack-gamma directory. We tried SciDB and Vertica for the K-means Clustering, but I believe the Vertica version was not done. In the scidb-udos folder, I included my customized SciDB operator for 2-D array loading as well as some other operators that I had fun with while learning how to write SciDB operators. In the tools folder, I uploaded some scripts I used to help with my development or experiments. It also contains some proof-of-concept little programs.

Contact

I apologize for not having enough time to polish all that source code and to provide very detailed documentations. The code here is not so well engineered in my today's view, but it carries all my beautiful memories for my Ph.D. life. If you are interested in any of those work, please contact Dr. Carlos Ordonez via emails to carlos at central dot uh dot edu.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
gamma-scidb		gamma-scidb
gamma-spark		gamma-spark
gamma-vertica		gamma-vertica
kmeans-scidb		kmeans-scidb
kmeans-vertica		kmeans-vertica
scalapack-gamma		scalapack-gamma
scidb-udos		scidb-udos
tools		tools
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The "Gamma" operator

Repository structure

Contact

About

Releases

Packages

Languages

ethanyzhang/gamma

Folders and files

Latest commit

History

Repository files navigation

The "Gamma" operator

Repository structure

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages