ABS: A System for Scalable Approximate Queries with Accuracy Guarantees
ABS is a parallel approximate query processing for running interactive SQL queries over massive data. It allows users to pick the desire spot in the latency-accuracy space by running the queries data samples, and presents approximate query results with accuracy guarantees.
ABS achieves this by exploiting the recent advance in scalable error estimation techniques -- Analytical Bootstrap Method, and lets the compiler automatically choose this fast method whenever possible and brings the error estimation overhead down to seconds.
ABS is available for Hive and Shark, and supports HiveQL.
ABS is currently built on top of
- Hive 0.11
- Shark 0.9.1
- Spark 0.9.1
Note that ABS is different from the sequential implementation ABM presented in our SIGMOD paper, which is implemented as a middle layer on top of MonetDB using Java and R.
ABS can be set up very quickly using standalone mode.
- Clone the source
- Enter abs root folder
- Run clean script
chmod +x clean.sh && ./clean.sh
- Compile using
After you have successfully compiled your abs code, you can start ABS cli by
./bin/shark or start server by
./bin/shark --service sharkserver
ABS requires Spark 0.9.1 for running on clusters. You can setup the cluster by following similar steps in Running Shark on a Cluster
ABS extends Hive and Shark. This repository contains all the codes for Shark and for Hive, we provide a jar file: hive-exec-0.11.0-shark-0.9.1.jar. If you are interested in Hive implementation, please check here.
Kai Zeng, Shi Gao, Jiaqi Gu, Barzan Mozafari, Carlo Zaniolo: ABS: a system for scalable approximate queries with accuracy guarantees. ACM SIGMOD 2014 (Best Demo Award)
Kai Zeng, Shi Gao, Barzan Mozafari, Carlo Zaniolo: The analytical bootstrap: a new method for fast error estimation in approximate query processing. ACM SIGMOD 2014
Kai Zeng, Shi Gao, Barzan Mozafari, Carlo Zaniolo: The analytical bootstrap: a new method for fast error estimation in approximate query processing. Technical Report CSD #130028, UCLA, 2013.