Skip to content
forked from tspurway/hustle

A column oriented, embarrassingly distributed relational OLAP database.

License

Notifications You must be signed in to change notification settings

tonicmuroq/hustle

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hustle

A column oriented, embarrassingly distributed, relational OLAP database.

Features

  • column oriented - super fast queries
  • distributed insert - Hustle is designed for petabyte scale datasets in a distributed environment with massive write loads
  • compressed - bitmap indexes, lz4, and prefix trie compression
  • relational - join gigantic data sets
  • partitioned - smart shards
  • embarrassingly distributed (based on Disco)
  • embarrassingly fast (uses LMDB)
  • NoSQL - Python DSL
  • bulk append only semantics
  • highly available, horizontally scalable
  • REPL/CLI query interface

Example Query

select(impressions.ad_id, impressions.date, h_sum(pix.amount), h_count(),
       where=((impressions.date < '2014-01-13') & (impressions.ad_id == 30010), 
               pix.date < '2014-01-13'),
       join=(impressions.site_id, pix.site_id),
       order_by=impressions.date)

BETA / EAP

Please note that this software is beta/early access. We intend that you thoroughly enjoy wrangling unimaginably large datasets with this software, but really have no idea how it will perform in your particular installation. Be nice and drop us a GitHub 'issue' or just email me at tspurway@gmail.com for help.

Installation

After cloning this repo, here are some considerations:

  • you will need Python 2.7 or higher - note that it probably won't work on 2.6 (has to do with pickling lambdas...)
  • you need to install Disco 0.5 and its dependencies - get that working first
  • you need to install Hustle and its 'deps' thusly:
cd hustle
sudo ./bootstrap.sh

Please refer to the Installation Guide for more details

Documentation

http://chango.github.io/hustle/

Hustle Mailing List

Credits

Special thanks to following open-source projects:

About

A column oriented, embarrassingly distributed relational OLAP database.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published