No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Failed to load latest commit information.
docs lazy commit msg Oct 17, 2018
src Update Nov 2, 2018
.gitignore Initial commit Aug 15, 2018
LICENSE Initial commit Aug 15, 2018 Update Nov 1, 2018 lazy commit msg Sep 20, 2018

DataBass: Not Quite a DataBase

This isn't your average database. This will be the base of operations for exanding your data processing knowledge! The base of your data exploration in this class! It will cover the bassics of query execution that you will learn in class!

We present to you.... the DATABASS..Bass...bass.

Getting Started

This is a simple Python-based analytical database for instructional purposes. See the system design for details.


git clone

#If you get an error like 'Fatal' or 'Access denied' try instead doing:
#git clone

# turn on virtualenv

pip install click pandas numpy parsimonious readline

If you are a Columbia student and have a clic account, you can install and edit databass on clic. That way you can minimize computer environment issues:

# ssh into clic
ssh <your user name>

# create virtual environment and enable it
mkvirtualenv test
workon test

git clone
pip install click pandas numpy parsimonious readline

Take DataBass for a Spin.

Do the following to run the DataBass console:

cd databass-public/src/engine

Below is an example session using the prompt. The user input is the text after the > character.

Welcome to DeepBass.
Type "help" for help, and "q" to exit
> help

List of commands

[query]                           runs query string
PARSE [query or expression str]   parse and print AST for expression or query
TRACE                             print stack trace of last error
SHOW TABLES                       print list of database tables
SHOW <tablename>                  print schema for <tablename>

You can see how simple expressions are parsed:

> parse 1+2*a
1.0 + 2.0 * a

> parse (1+2*a) / 10
(1.0 + 2.0 * a) / 10.0

Or the parsed query plan of a SQL query

> parse SELECT 1+2*a AS a FROM data WHERE a > 1
Project(1.0 + 2.0 * a AS a)
  WHERE(a > 1.0)
	  Scan(data AS data)

When the program starts, DataBass automatically crawls all subdirectories and loads any CSV files that it finds into memory. In our example, src/engine/data contains two CSV files: data.csv and iowa-liquor-sample.csv.

> show tables

> show data
Schema for data
a       <type 'int'>
b       <type 'int'>
c       <type 'int'>
d       <type 'int'>

You can execute a simple query, and it will print the query plan and then the result rows. Notice that SQL keywords need to be CAPITALIZED:

Project(1.0 AS attr0)
{'attr0': 1.0}

> select 1
('ERROR:', Rule 'query' didn't match at 'select 1' (line 1, column 1).)

  Project(* AS None)
	Scan(data AS data)
{'a': 1, 'c': 3, 'b': 2, 'd': 4}
{'a': 1, 'c': 6, 'b': 5, 'd': 7}