Skip to content

The mission of this project is to build a query processing engine for Ad-Hoc OLAP queries. The query construct is based on an extended SQL syntax known as MF and EMF queries (i.e., Multi-Feature and Extended Multi-Feature queries). Extended-SQL introduces a new operator called PHI to standard SQL to extend & simplify OLAP SQL queries.

Notifications You must be signed in to change notification settings

vishwa5854/Extended-SQL

Repository files navigation

ESQL

Extended-SQL introduces a new operator called PHI to standard SQL to extend & simplify SQL queries (especially OLAP queries, including hierarchies, trends, etc).

This project is based on the philosophy of "Succint queries lead to efficient evaluation".

Extended-SQL lets you decouple group by attributes and aggregate functions by introducing a operation called as PHI in relational algebra, and a new clause called as SUCH THAT which operates on a new term called as GROUPING VARIABLES.

Mission

The mission of this project is to build a query processing engine for Ad-Hoc OLAP queries. The query construct is based on an extended SQL syntax known as MF and EMF queries (i.e., Multi-Feature and Extended Multi-Feature queries).

Normal SQL vs Extended-SQL

(TLDR; Extended-SQL is much more Simpler, Succint & Blazingly Fast)

Consider this OLAP table schema which records sales.

sales (cust, prod, day, month, year, state, quant, date)

  1. For each customer and product find avg quantity in states of "NY" and "CT". Traditional SQL query would look something like this involving multiple expensive table JOINS

    WITH t1 as (
        SELECT cust, prod, avg(quant)
        FROM sales
        WHERE state = 'NY'
        GROUP BY cust, prod
    ), t2 as (
        SELECT cust, prod, avg(quant)
        FROM sales
        WHERE state = 'CT'
        GROUP BY cust, prod
    ) select * from t1 inner join t2 on t1.cust = t2.cust and t1.prod = t2.prod;

    Extended-SQL Version (Note that this is an MF-Query, more on this later)

    SELECT cust, prod, avg(1.quant) as NY_avg_quant, avg(2.quant) as CT_avg_quant
    FROM sales
    GROUP BY cust, prod; 1, 2
    SUCH THAT 1.state = 'NY',
    2.state = 'CT';
  2. One may want to identify those months that were “significant” for the sales of a product: “For each product and sales, show the product’s average sale before and after each month.” (trends) Traditional SQL query would look something like this involving multiple expensive table JOINS

    DROP VIEW B1;
    CREATE VIEW B1 as
    SELECT x.prod, x.month, avg(y.quant) as xx
    FROM Sales x,
         Sales y
    WHERE x.prod = y.prod
      and x.month > y.month
    GROUP BY x.prod, x.month;
    
    DROP VIEW B2;
    CREATE VIEW B2 as
    SELECT x.prod, x.month, avg(y.quant) as yy
    FROM Sales x,
         Sales y
    WHERE x.prod = y.prod
      and x.month < y.month
    GROUP BY x.prod, x.month;
    
    SELECT B1.prod, B1.month, xx, yy
    FROM B1,
         B2
    WHERE B1.prod = B2.prod
      and B1.month = B2.month;

    Extended-SQL Version (Note that this is an EMF-Query, more on this later)

    SELECT prod,
       month,
       avg(X.quant),
       avg(Y.quant)
    FROM Sales
    GROUP BY prod, month; X , Y
    SUCH THAT X.prod = prod and X.month < month,
    Y.prod = prod and Y.month > month;

Pre-requisites

  1. You need postgres to be running either in your local or modify .env to point to a running postgres instance
docker compose build && docker compose up -d
  1. Set up the username, password so that .env matches
  2. You would need dev libraries for Postgres
# fedora users example
sudo dnf install postgresql-devel

How to run ?

You need to have postgres running in your local machine or change the configuration of postgres in the .env file.

  1. Create a virtual environment to run the code.
 python3 -m venv esql
  1. Spawn a shell inside the created virtual environment.
source esql/bin/activate
  1. Install dependencies
pip3 install -r requirements.txt
  1. Run the code
python3 generator.py <input_file> <options>

Sample Usage

  1. If you would like to run an emf query
python3 generator.py emf-inputs/1.txt
  1. If you would like to run an mf query
python3 generator.py mf-inputs/1.txt mf
  1. If you just wanna generate the code but don't want to run it automatically
python3 generator.py emf-inputs/1.txt dont-run

Installing Postgres & PGAdmin

I use docker to spin up Postgres and PGAdmin in my local using the postgres-pgadmin-compose.yml. The below command will spin up a docker compose with both PGAdmin and Postgres with credentials matching in .env file.

docker compose -f postgres-pgadmin-compose.yml up -d

About

The mission of this project is to build a query processing engine for Ad-Hoc OLAP queries. The query construct is based on an extended SQL syntax known as MF and EMF queries (i.e., Multi-Feature and Extended Multi-Feature queries). Extended-SQL introduces a new operator called PHI to standard SQL to extend & simplify OLAP SQL queries.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published