Query Processor Project

Summary

This project includes a volcano-style iterator model query processor written in python. This program processes JSON serialized queries from tables described in a .csv file format. This query processor is capable of Select, Project, Join, Scan, and Limit query operations. It is also capable of evaluating column references and aliases, constants, arithmetic operations, comparison operations, and logical operations. The code takes a modular approach, where the join, limit, project, scan, and select, are all subclasses of a base operation superclass allowing for further implementation of additional operations.

Validation approach

To validate this model, I ran all of the example queries included within this project. The queries exercise multiple operators. I made sure that every operator was tested and utilized in at least one query. I statically verified the correctness of each query since the correct output for each query was short enough to be read by a human.

Additional Optimization Features

Join Algorithms

Hash Join

Builds a hash table from the right input based on a join key, then iterates over rows from the left input to find matching rows efficiently. The operator combines matching records from both sides into a single result and returns them one by one.

Sort-Merge Join

Loads and sorts both input tables by their join keys, then sequentially scans through them to find and combine matching rows

Predicate Pushdown

Traverses the query plan tree and moves filter conditions (Select operators) as close as possible to the data sources (Scan operators)

How to run the system

All queries (.json files) and tables (.csv files) can be found in data/
There are two example folders (example1 and example2) which each contain different queries and table data
Set CENGINE_OPTIMIZE=true/false to enable/disable predicate pushdown Optimization
Set CENGINE_JOIN_ALGO=nested_loop/hash/sort_merge to enable nested loop joins, hash joins, or sort_merge joins

Command Template

CENGINE_OPTIMIZE=[true/false] CENGINE_JOIN_ALGO=[nested_loop/hash/sort_merge] python3 query_processor.py /path/to/query.json

Example Command

CENGINE_OPTIMIZE=true CENGINE_JOIN_ALGO=hash python3 query_processor.py data/example2/query_open_order_line_totals.json

Running the benchmarking script

python3 benchmark.py

AI usage notice

Since I had no idea where to start this project, I asked ChatGPT to generate a folder structure and some boilerplate code for this project. The ideas for the modular approach came from ChatGPT. I filled in the rest of the details I also had ChatGPT generate the benchmarking script, as well as generate most of the predicate pushown code.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
__pycache__		__pycache__
data		data
operators		operators
Query Processor Project_ Final Report.pdf		Query Processor Project_ Final Report.pdf
README.md		README.md
benchmark.py		benchmark.py
benchmark_results.json		benchmark_results.json
expr.py		expr.py
generate_data.py		generate_data.py
generate_schemas.py		generate_schemas.py
optimizer.py		optimizer.py
plan.py		plan.py
query_processor.py		query_processor.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Query Processor Project

Summary

Validation approach

Additional Optimization Features

Join Algorithms

Hash Join

Sort-Merge Join

Predicate Pushdown

How to run the system

Command Template

Example Command

Running the benchmarking script

AI usage notice

About

Uh oh!

Releases

Packages

Languages

wallyschag/query-processor

Folders and files

Latest commit

History

Repository files navigation

Query Processor Project

Summary

Validation approach

Additional Optimization Features

Join Algorithms

Hash Join

Sort-Merge Join

Predicate Pushdown

How to run the system

Command Template

Example Command

Running the benchmarking script

AI usage notice

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages