AA1: Add OFFSET to the LIMIT Clause
- Released: 9/13
- Due: 9/27 10AM
- Value: 3.75% of class grade
- Max team of 2
This assignment is part of the DataBass Series of Advanced Assignment. Take a look at the README in this directory to get an overview of the system and how to set it up. Then read these instructions. Since these advanced assignments are new and experimental, please be patient with us and ask any questions on Piazza.
The goal of this assignment is to modify the SQL parser, the operator implementations, and the interpretor to support the OFFSET statement in the LIMIT clause. After completing this assignment, DataBass will be able to eecute queries such as
SELECT * FROM data LIMIT 10 OFFSET 5 SELECT * FROM data LIMIT 10 OFFSET 2*5+1
But should fail for inputs with incorrect syntax
SELECT * FROM data LIMIT 10 OFFSET
In addition, syntactically correct queries should be parsed but fail to run (
foo is not defined):
SELECT 1 LIMIT 10 OFFSET foo
- Updated specs to reflect that negative OFFSET values should raise an exception
You will edit
parse_sql.py in two places. You will edit the PEG grammar to support the "OFFSET expression" syntax. OFFSET is an optional part of the LIMIT clause. If OFFSET is specified, then it should be followed by one or more white spaces, followed by an expression that evaluates to a number. The expression cannot reference any attributes.
Further, the Visitor object transforms the parsed abstract syntax tree into the query plan. Edit the
visit_limit method to take into account the new OFFSET syntax.
A SQL query (or any program) is just a string. The parser turns the string into a data structure that the system knows how to execute. For SQL, this data structure is a query plan. To do so, we typically use a grammar to define the syntax of the language. For example, let's say we want to support arithmetic expressions consisting of numbers and plus symbols. Then we would write a list of grammar rules such as the following:
grammar = Grammar( r""" exprstmt = ws expr ws expr = biexpr / value biexpr = value ws op ws expr value = ~"\d*\.?\d+"i op = "+" ws = ~"\s*"i wsp = ~"\s+"i """)
A rule is composed of a head that is the rule's name (e.g.,
exprstmt) and the body (e.g.,
ws expr ws). The body is a list of string literals (e.g.,
+) or another defined rule (e.g.,
ws). For example,
ws expr ws has three means that an expression is any number of white spaces (
ws) as defined by the regular expression (
\s*) followed by input characters that match an expression rule
expr followed by any number of white spaces.
expr = biexpr / value contains a
/ symbol. This means the input should either match
biexpr or, if it doesn't match, a single
biexpr is further defined as a
value followed by some whitespaces, a binary operator
+, some whitespace, and an
This grammar can parse expression strings such as
1 + 2 1.1 1 + 2 + 3
A grammar itself is not enough to magically parse an input string. What we want to turn the input string into an expression tree, such as
+ / \ 1 + / \ 2 3
Where each node is an object we have defined elsewhere. To do so, the library we use provides a class called
NodeVisitor. Each time the parser successfully matches a grammar rule (e.g.,
expr), it will try to call the corresponding visitor method (e.g.,
visit_expr()) or the default method
generic_visit(). We can implement these
visit_XXX rules to build our expression tree. The final parsed object is whatever the visitor method for the root rule
class Visitor(NodeVisitor): grammar = grammar def visit_expr(self, node, children): return children def visit_biexpr(self, node, children): return Expr(children, children, children[-1]) def visit_op(self, node, children): return node.text def visit_value(self, node, children): return Literal(float(node.text)) def generic_visit(self, node, children): f = lambda v: v and (not isinstance(v, str) or v.strip()) children = list(filter(f, children)) if len(children) == 1: return children return children
Each visitor method is passed the AST node of the matched rule, and the children for each nonterminal in the rule's body.
children is a list the same size as the rule's body, and each element is the return value of visiting the corresponding nonterminal. For example,
children for the
biexpr rule has 5 elements:
[<return value of visit_value>, None, <return value of visit_op>, None, <return value of visit_expr>]
This is why
visit_biexpr creates an
Expr object with the operator ("+"), and the left and right operands.
Finally, the following code parses "1 + 2 + 3" using our grammar and visitor implementation.
print Visitor().parse("1 + 2 + 3")
In this assignment you will edit the parser in
parse_sql.py to support the OFFSET syntax, and edit the LIMIT operator in
ops.py to take offset information and take it into account during query execution.
limit rule in the SQL grammar to recognize the
OFFSET [expression] syntax. The offset keyword in a query should be capitalized, followed by one or more spaces, and then an expression. Note that the offset can be an expression:
SELECT 1 FROM data LIMIT 1 OFFSET 1 SELECT 1 FROM data LIMIT 1 OFFSET 10 SELECT 1 FROM data LIMIT 1 OFFSET 1+10 SELECT 1 FROM data LIMIT 1 OFFSET 1*10+4
However, it is NOT valid for OFFSET to be negative!
Further, you can assume that the offset expression will only be tested with arithmetic expressions that well not reference any attributes. Thus do you don't need to worry about the following:
SELECT 1 FROM data LIMIT 1 OFFSET 'blue' SELECT 1 FROM data LIMIT 1 OFFSET data.attr + 1
Further, make sure to edit the rule's visitor so that it initializes the Limit operator properly.
The LIMIT Operator
At this point, databass will throw an exception during parsing because the Limit class' constructor needs to be modified to store the offset value. Edit
ops.py so that the Limit operator can take the offset as an optional paramater. At this point, you should be able to parse queries with OFFSET, but it will not run correctly.
Limit.__iter__() so that if an offset is specified, the execution engine will correctly return the results offset by the amount specified in the OFFSET clause.
The SQL specs state that LIMIT and OFFSET cannot be negative. Please make sure that if the OFFSET is negative, that you raise an exception (of any kind).
test_offset.py into the
databass/src/engine/ folder and then run it. All the tests should pass. Take a look at the file to see what we are testing. Feel free to add additional tests that we may not have included. Our autograder contains a more complete set of test cases that we will evaluate your submission with.
Go to the root of your databass codebase repository, and run
git pull to make sure you have the most recent version. You should see a submission script
Run the submission script
submit.py to package and zip your file. It will ask you to submit you and your partners' UNIs and specify the assignment. It will create a zip file in the current directory. Submit the generated ZIP file to Canvas.
python submit.py --help
Both partners should submit on their Canvas accounts to AA1.