Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to parse Trino statement with non-escaped limit column #3247

Closed
3 tasks done
john-bodley opened this issue Mar 30, 2024 · 12 comments
Closed
3 tasks done

Unable to parse Trino statement with non-escaped limit column #3247

john-bodley opened this issue Mar 30, 2024 · 12 comments

Comments

@john-bodley
Copy link
Contributor

According to the documentation LIMIT is not a reserved keyword, though it seems like SQLGlot is unable to parse statements referring to a column named limit unless it is escaped.

Before you file an issue

  • Make sure you specify the "read" dialect eg. parse_one(sql, read="spark")
  • Make sure you specify the "write" dialect eg. ast.sql(dialect="duckdb")
  • Check if the issue still exists on main

Fully reproducible code snippet
Please include a fully reproducible code snippet or the input sql, dialect, and expected output.

>>> parse_one("SELECT limit FROM foo.bar", read="trino")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/johnbodley/repos/github.com/sqlglot/sqlglot/__init__.py", line 124, in parse_one
    result = dialect.parse(sql, **opts)
  File "/Users/johnbodley/repos/github.com/sqlglot/sqlglot/dialects/dialect.py", line 490, in parse
    return self.parser(**opts).parse(self.tokenize(sql), sql)
  File "/Users/johnbodley/repos/github.com/sqlglot/sqlglot/parser.py", line 1163, in parse
    return self._parse(
  File "/Users/johnbodley/repos/github.com/sqlglot/sqlglot/parser.py", line 1229, in _parse
    expressions.append(parse_method(self))
  File "/Users/johnbodley/repos/github.com/sqlglot/sqlglot/parser.py", line 1458, in _parse_statement
    expression = self._parse_set_operations(expression) if expression else self._parse_select()
  File "/Users/johnbodley/repos/github.com/sqlglot/sqlglot/parser.py", line 2521, in _parse_select
    this = self._parse_query_modifiers(this)
  File "/Users/johnbodley/repos/github.com/sqlglot/sqlglot/parser.py", line 2669, in _parse_query_modifiers
    key, expression = parser(self)
  File "/Users/johnbodley/repos/github.com/sqlglot/sqlglot/parser.py", line 952, in <lambda>
    TokenType.LIMIT: lambda self: ("limit", self._parse_limit()),
  File "/Users/johnbodley/repos/github.com/sqlglot/sqlglot/parser.py", line 3620, in _parse_limit
    limit_exp = self.expression(
  File "/Users/johnbodley/repos/github.com/sqlglot/sqlglot/parser.py", line 1293, in expression
    return self.validate_expression(instance)
  File "/Users/johnbodley/repos/github.com/sqlglot/sqlglot/parser.py", line 1313, in validate_expression
    self.raise_error(error_message)
  File "/Users/johnbodley/repos/github.com/sqlglot/sqlglot/parser.py", line 1273, in raise_error
    raise error
sqlglot.errors.ParseError: Required keyword: 'expression' missing for <class 'sqlglot.expressions.Limit'>. Line 1, Col: 17.
  SELECT limit FROM foo.bar

Note that SQLGlot is able to parse the following statement if the limit column is escaped.

>>> parse_one('SELECT "limit" FROM foo.bar', read="trino")
Select(
  expressions=[
    Column(
      this=Identifier(this=limit, quoted=True))],
  from=From(
    this=Table(
      this=Identifier(this=bar, quoted=False),
      db=Identifier(this=foo, quoted=False))))

Official Documentation

@tobymao
Copy link
Owner

tobymao commented Mar 30, 2024

this is a big pain to deal with. considering most other engines limit is a reserved keyword, i'm not inclined to make this change. it breaks a lot of other things, so closing this as won't fix.

@tobymao tobymao closed this as not planned Won't fix, can't repro, duplicate, stale Mar 30, 2024
@barakalon
Copy link
Collaborator

barakalon commented Mar 30, 2024

Don't we just need to add LIMIT to the trino parser's ID_VAR_TOKENS?

@barakalon
Copy link
Collaborator

from sqlglot import parse_one
from sqlglot.dialects.trino import Trino
from sqlglot.tokens import TokenType

Trino.Parser.ID_VAR_TOKENS.add(TokenType.LIMIT)

q = """
SELECT limit FROM limit ORDER BY limit LIMIT 1
"""

print(repr(parse_one(q, dialect="trino")))
Select(
  expressions=[
    Column(
      this=Identifier(this=limit, quoted=False))],
  limit=Limit(
    expression=Literal(this=1, is_string=False)),
  from=From(
    this=Table(
      this=Identifier(this=limit, quoted=False))),
  order=Order(
    expressions=[
      Ordered(
        this=Column(
          this=Identifier(this=limit, quoted=False)),
        nulls_first=False)]))

@georgesittas
Copy link
Collaborator

georgesittas commented Mar 30, 2024

It's not that simple. I think your example works by accident and may have actually uncovered an interesting bug.

Basically, Dialect gets created before any of its subclasses, so TABLE_ALIAS_TOKENS is computed at that time as a function of Dialect.ID_VAR_TOKENS and it doesn't contain TokenType.LIMIT. This means that SELECT * FROM t LIMIT 1 correctly produces a Limit node instead of treating LIMIT as an alias of t.

Even if you add TokenType.LIMIT to a dialect's ID_VAR_TOKENS, it doesn't affect the TABLE_ALIAS_TOKENS of that dialect, so you still have the above behavior (the "accident"). I think the right approach would be to compute class attributes which depend on other class attributes in the metaclass to avoid this situation.

Using the above logic you can easily construct an example where Trino's parser would fail:

>>> from sqlglot import parse_one
>>> from sqlglot.dialects.trino import Trino
>>> from sqlglot.tokens import TokenType
>>>
>>> Trino.Parser.ID_VAR_TOKENS.add(TokenType.LIMIT)
>>>
>>> q = """
... SELECT a LIMIT 1
... """
>>>
>>> print(repr(parse_one(q, dialect="trino")))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "sqlglot/sqlglot/__init__.py", line 124, in parse_one
    result = dialect.parse(sql, **opts)
  File "sqlglot/sqlglot/dialects/dialect.py", line 490, in parse
    return self.parser(**opts).parse(self.tokenize(sql), sql)
  File "sqlglot/sqlglot/parser.py", line 1167, in parse
    return self._parse(
  File "sqlglot/sqlglot/parser.py", line 1236, in _parse
    self.raise_error("Invalid expression / Unexpected token")
  File "sqlglot/sqlglot/parser.py", line 1277, in raise_error
    raise error
sqlglot.errors.ParseError: Invalid expression / Unexpected token. Line 2, Col: 16.

SELECT a LIMIT 1
               ~

@tobymao
Copy link
Owner

tobymao commented Mar 30, 2024

don’t a ton of tests break when you add limit to id var?

@georgesittas
Copy link
Collaborator

Yes, if you add it in the base parser. If you add it to a dialect that doesn't override TABLE_ALIAS_TOKENS though, the same tests won't break for it (see above).

@tobymao
Copy link
Owner

tobymao commented Mar 30, 2024

i added limit to table alias tokens in base as well and all the tests broke.

@barakalon
Copy link
Collaborator

Ah I see.

FWIW, SELECT a LIMIT 1 isn't a valid query.

Here's trino's grammar.

Some tests on trino:

-- fails
SELECT 1 limit
-- succeeds
SELECT 1 AS limit

-- fails
SELECT limit FROM (SELECT 1 AS limit)
-- succeeds
SELECT "limit" FROM (SELECT 1 AS limit)

@john-bodley where did this come up? Does SELECT limit FROM foo.bar actually parse?

@john-bodley
Copy link
Contributor Author

@barakalon this toy example is from Superset performing a SELECT *-esque query where it enumerates all the columns (as opposed to using the wildcard operator).

Regardless, SELECT limit FROM foo.bar is a valid query in Trino, i.e., one doesn’t need to explicitly escape the limit column.

@barakalon
Copy link
Collaborator

Based on SELECT limit FROM (SELECT 1 AS limit) failing to parse in trino, I'm curious how SELECT limit FROM foo.bar could be valid?
Give me a bit to test with a table that has a limit column, but I'm curious if you've actually tested such a query?

@barakalon
Copy link
Collaborator

oh - ha!
I'm testing on superset, so I think its actually sqlglot that's failing 🤦

@georgesittas
Copy link
Collaborator

oh - ha! I'm testing on superset, so I think its actually sqlglot that's failing 🤦

Lol

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants