fix(exasol)!: fix parsing error in json_extract for exasol#7098
fix(exasol)!: fix parsing error in json_extract for exasol#7098nnamdi16 wants to merge 3 commits intotobymao:mainfrom
Conversation
* Generate static documentation pages & update the CI * Change CI so that changes are committed & pushed * Make it so that only python v3.10 triggers doc generation * Fix yaml syntax * PR feedback, fix push action # Conflicts: # .github/workflows/python-package.yml
| ), | ||
| # https://docs.exasol.com/db/latest/sql_references/functions/alphabeticallistfunctions/json_extract.htm | ||
| "JSON_EXTRACT": _build_extract_json(exp.JSONExtract), | ||
| "NOW": exp.CurrentTimestamp.from_arg_list, |
There was a problem hiding this comment.
We shouldn't do it like this, please add an entry in FUNCTION_PARSERS and get rid of the _parse_function override.
There was a problem hiding this comment.
Then how do we handle EMITS if we are not allowed to override _parse_function , which is required in using JSONExtract in exasol https://docs.exasol.com/db/latest/sql_references/functions/alphabeticallistfunctions/json_extract.htm
There was a problem hiding this comment.
You will parse the JSON_EXTRACT using the default parser (i.e., leveraging the superclass' logic) and then consume the tokens needed to handle the EMIT syntax.
sqlglot/dialects/exasol.py
Outdated
| exp.CurrentSchema: lambda *_: "CURRENT_SCHEMA", | ||
| exp.DateDiff: _date_diff_sql, | ||
| exp.DateAdd: _add_date_sql, | ||
| exp.JSONExtract: _json_extract_sql, |
There was a problem hiding this comment.
There shouldn't be a module-level function for this, add a method jsonextract_sql under Generator here.
| column.set("table", None) | ||
| return column | ||
|
|
||
| def _parse_field(self, *args, **kwargs): |
There was a problem hiding this comment.
@nnamdi16 this is still not correct, did you try adding a function parser as per my suggestion, earlier?
There was a problem hiding this comment.
I did for JSON_EXTRACT but I had issue parsing the EMITS that was why I resulted in overriding _parse_field
There was a problem hiding this comment.
I replied here: #7098 (comment). Is it still unclear? Can you please explain what you tried and didn't work?
There was a problem hiding this comment.
I first added a custom FUNCTION_PARSER for JSON_EXTRACT and tried to match the EMITS clause, but it never matched because the parser was still positioned at the closing)when my method ran. I then tried consuming the closing parenthesis myself using self._match_r_paren(), which let me see EMITS, but this broke SQLGlot’s normal parsing flow since the parenthesis was consumed twice. This resulted in errors like:
sqlglot.errors.ParseError: Expecting ). Line 1, Col: 177.I also tried calling super()._parse_function_call to reuse the default logic, but that caused parser state issues. @georgesittas
There was a problem hiding this comment.
Hmm I see. I think we need a mechanism to make _match_r_paren() optional, at least for when there's a registered FUNCTION_PARSER that handles the consumption of ) to support cases like this one.
One idea is to check whether parser is truthy (i.e., we detected a function parser) and whether the _prev token is ). If both these conditions hold, you could do _match(R_PAREN) instead of _match_r_paren(), to make the match non-failing.
There was a problem hiding this comment.
I would look into it thanks for your suggestion @georgesittas
| path = [e.expression, *e.args.get("expressions", [])] | ||
| sql = self.func("JSON_EXTRACT", e.this, *path) |
There was a problem hiding this comment.
| path = [e.expression, *e.args.get("expressions", [])] | |
| sql = self.func("JSON_EXTRACT", e.this, *path) | |
| sql = self.func("JSON_EXTRACT", e.this, e.expression, *e.expressions) |
| expression = super()._parse_field(*args, **kwargs) | ||
| if isinstance(expression, exp.JSONExtract) and self._match_texts("EMITS"): | ||
| schema = self._parse_schema() | ||
| expression.set("columns", schema.expressions) |
There was a problem hiding this comment.
Why are you setting columns? This doesn't look like the right way to represent EMITS in the AST, columns is used for a different purpose today, right?
What motivated this PR?
When parsing Exasol SQL that uses the
JSON_EXTRACTfunction, it results in parsing errors.See #6695
What was incorrect in the existing logic?
The Exasol dialect parser did not fully support the parsing of the
JSON_EXTRACTfunction. As a result, valid Exasol SQL such as:fails during parsing.
How does this PR resolve the issue?
This PR fixes the Exasol JSON_EXTRACT parsing error by correctly handling the required EMITS clause and multiple JSON path arguments. It also updates the Exasol generator to render JSON_EXTRACT(... ) EMITS (...) properly by emitting all parsed JSON paths and printing the EMITS column definitions.
Documentation and semantic notes
JSON_EXTRACTfunction in the Exasol dialect.