Respect case-sensitive json key path extractions #269

sgoley · 2022-09-29T06:31:11Z

Describe the bug
sqlfmt utility automatically formats json keys which can be case sensitive.
It will make all keys fully lower case and return nulls as values if the .sql keys do not access the actual case sensitive json keys.

To Reproduce
Use sqlfmt on any sql file that includes something like the following key path extractions:

src:salesperson.name <- all lowercase
SRC:salesperson.name <- column ref lower case
SRC:Salesperson.Name <- json keys have first letter capitals

Expected behavior
sqlfmt should be sensitive to adapter specific json extraction notations.

Example: Snowflake "dot" or "bracket" semi-structured notation

Example: Postgres "colon" semi-structured notation

Exception: Bigquery at least appears to have only stringified semi-structured extracts so this adapter should be safe as-is.

The text was updated successfully, but these errors were encountered:

tconbeer · 2022-09-29T15:04:49Z

Thanks for the report, @sgoley !

BigQuery is case-sensitive when accessing JSON properties using dot notation, also (mentioned in this discussion). I don't think I'll be solving the BQ issue, since parsing the names as field/property names instead of table/column names is out of scope as long as I'm the sole maintainer.

BUT if colons are rare enough to make parsing easy (e.g., they don't appear in other, similar contexts), I would consider respecting case sensitivity here. The hardest and most time-consuming part of this will just be researching the colon operator in all of the major dialects and determining how unique its usage as a JSON property accessor is. I know the colon can also be used to slice arrays in some dialects, but that should be easy to disambiguate from this operator (since it will be directly inside of square brackets). I don't know if there are other uses for the colon in SQL.

If you or anyone else could contribute this research, I'd be very grateful.

Until then, the workaround is to quote all json properties that are not snake_case:

src:salesperson.name
"SRC":salesperson.name
"SRC":"Salesperson"."Name"

gregbayer · 2023-02-21T23:43:44Z

I'm running into this issue in with DBT + Snowflake as well.
Example: After formating, parse_json(event_properties):numFieldsFilledIn becomes parse_json(event_properties):numfieldsfilledin which returns null.

tconbeer · 2023-02-22T01:08:24Z

Thanks. Still looking for help researching the various uses for the colon in different dialects

ryaminal · 2023-05-26T00:09:27Z

Oh man, i just got burned by this. took me forever to find the problem. This was for snowflake specifically.

I know pyfmt is opinionated without options, but perhaps one of the opinions could be to force quotation marks around the keys e.g. c:"FIELD_ONE" instead of just to lowering them or perhaps force bracket notation e.g. c['FIELD_ONE'] ?

jlucas91 · 2024-03-12T02:31:56Z

Coming over from #568.

Completely empathize with the challenges of sole maintenance and respect the call here. If I might suggest a documentation change to save others in my shoes similar pain - I would consider documenting the known cases where sqlfmt will break a project. We missed this change when implementing sqlfmt on a portion of our project, which ended up causing a significant outage for us in production (since case changes lead to silent failures). Ultimately the responsibility lies fully on us for not catching it. But given it appears to be a known concern - I might consider listing it in the Beta warning block (or similar) in the docs. We had (incorrectly in retrospect) assumed that given sqlfmt is now the default implementation in DBT cloud that it was fairly unlikely to break a dbt/snowflake project out of the box.

ryaminal · 2024-03-12T21:38:05Z

@tconbeer , i was looking at this to work on i). after a quick perusal of the code, it appears the casing happens here in node_manager.py.

would a modification to standardize_value in node_manager be where this work is done?

Is one solution for this to add a dialect-specific check for a SEMICOLON token(and others) that either:

doesn't touch the case
surrounds in double quotes
uses brackets

to simplify this work, would it be possible/advised that we break this in to dialect-specific implementation? not meaning a different implementation but slowly adding this for snowflake first then BQ or whatever.

my guess is doing snowflake first would solve the problem for the largest amount of users and potentially simplify this implementation.

brianschillaci · 2024-05-21T14:38:35Z

I just ran into this issue as well.
Something like properties:someRandomKey was formatted to properties:somerandomkey which broke the logic.

tconbeer added bug Something isn't working help wanted Extra attention is needed labels Sep 29, 2022

tconbeer changed the title ~~[Bug] Handling for non-string json key path extractions~~ Respect case-sensitive json key path extractions Sep 29, 2022

tconbeer mentioned this issue Sep 4, 2023

Snowflake JSON fields case sensitive keyword are wrongly lower-cased #475

Closed

tconbeer mentioned this issue Feb 8, 2024

Case-sentitive identifiers within object navigation expressions are lowercased #561

Closed

tconbeer mentioned this issue Mar 11, 2024

sqlfmt incorrect reformatted JSON access path, breaking all accesses. #568

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Respect case-sensitive json key path extractions #269

Respect case-sensitive json key path extractions #269

sgoley commented Sep 29, 2022 •

edited

Loading

tconbeer commented Sep 29, 2022

gregbayer commented Feb 21, 2023 •

edited

Loading

tconbeer commented Feb 22, 2023

ryaminal commented May 26, 2023

jlucas91 commented Mar 12, 2024

ryaminal commented Mar 12, 2024

brianschillaci commented May 21, 2024

Respect case-sensitive json key path extractions #269

Respect case-sensitive json key path extractions #269

Comments

sgoley commented Sep 29, 2022 • edited Loading

tconbeer commented Sep 29, 2022

gregbayer commented Feb 21, 2023 • edited Loading

tconbeer commented Feb 22, 2023

ryaminal commented May 26, 2023

jlucas91 commented Mar 12, 2024

ryaminal commented Mar 12, 2024

brianschillaci commented May 21, 2024

sgoley commented Sep 29, 2022 •

edited

Loading

gregbayer commented Feb 21, 2023 •

edited

Loading