-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More fully supported nested structs and arrays #520
Comments
Can you please start by creating a reprex using |
Here you go:
Note the result when I attempt to index the nested array with |
Slightly more minimal reprex: library(dbplyr)
library(dplyr, warn.conflicts = FALSE)
db <- lazy_frame(
x = c(
list(data = list(list(data = list(id1 = 1, id2 = 2)))),
list(data = list(list(data = list(id1 = 3, id2 = 4))))
),
con = simulate_postgres()
)
db %>% mutate(y = x[["data"]])
#> <SQL>
#> SELECT `x`, `x`.`data` AS `y`
#> FROM `df`
db %>% mutate(y = x[["data"]][[1]])
#> Error: Can only index with strings Created on 2020-10-15 by the reprex package (v0.3.0.9001) |
If the generated sql is correct, the problem probably lies in the database backend that you're using. |
@hadley The generated sql does not appear to me to be correct. The SQL for extracting an element for an array is There does not seem to be any way with current dbplyr syntax to specify that one wants a particular item from an
(Actually, that's what I'd expect it to do - the missing option seems to be |
That is exactly what I fixed in this issue. This is what I see: library(dbplyr)
library(dplyr, warn.conflicts = FALSE)
db <- lazy_frame(
x = list(data = list(list(data = list(id1 = 1, id2 = 2)))),
con = simulate_postgres()
)
db %>% mutate(y = x[["data"]][[1]])
#> <SQL>
#> SELECT `x`, `x`.`data`[1] AS `y`
#> FROM `df` |
Oops - didn't realize you'd implemented the change. Sorry! |
A commit made last year enabled
dbplyr
to support a single layer of struct nesting: 423820aThis commit converts the R syntax
parent_field$sub_field
to SQL syntaxparent_field.sub_field
.In my dataset, the data (from Snowplow) looks like, in a field called contexts:
Calling
contexts[["data"]]
returns, unexpectedly,data[1]
. This is unexpected becausedata
should be an array.Calling
contexts[["data"]][["data"]] (or
contexts[["data"]][["1"]]returns an error that
Expression "contexts"."data" is not of type ROW. The same thing occurs if these are turned into separate steps in a subsequent
mutate` with an intermediate variable name.Interestingly, the errors describe the sql as being translated to
contexts.data.whateverelse
, rather thancontexts.data[1]
, which is what's returned.So, I'm not sure if what's going on is that there's no way to specify indexing a nested array within a struct in
dbplyr
, or if the nested instructions are being translated in a funky way. But in any event, there doesn't seem to be a syntax for drilling deeper into the nested structure.The text was updated successfully, but these errors were encountered: