Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Columns shouldn't be visible after select in SQL backends #1426

Closed
craigcitro opened this issue Sep 28, 2015 · 3 comments
Closed

Columns shouldn't be visible after select in SQL backends #1426

craigcitro opened this issue Sep 28, 2015 · 3 comments
Labels
Milestone

Comments

@craigcitro
Copy link
Contributor

@craigcitro craigcitro commented Sep 28, 2015

(apologies if this is a dupe, but I couldn't find the original)

Currently, for some SQL backends (at least sqlite), columns are still visible for filtering after a select, eg

src_sqlite(path='Databases.db') %>% tbl("Databases") %>% select(id, name) %>% filter(description != 'temp')

will still filter on the description column, even though it should no longer be visible. This works as expected for dataframes:

> df <- data_frame(x = runif(10), y = x*2)
> df %>% select(a = x) %>% filter(y > 1)
Error: object 'y' not found

One option here is to just collapse inside select, though it's not clear whether that would put too much pressure on the query optimizer.

@hadley hadley added this to the 0.5 milestone Oct 21, 2015
@krlmlr
Copy link
Member

@krlmlr krlmlr commented Nov 27, 2015

Could this be a feature of SQLite: The filter appears to "work" even after inserting a compute() between select() and filter(), which leads to the conclusion that SQLite treats unknown columns in a special way in WHERE (and also ORDER BY) clauses.

i <- copy_to(src_sqlite(":memory:", TRUE), iris)
i %>% arrange(bogus)
## ...
i %>% filter(bogus > 5)
## Source: sqlite 3.8.6 [:memory:]
## From: iris [0 x 5]
## Filter: bogus < 5 
## 
## Error: n not greater than 0L
i %>% filter(bogus < 5) %>% head
## [1] Sepal.Length Sepal.Width  Petal.Length Petal.Width  Species     
## <0 rows> (or 0-length row.names)

This is in addition to the behavior that a select() doesn't seem to hide the removed variables:

> i %>% select(Petal.Width) %>% filter(Species == "setosa") %>% explain
<SQL>
SELECT "Petal.Width" AS "Petal.Width"
FROM "iris"
WHERE "Species" = 'setosa'


<PLAN>
  selectid order from          detail
1        0     0    0 SCAN TABLE iris

Which, in turn, is probably related to #983 and #1000.

@craigcitro
Copy link
Contributor Author

@craigcitro craigcitro commented Nov 28, 2015

I'm not sure what you mean by "feature of SQLite" -- note that the corresponding query doesn't work in SQLite, i.e. if you have a table with columns a, b, and c, then

SELECT a FROM (SELECT b, c FROM t)

fails.

@momeara
Copy link

@momeara momeara commented Mar 1, 2016

A related issue is renaming with either rename or select is invisible before applying filter. The issue is that the select and filter are combined into a single SQL query and in postgres, it is illegal to reference an ALIAS in the WHERE clause (see http://www.postgresql.org/message-id/18195.1038530280@sss.pgh.pa.us)

As an example,

library(dplyr)
library(DBI)
data_repo <- src_postgres(...)

x <- copy_to(data_repo, data_frame(a=c(1,2,3), "x")
x %>% rename(b = a) %>% collect
 Source: local data frame [3 x 1]

       b
   (dbl)
 1     1
 2     2
 3     3

> y <- x %>% rename(b=a) %>% filter(b<2)
> y$query$sql
 <SQL> SELECT "a" AS "b"
 FROM "x"
 WHERE "b" < 2.0

> y %>% collect
 Error in postgresqlExecStatement(conn, statement, ...) :
   RS-DBI driver: (could not Retrieve the result : ERROR:  column "b" does not exist
 LINE 3: WHERE "b" < 2.0
               ^
 )

Note that including a compute in between the select and filter works around the problem.

This behavior is problematic because it is different different from how dplyr works for data.frames and it breaks the mental model of piping the results as immutable objects through the various verbs.

Ideally, it would be nice if dplyr was able back out the underlying table.column names from the ALIASes and use them in to construct the WHERE clause rather then using the alias directly.

@hadley hadley closed this in a5b741e Mar 17, 2016
@lock lock bot locked as resolved and limited conversation to collaborators Jun 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants