Skip to content

COPY = TRUE is creating a global temporary table of some kind. Not a session specific temporary table. #574

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ben1787 opened this issue Jan 15, 2021 · 13 comments · Fixed by #579
Labels
backend 🕺 bug an unexpected problem or unintended behavior

Comments

@ben1787
Copy link

ben1787 commented Jan 15, 2021

This seems like a bug in the new version of dbplyr.

When I use copy = TRUE, or the underlying copy_to(..., temporary = TRUE) the table is not very temporary. Even if I create a new data connection to the db the table is still there.

First I am posting the reprex using dbplyr 1.4.4. Below that I will paste the reprex with dbplyr 2.0.0:

library(DBI)
library(RPostgreSQL)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(dbplyr)
#> 
#> Attaching package: 'dbplyr'
#> The following objects are masked from 'package:dplyr':
#> 
#>     ident, sql
ch = dbConnect(drv = dbDriver("PostgreSQL"),
                dbname = db,
                host = host,
                port = port,
                user = username,
                password = pwd)

tickers_hedged = data.frame(factset_id='No Hedged Tickers', date = Sys.Date(), instrument_type = as.character(NA))
tickers_hedged_temp_table = paste('tickers_hedged',format(Sys.time(),format = '%Y%m%d%H%M%OS6'),sep='_')
copy_to(dest = ch, df = tickers_hedged, name = tickers_hedged_temp_table, temporary = TRUE)
tbl(ch, tickers_hedged_temp_table)
#> # Source:   table<tickers_hedged_20210120005010.614228> [?? x 3]
#> # Database: postgres 11.0.7
#>   factset_id        date       instrument_type
#>   <chr>             <date>     <chr>          
#> 1 No Hedged Tickers 2021-01-20 <NA>

dbDisconnect(ch)
#> [1] TRUE


ch2 = dbConnect(drv = dbDriver("PostgreSQL"),
                dbname = db,
                host = host,
                port = port,
                user = username,
                password = pwd)
tbl(ch2, tickers_hedged_temp_table)
#> Error in postgresqlExecStatement(conn, statement, ...): RS-DBI driver: (could not Retrieve the result : ERROR:  relation "tickers_hedged_20210120005010.614228" does not exist
#> LINE 1: SELECT * FROM "tickers_hedged_20210120005010.614228" AS "zzz...
#>                       ^
#> )
#expected behavior

Created on 2021-01-20 by the reprex package (v0.3.0)

Now for dbplyr 2.0.0:

library(DBI)
library(RPostgreSQL)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(dbplyr)
#> 
#> Attaching package: 'dbplyr'
#> The following objects are masked from 'package:dplyr':
#> 
#>     ident, sql
ch = dbConnect(drv = dbDriver("PostgreSQL"),
                dbname = db,
                host = host,
                port = port,
                user = username,
                password = pwd)

tickers_hedged = data.frame(factset_id='No Hedged Tickers', date = Sys.Date(), instrument_type = as.character(NA))
tickers_hedged_temp_table = paste('tickers_hedged',format(Sys.time(),format = '%Y%m%d%H%M%OS6'),sep='_')
copy_to(dest = ch, df = tickers_hedged, name = tickers_hedged_temp_table, temporary = TRUE)
tbl(ch, tickers_hedged_temp_table)
#> # Source:   table<tickers_hedged_20210120005131.538403> [?? x 3]
#> # Database: postgres 11.0.7
#>   factset_id        date       instrument_type
#>   <chr>             <date>     <chr>          
#> 1 No Hedged Tickers 2021-01-20 <NA>

dbDisconnect(ch)
#> [1] TRUE

ch2 = dbConnect(drv = dbDriver("PostgreSQL"),
                dbname = db,
                host = host,
                port = port,
                user = username,
                password = pwd)
tbl(ch2, tickers_hedged_temp_table)
#> # Source:   table<tickers_hedged_20210120005131.538403> [?? x 3]
#> # Database: postgres 11.0.7
#>   factset_id        date       instrument_type
#>   <chr>             <date>     <chr>          
#> 1 No Hedged Tickers 2021-01-20 <NA>
#this should fail right? but it does not, it finds the table still

Created on 2021-01-20 by the reprex package (v0.3.0)

@ben1787
Copy link
Author

ben1787 commented Jan 15, 2021

Using Postgres btw.

@hadley hadley added the reprex needs a minimal reproducible example label Jan 19, 2021
@hadley

This comment has been minimized.

@ben1787

This comment has been minimized.

@hadley

This comment has been minimized.

@ben1787

This comment has been minimized.

@hadley

This comment has been minimized.

@ben1787

This comment has been minimized.

@ben1787

This comment has been minimized.

@hadley hadley closed this as completed in e2c383a Jan 21, 2021
@hadley

This comment has been minimized.

@hadley hadley reopened this Jan 21, 2021
@hadley
Copy link
Member

hadley commented Jan 21, 2021

As I suspected, it's not a problem for RPostgres, but is for RPostgreSQL:

library(dbplyr)
library(dplyr, warn.conflicts = FALSE)

# RPostgres --------------------------------------------------------------------
con1 <- DBI::dbConnect(RPostgres::Postgres(), dbname = "test")
con2 <- DBI::dbConnect(RPostgres::Postgres(), dbname = "test")

copy_to(con1, data.frame(x = 1), "df")
DBI::dbListTables(con1)
#> [1] "df"
DBI::dbListTables(con2)
#> character(0)

# RPostgresSQL ------------------------------------------------------------------
con1 <- DBI::dbConnect(RPostgreSQL::PostgreSQL(), dbname = "test")
con2 <- DBI::dbConnect(RPostgreSQL::PostgreSQL(), dbname = "test")

copy_to(con1, data.frame(x = 1), "df")
DBI::dbListTables(con1)
#> [1] "df" "df"
DBI::dbListTables(con2)
#> [1] "df" "df"

DBI::dbRemoveTable(con1, "df")
#> [1] TRUE

Created on 2021-01-21 by the reprex package (v0.3.0.9001)

@hadley
Copy link
Member

hadley commented Jan 21, 2021

The problem is that in dbplyr 2.0.0:

  • db_write_table() now calls DBI::dbWriteTable() instead of nine generics
    that formerly each did a small part: db_create_indexes(), db_begin(),
    db_rollback(), db_commit(), db_list_tables(), drop_drop_table(),
    db_has_table(), db_create_table(), and db_data_types(). You can
    now delete the methods for these generics.

This significantly reduces the complexity of dbplyr, but unfortunately RPostgresSQL doesn't support the temporary argument. I'm not sure how to work around this without a bunch of work.

One simple option would be to make temporary = TRUE throw an error; that way you at least know what's going wrong.

@hadley hadley added backend 🕺 bug an unexpected problem or unintended behavior and removed reprex needs a minimal reproducible example labels Jan 21, 2021
@ben1787
Copy link
Author

ben1787 commented Jan 22, 2021

I see. Thanks for that. Is RPostgres a more reliable package? I see there hasn't been an update to the RPostgreSQL package in 3.5 years.

@hadley
Copy link
Member

hadley commented Jan 22, 2021

Yes, it’s actively maintained and generally more reliable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend 🕺 bug an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants