Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using dplyr::copy_to(...) to append to an existing table on remote data source #3026

Closed
ChadEfaw opened this issue Aug 15, 2017 · 3 comments
Closed
Labels
feature a feature request or enhancement

Comments

@ChadEfaw
Copy link

ChadEfaw commented Aug 15, 2017

Hello all!

I am trying to append data to a remote data source (an RDS sitting on AWS) and want to use copy_to(...).

I'm used to using the pool package to do this, particularly dbwritetable(..., append = T), but the pool package does not seem to work on RStudio 3.4.

# Call appropriate libraries

library(dplyr)
library(data.table)

# Call in data 

data(mtcars)

setDT(mtcars, keep.rownames = TRUE)[]
mtcars <- as.data.frame(mtcars)

# Connect to data source

my_db <- src_mysql(
  dbname = "data_source_name",
  host = "data.client.com",
  user = "username",
  password = "password"
)

# Copy original data set to database 

copy_to(dest = my_db, name = "mtcars", df = mtcars, overwrite = F, temporary = F)

# look at tables on database 
# dataset 1 = mtcars
src_tbls(my_db)
[1] mtcars

# Create a new data set 

mtcars.1 <- data.frame(mtcars[(2:12),2,]*2)

names(mtcars.1)[1] <- "values"

names <- data.frame(names(mtcars)[2:12])
names(names)[1] <- "columns"

mtcars.1 <- 
  mtcars.1 %>%
  bind_cols(names)

mtcars.1 <- tidyr::spread(mtcars.1, key = columns,value = values)

vehicle <- data.frame(c("placeholder"))
names(vehicle)[1] <- "vehicle"

mtcars.2 <-
  mtcars.1 %>%
  bind_cols(vehicle, mtcars.1) %>%
  select(-am, -carb, -cyl, -disp, -drat, -gear, -hp, -mpg, -qsec, -vs, -wt)

names(mtcars.2)[2:12] <- c("am", "carb","cyl","disp","drat", "gear","hp","mpg", "qsec","vs","wt")


# Copy to database 

copy_to(dest = my_db, name = "mtcars", df = mtcars.2, overwrite = F)
db_insert_into(con = my_db$con, table = "mtcars", values = mtcars.2) # insert into


# I would like to add an Append = T statement to copy_to

I did have success with db_insert_into, but it times out easily. This is just a reproducible example, but I need to be able to append ~1M row dataset to a mysql database a number of times. I know I can simply write it to my mysql database and select into from the MySQL GUI.

Thanks!
Chad

@ChadEfaw ChadEfaw changed the title Using dplyr::copy_to(...) to append to a remote data source Using dplyr::copy_to(...) to append to an existing table on remote data source Aug 15, 2017
@krlmlr krlmlr added database feature a feature request or enhancement labels Aug 16, 2017
@edgararuiz-zz
Copy link

Hi @ChadEfaw , have you been able to open an issue in the pool package as well?

@ChadEfaw
Copy link
Author

@edgararuiz I wasn't sure if that was the appropriate place to post but I will open up this issue over there as well.

Thank you.

@hadley
Copy link
Member

hadley commented Oct 24, 2017

Please call the underlying DBI methods directly (i.e. dbWriteTable()). dplyr is an interface for working with data in a database, not for modifying remote tables.

@hadley hadley closed this as completed Oct 24, 2017
@lock lock bot locked as resolved and limited conversation to collaborators Jun 7, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

4 participants