-
Notifications
You must be signed in to change notification settings - Fork 185
Description
Thanks to the default "DBI" translation, most of the dplyr/dbplyr operations work. There are Databricks specific operations, for example var(), currently marked as not-supported by dbplyr, due to the fact that var() is not in the default translation. Databricks uses "Spark SQL", which is underpinned by the Hive SQL syntax. My suggestion would be to use the same SQL variance currently in use for the Hive backend support.
The second issue uncovered, is that copy_to() does not work. The Databricks back-end does not support transactions. So I think a custom db_copy_to() function will be needed here.
Reprex
The following reprex contains the specific code that can be used to connect to Databricks using a PAT. It also contains confirmation that var() is supported in Databricks. The confirmation is in calling SQL directly via DBI, and then attempting
to do the same via dplyr, which errors out. It also contains the error received when trying to use copy_to()
# https://docs.databricks.com/en/sql/language-manual/index.html
library(dbplyr)
library(dplyr)
library(DBI)
con <- dbConnect(
odbc::odbc(),
Driver = "/Library/simba/spark/lib/libsparkodbc_sb64-universal.dylib",
Host = "rstudio-partner-posit-default.cloud.databricks.com",
Port = 443,
AuthMech = 3,
HTTPPath = "/sql/1.0/warehouses/300bd24ba12adf8e",
Protocol = "https",
ThriftTransport = 2,
SSL = 1,
UID = "token",
PWD = Sys.getenv("DATABRICKS_TOKEN")
)
# https://docs.databricks.com/en/sql/language-manual/functions/variance.html
dbGetQuery(con, "Select variance(trip_distance) as var1 from samples.nyctaxi.trips where trip_distance > 20")
#> var1
#> 1 3.369306
trips <- tbl(con, in_catalog("samples", "nyctaxi", "trips"))
trips %>%
filter(trip_distance > 20) %>%
summarise(var1 = var(trip_distance, na.rm = TRUE))
#> Error in `var()`:
#> ! `var()` is not available in this SQL variant.
#> Backtrace:
#> ▆
#> 1. ├─base::tryCatch(...)
#> 2. │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#> 3. │ ├─base (local) tryCatchOne(...)
#> 4. │ │ └─base (local) doTryCatch(return(expr), name, parentenv, handler)
#> 5. │ └─base (local) tryCatchList(expr, names[-nh], parentenv, handlers[-nh])
#> 6. │ └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#> 7. │ └─base (local) doTryCatch(return(expr), name, parentenv, handler)
#> 8. ├─base::withCallingHandlers(...)
#> 9. ├─base::saveRDS(...)
#> 10. ├─base::do.call(...)
#> 11. ├─base (local) `<fn>`(...)
#> 12. ├─global `<fn>`(input = base::quote("hexed-tuna_reprex.R"))
#> 13. │ └─rmarkdown::render(input, quiet = TRUE, envir = globalenv(), encoding = "UTF-8")
#> 14. │ └─knitr::knit(knit_input, knit_output, envir = envir, quiet = quiet)
#> 15. │ └─knitr:::process_file(text, output)
#> 16. │ ├─base::withCallingHandlers(...)
#> 17. │ ├─base::withCallingHandlers(...)
#> 18. │ ├─knitr:::process_group(group)
#> 19. │ └─knitr:::process_group.block(group)
#> 20. │ └─knitr:::call_block(x)
#> 21. │ └─knitr:::block_exec(params)
#> 22. │ └─knitr:::eng_r(options)
#> 23. │ ├─knitr:::in_input_dir(...)
#> 24. │ │ └─knitr:::in_dir(input_dir(), expr)
#> 25. │ └─knitr (local) evaluate(...)
#> 26. │ └─evaluate::evaluate(...)
#> 27. │ └─evaluate:::evaluate_call(...)
#> 28. │ ├─evaluate (local) handle(...)
#> 29. │ │ └─base::try(f, silent = TRUE)
#> 30. │ │ └─base::tryCatch(...)
#> 31. │ │ └─base (local) tryCatchList(expr, classes, parentenv, handlers)
#> 32. │ │ └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#> 33. │ │ └─base (local) doTryCatch(return(expr), name, parentenv, handler)
#> 34. │ ├─base::withCallingHandlers(...)
#> 35. │ ├─base::withVisible(value_fun(ev$value, ev$visible))
#> 36. │ └─knitr (local) value_fun(ev$value, ev$visible)
#> 37. │ └─knitr (local) fun(x, options = options)
#> 38. │ ├─base::withVisible(knit_print(x, ...))
#> 39. │ ├─knitr::knit_print(x, ...)
#> 40. │ └─rmarkdown:::knit_print.tbl_sql(x, ...)
#> 41. │ ├─context$df_print(x)
#> 42. │ └─dbplyr:::print.tbl_sql(x)
#> 43. │ ├─dbplyr:::cat_line(format(x, ..., n = n, width = width, n_extra = n_extra))
#> 44. │ │ ├─base::cat(paste0(..., "\n"), sep = "")
#> 45. │ │ └─base::paste0(..., "\n")
#> 46. │ ├─base::format(x, ..., n = n, width = width, n_extra = n_extra)
#> 47. │ └─pillar:::format.tbl(x, ..., n = n, width = width, n_extra = n_extra)
#> 48. │ └─pillar:::format_tbl(...)
#> 49. │ └─pillar::tbl_format_setup(...)
#> 50. │ ├─pillar:::tbl_format_setup_dispatch(...)
#> 51. │ └─pillar:::tbl_format_setup.tbl(...)
#> 52. │ └─pillar:::df_head(x, n + 1)
#> 53. │ ├─base::as.data.frame(head(x, n))
#> 54. │ └─dbplyr:::as.data.frame.tbl_sql(head(x, n))
#> 55. │ ├─base::as.data.frame(collect(x, n = n))
#> 56. │ ├─dplyr::collect(x, n = n)
#> 57. │ └─dbplyr:::collect.tbl_sql(x, n = n)
#> 58. │ ├─dbplyr::db_sql_render(x$src$con, x, cte = cte)
#> 59. │ └─dbplyr:::db_sql_render.DBIConnection(x$src$con, x, cte = cte)
#> 60. │ ├─dbplyr::sql_render(sql, con = con, ..., cte = cte)
#> 61. │ └─dbplyr:::sql_render.tbl_lazy(sql, con = con, ..., cte = cte)
#> 62. │ ├─dbplyr::sql_render(...)
#> 63. │ └─dbplyr:::sql_render.lazy_query(...)
#> 64. │ ├─dbplyr::sql_build(query, con = con, ...)
#> 65. │ └─dbplyr:::sql_build.lazy_select_query(query, con = con, ...)
#> 66. │ └─dbplyr:::get_select_sql(...)
#> 67. │ └─dbplyr::translate_sql_(select_expr, con, window = FALSE, context = list(clause = "SELECT"))
#> 68. │ └─base::lapply(...)
#> 69. │ └─dbplyr (local) FUN(X[[i]], ...)
#> 70. │ ├─dbplyr::escape(eval_tidy(x, mask), con = con)
#> 71. │ └─rlang::eval_tidy(x, mask)
#> 72. └─dbplyr (local) var(trip_distance, na.rm = TRUE)
#> 73. └─cli::cli_abort("{.fun {f}} is not available in this SQL variant.")
#> 74. └─rlang::abort(...)
# Copying data does not work
tbl_mtcars <- copy_to(con, mtcars)
#> Error in eval(expr, envir, enclos): nanodbc/nanodbc.cpp:1296: 00000: [Simba][ODBC] (11470) Transactions are not supported.
dbDisconnect(con)Created on 2023-10-24 with reprex v2.0.2