Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
82 lines (59 sloc) 2.87 KB
---
title: "Reprexes for dbplyr"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{reprex}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
If you're reporting a bug in dbplyr, it is much easier for me to help you if you can supply a [reprex](https://reprex.tidyverse.org) that I can run on my computer. Creating reprexes for dbplyr is particularly challenging because you are probably using a database that you can't share with me. Fortunately, in many cases you can still demonstrate the problem even if I don't have the complete dataset, or even access to the database system that you're using.
This vignette outlines three approaches for creating reprexes that will work anywhere:
* Use `memdb_frame()`/`tbl_memdb()` to easily create datasets that live in an
in-memory SQLite database.
* Use `lazy_frame()`/`tbl_lazy()` to simulate SQL generation of dplyr pipelines.
* Use `translate_sql()` to simulate SQL generation of columnar expression.
```{r setup, message = FALSE}
library(dplyr)
library(dbplyr)
```
## Using `memdb_frame()`
The first place to start is with SQLite. SQLite is particularly appealing because it's completely embedded instead an R package so doesn't have any external dependencies. SQLite is designed to be small and simple, so it can't demonstrate all problems, but it's easy to try out and a great place to start.
You can easily create a SQLite in-memory database table using `memdb_frame()`:
```{r}
mf <- memdb_frame(g = c(1, 1, 2, 2, 2), x = 1:5, y = 5:1)
mf
mf %>%
group_by(g) %>%
summarise_all(mean, na.rm = TRUE)
```
Reprexes are easiest to understand if you create very small custom data, but if you do want to use an existing data frame you can use `tbl_memdb()`:
```{r}
mtcars_db <- tbl_memdb(mtcars)
mtcars_db %>%
count(cyl) %>%
show_query()
```
## Translating verbs
Many problems with dbplyr come down to incorrect SQL generation. Fortunately, it's possible to generate SQL without a database using `lazy_frame()` and `tbl_lazy()`. Both take an `con` argument which takes a database "simulator" like `simulate_postgres()`, `simulate_sqlite()`, etc.
```{r}
x <- c("abc", "def", "ghif")
lazy_frame(x = x, con = simulate_postgres()) %>%
head(5) %>%
show_query()
lazy_frame(x = x, con = simulate_mssql()) %>%
head(5) %>%
show_query()
```
If you isolate the problem to incorrect SQL generation, it would be very helpful if you could also suggest more appropriate SQL.
## Translating individual expressions
In some cases, you might be able to track the problem down to incorrect translation for a single column expression. In that case, you can make your reprex even simpler with `translate_sql()`:
```{r}
translate_sql(substr(x, 1, 2), con = simulate_postgres())
translate_sql(substr(x, 1, 2), con = simulate_sqlite())
```
You can’t perform that action at this time.