Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pivot_wider gives incorrect output when pivoting database table #693

Closed
eipi10 opened this issue Aug 11, 2021 · 2 comments · Fixed by #696
Closed

pivot_wider gives incorrect output when pivoting database table #693

eipi10 opened this issue Aug 11, 2021 · 2 comments · Fixed by #696

Comments

@eipi10
Copy link
Contributor

eipi10 commented Aug 11, 2021

I have a database table listing which instructors teach which courses. Some courses have more than one instructor and therefore can appear in more than one row in the database table. I'm trying to pivot the data to wide format so that each course has only one row. However, the pivoting operation gives incorrect output when run on the database table (but gives correct output on a local data frame).

Below is a reproducible example with a simplified data frame. The column seq (sequence) marks the primary and secondary instructors (values 1 and 2, respectively). Note that when pivot_wider is run on the database table, column instructor.id_1 has the wrong instructor ID value, while column instructor.name_2 has the wrong instructor name value. However, with a local data frame, pivot_wider works as expected.

library(tidyverse)
library(dbplyr)
#> 
#> Attaching package: 'dbplyr'
#> The following objects are masked from 'package:dplyr':
#> 
#>     ident, sql

# Fake data
x = tibble(instructor.id=rep(c("01","02"), each=2), 
           instructor.name=rep(c("Curie", "Arrhenius"), each=2),
           course=c("CHEM 1", "CHEM 2", "CHEM 1", "CHEM 2")) %>% 
  group_by(course) %>% 
  mutate(seq=1:n()) %>% 
  pivot_longer(c(instructor.id, instructor.name))

x
#> # A tibble: 8 x 4
#> # Groups:   course [2]
#>   course   seq name            value    
#>   <chr>  <int> <chr>           <chr>    
#> 1 CHEM 1     1 instructor.id   01       
#> 2 CHEM 1     1 instructor.name Curie    
#> 3 CHEM 2     1 instructor.id   01       
#> 4 CHEM 2     1 instructor.name Curie    
#> 5 CHEM 1     2 instructor.id   02       
#> 6 CHEM 1     2 instructor.name Arrhenius
#> 7 CHEM 2     2 instructor.id   02       
#> 8 CHEM 2     2 instructor.name Arrhenius

# Desired output (using local data frame)
x %>% pivot_wider(names_from=c(name, seq), values_from=value)
#> # A tibble: 2 x 5
#> # Groups:   course [2]
#>   course instructor.id_1 instructor.name_1 instructor.id_2 instructor.name_2
#>   <chr>  <chr>           <chr>             <chr>           <chr>            
#> 1 CHEM 1 01              Curie             02              Arrhenius        
#> 2 CHEM 2 01              Curie             02              Arrhenius

# With a database, same code gives incorrect output
memdb_frame(x) %>% pivot_wider(names_from=c(name, seq), values_from=value)
#> # Source:   lazy query [?? x 5]
#> # Database: sqlite 3.35.5 [:memory:]
#>   course instructor.id_1 instructor.name_1 instructor.id_2 instructor.name_2
#>   <chr>  <chr>           <chr>             <chr>           <chr>            
#> 1 CHEM 1 02              Curie             02              Curie            
#> 2 CHEM 2 02              Curie             02              Curie

Created on 2021-08-11 by the reprex package (v2.0.0)

@hadley hadley transferred this issue from tidyverse/tidyr Aug 16, 2021
@snowdj
Copy link

snowdj commented Mar 24, 2022

I got the same error. It only happens when using dbplyr with database, but it works properly when using dplyr with dataframe.

`> sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale:
[1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252
[3] LC_MONETARY=English_Canada.1252 LC_NUMERIC=C
[5] LC_TIME=English_Canada.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] lavaan_0.6-9 naniar_0.6.1 psych_2.1.9 esquisse_1.1.0 lme4_1.1-27.1
[6] Matrix_1.3-2 dbplyr_2.1.1 odbc_1.3.3 DBI_1.1.2 glue_1.6.2
[11] readxl_1.3.1 forcats_0.5.1 stringr_1.4.0 dplyr_1.0.8 purrr_0.3.4
[16] readr_2.1.1 tidyr_1.1.4 tibble_3.1.6 ggplot2_3.3.5 tidyverse_1.3.1

loaded via a namespace (and not attached):
[1] minqa_1.2.4 colorspace_2.0-3 ellipsis_0.3.2 rio_0.5.29
[5] rprojroot_2.0.2 rsconnect_0.8.25 visdat_0.5.3 snakecase_0.11.0
[9] fs_1.5.2 rstudioapi_0.13 DT_0.20 bit64_4.0.5
[13] fansi_1.0.2 lubridate_1.8.0 xml2_1.3.3 splines_4.0.4
[17] mnormt_2.0.2 knitr_1.37 jsonlite_1.7.2 nloptr_1.2.2.3
[21] packrat_0.7.0 broom_0.7.11 shiny_1.7.1 compiler_4.0.4
[25] httr_1.4.2 backports_1.4.1 assertthat_0.2.1 fastmap_1.1.0
[29] lazyeval_0.2.2 cli_3.2.0 later_1.3.0 htmltools_0.5.2
[33] tools_4.0.4 gtable_0.3.0 Rcpp_1.0.7 cellranger_1.1.0
[37] jquerylib_0.1.4 vctrs_0.3.8 nlme_3.1-152 xfun_0.29
[41] openxlsx_4.2.5 rvest_1.0.2 mime_0.12 lifecycle_1.0.1
[45] pacman_0.5.1 MASS_7.3-54 scales_1.1.1 vroom_1.5.7
[49] hms_1.1.1 promises_1.2.0.1 parallel_4.0.4 flexdashboard_0.5.2
[53] yaml_2.2.1 curl_4.3.2 datamods_1.2.0 sass_0.4.0
[57] stringi_1.7.6 boot_1.3-28 zip_2.2.0 rlang_1.0.2
[61] pkgconfig_2.0.3 evaluate_0.14 lattice_0.20-41 htmlwidgets_1.5.4
[65] bit_4.0.4 tidyselect_1.1.2 here_1.0.1 magrittr_2.0.2
[69] R6_2.5.1 generics_0.1.2 pillar_1.7.0 haven_2.4.3
[73] foreign_0.8-81 withr_2.5.0 janitor_2.1.0 modelr_0.1.8
[77] crayon_1.5.0 shinyWidgets_0.6.3 utf8_1.2.2 tmvnsim_1.0-2
[81] plotly_4.10.0 tzdb_0.2.0 rmarkdown_2.11 grid_4.0.4
[85] data.table_1.14.2 pbivnorm_0.6.0 blob_1.2.2 reprex_2.0.1
[89] digest_0.6.29 xtable_1.8-4 httpuv_1.6.5 stats4_4.0.4
[93] munsell_0.5.0 viridisLite_0.4.0 bslib_0.3.1 phosphoricons_0.1.1`

@mgirlich
Copy link
Collaborator

The issue is fixed in the current dev version which you can install with

devtools::install_github("tidyverse/dbplyr")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants