Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tidyr::separate should prevent duplicated column names #255

Closed
holgerbrandl opened this issue Oct 24, 2016 · 1 comment
Closed

tidyr::separate should prevent duplicated column names #255

holgerbrandl opened this issue Oct 24, 2016 · 1 comment
Labels
feature a feature request or enhancement strings 🎻

Comments

@holgerbrandl
Copy link

Example:

iris %>% separate(Species, c("Petal.Length"), remove=F) 
Observations: 150
Variables: 6
$ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4,...
$ Sepal.Width  <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7,...
$ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5,...
$ Petal.Width  <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2,...
$ Species      <fctr> setosa, setosa, setosa, setosa, setosa, setosa, setos...
$ Petal.Length <chr> "setosa", "setosa", "setosa", "setosa", "setosa", "set...

The example does not really separate anything, but just illustrates the problem of tidyr creating ansemantically invalid data-frame with a duplicated column name.

tidyr::separate should throw an error in such a case. Or at least overwrite the existing column while printing a warning.

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_0.5.0     purrr_0.2.2     readr_1.0.0     tidyr_0.6.0    
[5] tibble_1.2      ggplot2_2.1.0   tidyverse_1.0.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.7      assertthat_0.1   R6_2.1.3         grid_3.3.1      
 [5] plyr_1.8.4       gtable_0.2.0     DBI_0.5-1        magrittr_1.5    
 [9] scales_0.4.0     stringi_1.1.1    tools_3.3.1      munsell_0.4.3   
[13] colorspace_1.2-6
@pkopps
Copy link

pkopps commented Oct 25, 2016

Having same issue with gather(), not sure if I could be something differently

> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] tidyr_0.6.0    stringr_1.1.0  dplyr_0.5.0    openxlsx_3.0.0 reshape2_1.4.1 ggplot2_2.1.0  pander_0.6.0   readr_1.0.0   
[9] knitr_1.14    

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.7      magrittr_1.5     munsell_0.4.3    colorspace_1.2-6 R6_2.2.0         highr_0.6        plyr_1.8.4      
 [8] tools_3.3.0      grid_3.3.0       gtable_0.2.0     DBI_0.5-1        yaml_2.1.13      lazyeval_0.2.0   assertthat_0.1  
[15] digest_0.6.10    tibble_1.1       formatR_1.4      rsconnect_0.5    evaluate_0.9     labeling_0.3     stringi_1.1.1   
[22] scales_0.4.0   
> names(final)
 [1] "csa"                         "WK_NUM_IN_YR"                "opening_percent_score"       "procedures_percent_score"   
 [5] "resolution_percent_score"    "account_promo_percent_score" "customer_care_percent_score" "documentation_percent_score"
 [9] "hold_transfer_percent_score" "closing_percent_score"       "total_percent_score"

filter(final, csa == agent, WK_NUM_IN_YR == week) %>% gather(csa, WK_NUM_IN_YR) %>% names()
[1] "csa"          "WK_NUM_IN_YR" "csa"          "WK_NUM_IN_YR"
> filter(final, csa == agent, WK_NUM_IN_YR == week) %>% gather(csa, WK_NUM_IN_YR)
        csa WK_NUM_IN_YR                         csa WK_NUM_IN_YR
1  clkjedel           12       opening_percent_score    100.00000
2  clkjedel           12       opening_percent_score    100.00000
3  clkjedel           12    procedures_percent_score     58.82353
4  clkjedel           12    procedures_percent_score     64.70588
5  clkjedel           12    resolution_percent_score     68.42105
6  clkjedel           12    resolution_percent_score     21.05263
7  clkjedel           12 account_promo_percent_score      0.00000
8  clkjedel           12 account_promo_percent_score      0.00000
9  clkjedel           12 customer_care_percent_score    100.00000
10 clkjedel           12 customer_care_percent_score    100.00000
11 clkjedel           12 documentation_percent_score     80.00000
12 clkjedel           12 documentation_percent_score     80.00000
13 clkjedel           12 hold_transfer_percent_score      0.00000
14 clkjedel           12 hold_transfer_percent_score      0.00000
15 clkjedel           12       closing_percent_score    100.00000
16 clkjedel           12       closing_percent_score    100.00000
17 clkjedel           12         total_percent_score     68.00000
18 clkjedel           12         total_percent_score     60.00000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement strings 🎻
Projects
None yet
Development

No branches or pull requests

3 participants