Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reshape two columns of data with spread #296

Closed
jstitlow opened this issue Apr 20, 2017 · 3 comments
Closed

Reshape two columns of data with spread #296

jstitlow opened this issue Apr 20, 2017 · 3 comments
Labels
reprex needs a minimal reproducible example

Comments

@jstitlow
Copy link

Sorry, not a bug, just a suggestion.

This doesn't work using the spread function:
spread <- (df,Gentoype,D)
Error: duplicate identifiers for rows (1,2,3...etc)

image

But this does:
spread <- (df,Gentoype,D)

image

Is there a way to ignore the identifiers when the values are not correlated, and just fill the new columns directly?

maybe this:
spread <- (df,Genotype,D, Literal=TRUE)

@hadley
Copy link
Member

hadley commented Jun 23, 2017

Could you please rework your reproducible example to use the reprex package ? That makes it easier to see both the input and the output, formatted in such a way that I can easily re-run in a local session.

@hadley hadley added the reprex needs a minimal reproducible example label Jun 23, 2017
@markdly
Copy link
Contributor

markdly commented Aug 17, 2017

Here's a minimal reprex based on the original post and a possible workaround solution.

reprex based on original post

#==== minimal reprex for original post ====
suppressPackageStartupMessages(library(tidyverse))

df <- tribble(
  ~D, ~Genotype,
  1.2, "GFP",
  3.9, "GFP",
  5.5, "GFP",
  2.7, "WT",
  4.8, "WT")

# As expected, trying to spread as usual doesn't work as Genotype values are not unique
df %>% spread(Genotype, D)
#> Error: Duplicate identifiers for rows (1, 2, 3), (4, 5)

# Add rowname column before spreading enables spread to work
# But this is still not the desired output
df %>%
  rownames_to_column() %>%
  spread(Genotype, D)
#> # A tibble: 5 x 3
#>   rowname   GFP    WT
#> *   <chr> <dbl> <dbl>
#> 1       1   1.2    NA
#> 2       2   3.9    NA
#> 3       3   5.5    NA
#> 4       4    NA   2.7
#> 5       5    NA   4.8

Possible workaround solution
The desired output contains values for GFP and WT on the same row. Assuming the GFP and WT values are already ordered correctly, then this can be achieved by adding a rowid for each group and then spreading.

#==== Workaround solution for desired output ====
df %>% 
  group_by(Genotype) %>%
  mutate(group_row = 1:n()) %>%
  spread(Genotype, D)
#> # A tibble: 3 x 3
#>   group_row   GFP    WT
#> *     <int> <dbl> <dbl>
#> 1         1   1.2   2.7
#> 2         2   3.9   4.8
#> 3         3   5.5    NA

@hadley
Copy link
Member

hadley commented Nov 15, 2017

@markdly thanks for the reprex and workaround!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reprex needs a minimal reproducible example
Projects
None yet
Development

No branches or pull requests

3 participants