spread changes factor order #47

manuelreif · 2014-12-15T12:13:10Z

Hi, and thank you for this package!

One question arose, when using tidyr version 0.2.0 from cran.
I want to reshape a longtable to a wide one, which is no problem, but the outcome (ordering) is somehow unexpected.
When using unite(), the new variabel is of type character, which does not contain any information about category ordering. So the output columns are sorted alphabetically, after using spread(). Is there any way, to keep the factor-ordering information - or to avoid unite() (submitting > 1 keys to spread()) - i fear currently this is not possible?
Would this be possible in future versions? It would be extremely helpful for creating tables automatically, when a certain order of columns is given a-priori.

# creating some variables
year <- c(rep(2006,4),rep(2007,4),rep(2006,4),rep(2007,4),rep(2006,4),rep(2007,4))
f1 <- factor(rep(c("m","w","gesamt"),each=8),levels=c("m","w","gesamt"))
f2 <- factor(rep(letters[1:4],6),levels=letters[1:4])
val <- round(rnorm(24),2)

# creating a data.frame
d1 <- data.frame(year = year,f1,f2,val)

d1
   year     f1 f2   val
1  2006      m  a -0.92
2  2006      m  b  0.93
3  2006      m  c  1.10
4  2006      m  d -1.04
5  2007      m  a  0.02
6  2007      m  b -0.22
7  2007      m  c  1.00
8  2007      m  d -0.50
9  2006      w  a  1.56
10 2006      w  b -0.52
11 2006      w  c -1.51
12 2006      w  d  0.50
13 2007      w  a -0.25
14 2007      w  b -0.56
15 2007      w  c -0.31
16 2007      w  d  0.50
17 2006 gesamt  a  0.74
18 2006 gesamt  b -1.90
19 2006 gesamt  c  0.44
20 2006 gesamt  d  0.46
21 2007 gesamt  a -0.91
22 2007 gesamt  b  1.20
23 2007 gesamt  c  0.03
24 2007 gesamt  d -0.41

# from long --> to wide
d1 %>% unite(univar,f1,f2) %>% spread(univar,val)

  year gesamt_a gesamt_b gesamt_c gesamt_d   m_a   m_b m_c   m_d   w_a   w_b   w_c w_d
1 2006     0.74     -1.9     0.44     0.46 -0.92  0.93 1.1 -1.04  1.56 -0.52 -1.51 0.5
2 2007    -0.91      1.2     0.03    -0.41  0.02 -0.22 1.0 -0.50 -0.25 -0.56 -0.31 0.5

Thank you!
Manuel

The text was updated successfully, but these errors were encountered:

mrdwab · 2015-04-01T04:15:17Z

This actually seems to also affect the "id" variables. See, for example, here: http://stackoverflow.com/q/29381069/1270695

df = data.frame(name=c("B","B","A","A"),
                group=c("g1","g2","g1","g2"),
                V1=c(10,40,20,30),
                V2=c(6,3,1,7))

gather(df, Var, Val, V1:V2) %>% 
  unite(VarG, Var, group) %>% 
  spread(VarG, Val)

Note the factor levels for "name" in the input and output.

> str(.Last.value)
'data.frame':   2 obs. of  5 variables:
 $ name : Factor w/ 2 levels "A","B": 1 2
 $ V1_g1: num  20 10
 $ V1_g2: num  30 40
 $ V2_g1: num  1 6
 $ V2_g2: num  7 3
> str(df)
'data.frame':   4 obs. of  4 variables:
 $ name : Factor w/ 2 levels "A","B": 2 2 1 1
 $ group: Factor w/ 2 levels "g1","g2": 1 2 1 2
 $ V1   : num  10 40 20 30
 $ V2   : num  6 3 1 7

dataRangler · 2015-04-01T17:25:56Z

I asked the SO question:-) I am a beginner but when I looked at the spread.R source code, line 79 seems to be the start of the sorting. Is it necessary? I am new to R and github and do not know how to test this yet.

# Add in missing values, if necessary
if (length(overall) < n) {
  overall <- match(seq_len(n), overall, nomatch = NA)
} else {
  overall <- order(overall)
}

dataRangler · 2015-04-02T22:17:35Z

I've just found dplyr::summarise does sorting as well. Is this a design philosophy?

df %>% 
+ group_by(name) %>% 
+ summarise(n()
+ )
Source: local data frame [2 x 2]

  name n()
1    A   2
2    B   2

hadley · 2015-08-24T15:36:32Z

Instead of unite() do:

d1 %>% mutate(univar = f1:f2, f1 = NULL, f2 = NULL) %>% spread(univar,val)
# OR
d1 %>% mutate(univar = interaction(f1, f2), f1 = NULL, f2 = NULL) %>% spread(univar,val)

hadley closed this as completed Aug 24, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spread changes factor order #47

spread changes factor order #47

manuelreif commented Dec 15, 2014

mrdwab commented Apr 1, 2015

dataRangler commented Apr 1, 2015

dataRangler commented Apr 2, 2015

hadley commented Aug 24, 2015

spread changes factor order #47

spread changes factor order #47

Comments

manuelreif commented Dec 15, 2014

mrdwab commented Apr 1, 2015

dataRangler commented Apr 1, 2015

dataRangler commented Apr 2, 2015

hadley commented Aug 24, 2015