Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Applying median to an integer vector fails on a big grouped_df #893

Closed
alexfun opened this issue Jan 15, 2015 · 19 comments
Closed

Applying median to an integer vector fails on a big grouped_df #893

alexfun opened this issue Jan 15, 2015 · 19 comments
Assignees
Milestone

Comments

@alexfun
Copy link

@alexfun alexfun commented Jan 15, 2015

This was originally reported at http://forums.udacity.com/questions/100254695/problems-with-dplyr-040. It appears to be specific to v0.40.

First download the dataset from https://s3.amazonaws.com/udacity-hosted-downloads/ud651/pseudo_facebook.tsv

Then, create the dataset pf using pf <- read.delim('~/Downloads/pseudo_facebook.tsv'), changing the file location to be appropriate to your system.

The bug occurs by running

pf.age_and_gender <- 
    group_by(subset(pf, !is.na(gender)), age, gender)

pf.fc_by_age_gender <- summarise(pf.age_and_gender, 
                             mean_friend_count = mean(friend_count), 
                             median_friend_count = median(friend_count),
                             n =n ())

with error message Error: loss of precision when attempting to convert a numeric to an integer.

@alexfun
Copy link
Author

@alexfun alexfun commented Jan 15, 2015

Here is a 100 row sample from pf.fc_by_age_gender which still has this problem:

> dput(pf.age_and_gender)
structure(list(userid = c(1098988L, 1172446L, 1937635L, 1265714L, 
1277736L, 1439717L, 2165663L, 2148397L, 1136925L, 2025751L, 2038585L, 
1578885L, 1058901L, 1065692L, 1326082L, 1704287L, 1576382L, 1345756L, 
1570027L, 1744542L, 1004934L, 2117197L, 1221204L, 1785869L, 1719619L, 
1285499L, 1879129L, 1964655L, 1125749L, 2034606L, 2132778L, 1288107L, 
2087314L, 1346325L, 2137975L, 1737129L, 1785189L, 1121087L, 1336821L, 
2082410L, 1897013L, 1697564L, 1066025L, 1587124L, 1209737L, 1496299L, 
1961416L, 1287823L, 1245409L, 2101632L, 1714273L, 1819972L, 1901036L, 
2089023L, 1106807L, 1931800L, 1440377L, 2082950L, 1000513L, 1357852L, 
1789951L, 1056617L, 1507382L, 1592690L, 1265127L, 1526734L, 1046691L, 
1211132L, 1052130L, 1853560L, 1717021L, 1985335L, 1349255L, 1144761L, 
1829214L, 1415050L, 1639969L, 1282129L, 1855430L, 1039404L, 1254469L, 
1775534L, 1064101L, 1763012L, 1916597L, 1785760L, 1358236L, 1423187L, 
1518215L, 1967747L, 1093999L, 1107076L, 1014093L, 1905140L, 1098363L, 
1497801L, 1624731L, 1792167L, 1878511L, 1165927L), age = c(29L, 
14L, 61L, 108L, 21L, 53L, 59L, 68L, 17L, 20L, 26L, 50L, 25L, 
53L, 45L, 27L, 48L, 38L, 28L, 51L, 26L, 21L, 39L, 24L, 36L, 99L, 
22L, 20L, 20L, 42L, 39L, 23L, 53L, 25L, 93L, 66L, 16L, 23L, 14L, 
16L, 63L, 20L, 40L, 18L, 25L, 15L, 16L, 33L, 54L, 19L, 69L, 23L, 
56L, 34L, 23L, 33L, 18L, 49L, 21L, 47L, 19L, 18L, 103L, 21L, 
21L, 49L, 17L, 17L, 66L, 50L, 28L, 15L, 33L, 108L, 63L, 22L, 
25L, 24L, 33L, 24L, 39L, 41L, 37L, 39L, 55L, 18L, 19L, 72L, 17L, 
23L, 27L, 41L, 25L, 38L, 31L, 57L, 19L, 78L, 29L, 36L), dob_day = c(12L, 
27L, 26L, 2L, 24L, 18L, 4L, 28L, 12L, 27L, 5L, 23L, 8L, 15L, 
25L, 20L, 2L, 10L, 10L, 22L, 1L, 1L, 10L, 28L, 18L, 28L, 12L, 
16L, 6L, 14L, 5L, 3L, 17L, 15L, 29L, 16L, 25L, 1L, 7L, 12L, 27L, 
18L, 28L, 28L, 13L, 30L, 11L, 23L, 8L, 23L, 6L, 25L, 12L, 25L, 
27L, 14L, 25L, 22L, 26L, 10L, 6L, 18L, 8L, 17L, 5L, 18L, 11L, 
25L, 26L, 25L, 9L, 2L, 1L, 8L, 27L, 14L, 12L, 22L, 7L, 20L, 28L, 
20L, 26L, 28L, 20L, 18L, 11L, 2L, 13L, 9L, 5L, 17L, 11L, 21L, 
18L, 18L, 1L, 29L, 21L, 23L), dob_year = c(1984L, 1999L, 1952L, 
1905L, 1992L, 1960L, 1954L, 1945L, 1996L, 1993L, 1987L, 1963L, 
1988L, 1960L, 1968L, 1986L, 1965L, 1975L, 1985L, 1962L, 1987L, 
1992L, 1974L, 1989L, 1977L, 1914L, 1991L, 1993L, 1993L, 1971L, 
1974L, 1990L, 1960L, 1988L, 1920L, 1947L, 1997L, 1990L, 1999L, 
1997L, 1950L, 1993L, 1973L, 1995L, 1988L, 1998L, 1997L, 1980L, 
1959L, 1994L, 1944L, 1990L, 1957L, 1979L, 1990L, 1980L, 1995L, 
1964L, 1992L, 1966L, 1994L, 1995L, 1910L, 1992L, 1992L, 1964L, 
1996L, 1996L, 1947L, 1963L, 1985L, 1998L, 1980L, 1905L, 1950L, 
1991L, 1988L, 1989L, 1980L, 1989L, 1974L, 1972L, 1976L, 1974L, 
1958L, 1995L, 1994L, 1941L, 1996L, 1990L, 1986L, 1972L, 1988L, 
1975L, 1982L, 1956L, 1994L, 1935L, 1984L, 1977L), dob_month = c(4L, 
12L, 10L, 11L, 8L, 9L, 12L, 2L, 6L, 10L, 6L, 6L, 8L, 6L, 8L, 
12L, 7L, 10L, 12L, 1L, 1L, 1L, 6L, 12L, 2L, 5L, 1L, 8L, 4L, 7L, 
3L, 1L, 9L, 4L, 2L, 11L, 11L, 1L, 1L, 9L, 9L, 9L, 5L, 11L, 4L, 
6L, 11L, 12L, 11L, 3L, 12L, 10L, 9L, 12L, 11L, 8L, 11L, 11L, 
8L, 11L, 1L, 3L, 7L, 10L, 1L, 8L, 10L, 12L, 11L, 3L, 1L, 10L, 
1L, 8L, 1L, 6L, 1L, 4L, 9L, 5L, 7L, 6L, 6L, 5L, 3L, 3L, 8L, 6L, 
7L, 4L, 7L, 5L, 5L, 4L, 1L, 8L, 9L, 9L, 2L, 4L), gender = structure(c(2L, 
2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 
1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 
2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 
2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 
1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 
2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 
2L, 1L, 2L), .Label = c("female", "male"), class = "factor"), 
    tenure = c(710L, 65L, 1926L, 1203L, 125L, 196L, 768L, 1399L, 
    584L, 685L, 465L, 6L, 413L, 541L, 2223L, 72L, 1922L, 425L, 
    23L, 208L, 314L, 683L, 573L, 114L, 265L, 213L, 702L, 270L, 
    557L, 517L, 670L, 867L, 959L, 164L, 1520L, 916L, 441L, 42L, 
    81L, 10L, 55L, 156L, 1037L, 393L, 163L, 63L, 592L, 392L, 
    373L, 1386L, 1287L, 777L, 2191L, 223L, 565L, 448L, 344L, 
    163L, 246L, 928L, 274L, 233L, 806L, 474L, 18L, 145L, 183L, 
    280L, 1483L, 453L, 565L, 392L, 511L, 2035L, 1707L, 499L, 
    32L, 383L, 1923L, 278L, 321L, 209L, 737L, 355L, 503L, 639L, 
    524L, 1247L, 391L, 327L, 124L, 284L, 576L, 591L, 483L, 759L, 
    349L, 1197L, 1786L, 340L), friend_count = c(174L, 56L, 20L, 
    991L, 24L, 111L, 62L, 1252L, 428L, 1164L, 57L, 4L, 14L, 48L, 
    553L, 19L, 58L, 19L, 21L, 3L, 0L, 87L, 42L, 59L, 64L, 131L, 
    246L, 454L, 93L, 0L, 119L, 221L, 38L, 0L, 60L, 74L, 224L, 
    14L, 58L, 207L, 61L, 376L, 61L, 217L, 115L, 0L, 149L, 15L, 
    113L, 299L, 18L, 110L, 147L, 125L, 141L, 365L, 278L, 115L, 
    105L, 1L, 98L, 31L, 205L, 269L, 64L, 70L, 347L, 320L, 159L, 
    39L, 95L, 98L, 1L, 477L, 109L, 28L, 3L, 74L, 88L, 45L, 767L, 
    47L, 196L, 34L, 6L, 408L, 895L, 188L, 33L, 586L, 9L, 4L, 
    56L, 150L, 379L, 52L, 707L, 40L, 155L, 18L), friendships_initiated = c(114L, 
    50L, 4L, 156L, 16L, 88L, 23L, 1005L, 334L, 782L, 26L, 1L, 
    14L, 27L, 214L, 15L, 14L, 8L, 17L, 2L, 0L, 43L, 22L, 35L, 
    32L, 81L, 60L, 373L, 52L, 0L, 78L, 179L, 27L, 0L, 19L, 28L, 
    106L, 12L, 34L, 173L, 53L, 345L, 42L, 188L, 93L, 0L, 68L, 
    13L, 104L, 192L, 8L, 54L, 34L, 22L, 94L, 348L, 51L, 58L, 
    103L, 0L, 76L, 21L, 137L, 197L, 59L, 43L, 214L, 47L, 65L, 
    21L, 65L, 76L, 1L, 292L, 39L, 18L, 3L, 35L, 46L, 41L, 365L, 
    44L, 87L, 21L, 3L, 247L, 681L, 124L, 23L, 55L, 4L, 4L, 17L, 
    95L, 179L, 17L, 84L, 24L, 50L, 17L), likes = c(32L, 0L, 0L, 
    250L, 0L, 0L, 1L, 557L, 3L, 96L, 0L, 2L, 5L, 1L, 84L, 1L, 
    21L, 1L, 1L, 9L, 0L, 13L, 2L, 19L, 65L, 1L, 34L, 51L, 3L, 
    0L, 6L, 22L, 11L, 0L, 66L, 7L, 29L, 0L, 10L, 51L, 35L, 582L, 
    0L, 2L, 0L, 0L, 0L, 0L, 209L, 1415L, 39L, 9L, 69L, 0L, 0L, 
    16L, 81L, 0L, 0L, 0L, 115L, 1L, 0L, 10L, 3L, 9L, 80L, 272L, 
    71L, 145L, 2137L, 0L, 0L, 56L, 651L, 4L, 9L, 94L, 1470L, 
    1L, 3191L, 0L, 102L, 0L, 2L, 88L, 277L, 1477L, 0L, 0L, 0L, 
    0L, 8L, 1439L, 95L, 91L, 0L, 0L, 290L, 23L), likes_received = c(11L, 
    0L, 11L, 439L, 1L, 0L, 1L, 39L, 19L, 315L, 0L, 2L, 2L, 5L, 
    191L, 0L, 15L, 10L, 19L, 2L, 0L, 12L, 0L, 64L, 25L, 0L, 14L, 
    86L, 75L, 0L, 4L, 0L, 5L, 0L, 15L, 0L, 17L, 0L, 3L, 58L, 
    31L, 438L, 18L, 12L, 0L, 0L, 0L, 1L, 102L, 1527L, 10L, 5L, 
    106L, 0L, 0L, 1L, 174L, 18L, 19L, 0L, 12L, 0L, 0L, 122L, 
    0L, 28L, 248L, 370L, 89L, 31L, 271L, 1L, 0L, 190L, 224L, 
    1L, 0L, 47L, 1909L, 0L, 1250L, 0L, 141L, 0L, 0L, 12L, 479L, 
    16L, 1L, 10L, 5L, 0L, 18L, 533L, 52L, 45L, 4L, 2L, 336L, 
    2L), mobile_likes = c(32L, 0L, 0L, 250L, 0L, 0L, 1L, 0L, 
    3L, 96L, 0L, 2L, 5L, 0L, 21L, 0L, 21L, 1L, 1L, 0L, 0L, 13L, 
    1L, 19L, 60L, 0L, 33L, 42L, 3L, 0L, 6L, 19L, 8L, 0L, 49L, 
    0L, 29L, 0L, 0L, 51L, 0L, 457L, 0L, 2L, 0L, 0L, 0L, 0L, 209L, 
    926L, 39L, 9L, 27L, 0L, 0L, 10L, 81L, 0L, 0L, 0L, 90L, 1L, 
    0L, 9L, 3L, 9L, 80L, 188L, 0L, 54L, 2111L, 0L, 0L, 19L, 35L, 
    4L, 9L, 94L, 643L, 1L, 3134L, 0L, 102L, 0L, 0L, 49L, 277L, 
    0L, 0L, 0L, 0L, 0L, 8L, 1439L, 95L, 56L, 0L, 0L, 222L, 0L
    ), mobile_likes_received = c(11L, 0L, 10L, 324L, 0L, 0L, 
    1L, 29L, 6L, 290L, 0L, 2L, 2L, 5L, 61L, 0L, 11L, 10L, 17L, 
    2L, 0L, 12L, 0L, 43L, 23L, 0L, 9L, 43L, 24L, 0L, 4L, 0L, 
    2L, 0L, 6L, 0L, 17L, 0L, 2L, 47L, 7L, 212L, 13L, 10L, 0L, 
    0L, 0L, 0L, 73L, 458L, 10L, 5L, 59L, 0L, 0L, 0L, 116L, 10L, 
    19L, 0L, 7L, 0L, 0L, 37L, 0L, 10L, 197L, 132L, 21L, 1L, 126L, 
    1L, 0L, 136L, 66L, 1L, 0L, 30L, 699L, 0L, 652L, 0L, 87L, 
    0L, 0L, 10L, 331L, 2L, 0L, 10L, 2L, 0L, 12L, 452L, 34L, 25L, 
    3L, 1L, 231L, 0L), www_likes = c(0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 557L, 0L, 0L, 0L, 0L, 0L, 1L, 63L, 1L, 0L, 0L, 0L, 9L, 
    0L, 0L, 1L, 0L, 5L, 1L, 1L, 9L, 0L, 0L, 0L, 3L, 3L, 0L, 17L, 
    7L, 0L, 0L, 10L, 0L, 35L, 125L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    489L, 0L, 0L, 42L, 0L, 0L, 6L, 0L, 0L, 0L, 0L, 25L, 0L, 0L, 
    1L, 0L, 0L, 0L, 84L, 71L, 91L, 26L, 0L, 0L, 37L, 616L, 0L, 
    0L, 0L, 827L, 0L, 57L, 0L, 0L, 0L, 2L, 39L, 0L, 1477L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 35L, 0L, 0L, 68L, 23L), www_likes_received = c(0L, 
    0L, 1L, 115L, 1L, 0L, 0L, 10L, 13L, 25L, 0L, 0L, 0L, 0L, 
    130L, 0L, 4L, 0L, 2L, 0L, 0L, 0L, 0L, 21L, 2L, 0L, 5L, 43L, 
    51L, 0L, 0L, 0L, 3L, 0L, 9L, 0L, 0L, 0L, 1L, 11L, 24L, 226L, 
    5L, 2L, 0L, 0L, 0L, 1L, 29L, 1069L, 0L, 0L, 47L, 0L, 0L, 
    1L, 58L, 8L, 0L, 0L, 5L, 0L, 0L, 85L, 0L, 18L, 51L, 238L, 
    68L, 30L, 145L, 0L, 0L, 54L, 158L, 0L, 0L, 17L, 1210L, 0L, 
    598L, 0L, 54L, 0L, 0L, 2L, 148L, 14L, 1L, 0L, 3L, 0L, 6L, 
    81L, 18L, 20L, 1L, 1L, 105L, 2L)), .Names = c("userid", "age", 
"dob_day", "dob_year", "dob_month", "gender", "tenure", "friend_count", 
"friendships_initiated", "likes", "likes_received", "mobile_likes", 
"mobile_likes_received", "www_likes", "www_likes_received"), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -100L), vars = list(
    age, gender), drop = TRUE, indices = list(38L, 1L, c(45L, 
71L), c(36L, 39L, 46L), c(66L, 67L, 88L), 8L, c(56L, 85L), c(43L, 
61L), c(60L, 86L, 96L), 49L, 41L, c(9L, 27L, 28L), c(4L, 21L, 
58L, 63L, 64L), 26L, 75L, c(54L, 89L), c(31L, 37L, 51L), c(23L, 
77L), 79L, c(12L, 92L), c(33L, 44L, 76L), 20L, 10L, 90L, 15L, 
    c(18L, 70L), 98L, 0L, 94L, 78L, c(47L, 55L, 72L), 53L, 24L, 
    99L, 82L, c(17L, 93L), 80L, c(22L, 30L, 83L), 42L, c(81L, 
    91L), 29L, 14L, 59L, 16L, c(57L, 65L), 11L, 69L, 19L, c(13L, 
    32L), 5L, 48L, 84L, 52L, 95L, 6L, 2L, 74L, 40L, c(35L, 68L
    ), 7L, 50L, 87L, 97L, 34L, 25L, 62L, 3L, 73L), group_sizes = c(1L, 
1L, 2L, 3L, 3L, 1L, 2L, 2L, 3L, 1L, 1L, 3L, 5L, 1L, 1L, 2L, 3L, 
2L, 1L, 2L, 3L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 
1L, 1L, 2L, 1L, 3L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), biggest_group_size = 5L, labels = structure(list(
    age = c(14L, 14L, 15L, 16L, 17L, 17L, 18L, 18L, 19L, 19L, 
    20L, 20L, 21L, 22L, 22L, 23L, 23L, 24L, 24L, 25L, 25L, 26L, 
    26L, 27L, 27L, 28L, 29L, 29L, 31L, 33L, 33L, 34L, 36L, 36L, 
    37L, 38L, 39L, 39L, 40L, 41L, 42L, 45L, 47L, 48L, 49L, 50L, 
    50L, 51L, 53L, 53L, 54L, 55L, 56L, 57L, 59L, 61L, 63L, 63L, 
    66L, 68L, 69L, 72L, 78L, 93L, 99L, 103L, 108L, 108L), gender = structure(c(1L, 
    2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 
    2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 
    2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 
    2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 
    2L, 2L, 2L, 1L, 2L, 1L, 2L), .Label = c("female", "male"), class = "factor")), class = "data.frame", row.names = c(NA, 
-68L), .Names = c("age", "gender"), vars = list(age, gender)))

@rickyars
Copy link

@rickyars rickyars commented Jan 21, 2015

What's really weird is that it's not consisent:

set.seed(0)

foo <- data.frame(var = sample(letters, 100, replace=TRUE), val = sample(1:9, 100, replace=TRUE))

foo %>%
  group_by(var) %>%
  summarise(median=median(val)) 

set.seed(3)

foo <- data.frame(var = sample(letters, 100, replace=TRUE), val = sample(1:9, 100, replace=TRUE))

foo %>%
  group_by(var) %>%
  summarise(median=median(val))

The first data frame works. The second doesn't. The only difference is the randomness.

@alexfun
Copy link
Author

@alexfun alexfun commented Jan 21, 2015

I think it has something to do with the fact with whether the first factor level evaluates to numeric or integer.

In your first example, the median for var == "a" is numeric, where as var == "a" in the second example is an integer.

Doing the following makes the second example work:

set.seed(3)


foo <- data.frame(var = sample(letters, 100, replace=TRUE), val = sample(1:9, 100, replace=TRUE))



foo$var = gsub("a", "b1", foo$var)

foo %>%
    group_by(var) %>%
    summarise(median=median(val))

@jrvianna
Copy link

@jrvianna jrvianna commented Feb 4, 2015

Hello, having the same problem (using version 0.4.1). In my case, it is also related to the variable being summarized, not only the grouping variable. The example with my data:

df <- data.frame(group = c("G", "G", "G", "G", "G", "G", "G", "G", 
"G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", 
"G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", 
"G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", 
"G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", 
"G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", 
"G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", 
"G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", 
"G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", 
"G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", 
"G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", 
"G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", 
"G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", 
"G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", 
"G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", "G", 
"G", "G", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", 
"C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", 
"C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C", "C"), 
baseline_iop = c(10L, 12L, 11L, 18L, 10L, 12L, 14L, 12L, 19L, 
12L, 15L, 17L, 21L, 9L, 18L, 13L, 7L, 27L, 13L, 6L, 6L, 10L, 
11L, 8L, 18L, 14L, 9L, 6L, 19L, 18L, 12L, 14L, 16L, 12L, 18L, 
12L, 10L, 11L, 14L, 14L, 11L, 12L, 16L, 17L, 8L, 15L, 7L, 14L, 
18L, 11L, 26L, 16L, 10L, 12L, 16L, 16L, 15L, 16L, 12L, 16L, 13L, 
18L, 16L, 15L, 13L, 15L, 15L, 17L, 15L, 11L, 21L, 16L, 17L, 14L, 
18L, 19L, 11L, 16L, 14L, 14L, 18L, 10L, 14L, 24L, 16L, 14L, 17L, 
8L, 11L, 13L, 13L, 8L, 20L, 11L, 19L, 11L, 16L, 15L, 5L, 17L, 
17L, 12L, 9L, 15L, 11L, 20L, 14L, 9L, 19L, 11L, 11L, 18L, 11L, 
14L, 7L, 11L, 14L, 11L, 17L, 11L, 12L, 15L, 13L, 13L, 8L, 15L, 
10L, 14L, 16L, 8L, 14L, 20L, 5L, 14L, 13L, 12L, 10L, 15L, 11L, 
12L, 8L, 12L, 14L, 12L, 14L, 13L, 14L, 13L, 14L, 13L, 16L, 16L, 
15L, 11L, 21L, 13L, 16L, 12L, 13L, 6L, 10L, 9L, 13L, 12L, 16L, 
13L, 15L, 11L, 11L, 12L, 18L, 14L, 16L, 18L, 6L, 14L, 14L, 21L, 
4L, 12L, 14L, 15L, 16L, 23L, 16L, 15L, 14L, 17L, 15L, 13L, 12L, 
21L, 15L, NA, 16L, 13L, 15L, 16L, 14L, 18L, 16L, 16L, 19L, 13L, 
15L, 15L, 13L, 15L, 13L, 12L, 20L, 17L, 16L, 14L, 12L, 14L, 12L, 
16L, 18L, 14L, 13L, 14L, 14L, 15L, 13L, 21L, 15L, 14L, 19L), 
    baseline_vfi = c(67L, 84L, 93L, 84L, 93L, 74L, 94L, 79L, 
    98L, 83L, 94L, 93L, NA, 65L, 95L, 84L, 59L, 94L, 69L, 56L, 
    NA, 95L, NA, 85L, NA, 87L, 63L, 81L, 75L, 97L, 97L, 79L, 
    72L, NA, 96L, 92L, 61L, 73L, NA, 93L, 90L, 76L, 84L, 71L, 
    94L, 98L, 50L, 94L, 94L, 98L, 94L, 98L, 95L, 91L, 82L, 91L, 
    95L, 97L, 96L, NA, NA, 99L, 98L, 94L, 98L, NA, 94L, 100L, 
    NA, 89L, 96L, 99L, 75L, 96L, 98L, 96L, NA, NA, 90L, NA, 96L, 
    98L, 96L, 97L, NA, 96L, 96L, 85L, NA, 98L, 94L, 89L, 96L, 
    98L, 86L, 95L, 97L, 98L, 74L, 95L, 93L, 99L, NA, 98L, 79L, 
    95L, NA, 97L, 98L, 78L, 95L, 89L, NA, 88L, 75L, 90L, 97L, 
    83L, 90L, 95L, 98L, 74L, 59L, 94L, NA, 81L, 93L, 98L, NA, 
    99L, 59L, 97L, 35L, 81L, 72L, 93L, 99L, 92L, 56L, 76L, 70L, 
    73L, 82L, 94L, 95L, 91L, 99L, 72L, 74L, NA, 99L, 99L, 96L, 
    97L, 100L, 70L, 96L, 98L, 91L, 87L, NA, 63L, 92L, 61L, 84L, 
    80L, 98L, 55L, 92L, 90L, 99L, 95L, 93L, 97L, 92L, 96L, 99L, 
    99L, 61L, 96L, 99L, 99L, 99L, 98L, 90L, 83L, 82L, 99L, NA, 
    79L, 53L, NA, 100L, 99L, 99L, 100L, 99L, 99L, 100L, 99L, 
    NA, 96L, 99L, 100L, 100L, 99L, 99L, 98L, 100L, 98L, 99L, 
    100L, NA, 99L, 97L, 99L, 98L, 99L, 100L, 99L, 100L, 100L, 
    100L, 100L, 99L, 100L, 100L, 99L, 97L))

df %>% group_by(group) %>% summarise(median(baseline_iop, na.rm = TRUE))

Source: local data frame [2 x 2]

  group median(baseline_iop, na.rm = TRUE)
1     C                                 15
2     G                                 14

df %>% group_by(group) %>% summarise(median(baseline_vfi, na.rm = TRUE))

Error: loss of precision when attempting to convert a numeric to an integer

Hope this helps, thanks for the great package!

@markshiz
Copy link

@markshiz markshiz commented Feb 20, 2015

Same problem on 0.4.1

@djhocking
Copy link

@djhocking djhocking commented Mar 20, 2015

Similar problem here:

> str(obs_per_day)
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 584090 obs. of  3 variables:
 $ series_id  : chr  "1" "1" "1" "1" ...
 $ date       : Date, format: "2005-07-19" "2005-07-20" "2005-07-21" "2005-07-22" ...
 $ obs_per_day: int  1 1 1 1 1 1 1 1 1 1 ...
 - attr(*, "vars")=List of 1
  ..$ : symbol series_id
 - attr(*, "drop")= logi TRUE
> obs_per_day %>%
+     dplyr::group_by(series_id) %>%
+     dplyr::summarise(median_freq = median(obs_per_day, na.rm = T)
+ )
Error: loss of precision when attempting to convert a numeric to an integer

It does not happen without the grouping. It did not matter whether I changed the grouping variable to an alphanumeric. I also tried to make sure that NA were integers but it didn't help:

dplyr::mutate(obs_per_day = ifelse(is.na(obs_per_day), NA_integer_, obs_per_day)) %>%

@gabrielflorit
Copy link

@gabrielflorit gabrielflorit commented Apr 9, 2015

This doesn't seem to happen if you multiply the median calculation by a float. The following works fine:

summarise(median = median(x) * 1.0)

@teramonagi
Copy link
Contributor

@teramonagi teramonagi commented Apr 23, 2015

@gabrielflorit
LGTM!!! It looks perfect!!!

@mbannert
Copy link

@mbannert mbannert commented Apr 24, 2015

Got the same problem, was to able to 'fix' it by multiplying with 1.0, too. However, I don't feel like this should be the solution... ?

@romainfrancois romainfrancois self-assigned this Apr 24, 2015
@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Apr 24, 2015

You definitely should not have to resort to tricks like multiplying by 1.0. I'll pick it up.

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Apr 24, 2015

Right, I get:

gdf <- df %>% group_by(group)
# grab 1-based indices (dplyr internally uses 0-based)
idx <- sapply( attr(gdf, "indices"), function(.) . +1 )

so :

> median( gdf$baseline_vfi[ idx[[1]] ], na.rm = TRUE ) %>% str
 int 99
> median( gdf$baseline_vfi[ idx[[2]] ], na.rm = TRUE ) %>% str
 num 93

The first gives us a int and the second gives us a num. That's what's confusing internals.

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Apr 24, 2015

Right now, dplyr considers that the expression will give the same types for all groups and protects us against that, hence the error message. loss of precision when attempting to convert a numeric to an integer

Perhaps we could have some sort of promotion mechanism instead.

e.g. for the few few groups we get int and then for the next groups we get num so we switch to collecting numeric values. @hadley ?

@hadley
Copy link
Member

@hadley hadley commented Apr 24, 2015

@romainfrancois that seems like a reasonable strategy to me. Ideally we should be able to use the same code for this, combine and the joins.

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Apr 25, 2015

Well it's not exactly the same situation, but I guess some logic can be shared.

@hadley hadley added this to the 0.5 milestone May 19, 2015
@vnijs
Copy link

@vnijs vnijs commented Jun 18, 2015

Another (smaller) example (0.4.2 on Mac)

structure(list(price = c(580L, 650L, 630L, 706L, 1080L, 3082L, 
3328L, 4229L, 1895L, 3546L, 752L, 13003L, 814L, 6115L, 645L, 
3749L, 2926L, 765L, 1140L, 1158L), cut = structure(c(2L, 4L, 
4L, 2L, 3L, 2L, 2L, 3L, 4L, 1L, 1L, 3L, 2L, 4L, 3L, 3L, 1L, 2L, 
2L, 2L), .Label = c("Good", "Ideal", "Premium", "Very Good"), class = "factor")), row.names = c(NA, 
-20L), .Names = c("price", "cut"), class = "data.frame") %>%
  group_by(cut) %>% 
  select(price) %>% 
  summarise(price = median(price))

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Jul 15, 2015

Seems fixed now. Can everyone please try their examples and see if there is still a problem.

@vnijs
Copy link

@vnijs vnijs commented Jul 15, 2015

I tested my example with version 0.4.2.9002 from github and it works. I also ran tests and examples from my main package and everything works great. Thanks @romainfrancois

@jrvianna
Copy link

@jrvianna jrvianna commented Jul 15, 2015

Tested my example and other data and all are working fine (0.4.2.9002). Thanks!

@nxskok
Copy link

@nxskok nxskok commented Sep 21, 2016

It works on my example that failed last year.

@lock lock bot locked as resolved and limited conversation to collaborators Jun 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet