Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

augment makes 'OBJECT' columns #287

Closed
romainfrancois opened this issue Mar 5, 2018 · 8 comments
Closed

augment makes 'OBJECT' columns #287

romainfrancois opened this issue Mar 5, 2018 · 8 comments

Comments

@romainfrancois
Copy link

This is what was troubling tidyverse/dplyr#3349

Not sure this is a broom issue, might be a legacy from lm, just passing the 🏀 along.

library(broom)
library(purrr)

objects <- lm(y~ ., data=freeny) %>% 
  augment() %>% 
  keep(is.object)

objects %>% 
  map(class)
#> $y
#> [1] "numeric"
#> 
#> $.resid
#> [1] "numeric"
#> 
#> $.cooksd
#> [1] "numeric"
#> 
#> $.std.resid
#> [1] "numeric"

objects %>% 
  map(attr, "class")
#> $y
#> NULL
#> 
#> $.resid
#> NULL
#> 
#> $.cooksd
#> NULL
#> 
#> $.std.resid
#> NULL

Created on 2018-03-05 by the reprex package (v0.2.0).

@romainfrancois
Copy link
Author

Well actually ... 👉 stats::model.frame

mod <- lm(y~ ., data=freeny)
y <- model.frame(mod)$y
is.object(y)
#> [1] TRUE
attr(y, "class")
#> NULL

Created on 2018-03-05 by the reprex package (v0.2.0).

@romainfrancois
Copy link
Author

I've just added some defensive code in dplyr.
tidyverse/dplyr#3394

This is not a broom issue per se, innocent bystander to what model.frame does. I'll leave this open, so that you can decide if you want defensive code as well.

@MichaelChirico
Copy link
Contributor

MichaelChirico commented Mar 5, 2018

Since I didn't catch all of what was going on from the above, here's what's happening:

is.object(freeny$y)
# [1] TRUE
attr(freeny$y, 'class')
# [1] "ts"
class(freeny$y)
# [1] "ts"

# ts attribute wiped by model.frame
class(model.frame(y ~ ., data = freeny)$y)
# [1] "numeric"
attr(model.frame(y ~ ., data = freeny)$y, 'class')
# NULL

# but still:
is.object(model.frame(y ~ ., data = freeny)$y)
# [1] TRUE

This is actually partially documented behavior of stats::model.frame.default:

Unless na.action = NULL, time-series attributes will be removed from the variables found (since they will be wrong if NAs are removed).

And lo and behold:

class(model.frame(y ~ ., data = freeny, na.action = NULL)$y)
# [1] "ts"

The unexpected part is that the attributes have been removed, but is.object remains TRUE. This sounds to me like a bug; here is the guilty line:

# before this, class(data$y) is 'ts'
data <- .External2(C_modelframe, formula, rownames, variables, 
    varnames, extras, extranames, subset, na.action)
# now, it's 'numeric'

For conciseness, we can reproduce this by taking

formula = y ~ lag.quarterly.revenue + price.index + income.level + market.potential
rownames = c("1962.25", "1962.5", "1962.75", "1963", "1963.25", "1963.5", "1963.75", "1964", "1964.25", "1964.5", "1964.75", "1965", "1965.25", "1965.5", "1965.75", "1966", "1966.25", "1966.5", "1966.75", "1967", "1967.25", "1967.5", "1967.75", "1968", "1968.25", "1968.5", "1968.75", "1969", "1969.25", "1969.5", "1969.75", "1970", "1970.25", "1970.5", "1970.75", "1971", "1971.25", "1971.5", "1971.75")
variables = list(structure(c(8.79236, 8.79137, 8.81486, 8.81301, 8.90751, 8.93673, 8.96161, 8.96044, 9.00868, 9.03049, 9.06906, 9.05871, 9.10698, 9.12685, 9.17096, 9.18665, 9.23823, 9.26487, 9.28436, 9.31378, 9.35025, 9.35835, 9.39767, 9.4215, 9.44223, 9.48721, 9.52374, 9.5398, 9.58123, 9.60048, 9.64496, 9.6439, 9.69405, 9.69958, 9.68683, 9.71774, 9.74924, 9.77536, 9.79424), .Tsp = c(1962.25, 1971.75, 4), class = "ts"), c(8.79636, 8.79236, 8.79137, 8.81486, 8.81301, 8.90751, 8.93673, 8.96161, 8.96044, 9.00868, 9.03049, 9.06906, 9.05871, 9.10698, 9.12685, 9.17096, 9.18665, 9.23823, 9.26487, 9.28436, 9.31378, 9.35025, 9.35835, 9.39767, 9.4215, 9.44223, 9.48721, 9.52374, 9.5398, 9.58123, 9.60048, 9.64496, 9.6439, 9.69405, 9.69958, 9.68683, 9.71774, 9.74924, 9.77536), c(4.70997, 4.70217, 4.68944, 4.68558, 4.64019, 4.62553, 4.61991, 4.61654, 4.61407, 4.60766, 4.60227, 4.5896, 4.57592, 4.58661, 4.57997, 4.57176, 4.56104, 4.54906, 4.53957, 4.51018, 4.50352, 4.4936, 4.46505, 4.44924, 4.43966, 4.42025, 4.4106, 4.41151, 4.3981, 4.38513, 4.3732, 4.3277, 4.32023, 4.30909, 4.30909, 4.30552, 4.29627, 4.27839, 4.27789), c(5.8211, 5.82558, 5.83112, 5.84046, 5.85036, 5.86464, 5.87769, 5.89763, 5.92574, 5.94232, 5.95365, 5.9612, 5.97805, 6.00377, 6.02829, 6.03475, 6.03906, 6.05046, 6.05563, 6.06093, 6.07103, 6.08018, 6.08858, 6.10199, 6.11207, 6.11596, 6.12129, 6.122, 6.13119, 6.14705, 6.15336, 6.15627, 6.16274, 6.17369, 6.16135, 6.18231, 6.18768, 6.19377, 6.2003), c(12.9699, 12.9733, 12.9774, 12.9806, 12.9831, 12.9854, 12.99, 12.9943, 12.9992, 13.0033, 13.0099, 13.0159, 13.0212, 13.0265, 13.0351, 13.0429, 13.0497, 13.0551, 13.0634, 13.0693, 13.0737, 13.077, 13.0849, 13.0918, 13.095, 13.0984, 13.1089, 13.1169, 13.1222, 13.1266, 13.1356, 13.1415, 13.1444, 13.1459, 13.152, 13.1593, 13.1579, 13.1625, 13.1664))
varnames = c("y", "lag.quarterly.revenue", "price.index", "income.level", "market.potential")
extras = list()
extranames = NULL
subset = NULL
na.action = 'na.omit'

Then comparing

class(.External2(stats:::C_modelframe, formula, rownames, variables,
                 varnames, extras, extranames, subset, na.action)$y)
# [1] "numeric"
class(.External2(stats:::C_modelframe, formula, rownames, variables, 
                 varnames, extras, extranames, subset, NULL)$y)
# [1] "ts"

This differs from the simple behavior of stats::na.omit, adding to the argument that the behavior of model.frame is unintentional:

DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA))
is.object(DF$y)
# [1] FALSE
class(DF$y) = 'foo'
is.object(DF$y)
# [1] TRUE
class(na.omit(DF)$y)
# [1] "numeric"
is.object(na.omit(DF)$y)
# [1] FALSE

I've just passed this along to r-devel.

@MichaelChirico
Copy link
Contributor

@romainfrancois
Copy link
Author

Thanks for taking the time to report the bug upstream :-)

@IndrajeetPatil
Copy link
Contributor

@alexpghayes Maybe this is already fixed after the release of R 3.6.0?

@github-actions
Copy link

This issue has been automatically closed due to inactivity.

@github-actions
Copy link

github-actions bot commented Jul 2, 2021

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Jul 2, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants