-
Notifications
You must be signed in to change notification settings - Fork 415
A time version of expand #85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Huge +1 for a function such as this. I'm currently implementing it myself using The function signature by @matthieugomez looks good - I would probably add something whereby you can specify default values for missing rows that are added. Although grouped data frames aren't supported by tidyr (yet?), I would include a method for a I think this function would also be useful in ggvis. e.g. running the following code: df <- data_frame(
date = c(as.Date("2014-01-01"), as.Date("2014-01-03")),
sales = c(10, 13)
)
df %>%
ggvis(~date, ~sales) %>%
layer_lines we get a linear increasing graph, whereas I think in fact it makes sense to set the value of the missing date (2014-01-02) to 0. If this wasn't done in ggvis, you could at least call this proposed function on the data set before visualising it. Alternate name: |
In case you're interested, I've knocked up this package very quickly based on existing code: https://github.com/Mullefa/inflate |
I think this could just be an additional argument to df %>% expand(name, year, values = list(year = 1999:2003)) Or maybe seq_range <- function(x, period) {
rng <- range(x, na.rm = TRUE)
seq(rng[1], rng[2], by = period)
}
df %>% expand(name, year, values = list(year = function(x) seq_range(x, 1))) (Note that |
Ideally this gives two functionalities
I think your proposal only handles the second case. The first functionality is useful for datasets with non overlapping dates across groups. In certain cases, filling between the across group min and across group max can make the dataset 10x bigger. |
I think the best way forward would be to figure out how to solve the vector case - i.e. write a good seq_time <- function(x, period) {
if (any(x %% period > 1e-6)) {
stop("Time vector is not a regular sequence", call. = FALSE)
}
rng <- range(x, na.rm = TRUE)
seq(rng[1], rng[2], by = period)
}
seq_time(c(1, 2, 5, 10), 1)
seq_time(c(1, 2, 5, 9.9, 10), 1) |
Thanks! |
For datasets where a variable corresponds to a group and another to a time, a useful function would be to add rows for missing dates, making missing observation explicit.
texpand(mydata, name, year)
is different fromexpand(mydata, name, year)
in that it adds all dates between the min and the max of the last argument (2002 in the previous dataset)This defines
texpand
:Another direction than creating
texpand
andtlag
would be to define a new class of dataset, tbl_panel, which is a group + time. When setting the panel type, this checks that the time variable has no missing value and that there are no duplicate times by group. Then tidyr/dplyr has a special expand and a special lag for them.The text was updated successfully, but these errors were encountered: