New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fill downup updown #504
Fill downup updown #504
Conversation
Can you explain why you'd want to do this? |
* Update lazyeval compat file * Unquote scalar quosure with !! * Use as_string(ensym()) rather than quo_name(enquo()) This is a much more robust way of capturing symbols
* Add uncount to ref index * Build reference index w/ parens * Fixes #480
A situation where I do thisI have values only known at some particular time and I need to fill this value both forwards and backwards in time. A particular exampleI work with clinical trial data, which is often provided in multiple files. In the process of making a data set for analysis, particular information may only be recorded at certain events/times, but need to be filled forward/back in time throughout a related time period. It is only valid to fill up/down within certain groupings (e.g. subjects, day, part of study) - with lots of subjects and lots of groups, this filling can take a noticeable amount of time. Also filling may be done within different groupings for different variables. A simplified concrete example:suppressPackageStartupMessages({
library(dplyr)
})
# Weight only recorded at event_type = 1, but considered
# valid across the entire event_num.
# If 'wt' not defined for a given event num, it may be
# carried forwards from a prior run, or backwards from a following run
df <- tibble::tribble(
~subject, ~time, ~event_type, ~event_num, ~wt,
1 , 1, 0, 1, NA,
1 , 2, 0, 1, NA,
1 , 3, 1, 1, 20,
1 , 4, 0, 1, NA,
1 , 5, 0, 1, NA,
1 , 1, 0, 2, NA,
1 , 2, 0, 2, NA,
1 , 3, 1, 2, NA,
1 , 4, 0, 2, NA,
1 , 5, 0, 2, NA,
1 , 1, 0, 3, NA,
1 , 2, 0, 3, NA,
1 , 3, 1, 3, 30,
1 , 4, 0, 3, NA,
1 , 5, 0, 3, NA,
)
# fill wt down/up within the event_num for each subject,
# then down/up within subject only.
df %>%
group_by(subject, event_num) %>%
tidyr::fill(wt, .direction = 'down') %>%
tidyr::fill(wt, .direction = 'up' ) %>%
group_by(subject) %>%
tidyr::fill(wt, .direction = 'down') %>%
tidyr::fill(wt, .direction = 'up' ) %>%
ungroup()
#> # A tibble: 15 x 5
#> subject time event_type event_num wt
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 0 1 20
#> 2 1 2 0 1 20
#> 3 1 3 1 1 20
#> 4 1 4 0 1 20
#> 5 1 5 0 1 20
#> 6 1 1 0 2 20
#> 7 1 2 0 2 20
#> 8 1 3 1 2 20
#> 9 1 4 0 2 20
#> 10 1 5 0 2 20
#> 11 1 1 0 3 30
#> 12 1 2 0 3 30
#> 13 1 3 1 3 30
#> 14 1 4 0 3 30
#> 15 1 5 0 3 30 Created on 2018-10-24 by the reprex |
* Add missing tests for spread with fill * Remove duplicate test for gather. The test below the removed one, with the same name, covers exactly the same test cases (and more) * Add missing test for id with high dimension * Use tibble in test-spread.r * Ensure /tests is lint-free * Resolve conflict and small style points
To quiet glue deprecation message
I also had several occasions where this type of functionality would be useful. However, I'd phrase them slightly differently: replace missing values based on the closest row (by some column, usually time). In case of an equal distance, use If data is ordered by reference column then these |
Also url in order to generate CNAME
* Add CODE_OF_CONDUCT, CONTRIBUTING, ISSUE_TEMPLATE, and SUPPORT
OK. I think i totally hosed this PR by trying to sync it with current master. :/ Burn to the ground and start again? I can't see a solution... |
In the future, you might try |
Add option to fill() to both fill-down-then-up and fill-up-then-down.
This is to replace a common idiom of mine, i.e.
which could become
Depending upon number of groups and number of variables to replace, the current duplicate call to fill() can be avoided, giving significant speed savings.