-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combiners #10
Comments
⌨️ Activity: Switch to a new branchBefore you edit any code, create a local branch called "combiners" and push that branch up to the remote location "origin" (which is the github host of your repository).
The first two lines aren't strictly necessary when you don't have any new branches, but it's a good habit to head back to Comment on this issue once you've created and pushed the "combiners" branch. |
a |
⌨️ Activity: Add a data combinerWrite
|
Check your progressSome answers to compare to your own: 1. Inspect the console output. Which task steps ( The 2. Inspect the value of Here's what my
⌨️ Activity: Explore
|
a |
⌨️ Activity: Use the combiner target downstreamIt's time to reap the rewards from your first combiner.
When you've got it, share the image in 3_visualize/out/data_coverage.png as a comment. I'll respond when I see your comment. |
Great, you have a combiner hooked up from start to finish, and you probably learned some things along the way! It's time to add a second combiner that serves a different purpose - here, rather than produce a target that contains the data of interest, we'll produce a combiner target that summarizes the outputs of interest (in this case the state-specific .png files we've already created). ⌨️ Activity: Add a summary combinerDon't write another combinerLast time, you wrote your own combiner. This time you just need to check out
Prepare the task plan and task makefile to use
|
|
You're getting close! The last step for this second combiner is to connect it to the main pipeline. But this isn't trivial, because right now your code in One function, two outputs?To connect both combiners to the main pipeline - and more broadly to follow pipelining best practices, ensuring that our pipeline's reproducibility is robust to modification - we need Let's take a moment to decide which effects of the task table we want to be visible. For this we need to check our project plans, because what we want does differ by project...ahh, here they are: In this course project we won't ever need to revisit the state-specific data tables again, so we don't need to carry those Great! So we only have two outputs that need to be represented by This challenge should be ringing bells for you, because we've actually solved it twice already.
There are actually a few ways to implement this general strategy. So far we've created summary files, but in this case, the output of
⌨️ Activity: Make a multi-output targetFor this course, let's go with option 3 from the list above.
Test
Add any comments, questions, or revelations to a comment on this issue. I'll respond when I see your comment. |
I wonder if the combiner target filtering could be more built-in? Perhaps some using a named vector in |
You're down to the last task for this issue! I hope you'll find this one rewarding. After all your hard work, you're now in a position to create a leaflet map that will give you interactive access to the locations, identities, and timeseries plots of the Upper Midwest's oldest gages, all in one .html map. Ready? Use the plots downstream
Test
Make a pull requestIt's finally time to submit your work.
I'll respond when I see your PR. |
Agreed. This pattern does make you want more customization in the combiners for sure. |
Yep. That was a pain point as I was working on this course. Also noted in this issue: DOI-USGS/scipiper#113 |
So far we've implemented split and apply operations; now it's time to explore combine operations in scipiper pipelines.
In this issue you'll add two combiners to serve different purposes - the first will combine all of the annual observation tallies into one giant table, and the second will summarize the set of state-specific timeseries plots generated by the task table. We'll use the second combiner in a target within the main remake.yml, downstream of the whole task table, to illustrate how task tables can fit into the flow of a longer pipeline.
Background
Approach
Combiners in scipiper are functions that accept a set of task-step targets as arguments and produce a single file or R object as output. We define their corresponding targets within the task remakefile (as opposed to within the main remake.yml) for several reasons:
remake::diagram(remake_file='remake.yml')
and searching in vain for targets such asWI_tally
:For the same reasons that purrr and dplyr allow you to implement split, apply, and combine operations all in a single expression, it's conceptually tidier to think of those three operations as a bundle within scipiper pipelines, and therefore to code them all in the same place.
The number of inputs to a combiner should change if the number of tasks changes. If we hand-coded a combiner target with a
command
function that accepted a set of inputs (e.g.,command: combine_tallies(WI_tally, MI_tally, [etc])
), we'd need to manually edit the inputs to that function anytime we changed thestates
vector. That would be a pain and would make our pipeline susceptible to human error if we forgot or made a mistake in that editing. It's safer if we can automatically generate that list of inputs along with the rest of the task remakefile.Implementation
The scipiper way to use combiners is to work with the
finalize_funs
,final_targets
,as_promises
, andtickquote_combinee_objects
arguments tocreate_task_makefile()
. Once you've set up these arguments properly,create_task_makefile()
will write combiner targets into your task remakefile for you, autopopulating the arguments to match the final step of each task.The
finalize_funs
argument is a vector of one or more function names, each one of which will be called as thecommand
to create a combiner target. Note that you don't get to specify anything else about your combiners here except their names. You can write your own combiner function or you can use the built-incombine_to_ind()
combiner for a common type of combining (you'll see when we try it out). If you write your own combiner function, it should be defined within one of thesources
orpackages
specified increate_task_makefile()
.You don't get choices about the list of arguments that a combiner function will accept: If the output of your combiner will be a file, your combiner function must accept arguments that are a summary filename, the output from task 1, the output from task 2, and so on, in that order, and your combiner must write the output to the filename given by the summary filename argument. If the output of your combiner will be an R object, your combiner function should skip straight to the outputs from the tasks (and not accept an initial filename argument) and should return an R object. The declaration of the combiner function should therefore either be
function(out_file, ...)
orfunction(...)
(where the name of theoutfile
argument is up to you).The
final_targets
argument is a vector of target names for your one or more combiner targets. This argument's length needs to exactly match that offinalize_funs
and will be mapped 1:1 to those target functions. If afinalize_fun
(combiner) will return a file, the correspondingfinal_targets
value should be a filename. It the combiner will return an R object, thefinal_targets
value should be a valid object target name.The utility of
as_promises=TRUE
will become clearer with an illustration shortly. For now you can useas_promises=FALSE
.We've been working with
tickquote_combinee_objects=FALSE
largely because you don't have any combiners. Once you add them, you should pretty much always usetickquote_combinee_objects=TRUE
. This argument helpscreate_task_makefile
format the remakefile correctly across a range of possible apply and combine operations.The text was updated successfully, but these errors were encountered: