-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
potential to add check that data files are committed #242
Comments
I think this is a great feature, but to be robust, it would need a more substantial solution like the ideas discussed in #9. Without the user specifying which files are input (and should be committed), then it's very hard for workflowr to guess what should be done. Without this extra infrastructure, we could do something like the following:
A big caveat of the above is that this check only works for files in Thoughts? |
I agree it would be a nice feature to have despite inevitable caveats. |
i think it is useful. What if a Rmd file saves a file to data/ rather than
loads it?
do we want to also emit a warning if the file is not tracked? (possibly
yes?)
…On Mon, Mar 8, 2021 at 6:43 AM Peter Carbonetto ***@***.***> wrote:
I agree it would be a nice feature to have despite inevitable caveats.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#242 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANXRRMSXTQ7X5XMZSEQSELTCTA6ZANCNFSM4YX6K2HQ>
.
|
The more I think about the caveats, the more I don't like this feature. It's a good point that users can write to The problem is that we don't have any clever way to really know what the code is doing as far as input/output. Some examples: # Is the code reading or writing?
x <- customPkg::customFunc("data/file.txt")
# Regexes require the file path to be a contiguous string
x <- read.table(file.path("data", "file.txt"))
# If the user sets knit_root_dir to analysis/ and uses the here package to resolve file paths.
# Workflowr would look for the file in analysis/data/file.txt b/c it's not executing the code
x <- read.table(here::here("data/file.txt")) This problem reminds me of how the drake package handles dependencies in Rmd files. You have to use its custom function |
How feasible would it be to add a check that any files being sourced or loaded are included in the repo?
eg. it seems to be an easy mistake to make to have a line like:
dat = readRDS('data/my_dat.rds')
in my Rmd file but forget to publish
data/my_dat.rds
The ability to check for this kind of thing could be helpful, although maybe not so straightforward?
The text was updated successfully, but these errors were encountered: