Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent reprex() from successfully running if it is not a reprex #191

Closed
prosoitos opened this issue May 14, 2018 · 9 comments
Closed

Prevent reprex() from successfully running if it is not a reprex #191

prosoitos opened this issue May 14, 2018 · 9 comments

Comments

@prosoitos
Copy link

@prosoitos prosoitos commented May 14, 2018

Thought inspired by conversation on this thread.

The purpose of the package reprex seems a little unclear to me:

  • is it to render code for sharing (as suggested by the main package description: "Render bits of R code for sharing")?
  • is it to render reprexes for sharing (as suggested by the package name and the sentence: "What is a reprex? It’s a reproducible example, as coined by Romain Francois.")?

Since reprex does more than just rendering bits of R code (it also runs that code in a fresh environment and gives helpful error messages if the code is not a reprex) and with such a name, I think the intent is more likely to be the later.

If the purpose is to produce nicely formatted reprexes, I wonder whether it is a good idea to have reprex() output nicely formatted reprex-looking chunks that are not reprexes when run on non-reprexes. Admittedly, those do have an error message. But that message seems to be ignored by some new R users who think that they are posting reprexes, for instance on the RStudio Community site.

This also has the effect of blurring the concept of reprex with the idea of formatted code.

The tidyverse functions are often more strict than their base R counterparts: they will not run but will give an error message when there are data type inconsistencies for instance (while the base R functions would give an output, for better or for worse). I really like this behaviour. I like the idea of functions not running if fed inadequate code. I was wondering if having reprex() only output the error message (and not a reprex-looking chunk) when run on a non-reprex might not make the package more inline with this philosophy as well as making the actual concept of reprex much more clear to new R users?

Then one could safely say "reprex() outputs a reprex", which is not always the case now. And asking people to use the package reprex would necessarily mean to make a reprex in the first place (since without a reprex, the code would not run but just give an error message). People would never post what they think is a reprex when it isn't. It might make helping them easier, but mostly, it might be more pedagogical about the actual concept of reprex (which seems to be surrounded by a fair amount of confusion amongst people new to R).

@jennybc
Copy link
Member

@jennybc jennybc commented May 15, 2018

I don't think I quite follow. Can you show an example of "now" vs. what you are proposing? Or sketch the change you propose?

@prosoitos
Copy link
Author

@prosoitos prosoitos commented May 15, 2018

Sure. Sorry for not being more clear.

I feel that the package reprex as it is now might be contributing to some confusion between reprex (the package, the function, or the output of the function) and a reprex (reproducible example).

I find the current behaviour to be consistant with the stated package description, both on GitHub ("Render bits of R code for sharing") and in the help file ("Description Convenience wrapper that uses the 'rmarkdown' package to render small snippets of code to target formats that include both code and output.").

But if the package goal is only to render code and output, then its name is misleading: rendering formatted code is in itself independent from the concept of reprex.

This is a problem because a lot of users view it not as a formatting/copy-paste convenience tool but as a tool to create reprexes (and obviously this is why it was created). But of course the package does not magically turn code into reproducible examples. You have to feed reprexes to it to get (formatted and "pastable") reprexes out of it.

Basically, non-reprex in, non-reprex out.

This blog for instance, which is used in the RStudio Community FAQ on reprex highlights this confusion: it really sounds like you install the package, copy some code, run reprex::reprex() and tada!, "Everything that you need to post a reprex is now automatically stored on your clipboard!" But of course, that is not true if your code was not a reprex.

And even the function output on the console is "Rendered reprex is on the clipboard." (even if the rendered output is not a reprex at all).

If reprex as taken by this console output or this blog author means the output of the function reprex(), that works. But calling "reprex" something whether it is or not a reprex (= reproducing example) is confusing and problematic.

And this, unsurprisingly, leads to posts on the RStudio Community where people say that they are posting a reprex when they are not because, for instance, they are loading local data to which we don't have access.

To make all of this a lot simpler and clean all those semantic confusions which lead (I think) to real concepts confusions, I suggest that the goal of the package be to create rendered reprexes (as the name really suggests), rather than rendered code. When fed a reprex, reprex() would behave as it does now (and output a rendered reprex as promised). But when fed a non-reprex, instead of still telling you that "Rendered reprex is on the clipboard." and outputting a nicely formatted code that can be happily pasted in a forum (with, true, an error at the bottom of it), the function could simply output that error message in the console but not successfully do any of the other things as it currently does. No rendered anything. Nothing to paste. The error message could be informative of course so that the new R user is not left hanging ("This is not a reprex. Are you sure you have loaded the necessary libraries to run your code and that you are not using data files from your computer" for instance).

This way, one could simply say that reprex() outputs reprexes. "Rendered reprex is on the clipboard." would only appear when it is indeed a reprex that is on the clipboard. When telling new R users to post a reprex, it would be clear that we actually mean, well, a reprex. Not just the output of reprex() that is called "reprex" even when it is not a reprex. Getting the function to run would teach users to put together a reprex rather than potentially confusing them about what a reprex actually is.

Side note: I truly love the package! Thank you for all the fantastic work!!

@prosoitos
Copy link
Author

@prosoitos prosoitos commented May 15, 2018

Note: if the ability to "render bits of R code", independently of whether they are reprexes or not, were deemed useful, then there could be another function (for instance reprex::render()) which would render code without running it and with the console output "Rendered code is on the clipboard."

@hadley
Copy link
Member

@hadley hadley commented May 15, 2018

reprex() already goes to considerable lengths to ensure the code is reproducible (e.g.. running in a clean session and in temporary directory). Of course, the user can always work around this. But there is no way to tell if code is fully reproducible or not. (Because sometimes generating an error is the whole point of the reprex)

I think you are trying to solve a human communication problem with code.

@prosoitos
Copy link
Author

@prosoitos prosoitos commented May 15, 2018

(e.g.. running in a clean session and in temporary directory)

Totally! and the package is phenomenal for that!

So, what about outputting an error message in the console (and no viewer or paste capabilities) when the code contains a path? Maybe it would give too many false positives? Loading local data seems to be the most common situation where people produce reprex() outputs which are not reprexes.

@prosoitos
Copy link
Author

@prosoitos prosoitos commented May 15, 2018

Or, if false positives are a concern, at least a warning in the console of the kind "Your code contains a path. Are you sure that it will run on another machine?" with a "Y/n" dialog, so that the user could just answer "Y" and get an output if they know what they are doing or if they really want to output regardless of the fact that it is not a reprex. It would give people who are not aware of the local data issue a pause and they would probably realize what they are not doing right.

I feel that this would fit the goal of the reprex package to make helping people easier.

Help them help me help them 😉

@jennybc
Copy link
Member

@jennybc jennybc commented May 15, 2018

I agree with @hadley that we've arrived at the frontier between the technical constraints that reprex can provide and human issues.

I have legitimate uses for reprex that read from a file. For example, sometimes people aren't at liberty to share an Excel file with me, but they can still use reprex to show me some puzzle they are having with readxl.

Beyond that, parsing the code and deciding whether it reads from a path or not is not a trivial ask. Ditto for parsing results and judging if the error is "legitimate" or "invalid". reprex just dumps user's code into a templated .R file and renders it. It does no analysis of the code or its results.

We're planning a short video (or 2 or 3) to help those who'd rather see reprex in that form. I'll make sure to hit some of these classic gotchas, the same way I did in the rOpenSci Community Call such as slide 10.

@jennybc jennybc closed this as completed May 15, 2018
@prosoitos
Copy link
Author

@prosoitos prosoitos commented May 16, 2018

OK. Thank you for the time given to my suggestion! and thank you for the extra resources to help new users.

@jennybc
Copy link
Member

@jennybc jennybc commented May 16, 2018

Another thing to keep on the radar is the eventual possibility of suggesting that people run their reprex on http://cloud.rstudio.com. The combination of that and the reprex package would reinforce all possible aspects of "self-contained". But we're not there yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants