Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation Suggestion: packages mentioned in document take precedence over packages in user library #61

Open
yogat3ch opened this issue Nov 5, 2020 · 9 comments

Comments

@yogat3ch
Copy link
Contributor

yogat3ch commented Nov 5, 2020

Hi @yonicd,
As I'm diving deeper into the package and making pull requests for some features that I feel would be helpful, I'm noticing an issue that seems to happen at random and I think it's due to the way sinew searches the search list and all possible packages to generate the possible packages.

For example, I have an Rmd document that loads a couple of libraries, and uses certain packages via :: throughout. Randomly, sinew will select a package that's not used anywhere else in the Rmd:

 √ stats::filter        (2) 
 √ dplyr::arrange       (1) 
 √ utils::head          (1) 
 √ dplyr::select        (1) 
 √ dplyr::rename_at     (2) 
 √ dplyr::vars          (1) 
 √ sos::matches         (2) < Here
 √ stringr::str_replace (2) 
 √ stringr::str_detect  (1) 

I've been using sinew to namespace all of the R documents on my computer, and it does this quite often. There's probably a number of documents and scripts that are no longer going to work and will need debugging as a result.

To remedy this issue, I was thinking that it would make more sense for sinew to search the documents for all of the following ways in which namespaces are loaded:

  • library
  • require
  • attach
  • loadNamespace
  • roxygen2 @importFrom & @import
  1. grep can be used to identify lines in the document with these functions, the lines can be parsed to calls (for all but roxygen comments), and namespaces can be extracted by identifying the appropriate arguments. Resulting package names are then matched with those in the user library to eliminate the possibility of extraneous arguments to these functions being extracted.
    roxygen comments can simply be parsed with regex.

  2. Namespaces are then aggregated from all calls using :: and added to the list of possible packages.

  3. If a function is encountered that is not in the namespaces of these packages, then the user library is searched (and asked for verification if ask = TRUE).

Unlike the other simple feature additions that I've submitted as PRs, making this change will require substantial revisions so I wanted to hear your thoughts on such a change.

Looking forward to your response,
Stephen

@yonicd
Copy link
Owner

yonicd commented Nov 5, 2020

thanks for breathing life into the package again.

This is indeed a heavier lift and departs a bit from the current logic.

The way the package is set up now it is using the parser to figure out function names for the script context. That would have to stay put instead of grepping.

the order of the lookup is defined by the searchpath and what is defined in the global environment. you can see that in the pretty_utils

happy to discuss further

@yogat3ch
Copy link
Contributor Author

yogat3ch commented Nov 6, 2020

thanks for breathing life into the package again.

Certainly! I'm getting a lot of mileage out of it parsing all of the R code I've written over the last couple years. The experience of using it on such a diversity of code in an iterative fashion has had the side effect of running into all kinds of edge cases and exceptions where things bug out or are difficult to handle properly - so that's the inspiration!
Thank you for creating such a useful package - I'm so glad it exists!

The way the package is set up now it is using the parser to figure out function names for the script context. That would have to stay put instead of grepping.

This makes sense. I imagine that the premises for this is that the majority of use cases are going to be people using sinew on the document/workspace they have in the open session?
I think we can definitely preserve this behavior if users have come to expect it.

the order of the lookup is defined by the searchpath and what is defined in the global environment. you can see that in the pretty_utils

Yes, I did notice this. I don't quite yet understand the logic of how the functions from the packages on the searchpath are made to take precedence in the selection. I'll have to delve more into this to figure out how the the how functions derived from the code can supplement those derived from searchpath.
I take is sos::matches above is being grabbed because sos is loaded by sinew?

I've rolled some supporting functions for extracting the namespaces loaded in Rmd/R docs and then acquiring their respective exported functions in list format similar to what's recommend for force.

I'll see about submitting a PR tomorrow with the custom functions so you can see what I'm talking about here.

@yogat3ch
Copy link
Contributor Author

yogat3ch commented Nov 6, 2020

I gave this some more thought since yesterday evening. What do you think about adding a parameter called lookup_order. It would take a character vector (or list, see below for explanation) of any of the following package listings (with the default order as shown):

  • "searchpaths"/"sp" - the search path
  • "document"/"doc" - the document
  • "library"/"lib" - the user library
  • "sos" - package namespaces found with sos::findFn

The user could re-arrange the arguments as they see fit to determine the order in which namespaces are applied to the document.

This could even be structured as a list where they could supply namespace lists like that to force as well as these single character vectors to indicate a "path", and lookup would proceed down the list. It could potentially replace the force argument altogether.

It would require some reworking of how namespaces are attributed, but probably wouldn't be too difficult.

Thoughts?

@yogat3ch
Copy link
Contributor Author

yogat3ch commented Nov 6, 2020

As a follow-up I just opened #63 with those supporting functions I mentioned on a branch entitled lookup_order that might be the future home of said proposal
.

@yogat3ch
Copy link
Contributor Author

yogat3ch commented Nov 9, 2020

Hi @yonicd,
Just checking in as I haven't gotten a response on this yet. I'm still unsure as to which branch is the branch for tests as I don't see another active branch right now.
I'm reluctant to write tests for these unless you're planning on integrating them... any word on what you plan to do in that regard?

@yonicd
Copy link
Owner

yonicd commented Nov 9, 2020

There is a testing branch. I was looking at your PR over the weekend, and they look good. I'll continue going over them in the next few days.

Thanks again for all the content.

@yonicd
Copy link
Owner

yonicd commented Nov 10, 2020

I migrated the tests to the master branch. You can add them there now, the GHA CI will pick them up on PRs. thanks

@yogat3ch
Copy link
Contributor Author

Ok, thank you! I'll create the tests soon.

@yogat3ch
Copy link
Contributor Author

yogat3ch commented Nov 10, 2020

Ok, so looking back over the pull requests, all of the PR's implement changes to user-input features (when ask = TRUE) except for #63.
Testing #63 would be straightforward, but I don't feel like we've finished the discussion on this thread to the point where I understand that we're moving forward with adding a lookup order option and the functions in #63 will actually be put into use. If/when we do decide to implement this change, I can write tests for it.

Regarding the other PRs: I've searched around and I don't see anything suggesting that it's possible to actually run tests on functions that solicit user inputs. As far as I can tell, I don't think tests can be written for these PRs.
I've been using the modified sinew with ask = TRUE to parse through 100s of files on my computer and have run into numerous bugs/exceptions that I've corrected in the subsequent commits.
I'll finish up name spacing the remaining documents today and see if I encounter any more issues. Other than that I'm not sure what else to do for tests, unless you have some ideas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants