Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write_bib() could respect the CITATION file if any #2028

Open
cderv opened this issue Jul 19, 2021 · 9 comments
Open

write_bib() could respect the CITATION file if any #2028

cderv opened this issue Jul 19, 2021 · 9 comments

Comments

@cderv
Copy link
Collaborator

cderv commented Jul 19, 2021

This would allow some special like keeping first URL if multiple to be apply.

For example, we have this in bookdown where we use a CITATION file to get some specific logic
https://github.com/rstudio/bookdown/blob/0de8f113fd1f8f9d8d140b754c5120eca1e27a0a/inst/CITATION#L12

Currently, write_bib() overwrite the auto= argument of utils::citation() by some content of the DESCRIPTION file. This lead to different output of the two functions.

knitr::write_bib("bookdown")
#> Warning in utils::citation(..., lib.loc = lib.loc): no date field in DESCRIPTION
#> file of package 'bookdown'
#> @Manual{R-bookdown,
#>   title = {bookdown: Authoring Books and Technical Documents with R Markdown},
#>   author = {Yihui Xie},
#>   year = {2021},
#>   note = {https://github.com/rstudio/bookdown,
#> https://pkgs.rstudio.com/bookdown/},
#> }
#> 
#> @Book{bookdown2016,
#>   title = {bookdown: Authoring Books and Technical Documents with {R} Markdown},
#>   author = {Yihui Xie},
#>   publisher = {Chapman and Hall/CRC},
#>   address = {Boca Raton, Florida},
#>   year = {2016},
#>   note = {ISBN 978-1138700109},
#>   url = {https://bookdown.org/yihui/bookdown},
#> }

utils::toBibtex(utils::citation("bookdown"))
#> @Manual{,
#>   title = {bookdown: Authoring Books and Technical Documents with R Markdown},
#>   author = {Yihui Xie},
#>   note = {R package version 0.22.12},
#>   url = {https://github.com/rstudio/bookdown},
#> }
#> 
#> @Book{,
#>   title = {bookdown: Authoring Books and Technical Documents with {R} Markdown},
#>   author = {Yihui Xie},
#>   publisher = {Chapman and Hall/CRC},
#>   address = {Boca Raton, Florida},
#>   year = {2016},
#>   note = {ISBN 978-1138700109},
#>   url = {https://bookdown.org/yihui/bookdown},
#> }

This came up in rstudio/rmarkdown-cookbook#348 (comment)
Dealing with CITATION file if it exists would also work for dev package not yet on CRAN possibly.

Opening to track the idea and see if we have other use case for this.

@yihui
Copy link
Owner

yihui commented Jul 19, 2021

I've made a change to show only one URL if multiple are provided. This does not fully fulfill the original request, but perhaps the OP would be happy enough.

It may be a little tricky to fulfill the original request because we will have to tell which entry in citation(auto = TRUE) is the package citation and not duplicate it with the entry generated from citation(auto = FALSE).

@cderv
Copy link
Collaborator Author

cderv commented Jul 20, 2021

Oh I see the logic now. I previously missed it.

We don't get the same results as I expected because some part of the CITATION file processing is filtered out.

knitr/R/citation.R

Lines 109 to 114 in cab26ef

if (system.file('CITATION', package = pkg) == '') return()
cites = citation(pkg, auto = FALSE)
cites = Filter(x = cites, function(cite) {
# exclude entries identical to citation(pkg, auto = TRUE)
!isTRUE(grepl('R package version', cite$note))
})

That is why the processing of keeping only the first url that we put in CITATION file is not kept. The content from DESCRIPTION takes precedence in the function.

The change you've made offer that but the url is still set in note = and not in url =.
Maybe that is not so important but I was curious of why we got this difference in the first place.

Now I understand that CRAN url will be used if multiple URL are used in the fields

knitr/R/citation.R

Lines 75 to 80 in cab26ef

# don't use the CRAN URL if the package has provided its own URL
if (identical(meta$Repository, 'CRAN') && !is.null(meta$URL)) {
# however, the package may have provided multiple URLs, in which case we
# still use the CRAN URL
if (!grepl('[, ]', meta$URL)) meta$Repository = NULL
}

so we'll always have the difference with CRAN package.

Maybe that is not worth changing.

@cderv
Copy link
Collaborator Author

cderv commented Aug 17, 2021

Now I understand that CRAN url will be used if multiple URL are used in the fields

@yihui Why do we use here the CRAN url instead of the the first URL provided ?

If we keep this rule, we may need to also support RSPM as a repo to change the URL to the CRAN one.

Currently, we don't get the same information for the same package installed from CRAN or RSPM

> withr::with_temp_libpaths({
+   install.packages("trackdown", lib = .libPaths()[1], repos = "https://cran.rstudio.com")
+   knitr::write_bib("trackdown")
+   install.packages("trackdown", lib = .libPaths()[1], repos = "https://packagemanager.rstudio.com/all/__linux__/focal/latest")
+   knitr::write_bib("trackdown")
+ })

@Manual{R-trackdown,
  title = {trackdown: Collaborative Writing and Editing of R Markdown (or Sweave)
Documents in Google Drive},
  author = {Emily Kothe and Claudio {Zandonella Callegher} and Filippo Gambarota and Janosch Linkersdörfer and Mathew Ling},
  year = {2021},
  note = {R package version 1.0.0},
  url = {https://CRAN.R-project.org/package=trackdown},
}

@Manual{R-trackdown,
  title = {trackdown: Collaborative Writing and Editing of R Markdown (or Sweave)
Documents in Google Drive},
  author = {Emily Kothe and Claudio {Zandonella Callegher} and Filippo Gambarota and Janosch Linkersdörfer and Mathew Ling},
  year = {2021},
  note = {https://github.com/claudiozandonella/trackdown/},
}

This is because:

  • citation("trackdown", auto = FALSE) will not create a URL field. It will only do it for CRAN repo when multi URL
  • citation("trackdown", auto = TRUE) will - but the whole citation will be ignored when merged with the previous one. (filtered out on the field isTRUE(grepl("R package version", cite$note)

I am not sure which is the best solution for this function to work for more packages but I believe we could just also support RSPM with CRAN to set the URL field to the CRAN page of the package.

diff --git a/R/citation.R b/R/citation.R
index e22f6b27..a002bb4c 100644
--- a/R/citation.R
+++ b/R/citation.R
@@ -73,7 +73,7 @@ write_bib = function(
     cite = citation(pkg, auto = if (pkg != 'base') {
       meta = packageDescription(pkg, lib.loc = lib.loc)
       # don't use the CRAN URL if the package has provided its own URL
-      if (identical(meta$Repository, 'CRAN') && !is.null(meta$URL)) {
+      if (meta$Repository %in% c('CRAN', 'RSPM') && !is.null(meta$URL)) {
         # however, the package may have provided multiple URLs, in which case we
         # still use the CRAN URL
         if (!grepl('[, ]', meta$URL)) meta$Repository = NULL

Simple fix that would set an URL also when RSPM is used.

@yihui
Copy link
Owner

yihui commented Aug 20, 2021

+      if (meta$Repository %in% c('CRAN', 'RSPM') && !is.null(meta$URL)) {

Sure. We can certainly do that.

Why do we use here the CRAN url instead of the the first URL provided ?

I don't remember exactly, but it's probably because it's not robust to split multiple URLs by commas---one URL can contain a comma. Perhaps splitting by ", " (comma followed by space) is safe enough. We can do that, too.

It may also be because I wanted to support the "canonical" CRAN URL when there are multiple URLs, if I must pick one URL.

@dmurdoch
Copy link
Contributor

I've just had a discussion about the URL that citation() provides. It started here and continued offline. There were two issues:

  • If the package provides two URLs, citation() doesn't know which of them to use, so it was using a note entry, and overwriting the package version information that was recorded there. This bug has been fixed in R-devel, but it still affects write_bib() in earlier versions.
  • If the package provides just one, but it was installed from CRAN, BioC, or R-forge, citation() uses canonical URLs from those sites rather than the URL provided by DESCRIPTION. I called that a bug, and after some back and forth I agreed that sometimes the DESCRIPTION URL is right, and sometimes the canonical one is right. Basically if you want the reader to find out more about the package, use the first one; if you want to know exactly what code was used in the paper, use the second. (Package version numbers aren't guaranteed to uniquely identify the package code in general, but on CRAN and BioC they do.)

BTW, a reason not to use the first URL when two are provided is that the definition of what kind of thing should go in the DESCRIPTION file URL field is very loose. It is not at all required that the first one is about the package, it might be the author's home page.

I've got a patch to write_bib() that works around the bug in citation() so that version info isn't lost in the two-URL case. I'll put in a PR for that soon. Would you also be interested in adding an argument to write_bib() to choose which kind of URL to use?

@zeileis
Copy link

zeileis commented Jun 14, 2023

Thanks, Duncan! Just a small addition: The citation() functionality will also be extended soon in R-devel to recognize packages installed via remotes::install_github() and adapt the "note" and "url" fields correspondingly (indicating the exact commit that was installed). Kurt will incorporate a patch that he has discussed briefly with Gabor.

@yihui
Copy link
Owner

yihui commented Jun 15, 2023

Would you also be interested in adding an argument to write_bib() to choose which kind of URL to use?

@dmurdoch Yes. Thanks!

@dmurdoch
Copy link
Contributor

@yihui : Ok, I'll include both changes in the PR. I'll wait until I see the citation() changes for Github before submitting it, so they'll be covered too.

@dmurdoch
Copy link
Contributor

This is done now in #2264. I think this fixes other issues here too, though I'm not completely sure about the RSPM intention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants