Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integrate jcheng5/knitsql as knitr engine #1241

Merged
merged 5 commits into from Jul 13, 2016

Conversation

javierluraschi
Copy link
Contributor

@javierluraschi javierluraschi commented Jul 12, 2016

This PR adds a SQL engine for knitr to allow chunks based on DBI that look like:

``{r}
library(DBI)
conn <- dbConnect(RSQLite::SQLite(), dbname = "sql.sqlite")
number <- 42
``

``{sql output, conn = conn}
SELECT ?number
``

``{r}
output
``
##   42
## 1 42

This change comes from integrating https://github.com/jcheng5/knitsql adding a sample .Rmd file and removing the yaml configuration feature.

eng_sql = function(options) {
# Return char vector of sql interpolation param names
varnames_from_sql <- function(conn, sql) {
varPos <- DBI::sqlParseVariables(conn, sql)
Copy link
Contributor

@jjallaire jjallaire Jul 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are going to want the DBI package to be a Suggests dependencies, which implies that we should preface our calls to DBI with a loadable check (see

knitr/R/utils.R

Line 795 in 07b79a9

loadable = function(pkg) requireNamespace(pkg, quietly = TRUE)
).

Copy link
Owner

@yihui yihui Jul 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting DBI in Suggests is enough. If it is not installed or loadable, R will just signal an error here. I use loadable() when the package is not absolutely needed, which is not true here.

Copy link
Contributor Author

@javierluraschi javierluraschi Jul 13, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to only use suggests

@yihui yihui added this to the v1.14 milestone Jul 12, 2016
@javierluraschi
Copy link
Contributor Author

@javierluraschi javierluraschi commented Jul 13, 2016

Here is the output with the pr feedback changes:

screen shot 2016-07-12 at 11 55 53 pm

@jjallaire
Copy link
Contributor

@jjallaire jjallaire commented Jul 13, 2016

Another thought aimed at preventing performance problems when exploring large datasets: if there is no ouptut.var provided then I think we should automatically append a LIMIT clause to SELECT statements. This is also the default behavior of e.g. dplyr which will fully realize the dataset only for collect() but only show 10 records for print(). We could add a max.display or max.print chunk option to control the number of records printed.


args <- if (length(names) > 0) {
setNames(
mget(names, inherits = inherits),
Copy link
Owner

@yihui yihui Jul 13, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we also need envir = env for mget().

@jjallaire
Copy link
Contributor

@jjallaire jjallaire commented Jul 13, 2016

A related suggestion: @yihui What would you think about importing the tibble (https://cran.r-project.org/web/packages/tibble/index.html) package for printing data frames within knitr? I guess you could also just check whether it is installed and use it in that case. Alternatively we could do the same thing in rmarkdown.

@yihui
Copy link
Owner

@yihui yihui commented Jul 13, 2016

I think the LIMIT suggestion makes sense, although I'm not sure it can be done reliably.

I definitely prefer tibble's printing behavior over R's default, so yes, if (loadable('tibble')) print(tibble::as_tibble(result)) else print(result) will be nice.

@jjallaire
Copy link
Contributor

@jjallaire jjallaire commented Jul 13, 2016

Another more robust way to implement the LIMIT suggestions is to use
dbSendQuery/dbFetch(n=10)/dbClearResult rather than dbGetQuery in cases
where no variable is being assigned.

Glad you like the tibble idea, Javier let's do it as Yihui suggests (and
add a Suggests dependency for tibble)

On Wed, Jul 13, 2016 at 5:41 AM, Yihui Xie notifications@github.com wrote:

I think the LIMIT suggestion makes sense, although I'm not sure it can be
done reliably.

I definitely prefer tibble's printing behavior over R's default, so yes, if
(loadable('tibble')) print(tibble::as_tibble(result)) else print(result)
will be nice.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1241 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AAGXx2ItOmvB21f9mbmTcqiFqw2nIXg5ks5qVLLKgaJpZM4JK1i8
.

@yihui
Copy link
Owner

@yihui yihui commented Jul 13, 2016

Oh yeah, passing the limit as a function argument should be more robust than manipulating the SQL code directly.

@jjallaire
Copy link
Contributor

@jjallaire jjallaire commented Jul 13, 2016

@javierluraschi Could you merge master into this PR (I want to provide instructions for installing from the PR branch and I want the branch to include the recent Rcpp work we did)

@javierluraschi
Copy link
Contributor Author

@javierluraschi javierluraschi commented Jul 13, 2016

@jjallaire merged to master + added tibble dependency.

@@ -101,7 +101,8 @@ Suggests:
jpeg,
XML,
RCurl,
DBI
DBI,
tibble
Copy link
Contributor

@jjallaire jjallaire Jul 13, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also need the bit that Yihui provided to use tibble to do the printing when it's available (i.e. loadable).

Copy link
Contributor Author

@javierluraschi javierluraschi Jul 13, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks updated.

@yihui yihui merged commit 49aa561 into yihui:master Jul 13, 2016
2 checks passed
@jjallaire
Copy link
Contributor

@jjallaire jjallaire commented Jul 13, 2016

A couple more changes I think we still need here:

  1. Add the envir = env to the mget call as suggested by @yihui
  2. Add the behavior of throttling the number of records retrieved when we are only displaying the data (I'd say 10 records is okay but I could live with 15 or 20). This should be done via dbSendQuery/dbFetch(n=10)/dbClearResult.
  3. Add a max.display option which lets the user override the default number of displayed records.

@yihui
Copy link
Owner

@yihui yihui commented Jul 13, 2016

Yeah these are simple enough, so I'm adding them now.

yihui added a commit that referenced this issue Jul 13, 2016
@yihui
Copy link
Owner

@yihui yihui commented Jul 13, 2016

All done now. For 2, the chunk option is limit, which defaults to 10. Users can set it to -1 to fetch all records.

yihui added a commit to yihui/knitr-examples that referenced this issue Jul 13, 2016
@jjallaire
Copy link
Contributor

@jjallaire jjallaire commented Jul 13, 2016

I have one more idea that I may want to explore for data display (will send a PR tomorrow or Monday but just wanted to put it out there for discussion). If we know we are in rmarkdown we could display the data using kable. Furthermore, the html_document format in rmarkdown could attach stickytableheaders to the kable output (as I do in flexdashboard: https://github.com/rstudio/flexdashboard/blob/master/inst/www/stickytableheaders/jquery.stickytableheaders.js). These two things together would make it pretty easy to display more like 1000 records and not have it overwhelm the document.

@yihui
Copy link
Owner

@yihui yihui commented Jul 13, 2016

Coincidently I also thought for a while about using kable()! 😄

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants