Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: SQL engine with interlaced output #2093

Open
3 tasks done
BrianDiggs opened this issue Jan 7, 2022 · 2 comments
Open
3 tasks done

Feature request: SQL engine with interlaced output #2093

BrianDiggs opened this issue Jan 7, 2022 · 2 comments
Labels
feature Feature requests help needed We need your help with the issue/PR

Comments

@BrianDiggs
Copy link
Contributor

Short description

This is a request for a new (or modified) language engine that would process multiple SQL statements within a single block and insert the results after each statement.

Current capabilities

The existing SQL engine submits the entire block in a single execution (via either DBI::dbExecute or DBI::dbSendQuery) and pulls all the results at once in the latter case. Depending on the back end DBI implementation, the SQL database will either execute the first statement and discard the rest, or the SQL database will execute all the statements submitted, but the returned results are squashed together into a structure based on the results of the first statement, possibly omitting some of the returned data in the process. These results are then output in a single table.

Desired behavior

My request is for a new language engine (or a modification to the existing SQL language engine if the changes are not too breaking) which treats SQL much like R is currently treated:

  • In the case of multiple statements, the input is parsed into separate individual statements
  • Each statement is run sequentially
  • The output (in this case the returned data rather than simply captured output) is collected for each statement
  • When there is output, the code block is ended and an output block is rendered with a new code block starting with the next statement
  • Perhaps, for efficiency, full line comments are removed before submitting submitting to the database

In the case of an SQL block with a single statement, this would behave as the current engine does.

Motivation

The desired behavior could be achieved by splitting the SQL code block into separate statements. However, the use case that I am thinking of gets the entire SQL script from an external file. I have an existing .sql file which is an exploration of where data is in a database and how it is related. I want to keep that as a complete, self contained SQL file which can be executed in a database front end. However, I would also like to be able to create a document that shows what the result of running the commands are, but keep that separate from the SQL file itself. In particular, this can be useful for tracking changes separately between what is requested of the database (the SQL file) and what the database returns (the output woven with the input). The SQL file being separate allows that to be edited using tools best suited for it.

Example

RMarkdown source file

Here is a minimal .Rmd file that I can use to show the behavior I would like. Using the RSQLite DBI back end to remove a dependency on an external database.

---
output: 
  html_document: 
    keep_md: yes
---
```{r}
library("odbc")
library("RSQLite")
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
```

```{sql, connection = con}
SELECT 1 AS A;
SELECT 2 AS A, 'B' AS B
```

```{r}
DBI::dbDisconnect(conn = con)
```

Excerpt of intermediate markdown file

This is an excerpt of the intermediate .md file for just the portion related to the SQL engine block. It is worth noting that the RSQLite back end does not seem to support sending multiple statements at once anyway. There is a warning on the console:

Warning message:
Ignoring remaining part of query: SELECT 2 AS A, 'B' AS B 

Excerpt of .md file:

```sql
SELECT 1 AS A;
SELECT 2 AS A, 'B' AS B
```


<div class="knitsql-table">


Table: 1 records

|  A|
|--:|
|  1|

</div>

Excerpt of intermediate markdown file, using a different driver supporting multiple statements

If I use a DBI driver which does support multiple statements being submitted at once (such as the SQL Server ODBC driver), the same section looks like (reproducible code not shown because it requires a connection to an SQL Server which I can not reproduce):

```sql
SELECT 1 AS A;
SELECT 2 AS A, 'B' AS B
```


<div class="knitsql-table">


Table: 2 records

|A  |
|:--|
|1  |
|2  |

</div>

Desired excerpt of intermediate markdown file

The output that I would like to see for that same section (which in this case can be generated by splitting the two statements into two separate SQL blocks, but which does not work as a general solution):

```sql
SELECT 1 AS A;
```


<div class="knitsql-table">


Table: 1 records

|  A|
|--:|
|  1|

</div>

```sql
SELECT 2 AS A, 'B' AS B
```


<div class="knitsql-table">


Table: 1 records

|  A|B  |
|--:|:--|
|  2|B  |

</div>

Implementation ideas

There is a function sqlparseR::sql_split() which looks like it would do the job of splitting a set of strings into separate SQL statements. Two other SQL parsers, RSQLParser and queryparser, do not seem to have support for splitting up multiple statements. For R chunks, that splitting takes place in evaluate::evaluate(), but for SQL it looks like it would need to go straight into eng_sql() since that does not use the evaluate hook.


By filing an issue to this repo, I promise that

  • I have fully read the issue guide at https://yihui.org/issue/.
  • I have provided the necessary information about my issue.
    • If I'm asking a question, I have already asked it on Stack Overflow or RStudio Community, waited for at least 24 hours, and included a link to my question there.
    • If I'm filing a bug report, I have included a minimal, self-contained, and reproducible example, and have also included xfun::session_info('knitr'). I have upgraded all my packages to their latest versions (e.g., R, RStudio, and R packages), and also tried the development version: remotes::install_github('yihui/knitr').
    • If I have posted the same issue elsewhere, I have also mentioned it in this issue.
  • I have learned the Github Markdown syntax, and formatted my issue correctly.

I understand that my issue may be closed if I don't fulfill my promises.

@yihui yihui added feature Feature requests help needed We need your help with the issue/PR labels Mar 24, 2022
@yihui
Copy link
Owner

yihui commented Mar 24, 2022

I think this feature request definitely makes sense, and I'm supportive of it. Unfortunately I don't really have much expertise in SQL or databases, so someone has to help.

I feel the first step appears to be the most critical, i.e., splitting the SQL statements. I'm a little concerned about sqlparseR since it requires Python and a Python package. I wonder if there could be a pure R solution like what I did for R code: https://github.com/yihui/xfun/blob/5557263bfd064efcfe7f241f094f8e5b73e427c8/R/string.R#L145-L168 Bascially all we need is a way to verify if a SQL statement is valid without executing it.

@cderv
Copy link
Collaborator

cderv commented Mar 24, 2022

Bascially all we need is a way to verify if a SQL statement is valid without executing it

We could check into what dbplyr is doing maybe... it will translate dplyr pipeline to SQL statement so doing some R / SQL stuff for sure.

There is also a set of packages that could be helpful or source of inspiration.

Maybe queryparser can help validate a query. 🤔

This is a project I can definitely look at - but now right now. If someone want to give it a shot, anyone welcomed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Feature requests help needed We need your help with the issue/PR
Projects
None yet
Development

No branches or pull requests

3 participants