Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve readability of subqueries #638

Closed
mgirlich opened this issue Apr 13, 2021 · 4 comments · Fixed by #790
Closed

Improve readability of subqueries #638

mgirlich opened this issue Apr 13, 2021 · 4 comments · Fixed by #790
Labels
feature a feature request or enhancement verb trans 🤖 Translation of dplyr verbs to SQL

Comments

@mgirlich
Copy link
Collaborator

dbplyr generated SQL would be much easier to understand if one could add custom subquery names or even create CTE ("with" clause). For example

library(dplyr, warn.conflicts = FALSE)
library(dbplyr, warn.conflicts = FALSE)

tbl_lazy(nycflights13::flights) %>% 
  group_by(dest) %>% 
  summarise(
    count = n(),
    dist = mean(distance, na.rm = TRUE),
    delay = mean(arr_delay, na.rm = TRUE)
  ) %>% 
  filter(delay, count > 20, dest != "HNL")

produces (I added some indentation to improve readability already a bit)

SELECT *
FROM (
  SELECT `dest`, COUNT(*) AS `count`, AVG(`distance`) AS `dist`, AVG(`arr_delay`) AS `delay`
  FROM `df`
  GROUP BY `dest`
) `q01`
WHERE ((`delay`) AND (`count` > 20.0) AND (`dest` != 'HNL'))

This would be a bit easier with a custom name for the subquery

SELECT *
FROM (
  SELECT `dest`, COUNT(*) AS `count`, AVG(`distance`) AS `dist`, AVG(`arr_delay`) AS `delay`
  FROM `df`
  GROUP BY `dest`
) `destination_summary`
WHERE ((`delay`) AND (`count` > 20.0) AND (`dest` != 'HNL'))

and even better with a CTE

WITH `destination_summary` AS (
  SELECT `dest`, COUNT(*) AS `count`, AVG(`distance`) AS `dist`, AVG(`arr_delay`) AS `delay`
  FROM `df`
  GROUP BY `dest`
)
SELECT *
FROM `destination_summary`
WHERE ((`delay`) AND (`count` > 20.0) AND (`dest` != 'HNL'))

I haven't thought about syntax yet (especially CTE might be a bit tricky) but if you like the idea I'll try to come up with a concept.

@krlmlr
Copy link
Member

krlmlr commented May 14, 2021

I think the translation of dplyr to SQL with CTEs would be fairly straightforward, each (un-optimized) pipe step would correspond to a derived CTE. Would love to see this!

@mgirlich mgirlich mentioned this issue May 19, 2021
@hadley
Copy link
Member

hadley commented Dec 8, 2021

I wish I found the CTE syntax more compelling. Other people seem to like it, but it doesn't seem like a much of an improvement to me. That doesn't mean that you shouldn't keep working on it, just don't expect me to be excited 😄

@hadley hadley added feature a feature request or enhancement verb trans 🤖 Translation of dplyr verbs to SQL labels Dec 8, 2021
@mgirlich mgirlich mentioned this issue Mar 10, 2022
@bryanwhiting
Copy link

I'm a nobody, but I saw this was recently launched and had to chime in and say thank you!! I want to say I've been trying to think through how to do this for my own use cases for the last 6 months. I use show_query() extensively when creating Rmarkdown pages (specifically, I used https://github.com/line/conflr to publish analyses to Confluence). My non-DS colleagues need a SQL query they can use to replicate my results. @hadley the subquery flow gets incredibly messy once you have a few queried tables that you join together. Subquery flow also doesn't seem to follow the dplyr style of top-down readability, whereas CTEs enable easy readability. Regardless, thanks for building this @mgirlich !

@mgirlich
Copy link
Collaborator Author

@bryanwhiting Thanks a lot for you comment. I am very pleased to see that others benefit from my contributions 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement verb trans 🤖 Translation of dplyr verbs to SQL
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants