Skip to content

Improve readability of subqueries #638

@mgirlich

Description

@mgirlich

dbplyr generated SQL would be much easier to understand if one could add custom subquery names or even create CTE ("with" clause). For example

library(dplyr, warn.conflicts = FALSE)
library(dbplyr, warn.conflicts = FALSE)

tbl_lazy(nycflights13::flights) %>% 
  group_by(dest) %>% 
  summarise(
    count = n(),
    dist = mean(distance, na.rm = TRUE),
    delay = mean(arr_delay, na.rm = TRUE)
  ) %>% 
  filter(delay, count > 20, dest != "HNL")

produces (I added some indentation to improve readability already a bit)

SELECT *
FROM (
  SELECT `dest`, COUNT(*) AS `count`, AVG(`distance`) AS `dist`, AVG(`arr_delay`) AS `delay`
  FROM `df`
  GROUP BY `dest`
) `q01`
WHERE ((`delay`) AND (`count` > 20.0) AND (`dest` != 'HNL'))

This would be a bit easier with a custom name for the subquery

SELECT *
FROM (
  SELECT `dest`, COUNT(*) AS `count`, AVG(`distance`) AS `dist`, AVG(`arr_delay`) AS `delay`
  FROM `df`
  GROUP BY `dest`
) `destination_summary`
WHERE ((`delay`) AND (`count` > 20.0) AND (`dest` != 'HNL'))

and even better with a CTE

WITH `destination_summary` AS (
  SELECT `dest`, COUNT(*) AS `count`, AVG(`distance`) AS `dist`, AVG(`arr_delay`) AS `delay`
  FROM `df`
  GROUP BY `dest`
)
SELECT *
FROM `destination_summary`
WHERE ((`delay`) AND (`count` > 20.0) AND (`dest` != 'HNL'))

I haven't thought about syntax yet (especially CTE might be a bit tricky) but if you like the idea I'll try to come up with a concept.

Metadata

Metadata

Assignees

No one assigned

    Labels

    dplyr verbs 🤖Translation of dplyr verbs to SQLfeaturea feature request or enhancement

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions