Skip to content

Using dplyr::dense_rank without attaching dplyr causes the function replacement to fail #1231

Closed
@multimeric

Description

@multimeric

Here is a strange edge case. Under the following conditions, dplyr::dense_rank is not converted to a DENSE_RANK() SQL function:

  • dplyr::dense_rank() is inside a dplyr::across()
  • dplyr is not attached via library
  • The function is called via the namespace: dplyr::dense_rank

Here are some examples. Firstly, the failure case. Note how vec_rank (which is a dplyr, and not an SQL function) is used:

> dbplyr::lazy_frame(a=5:1, b=1:5) |> dplyr::mutate(dplyr::across(dplyr::everything(), dplyr::dense_rank)) |> dplyr::show_query()
<SQL>
SELECT
  vec_rank(`a`, 'dense' AS `ties`, 'na' AS `incomplete`) AS `a`,
  vec_rank(`b`, 'dense' AS `ties`, 'na' AS `incomplete`) AS `b`
FROM `df`
Warning messages:
1: Named arguments ignored for SQL vec_rank
2: Named arguments ignored for SQL vec_rank

However, if we simply remove the dplyr:: namespace, it works fine:

> dbplyr::lazy_frame(a=5:1, b=1:5) |> dplyr::mutate(dplyr::across(dplyr::everything(), dense_rank)) |> dplyr::show_query()
<SQL>
SELECT
  CASE
WHEN (NOT((`a` IS NULL))) THEN DENSE_RANK() OVER (PARTITION BY (CASE WHEN ((`a` IS NULL)) THEN 1 ELSE 0 END) ORDER BY `a`)
END AS `a`,
  CASE
WHEN (NOT((`b` IS NULL))) THEN DENSE_RANK() OVER (PARTITION BY (CASE WHEN ((`b` IS NULL)) THEN 1 ELSE 0 END) ORDER BY `b`)
END AS `b`
FROM `df`

Also, if we don't use across():

> dbplyr::lazy_frame(a=5:1, b=1:5) |> dplyr::mutate(a = dplyr::dense_rank(a)) |> dplyr::show_query()
<SQL>
SELECT
  CASE
WHEN (NOT((`a` IS NULL))) THEN DENSE_RANK() OVER (PARTITION BY (CASE WHEN ((`a` IS NULL)) THEN 1 ELSE 0 END) ORDER BY `a`)
END AS `a`,
  `b`
FROM `df`

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions