Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add <dbplyr_table_ident> #1284

Merged
merged 35 commits into from
Jun 13, 2023
Merged

Add <dbplyr_table_ident> #1284

merged 35 commits into from
Jun 13, 2023

Conversation

mgirlich
Copy link
Collaborator

@mgirlich mgirlich commented May 12, 2023

Currently, there are multiple classes that can be used to create a sql tbl:

  • <ident>
  • <dbplyr_schema>
  • <dbplyr_catalog>
  • <Id> (from DBI)
  • <sql>
  • <SQL>

This is quite messy, needs lots of testing and lots of extra code to support every case. Further, these classes have some annoying properties (e.g. see #1205). This PR improves the situation by adding a new class <dbplyr_table_ident> which replaces these classes (at least internally for now).

Its using vctrs::rcrd() under the hood. In most cases we don't really need (or want) anything but a single table but it might still be nice to have the vector properties.

There are a couple of questions though:

  1. in_schema() and in_catalog() support <ident_q>, e.g. in_schema(ident_q("a b"), "table") gives <SCHEMA> a b.`table`. I don't think it is used often and I don't really see the need for it. Supporting this would make the implementation more complicated.
  2. Should we deprecate <dbplyr_schema> and <dbplyr_catalog>? We use it internally anyway, so package authors might need to adapt to it anyway. Also, they would likely face the same issues we had with the schema and catalog classes.
  3. I would love to get rid of the sql field in the table ident class. Unfortunately, it is needed to support <ident_q>. Could we deprecate <ident_q> for table identifiers? I don't really see a good use case for them. I think they rather make problems than provide actual benefit.

Closes #1204. Closes #1205. Closes #1261. Closes #1139. Closes #1264.

@mgirlich
Copy link
Collaborator Author

mgirlich commented May 12, 2023

@hadley What's your opinion here?
@krlmlr This might be interesting for you and dm. Feel free to provide feedback.

@hadley
Copy link
Member

hadley commented May 12, 2023

Basic idea seems reasonable to me. Answer to questions below:

  1. As long as there's someway to opt-out of quoting, no need to keep ident_q()
  2. I'm fine with it; I've never found the idea of giving special names to various levels of namespacing terribly compelling.
  3. No strong feelings and no recollection of why it was set up this was in the first place, so as long as (1) is handled, then I'm happy.

@mgirlich
Copy link
Collaborator Author

As long as there's someway to opt-out of quoting, no need to keep ident_q()

At which detail level to you want to be able to opt out of quoting?

  1. Either quote everything (i.e. catalog, schema, and table) or nothing, e.g.
new_table_ident("catalog", "schema", "table", quote = TRUE)
#> "catalog"."schema"."table"
new_table_ident("catalog", "schema", "table", quote = FALSE)
#> catalog.schema.table
  1. Can toggle quoting for catalog, schema, and table separately, e.g.
new_table_ident("catalog", "schema", "table")
#> "catalog"."schema"."table"
new_table_ident("catalog", "schema", "table", quote_schema = FALSE)
#> "catalog".schema."table"
  1. Should it be possible to provide a qualified table name, e.g.
new_table_ident(sql("catalog.schema.table"))

The 3rd case doesn't really need to be implemented separately as it can be covered in both cases, but I think it might affect the user interface.

By the way, what's your use case to provide already quoted identifiers?

@hadley
Copy link
Member

hadley commented May 15, 2023

All I want is someway to provide a string that dbplyr will leave as is — this is just a useful escape hatch if we hit some weird database or weird table name where our normal approach doesn't work. So in terms of the user interface, option 3 is what I was thinking of (although I'm not tied to sql()).

@mgirlich mgirlich marked this pull request as ready for review May 25, 2023 11:01
@mgirlich
Copy link
Collaborator Author

@hadley You can skim over this PR to get a rough idea on how as_table_ident() affects the code.
My main questions are:

  • do we want to export the <dbplyr_table_ident> class? (together with new/as_table_ident())
  • It would be nice to get rid of the classes <dbplyr_catalog>, <dbplyr_schema()> and <ident_q> in the future. But its not easy to get rid of a class. Maybe we could do that by deprecating escape.dbplyr_schema/dbplyr_catalog/ident_q(). What do you think?

R/table-ident.R Outdated Show resolved Hide resolved
check_character(types, allow_null = TRUE)
check_named(types)
check_bool(temporary)

tryCatch(
dbWriteTable(
con,
name = dbi_quote(table, con),
name = table_ident_to_id(table),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before we passed a quoted string of class <SQL>. It has benefits to instead pass an <Id> object (see #1261). As I had to adapt this anyway, I thought I might just change it directly.

@@ -187,11 +187,6 @@ db_write_table.DBIConnection <- function(con,

# Utility functions ------------------------------------------------------------

dbi_quote <- function(x, con) UseMethod("dbi_quote")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dbi_quote() basically wasn't used anywhere.

R/table-ident.R Outdated
check_character(catalog, call = error_call)
check_logical(quoted, call = error_call)

n <- vctrs::vec_size_common(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not really necessary to support vectors but it might be handy at some point.

),
.frequency = "regularly",
.frequency_id = paste0("dbplyr_", f)
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should hopefully push people to avoid using ident_q() and sql() unless necessary.

R/tbl-sql.R Show resolved Hide resolved
@mgirlich
Copy link
Collaborator Author

mgirlich commented Jun 7, 2023

@hadley do you think you find time in the next 2 weeks to skim over this PR? It's okay if you're too busy just wanted to know whether I should wait for this PR before tackling some other issues or not 😄

@hadley
Copy link
Member

hadley commented Jun 7, 2023

Sorry, I lost the notification. I'll add it to my to do list and try and take a look this week.

Copy link
Member

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea! I'll watch this space to make the necessary changes in dm. Is there a fallback that works in CRAN and dev dplyr?

R/table-ident.R Outdated Show resolved Hide resolved
@@ -21,7 +21,7 @@
df %>% compute(name = in_schema("main", "db1"), temporary = FALSE)
Condition
Error in `db_save_query.DBIConnection()`:
! Can't save query to table `main`.`db1`.
! Can't save query to table main.db1.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is on purpose:

  • this might help to disencourage people to use format() when they should actually use escape()
  • the backticks aren't the correct way to escape names in many databases. So, the error message might confuse people that dbplyr didn't escape the table identifier correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do generally prefer to use some indicator of the start and end of the string because it's not uncommon to accidentally add a leading or trailing space, which is then really hard to discover.

R/table-ident.R Outdated Show resolved Hide resolved
R/table-ident.R Outdated Show resolved Hide resolved
R/table-ident.R Outdated Show resolved Hide resolved
R/tbl-sql.R Show resolved Hide resolved
R/verb-joins.R Outdated Show resolved Hide resolved
@@ -21,7 +21,7 @@
df %>% compute(name = in_schema("main", "db1"), temporary = FALSE)
Condition
Error in `db_save_query.DBIConnection()`:
! Can't save query to table `main`.`db1`.
! Can't save query to table main.db1.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do generally prefer to use some indicator of the start and end of the string because it's not uncommon to accidentally add a leading or trailing space, which is then really hard to discover.

if (!isFALSE(temporary)) {
cli_abort(c(
"RPostgreSQL backend does not support creation of temporary tables",
i = "Either set {.code temporary = FALSE} or switch to {.pkg RPostgres}"
))
}

# RPostgreSQL doesn't handle `Id()` or `SQL()` correctly, so we can only pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to warn about this? Or do we think so few people still use RPostgreSQL that it's not worth it?

R/build-sql.R Outdated Show resolved Hide resolved
R/db-sql.R Outdated Show resolved Hide resolved
R/table-ident.R Outdated Show resolved Hide resolved
@mgirlich mgirlich merged commit 17980ed into main Jun 13, 2023
@mgirlich mgirlich deleted the new-table-ident branch June 13, 2023 07:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants