Implement check_keys #1619

hadley · 2016-01-12T15:51:21Z

This is very rarely desirable (however we will need an option to turn the warning off in case you do actually want it).

krlmlr · 2016-01-25T20:51:17Z

I wonder how to achieve this for SQL sources.

hadley · 2016-03-01T15:17:49Z

Or if keys are missing (#1590)

hadley · 2016-05-22T18:52:04Z

For now, let's make this an explicit check_keys() function. This could be a simple wrapper around count - the main challenge will be a nice output, particularly if there are many problems.

krlmlr · 2016-05-22T20:22:19Z

I have been using a similar functionality in my scripts. I think it suffices if the first few collisions are reported.

Should check_keys() call compute() for SQL sources?

rtaph · 2016-12-01T19:37:04Z

@hadley I agree it is rarely desirable to require that both x and y not have duplicates, but would argue it is common not to want the keys in y to have duplicates.

When left joining in particular, being able to require a non-duplicative property in the foreign key y ensures that conceptual meaning of a row is preserved through the join operation.

hadley · 2017-02-23T00:39:56Z

A natural implementation would be something like this:

check_keys <- function(.data, ...) {
  keys <- select(.data, ...)
  keys <- ungroup(keys)
  keys <- mutate(keys, `__id` = 1:row_number())

  # Check that keys are unique
  dups <- group_by_all(keys)
  dups <- summarise(dups, n = n())
  dups <- filter(dups, n > 1)
  dups_keys <- semi_join(keys, dups, by = names(keys))
  
  # Check that keys are not missing
  miss <- filter_all(keys, is.na)
  miss_keys <- semi_join(keys, dups, by = names(keys))
}

But this requires both group_by_all() and filter_all()

hadley · 2017-02-23T00:40:36Z

I think we'll probably need two functions: one that throws an error if anything is wrong, the other should return a tibble with one row per problem.

krlmlr · 2017-02-23T08:37:33Z

Should this be part of tidyr, because we're only using exported verbs here? How does this interact with set_key() (#1792)?

romainfrancois · 2018-05-30T08:18:56Z

We have group_by_all and filter_all now. I don't really understand what this is about though.

krlmlr · 2018-05-30T12:40:25Z

I think this is about a utility function that verifies if a selection of "key columns" defines a key on the tibble, i.e. that for each unique combination of values in the "key columns" there is at most one row.

romainfrancois · 2018-05-30T12:48:56Z

Something like this then conceptually using some of the new tools from #3574

check_keys <- function(.data, ...){
  group_by(.data, ...) %>% 
    group_rows() %>% 
    lengths() %>% 
    all(. == 1L)
}

mtcars %>% 
  check_keys(mpg, cyl, qsec)

Internalizing this by reusing some of the code from group_by would be quicker.

krlmlr · 2018-05-30T13:01:43Z

Maybe we can just look at group_data() ?

romainfrancois · 2018-05-30T13:11:16Z

what do you mean ? instead of group_rows ?

hadley · 2018-05-30T13:28:57Z

Just leave this one for me - it's more of a user interface issue.

hadley · 2019-12-11T20:02:28Z

I now think this is out of scope for dplyr, and better belongs in its own package, e.g. https://github.com/krlmlr/dm.

hadley added the feature a feature request or enhancement label Mar 1, 2016

hadley added this to the 0.5 milestone Mar 1, 2016

hadley modified the milestones: future, 0.5 Mar 1, 2016

krlmlr mentioned this issue Mar 1, 2016

New arguments "unique_indexes" and "indexes" to compute() #1550

Merged

hadley modified the milestones: 0.5, future May 22, 2016

hadley modified the milestones: future, 0.5 May 26, 2016

This was referenced Dec 1, 2016

feature request : add merge indicator after a merge in dplyr #2183

Closed

Preventing Addition of Rows in left_join #2278

Closed

krlmlr mentioned this issue Feb 21, 2017

FR: UPDATE- and UPSERT-like functionality #2075

Closed

hadley added the verbs 🏃‍♀️ label Feb 21, 2017

hadley changed the title ~~In joins, warn if both x and y have duplicate keys~~ Implement check_join function Feb 22, 2017

hadley changed the title ~~Implement check_join function~~ Implement check_keys Feb 22, 2017

romainfrancois assigned hadley May 30, 2018

hadley closed this as completed Dec 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement check_keys #1619

Implement check_keys #1619

hadley commented Jan 12, 2016

krlmlr commented Jan 25, 2016

hadley commented Mar 1, 2016

hadley commented May 22, 2016

krlmlr commented May 22, 2016

rtaph commented Dec 1, 2016 •

edited

Loading

hadley commented Feb 23, 2017

hadley commented Feb 23, 2017

krlmlr commented Feb 23, 2017

romainfrancois commented May 30, 2018

krlmlr commented May 30, 2018

romainfrancois commented May 30, 2018 •

edited

Loading

krlmlr commented May 30, 2018

romainfrancois commented May 30, 2018

hadley commented May 30, 2018

hadley commented Dec 11, 2019

Implement check_keys #1619

Implement check_keys #1619

Comments

hadley commented Jan 12, 2016

krlmlr commented Jan 25, 2016

hadley commented Mar 1, 2016

hadley commented May 22, 2016

krlmlr commented May 22, 2016

rtaph commented Dec 1, 2016 • edited Loading

hadley commented Feb 23, 2017

hadley commented Feb 23, 2017

krlmlr commented Feb 23, 2017

romainfrancois commented May 30, 2018

krlmlr commented May 30, 2018

romainfrancois commented May 30, 2018 • edited Loading

krlmlr commented May 30, 2018

romainfrancois commented May 30, 2018

hadley commented May 30, 2018

hadley commented Dec 11, 2019

rtaph commented Dec 1, 2016 •

edited

Loading

romainfrancois commented May 30, 2018 •

edited

Loading