Skip to content

Vectorize add_suffixes() #6642

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

DavisVaughan
Copy link
Member

Closes #6636

Alternate approach that truly vectorizes the calculation. I'm about 95% sure this is correct. It passes all tests and we do have a decent number of tests that check corner cases of suffix so I'm fairly confident about it.

@DavisVaughan DavisVaughan force-pushed the feature/faster-suffix-adding branch from 17d6b92 to a79a798 Compare January 17, 2023 21:54
@DavisVaughan DavisVaughan requested a review from hadley January 17, 2023 22:12
x <- c(y, x)

# Never marks the "first" duplicate (i.e. never anything in `y`)
dup <- duplicated(x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're doing performance optimisations, I think this might be a little faster:

dup <- anyDuplicated(x)
while (length(dup) > 0) { 
  x[dup] <- paste0(x[dup], suffix)
  dup <- anyDuplicated(x)
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it? If there are multiple columns that are duplicated between x and y then my approach tackles them all in one pass. We only ever do a 2nd pass at all in the super rare case where adding a suffix created another duplication.

anyDuplicated() returns exactly 1 value, the location of the first duplicated value or 0 if there are no dups. So we'd have to do (at least) as many passes as there are duplicates in x and y

Copy link
Member Author

@DavisVaughan DavisVaughan Jan 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to r-lib/vctrs#1239 since it would be nice to be able to use vec_detect_duplicate(x, ignore = "first") here instead of duplicated()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, good point.

@DavisVaughan DavisVaughan force-pushed the feature/faster-suffix-adding branch from 3b71f6c to 2816ada Compare January 18, 2023 14:48
@DavisVaughan DavisVaughan merged commit 2af5c86 into tidyverse:main Jan 18, 2023
@DavisVaughan DavisVaughan deleted the feature/faster-suffix-adding branch January 18, 2023 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants