-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tr: calculate complement set early #6340
Conversation
GNU testsuite comparison:
|
The MacOS failures look unrelated to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good! Just two nitpicks, and a worry about a potentially new bug: tr -dts asdf qwe
, see below.
I also don't have a Mac so my hands are tied. My comment was to make it easier to review c: |
An additional issue was found while reviewing uutils#6340, check [this thread][1]. A summary is: - `tr` ignores the `-t`/`--truncate-set1` flag when not translating - Not translating is defined as `-d` was passed, or one set was passed. [1]: uutils#6340 (comment)
Fixes uutils#6163 and adds a test to verify that a regression is not caused. Instead of inverting the conditions to check (e.g. delete characters **not** present in set1) invert set1 when passed the complement flag (`-c`, `-C`, `--complement`). This is done by calculating set1 then "inverting" it by subtracting from the "full" (universe) set (0..=u8::MAX). This fixes issue 6163 because it was caused by a combination of the `-c` and `-t` flag. `-c` is the abovementioned complement flag and `-t`/`--truncate-set1` truncates set1 to the length of set2. What happened in issue 6163 is that `set1={b'Y'}` and `set2={b'Z'}`, when truncated set1 stays the same and we proceed. The problem is GNU utils does not consider set1 to be `{b'Y'}`, but the complement of `{b'Y'}`, that is `U \ {b'Y'}={0, 1, ..., b'X', b'Z', ...}`, thus it is truncated to `{0}`. We can verify this by doing: `printf '\0' | tr -c -t Y Z`, which prints `Z` to stdout as expected. Additionally, by calculating the complement of set1 we no longer need to consider the complement flag when doing the translate operation, this allows us to delete a lot of code.
An additional issue was found while reviewing uutils#6340, check [this thread][1]. A summary is: - `tr` ignores the `-t`/`--truncate-set1` flag when not translating - Not translating is defined as `-d` was passed, or one set was passed. [1]: uutils#6340 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
GNU testsuite comparison:
|
Ah, I missed the "spelling mistakes". Personally I believe the spellchecker is configured too strictly. For this PR, please somehow make the spellchecker stop complaining about these:
You'll find in the same file examples how to use |
Those are new, you didn't miss them, I did T-T |
An additional issue was found while reviewing uutils#6340, check [this thread][1]. A summary is: - `tr` ignores the `-t`/`--truncate-set1` flag when not translating - Not translating is defined as `-d` was passed, or one set was passed. [1]: uutils#6340 (comment)
No worries, all is good ^^ |
I gave it a try to improving perf by rewriting the set from a I think delaying the translation from args to sets might help, but it doesn't seem necessary to me. Is there a benchmark framework used to test |
Yes, the TranslateOperation is order-sensitive. I was mostly thinking about the process of computing the complement itself. And I totally agree, let's keep it as-is for now, and worry about performance later. (And when you do, MEASURE first! I feel like this would make a difference, but feelings aren't reliable enough.) I'm not aware of any consistent benchmarking scheme. However, there are a few |
GNU testsuite comparison:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jup, still LGTM
Fixes #6163 and adds a test to verify that a regression is not caused.
Instead of inverting the conditions to check (e.g. delete characters not present in set1) invert set1 when passed the complement flag (
-c
,-C
,--complement
). This is done by calculating set1 then "inverting" it by subtracting from the "full" (universe) set (0..=u8::MAX).This fixes issue 6163 because it was caused by a combination of the
-c
and-t
flag.-c
is the abovementioned complement flag and-t
/--truncate-set1
truncates set1 to the length of set2. What happened in issue 6163 is thatset1={b'Y'}
andset2={b'Z'}
, when truncated set1 stays the same and we proceed. The problem is GNU utils does not consider set1 to be{b'Y'}
, but the complement of{b'Y'}
, that isU \ {b'Y'}={0, 1, ..., b'X', b'Z', ...}
, thus it is truncated to{0}
.We can verify this by doing:
printf '\0' | tr -c -t Y Z
, which printsZ
to stdout as expected.Additionally, by calculating the complement of set1 we no longer need to consider the complement flag when doing the translate operation, this allows us to delete a lot of code.