Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify invalid encoding handling #3832

Merged
merged 3 commits into from
Aug 17, 2022

Conversation

tertsdiepraam
Copy link
Member

@tertsdiepraam tertsdiepraam commented Aug 17, 2022

The InvalidEncodingHandling for uucore::Args was introduced (for good reason) in #1852, but seemed to be unnecessarily complex. A few things stuck out to me:

  • No util should panic on invalid input, so the Panic variant can be removed.
  • We don't want to print anything when we encounter an invalid string (GNU doesn't do that either).
  • All utils are using accept_any (and not accept_lossy or accept_complete)
  • If only accept_any is used, then the intermediate ConversionResult is useless and can be removed.
  • Finally, the collect_str was overly complex and becomes much nicer when split into two smaller methods.

This cleans up a lot, but I don't think it's quite done yet. In particular:

  • I don't think any util should use collect_ignore, but I didn't want to go through every util that does this (which are 23 utils) to check :)
  • We should be able to get rid of the construction of a Vec and instead create an Iterator.
  • It bugs me that uucore::Args does not require ExactSizeIterator, requiring collecting it into a Vec to get the length, like in Use clap::ArgAction in true and false #3784.
  • Clap's invalid encoding handling is improving steadily and we should look into using that.

We never want utilities to panic on invalid input and it is not currently in use, so it can be removed safely.
Outside of tests, only `accept_any` was used, meaning that this unnecessarily complicated the code. The behaviour of `accept_any` is now the default (and only) option.
The previous encoding handling was unnecessarily complex. This commit removes the enum that specifies the handling and instead has two separate methods to collect the strings either with lossy conversion or by ignoring invalidly encoded strings.
@sylvestre sylvestre merged commit 87e3899 into uutils:main Aug 17, 2022
sylvestre added a commit to tertsdiepraam/coreutils that referenced this pull request Aug 18, 2022
sylvestre added a commit to tertsdiepraam/coreutils that referenced this pull request Aug 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants