Skip to content

fix: silently ignore unrecognized trailing alphabetic tokens after pure numbers#286

Open
0xSoftBoi wants to merge 3 commits intouutils:mainfrom
0xSoftBoi:fix/ignore-trailing-noise-tokens
Open

fix: silently ignore unrecognized trailing alphabetic tokens after pure numbers#286
0xSoftBoi wants to merge 3 commits intouutils:mainfrom
0xSoftBoi:fix/ignore-trailing-noise-tokens

Conversation

@0xSoftBoi
Copy link
Copy Markdown

@0xSoftBoi 0xSoftBoi commented Apr 6, 2026

Summary

  • GNU date accepts inputs like 8j and 8 j, treating the leading number as an hour and silently discarding the unrecognized trailing word-token. parse_datetime currently rejects these with InvalidInput.
  • This PR matches GNU behavior for these cases while keeping all existing error paths intact.

Implementation

  • Added Item::Noise variant for unrecognized alphabetic tokens.
  • Added noise_token() as the last alternative in parse_item(), so it only fires after every other parser has already failed.
  • In DateTimeBuilder::try_from, a Noise item is accepted only when it directly follows a Pure number item (prev_was_pure guard). This ensures 8j and 8 j succeed while bogus +1 day, 2025-01-01 abcdef, and NotADate still return errors.
  • Added noise_after_pure_number regression test.

Test Plan

  • All 134+ existing tests pass
  • New regression test covers both fixed inputs and error cases

Fixes #279

0xSoftBoi and others added 2 commits April 5, 2026 21:08
GNU date accepts bare "UT" and "ut" as a synonym for UTC (+0).
parse_datetime rejected them because the abbreviation was absent from
the named-timezone lookup table in timezone_name_to_offset().

Add "ut" => Ok("+0") immediately after the existing "utc" entry and
add a regression test that verifies all four case variants are
accepted and resolve to a UTC-offset-0 instant.

Fixes uutils#280

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…re numbers

GNU date accepts inputs like '8j' and '8 j', treating the number as an
hour and silently discarding the unrecognized trailing word-token. This
commit matches that behaviour.

Implementation:
- Add Item::Noise variant for unrecognized alphabetic tokens
- Add noise_token() as the last alternative in parse_item(), so it only
  fires after every other parser has failed
- In DateTimeBuilder::try_from, accept Noise only when it directly follows
  a Pure number item (prev_was_pure guard); reject it anywhere else so
  that leading garbage (e.g. 'bogus +1 day') and post-date garbage
  (e.g. '2025-01-01 abcdef') still produce errors
- Add noise_after_pure_number regression test covering both '8j' and '8 j'

Fixes uutils#279
@0xSoftBoi 0xSoftBoi changed the title fix(offset): accept ut / UT as a UTC timezone abbreviation fix: silently ignore unrecognized trailing alphabetic tokens after pure numbers Apr 6, 2026
@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq bot commented Apr 6, 2026

Merging this PR will degrade performance by 13.27%

❌ 2 regressed benchmarks
✅ 19 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Benchmark BASE HEAD Efficiency
parse_invalid_input 27.2 µs 31.4 µs -13.27%
parse_weekday 30 µs 31.5 µs -4.83%

Comparing 0xSoftBoi:fix/ignore-trailing-noise-tokens (118bb5a) with main (7440dd9)

Open in CodSpeed

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.31%. Comparing base (7440dd9) to head (118bb5a).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #286      +/-   ##
==========================================
+ Coverage   99.30%   99.31%   +0.01%     
==========================================
  Files          20       21       +1     
  Lines        3894     3953      +59     
  Branches      122      123       +1     
==========================================
+ Hits         3867     3926      +59     
  Misses         26       26              
  Partials        1        1              
Flag Coverage Δ
macos_latest 99.31% <100.00%> (+0.01%) ⬆️
ubuntu_latest 99.31% <100.00%> (+0.01%) ⬆️
windows_latest 14.09% <20.89%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sylvestre
Copy link
Copy Markdown
Contributor

please run cargo fmt

@tomagsx
Copy link
Copy Markdown

tomagsx commented Apr 6, 2026

Review bump on this PR. I see the current blockers are the formatting failure and the CodSpeed regression flag; I can follow up on the formatting immediately, but I’d still appreciate a maintainer read on whether the GNU-compat behavior for trailing noise-after-pure-number is acceptable before this drifts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

date input: unrecognized trailing tokens rejected (e.g. 8j, 8 j)

3 participants