Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#3505 Add date modification parsing #5476

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

dcechano
Copy link

@dcechano dcechano commented Oct 29, 2023

This commit addresses #3505. I added a parse_date_modification.rs module to uucore::parser and used it in touch.rs. I added tests to touch.rs to prove the parser works and is not buggy. In theory these changes can also help with similar issues in date.rs but I wanted to keep the PR to one issue at a time.

Thank you for considering my pull request and I welcome feedback if you guys have any.

@github-actions
Copy link

GNU testsuite comparison:

GNU test error: tests/touch/relative. tests/touch/relative is passing on 'main'. Maybe you have to rebase?

Comment on lines 348 to 351
return match date_from_modifier(modifier, parsed) {
Ok(new_date) => Ok(datetime_to_filetime(&new_date.and_utc())),
Err(e) => Err(e),
};
Copy link
Sponsor Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be simplified with:

Suggested change
return match date_from_modifier(modifier, parsed) {
Ok(new_date) => Ok(datetime_to_filetime(&new_date.and_utc())),
Err(e) => Err(e),
};
return date_from_modifier(modifier, parsed).map(|new_date| datetime_to_filetime(&new_date.and_utc()));

Comment on lines 363 to 370
if let Ok((parsed, modifier)) = NaiveDateTime::parse_and_remainder(s, fmt) {
if modifier.is_empty() {
return Ok(datetime_to_filetime(&parsed.and_utc()));
}
return match date_from_modifier(modifier, parsed) {
Ok(new_date) => Ok(datetime_to_filetime(&new_date.and_utc())),
Err(e) => Err(e),
};
Copy link
Sponsor Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can probably be simplified too by something like

Suggested change
if let Ok((parsed, modifier)) = NaiveDateTime::parse_and_remainder(s, fmt) {
if modifier.is_empty() {
return Ok(datetime_to_filetime(&parsed.and_utc()));
}
return match date_from_modifier(modifier, parsed) {
Ok(new_date) => Ok(datetime_to_filetime(&new_date.and_utc())),
Err(e) => Err(e),
};
if let Ok((parsed, modifier)) = NaiveDateTime::parse_and_remainder(s, fmt) {
return if modifier.is_empty() {
Ok(datetime_to_filetime(&parsed.and_utc()))
} else {
date_from_modifier(modifier, parsed).map(|new_date| datetime_to_filetime(&new_date.and_utc()))
};
}

if modifier.is_empty() {
return Ok(datetime_to_filetime(&parsed));
}
return match date_from_modifier(modifier, parsed) {
Copy link
Sponsor Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, Err(e) => Err(e),
can often be simplified

@sylvestre
Copy link
Sponsor Contributor

Lot of work probably went into this, bravo
please add integration tests here too:
https://github.com/uutils/coreutils/blob/main/tests/by-util/test_touch.rs

));
}
date = if time >= 0 {
date.add(Months::new((12 * time) as u32))
Copy link
Sponsor Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

codecov suggests that this line isn't covered, could you please fix it? thanks

if i >= bytes.len() {
break;
}
if let n @ 48..=57 = bytes[i] {
Copy link
Sponsor Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please replace 48 & 57 by variable names

}

fn parse_num(&mut self) -> Result<i64, ParseError> {
while self.cursor < self.haystack.len() && self.haystack[self.cursor].is_ascii_whitespace()
Copy link
Sponsor Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please create a function:

fn skip_whitespace(&mut self) {
    while self.cursor < self.haystack.len() && self.haystack[self.cursor].is_ascii_whitespace() {
        self.cursor += 1;
    }
}

}

fn parse_unit(&mut self) -> Result<ChronoUnit, ParseError> {
while self.cursor < self.haystack.len() && self.haystack[self.cursor].is_ascii_whitespace()
Copy link
Sponsor Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and replace this one too

}
let bytes = &self.haystack[self.cursor..].to_ascii_lowercase();
match bytes[0] {
b'd' => {
Copy link
Sponsor Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can probably by simplified by something like:

    let units = [
        (b'd', "day", ChronoUnit::Day),
        (b'w', "week", ChronoUnit::Week),
        (b'm', "month", ChronoUnit::Month),
        (b'y', "year", ChronoUnit::Year),
        (b'h', "hour", ChronoUnit::Hour),
        (b's', "second", ChronoUnit::Second),

    ];
    for &(byte, unit_str, chrono_unit) in &units {
        if bytes.starts_with(unit_str) {
            self.cursor += unit_str.len();
            return Ok(chrono_unit);
        }
    }

Copy link
Author

@dcechano dcechano Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for reviewing this PR. I can implement the changes later this evening.

Copy link
Member

@tertsdiepraam tertsdiepraam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sylvestre, you seem to have a pretty good grasp on this already. Shouldn't this be fixed in parse_datetime?

@dcechano
Copy link
Author

@sylvestre, you seem to have a pretty good grasp on this already. Shouldn't this be fixed in parse_datetime?

I actually did consider this, but I figured it is not technically parsing a date but a modifier to a date. I can move the changes over to parse_datetime if needed.

@tertsdiepraam
Copy link
Member

Yes, that's what parse datetime is for. It's supposed to parse the format described here: https://www.gnu.org/software/coreutils/manual/html_node/Date-input-formats.html, which includes this part I believe. Porting it would be great!

@dcechano dcechano marked this pull request as draft October 30, 2023 23:19
@dcechano dcechano marked this pull request as ready for review October 31, 2023 02:41
@dcechano
Copy link
Author

I made the suggested changes including adding integration testing. I also increased testing to satisfy the code coverage bot, however I'm not sure I was able to get everything. I apologize in advance if the I did something incorrectly; I don't know git very well.

@sylvestre
Copy link
Sponsor Contributor

what do you think about moving it into https://github.com/uutils/parse_datetime/ ?

@dcechano
Copy link
Author

dcechano commented Oct 31, 2023

what do you think about moving it into https://github.com/uutils/parse_datetime/ ?

Oh, I see. I misunderstood @tertsdiepraam. Yes, I'm willing to move my work to parse_datetime. However, I may need more time to investigate the best way to integrate them smoothly.

One observation is that parse_datetime uses Regex for parsing which will render some of my algorithm redundant. Another is that parse_datetime already appears to do something similar but evidently is not parsing '2022-05-15 +01 Month', as pointed out in issue #3505.

In short, I just need to a day or two to reorient myself. But I am confident I can figure it out since I already gained a decent understanding while working on this PR.

@dcechano
Copy link
Author

dcechano commented Nov 2, 2023

@sylvestre @tertsdiepraam I ran into some interesting issues while working to transfer this PR to parse_datetime. I have opened an issue that I think warrants your inputs. Your guidance would be invaluable.

@sylvestre
Copy link
Sponsor Contributor

@dcechano sorry but do you have an update ? thanks

@dcechano
Copy link
Author

@dcechano sorry but do you have an update ? thanks

Yes sir. I am very sorry for the long delay but working out a solution that covers edge cases turned out to be quite challenging. I have been working in the parse_datetime repo as suggested and getting a PR ready.

What has been taking so long is that I had to implement the core logic (which was challenging in its own right) then integrate it into the larger library via lib.rs. That took some unanticipated refactoring, discovering bugs/edge cases ect. I had to find a way to take arbitrary strings with arbitrary formats and figure out how to hand off the appropriate part to chrono and take the remaining and process it via parse-datetime. This was quite a battle to get right. But I am very happy with the result.

After the PR, which I have set a hard goal of getting in by the end of this week, parse_datetime will be able to take arbitrary input like 08-31-2022 12:30:00 +1 month and spit out the result. The only restriction is that the time format has to be in a currently supported format.
I am just cleaning up a bit to make it presentable and implementing more tests to make sure that the code I am submitting is sufficiently robust.

@dcechano
Copy link
Author

dcechano commented Jan 5, 2024

@sylvestre & @tertsdiepraam I have a PR in that is a major step in addressing #3505. I'm looking forward to hearing what you guys think. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants