-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalize function calculating normalized distance between date/time values #2543
Conversation
@@ -83,7 +83,7 @@ public double getDistance() { | |||
} | |||
|
|||
private double calculateDistance(Temporal start, Temporal end) { | |||
double distance = ((double) ChronoUnit.YEARS.between(start, end)) / 100; | |||
double distance = ((double) ChronoUnit.MINUTES.between(start, end)) / (365 * 24 * 60 * 100); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason we couldn't go down to milliseconds? e.g.
private static final long ONE_YEAR_IN_MILLIS = 365 * 24 * 60 * 60 * 1000L;
private double calculateDistance(Temporal start, Temporal end) {
double distance = ((double) (ChronoUnit.MILLIS.between(start, end))) / ONE_YEAR_IN_MILLIS;
distance = Math.abs(distance);
return Math.min(distance, 1.0);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried going down to milliseconds and hit a numeric overflow not realizing I should use the L
suffix so that the denominator is calculated using longs instead of integers. My mistake, I will fix that on the next commit.
Before making the change I will need a clarification: Right now all date/time distances are normalized within the range from 0 to 1. In your initial implementation you set a difference of 100 years as a ceiling when calculating date/time distances. That is, any difference greater than 100 years results in a distance of 1.
In your suggestion above you lower that ceiling even further to one year. Any date/time difference greater than a year will now result in a distance of one. I suspect that was by mistake and we should stick to the 100 year ceiling, right?
On the other hand, could we even raise that ceiling instead? Is there a reason 100 years should be the limit over which all date/time differences result in a distance of 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I did it deliberately, but this approach is pretty arbitrary and I'm not really happy it in general TBH.
The issue is that normalising to a finite range is hard when the potential difference in date values in infinite.
I think that the difference in dates that are already quite close are more significant (people will probably mostly be working in the range of few days-months at most from today) so a better strategy would probably be a logarithmic function asymptotic to 1.0. If you felt like tackling that also, that would be a significant improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have amended my commit, this time using a version of the following function:
|x|
f(x) = --------
|x| + c
When c > 0, f has a range of [0, 1) and is asymptotic to 1 as shown in the following graph (where c = 1):
In our case the actual calculateDistance
function implemented has the form:
|time diff|
distance = -----------------------
|time diff| + 50 years
The number 50 is arbitrary but it keeps to the previous implementation of having 100 years as the ceiling. That is, a 50 year difference still results in a 0.5 distance.
I have edited the title of the PR to reflect its new scope.
Please let me know what you think.
ACTUAL_DATE_TIME_ONE_CENTURY_DIFFERENCE | ||
}; | ||
; | ||
assertThat( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is quite hard to read. I suggest pulling out some well-named variables for the expected and actual values in this assertion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in amended commit.
b1b5a75
to
c877a06
Compare
db8b65e
to
2788cff
Compare
distance = Math.abs(distance); | ||
return Math.min(distance, 1.0); | ||
long absoluteTimeDifference = Math.abs((ChronoUnit.MILLIS.between(start, end))); | ||
return (double) absoluteTimeDifference / (absoluteTimeDifference + 50 * ONE_YEAR_IN_MILLIS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having played around with this in a spreadsheet a bit I suggest changing the 50 to 2. This means a 10 year difference gives a distance of 0.83 and 100 years 0.98 which seems about right as a tradeoff for more range with smaller differences.
Otherwise, great stuff - exactly what I had in mind!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion implemented in amended commit.
Great stuff, thanks for contributing! |
Right now, multiple value matching for date/time values is broken for actual-to-expected differences smaller than a year.
For instance, given the matchers below:
Then (assuming the current date is
2023-12-20
) a GET request to the following URI should be matched:/resource?datetime=2023-12-20T00:00:00.000Z&datetime=2023-12-30T00:00:00.000Z
Instead, we get a "request was not matched" message.
The root cause is in the granularity of the calculation of the distance between the actual and the expected date/time. With the current implementation any date/time pair with a smaller than a year difference has a distance of zero. That means an actual date/time value that's ten minutes after the expected one has the same distance from it as one that's ten months after that.
So, in the example above, all four combinations of the
now
andnow +10 days
matchers on one side and the two date/time query parameters on the other have a distance of zero. This makes multi-value matching unusable for use cases such as retrieving records for the first day of each month within a year, for example.This PR increases the granularity to the level of minutes allowing the user to also use multi-value matching for the range between a minute and a year. I didn't want to change the logic of the current implementation so I just moved from years to minutes which means the problem remains when matching any number of date/time pairs with a difference below the minute level. Keeping to the current logic, minutes was the most granular level that could be reached. When we turn to seconds functionality seems to break, probably due to floating point overflow/loss of precision.
I have added two tests that break with the current implementation and also tweaked a few existing ones so that they take under account leap years, which wasn't an issue before.
As a side note, the same problem exists for actual-to-expected differences of over 100 years. A date/time pair which is 100 years apart has the same distance (one) as a pair that's 2000 years apart.
References
An older, related PR: #1925
Submitter checklist
#help-contributing
or a project-specific channel like#wiremock-java