Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize function calculating normalized distance between date/time values #2543

Merged
merged 1 commit into from
Feb 9, 2024

Conversation

papiomytoglou
Copy link
Contributor

@papiomytoglou papiomytoglou commented Dec 20, 2023

Right now, multiple value matching for date/time values is broken for actual-to-expected differences smaller than a year.

For instance, given the matchers below:

  "request": {
    "method": "GET",
    "urlPath": "/resource",
    "queryParameters": {
      "datetime" : {
        "hasExactly" : [
          {
            "equalToDateTime": "now",
            "truncateActual": "first hour of day",
            "truncateExpected": "first hour of day"
          },
          {
            "equalToDateTime": "now +10 days",
            "truncateActual": "first hour of day",
            "truncateExpected": "first hour of day"
          }
        ]
      }
    }
  }

Then (assuming the current date is 2023-12-20) a GET request to the following URI should be matched:
/resource?datetime=2023-12-20T00:00:00.000Z&datetime=2023-12-30T00:00:00.000Z
Instead, we get a "request was not matched" message.

The root cause is in the granularity of the calculation of the distance between the actual and the expected date/time. With the current implementation any date/time pair with a smaller than a year difference has a distance of zero. That means an actual date/time value that's ten minutes after the expected one has the same distance from it as one that's ten months after that.

So, in the example above, all four combinations of the now and now +10 days matchers on one side and the two date/time query parameters on the other have a distance of zero. This makes multi-value matching unusable for use cases such as retrieving records for the first day of each month within a year, for example.

This PR increases the granularity to the level of minutes allowing the user to also use multi-value matching for the range between a minute and a year. I didn't want to change the logic of the current implementation so I just moved from years to minutes which means the problem remains when matching any number of date/time pairs with a difference below the minute level. Keeping to the current logic, minutes was the most granular level that could be reached. When we turn to seconds functionality seems to break, probably due to floating point overflow/loss of precision.

I have added two tests that break with the current implementation and also tweaked a few existing ones so that they take under account leap years, which wasn't an issue before.

As a side note, the same problem exists for actual-to-expected differences of over 100 years. A date/time pair which is 100 years apart has the same distance (one) as a pair that's 2000 years apart.

References

An older, related PR: #1925

Submitter checklist

  • Recommended: Join WireMock Slack to get any help in #help-contributing or a project-specific channel like #wiremock-java
  • The PR request is well described and justified, including the body and the references
  • The PR title represents the desired changelog entry
  • The repository's code style is followed (see the contributing guide)
  • Test coverage that demonstrates that the change works as expected
  • For new features, there's necessary documentation in this pull request or in a subsequent PR to wiremock.org

@papiomytoglou papiomytoglou requested a review from a team as a code owner December 20, 2023 09:28
@tomakehurst tomakehurst self-assigned this Feb 6, 2024
@@ -83,7 +83,7 @@ public double getDistance() {
}

private double calculateDistance(Temporal start, Temporal end) {
double distance = ((double) ChronoUnit.YEARS.between(start, end)) / 100;
double distance = ((double) ChronoUnit.MINUTES.between(start, end)) / (365 * 24 * 60 * 100);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason we couldn't go down to milliseconds? e.g.

private static final long ONE_YEAR_IN_MILLIS = 365 * 24 * 60 * 60 * 1000L;

  private double calculateDistance(Temporal start, Temporal end) {
    double distance = ((double) (ChronoUnit.MILLIS.between(start, end))) / ONE_YEAR_IN_MILLIS;
    distance = Math.abs(distance);
    return Math.min(distance, 1.0);
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried going down to milliseconds and hit a numeric overflow not realizing I should use the L suffix so that the denominator is calculated using longs instead of integers. My mistake, I will fix that on the next commit.

Before making the change I will need a clarification: Right now all date/time distances are normalized within the range from 0 to 1. In your initial implementation you set a difference of 100 years as a ceiling when calculating date/time distances. That is, any difference greater than 100 years results in a distance of 1.

In your suggestion above you lower that ceiling even further to one year. Any date/time difference greater than a year will now result in a distance of one. I suspect that was by mistake and we should stick to the 100 year ceiling, right?

On the other hand, could we even raise that ceiling instead? Is there a reason 100 years should be the limit over which all date/time differences result in a distance of 1?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I did it deliberately, but this approach is pretty arbitrary and I'm not really happy it in general TBH.

The issue is that normalising to a finite range is hard when the potential difference in date values in infinite.

I think that the difference in dates that are already quite close are more significant (people will probably mostly be working in the range of few days-months at most from today) so a better strategy would probably be a logarithmic function asymptotic to 1.0. If you felt like tackling that also, that would be a significant improvement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have amended my commit, this time using a version of the following function:

          |x|
f(x) = --------
       |x| + c

When c > 0, f has a range of [0, 1) and is asymptotic to 1 as shown in the following graph (where c = 1):
graph

In our case the actual calculateDistance function implemented has the form:

                 |time diff|
distance = -----------------------
           |time diff| + 50 years

The number 50 is arbitrary but it keeps to the previous implementation of having 100 years as the ceiling. That is, a 50 year difference still results in a 0.5 distance.

I have edited the title of the PR to reflect its new scope.

Please let me know what you think.

ACTUAL_DATE_TIME_ONE_CENTURY_DIFFERENCE
};
;
assertThat(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is quite hard to read. I suggest pulling out some well-named variables for the expected and actual values in this assertion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in amended commit.

@papiomytoglou papiomytoglou reopened this Feb 8, 2024
@papiomytoglou papiomytoglou changed the title Fix multi-value matching for smaller actual-to-expected date/time differences Generalize function calculating normalized distance of date/time values Feb 8, 2024
@papiomytoglou papiomytoglou changed the title Generalize function calculating normalized distance of date/time values Generalize function calculating normalized distance between date/time values Feb 9, 2024
distance = Math.abs(distance);
return Math.min(distance, 1.0);
long absoluteTimeDifference = Math.abs((ChronoUnit.MILLIS.between(start, end)));
return (double) absoluteTimeDifference / (absoluteTimeDifference + 50 * ONE_YEAR_IN_MILLIS);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having played around with this in a spreadsheet a bit I suggest changing the 50 to 2. This means a 10 year difference gives a distance of 0.83 and 100 years 0.98 which seems about right as a tradeoff for more range with smaller differences.

Otherwise, great stuff - exactly what I had in mind!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion implemented in amended commit.

@tomakehurst tomakehurst merged commit db70771 into wiremock:master Feb 9, 2024
7 checks passed
@tomakehurst
Copy link
Member

Great stuff, thanks for contributing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants