Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add be_within 'fuzzy' matcher #77

Closed
jgebal opened this issue Oct 17, 2016 · 32 comments · Fixed by #1076
Closed

Add be_within 'fuzzy' matcher #77

jgebal opened this issue Oct 17, 2016 · 32 comments · Fixed by #1076
Assignees

Comments

@jgebal
Copy link
Member

jgebal commented Oct 17, 2016

For numbers we should allow a comparison with a percent_off deviation.
The syntax:
ut.expect( a_actual =>1.23 ).to_( be_within( a_pct => 1, a_expected => 1.23456789 ) )
should give success
ut.expect( a_actual =>1.2 ).to_( be_within( a_pct => 1, a_expected => 1.23456789 ) )
should give success

Calculation logic.
a_actual between (a_expected - ((a_expected * a_pct)/100) ) and (a_expected + ((a_expected * a_pct)/100) )

@jgebal jgebal added this to the version3 milestone Oct 17, 2016
@jgebal jgebal self-assigned this Oct 18, 2016
@jgebal jgebal added the ready label Oct 18, 2016
@jgebal
Copy link
Member Author

jgebal commented Nov 21, 2016

Similar should be available for dates, timestamps and intevals (once suported)

@Pazus Pazus modified the milestones: version3.1, version3 Dec 7, 2016
@Pazus
Copy link
Member

Pazus commented Dec 7, 2016

Rescheduled to version 3.1

@jgebal jgebal removed the ready label Jan 18, 2017
@proldan
Copy link
Contributor

proldan commented Mar 22, 2017

What should pct mean in dates and intervals? Should it be a percentage of a day for dates?

@Shoelace
Copy link
Member

that may not make a whole lot of sense..
more useful for dates would be if it meant "number of seconds difference"
ie so 1-jan-2017 10:12:13 would match 1-jan-2017 10:12:14

i guess you could say 50% would mean +/- 12hours

proldan added a commit to proldan/utPLSQL that referenced this issue Mar 30, 2017
@jgebal
Copy link
Member Author

jgebal commented May 15, 2017

I think that for dates, timestamps and intervals we should not think of within in terms of PCT.
For those data-types we should use interval as a measure of allowed deviation.
so the syntax woule be like:

ut.expect(sysdate).to_( be_within( interval '1' day, to_date( '20170515', 'yyyy-mm-dd' ) ) );

@jgebal jgebal removed this from the v3.1.0 milestone Feb 5, 2018
@jgebal jgebal removed the task label Feb 5, 2018
@jgebal jgebal added the idea label Jun 19, 2018
@MickeVr
Copy link

MickeVr commented Oct 16, 2018

I customized my utplsql implementation to be able to compare floats. However, instead of a relative precision, I used an absolute precision, but it sounds like the same idea as I already applied. Would it be an idea to make this absolute / relative distinction in a parameter?

@Shoelace
Copy link
Member

that could work so instead of
ut.expect( a_actual =>1.23 ).to_( be_within( a_pct => 1, a_expected => 1.23456789 ) )
you could also have
ut.expect( a_actual =>1.23 ).to_( be_within( a_amt => 0.123, a_expected => 1.23456789 ) )

that works for date/timestamp/interval comparison better then a_pct also and keeps it somewhat consistent to a date/timpstamp comparitson could be
ut.expect( a_actual =>SYSDATE ).to_( be_within( a_amt => INTERVAL '1' WEEK, a_expected => DATE '2018-10-17' ) )

@Shoelace
Copy link
Member

actually that does that not work well becuase cannot tell the difference between a_amt and a_pct for number comparision.
maybe it actually needs to be 2 diffrent matchers of be_within_pct and be_within_amt

@jgebal
Copy link
Member Author

jgebal commented Oct 17, 2018

I would keep the be_within and add one of:

  • parameter use_pct varchar2 := 'N'
  • separate matcher be_within_pct

Separate matcher seems more clear to use but both are good solutions.

Thinking of code readability when matchers are used:

ut.expect(v_actual).to_be_within(0.1, v_expected, 'Y');
ut.expect(v_actual).to_be_within_pct(0.1, v_expected);

--or maybe
ut.expect(v_actual).to_be_within(0.1).percent_of(v_expected);
--and without percent
ut.expect(v_actual).to_be_within(0.1).of(v_expected);

Which syntax do you find best and why you think it's best?

@jgebal
Copy link
Member Author

jgebal commented Oct 17, 2018

After looking at the issue second time I think that absolute distance is a better option than pct.
Maybe we shouldnt bother with PCT at all.
Absolute distance can be applied to types like:
Number, float, interval, date, timestamp

@Shoelace
Copy link
Member

Shoelace commented Oct 17, 2018

true and if you really want a pct value you can calculate it on the fly.
ie for 5% you could
ut.expect( a_actual =>1.23 ).to_be_within( a_amt => 1.23456789 *(5/100), a_expected => 1.23456789 )
which i admit is a little unwieldy.. but does make it clear what you are trying to do.

@jgebal jgebal added this to the v3.1.4 milestone Nov 20, 2018
@PhilippSalvisberg PhilippSalvisberg removed this from the v3.1.4 milestone Sep 6, 2019
@lwasylow lwasylow self-assigned this May 20, 2020
@lwasylow
Copy link
Member

So we can have options as follow:

  1. date
    @jgebal - I know you wanted to modify syntax ?
    syntax: ut.expect( a_actual =>1.23 ).to_be_within( a_amt => 1, a_expected => to_date('12/09/2020','DDMM/RRR' )
    message: Actual: 2020-05-21T20:14:07 (date) was expected to be within absolute distance of 1 from 2020-05-23T20:14:07 (date)

  2. number:
    syntax: ut.expect( a_actual =>1).to_be_within( a_amt => 1, a_expected => 3 )
    message: Actual: 1 (number) was expected to be within absolute distance of 1 from 3 (number)

Now dates one can be a bit of crude to see but I feel this provides an easy to write and udnerstand syntax that ask for absolute distance between two values actual and expected

We can also implement this , would be follow same syntax?

  • timestamp_unconstrained
  • timestamp_tz_unconstrained
  • timestamp_ltz_unconstrained
  • yminterval_unconstrained
  • dsinterval_unconstrained

@lwasylow
Copy link
Member

After some thought I think this should be an argument of equal. For me it's looking like an equality matcher with a margin of error. E.g. to_equal(4).within(5).
Approach of to_be_within(interval '1' day).of(sysdate+2) breaking idea of expected being part of matcher (which is something we can workaround) but also word of is key word so we would be looking for replacement like ofvalue etc.

@lwasylow
Copy link
Member

Actually having a little read the word tolerance or with_tolerance sounds like a better match than within and we could provide abbreviation wt

@lwasylow
Copy link
Member

lwasylow commented May 23, 2020

Can't make a mind up :) I also like an idea of new one called to_be_close as implemented in python https://www.python.org/dev/peps/pep-0485/#proposed-implementation with relative and and absolute tolerance. Gives a lot of flexibility and as new matcher won't imply that works for compounds like cursor etc. E.g
expect(2).to_be_close(3).abs_tol(1) something in this lines

@jgebal
Copy link
Member Author

jgebal commented May 24, 2020

How about going with:

ut.expect(v_actual).to_be_within(0.1, v_expected);
ut.expect(v_actual).to_be_within_pct(0.1, v_expected);
ut.expect(sysdate).to_be_within( interval '1' day, to_date( '20170515', 'yyyy-mm-dd' ) );

@lwasylow
Copy link
Member

Sounds good to me.

@lwasylow
Copy link
Member

Questions should we be supporting intervals as argument of within.
Generally, I think interval express distance by itself and comparing a distance within a distance is a bit counter intuitive ?

@jgebal
Copy link
Member Author

jgebal commented Jun 14, 2020

In my opinion, INTERVAL is a measure of time.
So if some task takes 1 hour I would express it as an interval.
I could write a test to check if task takes an hour +/- 5 minutes.
That seems to me a very legitimate use case.
ut.expect( l_actual_interval ).to_be_within( interval '5' minute ).of_( interval '1' hour );

I agree that interval can also be used to capture date/timestamp difference.
In that case, the matcher should be applied to dates/timestamps.
ut.expect( l_actual_date ).to_be_within( interval '5' minute ).of_( timestamp '2020-06-14 22:11:34');
@pesse @Pazus @PhilippSalvisberg @Shoelace - comments?

@PhilippSalvisberg
Copy link
Member

PhilippSalvisberg commented Jun 15, 2020

The definition of the INTERVAL DAY TO SECOND Data Type is

INTERVAL DAY TO SECOND stores a period of time in terms of days, hours, minutes, and seconds. This data type is useful for representing the precise difference between two datetime values.

I like the Interval literals to express an interval in a human readable way. This helps especially for this use case. A comparison/assert with a tolerance. As @lwasylow I'd prefer a syntax as follows:

ut.expect(l_actual_interval).to_be(l_expected_interval).with_tolerance(interval '5' minute);

and for the second example

ut.expect(l_actual_date).to_be(timestamp '2020-06-14 22:11.34').with_tolerance(interval '5' minute);

It's a well know equal comparison with an optional tolerance.

The tolerance can be an interval (year to month or day to second) or a number. Tolerance is applicable for datetime, timestamp, number comparisons. In case of datetime/timestamp a number tolerance is interpreted in days. This would be consistent with the datetime arithmetic of the Oracle Database.

@jgebal
Copy link
Member Author

jgebal commented Jun 16, 2020

I think there are two aspects here

  • should we allow for interval/interval/interval as acual/expexted/tolerance
  • what syntax shoul be used for this expectation?

Seems like you're supportive of the firts but suggest to use alternative syntax to (...).to_be_within(...).of_(...)

@PhilippSalvisberg
Copy link
Member

PhilippSalvisberg commented Jun 16, 2020

should we allow for interval/interval/interval as acual/expexted/tolerance

Yes.

what syntax shoul be used for this expectation?

I favor (...).to_be(...).with_tolerance(...) over (...).to_be_within(...).of_(...). For me it is clearer which is the "tolerance" and which the value to compare to. However, there might be other things to consider like the return type of a method (making a fluent API harder or easier) and the consistency to other methods. I have not looked at these aspects.

@lwasylow
Copy link
Member

For me a tolerance feel a better being second as for first part of name this can probably be modified there are multiple options like:
nearly_equal or approx_equal instead of to_be

I think with_tolerance also is a much clearer expresion and falls into a format as we passing expected first and then a modifier of that comparision whether is a join by or a distance.

@jgebal
Copy link
Member Author

jgebal commented Jun 16, 2020

One more:
ut.expect(...).to_be_close_to(...).with_tolerance(...);

@PhilippSalvisberg
Copy link
Member

Why change the first part and not keep the existing variant to_be? I see with_tolerance optional because the default is a tolerance of zero.

@lwasylow
Copy link
Member

We need to change first part as currently we bene looking on to be within(3).of_(5) where 3 was a distance and 5 was a point from distance was measured.

@PhilippSalvisberg
Copy link
Member

PhilippSalvisberg commented Jun 17, 2020

@lwasylow you were suggesting nearly_equal or approx_equal and @jgebal to_be_close_to instead of to_be. So, my question was why? Just because you like it more or is there some limitation regarding overloading of to_be?

@lwasylow
Copy link
Member

@PhilippSalvisberg, apologies I misread your post. For some reason I assumed you reffered to to_be_within. There is no restriction in to_be. As for syntax I personally think to_be is not descriptive and can be be easily misunderstood with equal. Personally I would prefer something more descriptive as well as distance being mandatory value positive, with zero is just a worse version of equal.
Hope that makes sense and clears.

@PhilippSalvisberg
Copy link
Member

@lwasylow no problem.

As for syntax I personally think to_be is not descriptive and can be be easily misunderstood with equal

Actually that's what I like. Standalone to_be really means "to be equal to ... with zero tolerance". You can express it explicitly like this

ut.expect(l_actual).to_be(l_expected).with_tolerance(0);

or like this

ut.expect(l_actual).to_be(l_expected);

And it feels right. And also with non-default values for tolerance like this:

ut.expect(l_actual).to_be(l_expected).with_tolerance(0.5);

I am therefore voting against the introduction of additional keywords. Because I think it unnecessarily bloats the syntax of utPLSQL.

We don't have to agree . Do whatever you and @jgebal think is best.

@jgebal
Copy link
Member Author

jgebal commented Jun 21, 2020

@PhilippSalvisberg, @lwasylow

Keep in mind that utPSLQL suppoers two flavors of syntax for expectations:

  • shortcut syntax: ut.expect(l_actual).to_**matcher(...)[.options(...)...]**;
  • passing matcher to expectation: ut.expect(l_actual).to_( matcher(...)[.options(...)...] );

So for fuzzy matcher we would need to name it be / ut_be.
That way we would end up with syntax like:
ut.expect( 1 ).to_( be( 2 ).with_tolerance( 1 ) );
ut.expect( 1 ).not_to( be( 2 ).with_tolerance( 1 ) );
ut.expect( 1 ).to_be( 2 ).with_tolerance( 1 );
ut.expect( 1 ).not_to_be( 2 ).with_tolerance( 1 );

The be in OOP it s used to represent equal identity in sense of "the same object instance" not "the same value" or "identical object".

I think that be is too strong for this matcher and actually in PL/SQL it is quite impossible for two variables to point to (represent) the same object instance as assignments in PLSQL are done by deep copy of a variable.

We could have used be for equality but since we have decided to use to_equal I se no value in changing it now.

Overloading to_equal to support tolerance would also cause trouble with support for non-scalar data-types.

It will not be possible to apply with_tolerance to cursor, object type, nested table.

So we would need to add safety net for invalid syntax like:
ut.expect( l_actual_cursor ).to_be( l_expected_cursor ).with_tolerance( 1 );

The be_within is described as part of RSpec, which was as a reference for Domain Specific Language specification in utPLSQL expectations:
https://relishapp.com/rspec/rspec-expectations/v/3-9/docs/built-in-matchers/be-within-matcher

Thanks for all the inputs.

I'm in favor of the following:

begin
  ut.expect(interval).to_be_within(interval).of_(interval);

  ut.expect(date).to_be_within(interval).of_(interval);
  ut.expect(timestamp).to_be_within(interval).of_(timestamp);
  ut.expect(timestamp_tz).to_be_within(interval).of_(timestamp_tz);
  ut.expect(timestamp_ltz).to_be_within(interval).of_(timestamp_ltz);

  ut.expect(number).to_be_within(number).of_(number);
  ut.expect(number).to_be_within_pct(number).of_(number);
end;

@PhilippSalvisberg
Copy link
Member

Thank you, @jgebal for the detailed answer. You got me with the link to RSpec (and because utPLSQL is based on that) and the explanation regarding the technical issues.

So, I support your approach.

@pesse
Copy link
Member

pesse commented Jun 22, 2020

@jgebal 's comments - as always a treasure of knowledge! ❤

@lwasylow lwasylow linked a pull request Feb 3, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants