Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement(remap): add integer division #5353

Merged
merged 3 commits into from
Dec 13, 2020
Merged

Conversation

StephenWakely
Copy link
Contributor

Closes #3729

Discussion in that issue stalled, so I have moved forward to try to satisfy all the issues raised there..

This PR makes two changes.

  • Ordinary division is now always float division. Before, if the first parameter was an Integer it would be Integer division. So 9 / 12 == 0. Now it will be 9 / 12 == 0.75. It does mean that even if the result is a whole number it will be a float - 4 / 2 == 2.0

  • There is a new integer division operator - //. So 9 // 12 == 0. Float parameters are cast to an integer.

Signed-off-by: Stephen Wakely fungus.humungus@gmail.com

Signed-off-by: Stephen Wakely <fungus.humungus@gmail.com>
Signed-off-by: Stephen Wakely <fungus.humungus@gmail.com>
Copy link
Contributor

@JeanMertz JeanMertz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

It does mean that even if the result is a whole number it will be a float - 4 / 2 == 2.0

Does it make sense to check if we have a fraction and return an integer if we don't (if both lhs and rhs are integers)?

e.g. v == (v as i64) as f64

@StephenWakely
Copy link
Contributor Author

Does it make sense to check if we have a fraction and return an integer if we don't (if both lhs and rhs are integers)?

e.g. v == (v as i64) as f64

This is how the json parser works - if it can be an integer it becomes an integer, otherwise a float. But, I personally prefer to return a consistent type. Am open to objections if anyone prefers otherwise?

@JeanMertz
Copy link
Contributor

Personally, having 4 / 2 return 2.0 isn't "consistent" to me. I'd prefer it if we return an integer if both sides are integers. I am not advocating to return an integer if someone does 4.0 / 2, that should still return 2.0 (same for 4 / 2.0).

@StephenWakely
Copy link
Contributor Author

Personally, having 4 / 2 return 2.0 isn't "consistent" to me.

The inconsistency I mean is 4 / 2 would return an Integer, but 5 / 2 would return a Float. In a real example, where at least of those sides would be a variable sometimes .foo / 2 would be a Integer and other times it would be a Float - the type would only be determined at runtime.

@JeanMertz
Copy link
Contributor

The inconsistency I mean is 4 / 2 would return an Integer, but 5 / 2 would return a Float.

Right. To me, I would describe this not as inconsistency, but as "the most useful result". I get that from a type-level perspective, it's inconsistent, but as a user, I wouldn't want 5 / 2 to resolve to 3.

We could have / do "what you most likely want" and have // do what is "technically correct", maybe?

In a real example, where at least of those sides would be a variable sometimes .foo / 2 would be a Integer and other times it would be a Float - the type would only be determined at runtime.

I don't see how that's a problem in this case. If, at runtime, .foo is a float, then the result will be a float as well.

@StephenWakely
Copy link
Contributor Author

If, at runtime, .foo is a float, then the result will be a float as well.

If I've understood your suggestion, then if .foo == 4 then the result would be an int.

All told though, I don't think it will have a negative impact to how things work, so I am happy to make the change. The fact it is an Int will not prevent you from doing anything that you could do if it was a Float. Plus it does remain consistent with the Json parsing.

@bruceg
Copy link
Member

bruceg commented Dec 3, 2020

Unless we have a pattern of converting a Float into an Integer when possible (note that this must include both the rounding discussed here and overflow detection), then I would dissent. Operations should have a consistent result dependent on the types of the arguments but not on the values. So, A + B has one result type. A * B has one result type. So also A / B should have a single result type, particularly if A is an event field. I usually work exclusively with integers and so have normalized the silent truncation, but mathematically a Float result is much less surprising.

Parsing JSON where a number may be parsed as either an Integer or a Float depending on the value is a different case. There is a textual difference between floats and integers that makes such an automatic conversion much more obvious and less surprising.

Of course, if everything downstream for this treats Integers and Floats equivalently, this is all pretty much moot, and the only reasonable thing is to do whatever is easiest. I don't think that is actually the case, though.

@jamtur01 jamtur01 added domain: vrl Anything related to the Vector Remap Language transform: remap Anything `remap` transform related labels Dec 3, 2020
@StephenWakely
Copy link
Contributor Author

@binarylogic We have reached a bit of a stalemate. Do you have an opinion on this.

5 / 2 == 2.5 (always a float)

So should:

4 / 2 == 2 (an integer)

or should

4 / 2 == 2.0 (a float)

?

@binarylogic
Copy link
Contributor

binarylogic commented Dec 4, 2020

🧑‍⚖️

I lean towards consistent types, especially since Vector will be writing to some storages that require a strict schema. For example, Elasticsearch might see an integer as a field's first value and set the field type as integer, and then subsequent float values would be unexpectedly rounded.

Therefore, I vote for:

5 / 2 = 2.5
4 / 2 = 2.0

My rationale is:

  • 5 / 2 = 2.5 is the least surprising.
  • 4 / 2 = 2.0 fits into my consistent types argument above.

Copy link
Member

@bruceg bruceg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks ok except for conversion issues. I'm not sure if that's in scope for this issue.

@@ -315,14 +318,26 @@ impl Value {
let err = || Error::Div(self.kind(), rhs.kind());

let value = match self {
Value::Integer(lhv) => (lhv / i64::try_from(&rhs).map_err(|_| err())?).into(),
Value::Integer(lhv) => (lhv as f64 / f64::try_from(&rhs).map_err(|_| err())?).into(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of those silent data corruption problems that Rust doesn't warn about. i64 as f64 is not a lossless conversion (it will lose precision for values > 2^53, see https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=d9425e2eef14bb569653638ab3076ff5), so try_from may be appropriate on both sides.

I doubt this is the only place this shows up, so it might not be in scope for this issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there such a try_from function between float and int? I can't find any.

The f64::try_from(Value that we are using here is just doing under the hood:

        match value {
            Value::Integer(v) => Ok(*v as f64),
            Value::Float(v) => Ok(*v),
            _ => Err(Error::Coerce(value.kind(), Kind::Float)),
        }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought there was, but I can't find it now so I must be mistaken. It would have to be emulated:

    let result = value as f64;
    match result as i64 {
        value => Ok(result),
        _ => Err(Error::Coerce()),
    }

Again, it might be out of scope for this particular PR, but something to consider making an issue of.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see how that emulation would work. Rust won't let you compare an i64 with an f64, so you would have to cast result back to an f64 - by which point the comparison is pointless.

I can't quite see a good way around this, so I think I will close this and raise an issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly what this is doing, by casting value to f64 and then back to i64, any conversion failure will cause that value to be different than the original.

I agree, though, that should probably go in a new issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes.. Good point.

I'm not sure that helps though if the number is a power of 2. Take 36028797018963968 as i64. Converted to a f64 that is 36028797018963970. But that number converted back to i64 is back to the original 36028797018963968.

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=1b03fa41be91569e49e7446e2b87ecff


let value = match &self {
Value::Integer(lhv) => (lhv / i64::try_from(&rhs).map_err(|_| err())?).into(),
Value::Float(lhv) => (*lhv as i64/ i64::try_from(&rhs).map_err(|_| err())?).into(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too, values larger than 2^63 silently turn into gibberish: 50000000000000000000.0 as i64 == 9223372036854775807.

Signed-off-by: Stephen Wakely <fungus.humungus@gmail.com>
@StephenWakely
Copy link
Contributor Author

Have raised #5530 to look into the issues of converting between floats and ints.

@StephenWakely StephenWakely merged commit 565bc52 into master Dec 13, 2020
@StephenWakely StephenWakely deleted the 3729_integer_division branch December 13, 2020 09:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: vrl Anything related to the Vector Remap Language transform: remap Anything `remap` transform related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support plain (non-float) integer division in remap transform
5 participants