Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coprocessor: refine the 'convert_*_to_int' functions #4962

Merged
merged 20 commits into from Jul 5, 2019

Conversation

Projects
None yet
6 participants
@lonng
Copy link
Contributor

commented Jun 24, 2019

Signed-off-by: Lonng heng@lonng.org

What have you changed? (mandatory)

  1. add functions to retrieve boundary of integer types.
  2. add function convert_*_to_int and related tests.
  3. convert_json_to_int directly use Json::cast_as_int which behavior is not correct. This will be fixed in the next PR.

What are the type of the changes? (mandatory)

The currently defined types are listed below, please pick one of the types for this PR by removing the others:

  • New feature (change which adds functionality)
  • Improvement (change which is an improvement to an existing feature)

How has this PR been tested? (mandatory)

Unit tests

@lonng lonng added the C: Copr label Jun 24, 2019

@lonng lonng changed the title coprocessor: refine the 'convert_int_to_int' coprocessor: refine the 'convert_{int,uint,float}_to_int' Jun 24, 2019

@lonng lonng changed the title coprocessor: refine the 'convert_{int,uint,float}_to_int' coprocessor: refine the 'convert_{int,uint,float,bytes}_to_int' Jun 25, 2019

@lonng lonng changed the title coprocessor: refine the 'convert_{int,uint,float,bytes}_to_int' coprocessor: refine the 'convert_*_to_int' functions Jun 25, 2019

@lonng lonng force-pushed the lonng:integer-bound branch from 12f2f86 to 91f99be Jun 26, 2019

@lonng lonng referenced this pull request Jun 27, 2019

Merged

coprocessor: refactor convert to uint functions #4987

1 of 1 task complete

@lonng lonng force-pushed the lonng:integer-bound branch from e7763e6 to 739cf16 Jun 27, 2019

Show resolved Hide resolved src/coprocessor/codec/convert.rs Outdated
Show resolved Hide resolved src/coprocessor/codec/convert.rs Outdated
Show resolved Hide resolved src/coprocessor/codec/convert.rs Outdated
Show resolved Hide resolved src/coprocessor/codec/convert.rs Outdated
Show resolved Hide resolved src/coprocessor/codec/convert.rs
Show resolved Hide resolved src/coprocessor/codec/convert.rs
Show resolved Hide resolved src/coprocessor/codec/convert.rs Outdated
Show resolved Hide resolved src/coprocessor/codec/convert.rs Outdated
Show resolved Hide resolved src/coprocessor/codec/convert.rs Outdated
}};
}

/// `convert_int_to_int` converts an int value to a diferent int value.

This comment has been minimized.

Copy link
@H-ZeX

H-ZeX Jun 28, 2019

Contributor

the diferent should be different

This comment has been minimized.

Copy link
@H-ZeX

H-ZeX Jun 28, 2019

Contributor

what if add comment "If val <lower_bound, return Ok(lower_bound), if val> upper_bound, return Ok(upper_bound)"

This comment has been minimized.

Copy link
@lonng

lonng Jul 1, 2019

Author Contributor

I feel that will be redundant.

}

let exp = box_try!((&valid_float[e_idx.unwrap() + 1..]).parse::<i64>());
let e_idx = e_idx.unwrap();
let exp = box_try!((&valid_float[e_idx + 1..]).parse::<i64>());
if exp > 0 && int_cnt > (i64::MAX - exp) {
// (exp + inc_cnt) overflows MaxInt64. Add warning and return original float string
ctx.warnings

This comment has been minimized.

Copy link
@H-ZeX

H-ZeX Jun 30, 2019

Contributor

why this is exp+int_cnt? , n1 e n2 = n1 * 10^n2

Show resolved Hide resolved src/coprocessor/codec/convert.rs Outdated

@lonng lonng referenced this pull request Jul 1, 2019

Open

coprocessor: add functions convert * to float #4998

3 of 4 tasks complete
///
/// # Panics
///
/// Panics if the `tp` is not one of `FieldTypeTp::Tiny`, `FieldTypeTp::Short`,

This comment has been minimized.

Copy link
@breeswish

breeswish Jul 3, 2019

Member

Panics if tp is not an integer field type tp?

Show resolved Hide resolved src/coprocessor/codec/convert.rs Outdated
#[inline]
pub fn convert_uint_to_int(val: u64, upper_bound: i64, tp: FieldTypeTp) -> Result<i64> {
pub fn convert_int_to_int(ctx: &mut EvalContext, val: i64, tp: FieldTypeTp) -> Result<i64> {

This comment has been minimized.

Copy link
@breeswish

breeswish Jul 3, 2019

Member

Prefer to re-order these convert_xxx_to_xxx functions by name, especially convert_float_to_uint.

This comment has been minimized.

Copy link
@lonng

lonng Jul 4, 2019

Author Contributor

All convert functions are ordered by the target type, int -> uint -> float -> bytes -> datetime -> duration -> decimal -> json. The convert_float_to_uint is exsiting in previous implementation and will be re-ordered in #4987.

/// `FieldTypeTp::Int24`, `FieldTypeTp::Long`, `FieldTypeTp::LongLong`,
/// `FieldTypeTp::Bit`, `FieldTypeTp::Set`, `FieldTypeTp::Enum`
#[inline]
pub fn integer_unsigned_upper_bound(tp: FieldTypeTp) -> u64 {

This comment has been minimized.

Copy link
@breeswish
/// Panics if the `tp` is not one of `FieldTypeTp::Tiny`, `FieldTypeTp::Short`,
/// `FieldTypeTp::Int24`, `FieldTypeTp::Long`, `FieldTypeTp::LongLong`,
#[inline]
pub fn integer_signed_upper_bound(tp: FieldTypeTp) -> i64 {

This comment has been minimized.

Copy link
@breeswish
Show resolved Hide resolved src/coprocessor/codec/convert.rs Outdated
Show resolved Hide resolved src/coprocessor/codec/convert.rs Outdated
Show resolved Hide resolved src/coprocessor/codec/convert.rs
Show resolved Hide resolved src/coprocessor/codec/convert.rs
Show resolved Hide resolved src/coprocessor/codec/convert.rs

lonng added some commits Jun 24, 2019

coprocessor: refine the 'convert_int_to_int'
Signed-off-by: Lonng <heng@lonng.org>
coprocessor: refine the covert function 'convert_bytes_to_int'
Signed-off-by: Lonng <heng@lonng.org>
add #[inline] attr
Signed-off-by: Lonng <heng@lonng.org>
coprocessor: implement the convert * to int family functions
Signed-off-by: Lonng <heng@lonng.org>
add unit tests for overflow
Signed-off-by: Lonng <heng@lonng.org>
fix 'float_str_to_int_string'
Signed-off-by: Lonng <heng@lonng.org>
address comment & fix CI failure
Signed-off-by: Lonng <heng@lonng.org>
address comment
Signed-off-by: Lonng <heng@lonng.org>
address comment
Signed-off-by: Lonng <heng@lonng.org>
fix CI fuilure
Signed-off-by: Lonng <heng@lonng.org>

@lonng lonng force-pushed the lonng:integer-bound branch 4 times, most recently from c803fa7 to ec74b53 Jul 4, 2019

Show resolved Hide resolved src/coprocessor/codec/convert.rs Outdated
match s.rfind(|c| c != '9' && c != '+' && c != '-') {
Some(idx) => {
int_str.push_str(&s[..idx]);
let next_char = char::from_u32(s.chars().nth(idx).unwrap() as u32 + 1).unwrap();

This comment has been minimized.

Copy link
@H-ZeX

H-ZeX Jul 4, 2019

Contributor

From nth's doc

Note that all preceding elements, as well as the returned element, will be consumed from the iterator. That means that the preceding elements will be discarded, and also that calling nth(0) multiple times on the same iterator will return different elements.

This is str contain valid decimal, so I think use &[u8] but not &str is better, and we can index the char in O(1) time

This comment has been minimized.

Copy link
@lonng

lonng Jul 4, 2019

Author Contributor

We cannot get a char in &[u8] actually.

address comment
Signed-off-by: Lonng <heng@lonng.org>

@lonng lonng force-pushed the lonng:integer-bound branch from ec74b53 to 10f84a1 Jul 4, 2019

address comment
Signed-off-by: Lonng <heng@lonng.org>
Show resolved Hide resolved src/coprocessor/codec/convert.rs Outdated
/// Returns the overflow error if the value exceeds the boundary.
pub fn convert_float_to_uint(fval: f64, upper_bound: u64, tp: FieldTypeTp) -> Result<u64> {
// TODO any performance problem to use round directly?
/// Converts a byte arrays to an i64 in best effort.

This comment has been minimized.

Copy link
@iosmanthus

iosmanthus Jul 4, 2019

Contributor

How about using combinator instead of match:

    let s = str::from_utf8(bytes).or_else(|err| {
        ctx.handle_truncate(true)?;
        let (valid, _) = bytes.split_at(err.valid_up_to());
        Ok(unsafe { str::from_utf8_unchecked(valid) })
    })?;

This comment has been minimized.

Copy link
@lonng

lonng Jul 4, 2019

Author Contributor

There is no much different from using combinator. The combinator uses match also and should construct an extra closure.

Show resolved Hide resolved src/coprocessor/codec/convert.rs
Show resolved Hide resolved src/coprocessor/codec/convert.rs
Ok(val) => convert_int_to_int(ctx, val, tp),
Err(_) => {
ctx.handle_overflow(Error::overflow("BIGINT", &vs))?;
let val = if vs.starts_with('-') {
) -> Result<i64> {
// It's OK to clone because duration only occupies 8 bytes after #4858 merged
let dur = dur.round_frac(DEFAULT_FSP)?;
let val = Decimal::try_from(dur)?.as_i64_with_ctx(ctx)?;

This comment has been minimized.

Copy link
@breeswish

breeswish Jul 4, 2019

Member

Use to_decimal()? like above?

This comment has been minimized.

Copy link
@lonng

lonng Jul 4, 2019

Author Contributor

to_decimal has been removed.

dur: Duration,
tp: FieldTypeTp,
) -> Result<i64> {
// It's OK to clone because duration only occupies 8 bytes after #4858 merged

This comment has been minimized.

Copy link
@breeswish

breeswish Jul 4, 2019

Member

#4858 is already merged

// TODO: avoid this clone after refactor the `Time`
let mut t = dt.clone();
t.round_frac(DEFAULT_FSP)?;
let val = t.to_decimal()?.as_i64_with_ctx(ctx)?;

This comment has been minimized.

Copy link
@breeswish

breeswish Jul 4, 2019

Member

I believe using as_i64_with_ctx(ctx), and/or supplying ctx to convert_int_to_int can be incorrect. For example, when as_i64_with_ctx produces Res::Overflow, there will be an overflow warning in ctx && this line produces i64::MIN or i64::MAX. Suppose our tp is TinyInt, then in Line 198 another warning will be generated.

// It's OK to clone because duration only occupies 8 bytes after #4858 merged
let dur = dur.round_frac(DEFAULT_FSP)?;
let val = Decimal::try_from(dur)?.as_i64_with_ctx(ctx)?;
convert_int_to_int(ctx, val, tp)

This comment has been minimized.

Copy link
@breeswish

breeswish Jul 4, 2019

Member

Same issue as above. Accepting ctx multiple times may result in multiple warnings being generated in a single UDF function call?


/// Converts a `Duration` to an i64 value
#[inline]
pub fn convert_duration_to_int(

This comment has been minimized.

Copy link
@breeswish

breeswish Jul 4, 2019

Member

LGTM, except for the multi warning issue.


/// Converts a `DateTime` to an i64 value
#[inline]
pub fn convert_datetime_to_int(

This comment has been minimized.

Copy link
@breeswish

breeswish Jul 4, 2019

Member

LGTM, except for the multi warning issue.

This comment has been minimized.

Copy link
@lonng

lonng Jul 4, 2019

Author Contributor

This PR is too large, the next PR contains a macro which uses as_i64 to avoid multiple warning issues.

let vs = get_valid_int_prefix(ctx, s)?;
bytes_to_int_without_context(vs.as_bytes()).map_err(|_| Error::overflow("BIGINT", &vs))
}

/// `bytes_to_uint` converts a byte arrays to an u64 in best effort.
pub fn bytes_to_uint(ctx: &mut EvalContext, bytes: &[u8]) -> Result<u64> {

This comment has been minimized.

Copy link
@breeswish

breeswish Jul 4, 2019

Member

The implementation of this function looks to be incorrect.

This comment has been minimized.

Copy link
@lonng

lonng Jul 4, 2019

Author Contributor

The function will be addressed in the next PR (convert * to uint).

ctx: &mut EvalContext,
valid_float: &'b str,
valid_float: &'a str,
) -> Result<Cow<'a, str>> {
let mut dot_idx = None;
let mut e_idx = None;

This comment has been minimized.

Copy link
@breeswish

breeswish Jul 4, 2019

Member

Looks like there is no such logic in TiDB:

            '0'..='9' => {
                if e_idx.is_none() {
                    if dot_idx.is_none() {
                        int_cnt += 1;
                    }
                    digits_cnt += 1;
                }
            }

This comment has been minimized.

Copy link
@lonng

lonng Jul 4, 2019

Author Contributor

The original use different way to implement the logic. I kept it and fixed some corner case.

This comment has been minimized.

Copy link
@breeswish

breeswish Jul 5, 2019

Member

For the first version I prefer to keep the code identical as much as possible, especially considering that we have proposed many changes in this PR. I don't think we have tests that can cover 100% branches so that differences are likely to cause many kind of bugs.

}

let mut int_str = String::with_capacity(s.len());
match s.rfind(|c| c != '9' && c != '+' && c != '-') {

This comment has been minimized.

Copy link
@breeswish

breeswish Jul 4, 2019

Member

Why it is testing against c != '9' && c != '+' && c != '-'? In TiDB's roundIntStr implementation + and - is not checked at all.

This comment has been minimized.

Copy link
@lonng

lonng Jul 4, 2019

Author Contributor

I handle the "+"/"-" in this function to make float_str_to_int_string more clear.

lonng added some commits Jul 4, 2019

address comment
Signed-off-by: Lonng <heng@lonng.org>
@lonng

This comment has been minimized.

Copy link
Contributor Author

commented Jul 4, 2019

/run-all-tests

lonng added some commits Jul 5, 2019

address comment
Signed-off-by: Lonng <heng@lonng.org>
pub fn convert_float_to_uint(fval: f64, upper_bound: u64, tp: FieldTypeTp) -> Result<u64> {
// TODO any performance problem to use round directly?
/// Converts a byte arrays to an i64 in best effort.
pub fn convert_bytes_to_int(ctx: &mut EvalContext, bytes: &[u8], tp: FieldTypeTp) -> Result<i64> {

This comment has been minimized.

Copy link
@breeswish
ctx: &mut EvalContext,
valid_float: &'b str,
valid_float: &'a str,
) -> Result<Cow<'a, str>> {
let mut dot_idx = None;
let mut e_idx = None;

This comment has been minimized.

Copy link
@breeswish

breeswish Jul 5, 2019

Member

For the first version I prefer to keep the code identical as much as possible, especially considering that we have proposed many changes in this PR. I don't think we have tests that can cover 100% branches so that differences are likely to cause many kind of bugs.

@lonng lonng referenced this pull request Jul 5, 2019

Merged

coprocessor: add functions convert * to decimal #5029

3 of 3 tasks complete
return Ok(Cow::Borrowed("0"));
let dot_idx = dot_idx.unwrap();
// NOTE: to make compatible with TiDB, +0.5 -> 1, -0.5 -> -1
let int_str = if int_cnt == 0 && valid_float.starts_with('-') {

This comment has been minimized.

Copy link
@Deardrops

Deardrops Jul 5, 2019

do we need handle the case - valid_float.starts_with('+') ?

if valid_len == 0 {
Ok("0")
} else {
Ok(&s[..valid_len])
}
}

fn round_int_str(num_next_dot: char, s: &str) -> Cow<'_, str> {

This comment has been minimized.

Copy link
@Deardrops
}
return Ok(Cow::Borrowed(&valid_float[..dot_idx.unwrap()]));
return Ok(Cow::Borrowed(int_str));

This comment has been minimized.

Copy link
@H-ZeX

H-ZeX Jul 5, 2019

Contributor

Why this is not Cow::Owned(int_str))

This comment has been minimized.

Copy link
@lonng

lonng Jul 5, 2019

Author Contributor

It's not Owned in all situation.

if exp > 0 && int_cnt > (i64::MAX - exp) {
// (exp + inc_cnt) overflows MaxInt64. Add warning and return original float string
// (exp + int_cnt) overflows MaxInt64. Add warning and return original float string
ctx.warnings
.append_warning(Error::overflow("BIGINT", &valid_float));
return Ok(Cow::Owned(valid_float.to_owned()));

This comment has been minimized.

Copy link
@H-ZeX

H-ZeX Jul 5, 2019

Contributor

Why this is not Cow::Borrowed(valid_float)

This comment has been minimized.

Copy link
@lonng

lonng Jul 5, 2019

Author Contributor

Nice catch.

@breeswish

This comment has been minimized.

Copy link
Member

commented Jul 5, 2019

/run-integration-tests

Show resolved Hide resolved src/coprocessor/codec/convert.rs
s.len()
};
int_str.push('1');
int_str.extend((0..zero_count).map(|_| '0'));

This comment has been minimized.

Copy link
@iosmanthus

iosmanthus Jul 5, 2019

Contributor

use std::iter::repeat instead:

            if zero_count > 0 {
                int_str.extend(std::iter::repeat('0').take(zero_count));
            }
@AndreMouche
Copy link
Member

left a comment

LGTM

@lonng lonng merged commit 65c5f8c into tikv:master Jul 5, 2019

5 checks passed

DCO All commits are signed off!
Details
idc-jenkins-ci-tikv/integration-common-test Jenkins job succeeded.
Details
idc-jenkins-ci-tikv/integration-compatibility-test Jenkins job succeeded.
Details
idc-jenkins-ci-tikv/integration-ddl-test Jenkins job succeeded.
Details
idc-jenkins-ci/test Jenkins job succeeded.
Details

@lonng lonng deleted the lonng:integer-bound branch Jul 5, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.