Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement compress,uncompress,uncompressed_length #3856

Merged
merged 8 commits into from Dec 5, 2018

Conversation

Projects
None yet
6 participants
@niedhui
Copy link
Contributor

commented Nov 28, 2018

Signed-off-by: niedhui niedhui@gmail.com

What have you changed? (mandatory)

implement builtin encryption functions: Compress, Uncompress, UncompressedLength

What are the type of the changes? (mandatory)

Improvement

How has this PR been tested? (mandatory)

unit test

Does this PR affect documentation (docs/docs-cn) update? (mandatory)

No

Does this PR affect tidb-ansible update? (mandatory)

No

Refer to a related PR or issue link (optional)

#3275

implement compress,uncompress,uncompressed_length
Signed-off-by: niedhui <niedhui@gmail.com>
@ice1000

This comment has been minimized.

Copy link
Member

commented Nov 28, 2018

Thank you!

@sre-bot

This comment has been minimized.

Copy link
Collaborator

commented Nov 28, 2018

Hi contributor, thanks for your PR.

This patch needs to be approved by someone of admins. They should reply with "/ok-to-test" to accept this PR for running test automatically.

@huachaohuang
Copy link
Contributor

left a comment

Thanks, the implementation looks good, some things need to be explained though.

match e.read_to_end(&mut vec) {
Ok(len) => {
let mut cap = 4 + len;
if vec[len - 1] == 32 {

This comment has been minimized.

Copy link
@huachaohuang

huachaohuang Nov 29, 2018

Contributor

Maybe add a comment for this behavior.

This comment has been minimized.

Copy link
@niedhui

niedhui Nov 29, 2018

Author Contributor

PTAL

let mut decoder = ZlibDecoder::new(&input[4..]);
let mut vec = Vec::with_capacity(len);
match decoder.read_to_end(&mut vec) {
Ok(decoded_len) if len >= decoded_len && decoded_len != 0 => Ok(Some(Cow::Owned(vec))),

This comment has been minimized.

Copy link
@huachaohuang

huachaohuang Nov 29, 2018

Contributor

Can you explain the logic behind these comparisons?

This comment has been minimized.

Copy link
@niedhui

niedhui Nov 29, 2018

Author Contributor

I some comments to explain these.

In the current implementation of TiDB, the first four bytes length is ignored , it try to uncompress , if success, return the string

In MySQL, if the first four bytes length is less than the actual 'uncompressed string' , it will return null and generate a warning ZLIB: Not enough room in the output buffer (probably, length of uncompressed data was corrupted). Maybe it only read exact bytes of data, so caused corruped.

This comment has been minimized.

Copy link
@breeswish

breeswish Dec 3, 2018

Member

Nice catch! Would you also like to file an issue in TiDB?

add comments to explain some obscure logic
Signed-off-by: niedhui <niedhui@gmail.com>
@breeswish
Copy link
Member

left a comment

Thanks a lot! Rest LGTM.

row: &[Datum],
) -> Result<Option<Cow<'a, [u8]>>> {
let input = try_opt!(self.children[0].eval_string(ctx, row));
if input.is_empty() {

This comment has been minimized.

Copy link
@breeswish

breeswish Dec 3, 2018

Member

Maybe add a comment as TiDB: Empty strings are stored as empty strings

This comment has been minimized.

Copy link
@niedhui

niedhui Dec 4, 2018

Author Contributor

done

let mut decoder = ZlibDecoder::new(&input[4..]);
let mut vec = Vec::with_capacity(len);
match decoder.read_to_end(&mut vec) {
Ok(decoded_len) if len >= decoded_len && decoded_len != 0 => Ok(Some(Cow::Owned(vec))),

This comment has been minimized.

Copy link
@breeswish

breeswish Dec 3, 2018

Member

Nice catch! Would you also like to file an issue in TiDB?

cap += 1;
}

let mut wtr = Vec::with_capacity(cap);

This comment has been minimized.

Copy link
@breeswish

breeswish Dec 3, 2018

Member

OMG GitHub lost my comment..

There is several re-allocation here, maybe it can be eliminated in the following ways:

  1. Create a Vec with capacity=original length (we asserted that in most cases compressed length < original length)
  2. Resize to 4 bytes (fill with 0)
  3. encoder = ZlibEncoder::new(vec)
  4. encoder.write_all(..)
  5. encoder.finish()
  6. append . if necessary
  7. modify first 4 bytes

This comment has been minimized.

Copy link
@niedhui

niedhui Dec 4, 2018

Author Contributor

@breeswish thanks. code updated. PTAL


let mut wtr = Vec::with_capacity(cap);
wtr.write_u32::<LittleEndian>(input.len() as u32).unwrap();
wtr.extend_from_slice(&vec);

This comment has been minimized.

Copy link
@breeswish

breeswish Dec 3, 2018

Member

Personally I prefer to write use byteorder::xxx at the scope of this function, because it is not useful for the rest of the file. Also apply to uncompress.

This comment has been minimized.

Copy link
@niedhui

niedhui Dec 4, 2018

Author Contributor

updated. PTAL
but now there are use byteorder::{ByteOrder, LittleEndian}; in three functions of the single file :(

niedhui added some commits Dec 4, 2018

more doc & some refactor
Signed-off-by: niedhui <niedhui@gmail.com>
Merge branch 'master' of https://github.com/tikv/tikv into niedhui/co…
…mpress

Signed-off-by: niedhui <niedhui@gmail.com>

# Conflicts:
#	src/coprocessor/dag/expr/scalar_function.rs
@breeswish
Copy link
Member

left a comment

Looks nice!

Show resolved Hide resolved src/coprocessor/dag/expr/builtin_encryption.rs Outdated
use more accurate capacity
Signed-off-by: niedhui <niedhui@gmail.com>
@breeswish

This comment has been minimized.

Copy link
Member

commented Dec 5, 2018

Good job!

@breeswish breeswish added the S: LGT1 label Dec 5, 2018

@huachaohuang
Copy link
Contributor

left a comment

Rest LGTM

let len = LittleEndian::read_u32(&input[0..4]) as usize;
let mut decoder = ZlibDecoder::new(&input[4..]);
let mut vec = Vec::with_capacity(len);
// if the length of uncompressed string is less than the length we read from the first

This comment has been minimized.

Copy link
@huachaohuang

huachaohuang Dec 5, 2018

Contributor

This comment seems wrong:
"the length of uncompressed string" = decoded_len
"the length we read from the first four bytes" = len
It is totally fine if decoded_len < len.

This comment has been minimized.

Copy link
@niedhui

niedhui Dec 5, 2018

Author Contributor

sorry , fixed

niedhui added some commits Dec 5, 2018

fix comment typo
Signed-off-by: niedhui <niedhui@gmail.com>
Merge branch 'master' of https://github.com/tikv/tikv into niedhui/co…
…mpress

Signed-off-by: niedhui <niedhui@gmail.com>

# Conflicts:
#	src/coprocessor/dag/expr/scalar_function.rs
@huachaohuang
Copy link
Contributor

left a comment

LGTM

@huachaohuang huachaohuang merged commit 51952fc into tikv:master Dec 5, 2018

3 checks passed

DCO All commits are signed off!
Details
ci/circleci: test Your tests passed on CircleCI!
Details
jenkins-ci-tikv/build Jenkins job succeeded.
Details

@AndreMouche AndreMouche referenced this pull request Dec 6, 2018

Open

coprocessor/expression: push down scalar functions #3275

333 of 469 tasks complete

@niedhui niedhui deleted the niedhui:niedhui/compress branch Dec 24, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.