Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch Top N Executor #4825

Merged
merged 7 commits into from Jun 5, 2019

Conversation

Projects
None yet
3 participants
@breeswish
Copy link
Member

commented Jun 3, 2019

What have you changed? (mandatory)

This PR adds batch top N executor.

What are the type of the changes? (mandatory)

  • New feature (change which adds functionality)

How has this PR been tested? (mandatory)

Unit test

Top N executor
Signed-off-by: Breezewish <breezewish@pingcap.com>

@breeswish breeswish force-pushed the breeswish:__batch/top_n branch from 2c81436 to b19714f Jun 3, 2019

@breeswish

This comment has been minimized.

Copy link
Member Author

commented Jun 3, 2019

/run-integration-tests

@breeswish breeswish marked this pull request as ready for review Jun 3, 2019

@breeswish breeswish added the C: Copr label Jun 3, 2019

/// are placed behind a reference counter. However, there won't be a place other than
/// `HeapItemUnsafe` holding this reference counter, so Rc won't break
/// `BatchTopNExecutor: Send`.
source_column_data: Pin<Rc<LazyBatchColumnVec>>,

This comment has been minimized.

Copy link
@breeswish

breeswish Jun 3, 2019

Author Member

For reviewers: This character needs to be verified carefully. Please help confirm this Rc is correct. @sticnarf @lonng

This comment was marked as resolved.

Copy link
@sticnarf

sticnarf Jun 4, 2019

Contributor

There is a risk. BinaryHeap is from the standard library. We cannot make sure that BinaryHeap won't move the HeapItemUnsafe to other threads. (Although it's unreasonable for BinaryHeap to do so.

This comment was marked as resolved.

Copy link
@sticnarf

sticnarf Jun 4, 2019

Contributor

If then, we might get data races when multiple Rcs get dropped at the same time.

This comment has been minimized.

Copy link
@sticnarf

sticnarf Jun 4, 2019

Contributor

Sorry, not actually a problem

This comment has been minimized.

Copy link
@breeswish

breeswish Jun 4, 2019

Author Member

Yes, not actually a problem since HeapItemUnsafe remains !Sync + !Send.

is_ended: bool,
}

unsafe impl<Src: BatchExecutor + Send> Send for BatchTopNExecutor<Src> {}

This comment has been minimized.

Copy link
@breeswish

breeswish Jun 3, 2019

Author Member

For reviewers: This character needs to be verified carefully. Please help confirm Send bound can be satisfied. @sticnarf @lonng

Address comments
Signed-off-by: Breezewish <breezewish@pingcap.com>
@breeswish

This comment has been minimized.

Copy link
Member Author

commented Jun 3, 2019

/run-integration-tests

@breeswish breeswish referenced this pull request Jun 3, 2019

Merged

Batch Top N Layered Benchmarks #4827

1 of 1 task complete

@breeswish breeswish changed the title Batch Top N Executor WIP: Batch Top N Executor Jun 3, 2019

@breeswish

This comment has been minimized.

Copy link
Member Author

commented Jun 3, 2019

According to the benchmark in #4827 there is a case that current implementation is less efficient. So mark this PR as WIP.

@breeswish breeswish added the S: WIP label Jun 3, 2019

}
}

impl<'a> Eq for ScalarValueRef<'a> {}

This comment has been minimized.

Copy link
@lonng

lonng Jun 4, 2019

Contributor

Why not put the Eq in derive directly?

This comment has been minimized.

Copy link
@breeswish

breeswish Jun 4, 2019

Author Member

good catch

/// not dropped). Thus it is called unsafe.
pub struct HeapItemUnsafe {
/// A pointer to the `order_is_desc` field in `BatchTopNExecutor`.
order_is_desc_ptr: NonNull<Pin<Vec<bool>>>,

This comment has been minimized.

Copy link
@sticnarf

sticnarf Jun 4, 2019

Contributor

We shouldn't point to the order_is_desc field in BatchTopNExecutor. A move of BatchTopNExecutor will invalidate it. We can point to the inner slice of the vec instead. That address is stable.

This comment has been minimized.

Copy link
@sticnarf

sticnarf Jun 4, 2019

Contributor

BTW, for those fields that don't change after the executor is created, we can simply store a Box<[T]> in the executor, it will save one word (

This comment has been minimized.

Copy link
@breeswish

breeswish Jun 4, 2019

Author Member

I think actually order_is_desc is indeed pinned over the inner slice. Overwise it violates what Pin provides.

Box<[T]> is a good suggestion!

This comment has been minimized.

Copy link
@sticnarf

sticnarf Jun 4, 2019

Contributor

I don't think so. Pin ensures the inner slice (Deref::Target) is pinned, but Vec itself is not protected. Illustration: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=f17281079b78b1294e7106a92f767d64

This comment has been minimized.

Copy link
@breeswish

breeswish Jun 4, 2019

Author Member

Updated, PTAL again.

breeswish added some commits Jun 4, 2019

Remove Pin to be more clear
Signed-off-by: Breezewish <breezewish@pingcap.com>
derive Eq
Signed-off-by: Breezewish <breezewish@pingcap.com>
@breeswish

This comment has been minimized.

Copy link
Member Author

commented Jun 4, 2019

/run-integration-tests


#[allow(clippy::transmute_ptr_to_ptr)]
fn process_batch_input(&mut self, mut data: LazyBatchColumnVec) -> Result<()> {
let src_schema_unbounded = unsafe { &*(self.src.schema() as *const _) };

This comment has been minimized.

Copy link
@lonng

lonng Jun 4, 2019

Contributor

I think to use std::mem::transmute(&self.src.schema()) to extend explicitly lifetime is more readable. e.g:

let src_schema_unbounded: &'static ... = unsafe {std::mem::transmute(&self.src.schema())}

Everyone can understand the meaning immediately.

This comment has been minimized.

Copy link
@sticnarf

sticnarf Jun 4, 2019

Contributor

....Looks like we diverge in this. Shall we add an unsafe util function with name "extend_lifetime"?

This comment has been minimized.

Copy link
@lonng

lonng Jun 4, 2019

Contributor

@sticnarf I think it's better to add this function.


let eval_offset = self.eval_columns_buffer_unsafe.len();
let order_exprs_unbounded = unsafe { &*(&*self.order_exprs as *const [RpnExpression]) };
let data_unbounded = unsafe { &*(&*data as *const _) };

This comment has been minimized.

Copy link
@lonng

lonng Jun 4, 2019

Contributor

(ditto)

breeswish added some commits Jun 4, 2019

Improve materialize performance
Signed-off-by: Breezewish <breezewish@pingcap.com>
@breeswish

This comment has been minimized.

Copy link
Member Author

commented Jun 4, 2019

/run-integration-tests

@breeswish breeswish removed the S: WIP label Jun 4, 2019

@breeswish breeswish changed the title WIP: Batch Top N Executor Batch Top N Executor Jun 4, 2019

}
}

#[allow(clippy::transmute_ptr_to_ptr)]

This comment has been minimized.

Copy link
@sticnarf

sticnarf Jun 5, 2019

Contributor

We can remove it now.

Remove unused clippy rule
Signed-off-by: Breezewish <breezewish@pingcap.com>
@lonng

lonng approved these changes Jun 5, 2019

@breeswish

This comment has been minimized.

Copy link
Member Author

commented Jun 5, 2019

/rebuild

@sticnarf
Copy link
Contributor

left a comment

LGTM
Something not important:

  • HeapItemUnsafe needn't be pub. (But it cannot be constructed out of the mod, thus totally ok)
  • Strictly speaking, I think whether BatchTopNExecutor is Send still depends on the implementation of Rc. For example, if Rc depends on something thread local (we know it doesn't), then our BatchTopNExecutor is not Send too.
@breeswish

This comment has been minimized.

Copy link
Member Author

commented Jun 5, 2019

@sticnarf Thanks for pointing out. Your concern is valid and I cannot think out some safety belts to protect us from this change (in future). I will add a comment for this later in future PRs.

@breeswish breeswish merged commit 7d3472e into tikv:master Jun 5, 2019

2 checks passed

DCO All commits are signed off!
Details
idc-jenkins-ci/test Jenkins job succeeded.
Details

@breeswish breeswish deleted the breeswish:__batch/top_n branch Jun 5, 2019

sticnarf added a commit to sticnarf/tikv that referenced this pull request Jun 10, 2019

Remove all Box of RPN functions
Signed-off-by: Yilin Chen <sticnarf@gmail.com>

Makefile: make sure gdb is installed before sse4.2 check (tikv#4832)

Signed-off-by: Kaige Ye <ye@kaige.org>

Upgrade sys-info (tikv#4760)

* *: upgrade sys-info crate

This fixes a problem with the next toolchain upgrade
where rust fails to link the native components of the crate.

Signed-off-by: Brian Anderson <andersrb@gmail.com>

* *: bump sys-info to 0.5.7

Signed-off-by: Brian Anderson <andersrb@gmail.com>

Batch Top N Executor (tikv#4825)

Signed-off-by: Breezewish <breezewish@pingcap.com>

Add help message in doc:go-client-api.md (tikv#4763)

* add help message in doc:go-client-api.md

Signed-off-by: yy <cacheyy@qq.com>

* update go-client-api.md

Signed-off-by: yy <cacheyy@qq.com>

Modify Makefile to distinguish between developer and packaging use cases (tikv#4687)

* make: Add new "dist_release" rules

To make the optimized build faster the existing "release" rules are going to
changed such that they are not identical to the actual releases. Primarily they
will not have debuginfo by default and will use thinLTO instead of LTO.

This adds new "dist_release", etc rules for the CI/CD system to use.

For now they are identical to the existing rules. After CI is updated
the "release" rules will be changed.

Signed-off-by: Brian Anderson <andersrb@gmail.com>

* make: Document release rules

Signed-off-by: Brian Anderson <andersrb@gmail.com>

* Makefile: indicate use of fail_release

Signed-off-by: Brian Anderson <andersrb@gmail.com>

* Clarify the distinction in instruction set for release targets

Signed-off-by: Brian Anderson <andersrb@gmail.com>

Makefile: fix gdb check (tikv#4840)

Signed-off-by: Kaige Ye <ye@kaige.org>

pessimistic-txn: solve non-pessimistic-lock conflict (tikv#4801)

* txn: replace is_pessimistic_lock to for_update_ts in Lock

Signed-off-by: youjiali1995 <zlwgx1023@gmail.com>

* pessimistic-txn: overwrite optimistic lock in pessimistic_prewrite if
request's for_update_ts is greater than lock's for_update_ts

Signed-off-by: youjiali1995 <zlwgx1023@gmail.com>

* modify comment

Signed-off-by: youjiali1995 <zlwgx1023@gmail.com>

* address comment

Signed-off-by: youjiali1995 <zlwgx1023@gmail.com>

* address comment

Signed-off-by: youjiali1995 <zlwgx1023@gmail.com>

* address comment

Signed-off-by: youjiali1995 <zlwgx1023@gmail.com>

* address comment

Signed-off-by: youjiali1995 <zlwgx1023@gmail.com>

* add comment

Signed-off-by: youjiali1995 <zlwgx1023@gmail.com>

* return Error let TiDB to resolve lock

Signed-off-by: youjiali1995 <zlwgx1023@gmail.com>

* address comment

Signed-off-by: youjiali1995 <zlwgx1023@gmail.com>

* address comment

Signed-off-by: youjiali1995 <zlwgx1023@gmail.com>

coprocessor: add batch aggregate function BitAnd/BitOr/BitXor (tikv#4824)

Batch Top N Layered Benchmarks (tikv#4827)

* Add Top N benchmarks

Signed-off-by: Breezewish <breezewish@pingcap.com>

* Address some comments in previous PRs

Signed-off-by: Breezewish <breezewish@pingcap.com>

coprocessor: add batch aggregate function Max/Min (tikv#4837)

Implement RpnFunction MultiplyDecimal (tikv#4849)

Signed-off-by: Breezewish <breezewish@pingcap.com>

Add missing fsync calls in the snapshot module (tikv#4850)

Signed-off-by: Ben Pig Chu <benpichu@gmail.com>

use HTTP to enable jemalloc profile (tikv#4600)

use HTTP to enable jemalloc profile

Signed-off-by: Yang Keao <keao.yang@yahoo.com>

coprocessor: use servo_arc in BatchTopNExecutor (tikv#4854)

Signed-off-by: Yilin Chen <sticnarf@gmail.com>

Fix clippy warnings

Signed-off-by: Yilin Chen <sticnarf@gmail.com>

Fix test

Signed-off-by: Yilin Chen <sticnarf@gmail.com>

Add docs to function.rs

Signed-off-by: Yilin Chen <sticnarf@gmail.com>

Add example output of the macro in the test of the macro.

Signed-off-by: Yilin Chen <sticnarf@gmail.com>

fix broken url for configuration options (tikv#4856)

Signed-off-by: Yukang <moorekang@gmail.com>

shrink the latch waiting list (tikv#4844)

Signed-off-by: zhangjinpeng1987 <zhangjinpeng@pingcap.com>

Fix clippy

Signed-off-by: Yilin Chen <sticnarf@gmail.com>

scheduler use spin::Mutex (tikv#4829)

* scheduler use spinlock

Signed-off-by: zhangjinpeng1987 <zhangjinpeng@pingcap.com>

Better panic info

Signed-off-by: Yilin Chen <sticnarf@gmail.com>

breeswish added a commit to breeswish/tikv that referenced this pull request Jun 12, 2019

Batch Top N Executor (tikv#4825)
Signed-off-by: Breezewish <breezewish@pingcap.com>

breeswish added a commit to breeswish/tikv that referenced this pull request Jun 12, 2019

Batch Top N Executor (tikv#4825)
Signed-off-by: Breezewish <breezewish@pingcap.com>

breeswish added a commit to breeswish/tikv that referenced this pull request Jun 12, 2019

Batch Top N Executor (tikv#4825)
Signed-off-by: Breezewish <breezewish@pingcap.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.