-
Notifications
You must be signed in to change notification settings - Fork 3.5k
[feat](query) use UUIDv7 for queryId #50126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
run buildall |
TPC-H: Total hot run time: 33772 ms
|
TPC-DS: Total hot run time: 193903 ms
|
ClickBench: Total hot run time: 29.31 s
|
|
||
namespace doris { | ||
|
||
// Format: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a note that UUID v7 is currently used to generate values, and why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this note should be added for Java part as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dolfinus could please add a benchmark to show performance between of new next_uuid()
and _boost_uuid_generator()
use like : https://quick-bench.com/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any experience in writing C++ code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quick-bench doesn't support including third-party libraries like Boost.
I've tried to use https://github.com/google/benchmark locally, and got these results:
#include <atomic>
#include <boost/random/mersenne_twister.hpp>
#include <boost/uuid/uuid.hpp>
#include <boost/uuid/uuid_generators.hpp>
#include <boost/uuid/uuid_io.hpp>
#include <chrono>
#include <cstdint>
#include <mutex>
#include <random>
#include <benchmark/benchmark.h>
class BoostUUID4Generator {
public:
boost::uuids::uuid next_uuid() {
std::lock_guard<std::mutex> lock(_uuid_gen_lock);
return _boost_uuid_generator();
}
static BoostUUID4Generator* instance() {
static BoostUUID4Generator generator;
return &generator;
}
private:
boost::uuids::basic_random_generator<boost::random::mt19937> _boost_uuid_generator;
std::mutex _uuid_gen_lock;
};
class BoostUUID7Generator {
public:
boost::uuids::uuid next_uuid() {
std::lock_guard<std::mutex> lock(_uuid_gen_lock);
return _boost_uuid_generator();
}
static BoostUUID7Generator* instance() {
static BoostUUID7Generator generator;
return &generator;
}
private:
boost::uuids::time_generator_v7 _boost_uuid_generator;
std::mutex _uuid_gen_lock;
};
class CustomUUID7Generator {
public:
boost::uuids::uuid next_uuid() {
std::lock_guard<std::mutex> lock(_uuid_gen_lock);
uint64_t millis =
std::chrono::time_point_cast<std::chrono::milliseconds>(std::chrono::system_clock::now()).time_since_epoch()
.count();
uint16_t counter = _counter.fetch_add(1, std::memory_order_relaxed) & 0xFF;
// Generate random bits
std::uniform_int_distribution<uint64_t> dis;
uint64_t random = dis(gen);
boost::uuids::uuid uuid;
uuid.data[0] = static_cast<uint8_t>((millis >> 40) & 0xFF);
uuid.data[1] = static_cast<uint8_t>((millis >> 32) & 0xFF);
uuid.data[2] = static_cast<uint8_t>((millis >> 24) & 0xFF);
uuid.data[3] = static_cast<uint8_t>((millis >> 16) & 0xFF);
uuid.data[4] = static_cast<uint8_t>((millis >> 8) & 0xFF);
uuid.data[5] = static_cast<uint8_t>(millis & 0xFF);
// Next 4 bits: version (7)
// Next 12 bits: counter
uuid.data[6] = static_cast<uint8_t>(0x70 | ((counter >> 8) & 0x0F));
uuid.data[7] = static_cast<uint8_t>(counter & 0xFF);
// Next 2 bits: variant (2)
// Remaining 62 bits: random
uuid.data[8] = static_cast<uint8_t>(0x80 | ((random >> 56) & 0x3F));
uuid.data[9] = static_cast<uint8_t>((random >> 48) & 0xFF);
uuid.data[10] = static_cast<uint8_t>((random >> 40) & 0xFF);
uuid.data[11] = static_cast<uint8_t>((random >> 32) & 0xFF);
uuid.data[12] = static_cast<uint8_t>((random >> 24) & 0xFF);
uuid.data[13] = static_cast<uint8_t>((random >> 16) & 0xFF);
uuid.data[14] = static_cast<uint8_t>((random >> 8) & 0xFF);
uuid.data[15] = static_cast<uint8_t>(random & 0xFF);
return uuid;
}
static CustomUUID7Generator* instance() {
static CustomUUID7Generator generator;
return &generator;
}
private:
std::mutex _uuid_gen_lock;
std::atomic<uint16_t> _counter {0};
std::random_device rd;
std::mt19937_64 gen { std::mt19937_64(rd()) };
};
static void BoostUUID4(benchmark::State& state) {
// Code before the loop is not measured
BoostUUID4Generator* generator = BoostUUID4Generator::instance();
// Code inside this loop is measured repeatedly
for (auto _ : state) {
boost::uuids::uuid generated_uuid = generator->next_uuid();
// Make sure the variable is not optimized away by compiler
benchmark::DoNotOptimize(generated_uuid);
}
}
// Register the function as a benchmark
BENCHMARK(BoostUUID4);
static void BoostUUID7(benchmark::State& state) {
// Code before the loop is not measured
BoostUUID7Generator* generator = BoostUUID7Generator::instance();
// Code inside this loop is measured repeatedly
for (auto _ : state) {
boost::uuids::uuid generated_uuid = generator->next_uuid();
// Make sure the variable is not optimized away by compiler
benchmark::DoNotOptimize(generated_uuid);
}
}
// Register the function as a benchmark
BENCHMARK(BoostUUID7);
static void CustomUUID7(benchmark::State& state) {
// Code before the loop is not measured
CustomUUID7Generator* generator = CustomUUID7Generator::instance();
// Code inside this loop is measured repeatedly
for (auto _ : state) {
boost::uuids::uuid generated_uuid = generator->next_uuid();
// Make sure the variable is not optimized away by compiler
benchmark::DoNotOptimize(generated_uuid);
}
}
BENCHMARK(CustomUUID7);
BENCHMARK_MAIN();
g++ -O3 mybenchmark.cc -std=c++11 -isystem benchmark/include -Lbenchmark/build/src -lbenchmark -lpthread -o mybenchmark
./mybenchmark
2025-04-27T14:32:24+03:00
Running ./mybenchmark
Run on (8 X 4800 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x4)
L1 Instruction 32 KiB (x4)
L2 Unified 256 KiB (x4)
L3 Unified 8192 KiB (x1)
Load Average: 1.45, 1.49, 1.47
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
------------------------------------------------------
Benchmark Time CPU Iterations
------------------------------------------------------
BoostUUID4 25.7 ns 25.6 ns 24760690
BoostUUID7 60.8 ns 60.6 ns 10957843
CustomUUID7 53.2 ns 53.1 ns 12940347
This benchmark uses code after fixing #50126 (review) and https://github.com/apache/doris/pull/50126/files/4190be14312634b95ba1cbb4edcf1af8a5473314#diff-e0023bb1a4d322e4239f33381749a03e826aba045e5944e41795ad72dcbfd916. Without these fixes, CustomUUID7 took 10k nanoseconds, due to random generator initialization on every call.
BTW, Boost 1.86+ already includes generator for UUID7 which can be used almost as-is. And it has a code handling counter wraparound which current approach doesn't.
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
be/src/util/uuid_generator.h
Outdated
// | unix_ts_ms | ver | rand_a | | ||
// +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||
// |var| rand_b | |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
private static final int VERSION = 7; | ||
private static final int VARIANT = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These fields are shifted on every method call. What do you think about storing them as bit mask?
private static final int VERSION = 7; | |
private static final int VARIANT = 2; | |
private static final long VERSION = 7 << 12; | |
private static final long VARIANT = 2 << 62; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll fix it. But it might sacrifice code readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
public UUID nextUUID() { | ||
long timestamp = Instant.now().toEpochMilli(); | ||
|
||
int counter = COUNTER.getAndIncrement() & 0xFF; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mask for counter is too short:
1111111111111111111111111111111111111111111111110000000000000000 # 48 bit
0000000000000000000000000000000000000000000000000111000000000000 # 4 bit
0000000000000000000000000000000000000000000000000000000011111111 # 8 bit, should be 12
1111111111111111111111111111111111111111111111110111000011111111 # 4 random bits are always 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
LTGM, but there are 2 open comments about class description, please take a look. |
run buildall |
Thanks. done |
PR approved by anyone and no changes requested. |
TPC-H: Total hot run time: 33816 ms
|
TPC-DS: Total hot run time: 191996 ms
|
ClickBench: Total hot run time: 29.77 s
|
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
ClickBench: Total hot run time: 28.38 s
|
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
std::chrono::duration_cast<std::chrono::milliseconds>(now.time_since_epoch()) | ||
.count(); | ||
|
||
if (millis <= _last_timestamp) { |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
long timestamp = current >>> 12; | ||
long sequence = current & MAX_SEQUENCE; | ||
|
||
if (currentTimestamp < timestamp || (currentTimestamp == timestamp && sequence >= MAX_SEQUENCE)) { |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
run buildall |
|
||
private UUID generateUUID(long timestamp, long sequence) { | ||
// Get counter value (12 bits) | ||
int counter = (int) (sequence & 0xFFF); |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
TPC-H: Total hot run time: 33550 ms
|
TPC-DS: Total hot run time: 182862 ms
|
ClickBench: Total hot run time: 28.88 s
|
run buildall |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
May I propose an alternative? This PR can be splitted to a set of smaller ones:
First 2 PRs can be reviewed and merged without any hesitation about performance and implementation complexity. Also diff for each PR will be smalled than now. |
What problem does this PR solve?
Issue Number: close #49907
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)