New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sql table benchmark #7
Conversation
… so that it's comparable to data_table benchmark
Codecov Report
@@ Coverage Diff @@
## schema_change #7 +/- ##
==============================================
Coverage 90.76% 90.76%
==============================================
Files 82 82
Lines 3442 3442
==============================================
Hits 3124 3124
Misses 318 318
Continue to review full report at Codecov.
|
// Insert the num_inserts_ of tuples into a DataTable in a single thread | ||
// Insert the num_inserts_ of tuples into a SqlTable concurrently | ||
// NOLINTNEXTLINE | ||
BENCHMARK_DEFINE_F(SqlTableBenchmark, ConcurrentInsert)(benchmark::State &state) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying understand how these benchmarks work. What does State represent? I couldn't find it defined anywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It uses google benchmark (gbenchmark).
https://github.com/google/benchmark
LOGGING_DISABLED); | ||
std::vector<storage::TupleSlot> read_order; | ||
for (uint32_t i = 0; i < num_reads_; ++i) { | ||
read_order.emplace_back(table_->Insert(&txn, *redo_, storage::layout_version_t(0))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case wouldn't the insert time also be included in the benchmark? Wouldn't it be more accurate to move to some other set up function that isn't included in the benchmark time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The process time only includes code inside
for (auto _ : state) {
...
}
Basically, the final benchmark number is calculated by
state.SetItemsProcessed(state.iterations() * num_inserts_);
divided by
the time spent running the loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
google benchmark might run each benchmark multiple times to get a more accurate result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks for clarifying that.
LOGGING_DISABLED); | ||
std::vector<storage::TupleSlot> read_order; | ||
for (uint32_t i = 0; i < num_reads_; ++i) { | ||
read_order.emplace_back(table_->Insert(&txn, *redo_, storage::layout_version_t(0))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing as above, wouldn't it be better to exclude the insert time from this benchmark
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above
std::vector<catalog::Schema::Column> new_columns(columns_.begin(), columns_.end() - 1); | ||
new_columns.emplace_back("", type::TypeId::BIGINT, false, col_oid); | ||
catalog::Schema new_schema(new_columns, storage::layout_version_t(1)); | ||
table_->UpdateSchema(new_schema); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The update schema should probably also be not included in the benchmark time. This is an expensive operation and just this would make this slower than reading from a single version, so it wouldn't really give us an accurate benchmark comparison
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above
// NOLINTNEXTLINE | ||
for (auto _ : state) { | ||
for (uint32_t i = 0; i < num_inserts_; ++i) { | ||
table_->Select(&txn, read_order[i], read_pr, pair.second, storage::layout_version_t(1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sequential read benchmarks, shouldn't we be using scan, not select?
std::vector<catalog::Schema::Column> new_columns(columns_.begin(), columns_.end() - 1); | ||
new_columns.emplace_back("", type::TypeId::BIGINT, false, col_oid); | ||
catalog::Schema new_schema(new_columns, storage::layout_version_t(1)); | ||
table_->UpdateSchema(new_schema); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing as above comment about update schema
LOGGING_DISABLED); | ||
std::vector<storage::TupleSlot> read_order; | ||
for (uint32_t i = 0; i < num_reads_; ++i) { | ||
read_order.emplace_back(table_->Insert(&txn, *redo_, storage::layout_version_t(0))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing as above, about inserts not being in benchmark time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of the benchmarks are measuring read times, but during those the inserts and update schema are included. Isn't that measuring the wrong thing. Multi-versioned read will be slower that normal reads simply because update schema is included in the multi-versioned read benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
* Update friends * Add builder * Fix header typo * Fix build error * Fix build error * Fix build error * Fix build error * Fix build error * Fix build error * Doxygen update * Improve bootstrap * Fix build errors * Fix build errors * Fix build errors * Fix build errors * Implement allocation of DatabaseCatalog * Add additional indexes * Fix includes in database_catalog.h * Fix includes in database_catalog.h * Fix namespace error * Update index metadata for IndexSchema * Fix build errors * Fix build errors * Add missing functions * Fix build errors * Update TPCC to use new API * Various updates * Update * Snapshot before hackathon * Fix index operations * Fix index operations * Add database catalog implementation file * Implement DatabaseCatalog tear down function * Add header to implementation * Avoid double defs * Add missing includes * Catalog Accessor (#2) * accessor functions * lint * Remove obsolete code * PG_Type Bootstrapping (#3) * PG_Type bootstrap * Quick fixes * Remove double def * Fix undeclared variables * Fix undeclared variables * Fix reinterpret * Pass pointers and fix loops * Change to unsigned * Fix pci * Fix C++17-isms * catalog class fix (#4) * fix builder build unique index * some typos like `Initialize` -> `InitializeRow` * fix typos * Fix errors * Catalog refactor (#6) * Add pg_namespace & pg_attribute * Change function names * Add CreateIndex (#7) * PG_Type bootstrap * Quick fixes * Add support for CreateIndex to catalog * Add #define that was lost in merge * Fix dumb merge conflict * Remove lambda and remove memcpy in pg_type bootstrapping * So much good stuff (#8) * CreateTable. * GetTableOid * Add checks to GetTableOid. * Fix memory leak in Get stuffs. * DeleteTable. * support getIndexOid * SetTablePointer. * GetTable. * add comments for getClassOidKind * Fix classes_name_index_ accesses. * Remove dead code. * Miscellaneous fixes * Fix build errors and add copy constructor for IndexSchema::Column * Fix build errors * More Index catalog functions and general fixes (#9) * PG_Type bootstrap * Quick fixes * Add support for CreateIndex to catalog * Add #define that was lost in merge * Fix dumb merge conflict * Remove lambda and remove memcpy in pg_type bootstrapping * Add GetIndex * Implement GetIndexes * Implement DeleteIndex * Add missing StageDelete calls * Remove using std::max to allocate buffer * Add missing argument to declaration * Add TODO for deleting of index schema * Multiple fixes * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix build errors * Revert constant value expression to master * Fix API calls * Regexes are awesome! * Will anyone see this? * You've got a friend in me... * Fix plan_node_json_test.cpp * Fix syntax * Fix Column constructors * Fix parser/planner oddities * Fix create table plan node * Fix create table plan node * Fix create table plan node * Fix create table plan node * Fix create table plan node * Strip old assumptions out of deferred actions test * Fix bwtree index test * Fix plan node JSON test * Fix typo * Fix typo * Fix typo * Fix static declaration * Fix TPCC loader * Fix log test * Fix log test * Fix tests * Fix tests * Fix tests * Ninja format * Update CreateDatabase API * Finish bootstrap logic * Add missing includes * Add missing includes and fix typos * Fix API call * Fix API call * Fix API * Fix API call * Fix catalog test * Fix catalog test * Fix catalog test * Fixes * Fixes * Fixes * Fixes * Fixes * Fixes * Fixes * Fixes * Fixes * Fixes * Fixes * Fixes * Fixes * Fixes * Fixes * Fix error in constant value expression * Fix constructor error * Fix constructor error * Fix constructor error * Fix clang error * Fix clang error * Fix BWTree Index test * Fix BWTree Index test * Fixes * Ninja format * Fixes? * Fixes? * Fixes? * Fixes? * Fixes? * Ninja format * Fix bug in IndexBuilder when computing GenericKey size that could lead to out of bounds access. Added a test to exercise this, along with stronger asserts throughout the code to check this as well. * C++ is not Python... * Matt's awesome! * Matt's less awesome... * Fix typo * Add assert * Fix typo * Use larger buffer * Improve error message? * Improve error message? * Fix test * Fix missing StageDelete * Fix missing index creation * Fix missing logic error * Fix missing search path * Fix missing search path * Fix ptr error * Fix buffer reuse in DeleteTable and DeleteIndex. * Call teardown * Cleanup buffer * Cleanup once more? * Fix catalog tests * Fix catalog tests * Add a must_abort_ flag to TransactionContext that is then checked in TransactionManager::Commit(). Currently, it's flipped in indexes (unique insert fails) and SqlTable (write-write conflict). * Fix catalog tests * Move ProcessDeferredActions after the unlink step. * Uniqueness * Or not... * Clean up memory leak with varlen index keys. * A couple TODOs. Currently passes all existing tests without leaking memory. * clang-format. * lint fixes. * Add index test * Add index test * Add index test * Add index test * tidy fixes. * Improve index test * Improve index test * Improve index test * Improve index test * doxygen fixes. * Remove bad assert * Fix type * Fix leak * Fix leak * Fix leak * Fix leak * Fix leak * Format for Matt * Fix Release mode build. * Add search path test * Add search path test * Add search path test * Add search path test * Fix tidy error in test * DatabaseCatalog::CreateNamespace cleanup. * DatabaseCatalog::DeleteNamespace cleanup. * DatabaseCatalog::GetNamespaceOid cleanup. * DatabaseCatalog::CreateAttribute cleanup. * DatabaseCatalog::GetAttribute cleanup. * DatabaseCatalog::GetAttributes cleanup. * DatabaseCatalog::DeleteColumns cleanup. * DatabaseCatalog::GetClassOidKind cleanup. * DatabaseCatalog::GetIndexes cleanup. * DatabaseCatalog::SetTablePointer and DatabaseCatalog::SetIndexPointer cleanup. * Move that VarlenEntry helper method out of pg_attribute.h. Fixed up Doxygen after today's refactors. * tidy. * catalog.cpp cleanup. * Address PR review stuffs. * Refactor IndexKeySchema::Column signature. * Fix TPCC schemas.h * Update builder * Update API calls * Update API calls * Update API calls * Schema and IndexSchema API consolidation for use in templated catalog methods. * Remove catalog folder from benchmarks because who benchmarks a catalog? * Remove old TODO. * More Schema and IndexSchema cleanup. * More Schema and IndexSchema cleanup. Still working on pg_attribute-related fixes, but we're falling down a template rabbit-hole that I might need to excavate myself from soon. It's just getting worse the more we templatize it. * Explicit instantiation for pg_attribute functions finally compiles. DeleteColumns also removes the entries from pg_attribute. * Add assert to IndexSchema constructor regarding is_unique and is_primary. Also added one in the IndexBuilder to make sure is_unique matches ConstraintType.
Add Benchmarks
SimpleInsert
SingleVersionSequentialRead
SingleVersionRandomRead
MultiVersionMismatchSequentialRead
MultiVersionMismatchRandomRead
MultiVersionMatchSequentialRead
MultiVersionMatchRandomRead
ConcurrentInsert
ConcurrentSingleVersionRead
ConcurrentMultiVersionRead