Sql table benchmark #7

yangjuns · 2019-03-22T15:46:12Z

Add Benchmarks

… so that it's comparable to data_table benchmark

codecov-io · 2019-03-23T17:32:51Z

Codecov Report

Merging #7 into schema_change will not change coverage.
The diff coverage is 100%.

@@              Coverage Diff               @@
##           schema_change       #7   +/-   ##
==============================================
  Coverage          90.76%   90.76%           
==============================================
  Files                 82       82           
  Lines               3442     3442           
==============================================
  Hits                3124     3124           
  Misses               318      318

Impacted Files	Coverage Δ
src/storage/sql_table.cpp	`99.17% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7c88a17...7bf8981. Read the comment docs.

…enchmarks

yashNaN · 2019-03-23T19:04:14Z

benchmark/storage/sql_table_benchmark.cpp

-// Insert the num_inserts_ of tuples into a DataTable in a single thread
+// Insert the num_inserts_ of tuples into a SqlTable concurrently
+// NOLINTNEXTLINE
+BENCHMARK_DEFINE_F(SqlTableBenchmark, ConcurrentInsert)(benchmark::State &state) {


I am trying understand how these benchmarks work. What does State represent? I couldn't find it defined anywhere

It uses google benchmark (gbenchmark).
https://github.com/google/benchmark

yashNaN · 2019-03-23T19:19:31Z

benchmark/storage/sql_table_benchmark.cpp

+                                      LOGGING_DISABLED);
+  std::vector<storage::TupleSlot> read_order;
+  for (uint32_t i = 0; i < num_reads_; ++i) {
+    read_order.emplace_back(table_->Insert(&txn, *redo_, storage::layout_version_t(0)));


In this case wouldn't the insert time also be included in the benchmark? Wouldn't it be more accurate to move to some other set up function that isn't included in the benchmark time?

The process time only includes code inside

for (auto _ : state) { ... }

Basically, the final benchmark number is calculated by
state.SetItemsProcessed(state.iterations() * num_inserts_); divided by
the time spent running the loop

google benchmark might run each benchmark multiple times to get a more accurate result.

Ah, thanks for clarifying that.

yashNaN · 2019-03-23T19:20:35Z

benchmark/storage/sql_table_benchmark.cpp

+                                      LOGGING_DISABLED);
+  std::vector<storage::TupleSlot> read_order;
+  for (uint32_t i = 0; i < num_reads_; ++i) {
+    read_order.emplace_back(table_->Insert(&txn, *redo_, storage::layout_version_t(0)));


Same thing as above, wouldn't it be better to exclude the insert time from this benchmark

yashNaN · 2019-03-23T19:23:13Z

benchmark/storage/sql_table_benchmark.cpp

+  std::vector<catalog::Schema::Column> new_columns(columns_.begin(), columns_.end() - 1);
+  new_columns.emplace_back("", type::TypeId::BIGINT, false, col_oid);
+  catalog::Schema new_schema(new_columns, storage::layout_version_t(1));
+  table_->UpdateSchema(new_schema);


The update schema should probably also be not included in the benchmark time. This is an expensive operation and just this would make this slower than reading from a single version, so it wouldn't really give us an accurate benchmark comparison

yashNaN · 2019-03-23T19:26:37Z

benchmark/storage/sql_table_benchmark.cpp

+  // NOLINTNEXTLINE
+  for (auto _ : state) {
+    for (uint32_t i = 0; i < num_inserts_; ++i) {
+      table_->Select(&txn, read_order[i], read_pr, pair.second, storage::layout_version_t(1));


For sequential read benchmarks, shouldn't we be using scan, not select?

yashNaN · 2019-03-23T19:27:15Z

benchmark/storage/sql_table_benchmark.cpp

+  std::vector<catalog::Schema::Column> new_columns(columns_.begin(), columns_.end() - 1);
+  new_columns.emplace_back("", type::TypeId::BIGINT, false, col_oid);
+  catalog::Schema new_schema(new_columns, storage::layout_version_t(1));
+  table_->UpdateSchema(new_schema);


Same thing as above comment about update schema

yashNaN · 2019-03-23T19:28:19Z

benchmark/storage/sql_table_benchmark.cpp

+                                      LOGGING_DISABLED);
+  std::vector<storage::TupleSlot> read_order;
+  for (uint32_t i = 0; i < num_reads_; ++i) {
+    read_order.emplace_back(table_->Insert(&txn, *redo_, storage::layout_version_t(0)));


Same thing as above, about inserts not being in benchmark time.

yashNaN

A lot of the benchmarks are measuring read times, but during those the inserts and update schema are included. Isn't that measuring the wrong thing. Multi-versioned read will be slower that normal reads simply because update schema is included in the multi-versioned read benchmark.

yashNaN

Looks good to me.

* Update friends * Add builder * Fix header typo * Fix build error * Fix build error * Fix build error * Fix build error * Fix build error * Fix build error * Doxygen update * Improve bootstrap * Fix build errors * Fix build errors * Fix build errors * Fix build errors * Implement allocation of DatabaseCatalog * Add additional indexes * Fix includes in database_catalog.h * Fix includes in database_catalog.h * Fix namespace error * Update index metadata for IndexSchema * Fix build errors * Fix build errors * Add missing functions * Fix build errors * Update TPCC to use new API * Various updates * Update * Snapshot before hackathon * Fix index operations * Fix index operations * Add database catalog implementation file * Implement DatabaseCatalog tear down function * Add header to implementation * Avoid double defs * Add missing includes * Catalog Accessor (#2) * accessor functions * lint * Remove obsolete code * PG_Type Bootstrapping (#3) * PG_Type bootstrap * Quick fixes * Remove double def * Fix undeclared variables * Fix undeclared variables * Fix reinterpret * Pass pointers and fix loops * Change to unsigned * Fix pci * Fix C++17-isms * catalog class fix (#4) * fix builder build unique index * some typos like `Initialize` -> `InitializeRow` * fix typos * Fix errors * Catalog refactor (#6) * Add pg_namespace & pg_attribute * Change function names * Add CreateIndex (#7) * PG_Type bootstrap * Quick fixes * Add support for CreateIndex to catalog * Add #define that was lost in merge * Fix dumb merge conflict * Remove lambda and remove memcpy in pg_type bootstrapping * So much good stuff (#8) * CreateTable. * GetTableOid * Add checks to GetTableOid. * Fix memory leak in Get stuffs. * DeleteTable. * support getIndexOid * SetTablePointer. * GetTable. * add comments for getClassOidKind * Fix classes_name_index_ accesses. * Remove dead code. * Miscellaneous fixes * Fix build errors and add copy constructor for IndexSchema::Column * Fix build errors * More Index catalog functions and general fixes (#9) * PG_Type bootstrap * Quick fixes * Add support for CreateIndex to catalog * Add #define that was lost in merge * Fix dumb merge conflict * Remove lambda and remove memcpy in pg_type bootstrapping * Add GetIndex * Implement GetIndexes * Implement DeleteIndex * Add missing StageDelete calls * Remove using std::max to allocate buffer * Add missing argument to declaration * Add TODO for deleting of index schema * Multiple fixes * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix errors * Fix build errors * Revert constant value expression to master * Fix API calls * Regexes are awesome! * Will anyone see this? * You've got a friend in me... * Fix plan_node_json_test.cpp * Fix syntax * Fix Column constructors * Fix parser/planner oddities * Fix create table plan node * Fix create table plan node * Fix create table plan node * Fix create table plan node * Fix create table plan node * Strip old assumptions out of deferred actions test * Fix bwtree index test * Fix plan node JSON test * Fix typo * Fix typo * Fix typo * Fix static declaration * Fix TPCC loader * Fix log test * Fix log test * Fix tests * Fix tests * Fix tests * Ninja format * Update CreateDatabase API * Finish bootstrap logic * Add missing includes * Add missing includes and fix typos * Fix API call * Fix API call * Fix API * Fix API call * Fix catalog test * Fix catalog test * Fix catalog test * Fixes * Fixes * Fixes * Fixes * Fixes * Fixes * Fixes * Fixes * Fixes * Fixes * Fixes * Fixes * Fixes * Fixes * Fixes * Fix error in constant value expression * Fix constructor error * Fix constructor error * Fix constructor error * Fix clang error * Fix clang error * Fix BWTree Index test * Fix BWTree Index test * Fixes * Ninja format * Fixes? * Fixes? * Fixes? * Fixes? * Fixes? * Ninja format * Fix bug in IndexBuilder when computing GenericKey size that could lead to out of bounds access. Added a test to exercise this, along with stronger asserts throughout the code to check this as well. * C++ is not Python... * Matt's awesome! * Matt's less awesome... * Fix typo * Add assert * Fix typo * Use larger buffer * Improve error message? * Improve error message? * Fix test * Fix missing StageDelete * Fix missing index creation * Fix missing logic error * Fix missing search path * Fix missing search path * Fix ptr error * Fix buffer reuse in DeleteTable and DeleteIndex. * Call teardown * Cleanup buffer * Cleanup once more? * Fix catalog tests * Fix catalog tests * Add a must_abort_ flag to TransactionContext that is then checked in TransactionManager::Commit(). Currently, it's flipped in indexes (unique insert fails) and SqlTable (write-write conflict). * Fix catalog tests * Move ProcessDeferredActions after the unlink step. * Uniqueness * Or not... * Clean up memory leak with varlen index keys. * A couple TODOs. Currently passes all existing tests without leaking memory. * clang-format. * lint fixes. * Add index test * Add index test * Add index test * Add index test * tidy fixes. * Improve index test * Improve index test * Improve index test * Improve index test * doxygen fixes. * Remove bad assert * Fix type * Fix leak * Fix leak * Fix leak * Fix leak * Fix leak * Format for Matt * Fix Release mode build. * Add search path test * Add search path test * Add search path test * Add search path test * Fix tidy error in test * DatabaseCatalog::CreateNamespace cleanup. * DatabaseCatalog::DeleteNamespace cleanup. * DatabaseCatalog::GetNamespaceOid cleanup. * DatabaseCatalog::CreateAttribute cleanup. * DatabaseCatalog::GetAttribute cleanup. * DatabaseCatalog::GetAttributes cleanup. * DatabaseCatalog::DeleteColumns cleanup. * DatabaseCatalog::GetClassOidKind cleanup. * DatabaseCatalog::GetIndexes cleanup. * DatabaseCatalog::SetTablePointer and DatabaseCatalog::SetIndexPointer cleanup. * Move that VarlenEntry helper method out of pg_attribute.h. Fixed up Doxygen after today's refactors. * tidy. * catalog.cpp cleanup. * Address PR review stuffs. * Refactor IndexKeySchema::Column signature. * Fix TPCC schemas.h * Update builder * Update API calls * Update API calls * Update API calls * Schema and IndexSchema API consolidation for use in templated catalog methods. * Remove catalog folder from benchmarks because who benchmarks a catalog? * Remove old TODO. * More Schema and IndexSchema cleanup. * More Schema and IndexSchema cleanup. Still working on pg_attribute-related fixes, but we're falling down a template rabbit-hole that I might need to excavate myself from soon. It's just getting worse the more we templatize it. * Explicit instantiation for pg_attribute functions finally compiles. DeleteColumns also removes the entries from pg_attribute. * Add assert to IndexSchema constructor regarding is_unique and is_primary. Also added one in the IndexBuilder to make sure is_unique matches ConstraintType.

Yangjun Sheng added 4 commits March 22, 2019 10:44

Add sql table single version sequential read test

1b081bc

Add multiversion sequential read benchmark

5984e18

Make things compile

b009b4d

Add MultiVersionRandomRead sql table benchmark

a4e2697

yangjuns self-assigned this Mar 22, 2019

yangjuns added the In Progress This PR is in progress and not ready to be reviewed label Mar 22, 2019

Yangjun Sheng added 2 commits March 23, 2019 12:54

Add concurrent insert benchmark and reduce the number of columns to 2…

5b30dce

… so that it's comparable to data_table benchmark

Format

cdbcc6e

Yangjun Sheng added 2 commits March 23, 2019 14:03

Add MultiVersionMatchSequentialRead and MultiVersionMatchRandomRead b…

1fc2b2b

…enchmarks

Add concurrent read benchmark

7bf8981

yashNaN reviewed Mar 23, 2019

View reviewed changes

yangjuns added ReadyForReview This PR is ready for review and removed In Progress This PR is in progress and not ready to be reviewed labels Mar 23, 2019

yashNaN approved these changes Mar 23, 2019

View reviewed changes

jrolli merged commit d81f871 into yashNaN:schema_change Mar 23, 2019

yangjuns deleted the sql_table_benchmark branch March 25, 2019 19:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sql table benchmark #7

Sql table benchmark #7

yangjuns commented Mar 22, 2019 •

edited

codecov-io commented Mar 23, 2019 •

edited

yashNaN Mar 23, 2019

yangjuns Mar 23, 2019 •

edited

yashNaN Mar 23, 2019

yangjuns Mar 23, 2019

yangjuns Mar 23, 2019

yashNaN Mar 23, 2019

yashNaN Mar 23, 2019

yangjuns Mar 23, 2019

yashNaN Mar 23, 2019

yangjuns Mar 23, 2019

yashNaN Mar 23, 2019

yashNaN Mar 23, 2019

yashNaN Mar 23, 2019

yashNaN left a comment

yashNaN left a comment

Sql table benchmark #7

Sql table benchmark #7

Conversation

yangjuns commented Mar 22, 2019 • edited

codecov-io commented Mar 23, 2019 • edited

Codecov Report

Choose a reason for hiding this comment

yangjuns Mar 23, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yashNaN left a comment

Choose a reason for hiding this comment

yashNaN left a comment

Choose a reason for hiding this comment

yangjuns commented Mar 22, 2019 •

edited

codecov-io commented Mar 23, 2019 •

edited

yangjuns Mar 23, 2019 •

edited