Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sql table benchmark #7

Merged
merged 8 commits into from Mar 23, 2019
Merged

Conversation

yangjuns
Copy link
Collaborator

@yangjuns yangjuns commented Mar 22, 2019

Add Benchmarks

  • SimpleInsert
  • SingleVersionSequentialRead
  • SingleVersionRandomRead
  • MultiVersionMismatchSequentialRead
  • MultiVersionMismatchRandomRead
  • MultiVersionMatchSequentialRead
  • MultiVersionMatchRandomRead
  • ConcurrentInsert
  • ConcurrentSingleVersionRead
  • ConcurrentMultiVersionRead

@yangjuns yangjuns self-assigned this Mar 22, 2019
@yangjuns yangjuns added the In Progress This PR is in progress and not ready to be reviewed label Mar 22, 2019
Yangjun Sheng added 2 commits March 23, 2019 12:54
@codecov-io
Copy link

codecov-io commented Mar 23, 2019

Codecov Report

Merging #7 into schema_change will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@              Coverage Diff               @@
##           schema_change       #7   +/-   ##
==============================================
  Coverage          90.76%   90.76%           
==============================================
  Files                 82       82           
  Lines               3442     3442           
==============================================
  Hits                3124     3124           
  Misses               318      318
Impacted Files Coverage Δ
src/storage/sql_table.cpp 99.17% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7c88a17...7bf8981. Read the comment docs.

// Insert the num_inserts_ of tuples into a DataTable in a single thread
// Insert the num_inserts_ of tuples into a SqlTable concurrently
// NOLINTNEXTLINE
BENCHMARK_DEFINE_F(SqlTableBenchmark, ConcurrentInsert)(benchmark::State &state) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying understand how these benchmarks work. What does State represent? I couldn't find it defined anywhere

Copy link
Collaborator Author

@yangjuns yangjuns Mar 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It uses google benchmark (gbenchmark).
https://github.com/google/benchmark

LOGGING_DISABLED);
std::vector<storage::TupleSlot> read_order;
for (uint32_t i = 0; i < num_reads_; ++i) {
read_order.emplace_back(table_->Insert(&txn, *redo_, storage::layout_version_t(0)));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case wouldn't the insert time also be included in the benchmark? Wouldn't it be more accurate to move to some other set up function that isn't included in the benchmark time?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The process time only includes code inside

for (auto _ : state) {
    ...
}

Basically, the final benchmark number is calculated by
state.SetItemsProcessed(state.iterations() * num_inserts_); divided by
the time spent running the loop

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

google benchmark might run each benchmark multiple times to get a more accurate result.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for clarifying that.

LOGGING_DISABLED);
std::vector<storage::TupleSlot> read_order;
for (uint32_t i = 0; i < num_reads_; ++i) {
read_order.emplace_back(table_->Insert(&txn, *redo_, storage::layout_version_t(0)));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing as above, wouldn't it be better to exclude the insert time from this benchmark

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

std::vector<catalog::Schema::Column> new_columns(columns_.begin(), columns_.end() - 1);
new_columns.emplace_back("", type::TypeId::BIGINT, false, col_oid);
catalog::Schema new_schema(new_columns, storage::layout_version_t(1));
table_->UpdateSchema(new_schema);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The update schema should probably also be not included in the benchmark time. This is an expensive operation and just this would make this slower than reading from a single version, so it wouldn't really give us an accurate benchmark comparison

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

// NOLINTNEXTLINE
for (auto _ : state) {
for (uint32_t i = 0; i < num_inserts_; ++i) {
table_->Select(&txn, read_order[i], read_pr, pair.second, storage::layout_version_t(1));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sequential read benchmarks, shouldn't we be using scan, not select?

std::vector<catalog::Schema::Column> new_columns(columns_.begin(), columns_.end() - 1);
new_columns.emplace_back("", type::TypeId::BIGINT, false, col_oid);
catalog::Schema new_schema(new_columns, storage::layout_version_t(1));
table_->UpdateSchema(new_schema);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing as above comment about update schema

LOGGING_DISABLED);
std::vector<storage::TupleSlot> read_order;
for (uint32_t i = 0; i < num_reads_; ++i) {
read_order.emplace_back(table_->Insert(&txn, *redo_, storage::layout_version_t(0)));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing as above, about inserts not being in benchmark time.

Copy link
Owner

@yashNaN yashNaN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of the benchmarks are measuring read times, but during those the inserts and update schema are included. Isn't that measuring the wrong thing. Multi-versioned read will be slower that normal reads simply because update schema is included in the multi-versioned read benchmark.

@yangjuns yangjuns added ReadyForReview This PR is ready for review and removed In Progress This PR is in progress and not ready to be reviewed labels Mar 23, 2019
Copy link
Owner

@yashNaN yashNaN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@jrolli jrolli merged commit d81f871 into yashNaN:schema_change Mar 23, 2019
@yangjuns yangjuns deleted the sql_table_benchmark branch March 25, 2019 19:08
yashNaN pushed a commit that referenced this pull request Sep 4, 2019
* Update friends

* Add builder

* Fix header typo

* Fix build error

* Fix build error

* Fix build error

* Fix build error

* Fix build error

* Fix build error

* Doxygen update

* Improve bootstrap

* Fix build errors

* Fix build errors

* Fix build errors

* Fix build errors

* Implement allocation of DatabaseCatalog

* Add additional indexes

* Fix includes in database_catalog.h

* Fix includes in database_catalog.h

* Fix namespace error

* Update index metadata for IndexSchema

* Fix build errors

* Fix build errors

* Add missing functions

* Fix build errors

* Update TPCC to use new API

* Various updates

* Update

* Snapshot before hackathon

* Fix index operations

* Fix index operations

* Add database catalog implementation file

* Implement DatabaseCatalog tear down function

* Add header to implementation

* Avoid double defs

* Add missing includes

* Catalog Accessor (#2)

* accessor functions

* lint

* Remove obsolete code

* PG_Type Bootstrapping (#3)

* PG_Type bootstrap

* Quick fixes

* Remove double def

* Fix undeclared variables

* Fix undeclared variables

* Fix reinterpret

* Pass pointers and fix loops

* Change to unsigned

* Fix pci

* Fix C++17-isms

* catalog class fix (#4)

* fix builder build unique index

* some typos like `Initialize` -> `InitializeRow`

* fix typos

* Fix errors

* Catalog refactor (#6)

* Add pg_namespace & pg_attribute

* Change function names

* Add CreateIndex (#7)

* PG_Type bootstrap

* Quick fixes

* Add support for CreateIndex to catalog

* Add #define that was lost in merge

* Fix dumb merge conflict

* Remove lambda and remove memcpy in pg_type bootstrapping

* So much good stuff (#8)

* CreateTable.

* GetTableOid

* Add checks to GetTableOid.

* Fix memory leak in Get stuffs.

* DeleteTable.

* support getIndexOid

* SetTablePointer.

* GetTable.

* add comments for getClassOidKind

* Fix classes_name_index_ accesses.

* Remove dead code.

* Miscellaneous fixes

* Fix build errors and add copy constructor for IndexSchema::Column

* Fix build errors

* More Index catalog functions and general fixes (#9)

* PG_Type bootstrap

* Quick fixes

* Add support for CreateIndex to catalog

* Add #define that was lost in merge

* Fix dumb merge conflict

* Remove lambda and remove memcpy in pg_type bootstrapping

* Add GetIndex

* Implement GetIndexes

* Implement DeleteIndex

* Add missing StageDelete calls

* Remove using std::max to allocate buffer

* Add missing argument to declaration

* Add TODO for deleting of index schema

* Multiple fixes

* Fix errors

* Fix errors

* Fix errors

* Fix errors

* Fix errors

* Fix errors

* Fix errors

* Fix errors

* Fix errors

* Fix errors

* Fix errors

* Fix errors

* Fix errors

* Fix errors

* Fix errors

* Fix errors

* Fix errors

* Fix errors

* Fix errors

* Fix errors

* Fix errors

* Fix errors

* Fix errors

* Fix build errors

* Revert constant value expression to master

* Fix API calls

* Regexes are awesome!

* Will anyone see this?

* You've got a friend in me...

* Fix plan_node_json_test.cpp

* Fix syntax

* Fix Column constructors

* Fix parser/planner oddities

* Fix create table plan node

* Fix create table plan node

* Fix create table plan node

* Fix create table plan node

* Fix create table plan node

* Strip old assumptions out of deferred actions test

* Fix bwtree index test

* Fix plan node JSON test

* Fix typo

* Fix typo

* Fix typo

* Fix static declaration

* Fix TPCC loader

* Fix log test

* Fix log test

* Fix tests

* Fix tests

* Fix tests

* Ninja format

* Update CreateDatabase API

* Finish bootstrap logic

* Add missing includes

* Add missing includes and fix typos

* Fix API call

* Fix API call

* Fix API

* Fix API call

* Fix catalog test

* Fix catalog test

* Fix catalog test

* Fixes

* Fixes

* Fixes

* Fixes

* Fixes

* Fixes

* Fixes

* Fixes

* Fixes

* Fixes

* Fixes

* Fixes

* Fixes

* Fixes

* Fixes

* Fix error in constant value expression

* Fix constructor error

* Fix constructor error

* Fix constructor error

* Fix clang error

* Fix clang error

* Fix BWTree Index test

* Fix BWTree Index test

* Fixes

* Ninja format

* Fixes?

* Fixes?

* Fixes?

* Fixes?

* Fixes?

* Ninja format

* Fix bug in IndexBuilder when computing GenericKey size that could lead to out of bounds access. Added a test to exercise this, along with stronger asserts throughout the code to check this as well.

* C++ is not Python...

* Matt's awesome!

* Matt's less awesome...

* Fix typo

* Add assert

* Fix typo

* Use larger buffer

* Improve error message?

* Improve error message?

* Fix test

* Fix missing StageDelete

* Fix missing index creation

* Fix missing logic error

* Fix missing search path

* Fix missing search path

* Fix ptr error

* Fix buffer reuse in DeleteTable and DeleteIndex.

* Call teardown

* Cleanup buffer

* Cleanup once more?

* Fix catalog tests

* Fix catalog tests

* Add a must_abort_ flag to TransactionContext that is then checked in TransactionManager::Commit(). Currently, it's flipped in indexes (unique insert fails) and SqlTable (write-write conflict).

* Fix catalog tests

* Move ProcessDeferredActions after the unlink step.

* Uniqueness

* Or not...

* Clean up memory leak with varlen index keys.

* A couple TODOs. Currently passes all existing tests without leaking memory.

* clang-format.

* lint fixes.

* Add index test

* Add index test

* Add index test

* Add index test

* tidy fixes.

* Improve index test

* Improve index test

* Improve index test

* Improve index test

* doxygen fixes.

* Remove bad assert

* Fix type

* Fix leak

* Fix leak

* Fix leak

* Fix leak

* Fix leak

* Format for Matt

* Fix Release mode build.

* Add search path test

* Add search path test

* Add search path test

* Add search path test

* Fix tidy error in test

* DatabaseCatalog::CreateNamespace cleanup.

* DatabaseCatalog::DeleteNamespace cleanup.

* DatabaseCatalog::GetNamespaceOid cleanup.

* DatabaseCatalog::CreateAttribute cleanup.

* DatabaseCatalog::GetAttribute cleanup.

* DatabaseCatalog::GetAttributes cleanup.

* DatabaseCatalog::DeleteColumns cleanup.

* DatabaseCatalog::GetClassOidKind cleanup.

* DatabaseCatalog::GetIndexes cleanup.

* DatabaseCatalog::SetTablePointer and DatabaseCatalog::SetIndexPointer cleanup.

* Move that VarlenEntry helper method out of pg_attribute.h. Fixed up Doxygen after today's refactors.

* tidy.

* catalog.cpp cleanup.

* Address PR review stuffs.

* Refactor IndexKeySchema::Column signature.

* Fix TPCC schemas.h

* Update builder

* Update API calls

* Update API calls

* Update API calls

* Schema and IndexSchema API consolidation for use in templated catalog methods.

* Remove catalog folder from benchmarks because who benchmarks a catalog?

* Remove old TODO.

* More Schema and IndexSchema cleanup.

* More Schema and IndexSchema cleanup.

Still working on pg_attribute-related fixes, but we're falling down a template rabbit-hole that I might need to excavate myself from soon. It's just getting worse the more we templatize it.

* Explicit instantiation for pg_attribute functions finally compiles. DeleteColumns also removes the entries from pg_attribute.

* Add assert to IndexSchema constructor regarding is_unique and is_primary. Also added one in the IndexBuilder to make sure is_unique matches ConstraintType.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ReadyForReview This PR is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants