Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: VReplication Defer Secondary Index Creation #11699

Closed
mattlord opened this issue Nov 12, 2022 · 0 comments · Fixed by #11700
Closed

Feature Request: VReplication Defer Secondary Index Creation #11699

mattlord opened this issue Nov 12, 2022 · 0 comments · Fixed by #11700

Comments

@mattlord
Copy link
Contributor

mattlord commented Nov 12, 2022

Feature Description

For MoveTables and Reshard , VReplication should defer secondary index creation for a table until after the table has been copied.

As a quick example, you can see ~ 20% improvement for a small dataset this way:

cd /tmp
mkdir /tmp/myidxtestdata
git clone https://github.com/mattlord/test_db.git
cd test_db

# Install MySQL 8.0.31: https://dev.mysql.com/downloads/mysql/
mysqld --initialize-insecure --datadir=/tmp/myidxtestdata
mysqld --no-defaults --datadir=/tmp/myidxtestdata &

# Load data with secondary indexes
time mysql -u root < ./employees.sql
# ~ 25 seconds on avg
mysql -u root -e "drop database employees"

# Load data without secondary indexes and add at the end
time mysql -u root < ./employees_deferred_keys.sql
# ~ 20 seconds on avg
mysql -u root -e "drop database employees"

mysqladmin -u root shutdown
rm -rf /tmp/myidxtestdata

In practice we've seen upwards of 10x improvement when importing tables that are hundreds of GiBs when creating the secondary indexes after the tables have been copied during a MoveTables.

The one-shot secondary index creation after table copy should be even faster with MySQL 8.0.31+ due to InnoDB support for parallel index builds:

InnoDB: InnoDB now supports parallel index builds, which improves index build performance. In particular, loading sorted index entries into a B-tree is now multithreaded. Previously, this action was performed by a single thread.

Use Case(s)

  • Moving/Importing very large tables
  • Resharding very large tables

Note: The value of doing this becomes exponential as the table size and number of secondary indexes grow. We've seen some cases where the time to copy a very large table goes from days to hours.

@mattlord mattlord self-assigned this Nov 12, 2022
@mattlord mattlord added this to Backlog in VReplication via automation Nov 12, 2022
@mattlord mattlord moved this from Backlog to Prioritized in VReplication Nov 12, 2022
@mattlord mattlord moved this from Prioritized to In progress in VReplication Nov 12, 2022
VReplication automation moved this from In progress to Done Jan 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging a pull request may close this issue.

1 participant