Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vreplication: performance & benchmarks (table copying) #7881

Merged
merged 6 commits into from
Apr 21, 2021

Conversation

vmg
Copy link
Collaborator

@vmg vmg commented Apr 16, 2021

Description

These upcoming weeks I'm going to be working on vreplication performance. It's a pretty complex part of Vitess' codebase and it hasn't received a lot of performance tuning in the past so I'm hoping to get up to speed with the implementation and give it some perf attention.

Let's take a look at the changes in this PR, commit by commit:

  • endtoend: improve vreplication tests: Critically, there are currently no benchmarks for vreplication, and it turns out, the system is complex enough (with interdependencies between vtctl, vtgate, vttablet and mysqld) that writing microbenchmarks for it is not feasible. To get a starting point for performance tuning instead, I've written some end-to-end performance tests, which give us benchmarking estimates and let us look at the performance profiles for all the moving parts in the system. The first test included here tackles the simplest (possibly?) and most performance sensitive part of replication: the initial table copying.

    For those following at home, I might as well explain a little bit some vreplication basics:

    A vreplication stream links two vttablet instances by copying and keeping in sync one or more tables between the underlying mysqld instances of each tablet. To accomplish this, the stream is divided in three phases: first off, there is a table copying phase (what we're profiling and optimizing in this PR), which does bulk reads from the source tablet and sends them to the target for bulk inserting. Afterwards, there is a catchup phase, which catches up any row changes that have become outdated during the initial copy of the table. Once the stream has caught up and there's no significant replication lag between the two tablets, we switch to the final phase where we replicate rows in real time from the MySQL binary log.

    The table copying phase is, for many vreplication workflows, the most performance intensive part of the process. For instance, if you're using vreplication to re-shard a cluster, you'll want to copy the tables as fast as possible, and once the target has caught up, you want to switch traffic to the new sharding. Hence, you'd spend the vast majority of time doing a copy, and very little time actually streaming. Other uses, just as materializing views, make much more use of streaming, but streaming can often keep up with the changes in the source cluster, because most clusters are not write-heavy, so optimizing that doesn't feel as urgent to me. We'll look at the streaming in the following weeks.

  • pprof: do not create profiles until we start profiling: this is a small cleanup to the profiling handlers that we wrote for vtgate and vttablet. These profiling handlers don't start profiling in their respective processes until they receive an USR2 signal, but they were previously creating an on-disk profile as soon as the process started. Since our integration suite doesn't profile all the available tablets in the integration cluster, this resulted in several zero-sized profiles on disk for the tablets that were not being profiled.

  • endtoend: allow bulk-loading databases: another important cleanup for our endtoend suite. We need very large datasets in MySQL in order to gather realistic profiling benchmarks, and building these for each test run can be very expensive (expensive enough to make the profile-optimize-build cycle very painful). To speed up this operation, this commit improves Vitess' mysqlctl helper so that spawned mysqld instances can have a configurable secure-file-priv path. This flag allows us to use MySQL's LOAD DATA INFILE to bulk-load millions of rows from CSV files on disk.

  • vreplication: improve copy performance: this is the first actual optimization from the CPU profiles of our copy test. It tackles table copy performance on the target side (i.e. the one receiving the rows to be inserted locally). The result is very good, it reduces CPU usage by 60% and memory allocations by 90%. Unfortunately, the effective performance improvement in wall time is not as large: between 15% and 20% faster runtime.

    Some thoughts on this: one of the issues that we've seen in large production clusters that use vreplication is that, since the replication streams connect two vttablets, the replication process is competing for CPU cycles with the actual query serving code, which also runs in the vttablets. From this point of view, the massive CPU reduction in this optimization is a very good thing, because it means more spare CPU cycles to continue serving MySQL queries from the tablet. Why does it translate into only a 15% reduction in runtime? If we look at a syscall blocking profile of the tablet, the answer is clear: the optimization of the query inserts is so comprehensive that it brings down the runtime of the table copying process all the way down to MySQL insert performance. The copying process cannot go any faster because MySQL doesn't ingest our rows any faster.

    Not all is lost, of course: there are still things we can do to make MySQL ingest the rows faster. I intend to look into that next week. Another side effect of the fact that the receiving vttablet is now bottlenecked in MySQL insert performance, is that are no gains to be obtained by optimizing the row generation in the source vttablet -- in theory. In practice, the same issue we've just discussed also applies to the source tablet: the replication flow also competes for CPU cycles with the actual query load, so any optimizations we perform there, while not reducing the actual wall runtime of the replication process, will result in more query throughput for production servers, so I intend to tackle that next week too.

For now, let's wrap this up by taking a look at the impact of this optimization on the flame graphs for the vttablet process:

Before

image

After

image

We can see two clear columns in both profiles: Go runtime (on the left), and our own code (on the right). The profile of our own code is further divided in two clear columns: GRPC streaming reads (left) and query preparation for MySQL bulk inserts (right). The result optimizing the query preparation step is significant, and it shows in how the relative CPU usage of GRPC, which used to be roughly 25% of the runtime, now becomes 75% of the total runtime. Obviously, the actual CPU usage of GRPC is constant between the two profiles, so the fact that it now dominates our CPU profile means that the query preparation is basically using residual amounts of CPU. The next possible optimization here is Protobuf (de)serialization for the incoming rows, which I intend to do next week.

Related Issue(s)

Checklist

  • Should this PR be backported?
  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

Impacted Areas in Vitess

Components that this PR will affect:

  • Query Serving
  • VReplication
  • Cluster Management
  • Build/CI
  • VTAdmin

Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

Comment on lines 64 to 75
func (buf *Buffer) StringUnsafe() string {
return *(*string)(unsafe.Pointer(&buf.bytes))
}

func (buf *Buffer) Reset() {
buf.bytes = buf.bytes[:0]
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised the linter didn't catch this. Exported methods need comments.

@@ -284,6 +284,36 @@ func (v Value) EncodeSQL(b BinWriter) {
}
}

// EncodeSQLStringBuilder is identical to EncodeSQL but it takes a strings.Builder
// as its writer, so it can be inlined for performance.
func (v Value) EncodeSQLStringBuilder(b *strings.Builder) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have two sets of duplicated code in this file that are crying out for generics.
I don't suppose it is worth generating them (since it is only two copies).

@@ -211,7 +207,7 @@ func (vc *VitessCluster) AddKeyspace(t *testing.T, cells []*Cell, ksName string,
keyspace.VSchema = vschema
for _, cell := range cells {
if len(cell.Vtgates) == 0 {
fmt.Println("Starting vtgate")
t.Logf("Starting vtgate")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still get logs to console with this change?
I usually run go test -alsologtostderr -v.

@shlomi-noach
Copy link
Contributor

shlomi-noach commented Apr 19, 2021

Noting that I'm going to review this, probably slowly 😛

@vmg vmg force-pushed the vmg/vrperf branch 3 times, most recently from b59e211 to 5df0ff8 Compare April 20, 2021 10:01
vmg added 5 commits April 21, 2021 10:51
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
}
defer conn.Close()

query := fmt.Sprintf("LOAD DATA INFILE '%s' INTO TABLE `%s` FIELDS TERMINATED BY ',' ENCLOSED BY '\"'", tmpbulk.Name(), table)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes you are importing a relation that matches exactly the target table. Just making sure: LOAD DATA INFILE will not work this way if:

  1. the target table is a materialized view, and has more columns than original table, or:
  2. this is an Online DDL operation, and target table may look altogether different than source table

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this is supposed to be used only when loading data in the unit tests, so the author of the test must make sure that the CSV file it's writing actually fits in the database. I'm not sure how we could enforce that programatically.

Signed-off-by: Vicent Marti <vmg@strn.cat>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants