Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve MySQL restore performance #1131

Merged
merged 20 commits into from Nov 23, 2021
Merged

Improve MySQL restore performance #1131

merged 20 commits into from Nov 23, 2021

Conversation

ostinru
Copy link
Contributor

@ostinru ostinru commented Oct 20, 2021

This patch improves wal-g-mysql backup-push/backup-fetch by splitting single-stream backup into multi-stream backups:

In order to provide database-agnostic approach we don't make assumptions about backup-stream wal-g getting from stdin.
Instead, wal-g splits backup stream in blocks of WALG_STREAM_SPLITTER_BLOCK_SIZE bytes (in storages/splitmerge/splitreader.go). Then blocks are sent to one of WALG_STREAM_SPLITTER_PARTITIONS output streams. Each stream compressed, encrypted and uploaded separately.
Restore process is doing the opposite: it fetches multiple streams (utilize network by using multiple connections), then each stream can be decrypted and decompressed independently (utilize more CPU cores). Then it merges (in storages/splitmerge/mergewriter.go) all streams into one stream.

In my benchmarks wal-g backup-fetch LATEST --turbo:

  • WAL-G VANILLA - produces 70 mbs of data written to ssd
  • WAL-G patched (6 streams in blocks of 64kb) - produces 270mbs of data written to ssd

@ostinru ostinru requested review from mialinx and a team as code owners October 20, 2021 18:18
internal/backup_fetch_handler.go Outdated Show resolved Hide resolved
internal/compression/compression_default.go Outdated Show resolved Hide resolved
internal/config.go Outdated Show resolved Hide resolved
internal/databases/mysql/backup_push_handler.go Outdated Show resolved Hide resolved
pkg/storages/storage/splitter.go Outdated Show resolved Hide resolved
pkg/storages/storage/mergewriter.go Outdated Show resolved Hide resolved
pkg/storages/storage/mergewriter.go Outdated Show resolved Hide resolved
continue
}
rbytes := len(block)
wbytes, err := sink.Write(block)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

io.Writer may write block partially without error
We should retry with the rest of the block until it's over or error returned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acoarding to io.Writer documentation:
Write must return a non-nil error if it returns n < len(p).

pkg/storages/storage/mergewriter.go Outdated Show resolved Hide resolved
pkg/storages/s3/s3reader.go Outdated Show resolved Hide resolved
internal/uploader.go Outdated Show resolved Hide resolved
@@ -155,3 +196,19 @@ func (uploader *Uploader) UploadMultiple(objects []UploadObject) error {
}
return nil
}

func (uploader *Uploader) ChangeDirectory(relativePath string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that SetFolder(storage.Folder) would be more universal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetFolder sounds like absolute path. Change Directory is like uniux cd

internal/stream_push_helper.go Outdated Show resolved Hide resolved
pkg/storages/splitmerge/channelreader.go Outdated Show resolved Hide resolved
)

type BackupStreamMetadata struct {
Type string `json:"type"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this metadata does not consume much space, maybe just add it to the sentinel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will increase coupling between uploader metadata and sentinel

}

func UploadBackupStreamMetadata(uploader UploaderProvider, metadata interface{}, backupName string) error {
sentinelName := MetadataNameFromBackup(backupName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this name interferes with the Postgres metadata.json file, but has the different purpose. Since this file contains only the stream-specific metadata maybe name it like stream_metadata.json?

}

uploader := &SplitStreamUploader{
Uploader: &Uploader{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pass the Uploader to the NewSplitStreamUploader to avoid constructing the internal uploader. Actually, can you use the UploaderProvider interface here?

 * Improve download speed by splitting backup stream in multiple blobs. Each of which can be decoded & decompressed independetly
 * use background readahead goroutine to separate decoding and decompressing stages.
 * add concurrent range-based S3-reader (WIP)
 * save type of the backup in sentinel
 * use sentinel in backup-fetch command
 * improve error handling
…allelism.

  However it is better to just increase `WALG_STREAM_SPLITTER_PARTITIONS`: in this case
  parallelism increased without adding unnecessary memory copy.
…hronously copy data with size greater than 2*blockSize.
@ostinru ostinru merged commit a4caa14 into master Nov 23, 2021
@ostinru ostinru deleted the readahead branch November 23, 2021 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants