Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PostgreSQL - Reduce memory usage by not tracking metadata of the files backed up #1101

Merged
merged 3 commits into from Nov 25, 2021

Conversation

krnaveen14
Copy link
Contributor

@krnaveen14 krnaveen14 commented Sep 26, 2021

Database name

PostgreSQL

Pull request description

Similar to Issue #738 , WAL-G consumes huge memory on backing up instance with millions of files. This is due to tracking the metadata of each file, maintaining the list of files in each TarFileSet and finally marshalling the data structure to text for creating sentinel.json file. Even though PR #740 tries to address the marshalling part by streaming / pipe it, significant memory usage on BackupSentinelDto data structure still persists.

Based on the BackupSentinelDto usage, it is needed only for configurations such as DELTA, INCREMENTAL, RATING_COMPOSER, catchup-push etc... Since we don't use any of the above mentioned configs, Files and TarFileSets data in BackupSentinelDto simply unused.

To reduce the memory usage of WAL-G on instances with millions of files and to make it available & configurable for the community as whole, NopTarBallComposer & NopBundleFiles is implemented which does not track any of the file metadata, list of files in each TarFileSet and the list of TarFileSets itself.

reduce-memory-usage is supported on both local and remote backups.

Usage : wal-g backup-push [path] --reduce-memory-usage or WALG_REDUCE_MEMORY_USAGE=true environment variable

( Successor to PR #1037 )

Please provide steps to reproduce (if it's a bug)

Create 500+ Databases with 1000+ Tables in each Database (not necessary to insert any data). Execute WAL-G backup-push to see the memory usage of wal-g gradually increases over time and failing with OOM after exhausting available memory (10GB in our case).

@krnaveen14
Copy link
Contributor Author

@usernamedt I feel naming of --reduce-memory-usage is pretty naive. Any suggestions regarding this would be welcome.

cmd/pg/backup_push.go Outdated Show resolved Hide resolved
@krnaveen14
Copy link
Contributor Author

@usernamedt Can you review and approve the workflows?

@krnaveen14
Copy link
Contributor Author

@usernamedt Can you review and approve the workflows?

@usernamedt A Gentle Reminder!

@usernamedt
Copy link
Member

@usernamedt Can you review and approve the workflows?

@usernamedt A Gentle Reminder!

Hi! Currently, we plan to merge this PR first and then proceed to this one.

@usernamedt
Copy link
Member

Hi! We merged the #1114, can you please rebase this PR onto it?

@krnaveen14
Copy link
Contributor Author

Hi! We merged the #1114, can you please rebase this PR onto it?

Sure

…s backed up

By default, WAL-G tracks metadata of the files backed up. If millions of files are backed up (typically in case of hundreds of databases and thousands of tables in each database), tracking this metadata alone would require GBs of memory.

If `--reduce-memory-usage` or `WALG_REDUCE_MEMORY_USAGE` is enabled, WAL-G does not track metadata of the files backed up. This significantly reduces the memory usage on instances with `> 100k` files.

Limitations

* Cannot be used with `rating-composer`, `copy-composer`
* Cannot be used with `delta-from-user-data`, `delta-from-name`, `add-user-data`
…hout-files-metadata (WALG_WITHOUT_FILES_METADATA)
@@ -62,9 +62,8 @@ func chooseTablespaceSpecification(sentinelDtoSpec, spec *TablespaceSpec) *Table

// TODO : unit tests
// deltaFetchRecursion function composes Backup object and recursively searches for necessary base backup
func deltaFetchRecursionOld(backupName string, folder storage.Folder, dbDataDirectory string,
func deltaFetchRecursionOld(backup Backup, folder storage.Folder, dbDataDirectory string,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is not related to the files metadata uploading, but helps to avoid downloading it twice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants