New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PostgreSQL - Reduce memory usage by not tracking metadata of the files backed up #1101
Conversation
de8a0b9
to
9e641b1
Compare
@usernamedt I feel naming of |
@usernamedt Can you review and approve the workflows? |
@usernamedt A Gentle Reminder! |
Hi! Currently, we plan to merge this PR first and then proceed to this one. |
Hi! We merged the #1114, can you please rebase this PR onto it? |
Sure |
…s backed up By default, WAL-G tracks metadata of the files backed up. If millions of files are backed up (typically in case of hundreds of databases and thousands of tables in each database), tracking this metadata alone would require GBs of memory. If `--reduce-memory-usage` or `WALG_REDUCE_MEMORY_USAGE` is enabled, WAL-G does not track metadata of the files backed up. This significantly reduces the memory usage on instances with `> 100k` files. Limitations * Cannot be used with `rating-composer`, `copy-composer` * Cannot be used with `delta-from-user-data`, `delta-from-name`, `add-user-data`
…hout-files-metadata (WALG_WITHOUT_FILES_METADATA)
4859943
to
70260bc
Compare
@@ -62,9 +62,8 @@ func chooseTablespaceSpecification(sentinelDtoSpec, spec *TablespaceSpec) *Table | |||
|
|||
// TODO : unit tests | |||
// deltaFetchRecursion function composes Backup object and recursively searches for necessary base backup | |||
func deltaFetchRecursionOld(backupName string, folder storage.Folder, dbDataDirectory string, | |||
func deltaFetchRecursionOld(backup Backup, folder storage.Folder, dbDataDirectory string, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is not related to the files metadata uploading, but helps to avoid downloading it twice.
70260bc
to
b7abd11
Compare
Database name
PostgreSQL
Pull request description
Similar to Issue #738 , WAL-G consumes huge memory on backing up instance with millions of files. This is due to tracking the metadata of each file, maintaining the list of files in each TarFileSet and finally marshalling the data structure to text for creating
sentinel.json
file. Even though PR #740 tries to address the marshalling part by streaming / pipe it, significant memory usage onBackupSentinelDto
data structure still persists.Based on the
BackupSentinelDto
usage, it is needed only for configurations such asDELTA
,INCREMENTAL
,RATING_COMPOSER
,catchup-push
etc... Since we don't use any of the above mentioned configs,Files
andTarFileSets
data inBackupSentinelDto
simply unused.To reduce the memory usage of WAL-G on instances with millions of files and to make it available & configurable for the community as whole,
NopTarBallComposer
&NopBundleFiles
is implemented which does not track any of the file metadata, list of files in each TarFileSet and the list of TarFileSets itself.reduce-memory-usage
is supported on both local and remote backups.Usage :
wal-g backup-push [path] --reduce-memory-usage
orWALG_REDUCE_MEMORY_USAGE=true
environment variable( Successor to PR #1037 )
Please provide steps to reproduce (if it's a bug)
Create 500+ Databases with 1000+ Tables in each Database (not necessary to insert any data). Execute WAL-G
backup-push
to see the memory usage ofwal-g
gradually increases over time and failing with OOM after exhausting available memory (10GB in our case).