Optimization - CPU #11

tasket · 2018-12-14T16:42:34Z

Changes that may improve throughput, especially for send:

Multithreaded encoding layer for compression and encryption
Other areas for concurrency: getting deltas, dedup init, send and receive main loops
Alternatives to tar streaming, such as direct file IO for internal: destination
Static buffers to avoid garbage collection
Structs, especially in deduplication code
Explore new Python optimization options
Tighten the main send loop, use locals
Use formats instead of + string concat
Quicker compression – issue Zstandard compression #23

tasket · 2018-12-19T23:04:54Z

An optimization attempt was posted to the optimize1 branch. Unfortunately, the limited testing I did showed little if any difference.

I may try re-introducing some of these changes on top of alpha4 and do some more extensive trials.

tasket · 2019-02-22T23:35:13Z

(Note) Some optimization of prune/merge was recently done by setting LC_ALL=C and using -m merge where possible. In some cases this slashes pruning time by more than 75%.

tasket · 2024-02-23T22:12:58Z

#179 (comment)

An interesting observation.

The first 3 blurps of traffic are:
with 128K blocks
with 2048K blocks
with 2048K blocks and compression set to zstd:1
Those tests were run a against a 500GB test volume.
The 4th block is the 22TB volume
I would expect the B/s to be in the same general ball-park but there is something going on.

tasket · 2024-02-23T22:17:11Z

@alvinstarr I think what may be going on is the large difference in Cpython's garbage collection workload due to dynamic buffering having to juggle larger buffers. (Hmmm. Does a 1MB buffer behave much differently?)

Wyng does not yet use static buffering for transfer operations. And I always suspected that locally-based archives would someday throw performance issues that were masked by net access into high relief (as your benchmark just did).

It would also be interesting to see the difference, for instance, with the helper script removed from the local transfer loop. That in combination with using static buffers could make a big difference, IMO. However, the limitations of the zstandard lib I'm currently using precludes static buffering.

One really cheap (and safe) tweak you could try in the Wyng code is to remove the file IO buffering= parameter, letting Cpython adjust automatically:

    # Open source volume and its delta bitmap as r, session manifest as w.
    with open(volpath,"rb", buffering=chunksize) as vf,    \

(Moved to issue 11.)

tasket · 2024-02-26T03:36:31Z

@alvinstarr I've tested a simple modification to Wyng that bypasses the tarfile streaming when the destination is a local filesystem. This improves the throughput by 17%.

The parameters I'm using are: Same SSD source and destination, 2MB chunks, zstd:0 and no encryption.

tasket · 2024-02-26T03:46:59Z

BTW, removing buffering=chunksize parameter did not improve throughput.

alvinstarr · 2024-02-28T04:09:34Z

got side tracked a bit.
I will take a look at your changes and see if it helps our situation.

We have a backup and an incremental of our 27TB volume.
Its good to here about the speed speed improvement since it took 4 days to run the 27TB backup.
The first incremental took close to 2 days.

We are looking at copying the our backups to a remote location so that we can have off-site storage.
As part of that process we started copying using rsync and scp but we ran into bdp problems(bandwidth delay product).
to get around this we are using bbcp([https://www.slac.stanford.edu/~abh/bbcp/])
The speed improvement has been from 20Mbs to 500Mbps by using bbcp.
Not sure if you can leverage it but bbcp can provide a huge speed improvement for large data sets.

tasket · 2024-02-29T19:56:19Z

@alvinstarr Sorry, I got sidetracked as well. I just posted the --tar-bypass optimization for send after fixing some bugs. Use it with send when the dest archive is local (file:/...). It will indicate it is bypassing the tar stream.

The kicker is that while I'm seeing throughput increase >17% on an archive with 2MB chunks, I also tried an archive with 1MB chunks. It appears that the 1MB chunk size is the fastest for sending an initial/whole volume, regardless of whether tar-bypass is used, and the gain in throughput going from 2MB to 1MB is about 25%.

Overall, sending to a 1MB archive while using the --tar-bypass I saw as much as 37% gain in throughput.

The tar-bypass is considered experimental at this point, although I don't anticipate it causing any issues.

Thanks for the tip about bbcp for backup copies. To be useful inside of Wyng, a copy/archive utility would have to handle streams from memory as well as files (this is why Python's tarfile lib was used). To get a similar muti-thread, multi-stream boost in wyng send will probably require using asyncio or one of the new multiprocess options.

tasket · 2024-02-29T23:16:46Z

@alvinstarr PS - You may want to look into bbcp's behavior when updating existing file sets. The documentation has scant info on that subject and I could not figure out if it would skip files based on file timestamps, for example. It also doesn't seem to have a delete feature. So my own preference would be, after the initial bbcp copy, to use rsync -aH --delete to update the offsite backups. Of course, if you have doubts about the effect of a copy or update, you can always use Wyng's arch-check feature to check the integrity of the copy.

alvinstarr · 2024-03-01T02:25:02Z

bbcp is defiantly not a replacement for rsync because of delete and sync of existing files like you mentioned.
Also for lots of small files like we are running here the best way to use bbcp is in pipe mode with tar on both ends.

bbcp may not integrate well into what your doing but it may be possible to leverage the knowledge and work that has gone into doing the network socket processing.

tasket added the help wanted Extra attention is needed label Jul 27, 2020

tasket added the optimization label Jun 22, 2023

tasket mentioned this issue Feb 20, 2024

Memory leak? #179

Closed

tasket added a commit that referenced this issue Feb 29, 2024

Implement --tar-bypass to optimize send to local file:, issue #11

8e20b5b

tasket added a commit that referenced this issue Mar 1, 2024

Fix destpath reference error, issue #11

218d38e

tlaurion mentioned this issue May 4, 2024

[Contribution] qubes-incremental-backup-poc OR Wyng backup QubesOS/qubes-issues#858

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization - CPU #11

Optimization - CPU #11

tasket commented Dec 14, 2018 •

edited

Loading

tasket commented Dec 19, 2018

tasket commented Feb 22, 2019

tasket commented Feb 23, 2024 •

edited

Loading

tasket commented Feb 23, 2024

tasket commented Feb 26, 2024 •

edited

Loading

tasket commented Feb 26, 2024

alvinstarr commented Feb 28, 2024

tasket commented Feb 29, 2024

tasket commented Feb 29, 2024

alvinstarr commented Mar 1, 2024 •

edited

Loading

Optimization - CPU #11

Optimization - CPU #11

Comments

tasket commented Dec 14, 2018 • edited Loading

tasket commented Dec 19, 2018

tasket commented Feb 22, 2019

tasket commented Feb 23, 2024 • edited Loading

#179 (comment)

tasket commented Feb 23, 2024

tasket commented Feb 26, 2024 • edited Loading

tasket commented Feb 26, 2024

alvinstarr commented Feb 28, 2024

tasket commented Feb 29, 2024

tasket commented Feb 29, 2024

alvinstarr commented Mar 1, 2024 • edited Loading

tasket commented Dec 14, 2018 •

edited

Loading

tasket commented Feb 23, 2024 •

edited

Loading

tasket commented Feb 26, 2024 •

edited

Loading

alvinstarr commented Mar 1, 2024 •

edited

Loading