Performance hits of user choices #81

tlaurion · 2021-06-30T21:47:41Z

Sorry if I am a bit rigid on documentation. I have a hard time wrapping my head on current, where QubesOS costs on AppVM's LVM disk for storage for backup storage has known high overhead, so I thought of using --sparse-write to spread the cpu pinning costs over 2 CPUs, giving edge over --sparse, but don't see direct benefit.

Let me explain:

Doc today says:

--sparse | Receive volume data sparsely (implies --sparse-write)
--sparse-write | Overwrite local data only where it differs (receive)

Where detailed doc says:

--sparse-write

Used with receive, this option does not prevent Wyng from overwriting existing local volumes! The sparse-write mode merely tells Wyng not to create a brand-new local volume for receive, and results in the data being sparsely written into the volume instead. This is useful if the existing local volume is a clone/snapshot of another volume and you wish to save local disk space. It is also best used when the backup/archive storage is local (i.e. fast USB drive or similar) and you don't want the added CPU usage of full --sparse mode.

--sparse

The sparse mode can be used with the receive command to intelligently overwrite an existing local volume so that only the differences between the local and archived volumes will be fetched from the archive and written to the local volume. This results in reduced remote disk and network usage while receiving at the expense of some extra CPU usage on the local machine, and also uses less local disk space when snapshots are a factor (implies '--sparse-write`).

My understanding is that present code is not paralleling any work (in neither modes, think its another tticket), so one core performance would be the limitation of combined dom0 + qube virtualized IO backup storage in my use case (which happens over wyng-backups-vm storage).

I would still have expected --sparse-write (50% storage qubes, 50% dom0) to fasten the receive operation over --sparse (100% CPU hit for local calculation and less pulling over qubes's stored backups data) where the results seem to be as equal.

Maybe you could clarify or give a bit more of insights? Otherwise I will put timestamps in my scripts.

Pertinent notes on current archive.ini conf:

chunksize = 262144
compression = bz2
compr_level = 9
hashtype = sha256

Where bz2 was chosen for Head's current busybox support and where I lost track of the chunksize and hashtype costs, so you may shed some lights if you will! :)

Also note that https://git.busybox.net/buildroot/commit/?id=6bccac75ea3f8cd66bcde3747067add14b0c4f2c relies on python script... so not gonna happen soon under Heads.

The text was updated successfully, but these errors were encountered:

tasket · 2021-07-01T15:32:38Z

There is already an issue for generic CPU optimization. But you should know that receive has gotten very little of it so far. Probably after v0.5 we'll see some receive CPU optimization including parallel processing.

(BTW, I was able to make the wyng-extract.sh script somewhat parallel for compression because the data caching in /tmp presented the opportunity to do this easily. But Wyng itself doesn't cache data this way.)

The doc for --sparse-write doesn't mention that it introduces an extra step: comparison with the local volume data. Despite that, all chunks are retrieved from remote. Therefore, this option should be used specifically to optimize local disk space.

Re-Compression dominates receive --sparse out of necessity & bzip2 is CPU intensive. In the future we could use more efficient compression like zstd and lbzip2, as well as find ways to do more work in parallel. But for now this option is strictly "use CPU bandwidth to avoid costs on slow/expensive network".

And the respective efficiency of both options depends on just how much the archive copy differs with the local volume being over-written. (More difference = less efficiency.)

This type of issue is why backup tools like Wyng try to migrate to efficient compression libraries as soon as they can, bc uncompressed data chunks cannot be safely compared for --sparse use case. v0.3 now has Zstandard, which is fast but requires fairly recent OS version (Qubes 4.0 has lousy support for zstd). The picture for Wyng is quite complex bc libraries for new tools/formats come to Python slowly, and the decisions of repo managers in this respect is often shockingly bad; there is also a format stabilization issue w zstd that can defeat a sparse processing feature (IIRC this has also negatively impacted other backup tools) but that is slowly improving.

Hashtype is already fastest type, sha256. Chunk size IIRC was chosen to reduce the amount of metadata being sent over a slow network and reduce metadata that had to be verified in Heads env. You might try an archive configured with smaller chunk sizes (the default is 65536) to see how that impacts send/receive ops.

tlaurion · 2021-07-01T17:36:59Z

Did comparison to get virtualization and additional IO costs for --sparse vs --sparse-write.
Locally mounted LVM (dom0) vs QubesOS AppVM backup storage were not significant. Output below:

chunksize = 262144
compression = bz2
compr_level = 9
hashtype = sha256

Where Windows-standalone-root was chosen because it's the biggest LVM I had in hand, weighting 16Gb of backuped data on the backup storage compressed size (26465MiB on Thin LVM reported by QubesOS manager).

AppVM (QubesOS mode):


[user@dom0 ~]$ time sudo wyng --meta-dir=/var/lib/wyng-backups-vm -u receive vm-windows-10-standalone-root --sparse-write --verbose
Wyng 0.3.0rc2 20210622
Checking metadata... OK
Receiving volume : vm-windows-10-standalone-root 20210629-154433
Saving to logical volume '/dev/qubes_dom0/vm-windows-10-standalone-root'
100.00%
  Initial snapshot created for vm-windows-10-standalone-root

real	28m30.861s
user	25m13.284s
sys	2m0.635s

[user@dom0 ~]$ time sudo wyng --meta-dir=/var/lib/wyng-backups-vm -u receive vm-windows-10-standalone-root --sparse --verbose
Wyng 0.3.0rc2 20210622
Checking metadata... OK
Receiving volume : vm-windows-10-standalone-root 20210629-154433
Saving to logical volume '/dev/qubes_dom0/vm-windows-10-standalone-root'
100.00%
  Initial snapshot created for vm-windows-10-standalone-root

real	62m10.596s
user	52m39.610s
sys	7m57.865s

[user@dom0 ~]$ time sudo wyng --meta-dir=/var/lib/wyng-backups-vm -u receive vm-windows-10-standalone-root --sparse-write --verbose
Wyng 0.3.0rc2 20210622
Checking metadata... OK
Receiving volume : vm-windows-10-standalone-root 20210629-154433
Saving to logical volume '/dev/qubes_dom0/vm-windows-10-standalone-root'
100.00%
  Initial snapshot created for vm-windows-10-standalone-root

real	28m24.503s
user	25m1.602s
sys	2m0.908s

[user@dom0 ~]$ time sudo wyng --meta-dir=/var/lib/wyng-backups-vm -u receive vm-windows-10-standalone-root --sparse --verbose
Wyng 0.3.0rc2 20210622
Checking metadata... OK
Receiving volume : vm-windows-10-standalone-root 20210629-154433
Saving to logical volume '/dev/qubes_dom0/vm-windows-10-standalone-root'
100.00%
  Initial snapshot created for vm-windows-10-standalone-root

real	62m16.607s
user	52m43.738s
sys	7m59.969s

Locally mounted LVM in dom0 of same archive. But now no IO overhead+virtualization:


[user@dom0 ~]$ sudo wyng -u --meta-dir=/var/lib/wyng-backups-local-mount --from=internal:/ --subdir=/media/home/user/wyng-backups arch-init


[user@dom0 ~]$ sudo wyng -u --meta-dir=/var/lib/wyng-backups-local-mount --from=internal:/ --subdir=media/home/user/wyng-backups arch-init
Wyng 0.3.0rc2 20210622
[user@dom0 ~]$ time sudo wyng --meta-dir=/var/lib/wyng-backups-local-mount -u receive vm-windows-10-standalone-root --sparse --verboseWyng 0.3.0rc2 20210622
Checking metadata... OK
Receiving volume : vm-windows-10-standalone-root 20210629-154433
Saving to logical volume '/dev/qubes_dom0/vm-windows-10-standalone-root'
100.00%
  Initial snapshot created for vm-windows-10-standalone-root

real	61m52.233s
user	52m23.387s
sys	7m58.963s
[user@dom0 ~]$ time sudo wyng --meta-dir=/var/lib/wyng-backups-local-mount -u receive vm-windows-10-standalone-root --sparse-write --verbose
Wyng 0.3.0rc2 20210622
Checking metadata... OK
Receiving volume : vm-windows-10-standalone-root 20210629-154433
Saving to logical volume '/dev/qubes_dom0/vm-windows-10-standalone-root'
100.00%
  Initial snapshot created for vm-windows-10-standalone-root

real	28m3.802s
user	25m4.396s
sys	1m59.709s

tasket · 2021-07-02T12:04:22Z

There is another reason why --sparse can be slower: Without sparse the list of chunks to be sent is pre-fetched by the helper program, but with sparse it must wait for the local system to compress+compare before receiving the next chunk identifier. So that introduces latency.

An idea for the future would be for receive to look for Wyng snapshots that belong to a known session in the archive, make a local comparison between the snapshot and the dest volume, and then compute the receive manifest based on that comparison + any session manifests that come after the snapshot. Re-compression would not be needed at all in such a case.

tlaurion · 2021-07-04T16:16:38Z

From #83 (comment)

Yes sshfs along with the qubes: (not qubes-ssh:) specifier can get you around these hurdles. I think sshfs can be pretty slow when used with the kind of LUKS+Ext4 loopback container (in dom0) suggested in the Readme, and this is how I use it. So I'm sorry to hear its also slow without the container layer. I noticed there is a lot of sshfs tuning advice out there; a 100% speed improvement seems pretty good.

Some clarifications. The 100% speed improvement was gained having --sparse-write over --sparse on locally mounted archive dir (comparison was done between having archive stored inside qubes:// data storage vs dom0 mounted LVM, which didn't show significant gains) in qubes:// mode.

Nothing to do on my current tests over sshfs mounted LUKS mapped container over a sshfs loopback raw file.
In this scenario, sync seems the culprit of delays when applying dedup (lsof | grep media which is the final mountpoint of the mapped unlocked LUKS container's partition mounted over /media), where test results are preliminary.

Those results are opposed to testing qubes:// mode over sshfs mounted remote directory. In this scenario, it seems that sshfs over loopback unlocked LUKS container (LUKS+ext4) is faster then doing plain SSHFS mounted mountpoint of remote directory (where operations on files and directory seem a lot slower) and where listing content, du etc operations take forever, and where the mounted loopback operations (outside of the sync operations) are instantaneous.

EDIT: Will verify results of SSHFS tweaking advices (if that is the 100% expected improvement here).

tlaurion · 2022-04-14T23:02:17Z

@tasket some results on fresh Q4.1 install (and why I posted 3 bug reports)

At the time of writing this (2022-04-14)
Wyng 04alpha 20220104

Doesn't have --tag working 04alpha: --tag="" or --tag="test, this is test" fails #97
Doesn't support --encrypt=off 04alpha: --encrypt=off fails to create archive.ini #96
Doesn't support arch-deduplication (which would be helpful to do basic sends, but then doing cleanup to economize space when computer is unused). 04alpha --arch-deduplicate fails #98

Qubes 4.1 clean install backup.
dom0:
sudo qubes-dom0-update python3-zstd

root-autosnap created with systemd root-autosnap.shutdown at shutdown at /usr/lib/systemd/system-shutdown/root-autosnap.shutdown:

/usr/sbin/lvremove --noudevsync --force -An qubes_dom0/root-autosnap
/usr/sbin/lvcreate --noudevsync --ignoremonitoring -An -pr -s qubes_dom0/root -n root-autosnap

Interesting enough, specifying vm-pool at arch-init still permits to backup root-autosnap from wyng.

Basically, for the next tests, we vary arch-init settings:

sudo wyng --local=qubes_dom0/vm-pool --dest=qubes://wyng-backups/ --subdir=home/user/ arch-init
sudo wyng --local=qubes_dom0/vm-pool --dest=qubes://wyng-backups/ --subdir=home/user/ arch-init --compression zlib:3

or
sudo wyng --local=qubes_dom0/vm-pool --dest=qubes://wyng-backups/ --subdir=home/user/ arch-init --compression zlib:9

Then:

sudo wyng add vm-debian-11-root vm-fedora-34-root vm-whonix-gw-16-root vm-whonix-ws-16-root root-autosnap vm-anon-whonix-private vm-default-mgmt-dvm-private vm-fedora-34-dvm-private vm-personal-private vm-sys-whonix-private vm-untrusted-private vm-vault-private vm-whonix-ws-16-dvm-private vm-work-private

Then:
time sudo wyng send
or
time sudo wyng send --dedup

Then in-between tests:
sudo wyng arch-delete

Most of the CPU operations are happening over dom0, where wyng-backups seems to be waiting on IOs.

Unknowns: cost of encryption (cannot test --encrypt=off on "Wyng 0.4.0alpha release 20220104", bugs reported individually.

Knowns:
x230: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz
Qubes 4.1 from release ISO.

Default settings (zlib:3 compression) no --dedup

Sending backup session 20220414-160855 to qubes://wyng-backups
  100%    1357.2M  |  root-autosnap 
  100%       0.0M  |  vm-anon-whonix-private 
  100%    1517.4M  |  vm-debian-11-root 
  100%       0.0M  |  vm-default-mgmt-dvm-private 
  100%       0.0M  |  vm-fedora-34-dvm-private 
  100%    2255.5M  |  vm-fedora-34-root 
  100%       0.0M  |  vm-personal-private 
  100%       0.0M  |  vm-sys-whonix-private 
  100%       0.0M  |  vm-untrusted-private 
  100%       0.0M  |  vm-vault-private 
  100%     715.0M  |  vm-whonix-gw-16-root 
  100%       0.0M  |  vm-whonix-ws-16-dvm-private 
  100%    1140.6M  |  vm-whonix-ws-16-root 
  100%       0.0M  |  vm-work-private

real	7m44.000s
user	4m39.364s
sys	1m46.112s
7.0GB on disk

Sending initial send with --dedup:

Sending backup session 20220414-163805 to qubes://wyng-backups
  100%    1346.5M  |  root-autosnap 
  100%       0.0M  |  vm-anon-whonix-private 
  100%    1399.6M  |  vm-debian-11-root 
  100%       0.0M  |  vm-default-mgmt-dvm-private 
  100%       0.0M  |  vm-fedora-34-dvm-private 
  100%    2158.6M  |  vm-fedora-34-root 
  100%       0.0M  |  vm-personal-private 
  100%       0.0M  |  vm-sys-whonix-private 
  100%       0.0M  |  vm-untrusted-private 
  100%       0.0M  |  vm-vault-private 
  100%     279.4M  |  vm-whonix-gw-16-root 
  100%       0.0M  |  vm-whonix-ws-16-dvm-private 
  100%     487.4M  |  vm-whonix-ws-16-root 
  100%       0.0M  |  vm-work-private 

real	7m31.166s
user	4m43.038s
sys	1m39.586s
5.7GB on disk

arch-init with zlib:5 and --dedup:

Sending backup session 20220414-170001 to qubes://wyng-backups
  100%    1321.7M  |  root-autosnap 
  100%       0.0M  |  vm-anon-whonix-private 
  100%    1361.9M  |  vm-debian-11-root 
  100%       0.0M  |  vm-default-mgmt-dvm-private 
  100%       0.0M  |  vm-fedora-34-dvm-private 
  100%    2119.8M  |  vm-fedora-34-root 
  100%       0.0M  |  vm-personal-private 
  100%       0.0M  |  vm-sys-whonix-private 
  100%       0.0M  |  vm-untrusted-private 
  100%       0.0M  |  vm-vault-private 
  100%     276.9M  |  vm-whonix-gw-16-root 
  100%       0.0M  |  vm-whonix-ws-16-dvm-private 
  100%     476.2M  |  vm-whonix-ws-16-root 
  100%       0.0M  |  vm-work-private 

real	13m46.003s
user	11m17.495s
sys	1m43.612s
5.6GB on disk

arch-init with zlib:9 and --dedup:

Sending backup session 20220414-174526 to qubes://wyng-backups
  100%    1308.8M  |  root-autosnap 
  100%       0.0M  |  vm-anon-whonix-private 
  100%    1347.8M  |  vm-debian-11-root 
  100%       0.0M  |  vm-default-mgmt-dvm-private 
  100%       0.0M  |  vm-fedora-34-dvm-private 
  100%    2105.8M  |  vm-fedora-34-root 
  100%       0.0M  |  vm-personal-private 
  100%       0.0M  |  vm-sys-whonix-private 
  100%       0.0M  |  vm-untrusted-private 
  100%       0.0M  |  vm-vault-private 
  100%     273.5M  |  vm-whonix-gw-16-root 
  100%       0.0M  |  vm-whonix-ws-16-dvm-private 
  100%     471.0M  |  vm-whonix-ws-16-root 
  100%       0.0M  |  vm-work-private 

real	53m22.395s
user	50m45.852s
sys	1m47.103s
5.5GB on disk

Considering those results, there is no real gain into compressing past zlib:3, where --dedup is giving a lot, even on first send.

tlaurion · 2022-08-28T22:10:46Z

@tasket !!!! Finally found a cheap provider to experiment with.

veeble.org, 5$ USD a month, 2gb ram, 20GB ssd and 100TB bandwidth.
They of course have more space/bandwidth/memory options available if needed and DNS name as well for later more serious PoC.

Was able to duplicate rsync.net subaccount setup based on basic user rights, and create rw account with ro sub-account(in subdir) used to specify what OEM image type is there (q41_insurgo here as example) where ssh authorized_keys is simply put somewhere else per sshd_config override on user match:

Match User q41_insurgo
        AuthorizedKeysFile /etc/ssh/authorized_keys-%u

So safe state restoration as a service is totally feasible on cheap storage friendly VPS services (again, no 0.4 encryption testing, but I see no stopper there. Please fix #112 though!)

@tasket : you got a x230 laptop? This might go faster now. I see you are not active on Matrix?
As stated under #104 , blake2 and zstd were packaged under Heads successfully, lvm thin provisioning was hacked to bypass the past failings. Basically, the only thing missing is ash compatible wyng bash script I was not able to make work successfully before, and I would be able to even have Heads create LUKS container, vg pools and have Heads just pump a state on demand soon enough.

Basically, RW account is used by OEM/org to create archive under RO account made available to access backup archives, at the condition of having public key in authorized_keys above.

The RO account is used in qubes-ssh specified app-qube per dom0 to retrieve trusted states archives and works pretty well as opposed to sshfs (now deprecated anyway...).

We will have a problem offering states as service though.

As of now, I see that the wyng helper script and errors are at a shared location if multiple peoples were using the service at the same time, those should be isolated with different paths on the ssh server host. Want me to create an individual issue?

Some comparison of performances differences of current modes with wyng-backup defaults of arch-init:

With without --dedup and --sparse-write:

We see a small difference in local processing time, where bandwidth used is mostly the same.

With without --dedup and --sparse:

We see that bandwidth consumed is strongly reduced, but where CPU and processing time is increased exponentially.
This setting would be perfect for low bandwidth situation where user can trust locking his computer and go to sleep while this happens.

Consequently, I think an emphasis should be made between sparse and sparse-write in the documentation. This has high impact and I wish I could trace what accounts for the difference a bit more. dom0 was using 10-20% cpu the whole time, so not so busy to account for the difference of time processing. Load average on server is 0.00 0.01 0.05, so if something could be done by the server to help the client speed things up (through the helper), that might be a nice avenue here. There seems to be no real reason to use all that bandwidth following past result, where something seems to be missing to ease catching and transmitting only the changes needed faster. @tasket : thoughts?

Intuition here is that the client could upload a bit more about its mapping to the server (4mb upload vs 361 dl here, an hour later while the previous tests were done under minutes with 50mbit download link)

tlaurion · 2022-08-28T23:03:12Z

Conslusion --dedup with --sparse vs --sparse-write

2.14Gb (no --sparse, with --sparse-write or not specified) vs 814mb (with -sparse)
~ 7minutes (no --sparse, with --sparse-write or not specified) vs 85 minutes (with --sparse)

So having --sparse is:

Saving more then 1.2Gb of bandwidth
Taking nearly 80 minutes more of processing to do so.

tasket · 2022-08-30T14:35:59Z

@tlaurion I don't have an x230 but I do have a T430s which is internally almost identical. It currently has a basic Qubes 4.1 install and factory firmware.

blake2 isn't required for v0.4, you can manually select sha256. zstd might give you a speed boost, but it could also mess things up because the format has been evolving recently so I doubt how reproducible the resulting "comparison chunks" will be (probably an issue bc Python library and script library won't be identical). So bzip2 is still the safe bet. FWIW, I could now add gzip support to Wyng because newer Python gzip lib allows override of time header info which is required for consistent hashing.

As of now, I see that the wyng helper script and errors are at a shared location if multiple peoples were using the service at the same time, those should be isolated with different paths on the ssh server host. Want me to create an individual issue?

I think this is due to /tmp dir paths being static. I am already addressing this in v0.4 but if you need it working in v0.3 then open an issue.

The benchmark is interesting. I would not have expected 80m added with --sparse. CPU wasn't hugely affected so I think this has to do with the much higher interactivity over the network; worth investigating and improving. I did try to avert this type of issue as the current code already issues a flush op when requesting a chunk:

            else:
                print("%s/%s/%s" % (ses, faddr[1:addrsplit], faddr), flush=True, file=gv_stdin)

Maybe Python isn't pushing the flush past its various io layers, or it may be an ssh/Internet buffering behavior. But yeah, I interpret this as mostly latency/waiting occurring when it shouldn't. Obviously sparse receive could be very valuable if this were resolved so I'll definitely try to do so.

Also note --sparse-write only affects local writes; only noticeable difference would be less space used in the LVM pool. As such, it is currently imperfectly implemented because occupied chunks which are zeroed-out by receive only generate a 'discard' if LVM is configured to automatically discard zeros; ideally Wyng should generate the discard but that is not simple to do in Python. Finally, --dedup has no effect on receive although its interesting you make that association; dedup is for send only (now automatically activated for arch-deduplicate).

Intuition here is that the client could upload a bit more about its mapping to the server (4mb upload vs 361 dl here, an hour later while the previous tests were done under minutes with 50mbit download link)

Yes the procedural difference between sparse and non-sparse is that the latter sends an entire file list to the helper script in one batch, while sparse mode compares-then-requests each chunk individually. Doing it the current way actually presents opportunity for reduced (not enlarged) processing time but specific i/o behaviors may make it necessary to use asyncio to realize that potential. And yes, comparing all then sending the list to the helper would immediately improve performance, but that seems like the low road to me; we want CPU comparing and net i/o flowing simultaneously if possible.

tlaurion · 2022-09-03T22:25:31Z

zstd might give you a speed boost, but it could also mess things up because the format has been evolving recently so I doubt how reproducible the resulting "comparison chunks" will be (probably an issue bc Python library and script library won't be identical). So bzip2 is still the safe bet. FWIW, I could now add gzip support to Wyng because newer Python gzip lib allows override of time header info which is required for consistent hashing.

As per imperfect PR proposed, I was able to integrate blake2 and zstd under Heads, and removing thin-provisioning-tools checks.

zstd is and blake2 are definitely speedier, so a little bit more details on zstd not having consistent hashing would be welcome here for next steps of testing.

bzip2 is damn slow!

tasket · 2022-09-04T19:03:19Z

Yes, my initial tests of zstd files from different sources shows they don't match. Under certain conditions they are very close in size so I will look further with hexdump to see if the difference is just header info.

blake2 isn't really faster than sha256 as the latter usually benefits from hw acceleration. However, blake2 is considered more secure as it has good resistance against length extension attacks.

bzip2 does compare favorably to zstd speed when higher compression ratios are used. If you're OK with lower compression ratios (say 3.0:1 instead of 3.8:1) and compression speed is more important than net bandwidth, then gzip is a future possibility. Currently Wyng v0.3 cannot do gzip because its geared to Python 3.5.

tasket · 2022-09-04T19:11:00Z

BTW, considering you are importing new tools into Heads environment, the compression issue IIRC is resolved if the env has pigz available.

tasket · 2022-09-04T19:58:35Z

BTW2... adding gzip to Wyng only solves the consistency issue internally, for things like deduplication. When using shell script to process data, the gzip command itself has no way to suppress header timestamps. However, pigz has options to suppress timestamps for gzip format. The only alternative to using pigz in this case is to hack the header metadata created by gzip before chunks are compared.

tasket · 2022-09-06T03:34:24Z

@tlaurion After doing some manual tests with python-zstd and 'zstd' command line tool, I have some good news...

The output does match if --no-check option is used with the zstd command.

The bad news: This was tested in dom0 / fc32 system where both the python library and the CLI tool use libzstd version 1.4.x. Newer Linux releases have a CLI command version 1.5.x which does not yield matching output with the older library version. So for zstd to work with the Wyng sh script, for the time being you will have to use older zstd v1.4.x in the Heads environment.

tasket · 2022-09-06T03:44:50Z

zstandard issue explaining the conditions for reproducibility:
facebook/zstd#999

tasket · 2022-09-06T17:44:45Z

@tlaurion wyng-extract.sh has been updated in fix03 to make zstd compression reproducible and generally usable in this context. Compression levels 3-10 will give fast results with good size reduction.

tlaurion · 2022-09-06T17:58:35Z

@tlaurion wyng-extract.sh has been updated in fix03 to make zstd compression reproducible and generally usable in this context. Compression levels 3-10 will give fast results with good size reduction.

@tasket Will look at it, but as stated in PR #104 the script contains bashisms that Heads' busybox (ash compliant) doesn't like.

I tried to remove some of those bashisms but broke the script doing so, leaving trace of what was needed to be removed to be more posix'ish compliant.

tlaurion · 2022-09-06T18:17:25Z

@tasket just commented on the required changes pushed under #104 to remove bashisms and where I left the PR as draft because not-functional as is. Take literally anything that is needed.

I understand that I have to pack zstd 1.4x under Heads

tlaurion · 2022-09-06T18:38:15Z

blake2 isn't really faster than sha256 as the latter usually benefits from hw acceleration. However, blake2 is considered more secure as it has good resistance against length extension attacks.

From compilation choices, I understood that blake2b is also hardware accelerated

bzip2 does compare favorably to zstd speed when higher compression ratios are used. If you're OK with lower compression ratios (say 3.0:1 instead of 3.8:1) and compression speed is more important than net bandwidth, then gzip is a future possibility.

I think the priority will be to reduce restoration times, so I guess a combination with higher compression time (zstd 3 is the default right? So should do tests with zstd 10-19?) and blake2b.

tlaurion · 2022-09-06T18:40:40Z

The bad news: This was tested in dom0 / fc32 system where both the python library and the CLI tool use libzstd version 1.4.x. Newer Linux releases have a CLI command version 1.5.x which does not yield matching output with the older library version. So for zstd to work with the Wyng sh script, for the time being you will have to use older zstd v1.4.x in the Heads environment.

Will restest this, I'm not clear on the impacts of facebook/zstd#999 (comment) comment in our wyng-backup case.

@tasket It's also confusing to know that once dom0 will be upgraded the full backup archive will need to be redone? So basically, what I understand from this is that things will break if hashes are on resulting compressed data and not its origin blocks? This might be problematic?

tlaurion · 2022-09-06T19:14:51Z

@tasket I could also pack pigz instead of zstd and compare results with --sparse restoration.

For the sake of states restoration as a service, there will be a choice to be made toward archive lifetime and restoration speed over network, on which as of now I have not enough experimentation background.

The result of --sparse restoration above were the result of fix03 branch with wyng-backup default settings used to backup over local wyng qube, with python script used to receive the archive.

I only rsync'ed the archive over VPS for network based restoration tests exposed, so any clear recommendations on settings to be tested on arch-init would be welcome to optimize network bandwidth and restoration time :)

Could also switch to test 0.4 branch from now on as well. Have not followed improvements on that branch, but if integrity contract is now built in (without encryption or with it, if it can be passed as option unattended), I could start to test this instead, of course if wyng-extract script can be used with it going forward.

Not to mix performance tests with long term support as of now, but since states are meant to be selectable, I would definitely prefer directions that would not require to recreate the archives too often :)

As of now, just getting excited to have PoC over Heads.

tasket · 2022-09-08T14:52:32Z

I think the priority will be to reduce restoration times, so I guess a combination with higher compression time (zstd 3 is the default right? So should do tests with zstd 10-19?) and blake2b.

zstd level 10 will give about the same throughput as gzip/zlib level 4 but with noticeably better compression ratios. Feel free to experiment but I personally wouldn't use above zstd 10; the setting I typically use is either 3 or 7. This benchmark chart gives a general idea of the differences.

Keep in mind that for the wyng-extract.sh script in sparse mode, it must also do compression (in addition to decompression) in order to find/fetch only changed chunks.

When dom0 changes to zstd 1.5 some choices will have to be made. With Wyng-only operation, the "breakage" would manifest as dedup and remap becoming temporarily inefficient but I would expect no data corruption. Especially with a remap op (where a mismatched snapshot is deleted and new snapshot is paired) would result in a whole additional copy of the volume being added to the archive (although subsequent remaps of the same volume would not suffer this effect). IIRC the borg backup program standardized on zstd early and has issued many advisories to users to ditch and rebuild their archives after upgrading to avoid archives ballooning in size. For the time being, I will look for ways to advise/warn users, but I may put restrictions on which version can be used (already started this in the sh script).

OTOH, a careful archive user/curator could discern when zstd has changed to 1.5 and then prune all the older sessions that were done with 1.4. I think for your use case w sh script, disk space would be saved but bandwidth for dl updates is not saved.

OTOH2, Ubuntu LTS already has 1.5 of the python3-zstd library, and that version is already in Debian Testing. Fedora lags badly, however, with no update between fc32 and fc37. Maybe consider backporting the 1.5 library to Fedora ourselves.

Hashing: I would use blake2b because the difference vs sha256 may not even be noticeable as they are both far faster than most compression options.

Wyng 0.3 vs upgrading to v0.4alpha: The v0.4 format is going to change some more when alpha3 drops, but I don't anticipate any conversion roadblocks bc unencrypted data chunks will remain the same. There is already alpha1->alpha2 conversion that is done automatically but I don't anticipate v0.3->v0.4 conversion until the end of alpha3. I still prefer to test the extractor sh script on v0.3 and then convert it to v0.4 later mostly bc some tedious steps will have to be added to support v0.4 format.

Verification of v0.4 archives: Think of it being mostly the same as v0.3 except you only need to do your own verification on archive.ini if archive is unencrypted; archive.ini will verify the rest of the metadata and data. If archive is encrypted then archive.ini is self-verifying.

Fails w sig 13 at 'xargs ' issues #81 #65 #54

tasket · 2022-09-21T03:53:24Z

@tlaurion Here is my updated survey of the situation, based on feedback from zstd project and some recent tests I've made...

Assessment

Neither Zlib nor Gzip can match shell command output with Python lib output. This is unfortunate because Zlib output remains very consistent between versions ranging from Fedora 32 through 36 and Python 3.5 through 3.11.

Bzip2 output matches no matter what, across shell, Python and different versions.

Zlib, Gzip and Bzip2 are mature, stable code bases.

Zstd can be very consistent between shell and Python output if the versions are similar. Its an encouraging sign, but Zstd project is extremely noncommittal on the subject of reproducibility; if they so much as tweak a status message or fix a buffer overflow vuln we are to assume Zstd output will be different than in the past.

Options

Stay with Bzip2
Analyze the Zlib output of pigz and Python's version; we will likely find that only framing and padding differ (but with identical data segments)
Find some way to make Python handle Zlib compression everywhere (i.e. call python from the shell script). However, micropython will only decompress zlib, it doesn't support compression which is crucial.
Use Zstd and nail-down the version used. This would likely fall on admin/integrator (your) shoulders although I can facilitate by adding configurable version checks of the compression lib to either Wyng or the extract script. If Zstd reports a vulnerability and issues a patch, you would have to decide whether to use their updated version or backport the patch to your chosen Zstd version.

Other

Affects issue #54 –

SSH/Rsync/remote: The extract shell script operates as a file batch processor, so the addition of remote access transfers ought to be straightforward.

Sparse mode: At this point I would make the script blockdev-only, which gets us past the busybox fallocate shortcomings. That makes busybox dd shortcomings the biggest issue; the basic problem here is simply updating a block device in a sparse way to avoid consuming 100% disk space for each volume restore. dd sparse mode made this easy, but it can be done in other ways.

tlaurion · 2024-06-03T14:44:52Z

Weird issue with ext4 while attempting to cp -alr archive dir to another one. Seems like there is a maximum number of possible references to the same blocks?

Maybe documentation should suggest filesystem limits. As of now, we know ext4 might not be a perfect fit in terms of fixated Inode (maximum number of small files that can be created on a ext4 filesystem, determined at fs creation time) and this weird limit I encountered trying to archive an archive doing a directory copy with hardlinks tracking.

@tasket?

tasket · 2024-06-03T16:39:47Z

The hard-link limit for any single file on most Linux filesystems is about 65,000.

Having any data that is quite that dedup-prone is a very small corner case. Wyng has its internal workaround, which you helped with via your feedback. But externally, no; nothing in GNU or Linux guards against it or works around it.

That Wyng workaround could probably be enhanced so that links are kept to, say, 6500 per file instead of 65,000. But I very much doubt its a good idea to implement that before "Cloud storage API" feature.

But note... Implementing an internal archive-copying feature could also be the answer.

tlaurion mentioned this issue Jul 1, 2021

0.3.0rc2 Unattended receive actions don't show progress nor diff bytes but only 100% upon completion #80

Closed

tasket added research help wanted Extra attention is needed labels Dec 31, 2021

tasket added a commit that referenced this issue Sep 9, 2022

Try ash adaptation

afae272

Fails w sig 13 at 'xargs ' issues #81 #65 #54

tasket mentioned this issue Mar 25, 2023

Streamline 'receive --sparse' mode #152

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance hits of user choices #81

Performance hits of user choices #81

tlaurion commented Jun 30, 2021 •

edited

Loading

tasket commented Jul 1, 2021

tlaurion commented Jul 1, 2021 •

edited

Loading

tasket commented Jul 2, 2021

tlaurion commented Jul 4, 2021 •

edited

Loading

tlaurion commented Apr 14, 2022

tlaurion commented Aug 28, 2022 •

edited

Loading

tlaurion commented Aug 28, 2022

tasket commented Aug 30, 2022 •

edited

Loading

tlaurion commented Sep 3, 2022 •

edited

Loading

tasket commented Sep 4, 2022

tasket commented Sep 4, 2022

tasket commented Sep 4, 2022

tasket commented Sep 6, 2022

tasket commented Sep 6, 2022

tasket commented Sep 6, 2022

tlaurion commented Sep 6, 2022 •

edited

Loading

tlaurion commented Sep 6, 2022

tlaurion commented Sep 6, 2022 •

edited

Loading

tlaurion commented Sep 6, 2022 •

edited

Loading

tlaurion commented Sep 6, 2022 •

edited

Loading

tasket commented Sep 8, 2022

tasket commented Sep 21, 2022

tlaurion commented Jun 3, 2024

tasket commented Jun 3, 2024

Performance hits of user choices #81

Performance hits of user choices #81

Comments

tlaurion commented Jun 30, 2021 • edited Loading

tasket commented Jul 1, 2021

tlaurion commented Jul 1, 2021 • edited Loading

tasket commented Jul 2, 2021

tlaurion commented Jul 4, 2021 • edited Loading

tlaurion commented Apr 14, 2022

Knowns: x230: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz Qubes 4.1 from release ISO.

tlaurion commented Aug 28, 2022 • edited Loading

With without --dedup and --sparse-write:

With without --dedup and --sparse:

tlaurion commented Aug 28, 2022

Conslusion --dedup with --sparse vs --sparse-write

tasket commented Aug 30, 2022 • edited Loading

tlaurion commented Sep 3, 2022 • edited Loading

tasket commented Sep 4, 2022

tasket commented Sep 4, 2022

tasket commented Sep 4, 2022

tasket commented Sep 6, 2022

tasket commented Sep 6, 2022

tasket commented Sep 6, 2022

tlaurion commented Sep 6, 2022 • edited Loading

tlaurion commented Sep 6, 2022

tlaurion commented Sep 6, 2022 • edited Loading

tlaurion commented Sep 6, 2022 • edited Loading

tlaurion commented Sep 6, 2022 • edited Loading

tasket commented Sep 8, 2022

tasket commented Sep 21, 2022

Assessment

Options

Other

tlaurion commented Jun 3, 2024

tasket commented Jun 3, 2024

tlaurion commented Jun 30, 2021 •

edited

Loading

tlaurion commented Jul 1, 2021 •

edited

Loading

tlaurion commented Jul 4, 2021 •

edited

Loading

Knowns:
x230: Intel(R) Core(TM) i5-3320M CPU @ 2.60GHz
Qubes 4.1 from release ISO.

tlaurion commented Aug 28, 2022 •

edited

Loading

tasket commented Aug 30, 2022 •

edited

Loading

tlaurion commented Sep 3, 2022 •

edited

Loading

tlaurion commented Sep 6, 2022 •

edited

Loading

tlaurion commented Sep 6, 2022 •

edited

Loading

tlaurion commented Sep 6, 2022 •

edited

Loading

tlaurion commented Sep 6, 2022 •

edited

Loading