Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrated encryption #7

Closed
6 tasks done
tasket opened this issue Dec 12, 2018 · 28 comments
Closed
6 tasks done

Integrated encryption #7

tasket opened this issue Dec 12, 2018 · 28 comments
Labels
enhancement New feature or request funding Seeking funds for implementation help wanted Extra attention is needed
Milestone

Comments

@tasket
Copy link
Owner

tasket commented Dec 12, 2018

Integrate an encryption layer that can also be used to verify metadata and data from the destination archive.

Looking for examples and discussion on applied cryptography techniques from best practices to implementations in various tools including qvm-backup, restic, Time Machine, etc.

Factors

  • Security
  • Efficiency
  • Stream-ability: No intermediate data storage before destination
  • Transport compatibility (ssh, https)
  • Storage compatibility (filesystem, share, "cloud")

Implementation checklist

  • Add encryption for data
  • Add compression and encryption for metadata
  • Full verification chain for metadata, issue Implement metadata validation #79
  • Change volume dirs to use anonymous IDs
  • Assign safety ranges per key for each cipher and prevent nonce re-use
  • Implement key derivation and storage

Threat model

Wyng's threat model appears to be most similar to an encrypted database: A mass of data that is updated and curated periodically. Attackers gaining access to the entire volume ciphertext, possibly on successive occasions may be assumed.

Security issues

Encryption scheme should be robust and have low interactivity and complexity as well as high isolation potential.

Isolation would be in the form of a Qubes-like environment where the Admin VM (e.g. Domain 0) running the backup process is blocked from direct network access, and encryption/decryption is performed only there. Wyng should be able to encrypt effectively in such an isolated environment.

Compatibility with Admin isolation also extends to how any guest containers/VMs are handled: Encryption and integrity verification cannot rely on the guest environments or their OS templates.

Encryption strategies

  1. LUKS or VeraCrypt on a loop device (which can be isolated) with backing in a remote/shared image file. For example: cryptsetup -> losetup -> sshfs. This solution is readily available but imposes a performance penalty of ~20% on a VM-isolated configuration. It also requires painstaking user setup in a Linux-specific environment; difficult to integrate; poor choice for remote/cloud.

  2. Encfs - A FUSE file-encrypting layer may improve performance over a setup based on a loop device. It may also be simpler to setup or even integrate. Advantage: automatic filename (but not size or sequence) obfuscation. Drawback: issue with hardlinks in some encryption modes.

  3. CryFS - Another FUSE layer with built-in support for network transports. Complete file metadata obfuscation. Claims superior resistance to attack. Unknowns: Hardlink support, transport isolation potential.

  4. Direct crypto library/AES utilization - Uses no external layers, but requires painstaking attention to detail and review by a cryptographer if possible. This option may be a natural choice, given the simplicity of the archive chunk format; any issues around the implementation security should have direct analogues to a wide field of other implementations and their use cases. See initial comments on AES modes.

  5. Some encrypted backup tool that can accept a stream of named chunks with very low interactivity between the front end and back end (e.g. a 'push' model).

(After some deliberation and using Wyng with external encryption layers, this issue will be primarily concerned with an integrated solution similar to item 4.)

Types of data

Wyng keeps volume data and metadata as separate files, and the metadata validates the volume data.

See Issue #79 for specifics on metadata, which is expected to use separate encryption keys.

On commenting...

Following a core tenet of cryptography that the application must be understood thoroughly before making specific decisions, a substantial familiarity with Wyng is required to make sense of this issue (ye have been warned...).

Its suggested that making some incremental backups with Wyng and looking at the metadata under '/var/lib/wyng.backup' is a good starting point. In the source code, the classes under ArchiveSet() are instructive in addition to merge_manifests() and the places where its used.

@tasket tasket added enhancement New feature or request help wanted Extra attention is needed labels Dec 12, 2018
@tasket tasket added this to the v0.5 milestone Dec 12, 2018
@marmarek
Copy link

I'd also include in requirements that it should not trust any single template (including VMs based on a specific template) with cleartext. While template compromise is unlikely and fatal already (for data of VMs based on it), spreading impact to all the VMs is even worse. This means encryption should be done either in dom0, or some entity (unikernel-like?) used only for backups encryption and nothing else. Or use different VMs depending on what data is encrypted (probably too complex).

@tasket
Copy link
Owner Author

tasket commented Feb 17, 2020

Thanks, Marek. That is what I was alluding to in references to isolation potential and low interactivity (as well as in the Readme where it states that untrusted guest volumes are handled safely), but its best to make it explicit.

I had some discussion with the author of CryFS about this issue, backing up from an isolated admin VM, but s/he didn't seem to appreciate why anyone would isolate encryption functions in a disconnected admin environment.

@tasket tasket added the funding Seeking funds for implementation label Jul 27, 2020
@tasket tasket modified the milestones: v0.5, v0.4 Jun 3, 2021
@tasket
Copy link
Owner Author

tasket commented Jun 3, 2021

Changing the milestone to v0.4 as that will be the version that gets experimental encryption support.

@tlaurion

This comment has been minimized.

@cm157

This comment has been minimized.

@tlaurion
Copy link
Contributor

tlaurion commented Jun 4, 2021

Discussion should move back to QubesOS/qubes-issues#1293

@cm157

This comment has been minimized.

@tasket
Copy link
Owner Author

tasket commented Jun 5, 2021

Locking for now due to extraneous noise.

Repository owner locked as off-topic and limited conversation to collaborators Jun 5, 2021
@tasket tasket mentioned this issue Jun 5, 2021
5 tasks
Repository owner deleted a comment from cm157 Jun 10, 2021
Repository owner deleted a comment Jun 10, 2021
Repository owner deleted a comment from cm157 Jun 10, 2021
Repository owner deleted a comment from cm157 Jun 10, 2021
Repository owner deleted a comment Jun 10, 2021
Repository owner unlocked this conversation Jun 10, 2021
@tasket
Copy link
Owner Author

tasket commented Jun 10, 2021

AES mode selection –

Some interesting AES encryption modes (subject to change):

Non-authenticating

  • CBC
  • CTR

Authenticating

  • OCB3
  • GCR
  • SIV

OCB is said to be much faster than other authenticating modes.

IMHO, its uncertain whether an authenticating mode is necessary here for a couple of reasons: 1) Wyng already has a basis (hash manifests) for validating chunks of data. 2) The "hash last, then validate hash first" advocates appear to be basing their argument on attacks that mainly work on network data streams. If that is the case, then there is less to be concerned about in selecting between these modes.

It is also worth assessing the risk of decrypting an (initially) unvalidated ciphertext. My understanding is that a symmetric cipher like AES and popular hashing algorithms are closely related and fall under the class of finite state machines. Therefore, if the very next operation after decrypt() is always either hash + compare with manifest hash or discard, then I think this is safe and secure.

The largest issue in selecting a mode is probably in the degree of uniqueness required for the IV/nonce, which will be something to consider going forward.

IANAC – This is all open to debate so convince me otherwise. :)

tasket added a commit that referenced this issue Jun 19, 2021
@tasket
Copy link
Owner Author

tasket commented Jun 19, 2021

Encryption work has started in branch wip04, at a POC stage, and as yet cannot be used securely (!) as it leaves the key exposed.... Use only test volumes with this. The metadata will be saved under a separate 'wyng.backup040' dir instead of usual, so no need to change meta dir manually for tests.

Encryption is enabled by default and uses AES-256-CBC mode cipher. Currently it encrypts+decrypts data only (not metadata). To this extent "it works" for send and receive.


After testing various cipher modes, I settled on CBC. It is quite secure ("catastrophic" failure of confidentiality is rare/limited) and since the Python crypto libraries don't appear to be parallelized, its one of the better performers too. SIV mode had less than half the throughput of CBC in my tests.

I also want to note that I took a fairly harmless liberty using encrypt() in the hope that collision resistance would be improved: A 128bit random "bolster" is added to the beginning of each plaintext chunk just before encrypting. With a chaining or cascading mode like CBC, I expect this should protect the actual data better than just the IV alone. Of course, informed comments on any of this are very welcome.

@tasket
Copy link
Owner Author

tasket commented Jun 22, 2021

New branch 'wip04b' created with the crypto library switched from pycryptodome to cryptography (python3-cryptography package).

The reason is that cryptography is benchmarking around 35% faster for the same AES-256-CBC cipher, meaning that pycryptodome is exacting a 50% performance penalty. This was too large to ignore so I decided to switch now before the code got too dependent on the slower library. Another plus is that cryptography has been available in OS repositories longer and more consistently.

Another small change is that encryption happens only just prior to the data buffer being sent, instead of being encrypted and possibly not being sent bc deduplication.

Also, MAC tests for receive/verify/diff are now done with the secrets library.


Some changes that are needed next:

Once these are implemented, we should have a reasonably secure encryption scheme for Wyng archives.

@DemiMarie
Copy link

Encryption is enabled by default and uses AES-256-CBC mode cipher. Currently it encrypts+decrypts data only (not metadata). To this extent "it works" for send and receive.

AES-256-CBC is a poor choice. Not only is it very slow when encrypting (far more common than decrypting), it does not provide authentication, which leaves it vulnerable to chosen-ciphertext attacks. An AEAD cipher such as AES-256-GCM or ChaCha20-Poly1305 is a far better choice. AES is only a reasonable option on platforms where it is hardware accelerated; if portability to other platforms matters, use ChaCha20-Poly1305.

AEAD ciphers usually have short nonces, which must never repeat for a given key. These nonces are too short to be safely chosen at random. Since persistently storing a nonce is a recipe for disaster (consider qvm-volume revert), a fresh key must be generated whenever the process starts. The key can be generated from a master key and a long random nonce using any decent KDF. The nonce must be stored along with the data.

Finally, AEAD APIs should require the entire buffer to be passed in one operation. This makes them unsuitable for encrypting large individual messages. Instead, a streaming API should be used. Manually implementing such an API is error-prone.

Is https://download.libsodium.org/doc/secret-key_cryptography/secretstream an option? That provides a high-level API for encrypting a sequence of messages, which is what a backup system needs. I strongly recommend just using libsodium for this, rather than trying to implement something similar by hand.

@tasket
Copy link
Owner Author

tasket commented Jun 22, 2021

@DemiMarie

Wyng is already awash in data hashes which puts it in the position of using CBC mode to its advantage. GCM is based on CTR mode, which has the potential for catastrophic confidentiality failures not present in CBC. Also see my comments above where the python library CBC was found to be about as fast as GCM.

If you think Wyng's data verification is an issue, that should be addressed separately as it exists already and independently from encryption––and that will not change without compelling arguments.

OTOH, that is not to say an AEAD mode won't be added. I think SIV (rather slow) or GCM-SIV (faster, but currently unavailable from OS repository) would have acceptable confidentiality safeguards. But GCM confidentiality appears too weak on its own (and hence why GCM-SIV was developed). XChaCha20-Poly1305 (X- with beefed-up IV space) also looks interesting, although I'd prefer to see some discussion about non-stream applications and also a formalization of the ChaCha20/XChaCha20 protocol first. Note these recent developments (GCM-SIV and XChaCha20) indicate confidentiality weakness of prior modes.

The current emphasis on AEADs is controversial. Probably, if Wyng were a network protocol and not an at-rest storage format, I would agree AEADs are compelling. But that is not the case here.

Since persistently storing a nonce is a recipe for disaster (consider qvm-volume revert), a fresh key must be generated whenever the process starts. The key can be generated from a master key and a long random nonce using any decent KDF. The nonce must be stored along with the data.

My reading of current practice is that (besides nonce storage being accepted and usually mandatory) new keys are generated when nonce/IV space is exhausted. A particular mode may also have a requirement that an IV is unpredictable. So I think its more likely Wyng could use a nonce/IV that combines a unique counter with a random portion. A 64bit counter would accommodate (w smallest chunk size) 2^64 * 65536 (yottabytes) of backed-up disk space.

Is https://download.libsodium.org/doc/secret-key_cryptography/secretstream an option?

It depends. An incremental backup session records a series of data chunks to the archive. If the whole series must be considered a single stream, then Wyng loses both pruning capability and deduplication. So each chunk would need to be its own stream.... and I think we're back to the issues we face with the AES modes. That's why I stated early-on that the threat model looks more like the one for whole-disk encryption, which has its own trade-offs. If we break from that threat model we're probably looking at limiting the storage model to something that is not random-access.

Metadata is a bit different story. Because of the way Wyng processes it (funneling digest lists through merge-sort), other encryption modes can be used.

@DemiMarie
Copy link

My reading of current practice is that (besides nonce storage being accepted and usually mandatory) new keys are generated when nonce/IV space is exhausted. A particular mode may also have a requirement that an IV is unpredictable. So I think its more likely Wyng could use a nonce/IV that combines a unique counter with a random portion. A 64bit counter would accommodate (w smallest chunk size) 2^64 * 65536 (yottabytes) of backed-up disk space.

One needs to store the nonce with the data, but one must never use a nonce that was read from disk or the network. Instead, one should generate a fresh key using a KDF whenever the process starts, and store the KDF inputs (except the secret seed) along with the data. Alternatively, one can use XChaCha20-Poly1305, which has a large enough nonce that it can be just generated at random at each startup.

It depends. An incremental backup session records a series of data chunks to the archive. If the whole series must be considered a single stream, then Wyng loses both pruning capability and deduplication. So each chunk would need to be its own stream.... and I think we're back to the issues we face with the AES modes. That's why I stated early-on that the threat model looks more like the one for whole-disk encryption, which has its own trade-offs. If we break from that threat model we're probably looking at limiting the storage model to something that is not random-access.

The whole-disk encryption threat model is intended for protection against loss of physical media, where chosen-ciphertext attacks are very difficult. That is not the case here.

@tasket
Copy link
Owner Author

tasket commented Jun 22, 2021

but one must never use a nonce that was read from disk or the network

Re -use... correct?

The whole-disk encryption threat model is intended for protection against loss of physical media, where chosen-ciphertext attacks are very difficult. That is not the case here.

This is going a little far. FDE is deployed on network storage systems, and in office environments where repeated physical access (without losing media) is a part of the threat model.

@tasket
Copy link
Owner Author

tasket commented Jun 22, 2021

I'm also curious why a chosen-ciphertext attack against data is an issue here. Wyng always works from the assumption that its metadata (digest list) is secure before any data is verified. Hence the 3rd item on the above checklist.

@tasket
Copy link
Owner Author

tasket commented Jun 22, 2021

And I'll grant the threat model is not exactly like FDE, where some implementations will re-use IVs. That's why I proposed using a unique counter in the IV.

@HW42
Copy link

HW42 commented Jun 22, 2021

Note: I wrote this comment after reading up to #7 (comment). So this text doesn't take into consideration what has been posted after that.

I second @DemiMarie's opinion that this should use authenticated encryption and if possible some existing more high-level API (I need to lookup some details on libsodium before I will comment if I think it's a good choice here (probably yes)).

You are right that this is probably harder to attack in practice than some network protocol but given how fast modern AEADs are there is no reason to build something fragile in a new thing.

So each chunk would need to be its own stream.... and I think we're back to the issues we face with the AES modes.

I don't understand this argument. What issue do you have if you make each chunk a separate stream? That being said if your chunks are small enough you can use the simpler AEAD interface instead of some streaming API.

That's why I stated early-on that the threat model looks more like the one for whole-disk encryption, which has its own trade-offs.

I think for a backup software you need to support a bit more than FDE. In particular non local storage needs stronger authentication requirements than local FDE (as it's currently in use). Qubes built-in backup also supports strong authentication. FDE is also slowly moving into authenticating things. For example for system software (no encryption, only authentication) there's dm-verity that provides strong authentication. dm-crypt+dm-intergrity can provide only sector level authentication, but still better than plain dm-crypt. In general FDE makes a lot of compromises due to it's requirements. For a backup software you are in a much better position so you can support better crypto.

OTOH, that is not to say an AEAD mode won't be added.

I would recommend against building in some separate crypto algorithm agility scheme for this use case. Choose a good algo+parameters. And if at some point it really turns out that there is a need to change it, that should be handled by a global format version change like other big changes.

Note these recent developments (GCM-SIV and XChaCha20) indicate confidentiality weakness of prior modes.

Those developments are mainly to support randomly generated IVs (and IIRC the SIV variants also target to have some nonce reuse resistance), I don't think "confidentiality weakness" is a good way to say that they should not be used with random IVs. If this relevant to your usage depends a lot on how you plan to use it (see below).

Some changes that are needed next:

Key derivation and integration into hierarchical metadata is probably the much more tricky task (because here "just use a existing robust higher-level API" is probably not possible to the extend it's for the encrypt a chunk part). So I would suggest to first draft the plan for this and discuss that. Then you can take another look at the "how to encrypt a chunk" part (for example if you derive a new key for each chunk anyway IV re-use is a not an issue).

@HW42
Copy link

HW42 commented Jun 22, 2021

FDE is deployed on network storage systems, and in office environments where repeated physical access (without losing media) is a part of the threat model.

Such attacks are not addressed by common FDE solutions like dm-crypt (AFAIK MS's BitLocker is very similar but I'm not familiar with it's details).

@HW42
Copy link

HW42 commented Jun 22, 2021

I'm also curious why a chosen-ciphertext attack against data is an issue here. Wyng always works from the assumption that its metadata (digest list) is secure before any data is verified. Hence the 3rd item on the above checklist.

So you already verify the hash of the encrypted data before decryption? Then you have a custom AEAD scheme, not just CBC. I read your previous comments as you don't do this and only verify after decryption.

@DemiMarie
Copy link

but one must never use a nonce that was read from disk or the network

Re -use... correct?

No, I meant “use”. Otherwise one is vulnerable to a replay attack. The only time this is okay is if one has a hardware-enforced monotonic counter, but that is ~never the case in this context.

@HW42
Copy link

HW42 commented Jun 22, 2021

but one must never use a nonce that was read from disk or the network

Re -use... correct?

No, I meant “use”. Otherwise one is vulnerable to a replay attack. The only time this is okay is if one has a hardware-enforced monotonic counter, but that is ~never the case in this context.

I think you are talking past each other. You use the stored nonce to decrypt the data that was encrypted using it. What @DemiMarie means is that you should not use stored data to derive another nonce from it. So you should not do something like: counter = read_counter_from_disk(); counter += 1; save_counter_to_disk(counter); encrypt(data, iv=counter) because it risks nonce re-use (for example after restoring a backup that contains an older stored counter.

@tasket
Copy link
Owner Author

tasket commented Jun 22, 2021

So you already verify the hash of the encrypted data before decryption? Then you have a custom AEAD scheme, not just CBC. I read your previous comments as you don't do this and only verify after decryption.

You're confusing the role of metadata and data here. The lion's share of metadata is digest lists. There are too many assumptions being made here by people who have been disinterested in this project until this point.

No, I meant “use”. Otherwise one is vulnerable to a replay attack. The only time this is okay is if one has a hardware-enforced monotonic counter, but that is ~never the case in this context.

No, Wyng is not a network protocol and if you paid attention you'd realize the data chunks are not validated like a network protocol. Its all-or-nothing. There is no "re-play" from error-correction schemes. The digest test already uses a time-invariant function. Anything else an attacker is likely to do is DoS. I'm fine with DoS.

So you should not do something like: counter = read_counter_from_disk(); counter += 1; save_counter_to_disk(counter); encrypt(data, iv=counter) because it risks nonce re-use (for example after restoring a backup that contains an older stored counter.

WHY would I read and (yes) re-use any data like that and apply it in such a fashion???

If I store a counter, it can be in a protected header (metadata) such as archive.ini. And PLEASE don't repeat the error and say I can't trust the header either. It can be signed if that's really necessary, that is the whole point of issue 79. The current usage model assumes that the metadata is protected by the isolated admin environment; transitioning to encryption, that metadata will have to be verified before any data can be processed. And if you assume I'm going to use AES-CBC for metadata signing/verification, then /eyeroll.

I should also point out the GCM problems aren't limited to nonce re-use. When the underlying CTR mode fails, it is (I repeat) catastrophic. Tons of data (or all) data gets exposed. With CBC under the same conditions, only the identical repeat messages tend to be exposed.

Finally, key scheduling is better suited to network streams, but won't be out of the question going forward. I still have to assume that unique IVs will be sufficient, because that's what the API documentation and application guides say so the current tack is a reasonable starting point.


Here's the deal. I do not want this issue flooded with piles of best-practice nostrums from every use case under the sun applied indiscriminately, as is the fashion. Going forward, you can comment if A) you're a cryptographer or B) you demonstrate you've reviewed the Wyng format and present ideas about encryption in "Wyng-ese".

The ideas already in Wyng have to be respected or there will be little point in adding encryption to it.

Please also understand, this is being developed by a single person (me) in my spare time. The encryption feature will be introduced as experimental and probably stay experimental for some time––as happened with deduplication––barring some considerable increase in participation. Other projects have a lot more manpower, and can still get by with delaying (say) correct verification of system updates for over a decade; in non-experimental releases at that.

So, those are the terms and they are terrific. :-)

@HW42
Copy link

HW42 commented Jun 22, 2021

[...] There are too many assumptions being made here by people who have been disinterested in this project until this point.
[...]
So, those are the terms and they are terrific. :-)

I did look at the commit mentioning the ticket but given your comment it wasn't clear what is just done this way because it's some very early version of the feature and what is your mid to long term plan.

Anyway: I did comment here since Marek asked me privately if I would have time to take a look. Unfortunately my comments had the opposite of the intended effect and you perceived my comments as some outsider to the project trying to force "best-practice nostrums" on you. Sorry about this, I definitely didn't want to annoy you in the issue tracker of your spare time project. So I will refrain from further comments for now. If you would like to discuss this or related topic in the future feel free to contact me.

@tasket
Copy link
Owner Author

tasket commented Jun 25, 2021

For perspective...

qubes-core-admin/qubes/backup.py:


DEFAULT_CRYPTO_ALGORITHM = 'aes-256-cbc'

@marmarek
Copy link

DEFAULT_CRYPTO_ALGORITHM = 'aes-256-cbc'

That's literally the only occurrence of this constant in the code, it is not used anywhere :)
Currently the encryption uses https://github.com/Tarsnap/scrypt/blob/master/FORMAT - especially because it handles HMAC properly (after encrypting) and has proper KDF too.

@tasket
Copy link
Owner Author

tasket commented Jun 25, 2021

@marmarek What do you think about AES-SIV mode?

tasket added a commit that referenced this issue Jul 7, 2021
tasket added a commit that referenced this issue Aug 4, 2021
Implement validation of metadata

Check bounds of metadata loading

Sync add cmd with dest
tasket added a commit that referenced this issue Aug 7, 2021
tasket added a commit that referenced this issue Aug 13, 2021
tasket added a commit that referenced this issue Aug 24, 2021
tasket added a commit that referenced this issue Aug 31, 2021
Change data cipher default to xchacha20

Issue #7
tasket added a commit that referenced this issue Aug 31, 2021
tasket added a commit that referenced this issue Aug 31, 2021
@tasket
Copy link
Owner Author

tasket commented Aug 31, 2021

Good Morning.... The basic encryption implementation has been completed!

Usage notes

Upon new archive creation with arch-init, encryption is enabled by default. An unencrypted archive may be created with arch-init --encrypt=off or to encrypt with a specific data cipher arch-init --encrypt=<cipher>. The unencrypted mode still needs preliminary testing.

The rest is like using Wyng v0.3, although there are additional features slated for v0.4 that will cause further changes in its command syntax and format.

Compatibility: Wyng 0.4 (wip) is being tested on Fedora 32 (Qubes 4.1), Debian 11 and Ubuntu 21.04. Qubes 4.0 does seem like a possibility if A) encryption is not used, or B) suitable encryption library versions are ported to Fedora 25.

Technical notes

  • The user selects a data cipher (either AEAD or non-AEAD) and Wyng selects a matching AEAD cipher for authenticating metadata. XChaCha20-Poly1305 is now an option for data, and selecting XChaCha20 will now use XChaCha20-Poly1305 as the metadata cipher instead of AES-SIV. (Note that there are >3 sodium/NaCl based libraries now available for Python, and although they seem to vary a lot in quality one of them may be a better option for providing XChaCha20 in the future.)

  • Metadata and data are encrypted by separate keys derrived with scrypt from a single passphrase and separate salts. There is no re-keying capability at present, although this could make a nice future addition.

  • For the IV/nonce , a counter is used with the number-of-messages (or -blocks) safety bound for each cipher mode determining the counter size. For XChaCha20 and XChaCha20-Poly1305, three factors are concatenated together including a 32bit UTC time in seconds and 80 random bits in addition to the counter. This approach was chosen as an efficient way to prevent nonce re-use.

The counters for each key are updated in the remote metadata root archive.ini file as the volume data is being sent, alongside more frequently updated mirror of the counters tracked in the local .salt file. On startup, the two versions of each counter are compared, the larger is taken, and then the counter update step-1 is added as a precaution. The counter is always advanced before incorporating it in a new IV.

The upper bound for each cipher's message counter plus other parameters for the IV:

Cipher IV/nonce size Counter Max Random Time sec.
XChaCha20 192 bits 2^80-64 80 bits 32 bits
AES-SIV 96 bits 2^48-64 48 bits none
AES-CBC* 128 bits 2^48-64* 80 bits none
  • If the counter runs out the current key is considered exhausted and no further data will be written. Currently with the XChaCha20 cipher, that allows approximately 2^80 * 64Kbytes of source volume data to be written to the archive before it effectively becomes read-only. There is also a small emergency reserve of 64 (for future) in case more data must be written to make an exhausted archive consistent. Archives that were initialized with a default chunk size >64K can store proportionally more data. Metadata counter is consumed at a bit more than 1/128 the rate of the data counter.

  • Once the metadata root is verified at startup (currently done via the AEAD cipher), the hashes contained within are used to validate all other metadata + data, always in the context of the latest archive revision. IOW, archive.ini is always updated anytime there is a change in the archive (which is also time stamped) so outdated metadata below the root won't be accepted even when having valid AEAD tags.

  • Note for AES-CBC: This cipher is currently disabled as the entropy diffusion in the IV appears to be an issue. This could be enabled in the future once this concern is addressed, i.e. by taking the additional step of encrypting each IV before use. The counter limit in the above table refers to the number of AES blocks in this case.

  • A simple check of cpu flags is made for AES_NI hardware support when an AES cipher is selected, and a check for Cryptodome library version >= 3.9 is made with XChaCha20.

  • For those who do not want use or rely on AEAD ciphers for authenticating Wyng archives, the metadata root archive.ini can be signed by the user by some other means as a way of authenticating an entire archive. Signing the entire metadata file set is no longer necessary with versions >= 0.4.

@tasket tasket closed this as completed Aug 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request funding Seeking funds for implementation help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

6 participants