New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream Cipher Used to Encrypt Last File Block #9

Open
lipnitsk opened this Issue Aug 26, 2014 · 53 comments

Comments

Projects
None yet
@lipnitsk

lipnitsk commented Aug 26, 2014

From: https://defuse.ca/audits/encfs.htm

Exploitability: Unknown
Security Impact: High

As reported in [1], EncFS uses a stream cipher mode to encrypt the last file block. The change log says that the ability to add random bytes to a block was added as a workaround for this issue. However, it does not solve the problem, and is not enabled by default.

EncFS needs to use a block mode to encrypt the last block.

EncFS's stream encryption is unorthodox:

1. Run "Shuffle Bytes" on the plaintext.
    N[J+1] = Xor-Sum(i = 0 TO J) { P[i] }
    (N = "shuffled" plaintext value, P = plaintext)
2. Encrypt with (setIVec(IV), key) using CFB mode.
3. Run "Flip Bytes" on the ciphertext.
    This reverses bytes in 64-byte chunks.
4. Run "Shuffle Bytes" on the ciphertext.
5. Encrypt with (setIVec(IV + 1), key) using CFB mode.

Where setIVec(IV) = HMAC(globalIV || (IV), key), and,
    - 'globalIV' is an IV shared across the entire filesystem.
    - 'key' is the encryption key.

This should be removed and replaced with something more standard. As far as I can see, this provides no useful security benefit, however, it is relied upon to prevent the attacks in [1]. This is security by obscurity.

Edit : [1] may be unavailable, so here it is from archives.org :

[Full-disclosure] Multiple Vulnerabilities in EncFS
From: Micha Riser (micha[at]povworld.org)
Date: Thu Aug 26 2010 - 07:05:18 CDT
(...)
3. Last block with single byte is insecure 
------------------------------------------------------- 
The CFB cipher mode is insecure if it is used twice with the same 
initialization vector. In CFB, the first block of the plain text is XOR-ed with 
the encrypted IV: 
  C0 = P0 XOR Ek (IV ) 
Therefore, for two cipher blocks C0 and C0' encrypted with the same IV, it 
holds that: 
  C0 XOR C0' = (P0 XOR Ek (IV )) XOR (P0' XOR Ek (IV )) = P0 XOR P0' 
This means that an attacker gets the XOR of the two plain texts. EncFs uses a 
modified version of CFB which additionally shuffles and reverses bytes. It is not 
clear however, if the modifications generally help against this problem. 

A security problem arises definitely if the last block contains only a single 
byte and an attacker has two versions of the last block. Operating on a single 
byte, the shuffle and reverse operation do nothing. What remains is a double 
encryption with CFB and XOR-ing the two cipher bytes gives the XOR of the 
two plain text bytes due to the reason described above. Encrypting the last 
block with a stream cipher instead of a block cipher saves at most 16 bytes 
(one cipher block). We think it would be better to sacrifice these bytes and in 
exchange rely only on a single encryption mode for all blocks which simplifies 
both the crypto analysis and the implementation.
@vgough

This comment has been minimized.

Show comment
Hide comment
@vgough

vgough Aug 29, 2014

Owner

Plan is to eliminate use of stream mode entirely in Encfs 2.x (for new filesystems). No plan for Encfs 1.x

Owner

vgough commented Aug 29, 2014

Plan is to eliminate use of stream mode entirely in Encfs 2.x (for new filesystems). No plan for Encfs 1.x

@rfjakob

This comment has been minimized.

Show comment
Hide comment
@rfjakob

rfjakob Oct 18, 2014

Collaborator

Do you already have a plan for what mode to use? CBC with ciphertext stealing seems to be a good option.

Collaborator

rfjakob commented Oct 18, 2014

Do you already have a plan for what mode to use? CBC with ciphertext stealing seems to be a good option.

@rfjakob

This comment has been minimized.

Show comment
Hide comment
@rfjakob

rfjakob Oct 18, 2014

Collaborator

The other option would be to go with CTR for the whole file. With CTR, however, an attacker can flip single bits at will, so it would need to go with MAC enabled by default. If ecryptfs has MACs enabled by default (will check) we should probably too, anyway.

Collaborator

rfjakob commented Oct 18, 2014

The other option would be to go with CTR for the whole file. With CTR, however, an attacker can flip single bits at will, so it would need to go with MAC enabled by default. If ecryptfs has MACs enabled by default (will check) we should probably too, anyway.

@rfjakob

This comment has been minimized.

Show comment
Hide comment
@rfjakob

rfjakob Oct 19, 2014

Collaborator

CTR has the additional problem that the XOR of two cipertext files copied at two different times is the XOR of the plaintext. To fix that leak you'd need random per-block IVs.

Collaborator

rfjakob commented Oct 19, 2014

CTR has the additional problem that the XOR of two cipertext files copied at two different times is the XOR of the plaintext. To fix that leak you'd need random per-block IVs.

@vgough

This comment has been minimized.

Show comment
Hide comment
@vgough

vgough Oct 23, 2014

Owner

For Encfs2, I'm leaning towards GCM mode (as used in ZFS).

Owner

vgough commented Oct 23, 2014

For Encfs2, I'm leaning towards GCM mode (as used in ZFS).

@generalmanager

This comment has been minimized.

Show comment
Hide comment
@generalmanager

generalmanager Mar 1, 2015

@vgough Salsa20+Poly1305 would also be a viable (and very fast) alternative, as outlined by Thomas Ptacek in his blog:
http://sockpuppet.org/blog/2014/04/30/you-dont-want-xts/

@vgough Salsa20+Poly1305 would also be a viable (and very fast) alternative, as outlined by Thomas Ptacek in his blog:
http://sockpuppet.org/blog/2014/04/30/you-dont-want-xts/

@rfjakob

This comment has been minimized.

Show comment
Hide comment
@rfjakob

rfjakob Mar 1, 2015

Collaborator

Actually, i don't think large changes like that are neccessary. Blockwise cbc works fine for everything but the last 16 bytes (the aes block size).
By padding the plaintext with 16 zero bytes, that problem goes away, at the cost of wasting 16 bytes.
I think this is the way to go.

Collaborator

rfjakob commented Mar 1, 2015

Actually, i don't think large changes like that are neccessary. Blockwise cbc works fine for everything but the last 16 bytes (the aes block size).
By padding the plaintext with 16 zero bytes, that problem goes away, at the cost of wasting 16 bytes.
I think this is the way to go.

@lachesis

This comment has been minimized.

Show comment
Hide comment
@lachesis

lachesis Mar 21, 2015

Contributor

Please don't invent a padding scheme; just pad with PKCS#7 like everyone else. :)

Contributor

lachesis commented Mar 21, 2015

Please don't invent a padding scheme; just pad with PKCS#7 like everyone else. :)

@rfjakob

This comment has been minimized.

Show comment
Hide comment
@rfjakob

rfjakob Mar 21, 2015

Collaborator

Thanks for the pointer! However, pkcs#7 seems to require that you read the last bytes of the ciphertext to geht the plaintext length. This is one additional seek for every stat(), we should really avoid that as it kills rsync performance.

Collaborator

rfjakob commented Mar 21, 2015

Thanks for the pointer! However, pkcs#7 seems to require that you read the last bytes of the ciphertext to geht the plaintext length. This is one additional seek for every stat(), we should really avoid that as it kills rsync performance.

@rfjakob

This comment has been minimized.

Show comment
Hide comment
@rfjakob

rfjakob Mar 21, 2015

Collaborator

(It's probably more than one seek, because the filesystem has to parse its internal data structures first to locate the data)
So I think what we need is a "headerless" scheme, where you don't have to read any ciphertext to get the length.
Unconditionally adding 16 zero bytes (or any value) would to that:

pppppppppp 0000000000000000
                    ^---- 16 bytes zero padding
    ^-------------------- 10 bytes plaintext

AES encryption (16 byte blocks) ->

cccccccccccccccc 0000000000
                     ^--- 10 bytes of zeros
     ^------------------- 16 bytes encrypted data
Collaborator

rfjakob commented Mar 21, 2015

(It's probably more than one seek, because the filesystem has to parse its internal data structures first to locate the data)
So I think what we need is a "headerless" scheme, where you don't have to read any ciphertext to get the length.
Unconditionally adding 16 zero bytes (or any value) would to that:

pppppppppp 0000000000000000
                    ^---- 16 bytes zero padding
    ^-------------------- 10 bytes plaintext

AES encryption (16 byte blocks) ->

cccccccccccccccc 0000000000
                     ^--- 10 bytes of zeros
     ^------------------- 16 bytes encrypted data
@djtm

This comment has been minimized.

Show comment
Hide comment
@djtm

djtm May 14, 2015

Isn't that a security issue if you know that the last bytes will be (padded with) zero bytes? Maybe better random bytes?

djtm commented May 14, 2015

Isn't that a security issue if you know that the last bytes will be (padded with) zero bytes? Maybe better random bytes?

@rfjakob

This comment has been minimized.

Show comment
Hide comment
@rfjakob

rfjakob May 14, 2015

Collaborator
Collaborator

rfjakob commented May 14, 2015

@RogerThiede

This comment has been minimized.

Show comment
Hide comment
@RogerThiede

RogerThiede May 14, 2015

@akerl

This comment has been minimized.

Show comment
Hide comment
@akerl

akerl May 14, 2015

Trying to predict how to modify ciphers based on what vulnerabilities might be discovered in the future quickly becomes a wild goose chase. I suspect if you submitted a PR that improved the padding without affecting backwards compat, it would fare better.

akerl commented May 14, 2015

Trying to predict how to modify ciphers based on what vulnerabilities might be discovered in the future quickly becomes a wild goose chase. I suspect if you submitted a PR that improved the padding without affecting backwards compat, it would fare better.

@JanKanis

This comment has been minimized.

Show comment
Hide comment
@JanKanis

JanKanis Jul 21, 2015

A random idea I just thought of:
Encode file length (and other small useful metadata) in the encrypted filename. That would reduce the maximum filename length even more than it is now, so if that maximum is reached, substitute a hash of the filename and add the real file name to the end of the file data. That would encode metadata in the file contents only in the (rare) case where the filename is too long, so it wouldn't hurt rsync et al in the common case. And this would resolve the limited filename length problem as well.

A random idea I just thought of:
Encode file length (and other small useful metadata) in the encrypted filename. That would reduce the maximum filename length even more than it is now, so if that maximum is reached, substitute a hash of the filename and add the real file name to the end of the file data. That would encode metadata in the file contents only in the (rare) case where the filename is too long, so it wouldn't hurt rsync et al in the common case. And this would resolve the limited filename length problem as well.

@vgough

This comment has been minimized.

Show comment
Hide comment
@vgough

vgough Jul 24, 2015

Owner

In order to make lookups simple, it is preferable that encrypted filenames can be directly computed from plaintext filenames. That way a call to open("foo.txt") doesn't require a directory scan in order to find the encrypted file. Instead, we encrypt "foo.txt" and attempt to open the encrypted name.

Allowing hashed names, to extend allowable file lengths, doesn't hurt too badly since it could still be done without a directory traversal. Encoding metadata into filenames would thwart this, since I'm not aware of any portable way to do a prefix match or otherwise avoid walking the entire directory listing.

Owner

vgough commented Jul 24, 2015

In order to make lookups simple, it is preferable that encrypted filenames can be directly computed from plaintext filenames. That way a call to open("foo.txt") doesn't require a directory scan in order to find the encrypted file. Instead, we encrypt "foo.txt" and attempt to open the encrypted name.

Allowing hashed names, to extend allowable file lengths, doesn't hurt too badly since it could still be done without a directory traversal. Encoding metadata into filenames would thwart this, since I'm not aware of any portable way to do a prefix match or otherwise avoid walking the entire directory listing.

@JanKanis

This comment has been minimized.

Show comment
Hide comment
@JanKanis

JanKanis Jul 24, 2015

Of course. I should have thought it through a bit longer.

Of course. I should have thought it through a bit longer.

@vgough

This comment has been minimized.

Show comment
Hide comment
@vgough

vgough Jul 24, 2015

Owner

No worries, I appreciate the ideas. I've wanted to do the same myself, just didn't figure out a way to make that work.

Owner

vgough commented Jul 24, 2015

No worries, I appreciate the ideas. I've wanted to do the same myself, just didn't figure out a way to make that work.

@wasgehetdichdasan

This comment has been minimized.

Show comment
Hide comment
@wasgehetdichdasan

wasgehetdichdasan Aug 2, 2015

Is there a chance that there - maybe ;-) - will be a solution for the actual version in next time?

Is there a chance that there - maybe ;-) - will be a solution for the actual version in next time?

@wasgehetdichdasan

This comment has been minimized.

Show comment
Hide comment
@wasgehetdichdasan

wasgehetdichdasan Aug 13, 2015

no one who thinks that he can make a fast fix?

no one who thinks that he can make a fast fix?

@rfjakob

This comment has been minimized.

Show comment
Hide comment
@rfjakob

rfjakob Aug 14, 2015

Collaborator
Collaborator

rfjakob commented Aug 14, 2015

@wasgehetdichdasan

This comment has been minimized.

Show comment
Hide comment
@wasgehetdichdasan

wasgehetdichdasan Aug 18, 2015

uhh. And what's with an not backwards compatible version which is not 2.0?

uhh. And what's with an not backwards compatible version which is not 2.0?

@h0nIg

This comment has been minimized.

Show comment
Hide comment
@h0nIg

h0nIg Oct 24, 2016

However, it does not solve the problem, and is not enabled by default.

could you please clarify which commit introduced the fix and which option is used to workaround this issue?

ping @vgough

h0nIg commented Oct 24, 2016

However, it does not solve the problem, and is not enabled by default.

could you please clarify which commit introduced the fix and which option is used to workaround this issue?

ping @vgough

@benrubson

This comment has been minimized.

Show comment
Hide comment
@benrubson

benrubson Mar 10, 2017

Collaborator

could you please clarify which commit introduced the fix and which option is used to workaround this issue?

When you configure encfs in expert mode :

Add random bytes to each block header?
This adds a performance penalty, but ensures that blocks
have different authentication codes.  Note that you can
have the same benefits by enabling per-file initialization
vectors, which does not come with as great of performance
penalty. 
Select a number of bytes, from 0 (no random bytes) to 8: 

However, as the audit stated, it does not solve the problem.

Collaborator

benrubson commented Mar 10, 2017

could you please clarify which commit introduced the fix and which option is used to workaround this issue?

When you configure encfs in expert mode :

Add random bytes to each block header?
This adds a performance penalty, but ensures that blocks
have different authentication codes.  Note that you can
have the same benefits by enabling per-file initialization
vectors, which does not come with as great of performance
penalty. 
Select a number of bytes, from 0 (no random bytes) to 8: 

However, as the audit stated, it does not solve the problem.

@danim7

This comment has been minimized.

Show comment
Hide comment
@danim7

danim7 Mar 13, 2017

Any ideas on what data format shall be implemented?

A disk format based on GCM mode could also help to fix the issues related to MAC headers.

danim7 commented Mar 13, 2017

Any ideas on what data format shall be implemented?

A disk format based on GCM mode could also help to fix the issues related to MAC headers.

@benrubson

This comment has been minimized.

Show comment
Hide comment
@benrubson

benrubson Apr 28, 2018

Collaborator

This is the most "important" EncFS security report.

By padding the plaintext with 16 zero bytes, that problem goes away, at the cost of wasting 16 bytes. I think this is the way to go.

I like the idea @rfjakob.
Rather simple, without changing all existing work.
Could be a transitional change, before moving to another whole format.
Did you perhaps already / would you work on a patch for this please ?

We would then kill two birds with one stone, as we would then be able to also close #10 👍

Collaborator

benrubson commented Apr 28, 2018

This is the most "important" EncFS security report.

By padding the plaintext with 16 zero bytes, that problem goes away, at the cost of wasting 16 bytes. I think this is the way to go.

I like the idea @rfjakob.
Rather simple, without changing all existing work.
Could be a transitional change, before moving to another whole format.
Did you perhaps already / would you work on a patch for this please ?

We would then kill two birds with one stone, as we would then be able to also close #10 👍

@rfjakob

This comment has been minimized.

Show comment
Hide comment
@rfjakob

rfjakob Apr 28, 2018

Collaborator

No, sorry, no plans of working on this. People who don't mind a format change can move to gocryptfs IMO.

Collaborator

rfjakob commented Apr 28, 2018

No, sorry, no plans of working on this. People who don't mind a format change can move to gocryptfs IMO.

@benrubson

This comment has been minimized.

Show comment
Hide comment
@benrubson

benrubson Apr 28, 2018

Collaborator

I'll see if I can work on this later on :)
I'm also wondering if such a modification would be reverse-write compatible.

By padding the plaintext with 16 zero bytes, that problem goes away, at the cost of wasting 16 bytes. I think this is the way to go.

Actually I think that padding with 15 bytes should be OK ?

Collaborator

benrubson commented Apr 28, 2018

I'll see if I can work on this later on :)
I'm also wondering if such a modification would be reverse-write compatible.

By padding the plaintext with 16 zero bytes, that problem goes away, at the cost of wasting 16 bytes. I think this is the way to go.

Actually I think that padding with 15 bytes should be OK ?

@rfjakob

This comment has been minimized.

Show comment
Hide comment
@rfjakob

rfjakob Apr 29, 2018

Collaborator

Right now I don't see why reverse-write would be a problem.

And yes, actually, 15 bytes should be enough.

Collaborator

rfjakob commented Apr 29, 2018

Right now I don't see why reverse-write would be a problem.

And yes, actually, 15 bytes should be enough.

@benrubson

This comment has been minimized.

Show comment
Hide comment
@benrubson

benrubson May 3, 2018

Collaborator

I then tried to implement the cipherBlockSize - 1 padding (15 bytes in examples below), according to what you described above @rfjakob, adding 15 bytes at the end of each file (but 0 byte files).

Leading to :

pppppppppp 000000000000000
                    ^---- 15 bytes zero padding
    ^-------------------- 10 bytes plaintext

AES encryption (16 byte blocks) ->

cccccccccccccccc 000000000
                     ^---  9 bytes of zeros
     ^------------------- 16 bytes encrypted data

So, I have a working algorithm in the following situations :

  • normal mode : read and write ;
  • reverse mode : read and write.

But reverse write seems impossible to achieve correctly. Below are some complicated situations.
Let's assume block size is 4KB bytes.

  • We receive a 4KB write request which comes at the end of the existing backing file. It could then be the last block of the file, so we would have to crop the last 15 bytes after decryption. But we are not sure this is the last block to be written, so we are not sure we should crop...
  • We receive a 1020 bytes request which comes at the end of the existing backing file. If we assume this is the last block of the file, we should then crop 12 bytes, decrypt, crop 3 bytes and append. But are we sure this is the last block ? Perhaps calling application will come with another write call to complete the 1020 bytes already received...
  • We receive a 10 bytes write request which comes at the end of the existing backing file. Is it some padding ? Should we then crop 5 bytes of the previous block ? Once again here we are not sure...
Collaborator

benrubson commented May 3, 2018

I then tried to implement the cipherBlockSize - 1 padding (15 bytes in examples below), according to what you described above @rfjakob, adding 15 bytes at the end of each file (but 0 byte files).

Leading to :

pppppppppp 000000000000000
                    ^---- 15 bytes zero padding
    ^-------------------- 10 bytes plaintext

AES encryption (16 byte blocks) ->

cccccccccccccccc 000000000
                     ^---  9 bytes of zeros
     ^------------------- 16 bytes encrypted data

So, I have a working algorithm in the following situations :

  • normal mode : read and write ;
  • reverse mode : read and write.

But reverse write seems impossible to achieve correctly. Below are some complicated situations.
Let's assume block size is 4KB bytes.

  • We receive a 4KB write request which comes at the end of the existing backing file. It could then be the last block of the file, so we would have to crop the last 15 bytes after decryption. But we are not sure this is the last block to be written, so we are not sure we should crop...
  • We receive a 1020 bytes request which comes at the end of the existing backing file. If we assume this is the last block of the file, we should then crop 12 bytes, decrypt, crop 3 bytes and append. But are we sure this is the last block ? Perhaps calling application will come with another write call to complete the 1020 bytes already received...
  • We receive a 10 bytes write request which comes at the end of the existing backing file. Is it some padding ? Should we then crop 5 bytes of the previous block ? Once again here we are not sure...
@benrubson

This comment has been minimized.

Show comment
Hide comment
@benrubson

benrubson May 5, 2018

Collaborator

I think I will go with OneAndZeroes padding of each block, with a cipherBlockSize - 1 bytes padding for the last block.
We would then still be able to get size of files without having to read the last block.
We would also be able to properly reverse-write, at a cost of one byte per blockSize.
I think it's worth it.

Any thoughts ?

Thx 👍

Last block :

pppppppppp 100000000000000
                    ^---- 15 bytes OneAndZeroes padding
    ^-------------------- 10 bytes plaintext

AES encryption (16 byte blocks) ->

cccccccccccccccc 000000000
                     ^---  9 bytes of zeros
     ^------------------- 16 bytes encrypted data

Other blocks :

ppppppppppppppp 1
                ^--------  1 byte OneAndZeroes padding
    ^-------------------- 15 bytes plaintext

AES encryption (16 byte blocks) ->

cccccccccccccccc
     ^------------------- 16 bytes encrypted data
Collaborator

benrubson commented May 5, 2018

I think I will go with OneAndZeroes padding of each block, with a cipherBlockSize - 1 bytes padding for the last block.
We would then still be able to get size of files without having to read the last block.
We would also be able to properly reverse-write, at a cost of one byte per blockSize.
I think it's worth it.

Any thoughts ?

Thx 👍

Last block :

pppppppppp 100000000000000
                    ^---- 15 bytes OneAndZeroes padding
    ^-------------------- 10 bytes plaintext

AES encryption (16 byte blocks) ->

cccccccccccccccc 000000000
                     ^---  9 bytes of zeros
     ^------------------- 16 bytes encrypted data

Other blocks :

ppppppppppppppp 1
                ^--------  1 byte OneAndZeroes padding
    ^-------------------- 15 bytes plaintext

AES encryption (16 byte blocks) ->

cccccccccccccccc
     ^------------------- 16 bytes encrypted data
@rfjakob

This comment has been minimized.

Show comment
Hide comment
@rfjakob

rfjakob May 5, 2018

Collaborator

Let's look at the difficulties, I think this should all work:

But we are not sure this is the last block to be written, so we are not sure we should crop...

Yes, we have to stat() the file to find out.

But are we sure this is the last block ? Perhaps calling application will come with another write call to complete the 1020 bytes already received...

Again, we can stat() the file to determine if it is the last block. Forward mode has to do this as well, right?

Collaborator

rfjakob commented May 5, 2018

Let's look at the difficulties, I think this should all work:

But we are not sure this is the last block to be written, so we are not sure we should crop...

Yes, we have to stat() the file to find out.

But are we sure this is the last block ? Perhaps calling application will come with another write call to complete the 1020 bytes already received...

Again, we can stat() the file to determine if it is the last block. Forward mode has to do this as well, right?

@rfjakob

This comment has been minimized.

Show comment
Hide comment
@rfjakob

rfjakob May 5, 2018

Collaborator

Another note:

Perhaps calling application will come with another write call to complete the 1020 bytes already received...

This does not matter. In forward mode, the file has to be always consistent on disk. The user application may crash at any time and stop writing. But the data it has already written must be safe.

Collaborator

rfjakob commented May 5, 2018

Another note:

Perhaps calling application will come with another write call to complete the 1020 bytes already received...

This does not matter. In forward mode, the file has to be always consistent on disk. The user application may crash at any time and stop writing. But the data it has already written must be safe.

@benrubson

This comment has been minimized.

Show comment
Hide comment
@benrubson

benrubson May 5, 2018

Collaborator

Thx for your feedbacks @rfjakob 👍

I agree if the cipher file is fully available locally.
You may be in a situation where the cipher file would not be locally available, so you would not be able to stat() it (so you would not be able to know if the block you have been asked to write is the last one of the file).
Think about for example downloading (or syncing, whatever the method used) some remote cipher files directly into a reverse-mounted EncFS.

Collaborator

benrubson commented May 5, 2018

Thx for your feedbacks @rfjakob 👍

I agree if the cipher file is fully available locally.
You may be in a situation where the cipher file would not be locally available, so you would not be able to stat() it (so you would not be able to know if the block you have been asked to write is the last one of the file).
Think about for example downloading (or syncing, whatever the method used) some remote cipher files directly into a reverse-mounted EncFS.

@rfjakob

This comment has been minimized.

Show comment
Hide comment
@rfjakob

rfjakob May 5, 2018

Collaborator

Forward mode would not work either in this case, right?

Collaborator

rfjakob commented May 5, 2018

Forward mode would not work either in this case, right?

@benrubson

This comment has been minimized.

Show comment
Hide comment
@benrubson

benrubson May 5, 2018

Collaborator

It would, because then here you encode data, so you don't expect it to be a multiple of cipherBlockSize. If the block you are writing is at the end of the local (cipher) file, you assume this is the last block and compute a cipherBlockSize - 1 bytes padding.

Collaborator

benrubson commented May 5, 2018

It would, because then here you encode data, so you don't expect it to be a multiple of cipherBlockSize. If the block you are writing is at the end of the local (cipher) file, you assume this is the last block and compute a cipherBlockSize - 1 bytes padding.

@rfjakob

This comment has been minimized.

Show comment
Hide comment
@rfjakob

rfjakob May 5, 2018

Collaborator

I agree if the cipher file is fully available locally.

Can't we stat() the plaintext file instead?

Collaborator

rfjakob commented May 5, 2018

I agree if the cipher file is fully available locally.

Can't we stat() the plaintext file instead?

@benrubson

This comment has been minimized.

Show comment
Hide comment
@benrubson

benrubson May 5, 2018

Collaborator

Unfortunately this would not help.
Let's assume we receive a 4KB (blockSize) cipher block.
According to the write call received, we have to write it as the end of the plaintext file. Perfect.
It could then be the last block of the plain file. But how to be sure ?
How can we then remove the last padding bytes that may exist ?
Without padding every block as proposed above, I don't see :|

Collaborator

benrubson commented May 5, 2018

Unfortunately this would not help.
Let's assume we receive a 4KB (blockSize) cipher block.
According to the write call received, we have to write it as the end of the plaintext file. Perfect.
It could then be the last block of the plain file. But how to be sure ?
How can we then remove the last padding bytes that may exist ?
Without padding every block as proposed above, I don't see :|

@rfjakob

This comment has been minimized.

Show comment
Hide comment
@rfjakob

rfjakob May 5, 2018

Collaborator

If the write expanded the file, if must be the last block, and it must have padding

Collaborator

rfjakob commented May 5, 2018

If the write expanded the file, if must be the last block, and it must have padding

@rfjakob

This comment has been minimized.

Show comment
Hide comment
@rfjakob

rfjakob May 5, 2018

Collaborator

(otherwise forward mode is buggy)

Collaborator

rfjakob commented May 5, 2018

(otherwise forward mode is buggy)

@benrubson

This comment has been minimized.

Show comment
Hide comment
@benrubson

benrubson May 5, 2018

Collaborator

Not necessarily. Think about a cipher file being dowloaded directly into a reverse-write EncFS (so that it is written decrypted directly to the local disk).
Every block received and written will expand the plain file. But only the last one received (and written) will be the real last block of the plain file.

Collaborator

benrubson commented May 5, 2018

Not necessarily. Think about a cipher file being dowloaded directly into a reverse-write EncFS (so that it is written decrypted directly to the local disk).
Every block received and written will expand the plain file. But only the last one received (and written) will be the real last block of the plain file.

@rfjakob

This comment has been minimized.

Show comment
Hide comment
@rfjakob

rfjakob May 5, 2018

Collaborator

The every block must have padding.

Collaborator

rfjakob commented May 5, 2018

The every block must have padding.

@benrubson

This comment has been minimized.

Show comment
Hide comment
@benrubson

benrubson May 5, 2018

Collaborator

A 15 bytes padding ?
Or a OneAndZeroes padding of each block, with a cipherBlockSize - 1 bytes padding for the last block ?

Collaborator

benrubson commented May 5, 2018

A 15 bytes padding ?
Or a OneAndZeroes padding of each block, with a cipherBlockSize - 1 bytes padding for the last block ?

@rfjakob

This comment has been minimized.

Show comment
Hide comment
@rfjakob

rfjakob May 5, 2018

Collaborator

Yes, 15 bytes.

Collaborator

rfjakob commented May 5, 2018

Yes, 15 bytes.

@rfjakob

This comment has been minimized.

Show comment
Hide comment
@rfjakob

rfjakob May 5, 2018

Collaborator

At that moment, it's the last block, right?

Collaborator

rfjakob commented May 5, 2018

At that moment, it's the last block, right?

@benrubson

This comment has been minimized.

Show comment
Hide comment
@benrubson

benrubson May 5, 2018

Collaborator

Look at these use cases :

Backup :
plain local -> EncFS reverse -> rsync to remote location

Restore :
rsync from remote location -> EncFS reverse -> plain local

I'm not sure backup will need to insert a 15 bytes padding after every block.

Collaborator

benrubson commented May 5, 2018

Look at these use cases :

Backup :
plain local -> EncFS reverse -> rsync to remote location

Restore :
rsync from remote location -> EncFS reverse -> plain local

I'm not sure backup will need to insert a 15 bytes padding after every block.

@rfjakob

This comment has been minimized.

Show comment
Hide comment
@rfjakob

rfjakob May 5, 2018

Collaborator

Interesting use case, but there are other problems:

plain local -> EncFS reverse -> rsync to remote location -> ciphertext

Now, let's assume the ciphertext contains 1000. And rsync happens to write() a chunk of data that ends with 1000. What does

EncFS reverse -> plain local

do?

Collaborator

rfjakob commented May 5, 2018

Interesting use case, but there are other problems:

plain local -> EncFS reverse -> rsync to remote location -> ciphertext

Now, let's assume the ciphertext contains 1000. And rsync happens to write() a chunk of data that ends with 1000. What does

EncFS reverse -> plain local

do?

Repository owner deleted a comment from rfjakob May 5, 2018

@benrubson

This comment has been minimized.

Show comment
Hide comment
@benrubson

benrubson May 5, 2018

Collaborator

// strange duplicate part of your message above deleted

Yes, I think this is the last tricky case.
I already thought about this, and I think we need an additional internal buffer.

Let's take your example.

1000%16 = 8
We crop last last 8 bytes.
We decode.
We remove padding bytes if it looks like we can.
We write plain data at the end of the plain file.
We return that we wrote 1000 bytes.
As 1000 < 4096, we keep the 1000 bytes into an internal buffer, as we may receive the next bytes of the block.

If we receive a write request with the next 1000 bytes, we will not read the 1000 previous bytes of the block from the plain file, as we have cropped some bytes, but will take them from our internal buffer.

Collaborator

benrubson commented May 5, 2018

// strange duplicate part of your message above deleted

Yes, I think this is the last tricky case.
I already thought about this, and I think we need an additional internal buffer.

Let's take your example.

1000%16 = 8
We crop last last 8 bytes.
We decode.
We remove padding bytes if it looks like we can.
We write plain data at the end of the plain file.
We return that we wrote 1000 bytes.
As 1000 < 4096, we keep the 1000 bytes into an internal buffer, as we may receive the next bytes of the block.

If we receive a write request with the next 1000 bytes, we will not read the 1000 previous bytes of the block from the plain file, as we have cropped some bytes, but will take them from our internal buffer.

@rfjakob

This comment has been minimized.

Show comment
Hide comment
@rfjakob

rfjakob May 5, 2018

Collaborator

I was curious if that use case really works, so I did:

a/zero -> reverse -> b/eNZPWSyw0rxU7T37UwNN3,n9  ----> cp
d/zero -> reverse -> c/eNZPWSyw0rxU7T37UwNN3,n9  <---/

And it seems wo work at first glance:

$ md5sum a/zero d/zero 
2d56b031dc8683c233c016429084f870  a/zero
2d56b031dc8683c233c016429084f870  d/zero

So that was easy, lets overwrite the middle of the file with itself:

dd if=b/eNZPWSyw0rxU7T37UwNN3,n9 of=c/eNZPWSyw0rxU7T37UwNN3,n9 bs=123 seek=43 skip=43 count=1

Random garbage:

$ md5sum a/zero d/zero 
2d56b031dc8683c233c016429084f870  a/zero
a22fc0525129c3eb2fe1af2e4bc9fd5d  d/zero
Collaborator

rfjakob commented May 5, 2018

I was curious if that use case really works, so I did:

a/zero -> reverse -> b/eNZPWSyw0rxU7T37UwNN3,n9  ----> cp
d/zero -> reverse -> c/eNZPWSyw0rxU7T37UwNN3,n9  <---/

And it seems wo work at first glance:

$ md5sum a/zero d/zero 
2d56b031dc8683c233c016429084f870  a/zero
2d56b031dc8683c233c016429084f870  d/zero

So that was easy, lets overwrite the middle of the file with itself:

dd if=b/eNZPWSyw0rxU7T37UwNN3,n9 of=c/eNZPWSyw0rxU7T37UwNN3,n9 bs=123 seek=43 skip=43 count=1

Random garbage:

$ md5sum a/zero d/zero 
2d56b031dc8683c233c016429084f870  a/zero
a22fc0525129c3eb2fe1af2e4bc9fd5d  d/zero
@rfjakob

This comment has been minimized.

Show comment
Hide comment
@rfjakob

rfjakob May 5, 2018

Collaborator

However, this (note the odd block size):

dd if=b/eNZPWSyw0rxU7T37UwNN3,n9 of=c/eNZPWSyw0rxU7T37UwNN3,n9 bs=123

works, and I'm not sure why.

$ md5sum a/zero d/zero 
2d56b031dc8683c233c016429084f870  a/zero
2d56b031dc8683c233c016429084f870  d/zero

On decryption, we have to know if it is the last block, because the last block is handled differently. Where do we have this information from?

Collaborator

rfjakob commented May 5, 2018

However, this (note the odd block size):

dd if=b/eNZPWSyw0rxU7T37UwNN3,n9 of=c/eNZPWSyw0rxU7T37UwNN3,n9 bs=123

works, and I'm not sure why.

$ md5sum a/zero d/zero 
2d56b031dc8683c233c016429084f870  a/zero
2d56b031dc8683c233c016429084f870  d/zero

On decryption, we have to know if it is the last block, because the last block is handled differently. Where do we have this information from?

@benrubson

This comment has been minimized.

Show comment
Hide comment
@benrubson

benrubson May 5, 2018

Collaborator

I think every 123 bytes block is written using stream cipher (so this creates garbage), until you are ready to write enough bytes (up to blokSize) to read (stream-encode) them again and re-decode the whole block correctly using CBC.

Confirmed (here blockSize is 1024) :

VERBOSE FileNode::write offset 984, data size 123 [FileNode.cpp:247]
VERBOSE streamRead(data, 984, IV) [CipherFileIO.cpp:350]
VERBOSE Called blockWrite [CipherFileIO.cpp:420]
VERBOSE Called streamWrite [CipherFileIO.cpp:429]
Collaborator

benrubson commented May 5, 2018

I think every 123 bytes block is written using stream cipher (so this creates garbage), until you are ready to write enough bytes (up to blokSize) to read (stream-encode) them again and re-decode the whole block correctly using CBC.

Confirmed (here blockSize is 1024) :

VERBOSE FileNode::write offset 984, data size 123 [FileNode.cpp:247]
VERBOSE streamRead(data, 984, IV) [CipherFileIO.cpp:350]
VERBOSE Called blockWrite [CipherFileIO.cpp:420]
VERBOSE Called streamWrite [CipherFileIO.cpp:429]
@benrubson

This comment has been minimized.

Show comment
Hide comment
@benrubson

benrubson May 5, 2018

Collaborator

Strangely, in your failing example above, file get truncated by dd at the end of the 123 bytes written block (I reproduced it).
There is a bug somewhere :)

Collaborator

benrubson commented May 5, 2018

Strangely, in your failing example above, file get truncated by dd at the end of the 123 bytes written block (I reproduced it).
There is a bug somewhere :)

@rfjakob

This comment has been minimized.

Show comment
Hide comment
@rfjakob

rfjakob May 5, 2018

Collaborator

Oh, my bad! You are right, the truncation is what causes the garbage:

dd if=b/eNZPWSyw0rxU7T37UwNN3,n9 of=c/eNZPWSyw0rxU7T37UwNN3,n9 \
 bs=123 seek=43 skip=43 count=1 conv=notrunc

$ md5sum a/zero d/zero 
2d56b031dc8683c233c016429084f870  a/zero
2d56b031dc8683c233c016429084f870  d/zero
Collaborator

rfjakob commented May 5, 2018

Oh, my bad! You are right, the truncation is what causes the garbage:

dd if=b/eNZPWSyw0rxU7T37UwNN3,n9 of=c/eNZPWSyw0rxU7T37UwNN3,n9 \
 bs=123 seek=43 skip=43 count=1 conv=notrunc

$ md5sum a/zero d/zero 
2d56b031dc8683c233c016429084f870  a/zero
2d56b031dc8683c233c016429084f870  d/zero

@benrubson benrubson referenced a pull request that will close this issue May 12, 2018

Open

Handle CBC padding #521

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment