#### Challenge 51: Compression Ratio Side-Channel Attacks

[Back to Index](CryptoPalsWalkthroughs_Cobb.ipynb)

In [1]:
from Crypto.Random import random
import cryptopals as cp
import zlib

<div class="alert alert-block alert-info">   

Internet traffic is often compressed to save bandwidth. Until recently, this included HTTPS headers, and it still includes the contents of responses.

Why does that matter?

Well, if you're an attacker with:

1. Partial plaintext knowledge and
2. Partial plaintext control and
3. Access to a compression oracle

You've got a pretty good chance to recover any additional unknown plaintext.

What's a compression oracle? You give it some input and it tells you how well the full message compresses, i.e. the length of the resultant output.

This is somewhat similar to the timing attacks we did way back in set 4 in that we're taking advantage of incidental side channels rather than attacking the cryptographic mechanisms themselves.

Scenario: you are running a MITM attack with an eye towards stealing secure session cookies. You've injected malicious content allowing you to spawn arbitrary requests and observe them in flight. (The particulars aren't terribly important, just roll with it.)
</div>

    
<div class="alert alert-block alert-info">   
    
So! Write this oracle:

`oracle(P) -> length(encrypt(compress(format_request(P))))`

Format the request like this:

```
POST / HTTP/1.1
Host: hapless.com
Cookie: sessionid=TmV2ZXIgcmV2ZWFsIHRoZSBXdS1UYW5nIFNlY3JldCE=
Content-Length: ((len(P)))
((P))
```
    
(Pretend you can't see that session id. You're the attacker.)

Compress using zlib or whatever.

Encryption... is actually kind of irrelevant for our purposes, but be a sport. Just use some stream cipher. Dealer's choice. Random key/IV on every call to the oracle.

And then just return the length in bytes.
    
</div>

In [2]:
TRUE_SESSION_ID = 'TmV2ZXIgcmV2ZWFsIHRoZSBXdS1UYW5nIFNlY3JldCE='

In [3]:
def challenge51_oracle(P):
    
    key = random.Random.get_random_bytes(32)
    IV  = random.Random.get_random_bytes(8)
    
    request = 'POST / HTTP/1.1\n' \
              'Host: hapless.com\n' \
              'Cookie: sessionid=TmV2ZXIgcmV2ZWFsIHRoZSBXdS1UYW5nIFNlY3JldCE=\n' \
              'Content-Length: ((' + str(len(P)) + '))\n' \
              '((' + P + '))'
    
    #print(request)
    c_request = zlib.compress(request.encode())
    e_request = cp.AESEncrypt(c_request, key, 'CTR', IV)
    
    return(len(e_request))

<div class="alert alert-block alert-info">   
    
Now, the idea here is to leak information using the compression library. A payload of `sessionid=T` should compress just a little bit better than, say, `sessionid=S`.

There is one complicating factor. The DEFLATE algorithm operates in terms of individual bits, but the final message length will be in bytes. Even if you do find a better compression, the difference may not cross a byte boundary. So that's a problem.

You may also get some incidental false positives.

But don't worry! I have full confidence in you.

Use the compression oracle to recover the session id.

I'll wait.
    
</div>

Some Resources:

**1. IACR Paper:**  [Compression and Information Leakage of Plaintext](https://iacr.org/archive/fse2002/23650264/23650264.pdf)   
2. Wikipedia article on [CRIME Attack](https://en.wikipedia.org/wiki/CRIME)     
3. Wikipedia article on [BREACH Attack](https://en.wikipedia.org/wiki/BREACH)   
4. Blackhat [slides on BREACH Attack](https://media.blackhat.com/us-13/US-13-Prado-SSL-Gone-in-30-seconds-A-BREACH-beyond-CRIME-Slides.pdf)   
5. [Paper on BREACH Attack](http://breachattack.com/resources/BREACH%20-%20SSL,%20gone%20in%2030%20seconds.pdf)   
6. [Thomas Pornin Blog Post](https://security.stackexchange.com/questions/19911/crime-how-to-beat-the-beast-successor/19914#19914) that predated the CRIME presentation   
7. [POC Code](https://gist.github.com/stamparm/3698401) by xorninja

This is going to be an _adaptive chosen text attack_, using side channel information leaked by the compression function to infer the unknown session id.

We know:

- Format of the compressed data and contents of the request, with the exception of the `sessionid`.
- The cookie is a `length=44` base64-encoded string



In [4]:
# My first attempt -- assumes no false positives.
base_64_chars = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789+/='
guess = ''
scores = {}

for chunk in range(22):
    for char1 in base_64_chars:
        print('.', end='')
        for char2 in base_64_chars:
            this_guess = char1+char2 # +char3
            scores[this_guess] = challenge51_oracle('Cookie: sessionid=' + guess + this_guess + '~~'*(22-chunk))
    guess += min(scores, key=scores.get)
    print('Guess so far: ', guess)
    
print()
print(f"My Guess: {guess}")
print(f"Actual:   {TRUE_SESSION_ID}")
      
assert(guess == TRUE_SESSION_ID)
      
print("Congrats -- you own my sessionid")

.................................................................Guess so far:  Tm
.................................................................Guess so far:  TmV2
.................................................................Guess so far:  TmV2ZX
.................................................................Guess so far:  TmV2ZXIg
.................................................................Guess so far:  TmV2ZXIgcm
.................................................................Guess so far:  TmV2ZXIgcmV2
.................................................................Guess so far:  TmV2ZXIgcmV2ZW
.................................................................Guess so far:  TmV2ZXIgcmV2ZWFs
.................................................................Guess so far:  TmV2ZXIgcmV2ZWFsIH
.................................................................Guess so far:  TmV2ZXIgcmV2ZWFsIHRo
.................................................................Guess so far:  

---

<div class="alert alert-block alert-info">  
Got it? Great.

Now swap out your stream cipher for CBC and do it again.

</div>

In [5]:
import pdb
from numpy.random import randint

def challenge51_oracle_CBC(P):
    
    key = random.Random.get_random_bytes(32)
    IV  = random.Random.get_random_bytes(16)

    #pdb.set_trace()

    request = 'POST / HTTP/1.1\n' \
              'Host: hapless.com\n' \
              'Cookie: sessionid=TmV2ZXIgcmV2ZWFsIHRoZSBXdS1UYW5nIFNlY3JldCE=\n' \
              'Content-Length: ((' + str(len(P)) + '))\n' \
              '((' + P + '))'
    
    c_request = zlib.compress(request.encode())
    e_request = cp.AESEncrypt(c_request, key, 'CBC', IV, True)
    
    return(len(e_request))

First attempt -- will just try just re-using the code that worked on CTR mode:

In [6]:
guess = ''
scores = {}

for chunk in range(22):
    for char1 in base_64_chars:
        print('.', end='')
        for char2 in base_64_chars:
            this_guess = char1+char2 # +char3
            scores[this_guess] = challenge51_oracle_CBC(
                'Cookie: sessionid=' + guess + this_guess + '~~'*(22-chunk))
    guess += min(scores, key=scores.get)
    print('Guess so far: ', guess)
    
print()
print(f"My Guess: {guess}")
print(f"Actual:   {TRUE_SESSION_ID}")
      
assert(guess == TRUE_SESSION_ID)
      
print("Congrats -- you own my sessionid")

.................................................................Guess so far:  aa
.................................................................Guess so far:  aaaa
.................................................................Guess so far:  aaaaaa
.................................................................Guess so far:  aaaaaaaa
.................................................................Guess so far:  aaaaaaaaaa
.................................................................Guess so far:  aaaaaaaaaaaa
.................................................................Guess so far:  aaaaaaaaaaaaaa
.................................................................Guess so far:  aaaaaaaaaaaaaaaa
.................................................................Guess so far:  aaaaaaaaaaaaaaaaaa
.................................................................Guess so far:  aaaaaaaaaaaaaaaaaaaa
.................................................................Guess so far:  

AssertionError: 

Well, that failed badly...

Since CBC mode isn't a stream cipher, data is padded out and small differences in the length of the compressed data aren't visible.  To get it to work, we have to force a compression difference that results in a different # of AES blocks (16 bytes each).  

I'll try the approach as presented in the original CRIME presentation:

- Based on the way the compression algorithm works, the "window size" impacts whether a string will be replaced as part of compression.  
- It acts as a data boundary -- so it can be used as a disginguisher.
- Construct two guesses for each candidate character:
  1. Query with the "guess" inside the window of the cookie
  2. Query with the "guess" outside the window of the cookie
- If guess is incorrect, it won't be replaced by a reference to the cookie in either request.  The oracle will return a common length for both guesses
- If guess is correct, compression will replace the guess in request 1, but not request 2.  Therefore the oracle will return different lengths for each guess.

I initially tried a padding of 2**15, which I thought was the default window length for zlib in python but didn't have any luck with that.  The code below starts with a padding length of 0 and increments it until we identify a difference.  Then it reduces the set of characters to ones that result in a difference and continues.  Experiments showed that a starting padding length of 1294 is effective.

In [9]:
# Trying the algorithm described in the original CRIME presentation
# The approa

KNOWN_PRE_TEXT = 'POST / HTTP/1.1\n' \
                 'Host: hapless.com\n' \
                 'Cookie: sessionid='

guess = ''
for byte_idx in range(44):
    
    possible_chars = []
    padding_length = 1294
    char_found = False
    
    while not(char_found):
        
        if len(possible_chars) > 0:
            char_list = possible_chars
        else:
            char_list = base_64_chars
            
        possible_chars = []
        
        for guess_char in char_list:
            
            # int('.', end='')
            this_guess = KNOWN_PRE_TEXT + guess + guess_char 
            junk = '~'*padding_length

            guess1 = junk + this_guess
            guess1_len = challenge51_oracle_CBC(guess1)

            guess2 = this_guess + junk
            guess2_len = challenge51_oracle_CBC(guess2)

            if guess1_len != guess2_len:
                possible_chars += guess_char
                
        if len(possible_chars) == 1:
            char_found = True
            guess += possible_chars[0]
            print(f"Guess so far:  {guess} found at padding_length={padding_length}")
            print(guess)
            # Trap bad guesses so we can debug.
            if not(guess[byte_idx] == TRUE_SESSION_ID[byte_idx]):
                pdb.set_trace()
        else:
            if (len(possible_chars) > 0) and (len(possible_chars) < 10):
                print(f'Possible Chars Reduced to: {possible_chars}')
            padding_length += 1
            
        if padding_length > 2**15:
            raise Exception
            
print()
print(f"My Guess: {guess}")
print(f"Actual:   {TRUE_SESSION_ID}")
      
assert(guess == TRUE_SESSION_ID)
      
print("Congrats -- you own my sessionid")

Guess so far:  T found at padding_length=1294
T
Guess so far:  Tm found at padding_length=1294
Tm
Possible Chars Reduced to: ['H', 'V']
Guess so far:  TmV found at padding_length=1295
TmV
Guess so far:  TmV2 found at padding_length=1294
TmV2
Guess so far:  TmV2Z found at padding_length=1294
TmV2Z
Possible Chars Reduced to: ['l', 'X']
Guess so far:  TmV2ZX found at padding_length=1295
TmV2ZX
Guess so far:  TmV2ZXI found at padding_length=1294
TmV2ZXI
Guess so far:  TmV2ZXIg found at padding_length=1294
TmV2ZXIg
Possible Chars Reduced to: ['a', 'c']
Guess so far:  TmV2ZXIgc found at padding_length=1295
TmV2ZXIgc
Guess so far:  TmV2ZXIgcm found at padding_length=1294
TmV2ZXIgcm
Guess so far:  TmV2ZXIgcmV found at padding_length=1294
TmV2ZXIgcmV
Guess so far:  TmV2ZXIgcmV2 found at padding_length=1294
TmV2ZXIgcmV2
Guess so far:  TmV2ZXIgcmV2Z found at padding_length=1294
TmV2ZXIgcmV2Z
Guess so far:  TmV2ZXIgcmV2ZW found at padding_length=1294
TmV2ZXIgcmV2ZW
Guess so far:  TmV2ZXIgcmV2ZWF f

[Back to Index](CryptoPalsWalkthroughs_Cobb.ipynb)