In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("lab05.ipynb")

# Lab 05: Binary Stream Ciphers

Binary stream ciphers, like the Linear Feedback Shift Register (LFSR), are great at producing statistically random sequences of 1's and 0's. In this lab we'll learn how to work with binary numbers a bit more in Python, and program a function to implement a LFSR in Python.

## Shift Operators

Bitwise shift operators are another kind of tool for bit manipulation. They let you move the bits around, which will be handy for creating bitmasks later on. In the past, they were often used to improve the speed of certain mathematical operations since they are very low level operations on the computer and can be executed very quickly.

### Left Shift

The bitwise left shift operator (`<<`) moves the bits of its first operand to the left by the number of places specified in its second operand. It also takes care of inserting enough zero bits to fill the gap that arises on the right edge of the new bit pattern:

![](https://files.realpython.com/media/lshift.e06f1509d89f.gif)

Run the cell below to see this operation in action:

In [None]:
a = 0b100111
print( 'a:        ', format(a, 'b'), a )
print( 'a << 1:  ', format(a << 1, 'b'), a << 1 )
print( 'a << 2: ', format(a << 2, 'b'), a << 2 )
print( 'a << 3:', format(a << 3, 'b'), a << 3 )

You may have noticed that, in general, shifting bits to the left corresponds to multiplying the number by a **power of two**, with an exponent equal to the number of places shifted:

$$a << n = a \cdot 2^n$$

Think about why shifting each bit to the left is equivalent to doubling the value of the entire number. Discuss with your classmates in your group before moving on.

### Right Shift

The bitwise right shift operator (`>>`) is analogous to the left one, but instead of moving bits to the left, it pushes them to the right by the specified number of places. The rightmost bits always get dropped:

![](https://files.realpython.com/media/rshift.9d585c1c838e.gif)

Every time you shift a bit to the right by one position, you halve its underlying value. Moving the same bit by two places to the right produces a quarter of the original value, and so on. When you add up all the individual bits, you’ll see that the same rule applies to the decimal number they represent:

In [None]:
a = 0b100111
print( 'a:     ', format(a, 'b'), a )
print( 'a >> 1: ', format(a >> 1, 'b'), a >> 1 )
print( 'a >> 2:  ', format(a >> 2, 'b'), a >> 2 )
print( 'a >> 3:   ', format(a >> 3, 'b'), a >> 3 )

Halving an odd number such as $15710$ would produce a fraction. To get rid of it, the right shift operator automatically floors the result (rounds down to the nearest integer). It’s virtually the same as a floor division by a power of two:

$$ a >> n = \left\lfloor \frac{a}{2^n} \right\rfloor $$

That means in Python you can either shift a number 1 bit to the right, `a >> 1`, or floor divide by 2, `a // 2` to obtain the same result.

In [None]:
print(5 >> 1)
print(5 // 2)

## Bitmasks

A bitmask works like a graffiti stencil that blocks the paint from being sprayed on particular areas of a surface. It lets you isolate the bits to apply some function on them selectively. Bitmasking involves both the bitwise logical operators and the bitwise shift operators that you’ve read about.

There are a few common types of operations associated with bitmasks. You’ll take a quick look at some of them below.

### Getting a Bit

Remember, the bitwise AND operation will match up corresponding bits in a number and return a 0 if both bits are 0, a 0 if one bit is 0 and one bit 1, and return a 1 if both bits are 1. 

If you think about that, that means that if you take any number, and use the bitwise AND operation against it with a binary number of all 0's, you'll get back all zeros. That's because if your original number had a bit with value 0, `0 AND 0` returns 0, and if your original number had a bit with value 1, `1 AND 0` returns 0:

In [None]:
# 215 & 0
0b11010111 & 0b00000000

If you were to bitwise AND a number against a bitmask of all 1's, you'd get back the exact same number, since if your original number had a bit with value 0, `0 AND 1` returns 0, and if your original number has a bit with value 1, `1 AND 1` returns 1.

In [None]:
0b11010111 & 0b11111111

We can use these bitmasks (with the shift operators) to create a method to read the value of a particular bit in a number.

To read the value of a particular bit at a given position, you can use the bitwise AND (`&` operator) against a bitmask composed of all 0's except for one bit set to 1 at the desired location. The resulting number will have a 0 anywhere the bitmask had a 0, and retain the value of the bits anywhere the bitmask had a 1.

Example #1:

```
 number: 10100000
bitmask: 00010000  <-- will return value of bit at b_5
-----------------
result:  00000000
            ^ (the bit at b_5 was a 0)
```

In [None]:
0b10100000 & 0b10000

Example #2
```
 number: 10100000
bitmask: 00100000  <-- will return value of bit at b_6
-----------------
result:  00100000  (32 in decimal)
           ^ (the bit at b_6 was a 1)
```

In [None]:
0b10100000 & 0b100000

If you were interested in just knowing the bit value, 0 or 1, and not keeping the rest of the output from the bitmask process, then all you need to do is right-shift the result (`>>`) the appropriate number of spaces.

```
 number: 10100000
bitmask: 00100000
-----------------
result:  00100000
result >> 5 : 001
```

In [None]:
(0b10100000 & 0b100000) >> 5

### Question 1

Write a function `get_bit` that takes in an integer (decimal or binary, remember Python treats them as the same) and the index of a bit in that number. The function should use the bitmask technique to return just a 0 or a 1 that matches the value at the indicated index in the provided number. **Remember, the right-most bit has an index value of 1!**

**Hint:** Think about how you can use the argument passed to `bit_index` to construct the bit mask that you'll need to apply. Then use the AND `&` operation and right-shift `>>` to return just the indicated bit value.

In [None]:
def get_bit(value, bit_index):
    bitmask = ...
    result = ...
    ...

In [None]:
get_bit(0b10100001, bit_index=1)

In [None]:
grader.check("q1")

### Setting a Bit

Defining a bit to have a value of 1 is called *setting a bit*, and the operation similar to getting the value of a bit. You take advantage of the same bitmask as before, but instead of using bitwise AND, you use the bitwise OR (`|`, usually found near the enter key on your keyboard) operator.

Think about why this would be true. The bitwise OR operator will match up corresponding bits in two numbers and return:
* a 0 if both bits are 0, 
* a 1 if one bit is 0 and the other bit is 1, 
* a 1 both bits are 1. 

So if you wanted to set the bit at $b_4$ you could do:

```
number:  00010
bitmask: 01000 <-- b_4's value is 1, rest of bits are 0
--------------
result:  01010 (10 in decimal)
          ^ set to a 1 using the OR operation
```

In [None]:
0b00010 | 0b01000

### Question 2

Write a function, `set_bit` that will set take in an integer and a the index of a bit in that number. The function should apply a bitmask that sets the bit at the indicated index to 1, and return that result.

In [None]:
def set_bit(value, bit_index):
    bitmask = ...
    result = ...
    ...

In [None]:
set_bit(0b10, bit_index = 4)

In [None]:
grader.check("q2")

### Unsetting a Bit

Defining a bit to have a value of 0 is called *unsetting a bit* or *clearing a bit*. You can use a bitmask and the bitwise AND (`&`) operator to accomplish this task.

```
number:  10010
bitmask: 11101 <-- b_2's value is 0, rest of bits are 1
--------------
result:  10000  (16 in decimal)
            ^ unset to a 0 using the AND operation
```

Notice how this is very similar to setting a bit, only the bitmask is inverted. That is, all the 0's are now 1's, and vice versa. It turns out, there's an operation that can do this for us! The invert (`~`) operation will invert an integer's binary form.

In [None]:
0b10010 & 0b11101

In [None]:
0b10010 & ~(0b00010)

### Question 3

Write a function, `clear_bit` that will set take in an integer and an index of a bit in that number. The function should apply a bitmask that unsets the bit at the indicated index to 0, and return that result.

In [None]:
def clear_bit(value, bit_index):
    bitmask = ...
    result = ...
    ...

In [None]:
clear_bit(0b11111111, bit_index=6)

In [None]:
grader.check("q3")

### Toggling a Bit

Lastly, sometimes you'll just want to toggle a bit from a 1 to a 0 or vice versa. We can accomplish that using a bitmask with a bitwise XOR (`^`) operation. Remember, the bitwise XOR operation returns:

* a 0 if the corresponding bits are the same (both 0's or both 1's)
* a 1 if the corresponding bits are different (one is a 0 and the other is a 1)

Example #1
```
number:  10010
bitmask: 00100 <-- b_3's value is 1, rest of bits are 0
--------------
result:  10110 (decimal value of 22)
           ^   toggled using a 1 in the bitmask with the XOR operation
```

In [None]:
0b10010 ^ 0b00100

Example #2
```
number:  10010
bitmask: 10000 <-- b_5's value is 1, rest of bits are 0
--------------
result:  00010 (decimal value of 2)
         ^     toggled using a 1 in the bitmask with the XOR operation
```

In [None]:
0b10010 ^ 0b10000

### Question 4

Write a function, `toggle_bit` that will set take in an integer and an index of a bit in that number. The function should apply a bitmask that toggles the bit at the indicated index and return the result.

In [None]:
def toggle_bit(value, bit_index):
    bitmask = ...
    result = ...
    ...

In [None]:
toggle_bit(0b10010, bit_index=3)

In [None]:
grader.check("q4")

## LFSR PRNGs

Now that you understand how to shift, get, set, unset, and toggle bit values we know more than enough to write a function that can mimic the LFSR process of generating pseudorandom digits. Let's recall the process of an LFSR, but use the language of shift, get, set, unset, and toggle to describe the operations.

We'll define the LFSR system of having a current state defined by the bits: $b_n, b_{n-1}, ..., b_2, b_1$ (this is like a row in the table representation that you've seen in class) and an output stream defined by $k_n, k_{n-1}, ..., k_2, k_1$ (these are the bits that are in the right-most column in the table representation in class).

An LFSR System:
* **gets** the value of the bit at $b_1$ in the current state
* The output stream is **left-shifted** by 1, and the value of the new $k_1$ is either **set** or **unset** to match $b_1$
* Using the LFSR definition, you **get** the value of some of the bits and **XOR** those values
* The current state is **right-shifted** by 1
* The result of the XOR is used to **set** $b_1$ in the shifted current state

### Question 5

Write a function, `lfsr8` that implements an 8-bit LFSR system with taps at bits 5, 4, 3, and 1. The function should take in a seed value and an integer that represents how many bits the LFSR should output. Use the comments in the scaffolded function below to help you think through what you could do to complete each operation.

In [None]:
def lfsr8(seed, num_bits):
    """
            8 7 6 5 4 3 2 1 (bits == 8)
           ┌─┬─┬─┬─┬─┬─┬─┬─┐
        ┌─→│1│0│1│0│0│0│0│1├─→ output
        │  └─┴─┴─┴┬┴┬┴┬┴─┴┬┘
        └──────XOR┘ │ │   │
                └─XOR─┘   │
                      └XOR┘
                          (taps == 5, 4, 3, 1)
    """
    current_state = seed
    output = 0
    
    for i in range(num_bits):
        
        # gets the value of the bit at $b_1$ in the current state
        output_bit = ...
        
        # left-shift the output stream by 1
        shifted_output = ...
        
        # if the output bit is 1, set k_1 to 1
        if output_bit == 1:
            output = ...
        else: #else, just leave k_1 as 0 (assign shifted_output to outpust "as is")
            output = ...
        
        # Get the value of bits 1 and 7 and **XOR** those values to generate a new bit
        new_bit = ...
        
        # Right-shift the current state by 1, creating room for a new bit at position 8
        shifted_current_state = ...
        
        # if the new_bit is 1, then set b_8 in shifted_current_state to 1 and assign to current_state
        if new_bit == 1:
            current_state = ...
        else: # otherwise, just leave b_8 as 0 by assigning shifted_current_state to current_state
            current_state = ...
    
    return output

In [None]:
format( lfsr8(0b10100001, 100), 'b')

In [None]:
grader.check("q5")

## A known-plaintext attack in action

When data is stored on the computer as a file, there's frequently some additional data tucked away in the file called *metadata* or a *file header*. This additional data is usually used by your computer to help identify what kind of data the file contains (i.e. is it an image, a spreadsheet, a database, etc.) so it knows what to do with a file when you double click on it in your file explorer. These *file headers* are usually very well structured to help the computer to know what data to find where. This structure, however, can be helpful to an attacker! Since the header is going to be the same for every file of the same type, if the attacker knows what kind of file was encrypted then they have some guaranteed known plaintext about the file.

Take for example the following blue square, saved as a *bitmap file*, which is an image file that ends in `.bmp`.

![Blue Square](blue.bmp)

Using a special tool called `xxd` we can inspect this file at the the bit level. Run the cell below to show the first 78 bytes (624 bits) of the file.

In [None]:
!xxd -l 78 -b blue.bmp

The first column is a counter of how many bytes have been displayed. The counter keeps track in hexadecimal. The far-right column displays each byte (8-bits) as an ASCII character, if that value corresponds to a printable ASCII character. The middle columns show the data stored in the file.

Now, take this image of a gold square

![Gold Square](gold.bmp)

and inspect the first 78 bytes of it's contents:

In [None]:
!xxd -l 78 -b gold.bmp

Visually inspecting the contents of both files shows that they have the exact same contents up to the row labeled `00000036`. That means that the first 54 bytes (432 bits) of data are identical in these two files! The actual data about the image itself bust not start until row `00000036`, and everything before is part of the image *header*.

Let's see how knowing that all bitmap files of this type will start the same could be helpful to us as an attacker when someone reuses a keystream.

## Working with the binary

The code below will pull out the binary that represents the entirety of the `blue.bmp` and `gold.bmp` files and store them as integers named `plaintext_blue` and `plaintext_gold` respectively.

In [None]:
from struct import unpack

plaintext_blue = ''
plaintext_gold = ''

with open("blue.bmp", "rb") as file_object:
    my_file = file_object.read()
    for i in range(0, len(my_file)):
        plaintext_blue += format(my_file[i], '08b')

    plaintext_blue = int(plaintext_blue, 2)
    
with open("gold.bmp", "rb") as file_object:
    my_file = file_object.read()
    for i in range(0, len(my_file)):
        plaintext_gold += format(my_file[i], '08b')
    
    plaintext_gold = int(plaintext_gold, 2)

Let's look at the last 100 binary digits of these files. We can see that while the start of the files was the same due to the header, the end of these files are in fact different.

In [None]:
format(plaintext_blue, 'b')[-100:-1]

In [None]:
format(plaintext_gold, 'b')[-100:-1]

### Question 6

Use the `lfsr8` function to generate a keystream with the seed value of `0b11010101` that's long enough to encrypt each of these files. Use the same keystream to encrypt each plaintext. Store the ciphertext to `ciphertext_blue` and `ciphertext_gold` respectively. The ciphertexts should be integers!

**Hint:** Don't forget that the `.bit_length()` method can be used to figure out how many bits are needed to represent a number in binary!

In [None]:
keystream = lfsr8(0b11010101, plaintext_blue.bit_length())
ciphertext_blue = plaintext_blue ^ keystream
ciphertext_gold = plaintext_gold ^ keystream

In [None]:
grader.check("q6")

Looking at the first 400 bits of the ciphertexts, we can see that the plaintext bitmap headers were in fact encrypted to the exact ciphertext.

In [None]:
print(format(ciphertext_blue, 'b')[0:400])

In [None]:
print(format(ciphertext_gold, 'b')[0:400])

It's this predictability is what leads to the opportunity of a known-plaintext attack!

Suppose you thought that the encrypted file was a bitmap (`.bmp`). You could XOR this intercepted ciphertext (for either file!) with the known header for bitmap files, which would produce the first 400 bits of the keystream. Unless your keystream has a long enough period, these 400 bits of the keystream that were recovered would be more than enough to deduce the seed value of the LFSR. In fact, obtaining $n$ bits of the keystream for an $n$-bit LFSR is enough to determine the seed value! Since we used an 8-bit LFSR to encrypt these images, knowing just 8 bits of the header would be enough to eventually produce the entire keystream and decrypt the entire file.

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(pdf=False)