In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("hw02A.ipynb")

In [None]:
def text_clean( text, LETTERS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'):
    """
    Arguments:
        text (str): a piece of text for cleaning
        LETTERS (str, optional): defines the alphabet of allowable characters
    Returns:
        (str): text with only the characters also found in LETTERS
               lower-case letters in text will be made upper-case  
    """

    cleaned_text = '' 
    
    for character in text: 
        if character.upper() in LETTERS:
            cleaned_text += character.upper()
    
    return cleaned_text

In [None]:
def text_block( text, size = 5 ):
    """
    Arguments:
        text (str): text to block
        size (int, optional): # of characters in a block
    Returns:
        (str): text blocked into groups of specified size
    """
    
    blocked_text = '' 
    
    for character in text: 
        if len(blocked_text.replace(' ', '') ) % size == 0 and len(blocked_text) != 0:
            blocked_text += ' '

        blocked_text += character
    
    return blocked_text

# Homework 02: Polyalphabetic Ciphers and Cryptanalysis ¶
At this point, we know enough about polyalphabetic substitution ciphers and how to analyze them to complete a homework that covers the Vigenere cipher, Autokey cipher, and some concepts from the One Time Pad (OTP).

## Imports 
To get you started, run the cell below to load functions that have been written in earlier homework assignments. Functions that are included:

text_clean
text_block
You can use these functions in any of your code below after you’ve imported them.

In [None]:
from hw02toolkit import *

## Preamble
Consider the earliest Caesar cipher program we wrote; it appears below.

In [None]:
# here is the message, already cleaned

message = "ATTACKATDAWN"

# here is the plain text alphabet lined up with the ciphertext alphabet.
# Notice that to change the key we would have to change the second alphabet.

PLAIN_LETTERS  = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
CAESAR_LETTERS = "XYZABCDEFGHIJKLMNOPQRSTUVW"

output = ''

for i in range(len(message)):
    num = PLAIN_LETTERS.find(message[i])
    output += CAESAR_LETTERS[num]
    
print(output)    

We can adapt these lines of code a little bit to build the Vigenère cipher. But <b>first</b> we need to have a key that is as long as our message.

## Question 1: Vigenère Key Generation 
When using the Vigenère cipher, you have the luxury of knowing exactly how long the plaintext / ciphertext are before you start to encrypt or decrypt. You’ll see in other cipher types, like stream ciphers, you don’t always have the complete message before you start your encryption or decryption process. For the Vigenère cipher, that means we can create the entire keystream from the keyword or primer before you get started with the message.

In the cell below write a function that will take in a Vigenère primer / keyword and generate the correct Vigenère keystream when provided the length of the message.

NOTE: You can assume that primer is already “cleaned” before it’s received in this function, so do NOT clean the primer string.

`Example:`

print( vigenere_keygen('TEST', 10) )
TESTTESTTE

In [None]:
def vigenere_keygen(primer, message_length):
    """
    Arguments:
        primer (str): the primer / keyword that will create the entire keystream
        message_length (int): # of characters in the CLEANED message (no spaces or punctuation)
    Returns:
        keystream (str): The entire keystream that can be combined with the message to encipher or decipher
    """

    keystream = ''
    ...

    while len(keystream) < message_length:
        keystream += primer
    key = keystream[0:message_length]
    return(key)
    ...

In [None]:
grader.check("q1")

## Question 2: Vigenère Cipher Function 
Write a function that implements the Vigenère Cipher. The function should be able to encipher and decipher messages depending on the values of encipher parameter. Your function should clean the provided message based on the provided LETTERS string using the text_clean function. Ciphertext output should be blocked into groups of 5 uppercase characters and plaintext output should be returned as lowercase characters with no spaces.

<b>Hint: You can use your vigenere_keygen function from the previous question if you’d like, but there are no tests that will specifically ensure that you do.</b>

<i> Examples: </i>

print( vigenere('hospital', 'onaplaneaplaneisdue') )

VBSET TNPHD DPVXI DKIW

print( vigenere('hospital', 'VBSET TNPHD DPVXI DKIW', encipher=False)

onaplaneaplaneisdue

<!-- BEGIN QUESTION -->



In [None]:
def vigenere(primer, message, encipher=True, LETTERS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'):
    """
    Arguments:
        keyword (str): the primer / keyword that will be used to create the entire keystream
        message (str): either the plaintext or ciphertext to work with
        encipher (bool, optional): True --> encipher the message, False --> decipher
        LETTERS (str, optional): defines the alphabet of allowable characters
    Returns:
        (str): encrypted / decrypted version of message formatting to specifications
    """
    cleaned_keyword = ...
    cleaned_message = ...
    keystream = ...
    output = '' 
    
    # Your code below this comment
    ...
    if encipher == True:
        for i in range(len(cleaned_message)):
            num = LETTERS.find(cleaned_message[i])
            num2 = LETTERS.find(keystream[i])
            num3 = (num + num2) % 26
            output += LETTERS[num3]
            
        return(text_block(output,5))
    if encipher == False:
        for i in range(len(cleaned_message)):
            num = LETTERS.find(cleaned_message[i])
            num2 = LETTERS.find(keystream[i])
            num3 = (num - num2) % 26
            output += (LETTERS[num3]).lower()
            
        return(output)
    ...

Try these messages from recent headlines:

print( vigenere('Zelensky', 'Ukraine has sent reinforcements to Bakhmut') )

>'TJQZH MDGZR RDMSQ DHMEN QBDLD MSRSN AZJGL TS'

print( vigenere('hospital', 'DKHKH KKXRZ HCHSV NTKCH LLDCH ZSDKX QDCTB DSGDO QHBDN EHSRF DMDQH BUDQR HNMNE GTLZK NF', encipher = False) )

>'elilillysaiditwouldimmediatelyreducethepriceofitsgenericversionofhumalog'

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

## Question 3: Autokey Cipher

First, by hand, encipher the message "Hello" with the keyword "key" using autokey. Save your solution to a string called 'solution' in the cell below.

In [None]:
solution = ...

<!-- END QUESTION -->



The autokey cipher does not allow you to compute the entire keystream from the start when deciphering messages, since you need to recover some of the plaintext before can continue constructing the keystream. As a result the function for completing the autokey cipher will need to be a bit different than the others you've already written for the caesar, affine, and now Vigenère ciphers. You will need to think carefully about how to modify the keystream after each letter you encipher or decipher to ensure that it has sufficient characters to finish creating the message.

**Examples**
```
>>> print( autokey('UNICORN', 'acceptthegreaterchallenge' ) )
UPKGD KGHGI VTTML VIYEL EIEIL

>>> print( autokey('unicorn', 'UPKGD KGHGI VTTML VIYEL EIEIL', False) )
acceptthegreaterchallenge
```

In [None]:
def autokey(keyword, message, encipher = True, LETTERS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'):
    """
    Arguments:
        keyword (str): the primer / keyword that will be used to create the keystream
        message (str): either the plaintext or ciphertext to work with
        encipher (bool, optional): True --> encipher the message, False --> decipher
        LETTERS (str, optional): defines the alphabet of allowable characters
    Returns:
        (str): encrypted / decrypted version of message formatting to specifications
    """
    cleaned_keyword = ...
    cleaned_message = ...
    output = ...
    
    ...
    
    keystream = cleaned_keyword
    
    if encipher == True:
        cleaned_keyword = cleaned_keyword + cleaned_message
        keystream = cleaned_keyword[0:len(cleaned_message)]
        for i in range(len(cleaned_message)):
            num = LETTERS.find(cleaned_message[i])
            num2 = LETTERS.find(keystream[i])
            num3 = (num + num2) % len(LETTERS)
            output += LETTERS[num3]
        output = text_block(output)
    if encipher == False:
        for i in range(len(cleaned_message)):
            num = LETTERS.find(cleaned_message[i])
            num2 = LETTERS.find(cleaned_keyword[i])
            num3 = (num - num2) % len(LETTERS)
            output += (LETTERS[num3]).lower()
            cleaned_keyword += LETTERS[num3]
    return(output)        
        
    ...
  


In [None]:
#Try these:

print(autokey('UNICORN', 'acceptthegreaterchallenge'))
print(autokey('unicorn', 'UPKGD KGHGI VTTML VIYEL EIEIL', False)) 

In [None]:
grader.check("q3_2")

<!-- BEGIN QUESTION -->

## Question 4: Character Frequency

One goal of the Vigenere cipher is to help disguise character frequencies, hopefully creating an almost uniform distribution of each letter in the alphabet making it impossible for an attacker to use frequency analysis to help crack the message. Let's see how well your Vigenere function will disguise a message.

Run the code cell below to load the sample plaintext found in `hw02plaintext.txt` to the variable `plaintext`. The file contains the entire book, *The Scarlet Letter*.

In [None]:
with open('hw02.txt') as f:
    plaintext = f.read() 

In the code cell below, encipher this message using the `autokey` function with a key word of your choosing. Save this result to the variable `ciphertext`.


In [None]:
ciphertext = autokey("Unicorn",plaintext[0:100000])

Run the following cell to import mathplotlib.pyplot as plt

In [None]:
import matplotlib.pyplot as plt

<!-- BEGIN QUESTION -->

Now create a graph of the character frequencies as before.

In [None]:
frequencies = ...

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

What do you notice about this plot? Type your response into the cell below:

_Type your answer here, replacing this text._

<!-- END QUESTION -->

# Submitting your work
You're done with this homework! All assignments in the course will be distributed as notebooks like this one, and you will submit your work by doing the following:
* Save your notebook
* Restart the kernel and run up to this cell.
* Run all the tests by running the cell containing `grader.check_all()`. Make sure they pass the way you expect them to.
* Run the cell below with the code `grader.export(...)`.
* Download the file named `labXX.zip`, found in the explorer pane on the left side of the screen.
* Upload `labXX-<date-time stamp>.zip` to the corresponding lab assignment on Canvas.

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit.

When done exporting, find the `.zip` file in the left side of the screen in the file browser, right-click, and select **Download**. You'll submit this `.zip` file for the assignment Gradescope for grading.

In [None]:
grader.export(pdf=False, force_save=True)