In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("activity-12-decrypt-the-classics.ipynb")

# Activity 12: Decrypt the Classics

In **Activity 11: Encrypt the Classics**, you generated a ciphertext that was created from a text file of a classic novel from Project Gutenberg (https://www.gutenberg.org). In this lesson you will be decrypting a message from a classmate without prior knowledge of the key. To do so, you'll implement a brute force attack combined with $\chi^2$ scoring to automatically choose the most likely candidate plaintext.

## Step 1: Sharing Ciphertext

### Question 1.1

Revisit the notebook you created for Activity 11 in which you created the ciphertext used to create the bar chart. Add a cell to the end of the notebook to save this ciphertext to a textfile (see Lab 02 for a reminder on how to do this!). Your ciphertext should only contain the 26 uppercase English letters. No spaces, no punctuation or other symbols. 

Name this file as `activity-12-<first name initial><last name initial>.txt`. For example, Julius Caesar would name his file `activity-12-jc.txt`. Download this file to your local computer.

### Question 1.2

Reply to the Activity 12 thread on EdSTEM with a post in which you include this file so your classmates can analyze it later on in this activity.

### Question 1.3

Download another student's ciphertext to your local computer from EdSTEM, and then upload it to `activities` subfolder contained in the course folder. That's the same location as the file you're working in right now.

### Question 1.4

Load the ciphertext from your classmate into this notebook (see Activity 11 or Lab 02 for a refresher) as a string stored to the variable named `book`.

#### WARNING

The variable `book` will contain a VERY long string. So long, in fact, that Datahub will struggle if you try to print out the whole thing at one time. When testing your code throughout this activity, you should never attempt to print the entire string at a time, but rather print a small slice of that string (100 or even a 1000 characters is no issue).

In [None]:
...
    book = ...

print(book[0:1000])

In [None]:
grader.check("q1_4")

## Step 2: Write a Scoring Function

In order to help determine the correct keys, you'll need a function to help score the candidate plaintexts. Recall the method for $\chi^2$ scoring:

$$ \chi^2 = \sum_{i=A}^Z \frac{\left( A_i-E_i \right)^2}{E_i} $$

where $A_i$ is the actual count of the character in the message, and $E_i$ is the expected count of the character in the message. You do this calculation once for each character in the message, and then sum each of the calculations to determine the total score, $\chi^2$.

Let's work through a few of the calculations on their own, and then ultimately put them together in a function to perform the scoring all at once.

### Question 2.1: Counting the Characters

Write a `for` loop below that iterates over `LETTERS` one character at a time, and for each character counts the number of occurrences of that character in the string named `book`. Each time you count a character, append the value to a list named `count`.

In [None]:
LETTERS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
count = []

for char in LETTERS:
    ...

In [None]:
grader.check("q2_1")

## Question 2.2: Computing the Expected Characters

Write another `for` loop below that computes the **expected** number of occurrences of each character from the standard English alphabet that would be contained the string named `book` if were correctly deciphered. Remember, this calculation is based on the distribution of English letters (provided for you from A-Z in the list named `standard_frequencies`) and the length of `book`. Each time you compute the expected number of characters, append the value to the list named `expected`.

In [None]:
LETTERS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
standard_frequencies = [0.08167, 0.01492, 0.02782, 0.04253, 0.12702, 0.02228, 0.02015, 0.06094, 0.06966, 0.00153, 0.00772, 0.04025, 0.02406, 0.06749, 0.07507, 0.01929, 0.00095, 0.05987, 0.06327, 0.09056, 0.02758, 0.00978, 0.02360, 0.00150, 0.01974, 0.00074]
length_of_book = ...
expected  = []

for i in standard_frequencies:
    ...
    
print(expected)

In [None]:
grader.check("q2_2")

### Question 2.3: Scoring the Ciphertext

We know that the ciphertext you took from a classmate is NOT plaintext, so we'd expect it to have a poor (large) $\chi^2$ score. Let's confirm that by computing the score in the cell below. Remember, you have the values for $A$ (actual count of the candidate plaintext, or in this case ciphertext) contained in the list `count` and $E$ the expected count of characters for a message of this length contained in the list `expected`. You should pull out corresponding values from these lists to make your calculation for each of the 26 characters.

In [None]:
sub_scores = []

...
    ...

print(sub_scores)

In [None]:
grader.check("q2_3")

### Question 2.4

Now that you've got all the `sub_scores` in a list, write a quick line of code that will sum them up to finish computing the $\chi^2$ score.

In [None]:
score = ...
print(score)

In [None]:
grader.check("q2_4")

## Step 3: Putting it all Together

Now that you've done all the individual steps of the process, let's put it all together to determine the keys used to create the message.

### Question 3.1 Functionize It

Write a function `chi_squared_score` that takes in a candidate plaintext (`str`) and returns the chi-squared score (`float`) of that string. The start of that function is in the cell below. Make sure that you are now referring to the parameter of the function, `candidate` instead of the string `book` that you were working in in parts 1 and 2. We can pass the function `book` if we want to, but it should also correctly score other strings that get passed in as well.

In [None]:
def chi_squared_score( candidate ):
    LETTERS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
    standard_frequencies = [0.08167, 0.01492, 0.02782, 0.04253, 0.12702, 0.02228, 0.02015, 0.06094, 0.06966, 0.00153, 0.00772, 0.04025, 0.02406, 0.06749, 0.07507, 0.01929, 0.00095, 0.05987, 0.06327, 0.09056, 0.02758, 0.00978, 0.02360, 0.00150, 0.01974, 0.00074]
    
    count = ...
    expected  = ...
    sub_scores = ...
    
    length_of_candidate = ...
    
    ...
        ...
        
    ...
        ...
        
    ...
        ...
    
    score = ...
    
    return score

In [None]:
grader.check("q3_1")

### Question 3.2: Determine the Keys

The code below will implement a brute-force attack on the ciphertext contained in `book`, and use your chi-squared function to score each candidate plaintext. The way it will do this is by using a `for` loop to select each valid multiplicative key `km` and for each of those keys use another `for` loop to select each valid additive key `ka`. After computing the score for each of the 312 candidate plaintexts, the code stores each `km`, `ka`, and $\chi^2$ score into a list together. Each of these lists is then stored into a different list named `results` that contains all possible 312 lists. It sorts those lists by the $\chi^2$ score and displays the `km`, `ka` used to generate the lowest score. Assuming everything before this cell worked correctly, you should just be able to run the cell below to generate the results.

**Note:** This next cell will likely run for a minute or two, so be patient. Keep in mind that it's deciphering the book you've imported 312 times, and for each of those candidates it's counting every character and then computing the $\chi^2$ score for the candidate. It's going to take a little while!

**Note 2:** While you don't need to fully understand the code below (especially the last line and the parts that make the progress bar run), you should be able to read it and get a feel for what the code is doing.

In [None]:
from operator import itemgetter
from activity12toolkit import affine
from tqdm import tqdm

results = []

with tqdm(total=312) as pbar:
    for km in [1, 3, 5, 7, 9, 11, 15, 17, 19, 21, 23, 25]:
        for ka in range(26):
            candidate_text = affine(book, km, ka, encipher=False)
            one_result = [km, ka, chi_squared_score( candidate_text )]
            results.append( one_result )
            pbar.update(1)

results = sorted(results, key=itemgetter(2))
print('Done!')

Once the cell above finishes running, you can run the cell below to see your results.

In [None]:
print(f"The multiplicative key {results[0][0]} and additive key {results[0][1]} yield the lowest chi-squared score of {results[0][2]}")
deciphered_book = affine(book, results[0][0], results[0][1], encipher=False).lower()
print(f"A sample of the best candidate is: {deciphered_book[0:1000]}")

### Wrapping Up

Assuming everything seems reasonable with your solution, go ahead and post a response to EdSTEM replying to the person from whom you chose your ciphertext. You should include the keys you determined, and if you could figure it out from the sample plaintext above, the title of the book they chose. If you need more text to work from, you can inspect the variable `deciphered_book` which should contain the best guess at the plaintext book. **Warning** It may be very long, so it's best not to print the whole thing out. You could either save it to a file, or just print some of the characters using a slice.

In [None]:
print( deciphered_book[0:1000] )