In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("lab.ipynb")

# Lab 7 – Regular Expressions

## DSC 80, Fall 2022

### Due Date: Monday, November 14th at 11:59 PM

## Instructions
Much like in DSC 10, this Jupyter Notebook contains the statements of the problems and provides code and Markdown cells to display your answers to the problems. Unlike DSC 10, the notebook is *only* for displaying a readable version of your final answers. The coding will be done in an accompanying `lab.py` file that is imported into the current notebook.

Labs and programming assignments will be graded in (at most) two ways:
1. The functions and classes in the accompanying `lab.py` file will be tested (a la DSC 20),
2. The notebook may be graded (if it contains free response questions or asks you to draw plots).

**Do not change the function names in the `lab.py` file!**
- The functions in the `lab.py` file are how your assignment is graded, and they are graded by their name.
- If you changed something you weren't supposed to, just use git to revert! Ask us if you need help with this, or google around for `git revert`.

**Tips for working in the notebook**:
- The notebooks serve to present the questions and give you a place to present your results for later review.
- The notebooks in *lab assignments* are not graded (only the `lab.py` file is submitted and graded).
- The notebook serves as a nice environment for 'pre-development' and experimentation before designing your function in your `lab.py` file. You can write code here, but make sure that all of your real work is in the `lab.py` file.

**Tips for developing in the `lab.py` file**:
- Do not change the function names in the starter code; grading is done using these function names.
- Do not change the docstrings in the functions. These are there to tell you if your work is on the right track!
- You are encouraged to write your own additional helper functions to solve the lab! 
- Always document your code!

### Importing code from `lab.py`

* We import our `lab.py` file that's contained in the same directory as this notebook.
* We use the `autoreload` notebook extension to make changes to our `lab.py` file immediately available in our notebook. Without this extension, we would need to restart the notebook kernel to see any changes to `lab.py` in the notebook.
    - `autoreload` is necessary because, upon import, `lab.py` is compiled to bytecode (in the directory `__pycache__`). Subsequent imports of `lab` merely import the existing compiled python.

In [23]:
%load_ext autoreload
%autoreload 2

In [24]:
from lab import *

In [25]:
import pandas as pd
import numpy as np
import os
import re

***Note:*** While working on the lab, check the Campuswire post titled "Lab 7 Released!" for any clarifications.

## Question 1 – Practice with Regular Expressions 🛠

Regular expressions can be tricky, and the best way to gain familiarity with them is through lots of practice. In this question, you will work through ten exercises, each of which requires you to write a regular expression that matches strings that satisfy certain criteria. Make sure to take a close look at the doctests for each function in `lab.py`, as they provide useful guidance for the types of strings you should and shouldn't match.

***Notes:*** 
- We recommend having [regex101.com](https://regex101.com/) open while working.

- Each exercise has a star rating, between 1 (⭐️) and 3 (⭐️⭐️⭐️) stars, indicating its difficulty level (1 being the easiest, 3 being the hardest). If you are spending lots of time on 1-star exercises, take a close look at the syntax from lecture, as there is probably an easier way of writing the necessary pattern!

<br>

### Exercise 1 (⭐️)

Write a regular expression that matches strings that have `'['` as the third character and `']'` as the sixth character.

<br>

### Exercise 2 (⭐️)

Write a regular expression that matches strings that are phone numbers that start with `'(858)'` and follow the format `'(xxx) xxx-xxxx'` (`'x'` represents a digit).

***Note:*** There is a space between `'(xxx)'` and `'xxx-xxxx'`.

<br>

### Exercise 3 (⭐️)

Write a regular expression that matches strings that:
- are between 6 and 10 characters long (inclusive),
- contain only alphanumeric characters, whitespace and `'?'`, and
- end with `'?'`.

<br>

### Exercise 4 (⭐️⭐️)

Write a regular expression that matches strings with exactly two `'$'`, one of which is at the start of the string, such that:
- the characters between the two `'$'` can be anything (including nothing) except the lowercase letters `'a'`, `'b'`, and `'c'`, (and `'$'`), and
- the characters after the second `'$'` can only be the **lowercase or uppercase** letters `'a'`/`'A'`, `'b'`/`'B'`, and `'c'`/`'C'`, with every `'a'`/`'A'` before every `'b'`/`'B'`, and every `'b'`/`'B'` before every `'c'`/`'C'`. There must be at least one `'a'` or `'A'`, at least one `'b'` or `'B'`, and at least one `'c'` or `'C'`.
    

<br>

### Exercise 5 (⭐️)
Write a regular expression that matches strings that represent valid Python file names, including the extension. 

***Note:*** For simplicity, assume that file names contains only letters, numbers, and underscores (`'_'`).

<br>

### Exercise 6 (⭐️)
Write a regular expression that matches strings that:
- are made up of only lowercase letters and exactly one underscore (`'_'`), and
- have at least one lowercase letter on both sides of the underscore.

<br>

### Exercise 7 (⭐️)
Write a regular expression that matches strings that start with and end with an underscore (`'_'`).

<br>

### Exercise 8 (⭐️)

Apple serial numbers are strings of length 1 or more that are made up of any characters, other than
- the uppercase letter `'O'`, 
- the lowercase letter `'i`', and 
- the number `'1'`.

Write a regular expression that matches strings that are valid Apple serial numbers.

<br>

### Exercise 9 (⭐️⭐️)

ID numbers are formatted as `'SC-NN-CCC-NNNN'`, where 
- SC represents state code in uppercase (e.g. `'CA'`),
- NN represents a number with 2 digits (e.g. `'98'`),
- CCC represents a three letter city code in uppercase (e.g. `'SAN'`), and
- NNNN represents a number with 4 digits (e.g. `'1024'`).

Write a regular expression that matches strings that are ID numbers corresponding to the cities of `'SAN'` or `'LAX'`, or the state of `'NY'`. Assume that there is only one city named `'SAN'` and only one city named `'LAX'`.

<br>

### Exercise 10 (⭐️⭐️⭐️)

Write a function named `match_10` that takes in a string and:
- converts the string to lowercase,
- removes all non-alphanumeric characters (i.e. removes everything that is not in the `\w` character class), and the letter `'a'`, and
- returns a list of every **non-overlapping** three-character substring in the remaining string, starting from the beginning of the string.
   
For instance, consider the following doctest:

```py
>>> match_10('Ab..DEF')
['bde']
```

Here's how `match_10` should process `'Ab..DEF'`:

1. Convert to lowercase: `'ab..def'`.
2. Remove non-alphanumeric characters and the letter `'a'`: `'bdef'`.
3. Starting from the beginning of the string, there is only a single non-overlapping three character substring: `'bde'`. Hence, we return `['bde']`.

***Note:*** Perform your operations in the exact order described above, otherwise your code may not pass all the tests.

In [6]:
import re
def match_5(string):
    """
    Write a regular expression that matches strings that represent valid 
    Python file names, including the extension.
    Note: For simplicity, assume that file names contains only letters, numbers, and underscores ('_').
    
    DO NOT EDIT THE DOCSTRING!
    >>> match_5("dsc80.py")
    True
    >>> match_5("dsc80py")
    False
    >>> match_5("dsc80..py")
    False
    >>> match_5("dsc80+.py")
    False
    """
    pattern = r'^[a-zA-Z_0-9]+(\.py)$'

    # Do not edit following code
    prog = re.compile(pattern)
    return prog.search(string) is not None

match_5("dsc80+.py")

False

In [None]:
grader.check("q1")

## Question 2 – Capturing Groups in Regular Expressions 📡

The dataset stored in `data/messy.txt` contains personal information from a fictional website that a user scraped from web server logs. Within this dataset, there are four fields that are of interest to you:
1. Email Addresses (assume they are alphanumeric usernames and domain names)
2. [Social Security Numbers](https://en.wikipedia.org/wiki/Social_Security_number#Structure)
3. Bitcoin Addresses (alphanumeric strings of long length)
4. Street Addresses

Create a function `extract_personal` that takes in a string containing the contents of a server log file (like `open('data/messy.txt').read()`) and returns a **tuple of four separate lists** containing values of the 4 pieces of information listed above (in the order listed above). Do **not** keep empty values.

***Note:*** Since this data is messy, your function will be allowed to miss ~5% of the records in each list. Good spot checking using certain useful substrings (e.g. `'@'` for emails) should help assure correctness! Your function will be tested on a sample of the file `messy.txt`.

***Hint:*** There are multiple "delimiters" in use in the file; there are few enough of them that you can safely determine what they are.

In [28]:
# experiment with extract_personal using the file s below
fp = os.path.join('data', 'messy.txt')
s = open(fp, encoding='utf8').read()

In [21]:
fp = os.path.join('data', 'messy.txt')
s = open(fp, encoding='utf8').read()
pre_emails = re.findall(
        pattern='[\t,\|#]([a-zA-Z0-9]+@[a-zA-Z0-9\.]+)',
        string= s
    )
print(len(pre_emails))

print(len([match for match in pre_emails if match]))

872
872


In [106]:
def extract_personal(s):
    """
    Extracts email addresses, Social Security Numbers, Bitcoin addresses and street addresses from input file
    :param s: file name as a string
    :return: a tuple of four separate lists
    
    :Example:
    >>> fp = os.path.join('data', 'messy.test.txt')
    >>> s = open(fp, encoding='utf8').read()
    >>> emails, ssn, bitcoin, addresses = extract_personal(s)
    >>> emails[0] == 'test@test.com'
    True
    >>> ssn[0] == '423-00-9575'
    True
    >>> bitcoin[0] == '1BvBMSEYstWetqTFn5Au4m4GFg7xJaNVN2'
    True
    >>> addresses[0] == '530 High Street'
    True
    """
    emails = re.findall(
        pattern='[\t,\|#]([\w]+@[\w\.]+)[\t,\|#]',
        string= s
    )

    ssn = re.findall(
        pattern='ssn:([\d]{3}-[\d]{2}-[\d]{4})',
        string= s
    )

    bitcoin = re.findall(
        pattern='bitcoin:([\w]+)[^\w]',
        string= s
    )

    addresses = re.findall(
        pattern='(\t|,|\|)([\d]+ [\w ]+)\n',
        string= s
    )
    addresses = [address[1] for address in addresses if address[1]]

    return emails, ssn, bitcoin, addresses

fp = os.path.join('data', 'messy.test.txt')
s = open(fp, encoding='utf8').read()
emails, ssn, bitcoin, addresses = extract_personal(s)
'(\t|,|\|)([0-9]+ [A-Za-z0-9 ]+)\n'
addresses

['530 High Street']

In [29]:
# don't change this cell, but do run it -- it is needed for the tests
test_fp = os.path.join('data', 'messy.test.txt')
test_s = open(test_fp, encoding='utf8').read()
emails, ssn, bitcoin, addresses = extract_personal(test_s)

In [None]:
grader.check("q2")

## Question 3 – TF-IDF 📊

The dataset `data/reviews.txt` contains [Amazon reviews](http://jmcauley.ucsd.edu/data/amazon/) for ~200k phones and phone accessories. The dataset has already been "cleaned" for you. In this question, you will create a function that takes in the reviews dataset as a Series (with one entry per review) as well as a single review, and returns the word that "best summarizes the single review" using TF-IDF.

To do so, implement the two functions below.

#### `tfidf_data`

Create a function `tfidf_data` that takes in the reviews data as a Series (`reviews_ser`) and a single review (`review`) and returns a DataFrame indexed by the words in `review` with four columns:
- `'cnt'`: the number of times each word is found in the review 
- `'tf'`: the term frequency for each word
- `'idf'`: the inverse document frequency for each word
- `'tfidf'` the TF-IDF for each word

You may use a `for`-loop. The words in the outputted DataFrame may appear in any order.

***Hint:*** You may need to use the [`'\b'` character](https://www.regular-expressions.info/wordboundaries.html) somewhere.
    
<br>

#### `relevant_word`

Create a function `tfidf_data` that takes in the DataFrame that `tfidf_data` returns and returns the word that "best summarizes" the review. If there are multiple "best" summary words, return any one of them.

In [26]:
# experiment with tfidf_data using reviews_ser and review below 
fp = os.path.join('data', 'reviews.txt')
reviews_ser = pd.read_csv(fp, header=None).squeeze("columns")
review = open(os.path.join('data', 'review.txt'), encoding='utf8').read().strip()

In [96]:
review_len = len(review.split())
words = set(review.split())
out = pd.DataFrame(columns=['cnt', 'tf', 'idf', 'tfidf'], index=words)


for word in words:
    pattern = '\\b' + str(word) + '\\b'
    word_ct = len(re.findall(pattern=pattern, string=review))
    tf = word_ct / review_len
    idf = (
        np.log(reviews_ser.size/
        np.count_nonzero(reviews_ser.str.contains(pattern)))
                )
    tfidf= tf*idf
    out.loc[word] = [word_ct, tf, idf, tfidf]



In [109]:
out = out.astype(np.float64)
out['cnt'] = out['cnt'].astype(np.int64)
out['cnt'].sum()

'before' in out.index

def tfidf_data(reviews_ser, review):
    """
    :Example:
    >>> fp = os.path.join('data', 'reviews.txt')
    >>> reviews_ser = pd.read_csv(fp, header=None).squeeze("columns")
    >>> review = open(os.path.join('data', 'review.txt'), encoding='utf8').read().strip()
    >>> out = tfidf_data(reviews_ser, review)
    >>> out['cnt'].sum()
    85
    >>> 'before' in out.index
    True
    """
    review_len = len(review.split())
    words = set(review.split())
    out = pd.DataFrame(columns=['cnt', 'tf', 'idf', 'tfidf'], index=words)


    for word in words:
        pattern = '\\b' + str(word) + '\\b'
        word_ct = len(re.findall(pattern=pattern, string=review))
        tf = word_ct / review_len
        idf = (
            np.log(reviews_ser.size/
            np.count_nonzero(reviews_ser.str.contains(pattern)))
                    )
        tfidf= tf*idf
        out.loc[word] = [word_ct, tf, idf, tfidf]

    out = out.astype(np.float64)

    out['cnt'] = out['cnt'].astype(np.int64)

    return out

def relevant_word(out):
    """
    :Example:
    >>> fp = os.path.join('data', 'reviews.txt')
    >>> reviews_ser = pd.read_csv(fp, header=None).squeeze("columns")
    >>> review = open(os.path.join('data', 'review.txt'), encoding='utf8').read().strip()
    >>> out = tfidf_data(reviews_ser, review)
    >>> relevant_word(out) in out.index
    True
    """
    return out['tfidf'].idxmax()

In [111]:
# don't change this cell, but do run it -- it is needed for the tests
fp = os.path.join('data', 'reviews.txt')
reviews_ser = pd.read_csv(fp, header=None).squeeze("columns")
review = open(os.path.join('data', 'review.txt'), encoding='utf8').read().strip()
q3_tfidf = tfidf_data(reviews_ser, review)

try:
    q3_rel = relevant_word(q3_tfidf)
except:
    q3_rel = None

q3_tfidf

Unnamed: 0,cnt,tf,idf,tfidf
seen,1,0.011765,4.104377,0.048287
have,1,0.011765,0.975412,0.011475
than,1,0.011765,1.792995,0.021094
skin,1,0.011765,4.875317,0.057357
the,3,0.035294,0.181416,0.006403
really,1,0.011765,1.896583,0.022313
chunk,1,0.011765,8.028953,0.094458
create,1,0.011765,5.814336,0.068404
and,5,0.058824,0.248188,0.014599
it,3,0.035294,0.247858,0.008748


In [115]:
q3_tfidf.loc['phone', 'idf']

0.8207030388978666

In [None]:
grader.check("q3")

## Questions 4 and 5 – Tweet Analysis 🐥

The dataset `data/ira.csv` contains tweets tagged by Twitter as likely being posted by the [Internet Research Agency](https://en.wikipedia.org/wiki/Internet_Research_Agency), the tweet factory facing allegations for attempting to influence US political elections.

Questions 4 and 5 will focus on the following:
- Question 4: Look at the hashtags present in the text and trends in their makeup.
- Question 5: Prepare the dataset for modeling by creating features out of the text fields.

### Question 4 – Hashtags #️⃣

You may assume that a hashtag is any string without whitespace that immediately follows a `'#'`.

#### `hashtag_list`

Create a function `hashtag_list` that takes in a Series of tweet texts and returns a Series containing a list of hashtags present in each tweet's text. If a tweet's text doesn't contain a hashtag, the Series should contain an empty list for that tweet. Don't include the `'#'` symbol in the lists that are returned (see the doctest for an example).

<br>

#### `most_common_hashtag`

Create a function `most_common_hashtag` that takes in a Series of hashtag lists (as is outputted by `hashtag_list`) and returns a Series consisting of a single hashtag per tweet: 
- If the tweet's text has no hashtags, the entry should in the output Series should be `NaN`.
- If the tweet's text has one distinct hashtag, the entry in the output Series should be that hashtag.
- If the tweet's text has more than one hashtag, the entry in the output Series should be the most common hashtag in the tweet's text with respect to **the whole input Series**. If there is a tie for the most common, any of the most common can be returned.
    - For example, if the input Series was `pd.Series([[2], [2], [3, 2, 3]])`, the output would be `pd.Series([2, 2, 2])`. Even though `3` was more common in the third list than `2`, `2` is the most common among all hashtags in the Series.

In [63]:
# The doctests/public tests don't test your work on the `ira` data,
# but the hidden tests do.
# So, make sure to thoroughly test your work yourself!
fp = os.path.join('data', 'ira.csv')
ira = pd.read_csv(fp, names=['id', 'name', 'date', 'text'])

In [66]:
testdata = [['RT @DSC80: Text-cleaning is cool! #NLP https://t.co/xsfdw88d #NLP1 #NLP1']]
test = pd.DataFrame(testdata, columns=['text'])['text']
tuplist = test.str.findall('#([^\s]+)($|\s)')
tuplist.apply(lambda x: [tup[0] for tup in x])

0    [NLP, NLP1, NLP1]
Name: text, dtype: object

In [71]:
def hashtag_list(tweet_text):
    """
    :Example:
    >>> testdata = [['RT @DSC80: Text-cleaning is cool! #NLP https://t.co/xsfdw88d #NLP1 #NLP1']]
    >>> test = pd.DataFrame(testdata, columns=['text'])['text']
    >>> out = hashtag_list(test)
    >>> (out.iloc[0] == ['NLP', 'NLP1', 'NLP1'])
    True
    """
    tuplist = tweet_text.str.findall('#([^\s]+)($|\s)')
    return tuplist.apply(lambda x: [tup[0] for tup in x])



testdata = [['RT @DSC80: Text-cleaning is cool! #NLP https://t.co/xsfdw88d #NLP1 #NLP1']]
test = hashtag_list(pd.DataFrame(testdata, columns=['text'])['text'])
most_common = pd.Series(test.sum()).value_counts().idxmax()
most_common

test

0    [NLP, NLP1, NLP1]
Name: text, dtype: object

In [None]:
grader.check("q4")

### Question 5 – Features 📋

Now, create a DataFrame of features from the `ira` data.  That is, create a function `create_features` that takes in a DataFrame `ira` that has just a single column, `'text'`, and returns a DataFrame with the same index as `ira` (i.e. the rows correspond to the same tweets) and the following columns:
* `'num_hashtags'`, the number of hashtags present in the tweet.
* `'mc_hashtags'`, the most common hashtag associated to the tweet (using the result of `most_common_hashtag` from Question 4).
* `'num_tags'`, the number of tags the tweet has (look for the presence of `'@'`).
* `'num_links'`, the number of hyperlinks present in the tweet.
    - A hyperlink is a string starting with `'http://'` or `'https://'`, not followed by whitespaces.
* `'is_retweet'`, a Boolean describing whether the tweet is a retweet. A retweet is a tweet that **begins** with `'RT'`.
* `'text'`, a version of the tweet's text that is cleaned according to the following steps, **in this exact order**:
    1. All meta-information above (retweet info, tags, hyperlinks, and hashtags) should be replaced with a single space.
    2. Everything other than letters, numbers, and spaces should be replaced with a single space.
    3. All letters should be lowercase.
    4. All words should be separated by exactly one space, and leading/trailing whitespace should be removed (stripped).
    
The columns in the outputted DataFrame must be in the order `['text', 'num_hashtags', 'mc_hashtags', 'num_tags', 'num_links', 'is_retweet']`. (Remember, the DataFrame that `create_features` is called on only has a single column, `'text'`.)

***Notes:***
- It's a good idea to make helper function for each column.
- The `\w` character class in regex **does not** refer to letters, numbers, and spaces (or even just letters and numbers). As such, you can't use it here!
- `create_features` will take a while to run on the entire dataset – test it on a small sample first!

In [72]:
# The doctests/public tests don't test your work on the `ira` data,
# but the hidden tests do.
# So, make sure to thoroughly test your work yourself!
fp = os.path.join('data', 'ira.csv')
ira = pd.read_csv(fp, names=['id', 'name', 'date', 'text'])

In [80]:
pd.notna(np.NaN)

False

In [90]:
def most_common_hashtag(tweet_lists):
    """
    :Example:
    >>> testdata = [['RT @DSC80: Text-cleaning is cool! #NLP https://t.co/xsfdw88d #NLP1 #NLP1']]
    >>> test = hashtag_list(pd.DataFrame(testdata, columns=['text'])['text'])
    >>> most_common_hashtag(test).iloc[0] == 'NLP1'
    True
    """
    most_common = pd.Series(tweet_lists.sum()).value_counts().idxmax()
    def helper(lst):
        if len(lst) == 0:
            return np.NaN
        elif len(lst) == 1:
            return lst[0]
        else:
            return most_common

    return tweet_lists.apply(helper)
def create_features(ira):
    """
    Takes in the ira data and returns a DataFrame with the described features.
    :param ira: the input DataFrame
    :return: a DataFrame with specified features
    
    :Example:
    >>> testdata = [['RT @DSC80: Text-cleaning is cool! #NLP https://t.co/xsfdw88d #NLP1 #NLP1']]
    >>> test = pd.DataFrame(testdata, columns=['text'])
    >>> out = create_features(test)
    >>> anscols = ['text', 'num_hashtags', 'mc_hashtags', 'num_tags', 'num_links', 'is_retweet']
    >>> ansdata = [['text cleaning is cool', 3, 'NLP1', 1, 1, True]]
    >>> ans = pd.DataFrame(ansdata, columns=anscols)
    >>> (out == ans).all().all()
    True
    """
    rt = '^RT'
    tag = '@([\w]+)'
    hrefs = '(http[s]*://[^\s]+)'
    ht = '#([^\s]+)'
    punc = '[^\w]'
    out = ira.copy()
    ht_lst = hashtag_list(ira['text'])
    mc_ht = most_common_hashtag(ht_lst)

    out['num_hashtags'] = ht_lst.apply(len)
    out['mc_hashtags'] = mc_ht
    out['num_tags'] = ira['text'].apply(
        lambda x: x if pd.isna(x) else len(re.findall(tag, x))
    )
    out['num_links'] = ira['text'].apply(
        lambda x: len(re.findall(hrefs, x) if pd.notna(x) else x)
    )
    out['is_retweet'] = ira['text'].apply(lambda x: x if pd.isna(x) else(
        True if re.match(rt,x) else False
    )
    )
    def cleaning_helper(string):
        r = '%s|%s|%s|%s|%s' % (rt, tag, hrefs, ht, punc)
        if pd.isnull(string):
            return string
        else:
            return re.sub(r, ' ', string).lower().strip()

    out['text'] = ira['text'].apply(cleaning_helper)

    return out

In [94]:
testdata = [['RT @DSC80: Text-cleaning is cool! #NLP https://t.co/xsfdw88d #NLP1 #NLP1']]
test = pd.DataFrame(testdata, columns=['text'])
out = create_features(test)
anscols = ['text', 'num_hashtags', 'mc_hashtags', 'num_tags', 'num_links', 'is_retweet']
ansdata = [['text cleaning is cool', 3, 'NLP1', 1, 1, True]]
ans = pd.DataFrame(ansdata, columns=anscols)
(out==ans).all().all()

True

In [82]:
# don't change this cell, but do run it -- it is needed for the tests
# (yes, we know it says "hidden" – there are still truly hidden tests in this question)
fp_hidden = 'data/ira_test.csv'
ira_hidden = pd.read_csv(fp_hidden, header=None)
text_hidden = ira_hidden.iloc[:, -1:]
text_hidden.columns = ['text']

test_hidden = create_features(text_hidden)

In [None]:
grader.check("q5")

## Congratulations! You're done! 🏁

Submit your `lab.py` file to Gradescope. Note that you only need to submit the `lab.py` file; this notebook should not be uploaded.

Before submitting, you should ensure that all of your work is in the `lab.py` file. You can do this by running the doctests below, which will verify that your work passes the public tests **and** that your work is in the `lab.py` file. Run the cell below; you should see no output.

In [85]:
!python -m doctest lab.py

In addition, `grader.check_all()` will verify that your work passes the public tests.

---

To double-check your work, the cell below will rerun all of the autograder tests.

In [None]:
grader.check_all()