# Static Webscraping with BeautifulSoup


In this notebook, I will walk you through the basics of webscraping. As we will see, webscraping, or programmitically extracting data (usually text) from a webpage, is a very common and useful part of a wide variety of natural language processing tasks. Often times, webscraping will be the first step in your NLP pipeline and so the results we get from webscraping have the potential to affect all of the subsequent steps of your pipeline. Too, smart webscraping can save you a lot of time upfront that you would have had to spend on optimization later on in the process.

## Our task in this notebook
I will take the Wikipedia page of the [list of all Roman Emperors](https://en.wikipedia.org/wiki/List_of_Roman_emperors) and recreate the tables in it in Python. This is a very common task in NLP. A lot of the time, we need to cross reference a source we are interested in with a secondary source like Wikipedia. I will show you the basics in this notebook.

## Goals
* Access a static webpage in Python using the requests library
* Navigate through raw html code to find the data we are interested in
* Arrange that data into a useful format
* Clean the data to fit whatever we want to do with it

In [None]:
## we want to scrape all of the tables from this wikipedia article and make our own tables of the same same data
from IPython.display import IFrame
IFrame('https://en.wikipedia.org/wiki/List_of_Roman_emperors#Principate_(27_BC_–_AD_284)', width=1000, height=500)

## `requests`
`requests` is an incredibly useful Python package that comes pre-installed on most distributions of Python. If you need to interact with the web from a .py file or notebook, you will be using requests in some form. Let's take a look at how it works.

In [None]:
## note: although requests is very common, if this cell returns a ModuleNotFound error, uncomment the line below to install it
# !python3 -m pip install requests
import requests

In [None]:
## using the .get method we can access any webpage
r = requests.get('https://en.wikipedia.org/wiki/List_of_Roman_emperors')

## let's see what r is
type(r)

requests.models.Response

In [None]:
## a Response object can give us all of the information we need
## we can call .text to see the raw html of a webpage
html = r.text
type(html)

str

## `BeautifulSoup`
Now that we have our HTML output, we need to parse it as a structured form of text. Above we see that our html object is a string, meaning we can only use the attributes and methods of strings on it. While we could create a parser from scratch that lets us take advantage of the structured nature of the HTML code, as with many tasks in Python, we don't need to because one already exists for us. There are many packages that parse HTML, but we'll be using the most popular, [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/).

*A note on copyright*
<br>
Just because you can access and scrape a website doesn't always mean you should. Most websites are public and they have to be to share the information they seek to, but that does not mean that it is free. Most data on the internet can be used without a problem. These sources are generally proteted under certain licences like the MIT or Creative Commons Attribution-ShareAlike 3.0. Generally, if a site does not have one of these licenses, you should stay away. Stealing data this way can be dangerous and can break the terms of service of many sites.

In [None]:
## bs4 is in most distributions of Python, but if this cell does not work try:
## !python3 -m pip install bs4
from bs4 import BeautifulSoup

In [None]:
## the BeautifulSoup parser takes in any string and attempts to parse it as HTML (or XML)
soup = BeautifulSoup(r.text, features='html')
type(soup)

bs4.BeautifulSoup

In [None]:
## with this BeautifulSoup object, we can navigate through the tag soup in a systematic way
soup
## but now we need to isolate the data we want
## tranisition to Chrome Developer Tools/HTML highlighting

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
<head>
<meta charset="utf-8"/>
<title>List of Roman emperors - Wikipedia</title>
<script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames":false,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"c5a92fb5-344d-4bd8-b7ce-b0524c5bd19d","wgCSPNonce":false,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":false,"wgNamespaceNumber":0,"wgPageName":"List_of_Roman_emperors","wgTitle":"List of Roman emperors","wgCurRevisionId":1125655881,"wgRevisionId":1125655881,"wgArticleId":25791,"wgIsArticle":true,"wgIsRedirect":false,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Articles containing Latin-language text","Articles with short description","Short description is different from Wikidata","Featu

In [None]:
## now we have a iterable of each tbody object
for tbody in soup.find_all('tbody'):
    print(tbody)

<tbody><tr>
<th scope="col" width="7%">Portrait
</th>
<th scope="col" width="17%"> Name<sup class="reference" id="cite_ref-57"><a href="#cite_note-57">[f]</a></sup>
</th>
<th scope="col" width="26%">Reign
</th>
<th scope="col" width="25%">Succession
</th>
<th scope="col" width="25%">Life details
</th></tr>
<tr>
<td><a class="image" href="/wiki/File:04.2022_Augustus_Bevilacqua_cropped.jpg"><img alt="bust" data-file-height="4916" data-file-width="3395" decoding="async" height="145" src="//upload.wikimedia.org/wikipedia/commons/thumb/a/a4/04.2022_Augustus_Bevilacqua_cropped.jpg/100px-04.2022_Augustus_Bevilacqua_cropped.jpg" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/a/a4/04.2022_Augustus_Bevilacqua_cropped.jpg/150px-04.2022_Augustus_Bevilacqua_cropped.jpg 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/a/a4/04.2022_Augustus_Bevilacqua_cropped.jpg/200px-04.2022_Augustus_Bevilacqua_cropped.jpg 2x" width="100"/></a>
</td>
<th scope="row" style="text-align:center; background:

In [None]:
## importantly each element of this iterable can be navigated like the whole tree did
tbody = soup.find_all('tbody')[0]
for tr in tbody.find_all('tr')[1:]: ## the first element (0th in the list) is the columns names, so I have gotten rid of it
    # print(tr)
    # print('_________')
    row = tr.find_all(['td','th'])
    print(row[0].a['href'], row[1].b.get_text(), row[1].small.get_text(), row[2].get_text(), row[3].get_text(), row[4].get_text(), sep='\n')
    print('_________')
## now we're getting somewhere

/wiki/File:04.2022_Augustus_Bevilacqua_cropped.jpg
Augustus
Caesar Augustus
16 January 27 BC – 19 August AD 14  (40 years, 7 months and 3 days)[g]

Grandnephew and adopted son of Julius Caesar. Gradually acquired further power through grants from, and constitutional settlements with, the Roman Senate.

23 September 63 BC – 19 August 14(aged 75)Born as Gaius Octavius; first elected Roman consul on 19 August 43 BC.Died of natural causes[53]

_________
/wiki/File:(Toulouse)_Tib%C3%A8re_-_Mus%C3%A9e_Saint-Raymond_Ra_342_b_(cropped).jpg
Tiberius
Tiberius Caesar Augustus
17 September 14 – 16 March 37(22 years, 5 months and 27 days)

Stepson, former son-in-law and adopted son of Augustus

16 November 42 BC – 16 March 37(aged 77)Died probably of natural causes, allegedly murdered at the instigation of Caligula[54]

_________
/wiki/File:Caligula_-_MET_-_14.37_(cropped_2).jpg
Caligula
Gaius Caesar Augustus Germanicus
18 March 37 – 24 January 41(3 years, 10 months and 6 days)

Grandnephew and ado

I am going to use a very versatile package called `re` or regular expressions (regex) to do some advanced string parsing. This regex function, finditer, takes in a pattern and a text and returns all of the times that pattern occurs in the text. These patterns can look very complicated, but, in this case, it is '(?<=\)', which means: *find all of the places between an end paraenthesis and the rest of the string*.
<br>

You can read more about regex [here](https://librarycarpentry.org/lc-data-intro-archives/04-regular-expressions/index.html) and you can play around with your own regex at [regex101](https://regex101.com/).  

In [None]:
## going back to the full list of tbodys and populate a dictionary with all our data
import re
emp_dict = {}
for tbody in soup.find_all('tbody'):
    for tr in tbody.find_all('tr')[1:]:
        row = tr.find_all(['td','th'])
        if len(row) == 5: ## check if each row is of the correct length
            if not isinstance(row[0].a, type(None)): ## check if each row has an image
                img_url = f"en.wikipedia.org{row[0].a['href']}"

                life_details = re.split('(?<=\))', row[4].get_text()) ## here I am using regex to search for the place between an end parenthesis ')' and the rest of the string
                if len(life_details) > 1: ## checking if there is a parenthesis, if there isn't then the cause of death recorded
                    life_date = life_details[0]
                    cod = life_details[1]
                else:
                    life_date = life_details[0]
                    cod = 'None found.'

                if not isinstance(row[1].small, type(None)): ## checking if there is a full name associated with an emperor
                    full_name = row[1].small.get_text()
                else:
                    full_name = 'None found.'

                emp_dict[row[1].b.get_text()] = (img_url, full_name, row[2].get_text(), row[3].get_text(), life_date, cod)

In [None]:
emp_dict

{'Augustus': ('en.wikipedia.org/wiki/File:04.2022_Augustus_Bevilacqua_cropped.jpg',
  'Caesar Augustus',
  '16 January 27 BC – 19 August AD 14\xa0\xa0(40\xa0years, 7\xa0months and 3\xa0days)[g]\n',
  'Grandnephew and adopted son of Julius Caesar. Gradually acquired further power through grants from, and constitutional settlements with, the Roman Senate.\n',
  '23 September 63 BC – 19 August 14(aged 75)',
  'Born as Gaius Octavius; first elected Roman consul on 19\xa0August\xa043\xa0BC.Died of natural causes[53]\n'),
 'Tiberius': ('en.wikipedia.org/wiki/File:(Toulouse)_Tib%C3%A8re_-_Mus%C3%A9e_Saint-Raymond_Ra_342_b_(cropped).jpg',
  'Tiberius Caesar Augustus',
  '17 September 14 – 16 March 37(22\xa0years, 5\xa0months and 27\xa0days)\n',
  'Stepson, former son-in-law and adopted son of Augustus\n',
  '16 November 42 BC – 16 March 37(aged 77)',
  'Died probably of natural causes, allegedly murdered at the instigation of Caligula[54]\n'),
 'Caligula': ('en.wikipedia.org/wiki/File:Caligula

In [None]:
import pickle

with open('emp_dict.pickle', 'wb') as handle:
    pickle.dump(emp_dict, handle, protocol=pickle.HIGHEST_PROTOCOL)

*Using `pandas`*
<br>
A dictionary is useful, but with more and more data, you'll find that a dictionary is to simple of a data structure for most webscraped data. A `pandas` dataframe is much more scalable and makes tabular data much easier to work with. That being said, one is NEVER supposed to fill a dataframe with a loop, as we filled the dictionary above. Instead, we can use the dictionary we created above and turn it into a dataframe. This way we can keep both forms of the data in case we need dictionary representation later.

In [None]:
import pandas as pd ## you will almost always see pandas imported like this, the 'pd' alias is a very useful shorthand
## Let's see what happens when we input the dictionary directly
pd.DataFrame(emp_dict)
## it's close but not quite what we wanted

Unnamed: 0,Augustus,Tiberius,Caligula,Claudius,Nero,Galba,Otho,Vitellius,Vespasian,Titus,...,Andronikos II,Michael IX,Andronikos III,John V,John VI,Andronikos IV,John VII,Manuel II,John VIII,Constantine XI
0,en.wikipedia.org/wiki/File:04.2022_Augustus_Be...,en.wikipedia.org/wiki/File:(Toulouse)_Tib%C3%A...,en.wikipedia.org/wiki/File:Caligula_-_MET_-_14...,en.wikipedia.org/wiki/File:Claudius_crop_(crop...,en.wikipedia.org/wiki/File:Nero_Glyptothek_Mun...,en.wikipedia.org/wiki/File:Roman_emperor_Galba...,en.wikipedia.org/wiki/File:Paris_-_Mus%C3%A9e_...,en.wikipedia.org/wiki/File:Tunis_Bardo_Buste_8...,en.wikipedia.org/wiki/File:Naples_Archaeology_...,en.wikipedia.org/wiki/File:Titus_Ny_Carlsberg_...,...,en.wikipedia.org/wiki/File:153_-_Andronikos_II...,en.wikipedia.org/wiki/File:154_-_Michael_IX_Pa...,en.wikipedia.org/wiki/File:155_-_Andronikos_II...,en.wikipedia.org/wiki/File:157_-_John_V_Palaio...,en.wikipedia.org/wiki/File:156_-_John_VI_Kanta...,en.wikipedia.org/wiki/File:158_-_Andronikos_IV...,en.wikipedia.org/wiki/File:159_-_John_VII_Pala...,en.wikipedia.org/wiki/File:160_-_Manuel_II_Pal...,en.wikipedia.org/wiki/File:161_-_John_VIII_Pal...,en.wikipedia.org/wiki/File:162_-_Constantine_X...
1,Caesar Augustus,Tiberius Caesar Augustus,Gaius Caesar Augustus Germanicus,Tiberius Claudius Caesar Augustus Germanicus,Nero Claudius Caesar Augustus Germanicus,Servius Galba Caesar Augustus,Marcus Otho Caesar Augustus,Aulus Vitellius Germanicus Augustus,Caesar Vespasianus Augustus,Titus Caesar Vespasianus Augustus,...,Ἀνδρόνικος Δούκας Ἄγγελος Κομνηνὸς Παλαιολόγος,Μιχαὴλ Δούκας Ἄγγελος Κομνηνὸς Παλαιολόγος,Ἀνδρόνικος Δούκας Ἄγγελος Κομνηνός Παλαιολόγος,Ίωάννης Κομνηνός Παλαιολόγος,Ἰωάννης Ἄγγελος Κομνηνὸς Παλαιολόγος Καντακουζ...,Ἀνδρόνικος Κομνηνός Παλαιολόγος,Ίωάννης Παλαιολόγος,Μανουὴλ Παλαιολόγος,Ίωάννης Παλαιολόγος,Κωνσταντῖνος Δραγάσης Παλαιολόγος
2,"16 January 27 BC – 19 August AD 14 (40 years,...","17 September 14 – 16 March 37(22 years, 5 mont...","18 March 37 – 24 January 41(3 years, 10 months...","24 January 41 – 13 October 54(13 years, 8 mont...","13 October 54 – 9 June 68(13 years, 7 months a...",8 June 68 – 15 January 69(7 months and 7 days)\n,15 January – 16 April 69(3 months and 1 day)\n,19 April – 20 December 69(8 months and 1 day)\n,"1 July 69 – 23 June 79(9 years, 11 months and ...","24 June 79 – 13 September 81(2 years, 2 months...",...,"11 December 1282 – 24 May 1328(45 years, 5 mon...","21 May 1294 – 12 October 1320(26 years, 4 mont...",24 May 1328 – 15 June 1341(13 years and 22 day...,"15 June 1341 – 16 February 1391(49 years, 8 mo...","8 February 1347 – 10 December 1354(7 years, 10...","12 August 1376 – 1 July 1379(2 years, 10 month...",14 April – 17 September 1390(5 months and 3 da...,"16 February 1391 – 21 July 1425(34 years, 4 mo...","21 July 1425 – 31 October 1448(23 years, 4 mon...","6 January 1449 – 29 May 1453(4 years, 4 months..."
3,Grandnephew and adopted son of Julius Caesar. ...,"Stepson, former son-in-law and adopted son of ...","Grandnephew and adopted heir of Tiberius, grea...","Uncle of Caligula, grandnephew of Augustus, pr...","Grandnephew, stepson, son-in-law and adopted s...","Governor of Hispania Tarraconensis, revolted a...",Seized power through a coup against Galba\n,"Governor of Germania Inferior, proclaimed empe...",Seized power with support of the eastern legio...,Son of Vespasian\n,...,Son of Michael VIII; co-emperor since 8 Novemb...,"Son and co-ruler of Andronikos II, named co-em...","Son of Michael IX, named co-emperor between 13...","Son of Andronikos III, not formally crowned un...",Related to the Palaiologoi through his mother....,Son of John V and grandson of John VI; co-empe...,"Son of Andronikos IV, usurped the throne from ...",Son of John V and grandson of John VI; co-empe...,Son of Manuel II; co-emperor since before 1408...,Son of Manuel II and favored successor of his ...
4,23 September 63 BC – 19 August 14(aged 75),16 November 42 BC – 16 March 37(aged 77),31 August 12 – 24 January 41(aged 28),1 August 10 BC – 13 October 54(aged 63),15 December 37 – 9 June 68(aged 30),24 December 3 BC – 15 January 69(aged 70),28 April 32 – 16 April 69(aged 36),24 September 15 – 20/22 December 69(aged 54),17 November 9 – 23/24 June 79(aged 69),30 December 39 – 13 September 81(aged 41),...,25 March 1259 – 13 February 1332(aged 72),17 April 1277/1278 – 12 October 1320(aged 42/43),25 March 1297 – 15 June 1341(aged 44),18 June 1332 – 16 February 1391(aged 58),c. 1295 – 15 June 1383(aged approx. 88),11 April 1348 – 25/28 June 1385(aged 37),1370 – 22 September 1408(aged 38),27 June 1350 – 21 July 1425(aged 74),18 December 1392 – 31 October 1448(aged 55),8 February 1405 – 29 May 1453(aged 48)
5,Born as Gaius Octavius; first elected Roman co...,"Died probably of natural causes, allegedly mur...",Murdered in a conspiracy involving the Praetor...,Began the Roman conquest of Britain. Probably ...,Committed suicide after being deserted by the ...,Murdered by soldiers of the Praetorian Guard i...,Committed suicide after losing the Battle of B...,Murdered by Vespasian's troops[60]\n,Died of natural causes[61]\n,Died of natural causes[62]\n,...,Deposed by his grandson Andronikos III in 1328...,Allegedly died of grief due to the accidental ...,Last Emperor to effectively control Greece. Di...,"Reigned almost 50 years, but only held effecti...",Deposed by John V in another civil war and ret...,Deposed by John V in 1379 and fled to Galata i...,Died of natural causes[254]\n,"Suffered a stroke in 1422, whereafter the gove...",First emperor to visit Rome since Constans II....,The last Roman emperor. Died in battle at the ...


In [None]:
## pandas is programmed to look for numerical indices, which dictionaries (because they're an unordered data type) do not have
## we can coerse it though to accept string values as the index with the 'from_dict' method and the 'orient' keyword argument
## the 'reset_index' method will then turn our index into a column and give us an index for the rows
emp_df = pd.DataFrame.from_dict(emp_dict, orient='index').reset_index()
emp_df = emp_df.rename(columns={'index':'name',0:'img',1:'full_name',2:'reign',3:'succession',4:'life_dates',5:'cause_of_death'}) ## last, this is one way to rename columns
emp_df

Unnamed: 0,name,img,full_name,reign,succession,life_dates,cause_of_death
0,Augustus,en.wikipedia.org/wiki/File:04.2022_Augustus_Be...,Caesar Augustus,"16 January 27 BC – 19 August AD 14 (40 years,...",Grandnephew and adopted son of Julius Caesar. ...,23 September 63 BC – 19 August 14(aged 75),Born as Gaius Octavius; first elected Roman co...
1,Tiberius,en.wikipedia.org/wiki/File:(Toulouse)_Tib%C3%A...,Tiberius Caesar Augustus,"17 September 14 – 16 March 37(22 years, 5 mont...","Stepson, former son-in-law and adopted son of ...",16 November 42 BC – 16 March 37(aged 77),"Died probably of natural causes, allegedly mur..."
2,Caligula,en.wikipedia.org/wiki/File:Caligula_-_MET_-_14...,Gaius Caesar Augustus Germanicus,"18 March 37 – 24 January 41(3 years, 10 months...","Grandnephew and adopted heir of Tiberius, grea...",31 August 12 – 24 January 41(aged 28),Murdered in a conspiracy involving the Praetor...
3,Claudius,en.wikipedia.org/wiki/File:Claudius_crop_(crop...,Tiberius Claudius Caesar Augustus Germanicus,"24 January 41 – 13 October 54(13 years, 8 mont...","Uncle of Caligula, grandnephew of Augustus, pr...",1 August 10 BC – 13 October 54(aged 63),Began the Roman conquest of Britain. Probably ...
4,Nero,en.wikipedia.org/wiki/File:Nero_Glyptothek_Mun...,Nero Claudius Caesar Augustus Germanicus,"13 October 54 – 9 June 68(13 years, 7 months a...","Grandnephew, stepson, son-in-law and adopted s...",15 December 37 – 9 June 68(aged 30),Committed suicide after being deserted by the ...
...,...,...,...,...,...,...,...
175,Andronikos IV,en.wikipedia.org/wiki/File:158_-_Andronikos_IV...,Ἀνδρόνικος Κομνηνός Παλαιολόγος,"12 August 1376 – 1 July 1379(2 years, 10 month...",Son of John V and grandson of John VI; co-empe...,11 April 1348 – 25/28 June 1385(aged 37),Deposed by John V in 1379 and fled to Galata i...
176,John VII,en.wikipedia.org/wiki/File:159_-_John_VII_Pala...,Ίωάννης Παλαιολόγος,14 April – 17 September 1390(5 months and 3 da...,"Son of Andronikos IV, usurped the throne from ...",1370 – 22 September 1408(aged 38),Died of natural causes[254]\n
177,Manuel II,en.wikipedia.org/wiki/File:160_-_Manuel_II_Pal...,Μανουὴλ Παλαιολόγος,"16 February 1391 – 21 July 1425(34 years, 4 mo...",Son of John V and grandson of John VI; co-empe...,27 June 1350 – 21 July 1425(aged 74),"Suffered a stroke in 1422, whereafter the gove..."
178,John VIII,en.wikipedia.org/wiki/File:161_-_John_VIII_Pal...,Ίωάννης Παλαιολόγος,"21 July 1425 – 31 October 1448(23 years, 4 mon...",Son of Manuel II; co-emperor since before 1408...,18 December 1392 – 31 October 1448(aged 55),First emperor to visit Rome since Constans II....


Yay! 🎉 🥳 Our data has been scraped! 🥳 🎉
... but what can we do with it 🤔
<br>

Let's try to plot all of the ages of the emperors, as we have that data in the life_dates column

In [None]:
## we have a slight problem though...
## Take Trajan's row for instance
string = emp_df.loc[emp_df['name'] == 'Trajan'].life_dates.iloc[0] ## gets a contents of the life_dates cell in the Trajan row
print(string)
print(str.encode(string))
## what is \xc2\xa0??

18 September 53 – 7/11 August 117(aged 63)
b'18 September 53 \xe2\x80\x93 7/11 August 117(aged\xc2\xa063)'


It might not seem like it, but the difference between the two lines above will be very significant in cleaning our data to be used in visualizations.

These collections of letters and numbers preceeded by a backslash are byte representations of characters at the index position of the characters themselves. In fact these representations are slightly different characters than they might seem and we will have to normalize them in order to interact with them. To put a long story short, these characters come from a different text encoding (ISO-8895-1) than what Python expects (utf-8), so we must convert these non-standard characters into their standard output. We can use a package called unidecode, which also comes with most distributions of Python.

In [None]:
#!pip install unidecode
import unidecode
print(string)
print(unidecode.unidecode(string))
print(str.encode(unidecode.unidecode(string)))

18 September 53 – 7/11 August 117(aged 63)
18 September 53 - 7/11 August 117(aged 63)
b'18 September 53 - 7/11 August 117(aged 63)'


In [None]:
def getAges(life_dates):
    ld = unidecode.unidecode(life_dates)
    age = re.search('(?<=aged )([0-9]+)|(?<=aged approx. )([0-9]+)', ld) ## more regex to extract the age
    if age:
        return int(age.group(0))
    else:
        return None ## there are some emperors for whom we have no dates for

In [None]:
getAges(string)

63

In [None]:
## apply takes in a function and applies it to all of the members of a column
emp_df['age'] = emp_df['life_dates'].apply(getAges)
emp_df['age']

0      75.0
1      77.0
2      28.0
3      63.0
4      30.0
       ... 
175    37.0
176    38.0
177    74.0
178    55.0
179    48.0
Name: age, Length: 180, dtype: float64

In [None]:
import plotly.express as px
fig = px.scatter(x=emp_df['name'], y=emp_df['age'])
fig.show()

## Reviewing what we learned
* The basics of the `requests` library
* Navigating HTML using BeautifulSoup
* How to construct a dictionary for our data
* Turning that dictionary into a `pandas` dataframe
* Cleaning our data for a specific purpose with `.apply`

As a challenge, try to do what I did for the ages of the emperors, but with the length that they reigned for. This is a much more difficult question and can be done in a couple different ways. You will likely have to use the `datetime` package in Python. If you have trouble or just want to show off how you did it, feel free to reach out and let me know at peter.nadel@tufts.edu

# Thanks for reading