# Modern Computing in Simple Packages

## The Python Standard Library

The Python Standard Library provides a wealth of builtin packages and modules that can most likely make your life easier. It is really good practice to see if any functionality that you need in your scripts and programs has already been implemented in the Python Standard Library.

We are going to go through a number of nice features that the Python Standard Library provides.

### Handling Missing Keys with setdefault() and defaultdict()

### Handle KeyErrors automatically using setdefault()

The function setdefault() provides the ability to retun a default value if the given key does not exist in the dictionary. Sometimes you do not want to handle a KeyError every time you attempt to retrieve a value from a dictionary using a key that doesn't exist in the dictionary:

In [2]:
periodic_table = {'Hydrogen': 1, 'Helium': 2}
periodic_table['Carbon']

KeyError: 'Carbon'

In order to prevent the above KeyError, you can use the dictionary function setdefault that will return the value of the key that you are referencing if it exists, and the value you specified as a default if not. For example:

In [3]:
periodic_table.setdefault('Hydrogen', 12)

1

In [4]:
periodic_table.setdefault('Helium', 12)

2

In [5]:
periodic_table.setdefault('Carbon', 12)

12

Also note that setdefault() will also add the key that was not previously in the dictionary to the dictionary with the default value.

In [6]:
periodic_table

{'Carbon': 12, 'Helium': 2, 'Hydrogen': 1}

### Create a dictionary with a default value using defaultdict()

The function defaultdict() can be used to provide a created dictionary a default value for nonexistent keys. defaultdict() takes in a function that will be called when a key does not exist in the database. It must be defined at the initialization of the dictionary:

In [7]:
from collections import defaultdict

def not_an_element():
    return int()

no_error_periodic_table = defaultdict(not_an_element)
no_error_periodic_table['Hydrogen'] = 1
no_error_periodic_table['Helium'] = 2

In [8]:
no_error_periodic_table['Hydrogen']

1

In [9]:
no_error_periodic_table['Helium']

2

In [10]:
no_error_periodic_table['Blastium']

0

Constructors like int(), list(), or dict() provide default emtpy values of their respective data types and can be handy for creating empty objects. One handy way of using defaultdict() is if you want to create a counter that can count the number of occurences a key has in a given list of data.

For example, we are going to pull the first forty books that Google Books has that is related to "Berkeley" and count how many books were published by a particular publisher:

In [11]:
from collections import defaultdict
from urllib.request import urlopen
import json

response = urlopen('https://www.googleapis.com/books/v1/volumes?q=berkeley&maxResults=40')
rawData = response.read().decode("utf-8")
book_data = json.loads(rawData)

# Here we create a dictionary whose default is int(), or zero
publisher_counter = defaultdict(int)

# We will go through all of the books that are related to berkeley and count
# how many books were published from a particular publisher
for item in book_data["items"]:
    
    # Note how we use the function setdefault to set publisher to None if there's no
    # publisher in the response
    publisher = item["volumeInfo"].setdefault("publisher", "None")
    
    # The default nature of publisher_counter enables us to do this without any raised
    # exceptions
    publisher_counter[publisher] += 1
    
for publisher, count in publisher_counter.items():
    print(publisher, count)

U of Nebraska Press 1
Oxford University Press on Demand 1
None 8
Wm. B. Eerdmans Publishing 1
Stanford University Press 1
University Press of Kentucky 1
Transaction Publishers 1
Frog Books 1
Manchester University Press 1
Basic Books 1
Psychology Press 1
Filiquarian Publishing, LLC. 1
LSU Press 1
Springer Science & Business Media 1
Princeton Architectural Press 1
Yale University Press 1
North Atlantic Books 2
McGraw-Hill College 1
Oxford ; New York : Oxford University Press 1
Oxford University Press 1
Heritage Books 1
Oxford University Press, USA 2
University of Chicago Press 1
Cambridge University Press 1
Lexington Books 1
Genealogical Publishing Com 1
Columbia University Press 1
Indiana University Press 1
Cornell University Press 2
Arcadia Publishing 1


In addition to using regular functions and default constructors you can also use a lambda function to set the default value of the dictionary:

In [12]:
no_error_periodic_table = defaultdict(lambda: 999999)
no_error_periodic_table['Hydrogen'] = 1
no_error_periodic_table['Helium'] = 2
no_error_periodic_table['Blastium']

999999

### Count Items with Counter()

Counter() prodces a dictionary with the values and the number of times they are present in the list. We are going to use our api example above to demonstrate how to use Counter to create a dictionary that is used for counting:

In [13]:
from collections import Counter
from urllib.request import urlopen
import json

response = urlopen('https://www.googleapis.com/books/v1/volumes?q=berkeley&maxResults=40')
rawData = response.read().decode("utf-8")
book_data = json.loads(rawData)

# We will go through all of the books that are related to berkeley to create a list of all
# of the publishers of the lists we pulled
berkeley_list = list()
for item in book_data["items"]:
    berkeley_list.append(item["volumeInfo"].setdefault("publisher", "None"))

# Create a Counter object based on the list of publishers we have created
berkeley_counter = Counter(berkeley_list)
print(berkeley_counter)

Counter({'None': 8, 'North Atlantic Books': 2, 'Oxford University Press, USA': 2, 'Cornell University Press': 2, 'U of Nebraska Press': 1, 'Oxford University Press on Demand': 1, 'Wm. B. Eerdmans Publishing': 1, 'Stanford University Press': 1, 'University Press of Kentucky': 1, 'Transaction Publishers': 1, 'Frog Books': 1, 'Manchester University Press': 1, 'Basic Books': 1, 'Psychology Press': 1, 'Filiquarian Publishing, LLC.': 1, 'LSU Press': 1, 'Springer Science & Business Media': 1, 'Princeton Architectural Press': 1, 'Yale University Press': 1, 'McGraw-Hill College': 1, 'Oxford ; New York : Oxford University Press': 1, 'Oxford University Press': 1, 'Heritage Books': 1, 'University of Chicago Press': 1, 'Cambridge University Press': 1, 'Lexington Books': 1, 'Genealogical Publishing Com': 1, 'Columbia University Press': 1, 'Indiana University Press': 1, 'Arcadia Publishing': 1})


Notice now we have a dictionary that has the list of the publishers as keys and their counts as values. This is another way we can count the publishers in the given selection of books. 

We can easily find the most common of the list by using the counter's most_common function:

In [14]:
berkeley_counter.most_common(3)

[('None', 8), ('North Atlantic Books', 2), ('Oxford University Press, USA', 2)]

We just received the three most common publishers in our list.

You can also do some operations on two or more counters. Lets make a counter for the first forty books that Google Books has related to Stanford:

In [15]:
from collections import Counter
from urllib.request import urlopen
import json

response = urlopen('https://www.googleapis.com/books/v1/volumes?q=standford&maxResults=40')
rawData = response.read().decode("utf-8")
book_data = json.loads(rawData)

# We will go through all of the books that are related to berkeley to create a list of all
# of the publishers of the lists we pulled

stanford_list = list()
for item in book_data["items"]:
    stanford_list.append(item["volumeInfo"].setdefault("publisher", "None"))

stanford_counter = Counter(stanford_list)
print(stanford_counter)

Counter({'None': 17, 'On The Mark Press': 6, 'Springer Science & Business Media': 4, 'AuthorHouse': 2, 'CRC Press': 2, 'A&C Black': 1, 'Stanford University Press': 1, 'Pickpocket Publishing': 1, 'Cambridge University Press': 1, 'John Wiley & Sons': 1, 'Lexington Books': 1, 'Genealogical Publishing Com': 1, 'BiblioBazaar, LLC': 1, 'Xlibris Corporation': 1})


We can now do some interesting operations such as combine two counter:

In [16]:
berkeley_counter + stanford_counter

Counter({'None': 25, 'On The Mark Press': 6, 'Springer Science & Business Media': 5, 'Genealogical Publishing Com': 2, 'Stanford University Press': 2, 'North Atlantic Books': 2, 'Cornell University Press': 2, 'Oxford University Press, USA': 2, 'AuthorHouse': 2, 'Cambridge University Press': 2, 'CRC Press': 2, 'Lexington Books': 2, 'Oxford University Press on Demand': 1, 'McGraw-Hill College': 1, 'Wm. B. Eerdmans Publishing': 1, 'University Press of Kentucky': 1, 'Transaction Publishers': 1, 'Frog Books': 1, 'Manchester University Press': 1, 'Psychology Press': 1, 'Basic Books': 1, 'A&C Black': 1, 'Princeton Architectural Press': 1, 'Pickpocket Publishing': 1, 'Oxford ; New York : Oxford University Press': 1, 'Oxford University Press': 1, 'University of Chicago Press': 1, 'LSU Press': 1, 'Heritage Books': 1, 'BiblioBazaar, LLC': 1, 'Xlibris Corporation': 1, 'Yale University Press': 1, 'Columbia University Press': 1, 'Filiquarian Publishing, LLC.': 1, 'Indiana University Press': 1, 'U of

Which publishers publish for both Berkeley and Stanford?

In [17]:
berkeley_counter & stanford_counter

Counter({'None': 8, 'Cambridge University Press': 1, 'Lexington Books': 1, 'Springer Science & Business Media': 1, 'Genealogical Publishing Com': 1, 'Stanford University Press': 1})

What are all of the publishers that publish for either Stanford or Berkeley or both? Note this also combines counts for publishers that are found in both collections:

In [18]:
berkeley_counter | stanford_counter

Counter({'None': 17, 'On The Mark Press': 6, 'Springer Science & Business Media': 4, 'North Atlantic Books': 2, 'Cornell University Press': 2, 'Oxford University Press, USA': 2, 'AuthorHouse': 2, 'CRC Press': 2, 'Oxford University Press on Demand': 1, 'McGraw-Hill College': 1, 'Wm. B. Eerdmans Publishing': 1, 'Genealogical Publishing Com': 1, 'Stanford University Press': 1, 'University Press of Kentucky': 1, 'Transaction Publishers': 1, 'Frog Books': 1, 'Manchester University Press': 1, 'Psychology Press': 1, 'Basic Books': 1, 'A&C Black': 1, 'Princeton Architectural Press': 1, 'Pickpocket Publishing': 1, 'Oxford ; New York : Oxford University Press': 1, 'Oxford University Press': 1, 'University of Chicago Press': 1, 'LSU Press': 1, 'Heritage Books': 1, 'BiblioBazaar, LLC': 1, 'Xlibris Corporation': 1, 'Yale University Press': 1, 'Cambridge University Press': 1, 'Columbia University Press': 1, 'Filiquarian Publishing, LLC.': 1, 'Lexington Books': 1, 'Indiana University Press': 1, 'U of

### Ordering dictionaries with OrderedDict()

A standard dictionary entries will not save the order where they were inserted into the dictionary. If you want to create a dictionary that iterates through in a particular order use OrderedDict:

In [19]:
from collections import OrderedDict
berkeley_publishers = OrderedDict(berkeley_counter)

for publisher in berkeley_publishers:
    print(publisher)

U of Nebraska Press
Oxford University Press on Demand
None
Wm. B. Eerdmans Publishing
Stanford University Press
University Press of Kentucky
Transaction Publishers
Frog Books
Manchester University Press
Basic Books
Psychology Press
Filiquarian Publishing, LLC.
LSU Press
Springer Science & Business Media
Princeton Architectural Press
Yale University Press
North Atlantic Books
McGraw-Hill College
Oxford ; New York : Oxford University Press
Oxford University Press
Heritage Books
Oxford University Press, USA
University of Chicago Press
Cambridge University Press
Lexington Books
Genealogical Publishing Com
Columbia University Press
Indiana University Press
Cornell University Press
Arcadia Publishing


If you notice this order is the same order as the publishers were added earlier in this section. The dictionary `berkeley_publishers` will return publishers in that order every time it is called and iterated through. This can be useful if you want to preserve the order that the api returned the data in.

### Using deques

Let's say now in we want to create a list of publishers that are in order of how many books that they have published in our sample list. We could get the list of publishers in order of how many books they have published in our sample:

In [20]:
berkeley_publisher_list_ordered = list()
for key, value in berkeley_counter.most_common():
    berkeley_publisher_list_ordered.append(key)

print(berkeley_publisher_list_ordered)

['None', 'North Atlantic Books', 'Oxford University Press, USA', 'Cornell University Press', 'U of Nebraska Press', 'Oxford University Press on Demand', 'Wm. B. Eerdmans Publishing', 'Stanford University Press', 'University Press of Kentucky', 'Transaction Publishers', 'Frog Books', 'Manchester University Press', 'Basic Books', 'Psychology Press', 'Filiquarian Publishing, LLC.', 'LSU Press', 'Springer Science & Business Media', 'Princeton Architectural Press', 'Yale University Press', 'McGraw-Hill College', 'Oxford ; New York : Oxford University Press', 'Oxford University Press', 'Heritage Books', 'University of Chicago Press', 'Cambridge University Press', 'Lexington Books', 'Genealogical Publishing Com', 'Columbia University Press', 'Indiana University Press', 'Arcadia Publishing']


Let's say we want to remove the most common and the least common publishers from our list. One way we can do it is to convert `berkeley_publisher_list_ordered` into a deque that allows us to pop elements from it from the beginning and the end:

In [21]:
from collections import deque

berkeley_publisher_list_ordered_deque = deque(berkeley_publisher_list_ordered)
berkeley_publisher_list_ordered_deque.pop()
berkeley_publisher_list_ordered_deque.popleft()
print(berkeley_publisher_list_ordered_deque)

deque(['North Atlantic Books', 'Oxford University Press, USA', 'Cornell University Press', 'U of Nebraska Press', 'Oxford University Press on Demand', 'Wm. B. Eerdmans Publishing', 'Stanford University Press', 'University Press of Kentucky', 'Transaction Publishers', 'Frog Books', 'Manchester University Press', 'Basic Books', 'Psychology Press', 'Filiquarian Publishing, LLC.', 'LSU Press', 'Springer Science & Business Media', 'Princeton Architectural Press', 'Yale University Press', 'McGraw-Hill College', 'Oxford ; New York : Oxford University Press', 'Oxford University Press', 'Heritage Books', 'University of Chicago Press', 'Cambridge University Press', 'Lexington Books', 'Genealogical Publishing Com', 'Columbia University Press', 'Indiana University Press'])


Notice that the most popular publisher and the least popular publisher ("None" and "University of Chicago Press" respectively) was removed from the dequeue. The function pop() removes the last element inserted in the list ("University of Chicago Press") and popleft() removes the first element of the list ("None").

Dequeues are useful for a variety of cases, but also know that you can always use slices to remove elements from the beginning and end of lists.

In [22]:
berkeley_publisher_list_ordered_copy = list(berkeley_publisher_list_ordered)
berkeley_publisher_list_ordered_copy = berkeley_publisher_list_ordered_copy[1:-1]
print(berkeley_publisher_list_ordered_copy)

['North Atlantic Books', 'Oxford University Press, USA', 'Cornell University Press', 'U of Nebraska Press', 'Oxford University Press on Demand', 'Wm. B. Eerdmans Publishing', 'Stanford University Press', 'University Press of Kentucky', 'Transaction Publishers', 'Frog Books', 'Manchester University Press', 'Basic Books', 'Psychology Press', 'Filiquarian Publishing, LLC.', 'LSU Press', 'Springer Science & Business Media', 'Princeton Architectural Press', 'Yale University Press', 'McGraw-Hill College', 'Oxford ; New York : Oxford University Press', 'Oxford University Press', 'Heritage Books', 'University of Chicago Press', 'Cambridge University Press', 'Lexington Books', 'Genealogical Publishing Com', 'Columbia University Press', 'Indiana University Press']


### Pretty print with pprint

We have been exclusively using the function `print()` to print out our lists and dictionaries, however pprint can format our elements in such a way to make it easier to read. Let's print that Berkeley publisher list in a better way:

In [23]:
from pprint import pprint

pprint(berkeley_publisher_list_ordered)

['None',
 'North Atlantic Books',
 'Oxford University Press, USA',
 'Cornell University Press',
 'U of Nebraska Press',
 'Oxford University Press on Demand',
 'Wm. B. Eerdmans Publishing',
 'Stanford University Press',
 'University Press of Kentucky',
 'Transaction Publishers',
 'Frog Books',
 'Manchester University Press',
 'Basic Books',
 'Psychology Press',
 'Filiquarian Publishing, LLC.',
 'LSU Press',
 'Springer Science & Business Media',
 'Princeton Architectural Press',
 'Yale University Press',
 'McGraw-Hill College',
 'Oxford ; New York : Oxford University Press',
 'Oxford University Press',
 'Heritage Books',
 'University of Chicago Press',
 'Cambridge University Press',
 'Lexington Books',
 'Genealogical Publishing Com',
 'Columbia University Press',
 'Indiana University Press',
 'Arcadia Publishing']


This is just a sampling of the extremely useful modules and packages available in the Python standard library where you can find here: https://docs.python.org/3/library/. And if you can't find the functionaltiy in the Python standard library, check the large open source community driven library PyPi at https://pypi.python.org/pypi.