# Modern Computing in Simple Packages

## The Python Standard Library

The Python standard library provides a wealth of built-in packages and modules that can make your life easier. It is good practice to see if functionality that you need in your scripts and programs has already been implemented in the Python standard library.

We are going to go through a number of nice features that the Python standard library provides.

### Handling missing keys with setdefault() and defaultdict()

### Handling KeyErrors automatically using setdefault()
*It is unusual to have paired but separate headings like this. Can they be combined?*

The function setdefault() provides the ability to return a default value if the given key does not exist in the dictionary. You may not want to handle a KeyError every time you attempt to retrieve a value from a dictionary using a key that does not exist in the dictionary.

In [1]:
periodic_table = {'Hydrogen': 1, 'Helium': 2}
periodic_table['Carbon']

KeyError: 'Carbon'

In order to prevent the above KeyError, you can use the dictionary function setdefault, which will return the value of the key that you are referencing if it exists, and the value you specified as a default if it does not exist, as in the following example.

In [None]:
periodic_table.setdefault('Hydrogen', 12)

In [None]:
periodic_table.setdefault('Helium', 12)

In [None]:
periodic_table.setdefault('Carbon', 12)

Also note that setdefault() will also add the key that was not previously in the dictionary to the dictionary with the default value.

In [None]:
periodic_table

### Create a dictionary with a default value using defaultdict()

The function defaultdict() can be used to provide a created dictionary a default value for nonexistent keys. defaultdict() takes in a function that will be called when a key does not exist in the database. It must be defined at the initialization of the dictionary.

In [None]:
from collections import defaultdict

def not_an_element():
    return int()

no_error_periodic_table = defaultdict(not_an_element)
no_error_periodic_table['Hydrogen'] = 1
no_error_periodic_table['Helium'] = 2

In [None]:
no_error_periodic_table['Hydrogen']

In [None]:
no_error_periodic_table['Helium']

In [None]:
no_error_periodic_table['Blastium']

Constructors like int(), list(), or dict() provide default empty values of their respective data types and can be handy for creating empty objects. One handy way of using defaultdict() is if you want to create a counter that can count the number of occurences a key has in a given list of data.

For example, let's pull the first forty books in Google Books that are related to "Berkeley" and count how many books were published by a particular publisher.

In [None]:
from collections import defaultdict
from urllib.request import urlopen
import json

response = urlopen('https://www.googleapis.com/books/v1/volumes?q=berkeley&maxResults=40')
rawData = response.read().decode("utf-8")
book_data = json.loads(rawData)

# Here we create a dictionary whose default is int(), or zero
publisher_counter = defaultdict(int)

# We will go through all of the books that are related to berkeley and count
# how many books were published from a particular publisher
for item in book_data["items"]:
    
    # Note how we use the function setdefault to set publisher to None if there's no
    # publisher in the response
    publisher = item["volumeInfo"].setdefault("publisher", "None")
    
    # The default nature of publisher_counter enables us to do this without any raised
    # exceptions
    publisher_counter[publisher] += 1
    
for publisher, count in publisher_counter.items():
    print(publisher, count)

In addition to using regular functions and default constructors, you can also use a lambda function to set the default value of the dictionary.

In [None]:
no_error_periodic_table = defaultdict(lambda: 999999)
no_error_periodic_table['Hydrogen'] = 1
no_error_periodic_table['Helium'] = 2
no_error_periodic_table['Blastium']

### Count items with Counter()

Counter() produces a dictionary with the values and the number of times they are present in the list. We are going to use our api example above to demonstrate how to use Counter() to create a dictionary that is used for counting.

In [None]:
from collections import Counter
from urllib.request import urlopen
import json

response = urlopen('https://www.googleapis.com/books/v1/volumes?q=berkeley&maxResults=40')
rawData = response.read().decode("utf-8")
book_data = json.loads(rawData)

# We will go through all of the books that are related to berkeley to create a list of all
# of the publishers of the lists we pulled
berkeley_list = list()
for item in book_data["items"]:
    berkeley_list.append(item["volumeInfo"].setdefault("publisher", "None"))

# Create a Counter object based on the list of publishers we have created
berkeley_counter = Counter(berkeley_list)
print(berkeley_counter)

Now we have a dictionary that has the list of the publishers as keys and their counts as values. This is another way we can count the publishers in the given selection of books. 

We can easily find the most common of the list by using the counter's most_common function.

In [None]:
berkeley_counter.most_common(3)

We just received the three most common publishers in our list.

You can also do some operations on two or more counters. Lets make a counter for the first forty books in Google Books related to Stanford.

In [None]:
from collections import Counter
from urllib.request import urlopen
import json

response = urlopen('https://www.googleapis.com/books/v1/volumes?q=standford&maxResults=40')
rawData = response.read().decode("utf-8")
book_data = json.loads(rawData)

# We will go through all of the books that are related to berkeley to create a list of all
# of the publishers of the lists we pulled

stanford_list = list()
for item in book_data["items"]:
    stanford_list.append(item["volumeInfo"].setdefault("publisher", "None"))

stanford_counter = Counter(stanford_list)
print(stanford_counter)

We can now do some interesting operations such as combine two counters.

In [None]:
berkeley_counter + stanford_counter

Which publishers publish for both Berkeley and Stanford?

In [None]:
berkeley_counter & stanford_counter

What are all of the publishers that publish for either Stanford or Berkeley or both? Note this also combines counts for publishers that are found in both collections.

In [None]:
berkeley_counter | stanford_counter

### Ordering dictionaries with OrderedDict()

A standard dictionary entry will not save the order where they were inserted into the dictionary. If you want to create a dictionary that iterates through in a particular order, use OrderedDict.
*In the first sentence, what does "they" refer to?*

In [None]:
from collections import OrderedDict
berkeley_publishers = OrderedDict(berkeley_counter)

for publisher in berkeley_publishers:
    print(publisher)

Note that this is the same order as the order in which publishers were added earlier. The dictionary `berkeley_publishers` will return publishers in that order every time it is called and iterated through. This can be useful if you want to preserve the order in which the api returned the data.

### Using deques

Let's say now we want to create a list of publishers in order of how many books they have published in our sample list. We could get the list of publishers in order of how many books they have published in our sample.

In [None]:
berkeley_publisher_list_ordered = list()
for key, value in berkeley_counter.most_common():
    berkeley_publisher_list_ordered.append(key)

print(berkeley_publisher_list_ordered)

Let's say we want to remove the most common and least common publishers from our list. One way we can do this is to convert `berkeley_publisher_list_ordered` into a deque that allows us to pop elements from it from the beginning and the end.

In [None]:
from collections import deque

berkeley_publisher_list_ordered_deque = deque(berkeley_publisher_list_ordered)
berkeley_publisher_list_ordered_deque.pop()
berkeley_publisher_list_ordered_deque.popleft()
print(berkeley_publisher_list_ordered_deque)

Notice that the most popular publisher and the least popular publisher ("None" and "University of Chicago Press", respectively) were removed from the dequeue. The function pop() removes the last element inserted in the list ("University of Chicago Press") and popleft() removes the first element of the list ("None").

Dequeues are useful for a variety of cases, but also know that you can always use slices to remove elements from the beginning and end of lists.

In [None]:
berkeley_publisher_list_ordered_copy = list(berkeley_publisher_list_ordered)
berkeley_publisher_list_ordered_copy = berkeley_publisher_list_ordered_copy[1:-1]
print(berkeley_publisher_list_ordered_copy)

### Pretty print with pprint

We have been exclusively using the function `print()` to print out our lists and dictionaries; however, pprint can format our elements to make them easier to read. Let's print the Berkeley publisher list in a better way.

In [None]:
from pprint import pprint

pprint(berkeley_publisher_list_ordered)

This is just a sampling of the extremely useful modules and packages available in the Python standard library, which you can find here: https://docs.python.org/3/library/. And if you cannot find the functionality in the Python standard library, check the large open-source community-driven library PyPi at https://pypi.python.org/pypi.

### Note on installing third-party packages

To really appreciate the full power of PyPi, we highly recommend [pip](https://pip.pypa.io/en/stable/) and [virtualenv](https://virtualenv.pypa.io/en/latest/index.html). Pip is Python's package installation utility, and it makes it very easy to install third-party parckages from PyPi to your machine. If you have installed the latest version of Python 3, pip should already have been included in the installation. 

To install a package, run the following command.

`pip install [package_name]`

After completion, all of your scripts would have access to the package that you have installed. Sometimes you need sudo access for Linux or Mac machines, and administrator rights for Windows to complete the installation. Watch for permission-denied errors.

If you do not have administator access or if you simply do not want to modify your system, you can create a virtual environment where you can safely install all of the third-party packages in a separate folder away from your system files. Check [virtualenv](https://virtualenv.pypa.io/en/latest/index.html) for more information.

Now we are able to create our own modules and packages that can be included in other applications using the import statement. Next we will discuss how to create proper classes in our modules. We will use Python 3 to build object-oriented scripts and applications.