# 27 - Various Useful Modules

---

At this point in the course, everything that you really need to know to be an accomplished Python programmer has been covered. There are still some chapters left to explore, which I consider to contain mostly optional (though often very interesting) material. To finish off the mandatory part, I want to quickly highlight a few useful modules that do not need a chapter of their own. I will not give many details; once you know what the purpose of a module is, you can look up more information on it in the Python reference.

---

## `datetime`

The `datetime` module contains functions that allow the manipulation of date and time. The module contains various classes for date and time manipulation, of which the most important ones are `datetime`, `timedelta`, `date`, and `time`. `datetime` contains attributes `year`, `month`, `day`, `hour`, `minute`, `second`, `microsecond`, and `tzinfo` (the last attribute provides time zone information). `date` and `time` contain subsets of these attributes. Objects of these types are immutable.

I restrict myself to discussing the `datetime` and `timedelta` classes, though related functions and methods exist for the other classes. 

`datetime` objects hold a date and a time. Amongst the methods for datetime objects are:

- `now()` creates a `datetime` object that contains the current day and time. You would typically use a class call to get a value for `now()`.
- `datetime()` creates a `datetime` object using given arguments. The first three arguments are not optional, and are `year`, `month`, and `day`. The others, `hour`, `minute`, `second`, `microsecond`, and `tzinfo` are optional. Arguments can either be given in this order, or by specifying `<argument>=<value>`, with `<argument>` an argument name as specified here.

In [None]:
from datetime import datetime

print( datetime.now() )

When printing `datetime` objects you get a specific format as output. If you want a different format (including printing such things as the day of the week) then the `datetime` module has functions that allow you to specify different kinds of formatting. For more information, see the Python reference.

To calculate with `datetime` objects, you need `timedelta`. A `timedelta` object specifies a difference between two `datetime` objects. A `timedelta` object stores `days`, `seconds`, and `microseconds`. You can create `timedelta` objects with other period-representing arguments, but it only stores the three mentioned here; other arguments are recalculated into these three. 

You can perform all kinds of calculations with `timedelta` objecs, but the most useful ones are concerning the difference between `datetime` objects. So you can add a `timedelta` object to a `datetime` object to get a new `datetime` object, or subtract two `datetime` objects from each other to get their difference as a `timedelta` object. 

In [None]:
from datetime import datetime, timedelta

thisyear = datetime.now().year
xmasthisyear = datetime( thisyear, 12, 25, 23, 59, 59 )
thisday = datetime.now()
days = xmasthisyear - thisday

if days.days < 0:
    print( "Christmas will come again next year." )
elif days.days == 0:
    print( "It's Christmas!" )
else:
    print( "Only", days.days, "days to Christmas!" )

---

## `collections`

The `collections` module contains handy classes that allow you to manipulate iterables such as strings, tuples, lists, dictionaries, and sets. `collections` offers many interesting functionalities, most of which are a bit eccentric, making it unlikely that you will need to use them soon. I discuss two of them, namely the `Counter` class and the `deque` class.

A `Counter` object is a similar to a dictionary, which contains items as keys, and for each of the items a "count" as value. You create a `Counter` object by providing the sequence of which you want to count the items as argument. It has some useful methods, such as:

- `most_common()` gets an integer as argument and returns a list containing the items that have the highest count, as many as the integer argument indicates. The items on the list are 2-tuples, the first element of a tuple being the counted item, and the second element being the count. They are ordered from most common to least common. If no integer argument is specified, the list contains all the items.
- `update()` gets an iterable as argument and "adds in" the items of the iterable.

In [None]:
from collections import Counter

data = [ "apple", "banana", "apple", "banana", "apple", "cherry" ]
c = Counter( data )
print( c )
print( c.most_common( 1 ) )

data2 = [ "apple", "orange", "cherry", "cherry", "cherry", "cherry" ]
c.update( data2 )
print( c )
print( c.most_common() )

A `deque` object is a list that is supposed to be used as a "queue", i.e., a list for which items are added and removed from the either end of the list. It supports methods that are similar to list methods, such as `append()`, `extend()`, and `pop()`, which work at the right side end of the list, but also has similar methods that work at the left side end of the list, such as `appendleft()`, `extendleft()`, and `popleft()`. For the rest, it has the same methods that you expect a list to have. You create a `deque` object with the iterable which you want to turn into a `deque` as argument.

In [None]:
from collections import deque

dq = deque( [ 1, 2, 3 ] )
dq.appendleft( 4 )
dq.extendleft( [ 5, 6 ] )
print( dq )

---

## `urllib`

The `urllib` module allows you to access web pages in the same way that you access files. There are two modules of main interest: `urllib.request` contains functions to access Internet content, and `urllib.error` contains definition for exceptions that can be raised. You can also use `urllib` to communicate with webpages; if you want to do so, you need to study the `urllib.parse` module. For now, I only give a simple example in which you want to open a webpage and read its contents.

In [None]:
from urllib.request import urlopen
from urllib.error import HTTPError, URLError
from sys import exit

try:
    u = urlopen( "http://www.python.org" )
except HTTPError as e:
    print( "HTTP Error", e )
    sys.exit()
except URLError as e:
    print( "URL error", e )
    sys.exit()

for i in range( 5 ):
    text = u.readline()
    print( text )
    
u.close()

Note that from `urllib` only `urlopen` needs to be imported. Once you have opened a web address, it is returned as a file handle, so you can use the regular file methods on it.

---

## `glob`

The `glob` module provides a function `glob()` to produce lists of files based on a wildcard search pattern that is provided as argument. The wildcard search uses Unix conventions, most of which also hold on other systems. They are as follows:

- A question mark (`?`) in a file name indicates any character.
- A Kleene star (`*`) in a file name indicates any sequence of characters.
- A sequence of characters between square brackets (`[]`) indicates any of these characters; a dash may be used to indicate a sequence that runs from the character to the left of the dash to the character to the right of the dash.

For instance, the wildcard search "`A[0-9]?B.*`" looks for all files that start with the letter `A`, followed by a digit, followed by any character, followed by a `B`, with any extension. It depends on the operating system whether this is a case-sensitive or case-insensitive search.

Do not confuse a wildcard search pattern with a regular expression. While they have some superficial resemblance (such as a question mark indicating "any character" in both of them), they are nothing alike. Wildcard searches only support the patterns listed above (some of which have a different meaning for regular expressions), and are only used for `glob` and when directly communicating with the system via the command prompt.

In [None]:
from glob import glob

glist = glob( "PC0[0-9]*.*" )
for name in glist:
    print( name )

The `glob` module also contains a function `iglob()`, which has the same functionality as `glob()`, but produces an iterator instead of a list.

**Exercise**: Use `glob()` to list all Python files in the current directory.

In [None]:
# Python files.
from glob import glob


---

## `statistics`

The `statistics` module gives you access to various common statistical functions. All of these functions get as argument a sequence or iterator of numbers (integers or floats).

- `mean()` calculates the mean (or average) of a sequence of numbers
- `median()` calculates the median of a sequence of numbers, i.e., the "middle" number
- `mode()` calculates the mode of a sequence of numbers, i.e., the number that occurs most often
- `stdev()` calculates the standard deviation of a sequence of numbers
- `variance()` calculates the variance of a sequence of numbers

There are a few more functions in the `statistics` module, but these are the most-used ones. For more advanced statistical calculations, there are other modules available, which are discussed in a later chapter.

These functions may raise a `StatisticsError`. This is particularly relevant in the case of the `mode()` function, as it is generated when no unique mode can be found.

In [None]:
from statistics import mean, median, mode, stdev, variance, StatisticsError

data = [ 4, 5, 1, 1, 2, 2, 2, 3, 3, 3 ]

print( "mean:", mean( data ) )
print( "median:", median( data ) )
try:
    print( "mode:", mode( data ) )
except StatisticsError as e:
    print( e )
print( "st.dev.: {:.3f}".format( stdev( data ) ) )
print( "variance: {:.3f}".format( variance( data ) ) )

Note that for a sequence with an even number of numbers, the median is the average of the two "middle" numbers. There are different ways of calculating the median in case of an even number of numbers; if you want to use a different one, examine other functions in the `statistics` module.

As for the mode, in the literature you find multiple definitions of what the mode is supposed to be. The general definition is "the most common number", but what if there are multiple of those? What if each number is unique? The version of the mode that the `statistics` module supports does not seem to be the most common one.

---

## What you learned

In this chapter, you learned about:

- `datetime` module
- `collections` module
- `urllib` module
- `glob` module
- `statistics` module

---

## Exercises

### Exercise 27.1

The code block below contains a sentence. Using the `Counter` class. List the five most common letters in that sentence, with their counts.

In [None]:
# Letter counts.
from collections import Counter

sentence = "Your mother was a hamster and your father smelled of elderberries."


### Exercise 27.2

Create a program that asks the user for numbers, until the user enters zero. It then prints the mean, median, and mode of these numbers. The `statistics` module can be used for the mean and median; however, for the mode, print all those numbers that have the highest count, even if that entails that you print more than one number. By definition, for a number to be the mode it must occur at least twice; so if every number only occurs once, there is no mode. Hint: Consider using the `Counter` class to construct the mode.

In [None]:
# Mean, median, mode.


---

## Python 2

`urllib` has been changed considerably between Python 2 and Python 3.

The `statistics` module does not exist for Python 2.

---

End of Chapter 27. Version 1.1. 