# Data containers in Python

Lars Tiede


## Basic container types

We already know these: tuple, list, set, and dict.

In [1]:
mytuple = (0,1,"a",0,1)
print(mytuple)

(0, 1, 'a', 0, 1)


In [2]:
mylist = [0,1,"a",0,1]
print(mylist)

[0, 1, 'a', 0, 1]


In [3]:
myset = {0,1,"a",0,1}
print(myset)

{0, 1, 'a'}


In [4]:
mydict = {0:"A", 1:"B", "a":"C", 0:"D", 1:"E"}
print(mydict)

{0: 'D', 1: 'E', 'a': 'C'}


## More containers: `collections`

The [collections](https://docs.python.org/3/library/collections.html) package gives us very useful container types beyond the basic ones.

In [5]:
import collections

*BTW: Python's documentation on collections is some good reading for learners of the language.*

### `namedtuple`: tuples with named fields

Who likes this?

In [6]:
my_address = ("Ola Nordmann", "Noen vei 2", 9106, "Mjelde")
my_zipcode = my_address[2] # meh
print(my_address)

('Ola Nordmann', 'Noen vei 2', 9106, 'Mjelde')


That's right, nobody does. Therefore, `namedtuple`:

In [7]:
Address = collections.namedtuple("Address", ["name", "road", "zipcode", "town"])

my_address = Address("Ola Nordmann", "Noen vei 2", 9106, town = "Mjelde")
my_zipcode = my_address.zipcode # that's better
print(my_address)

Address(name='Ola Nordmann', road='Noen vei 2', zipcode=9106, town='Mjelde')


*namedtuple allows for more readable, self-documenting code.*

### `defaultdict`: dict that calls a factory function for missing values

"factory function": a function whose purpose is to make some object and return it.

In `defaultdict`: function that takes no parameter and returns a default value for any missing key.

In [8]:
counts = collections.defaultdict(int)
s = "mississippi"
for c in s:
    counts[c] += 1
    
print(counts.items())

dict_items([('p', 2), ('m', 1), ('i', 4), ('s', 4)])


Compare this to how this code would look like with a regular `dict`:

In [9]:
counts = {}
for c in s:
    if c in counts: # "if counts has a key c"
        counts[c] += 1
    else:
        counts[c] = 1

print(counts.items())

dict_items([('p', 2), ('m', 1), ('i', 4), ('s', 4)])


### `Counter`

Counts hashable objects, similar to bags or multisets in other languages.

In [10]:
c = collections.Counter('mississippi')
print(c)

Counter({'i': 4, 's': 4, 'p': 2, 'm': 1})


In [11]:
c = collections.Counter(["eggs", "ham"])
print(c)
print(c["bacon"])

Counter({'ham': 1, 'eggs': 1})
0


In [12]:
import random

c = collections.Counter( (random.randint(1,3) for i in range(100000)) )
print(c)

Counter({1: 33587, 3: 33219, 2: 33194})


### More `collections`

* `deque` - list-like container with fast appends and pops *on either end*
* `ChainMap` - single view on multiple mappings (no need to make a new mapping)
* `OrderedDict` - dict that remembers the order in which entries were added