Pythonic with the `collections` module
<br>
The collections module implements specialized container datatypes providing alternatives to Python’s general purpose built-in containers, dict, list, set, and tuple.<br>
<br>
`defaultdic`<br>
`namedtuple`<br>
`Counter`<br>
`deque`<br>

In [7]:
from collections import defaultdict, namedtuple, Counter, deque
import csv
import random
from urllib.request import urlretrieve

#### 1. `namedtuple` : A `namedtuple` is a convenient way to define a class without methods. This allows you to store `dict` like objects you can access by attributes. Let's first look at a classic `tuple`:

In [5]:
users= ("Jack","John")
print(f"Another name for {users[0]} is {users[1]}")

Another name for Jack is John


Let's contrast that with a `namedtuple`:

In [23]:
Person = namedtuple("Person", "name children")

In [25]:
Jack = Person("Jack",["Timmy", "Jimmy"])

In [26]:
Jack.children

['Timmy', 'Jimmy']

In [32]:
# Python code to demonstrate namedtuple()
	
from collections import namedtuple
	
# Declaring namedtuple()
ogrenci = namedtuple('ogrenci',['isim','soyisim','dtarihi'])
	
# Adding values
S = ogrenci('Nandini','19','2541997')
	
# Access using index
print ("The Student age using index is : ",end ="")
print (S[1])
	
# Access using name
print ("The Student name using keyname is : ",end ="")
print (S.isim)


The Student age using index is : 19
The Student name using keyname is : Nandini


In [33]:
# Python code to demonstrate namedtuple() and
# Access by name, index and getattr()

# importing "collections" for namedtuple()
import collections

# Declaring namedtuple()
Student = collections.namedtuple('Student',['name','age','DOB'])

# Adding values
S = Student('Nandini','19','2541997')

# Access using index
print ("The Student age using index is : ",end ="")
print (S[1])

# Access using name
print ("The Student name using keyname is : ",end ="")
print (S.name)

# Access using getattr()
print ("The Student DOB using getattr() is : ",end ="")
print (getattr(S,'DOB'))


The Student age using index is : 19
The Student name using keyname is : Nandini
The Student DOB using getattr() is : 2541997


Making last string much more informational and elegant (f-strings helps too of course - now you know why we use Python >= 3.6)

In [8]:
f'{user.name} is a {user.role}'

'bob is a coder'

Conclusion: use a `namedtuple` wherever you can! They are easy to implement and make your code more readable.

### 2. `defaultdict`

I guess you are all too familiar with `KeyError` when using a `dict`, no? 

In [9]:
users = {'bob': 'coder'}

In [10]:
users['bob']
users['julian']  # oops

KeyError: 'julian'

You can get around it with dict's get method:

In [11]:
users.get('bob')

'coder'

In [12]:
users.get('julian') is None

True

But what if you need to build up a collection though? Let's make a dict from the following list of tuples:

In [13]:
challenges_done = [('mike', 10), ('julian', 7), ('bob', 5),
                   ('mike', 11), ('julian', 8), ('bob', 6)]
challenges_done

[('mike', 10),
 ('julian', 7),
 ('bob', 5),
 ('mike', 11),
 ('julian', 8),
 ('bob', 6)]

In [14]:
challenges = {}
for name, challenge in challenges_done:
    challenges[name].append(challenge)

KeyError: 'mike'

In [15]:
challenges = defaultdict(list)
for name, challenge in challenges_done:
    challenges[name].append(challenge)

challenges

defaultdict(list, {'bob': [5, 6], 'julian': [7, 8], 'mike': [10, 11]})

### 3. `Counter`

One of my favorites. Say you want to count the most common words in a text:

In [16]:
words = """Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been 
the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and 
scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into 
electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of
Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus
PageMaker including versions of Lorem Ipsum""".split()
words[:5]

['Lorem', 'Ipsum', 'is', 'simply', 'dummy']

Before getting to know `collections` I would has written something like this:

In [17]:
common_words = {}

for word in words:
    if word not in common_words:
        common_words[word] = 0
    common_words[word] += 1

# sort dict by values descending and slice first 5 to get most common
for k, v in sorted(common_words.items(),
                   key=lambda x: x[1],
                   reverse=True)[:5]:
    print(k ,v)

the 6
Lorem 4
Ipsum 4
of 4
and 3


Now compare this to using `Counter` and its `most_common` method:

In [18]:
Counter(words).most_common(5)

[('the', 6), ('Lorem', 4), ('Ipsum', 4), ('of', 4), ('and', 3)]

WOW!

When discovering this my mind was blown and it was a flag to myself that I had to study Python's [awesome standard library](https://docs.python.org/3/library/index.html) more, because it has these Pythonic idioms you can use right out of the box. They will make your code shorter (= less buggy) and more elegant!

Another aha moment was [`deque`](https://pybit.es/collections-deque.html) which we will cover next.

### 4. `deque`

> Deques are a generalization of stacks and queues (the name is pronounced “deck” and is short for “double-ended queue”). Deques support thread-safe, memory efficient appends and pops from either side of the deque with approximately the same O(1) performance in either direction. - [docs](https://docs.python.org/3.7/library/collections.html)




Lists in Python are awesome, probably one of your goto data structure, because they are so widely used and convenient. 

However certain operatings (delete, insert) can get slow when your `list` grows - see [TimeComplexity](https://wiki.python.org/moin/TimeComplexity) for more details. 

If you need to add/remove at both ends of a collection, consider using a `deque`. Let's show this with a practical example using the `timeit` module to measure performance:

First we create two 10 million integers with `range` storing one in a `list ` and the other in a `deque`:

In [19]:
lst = list(range(10000000))
deq = deque(range(10000000))

Let's do some removing and inserting at random locations in the sequence, a `list` is slow at this because it needs to move all adjacent around ([Grokking Algorithms](https://pybit.es/grokking_algorithms.html) explains this really well). Here is where `deque` is a big win:  

In [20]:
def insert_and_delete(ds):
    for _ in range(10):
        index = random.choice(range(100))
        ds.remove(index)
        ds.insert(index, index)

%timeit insert_and_delete(lst)

%timeit insert_and_delete(deq)

447 ms ± 45.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
83.7 µs ± 13.7 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


So when performance matters you really want to explore the alternative data structures in the `collections` module. Another example of a performance win is `ChainMap`:

> A ChainMap groups multiple dicts or other mappings together to create a single, updateable view. If no maps are specified, a single empty dictionary is provided so that a new chain always has at least one mapping. - [docs](https://docs.python.org/3.7/library/collections.html#collections.ChainMap)

## Second day: practice using movie data

For [Code Challenge 13 - Highest Rated Movie Directors](https://pybit.es/codechallenge13.html) we used a nice movie data set. Let's import it here to see some of our newly learned `collections` data types in action.

Let's make a `defaultdict` of movies per directory (keys = directors, values = list of movies). 

In [21]:
movie_data = 'https://raw.githubusercontent.com/pybites/challenges/solutions/13/movie_metadata.csv'
movies_csv = 'movies.csv'
urlretrieve(movie_data, movies_csv)

('movies.csv', <http.client.HTTPMessage at 0x10fee00f0>)

A `namedtuple` is ideal here to describe a movie so we can access movie attributes (e.g. `movie.score`):

In [22]:
Movie = namedtuple('Movie', 'title year score')

We need some CSV parsing as well here, no worries we got you covered:

In [23]:
def get_movies_by_director(data=movies_csv):
    """Extracts all movies from csv and stores them in a dictionary
       where keys are directors, and values is a list of movies (named tuples)"""
    directors = defaultdict(list)
    with open(data, encoding='utf-8') as f:
        for line in csv.DictReader(f):
            try:
                director = line['director_name']
                movie = line['movie_title'].replace('\xa0', '')
                year = int(line['title_year'])
                score = float(line['imdb_score'])
            except ValueError:
                continue

            m = Movie(title=movie, year=year, score=score)
            directors[director].append(m)

    return directors

In [24]:
directors = get_movies_by_director()

Looking up Christopher Nolan we get all his movies nicely stored in `Movie` `namedtuple` objects.

In [25]:
directors['Christopher Nolan']

[Movie(title='The Dark Knight Rises', year=2012, score=8.5),
 Movie(title='The Dark Knight', year=2008, score=9.0),
 Movie(title='Interstellar', year=2014, score=8.6),
 Movie(title='Inception', year=2010, score=8.8),
 Movie(title='Batman Begins', year=2005, score=8.3),
 Movie(title='Insomnia', year=2002, score=7.2),
 Movie(title='The Prestige', year=2006, score=8.5),
 Movie(title='Memento', year=2000, score=8.5)]

You can do a lot with this data set and [we challenge you to do so](https://pybit.es/codechallenge13.html), but for this example let's just get the 5 directors with the most movies. 

Of course we don't want to re-invent the wheel so we use `Counter`:

In [26]:
cnt = Counter()
for director, movies in directors.items():
    cnt[director] += len(movies)

cnt.most_common(5)

[('Steven Spielberg', 26),
 ('Woody Allen', 22),
 ('Martin Scorsese', 20),
 ('Clint Eastwood', 20),
 ('Ridley Scott', 17)]

## Third day: more practice on your own data

We challenge you to find your own data set and try to use the new `collections` data structures yourself. 

Stuck at finding examples? We used `collections` quite a bit for our own [100 Days of Code](https://github.com/pybites/100DaysOfCode/blob/master/LOG.md):

```$ python module_index.py |grep collections
collections        | stdlib | 001, 021, 023, 034, 036, 042, 045, 055, 057, 063, 076, 084, 086, 095, 096```

### Time to share what you've accomplished!

Be sure to share your last couple of days work on Twitter or Facebook. Use the hashtag **#100DaysOfCode**. 

Here are [some examples](https://twitter.com/search?q=%23100DaysOfCode) to inspire you. Consider including [@talkpython](https://twitter.com/talkpython) and [@pybites](https://twitter.com/pybites) in your tweets.

*See a mistake in these instructions? Please [submit a new issue](https://github.com/talkpython/100daysofcode-with-python-course/issues) or fix it and [submit a PR](https://github.com/talkpython/100daysofcode-with-python-course/pulls).*