# Chapter 1
# Data Structures and Algorithms

## 1.1 Unpacking a Sequence into Saparate Variables

If you want to to unpack an N-element tuple or sequence into a collection of N variables.

That can be done using simple assignment operation. The only requirement is that the number of variables and structure match the sequence.

In [6]:
p = (4, 5)
a, b = p
a, b

(4, 5)

In [9]:
data = 'damn you beauty'.split(' ')
word1, word2, word3 = data

stuff = ['some stuff happens', (2017, 7, 12)]
event, (year, month, day) = stuff

**NOTE**: Unpacking actually works with any Iterable object, not just tuples of lists. This includes strings, files, iterators, and generators

In [10]:
s = 'Hello'
a, b, c, d, e = s

When unpacking, if you want to discard certain values, you can just pick a throwaway variable name

In [13]:
_, date = stuff
date

(2017, 7, 12)

In [14]:
%reset

Once deleted, variables cannot be recovered. Proceed (y/[n])? y


## 1.2 Unpacking elements from Iterables of Arbitrary Length

You need to unpack N elements from an iterable, but the iterable may be longer than N elements, causing a "too many values to unpack exception".

Python "star expressions" can be used to address this problem. For example, suppose you run a course and decide at the end of the semester that you're going to drop the first and last homework grades and only average the rest of them. A start expression makes it easy:

In [15]:
def drop_first_last(grades):
    first, *middle, last = grades
    return avg(middle)

Another use case: suppose you have user records that consist of a name and email address, followed by an arbitrary number of phone numbers. You could unpack the records like this:

In [18]:
record = ('Tu', 'tu@random.com', '773-555-1212', '267-694-9395')
name, email, *phone_numbers = record
name

'Tu'

In [19]:
phone_numbers

['773-555-1212', '267-694-9395']

**NOTE**: It's worth noting that the `phone_numbers` variable will always be a list, regardless of how many elements are unpacked (including None). Thus, you won't have to worry the possibility of it not being a list or so.

The starred variable can also be the first one in the list.

In [22]:
*trailing, current = [10, 9, 8, 7, 6, 10, 23, 100]
print(trailing)
print(current)

[10, 9, 8, 7, 6, 10, 23]
100


It's also worth noting that the star syntax can be especially useful when iterating over a sequence of tuples of varying length.

In [23]:
records = [
    ('foo', 1, 2),
    ('bar', 'hello'),
    ('foo', 3, 4),
]

def do_foo(x, y):
    print('foo', x, y)
    
def do_bar(s):
    print('bar', s)
    
for tag, *args in records:
    if tag == 'foo':
        do_foo(*args)
    elif tag == 'bar':
        do_bar(*args)

foo 1 2
bar hello
foo 3 4


Star unpacking can also be useful when combined with string processing:

In [24]:
line = 'nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false'
uname, *fields, homedir, sh = line.split(':')
print(uname)
print(homedir)
print(fields)
print(sh)

nobody
/var/empty
['*', '-2', '-2', 'Unprivileged User']
/usr/bin/false


In [1]:
record = ('ACME', 50)

In [3]:
%reset

Once deleted, variables cannot be recovered. Proceed (y/[n])? y


## 1.3 Keeping the Last N Items

You want to keep a limited history of the last few items seen during iteration or during some other kind of processing.

Keeping a limited history is a perfect use for a `collections.deque`. For example, the following code performs a simple text match on a sequence of lines and yields the matching line along with the previous N lines of context when found:

In [4]:
from collections import deque

def search(lines, pattern, history=5):
    previous_lines = deque(maxlen=history)
    for line in lines:
        if pattern in line:
            yield line, previous_lines
        previous_lines.append(line)

When writing code to search for items, it is common to use a generator function involving `yield`, as shown in this recipe's solution. This decouples the process of searching from the code that uses the results.

Using `deque(maxlen=N)` creates a fixed-sized queue. When new items are added and the queue is full, the oldest item is automatically removed. For example:

In [7]:
q = deque(maxlen=3)
q.append(1)
q.append(2)
q.append(3)
print(q)
q.append(4)
print(q)

deque([1, 2, 3], maxlen=3)
deque([2, 3, 4], maxlen=3)


More generally, a `dequeue` can be used whenever you need a simple queue structure. If you don't give a maximum size, you get an unbounded queue that lets you append and pop items on either end.

In [10]:
q = deque()
q.append(1)
q.append(2)
q.append(3)
q

deque([1, 2, 3])

In [12]:
q.appendleft(4)
q

deque([4, 1, 2, 3])

In [13]:
q.pop()
q

deque([4, 1, 2])

In [14]:
q.popleft()
q

deque([1, 2])

Adding or popping items from either end of a queue has O(1) complexity. This is unlike a list where inserting or removing items from the front of the list is O(N).

In [17]:
%reset

Once deleted, variables cannot be recovered. Proceed (y/[n])? y


## 1.4 Finding the Largest or Smallest N Items

You want to make a list of the largest or smallest N items in a collection.

The `heapq` module has two functions - `nlargest()` and `nsmallest()` - that do exactly what you want:

In [18]:
import heapq

nums = [1, 8, 23, 2, 7, -4, 18, 23, 42, 37, 2]
print(heapq.nlargest(3, nums))
print(heapq.nsmallest(3, nums))

[42, 37, 23]
[-4, 1, 2]


Both functions also accept a key parameter that allows them to be used with more complicated data structures. For example:

In [20]:
portfolio = [
    {'name': 'IBM', 'shares': 100, 'price': 91.1},
    {'name': 'AAPL', 'shares': 50, 'price': 543.22},
    {'name': 'FB', 'shares': 200, 'price': 21.09},
    {'name': 'HPQ', 'shares': 35, 'price': 31.75},
    {'name': 'YHOO', 'shares': 45, 'price': 16.35},
    {'name': 'ACME', 'shares': 75, 'price': 115.65}
]

cheap = heapq.nsmallest(3, portfolio, key=lambda s: s['price'])
expensive = heapq.nlargest(3, portfolio, key=lambda s: s['price'])

In [23]:
print(cheap)
print(expensive)

[{'name': 'YHOO', 'shares': 45, 'price': 16.35}, {'name': 'FB', 'shares': 200, 'price': 21.09}, {'name': 'HPQ', 'shares': 35, 'price': 31.75}]
[{'name': 'AAPL', 'shares': 50, 'price': 543.22}, {'name': 'ACME', 'shares': 75, 'price': 115.65}, {'name': 'IBM', 'shares': 100, 'price': 91.1}]


Underneath the covers, these functions work by first converting the data into a list where items are ordered as a heap. For example:

In [24]:
nums = [1, 8, 2, 23, 7, -4, 18, 23, 42, 37, 2]

heap = list(nums)
heapq.heapify(heap)
heap

[-4, 2, 1, 23, 7, 2, 18, 23, 42, 37, 8]

The most important feature of a heap is that `heap[0]` is always the smallest item. Moreover, subsequent items can be easily found using the `heapq.heappop()` method, which pops off the first item and replaces it with the next smallest item (an operation that requires O(logN) operations where N is the size of the heap). For example, to find the three smallest items, you would do this:

In [25]:
heapq.heappop(heap)

-4

In [26]:
heapq.heappop(heap)

1

In [27]:
heapq.heappop(heap)

2

The `nlargest()` and `nsmallest()` functions are most appropriate if you are trying to find a relatively small number of items. If you are simply trying to find the single smallest or largest item (N=1), it is faster to use `min()` and `max()`. Similarly, if N is about the same size as the collection itself, it is usually faster to sort it first and take a slice (i.e., use `sorted(items)[:N]` or `sorted(items)[-N:]`). It should be noted that the actual implementation of `nlargest()` and `nsmallest()` is adaptive in how it operates and will carry out some of these optimizations on your behalf (e.g., using sorting if N is close to the same size as the input).

In [33]:
%reset

Once deleted, variables cannot be recovered. Proceed (y/[n])? y
