# Python Heap #

heaqp module provides an implementation of the heap queue algorithm, also known as the priority queue algorithm.

To create a heap, use a list initialized to [], or you can transform a populated list into a heap via function heapify().

The following functions are provided:

- `heapq.heappush(heap, item)`

    Push the value item onto the heap, maintaining the heap invariant.
    

- `heapq.heappop(heap)`

    Pop and return the smallest item from the heap, maintaining the heap invariant. If the heap is empty, IndexError is raised.
    

- `heapq.heappushpop(heap, item)`

    Push item on the heap, then pop and return the smallest item from the heap. The combined action runs more efficiently than heappush() followed by a separate call to heappop().
    

- `heapq.heapify(x)`

    Transform list x into a heap, in-place, in linear time.


- `heapq.heapreplace(heap, item)`

    Pop and return the smallest item from the heap, and also push the new item. The heap size doesn’t change. If the heap is empty, IndexError is raised. This is more efficient than heappop() followed by heappush(), and can be more appropriate when using a fixed-size heap. Note that the value returned may be larger than item! That constrains reasonable uses of this routine unless written as part of a conditional replacement:

    if item > heap[0]: item = heapreplace(heap, item)

创建堆有两种方式，
- `heappush()`: 创建一个数组作为堆的容器，然后一个一个值添加到堆中
- `heapify()`:  将一个数组直接结构化为堆

### Example

#### heappush 堆中添加元素

In [2]:
from heapq import heappush, heappop

In [2]:
heap = []
data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
for item in data:
    heappush(heap, item)

ordered = []
while heap:
    ordered.append(heappop(heap))

ordered
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [3]:
data.sort()
data == ordered

True

#### heapify 将数组直接变成堆

In [3]:
import heapq

data = [1,5,3,2,8,5]
heapq.heapify(data)
data

while data:
    print(data)
    print(heappop(data))


[1, 2, 3, 5, 8, 5]
1
[2, 5, 3, 5, 8]
2
[3, 5, 8, 5]
3
[5, 5, 8]
5
[5, 8]
5
[8]
8


### Tuples 将元祖作为 heap item

如果没有指定，默认是用元祖的第一个元素来判断大小

The priority queue can store objects such as tuples:

Using a heap to insert items at the correct place in a priority queue:

In [4]:
heap = []
data = [(1, 'J'), (4, 'N'), (3, 'H'), (2, 'O')]  # 如果没有指定，默认是用元祖的第一个元素来判断大小
for item in data:
    heappush(heap, item)

while heap:
    item = heappop(heap) 
    print(item[0], ": ", item[1])

1 :  J
2 :  O
3 :  H
4 :  N


### More Functions 其他方法

The module also offers three general purpose functions based on heaps.

- `heapq.merge(*iterables)`

    Merge multiple sorted inputs into a single sorted output (for example, merge timestamped entries from multiple log files). Returns an iterator over the sorted values.

    Similar to sorted(itertools.chain(*iterables)) but returns an iterable, does not pull the data into memory all at once, and assumes that each of the input streams is already sorted (smallest to largest).


- `heapq.nlargest(n, iterable[, key])`: 找到前 n 大

    Return a list with the n largest elements from the dataset defined by iterable. key, if provided, specifies a function of one argument that is used to extract a comparison key from each element in the iterable: key=str.lower Equivalent to: sorted(iterable, key=key, reverse=True)[:n]


- `heapq.nsmallest(n, iterable[, key])`：找到前 n 小

    Return a list with the n smallest elements from the dataset defined by iterable. key, if provided, specifies a function of one argument that is used to extract a comparison key from each element in the iterable: key=str.lower Equivalent to: sorted(iterable, key=key)[:n]

The latter two functions perform best for smaller values of n. For larger values, it is more efficient to use the sorted() function. Also, when n==1, it is more efficient to use the builtin min() and max() functions.

#### nlargest

In [7]:
import heapq
li1 = [6, 7, 9, 4, 3, 5, 8, 10, 1]
heapq.heapify(li1)
print("The 3 largest numbers in list are : ",end="")
print(heapq.nlargest(3, li1))

The 3 largest numbers in list are : [10, 9, 8]


#### nsmallest

In [8]:
print("The 3 smallest numbers in list are : ",end="")
print(heapq.nsmallest(3, li1))

The 3 smallest numbers in list are : [1, 3, 4]


#### 将字典作为 heap item

python 可以将各种数据类型放入 heap 中，但是要求 <u>**放入 heap 的类型必须要能够比较大小**</u> (即内部有指定比较大小的 key)

In [15]:
portfolio = [
    {'name': 'IBM', 'shares': 100, 'price': 91.1},
    {'name': 'AAPL', 'shares': 50, 'price': 543.22},
    {'name': 'FB', 'shares': 200, 'price': 21.09},
    {'name': 'HPQ', 'shares': 35, 'price': 31.75},
    {'name': 'YHOO', 'shares': 45, 'price': 16.35},
    {'name': 'ACME', 'shares': 75, 'price': 115.65}
]

cheap = heapq.nsmallest(3, portfolio)
cheap  # 会报错，因为你没指定排序的 key (TypeError: '<' not supported between instances of 'dict' and 'dict')

In [13]:
# 指定排序的 key
cheap = heapq.nsmallest(3, portfolio, key=lambda s: s['price'])
cheap

[{'name': 'YHOO', 'shares': 45, 'price': 16.35},
 {'name': 'FB', 'shares': 200, 'price': 21.09},
 {'name': 'HPQ', 'shares': 35, 'price': 31.75}]

In [28]:
expensive = heapq.nlargest(3, portfolio, key=lambda s: s['price'])
expensive

[{'name': 'AAPL', 'price': 543.22, 'shares': 50},
 {'name': 'ACME', 'price': 115.65, 'shares': 75},
 {'name': 'IBM', 'price': 91.1, 'shares': 100}]

### Class Objects 自己的类作为 heap item

Python isn't strongly typed, so we can save anything we like: just as we stored a tuple of (priority,thing) in previous section. 

We can also store class objects if we override `__cmp__()` method:

In [20]:
# 复写 __lt__ in Python 3, 
# 复写 __cmp__ only in Python 2

class Skill(object):
    
    def __init__(self, priority, description):
        self.priority = priority
        self.description = description
        print('New Level:', description)
        return
    
    def __cmp__(self, other):
        return cmp(self.priority, other.priority)  # 用 self.priority 来比较大小
    
    def __lt__(self, other):
        return self.priority < other.priority
    
    def __repr__(self):
        return str(self.priority) + ": " + self.description

In [23]:
s1 = Skill(5, 'Proficient')
s2 = Skill(10, 'Expert')
s3 = Skill(1, 'Novice')

l = [s1, s2, s3]
heapq.heapify(l)
print("The 3 largest numbers in list are : ", end="")
print(heapq.nlargest(3, l))

while l:
    item = heappop(l) 
    print(item)

New Level: Proficient
New Level: Expert
New Level: Novice
The 3 largest numbers in list are : [10: Expert, 5: Proficient, 1: Novice]
1: Novice
5: Proficient
10: Expert


### 小练习

#### 在数组中取每一列前 k 大的元素

In [None]:
import numpy as np
import heapq

x = np.array(
    [
        [1, 2, 3, 4, 5, 6],
        [2, 3, 5, 7, 8, 1], 
        [7, 9, 6, 6, 3, 2], 
        [8, 9, 0, 1, 4, 7]
    ], 
    np.int32
)

In [30]:
# 解法 1
cols = x.shape[1]
print(cols)

for col in range(cols):
    y = x[:, col]
    h = []
    for e in y:
        heapq.heappush(h, e)
        if len(h) > 2:
            heapq.heappop(h)
    print(h)

np.sort(x, axis=0)[-2:]

6
[7, 8]
[9, 9]
[5, 6]
[6, 7]
[5, 8]
[6, 7]


array([[7, 9, 5, 6, 5, 6],
       [8, 9, 6, 7, 8, 7]])

In [31]:
# 解法 2
list(map(lambda x: heapq.nlargest(n=2, iterable=x), x.T))

[[8, 7], [9, 9], [6, 5], [7, 6], [8, 5], [7, 6]]