# Elements of Programming Interviews
## Heaps
### Track 7: 11.1, 11.3, 11.4, 11.5, 11.7

### 11.1 - Merge Sorted Files

Write a program that takes as input a set of sorted sequences and computes the union of these sequences as a sorted sequence. 

For example, if the input is $[{3,5,7}],[{0,6}],[{0,6,28}]$ then the output is $[{0,0,3,5,6,6,7,28}]$

>Brute force approach is to have a tracker index for each sequence, each starting at index 0. Then find the minimum element out of all the tracker indices, push that element into the sorted sequence and increment the respective tracker index.


In [1]:
import heapq

In [2]:
def add_to_heap(heap, list_num, it):
    try:
        heapq.heappush(heap, (next(it), list_num))
    except StopIteration:
        pass
def merge_k_sorted_lists(*sequences):
    iters = list(map(iter, sequences))
    heap = []
    res = []
    #init heap to first vals of each seq
    #each heap elem holds (min_val, heap#)
    for seq_num, it in enumerate(iters):
        add_to_heap(heap, seq_num, it)
    while heap:
        min_val, seq_num = heapq.heappop(heap)
        add_to_heap(heap, seq_num, iters[seq_num])
        res.append(min_val)
    return res

In [3]:
sequences = [[0,3,5], [1, 2, 4]]
merge_k_sorted_lists([0, 2, 4, 6], [1, 7], [3, 5, 8])

[0, 1, 2, 3, 4, 5, 6, 7, 8]

### 11.3 - Sort An Almost Sorted Array
>Write a program which takes as input a very long sequence of numbers. Each number is at most $k$ away from its correctly sorted position. For exmaple, no number in the sequence
>>$ { 3, -1, 2, 6, 4, 5, 8 } $

>is more than 2 away from its final sorted position. Design an algorithm that output the numbers in the correct order.

The key insight here is to find the $k$ value. Once you have that, you can have a rolling heap of size $k$ iterating over the array, placing the minimum values accordingly. 

In [31]:
def find_k(arr):
    #for each unsorted elem, find the max # of moves 
    #it is away from correct spot
    max_k = 0
    for i in range(len(arr)-1):
        if arr[i] > arr[i+1]:
            k = 0
            unsorted_elem = arr[i]
            while (i+1) < len(arr) and unsorted_elem > arr[i + 1]:
                k += 1
                i += 1
            max_k = k if k > max_k else max_k
    return max_k

def sort_almost_sorted(arr):
    heap =[]
    k = find_k(arr)
    #init heap to first k vals
    for elem in arr[:k]:
        heapq.heappush(heap, elem)
    for i in range(len(arr) - k):
        arr[i] = heapq.heappop(heap)
        heapq.heappush(heap, arr[i + k])
    #dump the heap
    for i in range(len(arr)-k, len(arr)):
        arr[i] = heapq.heappop(heap)
    return arr

In [34]:
almost_sorted = [3,-1,2,6,4,5,8]
alm = [2,-1,6,4,5,10,7,8,9]
find_k(almost_sorted), find_k(alm)
sort_almost_sorted(almost_sorted), sort_almost_sorted([5,10,7,8,9])

([-1, 2, 3, 4, 5, 6, 8], [5, 7, 8, 9, 10])

### 11.4 - Compute The K Closest Stars
>Consider a coordinate system for the Milky Way, in which Earth is at (0,0,0). Model stars as points, and assume distances are in light years. The Milky Way consists of approximately $10^(12)$ stars, and their coordinates are stored in a file.

>How would you compute the $k$ stars which are closest to Earth?

File is assumed to be of format:

34.968398,-6.487265, 5.126452

34.969448,-6.488250, 5.763424

34.967364,-6.492370, 5.174235

34.965735,-6.582322, 5.108906


In [82]:
#pythons heapq doesn't have a max implementation, so workaround
#by negating each element before insert, to emulate a max-heap

from heapq import heappush, heappop
from operator import neg
def get_distance(x, y, z):
    return x*x + y*y + z*z

def find_k_closets_stars(file_path, k):
    heap = []
    with open(file_path, 'r') as f:
        #gen [x,y,z] for every line in f
        dist_coord_gen = ([float(val) for  val in line.split(',')] 
                                  for line in f)
        for coord in dist_coord_gen:
            distance = get_distance(coord[0], coord[1], coord[2])
            heappush(heap, (neg(distance), coord))
            if len(heap) > k:
                heappop(heap)
    #dump heap
    closest_stars = []
    while heap:
        dist, coord = heappop(heap)
        closest_stars.append(coord)
    #reversed due to the neg workaround
    return closest_stars[::-1]

In [84]:
find_k_closets_stars('Misc/stars.txt', 4)

[[0.0, 0.0, 0.0], [1.0, 1.0, 1.0], [2.0, 2.0, 2.0], [3.0, 3.0, 3.0]]

### 11.5 - Compute The Median Of Online Data
>You want to compute the running median of a sequence of numbers. The sequence is presented to you in a streaming fashion--you cannot back up to read an earlier value, and you neeed to output the median after reading in each new element. For example:

>If the input is $1,0,3,5,2,0,1$, 

>the output is $1,0.5, 1, 2, 2, 1.5, 1$

>Design an algorithm for computing the running median of a sequence

I don't want to have to look at all the previous values when reading in a new value to find the median. 

The key here is to use two heaps, one max and one min, due to the fact that the median splits a collection into two equal parts.

In [102]:
def compute_running_median(stream):
    min_heap = []; max_heap = []; medians = [] 
    #not using actual stream for succintness
    #init first two vals of stream to heaps
    heappush(min_heap, max(stream[0], stream[1]))
    heappush(max_heap, neg(min(stream[0], stream[1])))
    medians.append(stream[0])
    medians.append((stream[0] + stream[1]) / 2)
    for val in stream[2:]:
        if val > min_heap[0]:
            heappush(min_heap, val)
        else:
            heappush(max_heap, neg(val))
        #rebalance the heaps if one is >1 in size than the other
        if (len(min_heap) - len(max_heap)) > 1:
            heappush(max_heap, neg(heappop(min_heap)))
        elif (len(max_heap) - len(min_heap) > 1):
            heappush(min_heap, neg(heappop(max_heap)))
        #get median
        if len(min_heap) == len(max_heap):
            val1 = neg(max_heap[0])
            val2 = min_heap[0]
            medians.append((val1 + val2) / 2)
        elif len(min_heap) > len(max_heap):
            medians.append(min_heap[0])
        else:
            medians.append(neg(max_heap[0]))
    return medians

In [103]:
compute_running_median([1, 0, 3, 5, 2, 0, 1])

[1, 0.5, 1, 2.0, 2, 1.5, 1]