## Median Maintenance

You are presented with a sequence of numbers, one by one; assume for simplicity that they are distinct. Each time you receive a new number, your responsibility is to reply with the median element of all the numbers you’ve seen thus far. Thus, after seeing the first 11 numbers, you should reply with the sixth-smallest one you’ve seen; after 12, the sixth- or seventh-smallest; after 13, the seventh-smallest; and so on. Using heaps, we can solve the median maintenance problem in just logarithmic time per round. The key idea is to maintain two heaps H1 and H2 while satisfying two invariants. The first invariant is that H1 and H2 are balanced, meaning they each contain the same number of elements (after an even round) or that one contains exactly one more element than the other (after an odd round). The second invariant is that H1 and H2 are ordered, meaning every element in H1 is smaller than every element in H2. For example, if the numbers so far have been 1, 2, 3, 4, 5, then H1 stores 1 and 2 and H2 stores 4 and 5; the median element 3 is allowed to go in either one, as either the maximum element of H1 or the minimum element of H2. If we’ve seen 1, 2, 3, 4, 5, 6, then the first three numbers are in H1 and the second three are in H2; both the maximum element of H1 and the minimum element of H2 are median elements. One twist: H2 will be a standard heap, supporting Insert and ExtractMin, while H1 will be the “max” variant, supporting Insert and ExtractMax. This way, we can extract the median element with one heap operation, whether it’s in H1 or H2.

We still must explain how to update H1 and H2 each time a new element arrives so that they remain balanced and ordered. To figure out where to insert a new element x so that the heaps remain ordered, it’s enough to compute the maximum element y in H1 and the minimum element z in H2. If x is less than y, it has to go in H1; if it’s more than z, it has to go in H2; if it’s in between, it can go in either one. H1 and H2 stay balanced even after x is inserted, except for one case: In an even round 2k, if x is inserted into the bigger heap (with k elements), this heap will contain k + 1 elements while the other contains only k - 1 elements. But this imbalance is easy to fix: Extract the maximum or minimum element from H1 or H2, respectively (whichever contains more elements), and re-insert this element into the other heap.

In [1]:
class MinHeap:

    def __init__(self, maxSize):
        self.maxSize = maxSize
        self.arr = [None]*maxSize
        self.heapSize = 0

    def heapify(self, i):
        l = 2*i + 1
        r = 2*i + 2
        minimum = i
        if l < self.heapSize and self.arr[l] < self.arr[i]:
            minimum = l
        if r < self.heapSize and self.arr[r] < self.arr[minimum]:
            minimum = r
        if minimum != i:
            tmp = self.arr[i]
            self.arr[i] = self.arr[minimum]
            self.arr[minimum] = tmp
            self.heapify(minimum)

    def insert(self, x):
        if self.heapSize == self.maxSize:
            print("Overflow: Cannot insert any more element.")
            return
        
        self.heapSize += 1
        i = self.heapSize - 1
        self.arr[i] = x
        parent = (i - 1)//2
        while i != 0 and self.arr[parent] > self.arr[i]:
            tmp = self.arr[i]
            self.arr[i] = self.arr[parent]
            self.arr[parent] = tmp
            i = parent
            parent = (i - 1)//2

    def removeMin(self):
        if self.heapSize <= 0:
            return None
        if self.heapSize == 1:
            self.heapSize = 0
            minimum = self.arr[0]
            self.arr[0] = None
            return minimum
        
        minimum = self.arr[0]
        self.arr[0] = self.arr[self.heapSize - 1]
        self.arr[self.heapSize - 1] = None
        self.heapSize -= 1
        self.heapify(0)
        return minimum


    def getMin(self):
        return self.arr[0]
    
    def printHeap(self):
        heap = ''
        for i in range(self.heapSize):
            heap += "{}, ".format(self.arr[i])
        print(heap)

In [2]:
class MaxHeap:

    def __init__(self, maxSize):
        self.maxSize = maxSize
        self.arr = [None]*maxSize
        self.heapSize = 0

    def heapify(self, i):
        l = 2*i + 1
        r = 2*i + 2
        maximum = i
        if l < self.heapSize and self.arr[l] > self.arr[i]:
            maximum = l
        if r < self.heapSize and self.arr[r] > self.arr[maximum]:
            maximum = r
        if maximum != i:
            tmp = self.arr[i]
            self.arr[i] = self.arr[maximum]
            self.arr[maximum] = tmp
            self.heapify(maximum)

    def insert(self, x):
        if self.heapSize == self.maxSize:
            print("Overflow: Cannot insert any more element.")
            return
        
        self.heapSize += 1
        i = self.heapSize - 1
        self.arr[i] = x
        parent = (i - 1)//2
        while i != 0 and self.arr[parent] < self.arr[i]:
            tmp = self.arr[i]
            self.arr[i] = self.arr[parent]
            self.arr[parent] = tmp
            i = parent
            parent = (i - 1)//2

    def removeMax(self):
        if self.heapSize <= 0:
            return None
        if self.heapSize == 1:
            self.heapSize = 0
            maximum = self.arr[0]
            self.arr[0] = None
            return maximum
        
        maximum = self.arr[0]
        self.arr[0] = self.arr[self.heapSize - 1]
        self.arr[self.heapSize - 1] = None
        self.heapSize -= 1
        self.heapify(0)
        return maximum


    def getMax(self):
        return self.arr[0]
    
    def printHeap(self):
        heap = ''
        for i in range(self.heapSize):
            heap += "{}, ".format(self.arr[i])
        print(heap)

In [6]:
# read the input
with open('Median.txt', 'r') as f:
    data = [int(line.rstrip()) for line in f.readlines()]



In [11]:
H1 = MaxHeap(len(data))
H2 = MinHeap(len(data))
median_sum = 0

for i in range(1,len(data)+1):
    if H1.heapSize > H2.heapSize:
        if data[i-1] < H1.getMax():
            H2.insert(H1.getMax())
            H1.removeMax()
            H1.insert(data[i-1])
        else:
            H2.insert(data[i-1])
        median_sum = (median_sum + H1.getMax()) % 10000
    elif H1.heapSize < H2.heapSize:
        if data[i-1] > H2.getMin():
            H1.insert(H2.getMin())
            H2.removeMin()
            H2.insert(data[i-1])
        else:
            H1.insert(data[i-1])
        median_sum = (median_sum + H1.getMax()) % 10000
    else:
        if H2.heapSize > 0 and data[i-1] > H2.getMin():
            H2.insert(data[i-1])
            median_sum = (median_sum + H2.getMin()) % 10000
        else:
            H1.insert(data[i-1])
            median_sum = (median_sum + H1.getMax()) % 10000

median_sum

1213