# Lesson 5: Solving Real-World Problems with Heaps in Python


Welcome to this enriching session! Today, we will delve deeper into heaps by applying them to two intriguing problems. This will help you understand how heaps, a form of tree structure, can create efficient solutions to practical problems. 

Before we begin, remember that heaps are a type of priority queue where parent nodes always have values lesser (in a Min Heap) or greater (in a Max Heap) than their child nodes. This property is the foundation of our problem-solving approach with heaps.

---

## Problem: Heap-based Median Finder

Consider this scenario: You're working on an algorithm for a real-time analytics engine that calculates the median value of a continuously growing dataset. For instance, an ad tech company might need to analyze click-stream data in real-time. Our first problem is to create a data structure that supports adding a number while ensuring efficient retrieval of the median at any given point.

**Note:** A median value is the middle number in a data set when arranged in ascending order. If there is an even number of data points, the median is the average of the two numbers in the middle. It is a measure of central tendency used in statistics.

---

## Naive Approach and its Limitations

One initial approach could be to save each incoming number in a Python list. Whenever we need the median, we can sort the list and compute it. However, as the list length increases, the time to sort the list also grows as sorting has a time complexity of **O(n log n)** per each median search request. Thus, this approach becomes less efficient when we want to add and retrieve the median frequently.

---

## Efficient Approach

A smarter way to solve this problem is to maintain two heaps:
- A **max heap** to store the smaller half of the numbers.
- A **min heap** for the larger half.

If the heaps are kept balanced in their size, finding the median can be done in **O(1)** time — you need just the maximal value from the first half (Max Heap), and the minimal value from the second half (Min Heap). These two elements should be enough to calculate the median value.

New element addition at the same time can be done in **O(log n)** time: the new element can be just added to the first half heap (Max Heap), but after that, heaps should potentially be rebalanced to keep their sizes differing by at most 1. However, due to the fact that after a new element addition heap sizes differ by at most 2, moving just a single element from one heap to another should be enough, and this balancing can be done in **O(log n)** time.

---

## Implementing the Solution

Let's delve into the implementation specifics. We'll use Python's built-in module `heapq`, which allows us to create a standard min heap. By storing numbers as negatives, we can simulate a max heap.

### Python Code

```python
import heapq

class MedianFinder:
    def __init__(self):
        # Initialize two heaps: small (max heap) and large (min heap)
        self.heaps = [], []

    def addNum(self, num):
        small, large = self.heaps
        # Push the number to the min heap, then push the smallest element from the min heap to the max heap
        heapq.heappush(small, -heapq.heappushpop(large, num))
        # Rebalance the heaps if needed
        if len(large) < len(small):
            heapq.heappush(large, -heapq.heappop(small))

    def findMedian(self):
        small, large = self.heaps
        # If the min heap (large) has more elements, return the smallest element from it
        if len(large) > len(small):
            return float(large[0])
        # Return the average of the smallest element in the min heap and the largest in the max heap
        return float((large[0] - small[0]) / 2.0)

```

## Lesson Summary
Today we learned how to create a data structure for efficiently finding the median of streaming data and how to effectively apply heaps to solve this problem. This real-world example underscores the ubiquity of heaps in a variety of applications, from designing efficient algorithms to iterating over large datasets.

They remind us that understanding and mastering heaps can provide a significant advantage when tackling complex technical interviews.

Now, to cement your understanding, you'll practice and apply what you've learned in the following exercise.

Happy coding!




## Developing an Efficient Real-time Data Tracker in Python

Alright, it's time for a real challenge! Suppose you're working with a big stream of data, and your task is to support two operations on it:

Add a new integer number to the stream
Calculate the middle element value: in case the number of elements in the stream is odd, that's the middle element if stream elements are sorted, in case the number of elements is even, that's the largest element out of two middle stream elements.
For example, middleEl([1, 2, 3, 4, 5]) = 3 and middleEl([1, 2, 3, 4, 5, 6]) = 4.
Create a method, add_num(num: int) -> None, that takes an integer num as an argument and adds it to our data heap. Also, whip up another one, middle_element() -> int, which doesn't take any input but returns the middle element in the data.

Remember, the idea is to keep it simple! No complex phrases or highfalutin lingo, just clean, understandable, and practical code in your solution. And never mind edge cases or unusual inputs; our input data is always squeaky clean for this one! So put on your thinking cap and show me your moves!

```python
import heapq

class MiddleElementFinder:
    def __init__(self):
        self.heaps = [], []

    def add_num(self, num: int) -> None:
        # implement this
        pass

    def middle_element(self) -> int:
        # implement this
        return 0

# Let's test the code
estimate_finder = MiddleElementFinder()
estimate_finder.add_num(5)
estimate_finder.add_num(10)
estimate_finder.add_num(3)
estimate_finder.add_num(1)
estimate_finder.add_num(7)

print(estimate_finder.middle_element()) # Expected output: 5
```

Below is a simple, clean solution using two heaps. We use a max heap (simulated with negative numbers) for the lower half of the stream and a min heap for the upper half. We maintain the invariant so that either the max heap has one extra element (when the total count is odd) or both heaps have equal size (when even). For even counts, the "middle" is defined as the smallest element in the upper half (which is larger than the maximum of the lower half).

```python
import heapq

class MiddleElementFinder:
    def __init__(self):
        # 'left' is a max heap (simulate with negatives) for the lower half.
        # 'right' is a min heap for the upper half.
        self.left = []
        self.right = []
    
    def add_num(self, num: int) -> None:
        # Step 1: Always push the new number onto the left (max heap) as negative.
        heapq.heappush(self.left, -num)
        
        # Step 2: Move the largest element from left to right.
        heapq.heappush(self.right, -heapq.heappop(self.left))
        
        # Step 3: Balance the heaps.
        # If the right heap ends up with more elements, move one element back to left.
        if len(self.right) > len(self.left):
            heapq.heappush(self.left, -heapq.heappop(self.right))
    
    def middle_element(self) -> int:
        # When the count is odd, left has one extra element.
        if len(self.left) > len(self.right):
            return -self.left[0]
        # When even, by our invariant the middle element is the smallest element in right.
        return self.right[0]

# Testing the solution
estimate_finder = MiddleElementFinder()
estimate_finder.add_num(5)
estimate_finder.add_num(10)
estimate_finder.add_num(3)
estimate_finder.add_num(1)
estimate_finder.add_num(7)

print(estimate_finder.middle_element())  # Expected output: 5
```

### Explanation

1. **Adding a number (`add_num`)**:
   - We first push the new number into the left heap as its negative to simulate a max heap.
   - Then, we pop the largest number from left (which is the smallest negative) and push it into right.
   - Finally, if the right heap has more elements than the left, we move one element from right back to left. This ensures that either:
     - The left heap has one extra element (odd total count), or
     - Both heaps are equal (even total count).

2. **Finding the middle element (`middle_element`)**:
   - If the total number of elements is odd, the extra element in the left heap (i.e. its maximum, when converted back from negative) is the middle.
   - If the total number is even, the two middle numbers are the maximum of the left and the minimum of the right heaps. Because all elements in left are ≤ all in right, the larger of these two is the root of the right heap.

This approach ensures both operations run efficiently and the code remains clean and understandable.