Skip to content

2. Hybrid Data Structure

Xin Wan edited this page Jul 8, 2018 · 4 revisions

Classic problems

  • To implement a data structure that can do two things:
  1. void insert(int) - insert a new element
  2. int median() get the median of all the inserted elements

What's more? 3. delete(int) delete one element from the data structure

  • Choice 1: HashedPriorityQueue
class HashedPriorityQueue { // we implement this by ourselves.
    Map<Integer, Integer> indexMap; //maintains the map of value to corresponding index
    int[] array;
}
  • Choice 2: Lazy deletion
标记哪些element is marked deleted
1. Set<Element> deleted
2. class Node {
    Element ele;
    boolean isDeleted;
}

// when the element to be removed is the top element:
while (!minHeap.isEmpty() && minHeap.peek().isDeleted()) {
    minHeap.pop();
}
  • Choice 3 TreeSet How to handle duplicates? (Important)
use a counter for the insert operation:    counter
insert(1)    0 (1, 0)
insert(1)    1 (1, 1)
insert(2)    2 (2, 2)
delete(1)     ----
insert(2)    3 (2, 3)

TreeSet<Node> orderedSet;
class Node implements Comparable<Node> {
    Integer value;
    int index;
    public int compareTo(Node another) {
          first, compare value
          then, compare index;
    } 
}

delete(2) -->
choice 1: remove the largest node with value 2 --> orderedSet.floor(new Node(2, counter))
choice 2: remove the smallest node with value 2 --> orderedSet.ceiling(new Node(2, 0));
  • Question 1.1 Median of stream data flow at any time
  • Question 1.3 20% percentile of steam data flow
  • Question 1.4: To Implement this data structure: insert(int), median(), delete(int)
  • Questions 1.5: Sliding window median numbers (For an array with size n, find all size K window medians)

Top k problems Set 2 - Top k Trend / Frequency

Find top k frequent visited urls(top k discussed tweets) from a long list of visited urls.

  • Version 1: offline algorithm. HashMap<url, frequencey> -> Top K largest problem
  • Version 2: online algorithm
  • Question 2.2 What if the input is a steam and whenever a url is visited, we should update the Top K frequent ones.
Operations:
read(url)
- the url's corresponding frequency++ -> HashMap<url, frequency>
- if the url is already in topK -> update()
- if the url is not in topK
-- compare with the least frequency in the top K -> min()
--- <= do nothing O(1)
--- >, pop() + offer()
  • Question 3: Design a cache that can support
  • put(key, time, value)
  • get(key, time)
May<key, SortedMap<Time, Value>>
most recent to given time -> floorKey() on time.
  • Question 3.1: 换要求
  • put(key, value) -> record the value at current timestamp
  • get(key, value) -> get the value at most recent to given time for the key.
// the timestamp is already increasing sequence
Map<Key, ArrayList<Time, Value>>
- binary search for get(key, time) operation.

Map + DDL

  • If you know the position of the node in DDL, remove operation is O(1)
  • Maintain the order of access sequence.
    • tail - most recently visited
    • head - least recently visited
  • Typical problem: LRU cache
  • Use Case: Move an element from an arbitrary position to another position with the requirement:
    • remove from arbitrary position - O(1)
    • add to the new position - O(1)

Classic Problems

  • Question 5: Design a cache that can support:
    • put(key, value)
    • delete(key)
    • get(key)
    • List getKMostRecentKeys(int k) -- most recently accessed k <key, value>s.

HashMap + DDL maintain the access order.

  • Question 6: Revisit Top K trend / Top k frequency online version. Is it possible to find a more efficient solution than what we already covered?

To Do Question

Clone this wiki locally