-
Notifications
You must be signed in to change notification settings - Fork 2
2. Hybrid Data Structure
Xin Wan edited this page Jul 8, 2018
·
4 revisions
- To implement a data structure that can do two things:
- void insert(int) - insert a new element
- int median() get the median of all the inserted elements
What's more? 3. delete(int) delete one element from the data structure
- Choice 1: HashedPriorityQueue
class HashedPriorityQueue { // we implement this by ourselves.
Map<Integer, Integer> indexMap; //maintains the map of value to corresponding index
int[] array;
}
- Choice 2: Lazy deletion
标记哪些element is marked deleted
1. Set<Element> deleted
2. class Node {
Element ele;
boolean isDeleted;
}
// when the element to be removed is the top element:
while (!minHeap.isEmpty() && minHeap.peek().isDeleted()) {
minHeap.pop();
}
- Choice 3 TreeSet How to handle duplicates? (Important)
use a counter for the insert operation: counter
insert(1) 0 (1, 0)
insert(1) 1 (1, 1)
insert(2) 2 (2, 2)
delete(1) ----
insert(2) 3 (2, 3)
TreeSet<Node> orderedSet;
class Node implements Comparable<Node> {
Integer value;
int index;
public int compareTo(Node another) {
first, compare value
then, compare index;
}
}
delete(2) -->
choice 1: remove the largest node with value 2 --> orderedSet.floor(new Node(2, counter))
choice 2: remove the smallest node with value 2 --> orderedSet.ceiling(new Node(2, 0));
- Question 1.1 Median of stream data flow at any time
- Question 1.3 20% percentile of steam data flow
- Question 1.4: To Implement this data structure: insert(int), median(), delete(int)
- Questions 1.5: Sliding window median numbers (For an array with size n, find all size K window medians)
Find top k frequent visited urls(top k discussed tweets) from a long list of visited urls.
- Version 1: offline algorithm. HashMap<url, frequencey> -> Top K largest problem
- Version 2: online algorithm
- Question 2.2 What if the input is a steam and whenever a url is visited, we should update the Top K frequent ones.
Operations:
read(url)
- the url's corresponding frequency++ -> HashMap<url, frequency>
- if the url is already in topK -> update()
- if the url is not in topK
-- compare with the least frequency in the top K -> min()
--- <= do nothing O(1)
--- >, pop() + offer()
- Question 3: Design a cache that can support
- put(key, time, value)
- get(key, time)
May<key, SortedMap<Time, Value>>
most recent to given time -> floorKey() on time.
- Question 3.1: 换要求
- put(key, value) -> record the value at current timestamp
- get(key, value) -> get the value at most recent to given time for the key.
// the timestamp is already increasing sequence
Map<Key, ArrayList<Time, Value>>
- binary search for get(key, time) operation.
- If you know the position of the node in DDL, remove operation is O(1)
-
Maintain the order of access sequence.
- tail - most recently visited
- head - least recently visited
- Use Case: Move an element from an arbitrary position to another position with the requirement:
- remove from arbitrary position - O(1)
- add to the new position - O(1)
- Design a cache that can support: put(key, value), delete(key), get(key), List getKMostRecentKeys(int k) -- most recently accessed k <key, value>s. (HashMap + DDL maintain the access order.)
- Design LRU Cache (https://leetcode.com/problems/lru-cache/description/) (Note: no matter get() or put(), before you move it to tail, don't forget to connect its prev and next nodes first.)