# Problem 1
> Given an array, find the median

## Quick select alg
> Let n be the array length. Then if we can find the k-th largest number in the array, we can solve the median problem.
> To find the k-th largest number in the array, each time find a pivot number randomly and find arrays S and L that contain numbers smaller/larger than the pivot number. 
> * If |L|>k, then recursively find the k-th element in L
> * If n-|S|<k, then recursively find the (k-(n-|S|))-th element in S
> * Otherwise, the pivot number must be the k-th element.  
> Since each recursive step uses O(n) time, either T(n) = O(n) + T(|S|) or T(n) = O(n) + T(|L|), the average time complexity is O(n).

In [1]:
import numpy as np

In [34]:
def quickSelect(nums,k):
    np.random.seed()
    n = len(nums)
    i = np.random.randint(low=0, high=n)
    p = nums[i]
    smaller = []
    larger = []
    for m in nums:
        if m<p:
            smaller.append(m)
        elif m>p:
            larger.append(m)
    if len(larger)>=k:
        return quickSelect(larger, k)
    elif len(smaller)>n-k:
        return quickSelect(smaller, k-(n-len(smaller)))
    else:
        return p

In [42]:
def findMedian(nums):
    n = len(nums)
    m = int(n/2)
    if n%2==1:
        return quickSelect(nums,m+1)
    else:
        return (quickSelect(nums,m+1)+quickSelect(nums,m))/2

In [57]:
array = [4,3,7,6,5,5,1,4,3,3,5,6,3,3,5,7,6,4,2]

In [45]:
quickSelect(array,6)

5

In [46]:
findMedian(array)

4

## Using three-way partition to reduce the space complexity from O(n) to O(1)
> In-place change position, keep track of 3 positions, end of small, begin of medium (i.e., equal to pivot), begin of large

In [75]:
def quickSelect(nums,k):
    np.random.seed()
    n = len(nums)
    i = np.random.randint(low=0, high=n)
    p = nums[i]
    posS,posM,posL = 0,n-1,n
    nums[i] = nums[n-1]
    nums[n-1] = p
    
    while posS<posM:
        if nums[posS]<p:
            posS += 1
        elif nums[posS]==p:
            posM -= 1
            nums[posS] = nums[posM]
            nums[posM] = p
        else:
            posL -= 1
            posM -= 1
            nums[posL] = nums[posS]
            nums[posS] = nums[posM]
            nums[posM] = p
    print(p,nums)
    if n-posL>=k:
        return quickSelect(nums[posL:],k)
    elif posS>n-k:
        return quickSelect(nums[:posS],k+posS-n)
    else:
        return p

In [84]:
10|1

11

In [76]:
array = [4,3,7,6,5,5,1,4,3,3,5,6,3,3,5,7,6,4,2]

In [79]:
quickSelect(array, 6)

7 [3, 3, 2, 3, 3, 3, 1, 4, 4, 4, 5, 5, 6, 5, 6, 5, 6, 7, 7]
5 [3, 3, 2, 3, 3, 3, 1, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6]


5

In [85]:
array

[3, 3, 2, 3, 3, 3, 1, 4, 4, 4, 5, 5, 6, 5, 6, 5, 6, 7, 7]

In [90]:
np.min(np.array([]))

ValueError: zero-size array to reduction operation minimum which has no identity

In [89]:
np.array([9,0,3])>0

array([ True, False,  True])

# Median of medians alg
> Quick select only guarantees average running time of $O(n)$. In worst case when always smallest/largest elements are selected as pivot, finding median alg can be $O(n^2)$. Median of medians is the algorithm to reduce the worst-case time complexity to $O(n)$.  
> The trick of Median of medians algorithm is to guarantee each time pivot is between the 30th and 70th percentiles. Thus, the size of the array in each recursion must exponentially decrease. Thus it makes the worst time complexity to $O(n)$.
> 