This problem was asked by Stripe.

Given an array of integers, find the first missing positive integer in linear time and constant space. In other words, find the lowest positive integer that does not exist in the array. The array can contain duplicates and negative numbers as well.

For example, the input [3, 4, -1, 1] should give 2. The input [1, 2, 0] should give 3.

You can modify the input array in-place.

My idea: sort the array, then compare each number to an index i. It is linear after sorting - but does that count?

In [21]:
def lowest_pos_int(arr):
    i=1
    arr = sorted(arr)
    for num in arr:
        if num>i: return i
        elif num==i and num>0: i+=1
    return i

In [22]:
lowest_pos_int([3,4,-1,1])

2

In [23]:
lowest_pos_int([1,2,0])

3

In [24]:
lowest_pos_int([-3,-4,-1,-1])

1

In [25]:
lowest_pos_int([])

1

In [29]:
lowest_pos_int([6,1,1,2,2,3])

4

Now, to do it without cheating - without the first sort step. Hm.

Logically, there are two cases: 1) if len(a) is n, then [1,2,3,...,n-1] exist in the array, so n is the answer. Or, 2), if the array doesn't just contain 1 to n-1, then the answer is some number in between 1 and n-1. Lucky us: the indices of our matrix are also numbered, well, 0 to n-1, which corresponds exactly to the potential solution space 1 to n.



Using this logic, and with thanks to geeksforgeeks for some ideas, we'll follow the following algorithm:

1) loop through indices

2) change a[a[i]-1] to None as long as a[i]-1 is in bounds

3) again loop, and return first index i+1 that is not None

In [9]:
def lowest_pos_int(arr):
    for i in range(len(arr)):
        if arr[i]>0 and arr[i]<len(arr):
            arr[arr[i]]=None
    for j in range(len(arr)):
        if arr[j] is None: return j+1
    return len(arr)

In [10]:
lowest_pos_int([3,4,-1,1])

TypeError: '>' not supported between instances of 'NoneType' and 'int'

That didn't work - we're overwriting some values as None before we can check them. Would be nice to create a new array to keep track...

What if, instead of number/none, we used positive and negative. So: separate positives from negatives.

[-2,5,0,-1,1] becomes [-2,0,-1,5,1] and keep track of last swap - swapped=3.

Now, starting at position 3, use the sign flip technique - negs to pos, pos to negs.

Finally, check the list. The first index that's either still neg/0 before swapped or still pos after swapped is our guy - or else len(n)+1 is.

In [25]:
def separate(arr):
    swap_pos=0
    for i in range(len(arr)):
        if arr[i]==0: arr[i]=-1
        if arr[i]<0: 
                temp = arr[swap_pos]
                arr[swap_pos] = arr[i]
                arr[i] = temp
                swap_pos+=1
    return swap_pos
                

In [26]:
a=[-2,5,0,-1,1]
separate(a)

3

In [27]:
a

[-2, -1, -1, 5, 1]

In [28]:
b = [1,2,3,4,5]
separate(b)

0

In [29]:
b

[1, 2, 3, 4, 5]

In [30]:
c = [-1,-2,-3,-4,-5]
separate(c)

5

In [31]:
c

[-1, -2, -3, -4, -5]

In [32]:
d = []
separate(d)

0

In [33]:
e = [1,2,3,4,-5]
separate(e)

1

In [34]:
e

[-5, 2, 3, 4, 1]

In [93]:
def lowest_missing_num(arr,swap):
    
    #start at swap. all a[i] for indices < swap should be neg.
    for i in range(swap,len(arr)):
        
        #index to change is value of the array minus 1
        #minus one so in the example [1,2,3], 3 gets a home
        #we return +1 later
        #careful with abs in case, as in [1,1,2,3], a num shows up twice
        pos = abs(arr[i])-1
        
        if pos<len(arr): 
            
            #indices that show up < swap should change from neg to pos, and >=swap should change pos to neg
            if pos<swap: 
                arr[pos]=abs(arr[pos])
            if pos>=swap: 
                arr[pos]=-1*abs(arr[pos])

    #now loop through and see what index didn't get swapped - 
    #first remaining neg or, post-swap, remaining pos gets returned
    for j in range(len(arr)):
        if j<swap:
            if arr[j]<0: return j+1
        if j>=swap: 
            if arr[j]>0: return j+1
    return len(arr)+1
    

In [94]:
a=[-2,5,0,-1,1]
swap = separate(a)
a

[-2, -1, -1, 5, 1]

In [95]:
lowest_missing_num(a,swap)

2

In [96]:
a=[3,4,-1,1]
swap = separate(a)
a

[-1, 4, 3, 1]

In [97]:
lowest_missing_num(a,swap)

2

In [98]:
a=[1,2,3,4,5]
swap = separate(a)
a

[1, 2, 3, 4, 5]

In [99]:
lowest_missing_num(a,swap)

6

What a horrible problem. The first solution is fine, and whatever we gained in efficiency through the n solutions --> n indices trick we lost in readability. It reminds me of integrals you can only solve if you know the perfect trick - ridiculous problems to test any sort of knowledge. 

I proved nothing using someone else's insight that n solutions lines up with n indices. Maybe I proved a tiny bit by understanding it, and a bit more by making the code work. In any case, if this was an interview problem, and I failed and was told the trick, I would have some choice words.