# Advanced numpy functions

## `vectorize` – Make a scalar function work on vectors

With the help of `numpy.vectorize` you can make a function that is meant to work on individual numbers, to work on arrays.

In [None]:
import numpy as np

# Define a scalar function
def foo(x):
    if x % 2 == 1:
        return x**2
    else:
        return x/2

# On a scalar
print('x = 10 returns ', foo(10))
print('x = 11 returns ', foo(11))

# On a vector, doesn't work
print('x = [10, 11, 12] returns ', foo([10, 11, 12]))  # Error

Let’s vectorize `foo()` so that it will work on arrays.

`numpy.vectorize` also accepts an optional `otypes` parameter where you provide what the datatype of the output should be. It makes the vectorized function run faster.

In [None]:
foo_v = np.vectorize(foo, otypes=[float])

print('x = [10, 11, 12] returns ', foo_v([10, 11, 12]))
print('x = [[10, 11, 12], [1, 2, 3]] returns ', foo_v([[10, 11, 12], [1, 2, 3]]))

## `apply_along_axis` – Apply a function column wise or row wise

Let's first create a 2D array to show this.

In [4]:
# Create a 4x10 random array
np.random.seed(100)
arr_x = np.random.randint(1,10,size=[4,10])
arr_x

array([[9, 9, 4, 8, 8, 1, 5, 3, 6, 3],
       [3, 3, 2, 1, 9, 5, 1, 7, 3, 5],
       [2, 6, 4, 5, 5, 4, 8, 2, 2, 8],
       [8, 1, 3, 4, 3, 6, 9, 2, 1, 8]])

Let’s understand this by solving the following question:

> *How to find the difference of the maximum and the minimum value in each row?*

Well, the normal approach would be to write a for-loop that iterates along each row and then compute the max-min in each iteration.

That sounds good, but it can get cumbersome if you want to do the same column wise or want to implement a more complex computation. Besides, it can consume more keystrokes.

You can do this elegantly using the `numpy.apply_along_axis`.

It takes as arguments:

* Function that works on a 1D vector (fund1d)
* Axis along which to apply func1d. For a 2D array, 1 is row wise and 0 is column wise.
* Array on which func1d should be applied.

Let’s implement this.

In [5]:
# Define func1d
def max_minus_min(x):
    return np.max(x) - np.min(x)

# Apply along the rows
print('Row wise: ', np.apply_along_axis(max_minus_min, 1, arr=arr_x))

# Apply along the columns
print('Column wise: ', np.apply_along_axis(max_minus_min, 0, arr=arr_x))

Row wise:  [8 8 6 8]
Column wise:  [7 8 2 7 6 5 8 5 5 5]


## `searchsorted` – Find the location to insert so the array will remain sorted

what does `numpy.searchsorted` do?

It gives the index position at which a number should be inserted in order to keep the array sorted.

In [7]:
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [6]:
# example of searchsorted
x = np.arange(10)
print('Where should 5 be inserted?: ', np.searchsorted(x, 5))
print('Where should 5 be inserted (right)?: ', np.searchsorted(x, 5, side='right'))

Where should 5 be inserted?:  5
Where should 5 be inserted (right)?:  6


## `newaxis` - How to add a new axis to a numpy array?

Sometimes you might want to convert a 1D array into a 2D array (like a spreadsheet) without adding any additional data.

You might need this in order a 1D array as a single column in a csv file, or you might want to concatenate it with another array of similar shape.

Whatever the reason be, you can do this by inserting a new axis using the `numpy.newaxis`.

Actually, using this you can raise an array of a lower dimension to a higher dimension.

In [8]:
# Create a 1D array
x = np.arange(5)
print('Original array: ', x)
print()

# Introduce a new column axis
x_col = x[:, np.newaxis]
print('x_col shape: ', x_col.shape)
print(x_col)
print()

# Introduce a new row axis
x_row = x[np.newaxis, :]
print('x_row shape: ', x_row.shape)
print(x_row)

Original array:  [0 1 2 3 4]

x_col shape:  (5, 1)
[[0]
 [1]
 [2]
 [3]
 [4]]

x_row shape:  (1, 5)
[[0 1 2 3 4]]


## Other useful functions

### Digitize

Use `numpy.digitize` to return the index position of the bin each element belongs to.

In [13]:
# Create the array and bins
x = np.arange(10)
bins = np.array([0, 3, 6, 9])
print(x)
print()

# Get bin allotments
np.digitize(x, bins)

[0 1 2 3 4 5 6 7 8 9]



array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4])

### Clip

Use `numpy.clip` to cap the numbers within a given cutoff range. All number lesser than the lower limit will be replaced by the lower limit. 

Same applies to the upper limit also.

In [14]:
# Cap all elements of x to lie between 3 and 8
np.clip(x, 3, 8)

array([3, 3, 3, 3, 4, 5, 6, 7, 8, 8])

### Histogram and Bincount

Both `numpy.histogram` and `numpy.bincount` gives the frequency of occurences. But with certain differences.

While `numpy.histogram` gives the frequency counts of the bins, `numpy.bincount` gives the frequency count of all the elements in the range of the array between the min and max values, including the values that did not occur.

In [15]:
# Bincount example
x = np.array([1,1,2,2,2,4,4,5,6,6,6]) # doesn't need to be sorted
print(np.bincount(x)) # 0 occurs 0 times, 1 occurs 2 times, 2 occurs 3 times, 3 occurs 0 times, ...
print()

# Histogram example
counts, bins = np.histogram(x, [0, 2, 4, 6, 8])
print('Counts: ', counts)
print('Bins: ', bins)

[0 2 3 0 2 1 3]

Counts:  [2 3 3 3]
Bins:  [0 2 4 6 8]
