## numpy (Numerical Python)

<p><a href="http://www.numpy.org/" target="_blank">Numpy</a> is the core library for scientific computing in Python.
It provides a high-performance multidimensional array object, and tools for working with these
arrays. If you are already familiar with MATLAB, you might find
<a href="https://docs.scipy.org/doc/numpy/user/numpy-for-matlab-users.html" target="_blank">this tutorial useful</a> to get started with Numpy.</p>

If you are going to work on data analysis or machine learning projects, then having a solid understanding of numpy is nearly mandatory, because other packages for data analysis (like pandas) are built on top of numpy, and the scikit-learn package which is used to build machine learning applications works heavily with numpy as well.

So what does numpy provide?

At the core, numpy provides the excellent ndarray objects, short for n-dimensional arrays.

In a ‘ndarray’ object, aka ‘array’, you can store multiple items of the same data type. It is the facilities around the array object that makes numpy so convenient for performing math and data manipulations.

## How to create a numpy array?

There are multiple ways to create a numpy array, most of which will be covered as you read this. However one of the most common ways is to create one from a list or a list like an object by passing it to the `np.array` function.

In [69]:
import numpy as np

# Create an 1 dimension array from a list
list1 = [0,1,2,3,4]
arr1d = np.array(list1)

# Print the array and its type
print(type(arr1d))
arr1d

<class 'numpy.ndarray'>


array([0, 1, 2, 3, 4])

The key difference between an array and a list is, arrays are designed to handle vectorized operations while a python list is not.

That means, if you apply a function it is performed on every item in the array, rather than on the whole array object.

Let’s suppose you want to add the number 2 to every item in the list. The intuitive way to do it is something like this:

`list1 + 2  #error`

That was not possible with a list. But you can do that on a ndarray.

In [70]:
# Add 2 to each element of arr1d
arr1d = arr1d + 2

arr1d

array([2, 3, 4, 5, 6])

Another characteristic is that, **once a numpy array is created, you cannot increase its size**. To do so, you will have to create a new array. But such a behavior of extending the size is natural in a list. Nevertheless, there are so many more advantages.

So, that’s about 1d array. You can also pass a list of lists to create a matrix like a 2d array.

In [71]:
# Create a 2d array from a list of lists
list2 = [[0,1,2], [3,4,5], [6,7,8]]
arr2d = np.array(list2)
arr2d

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

You may also specify the datatype by setting the `dtype` argument. Some of the most commonly used numpy dtypes are: `float`, `int`, `bool`, `str` and `object`.

In [72]:
# Create a float 2d array
arr2d_f = np.array(list2, dtype='float')
arr2d_f

array([[0., 1., 2.],
       [3., 4., 5.],
       [6., 7., 8.]])

The decimal point after each number is indicative of the `float` datatype. You can also convert it to a different datatype using the `astype` method.

In [77]:
# Convert to 'int' datatype
arr2d_f.astype('int')

array([[0., 1., 2.],
       [3., 4., 5.],
       [6., 7., 8.]])

In [79]:
# Convert to int then to str datatype
arr2d_f.astype('float').astype('str')

array([['0.0', '1.0', '2.0'],
       ['3.0', '4.0', '5.0'],
       ['6.0', '7.0', '8.0']], dtype='<U32')

A numpy array **must have all items to be of the same data type**, unlike lists. This is another significant difference.

However, if you are uncertain about what datatype your array will hold or if you want to hold characters and numbers in the same array, you can set the dtype as `object`.

In [75]:
# Create an object array to hold numbers as well as strings
arr1d_obj = np.array([1, 'a'], dtype='object')
arr1d_obj

array([1, 'a'], dtype=object)

Finally, you can always convert an array back to a python list using `tolist()`.

In [76]:
# Convert an array back to a list
arr1d_obj.tolist()

[1, 'a']

## How to inspect the size and shape of a numpy array?

Let’s suppose you were handed a numpy vector that you didn’t create yourself. What are the things you would want to explore in order to know about that array?

* If it is a 1D or a 2D array or more. (`ndim`)

* How many items are present in each dimension (`shape`)

* What is its datatype (`dtype`)

* What is the total number of items in it (`size`)

In [80]:
# Create a 2d array with 3 rows and 4 columns
list2 = [[1, 2, 3, 4],[3, 4, 5, 6], [5, 6, 7, 8]]
arr2 = np.array(list2, dtype='float')
arr2

array([[1., 2., 3., 4.],
       [3., 4., 5., 6.],
       [5., 6., 7., 8.]])

In [81]:
# shape
print('Shape: ', arr2.shape)

# dtype
print('Datatype: ', arr2.dtype)

# size
print('Size: ', arr2.size)

# ndim
print('Num Dimensions: ', arr2.ndim)

Shape:  (3, 4)
Datatype:  float64
Size:  12
Num Dimensions:  2


## How to extract specific items from an array?

You can extract specific portions on an array using indexing starting with 0, something similar to how you would do with python lists.

But unlike lists, numpy arrays can optionally accept as many parameters in the square brackets as there is number of dimensions.

In [82]:
arr2

array([[1., 2., 3., 4.],
       [3., 4., 5., 6.],
       [5., 6., 7., 8.]])

In [84]:
# Extract the first 2 rows and columns
arr2[:2, :2]

array([[1., 2.],
       [3., 4.]])

Additionally, numpy arrays support boolean indexing.

A boolean index array is of the same shape as the array-to-be-filtered and it contains only True and False values. The values corresponding to True positions are retained in the output.

In [85]:
# Get the boolean output by applying the condition to each element.
b = arr2 > 4
b

array([[False, False, False, False],
       [False, False,  True,  True],
       [ True,  True,  True,  True]])

In [86]:
arr2[b]

array([5., 6., 5., 6., 7., 8.])

## How to reverse the rows and the whole array?

Reversing an array works like how you would do with lists, but you need to do for all the axes (dimensions) if you want a complete reversal.

In [89]:
arr2

array([[1., 2., 3., 4.],
       [3., 4., 5., 6.],
       [5., 6., 7., 8.]])

In [87]:
# Reverse only the row positions
arr2[::-1, ]

array([[5., 6., 7., 8.],
       [3., 4., 5., 6.],
       [1., 2., 3., 4.]])

In [88]:
# Reverse the row and column positions
arr2[::-1, ::-1]

array([[8., 7., 6., 5.],
       [6., 5., 4., 3.],
       [4., 3., 2., 1.]])

## How to represent missing values and infinite?

Missing values can be represented using `np.nan` object, while `np.inf` represents infinite. Let’s place some in `arr2`.

In [90]:
# Insert a nan and an inf
arr2[1,1] = np.nan  # not a number
arr2[1,2] = np.inf  # infinite
arr2

array([[ 1.,  2.,  3.,  4.],
       [ 3., nan, inf,  6.],
       [ 5.,  6.,  7.,  8.]])

In [92]:
# Replace nan and inf with -1. Don't use arr2 == np.nan
missing_bool = np.isnan(arr2) | np.isinf(arr2)
arr2[missing_bool] = -1  
arr2

array([[ 1.,  2.,  3.,  4.],
       [ 3., -1., -1.,  6.],
       [ 5.,  6.,  7.,  8.]])

## How to compute mean, min, max on the ndarray?

In [93]:
# mean, max and min
print("Mean value is: ", arr2.mean())
print("Max value is: ", arr2.max())
print("Min value is: ", arr2.min())

Mean value is:  3.5833333333333335
Max value is:  8.0
Min value is:  -1.0


However, if you want to compute the minimum values row wise or column wise, use the `np.mean`, `np.amax` and the `np.amin` version instead.

In [95]:
arr2

array([[ 1.,  2.,  3.,  4.],
       [ 3., -1., -1.,  6.],
       [ 5.,  6.,  7.,  8.]])

In [94]:
# Row wise and column wise mean
print("Column wise mean: ", np.mean(arr2, axis=0))
print("Row wise mean: ", np.mean(arr2, axis=1))
print()

# Row wise and column wise max
print("Column wise maximum: ", np.amax(arr2, axis=0))
print("Row wise maximum: ", np.amax(arr2, axis=1))
print()

# Row wise and column wise min
print("Column wise minimum: ", np.amin(arr2, axis=0))
print("Row wise minimum: ", np.amin(arr2, axis=1))

Column wise mean:  [3.   2.33 3.   6.  ]
Row wise mean:  [2.5  1.75 6.5 ]

Column wise maximum:  [5. 6. 7. 8.]
Row wise maximum:  [4. 6. 8.]

Column wise minimum:  [ 1. -1. -1.  4.]
Row wise minimum:  [ 1. -1.  5.]


In [96]:
# Cumulative Sum
np.cumsum(arr2)

array([ 1.,  3.,  6., 10., 13., 12., 11., 17., 22., 28., 35., 43.])

## How to create a new array from an existing array?

If you just assign a portion of an array to another array, the new array you just created actually refers to the parent array in memory.

That means, if you make any changes to the new array, it will reflect in the parent array as well.

So to avoid disturbing the parent array, you need to make a copy of it using `copy()`.

In [97]:
# Assign portion of arr2 to arr2a. Doesn't really create a new array.
arr2a = arr2[:2,:2]  
arr2a[:1, :1] = 100  # 100 will reflect in arr2
arr2

array([[100.,   2.,   3.,   4.],
       [  3.,  -1.,  -1.,   6.],
       [  5.,   6.,   7.,   8.]])

In [98]:
# Copy portion of arr2 to arr2b
arr2b = arr2[:2, :2].copy()
arr2b[:1, :1] = 101  # 101 will not reflect in arr2
print(arr2)
print()
print(arr2b)

[[100.   2.   3.   4.]
 [  3.  -1.  -1.   6.]
 [  5.   6.   7.   8.]]

[[101.   2.]
 [  3.  -1.]]


## Reshaping and Flattening Multidimensional arrays

Reshaping is changing the arrangement of items so that shape of the array changes while maintaining the same number of dimensions.

Flattening, however, will convert a multi-dimensional array to a flat 1d array. And not any other shape.

First, let’s reshape the `arr2` array from 3×4 to 4×3 shape.

In [105]:
# Reshape a 3x4 array to 4x3 array
arr2.reshape(4, 3)

array([[100.,   2.,   3.],
       [  4.,   3.,  -1.],
       [ -1.,   6.,   5.],
       [  6.,   7.,   8.]])

### What is the difference between `flatten()` and `ravel()`?

There are 2 popular ways to implement flattening. That is using the `flatten()` method and the other using the `ravel()` method.

The difference between ravel and flatten is that the new array created using ravel is actually a reference to the parent array. So, any changes to the new array will affect the parent as well. But is memory efficient since it does not create a copy.

In [106]:
# Flatten it to a 1d array
arr2.flatten()

array([100.,   2.,   3.,   4.,   3.,  -1.,  -1.,   6.,   5.,   6.,   7.,
         8.])

In [107]:
# Changing the flattened array does not change parent
b1 = arr2.flatten()  
b1[0] = 101  # changing b1 does not affect arr2
arr2

array([[100.,   2.,   3.,   4.],
       [  3.,  -1.,  -1.,   6.],
       [  5.,   6.,   7.,   8.]])

In [108]:
# Changing the raveled array changes the parent also.
b2 = arr2.ravel()  
b2[0] = 101  # changing b2 changes arr2 also
arr2

array([[101.,   2.,   3.,   4.],
       [  3.,  -1.,  -1.,   6.],
       [  5.,   6.,   7.,   8.]])

## How to create sequences, repetitions and random numbers using numpy?

The `np.arange` function comes handy to create customised number sequences as `ndarray`.

In [109]:
# Index starts at zero
print(np.arange(5))  
print()

# 0 to 9
print(np.arange(0, 10))  
print()

# 0 to 9 with step of 2
print(np.arange(0, 10, 2))  
print()

# 10 to 1, decreasing order
print(np.arange(10, 0, -1))

[0 1 2 3 4]

[0 1 2 3 4 5 6 7 8 9]

[0 2 4 6 8]

[10  9  8  7  6  5  4  3  2  1]


You can set the starting and end positions using `np.arange`. But if you're focused on the number of items in the array you will have to manually calculate the appropriate step value.

Say, you want to create an array of exactly 10 numbers between 1 and 50: can you compute what would the step value be?

Well, you can use the `np.linspace` instead.

In [112]:
# Start at 1 and end at 50
np.linspace(start=1, stop=50, num=10, dtype=int)

array([ 1,  6, 11, 17, 22, 28, 33, 39, 44, 50])

Notice since I explicitly forced the `dtype` to be `int`, the numbers are not equally spaced because of the rounding.

Similar to `np.linspace`, there is also `np.logspace` which rises in a logarithmic scale. In `np.logspace`, the given start value is actually base^start and ends with base^stop, with a default based value of 10.

In [113]:
# Limit the number of digits after the decimal to 2
np.set_printoptions(precision=2)  

# Start at 10^1 and end at 10^50
np.logspace(start=1, stop=50, num=10, base=10)

array([1.00e+01, 2.78e+06, 7.74e+11, 2.15e+17, 5.99e+22, 1.67e+28,
       4.64e+33, 1.29e+39, 3.59e+44, 1.00e+50])

The `np.zeros` and `np.ones` functions lets you create arrays of desired shape where all the items are either 0’s or 1’s.

In [114]:
np.zeros([2,2], dtype=int)

array([[0, 0],
       [0, 0]])

In [115]:
np.ones([2,2], dtype=int)

array([[1, 1],
       [1, 1]])

### How to create repeating sequences?

`np.tile` will repeat a whole list or array `n` times. Whereas `np.repeat` repeats each item n times.

In [118]:
a = [1,2,3] 

# Repeat whole of 'a' two times
print('Tile:   ', np.tile(a, 2))

# Repeat each element of 'a' two times
print('Repeat: ', np.repeat(a, 2))

Tile:    [1 2 3 1 2 3]
Repeat:  [1 1 2 2 3 3]


### How to generate random numbers?

The `random` module provides nice functions to generate random numbers (and also statistical distributions) of any given shape.

In [119]:
# Random numbers between [0,1) of shape 2,2
print(np.random.rand(2,2))
print()

# Normal distribution with mean=0 and variance=1 of shape 2,2
print(np.random.randn(2,2))
print()

# Random integers between [0, 10) of shape 2,2
print(np.random.randint(0, 10, size=[2,2]))
print()

# One random number between [0,1)
print(np.random.random())
print()

# Random numbers between [0,1) of shape 2,2
print(np.random.random(size=[2,2]))
print()

# Pick 10 items from a given list, with equal probability
print(np.random.choice(['a', 'e', 'i', 'o', 'u'], size=10))  
print()

# Pick 10 items from a given list with a predefined probability 'p'
print(np.random.choice(['a', 'e', 'i', 'o', 'u'], size=10, p=[0.3, .1, 0.1, 0.4, 0.1]))  # picks more o's

[[0.48 0.47]
 [0.47 0.87]]

[[ 0.93  1.73]
 [ 0.51 -0.33]]

[[1 8]
 [1 9]]

0.13062548097479965

[[0.64 0.73]
 [0.72 0.85]]

['e' 'a' 'i' 'o' 'a' 'e' 'a' 'o' 'o' 'e']

['o' 'o' 'a' 'a' 'i' 'e' 'o' 'o' 'o' 'i']


## How to get the unique items and the counts?

The `np.unique` method can be used to get the unique items. If you want the repetition counts of each item, set the `return_counts` parameter to `True`.

In [120]:
arr_rand = np.random.randint(0, 10, size=10)
arr_rand

array([8, 8, 3, 5, 1, 0, 9, 8, 4, 4])

In [121]:
# Get the unique items and their counts
uniqs, counts = np.unique(arr_rand, return_counts=True)
print("Unique items : ", uniqs)
print("Counts       : ", counts)

Unique items :  [0 1 3 4 5 8 9]
Counts       :  [1 1 1 2 1 3 1]


## How to get index locations that satisfy a given condition using `np.where`?

Previously you saw how to extract items from an array that satisfy a given condition. Boolean indexing, remember?

But sometimes we want to know the index positions of the items (that satisfy a condition) and do whatever you want with it.

`np.where` locates the positions in the array where a given condition holds true.

In [122]:
# Create an array
arr_rand = np.array([8, 8, 3, 7, 7, 0, 4, 2, 5, 2])
print("Array: ", arr_rand)

# Positions where value > 5
index_gt5 = np.where(arr_rand > 5)
print("Positions where value > 5: ", index_gt5)

Array:  [8 8 3 7 7 0 4 2 5 2]
Positions where value > 5:  (array([0, 1, 3, 4]),)


Once you have the positions, you can extract them using the array’s take method.

In [123]:
# Take items at given index
arr_rand.take(index_gt5)

array([[8, 8, 7, 7]])

Thankfully, `np.where` also accepts 2 more optional arguments x and y. Whenever condition is true, `x` is yielded else `y`.

Below, I try to create an array that will have the string *gt5* whenever the condition is true, else, it will have *lt5*.

In [125]:
# If value > 5, then yield 'gt5' else 'le5'
np.where(arr_rand > 5, 'gt5', 'lt5')

array(['gt5', 'gt5', 'lt5', 'gt5', 'gt5', 'lt5', 'lt5', 'lt5', 'lt5',
       'lt5'], dtype='<U3')

Let’s find the location of the maximum and minimum values as well.

In [127]:
arr_rand

array([8, 8, 3, 7, 7, 0, 4, 2, 5, 2])

In [126]:
# Location of the max
print('Position of max value: ', np.argmax(arr_rand))  

# Location of the min
print('Position of min value: ', np.argmin(arr_rand))

Position of max value:  0
Position of min value:  5


## How to import and export data as a csv file?

A standard way to import datasets is to use the `np.genfromtxt` function. It can import datasets from web URLs, handle missing values, multiple delimiters, handle irregular number of columns etc.

A less versatile version is the `np.loadtxt` which assumes the dataset has no missing values.

As an example, let’s try to read a .csv file from the below URL. Since all elements in a numpy array should be of the same data type, the last column which is a text will be imported as a *nan* by default.

By setting the `filling_values` argument you can replace the missing values with something else.

In [128]:
# Turn off scientific notation
np.set_printoptions(suppress=True)  

# Import data from csv file url
path = 'https://raw.githubusercontent.com/selva86/datasets/master/Auto.csv'
data = np.genfromtxt(path, delimiter=',', skip_header=1, filling_values=-999, dtype='float')
# see first 3 rows
data[:3]

array([[  18. ,    8. ,  307. ,  130. , 3504. ,   12. ,   70. ,    1. ,
        -999. ],
       [  15. ,    8. ,  350. ,  165. , 3693. ,   11.5,   70. ,    1. ,
        -999. ],
       [  18. ,    8. ,  318. ,  150. , 3436. ,   11. ,   70. ,    1. ,
        -999. ]])

## How to handle datasets that have both numbers and text columns?

If you really **must** have the text column as it is without replacing it with a placeholder, you can either set the `dtype` as *object* or as *None*.

In [129]:
data2 = np.genfromtxt(path, delimiter=',', skip_header=1, dtype=None, encoding=None)
# see first 3 rows
data2[:3]

array([(18., 8, 307., 130, 3504, 12. , 70, 1, '"chevrolet chevelle malibu"'),
       (15., 8, 350., 165, 3693, 11.5, 70, 1, '"buick skylark 320"'),
       (18., 8, 318., 150, 3436, 11. , 70, 1, '"plymouth satellite"')],
      dtype=[('f0', '<f8'), ('f1', '<i8'), ('f2', '<f8'), ('f3', '<i8'), ('f4', '<i8'), ('f5', '<f8'), ('f6', '<i8'), ('f7', '<i8'), ('f8', '<U38')])

Finally, `np.savetxt` lets you export the array as a csv file.

In [133]:
# Save the array as a csv file
np.savetxt("out.csv", data, delimiter=",", fmt='%f')

## How to save and load numpy objects?

At some point, we will want to save large transformed numpy arrays to disk and load it back to console directly without having the re-run the data transformations code.

Numpy provides the `.npy` and the `.npz` file types for this purpose.

If you want to store a single ndarray object, store it as a `.npy` file using `np.save`. This can be loaded back using the `np.load`.

If you want to store more than 1 ndarray object in a single file, then save it as a `.npz` file using `np.savez`.

In [134]:
# Save single numpy array object as .npy file
np.save('myarray.npy', arr2d)  

# Save multiple numpy arrays as a .npz file
np.savez('array.npz', arr2d_f, arr2d)

Load back the `.npy` file.

In [135]:
# Load a .npy file
a = np.load('myarray.npy')
print(a)

[[0 1 2]
 [3 4 5]
 [6 7 8]]


Load back the `.npz` file.

In [136]:
# Load a .npz file
b = np.load('array.npz')
# check the arrays inside it
print(b.files)
# get one of the arrays
b['arr_0']

['arr_0', 'arr_1']


array([[0., 1., 2.],
       [3., 4., 5.],
       [6., 7., 8.]])

## How to concatenate two numpy arrays column wise and row wise?

There are 3 different ways of concatenating two or more numpy arrays.

* Method 1: `np.concatenate` by changing the axis parameter to 0 and 1
* Method 2: `np.vstack` and `np.hstack`
* Method 3: `np.r_` and `np.c_`

All three methods provide the same output.

One key difference to notice is that unlike the other 2 methods, both `np.r_` and `np.c_` use square brackets to stack arrays. 

Let's create the arrays to be concatenated.

In [137]:
a = np.zeros([4, 4])
b = np.ones([4, 4])
print(a)
print(b)

[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]


### Stacking vertically (row wise)

In [138]:
# using concatenate 
np.concatenate([a, b], axis=0)

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [139]:
# using vstack 
np.vstack([a,b])

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [140]:
# using np.r_
np.r_[a,b] 

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

### Stacking horizontally (column wise)

In [141]:
# using concatenate
np.concatenate([a, b], axis=1) 

array([[0., 0., 0., 0., 1., 1., 1., 1.],
       [0., 0., 0., 0., 1., 1., 1., 1.],
       [0., 0., 0., 0., 1., 1., 1., 1.],
       [0., 0., 0., 0., 1., 1., 1., 1.]])

In [142]:
# using hstack 
np.hstack([a,b])  

array([[0., 0., 0., 0., 1., 1., 1., 1.],
       [0., 0., 0., 0., 1., 1., 1., 1.],
       [0., 0., 0., 0., 1., 1., 1., 1.],
       [0., 0., 0., 0., 1., 1., 1., 1.]])

In [143]:
 # using np.c_
np.c_[a,b]

array([[0., 0., 0., 0., 1., 1., 1., 1.],
       [0., 0., 0., 0., 1., 1., 1., 1.],
       [0., 0., 0., 0., 1., 1., 1., 1.],
       [0., 0., 0., 0., 1., 1., 1., 1.]])

## How to sort a numpy array based on one or more columns?

Let’s try and sort a 2d array based on the first column.

In [144]:
arr = np.random.randint(1,6, size=[8, 4])
arr

array([[3, 5, 4, 1],
       [2, 4, 3, 2],
       [3, 3, 4, 3],
       [1, 5, 3, 4],
       [1, 2, 1, 4],
       [2, 2, 1, 4],
       [2, 5, 4, 1],
       [1, 3, 1, 5]])

We have a random array of 8 rows and 4 columns.

If you use the `np.sort` function with `axis=0`, all the columns will be sorted in ascending order independent of eachother, effectively compromising the integrity of the row items. 

In simple terms, the values in each row gets corrupted with values from other rows.

In [147]:
# Sort each columns of arr
np.sort(arr, axis=0)

array([[1, 2, 1, 1],
       [1, 2, 1, 1],
       [1, 3, 1, 2],
       [2, 3, 3, 3],
       [2, 4, 3, 4],
       [2, 5, 4, 4],
       [3, 5, 4, 4],
       [3, 5, 4, 5]])

Since we don’t want the content of rows to be disturbed, we'll have to resort a method called `np.argsort`.

Let’s first understand what `np.argsort` does.

`np.argsort` returns the index positions of what would make a given 1d array sorted.

In [148]:
# Get the index positions that would sort the array
x = np.array([1, 10, 5, 2, 8, 9])
sort_index = np.argsort(x)
print(sort_index)

[0 3 2 4 5 1]


How to interpret this?

In array `x`, the 0th item is the smallest, 3rd item is the second smallest and so on.

In [149]:
x[sort_index]

array([ 1,  2,  5,  8,  9, 10])

Now, in order to sort our `arr`, we're going to do an `argsort` on the 1st column and use the resulting index positions to sort `arr`.

In [151]:
arr

array([[3, 5, 4, 1],
       [2, 4, 3, 2],
       [3, 3, 4, 3],
       [1, 5, 3, 4],
       [1, 2, 1, 4],
       [2, 2, 1, 4],
       [2, 5, 4, 1],
       [1, 3, 1, 5]])

In [152]:
# Argsort the first column
sorted_index_1stcol = arr[:, 0].argsort()

# Sort 'arr' by first column without disturbing the integrity of rows
arr[sorted_index_1stcol]

array([[1, 5, 3, 4],
       [1, 2, 1, 4],
       [1, 3, 1, 5],
       [2, 4, 3, 2],
       [2, 2, 1, 4],
       [2, 5, 4, 1],
       [3, 5, 4, 1],
       [3, 3, 4, 3]])

To sort it in decreasing order, simply reverse the argsorted index:

In [153]:
# Descending sort
arr[sorted_index_1stcol[::-1]]

array([[3, 3, 4, 3],
       [3, 5, 4, 1],
       [2, 5, 4, 1],
       [2, 2, 1, 4],
       [2, 4, 3, 2],
       [1, 3, 1, 5],
       [1, 2, 1, 4],
       [1, 5, 3, 4]])

## How to sort a numpy array based on 2 or more columns?

You can do this using `np.lexsort` by passing a tuple of columns based on which the array should be sorted.

Just remember to place the column to be sorted first at the rightmost side inside the tuple.

In [156]:
arr[sorted_index_1stcol]

array([[1, 5, 3, 4],
       [1, 2, 1, 4],
       [1, 3, 1, 5],
       [2, 4, 3, 2],
       [2, 2, 1, 4],
       [2, 5, 4, 1],
       [3, 5, 4, 1],
       [3, 3, 4, 3]])

In [157]:
# Sort by column 0, then by column 1
lexsorted_index = np.lexsort((arr[:, 0], arr[:, 1])) 
arr[lexsorted_index]

array([[1, 2, 1, 4],
       [2, 2, 1, 4],
       [1, 3, 1, 5],
       [3, 3, 4, 3],
       [2, 4, 3, 2],
       [1, 5, 3, 4],
       [2, 5, 4, 1],
       [3, 5, 4, 1]])