Author: Sridhar Nerur
Disclaimer: There is absolutely nothing original about this tutorial. Most of the examples are based on Wes McKinney's wonderful book Python for Data Analysis (O'Reilly).
Overview:
There are some cool things that you can do with Numpy. A few of them are given below:
1. Creating multidimensional arrays and manipulating them. For example, you can perform vectorized arithmetic operations on them. It also supports broadcasting.
2. It allows you to perform standard mathematical functions on arrays. It is very easy to multiply matrices, transpose them, etc.
3. It is great for applying linear algebra concepts, Fourier transforms, generating random numbers/distributions, etc.
4. It also allows one to integrate code with other languages such as C (e.g., C API) and Fortran.

In [1]:
#import numpy
import numpy as np
print(np.__version__) #you can update using conda or pip

1.17.2


In [2]:
#Simple operations with multidimensional arrays
m = np.array([[1, 2, 3, 4],[10, 11, 12, 13]])
print(m)
print("Dimensions: ", m.shape)
print("m's type: ", type(m))
print("Type of data in m: ", m.dtype)


[[ 1  2  3  4]
 [10 11 12 13]]
Dimensions:  (2, 4)
m's type:  <class 'numpy.ndarray'>
Type of data in m:  int64


In [15]:
#let us try a few mathematical operations
print("m * 5 = \n", m * 5)
print("m + m = \n", m + m)

m * 5 = 
 [[ 5 10 15 20]
 [50 55 60 65]]
m + m = 
 [[ 2  4  6  8]
 [20 22 24 26]]


In [5]:
#how many dimensions are we dealing with
print(m.ndim)

2


In [6]:
#converting our ints to floats
m.astype('float64')

array([[ 1.,  2.,  3.,  4.],
       [10., 11., 12., 13.]])

In [7]:
#creating arrays 
#Exampe 1: Creating a 2 x 4 array filled with zeros
zeros_matrix = np.zeros((2,4))
print("Matrix of zeros: \n", zeros_matrix)
#to fill with ones use np.ones
#creating an array of 10 sequential numbers
v = np.arange(10)
print("Array of 10 sequential numbers: \n",v)
#creating an identity matrix
i = np.eye(3)#diagonal will have ones, and the remaining will be 0s
print("Identity matrix:\n", i)

Matrix of zeros: 
 [[0. 0. 0. 0.]
 [0. 0. 0. 0.]]
Array of 10 sequential numbers: 
 [0 1 2 3 4 5 6 7 8 9]
Identity matrix:
 [[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


In [8]:
#some more operations
print(m) #our original matrix
print("Matrix times itself:\n", m * m) #multiply corresponding elements
print("Matrix raised to 0.5:\n", m ** 0.5)

[[ 1  2  3  4]
 [10 11 12 13]]
Matrix times itself:
 [[  1   4   9  16]
 [100 121 144 169]]
Matrix raised to 0.5:
 [[1.         1.41421356 1.73205081 2.        ]
 [3.16227766 3.31662479 3.46410162 3.60555128]]


In [9]:
#slicing one dimensional numpy arrays - just like lists in Python
x = np.array([1,2,3,4,5])
print("1: gives:\n", x[1:])
print("2:4 gives:\n", x[2:4])

1: gives:
 [2 3 4 5]
2:4 gives:
 [3 4]


In [10]:
#Broadcasting....this is cool
x[2:4] *= 2
print(x)
#replace first two elements with 0
x[:2] = 0
print(x)

[1 2 6 8 5]
[0 0 6 8 5]


In [11]:
#one important difference between lists and arrays.....
aList = [1,2,3,4,5]
print("Original list:\n", aList)
a_slice = aList[:2] #get the first two elements
print("Slice:\n", a_slice)
#let us change the slice
a_slice[0] = 10
print("Slice:\n", a_slice)
#what about the original list?
print("List after slice is changed:\n", aList) #there should be no change
#Now, let us do it with numpy arrays and see what happens
x = np.array([1,2,3,4,5])
print("Original numpy array:\n", x)
x_slice = x[:2]
print("Original slice:\n", x_slice)
#now change x_slice
x_slice[0] = 10
print("Changed slice:\n", x_slice)
#was x affected?
print("x after change to slice:\n", x)

Original list:
 [1, 2, 3, 4, 5]
Slice:
 [1, 2]
Slice:
 [10, 2]
List after slice is changed:
 [1, 2, 3, 4, 5]
Original numpy array:
 [1 2 3 4 5]
Original slice:
 [1 2]
Changed slice:
 [10  2]
x after change to slice:
 [10  2  3  4  5]


In [4]:
#creating a matrix of random normally distributed data
data = np.random.randn(5,4)
print(data)

[[ 0.48001109 -0.68229372 -0.01895502 -0.83832784]
 [-0.22130272  0.84669793  0.29504343 -0.27998416]
 [-0.17829735  1.29341941  0.63302758 -1.14309771]
 [ 0.55650699  0.92536045 -0.52058964 -0.80836642]
 [-1.79998587  1.44956897  1.03108533 -1.32765066]]


In [5]:
#we could associate each of the 5 rows above with some names
names = np.array(["John", "Mary", "Peter", "Pollock", "Richards"])
#let us display Peter's data - row# 3
data[names == "Peter"]

array([[-0.17829735,  1.29341941,  0.63302758, -1.14309771]])

In [6]:
#you can use Boolean expressions too
#example: display the rows corresponding to Mary and Richards
data[(names == 'Mary') | (names == "Richards")]

array([[-0.22130272,  0.84669793,  0.29504343, -0.27998416],
       [-1.79998587,  1.44956897,  1.03108533, -1.32765066]])

In [23]:
#let us replace all negative numbers in data with 0
data[data < 0] = 0
data

array([[0.        , 0.        , 0.        , 1.2261069 ],
       [0.93799974, 0.        , 0.        , 0.00258978],
       [0.        , 0.        , 1.84358564, 0.        ],
       [0.        , 0.        , 0.3564609 , 0.        ],
       [0.        , 0.        , 1.05092698, 0.66160643]])

In [24]:
#slice the data and then transpose
data[1:4,2:].T

array([[0.        , 1.84358564, 0.3564609 ],
       [0.00258978, 0.        , 0.        ]])

In [34]:
#universal functions - unary ufuncs
x = np.array([2,3,4,5,6])
np.sqrt(x)
#you can try out other functions - exp, log, log2, abs, isnan, cos, ....

array([1.41421356, 1.73205081, 2.        , 2.23606798, 2.44948974])

In [36]:
#binary ufuncs
#consider the following arrays
a = np.array([10,12,5,8,11])
b = np.array([5, 18, 1, 7, 13])
print(np.maximum(a,b))
print(np.add(a,b))
print(np.subtract(a,b))
print(np.multiply(a,b))

[10 18  5  8 13]
[15 30  6 15 24]
[ 5 -6  4  1 -2]
[ 50 216   5  56 143]


In [11]:
#using conditional logic - where
a = np.array([2,3,-4,8,-3,-7])
#let us replace 0 and positive numbers with 1 and negative numbers with -1
np.where( a >= 0, 1, -1)

array([ 1,  1, -1,  1, -1, -1])

In [39]:
#mathematical and statistical methods
#we will use the numpy array "a" given below
a = np.array([1,3,5,7,9,4])
print("Mean: ", a.mean()) #or, np.mean(a)
print("Sum: ", a.sum())
print("Std. Deviation: ", a.std()) #var() for variance


Mean:  4.833333333333333
Sum:  29
Std. Deviation:  2.608745973749755


In [12]:
#multidimensional arrays
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
a.mean() #mean of all the numbers in the array
a.mean(axis = 0) #gives column averages
a.mean(axis = 1) #gives row averages
a.cumsum(0) #get cumulative sums for each column


array([[ 1,  2,  3],
       [ 5,  7,  9],
       [12, 15, 18]])

In [18]:
a.sum(axis=0)

array([12, 15, 18])

In [19]:
a.sum(axis=1)

array([ 6, 15, 24])

In [47]:
#consider the following array
a = np.array([11,3,15,7,9,4])
#let us try a function called argmax - you should also try argmin
a.argmax() #returns index of the largest element in the array

2

In [53]:
#sorting
a.sort()
a

array([ 3,  4,  7,  9, 11, 15])

In [54]:
#sorted in reverse order
a[::-1]

array([15, 11,  9,  7,  4,  3])

In [3]:
#Using argsort to get an int array of indices
a = np.array([11,3,15,7,9,4])
a.argsort()

array([1, 5, 3, 4, 0, 2])

In [23]:
b = np.array([5,2,30,3,1])
b.argsort()

array([4, 1, 3, 0, 2])

You may store your arrays on disk and retrieve them later using save() and load functions. You may also save multiple arrays in a zipped format using savez()

In [None]:
np.save('my_array', a)
np.load('my_array')

Linear Algebra is arguably the most important foundation for data science. It typically involves manipulation of matrices - multiplication, eigenvalues, determinants, decompositions, and so forth. Numpy makes this fairly easy.

In [57]:
#Matrix multiplication
x = np.array([[1,2,3],[4,5,6]])
y = np.array([[10,11,12],[7,8,9],[1,2,3]])
np.dot(x,y) #also, x.dot(y) or y.dot(x)

array([[ 27,  33,  39],
       [ 81,  96, 111]])

In [59]:
#inverse of a matrix
from numpy.linalg import inv
inv(y)

array([[-9.63336819e+14,  1.44500523e+15, -4.81668409e+14],
       [ 1.92667364e+15, -2.89001046e+15,  9.63336819e+14],
       [-9.63336819e+14,  1.44500523e+15, -4.81668409e+14]])

In [64]:
#return diagonals
np.diag(x)

array([1, 5])

In [67]:
#eigenvalues and eigen vectors of a square matrix
from numpy.linalg import eig
eig(y)

(array([ 1.96241438e+01,  1.37585620e+00, -9.97885047e-16]),
 array([[ 0.807611  ,  0.68286933,  0.40824829],
        [ 0.57777082,  0.22339789, -0.81649658],
        [ 0.11809045, -0.69554501,  0.40824829]]))

In [69]:
#singular value decomposition
from numpy.linalg import svd
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
svd(a)

(array([[-0.21483724,  0.88723069,  0.40824829],
        [-0.52058739,  0.24964395, -0.81649658],
        [-0.82633754, -0.38794278,  0.40824829]]),
 array([1.68481034e+01, 1.06836951e+00, 3.33475287e-16]),
 array([[-0.47967118, -0.57236779, -0.66506441],
        [-0.77669099, -0.07568647,  0.62531805],
        [-0.40824829,  0.81649658, -0.40824829]]))

End of Numpy Tutorial. Check out the other methods under numpy.linalg.