### What is numpy?

NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.


At the core of the NumPy package, is the ndarray object. This encapsulates n-dimensional arrays of homogeneous data types

### Numpy Arrays Vs Python Sequences

- NumPy arrays have a fixed size at creation, unlike Python lists (which can grow dynamically). Changing the size of an ndarray will create a new array and delete the original.

- The elements in a NumPy array are all required to be of the same data type, and thus will be the same size in memory.

- NumPy arrays facilitate advanced mathematical and other types of operations on large numbers of data. Typically, such operations are executed more efficiently and with less code than is possible using Python’s built-in sequences.

- A growing plethora of scientific and mathematical Python-based packages are using NumPy arrays; though these typically support Python-sequence input, they convert such input to NumPy arrays prior to processing, and they often output NumPy arrays.

### Creating Numpy Arrays

In [7]:
# np.array
import numpy as np

# 1D (vector)
a = np.array([1,2,3])
print(a)
print(type(a))

# 2D (matrix)
b = np.array([[1,2,3],[4,5,6]])
print(b)

# 3D (tensor)
c = np.array([[[1,2],[3,4]],[[5,6],[7,8]]])
print(c)

[1 2 3]
<class 'numpy.ndarray'>
[[1 2 3]
 [4 5 6]]
[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]


In [17]:
# dtype
f = np.array([1,2,3],dtype=float)
print(f)
b = np.array([-1,0,3],dtype=bool)
print(b)
c = np.array([1,2,3],dtype=complex)
print(c)

[1. 2. 3.]
[ True False  True]
[1.+0.j 2.+0.j 3.+0.j]


In [19]:
# np.arange
np.arange(1,11,2)

array([1, 3, 5, 7, 9])

In [47]:
# with reshape (r,c)
print(np.arange(1,11).reshape(5,2))
print(np.arange(1,11).reshape(2,5))
print(np.arange(16).reshape(2,2,2,2)) # 4D array

[[ 1  2]
 [ 3  4]
 [ 5  6]
 [ 7  8]
 [ 9 10]]
[[ 1  2  3  4  5]
 [ 6  7  8  9 10]]
[[[[ 0  1]
   [ 2  3]]

  [[ 4  5]
   [ 6  7]]]


 [[[ 8  9]
   [10 11]]

  [[12 13]
   [14 15]]]]


In [31]:
# np.ones and np.zeros
print(np.ones((3,4)))
print(np.zeros((3,4)))

[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]
[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]


In [35]:
# np.random
np.random.random((3,4)) # btw 0 to 1

array([[0.65259507, 0.74741441, 0.13356095, 0.22464966],
       [0.16422061, 0.95350663, 0.52240118, 0.7616577 ],
       [0.05482256, 0.69113026, 0.29027751, 0.12809851]])

In [43]:
# np.linspace: generates equidistant points in a data range (linspace stands for 'linearly spaced')
print(np.linspace(-10,10,10))
print(np.linspace(-10,10,10,dtype=int))

[-10.          -7.77777778  -5.55555556  -3.33333333  -1.11111111
   1.11111111   3.33333333   5.55555556   7.77777778  10.        ]
[-10  -8  -6  -4  -2   1   3   5   7  10]


In [45]:
# np.identity: creates identity matrix
np.identity(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

### Array Attributes

In [61]:
a1 = np.arange(10,dtype=np.int32)
a2 = np.arange(12,dtype=float).reshape(3,4)
a3 = np.arange(8).reshape(2,2,2) # There are 2 2D arrays of shape (2,2)

print(a1)
print(a2)
print(a3)

[0 1 2 3 4 5 6 7 8 9]
[[ 0.  1.  2.  3.]
 [ 4.  5.  6.  7.]
 [ 8.  9. 10. 11.]]
[[[0 1]
  [2 3]]

 [[4 5]
  [6 7]]]


In [63]:
# ndim: tells dimensions of a given array

print(a1.ndim)
print(a2.ndim)
print(a3.ndim)

1
2
3


In [65]:
# shape: tells shape of a given array (i.e. number of items in each dimension)

print(a1.shape)
print(a2.shape)
print(a3.shape)

(10,)
(3, 4)
(2, 2, 2)


In [67]:
# size: tells number of items

print(a1.size)
print(a2.size)
print(a3.size)

10
12
8


In [77]:
# itemsize: tells how much size each item occupies in memory (RAM)

print(a1.itemsize) # int32 takes 4 bytes
print(a2.itemsize)
print(a3.itemsize) # int64 takes 8 bytes

4
8
8


In [75]:
# dtype: tells datatype of items

print(a1.dtype)
print(a2.dtype)
print(a3.dtype)

int32
float64
int64


### Changing Datatype

In [85]:
# astype (used when you have confidence that you can reduce current datatype to another datatype which occupies less memory)

print(a3.dtype)
a3.astype(np.int32) # reduced space by using int32 instead of int64

int64


array([[[0, 1],
        [2, 3]],

       [[4, 5],
        [6, 7]]], dtype=int32)

### Array Operations

In [87]:
a1 = np.arange(12).reshape(3,4)
a2 = np.arange(12,24).reshape(3,4)

print(a1)
print(a2)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
[[12 13 14 15]
 [16 17 18 19]
 [20 21 22 23]]


In [91]:
# scalar operations: btw. a scalar & a numpy array

# arithmetic
print(a1 ** 2)

# relational
print(a2 == 15)

[[  0   1   4   9]
 [ 16  25  36  49]
 [ 64  81 100 121]]
[[False False False  True]
 [False False False False]
 [False False False False]]


In [95]:
# vector operations: btw. 2 numpy arrays

# arithmetic
print(a1 + a2)
print(a1 ** a2)

[[12 14 16 18]
 [20 22 24 26]
 [28 30 32 34]]
[[                   0                    1                16384
              14348907]
 [          4294967296         762939453125      101559956668416
     11398895185373143]
 [ 1152921504606846976 -1261475310744950487  1864712049423024128
   6839173302027254275]]


### Array Functions

In [97]:
a1 = np.random.random((3,3))
a1 = np.round(a1*100)

print(a1) # randomly created array with items in range [1,100]

[[92. 50. 38.]
 [ 8. 10. 57.]
 [44. 11. 27.]]


In [111]:
# max/min/sum/prod

print(np.max(a1))
print(np.min(a1))
print(np.sum(a1))
print(np.prod(a1))

# axis: 0 -> col and 1 -> row
print(np.prod(a1,axis=0))
print(np.max(a1,axis=1)) # finds max in each row

92.0
8.0
337.0
10416345984000.0
[32384.  5500. 58482.]
[92. 57. 44.]


In [117]:
# mean/median/std/var

print(np.mean(a1,axis=1)) # mean of each row
print(np.median(a1,axis=1)) # median of each row
print(np.std(a1,axis=1)) # standard deviation of each row
print(np.var(a1,axis=1)) # variance of each row

[60.         25.         27.33333333]
[50. 10. 27.]
[23.15167381 22.6421436  13.47425529]
[536.         512.66666667 181.55555556]


In [121]:
# trigonomoetric functions

np.sin(a1) # calculates sine of each item

array([[-0.77946607, -0.26237485,  0.29636858],
       [ 0.98935825, -0.54402111,  0.43616476],
       [ 0.01770193, -0.99999021,  0.95637593]])

In [147]:
# dot product (only valid when no. of cols of 1st matrix = no. of rows of 2nd matrix)

a2 = np.arange(12).reshape(3,4)
a3 = np.arange(12,24).reshape(4,3)

print(a2)
print(a3)
np.dot(a2,a3)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
[[12 13 14]
 [15 16 17]
 [18 19 20]
 [21 22 23]]


array([[114, 120, 126],
       [378, 400, 422],
       [642, 680, 718]])

In [129]:
# log and exponents

print(a1)
print(np.log(a1)) # calculates log of each item, i.e. ln(item)
print(np.exp(a1)) # calculates exponent of each item, i.e. e^(item)

[[92. 50. 38.]
 [ 8. 10. 57.]
 [44. 11. 27.]]
[[4.52178858 3.91202301 3.63758616]
 [2.07944154 2.30258509 4.04305127]
 [3.78418963 2.39789527 3.29583687]]
[[9.01762841e+39 5.18470553e+21 3.18559318e+16]
 [2.98095799e+03 2.20264658e+04 5.68572000e+24]
 [1.28516001e+19 5.98741417e+04 5.32048241e+11]]


In [139]:
# round/floor/ceil

a = np.random.random((2,3))*100
print(a)
print(np.round(a))
print(np.floor(a))
print(np.ceil(a))

[[81.13041972 14.73576436 44.50540401]
 [92.99632369 33.73222596 19.56477926]]
[[81. 15. 45.]
 [93. 34. 20.]]
[[81. 14. 44.]
 [92. 33. 19.]]
[[82. 15. 45.]
 [93. 34. 20.]]


### Indexing and Slicing

#### Indexing

In [163]:
a1 = np.arange(10)
a2 = np.arange(12).reshape(3,4)
a3 = np.arange(8).reshape(2,2,2)

print(a1)
print(a2)
print(a3)

[0 1 2 3 4 5 6 7 8 9]
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
[[[0 1]
  [2 3]]

 [[4 5]
  [6 7]]]


In [165]:
# 1D arrays

print(a1)
print(a1[0]) # positive indexing
print(a1[-1]) # negative indexing

[0 1 2 3 4 5 6 7 8 9]
0
9


In [167]:
# 2D arrays

print(a2)
a2[1,2] # (r,c) based on 0-based indexing

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


6

In [177]:
# 3D arrays

print(a3)
a3[1,0,1] # (which 2D array, r, c) based on 0-based indexing

[[[0 1]
  [2 3]]

 [[4 5]
  [6 7]]]


5

#### Slicing

In [181]:
# 1D arrays

print(a1)
a1[2:5:2]

[0 1 2 3 4 5 6 7 8 9]


array([2, 4])

In [197]:
# 2D arrays

print(a2)
print(a2[0,:]) # 1st row with all columns
print(a2[:,2]) # 3rd column
print(a2[1:,1:3])
print(a2[::2,::3])
print(a2[::2,1::2])

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
[0 1 2 3]
[ 2  6 10]
[[ 5  6]
 [ 9 10]]
[[ 0  3]
 [ 8 11]]
[[ 1  3]
 [ 9 11]]


In [229]:
# 3D arrays

a3 = np.arange(27).reshape(3,3,3)
print(a3)
print(a3[1]) # finds middle 2d array out of 3 2d arrays
print(a3[::2]) 
print(a3[0,1]) # 2nd row of 1st 2d array
print(a3[1,:,1]) # middle col of 2nd 2d array
print(a3[2,1:,1:])
print(a3[::2,0,::2])

[[[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]]

 [[ 9 10 11]
  [12 13 14]
  [15 16 17]]

 [[18 19 20]
  [21 22 23]
  [24 25 26]]]
[[ 9 10 11]
 [12 13 14]
 [15 16 17]]
[[[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]]

 [[18 19 20]
  [21 22 23]
  [24 25 26]]]
[3 4 5]
[10 13 16]
[[22 23]
 [25 26]]
[[ 0  2]
 [18 20]]


### Iterating

In [243]:
# 1D arrays

print(a1)

for i in a1:
  print(i) # prints single element

[0 1 2 3 4 5 6 7 8 9]
0
1
2
3
4
5
6
7
8
9


In [245]:
# 2D arrays

print(a2)

for i in a2:
  print(i) # prints 1d array

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
[0 1 2 3]
[4 5 6 7]
[ 8  9 10 11]


In [247]:
# 3D arrays

print(a3)

for i in a3:
  print(i) # prints 2d array

[[[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]]

 [[ 9 10 11]
  [12 13 14]
  [15 16 17]]

 [[18 19 20]
  [21 22 23]
  [24 25 26]]]
[[0 1 2]
 [3 4 5]
 [6 7 8]]
[[ 9 10 11]
 [12 13 14]
 [15 16 17]]
[[18 19 20]
 [21 22 23]
 [24 25 26]]


In [249]:
# Loop over single items

for i in np.nditer(a3): # converted a3 to 1D and then printed all its elements
  print(i)

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26


### Reshaping

In [251]:
# reshape: discussed above

In [263]:
# Transpose: exchange row entries with col entries

print(a2)
print(np.transpose(a2))
print(a2.T) # another syntax for above

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
[[ 0  4  8]
 [ 1  5  9]
 [ 2  6 10]
 [ 3  7 11]]
[[ 0  4  8]
 [ 1  5  9]
 [ 2  6 10]
 [ 3  7 11]]


In [265]:
# ravel: converts any dimensional array to 1D

print(a3)
a3.ravel()

[[[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]]

 [[ 9 10 11]
  [12 13 14]
  [15 16 17]]

 [[18 19 20]
  [21 22 23]
  [24 25 26]]]


array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26])

### Stacking
- Used in case of multiple data sources (like API, web scrape, database, etc.)
- Way of joinging multiple data sources

In [273]:
# stacking means joining multiple numpy arrays

a4 = np.arange(12).reshape(3,4)
a5 = np.arange(12,24).reshape(3,4)
print(a4)
print(a5)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
[[12 13 14 15]
 [16 17 18 19]
 [20 21 22 23]]


In [279]:
# horizontal stacking: increases no. of cols

np.hstack((a4,a5,a4))

array([[ 0,  1,  2,  3, 12, 13, 14, 15,  0,  1,  2,  3],
       [ 4,  5,  6,  7, 16, 17, 18, 19,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 20, 21, 22, 23,  8,  9, 10, 11]])

In [277]:
# vertical stacking: increases no. of rows

np.vstack((a4,a5))

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23]])

### Splitting
- Opposite of stacking
- used when you make multiple things from single datasource (eg: segregating college data branch wise)

In [302]:
# horizontal splitting: when you cut vertical

print(a4)
print(np.hsplit(a4,2)) # split into 2 equal parts
np.hsplit(a4,5) # error --> has to be equal division

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
[array([[0, 1],
       [4, 5],
       [8, 9]]), array([[ 2,  3],
       [ 6,  7],
       [10, 11]])]


ValueError: array split does not result in an equal division

In [300]:
# vertical splitting: when you cut horizontal

print(a5)
print(np.vsplit(a5,3)) # split into 3 equal parts
np.vsplit(a5,2) # error --> has to be equal division

[[12 13 14 15]
 [16 17 18 19]
 [20 21 22 23]]
[array([[12, 13, 14, 15]]), array([[16, 17, 18, 19]]), array([[20, 21, 22, 23]])]


ValueError: array split does not result in an equal division