# NumPy

**- Numpy provide much more efficient storage and data operations on arrays**
- Python ease of use is attributed by Dynamic typing
- The variable doesn't only contain the value but extra information about the type of the value
- Data may come in the heterogeneous form (text, image, audio)
- The first step in making them analyzable is to transform them into arrays of numbers
- Efficient storage, and manipulation of numerical arrays is fundamental to the process of doing data science
- Python provide us specialized tools to handle such numerical arrays: NumPy package and Pandas Package

In [2]:
import numpy as np

In [None]:
np.version.version

In [None]:
np. #provide the content in numpy

In [None]:
np? #provide built-in-documentation

## Creating Arrays

### Creating Arrays from Python Lists

- Use np.array to create arrays from Python lists

In [None]:
np.array([1, 2, 3])

- In NumPy arrays are of same type
- If not then Numpy will Upcast if possible

In [None]:
np.array([1, 2, 3.0])

- We can set the data type of the array by the keyword dtype

In [None]:
np.array([1, 2, 3], dtype='float32')

- Numpy arrays can be multidimensional
- Multidimensional arrays can be initialized by using list of lists

In [None]:
np.array([[1, 2], [3, 4]])

### Creating Arrays from Scratch

- To create larger arrays it is better to create using routines built into NumPy

In [None]:
# Create a length -10 integer array filled with zeros
# zeros(shape, dtype=float, order='C', *, like=None)

np.zeros(10, dtype = int)

In [None]:
# Create a 3X5 floating point array filled with 1s
# np.ones(shape, dtype=None, order='C', *, like=None)

np.ones((3,5), dtype=float)

In [None]:
# Create a 3X5 array filled with 3.15
# np.full(shape, fill_value, dtype=None, order='C', *, like=None)

np.full((3,5), 3.14)

In [None]:
# Craete an array starting at 0, ending at 20, stepping by 2
# The interval inludes the starting value but exclude the ending valye
# arange([start,] stop[, step,], dtype=None, *, like=None)

np.arange(0, 20, 2)

In [None]:
# Create an array filled with a linear sequence
# Create an array of five values evely spaced between 0 and 1
# np.linspace( start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0)

np.linspace(0, 2, 5)

In [None]:
# Create a 3x3 array of uniformly distributed random values between 0 and 1

np.random.random((3,3))

In [None]:
# Create a 3x3 array of normally distributed/ bell curve random values with mean 0 and standard deviation 1

np.random.normal(0, 1, (3,3))

In [None]:
# Create a 3x3 array of random integers in the interval of 0 to 10

np.random.randint(0, 10, (3,3))

In [None]:
# Create a 3x3 identity matrix

np.eye(3,3)

In [None]:
# Create an uninitialized array of three integers
# The value will be whatever happens to already exist at that memory location

np.empty(3)

## Basics of  NumPy Arrays

- This section will do **data manipulation** using NumPy array to access data and subarrays, and to split, reshape, and, join the arrays

1. [ ] Attributes of arrays: Determining the dimension, shape, size, memory consumption, and data types of arrays.
2. [ ] Indexing of arrays: Getting and setting the values of individual array elements
3. [ ] Slicing of arrays: Getting and setting smaller subarrays within a larger array
4. [ ] Reshaping of a given array: Changing the shape of a given array
5. [ ] Joining and splitting of arrays: Combining multiple arrays into one, and splitting one array into many

### Array Attributes: NumPy

In [31]:
import numpy as np

np.random.seed(0) # seed for reproducibility

In [16]:
x1 = np.random.randint(10, size=6) # one-dimensional array
x2 = np.random.randint(10, size=(3,4)) # two-dimensional array
x3 = np.random.randint(10, size=(3,4,5)) # three-dimensional array

In [39]:
x1

array([3, 7, 5, 5, 0, 1])

In [40]:
x2

array([[5, 9, 3, 0],
       [5, 0, 1, 2],
       [4, 2, 0, 3]])

In [41]:
x3

array([[[2, 0, 7, 5, 9],
        [0, 2, 7, 2, 9],
        [2, 3, 3, 2, 3],
        [4, 1, 2, 9, 1]],

       [[4, 6, 8, 2, 3],
        [0, 0, 6, 0, 6],
        [3, 3, 8, 8, 8],
        [2, 3, 2, 0, 8]],

       [[8, 3, 8, 2, 8],
        [4, 3, 0, 4, 3],
        [6, 9, 8, 0, 8],
        [5, 9, 0, 9, 6]]])

In [42]:
#Each array has attributes ndim( the number of dimensions), shape( the size of each dimension), 
# size (total size of the array), dtype, itemsize(each size of the array), nbytes(total size of the array)

'\nEach array has attributes ndim( the number of dimensions), shape( the size of each dimension), \nsize (total size of the array)\n'

In [48]:
print("x3 ndim:", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size:", x3.size)
print("x3 dtype:", x3.dtype)
print("x3 itemsize:", x3.itemsize, "bytes")
print("x3 nbytes:", x3.nbytes, "bytes")

x3 ndim: 3
x3 shape: (3, 4, 5)
x3 size: 60
x3 dtype: int64
x3 itemsize: 8 bytes
x3 nbytes: 480 bytes


### Array Indexing: Accessing Single Elements

1. [x] Attributes of arrays: Determining the dimension, shape, size, memory consumption, and data types of arrays.
2. [ ] Indexing of arrays: Getting and setting the values of individual array elements
3. [ ] Slicing of arrays: Getting and setting smaller subarrays within a larger array
4. [ ] Reshaping of a given array: Changing the shape of a given array
5. [ ] Joining and splitting of arrays: Combining multiple arrays into one, and splitting one array into many

In [49]:
x1

array([3, 7, 5, 5, 0, 1])

In [50]:
x1[0] # Square bracket to access the individual array elements

3

In [51]:
x1[4]

0

In [52]:
x1[-1] # use negative indices to access end of the array

1

In [53]:
x1[-2]

0

In [54]:
x2

array([[5, 9, 3, 0],
       [5, 0, 1, 2],
       [4, 2, 0, 3]])

In [55]:
# In multi-dimensional array, use a comma-separated tuple of indices
# first value before comma represents row and after comma the value is of column
x2[0,0]

5

In [59]:
x2[2,1]

2

In [57]:
x2[2,-1]

3

In [61]:
# Indexing can be used to modify the value of an item

x2[0,3] = 12

In [62]:
x2

array([[ 5,  9,  3, 12],
       [ 5,  0,  1,  2],
       [ 4,  2,  0,  3]])

In [64]:
x1

array([3, 7, 5, 5, 0, 1])

In [65]:
x1[0] = 3.14159 # Assigning float value to integer array will lead to truncate the value

In [66]:
x1

array([3, 7, 5, 5, 0, 1])

### Array Slicing: Accessing Subarrays

1. [x] Attributes of arrays: Determining the dimension, shape, size, memory consumption, and data types of arrays.
2. [x] Indexing of arrays: Getting and setting the values of individual array elements
3. [ ] Slicing of arrays: Getting and setting smaller subarrays within a larger array
4. [ ] Reshaping of a given array: Changing the shape of a given array
5. [ ] Joining and splitting of arrays: Combining multiple arrays into one, and splitting one array into many

- **Use square brackets to access subarrays with the slice notation by the colon (:) character**

- **x[start:stop:step]**

**a. Accessing subarrays in one dimension**


In [4]:
x = np.arange(10)

In [5]:
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [6]:
# first five elements
x[:5]

array([0, 1, 2, 3, 4])

In [7]:
# elements after index 5
x[5:]

array([5, 6, 7, 8, 9])

In [8]:
# middle subarrays
x[4:7]

array([4, 5, 6])

In [9]:
# every other element
x[::2]

array([0, 2, 4, 6, 8])

In [10]:
# every other element starting at index 1
x[1::2]

array([1, 3, 5, 7, 9])

In [12]:
# every other element starting at index 1 and stop before index 8
x[1:8:2]

array([1, 3, 5, 7])

In [13]:
# reverse the elements by using -1 in steps

x[::-1]

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

In [14]:
# reverse every other element from index 5

x[5::-2]

array([5, 3, 1])

**b. Access Multidimensional subarrays**

In [17]:
# Multiple slicies separated by commas
x2

array([[9, 7, 9, 7],
       [2, 3, 3, 4],
       [0, 3, 7, 5]])

In [19]:
# two rows, three columns
x2[:2,:3]  #by default it takes start from 0 and step as 1

array([[9, 7, 9],
       [2, 3, 3]])

In [20]:
# all rows, every other column

x2[:3, ::2]

array([[9, 9],
       [2, 3],
       [0, 7]])

In [24]:
x2

array([[9, 7, 9, 7],
       [2, 3, 3, 4],
       [0, 3, 7, 5]])

In [22]:
x2[::-1, :] # reversing all rows

array([[0, 3, 7, 5],
       [2, 3, 3, 4],
       [9, 7, 9, 7]])

In [23]:
x2[:, ::-1] # reversing all columns

array([[7, 9, 7, 9],
       [4, 3, 3, 2],
       [5, 7, 3, 0]])

In [25]:
# reversing rows and columns
x2[::-1, ::-1]

array([[5, 7, 3, 0],
       [4, 3, 3, 2],
       [7, 9, 7, 9]])

**c. Accessing array rows and columns**

In [28]:
x2

array([[9, 7, 9, 7],
       [2, 3, 3, 4],
       [0, 3, 7, 5]])

In [26]:
# all rows, first column (specify the slice notation only with indexing

x2[:,0]

array([9, 2, 0])

In [27]:
# first row, all columns
x2[0,:]

array([9, 7, 9, 7])

In [29]:
x2[0] # is equivalent to x2[0,:]

array([9, 7, 9, 7])

**d. Subarrays as no-copy views**
- Arrays slices return views rather than copies of the array data

In [31]:
print(x2)

[[9 7 9 7]
 [2 3 3 4]
 [0 3 7 5]]


In [32]:
# extract 2x2 subarray from x2

x2_sub = x2[:2,:2]

In [33]:
x2_sub

array([[9, 7],
       [2, 3]])

In [34]:
x2_sub[0,0] = 99

In [35]:
x2_sub

array([[99,  7],
       [ 2,  3]])

In [36]:
x2

array([[99,  7,  9,  7],
       [ 2,  3,  3,  4],
       [ 0,  3,  7,  5]])

**e. Creating copies of arrays**
- By using the method copy()

In [42]:
x2_sub_copy = x2[:2,:2].copy()

In [43]:
x2_sub_copy

array([[99,  7],
       [ 2,  3]])

In [44]:
x2_sub_copy[0,0] = 42

In [45]:
x2_sub_copy

array([[42,  7],
       [ 2,  3]])

In [46]:
x2

array([[99,  7,  9,  7],
       [ 2,  3,  3,  4],
       [ 0,  3,  7,  5]])

### Reshaping of Arrays

1. [x] Attributes of arrays: Determining the dimension, shape, size, memory consumption, and data types of arrays.
2. [x] Indexing of arrays: Getting and setting the values of individual array elements
3. [x] Slicing of arrays: Getting and setting smaller subarrays within a larger array
4. [ ] Reshaping of a given array: Changing the shape of a given array
5. [ ] Joining and splitting of arrays: Combining multiple arrays into one, and splitting one array into many

**- Use of reshape() method to transform the data**

In [48]:
grid = np.arange(1,10).reshape(3,3)

In [50]:
print(grid)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


**- Conversion of one-dimensional array into a two-dimensional row or column matrix**

In [51]:
x = np.array([1,2,3])

In [52]:
x

array([1, 2, 3])

In [53]:
x.reshape(1,3) # row vector via reshape

array([[1, 2, 3]])

In [54]:
x[np.newaxis, :] # row vector via newaxis

array([[1, 2, 3]])

In [55]:
x.reshape(3,1) # column vector via reshape

array([[1],
       [2],
       [3]])

In [56]:
x[:, np.newaxis] # column vector via newaxis

array([[1],
       [2],
       [3]])

### Array Concatenation and Splitting

1. [x] Attributes of arrays: Determining the dimension, shape, size, memory consumption, and data types of arrays.
2. [x] Indexing of arrays: Getting and setting the values of individual array elements
3. [x] Slicing of arrays: Getting and setting smaller subarrays within a larger array
4. [X] Reshaping of a given array: Changing the shape of a given array
5. [ ] Joining and splitting of arrays: Combining multiple arrays into one, and splitting one array into many

**a. Concatenation of arrays**
- By using np.concatenate, np.vstack, and np.hstack

In [85]:
# One-dimensional array

x = np.array([1,2,3])
y = np.array([3,2,1])

In [86]:
np.concatenate([x, y])

array([1, 2, 3, 3, 2, 1])

In [87]:
# Concatenate more than two dimensional array

z = np.array([99,99,99])

In [88]:
np.concatenate([x, y, z])

array([ 1,  2,  3,  3,  2,  1, 99, 99, 99])

In [69]:
# Concatenation for two-dimensional array

grid = np.arange(1,7).reshape(2,3)

In [70]:
grid

array([[1, 2, 3],
       [4, 5, 6]])

In [71]:
np.concatenate([grid, grid]) # on axis=0 or rows

array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

In [72]:
np.concatenate([grid, grid], axis=1) # on axis=1 or columns

array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

**- Mixed dimensions can be concatenated by using np.vstack, and np.hstack**

In [74]:
x

array([1, 2, 3])

In [75]:
grid

array([[1, 2, 3],
       [4, 5, 6]])

In [76]:
np.vstack([x, grid]) # Vertically stack the arrays

array([[1, 2, 3],
       [1, 2, 3],
       [4, 5, 6]])

In [78]:
y = np.array([[99], [99]])

In [79]:
y

array([[99],
       [99]])

In [80]:
np.hstack([y, grid]) # horizontally stack the arrays

array([[99,  1,  2,  3],
       [99,  4,  5,  6]])

**b. Splitting of arrays**
- By implementing the functions np.split, np.vsplit, and np.hsplit
- Passing a list of indices giving the split points

In [89]:
x = [1,2,3,99,99, 3,2,1]

In [90]:
x

[1, 2, 3, 99, 99, 3, 2, 1]

In [93]:
x1,x2,x3 = np.split(x,[3,6]) # N split points leads to N+1 subarrays

In [94]:
print(x1,x2,x3)

[1 2 3] [99 99  3] [2 1]


In [95]:
# vertical split

grid = np.arange(16).reshape(4,4)
print(grid)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]


In [96]:
upper, lower = np.vsplit(grid,[2])

In [97]:
print(upper)
print(lower)

[[0 1 2 3]
 [4 5 6 7]]
[[ 8  9 10 11]
 [12 13 14 15]]


In [98]:
# horizontal split

left, right = np.hsplit(grid,[2])

In [99]:
print(left)
print(right)

[[ 0  1]
 [ 4  5]
 [ 8  9]
 [12 13]]
[[ 2  3]
 [ 6  7]
 [10 11]
 [14 15]]


### Summary completion of Basics of NumPy Arrays

1. [x] Attributes of arrays: Determining the dimension, shape, size, memory consumption, and data types of arrays.
2. [x] Indexing of arrays: Getting and setting the values of individual array elements
3. [x] Slicing of arrays: Getting and setting smaller subarrays within a larger array
4. [X] Reshaping of a given array: Changing the shape of a given array
5. [x] Joining and splitting of arrays: Combining multiple arrays into one, and splitting one array into many

## Computation on NumPy Arrays: Universal Functions

In [100]:
# As we have Python native arithmatic operators like addition, subtraction, multiplication, 
# and division. So, we have artihmatic functions in NumPy

In [101]:
x = np.arange(4)


[0 1 2 3]


In [104]:
print("x = ", x)
print("x + 5 =", x + 5)
print("x -  5 =", x - 5)
print("x * 2 =", x * 2)
print("x / 2 =", x / 2)
print( "x // 2 =", x // 2)

x =  [0 1 2 3]
x + 5 = [5 6 7 8]
x -  5 = [-5 -4 -3 -2]
x * 2 = [0 2 4 6]
x / 2 = [0.  0.5 1.  1.5]
x // 2 = [0 0 1 1]


In [107]:
print("-x = ", -x) # negation
print("x ** 2 = ", x ** 2) # exponentiation
print("x % 2 =", x % 2) # Modulus

-x =  [ 0 -1 -2 -3]
x ** 2 =  [0 1 4 9]
x % 2 = [0 1 0 1]


In [109]:
np.add(x, 2)

array([2, 3, 4, 5])

In [110]:
np.subtract(x, 3)

array([-3, -2, -1,  0])

In [111]:
np.negative(x)

array([ 0, -1, -2, -3])

In [112]:
np.multiply(x, 4)

array([ 0,  4,  8, 12])

In [113]:
np.divide(x, 2)

array([0. , 0.5, 1. , 1.5])

In [114]:
np.floor_divide(x, 2)

array([0, 0, 1, 1])

In [115]:
np.power(x, 2)

array([0, 1, 4, 9])

In [116]:
np.mod(x, 2)

array([0, 1, 0, 1])

In [117]:
# Absolute function

x = np.array([-2, -1, 0, 1, 2])

In [118]:
abs(x)

array([2, 1, 0, 1, 2])

In [119]:
np.abs(x)

array([2, 1, 0, 1, 2])

In [120]:
np.absolute(x) # np.abs is the shorthand function

array([2, 1, 0, 1, 2])

**- Calculating logarithmic value by using natural log, base-2, or base-10**

In [121]:
x = [1, 2, 4, 10]

In [124]:
print("ln(x)",np.log(x))
print("log2(x)",np.log2(x))
print("log10(x)",np.log10(x))

ln(x) [0.         0.69314718 1.38629436 2.30258509]
log2(x) [0.         1.         2.         3.32192809]
log10(x) [0.         0.30103    0.60205999 1.        ]


## Aggregations: Min, Max and others

- One dimension aggregate operation

In [125]:
L = np.random.random(100) 

In [126]:
sum(L) #Python built in aggregation function

46.49012388313727

In [127]:
np.sum(L) #NumPy built in aggregation function

46.490123883137294

In [129]:
%timeit sum(L)
%timeit np.sum(L)

8.33 µs ± 30.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
2.66 µs ± 21.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [130]:
big_array = np.random.random(1000000)

In [131]:
%timeit sum(big_array)
%timeit np.sum(big_array)

78.6 ms ± 382 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
193 µs ± 929 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [133]:
# Python built-in min and max
print(min(L))
print(max(L))

0.004377978446895026
0.9999707926217818


In [134]:
# NumPy buil-in min and max

print(np.min(L))
print(np.max(L))

0.004377978446895026
0.9999707926217818


In [138]:
min(L), max(L), sum(L)

(0.004377978446895026, 0.9999707926217818, 46.49012388313727)

In [137]:
# NumPy aggregates short syntax

print(big_array.min(), big_array.max(), big_array.sum())

1.0202525901892301e-07 0.9999989458789282 499577.5280977442


**- Multi-dimensional aggregates**
- Aggregation against row or column 

In [3]:
M = np.random.random((3,4))

In [4]:
M

array([[0.43894429, 0.80216422, 0.07477117, 0.04854035],
       [0.48822921, 0.63274277, 0.080721  , 0.54700134],
       [0.71298675, 0.35752035, 0.28493925, 0.21538613]])

In [6]:
sum(M)

array([1.64016025, 1.79242734, 0.44043142, 0.81092782])

In [5]:
M.sum() # by default it will return the aggregate over the entire array

4.68394683342652

**- The axis keyword used here are confusing**
- Here it is specifies the dimension of the array that will be collapsed
- Rather than the dimension that will be returned

In [7]:
M.min(axis = 0) # find the min values within each column by specifying axis=0

array([0.43894429, 0.35752035, 0.07477117, 0.04854035])

In [144]:
M.max(axis = 1) # find the min values within each row by specifying axis=1

array([0.8640827 , 0.7889667 , 0.78527936])

## Computation on Arrays: Broadcasting

**- NumPy universal functions can be used to vectorize operations**
- Another means of vectorizing operations is to use NumPy's broadcasting functionality
- Broadcasting is simply a set of rules to apply binary functions ('+', '-', 'x') on arrays of different sizes.

In [2]:
# Binary operation on same size of array

a = np.array([0, 1, 2])
b = np.array([5, 5, 5])

In [3]:
a + b

array([5, 6, 7])

**- How about the binary operation when the size of arrays are different?**


### Rules of Broadcasting
- Rule 1: If the two arrays differ in their number of dimensions, the shape of the
one with fewer dimensions is padded with ones on its leading (left) side.

- Rule 2: If the shape of the two arrays does not match in any dimension, the array
with shape equal to 1 in that dimension is stretched to match the other shape.

- Rule 3: If in any dimension the sizes disagree and neither is equal to 1, an error is
raised.

In [4]:
a + 5 # adding a scaler value

array([5, 6, 7])

- The condition when we have to broadcast one array 

In [5]:
M = np.ones((3,3))

In [6]:
M

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [8]:
M.shape

(3, 3)

In [9]:
a.shape # Broadcasting of single array

(3,)

In [7]:
M + a

array([[1., 2., 3.],
       [1., 2., 3.],
       [1., 2., 3.]])

- Broadcasting of both the arrays

In [10]:
a = np.arange(3)
b = np.arange(3)[:,np.newaxis]

In [12]:
print(a.shape)
print(b.shape)

(3,)
(3, 1)


In [13]:
print(a)
print(b)

[0 1 2]
[[0]
 [1]
 [2]]


In [14]:
a + b

array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4]])

- Example when two arrays are not compatible

In [17]:
M = np.ones((3,2))
a = np.arange(3)

In [18]:
print(M.shape)
print(a.shape)

(3, 2)
(3,)


In [19]:
M + a

ValueError: operands could not be broadcast together with shapes (3,2) (3,) 

In [20]:
# Solution

a[:,np.newaxis].shape

(3, 1)

In [23]:
M

array([[1., 1.],
       [1., 1.],
       [1., 1.]])

In [24]:
a

array([0, 1, 2])

In [22]:
M + a[:,np.newaxis]

array([[1., 1.],
       [2., 2.],
       [3., 3.]])