<a href="https://www.viharatech.com"><img src="https://miro.medium.com/v2/resize:fit:828/format:webp/1*on2Bpj6A5gfd-CzaFOAZRA.jpeg"></a>

<p style="font-family: Arial; font-size:3.75em;color:purple; font-style:bold"><br>
Introduction to Numpy:
</p><br>

<p style="font-family: Arial; font-size:1.25em;color:#2462C0; font-style:bold"><br>
Package for scientific computing with Python
</p><br>

Numerical Python, or "Numpy" for short, is a foundational package on which many of the most common data science packages are built.  Numpy provides us with high performance multi-dimensional arrays which we can use as vectors or matrices.  

The key features of numpy are:

- ndarrays: n-dimensional arrays of the same data type which are fast and space-efficient.  There are a number of built-in methods for ndarrays which allow for rapid processing of data without using loops (e.g., compute the mean).
- Broadcasting: a useful tool which defines implicit behavior between multi-dimensional arrays of different sizes.
- Vectorization: enables numeric operations on ndarrays.
- Input/Output: simplifies reading and writing of data from/to file.

<b>Additional Recommended Resources:</b><br>
<a href="https://docs.scipy.org/doc/numpy/reference/">Numpy Documentation</a><br>
<i>Python for Data Analysis</i> by Wes McKinney<br>
<i>Python Data science Handbook</i> by Jake VanderPlas



<p style="font-family: Arial; font-size:2.75em;color:purple; font-style:bold"><br>

Getting started with ndarray<br><br></p>

**ndarrays** are time and space-efficient multidimensional arrays at the core of numpy.  Like the data structures in Week 2, let's get started by creating ndarrays using the numpy package.

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

How to create Rank 1 numpy arrays:
</p>

In [7]:
import numpy as np
a = np.array([[1,2,3,4] , [10,20,30,40] , [90,100,110,120]])
print(a)
print(a.ndim)  #dimesions 
print('shape of the array  = ' , a.shape) # shape of the array 
print('size of the array = ', a.size)

[[  1   2   3   4]
 [ 10  20  30  40]
 [ 90 100 110 120]]
2
shape of the array  =  (3, 4)
size of the array =  12


In [17]:
a = np.arange(1,82).reshape(3,3,9)
a[2][1][5]

69

In [1]:
import numpy as np

an_array = np.array([3, 33, 333])  # Create a rank 1 array

print(type(an_array))              # The type of an ndarray is: "<class 'numpy.ndarray'>"

<class 'numpy.ndarray'>


In [3]:
print(an_array)

[  3  33 333]


In [2]:
# test the shape of the array we just created, it should have just one dimension (Rank 1)
print(an_array.shape)

(3,)


In [4]:
# because this is a 1-rank array, we need only one index to accesss each element
print(an_array[0], an_array[1], an_array[2]) 

3 33 333


In [5]:
an_array[0] =888            # ndarrays are mutable, here we change an element of the array

print(an_array)

[888  33 333]


<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

How to create a Rank 2 numpy array:</p>

A rank 2 **ndarray** is one with two dimensions.  Notice the format below of [ [row] , [row] ].  2 dimensional arrays are great for representing matrices which are often useful in data science.

In [18]:
another = np.array([[11,12,13],[21,22,23]])   # Create a rank 2 array

print(another)  # print the array

print("The shape is 2 rows, 3 columns: ", another.shape)  # rows x columns                   

print("Accessing elements [0,0], [0,1], and [1,0] of the ndarray: ", another[0, 0], ", ",another[0, 1],", ", another[1, 0])

[[11 12 13]
 [21 22 23]]
The shape is 2 rows, 3 columns:  (2, 3)
Accessing elements [0,0], [0,1], and [1,0] of the ndarray:  11 ,  12 ,  21


<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

There are many way to create numpy arrays:
</p>

Here we create a number of different size arrays with different shapes and different pre-filled values.  numpy has a number of built in methods which help us quickly and easily create multidimensional arrays.

In [19]:
import numpy as np

# create a 2x2 array of zeros
ex1 = np.zeros((2,2) , dtype = int)      
print(ex1)                              

[[0 0]
 [0 0]]


## full

In [21]:
# create a 2x2 array filled with 9.0
ex2 = np.full((2,2), 9 , dtype = float)  
print(ex2)   

[[9. 9.]
 [9. 9.]]


In [23]:
# create a 2x2 matrix with the diagonal 1s and the others 0
ex3 = np.eye(2,2 , dtype = int)
print(ex3)  

[[1 0]
 [0 1]]


In [27]:
# create an array of ones
import numpy as np
ex4 = np.ones((1,2) , dtype = int)
print(ex4)    

[[1 1]]


In [28]:
# notice that the above ndarray (ex4) is actually rank 2, it is a 2x1 array
print(ex4.shape)

# which means we need to use two indexes to access an element
print()
print(ex4[0,1])

(1, 2)

1


In [3]:
# create an array of random floats between 0 and 1
ex5 = np.random.random((2,2))
print(ex5)    

[[0.27871835 0.79315494]
 [0.57107792 0.87123751]]


<p style="font-family: Arial; font-size:2.75em;color:purple; font-style:bold"><br>

Array Indexing
<br><br></p>

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>
Slice indexing:
</p>

Similar to the use of slice indexing with lists and strings, we can use slice indexing to pull out sub-regions of ndarrays.

In [29]:
import numpy as np

# Rank 2 array of shape (3, 4)
an_array = np.array([[11,12,13,14], [21,22,23,24], [31,32,33,34]])
print(an_array)

[[11 12 13 14]
 [21 22 23 24]
 [31 32 33 34]]


In [39]:
a = an_array[2 , : 3]
b = a[-1]
print(a)
print(b)

[31 32 33]
33


Use array slicing to get a subarray consisting of the first 2 rows x 2 columns.

In [41]:
a_slice = an_array[:2, 1:3] ## : starting point to 2(ending point (n-1))  ,  1(starting point ) : 3(ending point (n-1))
print(a_slice)

[[12 13]
 [22 23]]


In [42]:
## without slicing 

an_array[2][2]


33

When you modify a slice, you actually modify the underlying array.

In [43]:
an_array

array([[11, 12, 13, 14],
       [21, 22, 23, 24],
       [31, 32, 33, 34]])

In [44]:
a_slice

array([[12, 13],
       [22, 23]])

In [45]:
an_array

array([[11, 12, 13, 14],
       [21, 22, 23, 24],
       [31, 32, 33, 34]])

In [46]:
print("Before:", an_array[0, 1])   #inspect the element at 0, 1  
a_slice[0, 0] = 1000    # a_slice[0, 0] is the same piece of data as an_array[0, 1]
print("After:", an_array[0, 1])    

Before: 12
After: 1000


<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Use both integer indexing & slice indexing
</p>

We can use combinations of integer indexing and slice indexing to create different shaped matrices.

In [47]:
# Create a Rank 2 array of shape (3, 4)
an_array = np.array([[11,12,13,14], [21,22,23,24], [31,32,33,34]])
print(an_array)
an_array.shape

[[11 12 13 14]
 [21 22 23 24]
 [31 32 33 34]]


(3, 4)

In [25]:
# Using both integer indexing & slicing generates an array of lower rank
row_rank1 = an_array[1, :]    # Rank 1 view 

print(row_rank1, row_rank1.shape)  # notice only a single []

[21 22 23 24] (4,)


In [26]:
# Slicing alone: generates an array of the same rank as the an_array
row_rank2 = an_array[1:2, :]  # Rank 2 view 

print(row_rank2, row_rank2.shape)   # Notice the [[ ]]

[[21 22 23 24]] (1, 4)


In [27]:
#We can do the same thing for columns of an array:

print()
col_rank1 = an_array[:, 1]
col_rank2 = an_array[:, 1:2]

print(col_rank1, col_rank1.shape)  # Rank 1
print()
print(col_rank2, col_rank2.shape)  # Rank 2


[12 22 32] (3,)

[[12]
 [22]
 [32]] (3, 1)


<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Array Indexing for changing elements:
</p>

Sometimes it's useful to use an array of indexes to access or change elements.

In [None]:
# Create a new array
an_array = np.array([[11,12,13], [21,22,23], [31,32,33], [41,42,43]])

print('Original Array:')
print(an_array)

<p style="font-family: Arial; font-size:2.75em;color:purple; font-style:bold"><br>
Boolean Indexing

<br><br></p>
<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Array Indexing for changing elements:
</p>

In [48]:
# create a 3x2 array
an_array = np.array([[11,12], [21, 22], [31, 32]])
print(an_array)

[[11 12]
 [21 22]
 [31 32]]


In [50]:
a = 1
a

1

In [49]:
map = 1
print(map)

1


In [29]:
# create a filter which will be boolean values for whether each element meets this condition
filter = (an_array > 15)
filter

array([[False, False],
       [ True,  True],
       [ True,  True]])

Notice that the filter is a same size ndarray as an_array which is filled with True for each element whose corresponding element in an_array which is greater than 15 and False for those elements whose value is less than 15.

In [30]:
# we can now select just those elements which meet that criteria
print(an_array[filter])

[21 22 31 32]


<p style="font-family: Arial; font-size:2.75em;color:purple; font-style:bold"><br>

Datatypes and Array Operations
<br><br></p>

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Datatypes:
</p>

In [4]:
ex1 = np.array([11, 12]) # Python assigns the  data type
print(ex1.dtype)

int32


In [5]:
ex2 = np.array([11.0, 12.0]) # Python assigns the  data type
print(ex2.dtype)

float64


In [6]:
ex3 = np.array([11, 21], dtype=np.int64) #You can also tell Python the  data type
print(ex3.dtype)

int64


In [7]:
# you can use this to force floats into integers (using floor function)
ex4 = np.array([11.1,12.7], dtype=np.int64)
print(ex4.dtype)
print()
print(ex4)

int64

[11 12]


In [8]:
# you can use this to force integers into floats if you anticipate
# the values may change to floats later
ex5 = np.array([11, 21], dtype=np.float64)
print(ex5.dtype)
print()
print(ex5)

float64

[11. 21.]


<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Arithmetic Array Operations:

</p>

In [53]:
x = np.array([[111,112],[121,122]], dtype=np.int)
y = np.array([[211.1,212.1],[221.1,222.1]], dtype=np.float64)

print('X_array = ' , x)
print()
print('y_array = ' , y)

X_array =  [[111 112]
 [121 122]]

y_array =  [[211.1 212.1]
 [221.1 222.1]]


In [32]:
# add
print(x + y)         # The plus sign works
print()
print(np.add(x, y))  # so does the numpy function "add"

[[322.1 324.1]
 [342.1 344.1]]

[[322.1 324.1]
 [342.1 344.1]]


In [33]:
# subtract
print(x - y)
print()
print(np.subtract(x, y))

[[-100.1 -100.1]
 [-100.1 -100.1]]

[[-100.1 -100.1]
 [-100.1 -100.1]]


In [34]:
# multiply
print(x * y)
print()
print(np.multiply(x, y))

[[23432.1 23755.2]
 [26753.1 27096.2]]

[[23432.1 23755.2]
 [26753.1 27096.2]]


In [35]:
# divide
print(x / y)
print()
print(np.divide(x, y))

[[0.52581715 0.52805281]
 [0.54726368 0.54930212]]

[[0.52581715 0.52805281]
 [0.54726368 0.54930212]]


In [54]:
# square root
x = [1,2,3,4]
print(np.sqrt(x))

[1.         1.41421356 1.73205081 2.        ]


In [55]:
# exponent (e ** x)
print(np.exp(x))

[ 2.71828183  7.3890561  20.08553692 54.59815003]


<p style="font-family: Arial; font-size:2.75em;color:purple; font-style:bold"><br>

Statistical Methods, Sorting, and <br> <br> Set Operations:
<br><br>
</p>

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Basic Statistical Operations:
</p>

In [38]:
# setup a random 2 x 4 matrix
arr = 10 * np.random.randn(2,5)
print(arr)

[[-6.68658506  6.53998555 -7.52995411 11.13502069  7.42649954]
 [-3.67800123 23.70505753  4.43984067  0.64073161 -3.48233117]]


In [39]:
# compute the mean for all elements
print(arr.mean())

3.2510264007736454


In [40]:
# compute the means by row
print(arr.mean(axis = 1))

[2.17699332 4.32505948]


In [41]:
# compute the means by column
print(arr.mean(axis = 0))

[-5.18229314 15.12252154 -1.54505672  5.88787615  1.97208419]


In [42]:
# sum all the elements
print(arr.sum())

32.51026400773645


In [43]:
# compute the medians
print(np.median(arr, axis = 1))

[6.53998555 0.64073161]


In [56]:
a = np.array([[1,2,3] , [5,6,7]])
a

array([[1, 2, 3],
       [5, 6, 7]])

In [57]:
a.sum()

24

In [60]:
a.mean()  # sum of the values / Num of observations => 24 / 6

4.0

In [64]:
np.median(a) # even n / 2 if odd => n + 1 / 2


4.0

In [65]:
a.sum(axis = 1)

array([ 6, 18])

In [66]:
a.sum(axis = 0)

array([ 6,  8, 10])

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Sorting:
</p>


In [44]:
# create a 10 element array of randoms
unsorted = np.random.randn(10)

print(unsorted)

[-0.22340038 -0.15396105  0.16790056 -0.27485611  1.11089076 -0.10363511
  0.66867835  1.19155913 -0.36038517  0.76006259]


In [45]:
# create copy and sort
sorted = np.array(unsorted)
sorted.sort()

print(sorted)
print()
print(unsorted)

[-0.36038517 -0.27485611 -0.22340038 -0.15396105 -0.10363511  0.16790056
  0.66867835  0.76006259  1.11089076  1.19155913]

[-0.22340038 -0.15396105  0.16790056 -0.27485611  1.11089076 -0.10363511
  0.66867835  1.19155913 -0.36038517  0.76006259]


In [69]:
a = [1,2,3,4]
a.sort(reverse = True)
a

[4, 3, 2, 1]

In [None]:
# inplace sorting
unsorted.sort() 

print(unsorted)

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Finding Unique elements:
</p>

In [46]:
array = np.array([1,2,1,4,2,1,4,2])

print(np.unique(array))

[1 2 4]


<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Set Operations with np.array data type:
</p>

In [48]:
s1 = np.array(['desk','chair','bulb'])
s2 = np.array(['lamp','bulb','chair'])
print(s1, s2)

['desk' 'chair' 'bulb'] ['lamp' 'bulb' 'chair']


In [49]:
print( np.intersect1d(s1, s2) ) 

['bulb' 'chair']


In [50]:
print( np.union1d(s1, s2) )

['bulb' 'chair' 'desk' 'lamp']


In [52]:
print( np.setdiff1d(s1, s2) )# elements in s1 that are not in s2

['desk']


In [53]:
print( np.in1d(s1, s2) )#which element of s1 is also in s2

[False  True  True]


<p style="font-family: Arial; font-size:2.75em;color:purple; font-style:bold"><br>

Broadcasting:
<br><br>
</p>

Introduction to broadcasting. <br>
For more details, please see: <br>
https://docs.scipy.org/doc/numpy-1.10.1/user/basics.broadcasting.html

In [10]:
import numpy as np

start = np.zeros((4,3))
print(start)

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


In [11]:
# create a rank 1 ndarray with 3 values
add_rows = np.array([1, 0, 2])
print(add_rows)

[1 0 2]


In [12]:
y = start + add_rows  # add to each row of 'start' using broadcasting
print(y)

[[1. 0. 2.]
 [1. 0. 2.]
 [1. 0. 2.]
 [1. 0. 2.]]


In [13]:
# create an ndarray which is 4 x 1 to broadcast across columns
add_cols = np.array([[0,1,2,3]])
add_cols = add_cols.T

print(add_cols)

[[0]
 [1]
 [2]
 [3]]


In [14]:
# add to each column of 'start' using broadcasting
y = start + add_cols 
print(y)

[[0. 0. 0.]
 [1. 1. 1.]
 [2. 2. 2.]
 [3. 3. 3.]]


In [15]:
# this will just broadcast in both dimensions
add_scalar = np.array([1])  
print(start+add_scalar)

[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]


Example from the slides:

In [16]:
# create our 3x4 matrix
arrA = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
print(arrA)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


In [17]:
# create our 4x1 array
arrB = [0,1,0,2]
print(arrB)

[0, 1, 0, 2]


In [18]:
# add the two together using broadcasting
print(arrA + arrB)

[[ 1  3  3  6]
 [ 5  7  7 10]
 [ 9 11 11 14]]


<p style="font-family: Arial; font-size:1.3em;color:#2462C0; font-style:bold"><br>

Binary Format:</p>

In [70]:
x = np.array([ 23.23, 24.24] )

In [71]:
np.save('an_array', x)

In [72]:
np.load('an_array.npy')

array([23.23, 24.24])

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Dot Product on Matrices and Inner Product on Vectors:

</p>

In [22]:
# determine the dot product of two matrices
a = np.array([[1,1],[2,2]])
b = np.array([[3,4],[4,4]]) # tensorflow we can discuss

print(a.dot(b))
print()
#print(np.dot(a, b))

[[ 7  8]
 [14 16]]



In [2]:
# determine the inner product of two vectors
import numpy as np
a1d = np.array([9 , 9 ])
b1d = np.array([10, 10])

print(a1d.dot(b1d))
print()
print(np.dot(b1d, a1d))

180

180


<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Sum:
</p>

In [None]:
# sum elements in the array
ex1 = np.array([[11,12],[21,22]])

print(np.sum(ex1))          # add all members

In [None]:
print(np.sum(ex1, axis=0))  # columnwise sum

In [None]:
print(np.sum(ex1, axis=1))  # rowwise sum

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Element-wise Functions: </p>

For example, let's compare two arrays values to get the maximum of each.

In [None]:
# random array
x = np.random.randn(8)
x

In [None]:
# another random array
y = np.random.randn(8)
y

In [None]:
# returns element wise maximum between two arrays

np.maximum(x, y)

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Reshaping array:
</p>

In [None]:
# grab values from 0 through 19 in an array
arr = np.arange(20)
print(arr)

In [None]:
# reshape to be a 4 x 5 matrix
arr.reshape(4,5)

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Transpose:

</p>

In [58]:
# transpose
ex1 = np.array([[11,12],[21,22]])

ex1.T

array([[11, 21],
       [12, 22]])