<p style="font-family: Arial; font-size:3.75em;color:purple; font-style:bold"><br>
Introduction to numpy:
</p><br>

<p style="font-family: Arial; font-size:1.25em;color:#2462C0; font-style:bold"><br>
Package for scientific computing with Python
</p><br>

Numerical Python, or "Numpy" for short, is a foundational package on which many of the most common data science packages are built.  Numpy provides us with high performance multi-dimensional arrays which we can use as vectors or matrices.  

The key features of numpy are:

- ndarrays: n-dimensional arrays of the same data type which are fast and space-efficient.  There are a number of built-in methods for ndarrays which allow for rapid processing of data without using loops (e.g., compute the mean).
- Broadcasting: a useful tool which defines implicit behavior between multi-dimensional arrays of different sizes.
- Vectorization: enables numeric operations on ndarrays.
- Input/Output: simplifies reading and writing of data from/to file.

<b>Additional Recommended Resources:</b><br>
<a href="https://docs.scipy.org/doc/numpy/reference/">Numpy Documentation</a><br>
<i>Python for Data Analysis</i> by Wes McKinney<br>
<i>Python Data science Handbook</i> by Jake VanderPlas



<p style="font-family: Arial; font-size:2.75em;color:purple; font-style:bold"><br>

Getting started with ndarray<br><br></p>

**ndarrays** are time and space-efficient multidimensional arrays at the core of numpy.  Like the data structures in Week 2, let's get started by creating ndarrays using the numpy package.

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

How to create Rank 1 numpy arrays:
</p>

In [1]:
import numpy as np

an_array = np.array([3, 33, 333])  # Create a rank 1 array

print(type(an_array))              # The type of an ndarray is: "<class 'numpy.ndarray'>"

<class 'numpy.ndarray'>


In [2]:
# test the shape of the array we just created, it should have just one dimension (Rank 1)
print(an_array.shape)

(3,)


In [3]:
# because this is a 1-rank array, we need only one index to accesss each element
print(an_array[0], an_array[1], an_array[2]) 

3 33 333


In [4]:
an_array[0] = 888                 # ndarrays are mutable, here we change an element of the array

print(an_array)

[888  33 333]


<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

How to create a Rank 2 numpy array:</p>

A rank 2 **ndarray** is one with two dimensions.  Notice the format below of [ [row] , [row] ].  2 dimensional arrays are great for representing matrices which are often useful in data science.

In [5]:
another = np.array([[11,12,13],[21,22,23]])   # Create a rank 2 array

print(another)  # print the array

print("The shape is 2 rows, 3 columns: ", another.shape)  # rows x columns                   

print("Accessing elements [0,0], [0,1], and [1,0] of the ndarray: ", another[0, 0], ", ",another[0, 1],", ", another[1, 0])

[[11 12 13]
 [21 22 23]]
The shape is 2 rows, 3 columns:  (2, 3)
Accessing elements [0,0], [0,1], and [1,0] of the ndarray:  11 ,  12 ,  21


<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

There are many way to create numpy arrays:
</p>

Here we create a number of different size arrays with different shapes and different pre-filled values.  numpy has a number of built in methods which help us quickly and easily create multidimensional arrays.

In [6]:
import numpy as np

# create a 2x2 array of zeros
ex1 = np.zeros((2,2))      
print(ex1)                              

[[ 0.  0.]
 [ 0.  0.]]


In [7]:
# create a 2x2 array filled with 9.0
ex2 = np.full((2,2), 9.0)  
print(ex2)   

[[ 9.  9.]
 [ 9.  9.]]


In [8]:
# create a 2x2 matrix with the diagonal 1s and the others 0
ex3 = np.eye(2)
print(ex3)  

[[ 1.  0.]
 [ 0.  1.]]


In [9]:
# create an array of ones
ex4 = np.ones((1,2))
print(ex4)    

[[ 1.  1.]]


In [10]:
# notice that the above ndarray (ex4) is actually rank 2, it is a 2x1 array
print(ex4.shape)

# which means we need to use two indexes to access an element
print()
print(ex4[0,1])

(1, 2)

1.0


In [11]:
# create an array of random floats between 0 and 1
ex5 = np.random.random((2,2))
print(ex5)    

[[ 0.60708015  0.45984756]
 [ 0.15594513  0.41336809]]


<p style="font-family: Arial; font-size:2.75em;color:purple; font-style:bold"><br>

Array Indexing
<br><br></p>

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>
Slice indexing:
</p>

Similar to the use of slice indexing with lists and strings, we can use slice indexing to pull out sub-regions of ndarrays.

In [12]:
import numpy as np

# Rank 2 array of shape (3, 4)
an_array = np.array([[11,12,13,14], [21,22,23,24], [31,32,33,34]])
print(an_array)

[[11 12 13 14]
 [21 22 23 24]
 [31 32 33 34]]


Use array slicing to get a subarray consisting of the first 2 rows x 2 columns.

In [13]:
a_slice = an_array[:2, 1:3]
print(a_slice)

[[12 13]
 [22 23]]


When you modify a slice, you actually modify the underlying array.

In [14]:
print("Before:", an_array[0, 1])   #inspect the element at 0, 1  
a_slice[0, 0] = 1000    # a_slice[0, 0] is the same piece of data as an_array[0, 1]
print("After:", an_array[0, 1])    

Before: 12
After: 1000


<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Use both integer indexing & slice indexing
</p>

We can use combinations of integer indexing and slice indexing to create different shaped matrices.

In [15]:
# Create a Rank 2 array of shape (3, 4)
an_array = np.array([[11,12,13,14], [21,22,23,24], [31,32,33,34]])
print(an_array)

[[11 12 13 14]
 [21 22 23 24]
 [31 32 33 34]]


In [16]:
# Using both integer indexing & slicing generates an array of lower rank
row_rank1 = an_array[1, :]    # Rank 1 view 

print(row_rank1, row_rank1.shape)  # notice only a single []

[21 22 23 24] (4,)


In [17]:
# Slicing alone: generates an array of the same rank as the an_array
row_rank2 = an_array[1:2, :]  # Rank 2 view 

 
print()
print(row_rank2, row_rank2.shape)   # Notice the [[ ]]


[[21 22 23 24]] (1, 4)


In [18]:
#We can do the same thing for columns of an array:

print()
col_rank1 = an_array[:, 1]
col_rank2 = an_array[:, 1:2]

print(col_rank1, col_rank1.shape)  # Rank 1
print()
print(col_rank2, col_rank2.shape)  # Rank 2


[12 22 32] (3,)

[[12]
 [22]
 [32]] (3, 1)


<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Array Indexing for changing elements:
</p>

In [19]:
# Create a new array
an_array = np.array([[11,12,13], [21,22,23], [31,32,33], [41,42,43]])

print('Original Array:')
print(an_array)

Original Array:
[[11 12 13]
 [21 22 23]
 [31 32 33]
 [41 42 43]]


In [20]:
# Create an array of indices
indices = np.array([0, 1, 2, 0])

whichRows = np.arange(4)
print('\nRows indices picked : %s' % whichRows)


Rows indices picked : [0 1 2 3]


In [21]:
# Select one element from each row
print('Values in the array at those indices: ',an_array[whichRows, indices])

Values in the array at those indices:  [11 22 33 41]


In [22]:
# Examine the pairings of whichRows and indices.  These are the elements we'll change next.
for row,col in zip(whichRows,indices):
    print(row, ", ",col)

0 ,  0
1 ,  1
2 ,  2
3 ,  0


In [23]:
# Change one element from each row using the indices selected
an_array[whichRows, indices] += 100000

print('\nChanged Array:')
print(an_array)


Changed Array:
[[100011     12     13]
 [    21 100022     23]
 [    31     32 100033]
 [100041     42     43]]


<p style="font-family: Arial; font-size:2.75em;color:purple; font-style:bold"><br>

Applying Conditional Features to Select Data:
<br><br></p>

In [24]:
# create a 3x2 array
an_array = np.array([[11,12], [21, 22], [31, 32]])
print(an_array)

[[11 12]
 [21 22]
 [31 32]]


In [25]:
# create a filter which will be boolean values for whether each element meets this condition
filter = (an_array > 15)
filter

array([[False, False],
       [ True,  True],
       [ True,  True]], dtype=bool)

Notice that the filter is a same size ndarray as an_array which is filled with True for each element whose corresponding element in an_array which is greater than 15 and False for those elements whose value is less than 15.

In [26]:
# we can now select just those elements which meet that criteria
print(an_array[filter])

[21 22 31 32]


In [27]:
# For short, we could have just used the approach below without the need for the separate filter array.

an_array[an_array > 15]

array([21, 22, 31, 32])

What is particularly useful is that we can actually change elements in the array applying a similar logical filter.  Let's add 100 to all the even values.

In [28]:
an_array[an_array % 2 == 0] +=100
print(an_array)

[[ 11 112]
 [ 21 122]
 [ 31 132]]


<p style="font-family: Arial; font-size:2.75em;color:purple; font-style:bold"><br>

Datatypes and Array Operations
<br><br></p>

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Datatypes:
</p>

In [29]:
ex1 = np.array([11, 12]) # Python assigns the  data type
print(ex1.dtype)

ex2 = np.array([11.0, 12.0]) # Python assigns the  data type
print(ex2.dtype)

ex3 = np.array([11, 21], dtype=np.int64) #You can also tell Python the  data type
print(ex3.dtype)

int64
float64
int64


<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Arithmetic Array Operations:

</p>

In [30]:
x = np.array([[111,112],[121,122]], dtype=np.int)
y = np.array([[211.1,212.1],[221.1,222.1]], dtype=np.float64)

print(x + y)
print(np.add(x, y))

[[ 322.1  324.1]
 [ 342.1  344.1]]
[[ 322.1  324.1]
 [ 342.1  344.1]]


In [31]:
print(x - y)
print(np.subtract(x, y))

[[-100.1 -100.1]
 [-100.1 -100.1]]
[[-100.1 -100.1]
 [-100.1 -100.1]]


In [32]:
print(x * y)
print(np.multiply(x, y))

[[ 23432.1  23755.2]
 [ 26753.1  27096.2]]
[[ 23432.1  23755.2]
 [ 26753.1  27096.2]]


In [33]:
print(x / y)
print(np.divide(x, y))

[[ 0.52581715  0.52805281]
 [ 0.54726368  0.54930212]]
[[ 0.52581715  0.52805281]
 [ 0.54726368  0.54930212]]


In [34]:
print(np.sqrt(x))

[[ 10.53565375  10.58300524]
 [ 11.          11.04536102]]


In [35]:
print(np.exp(x))

[[  1.60948707e+48   4.37503945e+48]
 [  3.54513118e+52   9.63666567e+52]]


<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Dot Product on Matrices and Inner Product on Vectors:

</p>

In [36]:
x2d = np.array([[1,1],[1,1]])
y2d = np.array([[2,2],[2,2]])

print(x2d.dot(y2d))
print()
print(np.dot(x2d, y2d))

[[4 4]
 [4 4]]

[[4 4]
 [4 4]]


In [37]:
a1d = np.array([9 , 9 ])
b1d = np.array([10, 10])

print(a1d.dot(b1d))
print()
print(np.dot(a1d, b1d))

180

180


In [38]:
print(x2d.dot(a1d))
print()
print(np.dot(x2d, a1d))

[18 18]

[18 18]


<p style="font-family: Arial; font-size:2.75em;color:purple; font-style:bold"><br>

Additional Common Array Operations
<br><br></p>

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Sum / Transpose:

</p>

In [39]:
ex1 = np.array([[11,12],[21,22]])

print(np.sum(ex1))          # add all members

66


In [40]:
print(np.sum(ex1, axis=0))  # columnwise sum

[32 34]


In [41]:
print(np.sum(ex1, axis=1))  # rowwise sum

[23 43]


<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Element-wise Functions: </p>

In [42]:
x = np.random.randn(8)
x

array([ 0.39338158,  2.1250698 , -0.66045678, -0.52936457, -0.91157388,
        0.5687658 , -0.85955859, -0.29799528])

In [43]:
y = np.random.randn(8)
y

array([-0.85658354, -0.97600948,  2.01268296,  0.44037292,  0.51197434,
       -0.64134707, -0.71198389, -0.42090743])

In [44]:
# returns element wise maximum between two arrays

np.maximum(x, y)

array([ 0.39338158,  2.1250698 ,  2.01268296,  0.44037292,  0.51197434,
        0.5687658 , -0.71198389, -0.29799528])

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Reshaping array:
</p>

In [45]:
arr = np.arange(20)
print(arr)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]


In [46]:
arr.reshape(4,5)

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

In [47]:
ex1.T

array([[11, 21],
       [12, 22]])

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Indexing using where():</p>

In [48]:
x_1 = np.array([1,2,3,4,5])

y_1 = np.array([11,22,33,44,55])

filter = np.array([True, False, True, False, True])

In [49]:
out = np.where(filter, x_1, y_1)
print(out)

[ 1 22  3 44  5]


In [50]:
mat = np.random.rand(5,5)
mat

array([[ 0.49034694,  0.34063095,  0.58060335,  0.03005724,  0.91333017],
       [ 0.37862955,  0.69696005,  0.63449459,  0.545719  ,  0.79475375],
       [ 0.67132951,  0.02836454,  0.06147228,  0.4554479 ,  0.36542797],
       [ 0.31153395,  0.56318331,  0.60830565,  0.58305217,  0.88287609],
       [ 0.28258947,  0.09038758,  0.50576655,  0.21734769,  0.71394119]])

In [51]:
np.where( mat > 0.5, 1000, -1)

array([[  -1,   -1, 1000,   -1, 1000],
       [  -1, 1000, 1000, 1000, 1000],
       [1000,   -1,   -1,   -1,   -1],
       [  -1, 1000, 1000, 1000, 1000],
       [  -1,   -1, 1000,   -1, 1000]])

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

"any" or "all" conditionals:</p>

In [52]:
arr_bools = np.array([ True, False, True, True, False ])

In [53]:
arr_bools.any()

True

In [54]:
arr_bools.all()

False

<p style="font-family: Arial; font-size:2.75em;color:purple; font-style:bold"><br>

Broadcasting:
<br><br>
</p>

In [55]:
start   = np.array([
                    [11,12,13], 
                    [21*10,22*10,23*10], 
                    [31*100,32*100,33*100], 
                    [41*1000,42*1000,43*1000]
                   ])
print(start)
print()

addThis = np.array([1, 0, 1])
print(addThis)
print()

y = start + addThis  # add to each row of 'start' using broadcasting
print(y)

[[   11    12    13]
 [  210   220   230]
 [ 3100  3200  3300]
 [41000 42000 43000]]

[1 0 1]

[[   12    12    14]
 [  211   220   231]
 [ 3101  3200  3301]
 [41001 42000 43001]]


<p style="font-family: Arial; font-size:2.75em;color:purple; font-style:bold"><br>

Statistical Methods, Sorting, and Set Operations:
<br><br>
</p>

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Basic Statistical Operations:
</p>

In [56]:
arry = 10 * np.random.randn(2,4)
arry

array([[ -3.59030283,   1.85416813,  -9.67242144,  -0.93078953],
       [ -5.55620499,   8.92697876,   2.71537451,  29.53890676]])

In [57]:
arry.mean()

2.9107136721874141

In [58]:
arry.mean(axis = 1)

array([-3.08483642,  8.90626376])

In [59]:
arry.mean(axis = 0)

array([ -4.57325391,   5.39057345,  -3.47852346,  14.30405862])

In [60]:
arry.sum()

23.285709377499312

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Sorting:
</p>


In [61]:
unsorted = np.random.randn(10)
unsorted

array([ 0.45639976, -0.06337989,  0.60291537, -0.86527552, -0.11525533,
       -0.2869925 , -0.79359996,  0.72292863,  1.22576201,  0.30092086])

In [62]:
unsorted.sort() #inplace sorting
unsorted

array([-0.86527552, -0.79359996, -0.2869925 , -0.11525533, -0.06337989,
        0.30092086,  0.45639976,  0.60291537,  0.72292863,  1.22576201])

In [63]:
# Bonus: Find how many unique values are there in the array:
np.unique(unsorted)

array([-0.86527552, -0.79359996, -0.2869925 , -0.11525533, -0.06337989,
        0.30092086,  0.45639976,  0.60291537,  0.72292863,  1.22576201])

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Set Operations with np.array data type:
</p>

In [64]:
s1 = np.array(['desk','chair','bulb'])
s2 = np.array(['lamp','bulb','chair'])

In [65]:
print(s1, s2)

['desk' 'chair' 'bulb'] ['lamp' 'bulb' 'chair']


In [66]:
np.intersect1d(s1, s2)

array(['bulb', 'chair'], 
      dtype='<U5')

In [67]:
np.union1d(s1, s2)

array(['bulb', 'chair', 'desk', 'lamp'], 
      dtype='<U5')

In [68]:
np.setdiff1d(s1, s2) # elements in s1 that are not in s2

array(['desk'], 
      dtype='<U5')

In [69]:
np.in1d(s1, s2) #which element of s1 is also in s2

array([False,  True,  True], dtype=bool)

<p style="font-family: Arial; font-size:2.75em;color:purple; font-style:bold"><br>

Read or Write to Disk:
<br><br>
</p>

<p style="font-family: Arial; font-size:1.3em;color:#2462C0; font-style:bold"><br>

Binary Format:</p>

In [70]:
x = np.array([ 23.23, 24.24] )

In [71]:
np.save('an_array', x)

In [72]:
np.load('an_array.npy')

array([ 23.23,  24.24])

<p style="font-family: Arial; font-size:1.3em;color:#2462C0; font-style:bold"><br>

Text Format:</p>

In [73]:
np.savetxt('array.txt', X=x, delimiter=',')

In [74]:
!cat array.txt

2.323000000000000043e+01
2.423999999999999844e+01


In [75]:
np.loadtxt('array.txt', delimiter=',')

array([ 23.23,  24.24])

<p style="font-family: Arial; font-size:2.75em;color:purple; font-style:bold"><br>

Extra ndarray useful operations:
<br><br>
</p>

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Random Number Generation:
</p>

In [76]:
Y = np.random.normal(size = (1,5))[0]
print(Y)

[ 0.83494707 -1.611114    0.23000912  0.1887899   0.37229041]


In [77]:
Z = np.random.randint(low=2,high=50,size=4)
print(Z)

[17 29 14 33]


In [78]:
np.random.permutation(Z) #return a new ordering of elements in Z

array([33, 17, 29, 14])

In [79]:
np.random.uniform(size=4) #uniform distribution

array([ 0.6579628 ,  0.56194906,  0.94308759,  0.74637378])

In [80]:
np.random.normal(size=4) #normal distribution

array([-1.71352849, -3.05783782, -0.19683102,  0.2275861 ])

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Merging data sets:
</p>

In [81]:
K = np.random.randint(low=2,high=50,size=(2,2))
print(K)

print()
M = np.random.randint(low=2,high=50,size=(2,2))
print(M)

[[17  6]
 [25 43]]

[[39 44]
 [11 20]]


In [82]:
np.vstack((K,M))

array([[17,  6],
       [25, 43],
       [39, 44],
       [11, 20]])

In [83]:
np.hstack((K,M))

array([[17,  6, 39, 44],
       [25, 43, 11, 20]])

In [84]:
np.concatenate([K, M], axis = 0)

array([[17,  6],
       [25, 43],
       [39, 44],
       [11, 20]])

In [85]:
np.concatenate([K, M.T], axis = 1)

array([[17,  6, 39, 11],
       [25, 43, 44, 20]])

<p style="font-family: Arial; font-size:2.75em;color:purple; font-style:bold"><br>

Speedtest: ndarrays vs lists
<br><br>
</p>

In [1]:
from numpy import arange
from timeit import Timer

size    = 100000
timeits = 10000

In [2]:
nd_array = arange(size)
type(nd_array)

numpy.ndarray

In [3]:
timer_numpy = Timer("nd_array.sum()", "from __main__ import nd_array")

print("Time taken by numpy ndarray: %.3e" % (timer_numpy.timeit(timeits)/timeits))

Time taken by numpy ndarray: 6.883e-05


In [4]:
a_list = list(range(size))
type(a_list)

list

In [5]:
timer_list = Timer("sum(a_list)", "from __main__ import a_list")

print("Time taken by list:  %.3e" % (timer_list.timeit(timeits)/timeits))

Time taken by list:  1.188e-03
