# Tech Exp Course 2, Session 3: Numpy

**Instructor**: Wesley Beckner<br>

**Contact**: wesleybeckner@gmail.com

<br>

---

<br>

Today, we will jump into the **Numpy** package.  


#### [``numpy``](http://numpy.org/): Numerical Python

Numpy is short for "Numerical Python", and contains tools for efficient manipulation of arrays of data.
If you have used other computational tools like IDL or MatLab, Numpy should feel very familiar.

<br>

---

## Import Libraries

In [3]:
import numpy as np
np.random.seed(42)

## 3.1 NumPy Arrays

### 3.1.1 Creating NumPy Arrays

When we worked with lists, we saw that we could fill them with all sorts of datatypes. NumPy arrays are necessarily of one datatype:

In [4]:
# these will all be ints
np.array([1, 2, 3, 6, 5, 4])

array([1, 2, 3, 6, 5, 4])

In [5]:
# these will all be floats
np.array([1, 2, 3.14, 6, 5, 4])

array([1.  , 2.  , 3.14, 6.  , 5.  , 4.  ])

We can check the data types in the standard way:

In [19]:
arr = np.array([1, 2, 3, 6, 5, 4])
for i in arr:
  print(type(i))

<class 'numpy.int64'>
<class 'numpy.int64'>
<class 'numpy.int64'>
<class 'numpy.int64'>
<class 'numpy.int64'>
<class 'numpy.int64'>


We see that all types are _upcast_ to the most complext object in the array. For instance, because 3.14 is a float, all the other numbers in the array will be a float:

In [20]:
for i in np.array([1, 2, 3.14, 6, 5, 4]):
  print(type(i))

<class 'numpy.float64'>
<class 'numpy.float64'>
<class 'numpy.float64'>
<class 'numpy.float64'>
<class 'numpy.float64'>
<class 'numpy.float64'>


We can also specify the datatypes in the array:

In [23]:
np.array([1, 2, 3.14, 6, 5, 4], dtype='float32')

array([1.  , 2.  , 3.14, 6.  , 5.  , 4.  ], dtype=float32)

#### 3.1.1.1 Exercise: Specify datatype

Create an array of 5 numbers whose datatypes are 16 bit integers. Make one of the numbers not a whole number. What happens to the number when it is stored in the 16 bit integer array?

In [None]:
# Cell for Ex 1

### 3.1.2 Creating Arrays from NumPy Methods

In [None]:
# create an array of 10 zeros
# how can we specify the datatype?
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [None]:
# create an array of 10 1's
np.ones(10)

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [None]:
# fill an array of the following dimensions
# with value 42
np.full((2,3), 42)

array([[42, 42, 42],
       [42, 42, 42]])

In [None]:
# arange from start (inc) to stop (exc)
# integers with step size
np.arange(1, 10, 2)

array([1, 3, 5, 7, 9])

In [None]:
# create an array of numbers that divides
# the space between start and stop (inc, inc)
# with X equally spaced intervals
np.linspace(0, 10, 5)

array([ 0. ,  2.5,  5. ,  7.5, 10. ])

In [None]:
# create an array of values drawn from a 
# uniform distribution
np.random.random(5)

array([0.37454012, 0.95071431, 0.73199394, 0.59865848, 0.15601864])

In [None]:
# create an array of values from a normal distribution
np.random.normal(loc=0, scale=1, size=(5,5))

array([[ 0.27904129,  1.01051528, -0.58087813, -0.52516981, -0.57138017],
       [-0.92408284, -2.61254901,  0.95036968,  0.81644508, -1.523876  ],
       [-0.42804606, -0.74240684, -0.7033438 , -2.13962066, -0.62947496],
       [ 0.59772047,  2.55948803,  0.39423302,  0.12221917, -0.51543566],
       [-0.60025385,  0.94743982,  0.291034  , -0.63555974, -1.02155219]])

In [29]:
# create an array of random integers between 5 and 10 with shape 2x2
np.random.randint(5, 11, (2,2))

array([[9, 5],
       [8, 6]])

### 3.1.3 Exercise: Creating Arrays

a. Create a 5x5 array of ones with datatype `int16`

In [None]:
# Cell for Ex 3.1.3 a

b. Create an array of 10 numbers drawn from a uniform distribution between 0 and 1

In [None]:
# Cell for Ex 3.1.3 b

c. Create an array of 10 numbers drawn from a normal distribution centered at 80 with a standard deviation of 5

In [None]:
# Cell for Ex 3.1.3 c

## 3.2 NumPy Array Attributes

Common array attributes are `shape`, `size`, `nbytes`, and `itemsize`

In [30]:
my_arr = np.random.randint(low=5, high=10, size=(5,5))
print(my_arr)

[[9 8 5 5 7]
 [7 6 8 8 7]
 [8 8 5 7 9]
 [7 9 5 6 8]
 [5 8 6 6 5]]


In [None]:
my_arr.shape

(5, 5)

In [None]:
my_arr.dtype

dtype('int64')

In [None]:
my_arr.size

25

In [None]:
my_arr.nbytes

200

In [None]:
my_arr.itemsize

8

### 3.2.1 Exercise: Conditional Check on Array Attributes

write a conditional that checks that the total number of bytes of the array object `my_arr` divided by the size of each item (in bytes) is equal to the number of items in the array (_hint: we covered the attributes above_)

In [None]:
# Cell for exercise 3.2.1

True

## 3.3 NumPy Array Slicing, Copying, Setting

Array slicing operates much the same way as with python lists

In [31]:
my_arr

array([[9, 8, 5, 5, 7],
       [7, 6, 8, 8, 7],
       [8, 8, 5, 7, 9],
       [7, 9, 5, 6, 8],
       [5, 8, 6, 6, 5]])

In [32]:
# grab the first row
my_arr[0]

array([9, 8, 5, 5, 7])

In [33]:
# grab the first element of the first row

# instead of this
print(my_arr[0][0])

# we do this
print(my_arr[0, 0])

9
9


We can time these...

In [34]:
%%timeit
my_arr[0][0]

The slowest run took 49.03 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 5: 388 ns per loop


In [35]:
%%timeit
my_arr[0, 0]

The slowest run took 61.38 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 5: 171 ns per loop


We can use the same slicing notation as with lists

`my_arr[start:stop:step]`

for n-dimensional arrays

`my_arr[1-start:1-stop:1-step, 2-start:2-stop:2-step, ... n-start:n-stop:n-step]`

In [36]:
# with arrays, we simply separate each dimension with a comma
my_arr[:2, :2]

array([[9, 8],
       [7, 6]])

Slices are views not copies. This means we can set slices of arrays to new values, and the original object will change:

In [37]:
my_arr[:2, :2] = 0
my_arr

array([[0, 0, 5, 5, 7],
       [0, 0, 8, 8, 7],
       [8, 8, 5, 7, 9],
       [7, 9, 5, 6, 8],
       [5, 8, 6, 6, 5]])

In [38]:
my_arr[-2:, -2:] = 1
my_arr

array([[0, 0, 5, 5, 7],
       [0, 0, 8, 8, 7],
       [8, 8, 5, 7, 9],
       [7, 9, 5, 1, 1],
       [5, 8, 6, 1, 1]])

Step through an array slice

In [39]:
# remember that we can use steps in slicing
my_arr[:, ::2] # the last number after :: is the step size

array([[0, 5, 7],
       [0, 8, 7],
       [8, 5, 9],
       [7, 5, 1],
       [5, 6, 1]])

We can use negative step sizes the way we do with lists. A negative step size reverses the order of start and stop, so it is a convenient way to reverse the order of one or more dimensions of an array

In [40]:
# reverse the rows
my_arr[::-1]

array([[5, 8, 6, 1, 1],
       [7, 9, 5, 1, 1],
       [8, 8, 5, 7, 9],
       [0, 0, 8, 8, 7],
       [0, 0, 5, 5, 7]])

In [41]:
# reverse the columns
my_arr[:, ::-1]

array([[7, 5, 5, 0, 0],
       [7, 8, 8, 0, 0],
       [9, 7, 5, 8, 8],
       [1, 1, 5, 9, 7],
       [1, 1, 6, 8, 5]])

Sometimes we want to create a copy of an array, despite the default slicing behavior. We can do this with the `.copy()` method

In [None]:
new_arr = my_arr.copy()
new_arr[:,:] = 0
print(my_arr)

[[0 0 5 8 6]
 [0 0 6 8 9]
 [6 6 8 6 6]
 [8 8 5 1 1]
 [6 9 6 1 1]]


### 3.3.1 Exercise: Array Setting and Slicing

set all the even columns of `my_arr` to 0 and all the odd columns to 1 (interpret the first column to be 1 and the last to be 5, i.e. don't index at 0 when thinking of each column as even/odd!)

In [None]:
# Cell for ex 3.3.1

array([[0, 1, 0, 1, 0],
       [0, 1, 0, 1, 0],
       [0, 1, 0, 1, 0],
       [0, 1, 0, 1, 0],
       [0, 1, 0, 1, 0]])

## 3.4 NumPy Array Reshaping, Concatenation, and Splitting

reshaping is going to be a common task for us:

In [47]:
arr = np.arange(9)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8])

In [48]:
# reshape into a 3x3 array
arr.reshape(3,3) # rows then columns

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

The reshaped dimensions have to be evenly divisible into the total number of elements:

_`-1` will infer the proper dimension based on the other dimensions provided and the total number of elements_

In [61]:
# arr.reshape(4,2) # throws and error
arr = np.arange(12)
arr.reshape(4,3) 

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

A common manipulation in numpy is to convert a 1 dimensional array into a 2 dimensional array. You will see this frequently when working with test/train datasets in machine learning.

In [63]:
arr = np.arange(9)
# reshape into 2 dimensions
arr.reshape(-1,1)

array([[0],
       [1],
       [2],
       [3],
       [4],
       [5],
       [6],
       [7],
       [8]])

In [65]:
# back to one dimension
arr.reshape(9)

array([0, 1, 2, 3, 4, 5, 6, 7, 8])

We can also concatenate arrays

In [67]:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
print(arr1, arr2)

[1 2 3] [4 5 6]


In [68]:
# now a single array
np.concatenate((arr1, arr2))

array([1, 2, 3, 4, 5, 6])

`vstack` or vertical stack will place the two arrays on top of eachother:

In [69]:
np.vstack((arr1,arr2))

array([[1, 2, 3],
       [4, 5, 6]])

`hstack` will place them side by side

In [None]:
np.hstack((arr1,arr2))

array([1, 2, 3, 4, 5, 6])

In [71]:
arr1 = np.array([[1, 2, 3],[7,8,9]])
arr2 = np.array([4, 5, 6])
print(arr1)
print(arr2)

print(arr1.shape)

[[1 2 3]
 [7 8 9]]
[4 5 6]
(2, 3)


In [None]:
np.vstack((arr1, arr2))

array([[1, 2, 3],
       [7, 8, 9],
       [4, 5, 6]])

Lastly, we can also split arrays. We give the indices where the split should be performed

In [76]:
arr = np.random.randint(5, 11, (10,10))
arr

array([[ 8,  9,  7,  7, 10,  8,  6,  6,  9, 10],
       [ 5,  9, 10,  8,  8,  8,  8,  8, 10, 10],
       [ 7,  6,  8,  5, 10,  5,  5,  5,  7, 10],
       [ 5,  8,  9,  5,  7, 10,  7,  5, 10,  9],
       [ 5,  7,  6,  8,  7, 10,  5,  8,  5, 10],
       [ 5,  6,  8,  8, 10,  6,  7,  5,  9,  5],
       [ 5,  7,  5,  6,  6,  8, 10,  9,  5,  5],
       [ 7, 10,  6,  9,  8,  6, 10,  8,  7,  7],
       [ 5, 10,  9,  8,  6, 10, 10,  7,  5,  5],
       [ 8,  7, 10,  9,  7,  8,  8,  7,  8,  7]])

In [82]:
a, b = np.split(arr, [5])
print(a)
print(b)

[[ 8  9  7  7 10  8  6  6  9 10]
 [ 5  9 10  8  8  8  8  8 10 10]
 [ 7  6  8  5 10  5  5  5  7 10]
 [ 5  8  9  5  7 10  7  5 10  9]
 [ 5  7  6  8  7 10  5  8  5 10]]
[[ 5  6  8  8 10  6  7  5  9  5]
 [ 5  7  5  6  6  8 10  9  5  5]
 [ 7 10  6  9  8  6 10  8  7  7]
 [ 5 10  9  8  6 10 10  7  5  5]
 [ 8  7 10  9  7  8  8  7  8  7]]


In [85]:
np.vsplit(arr, [2,4,6,8])

[array([[ 8,  9,  7,  7, 10,  8,  6,  6,  9, 10],
        [ 5,  9, 10,  8,  8,  8,  8,  8, 10, 10]]),
 array([[ 7,  6,  8,  5, 10,  5,  5,  5,  7, 10],
        [ 5,  8,  9,  5,  7, 10,  7,  5, 10,  9]]),
 array([[ 5,  7,  6,  8,  7, 10,  5,  8,  5, 10],
        [ 5,  6,  8,  8, 10,  6,  7,  5,  9,  5]]),
 array([[ 5,  7,  5,  6,  6,  8, 10,  9,  5,  5],
        [ 7, 10,  6,  9,  8,  6, 10,  8,  7,  7]]),
 array([[ 5, 10,  9,  8,  6, 10, 10,  7,  5,  5],
        [ 8,  7, 10,  9,  7,  8,  8,  7,  8,  7]])]

In [86]:
np.hsplit(arr, [5])

[array([[ 8,  9,  7,  7, 10],
        [ 5,  9, 10,  8,  8],
        [ 7,  6,  8,  5, 10],
        [ 5,  8,  9,  5,  7],
        [ 5,  7,  6,  8,  7],
        [ 5,  6,  8,  8, 10],
        [ 5,  7,  5,  6,  6],
        [ 7, 10,  6,  9,  8],
        [ 5, 10,  9,  8,  6],
        [ 8,  7, 10,  9,  7]]), array([[ 8,  6,  6,  9, 10],
        [ 8,  8,  8, 10, 10],
        [ 5,  5,  5,  7, 10],
        [10,  7,  5, 10,  9],
        [10,  5,  8,  5, 10],
        [ 6,  7,  5,  9,  5],
        [ 8, 10,  9,  5,  5],
        [ 6, 10,  8,  7,  7],
        [10, 10,  7,  5,  5],
        [ 8,  8,  7,  8,  7]])]

### 3.4.1 Exercise: Reshaping and Concatenating

We'll practice a few of these methods we've learned.

1. make `arr2` match the shape of `arr1` using `reshape`
2. stack `arr1` on top of `arr2` using `vstack` and call this new array `arr`
3. replace all the even columns of `arr` with zeros
4. return the sum of `arr` using `arr.sum()`

starting code:

```
np.random.seed(42)
arr1 = np.random.randint(5, 11, (5,10))
arr2 = np.random.randint(5, 11, (10,5))
```

expected output:

```
374
```

In [88]:
np.random.seed(42)
arr1 = np.random.randint(5, 11, (5,10))
arr2 = np.random.randint(5, 11, (10,5))
print(arr1,end='\n\n')
print(arr2)

[[ 8  9  7  9  9  6  7  7  7  9]
 [ 8  7 10  9  6  8 10 10  6  8]
 [ 9  5  8  6 10  9  8  5  5  7]
 [ 7  6  8  8 10 10 10  7  8  8]
 [ 5  7  9  7  9  5  6  8  5  8]]

[[10  6  6  5  6]
 [ 9  6  8  8  8]
 [ 8  9  7 10  5]
 [ 8  6  8  6 10]
 [10 10  6  8 10]
 [ 9  6  6  8  6]
 [ 6 10  8 10 10]
 [ 8  5 10  9  9]
 [ 6  9  6  5  8]
 [ 8  8  9  5  9]]


In [97]:
# cell for Ex 3.4.1

374