# Lecture 4-1

# NumPy Basics

## Week 4 Monday

## Miles Chen, PhD

Based on Python Data Science Handbook by Jake VanderPlas

## ALWAYS do: `import numpy as np`
This is a convention that everyone follows. If you do not do this, other people will have a hard time reading your code

In [2]:
import numpy as np

In [None]:
np.__version__

## Numpy arrays
- like lists, arrays are mutable
- unlike lists, arrays can only contain data of the same data type

## Making Arrays
- direct creation with `np.array()`
- Create a list with square brackets, and put that inside `np.array()`

In [None]:
np.array( [1,2,3] )

In an array, all the values have to be of the same datatype. This is different from lists, which can have any datatype, including functions. Arrays are also mutable.

In [None]:
a = np.array([1, 2, 3])
print(a) # printing an array appears different from the array([]) in ipython

In [None]:
print([1,2,3]) # a printed list has commas

A printed array has no commas. A printed list has commas.

In [None]:
type(a)

In the following code cells, I'll sometimes just produce the array directly and sometimes I will call print on the array. Get used to seeing both kinds of output.

### Upcasting

If you mix data types in an array, the values of the more restrictive types will get upcast to the value of the less restrictive type.

In [None]:
b = np.array([1, 2, 3.0, False, True])
print(b) # the 3.0 is a float and will upcast (coerce) other values to floats

In [None]:
c = np.array([1, 2, "3", True, False]) # upcast (coerced) to strings
print(c)

### Arrays in Higher dimensions

If you provide a list of lists, you can create a multi-dimensional array. (Like a matrix)

In [3]:
d = np.array( [ [1,2,3] , [4,5,6] ] ) 
print(d)

[[1 2 3]
 [4 5 6]]


When you print a multidimensional array, the number of opening square brackets is the number of dimensions. The above array is 2 dimensional.

but if the dimensions don't match, you'll get an error.

In [4]:
e = np.array([ [1,2,3],[4,5] ])
print(e)

[list([1, 2, 3]) list([4, 5])]


  e = np.array([ [1,2,3],[4,5] ])


## Other ways to make arrays

In [None]:
print(np.zeros(5)) # makes an array of 0s. similar to rep(0, 5)

In [None]:
print(np.zeros(5, dtype = int))  # default is to make floats, you can specify ints

In [None]:
print(np.zeros((2,4)))  # give dimensions as a tuple: makes an array 2x4

In [None]:
print(np.zeros((2,3,4))) # 3 dimensional array 2 x 3 x 4... 
# notice the order of creation: 2 'sheets' of 3 rows by 4 columns

In [None]:
np.zeros((2,3,4,5))
# make 2 'blocks', each with 3 'sheets', of 4 rows, and 5 columns

In 2 different dimensions, there are 3 4x5 blocks

In addition to `np.zeros` there is `np.ones` and `np.full` which can create new arrays.

In [None]:
print(np.ones(5))  # similar, but inserts ones

In [None]:
print(np.full((2,3), 1.2))# similar, but you specify one value that gets repeated

## Making arrays of random numbers
numpy uses the Mersenne Twister
- All random generator functions begin with `np.random.`

In [None]:
np.random.seed(1)  # seed the generator for reproducibility

In [None]:
print(np.random.random(5))  # random.random for random values on the interval [0,1)

In [None]:
np.random.randn(5)
# random.randn for random normal from standard normal
# this command will produce 5 values

In [None]:
np.random.normal(10, 3, (2, 4))
# random.randn for random normal from normal with mean 10 and sd 3
# arranged in a 2 x 4 matrix

In [None]:
np.random.randint(0, 10, 20) 
# select random integers from 0 inclusive to 10 exclusive
# and return 20 values

In [None]:
# simulate dice rolls
np.random.randint(1,7, 50)

More random generation at: https://numpy.org/doc/stable/reference/random/index.html

## Array sequences
make sequences with

- `np.arange(start, stop, step)`  
-  makes an **a**rray **range** from start (inclusive) to stop (exclusive), by step

In [None]:
range(0, 10, 2) # range object in regular python

In [None]:
list(range(0, 10, 2))

In [None]:
np.arange(0, 10, 2)  # numpy's arange function

In [None]:
np.array(range(0,10,2)) # equivalent 'manual' creation

In [None]:
np.arange(0, 100, 5)

In [None]:
np.arange(20) # quickest

- `np.linspace(start, stop, num)`
- makes an array of **lin**ear **space**d values beginning with start, ending with stop (inclusive), with a length of num

In [None]:
np.linspace(0, 100, 11)

In [None]:
np.linspace(0, 100, 10)

In [None]:
np.linspace(0, 100, 10, endpoint = False)  
# optional parameter endpoint to exclude the stop value

In [None]:
np.linspace(0, 100, 9, endpoint = False) 
# if you use the endpoint argument, the last number in the array will depend on the output length

# Array Attributes
- `array.ndim` for number of dimensions
- `array.shape` for the size of each dimension
- `array.dtype` for the data type 

In [None]:
x = np.ones((3,4))
print(x)

In [None]:
x.ndim

In [None]:
x.shape

In [None]:
x.dtype

In [None]:
y = np.arange(0, 12, 1)
print(y)

In [None]:
y.ndim

In [None]:
y.shape # a one dimensional array. Note that there's no second dimension.

## Reshaping Arrays

- `np.reshape(array, [new shape])` returns a new array that is reshaped
    - you can also use the method `array.reshape(shape)`
- `array.T` is the transpose method, but leaves the original array unaffected


In [None]:
j = np.arange(0,12,1)
print(j) # j is one dimensional

In [None]:
k = np.reshape(j, (3,4))  # note that it fills row-wise unlike R
print(k)

In [None]:
j # j is left unchanged

In [None]:
j.reshape(4,3)  # you can also call the method reshape() on the array j

In [None]:
j # j is left unchanged here as well

In [None]:
print(k)

In [None]:
print(k.T)  # the transpose of k

In [None]:
print(k) # calling k.T does not modify the original k array

In [None]:
# can combine the above methods and steps into one:
l = np.arange(0,12,1).reshape((3,4)).T
# create a-range >> reshape >> transpose
print(l)

In [None]:
j

In [None]:
j.reshape((3, -1))  # using -1 for a dimension will ask python to figure out the number to use for that dimension

In [None]:
j.reshape((-1, 4))

In [None]:
j.reshape((2, -1, 2)) # two sheets, unknown number of rows, 2 columns

In [None]:
y = np.arange(0,12, 1)
print(y)

In [None]:
y.shape

In [None]:
print(y.T) # the transpose of a one dimensional array doesn't suddenly give it a second dimension

In [None]:
y.T.shape

In [None]:
z = np.reshape(y, (1,12)) # the array now has two dimensions
print(z)

In [None]:
z.shape

In [None]:
print(z.T)  # with two dimensions, the transpose become a column

In [None]:
z.T.shape

# Subsetting and Slicing Arrays
- very similar to subsetting and slicing lists

In [None]:
y = np.arange(0,12, 1)
print(y)

In [None]:
y[4]

In [None]:
y.shape

In [None]:
y[4:6]

you can slice with a second colon. The array gets subset with `array[start:stop:step]`

In [None]:
y[1:8:3]

In [None]:
np.arange(100)[:100:2] # to get even values

In [None]:
np.arange(0,100,2)

Subsetting and slicing higher dimensional arrays is similar, and uses a comma to separate subsetting instructions for each dimension.

In [None]:
z = np.reshape(y, [3,4])
print(z)

In [None]:
print(z[1,2]) # returns what is at row index 1, col index 2

In [None]:
type(z[1,2]) # with only one value, the type is the integer. It is no longer an array.

In [None]:
print(z)
z[0:2, 0:2] # note the type remains a numpy array

In [None]:
print(z[2, :]) # returns row at index 2

In [None]:
print(z[2,])

In [None]:
z[2, :].shape  # the shape is one dimensional

In [None]:
print(z[:,2]) # returns column at index 2

In [None]:
z[:,2].shape # shape is one dimensional

#### Slices of numpy arrays are view objects, and automatically update if the original array is updated.

In [2]:
z = np.arange(12).reshape([3,4])
print(z)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


In [None]:
# we use numpy array slicing to create z_sub, the top left corner of z
z_sub = z[:2, :2]
print(z_sub)

In [3]:
# I modify the first element of z to be 99.
z[0,0] = 99

In [None]:
print(z_sub)  # z_sub is updated, even though we never redefined it

In [None]:
z

In [4]:
z = np.arange(15).reshape([3,5]) # here z gets redefined to an entirely new object
# we are not modifying the object that used to be called z
# we created a new object, and the name z now points to the new object

In [5]:
z

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [None]:
print(z_sub)  # the view z_sub still points to the object formerly known as z, which was not modified 

## If you want a copy that will not update if the original is updated, use `array.copy()`

In [9]:
print(z)

[[55  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]


In [6]:
z_sub_copy = z[:2, :2].copy()
print(z_sub_copy)

[[0 1]
 [5 6]]


In [7]:
z[0,0] = 55 # modify the first element of z

In [10]:
z_snapshot = z[:, 3].copy()
print(z_snapshot)
print(z)
z[:, 3] = [1, 2, 3]
print(z)

[ 3  8 13]
[[55  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]
[[55  1  2  1  4]
 [ 5  6  7  2  9]
 [10 11 12  3 14]]


In [8]:
print(z_sub_copy) # the copy remains unaffected by the change

[[0 1]
 [5 6]]


In [None]:
print(z)

Modifying the view object modifies the underlying array

In [None]:
z = np.arange(12).reshape((3,4))
print(z)

In [None]:
view = z[:2,:2]

In [None]:
view[0,0] = 99

In [None]:
view

In [None]:
z

In [None]:
type(view) # view objects themselves are arrays and have all the same methods and attributes

In [None]:
view.T

In [None]:
view.T.reshape((4,))

In [None]:
view # attributes like .T do not affect the orignal array