# NumPy Fundamentals: Array Basics

Until now, we have focused on Python basics and built-in functions. In order to access specific packages, like NumPy, we need to import them like so:

In [1]:
# import the numpy as the shorthand np
import numpy as np

Using a standardized alias like `np` saves us a little bit of time later.

## NumPy Arrays
The most basic data structure in NumPy is an `ndarray`. 
The name `ndarray` comes from the fact that they can store "n dimensions" of data.

Lastly, arrays are similar to lists in that they can be indexed with the same syntax; however, unlike lists, NumPy arrays must contain homogenous data types. 

Let's start by defining a simple 1D array, which can also be referred to as a vector:

In [4]:
# Create a vector from a list
my_list = [1,2,3,4]
my_vector = np.array(my_list)
print(my_list, type(my_list))
print(my_vector, type(my_vector))

[1, 2, 3, 4] <class 'list'>
[1 2 3 4] <class 'numpy.ndarray'>


Using what we learned from Module 1, we can index the list and the vector exactly the same way:

In [5]:
# Index the first elements:
print(f"The first element of the list: { my_list[0] }")
print(f"The first element of the vector: { my_vector[0] }")

The first element of the list: 1
The first element of the vector: 1


Using numpy arrays, we can make two dimensional arrays. This will look like a list *of* lists. The syntax for slicing should look familiar from indexing lists and tuples--using square brackets indexes the variable.

In [6]:
my_2d_array = np.array([[5,6,7], [7,8,9]])
print(f"2D array: \n{my_2d_array}\n")
print(f"Slice of 2D array: \n{my_2d_array[1][1:3]}")


2D array: 
[[5 6 7]
 [7 8 9]]

Slice of 2D array: 
[8 9]


We can also use indexing reassign new values as we would with lists:

In [12]:
my_2d_array[1][1] = 0
print(my_2d_array)

[[5 6 7]
 [7 0 9]]


Similarly, we can index and reassign indices in 3d arrays:

In [13]:
my_3d_array = np.array([[[9,10], [11,12]], [[13,14], [15,16]]])
print(f"3D array: \n{my_3d_array}\n")
print(f"Slice of 3D array: \n{my_3d_array[0][1][1]}\n")
my_3d_array[1][1][0] = 5
print(f"New 3D array: \n{my_3d_array}")

3D array: 
[[[ 9 10]
  [11 12]]

 [[13 14]
  [15 16]]]

Slice of 3D array: 
12

New 3D array: 
[[[ 9 10]
  [11 12]]

 [[13 14]
  [ 5 16]]]


## Array Arithmetic

We will use the following arrays `u` and `v` for demonstration purposes.

In [14]:
u = np.array([1,2,3])
v = np.array([5,10,15])

NumPy arrays behave much like vectors do in calculus.

We can do simple operations such as scalar multiplication, where the scalar number is multiplied into each element of the array.

If we multiply `u` by the scalar value `3`, we should get a result of `[3,6,9]`. Let's check!

In [15]:
print(u * 3)

[3 6 9]


Many other scalar/vector operations are supported as well, such as addition, subtraction, division, etc.

In [16]:
print(u + 2)
print(u - 2)
print(u / 2)

[3 4 5]
[-1  0  1]
[0.5 1.  1.5]


NumPy offers methods for the following arithmetic operations, which perform their operations *element-wise*. This means the operation is done between the elements in coresponding indices.

| Element-wise Operation | Operator | Built-in |
|------------------------|----------|----------|
| Addition | `+` | `add()` |
| Subtraction | `-` | `subtract()` |
| Multiplication | `*` | `multiply()` |
| Division | `/` | `divide()` |
| Exponentiation | `**` | `power()` |
| Modulus | `%` | `mod()` |

In [20]:
print(u * v)

print(u * u)
print(u**2)

print(v / u)

print(v % u)

[ 5 20 45]
[1 4 9]
[1 4 9]
[5. 5. 5.]
[0 0 0]


## Common Array Constructors

There are many ways to create an array in NumPy. These functions are called constructors and are often offered as *convenience* methods to make common ways of constructing an array easier.

In general, these constructors will have 2 important parameters, `shape` and `dtype=float`. `shape` must be defined by the user but `dtype` usually defaults to `float`. This second parameter defines the datatype that will be used in construction of the elements of the array.

`np.zeros(shape)` and `np.ones(shape)` will create arrays with zeros or ones respectivly as elements.

In [24]:
print(np.zeros(4))
print(np.ones(4))

[0. 0. 0. 0.]
[1. 1. 1. 1.]


`np.full(shape,fill_value)` is similar in function to zeros and ones but will fill each element with `fill_value` rather than a 0 or 1.

This is a great constructor for filling arrays with non-standard values such as `np.nan` (which we will discuss below).

In [25]:
print(np.full(6,156))

print(np.full(3,np.nan))

[156 156 156 156 156 156]
[nan nan nan]


`np.tile(A,reps)` will create a single array by repeating an array `A` a number of times specified by `reps`.

In [26]:
print(np.tile(u,5))

[1 2 3 1 2 3 1 2 3 1 2 3 1 2 3]


`np.identity(n)` is a function that creates a square matrix of length n where the central diagonal elements are 1 and all others are 0.

Crossing a matrix by an identity matrix is the equvalent of multiplying by 1.

In [30]:
print(np.identity(3))

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


`np.empty()` will return an array with uninitialized, arbitrary, entries. This is usually faster than `np.zeros()` or `np.ones()`.

The elements are arbitrary, holding no meaning and should be assigned before reading for reproducible behavior.

In [31]:
np.empty(2)

array([2.05833592e-312, 2.33419537e-312])

`np.nan` is a constant value meaning "Not a Number."

NumPy, being an arithmatic package, uses `nan` to represent values that are undefined such as 0/0.

Given that nan occurs when math "doesn't work", python will often return warnings. These are not failures.

In [33]:
print(np.nan)

print(np.log(-1))

print(np.array([0]) / 0)

nan
nan
[nan]


  print(np.log(-1))
  print(np.array([0]) / 0)


Another common need for arrays is to create a list starting from one value and ending at another.

`np.linspace(start,stop,num)` *linearly spaced elements* will create an array with `num` evenly spaced numbers from `start` to `stop`. This is best used when you know exactly how many elements you want but might not know what the start and stop is.

In [34]:
# using linspace with start, stop, and number of elements
print(np.linspace(1,5,5))

# using linspace without number of elements
random_start_val = 30
random_stop_val = 50
print(np.linspace(random_start_val, random_stop_val))

[1. 2. 3. 4. 5.]
[30.         30.40816327 30.81632653 31.2244898  31.63265306 32.04081633
 32.44897959 32.85714286 33.26530612 33.67346939 34.08163265 34.48979592
 34.89795918 35.30612245 35.71428571 36.12244898 36.53061224 36.93877551
 37.34693878 37.75510204 38.16326531 38.57142857 38.97959184 39.3877551
 39.79591837 40.20408163 40.6122449  41.02040816 41.42857143 41.83673469
 42.24489796 42.65306122 43.06122449 43.46938776 43.87755102 44.28571429
 44.69387755 45.10204082 45.51020408 45.91836735 46.32653061 46.73469388
 47.14285714 47.55102041 47.95918367 48.36734694 48.7755102  49.18367347
 49.59183673 50.        ]


`np.arange()` will create an array from `start` to `stop` with set `step` size.

Note: `np.arange` returns integers if the inputs are integers!

In [47]:
print(np.arange(1,6,2))

[1 3 5]


`range(start,stop,step)` will create an object that will produce a range of values from `start`(inclusive) to `stop`(exclusive) with `step`. This is useful in for loops.

In [36]:
print(range(1,9,1))

for i in range(0,10,2):
    print(i)

range(1, 9)
0
2
4
6
8


## Determining Matrix Size

It's very common to need the the number of elements in a list or array when programming. In Python, the words length, size, and shape all have different meanings!

The `len` function in Python is short for "length" and returns the number of items in a list. 

Let's see a quick example using our list from earlier: 

In [37]:
my_list_length = len(my_list)
print(f" The list \n {my_list} \n has {my_list_length} elements")

 The list 
 [1, 2, 3, 4] 
 has 4 elements


The `len` function works the same way for both lists and vectors like `my_vector`, which is represented by a 1D matrix.

The code below should return the same values for `my_list` and `my_vector` because they have the same number of elements:

In [38]:
list_length = len(my_list)
vector_length = len(my_vector)

print(my_list, type(my_list))
print(f"List length: {list_length}\n")

print(my_vector, type(my_vector))
print(f"Vector length: {vector_length}")

[1, 2, 3, 4] <class 'list'>
List length: 4

[1 2 3 4] <class 'numpy.ndarray'>
Vector length: 4


What happens if we use `len` on a nested list? In this case, the function does not care about the contents of the nested lists. It will treat the nested lists like any other data type. 

Here's an example of a list containing three nested lists:

In [39]:
my_nested_list = [[1, 2], [3, 4, 5], [6]]
print(f"Number of elements in nested list: {len(my_nested_list)}")

Number of elements in nested list: 3


Even though there are 6 integers in total contained in the nested list, the `len` function returns 3 because it only counts the nested lists within the list as elements. 

We can show this by printing the types of each element:

In [40]:
for element in my_nested_list:
    print(type(element))

<class 'list'>
<class 'list'>
<class 'list'>


If we want the length of an individual nested list, we can specify which nested list and print its length using the following code:

In [41]:
# Length of the first nested list
print(len(my_nested_list[0]))   

# Length of the second nested list
print(len(my_nested_list[1]))

# Length of the third nested list   
print(len(my_nested_list[2]))   

2
3
1


How will `len()` work with a 2D array?

In [42]:
new_2d_array = np.array([[7, 8, 9], [10, 11, 12]])

print(f"Nested list: \n{my_nested_list} \nlength: {len(my_nested_list)}\n")
print(f"2D array: \n{new_2d_array} \nlength: {len(new_2d_array)}")

Nested list: 
[[1, 2], [3, 4, 5], [6]] 
length: 3

2D array: 
[[ 7  8  9]
 [10 11 12]] 
length: 2


Looks like `len` returns the number of rows in our matrix!

Even though we see six total elements in each, `len(my_nested_list) != len(my_2d_array)` because of the differences in structure.

#### The `shape` and `size` properties
But what if we want the number of columns? Or the total number of elements in our matrix? 

With 2D arrays, we have a couple of properties that return similar information to `len`. We can use `size` to return the total number of elements and `shape` to return the total number of rows and columns (as a tuple).


In [44]:
# Get size and shape of 2D array
elements_in_array = my_2d_array.size
num_rows, num_cols = my_2d_array.shape
print(f" The array \n\n {my_2d_array} \n\n has {elements_in_array} elements ")
print(f" Its shape is: {my_2d_array.shape} ")
print(f" This means that there are {num_rows} rows and {num_cols} columns ")

 The array 

 [[5 6 7]
 [7 0 9]] 

 has 6 elements 
 Its shape is: (2, 3) 
 This means that there are 2 rows and 3 columns 


In [45]:
# Get size and shape of 3D array
elements_in_3d_array = my_3d_array.size
num_rows, num_cols, num_layers = my_3d_array.shape
print(f" The array \n\n {my_3d_array} \n\n has {elements_in_3d_array} elements ")
print(f" Its shape is: {my_3d_array.shape} ")
print(f" This means that there are {num_rows} rows, {num_cols} columns, and {num_layers} layers ")

 The array 

 [[[ 9 10]
  [11 12]]

 [[13 14]
  [ 5 16]]] 

 has 8 elements 
 Its shape is: (2, 2, 2) 
 This means that there are 2 rows, 2 columns, and 2 layers 


To summarize:
- `len`: returns the number of elements in a list/1D array *or* the number of rows in the array if the number of dimensions >= 2
- `shape`: a tuple of numbers representing the size of each dimension
- `size`: the total number of elements