# The NumPy ndarray: A Multidimensional Array Object

1. [Introduction](#introduction)
2. [Creating ndarrays](#creating)
3. [Data Types for ndarrays](#datatypes)
4. [Arithmetic with NumPy Arrays](#arithmetic)
5. [Basic Indexing and Slicing](#indexing)
6. [Boolean Indexing](#boolean)
7. [Fancy Indexing](#fancy)
8. [Transposing Arrays and Swapping Axes](#transposing)

<a name="introduction"></a>
# Introduction

N-dimensional array object (ndarray) is a fast, flexible container for large datasets.

Perform mathematical operations on whole blocks of data with similar syntax to equivalent operations between scalar elements.

`ndarrays` are generic multidimensional containers for homogenous data - **all elements must be same type**

Every array has a `shape` attribute that is a tuple indicating the size of each dimension. They also have a `dtype` attribute that indicates the data type of objects stored in the array.

Quick example:

1. Create an array
2. Multiply everything in the array by 10
3. Add the array to itself
4. Review the `shape` and `dtype` of the array

In [3]:
import numpy as np

myData  = np.array([[1.5, -0.1, 3], [0, -3, 6.5]])
myData

array([[ 1.5, -0.1,  3. ],
       [ 0. , -3. ,  6.5]])

In [4]:
myData * 10

array([[ 15.,  -1.,  30.],
       [  0., -30.,  65.]])

In [5]:
myData + myData

array([[ 3. , -0.2,  6. ],
       [ 0. , -6. , 13. ]])

In [6]:
print(myData.shape)
print(myData.dtype)

(2, 3)
float64


<a name="creating"></a>
# Creating ndarrays

The `array` function is the quickest way to create an ndarray. Any sequence-like object (even other arrays) can be provided as input.

The shape of the resulting array will depend on the input:

In [7]:
# Single list mades a 1-d array
data1 = [6, 7.5, 8, 0, 1]
arr1 = np.array(data1)
print(arr1)
print(arr1.shape)

[6.  7.5 8.  0.  1. ]
(5,)


In [8]:
# Nested sequence will make a 2-d array
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2 = np.array(data2)
print(arr2)
print(arr2.shape)
print(arr2.ndim)

[[1 2 3 4]
 [5 6 7 8]]
(2, 4)
2


## Other array creation functions

In addition to `numpy.array`, there are many other creation functions, such as zeroes, ones, and empty which create arrays of the specified dimension(s) populated with 0s, 1s, or 'garbage' (placeholder) values, respectively.

<img src="./myImages/table4.1_arrayCreation.png" width=500/>

In [9]:
# One-dimensional array
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [10]:
# Two-dimensional array
np.ones((3, 6))

array([[1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.]])

In [11]:
# Three-dimensional array
np.empty((2, 3, 2))

array([[[0., 0.],
        [0., 0.],
        [0., 0.]],

       [[0., 0.],
        [0., 0.],
        [0., 0.]]])

In [12]:
# arange ~ range
np.arange(15)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

<a name="datatypes"></a>
# Data Types for ndarrays

Can set the `dtype` explicitly during array creation. If not specified, numpy will assign to the most reasonable type - `float64` most of the time.

The significance of how arrays handle data types seems more important when dealing with lower-level languages. For now, it's enough to know that every array has only one type.

An array can be cast to a different type using `astype` (as long as it is able to be converted)

`astype` will create a new array (i.e. copy) every time. Even if the types are the same.

You can use an array's type as its own variable in order to cast one array to another's type.  

<img src="./myImages/table4.2_dataTypes.png" width = 500/>

Note that some of the types above are referred to as `signed` or `unsigned`.  

`Signed` integers can be positive or negative, while `unsigned` can only be positive.

Also see that you can use the "type code" instead of the type.

In [13]:
arr1 = np.array([1, 2, 3], dtype=np.float64)
arr2 = np.array([1, 2, 3], dtype=np.int32)
print(arr1.dtype)
print(arr2.dtype)

float64
int32


In [14]:
# Int to float adds the .
arr = np.array([1, 2, 3, 4, 5])
print(arr)
print(arr.dtype)
arr_float = arr.astype(np.float64)
print(arr_float)
print(arr_float.dtype)

[1 2 3 4 5]
int64
[1. 2. 3. 4. 5.]
float64


In [15]:
# Float to int truncates decimal (WITHOUT ROUNDING!!)
arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
print(arr)
arr_int = arr.astype(np.int32)
print(arr)

[ 3.7 -1.2 -2.6  0.5 12.9 10.1]
[ 3.7 -1.2 -2.6  0.5 12.9 10.1]


In [16]:
# Acceptable strings can be converted
numeric_strings = np.array(["1.25", "-9.6", "42"], dtype=np.string_)
print(numeric_strings)
print(numeric_strings.astype(float))

[b'1.25' b'-9.6' b'42']
[ 1.25 -9.6  42.  ]


In [17]:
# Assign an array to another's type
int_array = np.arange(10)
calibers = np.array([.22, .270, .357, .380, .44, .50], dtype=np.float64)
print(int_array)
print(int_array.astype(calibers.dtype))

[0 1 2 3 4 5 6 7 8 9]
[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]


<a name="arithmetic"></a>
# Arithmetic with NumPy Arrays

**Important!!!**

**You can do batch operations without for loops using this**

Called *vectorization* and is also used in R, although I don't think I ever utilized it to the full extent.

1. Any arithmetic operations between equal-sized arrays will apply the operation element-wise  
    - This means that comparisons between equal-size arrays (e.g. >) will result in a same-sized boolean array  
2. Arithmetic ops with scalars will propagate the scalar to each element in the array  
2. Arithmetic ops between differently-sized arrays is more complex. It's called broadcasting and is in Appendix A.  

In [18]:
arr = np.array([[1., 2., 3.], [4., 5., 6.]])
arr

array([[1., 2., 3.],
       [4., 5., 6.]])

In [19]:
arr * arr

array([[ 1.,  4.,  9.],
       [16., 25., 36.]])

In [20]:
arr - arr

array([[0., 0., 0.],
       [0., 0., 0.]])

In [21]:
1 / arr

array([[1.        , 0.5       , 0.33333333],
       [0.25      , 0.2       , 0.16666667]])

In [22]:
arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])
arr2

array([[ 0.,  4.,  1.],
       [ 7.,  2., 12.]])

In [23]:
arr2 > arr

array([[False,  True, False],
       [ True, False,  True]])

<a name="basic"></a>
# Basic Indexing and Slicing

This is complex and also important. There are a lot of ways you might want to subset an array.

## 1-d arrays

These act very similarly to lists - slice them with brackets.

A single element sliced from a 1-d array is a scalar, multi-element slices are just smaller arrays.

Important to note that **no copy** is made here, so if you grab a slice and modify it, any modifications are propagated to the original object.

The `copy()` method must be used to explicitly create and modify a copy without altering the original.

Below is a brief example of broadcasting. If you take a multi-element slice of an array and assign it to a value, that value will be broadcast to all those elements.

In [24]:
arr = np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [25]:
# Slicing
print(arr[5])
print(arr[5:8])

5
[5 6 7]


In [26]:
# Broadcast a value
arr[5:8] = 12
arr

array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])

In [27]:
# same-sized tuple will assign 1-to-1
arr[5:8] = (15, 16, 17)
arr

array([ 0,  1,  2,  3,  4, 15, 16, 17,  8,  9])

In [28]:
# Assign a slice
arr_slice = arr[5:8]
arr_slice

array([15, 16, 17])

In [29]:
# Modify slice
arr_slice[1] = 12345
arr_slice

array([   15, 12345,    17])

In [30]:
# Gets propagated to original
arr

array([    0,     1,     2,     3,     4,    15, 12345,    17,     8,
           9])

In [31]:
# Assign to all values in an array with [:]
arr_slice[:] = 64
arr

array([ 0,  1,  2,  3,  4, 64, 64, 64,  8,  9])

In [32]:
arr[:] = 2
arr

array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

## Multi-d Arrays

In a 2-d array, the element at each index is a one-dimensional array.

Access individual elements with recursive bracketing (just like in R), or using a comma-separated list of indices (not available in R)

Axis 0 ~ Rows

Axis 1 ~ columns

<img src="./myImages/figure4.1_arrayIndexing.png" width = 500/>

If you have a multi-d array and you provide fewer indices than there are total dimensions, it will grab the full lower-d array at that index. i.e. if you use [0] on a 3d array, it will get the first 2d array housed in it (see below).


In [33]:
# A 2-d array - "list" of 3 1-d arrays of length 3
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(arr2d.shape)
arr2d

(3, 3)


array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [34]:
# Giving only 1 index will grab the associated 1-d array:
arr2d[1]

array([4, 5, 6])

In [35]:
# Providing 2 indices will grab the specified element from the specified array
# Two ways to access
print(arr2d[0][2])
print(arr2d[0,2])

3
3


In [36]:
# A 3-d array
    # top level - sequence of length 2, each element is a 2-d array
    # next level - list of length 2, each element is a 1-d array
    # final level - sequence of length 3, each element is a scalar
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
print(arr3d.shape)
arr3d

(2, 2, 3)


array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [37]:
# Using 1 index will grab the associated 2-d array
arr3d[0]

array([[1, 2, 3],
       [4, 5, 6]])

In [38]:
# Using 2 indices will grab the 1-d array within the first 2-d array
print(arr3d[0,1])
arr3d[0][1]

[4 5 6]


array([4, 5, 6])

In [39]:
# Can 'broadcast' lower-dimensional objects (e.g. scalars) to these subsets
oldVals = arr3d[0].copy() # save
print(oldVals)
print(type(oldVals))
print('\n')
arr3d[0] = 42 # broadcast 42 to all values in the 2-d sub-array
print(arr3d)
arr3d[0] = oldVals # assign back
print('\n')
arr3d

[[1 2 3]
 [4 5 6]]
<class 'numpy.ndarray'>


[[[42 42 42]
  [42 42 42]]

 [[ 7  8  9]
  [10 11 12]]]




array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

## Indexing with slices

Using slices follows similar logic as above.

1. A 1-d array will behave just like a 1-d list: array[1:6] will get those elements from the array.
2. A 2-d array has two slice axes available (0 ("rows") and 1 ("columns")) and it behaves how you would expect - first slice is rows, second is columns
    - array[:2] means select the first two rows of the array
    - array[:2, 1:] means select element 1 to the end of the first two rows of the array
3. Also like before, if you just use an integer instead of a slice, you'll get a lower-d object returned.

<img src="./myImages/figure4.2_2darrayslicing.png" width = 500/>

In [40]:
# We saw 1-d array slicing above as well
arr = np.arange(10)
arr[1:6]

array([1, 2, 3, 4, 5])

In [41]:
# Grab the first two 1-d arrays from this 2-d array
print(arr2d)
arr2d[:2]

[[1 2 3]
 [4 5 6]
 [7 8 9]]


array([[1, 2, 3],
       [4, 5, 6]])

In [42]:
# Grab from index 1 to the end of the first two 1-d arrays
arr2d[:2, 1:]

array([[2, 3],
       [5, 6]])

In [43]:
# Grab to index 1 from the last two
arr2d[1:,:2]

array([[4, 5],
       [7, 8]])

In [44]:
# Again, as we saw before, subsetting an axis with an integer returns a lower-dimensional array
# Below selects up to index 1 of the *second* (i.e. index 1) 1-d array
lowerDim_slice = arr2d[1, :2]
print(lowerDim_slice)
lowerDim_slice.shape

[4 5]


(2,)

In [45]:
# Select first two rows of the 3rd column
arr2d[:2, 2]

array([3, 6])

In [46]:
# Colon by itself gets whole axis
# Grab index 0 of all rows:
arr2d[:, :1]

array([[1],
       [4],
       [7]])

In [47]:
# broadcasting 1 value
arr2d[:2, 1:] = 0
arr2d

array([[1, 0, 0],
       [4, 0, 0],
       [7, 8, 9]])

In [48]:
# 2 values
arr2d[:2, 1:] = (1, 2)
arr2d

array([[1, 1, 2],
       [4, 1, 2],
       [7, 8, 9]])

In [49]:
# 3 values - throws error b/c longer!
#arr2d[:2, 1:] = (1, 2, 3)

# 4 values - throws error b/c longer!
#arr2d[:2, 1:] = (1, 2, 3, 4)

# 4 values - have to nest
arr2d[:2, 1:] = [[1, 2], [3, 4]]
arr2d

array([[1, 1, 2],
       [4, 3, 4],
       [7, 8, 9]])

<a name="boolean"></a>
# Boolean Indexing

This seems sort of similar to R, except you don't have to add the `which()` call to get the True and False values.

Basically, in an array, if you add some sort of comparison (`==`, `>`, etc.) to your array, then the result returned will be an array of `True/False` according to the comparison.

Instead of returning the True/False array itself, you can use it as an index to return a subsetted array containing the matching values.

You can also combine slices with boolean values.

A python-specific thing is that you can use `~` to negate a condition.

Also, you CAN'T use `and` and `or` in boolean arrays, you have to use `&` and `|`.

**Selecting data from an array by boolean indexing and assigning the result to a new variable ALWAYS creates a copy of the data**

In [50]:
# 1-d array of names
names = np.array(["Bob", "Joe", "Will", "Bob", "Will", "Joe", "Joe"])
print(names.shape)
names

(7,)


array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'], dtype='<U4')

In [51]:
# 2-d array of values (7 rows, 2 columns)
data = np.array([[4, 7], [0, 2], [-5, 6], [0, 0], [1, 2], [-12, -4], [3, 4]])
print(data.shape)
data

(7, 2)


array([[  4,   7],
       [  0,   2],
       [ -5,   6],
       [  0,   0],
       [  1,   2],
       [-12,  -4],
       [  3,   4]])

In [52]:
# Create a boolean array of the names array where the elements equal "Bob"
names == "Bob"

array([ True, False, False,  True, False, False, False])

In [53]:
# Giving this array to the data 2-d array will grab the associated rows (i.e. index 0 [4,7] and 3 [0,0])
data[names == "Bob"]

array([[4, 7],
       [0, 0]])

In [54]:
# Can save the boolean array also
isBob = names == "Bob"
isBob

array([ True, False, False,  True, False, False, False])

In [55]:
# And then use it to subset
data[isBob]

array([[4, 7],
       [0, 0]])

In [56]:
# Grab index 1 (to the end, but only have 0 and 1 here) of the isBob rows:
sliceBob = data[isBob,1:]
sliceBob

array([[7],
       [0]])

In [57]:
# This example shows a difference between integers and slices that I didn't appreciate before. Might need to go back and update my notes above.
intBob = data[isBob,1]
intBob

array([7, 0])

In [58]:
# Notice how sliceBob and intBob are different!
# Slicing maintains the 2-dimensionality of the original array whereas selecting just an integer makes a 1-d array
print(sliceBob.shape)
print(intBob.shape)

(2, 1)
(2,)


In [59]:
# Show how != and ~ can produce the same results:
notBob1 = names != "Bob"
notBob2 = ~(names == "Bob")
print(notBob1)
print(notBob2)
print('\n')
print(notBob1 == notBob2)

[False  True  True False  True  True  True]
[False  True  True False  True  True  True]


[ True  True  True  True  True  True  True]


In [60]:
print(data[notBob1])
print('\n')
print(data[notBob2])

[[  0   2]
 [ -5   6]
 [  1   2]
 [-12  -4]
 [  3   4]]


[[  0   2]
 [ -5   6]
 [  1   2]
 [-12  -4]
 [  3   4]]


In [61]:
# Can assign conditions to objects and then feed them in (and also use the ~ to invert them)
nameGroup1 = (names == "Bob") | (names == "Will")
nameGroup1

array([ True, False,  True,  True,  True, False, False])

In [62]:
data[nameGroup1]

array([[ 4,  7],
       [-5,  6],
       [ 0,  0],
       [ 1,  2]])

In [63]:
data[~nameGroup1]

array([[  0,   2],
       [-12,  -4],
       [  3,   4]])

In [64]:
# You can use boolean arrays to assign multiple values
# Set all negative values to 0
data[data < 0] = 0
data

array([[4, 7],
       [0, 2],
       [0, 6],
       [0, 0],
       [1, 2],
       [0, 0],
       [3, 4]])

In [65]:
# Set all non-'Joe' rows to 7
notJoe = names != "Joe"
print(notJoe)
data[notJoe] = 7
data

[ True False  True  True  True False False]


array([[7, 7],
       [0, 2],
       [7, 7],
       [7, 7],
       [7, 7],
       [0, 0],
       [3, 4]])

<a name="fancy"></a>
# Fancy Indexing

Fancy indexing describes indexing with integer arrays.

1. Providing a 1-d array as the subset - this will grab the associated rows in the specified order  
2. Providing a 2-d array as the subset - this is a little strange. It will basically map the elements in each array (the one provided as the subsetter) into tuples and then returns a 1-d array of those tuple indices  
3. **Fancy indexing will always COPY the data when you assign to a new variable**  

In [66]:
# Build an 8 x 4 array where the array's value is its index
arr = np.zeros((8, 4))
for i in range(8):
    arr[i] = i
arr

array([[0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.],
       [5., 5., 5., 5.],
       [6., 6., 6., 6.],
       [7., 7., 7., 7.]])

In [67]:
# Select a subset of the rows in a particular order
arr[[4, 3, 0, 6]]

array([[4., 4., 4., 4.],
       [3., 3., 3., 3.],
       [0., 0., 0., 0.],
       [6., 6., 6., 6.]])

In [68]:
# Can also use negative indices (but I don't like this)
arr[[-3, -5, -7]]

array([[5., 5., 5., 5.],
       [3., 3., 3., 3.],
       [1., 1., 1., 1.]])

In [69]:
# New array from 0 to 31
# np.arange(32) makes a 1-d array from 0-31
# rehape((8,4)) coerces it to an 8 x 4 2-d arra
arr = np.arange(32).reshape((8,4))
arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

In [70]:
# Provide two arrays as subset
# Returns a 1-d array
# from arr, this grabs:
    # row index 1, column index 0 -> 4
    # row index 5, column index 3 -> 23
    # row index 7, column index 1 -> 29
    # row index 2, column index 2 -> 10
# Basically goes through and grabs elements using the tuples (1, 0); (5, 3); (7, 1); and (2, 2) and puts them in a 1-d array
arr[[1, 5, 7, 2], [0, 3, 1, 2]]

array([ 4, 23, 29, 10])

In [71]:
# If you want to grab certain rows and then certain columns from those rows, you have to have two separate index calls
arr[[1, 5, 7, 2]][:, [0, 3, 1, 2]]

array([[ 4,  7,  5,  6],
       [20, 23, 21, 22],
       [28, 31, 29, 30],
       [ 8, 11,  9, 10]])

In [72]:
# First bit selects row indices 1, 5, 7, and 2:
sub = arr[[1, 5, 7, 2]]
sub

array([[ 4,  5,  6,  7],
       [20, 21, 22, 23],
       [28, 29, 30, 31],
       [ 8,  9, 10, 11]])

In [73]:
# Second bit selects all the rows (:) and then re-orders the columns to be 0, 3, 1, 2
sub[:, [0, 3, 1, 2]]

array([[ 4,  7,  5,  6],
       [20, 23, 21, 22],
       [28, 31, 29, 30],
       [ 8, 11,  9, 10]])

In [74]:
# Assign the fancy index values from above (1,0) - 4; (5,3) - 23; (7,1) - 29; (2,2) - 10 all to 0
arr[[1, 5, 7, 2], [0, 3, 1, 2]] = 0
arr

array([[ 0,  1,  2,  3],
       [ 0,  5,  6,  7],
       [ 8,  9,  0, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22,  0],
       [24, 25, 26, 27],
       [28,  0, 30, 31]])

<a name="transposing"></a>
# Transposing Arrays and Swapping Axes

Transposing Arrays and Swapping Axes

Transposing arrays just returns a "view" of the array, it doesn't copy anything.

Arrays have a `transpose` method and an attribute `T`

In [75]:
arr = np.arange(15).reshape((3, 5))
arr

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [76]:
arr.transpose()

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

In [None]:
arr.T

I'll have to improve my knowledge of matrix multiplication to take full advantage of this stuff.

The following shows how T can be used to quickly calculate the inner matrix product, which seems to be the [dot product](https://algebra1course.wordpress.com/2013/02/19/3-matrix-operations-dot-products-and-inverses/) of a matrix multiplied by its inverse.

The dot product can be calculated using `numpy.dot` or the `@` "infix" operator

T is actually a special case of the ndarray `swapaxes`, which can swap any two axes

# 5 x 3 array
arr = np.array([[0, 1, 0], [1, 2, -2], [6, 3, 2], [-1, 0, -1], [1, 0, 1]])
arr

In [None]:
# View its inverse
arr.T

In [None]:
# Manually calculate:
# First row, first column: 0*0 + 1*1 + 6*6 + -1*-1 + 1*1 = 39
# First row, second column: 0*1 + 1*2 + 6*3 + -1*0 + 1*0 = 20
# First row, third column: 0*0 + 1*-2 + 6*2 + -1*-1 + 1*1 = 12
# etc.

# Using numpy
np.dot(arr.T, arr)

In [None]:
# Using @
arr.T @ arr

In [None]:
arr.swapaxes(0,1)