# Chapter 10: Arrays & Dataframes

While [Chapter 7 <img height="12" style="display: inline-block" src="../static/link/to_nb.png">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/develop/07_sequences/00_content.ipynb) and [Chapter 8 <img height="12" style="display: inline-block" src="../static/link/to_nb.png">](https://nbviewer.jupyter.org/github/webartifex/intro-to-python/blob/develop/07_sequences/00_content.ipynb)

A Python list, on the other hand, has a pointer to a contiguous buffer of pointers, each of which points to a Python object

In [None]:
numbers_list = [7, 11, 8, 5, 3, 12, 2, 6, 9, 10, 1, 4]

In [None]:
numbers_list

## The `array` Type

[array() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/array.html#array.array) from the [array <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/array.html) module in the [standard library <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/index.html)

homogenous

item instead of element

|  Slot   |  0  |  1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |  10 |  11 |
|  :--:   |:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|**Value**| `7` | `11`| `8` | `5` | `3` | `12`| `2` | `6` | `9` | `10`| `1` | `4` |

| Code |  Slot  |    0    |    1    |    2    |    3    |    4    |    5    |    6    |    7    |    8    |    9    |    10   |    11   |
| :--: |  :--:  |  :---:  |  :---:  |  :---:  |  :---:  |  :---:  |  :---:  |  :---:  |  :---:  |  :---:  |  :---:  |  :---:  |  :---:  |
| `"b"`|**Bits**|`0b_0111`|`0b_1011`|`0b_1000`|`0b_0101`|`0b_0011`|`0b_1100`|`0b_0010`|`0b_0110`|`0b_1001`|`0b_1010`|`0b_0001`|`0b_0100`|

[PythonTutor <img height="12" style="display: inline-block" src="../static/link/to_py.png">](http://pythontutor.com/visualize.html#code=from%20array%20import%20array%0Anumbers_list%20%3D%20%5B7,%2011,%208,%205,%203,%2012,%202,%206,%209,%2010,%201,%204%5D%0Anumbers_array%20%3D%20array%28%22b%22,%20numbers_list%29&cumulative=false&curInstr=3&heapPrimitives=nevernest&mode=display&origin=opt-frontend.js&py=3&rawInputLstJSON=%5B%5D&textReferences=false)

In [None]:
from array import array

`"b"` because all numbers between `0` and `255`

In [None]:
numbers_array = array("b", numbers_list)

In [None]:
id(numbers_array)

In [None]:
type(numbers_array)

In [None]:
numbers_array

In [None]:
numbers_array.typecode

[typecodes <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/array.html#array.typecodes)

In [None]:
from array import typecodes

In [None]:
list(typecodes)

In [None]:
numbers_array.itemsize  # in bytes

### Memory Savings

[randint() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/random.html#random.randint) function from the [random <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/random.html) module in the [standard library <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/index.html)

In [None]:
import random

In [None]:
random.seed(42)

In [None]:
a_million_list = [random.randint(42, 87) for _ in range(1_000_000)]

In [None]:
a_million_list[:10]

In [None]:
a_million_list[-10:]

In [None]:
a_million_tuple = tuple(a_million_list)

In [None]:
a_million_dict = dict(enumerate(a_million_list))

In [None]:
a_million_array = array("b", a_million_list)

[getsizeof() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/sys.html#sys.getsizeof) function from the [sys <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/sys.html) module in the [standard library <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/index.html)

In [None]:
import sys

Sometimes, $1$ Megabyte is $2^{20} = 1,048,576$ Bytes (cf., [source <img height="12" style="display: inline-block" src="../static/link/to_wiki.png">](https://en.wikipedia.org/wiki/Megabyte))

$1,000,000$

In [None]:
sys.getsizeof(a_million_list) / 1_000_000  # in MB

In [None]:
sys.getsizeof(a_million_tuple) / 1_000_000  # in MB

In [None]:
sys.getsizeof(a_million_dict) / 1_000_000  # in MB

In [None]:
sys.getsizeof(a_million_array) / 1_000_000  # in MB

### Arrays are like Mutable Sequences

In [None]:
numbers_array

In [None]:
len(numbers_array)

In [None]:
for number in numbers_array:
    print(number, end=" ")

In [None]:
for number in reversed(numbers_array):
    print(number, end=" ")

In [None]:
number

In [None]:
id(number)

In [None]:
type(number)

In [None]:
0 in numbers_array

In [None]:
1 in numbers_array

In [None]:
2.0 in numbers_array

In [None]:
numbers_array[0]

In [None]:
last = numbers_array[-1]

In [None]:
last

In [None]:
id(last)

In [None]:
type(last)

In [None]:
numbers_array[:6]

In [None]:
numbers_array[::2]

In [None]:
numbers_array[::-1]

In [None]:
numbers_array_copy = numbers_array[:]

In [None]:
numbers_array_copy

In [None]:
numbers_array[0] = 420

loud failure

In [None]:
numbers_array[0] = 42.87

In [None]:
numbers_array[0] = 42

In [None]:
numbers_array

In [None]:
numbers_array[:] = [40, 41, 42]

changes len

In [None]:
numbers_array[:] = array("b", range(6))

In [None]:
numbers_array

In [None]:
del numbers_array[0]

In [None]:
numbers_array

In [None]:
id(numbers_array)  # same memory location as before

In [None]:
numbers_array_copy

In [None]:
numbers_array[:] = numbers_array_copy  # restore original numbers

#### Standardized Interface

[shuffle() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/random.html#random.shuffle) function in the [random <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/random.html) module

explain what is an interface / "protocol"

In [None]:
numbers_array

In [None]:
random.seed(42)

In [None]:
random.shuffle(numbers_array)

In [None]:
numbers_array

In [None]:
numbers_array[:] = numbers_array_copy  # restore original numbers

In [None]:
numbers_array

In [None]:
numbers_list

In [None]:
random.seed(42)

In [None]:
random.shuffle(numbers_list)

In [None]:
numbers_list

In [None]:
numbers_list = list(numbers_array)  # restore original numbers

In [None]:
numbers_list

In [None]:
numbers_tuple = tuple(numbers_list)

In [None]:
numbers_tuple

In [None]:
random.seed(42)

In [None]:
random.shuffle(numbers_tuple)

#### Sequence Methods

`Sequence` [index() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/array.html#array.array.index) [count() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/array.html#array.array.count)

In [None]:
numbers_array

In [None]:
numbers_array.index(10)

no `start`

In [None]:
a_million_array.index(42)

In [None]:
numbers_array.count(7)

In [None]:
numbers_array.count(13)

In [None]:
a_million_array.count(42)

`MutableSequence` [append() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/array.html#array.array.append) [extend() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/array.html#array.array.extend) [insert() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/array.html#array.array.insert) [reverse() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/array.html#array.array.reverse) [pop() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/array.html#array.array.pop) [remove() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/array.html#array.array.remove)

In [None]:
numbers_array

In [None]:
numbers_array.append(42)

In [None]:
numbers_array

In [None]:
numbers_array.extend([87, 99])

In [None]:
numbers_array

In [None]:
numbers_array.insert(0, -1)

In [None]:
numbers_array

In [None]:
numbers_array.sort()

In [None]:
sorted(numbers_array)

In [None]:
array(numbers_array.typecode, sorted(numbers_array))

In [None]:
numbers_array

In [None]:
numbers_array[:] = array(numbers_array.typecode, sorted(numbers_array))

In [None]:
numbers_array

[reverse() <img height="12" style="display: inline-block" src="../static/link/to_py.png">](https://docs.python.org/3/library/array.html#array.array.reverse)

In [None]:
numbers_array.reverse()

In [None]:
numbers_array

In [None]:
removed = numbers_array.pop()

In [None]:
removed

In [None]:
numbers_array

In [None]:
removed = numbers_array.pop(0)

In [None]:
removed

In [None]:
numbers_array

In [None]:
numbers_array.remove(87)

In [None]:
numbers_array.remove(87)

In [None]:
numbers_array

In [None]:
numbers_array[:] = numbers_array_copy  # restore original numbers

In [None]:
numbers_array

`array`-specific methods to save to and load from disk efficiently.

`array` objects are *not* efficient if we need to change the size.

[bug tracker](https://bugs.python.org/issue25737)

In [None]:
import collections.abc as abc

In [None]:
isinstance(numbers_array, abc.Sequence)

### The `memoryview` Type

In [None]:
view_into_numbers_array = memoryview(numbers_array)

In [None]:
id(view_into_numbers_array)

In [None]:
type(view_into_numbers_array)

In [None]:
view_into_numbers_array

In [None]:
numbers_array

In [None]:
numbers_array[0]

In [None]:
view_into_numbers_array[0]

In [None]:
view_into_numbers_array[0] = 42

In [None]:
numbers_array

In [None]:
numbers_array[:] = numbers_array_copy  # restore original numbers

In [None]:
numbers_array

In [None]:
isinstance(view_into_numbers_array, abc.Sequence)

In [None]:
len(view_into_numbers_array)

In [None]:
for number in view_into_numbers_array:
    print(number, end=" ")

In [None]:
for number in reversed(view_into_numbers_array):
    print(number, end=" ")

In [None]:
0 in view_into_numbers_array

In [None]:
1 in view_into_numbers_array

In [None]:
2.0 in view_into_numbers_array

view is like meta data

In [None]:
view_into_numbers_array.obj

In [None]:
view_into_numbers_array.contiguous

In [None]:
view_into_numbers_array.itemsize  # in bytes

In [None]:
view_into_numbers_array.nbytes  # overall size of the data excluding the meta information

In [None]:
view_into_numbers_array.strides  # bytes to jump over to go from item to item

In [None]:
view_into_numbers_array.hex()

In [None]:
view_into_numbers_array.tobytes()

In [None]:
view_into_numbers_array.tolist()

In [None]:
for byte in view_into_numbers_array.tolist():
    print(byte, "->", hex(byte), sep="\t")

In [None]:
every_other_number = view_into_numbers_array[::2]

In [None]:
every_other_number

In [None]:
type(every_other_number)

In [None]:
len(every_other_number)

In [None]:
for number in every_other_number:
    print(number, end=" ")

In [None]:
every_other_number.obj

In [None]:
every_other_number.contiguous

In [None]:
every_other_number.itemsize  # in bytes

In [None]:
every_other_number.nbytes  # overall size of the data excluding the meta information

In [None]:
every_other_number.strides  # bytes to jump over to go from item to item

In [None]:
every_other_number[:] = array("b", range(101, 107))

In [None]:
numbers_array

In [None]:
numbers_array[:] = numbers_array_copy  # restore original numbers

In [None]:
numbers_array

## The `ndarray` Type

In [None]:
import numpy as np

In [None]:
np.version.version

In [None]:
numbers_ndarray = np.array(numbers_list)

In [None]:
id(numbers_ndarray)

In [None]:
type(numbers_ndarray)

In [None]:
numbers_ndarray

In [None]:
numbers_ndarray.data

In [None]:
type(numbers_ndarray.data)

In [None]:
numbers_ndarray.size  # = number of items

In [None]:
numbers_ndarray.itemsize  # in bytes

In [None]:
numbers_ndarray.dtype  # 64 bits => 8 bytes

In [None]:
numbers_ndarray.nbytes  # overall size of the data excluding the meta information

In [None]:
numbers_ndarray.strides  # bytes to jump over to go from item to item

owndata

In [None]:
numbers_ndarray.flags

### Memory Savings (continued)

In [None]:
sys.getsizeof(a_million_list) / 1_000_000  # in MB

In [None]:
sys.getsizeof(a_million_tuple) / 1_000_000  # in MB

In [None]:
sys.getsizeof(a_million_dict) / 1_000_000  # in MB

In [None]:
sys.getsizeof(a_million_array) / 1_000_000  # in MB

In [None]:
a_million_ndarray = np.array(a_million_list)

In [None]:
sys.getsizeof(a_million_ndarray) / 1_000_000  # in MB

In [None]:
a_million_ndarray.itemsize  # in bytes

In [None]:
a_million_ndarray.dtype

In [None]:
a_million_ndarray = np.array(a_million_list, dtype=np.int8)

In [None]:
sys.getsizeof(a_million_ndarray) / 1_000_000  # in MB

In [None]:
a_million_ndarray.itemsize  # in bytes

In [None]:
a_million_ndarray.dtype

### nd-Arrays are similar to Mutable Sequences

In [None]:
numbers_ndarray

In [None]:
len(numbers_ndarray)

In [None]:
for number in numbers_ndarray:
    print(number, end=" ")

In [None]:
for number in reversed(numbers_ndarray):
    print(number, end=" ")

 reminder

In [None]:
number

In [None]:
id(number)

In [None]:
type(number)

In [None]:
0 in numbers_ndarray

In [None]:
1 in numbers_ndarray

In [None]:
2.0 in numbers_ndarray

In [None]:
numbers_ndarray[0]

In [None]:
last = numbers_ndarray[-1]

In [None]:
last

In [None]:
id(last)

In [None]:
type(last)

In [None]:
numbers_ndarray[:6]

In [None]:
numbers_ndarray[::2]

In [None]:
numbers_ndarray[::-1]

In [None]:
numbers_ndarray_copy = numbers_ndarray[:]

In [None]:
numbers_ndarray_copy

In [None]:
numbers_ndarray[0] = 420

In [None]:
numbers_ndarray

type coercion, silent

In [None]:
numbers_ndarray[1] = 42.87

In [None]:
numbers_ndarray

would work for `list`, here size does not change

In [None]:
numbers_ndarray[:] = [40, 41, 42]

keeps len

In [None]:
numbers_ndarray[:] = list(range(12, 0, -1))

In [None]:
numbers_ndarray

In [None]:
numbers_ndarray[:] = range(12)  # must be of same length

In [None]:
numbers_ndarray

worked above

In [None]:
del numbers_ndarray[0]

In [None]:
numbers_ndarray

In [None]:
id(numbers_ndarray)  # same memory location as before

changed as well -> view

In [None]:
numbers_ndarray_copy

In [None]:
numbers_ndarray_copy.flags

In [None]:
numbers_ndarray[:] = numbers_list  # restore original numbers

In [None]:
numbers_ndarray_copy

#### Standardized Interface (continued)

In [None]:
numbers_ndarray

In [None]:
random.seed(42)

In [None]:
random.shuffle(numbers_ndarray)

In [None]:
numbers_ndarray

In [None]:
numbers_ndarray[:] = numbers_array  # restore original numbers

In [None]:
numbers_ndarray

#### No Sequence Methods

`Sequence` `index()` `count()`

`MutableSequence` `append()` `extend()` `insert()` `reverse()` `pop()` `remove()`


### Constructors

what is a constructor, include built-ins

different docstring format

In [None]:
np.array?

In [None]:
y = np.array([1, 2.0, 3])

In [None]:
y

In [None]:
y.dtype

In [None]:
y = np.array([1, 2.0, 3 + 0j])

In [None]:
y

In [None]:
y.dtype

In [None]:
y = np.array([1, 2.0, 3], dtype=np.int)

In [None]:
y

In [None]:
y.dtype

`np.bool` -> `True`/`False`

`np.byte` -> `np.int8` (most likely on every platform)

`np.int` (default) `np.int8` `np.int16` `np.int32` `np.int64`

`np.float` (default) `np.float16` `np.float32` `np.float64` `np.float128`

`np.complex` (default) `np.complex64` `np.complex128` `np.complex256`


In [None]:
type(np.int(42))

In [None]:
type(np.int8(42))

In [None]:
np.zeros(3)

In [None]:
np.zeros(3, dtype=np.int)

copies shape and dtype

In [None]:
np.zeros_like(y)

np.ones() and np.ones_like()

empty does not mean nan

In [None]:
np.empty(3)

In [None]:
np.empty(3, dtype=np.int)

In [None]:
np.empty_like(y)

constant vector

In [None]:
np.full(3, 42)

In [None]:
np.full_like(y, 42)

force dtype float

In [None]:
np.full(3, np.nan)

In [None]:
np.full(3, np.inf)

In [None]:
np.full(3, -np.inf)

In [None]:
np.arange(10)

In [None]:
np.arange(1, 11)

In [None]:
np.arange(1, 10, 3)

In [None]:
np.arange(99, 0, -11)

In [None]:
np.linspace(-10, 10, num=20)

In [None]:
np.linspace(-10, 10, num=21)

In [None]:
np.random.seed(42)

continuous uniform

In [None]:
np.random.random(10)

discrete uniform

In [None]:
np.random.randint(42, 87, size=10)

sample with replacement

In [None]:
np.random.choice(numbers_list, size=10)

sample without replacement

In [None]:
np.random.choice(numbers_list, size=10, replace=False)

`ValueError`

In [None]:
np.random.choice(numbers_list, size=20, replace=False)

https://docs.scipy.org/doc/numpy-1.16.0/reference/generated/numpy.random.binomial.html#numpy.random.binomial

https://docs.scipy.org/doc/numpy-1.16.0/reference/generated/numpy.random.exponential.html#numpy.random.exponential

https://docs.scipy.org/doc/numpy-1.16.0/reference/generated/numpy.random.poisson.html#numpy.random.poisson

https://docs.scipy.org/doc/numpy-1.16.0/reference/generated/numpy.random.standard_normal.html#numpy.random.standard_normal

[save()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.save.html)
[load()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.load.html)#
platform independent

np.save("a_million_numbers.npy", a_million_ndarray)

In [None]:
a_million_numbers = np.load("a_million_numbers.npy")

In [None]:
a_million_numbers

In [None]:
a_million_numbers.size

In [None]:
a_million_numbers.dtype

[fromfile()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.fromfile.html) function or [tofile()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.tofile.html) method  no dtype no shape info stored

a_million_ndarray.tofile("a_million_numbers.bin")

In [None]:
a_million_numbers = np.fromfile("a_million_numbers.bin")

In [None]:
a_million_numbers

In [None]:
a_million_numbers.size

In [None]:
a_million_numbers.dtype

In [None]:
a_million_numbers = np.fromfile("a_million_numbers.bin", dtype=np.int8)

In [None]:
a_million_numbers

In [None]:
a_million_numbers.size

In [None]:
a_million_numbers.dtype

[loadtxt()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html)
[genfromtxt()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html)  missing numbers
[savetxt()](https://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html)

slow

np.savetxt("a_million_numbers.csv", X=a_million_ndarray, fmt="%d")

In [None]:
a_million_numbers = np.loadtxt("a_million_numbers.csv")

In [None]:
a_million_numbers

In [None]:
a_million_numbers.size

In [None]:
a_million_numbers.dtype

In [None]:
a_million_numbers = np.loadtxt("a_million_numbers.csv", dtype=np.int8)

In [None]:
a_million_numbers

In [None]:
a_million_numbers.size

In [None]:
a_million_numbers.dtype

excel with pandas

### Dimensionality

In [None]:
numbers_ndarray

In [None]:
numbers_ndarray.size

In [None]:
numbers_ndarray.strides  # bytes to jump over to go from item to item

In [None]:
numbers_ndarray.ndim

always `tuple`, analogous to strides, no need for ndim really


In [None]:
numbers_ndarray.shape  # same format as .strides

 axis

In [None]:
numbers_matrix = np.array([[7, 11, 8], [5, 3, 12], [2, 6, 9], [10, 1, 4]])

In [None]:
numbers_matrix

In [None]:
numbers_ndarray.reshape(4, 3)  # vs tuple size must stay the same

In [None]:
numbers_ndarray.reshape(4, 4)

In [None]:
numbers_matrix.size

In [None]:
numbers_matrix.strides  # bytes to jump over to go from item to item, row-wise or column-wise

In [None]:
numbers_matrix.ndim

In [None]:
numbers_matrix.shape  # product always equals the .size  =>  4 * 3 == 12

explain C-contiguous

In [None]:
numbers_matrix.flags

 good for intermediate results

In [None]:
np.load("a_million_matrix.npy")

only raw bits

In [None]:
np.fromfile("a_million_matrix.bin")

In [None]:
np.fromfile("a_million_matrix.bin", dtype=np.int8).reshape(1000, 1000)

In [None]:
!md5sum a_million_*.bin

In [None]:
np.loadtxt("a_million_matrix.csv", delimiter=",")

In [None]:
np.loadtxt("a_million_matrix.csv", delimiter=",", dtype=np.int8)

#### Higher Dimensional Arrays are like "Nested Sequences"

In [None]:
numbers_matrix

In [None]:
len(numbers_matrix)

In [None]:
numbers_matrix[0]

In [None]:
len(numbers_matrix[0])

In [None]:
numbers_matrix.shape

so, a matrix is viewed as a sequence of rows

In [None]:
for row in numbers_matrix:
    for item in row:
        print(item, end=" ")

In [None]:
for item in numbers_matrix.flat:
    print(item, end=" ")

In [None]:
for row in reversed(numbers_matrix):
    for item in row:
        print(item, end=" ")

In [None]:
for row in reversed(numbers_matrix):
    for item in reversed(row):
        print(item, end=" ")

In [None]:
0 in numbers_matrix

In [None]:
1 in numbers_matrix

In [None]:
2.0 in numbers_matrix

#### Beyond two Dimensions

vectors neither rows nor columns

In [None]:
numbers_ndarray

In [None]:
numbers_ndarray.shape

 interpret vector as a row in matrix notation

In [None]:
numbers_row_vector = numbers_ndarray[np.newaxis, :]

In [None]:
numbers_row_vector

In [None]:
numbers_row_vector.shape

In [None]:
numbers_ndarray.reshape(1, 12)

interpret vector as a column in matrix notation

In [None]:
numbers_col_vector = numbers_ndarray[:, np.newaxis]

In [None]:
numbers_col_vector

In [None]:
numbers_col_vector.shape

In [None]:
numbers_ndarray.reshape(12, 1)

a 3D array can be viewed as a sequence of identically shaped matrixes, or a "cube" of numbers

In [None]:
three_matrices = a_million_ndarray[:36].reshape(3, 4, 3)

In [None]:
three_matrices

a 4D array can be viewed as a sequence of sequences with the same length containing identically shaped matrixes, or two identically shaped "cubes" of numbers

In [None]:
groups_of_three_matrices = a_million_ndarray[:72].reshape(2, 3, 4, 3)

In [None]:
groups_of_three_matrices

#### Multi-Dimensional Indexing & Slicing

In [None]:
numbers_matrix

In [None]:
numbers_matrix[0]

In [None]:
numbers_matrix[0, 0]

In [None]:
numbers_matrix[0][0]  # do not do this

same as

In [None]:
(numbers_matrix[0])[0]  # do not do this

first column or last row, whenever we select index, we drop a dimension

In [None]:
numbers_matrix

In [None]:
numbers_matrix[:, 0] 

In [None]:
numbers_matrix[-1, :]  # ":" not needed => be explicit about the resulting dimension

In [None]:
numbers_matrix[:3, 0]

In [None]:
numbers_matrix[-1, 1:]

In [None]:
numbers_matrix

In [None]:
numbers_matrix[:2, :2]

In [None]:
numbers_matrix[-2:, -2:]

In [None]:
numbers_matrix[::2, ::2]

 of course still mutable

In [None]:
numbers_matrix[-1, -1] = 87.42  # decimals are truncated

In [None]:
numbers_matrix

In [None]:
numbers_matrix[1] = range(40, 43)

In [None]:
numbers_matrix

In [None]:
numbers_matrix[::2, ::2] = np.arange(101, 105).reshape(2, 2)

In [None]:
numbers_matrix

broadcasting see below

In [None]:
numbers_matrix[::2, ::2] = 99

In [None]:
numbers_matrix

In [None]:
numbers_matrix[:] = numbers_ndarray.reshape(4, 3)  # restore original numbers

In [None]:
numbers_matrix

3D example

In [None]:
three_matrices

In [None]:
three_matrices[0]

last column of first matrix

In [None]:
three_matrices[0, :, -1]

last column of every matrix

In [None]:
three_matrices[:, :, -1]

In [None]:
np.array(three_matrices[:, :, -1].flat)

In [None]:
three_matrices[..., -1]

4D

In [None]:
groups_of_three_matrices

In [None]:
groups_of_three_matrices[0, :, :, -1]

In [None]:
groups_of_three_matrices[0, ..., -1]

In [None]:
groups_of_three_matrices[:, :, :, -1]

In [None]:
groups_of_three_matrices[..., -1]

#### Fancy Indexing

generalizes normal indexing

In [None]:
random_vector = a_million_ndarray[:12]

In [None]:
random_vector

In [None]:
selector = [0, -3, 0, 1]

In [None]:
random_vector[selector]

generalization

In [None]:
np.random.seed(42)

In [None]:
ndselector = np.random.randint(-random_vector.size, random_vector.size, size=(2, 2))

In [None]:
ndselector

In [None]:
random_vector[ndselector]

two index arrays

In [None]:
random_matrix = a_million_ndarray[:12].reshape(4, 3)

In [None]:
random_matrix

In [None]:
row_selector = [0, -3, 0, 1]
col_selector = [-1, 0, 1, 2]

In [None]:
random_matrix[row_selector, col_selector]

np.array does not work with gen exp -> needs to know length

same as above but slower due to object creation

In [None]:
np.array([random_matrix[row, col] for row, col in zip(row_selector, col_selector)])

In [None]:
np.random.seed(42)

In [None]:
n_rows, n_cols = random_matrix.shape

In [None]:
row_ndselector = np.random.randint(-n_rows, n_rows, size=(2, 2))

In [None]:
row_ndselector

In [None]:
col_ndselector = np.random.randint(-n_cols, n_cols, size=(2, 2))

In [None]:
col_ndselector

In [None]:
random_matrix

In [None]:
random_matrix[row_ndselector, col_ndselector]

In [None]:
three_matrices

In [None]:
three_matrices[:, row_ndselector, col_ndselector]

### Generic `ndarray` Methods

sum() vs np.sum() vs numbers_ndarray.sum()

In [None]:
numbers_ndarray # .fill  .copy()  astype()

In [None]:
numbers_ndarray.view()

In [None]:
numbers_ndarray.flags

In [None]:
numbers_ndarray[:].flags

In [None]:
numbers_ndarray[:].copy().flags

In [None]:
numbers_ndarray.copy

In [None]:
np.sin  np.cos  np.exp  np.log

In [None]:
matrix.ravel()  # makes view if possible

In [None]:
matrix.flatten() # makes copy

### Reduction Methods

apply on whole array, unless an axis is specified

sum prod min, max, argmin, argmax (np.unravel_index(a.argmin(), a.shape) ptp

nansum, etc -> ignonre nans

mean std var

any all

as methods or functions (creates arrays)

[ufunc](https://docs.scipy.org/doc/numpy/reference/ufuncs.html)  [list of ufuncs](https://docs.scipy.org/doc/numpy/reference/ufuncs.html#available-ufuncs)

np.where

In [None]:
type(np.sin)

In [None]:
np.sin??

### Operations

In [None]:
numbers_ndarray + 100

broadcasting
- prepend ones to lower dimensional array's shape
- dimensions of size one are repeated (without copying)

everything is elementwise

np.nan propagate

np.inf

In [None]:
np.broadcast_arrays?

#### Masking

In [None]:
numbers_ndarray < 10

In [None]:
~mask

combine with | and &

fillna

### Linear Algebra

In [None]:
matrix.transpose()

In [None]:
matrix.T  # more convenient

In [None]:
another_matrix = np.arange(12).reshape((4, 3))

In [None]:
another_matrix

operator overloading

broadcasting

+ *

## The `Series` Type

In [None]:
import pandas as pd

In [None]:
numbers_series = pd.Series(numbers_list)

In [None]:
id(numbers_series)

In [None]:
type(numbers_series)

In [None]:
numbers_series

## The `DataFrame` Type

In [None]:
orders = pd.read_csv("orders.csv")

In [None]:
orders.head()

## Further Resources

In [None]:
from IPython.display import YouTubeVideo
YouTubeVideo("ZB7BZMhfPgk", width="60%")

In his [SciPy Japan 2019 tutorial](https://www.youtube.com/watch?v=cYugp9IN1-Q) on "*Advanced Numpy*," Juan Nunuz-Iglesias gives some more background on how views on arrays in memory work.

In [None]:
YouTubeVideo("cYugp9IN1-Q", width="60%")

In his [PyCon 2015 talk](https://www.youtube.com/watch?v=EEUXKG97YRw) titled "*Loosing your Loops*" [Jake VanderPlas](http://vanderplas.com/), an astronomer and former [sklearn](https://scikit-learn.org) core developer, explains why `for`-loops are "slow" and how they can be replaced in [numpy](https://numpy.org/). A written out version of this talk can be found in [this blog post](https://jakevdp.github.io/blog/2014/05/09/why-python-is-slow/).

In [None]:
YouTubeVideo("EEUXKG97YRw", width="60%")

In [None]:
YouTubeVideo("5rNu16O3YNE", width="60%")