# Announcements

* HW 9 is assigned, due Thu Nov 6th.
* Final project details are coming soon - we'll be spending a significant amount of time at the end of the semester working on them, replacing tutorials for the last few weeks
* I'll continue to give short lectures on a couple of bonus topics after we transition to final projects.  Error correcting codes and symbolic algebra are on my to-do list.  Other suggestions are welcome!

# Higher-Dimensional Arrays in NumPy

<a href="https://cms.cern/sites/cmsexperiment.web.cern.ch/files/styles/large/public/field/image/2011-05-30-hiY1.jpg?itok=Jsr3Yp5N" target="_blank"><img src="https://raw.githubusercontent.com/wlough/CU-Phys2600-Fall2025/main/lectures/img/lhc-heavy-ion.jpg" width=600px /></a>

## PHYS 2600: Scientific Computing

## Lecture 20

In [None]:
%matplotlib inline

import matplotlib.pyplot as plt
import numpy as np

## n-dimensional arrays

In math and science, _multi-dimensional_ data are common, and `numpy` is designed to easily handle it!  

Let's start in $d=2$, where the most familiar sort of array is the __table__.  For example, suppose we have some angle $\theta$ and angular speed $\omega$ measurements for a swinging pendulum:

t   | theta | omega |
----|-------|-------
0   | 0.79  | 0.
0.2 | 0.71  | -0.74
0.4 | 0.50  | -1.35
0.6 | 0.19  | -1.69
0.8 | -0.16 | -1.70
1.0 | -0.47 | -1.39


We can enter this as data as a `numpy` array:

In [None]:
pend_data = np.array(
    [
        [0.0, 0.79, 0.0],
        [0.2, 0.71, -0.74],
        [0.4, 0.50, -1.35],
        [0.6, 0.19, -1.69],
        [0.8, -0.16, -1.70],
        [1.0, -0.47, -1.39],
    ]
)

This is entered as a list of lists, and we can access the _rows_ of the table using a single index:

In [None]:
pend_data[0]  # First row, all columns

But now (unlike lists!) we can access individual array entries with a _two-dimensional index_:

In [None]:
pend_data[2, 1]  # Third row, second column

We can even access the _columns_ directly, using slice notation!  For example, a list of all $\omega$ measurements is the slice taking column `2` from all rows:

In [None]:
print(pend_data[:, 2])  # All rows, third column ("omega")
print(pend_data[0:3, 0:2])  # First three rows, first two columns (just t and theta)

Generalized indexing and masks also work in higher dimensions.  For example, suppose we want every number which is greater than zero:

In [None]:
print(pend_data > 0)
print(pend_data[pend_data > 0])

Of course, this is sort of a weird mash-up of t, $\theta$, and $\omega$ values.  Notice that we _lose the original array structure_ when using a 2-d mask like this - it wouldn't make much sense to keep it since we no longer have uniform rows and columns after masking!

A more sensible use of a mask would be to get all rows where theta is positive:

In [None]:
theta_pos_mask = pend_data[:, 1] > 0
print(theta_pos_mask)  # note: 1-d array!
pend_data[theta_pos_mask, :]  # Get all rows where mask is True

Look carefully at how this works!  Ordinarily, a mask contains `True` or `False` for every entry in an array.  Here, `theta_pos_mask` has a `True` or `False` for _every row_, instead.

The final statement `pend_data[theta_pos_mask,:]` then should be read as "get every row where the mask is `True`, and every column."

Since we've been talking about memory and arrays, it should be noted that _any_ array is just a big chunk of contiguous memory.  "Dimensionality" is a useful fiction that lets us impose some extra structure!

For example, if we want to store a set of three 2-vectors (x,y), we could just store them end-to-end in a flat length-6 array.  Or, we can put them in a 3x2 array:

In [None]:
u, v, w = (3, -1), (0, 2), (-2, -4)

a1 = np.array([u[0], u[1], v[0], v[1], w[0], w[1]])
print(a1)
a2 = np.array([u, v, w])
print(a2)

Both arrays require the same amount of memory, but the dimensional structure makes it easier and clearer to work with `a2` compared to `a1`.

As a technical note, NumPy arrays are stored in __row-major__ format, which means `a2` looks exactly like `a1` in memory.  This can have performance implications - row operations are preferable to column operations - but in this class, we won't use any arrays large enough for the difference to matter.

For lists, the `len()` command was very useful, to check how many things are in a list variable.  For a two-dimensional array, the equivalent is `.shape`, which shows the length of _each dimension_:

In [None]:
print(a2)
print(a2.shape)

Note that like `.dtype`, `.shape` is a __property__ of the array itself, accessed with dot notation.  (A property is like a variable within a variable - it's associated with each array and we can access it like any other Python variable.)

Sometimes, it's useful to reorganize a NumPy array into a different shape - we can do this with the `.reshape()` method:

In [None]:
a2.reshape((1, 6))  # 1 row, 6 columns.  Note: same number of entries as original 2x3!

## Reductions

A __reduction__ is a many-to-one (or at least many-to-few) map: it compacts the information in a collection down to a single value or small set of values.

The most important reduction is probably `np.sum()`, which does what it says on the label, and the closely related `np.mean()`:

In [None]:
import numpy as np

print(np.sum(np.arange(100)))  # prints 100 * 99 / 2 = 4950
print(np.mean(np.arange(100)))  # = sum / 100

Less common, but still worth knowing about, are `np.prod()` (product of all array elements), and `np.cumsum()`, for a _cumulative_ or _running_ sum along the whole array.  (The cumulative sum isn't technically a reduction, but it's still useful!)

In [None]:
print(np.cumsum(np.arange(10)))

Reductions can also be used on _part_ of an array, decreasing the number of dimensions:

In [None]:
z = np.arange(1, 9).reshape(2, 4)
print(z)
print(np.mean(z))
print(np.mean(z, axis=0))
print(np.mean(z, axis=1))

<img src="https://raw.githubusercontent.com/wlough/CU-Phys2600-Fall2025/main/lectures/img/numpy-2d.png" width=400px style="float:right;" />

In this case we need to give `np.mean()` the `axis=` keyword argument to tell it to sum _along an axis_ (reduction to 1-d) instead of _over all entries_ (reduction to 0-d.)  Note that axis `0` is the row-number axis, so using sum/mean with `axis=0` __collapses axis 0__ and just leaves us with _column_ averages.

Another pair of important reductions that we've met before are `np.any()` and `np.all()` - remember that they return `True` or `False` if any (or all) of the entries in an array satisfy some condition.

In [None]:
z = np.array([[0, 1, 2], [-3, -4, 5], [-6, -9, -12]])
print(z)

print(np.any(z < 0))  # z contains any negatives?
print(np.any(z[0] < 0))  # Row 0 contains any negatives?
print(np.all(z < 0, axis=1))  # Which rows are all negative?
print(z[0])

The new feature this time is the `axis=` keyword, which again collapses the given axis down with the reduction we asked for (either `any` or `all`).  Here we collapse the _columns_, and we're left with a mask we can use on the _rows_:

In [None]:
row_mask = np.all(z < 0, axis=1)
z[row_mask, :]

The other familiar example of a two-dimensional array is a __matrix__.  Numpy has an entire submodule, `numpy.linalg`, which has all the common routines (trace, determinant, eigenvalues) as well as matrix equation solvers and other goodies. We'll talk more about linear algebra in another lecture.

<img src="https://raw.githubusercontent.com/wlough/CU-Phys2600-Fall2025/main/lectures/img/3d-matrix.jpg" width=500px />

`numpy` also allows us to use _higher-dimensional arrays_, 3-d and higher.  These show up rarely!  If you were measuring the temperature $T$ of some three-dimensional volume, you might think of $T(x,y,z,t)$ as a 4d array.  But you can use a 2d one just as well: your lab notebook from that experiment would probably be written as a table of rows $(x,y,z,t,T)$.

The only >2d array you're likely to encounter is a 3d one, and then usually only for making (some) three-dimensional plots and movies. 

(Note that movies are 3d arrays: an image is a 2d array, so a movie is a set of images that you display one after the other.)  

So I won't spend much time on them, except to tell you that the notation for indexing is just like 2d arrays but with more commas:

In [None]:
xyz = np.arange(8).reshape(2, 2, 2)
print(xyz)
print(xyz[0, :, :])
print(np.sum(xyz, axis=0))

<img src="https://raw.githubusercontent.com/wlough/CU-Phys2600-Fall2025/main/lectures/img/numpy-3d.png" width=400px style="float:left;margin:20px"/>

This adds a third axis, "axis 2", which we can slice and sum/reduce over as we saw before.

## Tutorial 20

Open up `tut20` and let's get started!  We'll do some new plots once you're comfortable with higher-dimensional arrays.