# Tutorial 20: Higher Dimensions in NumPy

## PHYS 2600

In [None]:
# Import cell

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

## T20.1 - Up to higher dimensions

Let's begin working with some higher-dimensional arrays!

### Part A

The following 2-dimensional array contains a set of velocity measurements $(v_x, v_y, v_z)$.  


In [None]:
velo = np.array(
    [
        [2.7, 1.1, -0.4],
        [1.9, 1.9, 3.7],
        [-0.4, 0.0, -0.1],
        [1.5, -1.9, -2.7],
        [3.3, 0.4, 1.5],
        [7.1, -0.9, 1.3],
    ]
)

__Use two-dimensional indexing, masks, and slices to carry out the following operations:__

* Get the third $v_z$ measurement.

In [None]:
#

* Get the last three $v_x$ measurements as an array.

In [None]:
#

* Get an array containing all rows where the speed in the $y$-direction is __negative__.  (There are two such rows.)

_(Hint: this requires a row mask!  Look back at the lecture notes to see how this works...)_

In [None]:
#

### Part B

Use `reshape` and `arange` to make a grid of numbers from 1-100, with 10 numbers in each row - `print` it to see that you got it right!

In [None]:
#

Now, use a Boolean mask to set all __multiples of 2__ (even numbers) to 0.  

_(Hint: remember that `%` or `np.mod()` are what you want to test for multiples.  Remember that a mask has to be a Boolean array, explicitly; if you create an array of 0/1 instead of False/True, you'll get confusing results!)_



In [None]:
#
print(grid)

In [None]:
## You should see all of the even numbers removed from `grid` in the print-out above!
## Here's an automatic test to make sure:

assert np.count_nonzero(np.logical_and(grid > 0, grid % 2 == 0) == 0)

Now do the same to all __multiples of 3__, __multiples of 5__, and __multiples of 7__.

In [None]:
#

print(grid)

You should see from the printed grid that many of the numbers have been zeroed out; the first number remaining after 1 is 11, which is the next prime number in sequence after 2,3,5,7!

In fact, this is the start of a simple and ancient prime-number finding algorithm called the [__Sieve of Eratosthenes__](https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes).  All of the numbers remaining in the grid above are prime numbers, since the first number we could try to cross out after $11$ would be $11^2 = 121$, which is greater than anything we have left.

(We didn't really need a two-dimensional array to apply the Sieve using Python, but if you're doing the algorithm by hand a grid like this is preferred - it makes it easier to spot patterns in the numbers that you're crossing out, e.g. all of the even columns disappeared when you removed multiples of 2.)

## T20.2 - Down to lower dimensions

Now let's do some reduction exercises!

### Part A

Use a mask with `np.sum()` or `np.count_nonzero()` to count how many numbers are greater than 100 in the array below.  You should find __24__ as your answer.

_(Note: Since_ `count_nonzero` _is more specialized, it tends to be faster than `sum`.  Otherwise, they accomplish  exactly the same task on a mask: finding how many `True` values there are.)_

In [None]:
a = np.array(
    [
        [
            32.64173429,
            191.85703207,
            28.07613219,
            93.6758794,
            137.11334151,
            98.49036171,
            16.21920297,
            73.11380953,
            198.45163689,
            95.30971938,
            105.18180757,
            58.69108781,
        ],
        [
            64.34112696,
            120.99784718,
            79.40494167,
            150.56049688,
            75.27558455,
            169.26085959,
            119.46765466,
            5.15243732,
            63.46223024,
            13.83497847,
            76.09363069,
            15.07392756,
        ],
        [
            98.81659688,
            127.66361968,
            25.18028895,
            1.80201359,
            41.11615739,
            22.39522975,
            76.92570925,
            45.42721332,
            9.30257269,
            103.4348637,
            159.5006543,
            52.82242142,
        ],
        [
            115.98655928,
            146.78847103,
            126.29624116,
            35.91681471,
            26.33847396,
            38.53213651,
            111.75034001,
            161.22160872,
            174.9432789,
            158.26168031,
            143.29047058,
            155.76391194,
        ],
        [
            31.40801211,
            62.97866379,
            95.74896593,
            22.0901773,
            58.7692522,
            90.20673554,
            183.78983523,
            198.26328673,
            137.14519089,
            194.93600581,
            71.14449439,
            65.56166131,
        ],
    ]
)

In [None]:
#

In the last part, we ignored the two-dimensional structure of the array, and just asked about individual numbers.  Now let's try some more structured questions: 

1. __How many rows in `a` have an average greater than 100?__  (There should be 2.)
2. __How many columns in `a` have an average greater than 100?__  (There should be 5.)

_(Hint: use `np.mean()` to take the average, and the `axis=` keyword to do it over rows or columns.  The array `a` has 5 rows and 12 columns, so if you're not sure which axis is which, you can look at the size of the array of averages produced by `np.mean()`.)_

In [None]:
# 

### Part B

Going back to the `velo` array from problem 1: use an array reduction to find the __total velocity vector__, 
$$
\vec{v} = \sum_i \vec{v}_i = \sum_i (v_{x,i}, v_{y,i}, v_{z,i}).
$$

Save your answer to the variable `v_tot`.  (Your result should be an array of length 3; I've given you the correct answer as a test case.)

In [None]:
# v_tot = 
print(v_tot)

In [None]:
import numpy.testing as npt

npt.assert_allclose(v_tot, np.array([16.1, 0.6, 3.3]), atol=0.1)

Next, get an array from `velo` containing all rows where _all_ components of $v$ are __positive__.

_(Hint: this requires using the `np.all` reduction to create a mask, and then selecting from `velo` using that mask.)_

In [None]:
# 

One more exercise with `velo`: get an array containing all rows where the __speed__ $v = \sqrt{v_x^2 + v_y^2 + v_z^2}$ is greater than 3.0 m/s.  There should be four such rows.

_(Hint: once again, you're creating a mask.  Since the formula for speed includes a sum, using `np.sum()` with the appropriate axis is recommended here. Another option is to use `np.linalg.norm`.)_

In [None]:
# speed = 

## T20.3 - Rolling the dice


Let's go back to random numbers to see another application of higher-dimensional arrays.  Consider the random distribution of results obtains by rolling pairs of six-sided dice and taking the sum:

<a href="https://math.stackexchange.com/questions/1204396/why-is-the-sum-of-the-rolls-of-two-dices-a-binomial-distribution-what-is-define" target="_blank"><img src="https://raw.githubusercontent.com/wlough/CU-Phys2600-Fall2025/main/lectures/img/dice_histogram.png" width=400px /></a>

Even though the distribution of numbers 1-6 for a single die is uniform, we end up with a triangular-shaped distribution for the sum.  As shown in the plot above, this is simple combinatorics: there are many ways to make 7 by adding two dice, but only one way to make 2 or 12.  The distribution should be peaked at 7 and symmetric on both sides.

### Part A

Let's reproduce this result in Python - representing pairs of dice rolls is a good use case for a two-dimensional array!  Using `np.random.randint()`, __create an array called `rolls`__ containing 10,000 random pairs of integers between 1 and 6 (inclusive.)  Then use `np.sum` to __create an array called `sums`__ which contains the sum of each pair of integers.  (The `sums` array will give us the distribution of fair rolls for a pair of six-sided dice.) 

_(Hint: `rolls` should have 2 rows and 10,000 columns, which you can get directly from `np.random.randint` with the `size=...` keyword.  Giving size a tuple of numbers will create a higher-dimensional array of that shape.  To create sums, you'll need the `axis=...` keyword, since just calling `np.sum(rolls)` will add everything up and leave you with one number!)_



In [None]:
# rolls = np.random.randint(...)
# sums = np.sum(...)

In [None]:
assert rolls.shape == (2, 10000)
assert sums.shape == (10000,)

Now we'd like to plot the distribution of `sums`. Your first instinct is probably to use `plt.hist()`, but it turns out that you have to be a little careful with how the histogram bins work, since our dice rolls are all exactly integers.  So, I'm going to ask you to do the same plot _two ways:_

1. __Use the function `np.unique()` with argument `return_counts=True`__: this will return a tuple of 2 arrays - the first is the unique values in `sums` (which is every number from 2 to 12) and the second is the number of counts of each value.  You can plug these two arrays into `plt.plot` - I suggest using a marker like `marker='s'` so you can see the discrete points being plotted.

2. __Use `plt.hist` on `sums` directly__, but make sure you pick the right number of bins (specified by the `bins=...` argument.)  You should have the same number of bins as possible outcomes!

Plot these both in the same cell: they should agree with each other, and they should both look like the ideal triangular distribution shown above, but common mistakes with `plt.hist` will show up plainly if it disagrees with the `np.unique()` approach.


In [None]:
# 

### Part B

A common task with random numbers is __random sampling__, where we have some list of values and we want to choose sets of them randomly.  We could use `randint` to index the list, but this problem is so common that NumPy gives us a function to do it, `np.random.choice()`.

Create a list of possible outcomes for rolling one die, and then __use `np.random.choice` with the `size=` keyword argument__ to repeat what you did in part A, i.e. drawing 10,000 random rolls of a pair of dice and then creating the array of their sums.  __Copy/paste one of your plotting routines from above__ to visualize the result and make sure you're getting the same distribution.

In [None]:
# 

### Part C

What does the distribution look like for rolling __three__ six-sided dice at once?  __Ten__ dice?  (You'll need to be careful with how you adjust your bins to get a nice-looking distribution - especially for the case of ten dice, because the odds of actually rolling ten 1s at the same time are very low...)

In [None]:
# 

### Part D (optional challenge)

Certain board or role-playing games sometimes call for the use of a 100-sided die (also known as a "d100".)  If you've ever seen a d100, you know they're large and somewhat impractical!  The largest common dice that are used for games are 20-sided dice ("d20".)

To replace a d100, you could instead roll __five 20-sided dice__ ("5d20".)  Simulate the results - how different would this distribution be from a uniform distribution from 1-100, which the d100 provides?  What if you rolled __two 10-sided dice__, multiplied the result of one of them by 10, and added them together - is that a better approximation to a single d100?

In [None]:
# 