# Assignment 25: NumPy Vector Operations #

### Goals for this Assignment ###

By the time you have completed this assignment, you should be able to:

- Use NumPy's vector operations to operate over many array elements at a time, without loops
- Use NumPy's _broadcasting_ with scalar values to effectively treat a single value as an array
- Use vector operations and broadcasting with masking to extract values meeting conditions
- Use `mean`, `min`, and `max` to gather basic statistical information over a data set represented as a NumPy array

## Step 1: Use Arithmetic and Logical Vector Operations in NumPy ##

### Background: Vector Operations in NumPy ###

When working with data sets, we often want to uniformly apply some operation to the data, or at least to some subset of the data.
Through the lens of NumPy, fancy indexing and masking can both be used to get subsets of data.
As for the operations we apply to data, this is where _vector operations_ come in.
The idea is that we can simultaneously apply a given operation to many values in an array, all at once, resulting in a new array.
To see this in action, consider the code in the following cell:

In [1]:
import numpy as np
first = np.array([3, 2, 7, 8])
second = np.array([1, 8, 2, 5])
result = first + second
print(first) # prints [3 2 7 8]
print(second) # prints [1 8 2 5]
print(result) # prints [ 4 10  9 13]

[3 2 7 8]
[1 8 2 5]
[ 4 10  9 13]


To explain the above code, the first three lines import NumPy and create some arrays, as normal.
With the line `result = first + second`, this appears to be adding two entire arrays together.
What `+` means for NumPy arrays is to perform addition on each individual pair of elements, putting the results in some returned output array (named `result` in this code).
Looking at the results of the `print`s, `first` and `second` are unaffected by `+`; these still hold the same values they started with.
As for the values in `result`, `result[0]` is `first[0] + second[0]`, `result[1]` is `first[1] + second[1]`, `result[2]` is `first[2] + second[2]`, and `result[3]` is `first[3] + second[3]`.
That is, each index of `result` corresponds to adding together the values of `first` and `second` at the given index.
Even though `first + second` thus inherently applied `+` for as many times as we had array elements, there are no loops explicitly present in the code; any looping is actually handled by NumPy itself.
Additionally, internally within NumPy's implementation of `+`, specialized hardware-level components are used to make this as fast as possible, well beyond what a normal loop would allow you to do.
In short, we performed a loop-like operation without all the trouble of writing a loop, and we did it quickly, too.

All the usual arithmetic operations can be applied in this fashion, for example:

In [2]:
subtract = first - second
print(subtract) # prints [ 2 -6  5  3]

multiply = first * second
print(multiply) # prints [ 3 16 14 40]

divide = first / second
print(divide) # prints [3.   0.25 3.5  1.6 ]

[ 2 -6  5  3]
[ 3 16 14 40]
[3.   0.25 3.5  1.6 ]


In all three cases above, the left operand of the arithmetic operator is taken from `first`, and the right operand is taken from `second`.

We also can apply numeric relational operators to NumPy arrays, as with:

In [3]:
nums1 = np.array([3, 2, 7, 4, 5])
nums2 = np.array([7, 2, 8, 4, 9])
res = nums1 < nums2
print(res) # prints [ True False  True False  True]
print(nums1 > nums2) # prints [False False False False False]
print(nums1 == nums2) # prints [False  True False  True False]

[ True False  True False  True]
[False False False False False]
[False  True False  True False]


For example, `res[0]` holds the result of `nums1[0] < nums2[0]`; this is `True` since `3 < 7`.
The results of all other cases follow by applying each operator pairwise to its operands.

### Try this Yourself ###

The next cell defines two different arrays, `arr1` and `arr2`.
The comments say which operation is requested for you to perform.
The first one is provided for you as an example.

In [7]:
arr1 = np.array([4, 8, 9, 0, 2, 0, 4])
arr2 = np.arange(7)
print(arr1)
print(arr2)
print() 

# +
print(arr1 + arr2) # should print [ 4  9 11  3  6  5 10]

# -
print() # should print [ 4  7  7 -3 -2 -5 -2]
print(arr1 - arr2)
# *
print() # should print [ 0  8 18  0  8  0 24]
print(arr1 * arr2)
# <
print() # should print [False False False  True  True  True  True]
print(arr1 < arr2)
# <=
print() # should print [False False False  True  True  True  True]
print(arr1 <= arr2)
# >
print() # should print [ True  True  True False False False False]
print(arr1 > arr2)
# >=
print() # should print [ True  True  True False False False False]
print(arr1 >= arr2)

[4 8 9 0 2 0 4]
[0 1 2 3 4 5 6]

[ 4  9 11  3  6  5 10]

[ 4  7  7 -3 -2 -5 -2]

[ 0  8 18  0  8  0 24]

[False False False  True  True  True  True]

[False False False  True  True  True  True]

[ True  True  True False False False False]

[ True  True  True False False False False]


## Step 2: Use Broadcasting to Treat a Scalar as an Array ##

### Background: Dimensionality and Broadcasting ###

When applying operations in this fashion with NumPy arrays, generally both arrays used must have the same size.
For example, the code in the following cell throws an exception:

In [9]:
has_three = np.array([2, 1, 8])
has_four = np.array([9, 3, 5, 1])

print(has_three )
print(has_four)

print(has_three + has_four)

[2 1 8]
[9 3 5 1]


ValueError: operands could not be broadcast together with shapes (3,) (4,) 

The code above specifically throws a `ValueError` exception, saying the operands could not be _broadcast_ together with shapes `(3,)`, `(4,)`.
The _shape_ of an array refers to the number of elements in the array, as well as the array's _dimensionality_.
So far, we have only dealt with so-called _single-dimensional_ data structures, meaning only one index needs to be provided to access any given element.
For example:

- Python lists: accessing `some_list[5]` will get the value at index `5`.
- Python dictionaries: accessing `some_dictionary["foo"]` gets the value for key `"foo"`.  In this case, `"foo"` behaves very similarly to an index.
- NumPy arrays: accessing `some_array[7]` will get the value at index `7`, similar to a Python list.

That all said, we can set up a _multidimensional_ data structure, wherein multiple indices need to be provided to get a particular value.
This is most commonly done by nesting one of the above data structures in another instance of itself, as with:

In [10]:
nested_list = [[0, 1, 2], [5], [12, 4]]
print(nested_list[1][0]) # prints 5
print(nested_list[2][1]) # prints 4
print(nested_list[0][2]) # prints 2

5
4
2


In this case, `nested_list` is said to specifically be a two-dimensional list, because it is a list which itself contains lists of integers.
To access any particular datum, we need to provide two indices, as with `nested_list[1][0]`, hence this is two-dimensional.
We can view this access as involving some implicit parentheses, namely `(nested_list[1])[0]`.
That is, starting from the outermost list (`nested_list`), we access the value at a given index.
In this case though, each value is _itself_ a list.
As such, from there we access individual elements in that returned list, giving us back whatever integer was stored in the innermost list.

In the same fashion as `nested_list` above, NumPy arrays can themselves contain other NumPy arrays, allowing for multidimensional arrays.
In practice, this is useful for repsenting information like a table, where a table can have many rows and columns; for example, we can have an array of rows, where each row is a separate array of values in the columns.
However, in the case of `has_three` and `has_four`, these are only single-dimensional arrays.
The specific shapes `(3,)` and `(4,)` refer to the number of elements in these arrays; these values themselves are actually tuples, but only one element is in the tuple.
If we were dealing with multidimensional arrays, more than one value would be in the tuple for the shape.
For example, the shape `(5, 9)` would refer to a two-dimensional array with five rows and nine columns per row.
Since we are only dealing with single-dimensional arrays here, the `3` and `4` refer only to the number of elements in `has_three` and `has_four`, respectively.

As for the term "broadcast" in the error message above, this is a NumPy-specific term for what is attempted when the arrays do **not** have the same length / dimensionality.
In that case, we have a bit of a problem, as there is at least one index for which we don't have both operands for the given operator.
Broadcasting effectively tries to convert one of the operands into an array of the expected size.
Broadcasting can't always be performed, as not everything can be treated as an array; in the above code, for example, NumPy couldn't apply broadcasting.
However, for our purposes, there is one important case where broadcasting _does_ work: when dealing with a _scalar_ (i.e., non-array) value.
When dealing with scalar values, semantically speaking, a new array is created which has a length of the expected array, and the scalar value is copied into every cell of the new array.
This represents the most common case of broadcasting by far.
The following cell shows an example of this broadcasting of a scalar value, specifically `5`:

In [17]:
#brocasting

arr = np.array([3, 2, 8, 4, 5])
less_than_five_entries = arr < 5
print(less_than_five_entries) # [ True  True False  True False]

[ True  True False  True False]


As shown, `5` is treated as if it is a NumPy array.
This would normally throw an exception, but this is one of the cases where broadcasting applies.
Specifically, NumPy will see the following:

- We are attempting to apply `<` to a valid NumPy array, `arr`
- The righthand operand of `<` is **not** a NumPy array, but rather a scalar (non-array) value
- Semantically speaking, we can turn `5` into a NumPy array by making an array of the same size as `arr`, and copying `5` into each cell of this new array.  In other words, `5` in the code above effectively becomes `np.array([5, 5, 5, 5, 5])`, because there were five elements in `arr`.

I say "semantically speaking", because NumPy may or may not actually create a separate array, but rather treat this specially somehow; these details are abstracted away from us.
We can, however, treat this as if it does create a new NumPy array containing all `5`s.
Looking at the output of `print(less_than_five_entries)`, this ends up printing all the Boolean values resulting from `3 < 5`, `2 < 5`, `8 < 5`, `4 < 5`, and `5 < 5`, corresponding to each of the values in `arr`, in order.

There are other situations in which broadcasting can occur, but they are beyond our scope in this assignment.
If you're curious, you can read more about broadcasting and these other situations in the [official NumPy documentation for broadcasting](https://numpy.org/devdocs/user/basics.broadcasting.html).

### Try this Yourself ###

The next cell defines an array named `arr3`, as well as a bunch of operations combined with a scalar value in the comments.
Similar to the prior step, print out the result of applying each operation and scalar value to `arr3`.
The first one is provided for you as an example.

In [12]:
arr3 = np.array([3, 2, 7, 8])

# + 5
print() # should print [ 8  7 12 13]
print(arr3 + 5) 
# - 2
print() # should print [1 0 5 6]
print(arr3 -2) 
# * 4
print() # should print [12  8 28 32]
print(arr3 *4) 
# / 2
print() # should print [1.5 1.  3.5 4. ]
print(arr3 /2) 
# % 2
print() # should print [1 0 1 0]
print(arr3 %2) 
# > 2
print() # should print [ True False  True  True]
print(arr3 >2) 


[ 8  7 12 13]

[1 0 5 6]

[12  8 28 32]

[1.5 1.  3.5 4. ]

[1 0 1 0]

[ True False  True  True]


## Step 3: Combine with Masking to Extract Values Meeting Conditions ##

### Background: Combining Operations ###

At this point, we have covered:

- Using fancy indexing to access values using another NumPy array (or something iterable), resulting in a NumPy array
- Using masking to access values using another NumPy array (or something iterable), resulting in a NumPy array
- Using broadcasting to implicitly treat scalar values as NumPy arrays
- Using vector operations over NumPy arrays to create new NumPy arrays

We can now start to put all this functionality together to perform higher-level tasks efficiently, with surprisingly minimal code.
In terms of the design of NumPy itself, importantly, all these operations work with NumPy arrays and yield NumPy arrays, meaning we can easily combine these to perform bigger-picture tasks.
For example, consider the `evens` function in the following cell, which can be used to extract all the even numbers in a given input NumPy array `arr`:

In [13]:
def evens(arr):
    return arr[arr % 2 == 0]

print(evens(np.array([3, 7, 2, 8, 9]))) # prints [2 8]
print(evens(np.array([2, 4, 6, 8]))) # prints [2 4 6 8]
print(evens(np.array([3, 5, 7, 9]))) # prints []
print(evens(np.array([]))) # prints []

[2 8]
[2 4 6 8]
[]
[]


To explain the body of `evens`, the `return`ed expression evaluates the following, in order:

- `arr % 2` broadcasts `2`, and ends up returning a NumPy array of the results of each value in `arr`, but modded by `2`.
- `arr % 2 == 0` then takes the array from `arr % 2`, broadcasts `0`, and compares each value to to `0`.  The resulting NumPy array of booleans will hold `True` at each index where the element was `0`, and `False` otherwise.  From a higher-level perspective, this yields a NumPy array of Booleans of the same length as `arr`, where each `True` means the corresponding element in `arr` is even, and each `False` means the corresponding element in `arr` is odd.
- `arr[arr % 2 == 0]` takes the array of Booleans from `arr % 2 == 0`, and uses this as a mask to create a new NumPy array of all the elements in `arr` which have `True` at the corresponding index in `arr % 2 == 0`.  Or, from a high-level perspective, this extracts out all the even elements into a new array.

Comparatively, if we were to define `evens` using Python lists and list comprehensions, this wouldn't be nearly as short, as shown in the cell below.

In [18]:
def evens_list(lst):
    return [e for e in lst if e % 2 == 0]

print(evens_list([3, 7, 2, 8, 9])) # prints [2, 8]
print(evens_list([2, 4, 6, 8])) # prints [2, 4, 6, 8]
print(evens_list([3, 5, 7, 9])) # prints []
print(evens_list([])) # prints []

[2, 8]
[2, 4, 6, 8]
[]
[]


Even though the list comprehension lets us avoid a typical loop, this code is still a bit longer, primarily because of the need to introduce a variable `e` representing each individual element.
With the NumPy version, this is entirely implicit; we never individually access any given element.
Furthermore, the NumPy version is expected to be much faster, because this will internally use vector operations to do practically everything.

### Try this Yourself ###

In the next cell, define a `greater_than_n` function which will take:

- An input NumPy array
- An integer `n`

Your `greater_than_n` function should return a NumPy array containing all values in the input NumPy array which are greater than the second parameter `n`.
Leave the calls to `greater_than_n` in place in order to test your code.

In [19]:
# Define your greater_than_n function here.
# Leave the calls below in place in order to test your code.
def greater_than_n(arr,n):
    return arr[arr > n]
    

print(greater_than_n(np.array([2, 8, 3, 6]), 3)) # should print [8 6]
print(greater_than_n(np.array([8, 4, 6, 1, 9]), 6)) # should print [8 9]
print(greater_than_n(np.array([10, 14, 27, 34]), 10)) # should print [14 27 34]
print(greater_than_n(np.array([]), 0)) # should print []

[8 6]
[8 9]
[14 27 34]
[]


## Step 4: Use `mean`, `min`, and `max` to Compute Basic Statistics ##

### Background: Statistical Operations with NumPy ###

NumPy has a number of statistical operations in place which can be performed over arrays; a full list can be seen [in the official NumPy documentation](https://numpy.org/doc/2.3/reference/routines.statistics.html).
For our purposes, we will only consider three such operations: `mean`, `min`, and `max`.
These perform the operations implied by their name, as demonstrated in the cell below:

In [20]:
arr = np.array([3, 7, 2, 9, 6, 4])
print(np.mean(arr)) # prints 5.166666666666667
print(np.min(arr)) # prints 2
print(np.max(arr)) # prints 9

5.166666666666667
2
9


As before, you can combine these operations to do quite a bit of work with one expression.
For example, the following cell prints the mean of all the numbers in `arr` which are greater than `3`:

In [21]:
print(np.mean(arr[arr > 3]))

6.5


### Try this Yourself ###

In the next cell, define a function named `descriptive_stats`, which will print the smallest, largest, and average number in a given input NumPy array.
This output should be printed as follows:

```
Smallest: MIN
Largest: MAX
Average: MEAN
```

Leave the calls in place for the next cell in order to test your code.

In [24]:
# Define your descriptive_stats function here.
# Leave the calls in place below in order to test your code.
def descriptive_stats(arr):
    print(f'Smallest: {np.min(arr)}')
    print(f'Largest: {np.max(arr)}')
    print(f'Average: {np.mean(arr)}')
    

descriptive_stats(np.array([3, 7, 2, 9, 6, 4]))
# Above statement should print:
# Smallest: 2
# Largest: 9
# Average: 5.166666666666667

print()
descriptive_stats(np.array([3, 8, 1, 9]))
# Above statement should print:
# Smallest: 1
# Largest: 9
# Average: 5.25

Smallest: 2
Largest: 9
Average: 5.166666666666667

Smallest: 1
Largest: 9
Average: 5.25


## Step 5: Submit via Canvas ##

Be sure to **save your work**, then log into [Canvas](https://canvas.csun.edu/).  Go to the COMP 502 course, and click "Assignments" on the left pane.  From there, click "Assignment 25".  From there, you can upload the `25_numpy_vector_operations.ipynb` file.

You can turn in the assignment multiple times, but only the last version you submitted will be graded.

### Special Thanks to Dr. Glenn Bruns ###

Special thanks to [Dr. Glenn Bruns](https://csumb.edu/scd/glenn-bruns/) at California State University, Monterey Bay, for providing me with closely-related materials which were used in the creation of this assignment.