# Assignment 23: Slicing and NumPy Array Basics #

### Goals for this Assignment ###

By the time you have completed this assignment, you should be able to:

- Use _slicing_ to extract parts of a list into another list
- Use negative indices to access list elements from the end of a list
- Create arrays of different types with NumPy
- Use slicing on NumPy arrays, and understand that these slices return _views_ instead of new arrays

## Step 1: Use Slicing to Make List Copies ##

### Background: List Slicing in Python ###

In Python, `for...in` can be used to iterate over the contents of a whole list.
This works fine for operating over an entire list, but sometimes we only want to work with part of a list.
For example, from assignment 15 step 3, you needed to implement a `product_then_sum` function, which would compute the product of the first two elements of a list, and then add this product to the sum of the remaining elements of the list.
In that case, the intention was to use destructuring to isolate all elements after the first two elements, as with:

In [1]:
some_list = [5, 2, 8, 4, 7]
a, b, *rest = some_list
print(a) # prints 5
print(b) # prints 2
print(rest) # prints [8, 4, 7]

5
2
[8, 4, 7]


As shown, the code above ends up putting all elements after the second element into the list `rest`.
This demonstrates that destructuring can be used to isolate different parts of a list into a separate list.

That all said, destructuring is limited in its capacity to do this sort of isolation.
Say instead we want to define a `product_n_then_sum` function, which:

- Takes a list of numbers
- Takes some non-negative number `n`

Given these parameters, `product_n_then_sum` should find the product of the first `n` elements, and then add this product to the sum of the remaining list elements.
To be clear, the `product_n_then_sum` function is a generalization of the `product_then_sum` function, where the number of elements to get the product of is now a parameter, whereas this number is hard-coded to `2` for `product_then_sum`.
Phrased a different way, given `product_n_then_sum`, we could trivially implement `product_then_sum` as follows:

```python
def product_then_sum(input_list):
    return product_n_then_sum(input_list, 2)
```

This generalization with `product_n_then_sum` is problematic with destructuring, because destructuring requires us to know ahead of time exactly how many elements we want to extract in this manner.
For example, with:

```python
a, b, *rest = some_list
```

...the code's structure itself tells us we are bunching all the list elements from index `2` onwards into `rest`, as we destructure `a` and `b` first.
Similarly, with:

```python
x, y, z, *rest = some_other_list
```

...this shows we are cutting out the first three list elements (into `x`, `y`, and `z`, respectively), and grouping everything else into `rest`.

Importantly, because the number of list elements we extract here is determined by how many variables and commas we provide, this is not suitable for situations where we don't know how many elements we want at the moment when we write the code.
Or, phrased another way, while we _can_ write out `n` variables and commas for any `n`, we need to know the specific value of `n` to do so.
However, the whole point of parameterizing `n` with `product_n_then_sum` is that this `n` is no longer fixed, so destructuring won't work for this purpose.

One way to solve this issue is by introducing a `while` loop and manually working through the list indices.
However, there is another Python feature we have yet to see which can be of assistance here: _slicing_.
The idea with slicing is that we can create a copy of a list, wherein we specify ranges of indices to copy over into the new list.
This can be done with square brackets (`[]`) and colon `:`.
An example is shown in the next cell, which implements a version of `product_n_then_sum`:

In [2]:
def product_n_then_sum(input_list, n):
    for_product = input_list[:n]
    for_sum = input_list[n:]
    product_value = 1
    for product_element in for_product:
        product_value *= product_element
    sum_value = 0
    for sum_element in for_sum:
        sum_value += sum_element
    return product_value + sum_value

print(product_n_then_sum([3, 2], 2)) # prints 6
print(product_n_then_sum([3, 2, 3], 2)) # prints 9
print(product_n_then_sum([3, 2, 3], 3)) # prints 18
print(product_n_then_sum([4, 3, 1, 1, 1], 2)) # prints 15
print(product_n_then_sum([4, 3, 1, 1, 1], 3)) # prints 14

6
9
18
15
14


To explain the above code a bit, the notation `input_list[:n]` copies over all elements in `input_list` _up to but not including_ index `n` into a new list.
The notation `input_list[n:]` copies all elements in `input_list` _starting at_ index `n` into a new list.
In this case, this means that `input_list[:n]` gives you a list of the first `n` elements, and `input_list[n:]` gives you a list of all subsequent elements.

You can optionally have a number both before and after the colon (`:`), as with:

In [3]:
example_list = ["foo", "bar", "baz", "blah", "moo", "cow", "bull"]

print(example_list[2:3]) # prints ['baz']
print(example_list[1:5]) # prints ['bar', 'baz', 'blah', 'moo']
print(example_list[3:6]) # prints ['blah', 'moo', 'cow']
print(example_list[3:7]) # prints ['blah', 'moo', 'cow', 'bull']

print(example_list[6:6]) # prints []
print(example_list[6:5]) # prints []

['baz']
['bar', 'baz', 'blah', 'moo']
['blah', 'moo', 'cow']
['blah', 'moo', 'cow', 'bull']
[]
[]


As demonstrated in the prior cell, the number before the colon indicates an _inclusive_ index into the list.
This is where list elements will start being taken from for the result list.
The number after the colon indicates an _exclusive_ index into the list.
That is, the last value taken from the list will be whatever this value after the colon is, minus one.
This is why for `example_list[3:6]`, the last list element you see is `'cow'`, corresponding to index `5` (`6 - 1`) in `example_list`.
Similarly, `example_list[3:7]` ended up including `'bull'`, which was at index `6` (`7 - 1`) in `example_list`.

The number before the colon defaults to `0`, meaning that the two slice expressions in the next cell are equivalent:

In [4]:
print(example_list[:2]) # prints ['foo', 'bar']
print(example_list[0:2]) # prints ['foo', 'bar']

['foo', 'bar']
['foo', 'bar']


This default of `0` is why `for_product` in the cell defining the `product_n_then_sum` function used `input_list[:n]`; `for_product` specifically was for the starting elements of the list (everything before index `n`).

The number after the colon defaults to the length of the list, meaning that the following two slice expressions are also equivalent:

In [5]:
print(example_list[3:]) # prints ['blah', 'moo', 'cow', 'bull']
print(example_list[3:len(example_list)]) # prints ['blah', 'moo', 'cow', 'bull']

['blah', 'moo', 'cow', 'bull']
['blah', 'moo', 'cow', 'bull']


Because of these defaults, it is very easy to make a copy of a list by using `[:]`, as with:

In [6]:
copied_list = example_list[:]
print(copied_list) # prints ['foo', 'bar', 'baz', 'blah', 'moo', 'cow', 'bull']

copied_list[0] = "apple"
print(example_list) # prints ['foo', 'bar', 'baz', 'blah', 'moo', 'cow', 'bull']
print(copied_list) # prints ['apple', 'bar', 'baz', 'blah', 'moo', 'cow', 'bull']

['foo', 'bar', 'baz', 'blah', 'moo', 'cow', 'bull']
['foo', 'bar', 'baz', 'blah', 'moo', 'cow', 'bull']
['apple', 'bar', 'baz', 'blah', 'moo', 'cow', 'bull']


As shown, slicing creates a _copy_ of the elements in the original list.
As a fair warning, later in this assignment we will see that a syntactically identical operation involving NumPy arrays will _not_ produce a copy, so care needs to be taken when slicing to ensure that it's on exactly what you think it is.

### Try this Yourself ###

The comments in the next cell ask you to construct and print various sublists from `example_list`, using slicing.
The first one is done for you.
Write out the remaining ones.

In [27]:
# redefined from before so you don't need to keep scrolling back and forth
example_list = ["foo", "bar", "baz", "blah", "moo", "cow", "bull"]

# wanted: ['bar', 'baz', 'blah']
print(example_list[1:4])

# wanted: ['foo', 'bar', 'baz', 'blah']

print(example_list[:4])

# wanted: ['moo']
print(example_list[4:5])

# wanted: ['cow', 'bull']
print(example_list[5:7])

# wanted: ['baz', 'blah']
print(example_list[2:3])

# wanted: ['cow']
print(example_list[5:6])


['bar', 'baz', 'blah']
['foo', 'bar', 'baz', 'blah']
['moo']
['cow', 'bull']
['baz']
['cow']


## Step 2: Use Negative Indices for List Access and Slicing ##

### Background: Negative List Indices in Python ###

In Python, we can also use negative integers to index into a list.
Negative integers start from the last element of the list, and go on from there.
This is illustrated in the cell below:

In [7]:
some_list = ["apple", "pear", "banana"]
print(some_list[-1]) # prints "banana"
print(some_list[-2]) # prints "pear"
print(some_list[-3]) # prints "apple"

banana
pear
apple


With negative indexing in mind, each element in a list can be individually accessed (or modified) using one of two possible indices:

- A non-negative integer, starting from the first element in the list, and moving towards the right.  This integer is zero-indexed.  In other words, the first element is at index `0`, the second at index `1`, the third at index `2`, and so on.
- A negative integer, starting from the last element in the list, and moving towards the left.  This integer is _one-indexed_, as the first element from the right is at index `-1`, and then the second element from the right is at index `-2`, and so on.

We can even use negative indices for slicing, as shown in the following cell:

In [8]:
# redefined from before so you don't need to keep scrolling back and forth
example_list = ["foo", "bar", "baz", "blah", "moo", "cow", "bull"]

print(example_list[-3:]) # prints ['moo', 'cow', 'bull']
print(example_list[:-3]) # prints ['foo', 'bar', 'baz', 'blah']

['moo', 'cow', 'bull']
['foo', 'bar', 'baz', 'blah']


In the first line above, with `example_list[-3:]`, this says to start at index `-3`, and go until the end of the list.
`'moo'` is at index `-3`, therefore this starts at `'moo'` and proceeds to the end of the list.
In the second line above, with `example_list[:-3]`, this says to instead _end_ at index `-3` exclusively, meaning the last element copied into the result list will be at `-3 - 1 = -4`.
Sure enough, the result list starts at `0`, and the last element was `'blah'`, which was in the original `example_list` at index `-4`.

> As an aside, Python's behavior regarding negative list indices is fairly unique to Python, and you should not expect the same behavior from another language.
> In C and C++, negative indices access elements to the left of index `0`; because of other features in C and C++, such accesses oddly can have well-defined meaning in some circumstances.
> In Java (and in most other languages), it is a runtime error to access a list with a negative index.

### Try this Yourself ###

In the next cell, access the requested elements of `example_list` using negative indices, or slice `example_list` with negative indecies.
Some are done for you as examples.
Be sure to use `print` to display the results.

In [40]:
# redefined from before so you don't need to keep scrolling back and forth
example_list = ["foo", "bar", "baz", "blah", "moo", "cow", "bull"]

# access "bull" with a negative index
print(example_list[-1])

# access "bar" with a negative index
print(example_list[-6])

# access "blah" with a negative index
print(example_list[-4])

# using slicing with a negative index, get the list ["bull"]
print(example_list[-1:])


# using slicing with a negative index, get the list ["cow", "bull"]
print(example_list[-2:])

bull
bar
blah
['bull']
['cow', 'bull']


## Step 3: Create NumPy Arrays ##

### Background: NumPy Arrays ###

[NumPy](https://numpy.org/) is a Python library for scientific computing applications.
For our purposes, the most important component of NumPy are its _arrays_.
NumPy arrays are similar to Python lists, with the following notable differences:

- The size of a array is fixed at creation time.  Whereas lists allow us to add and remove elements after the list is created (e.g., with the `append` method), arrays do not allow for such resizing.  With arrays, would would instead have to create a whole new array containing whatever element one wants.
- The type of values put into an array is fixed at creation time.  While Python lists can contain elements with mixed types (e.g., `["foo", 3, True]`), NumPy arrays disallow this.  (Technically you _can_ make this work with NumPy, but the result is probably counterintuitively not what you want.)
- You have more fine-grained control over the size (in memory) of the array elements, and how they are represented.  For example, you can select the number of bits which will be used for integers, and whether or not they can be negative.  In contrast, normal Python integers are arbitrary precision (meaning they can take up an arbitrarily large amount of memory for a big enough integer), and can always be negative.
- NumPy arrays tend to make matrix-level operations easy, or at least easier than with lists.
- (Perhaps most importantly) many common operations, particularly arithmetic operations, are **much** faster with NumPy arrays, with performance improvements typically in the 2x - 100x faster range, commonly with values in the 50x range.  This is because NumPy takes advantage of special hardware available on most platforms (specifically [vector operations](https://en.wikipedia.org/wiki/Vector_processor)).  Even typical C/C++ programs don't use this hardware (though they can).  Given that data sets are commonly very large, and pure Python is not particularly good for performance, NumPy offers a good solution here.

The cell below imports NumPy, creates an array of integers based on the integers in an input Python list, and prints out the values in that array.

In [9]:
import numpy as np

example_array = np.array([8, 3, 4, 5])

print(example_array[0]) # prints 8
print(example_array[1]) # prints 3
print(example_array[2]) # prints 4
print(example_array[3]) # prints 5

print(example_array[-1]) # prints 5
print(example_array[-2]) # prints 4
print(example_array[-3]) # prints 3
print(example_array[-4]) # prints 8

print(example_array.dtype) # prints int64

8
3
4
5
5
4
3
8
int64


The following is a line-by-line explanation of this code:

1. `import numpy as np`: loads in the NumPy library and makes it accessible.  By convention, the NumPy library should be imported as `np`, as performed in this line.
2. `example_array = np.array([8, 3, 4, 5])`: creates a new NumPy array, holding the values `8`, `3`, `4`, and `5`, in that order.  The array is then bound to the new variable `example_array`.  The type of the array elements is automatically determined by NumPy (more on that in a bit).
3. `print(example_array[0])`, followed by all the `print`s grouped in the same block: prints out the values in `example_array` at non-negative indices.  As shown, NumPy arrays can be accessed via non-negative indices in the same manner as Python lists.
4. `print(example_array[-1])`, followed by all the `print`s grouped in the same block: prints out the values in `example_array` at negative indices.  These behave the same as with negative indices in Python arrays.
5. `print(example_array.dtype)`: prints the type of the array elements.  In this case, this prints `int64`, meaning each array element is represented as a _signed 64-bit_ integer.  By "signed", we mean the value could be negative.  By 64-bit, we mean each integer occupies 64 bits (64 zeros or ones) in memory.

When the `array` function is used, the type of the array elements is automatically determined based on the actual elements passed.
In this case, NumPy picked a much larger type than strictly necessary.
To understand why, the largest value in the array, `8`, can be represented with 4 bits; exactly why this is is beyond our scope.
Additionally, our input list does not contain any negative values, and therefore we don't need a representation capable of storing negative values; we can instead pick an _unsigned_ representation.

If we want to control the type of the array elements, we can pass the `dtype` keyword parameter when creating the array, like so:

In [10]:
# redeclaration of example_array with uint8 as the element type
example_array = np.array([8, 3, 4, 5], dtype=np.uint8)

print(example_array[0]) # prints 8
print(example_array[1]) # prints 3
print(example_array[2]) # prints 4
print(example_array[3]) # prints 5

print(example_array.dtype) # prints uint8

8
3
4
5
uint8


In this case, `np.uint8` corresponds to an unsigned 8-bit integer representation.
There are a wide variety of types possible; see [this page of the official NumPy documentation](https://numpy.org/doc/1.25/user/basics.types.html) for more information.
For our purposes, you will likely not need to pass `dtype`; the defaults tend to work, and err on the side of being larger than what is necessary.
This means your code will still probably work without passing `dtype`, but it may consume more memory than desired, especially if you start dealing with very large data sets (millions of rows).
In contrast, picking a type that is too small or otherwise not right can break your code, as illustrated in the following cell:

In [11]:
example_array[0] = -1

OverflowError: Python integer -1 out of bounds for uint8

Since `uint8` cannot hold negative integers, the code throws an exception if we attempt to assign a negative integer into the array.

Another common way to make NumPy arrays is with the `arange` method, like so:

In [41]:
another_example = np.arange(10)
print(another_example[0]) # prints 0
print(another_example[9]) # prints 9
print(len(another_example)) # prints 10
print(another_example) # prints [0 1 2 3 4 5 6 7 8 9]

0
9
10
[0 1 2 3 4 5 6 7 8 9]


As shown, `arange` creates an array containing `0` until the parameter minus one.
This effectively does the same thing as `list(range(10))`, only the result is a NumPy array instead of a list.

### Try this Yourself ###

The next cell accesses various NumPy arrays at different positions, and `print`s the results.
Create arrays that will cause all the `print`s to display the values requested in the comments.

In [45]:
# define the integers array below
import numpy as np

integers = np.array([8, 3, 4, 5])

print(integers[0]) # should print 8
print(integers[1]) # should print 3


8
3


In [48]:
# define the booleans array below

booleans =  np.array([True,False,False])

print(booleans[0]) # should print True
print(booleans[1]) # should print False
print(booleans[2]) # should print False

True
False
False


## Step 4: Use Slicing to Create NumPy Array Views ##

### Background: NumPy Array Views ###

Like lists, we can use slicing to get subportions of a NumPy array, as shown in the cell below:

In [12]:
first = np.array([3, 2, 8, 9, 4])
second = first[1:3]
print(first) # prints [3 2 8 9 4]
print(second) # prints [2 8]

[3 2 8 9 4]
[2 8]


We can similarly use negative indices for slicing.
All this said, unlike with lists, slicing NumPy arrays does **not** yield a copy of the sliced portion, but rather a _view_ of the sliced portion.
With a view, we really have a window into the original array itself, as opposed to a copy of part of the original array.
We can see this if we modify `second`, as done in the following cell:

In [13]:
second[0] = 42
print(second) # prints [42  8]
print(first) # prints [ 3 42  8  9  4]

[42  8]
[ 3 42  8  9  4]


As expected, after running `second[0] = 42`, the element at `second[0]` is now `42`.
However, the element at `first[1]` is now _also_ `42`.
This is because `second` is a view into `first`, and `second[0]` actually directly refers to the same place as `first[1]`.
As such, if we change `second`, we will change `first`, and vice-versa; a change to `first` is similarly propagated to `second`, shown below:

In [14]:
first[2] = 57
print(first) # prints [ 3 42 57  9  4]
print(second) # prints [42 57]

[ 3 42 57  9  4]
[42 57]


This behavior of slicing returning views can lead to nasty surprises when modifying array elements.
However, on the flip side, this means that slicing is practically free over NumPy arrays, making it very cheap to access subportions of arrays.
Lists, in contrast, require copying over all the values in a slice, whether or not the slice will be modified.

When it comes to the sorts of large datasets that NumPy was designed to handle, this view-based behavior tends to make more sense.
After all, if desired, one can always use the `array` method to copy the values in another array-like value, including a view, like so:

In [15]:
first = np.array([3, 2, 8, 9, 4])
second = np.array(first[1:3])
print(first) # prints [3 2 8 9 4]
print(second) # prints [2 8]

second[0] = 42
print(first) # prints [3 2 8 9 4]
print(second) # prints [42  8]

[3 2 8 9 4]
[2 8]
[3 2 8 9 4]
[42  8]


In the above cell, the modification with `second[0] = 42` did not lead to any change in `first`.
This is because while `first[1:3]` did return a view over `first`, the subsequent call to `np.array` in the second line copied over all the values in the view into a new array `second`.
This effectively broke the connection to `first`, and therefore `first` was not modified when `second` was modified.

### Try this Yourself ###

The comments in the next cell ask you to construct and print various views from `example_array`, using slicing.
The first one is done for you.
Write out the remaining ones.

In [59]:
example_array = np.arange(20)

# wanted: [0 1 2 3 4 5 6 7 8 9]
print(example_array[:10])

# wanted: [10 11 12 13 14 15 16 17 18 19]
print(example_array[-10:])

# wanted: [ 5  6  7  8  9 10 11 12 13 14]
print(example_array[5:15])

# wanted: [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16]
print(example_array[0:17])

# wanted: [17 18 19]
print(example_array[-3:])


[0 1 2 3 4 5 6 7 8 9]
[10 11 12 13 14 15 16 17 18 19]
[ 5  6  7  8  9 10 11 12 13 14]
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16]
[17 18 19]


## Step 5: Submit via Canvas ##

Be sure to **save your work**, then log into [Canvas](https://canvas.csun.edu/).  Go to the COMP 502 course, and click "Assignments" on the left pane.  From there, click "Assignment 23".  From there, you can upload the `23_slicing_and_numpy_array_basics.ipynb` file.

You can turn in the assignment multiple times, but only the last version you submitted will be graded.

### Special Thanks to Dr. Glenn Bruns ###

Special thanks to [Dr. Glenn Bruns](https://csumb.edu/scd/glenn-bruns/) at California State University, Monterey Bay, for providing me with closely-related materials which were used in the creation of this assignment.