# Week 4, Class 2: Array Indexing, Slicing, and Operations

## 1. Array Indexing: Accessing Individual Elements
Just like Python lists, NumPy arrays use zero-based indexing to access elements.

### 1.1. 1D Arrays (Vectors)
For a one-dimensional array, you use a single index inside square brackets `[]`.

In [1]:
import numpy as np

# Create a 1D array of sensor readings
sensor_readings = np.array([10.1, 10.5, 9.8, 11.2, 10.9])

print(f"Original array: {sensor_readings}")

# Access the first element (index 0)
first_reading = sensor_readings[0]
print(f"First reading: {first_reading}")

# Access the third element (index 2)
third_reading = sensor_readings[2]
print(f"Third reading: {third_reading}")

# Access the last element (using negative indexing)
last_reading = sensor_readings[-1]
print(f"Last reading: {last_reading}")

# You can also modify an element by assigning a new value to its index
sensor_readings[1] = 10.6
print(f"Array after modifying second element: {sensor_readings}")

Original array: [10.1 10.5  9.8 11.2 10.9]
First reading: 10.1
Third reading: 9.8
Last reading: 10.9
Array after modifying second element: [10.1 10.6  9.8 11.2 10.9]


### 1.2. 2D Arrays (Matrices)

For 2D arrays (matrices), you need to specify both the **row index** and the **column index**. You can do this using a comma-separated tuple `[row_index, column_index]` or by chaining square brackets `[row_index][column_index]`.

In [2]:
# Create a 2D array representing experimental data (e.g., rows are samples, columns are measurements)
experiment_data = np.array([
    [100, 200, 300],  # Sample 1: Measurement 1, 2, 3
    [110, 210, 310],  # Sample 2
    [120, 220, 320]   # Sample 3
])

print(f"Original 2D array:\n{experiment_data}")

# Access element at row 0, column 1 (value 200)
element_0_1 = experiment_data[0, 1]
print(f"\nElement at (0, 1): {element_0_1}")

# Access element at row 2, column 0 (value 120)
element_2_0 = experiment_data[2, 0]
print(f"Element at (2, 0): {element_2_0}")

# Access an entire row
row_1 = experiment_data[1, :]
print(f"Row 1 (Sample 2 data): {row_1}")

# Access an entire column
col_2 = experiment_data[:, 2]
print(f"Column 2 (Measurement 3 data): {col_2}")

# Modify an element
experiment_data[0, 0] = 99
print(f"\nArray after modifying element at (0,0):\n{experiment_data}")

Original 2D array:
[[100 200 300]
 [110 210 310]
 [120 220 320]]

Element at (0, 1): 200
Element at (2, 0): 120
Row 1 (Sample 2 data): [110 210 310]
Column 2 (Measurement 3 data): [300 310 320]

Array after modifying element at (0,0):
[[ 99 200 300]
 [110 210 310]
 [120 220 320]]


## 2. Array Slicing: Extracting Subarrays

Slicing in NumPy works very similarly to Python lists, but it extends to multiple dimensions. It allows you to extract contiguous blocks or subsets of your array. Remember, slicing creates a *view* into the original array, so modifying a slice will modify the original array (unless you explicitly make a copy).

**Syntax:** `array[start:end:step]` (for each dimension)

### 2.1. 1D Array Slicing

In [3]:
data_series = np.arange(0, 100, 10)
print(f"Original 1D array: {data_series}")

# Get elements from index 2 up to (but not including) index 6
slice1 = data_series[2:6]
print(f"Slice data_series[2:6]: {slice1}")

# Get elements from the beginning up to index 5 (exclusive)
slice2 = data_series[:5]
print(f"Slice data_series[:5]: {slice2}")

# Get elements from index 3 to the end
slice3 = data_series[3:]
print(f"Slice data_series[3:]: {slice3}")

# Get every other element
every_other = data_series[::2]
print(f"Slice data_series[::2]: {every_other}")

# Reverse the array
reversed_arr = data_series[::-1]
print(f"Reversed array: {reversed_arr}")

Original 1D array: [ 0 10 20 30 40 50 60 70 80 90]
Slice data_series[2:6]: [20 30 40 50]
Slice data_series[:5]: [ 0 10 20 30 40]
Slice data_series[3:]: [30 40 50 60 70 80 90]
Slice data_series[::2]: [ 0 20 40 60 80]
Reversed array: [90 80 70 60 50 40 30 20 10  0]


### 2.2. 2D Array Slicing

Slicing 2D arrays involves specifying slices for both rows and columns, separated by a comma.

In [4]:
matrix_data = np.array([
    [1, 2, 3, 4],
    [5, 6, 7, 8],
    [9, 10, 11, 12],
    [13, 14, 15, 16]
])
print(f"Original 2D array:\n{matrix_data}")

# Get the first two rows, all columns
sub_matrix1 = matrix_data[0:2, :]
print(f"\nFirst two rows:\n{sub_matrix1}")

# Get all rows, first two columns
sub_matrix2 = matrix_data[:, 0:2]
print(f"\nFirst two columns:\n{sub_matrix2}")

# Get a specific block
block = matrix_data[1:3, 1:3]
print(f"\nBlock (rows 1-2, cols 1-2):\n{block}")

# Get specific rows and columns with steps
sparse_selection = matrix_data[::2, ::2]
print(f"\nEvery other row, every other column:\n{sparse_selection}")

Original 2D array:
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]]

First two rows:
[[1 2 3 4]
 [5 6 7 8]]

First two columns:
[[ 1  2]
 [ 5  6]
 [ 9 10]
 [13 14]]

Block (rows 1-2, cols 1-2):
[[ 6  7]
 [10 11]]

Every other row, every other column:
[[ 1  3]
 [ 9 11]]


### 2.3. Boolean Indexing

Boolean indexing is a powerful way to select elements from an array that satisfy a certain condition. You create a boolean array (an array of `True`/`False` values) of the same shape as your data array, and then use it to "mask" or select elements.

In [5]:
temperature_readings = np.array([22.5, 23.1, 21.9, 24.0, 22.8, 25.0])
print(f"Original temperatures: {temperature_readings}")

# Create a boolean array: True where temperature > 23.0, False otherwise
high_temp_mask = temperature_readings > 23.0
print(f"Boolean mask for > 23.0: {high_temp_mask}")

# Use the boolean mask to select elements
high_temperatures = temperature_readings[high_temp_mask]
print(f"Temperatures > 23.0: {high_temperatures}")

# You can combine conditions using logical operators (& for AND, | for OR)
# Find temperatures between 22.0 and 24.0 (exclusive)
normal_range_mask = (temperature_readings > 22.0) & (temperature_readings < 24.0)
normal_temperatures = temperature_readings[normal_range_mask]
print(f"Temperatures between 22.0 and 24.0: {normal_temperatures}")

Original temperatures: [22.5 23.1 21.9 24.  22.8 25. ]
Boolean mask for > 23.0: [False  True False  True False  True]
Temperatures > 23.0: [23.1 24.  25. ]
Temperatures between 22.0 and 24.0: [22.5 23.1 22.8]


In [6]:
# Boolean indexing also works for 2D arrays
matrix = np.array([[1, 2, 3], 
                   [4, 5, 6], 
                   [7, 8, 9]])
print(f"Original matrix:\n{matrix}")
mask_greater_than_5 = matrix > 5
print(f"Mask for > 5:\n{mask_greater_than_5}")
elements_greater_than_5 = matrix[mask_greater_than_5]
print(f"Elements > 5 (returned as a 1D array): {elements_greater_than_5}")

Original matrix:
[[1 2 3]
 [4 5 6]
 [7 8 9]]
Mask for > 5:
[[False False False]
 [False False  True]
 [ True  True  True]]
Elements > 5 (returned as a 1D array): [6 7 8 9]


When using a boolean mask on a multi-dimensional array, the result is always a **1D array** containing all the elements that satisfy the condition.

### 2.4. Advanced Indexing (Fancy Indexing)

While slicing works for contiguous blocks or regular steps, **advanced indexing** (often called "fancy indexing") allows you to select non-contiguous elements or rows/columns using lists or arrays of indices. This returns a *copy* of the data, not a view.

#### Selecting Specific Rows or Columns by Index List

In [7]:
experiment_log = np.array([
    [10.1, 20.5, 30.2], # Exp 0
    [11.0, 21.1, 31.5], # Exp 1
    [12.5, 22.0, 32.8], # Exp 2
    [13.0, 23.5, 33.0], # Exp 3
    [14.2, 24.1, 34.5]  # Exp 4
])
print(f"Original Experiment Log:\n{experiment_log}")

# Select specific, non-contiguous rows
selected_rows = experiment_log[[0, 2, 4]]
print(f"\nSelected rows (0, 2, 4):\n{selected_rows}")

# Select specific, non-contiguous columns
selected_columns = experiment_log[:, [0, 2]]
print(f"\nSelected columns (0, 2):\n{selected_columns}")

# Select rows in a specific order
reordered_rows = experiment_log[[4, 3, 2, 1, 0]]
print(f"\nReordered rows (reverse):\n{reordered_rows}")

Original Experiment Log:
[[10.1 20.5 30.2]
 [11.  21.1 31.5]
 [12.5 22.  32.8]
 [13.  23.5 33. ]
 [14.2 24.1 34.5]]

Selected rows (0, 2, 4):
[[10.1 20.5 30.2]
 [12.5 22.  32.8]
 [14.2 24.1 34.5]]

Selected columns (0, 2):
[[10.1 30.2]
 [11.  31.5]
 [12.5 32.8]
 [13.  33. ]
 [14.2 34.5]]

Reordered rows (reverse):
[[14.2 24.1 34.5]
 [13.  23.5 33. ]
 [12.5 22.  32.8]
 [11.  21.1 31.5]
 [10.1 20.5 30.2]]


#### Selecting Arbitrary Elements (Using Pairs of Indices)

You can also select individual elements at specific (row, column) coordinates by passing two lists/arrays of indices. The first list specifies the row indices, and the second specifies the column indices. The result will be a 1D array where `result[i]` is `array[rows[i], cols[i]]`.

In [8]:
grid_data = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])
print(f"Original Grid Data:\n{grid_data}")

# Select elements at (0,0), (1,2), and (2,1)
row_indices = [0, 1, 2]
col_indices = [0, 2, 1] # Corresponding column for each row
selected_elements = grid_data[row_indices, col_indices]
print(f"\nSelected elements at (0,0), (1,2), (2,1): {selected_elements}")

Original Grid Data:
[[1 2 3]
 [4 5 6]
 [7 8 9]]

Selected elements at (0,0), (1,2), (2,1): [1 6 8]


## 3. Basic Array Operations (Element-Wise)

NumPy's power truly shines when performing operations on entire arrays without explicit loops. These are called **vectorized operations**.

### 3.1. Scalar Operations

You can perform arithmetic operations between an array and a single number (scalar). The operation is applied to every element in the array.

In [9]:
# Array of initial concentrations
initial_concentrations = np.array([0.1, 0.2, 0.3, 0.4])
print(f"Original: {initial_concentrations}")

# Add a constant offset
offset = 0.05
new_concentrations = initial_concentrations + offset
print(f"Original + {offset}: {new_concentrations}")

# Multiply by a factor
factor = 2.5
scaled_concentrations = initial_concentrations * factor
print(f"Original * {factor}: {scaled_concentrations}")

# Divide by a value
divisor = 10.0
divided_values = initial_concentrations / divisor
print(f"Original / {divisor}: {divided_values}")

# Apply an exponent
exponent = 2
squared_values = initial_concentrations ** exponent
print(f"Original ** {exponent}: {squared_values}")

Original: [0.1 0.2 0.3 0.4]
Original + 0.05: [0.15 0.25 0.35 0.45]
Original * 2.5: [0.25 0.5  0.75 1.  ]
Original / 10.0: [0.01 0.02 0.03 0.04]
Original ** 2: [0.01 0.04 0.09 0.16]


### 3.2. Array-Array Operations

When performing operations between two NumPy arrays, the operation is applied **element-wise**. This means the first element of array A is operated with the first element of array B, and so on. For this to work, the arrays must have **compatible shapes** (often, they must have the exact same shape, or one must be broadcastable to the other, which we'll cover in the next class). These vectorized operations are incredibly fast.

In [10]:
# Two arrays of the same shape
array_A = np.array([1, 2, 3])
array_B = np.array([10, 20, 30])

print(f"Array A: {array_A}")
print(f"Array B: {array_B}")

# Element-wise addition
sum_array = array_A + array_B
print(f"A + B: {sum_array}")

# Element-wise subtraction
diff_array = array_B - array_A
print(f"B - A: {diff_array}")

# Element-wise multiplication (NOT matrix multiplication)
prod_array = array_A * array_B
print(f"A * B (element-wise): {prod_array}")

# Element-wise division
div_array = array_B / array_A
print(f"B / A (element-wise): {div_array}")

Array A: [1 2 3]
Array B: [10 20 30]
A + B: [11 22 33]
B - A: [ 9 18 27]
A * B (element-wise): [10 40 90]
B / A (element-wise): [10. 10. 10.]


## Summary and Key Takeaways

* **Indexing** allows you to access individual elements in NumPy arrays using `array[index]` for 1D, and `array[row, col]` for 2D.
* **Slicing** extracts portions of arrays, creating new views. Use `array[start:end:step]` for 1D, and `array[row_slice, col_slice]` for 2D.
* **Boolean indexing** is a powerful way to select elements based on a condition, using a boolean mask.
* **Vectorized operations** perform element-wise arithmetic between arrays and scalars, or between compatible arrays, offering significant performance benefits over Python loops.

## Exercises

Complete the following exercises in a new Python script or a new Jupyter Notebook.

1.  **Accessing 2D Array Elements:**
    * Create a 4x3 NumPy array (matrix) representing sensor data, where rows are different sensors and columns are readings at different times.
        ```python
        sensor_matrix = np.array([
            [10.1, 10.2, 10.3],
            [11.0, 11.5, 11.1],
            [9.5, 9.7, 9.9],
            [12.0, 12.2, 12.1]
        ])
        ```
    * Print the reading from the 2nd sensor at the 3rd time point (using appropriate indexing).
    * Print all readings from the 1st time point (i.e., the first column).
    * Change the reading for the 4th sensor at the 2nd time point to `12.5`. Print the updated `sensor_matrix`.

2.  **Slicing Temperature Data:**
    * Create a 1D NumPy array of 15 temperature values (you can use `np.linspace` or `np.arange` and then add some random noise if you like, or just create a list of 15 numbers).
    * Extract and print the first 5 temperature readings.
    * Extract and print the last 3 temperature readings.
    * Extract and print every third temperature reading from the entire array.

3.  **Boolean Filtering for pH Levels:**
    * You have a NumPy array of pH measurements: `ph_values = np.array([6.8, 7.2, 6.5, 7.0, 7.8, 6.9, 7.1, 8.0])`.
    * Use Boolean indexing to select and print only the `ph_values` that are within the "neutral" range (between 6.9 and 7.1, inclusive).
    * Use Boolean indexing to select and print `ph_values` that are either "acidic" (less than 7.0) OR "basic" (greater than 7.5).

4.  **Vectorized Calculations on Experimental Data:**
    * You have an array of raw experimental measurements: `raw_measurements = np.array([5.2, 6.1, 5.8, 6.5, 5.9])`.
    * Each measurement needs to be converted to a new scale by:
        1. Adding `0.5` to each value.
        2. Then multiplying the result by `10.0`.
    * Perform these operations using vectorized NumPy operations (no explicit loops).
    * Print the `raw_measurements` and the `scaled_measurements`.

5.  **Comparing Two Datasets:**
    * You have two arrays representing two different runs of the same experiment:
        `run1_results = np.array([100, 105, 110, 115, 120])`
        `run2_results = np.array([98, 107, 108, 118, 122])`
    * Calculate the **difference** between `run2_results` and `run1_results` (element-wise). Store this in `difference_array`.
    * Calculate the **percentage difference** for each element: `((run2 - run1) / run1) * 100`. Store this in `percentage_diff_array`.
    * Print both `difference_array` and `percentage_diff_array`.