# 📘 Notebook 3: Lists, Tuples, and NumPy Arrays
This notebook will help you understand Python's core data structures: lists and tuples, and introduce you to NumPy arrays.

### 🧠 Why This Matters for Machine Learning
When working with datasets, you'll often use lists or NumPy arrays to store and process data efficiently.

## 📋 Lists: Ordered and Mutable
- Lists are ordered collections that can be changed (mutable).
  - Think of them as a locker room with a row of lockers that you can store things in.
- You can put what ever into the lockers, even mix and match
  - I.e., elements can be of any type, int, float, string, etc.
- Use square brackets `[]` to define a list.
- You can access elements using indices (starting at 0).
- You can add elements to the list using the ``append()`` function.

**Example code below**

In [None]:
# Examples of using lists
models = ["Linear Regression", "Decision Tree", "KNN"]
print(models) # Prints the whole list
print(models[0])  # First item (model)
models.append("SVM")  # Add to list
print(models)
models[1] = "Random Forest"  # Update item as a specific location (index 1 = 2nd location)
print(models)

## 🔒 Tuples: Ordered and Immutable
- Tuples are like lists but cannot be changed after creation.
- Use parentheses `()` to define a tuple (instead of square brackets ``[]`` for lists).
- Still access values like with arrays using indices (starting at 0).
- Good for fixed values like model settings.

**Example code below**

In [None]:
# Examples of using tuples
settings = (0.01, 100)  # e.g., learning_rate, epochs
print(settings)
print(f"Learning rate: {settings[0]}, Epochs: {settings[1]}")

### 🔁 Converting Between Tuples and Lists

Similarly to casting variables (e.g., from float to int), you can convert between typles and lists.

It's rare to need to do this, but here's why you might:
- Tuples are immutable, meaning you cannot change their contents after creation.
- Lists are mutable, so you can add, remove, or change elements.
- You might convert a tuple to a list if you need to update its contents, then convert it back to a tuple for consistency or protection.

**Use cases:**
- Store fixed settings as a tuple to prevent accidental changes.
- Convert to a list temporarily when changes are needed.

**Example code below**

In [None]:
# Converting between tuple and list
model_params = (0.01, 100)  # learning rate, epochs
print("Original tuple:", model_params)

# Convert to list to modify
param_list = list(model_params)
param_list[1] = 150  # Update number of epochs
print("Modified list:", param_list)

# Convert back to tuple
model_params = tuple(param_list)
print("Updated tuple:", model_params)

## 🔢 NumPy Arrays: Fast Numerical Structures

Remember what NumPy arrays are?
- NumPy arrays are efficient, typed, collections for **numerical data**.
  - i.e., while lists can contain a mixture of ints, floats and strings, NumPy arrays are fixed to a specific data type
- You must first `import numpy as np`.
- Arrays can be created with a specified data type (e.g., `dtype=int`, `dtype=float`).
- Arrays support element-wise operations and many mathematical functions.

**What makes NumPy arrays better for data processing and Machine Learning?!**
- You can get properties like ``shape``, ``size``, and ``average values`` using built-in methods. No need to use the ``len()`` function to find out the size, for example, like you have to on a basic python list.
- While lists are flexible, arrays are better for numerical computations and large data.
- You can append or delete values using `np.append()` and `np.delete()`.
- You can represent **missing data** using `np.nan` (Not a Number), useful in preprocessing datasets as many real-world datasets have missing values. You can't do this in basic python lists.

**Example code below**

In [None]:
import numpy as np

# Creating NumPy arrays with specified type
array_int = np.array([1, 2, 3], dtype=int)
array_float = np.array([1, 2, 3], dtype=float)
print("Integer array:", array_int)
print("Float array:", array_float)

# Useful properties
print("Size:", array_float.size)   # MUCH QUICKER than doing len(array_float)
print("Mean:", array_float.mean()) # A quick way of getting the average value in the array

# Operations
print("Doubled:", array_float * 2)  # multiplies every number in the array by 2
print("Squared:", array_float ** 2) # squares every number in the array

# Adding and removing values
extended = np.append(array_float, [4.0, 5.0])
print("After append:", extended)
reduced = np.delete(extended, [0, 1])
print("After deletion:", reduced)

# Example with missing values
with_nan = np.array([1.0, np.nan, 3.0, 4.0])
print("Array with missing value:", with_nan)
print("Mean (ignores NaN by default):", np.nanmean(with_nan))

## 🎯 Tasks: Try it Yourself

1. Create a list of five machine learning model names. Then do the following:
- Print the list
- Access and update an element
- Add a new model to the list

In [2]:
models = ["linear","tree","quad","KNN","SVM"]

print("Models:", models)

models[1] = "decision tree"

models.append("random forest")

Models: ['linear', 'tree', 'quad', 'KNN', 'SVM']


2. Create a tuple containing a learning rate and number of epochs. Then do the following:
- Access and print each element
- Convert the tuple to a list, modify it, and convert back to a tuple

In [8]:
learning_rate = (0.01,100,0.001)

new = list(learning_rate)
new[1] = 99
new = tuple(new)
print(new)

(0.01, 99, 0.001)


3. Create a NumPy array from a list of numbers (e.g., loss values over epochs). Then do the following:
- Print the shape and perform a simple operation (e.g., subtract by a number)

In [11]:
import numpy as np
array_int = np.array([1,2,3,3,4,5,1,2])
print(array_int)
print("after subtraction epochs", array_int - 1)

[1 2 3 3 4 5 1 2]
after subtraction epochs [0 1 2 2 3 4 0 1]


## 💥 Mini Challenge
Simulate storing training accuracy for 5 epochs using a NumPy array. Multiply all values by 100 to show them as percentages.

In [15]:
accuracy = np.array([0.8, 0.85, 0.9, 0.95, 0.01])
print("Accuracy percentages : ", accuracy*100 )

Accuracy percentages :  [80. 85. 90. 95.  1.]


## 🤔 Reflection
- Why might it be better to use NumPy arrays instead of lists in machine learning workflows? Especially consider what actually happens when you need to use the ``len()`` function to get the size of a python list.
- When would a tuple be preferable over a list?

## ✅ Solutions (Click to Expand)

In [10]:
# Task 1
models = ["Linear Regression", "SVM", "KNN", "Random Forest", "Naive Bayes"]
print(models)
models[2] = "Decision Tree"
models.append("Neural Network")
print(models)

# Task 2
params = (0.01, 100)
print(params[0], params[1])
param_list = list(params)
param_list[1] = 150
params = tuple(param_list)
print(params)

# Task 3
import numpy as np
losses = np.array([0.8, 0.6, 0.5, 0.4, 0.35])
print(losses)
print("After subtraction:", losses - 0.1)

# Mini Challenge
accuracies = np.array([0.91, 0.93, 0.92, 0.94, 0.95])
print("Accuracies (%):", accuracies * 100)

['Linear Regression', 'SVM', 'KNN', 'Random Forest', 'Naive Bayes']
['Linear Regression', 'SVM', 'Decision Tree', 'Random Forest', 'Naive Bayes', 'Neural Network']
0.01 100
(0.01, 150)
[0.8  0.6  0.5  0.4  0.35]
After subtraction: [0.7  0.5  0.4  0.3  0.25]
Accuracies (%): [91. 93. 92. 94. 95.]
