# Week 4, Class 1: Introduction to NumPy

## 1. What is NumPy?

**NumPy** (Numerical Python) is the foundational library for numerical computation in Python.  
Its core feature is the `ndarray` object, which is a powerful, multi-dimensional array designed for efficient numerical operations.

**Why is NumPy so important for scientists?**

- **Performance:**  
  NumPy arrays are stored in a contiguous block of memory, which makes them much faster and more memory-efficient than Python lists for numerical operations.  
  Operations on entire arrays can be performed without writing explicit Python loops, a process called **vectorization**.

- **Vectorization:**  
  Instead of writing a `for` loop to multiply each element of a list by a number, you can simply multiply the entire array.  
  This makes code more concise, readable, and faster.

- **Ecosystem:**  
  NumPy is the bedrock of many other scientific libraries, including Pandas, SciPy, and Matplotlib.  
  A strong understanding of NumPy is a prerequisite for using these tools effectively.


## 2. NumPy Arrays vs. Python Lists

While NumPy arrays may look similar to lists at first, they behave very differently in key ways.

| Feature           | Python List                                         | NumPy Array                                      |
|-------------------|-----------------------------------------------------|-------------------------------------------------|
| **Data Type**      | Can store elements of different data types          | Stores elements of a single, uniform data type  |
| **Size**           | Dynamically resizable                              | Fixed size upon creation                        |
| **Arithmetic**     | Requires explicit loops for element-wise math       | Supports fast, vectorized element-wise operations |
| **Memory Efficiency** | Less memory-efficient                           | Highly memory-efficient                         |

In [None]:
!pip install numpy

In [1]:
import numpy as np

# A standard Python list
my_list = [1, 2, 3, 4, 5]
print(f"Python List: {my_list}")

# A NumPy array
my_array = np.array([1, 2, 3, 4, 5])
print(f"NumPy Array: {my_array}")
print(f"Type of my_array: {type(my_array)}")

# What happens when we multiply
print(f"List * 2: {my_list * 2}") # Repeats the list
print(f"Array * 2: {my_array * 2}") # Multiplies each element

Python List: [1, 2, 3, 4, 5]
NumPy Array: [1 2 3 4 5]
Type of my_array: <class 'numpy.ndarray'>
List * 2: [1, 2, 3, 4, 5, 1, 2, 3, 4, 5]
Array * 2: [ 2  4  6  8 10]


## 3. Creating NumPy Arrays

You can create NumPy arrays in several ways.

### 3.1. From a Python List or Tuple

This is the most common way to get started. You simply pass a list or a list of lists to `np.array()`.

In [2]:
# A one-dimensional array (vector) from a list
data_vector = np.array([1.5, 2.7, 3.1, 4.0])
print(f"1D Array (Vector):\n{data_vector}")

# A two-dimensional array (matrix) from a list of lists
my_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
data_matrix = np.array(my_list)
print(f"\n2D List (Matrix):\n{my_list}")
print(f"\n2D Array (Matrix):\n{data_matrix}")

1D Array (Vector):
[1.5 2.7 3.1 4. ]

2D List (Matrix):
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

2D Array (Matrix):
[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [4]:
data_matrix[1, 1]

np.int64(5)

### 3.2. Using Built-in Functions

NumPy provides convenient functions to create arrays pre-filled with values.

* `np.zeros(shape)`: Creates an array filled with zeros.
* `np.ones(shape)`: Creates an array filled with ones.
* `np.empty(shape)`: Creates an array with uninitialized (random) values.
* `np.full(shape, fill_value)`: Creates an array filled with a specified value.

The `shape` is a tuple representing the dimensions of the array, e.g., `(3,)` for a 1D array of length 3, or `(2, 3)` for a 2x3 matrix.


In [14]:
# Create a 1D array of 5 zeros
zeros_vector = np.zeros((5, ))
print(f"Vector of zeros:\n{zeros_vector}")

# Create a 2x3 matrix of ones
ones_matrix = np.ones((2, 3))
print(f"\nMatrix of ones:\n{ones_matrix}")

# Create a 3x2 matrix filled with the value 7
full_matrix = np.full((3, 2), 7)
print(f"\nMatrix filled with 7s:\n{full_matrix}")

print(f"\n{zeros_vector.dtype}")

Vector of zeros:
[0. 0. 0. 0. 0.]

Matrix of ones:
[[1. 1. 1.]
 [1. 1. 1.]]

Matrix filled with 7s:
[[7 7]
 [7 7]
 [7 7]]

float64


In [10]:
full_nan_matrix = np.full((3, 2), np.nan)
print(f"Matrix filled with NaNs:\n{full_nan_matrix}")

Matrix filled with NaNs:
[[nan nan]
 [nan nan]
 [nan nan]]


### 3.3. Using Sequences

* `np.arange(start, stop, step)`: Similar to Python's `range()`, but returns a NumPy array.
* `np.linspace(start, stop, num)`: Creates an array of evenly spaced numbers over a specified interval.

In [11]:
# Create an array from 0 to 9
numbers_array = np.arange(10)
print(f"Array from arange(10):\n{numbers_array}")

# Create an array of even numbers from 2 to 10
even_numbers = np.arange(2, 11, 2)
print(f"\nArray from arange(2, 11, 2):\n{even_numbers}")

# Create 5 evenly spaced numbers from 0 to 1
spaced_numbers = np.linspace(0, 1, 5)
print(f"\nArray from linspace(0, 1, 5):\n{spaced_numbers}")

Array from arange(10):
[0 1 2 3 4 5 6 7 8 9]

Array from arange(2, 11, 2):
[ 2  4  6  8 10]

Array from linspace(0, 1, 5):
[0.   0.25 0.5  0.75 1.  ]


## 4. Array Attributes

NumPy arrays have several useful attributes that provide information about their structure.

* `.shape`: A tuple indicating the dimensions of the array.
* `.ndim`: The number of dimensions (axes) of the array.
* `.size`: The total number of elements in the array.
* `.dtype`: The data type of the elements in the array.

In [12]:
data_1d = np.array([1, 2, 3, 4])
data_2d = np.array([[10, 20], [30, 40], [50, 60]])

print(f"Attributes for 1D array:")
print(f"  Shape: {data_1d.shape}")
print(f"  Dimensions: {data_1d.ndim}")
print(f"  Size: {data_1d.size}")
print(f"  Data type: {data_1d.dtype}")

print(f"\nAttributes for 2D array:")
print(f"  Shape: {data_2d.shape}")
print(f"  Dimensions: {data_2d.ndim}")
print(f"  Size: {data_2d.size}")
print(f"  Data type: {data_2d.dtype}")

Attributes for 1D array:
  Shape: (4,)
  Dimensions: 1
  Size: 4
  Data type: int64

Attributes for 2D array:
  Shape: (3, 2)
  Dimensions: 2
  Size: 6
  Data type: int64


In [15]:
# You can also specify the data type when creating an array
float_array = np.zeros((3,), dtype=np.int64)
print(f"Float array attributes:")
print(f"  Array: {float_array}")
print(f"  Data type: {float_array.dtype}")

Float array attributes:
  Array: [0 0 0]
  Data type: int64


In [18]:
array14 = np.zeros((1,4), dtype=np.int64)
print(array14)
print(array14[0, 1])

[[0 0 0 0]]
0


The `dtype` is an important concept in NumPy as it ensures all elements in an array are treated uniformly, leading to performance gains.

## 5. Performance: For Loops vs. NumPy Vectorization

One of the main reasons to use NumPy is its superior performance for numerical operations. Let's use the `time_this` decorator we learned in Week 2 to empirically demonstrate the speed difference between a traditional Python `for` loop and a NumPy vectorized operation for a large dataset.

In [20]:
import numpy as np
import time

# Let's assume you have a time_this decorator defined somewhere in your code
# For this example, we'll redefine it here for clarity.
def time_this(func):
    """A decorator to measure the execution time of a function."""
    def wrapper(*args, **kwargs):
        start_time = time.time()
        result = func(*args, **kwargs)
        end_time = time.time()
        print(f"Function '{func.__name__}' took {end_time - start_time:.6f} seconds to execute.")
        return result
    return wrapper

# Define the number of elements to process
NUM_ELEMENTS = 10_000_000

# --- Method 1: Python for loop on a list ---
@time_this
def process_with_loop(my_list: list) -> list:
    """Multiplies each element of a list by 2 using a for loop."""
    result = []
    for element in my_list:
        result.append(element * 2)
    return result

# --- Method 2: NumPy vectorized operation on an array ---
@time_this
def process_with_numpy(my_array: np.ndarray) -> np.ndarray:
    """Multiplies each element of a NumPy array by 2 using a vectorized operation."""
    return my_array * 2

# Create the data
print(f"Creating a list and a NumPy array with {NUM_ELEMENTS} elements...")
start_time = time.time()
py_list = list(range(NUM_ELEMENTS))
end_time = time.time()
print(f"py_list creatioin: {end_time-start_time:.6f}")
start_time = time.time()
np_array = np.arange(NUM_ELEMENTS)
end_time = time.time()
print(f"np_array creatioin: {end_time-start_time:.6f}")
print("...Data creation complete.\n")

# Run the performance comparison
print("Running Python list for loop...")
_ = process_with_loop(py_list)

print("\nRunning NumPy vectorized operation...")
_ = process_with_numpy(np_array)

Creating a list and a NumPy array with 10000000 elements...
py_list creatioin: 0.482280
np_array creatioin: 0.035467
...Data creation complete.

Running Python list for loop...
Function 'process_with_loop' took 1.365672 seconds to execute.

Running NumPy vectorized operation...
Function 'process_with_numpy' took 0.034858 seconds to execute.


## Summary and Key Takeaways

* **NumPy** is the core library for numerical computing in Python, built around the `ndarray` object.
* **NumPy arrays** are faster, more memory-efficient, and designed for vectorized operations compared to Python lists.
* You can create arrays from lists, using helper functions like `np.zeros()` and `np.ones()`, or using sequences generated by `np.arange()` and `np.linspace()`.
* Key array attributes like `.shape`, `.ndim`, `.size`, and `.dtype` provide essential information about the array's structure.

## Exercises

Complete the following exercises in a new Python script or a new Jupyter Notebook.

1.  **Create a 1D Array:**
    * Create a NumPy array from the following list of experimental readings: `[1.2, 1.5, 1.8, 2.1, 1.9]`.
    * Print the array and its `dtype`.

2.  **Create a 2D Array (Matrix):**
    * Create a NumPy array (a 3x2 matrix) from the following list of lists: `[[10, 20], [30, 40], [50, 60]]`.
    * Print the array, its `shape`, and its `ndim`.

3.  **Use Built-in Functions:**
    * Create a 4x4 matrix of ones.
    * Create a 1D array of 7 elements, all filled with the number 9.
    * Print both arrays.

4.  **Vectorized Operation:**
    * Create a NumPy array for temperatures in Celsius: `celsius_temps = np.array([0, 10, 20, 30, 40])`.
    * Without using a `for` loop, calculate the equivalent Fahrenheit temperatures using the formula: `F = (C * 9/5) + 32`.
    * Print the new `fahrenheit_temps` array.

5.  **linspace Challenge:**
    * Create a NumPy array containing 11 evenly spaced values between 0 and 1 (inclusive).
    * Print the array and its `size`.