# Chapter 2: Introduction to NumPy

The NumPy (short for *Numerical Python*) package provides an efficient interface to store and operate on dense data buffers. In some ways, NumPy arrays are like Python's built-in `list` type, but NumPy arrays provide much more efficient storage and data operators as the arrays grow larger in size.

In [1]:
import numpy as np
np.__version__

'1.22.3'

## Understanding Data Types in Python

Python is a dynamically typed language (in contrast to static typed languages like C and Java).
This means that the data types of variables are dynamically inferred:

In [None]:
x = 4
x = "four"

The standard mutable multielement container in Python is the list.

In [5]:
L = list(range(10))
L

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [6]:
type(L[0])

int

In [7]:
L2 = [str(c) for c in L]
L2

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

In [8]:
type(L2[0])

str

In [9]:
L3 = [True, "2", 3.0, 4]
[type(item) for item in L3]

[bool, str, float, int]

This dynamic flexibility comes at a cost: a Python `list` must contain the full structure/information of each object it contains.
By contrast, a NumPy array is fixed-type, and is therefore much more efficient in storing and manipulating data.

Python has a built-in `array` module that can store a uniform type:

In [10]:
import array
L = list(range(10))
A = array.array('i', L)
A

array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Here, `'i'` is a type code indicating the contents are integers.

A more useful object is the NumPy `ndarray`, which adds efficient *operations* on the data (explored later in this section).

In [11]:
np.array([1, 4, 2, 5, 3])

array([1, 4, 2, 5, 3])

If types in the given list do not match, `np.array()` will upcast if possible (here, integers are upcast to floating point):

In [12]:
np.array([3.14, 4, 2, 3])

array([3.14, 4.  , 2.  , 3.  ])

The `dtype` keyword can be used to explicitly set the data type:

In [13]:
np.array([1, 2, 3, 4], dtype='float32')

array([1., 2., 3., 4.], dtype=float32)

Finally, unlike Python lits, NumPy arrays can be explicitly multidimensional:

In [15]:
np.array([range(i, i + 3) for i in [2, 4, 6]])

array([[2, 3, 4],
       [4, 5, 6],
       [6, 7, 8]])

Especially for larger arrays, it is more efficient to create arrays from scratch using NumPy's built-in functions:

In [16]:
# Create a length 10 integer array filled with zeroes
np.zeros(10, dtype=int)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [17]:
# Create a 3x5 floating-point array filled with 1s
np.ones((3, 5), dtype=float)

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [18]:
# Create a 3x5 array filled with 3.14
np.full((3, 5), 3.14)

array([[3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14]])

In [20]:
# Create an array filled with a linear sequence 0 to 20, stepping by 2
np.arange(0, 20, 2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [22]:
# Create an array of five values evenly spaced between 0 and 1
np.linspace(0, 1, 5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [23]:
# Create a 3x3 array of uniformly distributed random values between 0 and 1
np.random.random((3, 3))

array([[0.11951912, 0.41018534, 0.27538936],
       [0.58023666, 0.72172228, 0.51023194],
       [0.84901395, 0.57661238, 0.21919099]])

In [24]:
# Create a 3x3 array of noramlly distributed random values with mean 0 and SD 1
np.random.normal(0, 1, (3, 3))

array([[ 1.95286972, -0.2192877 ,  0.85160344],
       [-1.1392111 , -0.68290616, -2.05889299],
       [ 0.20443203,  0.76397592, -0.37174   ]])

In [25]:
# Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(0, 10, (3, 3))

array([[1, 3, 5],
       [6, 2, 8],
       [4, 6, 6]])

In [26]:
# Create a 3x3 identity matrix
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [27]:
# Create an uninitialized array of three integers
# The values will be whatever happens to already exist at that memory location
np.empty(3)

array([1., 1., 1.])

When constructing an array, you can specify data type with a string:

In [28]:
np.zeros(10, dtype='int16')

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int16)

Or using the associated NumPy object:

In [29]:
np.zeros(10, dtype=np.int16)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int16)

## The Basics of Numpy Arrays

In [8]:
np.random.seed(0)

x1 = np.random.randint(10, size = 6)
x2 = np.random.randint(10, size = (3, 4))
x3 = np.random.randint(10, size = (3, 4, 5))

x3.ndim, x3.shape, x3.size, x3.dtype, x3.itemsize, x3.nbytes


(3, (3, 4, 5), 60, dtype('int32'), 4, 240)

In [9]:
x1[0], x1[4], x1[-1], x1[-2]

(5, 7, 9, 7)

In [15]:
x2, x2[0, 0], x2[2, 0], x2[2, -1]

(array([[3, 5, 2, 4],
        [7, 6, 8, 8],
        [1, 6, 7, 7]]),
 3,
 1,
 7)

In [16]:
x2[0, 0] = 12
x2

array([[12,  5,  2,  4],
       [ 7,  6,  8,  8],
       [ 1,  6,  7,  7]])

In [17]:
x1[0] = 3.14159
x1

array([3, 0, 3, 3, 7, 9])

Array slicing: accessing subarrays.

In [18]:
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [21]:
x[:5], x[5:], x[4:7], x[::2], x[1::2]

(array([0, 1, 2, 3, 4]),
 array([5, 6, 7, 8, 9]),
 array([4, 5, 6]),
 array([0, 2, 4, 6, 8]),
 array([1, 3, 5, 7, 9]))

In [23]:
x[::-1], x[5::-2]

(array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0]), array([5, 3, 1]))

In [25]:
x2

array([[12,  5,  2,  4],
       [ 7,  6,  8,  8],
       [ 1,  6,  7,  7]])

In [34]:
x2[:2, :3], x2[:3, ::2], x2[::-1, ::-1]

(array([[12,  5,  2],
        [ 7,  6,  8]]),
 array([[12,  2],
        [ 7,  8],
        [ 1,  7]]),
 array([[ 7,  7,  6,  1],
        [ 8,  8,  6,  7],
        [ 4,  2,  5, 12]]))

In [32]:
print(x2[:, 0], x2[0, :], x2[0])


[12  7  1] [12  5  2  4] [12  5  2  4]


It is important to note that array slices are *views* rather than *copies* of the array.

In [30]:
print(x2)

[[12  5  2  4]
 [ 7  6  8  8]
 [ 1  6  7  7]]


In [37]:
# Take a 2x2 subarray of x2
x2_sub = x2[:2, :2]
# Modify the subarray
x2_sub[0, 0] = 99
# The original array is modified as well
print(x2)

[[99  5  2  4]
 [ 7  6  8  8]
 [ 1  6  7  7]]


Use the `copy()` method to explicitly copy the data within an array.

In [38]:
x2_sub_copy = x2[:2, :2].copy()
x2_sub_copy[0, 0] = 42
print(x2)

[[99  5  2  4]
 [ 7  6  8  8]
 [ 1  6  7  7]]


Reshaping of arrays.

In [39]:
# Create the numbers 1 through 9 in a 3x3 grid
grid = np.arange(1, 10).reshape((3, 3))
print(grid)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [43]:
x = np.array([1, 2, 3])
# Row vector via reshape and newaxis
x, x.reshape((1, 3)), x[np.newaxis, :]

(array([1, 2, 3]), array([[1, 2, 3]]), array([[1, 2, 3]]))

In [44]:
# Column vector via reshape and newaxis
x.reshape((3, 1)), x[:, np.newaxis]

(array([[1],
        [2],
        [3]]),
 array([[1],
        [2],
        [3]]))

Array concatenation and splitting.

In [47]:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
z = [99, 99, 99]
np.concatenate([x, y]), np.concatenate([x, y, z])

(array([1, 2, 3, 3, 2, 1]), array([ 1,  2,  3,  3,  2,  1, 99, 99, 99]))

In [50]:
grid = np.array([[1, 2, 3], [4, 5, 6]])
np.concatenate([grid, grid]), np.concatenate([grid, grid], axis=1)

(array([[1, 2, 3],
        [4, 5, 6],
        [1, 2, 3],
        [4, 5, 6]]),
 array([[1, 2, 3, 1, 2, 3],
        [4, 5, 6, 4, 5, 6]]))

In [54]:
np.vstack([x, grid])

array([[1, 2, 3],
       [1, 2, 3],
       [4, 5, 6]])

In [55]:
y = np.array([[99], [99]])
np.hstack([grid, y])

array([[ 1,  2,  3, 99],
       [ 4,  5,  6, 99]])

Splitting arrays.

In [57]:
x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)

[1 2 3] [99 99] [3 2 1]


In [59]:
grid = np.arange(16).reshape((4, 4))
upper, lower = np.vsplit(grid, [2])
upper, lower

(array([[0, 1, 2, 3],
        [4, 5, 6, 7]]),
 array([[ 8,  9, 10, 11],
        [12, 13, 14, 15]]))

In [60]:
left, right = np.hsplit(grid, [2])
left, right

(array([[ 0,  1],
        [ 4,  5],
        [ 8,  9],
        [12, 13]]),
 array([[ 2,  3],
        [ 6,  7],
        [10, 11],
        [14, 15]]))

## Computation on NumPy Arrays: Universal Functions