# 4  NumPy Basics: Arrays and Vectorized Computation

NumPy, an abbreviation for Numerical Python, stands as a fundamental package in Python for numerical computing. It serves as a crucial component for various computational packages that offer scientific functionalities, using NumPy's array objects as a standard interface for seamless data exchange. The insights shared about NumPy are also applicable to pandas.

Key features of NumPy include:

1. **ndarray**: An efficient multidimensional array that facilitates rapid array-oriented arithmetic operations and offers versatile broadcasting capabilities.

2. **Mathematical Functions**: Quick operations on entire arrays of data without the need for explicit loops.

3. **Data I/O Tools**: Facilities for reading/writing array data to disk and working with memory-mapped files.

4. **Linear Algebra, Random Number Generation, and Fourier Transform Capabilities**: Providing a comprehensive set of tools for these essential mathematical operations.

5. **C API**: NumPy offers a C API, enabling seamless connectivity with libraries written in C, C++, or FORTRAN. This interoperability enhances the integration of NumPy with other programming languages.

NumPy's robust and well-documented C API facilitates the seamless transfer of data between Python and external libraries coded in lower-level languages. This capability streamlines the integration of NumPy arrays with external libraries, making Python an optimal choice for wrapping legacy codebases in C, C++, or FORTRAN. The result is a dynamic and easily accessible interface.

Although NumPy itself doesn't offer modeling or specific scientific functionalities, a solid grasp of NumPy arrays and array-oriented computing is invaluable for efficiently utilizing tools with array computing semantics, such as pandas. While this introduction provides a foundational understanding, the breadth of NumPy's capabilities is vast. Advanced features like broadcasting will be explored more comprehensively in subsequent sections (refer to Appendix A: Advanced NumPy). While these advanced features may not be necessary for the current context, they prove beneficial as you delve deeper into the realm of scientific computing in Python.

In the context of data analysis applications, my primary emphasis will be on the following key functionalities:

1. **Fast Array-Based Operations**: Utilizing efficient operations for data manipulation and cleaning, including tasks such as subsetting, filtering, transformation, and various computational operations.

2. **Common Array Algorithms**: Employing standard array algorithms like sorting, unique operations, and set operations to handle data effectively.

3. **Efficient Descriptive Statistics**: Calculating descriptive statistics with efficiency and aggregating/summarizing data in a manner that optimizes computational resources.

4. **Data Alignment and Relational Manipulations**: Performing operations related to data alignment and manipulating relational aspects for merging and joining heterogeneous datasets seamlessly.

5. **Expressing Conditional Logic as Array Expressions**: Utilizing array expressions to articulate conditional logic, avoiding the need for traditional loops with if-elif-else branches and thereby enhancing code efficiency.

6. **Group-wise Data Manipulations**: Conducting manipulations on data in a group-wise fashion, encompassing aggregation, transformation, and the application of functions tailored to specific groups within the dataset.

NumPy serves as a fundamental framework for general numerical data processing, offering a solid computational foundation. However, for readers engaging in statistical analysis or analytics, particularly on tabular data, pandas is often the preferred choice. Pandas extends beyond the capabilities of NumPy by providing additional domain-specific functionalities, such as specialized tools for time series manipulation, which are not inherently available in NumPy. Therefore, while NumPy lays the groundwork for numerical computations, pandas becomes the go-to tool for a broader spectrum of data analytics tasks, especially those involving structured and tabular data.\

Important to note is the origin of array-oriented computing in Python, dating back to 1995 when Jim Hugunin developed the Numeric library. In the subsequent decade, various scientific programming communities adopted array programming in Python. However, by the early 2000s, the library ecosystem had become fragmented. In 2005, Travis Oliphant played a pivotal role in unifying the community by consolidating the Numeric and Numarray projects into the NumPy project. This initiative successfully brought together diverse communities under a single and cohesive array computing framework.

The significance of NumPy in numerical computations within Python arises from its design, specifically tailored for efficiency when handling large arrays of data. Several factors contribute to this efficiency:

1. **Contiguous Memory Storage**: NumPy internally organizes data in a continuous block of memory, independent of other native Python objects. This allows NumPy's C-based algorithms to operate on this memory seamlessly, free from the constraints of type checking or additional overhead. In contrast, native Python sequences lack this contiguous memory structure.

2. **C-Language Algorithms**: NumPy's extensive library of algorithms is implemented in the C language. This design choice enables these algorithms to efficiently process data, contributing to enhanced performance. The absence of type checking or other overhead in C-based operations further accelerates computation.

3. **Reduced Memory Usage**: NumPy arrays are optimized to consume significantly less memory compared to built-in Python sequences. This efficiency in memory utilization is crucial when handling large datasets.

4. **Vectorized Operations**: NumPy allows for complex computations to be performed on entire arrays without the need for explicit Python for loops. This vectorized approach eliminates the inefficiencies associated with looping through large sequences in regular Python code. As a result, NumPy operations outpace their counterparts in regular Python code, especially when dealing with substantial datasets.

The performance difference between NumPy and pure Python becomes evident when considering a NumPy array of one million integers and its equivalent Python list:


In [1]:
import numpy as np

my_arr = np.arange(1_000_000)
my_list = list(range(1_000_000))

In [2]:
import numpy as np
np.random.seed(12345)
import matplotlib.pyplot as plt
plt.rc("figure", figsize=(10, 6))
np.set_printoptions(precision=4, suppress=True)

Now, let's perform a simple operation of multiplying each element by 2:

In [3]:
%timeit my_arr2 = my_arr * 2 

314 µs ± 22.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [4]:
%timeit my_list2 = [x * 2 for x in my_list] 

49.6 ms ± 3.59 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


The results highlight a substantial performance gap. NumPy's array-based operation takes only 309 microseconds per loop, while the equivalent Python list operation requires 46.4 milliseconds per loop. In general, NumPy-based algorithms demonstrate a speed improvement of 10 to 100 times (or more) compared to their pure Python counterparts, along with the added benefit of reduced memory usage. This stark contrast underscores the efficiency gains achieved by leveraging NumPy for numerical computations in Python.

## 4.1 The NumPy ndarray: A Multidimensional Array Object

A crucial component of NumPy is its N-dimensional array object, referred to as ndarray. This data structure serves as a rapid and adaptable container for managing substantial datasets in Python. Arrays empower users to execute mathematical operations on entire blocks of data, employing syntax that mirrors the conventions used for equivalent operations between individual scalar elements. This capability not only enhances computational efficiency but also facilitates the manipulation and analysis of large datasets with a concise and intuitive syntax. The ndarray's versatility makes it a cornerstone for numerical computing in Python, providing a powerful tool for handling complex mathematical operations on multidimensional data.

To illustrate how NumPy facilitates batch computations with a syntax similar to scalar values on built-in Python objects, let's consider a simple example. After importing NumPy, a small array is created:


In [2]:
import numpy as np
data = np.array([[1.5, -0.1, 3], [0, -3, 6.5]])
data

array([[ 1.5, -0.1,  3. ],
       [ 0. , -3. ,  6.5]])

Now, mathematical operations can be performed with the array 'data':

In [5]:
data * 10

In [None]:
data + data

In the first example, all elements of the array have been multiplied by 10. In the second example, corresponding values in each "cell" of the array have been added to each other. This demonstrates how NumPy allows for efficient and concise batch operations on entire arrays, mimicking the syntax used for individual scalar values in regular Python operations.

It's noteworthy that in this chapter and throughout the book, we adhere to the standard NumPy convention of importing it as `import numpy as np`. While it's technically feasible to use `from numpy import *` in your code to avoid the need for `np.` prefixes, I strongly discourage adopting this practice. The NumPy namespace is extensive and includes functions with names that might clash with built-in Python functions (e.g., `min` and `max`). Adhering to standard conventions, such as importing NumPy with an alias (`np`), is generally recommended to prevent potential naming conflicts and maintain code clarity.

An ndarray in NumPy serves as a versatile multidimensional container designed for homogeneous data, meaning that all its elements must be of the same type. Each array possesses two essential attributes:

1. **Shape**: This is represented as a tuple, indicating the size of each dimension within the array. For instance:

In [6]:
data.shape


 In this example, the array 'data' has a shape of (2, 3), signifying two rows and three columns.

 2. **Data Type (dtype)**: This attribute is an object that describes the type of data stored in the array. It can be queried as follows:

In [None]:
data.dtype

In this case, the elements of the 'data' array are of type float64. The dtype provides information about the nature of the data within the array.

This chapter aims to provide a foundational understanding of using NumPy arrays, offering sufficient knowledge to follow the subsequent chapters of the book. While a deep comprehension of NumPy might not be imperative for many data analytical applications, acquiring proficiency in array-oriented programming and adopting a mindset oriented towards arrays is crucial for advancing towards expertise in scientific Python.

**Note:** Throughout the book, when the terms "array," "NumPy array," or "ndarray" are used in the text, they typically allude to the ndarray object within the NumPy library.

### Creating ndarrays

Creating an array is straightforward using the array function in NumPy. This function accepts any sequence-like object, including other arrays, and generates a new NumPy array containing the provided data. A common choice for conversion is a Python list:

In [7]:
data1 = [6, 7.5, 8, 0, 1]
arr1 = np.array(data1)
arr1

In this example, a Python list called `data1` is converted into a NumPy array named `arr1` using the `np.array` function. The resulting array `arr1` contains the data from the original list.

Nested sequences, such as a list of equal-length lists, will be converted into a multidimensional array using NumPy's array function. Here's an example:

In [8]:
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2 = np.array(data2)
arr2

In this case, since `data2` is a list of lists, the resulting NumPy array `arr2` has two dimensions, with the shape automatically inferred from the data. We can verify this by checking the `ndim` (number of dimensions) and `shape` attributes:


In [9]:
arr2.ndim

In [None]:
arr2.shape

The `ndim` attribute indicates that the array has two dimensions, and the `shape` attribute specifies the size of each dimension (2 rows and 4 columns in this example).

By default, unless explicitly specified (as discussed later in the chapter on Data Types for ndarrays), the `numpy.array` function attempts to infer a suitable data type for the array it creates. The determined data type is stored in a special dtype metadata object. In the previous examples:

In [10]:
arr1.dtype

In [None]:
arr2.dtype

In the first example (`arr1`), where the input data included floating-point numbers, NumPy inferred the data type as 'float64'. In the second example (`arr2`), where the input data consisted of integers, NumPy inferred the data type as 'int64'. This automatic inference ensures that the created array is appropriately typed based on the provided data.

In addition to `numpy.array`, several other functions in NumPy facilitate the creation of new arrays. For instance, `numpy.zeros` and `numpy.ones` generate arrays filled with 0s or 1s, respectively, based on a specified length or shape. The `numpy.empty` function creates an array without initializing its values to any particular value. For higher-dimensional arrays, you can pass a tuple indicating the shape:

In [5]:
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [6]:
np.zeros((3, 6))

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

In [7]:
np.empty((2, 3, 2))

array([[[4.68756817e-310, 0.00000000e+000],
        [1.01855798e-312, 9.54898106e-313],
        [1.14587773e-312, 1.06099790e-312]],

       [[1.23075756e-312, 1.20953760e-312],
        [1.08221785e-312, 9.76118064e-313],
        [1.14587773e-312, 1.90979621e-312]]])

**Caution:** It's essential to note that `numpy.empty` does not guarantee an array filled with zeros. This function returns uninitialized memory, potentially containing non-zero "garbage" values. Therefore, it should only be used if the intention is to populate the new array with specific data.

`numpy.arange` serves as an array-valued counterpart to the built-in Python `range` function. It generates an array containing a sequence of numbers within the specified range. Here's an example:

In [3]:
np.arange(15)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In this case, `np.arange(15)` produces a NumPy array with values ranging from 0 to 14, similar to the output of the `range` function in Python. The primary distinction is that `numpy.arange` creates an array directly, offering a more convenient and array-oriented approach compared to the standard Python `range` function.

The table below provides a concise overview of some important NumPy array creation functions:

Table 4.1: Some important NumPy array creation functions

| Function   | Description                                                                                                                                                                   |
|------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `array`    | Converts input data (list, tuple, array, or other sequence type) to an ndarray either by inferring a data type or explicitly specifying a data type; copies the input data by default                                                |
| `asarray`  | Converts input to an ndarray, but does not copy if the input is already an ndarray                                                                                           |
| `arange`   | Similar to the built-in `range` but returns an ndarray instead of a list                                                                                                      |
| `ones`, `ones_like` | Produces an array of all 1s with the given shape and data type; `ones_like` takes another array and produces a ones array of the same shape and data type                 |
| `zeros`, `zeros_like` | Similar to `ones` and `ones_like` but produces arrays of 0s instead                                                                                                              |
| `empty`, `empty_like` | Creates new arrays by allocating new memory, but does not populate with any values like `ones` and `zeros`                                                                   |
| `full`, `full_like`   | Produces an array of the given shape and data type with all values set to the indicated "fill value"; `full_like` takes another array and produces a filled array of the same shape and data type |
| `eye`, `identity`     | Creates a 2D identity matrix with ones on the diagonal and zeros elsewhere (`identity` is equivalent to `eye`)                                                               |

These functions are valuable tools for efficiently creating arrays with specific shapes, data types, and fill values, catering to various needs in numerical computing with NumPy.


### Data Types for ndarrays

The data type, represented by the `dtype` attribute, is a special object in NumPy that contains the metadata necessary for an ndarray to interpret a chunk of memory as a specific type of data. Here's an example:


In [8]:
arr1 = np.array([1, 2, 3], dtype=np.float64)
arr2 = np.array([1, 2, 3], dtype=np.int32)

In [9]:
arr1.dtype

dtype('float64')

In [10]:
arr2.dtype

dtype('int32')

In this case, `arr1` is explicitly assigned a data type of `float64`, indicating that each element in the array should be interpreted as a 64-bit floating-point number. Similarly, `arr2` is assigned a data type of `int32`, specifying that each element should be treated as a 32-bit integer. The `dtype` attribute provides insights into the nature of the data stored in the array.

Data types in NumPy contribute to its flexibility when dealing with data from various sources. These data types often directly map onto the underlying memory or disk representation, facilitating the reading and writing of binary data streams to disk. This feature also enables seamless integration with code written in low-level languages like C or FORTRAN.

Numerical data types in NumPy are named by combining a type name (e.g., float or int) with a number indicating the number of bits per element. For instance, a standard double-precision floating-point value, equivalent to Python's float object, occupies 8 bytes or 64 bits. In NumPy, this type is denoted as `float64`. The naming convention allows for clear and precise specification of the data type.

For a comprehensive listing of NumPy's supported data types, refer to Table 4.2. This information is crucial for understanding and managing the memory representation of data in NumPy arrays.

Here is a concise table summarizing NumPy's data types:

**Table 4.2: NumPy Data Types**

| Type                     | Type Code | Description                                                                                         |
|--------------------------|-----------|-----------------------------------------------------------------------------------------------------|
| `int8`, `uint8`          | `i1`, `u1`| Signed and unsigned 8-bit (1 byte) integer types                                                    |
| `int16`, `uint16`        | `i2`, `u2`| Signed and unsigned 16-bit integer types                                                           |
| `int32`, `uint32`        | `i4`, `u4`| Signed and unsigned 32-bit integer types                                                           |
| `int64`, `uint64`        | `i8`, `u8`| Signed and unsigned 64-bit integer types                                                           |
| `float16`                | `f2`      | Half-precision floating point                                                                      |
| `float32`                | `f4` or `f`| Standard single-precision floating point; compatible with C float                                  |
| `float64`                | `f8` or `d`| Standard double-precision floating point; compatible with C double and Python float object         |
| `float128`               | `f16` or `g`| Extended-precision floating point                                                                 |
| `complex64`, `complex128`, `complex256` | `c8`, `c16`, `c32` | Complex numbers represented by two 32, 64, or 128 floats, respectively                   |
| `bool`                   | `?`       | Boolean type storing True and False values                                                        |
| `object`                 | `O`       | Python object type; a value can be any Python object                                               |
| `string_`                | `S`       | Fixed-length ASCII string type (1 byte per character); for example, to create a string data type with length 10, use 'S10' |
| `unicode_`               | `U`       | Fixed-length Unicode type (number of bytes platform-specific); same specification semantics as `string_` (e.g., 'U10') |

It's important to note that there's no need to memorize all the NumPy data types, especially if you're a new user. In many cases, it suffices to be aware of the general kind of data you're working with, such as floating-point, complex, integer, Boolean, string, or general Python objects. 

As you become more experienced, you may find it beneficial to understand the specifics of data types, especially when dealing with memory and disk storage, particularly with large datasets. Having control over the storage type can be advantageous in optimizing performance and memory usage for your specific use cases. So, while it's not necessary to memorize every detail, having a general understanding of the available options can be valuable as you gain more experience with NumPy.

You can explicitly convert or cast an array from one data type to another using the `astype` method of the ndarray:

In [14]:
arr = np.array([1, 2, 3, 4, 5])
arr.dtype


In [None]:
float_arr = arr.astype(np.float64)
float_arr


In [None]:
float_arr.dtype

In this example, integers were cast to floating point. If you cast floating-point numbers to an integer data type, the decimal part will be truncated:

In [15]:
arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
arr


In [None]:
arr.astype(np.int32)

If you have an array of strings representing numbers, you can use `astype` to convert them to numeric form:

In [16]:
numeric_strings = np.array(["1.25", "-9.6", "42"], dtype=np.string_)
numeric_strings.astype(float)

Be cautious when using the `numpy.string_` type, as string data in NumPy is fixed-size and may truncate input without warning. If casting fails for some reason (e.g., a string that cannot be converted to `float64`), a `ValueError` will be raised.

You can also use another array’s dtype attribute:

In [17]:
int_array = np.arange(10)
calibers = np.array([.22, .270, .357, .380, .44, .50], dtype=np.float64)
int_array.astype(calibers.dtype)

There are shorthand type code strings you can use to refer to a dtype:

In [18]:
zeros_uint32 = np.zeros(8, dtype="u4")
zeros_uint32

Note that calling `astype` always creates a new array (a copy of the data), even if the new data type is the same as the old data type.

### Arithmetic with NumPy Arrays

Arrays in NumPy are crucial for expressing batch operations on data without the need for explicit for loops. This capability is often referred to as "vectorization" among NumPy users. Any arithmetic operations performed between equal-size arrays apply the operation element-wise:

In [19]:
arr = np.array([[1., 2., 3.], [4., 5., 6.]])
arr

In [None]:
arr * arr

In [None]:
arr - arr

Arithmetic operations with scalars propagate the scalar argument to each element in the array:

In [20]:
1 / arr

In [None]:
arr ** 2

Comparisons between arrays of the same size yield Boolean arrays:

In [21]:
arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])
arr2

In [None]:
arr2 > arr

Evaluating operations between differently sized arrays is known as "broadcasting" and will be discussed in more detail in Appendix A: Advanced NumPy. However, a deep understanding of broadcasting is not necessary for most of this book.

### Basic Indexing and Slicing

NumPy array indexing is a nuanced topic with various ways to select subsets of data or individual elements. One-dimensional arrays behave similarly to Python lists on the surface:

In [22]:
arr = np.arange(10)
arr

In [None]:
arr[5]

In [None]:
arr[5:8]

In [None]:
arr[5:8] = 12
arr

Assigning a scalar value to a slice, as in `arr[5:8] = 12`, broadcasts the value to the entire selection.

**Note:** An important distinction from Python's built-in lists is that array slices in NumPy are views on the original array. This implies that the data is not copied, and any modifications made to the view will be reflected in the source array. This is different from lists in Python, where slices create a new list with copied data.

To illustrate this behavior, let's create a slice of the array `arr`:

In [23]:
arr_slice = arr[5:8]
arr_slice

Now, if we modify values in `arr_slice`, the changes will be reflected in the original array `arr`:

In [24]:
arr_slice[1] = 12345
arr

Using the "bare" slice `[:]` will assign the specified value to all elements in the array:

In [25]:
arr_slice[:] = 64
arr

If you're new to NumPy, this behavior might be surprising, especially if you're accustomed to other array programming languages that copy data more eagerly. NumPy's design allows it to work efficiently with large arrays, and copying data for every operation could lead to performance and memory issues.

**Caution:** If you want a copy of a slice of an ndarray instead of a view, you need to explicitly copy the array, for example, `arr[5:8].copy()`. This behavior is also observed in pandas.

With higher-dimensional arrays, you have more options. In a two-dimensional array, the elements at each index are no longer scalars but rather one-dimensional arrays:



In [26]:
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2d[2]

Individual elements can be accessed recursively. However, to simplify this process, you can pass a comma-separated list of indices to select individual elements. The following examples are equivalent:

In [27]:
arr2d[0][2]


In [None]:
arr2d[0, 2]

This notation allows for more concise and readable indexing when working with multi-dimensional arrays.

In multidimensional arrays, if you omit later indices, the returned object will be a lower-dimensional ndarray consisting of all the data along the higher dimensions. For example, in the 2 × 2 × 3 array `arr3d`:

In [28]:
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
arr3d

In [29]:
arr3d[0]

Here, `arr3d[0]` is a 2 × 3 array. Both scalar values and arrays can be assigned to `arr3d[0]`:

In [30]:
old_values = arr3d[0].copy()
arr3d[0] = 42
arr3d

In [None]:
arr3d[0] = old_values
arr3d

Similarly, `arr3d[1, 0]` gives you all the values whose indices start with `(1, 0)`, forming a one-dimensional array:

In [31]:
arr3d[1, 0]

This expression is equivalent to indexing in two steps:

In [32]:
x = arr3d[1]
x
x[0]

Note that in all these cases where subsections of the array have been selected, the returned arrays are views.

**Caution:** This multidimensional indexing syntax for NumPy arrays will not work with regular Python objects, such as lists of lists.

### Indexing with slices

Similar to one-dimensional objects like Python lists, ndarrays support slicing using the familiar syntax:

In [33]:
arr

In [None]:
arr[1:6]

However, when dealing with a two-dimensional array like `arr2d`, slicing works differently. For instance:

In [34]:
arr2d

In [None]:
arr2d[:2]

In this case, the slice `arr2d[:2]` selects the first two rows along axis 0. It's useful to interpret it as "select the first two rows of `arr2d`."

You can use multiple slices, similar to multiple indexes:

In [35]:
arr2d[:2, 1:]

When slicing, the resulting arrays are always views with the same number of dimensions. By combining integer indexes and slices, you can obtain lower-dimensional slices. For example:

In [36]:
lower_dim_slice = arr2d[1, :2]

While `arr2d` is two-dimensional, `lower_dim_slice` is one-dimensional. Refer to Figure 4.2 for a visual representation. It's important to understand that a standalone colon implies selecting the entire axis. Consequently, you can perform slicing exclusively on higher-dimensional axes using this notation.

![Figure 4.2: Two-dimensional array slicing](images/pda3_0402.png)

You can also select specific columns or rows:

In [37]:
lower_dim_slice.shape

In [38]:
arr2d[:2, 2]

In [39]:
arr2d[:, :1]

Assigning values to a slice modifies the original array:

In [40]:
arr2d[:2, 1:] = 0
arr2d

### Boolean Indexing



In [41]:
names = np.array(["Bob", "Joe", "Will", "Bob", "Will", "Joe", "Joe"])
data = np.array([[4, 7], [0, 2], [-5, 6], [0, 0], [1, 2],
                 [-12, -4], [3, 4]])
names
data

In [42]:
names == "Bob"

In [43]:
data[names == "Bob"]

In [44]:
data[names == "Bob", 1:]
data[names == "Bob", 1]

In [45]:
names != "Bob"
~(names == "Bob")
data[~(names == "Bob")]

In [46]:
cond = names == "Bob"
data[~cond]

In [47]:
mask = (names == "Bob") | (names == "Will")
mask
data[mask]

In [48]:
data[data < 0] = 0
data

In [49]:
data[names != "Joe"] = 7
data

In [50]:
arr = np.zeros((8, 4))
for i in range(8):
    arr[i] = i
arr

In [51]:
arr[[4, 3, 0, 6]]

In [52]:
arr[[-3, -5, -7]]

In [53]:
arr = np.arange(32).reshape((8, 4))
arr
arr[[1, 5, 7, 2], [0, 3, 1, 2]]

In [54]:
arr[[1, 5, 7, 2]][:, [0, 3, 1, 2]]

In [55]:
arr[[1, 5, 7, 2], [0, 3, 1, 2]]
arr[[1, 5, 7, 2], [0, 3, 1, 2]] = 0
arr

In [56]:
arr = np.arange(15).reshape((3, 5))
arr
arr.T

In [57]:
arr = np.array([[0, 1, 0], [1, 2, -2], [6, 3, 2], [-1, 0, -1], [1, 0, 1]])
arr
np.dot(arr.T, arr)

In [58]:
arr.T @ arr

In [59]:
arr
arr.swapaxes(0, 1)

In [60]:
samples = np.random.standard_normal(size=(4, 4))
samples

In [61]:
from random import normalvariate
N = 1_000_000
%timeit samples = [normalvariate(0, 1) for _ in range(N)]
%timeit np.random.standard_normal(N)

In [62]:
rng = np.random.default_rng(seed=12345)
data = rng.standard_normal((2, 3))

In [63]:
type(rng)

In [64]:
arr = np.arange(10)
arr
np.sqrt(arr)
np.exp(arr)

In [65]:
x = rng.standard_normal(8)
y = rng.standard_normal(8)
x
y
np.maximum(x, y)

In [66]:
arr = rng.standard_normal(7) * 5
arr
remainder, whole_part = np.modf(arr)
remainder
whole_part

In [67]:
arr
out = np.zeros_like(arr)
np.add(arr, 1)
np.add(arr, 1, out=out)
out

In [68]:
points = np.arange(-5, 5, 0.01) # 100 equally spaced points
xs, ys = np.meshgrid(points, points)
ys

In [69]:
z = np.sqrt(xs ** 2 + ys ** 2)
z

In [70]:
import matplotlib.pyplot as plt
plt.imshow(z, cmap=plt.cm.gray, extent=[-5, 5, -5, 5])
plt.colorbar()
plt.title("Image plot of $\sqrt{x^2 + y^2}$ for a grid of values")

In [71]:
plt.draw()

In [72]:
plt.close("all")

In [73]:
xarr = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
yarr = np.array([2.1, 2.2, 2.3, 2.4, 2.5])
cond = np.array([True, False, True, True, False])

In [74]:
result = [(x if c else y)
          for x, y, c in zip(xarr, yarr, cond)]
result

In [75]:
result = np.where(cond, xarr, yarr)
result

In [76]:
arr = rng.standard_normal((4, 4))
arr
arr > 0
np.where(arr > 0, 2, -2)

In [77]:
np.where(arr > 0, 2, arr) # set only positive values to 2

In [78]:
arr = rng.standard_normal((5, 4))
arr
arr.mean()
np.mean(arr)
arr.sum()

In [79]:
arr.mean(axis=1)
arr.sum(axis=0)

In [80]:
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7])
arr.cumsum()

In [81]:
arr = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
arr

In [82]:
arr.cumsum(axis=0)
arr.cumsum(axis=1)

In [83]:
arr = rng.standard_normal(100)
(arr > 0).sum() # Number of positive values
(arr <= 0).sum() # Number of non-positive values

In [84]:
bools = np.array([False, False, True, False])
bools.any()
bools.all()

In [85]:
arr = rng.standard_normal(6)
arr
arr.sort()
arr

In [86]:
arr = rng.standard_normal((5, 3))
arr

In [87]:
arr.sort(axis=0)
arr
arr.sort(axis=1)
arr

In [88]:
arr2 = np.array([5, -10, 7, 1, 0, -3])
sorted_arr2 = np.sort(arr2)
sorted_arr2

In [89]:
names = np.array(["Bob", "Will", "Joe", "Bob", "Will", "Joe", "Joe"])
np.unique(names)
ints = np.array([3, 3, 3, 2, 2, 1, 1, 4, 4])
np.unique(ints)

In [90]:
sorted(set(names))

In [91]:
values = np.array([6, 0, 0, 3, 2, 5, 6])
np.in1d(values, [2, 3, 6])

In [92]:
arr = np.arange(10)
np.save("some_array", arr)

In [93]:
np.load("some_array.npy")

In [94]:
np.savez("array_archive.npz", a=arr, b=arr)

In [95]:
arch = np.load("array_archive.npz")
arch["b"]

In [96]:
np.savez_compressed("arrays_compressed.npz", a=arr, b=arr)

In [97]:
!rm some_array.npy
!rm array_archive.npz
!rm arrays_compressed.npz

In [98]:
x = np.array([[1., 2., 3.], [4., 5., 6.]])
y = np.array([[6., 23.], [-1, 7], [8, 9]])
x
y
x.dot(y)

In [99]:
np.dot(x, y)

In [100]:
x @ np.ones(3)

In [101]:
from numpy.linalg import inv, qr
X = rng.standard_normal((5, 5))
mat = X.T @ X
inv(mat)
mat @ inv(mat)

In [102]:
import random
position = 0
walk = [position]
nsteps = 1000
for _ in range(nsteps):
    step = 1 if random.randint(0, 1) else -1
    position += step
    walk.append(position)


In [103]:
plt.figure()

In [104]:
plt.plot(walk[:100])

In [105]:
nsteps = 1000
rng = np.random.default_rng(seed=12345)  # fresh random generator
draws = rng.integers(0, 2, size=nsteps)
steps = np.where(draws == 0, 1, -1)
walk = steps.cumsum()

In [106]:
walk.min()
walk.max()

In [107]:
(np.abs(walk) >= 10).argmax()

In [108]:
nwalks = 5000
nsteps = 1000
draws = rng.integers(0, 2, size=(nwalks, nsteps)) # 0 or 1
steps = np.where(draws > 0, 1, -1)
walks = steps.cumsum(axis=1)
walks

In [109]:
walks.max()
walks.min()

In [110]:
hits30 = (np.abs(walks) >= 30).any(axis=1)
hits30
hits30.sum() # Number that hit 30 or -30

In [111]:
crossing_times = (np.abs(walks[hits30]) >= 30).argmax(axis=1)
crossing_times

In [112]:
crossing_times.mean()

In [113]:
draws = 0.25 * rng.standard_normal((nwalks, nsteps))