In [2]:
import numpy as np
np.__version__

'1.18.1'

In [4]:
np.array(list(range(10)))

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

## Methods to create arrays from scratch

* `np.zeros()`    - array of 0's
* `np.ones()`     - array of 1's
* `np.full()`     - array of any value specified
* `np.arrange()`  - array of a sequence
* `np.linspace()` - array of n evenly spaced values between 2 specified values

In [5]:
np.full((3,5), 3.14)

array([[3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14]])

In [7]:
np.linspace(0.01,1, num=100).reshape(10,10)

array([[0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1 ],
       [0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2 ],
       [0.21, 0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3 ],
       [0.31, 0.32, 0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.4 ],
       [0.41, 0.42, 0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.5 ],
       [0.51, 0.52, 0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.6 ],
       [0.61, 0.62, 0.63, 0.64, 0.65, 0.66, 0.67, 0.68, 0.69, 0.7 ],
       [0.71, 0.72, 0.73, 0.74, 0.75, 0.76, 0.77, 0.78, 0.79, 0.8 ],
       [0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87, 0.88, 0.89, 0.9 ],
       [0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99, 1.  ]])

In [8]:
np.empty((3,3))

array([[0.00000000e+000, 0.00000000e+000, 0.00000000e+000],
       [0.00000000e+000, 0.00000000e+000, 6.81810591e-321],
       [4.00526875e-307, 7.56577398e-307, 9.34600963e-307]])

In [9]:
np.random.seed(0) #set seed for reproducibility

x1 = np.random.randint(10, size=6)  # One-dimensional array
x2 = np.random.randint(10, size=(3, 4))  # Two-dimensional array
x3 = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array

In [11]:
x3

array([[[8, 1, 5, 9, 8],
        [9, 4, 3, 0, 3],
        [5, 0, 2, 3, 8],
        [1, 3, 3, 3, 7]],

       [[0, 1, 9, 9, 0],
        [4, 7, 3, 2, 7],
        [2, 0, 0, 4, 5],
        [5, 6, 8, 4, 1]],

       [[4, 9, 8, 1, 1],
        [7, 9, 9, 3, 6],
        [7, 2, 0, 3, 5],
        [9, 4, 4, 6, 4]]])

## array attributes
* `.ndim` - the number of dimensions
* `.shape` - the size of each dimension
* `.size` - the total size of the array
* `.dtype` - the data type of the array
* `.itemsize` - lists the size (in bytes) of each **element** in the array
* `.nbytes` - lists the **total** size (in bytes) of the array

In [12]:
x3.ndim

3

In [13]:
x2.shape

(3, 4)

In [17]:
x3.size

60

In [19]:
print("size:", x3.size, "bytes")
print("itemsize:", x3.itemsize, "bytes")
print("nbytes:", x3.nbytes, "bytes")  # nbytes = itemsize * size

size: 60 bytes
itemsize: 4 bytes
nbytes: 240 bytes


## Array indexing and slicing

In [21]:
x2

array([[3, 5, 2, 4],
       [7, 6, 8, 8],
       [1, 6, 7, 7]])

In [20]:
x2[::-1,] # reverse the order of the rows

array([[1, 6, 7, 7],
       [7, 6, 8, 8],
       [3, 5, 2, 4]])

In [28]:
x2[1::-1,2:0:-1] # grab small slice while reversing both rows and cols

array([[8, 6],
       [2, 5]])

## Subarrarys as "views" not copies

Changing a subarray (slice) value will modify the original array!

In [30]:
x2

array([[3, 5, 2, 4],
       [7, 6, 8, 8],
       [1, 6, 7, 7]])

In [32]:
x2_sub = x2[:2,:2]
print(x2_sub)

[[3 5]
 [7 6]]


In [33]:
x2_sub[0,0] = 99
print(x2_sub)

[[99  5]
 [ 7  6]]


In [35]:
x2 # is also modified

array([[99,  5,  2,  4],
       [ 7,  6,  8,  8],
       [ 1,  6,  7,  7]])

Use `.copy()` method to explicitly copy the data within the array to the subarray

In [36]:
x2_sub_copy = x2[:2, :2].copy()
print(x2_sub_copy)

[[99  5]
 [ 7  6]]


In [37]:
x2_sub_copy[0, 0] = 42
print(x2_sub_copy)

[[42  5]
 [ 7  6]]


In [39]:
print(x2) # unchanged

[[99  5  2  4]
 [ 7  6  8  8]
 [ 1  6  7  7]]


## Arithmetic operators implemented by NumPy

Each of these arithmetic operations are simply convenient wrappers around specific functions built into NumPy

|Operator | Equivalent ufunc | Description|
| --- | ---- | ------|
| + | `np.add` | Addition (e.g., 1 + 1 = 2)|
| - | `np.subtract` | Subtraction (e.g., 3 - 2 = 1)|
| - | `np.negative` | Unary negation (e.g., -2)|
| *	| `np.multiply` | Multiplication (e.g., 2 * 3 = 6)|
| / | `np.divide` | Division (e.g., 3 / 2 = 1.5)|
| // | `np.floor_divide` | Floor division (e.g., 3 // 2 = 1)|
| ** | `np.power` | Exponentiation (e.g., 2 ** 3 = 8)|
| % | `np.mod` | Modulus/remainder (e.g., 9 % 4 = 1)|

NumPy has many more ufuncs available, including hyperbolic trig functions, bitwise arithmetic, comparison operators, conversions from radians to degrees, rounding and remainders, and much more.

Another excellent source for more specialized and obscure ufuncs is the submodule `scipy.special`. If you want to compute some obscure mathematical function on your data, chances are it is implemented in `scipy.special`.

## Advanced Ufunc Features

For large calculations, it is sometimes useful to be able to specify the array where the result of the calculation will be stored. Rather than creating a temporary array, this can be used to write computation results directly to the memory location where you'd like them to be. For all ufuncs, this can be done using the `out` argument of the function:

In [42]:
x = np.arange(5)
y = np.zeros(10)
z = np.zeros(10)
print(x)

[0 1 2 3 4]


In [45]:
print("Ufunc using 'out'")
%timeit np.power(2, x, out=y[::2])
print("using assignment operator")
%timeit z[::2] = 2 ** x
print(y)
print(z)

Ufunc using 'out'
2 µs ± 58.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
using assignment operator
1.39 µs ± 13.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
[ 1.  0.  2.  0.  4.  0.  8.  0. 16.  0.]
[ 1.  0.  2.  0.  4.  0.  8.  0. 16.  0.]


In [47]:
%load_ext memory_profiler

In [48]:
print("Ufunc using 'out'")
%memit np.power(2, x, out=y[::2])
print("using assignment operator")
%memit z[::2] = 2 ** x

Ufunc using 'out'
peak memory: 121.23 MiB, increment: 0.10 MiB
using assignment operator
peak memory: 121.23 MiB, increment: 0.00 MiB


According to the book the second method should use more memory but it doesn't appear to be doing so. 

> "If we had instead written `y[::2] = 2 ** x`, this would have resulted in the creation of a temporary array to hold the results of `2 ** x`, followed by a second operation copying those values into the `y` array. "

Perhaps it did in older version of Numpy but not now.

## Aggregation functions

NumPy has fast built-in aggregation functions for working on arrays

In [50]:
big_array = np.random.rand(1000000)
%timeit sum(big_array)     # built-in Python function
%timeit np.sum(big_array)  # NumPy function much faster

130 ms ± 1.62 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
750 µs ± 20.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


For `min`, `max`, `sum`, and several other NumPy aggregates, a shorter syntax is to use methods of the array object itself:

In [51]:
print(big_array.min(), big_array.max(), big_array.sum())

1.4057692298008462e-06 0.9999994392723005 500209.12067471276


Whenever possible, make sure that you are using the NumPy version of these aggregates when operating on NumPy arrays!

Aggregation functions take an additional argument specifying the axis along which the aggregate is computed.

The `axis` keyword specifies the dimension of the array that will be collapsed, rather than the dimension that will be returned. So specifying `axis=0` means that the first axis will be collapsed (for two-dimensional arrays, this means that values within each column will be aggregated).

### Nan-safe

Most aggregates have a NaN-safe counterpart that computes the result while ignoring missing values

In [65]:
nan_test = np.random.random(5)
print(nan_test)

[0.0049466  0.25863997 0.62346477 0.90474173 0.71661557]


In [66]:
nan_test[2] = float('nan')
print(nan_test)

[0.0049466  0.25863997        nan 0.90474173 0.71661557]


In [70]:
np.sum(nan_test)

nan

In [69]:
np.nansum(nan_test)

1.8849438705821804