<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Why-NumPy?" data-toc-modified-id="Why-NumPy?-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Why NumPy?</a></span></li><li><span><a href="#1D-arrays" data-toc-modified-id="1D-arrays-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>1D arrays</a></span></li><li><span><a href="#2D-arrays" data-toc-modified-id="2D-arrays-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>2D arrays</a></span></li><li><span><a href="#Extending-arrays" data-toc-modified-id="Extending-arrays-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Extending arrays</a></span></li><li><span><a href="#Permuting-dimensions" data-toc-modified-id="Permuting-dimensions-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Permuting dimensions</a></span></li><li><span><a href="#Increasing-and-decreasing-dimensions" data-toc-modified-id="Increasing-and-decreasing-dimensions-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Increasing and decreasing dimensions</a></span></li><li><span><a href="#Slicing" data-toc-modified-id="Slicing-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>Slicing</a></span></li><li><span><a href="#Fancy-indexing" data-toc-modified-id="Fancy-indexing-8"><span class="toc-item-num">8&nbsp;&nbsp;</span>Fancy indexing</a></span></li><li><span><a href="#Boolean-array-indexing" data-toc-modified-id="Boolean-array-indexing-9"><span class="toc-item-num">9&nbsp;&nbsp;</span>Boolean array indexing</a></span></li><li><span><a href="#Elementwise-(array-to-array)-math" data-toc-modified-id="Elementwise-(array-to-array)-math-10"><span class="toc-item-num">10&nbsp;&nbsp;</span>Elementwise (array-to-array) math</a></span></li><li><span><a href="#Vector-math" data-toc-modified-id="Vector-math-11"><span class="toc-item-num">11&nbsp;&nbsp;</span>Vector math</a></span></li><li><span><a href="#Matrix-math" data-toc-modified-id="Matrix-math-12"><span class="toc-item-num">12&nbsp;&nbsp;</span>Matrix math</a></span></li><li><span><a href="#Broadcasting" data-toc-modified-id="Broadcasting-13"><span class="toc-item-num">13&nbsp;&nbsp;</span>Broadcasting</a></span></li><li><span><a href="#How-fast-is-Numpy's-array-math?" data-toc-modified-id="How-fast-is-Numpy's-array-math?-14"><span class="toc-item-num">14&nbsp;&nbsp;</span>How fast is Numpy's array math?</a></span></li><li><span><a href="#Arrays-of-objects" data-toc-modified-id="Arrays-of-objects-15"><span class="toc-item-num">15&nbsp;&nbsp;</span>Arrays of objects</a></span></li><li><span><a href="#Structured-arrays" data-toc-modified-id="Structured-arrays-16"><span class="toc-item-num">16&nbsp;&nbsp;</span>Structured arrays</a></span></li><li><span><a href="#Disk-I/O" data-toc-modified-id="Disk-I/O-17"><span class="toc-item-num">17&nbsp;&nbsp;</span>Disk I/O</a></span></li></ul></div>

# SciPy.org's [NumPy](http://www.numpy.org/)

Basically, NumPy [provides](https://www.oreilly.com/library/view/python-for-data/9781449323592/ch04.html):
1. Fast and space-efficient multidimensional arrays.
2. Vectorized arithmetic operations.
3. Broadcasting capabilities.
4. I/O of array data from/to disk.
5. Linear algebra operators.
6. Random number generators.
7. Fourier transform capabilities.
8. Tools for integrating code written in C, C++, and Fortran.

## Why NumPy?

Good running times.

In [3]:
try:
    import numpy as np
except:
    !pip3 install numpy
    import numpy as np

* Lets define a list and compute the sum of its elements, timing it:

In [2]:
l = list(range(0,100000)); print(type(l), l[:10])
%timeit sum(l)

<class 'list'> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1.54 ms ± 75.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


* An now, lets create a numpy's array and time the sum of its elements:

In [3]:
A = np.arange(0, 100000); print(type(A), A[:10])
%timeit np.sum(A)

<class 'numpy.ndarray'> [0 1 2 3 4 5 6 7 8 9]
98.7 µs ± 8.7 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


* And what about a *pure* C implementation of an equivalent computation: 

In [4]:
!cat sum_array.c
!gcc -O3 sum_array.c -o sum_array
%timeit !./sum_array

#include <stdio.h>
#include <time.h>
#include "sum_array_lib.c"

#define N 100000

int main() {
  double a[N];
  int i;
  clock_t start, end;
  double cpu_time;
  for(i=0; i<N; i++) {
    a[i] = i;
  }
  start = clock();
  double sum = sum_array(a,N);
  end = clock();
  printf("%f ", sum);
  cpu_time = ((double) (end - start)) / CLOCKS_PER_SEC;
  cpu_time *= 1000000;
  printf("%f usegs\n", cpu_time);
}
4999950000.000000 206.000000 usegs
4999950000.000000 150.000000 usegs
4999950000.000000 150.000000 usegs
4999950000.000000 150.000000 usegs
4999950000.000000 168.000000 usegs
4999950000.000000 171.000000 usegs
4999950000.000000 176.000000 usegs
4999950000.000000 195.000000 usegs
4999950000.000000 152.000000 usegs
4999950000.000000 166.000000 usegs
4999950000.000000 168.000000 usegs
4999950000.000000 173.000000 usegs
4999950000.000000 174.000000 usegs
4999950000.000000 173.000000 usegs
4999950000.000000 171.000000 usegs
4999950000.000000 183.000000 usegs
4999950000.000000 157.000000 usegs

* Another example:

In [5]:
# Example extracted from https://github.com/pyHPC/pyhpc-tutorial
lst = range(1000000)

for i in lst[:10]:
    print(i, end=' ')
print()

%timeit [i + 1 for i in lst] # A Python list comprehension (iteration happens in C but with PyObjects)
x = [i + 1 for i in lst]

print(x[:10])

arr = np.arange(1000000) # A NumPy list of integers
%timeit arr + 1 # Use operator overloading for nice syntax, now iteration is in C with ints
y = arr + 1

print(y[:10])

0 1 2 3 4 5 6 7 8 9 
207 ms ± 4.56 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
3 ms ± 283 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
[ 1  2  3  4  5  6  7  8  9 10]


* Looking for information of numpy's *something*:

In [6]:
np.lookfor('invert')

Search results for 'invert'
---------------------------
numpy.bitwise_not
    Compute bit-wise inversion, or bit-wise NOT, element-wise.
numpy.matrix.getI
    Returns the (multiplicative) inverse of invertible `self`.
numpy.in1d
    Test whether each element of a 1-D array is also present in a second array.
numpy.isin
    Calculates `element in test_elements`, broadcasting over `element` only.
numpy.transpose
    Permute the dimensions of an array.
numpy.linalg.inv
    Compute the (multiplicative) inverse of a matrix.
numpy.linalg.pinv
    Compute the (Moore-Penrose) pseudo-inverse of a matrix.
numpy.random.SFC64
    BitGenerator for Chris Doty-Humphrey's Small Fast Chaotic PRNG.
numpy.linalg.tensorinv
    Compute the 'inverse' of an N-dimensional array.
numpy.linalg.matrix_power
    Raise a square matrix to the (integer) power `n`.

* Remember that it's possible to use the tabulator to extend some command or to use a wildcard in Ipython to get the numpy's stuff:

In [7]:
np.*?

## 1D arrays
In NumPy, a 1D array is a grid of values, usually all of the same type, indexed by a nonnegative integer.

* Creating an empty array:

In [248]:
A = np.array([], dtype=np.uint8)
print(A)

[]


* A is an object of the "numpy.ndarray" class:

In [252]:
print(type(A))

<class 'numpy.ndarray'>


In [255]:
print(f"number of dimensions={A.ndim}, shape={A.shape}, type={A.dtype}")

number of dimensions=1, shape=(0,), type=uint8


* Creating an array using a list:

In [10]:
A = np.array([1, 2, 3])
print(type([1, 2, 3]))
print(type(A))

<class 'list'>
<class 'numpy.ndarray'>


* Native Python's [`len()`](https://docs.python.org/3.6/library/functions.html#len) also works:

In [14]:
print(len(A))

3


* Initialized arrays:

In [257]:
print(np.zeros(10))

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]


In [259]:
print(np.ones(10))

[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]


In [262]:
print(np.empty(12)) # Uninitialized garbage values, possiblely random.

[ 0.          0.17043426  0.          0.60884981 -0.         -0.54254531
  0.          0.43708068 -0.         -0.09102712  0.          0.40845499]


In [264]:
print(np.arange(10))

[0 1 2 3 4 5 6 7 8 9]


In [258]:
print(np.linspace(1., 4., 6))

[1.  1.6 2.2 2.8 3.4 4. ]


In [263]:
print(np.random.rand(10))

[0.12225613 0.26157231 0.56872075 0.52115777 0.9546527  0.14133928
 0.97872724 0.90080234 0.22954118 0.3996211 ]


* Arrays can be created from different types of contaniers (which store complex numbers in this case):

In [16]:
C = [[1,1.0],(1+1j,.3)]
print(type(C), type(C[0]), type(C[1]))
X = np.array(C)
X

<class 'list'> <class 'list'> <class 'tuple'>


array([[1. +0.j, 1. +0.j],
       [1. +1.j, 0.3+0.j]])

* Accessing to an element:

In [17]:
print(A, A[0], A[1])

[1 2 3] 1 2


In [18]:
A[0] = 0
print(A)

[0 2 3]


* Appending elements:

In [19]:
A = np.append(A, 4)
A

array([0, 2, 3, 4])

## 2D arrays
A 2D (and in general, an N dimensional) array is a 2D (ND) grid of values, usually all of the same type, indexed by a pair (a tuple) of nonnegative integers.

* Creating a 2D array with two 1D arrays:

In [20]:
B = np.array([[1,2,3],[4,5,6]])
print(B)
print(B.shape)
print(B[1, 2]) # [row, column]
print(B[0, 1])

[[1 2 3]
 [4 5 6]]
(2, 3)
6
2


* With zeroes:

In [21]:
A = np.zeros((5,5))
print(A)

[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]


* The default dtype is `float64`:

In [22]:
print(type(A[0][0]))

<class 'numpy.float64'>


* With ones:

In [23]:
A = np.ones((5,5))
print(A)

[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]


* With an arbitrary scalar:

In [24]:
A = np.full((5,5), 2)
print(A)

[[2 2 2 2 2]
 [2 2 2 2 2]
 [2 2 2 2 2]
 [2 2 2 2 2]
 [2 2 2 2 2]]


* The identity matrix:

In [25]:
A = np.eye(5)
print(A)

[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]


* With random data:

In [26]:
A = np.random.random((5,5))
print(A)

[[0.32890373 0.62877388 0.86907328 0.85953374 0.14791725]
 [0.60882391 0.31353135 0.86486938 0.729404   0.99800731]
 [0.47965179 0.50635352 0.13436516 0.40258373 0.10523697]
 [0.72980591 0.14461203 0.81009289 0.05912546 0.42760044]
 [0.57113811 0.37388072 0.9554644  0.03620718 0.74617999]]


In [27]:
# Always random
A = np.random.random((5,5))
print(A)

[[0.7606302  0.21873343 0.25885536 0.43803029 0.32467363]
 [0.92475943 0.88446699 0.8737746  0.52593091 0.27847475]
 [0.21634782 0.7871315  0.21594953 0.69669189 0.2630949 ]
 [0.97300372 0.09730144 0.95424731 0.7666204  0.02014721]
 [0.53326832 0.72583889 0.13127373 0.37063115 0.77342639]]


* Filled with [arbitrary](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.empty_like.html) data and with a previously defined shape:

In [28]:
B = np.empty_like(A) # The content could be any
print(B)

[[0.7606302  0.21873343 0.25885536 0.43803029 0.32467363]
 [0.92475943 0.88446699 0.8737746  0.52593091 0.27847475]
 [0.21634782 0.7871315  0.21594953 0.69669189 0.2630949 ]
 [0.97300372 0.09730144 0.95424731 0.7666204  0.02014721]
 [0.53326832 0.72583889 0.13127373 0.37063115 0.77342639]]


* With a 1D list comprehension:

In [29]:
A = np.array([i for i in range(5)])
print(A, A[1], A.shape)

[0 1 2 3 4] 1 (5,)


* With a 2D list comprehension:

In [30]:
A = np.array([[j+i*4 for j in range(4)] for i in range(5)])
print(A, A.shape)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]] (5, 4)


* Accessing to a row of a matrix:

In [31]:
A[1] # Get row 2

array([4, 5, 6, 7])

* Accessing to an element of a matrix:

In [32]:
A[1][2] # [row][column]

6

In [33]:
A[1,2] # [row, column]

6

* Be careful:

In [34]:
timeit A[1][2]

1.22 µs ± 70.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [35]:
timeit A[1,2] # This is faster than a[1][2]

503 ns ± 92.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


* Getting elements of a matrix using "integer array indexing":

In [36]:
print(A)
print(A[[0, 1, 2], [3, 2, 1]])

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]]
[3 6 9]


* The same integer array indexing using comprehension lists:

In [37]:
print(A[np.array([i for i in range(3)]), np.array([i for i in range(3,0,-1)])])

[3 6 9]


* The same using [`np.arange()`](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.arange.html):

In [38]:
print(np.arange(3))
print(np.arange(3,0,-1))
print(A[np.arange(3), np.arange(3,0,-1)])

[0 1 2]
[3 2 1]
[3 6 9]


* Reshaping:

In [39]:
A.shape

(5, 4)

In [40]:
np.reshape(A, (10, 2))

array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11],
       [12, 13],
       [14, 15],
       [16, 17],
       [18, 19]])

In [41]:
B = np.reshape(A, (10, 2), order='C') # C -> C language (the default behaviour)
B

array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11],
       [12, 13],
       [14, 15],
       [16, 17],
       [18, 19]])

In [42]:
print(np.isfortran(B))

False

In [43]:
# As you can see, by default, numpy runs the 2D arrays by rows (axis 0) when
# the source matrix A is read and the destination matrix B is "written":
# A[0, 0] == B[0, 0]
# A[0, 1] == B[0, 1]
# A[0, 2] == B[1, 0]
# A[0, 3] == B[1, 1]
# :
print(A)

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])

In [44]:
print(A[0, 3], B[1, 1])

3 3


In [45]:
B = np.reshape(A, (10, 2), order='F') # F -> Fortran language
print(B)

array([[ 0,  2],
       [ 4,  6],
       [ 8, 10],
       [12, 14],
       [16, 18],
       [ 1,  3],
       [ 5,  7],
       [ 9, 11],
       [13, 15],
       [17, 19]])

In [46]:
# Using the Fortran ordering, numpy runs the 2D array by columns (axis 1):
# A[0, 0] == B[0, 0]
# A[1, 0] == B[1, 0]
# A[2, 0] == B[2, 0]
# A[3, 0] == B[3, 0]
# A[4, 0] == B[4, 0]
# A[0, 1] == B[5, 0]
# :
print(A)

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])

In [47]:
print(A[0, 1], B[5, 0])

1 1


In [48]:
np.isfortran(B)

True

* Views and copies:

In [49]:
# https://stackoverflow.com/questions/56090021/list-comprehension-python-prime-numbers
Primes_less_than_100 = np.array([x for x in range(2,100) if not any([x % y == 0 for y in range(2, int(x/2)+1)])])
Primes_less_than_100

array([ 2,  3,  5,  7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59,
       61, 67, 71, 73, 79, 83, 89, 97])

In [50]:
A = Primes_less_than_100 # This is a copy of pointers
A

array([ 2,  3,  5,  7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59,
       61, 67, 71, 73, 79, 83, 89, 97])

In [51]:
A[0]=1
A

array([ 1,  3,  5,  7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59,
       61, 67, 71, 73, 79, 83, 89, 97])

In [52]:
Primes_less_than_100

array([ 1,  3,  5,  7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59,
       61, 67, 71, 73, 79, 83, 89, 97])

In [53]:
id(A)

4707562320

In [54]:
id(Primes_less_than_100)

4707562320

In [55]:
Primes_less_than_100 = np.array([x for x in range(2,100) if not any([x % y == 0 for y in range(2, int(x/2)+1)])])
A = np.copy(Primes_less_than_100)

In [56]:
id(A)

4707554448

In [57]:
id(Primes_less_than_100)

4707557744

In [58]:
Primes_less_than_100[0]

2

In [59]:
A[0] = 1

In [60]:
print(Primes_less_than_100[0], A[0])

2 1


In [61]:
timeit A=Primes_less_than_100 # This is much faster, depending on the size of the array

74.8 ns ± 6.22 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [62]:
%timeit A = np.copy(Primes_less_than_100)

5.77 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)


* An example with a 4-dimensions array:

In [63]:
A = np.ndarray((2,3,4,2))
A

array([[[[2.68156159e+154, 2.68156159e+154],
         [2.32501267e-314, 2.21594484e-314],
         [2.23082969e-314, 2.32506105e-314],
         [2.21480364e-314, 2.32504491e-314]],

        [[2.32506603e-314, 2.32506606e-314],
         [2.21473616e-314, 2.21462334e-314],
         [2.21462343e-314, 2.21462337e-314],
         [2.21462299e-314, 2.21525954e-314]],

        [[2.21481780e-314, 2.21525960e-314],
         [2.21480139e-314, 2.32506625e-314],
         [2.21540777e-314, 2.21540764e-314],
         [2.21475121e-314, 2.21539591e-314]]],


       [[[2.21671805e-314, 2.32507355e-314],
         [2.21466074e-314, 2.21557005e-314],
         [2.21575167e-314, 2.21551351e-314],
         [2.21475137e-314, 2.32506634e-314]],

        [[2.21491876e-314, 2.21656150e-314],
         [2.21464591e-314, 2.21487611e-314],
         [2.21551446e-314, 2.21579904e-314],
         [2.21664355e-314, 2.32506644e-314]],

        [[2.21486662e-314, 2.21472591e-314],
         [2.21475029e-314, 2.21475162e-314]

In [64]:
A.shape

(2, 3, 4, 2)

In [65]:
# The same can be done with:
A = np.ndarray(2*3*4*2).reshape(2, 3, 4, 2)
A.shape

(2, 3, 4, 2)

## Extending arrays

* Appending to the end of a 1D vector

## Permuting dimensions

* Simplest case: matrix permutation:

In [178]:
A = np.arange(20).reshape(5,4)
print(A, A.shape)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]] (5, 4)


In [180]:
print(np.transpose(A), np.transpose(A).shape)

[[ 0  4  8 12 16]
 [ 1  5  9 13 17]
 [ 2  6 10 14 18]
 [ 3  7 11 15 19]] (4, 5)


In [179]:
print(A.T, A.T.shape)

[[ 0  4  8 12 16]
 [ 1  5  9 13 17]
 [ 2  6 10 14 18]
 [ 3  7 11 15 19]] (4, 5)


* Permuting dimensions only makes sense when the number of dimensions is > 1:

In [175]:
A = np.arange(10)
print(A.shape)

(10,)


In [176]:
print(A.T.shape)

(10,)


* Transposing permutes all the dimensions:

In [183]:
A = np.arange(60).reshape(3,5,4)
print(A, A.shape)

[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]
  [12 13 14 15]
  [16 17 18 19]]

 [[20 21 22 23]
  [24 25 26 27]
  [28 29 30 31]
  [32 33 34 35]
  [36 37 38 39]]

 [[40 41 42 43]
  [44 45 46 47]
  [48 49 50 51]
  [52 53 54 55]
  [56 57 58 59]]] (3, 5, 4)


In [186]:
print(A.T, A.T.shape)

[[[ 0 20 40]
  [ 4 24 44]
  [ 8 28 48]
  [12 32 52]
  [16 36 56]]

 [[ 1 21 41]
  [ 5 25 45]
  [ 9 29 49]
  [13 33 53]
  [17 37 57]]

 [[ 2 22 42]
  [ 6 26 46]
  [10 30 50]
  [14 34 54]
  [18 38 58]]

 [[ 3 23 43]
  [ 7 27 47]
  [11 31 51]
  [15 35 55]
  [19 39 59]]] (4, 5, 3)


## Increasing and decreasing dimensions

* Shape and dimensions:

In [192]:
A = np.arange(5)
print(A, A.shape, A.ndim)

[0 1 2 3 4] (5,) 1


* Increasing the dimensions on the right:

In [195]:
B = A[:, None]
print(B, B.shape, B.ndim)

[[0]
 [1]
 [2]
 [3]
 [4]] (5, 1) 2


* Increasing the dimensions on the left:

In [194]:
B = A[None, :]
print(B, B.shape, B.ndim)

[[0 1 2 3 4]] (1, 5) 2


* For convenience, NumPy provides the np.newaxis object instead None (althougt both are quivalent):

In [399]:
B = A[np.newaxis, :]
print(B, B.shape, B.ndim)

[[[[ 0  1  2]
   [ 3  4  5]
   [ 6  7  8]]

  [[ 9 10 11]
   [12 13 14]
   [15 16 17]]

  [[18 19 20]
   [21 22 23]
   [24 25 26]]]] (1, 3, 3, 3) 4


## Slicing

In [66]:
A = np.arange(50).reshape(5,10)
print(A)

[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29]
 [30 31 32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47 48 49]]


* Get all rows of a matrix (the whole matrix):

In [67]:
print(A[:])

[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29]
 [30 31 32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47 48 49]]


In [68]:
print(A[:,:])

[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29]
 [30 31 32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47 48 49]]


In [69]:
timeit A[:]

527 ns ± 24 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [70]:
timeit A[:,:] # This is slightly slower than 'a[:]'

581 ns ± 31.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [71]:
# Notation: [starting index : stoping index : step]
# By default, start = 0, stop = maximum, step = 1
print(A[::])

[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29]
 [30 31 32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47 48 49]]


In [72]:
timeit A[::] # Identical to 'a[:]'

475 ns ± 25.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [73]:
timeit A # But not to 'a'

46.6 ns ± 1.42 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [74]:
print(A[0:])

[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29]
 [30 31 32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47 48 49]]


In [75]:
print(A[0::])

[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29]
 [30 31 32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47 48 49]]


In [76]:
print(A[:A.shape[1]])

[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29]
 [30 31 32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47 48 49]]


In [77]:
print(A[:A.shape[1]:])

[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29]
 [30 31 32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47 48 49]]


In [78]:
print(A[:A.shape[1]:1])

[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29]
 [30 31 32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47 48 49]]


* Get all rows of a matrix, except the first one:

In [79]:
print(A[1:])

[[10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29]
 [30 31 32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47 48 49]]


In [80]:
print(A[1::])

[[10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29]
 [30 31 32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47 48 49]]


* Get the first two rows of a matrix:

In [81]:
print(A[0:2])

[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]]


* Get the even rows of a matrix:

In [82]:
print(A[0::2])

[[ 0  1  2  3  4  5  6  7  8  9]
 [20 21 22 23 24 25 26 27 28 29]
 [40 41 42 43 44 45 46 47 48 49]]


* Get the odd rows of a matrix:

In [83]:
print(A[1::2])

[[10 11 12 13 14 15 16 17 18 19]
 [30 31 32 33 34 35 36 37 38 39]]


* Get the odd columns of a matrix:

In [84]:
print(A[:,1::2])

[[ 1  3  5  7  9]
 [11 13 15 17 19]
 [21 23 25 27 29]
 [31 33 35 37 39]
 [41 43 45 47 49]]


* Getting the second row:

In [85]:
print(A[1,:])

[10 11 12 13 14 15 16 17 18 19]


* Getting the third column:

In [86]:
print(A[:,2])

[ 2 12 22 32 42]


* Getting a top-left $2\times 2$ submatrix:

In [87]:
print(A[:2,:2])

[[ 0  1]
 [10 11]]


* Getting a bottom-right $2\times 2$ submatrix:

In [88]:
print(A[A.shape[0]-2:,A.shape[1]-2:])

[[38 39]
 [48 49]]


* Slices are [views](https://docs.scipy.org/doc/numpy/glossary.html#term-view) of the same data:

In [299]:
A = np.arange(10)
print(A)
B = A[1:3] # B is simply a new view of A
B[:] = 1000
print(B)
print(A)

[0 1 2 3 4 5 6 7 8 9]
[1000 1000]
[   0 1000 1000    3    4    5    6    7    8    9]


In [300]:
A = np.arange(10)
print(A)
B = A[::-1]
B[1] = 1000
print(B)
print(A)

[0 1 2 3 4 5 6 7 8 9]
[   9 1000    7    6    5    4    3    2    1    0]
[   0    1    2    3    4    5    6    7 1000    9]


* Copying slices:

In [301]:
A = np.arange(10)
print(A)
B = A[::-1].copy()
B[1] = 1000
print(B)
print(A)

[0 1 2 3 4 5 6 7 8 9]
[   9 1000    7    6    5    4    3    2    1    0]
[0 1 2 3 4 5 6 7 8 9]


* Ellipsis:

In [396]:
A = np.arange(27).reshape(3,3,3)
print(A)

[[[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]]

 [[ 9 10 11]
  [12 13 14]
  [15 16 17]]

 [[18 19 20]
  [21 22 23]
  [24 25 26]]]


In [397]:
print(A[1,:,:])

[[ 9 10 11]
 [12 13 14]
 [15 16 17]]


In [398]:
print(A[1,...])

[[ 9 10 11]
 [12 13 14]
 [15 16 17]]


## Fancy indexing
Also called [advanced indexing](https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-indexing).

* Allows to access to a customizable list of elements:

In [4]:
A = np.arange(100).reshape(10,10)
print(A)

[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29]
 [30 31 32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47 48 49]
 [50 51 52 53 54 55 56 57 58 59]
 [60 61 62 63 64 65 66 67 68 69]
 [70 71 72 73 74 75 76 77 78 79]
 [80 81 82 83 84 85 86 87 88 89]
 [90 91 92 93 94 95 96 97 98 99]]


In [6]:
print(A[[0, 1, 2], [0, 1, 2]])

[ 0 11 22]


* Access randomly to rows and columns of a matrix:

In [382]:
lst_of_rows = [1, 2, 4] # Using a list
print(A[lst_of_rows])

[[10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29]
 [40 41 42 43 44 45 46 47 48 49]]


In [383]:
arr_of_rows = np.array([1, 2, 4]) # Using an ndarray
print(A[arr_of_rows])

[[10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29]
 [40 41 42 43 44 45 46 47 48 49]]


In [387]:
lst_of_columns = [1, 2, 5]
sub_matrix = A[lst_of_rows][:, lst_of_columns]
print(sub_matrix)

[[11 12 15]
 [21 22 25]
 [41 42 45]]


* Be careful, advanced indexing always returns a copy of the data (contrast with basic slicing that returns a view).

In [13]:
B = A[[0, 1, 2], [0, 1, 2]]
print(B)

[ 0 11 22]


In [8]:
print(A)

[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29]
 [30 31 32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47 48 49]
 [50 51 52 53 54 55 56 57 58 59]
 [60 61 62 63 64 65 66 67 68 69]
 [70 71 72 73 74 75 76 77 78 79]
 [80 81 82 83 84 85 86 87 88 89]
 [90 91 92 93 94 95 96 97 98 99]]


In [18]:
B[...] = -1

In [19]:
print(B)

[-1 -1 -1]


In [20]:
print(A)

[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29]
 [30 31 32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47 48 49]
 [50 51 52 53 54 55 56 57 58 59]
 [60 61 62 63 64 65 66 67 68 69]
 [70 71 72 73 74 75 76 77 78 79]
 [80 81 82 83 84 85 86 87 88 89]
 [90 91 92 93 94 95 96 97 98 99]]


 ## Boolean array indexing

* Finding the elements bigger than ...

In [308]:
A = np.arange(20)
print(A, A.shape)
bool_idx = (A>12)
print(bool_idx, bool_idx.shape)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19] (20,)
[False False False False False False False False False False False False
 False  True  True  True  True  True  True  True] (20,)


* Printing the elements bigger than ...

In [309]:
print(A[bool_idx])

[13 14 15 16 17 18 19]


* Getting the elements of an array smaller than ...:

In [344]:
A = (100*(0.5-np.random.rand(25))).astype(np.int16).reshape(5,5)
print(A)
print(A[A<0]) # Notice that len(A[A<0]) <= len(A)

[[ 14 -31  30 -20  29]
 [ -8 -44 -48  32  -4]
 [ 33  26 -35  27  19]
 [ -3  48  44  11  -5]
 [-26 -35   1   9   9]]
[-31 -20  -8 -44 -48  -4 -35  -3  -5 -26 -35]


* Changing the elements smaller than ...:

In [345]:
A[A<0] = 0
print(A)

[[14  0 30  0 29]
 [ 0  0  0 32  0]
 [33 26  0 27 19]
 [ 0 48 44 11  0]
 [ 0  0  1  9  9]]


## Elementwise (array-to-array) math

* Assignment:

In [266]:
A = np.zeros((5,5), np.int32)
print(A)
A[:,:] = 1
print(A)

[[0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]]
[[1 1 1 1 1]
 [1 1 1 1 1]
 [1 1 1 1 1]
 [1 1 1 1 1]
 [1 1 1 1 1]]


In [269]:
A = np.zeros((5,5), np.int32)
print(A)
A[1:4,1:4] = 1 # Change to 1 from coordinate (1,1) to coordinate (4,4), not included
print(A)

[[0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]]
[[0 0 0 0 0]
 [0 1 1 1 0]
 [0 1 1 1 0]
 [0 1 1 1 0]
 [0 0 0 0 0]]


* Vectorial-scalar addition:

In [270]:
A[1:4, 1:4] += 1
print(A)

[[0 0 0 0 0]
 [0 2 2 2 0]
 [0 2 2 2 0]
 [0 2 2 2 0]
 [0 0 0 0 0]]


* Vectorial addition:

In [277]:
B = np.ones((5,5), np.int32)
print(A)
print(B)
C = A + B
print(C)

[[0 0 0 0 0]
 [0 2 2 2 0]
 [0 2 2 2 0]
 [0 2 2 2 0]
 [0 0 0 0 0]]
[[1 1 1 1 1]
 [1 1 1 1 1]
 [1 1 1 1 1]
 [1 1 1 1 1]
 [1 1 1 1 1]]
[[1 1 1 1 1]
 [1 3 3 3 1]
 [1 3 3 3 1]
 [1 3 3 3 1]
 [1 1 1 1 1]]


* Vectorial substraction:

In [278]:
D = C - B
print(D)

[[0 0 0 0 0]
 [0 2 2 2 0]
 [0 2 2 2 0]
 [0 2 2 2 0]
 [0 0 0 0 0]]


* Vectorial multiplication (not matrix multiplication!):

In [279]:
C = C * D
print(C)

[[0 0 0 0 0]
 [0 6 6 6 0]
 [0 6 6 6 0]
 [0 6 6 6 0]
 [0 0 0 0 0]]


* Floating-point vectorial division:

In [280]:
C = C / B
print(C)

[[0. 0. 0. 0. 0.]
 [0. 6. 6. 6. 0.]
 [0. 6. 6. 6. 0.]
 [0. 6. 6. 6. 0.]
 [0. 0. 0. 0. 0.]]


In [282]:
print(1/C)

[[       inf        inf        inf        inf        inf]
 [       inf 0.16666667 0.16666667 0.16666667        inf]
 [       inf 0.16666667 0.16666667 0.16666667        inf]
 [       inf 0.16666667 0.16666667 0.16666667        inf]
 [       inf        inf        inf        inf        inf]]


* Fixed-point (integer) vectorial division:

In [283]:
C = D // B
print(C)

[[0 0 0 0 0]
 [0 2 2 2 0]
 [0 2 2 2 0]
 [0 2 2 2 0]
 [0 0 0 0 0]]


* Absolute value:

In [219]:
A = (100*(np.random.rand(25) - 0.5)).astype(np.int16).reshape(5,5)
print(A)

[[ 31  17 -22 -45   6]
 [-45 -33 -28 -20  27]
 [  1  17  44 -38 -27]
 [ 12  39 -10  33  -5]
 [  0   0 -47 -47 -22]]


In [220]:
print(np.absolute(A))

[[31 17 22 45  6]
 [45 33 28 20 27]
 [ 1 17 44 38 27]
 [12 39 10 33  5]
 [ 0  0 47 47 22]]


## Vector math

In [236]:
A = np.arange(10)
print(A)

[0 1 2 3 4 5 6 7 8 9]


* Sum of all elements of an array:

In [237]:
print(np.sum(A))

45


* Compute the maximum of an array:

In [238]:
print(np.max(A))

9


* Scalar (dot) product:

In [243]:
B = A[::-1]
c = np.dot(A,B)
print(c)

120


In [240]:
c = sum(A_i*B_i for A_i, B_i in zip(A, B))
print(c)

120


In [247]:
c = sum(A[:]*B[:])
print(c)

120


In [244]:
c = A @ B
print(c)

120


In [245]:
c = np.inner(A, B)
print(c)

120


* Magnitude (norm):

In [225]:
import math
A = np.arange(10)+1
print(A)
print(np.linalg.norm(A)) # L2 norm
print(math.sqrt(sum(A_i**2 for A_i in A)))

[ 1  2  3  4  5  6  7  8  9 10]
19.621416870348583
19.621416870348583


In [233]:
print(np.linalg.norm(A, ord=1)) # L1 norm
print(sum(A))

55.0
55


In [231]:
print(np.linalg.norm(A, ord=0)) # L0 norm
print(max(A))

10.0
10


In [232]:
print(np.linalg.norm(A, ord=4)) # L4 norm

12.61599928712066


## Matrix math

* Let's define a "chessboard" matrix:

In [100]:
A = np.array([[(i+j)%2 for j in range(10)] for i in range(10)])
print(A, A.shape)

[[0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]] (10, 10)


... and a 1-column matrix:

In [101]:
B = np.array([[1] for i in range(10)])
print(B, B.shape)

[[1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]] (10, 1)


* Product matrix-matrix:

In [102]:
C = A @ B
print(C)

[[5]
 [5]
 [5]
 [5]
 [5]
 [5]
 [5]
 [5]
 [5]
 [5]]


* Matrix transpose:

In [106]:
print(C.T, C.T.shape, C.shape)

[[5 5 5 5 5 5 5 5 5 5]] (1, 10) (10, 1)


* Determinant:

In [107]:
print(np.linalg.det(A))

0.0


* Inverse:

In [108]:
R = np.random.rand(5,5)
iR = np.linalg.inv(R)
print(iR)

[[14.61741315 -6.37783338 -3.34302457 -5.58089024 -1.12000564]
 [-3.11010257  1.39827676  1.57147237 -0.44543376  1.67230113]
 [-8.3317683   3.49797781  2.28345506  5.09440659 -1.07510007]
 [-3.41378791  1.56348265 -0.53048724  1.34018703  1.96292206]
 [-7.91073324  4.45603337  2.05711732  4.21811861 -1.8207291 ]]


In [109]:
print(np.round(R @ iR))

[[ 1.  0.  0. -0.  0.]
 [-0.  1.  0. -0.  0.]
 [-0. -0.  1. -0. -0.]
 [ 0.  0.  0.  1. -0.]
 [-0.  0.  0. -0.  1.]]


In [110]:
print(R @ iR)

[[ 1.00000000e+00  0.00000000e+00  0.00000000e+00 -8.88178420e-16
   0.00000000e+00]
 [-1.77635684e-15  1.00000000e+00  0.00000000e+00 -4.44089210e-16
   4.44089210e-16]
 [-4.44089210e-16 -3.33066907e-16  1.00000000e+00 -6.66133815e-16
  -2.22044605e-16]
 [ 4.99600361e-16  1.11022302e-16  1.24900090e-16  1.00000000e+00
  -2.22044605e-16]
 [-1.87350135e-16  3.60822483e-16  8.84708973e-17 -9.36750677e-17
   1.00000000e+00]]


In [111]:
print(np.round(iR @ R))

[[ 1. -0. -0. -0.  0.]
 [-0.  1. -0. -0.  0.]
 [ 0.  0.  1.  0. -0.]
 [ 0.  0.  0.  1.  0.]
 [ 0.  0.  0.  0.  1.]]


In [112]:
print(iR @ R)

[[ 1.00000000e+00 -1.33226763e-15 -2.10942375e-15 -1.55431223e-15
   0.00000000e+00]
 [-5.55111512e-16  1.00000000e+00 -3.33066907e-16 -1.11022302e-16
   0.00000000e+00]
 [ 1.11022302e-15  1.88737914e-15  1.00000000e+00  1.11022302e-15
  -4.44089210e-16]
 [ 6.66133815e-16  4.44089210e-16  0.00000000e+00  1.00000000e+00
   0.00000000e+00]
 [ 6.66133815e-16  1.11022302e-15  5.55111512e-16  8.88178420e-16
   1.00000000e+00]]


* Pseudo-inverse:

In [113]:
R = np.random.rand(5,4)
iR = np.linalg.pinv(R)
print(iR)

[[ 1.61445095  0.26001173 -3.28981819 -0.84899614  2.49552109]
 [-2.84992312  1.22376388  2.55756301  1.56949005 -1.74520583]
 [ 0.8713956   0.48939598  0.1919096  -0.92692207 -0.31826767]
 [ 3.5535617  -3.08593886 -1.62164894 -0.38478791  2.23065642]]


In [114]:
print(np.round(R @ iR))

[[ 1. -0.  0. -0.  0.]
 [-0.  1.  0. -0.  0.]
 [ 0.  0.  1.  0. -0.]
 [-0. -0.  0.  1.  0.]
 [ 0.  0. -0.  0.  1.]]


In [115]:
print(R @ iR)

[[ 0.85108013 -0.05874054  0.15236694 -0.21813549  0.22911463]
 [-0.05874054  0.97683015  0.06010021 -0.08604222  0.09037288]
 [ 0.15236694  0.06010021  0.8441062   0.22318471 -0.23441798]
 [-0.21813549 -0.08604222  0.22318471  0.68047854  0.33560352]
 [ 0.22911463  0.09037288 -0.23441798  0.33560352  0.64750498]]


In [116]:
print(np.round(iR @ R))

[[ 1.  0. -0.  0.]
 [ 0.  1. -0. -0.]
 [ 0. -0.  1.  0.]
 [-0.  0.  0.  1.]]


In [117]:
print(iR @ R)

[[ 1.00000000e+00  0.00000000e+00 -4.44089210e-16  8.88178420e-16]
 [ 3.33066907e-16  1.00000000e+00 -1.11022302e-16 -1.11022302e-15]
 [ 5.55111512e-17 -2.22044605e-16  1.00000000e+00  1.80411242e-16]
 [-2.22044605e-16  4.44089210e-16  5.55111512e-16  1.00000000e+00]]


* Vector (cross) product:

## Broadcasting
In vectorized operations, NumPy "extends" scalars and arrays with one of its dimensions equal to 1 to the size of the other(s) array(s).

In [118]:
A = np.ones((5,3))
print(A)

[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]


In [119]:
B = np.arange(1)
print(B)

[0]


In [120]:
B += 1
print(B)

[1]


* Broadcasting of a $1\times 1$ matrix:

In [121]:
print(A + B) # 'A' is 5x3 and 'B' is 1x1

[[2. 2. 2.]
 [2. 2. 2.]
 [2. 2. 2.]
 [2. 2. 2.]
 [2. 2. 2.]]


* Broadcasting of a $1\times 3$ matrix:

In [122]:
B = np.arange(3)
print(B)

[0 1 2]


In [123]:
print(A + B) # 'a' is 5x3 and 'b' is '1x3'

[[1. 2. 3.]
 [1. 2. 3.]
 [1. 2. 3.]
 [1. 2. 3.]
 [1. 2. 3.]]


* Broadcasting of a $5\times 1$ matrix:

In [124]:
B = np.arange(5)
print(B)

[0 1 2 3 4]


In [125]:
B = B.reshape((5,1)) # (Rows, Columns)
print(B)

[[0]
 [1]
 [2]
 [3]
 [4]]


In [126]:
print(A + B)

[[1. 1. 1.]
 [2. 2. 2.]
 [3. 3. 3.]
 [4. 4. 4.]
 [5. 5. 5.]]


* One of the dimensions must match to broadcast the smaller array. Otherwise a `ValueError: frames are not aligned` is thrown.

In [127]:
B = np.arange(4)[:, None]
print(B)

[[0]
 [1]
 [2]
 [3]]


In [128]:
print(A.shape)

(5, 3)


In [129]:
print(B.shape)

(4, 1)


In [130]:
try:
    A + B
except ValueError as e:
    print("ValueError exception: ", end='')
    if hasattr(e, 'message'):
        print(e.message)
    else:
        print(e)

ValueError exception: operands could not be broadcast together with shapes (5,3) (4,1) 


## How fast is Numpy's array math?

In [131]:
A = np.array([[(i*10+j) for j in range(10)] for i in range(10)])
print(A, A.shape)

[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29]
 [30 31 32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47 48 49]
 [50 51 52 53 54 55 56 57 58 59]
 [60 61 62 63 64 65 66 67 68 69]
 [70 71 72 73 74 75 76 77 78 79]
 [80 81 82 83 84 85 86 87 88 89]
 [90 91 92 93 94 95 96 97 98 99]] (10, 10)


In [132]:
print(A[:1]) # First row (a matrix)

[[0 1 2 3 4 5 6 7 8 9]]


In [133]:
print(A[:1].shape)

(1, 10)


In [134]:
print(A[:1][0]) # First element of a matrix of one elment (a vector)

[0 1 2 3 4 5 6 7 8 9]


In [135]:
print(A[:1][0].shape)

(10,)


In [136]:
B = A[:1][0]
print(B)

[0 1 2 3 4 5 6 7 8 9]


* Add `B[]` to all the rows of `A[][]` using scalar arithmetic:

In [137]:
C = np.empty_like(A)
def add():
    for i in range(A.shape[1]):
        for j in range(A.shape[0]):
            C[i, j] = A[i, j] + B[j]
%timeit add()
print(C)

125 µs ± 9.03 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
[[  0   2   4   6   8  10  12  14  16  18]
 [ 10  12  14  16  18  20  22  24  26  28]
 [ 20  22  24  26  28  30  32  34  36  38]
 [ 30  32  34  36  38  40  42  44  46  48]
 [ 40  42  44  46  48  50  52  54  56  58]
 [ 50  52  54  56  58  60  62  64  66  68]
 [ 60  62  64  66  68  70  72  74  76  78]
 [ 70  72  74  76  78  80  82  84  86  88]
 [ 80  82  84  86  88  90  92  94  96  98]
 [ 90  92  94  96  98 100 102 104 106 108]]


* Add `B[]` to all the rows of `B[][]` using vectorial computation:

In [138]:
C = np.empty_like(A)
def add():
    for i in range(A.shape[1]):
        C[i, :] = A[i, :] + B
%timeit add()
print(C)

38.5 µs ± 779 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
[[  0   2   4   6   8  10  12  14  16  18]
 [ 10  12  14  16  18  20  22  24  26  28]
 [ 20  22  24  26  28  30  32  34  36  38]
 [ 30  32  34  36  38  40  42  44  46  48]
 [ 40  42  44  46  48  50  52  54  56  58]
 [ 50  52  54  56  58  60  62  64  66  68]
 [ 60  62  64  66  68  70  72  74  76  78]
 [ 70  72  74  76  78  80  82  84  86  88]
 [ 80  82  84  86  88  90  92  94  96  98]
 [ 90  92  94  96  98 100 102 104 106 108]]


* Add `B[]` to all the rows of `A[][]` using fully vectorial computation:

In [139]:
%timeit C = A + B # <- broadcasting is faster
print(C)

4.05 µs ± 64.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
[[  0   2   4   6   8  10  12  14  16  18]
 [ 10  12  14  16  18  20  22  24  26  28]
 [ 20  22  24  26  28  30  32  34  36  38]
 [ 30  32  34  36  38  40  42  44  46  48]
 [ 40  42  44  46  48  50  52  54  56  58]
 [ 50  52  54  56  58  60  62  64  66  68]
 [ 60  62  64  66  68  70  72  74  76  78]
 [ 70  72  74  76  78  80  82  84  86  88]
 [ 80  82  84  86  88  90  92  94  96  98]
 [ 90  92  94  96  98 100 102 104 106 108]]


## Arrays of objects

* For example, an array of strings:

In [140]:
A = np.array(['hello', 'world!'])
print(A)
print(A.shape)
print(np.char.upper(A))

['hello' 'world!']
(2,)
['HELLO' 'WORLD!']


* Simulating a dictionary:

In [321]:
A = np.array([("Spain", 100), ("France", 200), ("Italy", 300)])
print(A) # Notice that all the elements are srings
print(A.shape)
print(A[:,0])
print(A[A[:,0] == "France"])
print(A[A[:,0] == "France"][:,1])
print("The value associated to the key France is", A[A[:,0] == "France"][:,1][0])

[['Spain' '100']
 ['France' '200']
 ['Italy' '300']]
(3, 2)
['Spain' 'France' 'Italy']
[['France' '200']]
['200']
The value associated to the key France is 200


* A dictionary is faster:

In [327]:
%timeit A[A[:,0] == "France"][:,1][0]
dictionary = {"Spain":100, "France":200, "Italy":300}
print(dictionary["France"])
%timeit dictionary["France"]

12.4 µs ± 559 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
200
68.4 ns ± 1.65 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


* However, this difference can be smaller depending on the type of search:

In [337]:
others = [value for key, value in dictionary.items() if key != "France"]
print(others)
%timeit [value for key, value in dictionary.items() if key != "France"]

[100, 300]
1.12 µs ± 57.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [338]:
print(A[:,0] != "France")
print(A[A[:,0] != "France"])
print(A[A[:,0] != "France"][:,1])
print(A[A[:,0] != "France"][:,1].astype(np.int16))
%timeit A[A[:,0] != "France"][:,1].astype(np.int16)

[ True False  True]
[['Spain' '100']
 ['Italy' '300']]
['100' '300']
[100 300]
21.4 µs ± 1.24 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


## Structured arrays

* Create a 1D array of (two) records, where each record has the structure (int, float, char[10]).

In [142]:
X = np.array([(1, 2., "Hello"), (3, 4., "World")],
             dtype=[("first", "i4"),("second", "f4"), ("third", "S10")])
# See https://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html
print(X)

[(1, 2., b'Hello') (3, 4., b'World')]


* Get the first element of every record:

In [143]:
print(X["first"])

[1 3]


* Get the first record:

In [144]:
print(X[0])

(1, 2., b'Hello')


* Get the second element of every record:

In [145]:
print(X["second"])

[2. 4.]


* Third element of every record:

In [146]:
print(X["third"])

[b'Hello' b'World']


## Disk I/O

* Output data to an ASCII file:

In [147]:
Data = np.array([[1., 200.], [2., 150.], [3., 250.]])
np.savetxt("data.txt", Data)
!cat Data.txt

1.000000000000000000e+00 2.000000000000000000e+02
2.000000000000000000e+00 1.500000000000000000e+02
3.000000000000000000e+00 2.500000000000000000e+02


* Input data from an ASCII file:

In [148]:
np.genfromtxt('data.txt')

array([[  1., 200.],
       [  2., 150.],
       [  3., 250.]])

* Output data to a binary file (using the native endianness):

In [149]:
ofile = open("data.float64", mode="wb")
Data.tofile(ofile)

* Input data from a binary file (using the native endianness):

In [150]:
print(np.fromfile("data.float64", dtype=np.float64))

[  1. 200.   2. 150.   3. 250.]


* Numpy and C use the same endianness:

In [151]:
!cat create_float64.c
!gcc create_float64.c -o create_float64
!./create_float64

#include <stdio.h>

#define N 10

int main() {
  double a[N];
  int i;
  FILE *ofile = fopen("data.float64", "wb");
  for(i=0; i<N; i++) {
    a[i] = i;
  }
  fwrite(a, sizeof(double), N, ofile);
  fclose(ofile);
  fprintf(stderr,"create_float64: done\n");
}
create_float64: done


In [152]:
print(np.fromfile("data.float64", dtype=np.float64))

[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]


* Specifiying the endianness:

In [153]:
print(np.fromfile("data.float64", dtype=">d"))
# (> = bit-endian, d = double, see https://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html)

[0.00000e+000 3.03865e-319 3.16202e-322 1.04347e-320 2.05531e-320
 2.56124e-320 3.06716e-320 3.57308e-320 4.07901e-320 4.33197e-320]


* Make the things easier:

In [154]:
ofile = open("data.npy", mode="wb")
A = (100*np.random.rand(2,3)).astype(np.uint16)
print(A)

[[45 74 96]
 [86 78 94]]


In [155]:
np.save(ofile, A)

In [156]:
!ls save*

save_data


In [157]:
print(np.load("data.npy"))

[[45 74 96]
 [86 78 94]]
