# SciPy.org's [Numpy](http://www.numpy.org/)

Numpy provides a high-performance multidimensional array object.

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Why-numpy?" data-toc-modified-id="Why-numpy?-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Why numpy?</a></span></li><li><span><a href="#Creating-(simple)-arrays-in-Numpy" data-toc-modified-id="Creating-(simple)-arrays-in-Numpy-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Creating (simple) <a href="https://docs.scipy.org/doc/numpy-dev/user/quickstart.html" target="_blank">arrays</a> in Numpy</a></span></li><li><span><a href="#1D-arrays" data-toc-modified-id="1D-arrays-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>1D arrays</a></span></li><li><span><a href="#2D-arrays" data-toc-modified-id="2D-arrays-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>2D arrays</a></span></li><li><span><a href="#N-dimensional-arrays" data-toc-modified-id="N-dimensional-arrays-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>N-dimensional arrays</a></span></li><li><span><a href="#Slicing" data-toc-modified-id="Slicing-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Slicing</a></span></li><li><span><a href="#Boolean-array-indexing" data-toc-modified-id="Boolean-array-indexing-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>Boolean array indexing</a></span></li><li><span><a href="#Elementwise-(vectorial-vectorial-and-vectorial-scalar)-math" data-toc-modified-id="Elementwise-(vectorial-vectorial-and-vectorial-scalar)-math-8"><span class="toc-item-num">8&nbsp;&nbsp;</span>Elementwise (vectorial-vectorial and vectorial-scalar) math</a></span></li><li><span><a href="#Matricial-math" data-toc-modified-id="Matricial-math-9"><span class="toc-item-num">9&nbsp;&nbsp;</span>Matricial math</a></span></li><li><span><a href="#Broadcasting" data-toc-modified-id="Broadcasting-10"><span class="toc-item-num">10&nbsp;&nbsp;</span>Broadcasting</a></span></li><li><span><a href="#How-fast-is-Numpy's-array-math?" data-toc-modified-id="How-fast-is-Numpy's-array-math?-11"><span class="toc-item-num">11&nbsp;&nbsp;</span>How fast is Numpy's array math?</a></span></li><li><span><a href="#Structured-arrays" data-toc-modified-id="Structured-arrays-12"><span class="toc-item-num">12&nbsp;&nbsp;</span>Structured arrays</a></span></li><li><span><a href="#Disk-I/O" data-toc-modified-id="Disk-I/O-13"><span class="toc-item-num">13&nbsp;&nbsp;</span>Disk I/O</a></span></li></ul></div>

## Why numpy?

Good running times.

In [1]:
try:
    import numpy as np
except:
    !pip3 install numpy
    import numpy as np

* Lets define a list and compute the sum of its elements, timing it:

In [2]:
l = list(range(0,100000)); print(type(l), l[:10])
%timeit sum(l)

<class 'list'> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1.48 ms ± 59.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


* An now, lets create a numpy's array and time the sum of its elements:

In [3]:
A = np.arange(0, 100000); print(type(A), A[:10])
%timeit np.sum(A)

<class 'numpy.ndarray'> [0 1 2 3 4 5 6 7 8 9]
96.9 µs ± 10.7 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


* And what about a *pure* C implementation of an equivalent computation: 

In [4]:
!cat sum_array.c
!gcc -O3 sum_array.c -o sum_array
%timeit !./sum_array

#include <stdio.h>
#include <time.h>
#include "sum_array_lib.c"

#define N 100000

int main() {
  double a[N];
  int i;
  clock_t start, end;
  double cpu_time;
  for(i=0; i<N; i++) {
    a[i] = i;
  }
  start = clock();
  double sum = sum_array(a,N);
  end = clock();
  printf("%f ", sum);
  cpu_time = ((double) (end - start)) / CLOCKS_PER_SEC;
  cpu_time *= 1000000;
  printf("%f usegs\n", cpu_time);
}
4999950000.000000 167.000000 usegs
4999950000.000000 167.000000 usegs
4999950000.000000 166.000000 usegs
4999950000.000000 174.000000 usegs
4999950000.000000 167.000000 usegs
4999950000.000000 167.000000 usegs
4999950000.000000 151.000000 usegs
4999950000.000000 167.000000 usegs
4999950000.000000 166.000000 usegs
4999950000.000000 166.000000 usegs
4999950000.000000 175.000000 usegs
4999950000.000000 197.000000 usegs
4999950000.000000 170.000000 usegs
4999950000.000000 168.000000 usegs
4999950000.000000 167.000000 usegs
4999950000.000000 174.000000 usegs
4999950000.000000 174.000000 usegs

* Another example:

In [5]:
# Example extracted from https://github.com/pyHPC/pyhpc-tutorial
lst = range(1000000)

for i in lst[:10]:
    print(i, end=' ')
print()

%timeit [i + 1 for i in lst] # A Python list comprehension (iteration happens in C but with PyObjects)
x = [i + 1 for i in lst]

print(x[:10])

arr = np.arange(1000000) # A NumPy list of integers
%timeit arr + 1 # Use operator overloading for nice syntax, now iteration is in C with ints
y = arr + 1

print(y[:10])

0 1 2 3 4 5 6 7 8 9 
209 ms ± 22.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
2.73 ms ± 280 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
[ 1  2  3  4  5  6  7  8  9 10]


* Looking for information of numpy's *something*:

In [6]:
np.lookfor('invert')

Search results for 'invert'
---------------------------
numpy.bitwise_not
    Compute bit-wise inversion, or bit-wise NOT, element-wise.
numpy.matrix.getI
    Returns the (multiplicative) inverse of invertible `self`.
numpy.in1d
    Test whether each element of a 1-D array is also present in a second array.
numpy.isin
    Calculates `element in test_elements`, broadcasting over `element` only.
numpy.transpose
    Permute the dimensions of an array.
numpy.linalg.inv
    Compute the (multiplicative) inverse of a matrix.
numpy.linalg.pinv
    Compute the (Moore-Penrose) pseudo-inverse of a matrix.
numpy.random.SFC64
    BitGenerator for Chris Doty-Humphrey's Small Fast Chaotic PRNG.
numpy.linalg.tensorinv
    Compute the 'inverse' of an N-dimensional array.
numpy.linalg.matrix_power
    Raise a square matrix to the (integer) power `n`.

* Remember that it's possible to use the tabulator to extend some command or to use a wildcard in Ipython to get the numpy's stuff:

In [7]:
np.*?

## Creating (simple) [arrays](https://docs.scipy.org/doc/numpy-dev/user/quickstart.html) in Numpy
A simple array is a grid of values, all of the same type, indexed by a tuple of nonnegative integers.

## 1D arrays

* Creating an empty array:

In [8]:
A = np.array([], dtype=np.uint8)
A

array([], dtype=uint8)

* Creating an array using a list:

In [9]:
A = np.array([1, 2, 3])
print(type([1, 2, 3]))
print(type(A))

<class 'list'>
<class 'numpy.ndarray'>


* Getting the number of dimensions of an array:

In [10]:
print(A.ndim)

1


* Printing an array:

In [11]:
print(A)

[1 2 3]


* Printing the *shape* (which always is a tuple) of an array:

In [12]:
print(A.shape)

(3,)


* Native Python's [`len()`](https://docs.python.org/3.6/library/functions.html#len) also works:

In [13]:
print(len(A))

3


* A more exotic definition using [`linspace()`](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.linspace.html):

In [14]:
np.linspace(1., 4., 6)

array([1. , 1.6, 2.2, 2.8, 3.4, 4. ])

* Arrays can be created from different types of contaniers (which store complex numbers in this case):

In [15]:
C = [[1,1.0],(1+1j,.3)]
print(type(C), type(C[0]), type(C[1]))
X = np.array(C)
X

<class 'list'> <class 'list'> <class 'tuple'>


array([[1. +0.j, 1. +0.j],
       [1. +1.j, 0.3+0.j]])

* Accessing to an element:

In [16]:
print(A, A[0], A[1])

[1 2 3] 1 2


In [17]:
A[0] = 0
print(A)

[0 2 3]


* Appending elements:

In [18]:
A = np.append(A, 4)
A

array([0, 2, 3, 4])

## 2D arrays

* Creating a 2D array with two 1D arrays:

In [19]:
B = np.array([[1,2,3],[4,5,6]])
print(B)
print(B.shape)
print(B[1, 2]) # [row, column]
print(B[0, 1])

[[1 2 3]
 [4 5 6]]
(2, 3)
6
2


* With zeroes:

In [20]:
A = np.zeros((5,5))
print(A)

[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]


* The default dtype is `float64`:

In [21]:
print(type(A[0][0]))

<class 'numpy.float64'>


* With ones:

In [22]:
A = np.ones((5,5))
print(A)

[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]


* With an arbitrary scalar:

In [23]:
A = np.full((5,5), 2)
print(A)

[[2 2 2 2 2]
 [2 2 2 2 2]
 [2 2 2 2 2]
 [2 2 2 2 2]
 [2 2 2 2 2]]


* The identity matrix:

In [24]:
A = np.eye(5)
print(A)

[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]


* With random data:

In [25]:
A = np.random.random((5,5))
print(A)

[[0.61707668 0.53945173 0.96245513 0.14201763 0.66120875]
 [0.11083353 0.71041228 0.52073498 0.87548743 0.01253925]
 [0.66636854 0.50786554 0.20973364 0.45038123 0.56942655]
 [0.37435335 0.72671849 0.20362131 0.96984895 0.23458902]
 [0.63989705 0.60054282 0.76352767 0.96208433 0.74300137]]


In [26]:
# Always random
A = np.random.random((5,5))
print(A)

[[0.49982406 0.45940582 0.128115   0.15950017 0.64931378]
 [0.7241646  0.84063176 0.3966688  0.86722483 0.03855141]
 [0.65959691 0.21837672 0.72856096 0.17993583 0.33587541]
 [0.85181154 0.70062341 0.45821444 0.13289916 0.18458702]
 [0.19409244 0.64197529 0.05406954 0.33735665 0.39362688]]


* Filled with [arbitrary](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.empty_like.html) data and with a previously defined shape:

In [27]:
B = np.empty_like(A) # The content could be any
print(B)

[[0.49982406 0.45940582 0.128115   0.15950017 0.64931378]
 [0.7241646  0.84063176 0.3966688  0.86722483 0.03855141]
 [0.65959691 0.21837672 0.72856096 0.17993583 0.33587541]
 [0.85181154 0.70062341 0.45821444 0.13289916 0.18458702]
 [0.19409244 0.64197529 0.05406954 0.33735665 0.39362688]]


* With a 1D list comprehension:

In [28]:
A = np.array([i for i in range(5)])
print(A, A[1], A.shape)

[0 1 2 3 4] 1 (5,)


* With a 2D list comprehension:

In [29]:
A = np.array([[j+i*4 for j in range(4)] for i in range(5)])
print(A, A.shape)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]] (5, 4)


* Accessing to a row of a matrix:

In [30]:
A[1] # Get row 2

array([4, 5, 6, 7])

* Accessing to an element of a matrix:

In [31]:
A[1][2] # [row][column]

6

In [32]:
A[1,2] # [row, column]

6

* Be careful:

In [33]:
timeit A[1][2]

814 ns ± 86.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [34]:
timeit A[1,2] # This is faster than a[1][2]

384 ns ± 22.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


* Getting elements of a matrix using "integer array indexing":

In [35]:
print(A)
print(A[[0, 1, 2], [3, 2, 1]])

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]]
[3 6 9]


* The same integer array indexing using comprehension lists:

In [36]:
print(A[np.array([i for i in range(3)]), np.array([i for i in range(3,0,-1)])])

[3 6 9]


* The same using [`np.arange()`](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.arange.html):

In [37]:
print(np.arange(3))
print(np.arange(3,0,-1))
print(A[np.arange(3), np.arange(3,0,-1)])

[0 1 2]
[3 2 1]
[3 6 9]


* Reshaping:

In [38]:
A.shape

(5, 4)

In [39]:
np.reshape(A, (10, 2))

array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11],
       [12, 13],
       [14, 15],
       [16, 17],
       [18, 19]])

In [40]:
B = np.reshape(A, (10, 2), order='C') # C -> C language (the default behaviour)
B

array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11],
       [12, 13],
       [14, 15],
       [16, 17],
       [18, 19]])

In [41]:
np.isfortran(B)

False

In [42]:
# As you can see, by default, numpy runs the 2D arrays by rows when
# the source matrix A is read and the destination matrix B is "written":
# A[0, 0] == B[0, 0]
# A[0, 1] == B[0, 1]
# A[0, 2] == B[1, 0]
# A[0, 3] == B[1, 1]
# :
A

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])

In [43]:
print(A[0, 3], B[1, 1])

3 3


In [44]:
B = np.reshape(A, (10, 2), order='F') # F -> Fortran language
B

array([[ 0,  2],
       [ 4,  6],
       [ 8, 10],
       [12, 14],
       [16, 18],
       [ 1,  3],
       [ 5,  7],
       [ 9, 11],
       [13, 15],
       [17, 19]])

In [45]:
# Using the Fortran ordering, numpy runs the 2D array by columns:
# A[0, 0] == B[0, 0]
# A[1, 0] == B[1, 0]
# A[2, 0] == B[2, 0]
# A[3, 0] == B[3, 0]
# A[4, 0] == B[4, 0]
# A[0, 1] == B[5, 0]
# :
A

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19]])

In [46]:
print(A[0, 1], B[5, 0])

1 1


In [47]:
np.isfortran(B)

True

* Views and copies:

In [48]:
# https://stackoverflow.com/questions/56090021/list-comprehension-python-prime-numbers
Primes_less_than_100 = np.array([x for x in range(2,100) if not any([x % y == 0 for y in range(2, int(x/2)+1)])])
Primes_less_than_100

array([ 2,  3,  5,  7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59,
       61, 67, 71, 73, 79, 83, 89, 97])

In [49]:
A = Primes_less_than_100 # This is a copy of pointers
A

array([ 2,  3,  5,  7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59,
       61, 67, 71, 73, 79, 83, 89, 97])

In [50]:
A[0]=1
A

array([ 1,  3,  5,  7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59,
       61, 67, 71, 73, 79, 83, 89, 97])

In [51]:
Primes_less_than_100

array([ 1,  3,  5,  7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59,
       61, 67, 71, 73, 79, 83, 89, 97])

In [52]:
id(A)

4616775840

In [53]:
id(Primes_less_than_100)

4616775840

In [54]:
Primes_less_than_100 = np.array([x for x in range(2,100) if not any([x % y == 0 for y in range(2, int(x/2)+1)])])
A = np.copy(Primes_less_than_100)

In [55]:
id(A)

4616327696

In [56]:
id(Primes_less_than_100)

4652538016

In [57]:
Primes_less_than_100[0]

2

In [58]:
A[0] = 1

In [59]:
print(Primes_less_than_100[0], A[0])

2 1


In [60]:
timeit A=Primes_less_than_100 # This is much faster, depending on the size of the array

46.6 ns ± 1.82 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [61]:
%timeit A = np.copy(Primes_less_than_100)

4.7 µs ± 341 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


## N-dimensional arrays

In [62]:
A = np.ndarray((2,3,4,2))
A

array([[[[0.00000000e+000, 0.00000000e+000],
         [1.18575755e-322, 0.00000000e+000],
         [0.00000000e+000, 0.00000000e+000],
         [0.00000000e+000, 0.00000000e+000]],

        [[0.00000000e+000, 0.00000000e+000],
         [0.00000000e+000, 0.00000000e+000],
         [0.00000000e+000, 0.00000000e+000],
         [0.00000000e+000, 0.00000000e+000]],

        [[0.00000000e+000, 0.00000000e+000],
         [0.00000000e+000, 0.00000000e+000],
         [0.00000000e+000, 0.00000000e+000],
         [0.00000000e+000, 0.00000000e+000]]],


       [[[0.00000000e+000, 0.00000000e+000],
         [0.00000000e+000, 0.00000000e+000],
         [0.00000000e+000, 0.00000000e+000],
         [4.94065646e-323, 0.00000000e+000]],

        [[0.00000000e+000, 0.00000000e+000],
         [0.00000000e+000, 0.00000000e+000],
         [0.00000000e+000, 0.00000000e+000],
         [0.00000000e+000, 0.00000000e+000]],

        [[0.00000000e+000, 0.00000000e+000],
         [0.00000000e+000, 0.00000000e+000]

In [63]:
A.shape

(2, 3, 4, 2)

In [64]:
# The same can be done with:
A = np.ndarray(2*3*4*2).reshape(2, 3, 4, 2)
A.shape

(2, 3, 4, 2)

## Slicing

In [65]:
A = np.array([[j+i*5 for j in range(10)] for i in range(5)])
A

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [ 5,  6,  7,  8,  9, 10, 11, 12, 13, 14],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [15, 16, 17, 18, 19, 20, 21, 22, 23, 24],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])

* Get all rows of a matrix (the whole matrix):

In [66]:
A[:]

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [ 5,  6,  7,  8,  9, 10, 11, 12, 13, 14],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [15, 16, 17, 18, 19, 20, 21, 22, 23, 24],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])

In [67]:
A[:,:]

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [ 5,  6,  7,  8,  9, 10, 11, 12, 13, 14],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [15, 16, 17, 18, 19, 20, 21, 22, 23, 24],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])

In [68]:
timeit A[:]

511 ns ± 64.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [69]:
timeit A[:,:] # This is slightly slower than 'a[:]'

559 ns ± 19.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [70]:
# Notation: [starting index : stoping index : step]
# By default, start = 0, stop = maximum, step = 1
A[::]

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [ 5,  6,  7,  8,  9, 10, 11, 12, 13, 14],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [15, 16, 17, 18, 19, 20, 21, 22, 23, 24],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])

In [71]:
timeit A[::] # Identical to 'a[:]'

464 ns ± 20.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [72]:
timeit A # But not to 'a'

40.6 ns ± 1 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [73]:
A[0:]

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [ 5,  6,  7,  8,  9, 10, 11, 12, 13, 14],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [15, 16, 17, 18, 19, 20, 21, 22, 23, 24],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])

In [74]:
A[0::]

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [ 5,  6,  7,  8,  9, 10, 11, 12, 13, 14],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [15, 16, 17, 18, 19, 20, 21, 22, 23, 24],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])

In [75]:
A[:A.shape[1]]

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [ 5,  6,  7,  8,  9, 10, 11, 12, 13, 14],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [15, 16, 17, 18, 19, 20, 21, 22, 23, 24],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])

In [76]:
A[:A.shape[1]:]

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [ 5,  6,  7,  8,  9, 10, 11, 12, 13, 14],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [15, 16, 17, 18, 19, 20, 21, 22, 23, 24],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])

In [77]:
A[:A.shape[1]:1]

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [ 5,  6,  7,  8,  9, 10, 11, 12, 13, 14],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [15, 16, 17, 18, 19, 20, 21, 22, 23, 24],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])

* Get all rows of a matrix, except the first one:

In [78]:
A[1:]

array([[ 5,  6,  7,  8,  9, 10, 11, 12, 13, 14],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [15, 16, 17, 18, 19, 20, 21, 22, 23, 24],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])

In [79]:
A[1::]

array([[ 5,  6,  7,  8,  9, 10, 11, 12, 13, 14],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [15, 16, 17, 18, 19, 20, 21, 22, 23, 24],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])

* Get the first two rows of a matrix:

In [80]:
A[0:2]

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [ 5,  6,  7,  8,  9, 10, 11, 12, 13, 14]])

* Get the even rows of a matrix:

In [81]:
A[0::2]

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])

* Get the odd rows of a matrix:

In [82]:
A[1::2]

array([[ 5,  6,  7,  8,  9, 10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19, 20, 21, 22, 23, 24]])

* Get the odd columns of a matrix:

In [83]:
A[:,1::2]

array([[ 1,  3,  5,  7,  9],
       [ 6,  8, 10, 12, 14],
       [11, 13, 15, 17, 19],
       [16, 18, 20, 22, 24],
       [21, 23, 25, 27, 29]])

* Getting the second row:

In [84]:
A[1,:]

array([ 5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

* Getting the third column:

In [85]:
A[:,2]

array([ 2,  7, 12, 17, 22])

* Getting a top-left $2\times 2$ submatrix:

In [86]:
A[:2,:2]

array([[0, 1],
       [5, 6]])

* Getting a bottom-right $2\times 2$ submatrix:

In [87]:
A[A.shape[0]-2:,A.shape[1]-2:]

array([[23, 24],
       [28, 29]])

 ## Boolean array indexing

* Finding the elements bigger than ...

In [88]:
bool_idx = (A>12)
print(bool_idx)

[[False False False False False False False False False False]
 [False False False False False False False False  True  True]
 [False False False  True  True  True  True  True  True  True]
 [ True  True  True  True  True  True  True  True  True  True]
 [ True  True  True  True  True  True  True  True  True  True]]


* Printing the elements bigger than ...

In [89]:
print(A[bool_idx])

[13 14 13 14 15 16 17 18 19 15 16 17 18 19 20 21 22 23 24 20 21 22 23 24
 25 26 27 28 29]


## Elementwise (vectorial-vectorial and vectorial-scalar) math

* Create an zero-ed matrix:

In [90]:
A = np.zeros((5,5), np.int32)
print(A)

[[0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]]


* Change to 1 from coordinate (1,1) to coordinate (4,4):

In [91]:
A[1:4,1:4] = 1
print(A)

[[0 0 0 0 0]
 [0 1 1 1 0]
 [0 1 1 1 0]
 [0 1 1 1 0]
 [0 0 0 0 0]]


* Vectorial-scalar addition:

In [92]:
A[1:4, 1:4] += 1
print(A)

[[0 0 0 0 0]
 [0 2 2 2 0]
 [0 2 2 2 0]
 [0 2 2 2 0]
 [0 0 0 0 0]]


* A new matrix:

In [93]:
B = np.ones((5,5), np.int32)
print(B)

[[1 1 1 1 1]
 [1 1 1 1 1]
 [1 1 1 1 1]
 [1 1 1 1 1]
 [1 1 1 1 1]]


* Vectorial addition:

In [94]:
C = A + B
print(C)

[[1 1 1 1 1]
 [1 3 3 3 1]
 [1 3 3 3 1]
 [1 3 3 3 1]
 [1 1 1 1 1]]


* Vectorial substraction:

In [95]:
D = C - B
print(D)

[[0 0 0 0 0]
 [0 2 2 2 0]
 [0 2 2 2 0]
 [0 2 2 2 0]
 [0 0 0 0 0]]


* Vectorial multiplication (not matrix multiplication!):

In [96]:
C = C * D
print(C)

[[0 0 0 0 0]
 [0 6 6 6 0]
 [0 6 6 6 0]
 [0 6 6 6 0]
 [0 0 0 0 0]]


* Floating-point vectorial division:

In [97]:
C = C / B
print(C)

[[0. 0. 0. 0. 0.]
 [0. 6. 6. 6. 0.]
 [0. 6. 6. 6. 0.]
 [0. 6. 6. 6. 0.]
 [0. 0. 0. 0. 0.]]


* Fixed-point (integer) vectorial division:

In [98]:
C = D // B
print(C)

[[0 0 0 0 0]
 [0 2 2 2 0]
 [0 2 2 2 0]
 [0 2 2 2 0]
 [0 0 0 0 0]]


## Matricial math
Basic matrix computations.

* Let's define a "chessboard" matrix:

In [99]:
A = np.array([[(i+j)%2 for j in range(10)] for i in range(10)])
print(A, A.shape)

[[0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0 1 0]] (10, 10)


... and a 1-column matrix:

In [100]:
B = np.array([[1] for i in range(10)])
print(B, B.shape)

[[1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]
 [1]] (10, 1)


* Product matrix-matrix:

In [101]:
C = A @ B
print(C)

[[5]
 [5]
 [5]
 [5]
 [5]
 [5]
 [5]
 [5]
 [5]
 [5]]


* Sum of all elements of a matrix:

In [102]:
print(np.sum(C))

50


In [103]:
print(np.sum(A))

50


* Compute the maximum of a matrix:

In [104]:
print(np.max(C))

5


* Matrix transpose:

In [105]:
print(C.T, C.T.shape, C.shape)

[[5 5 5 5 5 5 5 5 5 5]] (1, 10) (10, 1)


* Determinant:

In [106]:
np.linalg.det(A)

0.0

* Inverse:

In [107]:
R = np.random.rand(5,5)
iR = np.linalg.inv(R)
print(iR)

[[ 3.04615348 -3.99257752  2.18308157 -3.17264096  0.21177375]
 [ 2.81297654 -3.61776273  1.44009232  0.05570666 -1.53259797]
 [-5.74337648  6.78089523 -3.38472131  2.9005613   2.56130709]
 [ 5.66250708 -9.11317019  5.71411408 -2.10742506 -3.16855396]
 [-2.92155326  5.68913015 -2.67107639  1.27939171  1.3434498 ]]


In [108]:
np.round(R @ iR)

array([[ 1.,  0., -0.,  0.,  0.],
       [ 0.,  1.,  0., -0., -0.],
       [-0., -0.,  1., -0.,  0.],
       [-0., -0., -0.,  1.,  0.],
       [-0., -0., -0.,  0.,  1.]])

In [109]:
R @ iR

array([[ 1.00000000e+00,  8.88178420e-16, -4.44089210e-16,
         2.22044605e-16,  0.00000000e+00],
       [ 4.44089210e-16,  1.00000000e+00,  4.44089210e-16,
        -4.44089210e-16, -2.22044605e-16],
       [-4.44089210e-16, -1.77635684e-15,  1.00000000e+00,
        -2.22044605e-16,  0.00000000e+00],
       [-2.22044605e-16, -8.88178420e-16, -4.44089210e-16,
         1.00000000e+00,  0.00000000e+00],
       [-4.44089210e-16, -4.44089210e-16, -4.44089210e-16,
         5.55111512e-16,  1.00000000e+00]])

In [110]:
np.round(iR @ R)

array([[ 1., -0., -0., -0., -0.],
       [ 0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0., -0.],
       [-0., -0., -0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  1.]])

In [111]:
iR @ R

array([[ 1.00000000e+00, -8.88178420e-16, -6.10622664e-16,
        -1.20389809e-15, -1.77635684e-15],
       [ 0.00000000e+00,  1.00000000e+00,  2.22044605e-16,
         2.77555756e-16,  8.88178420e-16],
       [ 0.00000000e+00,  4.44089210e-16,  1.00000000e+00,
         5.55111512e-17, -1.77635684e-15],
       [-8.88178420e-16, -4.44089210e-16, -4.44089210e-16,
         1.00000000e+00,  0.00000000e+00],
       [ 4.44089210e-16,  0.00000000e+00,  0.00000000e+00,
         1.38777878e-16,  1.00000000e+00]])

* Pseudo-inverse:

In [112]:
R = np.random.rand(5,4)
iR = np.linalg.pinv(R)
print(iR)

[[-1.07944754  0.64564343  0.90869325  1.13830672 -1.03016148]
 [ 1.59320169  0.41634817 -0.21057479 -1.86911381  0.69649517]
 [ 1.1421628  -2.37587356 -2.06435237 -1.86485779  6.30466481]
 [-0.48632967  2.01509728  1.70432222  2.46448329 -6.05242327]]


In [113]:
np.round(R @ iR)

array([[ 1.,  0., -0.,  0.,  0.],
       [ 0.,  1.,  0., -0., -0.],
       [-0.,  0.,  0.,  0.,  0.],
       [ 0., -0.,  0.,  1., -0.],
       [ 0., -0.,  0., -0.,  1.]])

In [114]:
R @ iR

array([[ 0.97202495,  0.10990743, -0.12082367,  0.01891353,  0.01251877],
       [ 0.10990743,  0.56819937,  0.474688  , -0.07430686, -0.04918331],
       [-0.12082367,  0.474688  ,  0.47816497,  0.08168717,  0.0540683 ],
       [ 0.01891353, -0.07430686,  0.08168717,  0.98721283, -0.00846376],
       [ 0.01251877, -0.04918331,  0.0540683 , -0.00846376,  0.99439788]])

In [115]:
np.round(iR @ R)

array([[ 1.,  0.,  0.,  0.],
       [-0.,  1., -0.,  0.],
       [-0., -0.,  1., -0.],
       [ 0.,  0.,  0.,  1.]])

In [116]:
iR @ R

array([[ 1.00000000e+00,  1.66533454e-16,  1.66533454e-16,
         2.77555756e-17],
       [-4.44089210e-16,  1.00000000e+00, -1.66533454e-16,
         4.16333634e-17],
       [-1.77635684e-15, -1.77635684e-15,  1.00000000e+00,
        -4.44089210e-16],
       [ 0.00000000e+00,  8.88178420e-16,  0.00000000e+00,
         1.00000000e+00]])

## Broadcasting
In vectorized operations, NumPy "extends" scalars and arrays with one of its dimensions equal to 1 to the size of the other(s) array(s).

In [117]:
A = np.ones((5,3))
A

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [118]:
B = np.arange(1)
B

array([0])

In [119]:
B += 1
B

array([1])

* Broadcasting of a $1\times 1$ matrix:

In [120]:
A + B # 'A' is 5x3 and 'B' is 1x1

array([[2., 2., 2.],
       [2., 2., 2.],
       [2., 2., 2.],
       [2., 2., 2.],
       [2., 2., 2.]])

* Broadcasting of a $1\times 3$ matrix:

In [121]:
B = np.arange(3)
B

array([0, 1, 2])

In [122]:
A + B # 'a' is 5x3 and 'b' is '1x3'

array([[1., 2., 3.],
       [1., 2., 3.],
       [1., 2., 3.],
       [1., 2., 3.],
       [1., 2., 3.]])

* Broadcasting of a $5\times 1$ matrix:

In [123]:
B = np.arange(5)
B

array([0, 1, 2, 3, 4])

In [125]:
B = B.reshape((5,1)) # (Rows, Columns)
B

array([[0],
       [1],
       [2],
       [3],
       [4]])

In [126]:
A + B

array([[1., 1., 1.],
       [2., 2., 2.],
       [3., 3., 3.],
       [4., 4., 4.],
       [5., 5., 5.]])

* One of the dimensions must match to broadcast the smaller array. Otherwise a `ValueError: frames are not aligned` is thrown.

In [127]:
B = np.arange(4)[:, None]
B

array([[0],
       [1],
       [2],
       [3]])

In [128]:
A.shape

(5, 3)

In [129]:
B.shape

(4, 1)

In [130]:
try:
    A + B
except ValueError as e:
    print("ValueError exception: ", end='')
    if hasattr(e, 'message'):
        print(e.message)
    else:
        print(e)

ValueError exception: operands could not be broadcast together with shapes (5,3) (4,1) 


## How fast is Numpy's array math?

In [131]:
A = np.array([[(i*10+j) for j in range(10)] for i in range(10)])
print(A, A.shape)

[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29]
 [30 31 32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47 48 49]
 [50 51 52 53 54 55 56 57 58 59]
 [60 61 62 63 64 65 66 67 68 69]
 [70 71 72 73 74 75 76 77 78 79]
 [80 81 82 83 84 85 86 87 88 89]
 [90 91 92 93 94 95 96 97 98 99]] (10, 10)


In [132]:
A[:1] # First row (a matrix)

array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

In [133]:
A[:1].shape

(1, 10)

In [134]:
A[:1][0] # First element of a matrix of one elment (a vector)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [135]:
A[:1][0].shape

(10,)

In [136]:
B = A[:1][0]
B

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

* Add `B[]` to all the rows of `A[][]` using scalar arithmetic:

In [137]:
C = np.empty_like(A)
def add():
    for i in range(A.shape[1]):
        for j in range(A.shape[0]):
            C[i, j] = A[i, j] + B[j]
%timeit add()
print(C)

127 µs ± 5.14 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
[[  0   2   4   6   8  10  12  14  16  18]
 [ 10  12  14  16  18  20  22  24  26  28]
 [ 20  22  24  26  28  30  32  34  36  38]
 [ 30  32  34  36  38  40  42  44  46  48]
 [ 40  42  44  46  48  50  52  54  56  58]
 [ 50  52  54  56  58  60  62  64  66  68]
 [ 60  62  64  66  68  70  72  74  76  78]
 [ 70  72  74  76  78  80  82  84  86  88]
 [ 80  82  84  86  88  90  92  94  96  98]
 [ 90  92  94  96  98 100 102 104 106 108]]


* Add `B[]` to all the rows of `B[][]` using vectorial computation:

In [138]:
C = np.empty_like(A)
def add():
    for i in range(A.shape[1]):
        C[i, :] = A[i, :] + B
%timeit add()
print(C)

42.8 µs ± 2.4 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
[[  0   2   4   6   8  10  12  14  16  18]
 [ 10  12  14  16  18  20  22  24  26  28]
 [ 20  22  24  26  28  30  32  34  36  38]
 [ 30  32  34  36  38  40  42  44  46  48]
 [ 40  42  44  46  48  50  52  54  56  58]
 [ 50  52  54  56  58  60  62  64  66  68]
 [ 60  62  64  66  68  70  72  74  76  78]
 [ 70  72  74  76  78  80  82  84  86  88]
 [ 80  82  84  86  88  90  92  94  96  98]
 [ 90  92  94  96  98 100 102 104 106 108]]


* Add `B[]` to all the rows of `A[][]` using fully vectorial computation:

In [139]:
%timeit C = A + B # <- broadcasting is faster
print(C)

4.66 µs ± 297 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
[[  0   2   4   6   8  10  12  14  16  18]
 [ 10  12  14  16  18  20  22  24  26  28]
 [ 20  22  24  26  28  30  32  34  36  38]
 [ 30  32  34  36  38  40  42  44  46  48]
 [ 40  42  44  46  48  50  52  54  56  58]
 [ 50  52  54  56  58  60  62  64  66  68]
 [ 60  62  64  66  68  70  72  74  76  78]
 [ 70  72  74  76  78  80  82  84  86  88]
 [ 80  82  84  86  88  90  92  94  96  98]
 [ 90  92  94  96  98 100 102 104 106 108]]


## Structured arrays

* Create a 1D array of (two) records, where each record has the structure (int, float, char[10]).

In [140]:
X = np.array([(1, 2., "Hello"), (3, 4., "World")],
             dtype=[("first", "i4"),("second", "f4"), ("third", "S10")]) # See struct
X

array([(1, 2., b'Hello'), (3, 4., b'World')],
      dtype=[('first', '<i4'), ('second', '<f4'), ('third', 'S10')])

* Get the first element of every record:

In [141]:
X["first"]

array([1, 3], dtype=int32)

* Get the first record:

In [142]:
X[0]

(1, 2., b'Hello')

* Get the second element of every record:

In [143]:
X["second"]

array([2., 4.], dtype=float32)

* Third element of every record:

In [144]:
X["third"]

array([b'Hello', b'World'], dtype='|S10')

## Disk I/O

* Output data to an ASCII file:

In [145]:
Data = np.array([[1., 200.], [2., 150.], [3., 250.]])
np.savetxt("data.txt", Data)
!cat Data.txt

1.000000000000000000e+00 2.000000000000000000e+02
2.000000000000000000e+00 1.500000000000000000e+02
3.000000000000000000e+00 2.500000000000000000e+02


* Input data from an ASCII file:

In [146]:
np.genfromtxt('data.txt')

array([[  1., 200.],
       [  2., 150.],
       [  3., 250.]])

* Output data to a binary file (using the native endianness):

In [147]:
ofile = open("data.float64", mode="wb")
Data.tofile(ofile)

* Input data from a binary file (using the native endianness):

In [148]:
np.fromfile("data.float64", dtype=np.float64)

array([  1., 200.,   2., 150.,   3., 250.])

* Numpy and C use the same endianness:

In [149]:
!cat create_float64.c
!gcc create_float64.c -o create_float64
!./create_float64

#include <stdio.h>

#define N 10

int main() {
  double a[N];
  int i;
  FILE *ofile = fopen("data.float64", "wb");
  for(i=0; i<N; i++) {
    a[i] = i;
  }
  fwrite(a, sizeof(double), N, ofile);
  fclose(ofile);
  fprintf(stderr,"create_float64: done\n");
}
create_float64: done


In [150]:
np.fromfile("data.float64", dtype=np.float64)

array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

* Specifiying the endianness:

In [151]:
np.fromfile("data.float64", dtype=">d") # (> = bit-endian, d = double, see struct)

array([0.00000e+000, 3.03865e-319, 3.16202e-322, 1.04347e-320,
       2.05531e-320, 2.56124e-320, 3.06716e-320, 3.57308e-320,
       4.07901e-320, 4.33197e-320])

* Make the things easier:

In [167]:
ofile = open("data.npy", mode="wb")
A = (100*np.random.rand(2,3)).astype(np.uint16)
print(A)

[[79 86 76]
 [35 79 67]]


In [168]:
np.save(ofile, A)

In [169]:
!ls save*

save_data


In [170]:
np.load("data.npy")

array([[79, 86, 76],
       [35, 79, 67]], dtype=uint16)