# Data science libraries - Chapter3 - LESSON 1. : Introduction to Numpy


## 1. Introduction

The main object of NumPy is the homogeneous multidimensional array. It is an array of elements (usually numbers), all of the `same type', indexed by a tuple of non-negative integers. In NumPy, the dimensions are called `axes`.

For example, the coordinates of a point in 3D space `[1, 2, 1]` have an axis. This axis contains 3 elements, so we say it has a length of 3.
In the example shown below, the array has 2 axes. The first axis has a length of 2, the second axis has a length of 3.

```
[[1., 0., 0.],
 [0., 1., 2.]]
```

The class of a NumPy array is called `ndarray`. It is also known by the alias: `array`.

The most important attributes of an endarray object are :

| Method | Description |
| ------- | ----------- |
| `ndarray.ndim` | the number of axes (dimensions) in the table. |
| `ndarray.shape` | the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with `n` rows and `m` columns, the `shape` will be `(n,m)`. The length of the shape tuple is therefore the number of axes, `ndim`. |
| `ndarray.size` | the total number of elements in the array. This is equal to the product of the elements of `shape`. |
| `ndarray.dtype` | an object describing the type of the elements in the array. You can create or specify dtypes using standard Python types. In addition, NumPy provides its own types. `numpy.int32`, `numpy.int16` and `numpy.float64` are some examples. |
| `ndarray.itemsize` | the size in bytes of each element of the array. For example, an array of elements of type `float64` has an `itemsize` of 8 (=64/8), while one of type complex32 has an element size of 4 (=32/8). It is equivalent to `ndarray.dtype.itemsize`. |
| `ndarray.data` | the buffer containing the actual elements of the array. Normally we will not need to use this attribute as we will access the elements of an array using indexing functions. |

Some examples: 

In [None]:
import numpy as np

# Generate 2 dimensions array
a = np.arange(15).reshape(3, 5)
print(a)

# Show infos
print(a.shape)
print(a.ndim)
print(a.dtype.name)
print(a.itemsize)
print(a.size)
print(type(a))


# Create 1 dimension array
b = np.array([6, 7, 8])
print(b)
print(type(b))


[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]
(3, 5)
2
int64
8
15
<class 'numpy.ndarray'>
[6 7 8]
<class 'numpy.ndarray'>


## 2. Creation of tables

There are several ways to create numpy arrays.

For example, you can create an array from a list or a standard Python tuple using the `array` function. The type of the resulting array is deduced from the type of the elements in the sequences.


In [None]:
import numpy as np

a = np.array([2, 3, 4])
print(a)
print(a.dtype)

b = np.array([1.2, 3.5, 5.1])
print(b)
print(b.dtype)

# Be careful !
a = np.array(1, 2, 3, 4)    # WRONG
a = np.array([1, 2, 3, 4])  # RIGHT

[2 3 4]
int64
[1.2 3.5 5.1]
float64


TypeError: ignored

`array` transforms sequences of sequences into two-dimensional arrays, sequences of sequences of sequences into three-dimensional arrays, etc.

In [None]:
b = np.array([(1.5, 2, 3), (4, 5, 6)])
print(b)

The `zeros` function creates an array full of zeros, the `ones` function creates an array full of ones, and the `empty` function creates an array whose initial content is random and depends on the state of memory. By default, the `dtype` of the array created is `float64`, but this can be specified via the `dtype` argument keyword.

In [None]:
print(np.zeros((3, 4)))
print(np.ones((2, 3, 4), dtype=np.int16))
print(np.empty((2, 3)))

To create sequences of numbers, NumPy provides the `arange` function which is analogous to the Python `range` function, but returns an `array`.

In [None]:
print(np.arange(10, 30, 5))
print(np.arange(0, 2, 0.3))

When `arange` is used with floating point arguments, it is usually not possible to predict the number of elements obtained, due to the precision of the finite floating point. For this reason, it is generally preferable to use the `linspace` function, which is given the number of elements you want as an argument, instead of the :

In [None]:
from numpy import pi

print(np.linspace(0, 2, 9))

x = np.linspace(0, 2 * pi, 100)
print(x)

f = np.sin(x)
print(f)


[0.   0.25 0.5  0.75 1.   1.25 1.5  1.75 2.  ]
[0.         0.06346652 0.12693304 0.19039955 0.25386607 0.31733259
 0.38079911 0.44426563 0.50773215 0.57119866 0.63466518 0.6981317
 0.76159822 0.82506474 0.88853126 0.95199777 1.01546429 1.07893081
 1.14239733 1.20586385 1.26933037 1.33279688 1.3962634  1.45972992
 1.52319644 1.58666296 1.65012947 1.71359599 1.77706251 1.84052903
 1.90399555 1.96746207 2.03092858 2.0943951  2.15786162 2.22132814
 2.28479466 2.34826118 2.41172769 2.47519421 2.53866073 2.60212725
 2.66559377 2.72906028 2.7925268  2.85599332 2.91945984 2.98292636
 3.04639288 3.10985939 3.17332591 3.23679243 3.30025895 3.36372547
 3.42719199 3.4906585  3.55412502 3.61759154 3.68105806 3.74452458
 3.8079911  3.87145761 3.93492413 3.99839065 4.06185717 4.12532369
 4.1887902  4.25225672 4.31572324 4.37918976 4.44265628 4.5061228
 4.56958931 4.63305583 4.69652235 4.75998887 4.82345539 4.88692191
 4.95038842 5.01385494 5.07732146 5.14078798 5.2042545  5.26772102
 5.33118753 5.394

## 3. The display of the tables
When you display a table, NumPy displays it in the same way as nested lists, but with the following layout:
 - the last axis is displayed from left to right,
 - the second to last axis is displayed from top to bottom,
 - The rest is also displayed from top to bottom, each slice being separated from the next by an empty line.

One-dimensional tables are then displayed as lines, two-dimensional tables as matrices and three-dimensional tables as lists of matrices.

In [2]:
import numpy as np
a = np.arange(6)
print(a)

b = np.arange(12).reshape(4, 3)
print(b)

c = np.arange(24).reshape(2, 3, 4)
print(c)

[0 1 2 3 4 5]
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]


If an array is too large to be displayed, NumPy automatically ignores the middle part of the array and displays only the corners:

In [None]:
print(np.arange(10000))
print(np.arange(10000).reshape(100, 100))

To disable this behaviour and force NumPy to display the entire table, you can change the print options using `set_printoptions`.

In [None]:
import sys
np.set_printoptions(threshold=sys.maxsize)

## 4. Basic operations

Arithmetic operators on arrays are applied element by element. A new array is created and filled with the result.

In [None]:
a = np.array([20, 30, 40, 50])
b = np.arange(4)
print(c - b)
print(b ** 2)
print(10 * np.sin(a))
print(a < 35)

The product operator `*` works per element in NumPy arrays.
The matrix product can be performed using the `@` operator or the `dot` function or method:

In [None]:
A = np.array([[1, 1], [0, 1]])
B = np.array([[2, 0], [3, 4]])
print(A * B)
print(A @ B)
print(A.dot(B))

# 5. The universal functions

NumPy provides well-known mathematical functions such as `sin`, `cos` and `exp`. In NumPy, these are called `ufunc` functions. These functions operate element by element on an array, producing an array as output.

In [None]:
B = np.arange(3)
print(B)

print(np.exp(B))
print(np.sqrt(B))

C = np.array([2., -1., 4.])
print(C)
print(np.add(B, C))

Other functions are: `all`, `any`, `apply_along_axis`, `argmax`, `argmin`, `argsort`, `average`, `bincount`, `ceil`, `clip`, `conj`, `corrcoef`, `cov`, `cross`, `cumprod`, `cumsum`, `diff`, `dot`, `floor`, `inner`, `invert`, `lexsort`, `max`, `maximum`, `mean`, `median`, `min`, `minimum`, `nonzero`, `outer`, `prod`, `re`, `round`, `sort`, `std`, `sum`, `trace`, `transpose`, `var`, `vdot`, `vectorize`, `where`

## 6. Indexing, slicing and iteration

One-dimensional arrays can be indexed, sliced and iterated, much like lists and other Python sequences.

In [None]:
a = np.arange(10)
print(a)

# Get element
print(a[2])

print(a[2:5])

# equal to a[0:6:2] = 1000
a[:6:2] = 1000
print(a)

# reverted array
print(a[::-1])

for i in a:
    print(i)

Multidimensional arrays can have one index per axis. These indices are given in a tuple separated by commas:

In [None]:
def f(x, y):
    return 10 * x + y

# We can create an array using a function
b = np.fromfunction(f, (5, 4), dtype=int)
print(b)

# Get element
print(b[2, 3])

# each row in the second column of b
# equivalent to : b[:, 1]
print(b[0:5, 1])

# each column in the second and third row of b
print(b[1:3, :])

# the last row. Equivalent to b[-1, :]
print(b[-1])

The bracketed expression in `b[i]` is treated as an `i` followed by as many instances of `:` as are needed to represent the remaining axes. NumPy also allows you to write this using points as `b[i, ...]`.

The dots (`...`) represent as many `:` as are needed to produce a complete indexing tuple. For example, if `x` is a 5-axis array, then :
 - `x[1, 2, ...]` is equivalent to `x[1, 2, :, :, :]`
 - `x[..., 3]` is equivalent to `x[:, :, :, :, 3]`
 - `x[4, ..., 5, :]` is equivalent to `x[4, :, :, 5, :]`.


In [None]:
# a 3D array (two stacked 2D arrays)
c = np.array([[[  0,  1,  2],
             [ 10, 12, 13]],
             [[100, 101, 102],
             [110, 112, 113]]])

print(c.shape)

print(c[1, ...])

print(c[..., 2])

Iteration on multidimensional arrays is done with respect to the first axis:

In [None]:
b = np.array([[  0,  1,  2],
             [ 10, 12, 13],
             [100, 101, 102],
             [110, 112, 113]])

for row in b:
    print(row)


# get each element of array in all dimensions
for element in b.flat:
    print(element)