# Introduction to Python for Open Source Geocomputation

![python](pics/python-logo-master-v3-TM.png)

* Instructor: Dr. Wei Kang
* Class Location and Time: ENV 336, Mon & Wed 12:30 pm - 1:50 pm

Content:

* Numpy
* A new data type: `numpy.array`
    * How to create an array
    * Array operations

# What is Numpy?

* The fundamental package for scientific computing with Python
* Nearly every scientist working in Python draws on the power of NumPy.
* NumPy brings the **computational power** of languages like C and Fortran to Python, a language much easier to learn and use. With this power comes **simplicity: a solution in NumPy is often clear and elegant**.
* Essential in many different realms:
    * NumPy lies at the core of a rich ecosystem of **data science** libraries 
<img src="pics/ds-landscape.png" width="500"/>
    * NumPy forms the basis of powerful **machine learning** libraries like [scikit-learn](https://scikit-learn.org/stable/), [SciPy](https://scipy.org/), [TensorFlow](https://www.tensorflow.org/), and [PyTorch](https://pytorch.org/)

    * NumPy is an essential component in the burgeoning Python **visualization landscape**, which includes Matplotlib, Seaborn, Plotly, Altair, Bokeh, Holoviz, Vispy, Napari, and PyVista, to name a few.


## What makes Numpy so important?

*arrays*: A very powerful data type essential to numerical computing: 
* sequences of data all of the _same type_
* behave a lot like lists, except for the constraint in the type of their elements.
    * There is a huge efficiency advantage when you know that **all elements of a sequence are of the same type**—so equivalent methods for arrays execute a lot **faster** than those for lists.

## Numpy `Array`  (or `ndarray`)

* homogeneous multidimensional array
    * a table of elements (usually numbers), all of the same type, indexed by a tuple of non-negative integers 
        * for the data types accepted in Numpy. Read the [docs: Data type objects](https://numpy.org/doc/stable/reference/arrays.dtypes.html).
    * dimensions are called _axes_
* An Example: points' coordinates
    * one single point: one-dimensional array: `np.array([1,2])`
    * two or more points: two-dimensional array: 
        * two points: `np.array([[1,2], [3,4]])`
        * five points: `np.array([[1,2], [3,4],[5,6], [7,8], [9,10]])`

In [1]:
import numpy as np

In [2]:
a1 = np.array([1,2])
a1

array([1, 2])

In [3]:
a2 = np.array([[1,2], [3,4],[5,6], [7,8], [9,10]])
a2

array([[ 1,  2],
       [ 3,  4],
       [ 5,  6],
       [ 7,  8],
       [ 9, 10]])

### Motivation (1): What can a Numpy array used for?

* An array can contain:
     * values of an experiment/simulation at discrete time steps, e.g., income, air pollution, crime rate, animal/plant occurrence
     * pixels of an image, grey-level or colour
     * signal recorded by a measurement device, e.g. sound wave
     * 3-D data measured at different X-Y-Z positions, e.g. MRI scan, digital elevation model

### Motivation (2): Efficiency of Numpy array - an example

* Problem description:  Write a python program that calculate the square of each number in a list, such that $x_i=i^2$, for $0\leq i < n$. 

Two data types: 
* Python built-in type: list
* Numpy array

We use [`%timeit`](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit) to calculate the time execution of a Python statement or expression.

In [4]:
L = list(range(1000)) #produce a list of integers from 0 to 999
L

[0,
 1,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 21,
 22,
 23,
 24,
 25,
 26,
 27,
 28,
 29,
 30,
 31,
 32,
 33,
 34,
 35,
 36,
 37,
 38,
 39,
 40,
 41,
 42,
 43,
 44,
 45,
 46,
 47,
 48,
 49,
 50,
 51,
 52,
 53,
 54,
 55,
 56,
 57,
 58,
 59,
 60,
 61,
 62,
 63,
 64,
 65,
 66,
 67,
 68,
 69,
 70,
 71,
 72,
 73,
 74,
 75,
 76,
 77,
 78,
 79,
 80,
 81,
 82,
 83,
 84,
 85,
 86,
 87,
 88,
 89,
 90,
 91,
 92,
 93,
 94,
 95,
 96,
 97,
 98,
 99,
 100,
 101,
 102,
 103,
 104,
 105,
 106,
 107,
 108,
 109,
 110,
 111,
 112,
 113,
 114,
 115,
 116,
 117,
 118,
 119,
 120,
 121,
 122,
 123,
 124,
 125,
 126,
 127,
 128,
 129,
 130,
 131,
 132,
 133,
 134,
 135,
 136,
 137,
 138,
 139,
 140,
 141,
 142,
 143,
 144,
 145,
 146,
 147,
 148,
 149,
 150,
 151,
 152,
 153,
 154,
 155,
 156,
 157,
 158,
 159,
 160,
 161,
 162,
 163,
 164,
 165,
 166,
 167,
 168,
 169,
 170,
 171,
 172,
 173,
 174,
 175,
 176,
 177,
 178,
 179,
 180,
 181,
 182,
 183,
 184,


In [5]:
%timeit -n 1000 [i**2 for i in L]  

182 µs ± 3.77 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [6]:
import numpy as np
a = np.arange(1000) #produce an array of integers from 0 to 999
a

array([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,
        13,  14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,
        26,  27,  28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,
        39,  40,  41,  42,  43,  44,  45,  46,  47,  48,  49,  50,  51,
        52,  53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,  64,
        65,  66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,  77,
        78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,  90,
        91,  92,  93,  94,  95,  96,  97,  98,  99, 100, 101, 102, 103,
       104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116,
       117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129,
       130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142,
       143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155,
       156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168,
       169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 18

In [7]:
%timeit -n 1000 a**2

1.1 µs ± 439 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


# Importing Numpy

```python
import numpy as np
```

In [8]:
import numpy as np

In [9]:
dir(np) #function dir gives you the package's attributes and functions.

['ALLOW_THREADS',
 'AxisError',
 'BUFSIZE',
 'CLIP',
 'DataSource',
 'ERR_CALL',
 'ERR_DEFAULT',
 'ERR_IGNORE',
 'ERR_LOG',
 'ERR_PRINT',
 'ERR_RAISE',
 'ERR_WARN',
 'FLOATING_POINT_SUPPORT',
 'FPE_DIVIDEBYZERO',
 'FPE_INVALID',
 'FPE_OVERFLOW',
 'FPE_UNDERFLOW',
 'False_',
 'Inf',
 'Infinity',
 'MAXDIMS',
 'MAY_SHARE_BOUNDS',
 'MAY_SHARE_EXACT',
 'NAN',
 'NINF',
 'NZERO',
 'NaN',
 'PINF',
 'PZERO',
 'RAISE',
 'SHIFT_DIVIDEBYZERO',
 'SHIFT_INVALID',
 'SHIFT_OVERFLOW',
 'SHIFT_UNDERFLOW',
 'ScalarType',
 'Tester',
 'TooHardError',
 'True_',
 'UFUNC_BUFSIZE_DEFAULT',
 'UFUNC_PYVALS_NAME',
 'WRAP',
 '_CopyMode',
 '_NoValue',
 '_UFUNC_API',
 '__NUMPY_SETUP__',
 '__all__',
 '__builtins__',
 '__cached__',
 '__config__',
 '__deprecated_attrs__',
 '__dir__',
 '__doc__',
 '__expired_functions__',
 '__file__',
 '__getattr__',
 '__git_version__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__',
 '_add_newdoc_ufunc',
 '_distributor_init',
 '_financial_names',
 

### Creating a Numpy Array

* create an array from a regular Python list or tuple using the `array` function.
```python
np.array(list/tuple)
```
* functions from Numpy to create special arrays
    * [`np.arange()`](https://numpy.org/doc/stable/reference/generated/numpy.arange.html): create evenly spaced values within a given interval.
    * [`np.linspace(start, stop, num=50)`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html): create evenly spaced numbers over a specified interval.
    * [`np.ones(shape)`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ones.html#numpy.ones): create new array of given shape and type, filled with ones.
    * [`np.zeros(shape)`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.zeros.html#numpy.zeros): create a new array of given shape and type, filled with zeros.
    * [`np.eye(N)`](https://numpy.org/devdocs/reference/generated/numpy.eye.html): create a 2-D array with ones on the diagonal and zeros elsewhere.

In [10]:
a1 = np.array([1,2])
a1

array([1, 2])

In [11]:
type(a1)

numpy.ndarray

In [12]:
a1.size

2

`array.size` gives the number of items in the array.

In [13]:
len(a1)

2

`len(array)` gives the same result to `array.size` 

In [14]:
a1.ndim

1

`array.ndim` gives the number of axes (dimensions) of the array.

In [15]:
a1.shape

(2,)

`array.shape` gives the dimensions of the array. This is a tuple of integers indicating the **size** of the array in each dimension. For a matrix with n rows and m columns, shape will be (n,m). The length of the shape tuple is therefore the number of axes, ndim.

In [16]:
a1.dtype

dtype('int64')

`array.dtype` returns an object describing the type of the elements in the array

In [17]:
a_str = np.array([1.0,2,"1"])
a_str

array(['1.0', '2', '1'], dtype='<U32')

In [18]:
a_str.dtype #32-character  string 

dtype('<U32')

In [19]:
a2 = np.array([[1,2], [3,4]])
a2

array([[1, 2],
       [3, 4]])

In [20]:
a2.ndim

2

In [21]:
a2.size

4

In [22]:
len(a2)

2

`len(array)` gives the number of rows or the size of the first dimension when encountering a 2-dimensional array

In [23]:
a2.shape

(2, 2)

In [24]:
a2.dtype

dtype('int64')

In [25]:
a3 = np.array([[1,2], [3,4],[5,6], [7,8], [9,10]])
a3

array([[ 1,  2],
       [ 3,  4],
       [ 5,  6],
       [ 7,  8],
       [ 9, 10]])

In [26]:
a3.ndim

2

In [27]:
len(a3)

5

In [28]:
a3.size

10

In [29]:
a3.shape

(5, 2)

In [30]:
a3.dtype

dtype('int64')

## Further reading

* read [Numpy tutorial](https://numpy.org/doc/stable/user/quickstart.html) to learn more about numpy functionalities