# Python for High Performance Computing
# Conclusion
<hr style="border: solid 4px green">
<br>
<center> <img src="images/arc_logo.png"; alt="Logo" style="float: center; width: 20%"></center>
<br>
## http://www.arc.ox.ac.uk
## support@arc.ox.ac.uk

## Overview
<hr style="border: solid 4px green">

### We have examined
* the sources of Python slow performance for scientific computation
* `NumPy`
  * array types that are suited for scientific codes
  * the basis for other Python modules
* single host parallel execution
  * serial: `NumPy`
  * parallel: `numba`, `Cython`, `multiprocessing`
* distributed parallel execution
  * `mpi4py`

## What to use when?
<hr style="border: solid 4px green">

### Assuming greenfield programming
* `NumPy` should be the first port of call
  * vectorised operations
  * ufuncs and specialised class methods
  * the basis of further approaches
* `numba` and `Cython` are very easy
  * should be tried next
  * large gains for almost no effort
* `multiprocessing` is relatively easy
  * optimal for mapping functions to data in parallel
  * best case: expensive functions and little data to map to
  * limited to single host execution
* `mpi4py` is relatively difficult
  * may pay off if multi-host execution dominates
  * inter-process communication should be avoided
  
### But...
* first profile, then re-program

## <span style="font-family: Courier New, Courier, monospace;">cython</span> and <span style="font-family: Courier New, Courier, monospace;">numba</span> (cont'd)
<hr style="border: solid 4px green">

### Which to use?
<br><br>

### Short answer: it depends.
<br><br>

### Long answer: it depends on several factors.
* `cython` is a safer bet
  * stable and mature
  * easier to distribute than `numba`, a better option for user-facing libraries
  * the preferred option for most of the scientific Python stack, including `NumPy`, `SciPy`, `pandas` and `Scikit-Learn`
  * shared-memory parallelism is easy to program
  * can compile arbitrary Python code, and can even directly call C

* `numba` is maturing fast
  * increasing number of features
  * few libraries that use `numba`
  * `numba` only accelerates code that uses scalars or N-dimensional arrays; built-in types (such as lists or dictionaries or own custom classes do not work)
<br><br>

### Conclusion
* `cython` is the better option in general
  * suitable for larger projects
  * an entire module (including advanced Python features) can be "cythonized" easily
  * the bottlenecks for speed can be targeted with little effort
  * parallel processing is an easy add-on
* `numba` can be a very good idea
  * suitable for smaller projects
  * Python data types limited to what `numba` can handle
  * serial processing is acceptable

<img src="../../images/reusematerial.png"; style="float: center; width: 90"; >
<br>
<br>