# Python for High Performance Computing
# Python revision
<hr style="border: solid 4px green ">
<br>
<center> <img src="images/arc_logo.png"; alt="Logo" style="float: center; width: 20%"></center>
<br>
## http://www.arc.ox.ac.uk
## support@arc.ox.ac.uk

## Using Python
<hr style="border: solid 4px green">

### Python can be run *interactively*
* directly executing the Python interpreter `python`
  * run Python interpreter without an input script file
  * interpreter runs as a Python shell (interactive Python runtime environment)
* indirectly, via a notebook
  * `IPython` -- enhanced Python shell, ideal for data manipulation and visualisation
  * `Jupyter` -- web browser-based interactive document
<br><br>

> The interactive mode is good for presentations, data inspection, *etc.*

## Using Python
<hr style="border: solid 4px green">

### Python can be run *non-interactively*
* supply the Python interpreter with an input script file
* example: `python myscript.py`
<br><br>

> The non-interactive mode is the normal way to execute production code.

## Interactive Python shell
<hr style="border: solid 4px green">

* interactive access to to interpreter via the `python` command (not Windows)
* type commands directly at the prompt

```bash
bash-prompt> python
>>> print "Hello World!"
>>> import numpy
>>> numpy.__version__
```

* use `CTRL-d` to exit
* good for testing simple operations
* however, can be rather limited
  * history lost on exit
  * no completion or integrated help
  * certain things do not work
  * not suitable for *production* code

## Interactive IPython shell
<hr style="border: solid 4px green">

### IPython extends the standard Python shell by a number of useful things
* TAB completion
* help (*e.g.* `help` or `?int`)
* "magic" commands, *e.g.*
  * `%run` (run python script within shell)
  * `%hist` (history of commands)
  * `%save` (record state to return later)
* shell escape to operating system, *e.g.*
  * `!ls -l`

## Non-interactive Python runs
<hr style="border: solid 4px green">

### Use a text editor to edit a script
* script file `hello.py`
```python
print "Hello World!"
```
<br><br>

### The script can be executed
* running from the command-line: `python hello.py`
* running from within an IPython shell using "magic": `%run hello.py`

> *Note*: scripts are ideal for pertsistent, reusable, large (complex) code.

## <span style="font-family: Courier New, Courier, monospace;">jupyter-notebook</span>
<hr style="border: solid 4px green">

### <span style="font-family: Courier New, Courier, monospace;">jupyter-notebook</span> (formerly <span style="font-family: Courier New, Courier, monospace;">ipython-notebook</span>)
* front-end for IPython, a browser-based application
* very useful for presentations and interactive use
* but *not* so useful for production code
* definitely *not* useful in batch processing

### Starting the notebook (from the command line in Linux and MacOS)
* `jupyter-notebook`
* `jupyter notebook`
* `jupyter notebook --browser=firefox`
<br><br>

### Notebook presentations
* input is split into cells
* cells are "Markdown" (text) and "Code" (Python commands)
* cells are "executed" with `SHIFT+ENTER`
  * "Markdown" cells are displayed
  * "Code" cells execute code
<br><br>

### Example

In [1]:
print "Hello World"

Hello World


## Python data types
<hr style="border: solid 4px green">

### Python is "dynamically typed"
* no explicit type declarations
* data types are worked out from context when code is executed
* basic types include integers, floating point numbers, strings...

In [2]:
# assign values to variables
nmax = 11
pi = 3.14159
string1 = "double quotes"
string2 = 'or single quotes'
print nmax, pi
print string1 + " " + string2

11 3.14159
double quotes or single quotes


In [3]:
# find out type using the inbuilt function...
print type (nmax)
print type (pi)
print type (string1)

<type 'int'>
<type 'float'>
<type 'str'>


## Careful!
<hr style="border: solid 4px green">

* dynamic typing requires some care
* use explicit type casting when needed

In [4]:
# a long variable name
long_variable_name = 1

# is easily mistyped later
long_varaible_name = long_variable_name + 1

# valid code: no errors but buggy execution
print long_variable_name

1


In [5]:
# care with variables that are meant to be integers
nmax = pi

# ...you have just redefined nmax to be a float
print nmax, type (nmax)

3.14159 <type 'float'>


In [6]:
# So use an explicit cast if unsure, e.g.
nmax = int(pi)
print nmax, type (nmax)

3 <type 'int'>


## A very likely occurence: integer division
<hr style="border: solid 4px green">

### Problem
```python
1/2
```
normally returns `0` rather than `0.5`.
<br><br>

### Solutions
* cast divisor, *e.g.* `float(2)`
* cast *all* division operations to floating point operations by importing the Python 3 division
```python
from __future__ import division
```

In [7]:
nmax = 100  # integer
xmax = 2    # meant to be float but forgot to write 2.0
# simple division (integers)
dx   = xmax / nmax
print dx, type(dx)
# cast divisor to float
dx   = xmax / float(nmax)
print dx, type(dx)
# exercise: import python3 division

0 <type 'int'>
0.02 <type 'float'>


## Python data structures
<hr style="border: solid 4px green">

* lists
  * `[3, "a", 3.14, False]`
* dictionary (or "dicts")
  * `{"key1" : "value1", "key1" : "value1"}`
* tuples
  * `(1,2,3)` or `(1.2,)` or `(1,)`

In [8]:
lista = [3, "a", 3.14, False]
dictb = {"key1" : 1, "key2" : 2}
tuplec = (1,2,3)

In [9]:
# what do these do?
print lista[2]
print dictb["key2"]
print tuplec[2]

3.14
2
3


## Immutable (read-olnly) objects
<hr style="border: solid 4px green">

An error related to the tuple is raised because the tuple is an *immutable* object: an object whose state cannot be modified after it is created.

In [10]:
# the following work
lista[2] = 1
dictb["key2"] = 1

In [11]:
# but now try...
tuplec[2] = 1

TypeError: 'tuple' object does not support item assignment

## Python code indentation
<hr style="border: solid 4px green">

### Code blocks are indented
* loops
* conditionals
* functions

### Myths
* indentation should be exactly 4 spaces (they can be anything, as long as everything is consistent, see below)
* tabs and spaces cannot be mixed (they can, although not a good idea)

In [None]:
# change the number of white spaces below...
for n in range (4):
    print "Iteration", n

print "End of iteration"

## Python modules
<hr style="border: solid 4px green">

### Extending Python functionality
* suppose you need to find the square root of a number
* Python does not have a `sqrt` function
* you need the standard library module `math` for that

In [14]:
# this gives an error (comment it out)
#print sqrt(25.0)
# need to import the math module
import math
# this still does not work (comment it out)
#print sqrt(25.0)
# you have to reference the method from its class/object
print math.sqrt(25.0)

5.0


## Importing modules
<hr style="border: solid 4px green">

### Two basic options
* `import module`
  * brings an entire module into the current namespace
* `from module import name`
  * brings the `name` object into the current namespace as a local reference
  * use the variant

    `from module import name as name2`

    in order to
    * refer object using a shorter name (*e.g.* `np` instead of `numpy`)
    * avoid namespace conflicts (*e.g.* two objects with the same name belong to two separate packages)

> *Note*: avoid universal imports, *e.g.* `from module import *`
> * diminished code readability (not clear method belongs to what class)
> * polluted namespace (overriding variable and function names)

## Example
<hr style="border: solid 4px green">

* the standard library contains a module `random` for the generation of random numbers
* the `random` module contains a function `random()`, amongst others

In [18]:
import random
print random.random ()
print random.choice (["yes", "no", "maybe"])

0.77848290253
no


## Python module structure
<hr style="border: solid 4px green">

### General: a directory containing
* `setup.py` -- package and distribution management
* `__init__.py` -- module initialiser, *e.g.* loads other modules
* Python files
* directories: tests, documentation, *etc.*
* sub-modules
<br><br>

### Simplest case: single file
* a file with extension `.py` that can be imported from Python
* Python environment specified at the top of the file
```python
#!/usr/bin/env python
```
* Python main at the bottom of the file (this allows use as "stand-alone" script or as imported module)
```python
if __name__ == "__main__":
    import sys
    some_function_defined_above(sys.argv)
```

> Linux and MacOS: if file is executable, locate interpreter from environment `$PATH`

## Python functions
<hr style="border: solid 4px green">

* in-built functions exist, *e.g.* `int()`, `type()`, `range()`
* object methods are accessed with the dot operator (`object.method()`), *e.g.* `list.sort()`
* module functions also accessed with dot operator (`module.function()`), *e.g.* `math.sqrt()`
* defining own function is easy, *e.g.*
```python
def my_square(x):
        return x*x
```

> More about in-built functions https://docs.python.org/2/library/functions.html

In [None]:
lista = [3, "a", 3.14, False]; print lista;
lista.sort(); print lista;

## Python functions and variable scoping
<hr style="border: solid 4px green">

### This can be confusing for beginners
* variables are local to the scope assigned to them

In [19]:
def cat_change ():
    cat = "purring"
    print "inside: cat is " + cat

cat = "meowing"
cat_change ()
print "outside: cat is " + cat

inside: cat is purring
outside: cat is meowing


## Python functions and variable scoping (cont'd)
<hr style="border: solid 4px green">

### A variable defined in the main body of a file is global
* visible throughout the file
* also visible inside any file which imports that file
<br><br>

### A variable defined inside a function is local to that function
* it is accessible from the point at which it is defined until the end of the function
* it exists for as long as the function is executing

In [1]:
def cat_change ():
    cat_local = "purring"
    cat_global = "also purring"
    print "inside: global cat is " + cat_global
    print "inside: local cat is " + cat_local

cat_global = "meowing"
cat_change ()
# the following prints "meowing"
print "outside: global cat is " + cat_global
# the following produces an error: comment line out
#print "outside: local cat is " + cat_local

inside: global cat is also purring
inside: local cat is purring
outside: global cat is meowing


## Python functions and variable scoping (cont'd)
<hr style="border: solid 4px green">

* the global variable is visible but changes to it have local effect
* declared as global, the function can change the global variable, with global effect
```python
global cat_global
cat_global = "also purring"
```

## Python functions and variable scoping (cont'd)
<hr style="border: solid 4px green">

### All this confusion is avoidable
* globals are *not* a good idea
* functions are passed arguments
* OOP encapsulates data with code

## Exercise : Warm-up 
<hr style="border: solid 4px green">

### Median from a list
Define a function `my_median()` that takes a list of years and computes the median based on the elapsed number of years since 1900 for each entry in the list.

The function should return the median *and* the value either side of the median as a list.  For instance, if `years = [1989, 1955, 2011, 1943, 1975]` then the result should be `[55, 75, 89]`

Assume the input list is not ordered and has an odd number of elements $N$, where $N \geq 3$.
1. Use the above example list as input to check you get the right answer.
2. The list needs to be sorted -- how?
3. List indexing may help you to get the final result, *e.g.* `list[3:5]`.
4. Try generating a random array for input using module random.

Hint: the `list.append()` method is very useful for this exercise.

> Type your solution in the notebook, or use an editor to create a separate script in a file, or use an IDE.  Use the approach you are most comfortable with...

## Solution : Warm-up
<hr style="border: solid 4px green">

The solution...

In [2]:
# %load warm_up_1.py

""" A function to calculate the median of a list"""

def my_median(years):
    ages = []
    for y in years:
        ages.append(y - 1900)

    ages.sort()
    mid = len(ages)/2

    return ages[mid-1:mid+2]

years = [1989, 1955, 2011, 1943, 1975]
print my_median(years)


[55, 75, 89]


## Exercise : Warm-up II (bonus)
<hr style="border: solid 4px green">

### Using an external file
Now read the list of years from a text file, `years.txt`, which should have the total number of years $N$ in the first line, followed by a numbered list of years:
```
total number of years
1 year1
2 year2
...
```
<br><br>

To generate an input file `years.txt`, you can try the following code:

```python
import sys
output = open(filename, "w")
...
output.write("{0:2d} {1:2d}\n".format(1, years[0]))
```
<br><br>

To read an input file `years.txt`, you may use the following:
```python
input = open(filename, "r")
line = input.readline()
line.rstrip()
tokens = line.split()
```
<br><br>

Check the in-built or web documentation for help with these standard library functions.

## Solution : Warm-up II (bonus)
<hr style="border: solid 4px green">

The solution...

In [None]:
%load warm_up_2.py

## Performance: measuring time
<hr style="border: solid 4px green">

### Two ways of measuring processing time
* *from outside code* -- measure overall walltime of an entire application or function
* *from within code* -- measure execution time of a critical part of the application

## Performance: timing from outside code
<hr style="border: solid 4px green">

### Linux timing function <span style="font-family: Courier New, Courier, monospace;">time</span>
Can be use to measure the execution time of small code snippets, *e.g.*
```
$ time python example.py
real 0m1.051s
user 0m1.022s
sys  0m0.028s
```

* `real` actual (wallclock) time
* `user` cumulative time spent by all threads of the application
* `sys`  cumulative time spent by all cores during system tasks (*e.g.* memory allocation)
* *Problem*: `user` + `sys` $\geq$ `real` (if threads work in parallel)
<br><br>

### Python function <span style="font-family: Courier New, Courier, monospace;">timeit</span>
Usage
* import module `timeit` and use `timeit.timeit` or
* use the magic IPython command `%timeit`

By default, `timeit`
* measures time by running several time
* which gives statistical significance to measurements, especially small runtimes
<br><br>

Flexible control over the number of times a code is run
```
%timeit -n <iterations> -r <repeats>  <code_snippet>
```
* `-n <iterations>` -- how many times to execute `<code_snippet>`
* `-r <repeats>` -- how many times to repeat the timer (default 3)
<br><br>

> More information
> * `%timeit?`
> * https://docs.python.org/2/library/timeit.html

## Performance: timing from inside code
<hr style="border: solid 4px green">

### The <span style="font-family: Courier New, Courier, monospace;">time</span> module has two functions
* clock() -- returns real time since previous call to clock()
* time() -- returns current time in seconds since the Epoch (*e.g.* 01/01/1970 on Linux, 01/01/2001 on Mac OS)
* both depend on the system time and underlying system time functions

### Which one to use?
* Linux and Mac OS: `time.time()` gives the better precision (microsecond accuracy)
* Windows: `time.clock()` gives the better precision
* example: run the following test

In [20]:
import time

def func (sleepTime):
    # do nothing for 2 secs
    time.sleep (sleepTime)

# measure process time
t0 = time.clock ()
func (2.0)
print time.clock() - t0, "seconds process time"

# measure wall time
t0 = time.time()
func (2.0)
print time.time() - t0, "seconds wall time"

0.001471 seconds process time
2.00540304184 seconds wall time


## Performance: timing from inside code (cont'd)
<hr style="border: solid 4px green">

### Conclusion
* make good use of `timeit` for benchmarking and testing
  * *Pros*: automatically measures a mean value
  * *Cons*: restrictive when used on isolated portions of code from within scripts
* use `time` when benchmarking portions of code in scripts
  * ideally, run the same piece of code a number of times and average
  * choose between `time.time()` and `time.clock()` depending on the system
  * platform-independent solution: `timeit.default_timer()` points to one of the above
<br><br>

### Warning: <span style="font-family: Courier New, Courier, monospace;">timeit</span>
* can be used to time functions called from a script
* but is quite tedious
  * returned values are lost
  * the `timeit` command must repeat the `import` statements of the script
  * the `timeit` command must include argument initialisation

## Summary
<hr style="border: solid 4px green">

We have 
* reviewed some core Python basics
* introduced the IPython shell
* looked at the options for timing execution
<br><br>

> Useful links
> * https://docs.python.org/2/
> * https://www.codecademy.com/learn/python

<img src="../../images/reusematerial.png"; style="float: center; width: 90"; >