# Simple parallelisation of jobs

Python offers the module `multiprocessing` to use multi-processor (multi-core) machines for job parallelisation.

We consider the following simple case:
- We want to parallelise a `Python`-function
- The individual runs of the function are completely independent of each other (no communication between parallel jobs is necessary).

In [None]:
%%time
import numpy as np

# simple script to test a list of numbers on the
# prime-number property:
def is_prime(n):
    """
    tests whether an integer is a prime number
    
    input: the number to be tested
    return: the number if it is a prime and -1 otherwise
    """
        
    if n != 2 and n%2 == 0:
        return -1
    else:
        for i in range(3, int(np.sqrt(n) + 1)):
            if n%i == 0:
                return -1

    return n

# The map-function applies a function to each element of
# an iterable and returns a new iterable with the results of the
# function application.
# Notet hat we typically use 'list-comprehension' for this task.
# However, the 'map'-function is used for parallelisation below.
result = list(map(is_prime, range(2, 1000000)))
#print([i for i in result if i > 0])

In [None]:
%%time
# The same script as above but with parallelisation
# Testing individual prime numbers is independent of each other
# and hence this is an ideal case for the multiprocessing module!
import numpy as np
import multiprocessing

# simple script to test a list of numbers on the
# prime-number property:
def is_prime(n):
    """
    tests whether an integer is a prime number
    
    input: the number to be testes
    return: the number if it is a prime and -1 otherwise
    """
        
    if n != 2 and n%2 == 0:
        return -1
    else:
        for i in range(3, int(np.sqrt(n) + 1)):
            if n%i == 0:
                return -1

    return n

# initialize a process pool;
# just play with the number of processes to see
# the time difference; note the method
# multiprocessing.cpu_count() which gives you the
# number of CPUs / cores of your machine:
print("Your machine has {} CPUs / cores".format(multiprocessing.cpu_count()))
pool = multiprocessing.Pool(processes = 4)

# and perform prime-number testing in parallel:
# The pool.map command takes a function and an iterable
# (typically a list) of arguments which are evaluated
# in parallel!
# Note that pool-map returns a list, not an iterable!
result = pool.map(is_prime, list(range(2, 1000000)))

#print([i for i in result if i > 0])

**Notes:**

- The resulting list of `pool.map` respects the order of elements in the input list!
- For small data samples, the parallelised version may be *slower* than the unparallelised!
  In that case, the overhead to initialise and to execute parallelisation is not a negigible
  part of the complete program execution time.  
- The `pool.map` command only works with *single argument functions*
  by default. If you need to provide multiple arguments to a function,
  you need to collect these arguments in a list / tuple.

In [None]:
# simple (useless) program to show how to give multiple
# arguments to a pool.map function

import multiprocessing

# of course you NEVER EVER would do the following
# in a real-life scenario (numpy!):
def line(args):
    # The line function needs three arguments:
    # x, a and b. To parallelise it with multiprocessing
    # we artificially make one list-argument containing
    # the three individual arguments
    x, a, b = args
    return a * x + b

pool = multiprocessing.Pool(processes = 2)
result = pool.map(line, [(1, 2, 3), (4, 5, 6)])

print(result)

**Note:**

The `multiprocessing` module offers much more. Check it out if you need to deal with more complex job-parallelisation!