<div style="border: 2px solid #8A9AD0; margin: 1em 0.2em; padding: 0.5em;">

# Python - Multiprocessing

by [Helena Rasche](https://training.galaxyproject.org/hall-of-fame/hexylena/)

CC-BY licensed content from the [Galaxy Training Network](https://training.galaxyproject.org/)

**Objectives**

- How can I parallelize code to make it run faster
- What code is, or is not, a prime target for parallelisation

**Objectives**

- Understanding how to paralellise code to make it run faster.
- Identifying how to decompose code into a parallel unit.

**Time Estimation: 30M**
</div>


<blockquote class="agenda" style="border: 2px solid #86D486;display: none; margin: 1em 0.2em">
<div class="box-title agenda-title" id="agenda">Agenda</div>
<p>In this tutorial, we will cover:</p>
<ol id="markdown-toc">
<li><a href="#threads-vs-processes" id="markdown-toc-threads-vs-processes">Threads vs Processes</a></li>
</ol>
</blockquote>
<h2 id="threads-vs-processes">Threads vs Processes</h2>
<p>For many languages, threads can be extremely efficient, as they are rather light weight and don’t require many resources to create new threads. “Threads are cheap”. However Python has a major limitation with the Global Interpreter Lock (GIL). Only one thread can be executing at once, whether it’s the main thread, or one of the others you’ve spawned. So we look to alternative concurrency mechanisms like processes for sharing the load across multiple CPU cores.</p>
<p>For <strong>threads</strong>, if you are doing “lightweight” work like fetching data from the web where a lot of time is spent waiting for the server to respond, but very little computational work within Python is required, then threads are a great fit!</p>
<p>However if you are doing computationally intensive work (imagine calculating prime numbers or similar complex tasks) then each thread will be contending for CPU time. Python is a single processes and can only have one thread running at a time due to the <abbr title="Global Interpreter Lock">GIL</abbr>. So it will switch between multiple threads and try and make progress on each, but it cannot execute them truly simultaneously. Here we need to switch to processes.</p>
<p><strong>Processes</strong> are relatively heavier weight as they essentially start a new python process, before executing the individual function. This is part of the reason for a “Pool”, to amortise the expensive cost of setting up processes, before allowing them to do a lot of work quickly. Here each process can be using it’s own CPU core fully, and thus make more sense for computationally expensive tasks.</p>
<p>Let’s dive straight into an example: here we’re using the <code style="color: inherit">multiprocessing</code> library which uses processes and is relatively simple to understand:</p>


In [None]:
from multiprocessing import Pool

<p>Importantly we should define a <em>pure</em> function, i.e. a function that only works on the inputs available to it, and doesn’t use or affect global state (i.e.  without side effects). (printing is ok.)</p>


In [None]:
def f(x):
    return x * x

<p>We spawn a process Pool, with 5 processes And then use the convenient map
function to send inputs multiply to the specified function <code class="language-plaintext highlighter-rouge">f</code></p>


In [None]:
with Pool(5) as p:
    print(p.map(f, range(10)))

<p>This will create 5 new python processes, and they will begin to consume tasks from the queue as quickly as they can. As soon as a process has finished handling one task, it will begin work on the next task.</p>
<h2 id="pools--paralellism">Pools &amp; Paralellism</h2>
<p>Let’s do another example, the classic “print time”, to see how the threads are actually executing across 4 processes.</p>


In [None]:
import time

def g(x):
    print(time.time())
    time.sleep(1)

with Pool(4) as p:
    print(p.map(g, range(12)))

<blockquote class="question" style="border: 2px solid #8A9AD0; margin: 1em 0.2em">
<div class="box-title question-title" id="question"><i class="far fa-question-circle" aria-hidden="true" ></i> Question</div>
<p>What did you see here?</p>
<p>Was it clear and easy to read?</p>
<br/><details style="border: 2px solid #B8C3EA; margin: 1em 0.2em;padding: 0.5em; cursor: pointer;"><summary>👁 View solution</summary>
<div class="box-title solution-title" id="solution"><button class="gtn-boxify-button solution" type="button" aria-controls="solution" aria-expanded="true"><i class="far fa-eye" aria-hidden="true" ></i> Solution<span class="fold-unfold fa fa-minus-square"></span></button></div>
<p>It prints four timestamps immediately, all around the same time. Then one second later it prints 4 more, and one second later, a final four.
Our Pool of 4 processes processes the function the maximal amount of times possible concurrently. Once each of those functions returns, then it can move on to processing the next tasks.
This is precisely the situation of e.g. 4 queues at the grocery; as soon as one customer is processed, they immediately begin on the next, until no more remain.</p>
<p>You might also see a situation where the numbers are interleaved in a completely unreadable way. Here they write immediately, but there is no synchronisation or limiting on who can write at one time, and the result is a mess of interleaved print statements.</p>
</details>
</blockquote>
<h2 id="when-to-parallelise">When to parallelise</h2>
<p>Some guidelines:</p>
<ul>
<li>When you can isolate a pure function</li>
<li>That do not require shared state, nor ordering</li>
<li>That do not modify global state (pure!)</li>
<li>That take a significant amount of time, relative to the time it takes to rewrite your code to support parallelising.</li>
</ul>
<p>Some common examples of this are slow tasks like requesting data from multiple websites, e.g. using the <code style="color: inherit">requests</code> library. Or doing some computationally expensive calculation. The last point however is especially important, as it is not always trivial to rearchitect your code to handle being spread across multiple threads or processes.</p>
<p>Let’s parallelize this program which fetches some metadata from multiple Galaxy servers:</p>


In [None]:
import requests
import time

servers = [
    "https://usegalaxy.org",
    "https://usegalaxy.org.au",
    "https://usegalaxy.eu",
    "https://usegalaxy.fr",
    "https://usegalaxy.be",
]

data = {}
start = time.time()
for url in servers:
    print(url)
    try:
        response = requests.get(url + "/api/version", timeout=2).json()
        data[url] = response['version_major']
    except requests.exceptions.ConnectTimeout:
        data[url] = None
    except requests.exceptions.ReadTimeout:
        data[url] = None


# How long did it take to execute
print(time.time() - start)

for k, v in data.items():
    print(k, v)

<p>If we look at this, we can see one hot spot in the code, where it’s quite slow, requesting data from a remote server. If we want to speed this up we’ll need to isolate it into a pure function. Here we can see a possibility for a function that requests the data, with the input of the server url, and output of the version.</p>


In [None]:
def fetch_version(server_url):
    try:
        response = requests.get(url + "/api/version", timeout=2).json()
        return response['version_major']
    except requests.exceptions.ConnectTimeout:
        return None
    except requests.exceptions.ReadTimeout:
        return None

<p>This now lacks side effects (like modifying the <code style="color: inherit">data</code> object), and can be used in a map statement.</p>


In [None]:
start = time.time()
with Pool(4) as p:
    versions = p.map(fetch_version, servers)
    data = dict(zip(servers, versions))
print(time.time() - start)

for k, v in data.items():
    print(k, v)

<p>Same result, and now this is a lot more efficient!</p>
<h2 id="sizing-your-pool">Sizing your Pool</h2>
<p>This depends largely on profilling your code or knowing the performance characteristics of it. In the above example, there is very little computation work executed as part of this function, it’s essentially all network I/O, no CPU or memory usage.</p>
<p>As such, we can probably set our pool size to be very large, a multiple of our systems’ processor count. It will spawn many processes that do very little.</p>
<p>If however this were a more complicated task (e.g. calculating a large number, machine learning, etc.), then we might wish to set our pool size to the number of CPUs - 1, as each process will potentially consume it’s CPU allocation completely, and we wish to have some left over for the managing program and any other work going on, on the system.</p>
<blockquote class="question" style="border: 2px solid #8A9AD0; margin: 1em 0.2em">
<div class="box-title question-title" id="question-1"><i class="far fa-question-circle" aria-hidden="true" ></i> Question</div>
<p>Try changing the pool size and see what effect it has on runtime. Start from a value of 1, and go up to 5, the number of servers in our list.</p>
<br/><details style="border: 2px solid #B8C3EA; margin: 1em 0.2em;padding: 0.5em; cursor: pointer;"><summary>👁 View solution</summary>
<div class="box-title solution-title" id="solution-1"><button class="gtn-boxify-button solution" type="button" aria-controls="solution-1" aria-expanded="true"><i class="far fa-eye" aria-hidden="true" ></i> Solution<span class="fold-unfold fa fa-minus-square"></span></button></div>
<p>Given the small sample size (5 servers), and the variability of response times (in local testing between 3-8 seconds for the single-pool version), you can see varying results but <em>generally</em> it should decrease as the pool size increases. However, sometimes you will see the Pool=5 version take the same or longer than Pool=4.</p>
<p>Especially in the case that 1 server (or 1 request one time) dominates the request time, this can “hide” the improvements as the others complete quickly on the remaining N-1 processes.</p>
</details>
</blockquote>


In [None]:
results = []
for i in range(1, 6):
    start = time.time()
    with Pool(i) as p:
        versions = p.map(fetch_version, servers)
        data = dict(zip(servers, versions))
    duration = time.time() - start
    results.append(duration)

# Plot it
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(range(1, 6), results)
ax.set(ylabel='time (s)', xlabel='Pool Size', title='Pool size vs runtime')
ax.grid()
# Uncomment if your notebook cannot display images inline.
# fig.savefig("pool-vs-runtime.png")
plt.show()

<p>You might see a result similar to the following:</p>
<p><img src="../../images/pool-vs-runtime.png" alt="graph of the above, showing a decrease in runtime from 2.6 seconds to approximately 1.1 seconds as the pool size increases from 1 to 5. As the pool size increases from 1-2 there is a large improvement, but as it increases from 4-5 the improvement is very small, on the order of milliseconds." width="640" height="480" loading="lazy" /></p>
<h2 id="threads">Threads</h2>
<p>Let’s convert our previous example from processes to threads, as processes aren’t strictly necessary for such a light weight use case as fetching data from the internet where you’re blocking on network rather than CPU resources.</p>


In [None]:
import concurrent.futures
import requests
import time

data = {}
start = time.time()
with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
    future_to_url = {executor.submit(fetch_version, url): url for url in servers}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            version = future.result()
            data[url] = version
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
print(time.time() - start)

for k, v in data.items():
    print(k, v)

<p>This is a bit more complicated to write, but again, if you’re not blocking on CPU resources, then this is potentially approximately as effecient as thread pools.</p>
<blockquote class="question" style="border: 2px solid #8A9AD0; margin: 1em 0.2em">
<div class="box-title question-title" id="question-2"><i class="far fa-question-circle" aria-hidden="true" ></i> Question</div>
<p>What result did you get? Was it slower, faster, or about the same as processes?</p>
</blockquote>


# Key Points

- Code go brrrt.

# Congratulations on successfully completing this tutorial!

Please [fill out the feedback on the GTN website](https://training.galaxyproject.org/training-material/topics/data-science/tutorials/python-multiprocessing/tutorial.html#feedback) and check there for further resources!
