This small python module implements four functions: map
and starmap
, and their async versions map_async
and starmap_async
.
- Provide an easy to use syntax for both
map
andstarmap
. - Parallelize transparently whenever possible.
- Pass additional positional and keyword arguments to parallelized functions.
- Show a progress bar (requires tqdm as optional package)
pip install tqdm # for progress bar support
pip install parmap
Here are some examples with some unparallelized code parallelized with parmap:
import parmap
# You want to do:
mylist = [1,2,3]
argument1 = 3.14
argument2 = True
y = [myfunction(x, argument1, mykeyword=argument2) for x in mylist]
# In parallel:
y = parmap.map(myfunction, mylist, argument1, mykeyword=argument2)
Requires pip install tqdm
# You want to do:
y = [myfunction(x) for x in mylist]
# In parallel, with a progress bar
y = parmap.map(myfunction, mylist, pm_pbar=True)
# Passing extra options to the tqdm progress bar
y = parmap.map(myfunction, mylist, pm_pbar={"desc": "Example"})
# You want to do:
z = [myfunction(x, y, argument1, argument2, mykey=argument3) for (x,y) in mylist]
# In parallel:
z = parmap.starmap(myfunction, mylist, argument1, argument2, mykey=argument3)
# You want to do:
listx = [1, 2, 3, 4, 5, 6]
listy = [2, 3, 4, 5, 6, 7]
param = 3.14
param2 = 42
listz = []
for (x, y) in zip(listx, listy):
listz.append(myfunction(x, y, param1, param2))
# In parallel:
listz = parmap.starmap(myfunction, zip(listx, listy), param1, param2)
In this example, Task1 uses 5 cores, while Task2 uses 3 cores. Both tasks start to compute simultaneously, and we print a message as soon as any of the tasks finishes, retreiving the result.
import parmap
def task1(item):
return 2*item
def task2(item):
return 2*item + 1
items1 = range(500000)
items2 = range(500)
with parmap.map_async(task1, items1, pm_processes=5) as result1:
with parmap.map_async(task2, items2, pm_processes=3) as result2:
data_task1 = None
data_task2 = None
task1_working = True
task2_working = True
while task1_working or task2_working:
result1.wait(0.1)
if task1_working and result1.ready():
print("Task 1 has finished!")
data_task1 = result1.get()
task1_working = False
result2.wait(0.1)
if task2_working and result2.ready():
print("Task 2 has finished!")
data_task2 = result2.get()
task2_working = False
#Further work with data_task1 or data_task2
The existing functions have some usability limitations:
- The built-in python function
map
1 is not able to parallelize. multiprocessing.Pool().map
2 does not allow any additional argument to the mapped function.multiprocessing.Pool().starmap
allows passing multiple arguments, but in order to pass a constant argument to the mapped function you will need to convert it to an iterator usingitertools.repeat(your_parameter)
3
parmap
aims to overcome this limitations in the simplest possible way.
- Create a pool for parallel computation automatically if possible.
parmap.map(..., ..., pm_parallel=False)
# disables parallelizationparmap.map(..., ..., pm_processes=4)
# use 4 parallel processesparmap.map(..., ..., pm_pbar=True)
# show a progress bar (requires tqdm)parmap.map(..., ..., pm_pool=multiprocessing.Pool())
# use an existing pool, in this case parmap will not close the pool.parmap.map(..., ..., pm_chunksize=3)
# size of chunks (see multiprocessing.Pool().map)
parmap.map()
and parmap.starmap()
(and their async versions) have their own arguments (pm_parallel
, pm_pbar
...). Those arguments are never passed to the underlying function. In the following example, myfun
will receive myargument
, but not pm_parallel
. Do not write functions that require keyword arguments starting with pm_
, as parmap
may need them in the future.
parmap.map(myfun, mylist, pm_parallel=True, myargument=False)
Additionally, there are other keyword arguments that should be avoided in the functions you write, because of parmap backwards compatibility reasons. The list of conflicting arguments is: parallel
, chunksize
, pool
, processes
, callback
, error_callback
and parmap_progress
.
This package started after this question, when I offered this answer, taking the suggestions of J.F. Sebastian for his answer
- Davide Gerosa, Michael Kesden, "PRECESSION. Dynamics of spinning black-hole binaries with python." arXiv:1605.01067, 2016
- Thibault de Boissiere, Implementation of Deep learning papers, 2017
- Wasserstein Generative Adversarial Networks arXiv:1701.07875
- pix2pix arXiv:1611.07004
- Improved Techniques for Training Generative Adversarial Networks arXiv:1606.03498
- Colorful Image Colorization arXiv:1603.08511
- Deep Feature Interpolation for Image Content Changes arXiv:1611.05507
- InfoGAN arXiv:1606.03657
- Geoscience Australia, SIFRA, a System for Infrastructure Facility Resilience Analysis, 2017
- André F. Rendeiro, Christian Schmidl, Jonathan C. Strefford, Renata Walewska, Zadie Davis, Matthias Farlik, David Oscier, Christoph Bock "Chromatin accessibility maps of chronic lymphocytic leukemia identify subtype-specific epigenome signatures and transcription regulatory networks" Nat. Commun. 7:11938 doi: 10.1038/ncomms11938 (2016). Paper, Code