parallel graph management and execution in heterogeneous computing
About the Pathos Framework
pathos is a framework for heterogenous computing. It provides a consistent
high-level interface for configuring and launching parallel computations
across heterogenous resources.
pathos provides configurable launchers for
parallel and distributed computing, where each launcher contains the
syntactic logic to configure and launch jobs in an execution environment.
Examples of launchers that plug into
pathos are: a queue-less MPI-based
pyina), a ssh-based launcher (in
pathos), and a multi-process
pathos provides a consistent interface for parallel and/or distributed
apply for each launcher, thus lowering the barrier
for users to extend their code to parallel and/or distributed resources.
The guiding design principle behind
pathos is that
should be drop-in replacements in otherwise serial code, and thus switching
to one or more of the
pathos launchers is all that is needed to enable
code to leverage the selected parallel or distributed computing resource.
This not only greatly reduces the time to convert a code to parallel, but it
also enables a single code-base to be maintained instead of requiring
parallel, serial, and distributed versions of a code.
pathos maps can be
nested, thus hierarchical heterogenous computing is possible by merely
selecting the desired hierarchy of
pathos framework is composed of several interoperating packages:
dill: a utility to serialize all of python
pox: utilities for filesystem exploration and automated builds
klepto: persistent caching to memory, disk, or database
multiprocess: better multiprocessing and multithreading in python
ppft: distributed and parallel python
pyina: MPI parallel
mapand cluster scheduling
pathos: graph management and execution in heterogenous computing
pathos package provides a few basic tools to make parallel and
distributed computing more accessible to the end user. The goal of
is to enable the user to extend their own code to parallel and distributed
computing with minimal refactoring.
pathos provides methods for configuring, launching, monitoring, and
controlling a service on a remote host. One of the most basic features
pathos is the ability to configure and launch a RPC-based service
on a remote host.
pathos seeds the remote host with the
script, which allows the remote host to inform the localhost of a port
that is available for communication.
Beyond the ability to establish a RPC service, and then post requests,
is the ability to launch code in parallel. Unlike parallel computing
performed at the node level (typically with MPI),
pathos enables the
user to launch jobs in parallel across heterogeneous distributed resources.
pathos provides distributed
pipe algorithms, where a mix of
local processors and distributed workers can be selected.
also provides a very basic automated load balancing service, as well as
the ability for the user to directly select the resources.
pool.map interface, yields a
map implementation that
hides the RPC internals from the user. With
pool.map, the user can launch
their code in parallel, and as a distributed service, using standard python
and without writing a line of server or parallel batch code.
RPC servers and communication in general is known to be insecure. However,
instead of attempting to make the RPC communication itself secure,
provides the ability to automatically wrap any distributes service or
communication in a ssh-tunnel. Ssh is a universally trusted method.
pathos has launched several distributed calculations
on national lab clusters, and to date has performed test calculations
that utilize node-to-node communication between several national lab clusters
and a user's laptop.
pathos allows the user to configure and launch
at a very atomistic level, through raw access to ssh and scp.
pathos is the core of a python framework for heterogeneous computing.
pathos is in active development, so any user feedback, bug reports, comments,
or suggestions are highly appreciated. A list of known issues is maintained
at http://trac.mystic.cacr.caltech.edu/project/pathos/query.html, with a public
ticket list at https://github.com/uqfoundation/pathos/issues.
pathos provides a configurable distributed parallel
to launching RPC service calls, with:
mapinterface that meets and extends the python
- the ability to submit service requests to a selection of servers
- the ability to tunnel server communications with ssh
pathos core is built on low-level communication to remote hosts using
ssh. The interface to ssh, scp, and ssh-tunneled connections can:
- configure and launch remote processes with ssh
- configure and copy file objects with scp
- establish an tear-down a ssh-tunnel
To get up and running quickly,
pathos also provides infrastructure to:
- easily establish a ssh-tunneled connection to a RPC server
The latest released version of
pathos is available from:
pathos is distributed under a 3-clause BSD license.
You can get the latest development version with all the shiny new features at: https://github.com/uqfoundation
If you have a new contribution, please submit a pull request.
Probably the best way to get started is to look at the documentation at
http://pathos.rtfd.io. Also see
for a set of scripts that demonstrate the configuration and launching of
communications with ssh and scp, and demonstrate the configuration and
execution of jobs in a hierarchical parallel workflow. You can run the test
python -m pathos.tests. Tunnels and other connections to
remote servers can be established with the
pathos_connect script (or with
python -m pathos). See
pathos_connect --help for more information.
pathos also provides a
portpicker script to select an open port
(also available with
python -m pathos.portpicker). The source code is
generally well documented, so further questions may be resolved by inspecting
the code itself. Please feel free to submit a ticket on github, or ask a
question on stackoverflow (@Mike McKerns).
If you would like to share how you use
pathos in your work, please send
an email (to mmckerns at uqfoundation dot org).
Important classes and functions are found here:
pathos.abstract_launcher[the worker pool API definition]
pathos.pools[all of the pathos worker pools]
pathos.core[the high-level command interface]
pathos.hosts[the hostname registry interface]
pathos.serial.SerialPool[the serial python worker pool]
pathos.parallel.ParallelPool[the parallelpython worker pool]
pathos.multiprocessing.ProcessPool[the multiprocessing worker pool]
pathos.threading.ThreadPool[the multithreading worker pool]
pathos.connection.Pipe[the launcher base class]
pathos.secure.Pipe[the secure launcher base class]
pathos.secure.Copier[the secure copier base class]
pathos.secure.Tunnel[the secure tunnel base class]
pathos.selector.Selector[the selector base class]
pathos.server.Server[the server base class]
pathos.profile[profiling in threads and processes]
pathos also provides two convenience scripts that are used to establish
secure distributed connections. These scripts are installed to a directory
on the user's
$PATH, and thus can be run from anywhere:
portpicker[get the portnumber of an open port]
pathos_connect[establish tunnel and/or RPC server]
--help as an argument to any of the above scripts will print out an
instructive help message.
If you use
pathos to do research that leads to publication, we ask that you
acknowledge use of
pathos by citing the following in your publication::
M.M. McKerns, L. Strand, T. Sullivan, A. Fang, M.A.G. Aivazis, "Building a framework for predictive science", Proceedings of the 10th Python in Science Conference, 2011; http://arxiv.org/pdf/1202.1056 Michael McKerns and Michael Aivazis, "pathos: a framework for heterogeneous computing", 2010- ; http://trac.mystic.cacr.caltech.edu/project/pathos
Please see http://trac.mystic.cacr.caltech.edu/project/pathos or http://arxiv.org/pdf/1202.1056 for further information.