Skip to content

Commit

Permalink
feat: adding flux executor, using flux futures
Browse files Browse the repository at this point in the history
Signed-off-by: vsoch <vsoch@users.noreply.github.com>
  • Loading branch information
vsoch committed Aug 13, 2022
1 parent 417aad4 commit f132f1d
Show file tree
Hide file tree
Showing 9 changed files with 484 additions and 4 deletions.
4 changes: 2 additions & 2 deletions docs/executing/cloud.rst
Original file line number Diff line number Diff line change
Expand Up @@ -364,9 +364,9 @@ defined as `mem_mb` to Tibanna. Further, it will propagate the number of threads
a job intends to use, such that Tibanna can allocate it to the most cost-effective
cloud compute instance available.

-----------------------------------------------------------------
--------------------------------------------
Executing a Snakemake workflow via GA4GH TES
-----------------------------------------------------------------
--------------------------------------------

The task execution service (`TES <https://github.com/ga4gh/task-execution-schemas>`_) is an application programming interface developed by the Global Alliance for Genomics and Health (`GA4GH <https://www.ga4gh.org/>`_).
It is used to process workflow tasks in a cloud environment.
Expand Down
146 changes: 146 additions & 0 deletions docs/executor_tutorial/flux.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@

.. _tutorial-flux:

Flux Tutorial
-------------

.. _Snakemake: http://snakemake.readthedocs.io
.. _Snakemake Remotes: https://snakemake.readthedocs.io/en/stable/snakefiles/remote_files.html
.. _Python: https://www.python.org/


Setup
:::::

To go through this tutorial, you need the following software installed:

* Docker

The (`Flux-framework <https://flux-framework.org/>`_) is a flexible resource scheduler that can work on both high performance computing systems and cloud (e.g., Kubernetes).
Since it is more modern (e.g., has an official Python API) we define it under a cloud resource. For this example, we will show you how to set up a "single node" local flux container to interact with snakemake. You can use the Dockerfile in ``examples/flux`` that will provide a container with flux and snakemake
Note that we install from source and bind to ``/home/fluxuser/snakemake`` with the intention of being able to develop (if desired).
First, build the container:

.. code-block:: console
$ docker build -t flux-snake .
And then you can run the container with or without any such bind:

.. code-block:: console
$ docker run -it --rm flux-snake
Once you shelled into the container, you can view and start a flux instance:

.. code-block:: console
$ flux getattr size
$ flux start --test-size=4
$ flux getattr size
And see resources available:

.. code-block:: console
$ flux resource status
STATUS NNODES NODELIST
avail 4 5a74dc238d[98,98,98,98]
Step 1: Run Snakemake
:::::::::::::::::::::

Now let's run Snakemake with the Flux executor. There is an example ``Snakefile``
in the flux examples folder that will show running a "Hello World!" example,
and this file should be in your fluxuser home:


.. code-block:: console
$ ls
Snakefile snakemake
Here is how to run the workflow:

.. code:: console
$ snakemake --flux --cores=1
The flags above refer to:

- `--flux`: tell Snakemake to use the flux executor


Once you submit the job, you'll immediately see the familiar Snakemake console output.
The jobs happen very quickly, but the default wait time between checks is 10 seconds
so it will take a bit longer.

.. code:: console
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job count min threads max threads
------------------------ ------- ------------- -------------
all 1 1 1
multilingual_hello_world 2 1 1
total 3 1 1
Select jobs to execute...
[Fri Aug 12 21:09:32 2022]
rule multilingual_hello_world:
output: hello/world.txt
jobid: 1
reason: Missing output files: hello/world.txt
wildcards: greeting=hello
resources: tmpdir=/tmp
Checking status for job ƒ3sWJLhD
[Fri Aug 12 21:09:42 2022]
Finished job 1.
1 of 3 steps (33%) done
Select jobs to execute...
[Fri Aug 12 21:09:42 2022]
rule multilingual_hello_world:
output: hola/world.txt
jobid: 2
reason: Missing output files: hola/world.txt
wildcards: greeting=hola
resources: tmpdir=/tmp
Checking status for job ƒ8JAY1Kd
[Fri Aug 12 21:09:52 2022]
Finished job 2.
2 of 3 steps (67%) done
Select jobs to execute...
[Fri Aug 12 21:09:52 2022]
localrule all:
input: hello/world.txt, hola/world.txt
jobid: 0
reason: Input files updated by another job: hola/world.txt, hello/world.txt
resources: tmpdir=/tmp
[Fri Aug 12 21:09:52 2022]
Finished job 0.
3 of 3 steps (100%) done
Complete log: .snakemake/log/2022-08-12T210932.564786.snakemake.log
At this point you can inspect the local directory to see your job output!

.. code:: console
$ ls
Snakefile hello hola
$ cat hello/world.txt
hello, World!
See the `flux documentation <https://flux-framework.readthedocs.io/en/latest/quickstart.html#docker-recommended-for-quick-single-node-deployments>`_
for more detail. For now, let's try interacting with flux via snakemake via the `Flux Python Bindings <https://flux-framework.readthedocs.io/projects/flux-workflow-examples/en/latest/job-submit-api/README.html>`_.

The code for this example is provided in (`examples/flux <https://github.com/snakemake/snakemake/tree/main/examples/flux>`_)
2 changes: 1 addition & 1 deletion docs/executor_tutorial/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ We ensured that no bioinformatics knowledge is needed to understand the tutorial
:maxdepth: 2

google_lifesciences

azure_aks
flux


11 changes: 11 additions & 0 deletions examples/flux/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
FROM fluxrm/flux-sched:focal
# docker build -t flux-snake .
# docker run -it flux-snake
USER root
#ENV PATH /home/fluxuser/.local/bin:${PATH}
RUN wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh && \
/bin/bash ~/miniconda.sh -b -p /opt/conda
ENV PATH=/opt/conda:/bin:$PATH
RUN pip install git+https://github.com/vsoch/snakemake@add/flux-executor
USER fluxuser
COPY ./Snakefile /home/fluxuser/Snakefile
19 changes: 19 additions & 0 deletions examples/flux/Snakefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# By convention, the first pseudorule should be called "all"
# We're using the expand() function to create multiple targets
rule all:
input:
expand(
"{greeting}/world.txt",
greeting = ['hello', 'hola'],
),

# First real rule, this is using a wildcard called "greeting"
rule multilingual_hello_world:
output:
"{greeting}/world.txt",
shell:
"""
mkdir -p "{wildcards.greeting}"
sleep 5
echo "{wildcards.greeting}, World!" > {output}
"""
14 changes: 13 additions & 1 deletion snakemake/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,7 @@ def snakemake(
wrapper_prefix=None,
kubernetes=None,
container_image=None,
flux=False,
tibanna=False,
tibanna_sfn=None,
google_lifesciences=False,
Expand Down Expand Up @@ -292,6 +293,7 @@ def snakemake(
wrapper_prefix (str): prefix for wrapper script URLs (default None)
kubernetes (str): submit jobs to Kubernetes, using the given namespace.
container_image (str): Docker image to use, e.g., for Kubernetes.
flux (bool): Launch workflow to flux cluster.
default_remote_provider (str): default remote provider to use instead of local files (e.g. S3, GS)
default_remote_prefix (str): prefix for default remote provider (e.g. name of the bucket).
tibanna (bool): submit jobs to AWS cloud using Tibanna.
Expand Down Expand Up @@ -402,7 +404,7 @@ def snakemake(
assume_shared_fs = False
default_remote_provider = "GS"
default_remote_prefix = default_remote_prefix.rstrip("/")
if kubernetes:
if kubernetes or flux:
assume_shared_fs = False

# Currently preemptible instances only supported for Google LifeSciences Executor
Expand Down Expand Up @@ -709,6 +711,7 @@ def snakemake(
google_lifesciences_regions=google_lifesciences_regions,
google_lifesciences_location=google_lifesciences_location,
google_lifesciences_cache=google_lifesciences_cache,
flux=flux,
tes=tes,
precommand=precommand,
preemption_default=preemption_default,
Expand Down Expand Up @@ -762,6 +765,7 @@ def snakemake(
google_lifesciences_location=google_lifesciences_location,
google_lifesciences_cache=google_lifesciences_cache,
tes=tes,
flux=flux,
precommand=precommand,
preemption_default=preemption_default,
preemptible_rules=preemptible_rules,
Expand Down Expand Up @@ -2257,6 +2261,7 @@ def get_argument_parser(profile=None):
)

group_cloud = parser.add_argument_group("CLOUD")
group_flux = parser.add_argument_group("FLUX")
group_kubernetes = parser.add_argument_group("KUBERNETES")
group_tibanna = parser.add_argument_group("TIBANNA")
group_google_life_science = parser.add_argument_group("GOOGLE_LIFE_SCIENCE")
Expand Down Expand Up @@ -2356,6 +2361,12 @@ def get_argument_parser(profile=None):
"are deleted at the shutdown step of the workflow.",
)

group_flux.add_argument(
"--flux",
action="store_true",
help="Execute your workflow on a flux cluster.",
)

group_tes.add_argument(
"--tes",
metavar="URL",
Expand Down Expand Up @@ -2912,6 +2923,7 @@ def open_browser():
drmaa_log_dir=args.drmaa_log_dir,
kubernetes=args.kubernetes,
container_image=args.container_image,
flux=args.flux,
tibanna=args.tibanna,
tibanna_sfn=args.tibanna_sfn,
google_lifesciences=args.google_lifesciences,
Expand Down

0 comments on commit f132f1d

Please sign in to comment.