-
Notifications
You must be signed in to change notification settings - Fork 1
/
index.html
686 lines (589 loc) · 34.8 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-us">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1"/>
<link href="//netdna.bootstrapcdn.com/font-awesome/4.0.3/css/font-awesome.css" rel="stylesheet">
<link href="https://fonts.googleapis.com/css?family=Source+Sans+Pro" rel="stylesheet" type="text/css">
<link href="https://fonts.googleapis.com/css?family=Droid+Sans:400,700" rel="stylesheet">
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0-beta.2/css/bootstrap.min.css" integrity="sha384-PsH8R72JQ3SOdhVi3uxftmaW6Vc51MKb0q5P2rRUpPvrszuE4W1povHYgTpBfshb" crossorigin="anonymous">
<script src="https://code.jquery.com/jquery-3.2.1.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.12.3/umd/popper.min.js" integrity="sha384-vFJXuSJphROIrBnz7yo7oB41mKfc8JzQZiCq4NCceLEaO4IHwicKwpJf9c9IpFgh" crossorigin="anonymous"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0-beta.2/js/bootstrap.min.js" integrity="sha384-alpBpkh1PFOepccYVYDB4do5UnbKysX5WZXm3XxPqe5iKTfUKjNkCk9SaVuEZflJ" crossorigin="anonymous"></script>
<title>How does SoS compare with other workflow engines | vatlab</title>
<meta property='og:title' content='How does SoS compare with other workflow engines - vatlab'>
<meta property='og:description' content='Over 200 workflow systems have been developed to date. Like any other software tools, many workflow systems are actively evolving with new features added from time to time. The goal of this blog post is to illustrate, by means of comparison to some of the most popular workflow systems similar to SoS, features and limitations of SoS as a conventional workflow system. It should be seen as a check-list of basic workflow features, in addition to the unique niche SoS places itself in the realm of workflow systems as explained in the next section and in other posts.'>
<meta property='og:url' content='https://vatlab.github.io/blog/post/comparison/'>
<meta property='og:site_name' content='vatlab'>
<meta property='og:type' content='article'><meta property='og:image' content='https://www.gravatar.com/avatar/969c44b6bc34a74a2f1d1dc59ddaa98f?s=256'><meta property='article:section' content='Post'><meta property='article:tag' content='SoS'><meta property='article:published_time' content='2018-03-29T00:00:00Z'/><meta property='article:modified_time' content='2018-03-29T00:00:00Z'/><meta name='twitter:card' content='summary'><meta name='twitter:site' content='@ScriptOfScripts'><meta name='twitter:creator' content='@ScriptOfScripts'>
<link rel="stylesheet" href="https://vatlab.github.io/blog//css/style.css"/><link rel='stylesheet' href='https://vatlab.github.io/blog/css/custom.css'></head>
<body>
<section class="section">
<div class="container">
<nav class="nav">
<div class="nav-left">
<a class="nav-item" href="https://vatlab.github.io/blog/"><h1 class="title is-4">vatlab
</h1></a>
</div>
<div class="nav-right">
<nav class="nav-item level is-mobile"><a class="level-item" href='mailto:bpeng@mdanderson.org' target='_blank' rel='noopener'>
<span class="icon">
<i class><svg viewbox='0 0 24 24' stroke-linecap='round' stroke-linejoin='round' stroke-width='2' aria-hidden='true'>
<path d="M4 4h16c1.1 0 2 .9 2 2v12c0 1.1-.9 2-2 2H4c-1.1 0-2-.9-2-2V6c0-1.1.9-2 2-2z"/>
<polyline points="22,6 12,13 2,6"/>
</svg></i>
</span>
</a><a class="level-item" href='https://github.com/vatlab' target='_blank' rel='noopener'>
<span class="icon">
<i class><svg viewbox='0 0 24 24' stroke-linecap='round' stroke-linejoin='round' stroke-width='2' aria-hidden='true'>
<path d="M9 19c-5 1.5-5-2.5-7-3m14 6v-3.87a3.37 3.37 0 0 0-.94-2.61c3.14-.35 6.44-1.54 6.44-7A5.44 5.44 0 0 0 20 4.77 5.07 5.07 0 0 0 19.91 1S18.73.65 16 2.48a13.38 13.38 0 0 0-7 0C6.27.65 5.09 1 5.09 1A5.07 5.07 0 0 0 5 4.77a5.44 5.44 0 0 0-1.5 3.78c0 5.42 3.3 6.61 6.44 7A3.37 3.37 0 0 0 9 18.13V22"/>
</svg></i>
</span>
</a><a class="level-item" href='https://twitter.com/ScriptOfScripts' target='_blank' rel='noopener'>
<span class="icon">
<i class><svg viewbox='0 0 24 24' stroke-linecap='round' stroke-linejoin='round' stroke-width='2' aria-hidden='true'>
<path d="M23 3a10.9 10.9 0 0 1-3.14 1.53 4.48 4.48 0 0 0-7.86 3v1A10.66 10.66 0 0 1 3 4s-4 9 5 13a11.64 11.64 0 0 1-7 2c9 5 20 0 20-11.5a4.5 4.5 0 0 0-.08-.83A7.72 7.72 0 0 0 23 3z"/>
</svg></i>
</span>
</a></nav>
</div>
</nav>
</div>
</section>
<section class="section">
<div class="container">
<div class="subtitle is-6 is-pulled-right">
<a class="subtitle is-6" href="https://vatlab.github.io/blog/tags/sos">#SoS</a>
</div>
<h2 class="subtitle is-6">March 29, 2018</h2>
<h1 class="title">How does SoS compare with other workflow engines</h1>
<div class="content">
<p>Over <a href="https://github.com/common-workflow-language/common-workflow-language/wiki/Existing-Workflow-systems" target="_blank">200 workflow systems</a> have been developed to date.
Like any other software tools, many workflow systems are actively evolving with new features added from time to time. The goal of this blog post is to illustrate, by means of
comparison to some of the most popular workflow systems similar to SoS, <strong>features and limitations of SoS as a conventional workflow system</strong>.
It should be seen as a check-list of basic workflow features, in addition to the unique niche SoS places itself in the realm of workflow systems as explained
in the next section and in <a href="https://vatlab.github.io/blog/" target="_blank">other posts</a>.</p>
<h2 id="how-does-sos-compare-with-nextflow-snakemake-bpipe-cwl-and-galaxy">How does SoS compare with Nextflow, SnakeMake, Bpipe, CWL, and Galaxy</h2>
<p>In comparison to most workflow systems that are designed for “consumers” of workflows with emphases on efficient execution of well-crafted workflows with hidden details,
<strong>SoS is designed for “developers” of workflows for ad hoc data processing with emphases on lowering the barrier of using workflows in daily computational research</strong>.
The following tables compare basic features, workflow features, and built-in support for external tools and services between <a href="https://vatlab.github.io/sos-docs/" target="_blank">SoS</a>,
<a href="https://www.nextflow.io/" target="_blank">NextFlow</a>, <a href="https://snakemake.readthedocs.io/en/stable/" target="_blank">Snakemake</a>, <a href="https://github.com/ssadedin/bpipe" target="_blank">Bpipe</a>, <a href="https://github.com/common-workflow-language/common-workflow-language" target="_blank">CWL</a>, and <a href="https://usegalaxy.org/" target="_blank">Galaxy</a>.</p>
<div class="alert alert-success" role="alert">
<span class="alert-heading"><h5>Hint:</h5></span>
<ol>
<li><strong>Click on the rows of the table to expand/collapse detailed explanations.</strong></li>
<li>"(?)" indicates uncertain comparisons due to lack of information.</li>
<li>Information presented here can be inaccurate or obsolete due to rapid evolution of workflow engines.</li>
</ol>
We would very much appreciate it if you could <a href="https://github.com/vatlab/blog/issues/10" class="alert-link" target="_blank">send us your comments</a> or <a href="https://github.com/vatlab/blog" class="alert-link" target="_blank">pull requests</a> if you notice any problems with the table, or if you believe more workflow engines or features should be compared.
</div>
<h3 id="basic-information">Basic information</h3>
<div class="comparison">
<table>
<thead>
<tr>
<th align="left">Workflow</th>
<th align="left">SoS</th>
<th align="left">NextFlow</th>
<th align="left">Snakemake</th>
<th align="left">Bpipe</th>
<th align="left">CWL</th>
<th align="left">Galaxy</th>
</tr>
</thead>
<tbody>
<tr class="result">
<th align="left">Language</th>
<td align="left">Python based</td>
<td align="left">Groovy flavored</td>
<td align="left">GNU Make style, Python flavored</td>
<td align="left">Groovy flavored</td>
<td align="left">na</td>
<td align="left">na</td>
</tr>
<tr class="detail">
<td>
The scripting language for workflow specification</td>
<td>
SoS extends Python 3.6 with
<a href="https://vatlab.github.io/sos-docs/doc/documentation/SoS_Syntax.html">a number of SoS-specific syntax extensions</a> and
<a href="https://vatlab.github.io/sos-docs/doc/documentation/Targets_and_Actions.html">pre-defined functions</a>.
</td>
<td>Nextflow is based on Groovy syntax with Nextflow-defined functions and objects. See <a href="https://www.nextflow.io/docs/latest/script.html">here</a> for details.</td>
<td>Snakemake is written in Python and has the flavor of <code>Make</code> system in syntax and execution.</td>
<td>Bpipe is implemented in Groovy. Its syntax departs as little as possible from the simplicity of the shell script.</td>
<td>CWL workflows are specified in JSON or YAML format.</td>
<td>Galaxy's workflows are stored in JSON files together with GUI-related meta information</td>
</tr>
<tr class="result">
<th align="left">User interface</td>
<td align="left">CLI + Notebook (Jupyter)</td>
<td align="left">CLI</td>
<td align="left">CLI</td>
<td align="left">CLI</td>
<td align="left">CLI (cwltool)</td>
<td align="left">CLI + GUI</td>
</tr>
<tr class="detail">
<td>Primary methods for users to interact with the workflow engine</td>
<td>SoS provides two sets of user interface: <a href="https://vatlab.github.io/sos-docs/doc/documentation/User_Interface.html">command line</a> (<code>sos</code> command) and <a href="https://vatlab.github.io/sos-docs/doc/documentation/Notebook_Interface.html#Execution-of-Workflows--15">Jupyter magics</a> (<code>%run</code>, <code>%sorun</code> etc)</td>
<td>Nextflow workflows are executed with a <code>nextflow</code> command.</td>
<td>Snakemake workflows are executed with a <code>snakemake</code> command.</td>
<td>Bpipe workflows are executed with a <code>bpipe</code> command.</td>
<td>cwltool has a CLI, but other workflow engines could provide a GUI</td>
<td>Galaxy workflows are mostly executed using a web interface, but it can also be executed using a CLI.</td>
</tr>
<tr class="result">
<th align="left">File format</th>
<td align="left">.sos (plain text) and Jupyter notebook</td>
<td align="left">.nf (plain text)</td>
<td align="left">Snakefile (plain text)</td>
<td align="left">.pipe (groovy, plain text)</td>
<td align="left">.cwl and .yml (JSON/YAML)</td>
<td align="left">XML</td>
</tr>
<tr class="detail">
<td>Format(s) to save workflows</td>
<td>SoS workflows can be saved in a plain text <code>.sos</code> format, or be embedded in a Jupyter Notebook with SoS kernel.</td>
<td>Plain text file with <code>.nf</code> extension.</td>
<td>Plain text file named <code>Snakefile</code>, or with <code>*.rules</code> extension for rules from another file.</td>
<td>Plain text file with <code>.pipe</code>, or <code>*.groovy</code> extensions. </td>
<td>CWL documents are written in JSON or YAML, or a mix of the two</td>
<td>Galaxy files are saved by the framework and are not supposted to be edited directly.</td>
</tr>
<tr class="result">
<th align="left">IDE</th>
<td align="left">SoS Notebook (Jupyter)</td>
<td align="left">No</td>
<td align="left">No</td>
<td align="left">No</td>
<td align="left">No (cwltools)</td>
<td align="left">Yes (only for building DAG)</td>
</tr>
<tr class="detail">
<td>Integrated Development Environment</td>
<td>SoS uses SoS Notebook, a companion Polyglot notebook environmnet based on Jupyter, as its IDE.</td>
<td>No dedicated IDE is available, but users can IDEs that support groovy (e.g. Eclipse, Netbeans) to edit (but not execute) nextflow workflows.</td>
<td>No dedicated IDE is available but syntax highlighter plugin are provided for some text editors.</td>
<td>No dedicated IDE is available but editors supporting Groovy syntax can be use to facilicate pipeline development.</td>
<td>No IDE is provided for cwltools, but other task engines might provide one</td>
<td>A web interface is provided to create steps and connect them</td>
</tr>
</tbody>
</table>
</div>
<h3 id="workflow-features">Workflow features</h3>
<div class="comparison">
<table>
<thead>
<tr>
<th align="left">Workflow</th>
<th align="left">SoS</th>
<th align="left">NextFlow</th>
<th align="left">Snakemake</th>
<th align="left">Bpipe</th>
<th align="left">CWL</th>
<th align="left">Galaxy</th>
</tr>
</thead>
<tbody>
<tr class="result">
<th align="left">DAG Building</th>
<td align="left">Explicit DAG of steps by connecting steps, implicit by target matching</td>
<td align="left">Implicit DAG of steps from input/output</td>
<td align="left">Implicit DAG by files from pattern matching input/output</td>
<td align="left">Implicit DAG of steps from input/output</td>
<td align="left">Implicit DAG of steps from input/output</td>
<td align="left">Explicit DAG of steps by connecting steps</td>
</tr>
<tr class="detail">
<td>Methods and logic to construct dependency graphs connecting tasks in a workflow.
</td>
<td>SoS supports explicit forward-style (sequential numbered steps), makefile-style (dependency), and mixed-style of subworkflows, and steps can be explicitly dependent upon.</td>
<td>Nextflow specifies process with input and output, and creates DAG of steps (the processes).</td>
<td>Relies on <a href="http://snakemake.readthedocs.io/en/stable/snakefiles/rules.html">filename (pattern) matching</a> to determine execution sequence.</td>
<td>Bpipe specifies stages with input and output, and creates DAG from the stages.</td>
<td>DAG is constructed from source of steps</td>
<td>DAG of galaxy is built explicitly using its web interface.</td>
</tr>
<tr class="result">
<th align="left">Streaming processing</th>
<td align="left">No</td>
<td align="left">Yes</td>
<td align="left">Yes</td>
<td align="left">No</td>
<td align="left">Optional</td>
<td align="left">No</td>
</tr>
<tr class="detail">
<td>Ability to process tasks inputs/outputs as a stream of data.
</td>
<td>SoS "data" are passed around as files.</td>
<td>Processes in nextflow can communicate via asynchronous FIFO queues, called channels in the Nextflow lingo. </td>
<td>From Snakemake 5.0 on, it is possible to <a href='http://snakemake.readthedocs.io/en/stable/snakefiles/rules.html?highlight=pipe#piped-output'>mark output files as pipes</a>. </td>
<td>
input and output variables are files.</td>
<td>The input files can be "streamable" and may be handled by pipes</td>
<td>Galaxy does not support streaming between steps.</td>
</tr>
<tr class="result">
<th align="left">Subworkflow</th>
<td align="left">Yes</td>
<td align="left">Yes</td>
<td align="left">Yes</td>
<td align="left">Yes</td>
<td align="left">Yes</td>
<td align="left">Yes</td>
</tr>
<tr class="detail">
<td>Support for executing subworkflows, potentially loaded from another pipeline file.
</td>
<td>SoS provides a <code>sos_run(name)</code> function to dynamically execute a subworkflow.</td>
<td>Nextflow supports subworkflows through the use of </a href="https://www.nextflow.io/docs/edge/dsl2.html#modules">submodules</a></td>
<td>Rules can be loaded from other text files. Subworkflows can be achieved by setting input of one workflow explicitly as output of another workflow.</td>
<td>Bpipe <code>run</code> keyword uses <code>+</code> operator to connect selected stages to pipeline. The <code>Load</code> statement can be used to import variable and pipeline stages from other files.</td>
<td>A CWL workflow can be used in place of a regular CWL step</td>
<td>Subworkflows are supported, although they cannot be generated dynamically as other workflow tools.</td>
</tr>
<tr class="result">
<th align="left">Atomic Write</th>
<td align="left">Yes</td>
<td align="left">Yes</td>
<td align="left">Yes</td>
<td align="left">Yes</td>
<td align="left">Implementation dependent</td>
<td align="left">Likely yes</td>
</tr>
<tr class="detail">
<td>Generate output only when the step completes so that failed steps do not leave incomplete output files.
</td>
<td>SoS uses step signature to track the output of steps and will remove partial output when the step fails.</td>
<td>Nextflow steps are executed in a stage area so all outputs are complete.</td>
<td>Snakemake uses .snakemake/incomplete_files to track paritial output files from failed runs.</td>
<td>Output from failed steps got cleaned up so failed steps will not get in the way during re-execution</td>
<td>The CWL specification does not require atomic write but individual workflow engine will likely implement it in some way</td>
<td>We could not find any information related to how galaxy recovers from failed steps. It is likely that its steps are staged so writes are atomic.</td>
</tr>
<tr class="result">
<th align="left">Named input/output</th>
<td align="left">Yes</td>
<td align="left">Yes</td>
<td align="left">Yes</td>
<td align="left">No</td>
<td align="left">Yes</td>
<td align="left">Yes</td>
</tr>
<tr class="detail">
<td>Label step input and output and use the labels to connect steps as flow of data
</td>
<td>SoS supports named input and output through keyword arguments in input and output statements and refer to them with functions <code>named_output</code></td>
<td>The "from" part of input essentially names the input</td>
<td>Snakemake support <a href="https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html">named input</a> through keyword arguments in input and output statements.</td>
<td>There seems to be no way to group input by names in bpipe</td>
<td>CWL supports named output and the creation of data flow</td>
<td>Galaxy workflows explicitly lables input and outputs</td>
</tr>
<tr class="result">
<th align="left">Modify and resume</th>
<td align="left">Yes</td>
<td align="left">Yes</td>
<td align="left">Yes</td>
<td align="left">Yes</td>
<td align="left">Yes (Optional for other engines)</td>
<td align="left">No (?)</td>
</tr>
<tr class="detail">
<td>Able to resume interrupted or modified workflow and ignore parts of the workflow that have been successfully executed</td>
<td>SoS automatically keeps signatures of steps and tasks and can ignore steps and tasks that have already
been executed, even if they were executed by a different workflow.</td>
<td>Nextflow keeps track of all the processes executed in your pipeline. If you modify some parts of your script, only the processes that are actually changed will be re-executed. The execution of the processes that are not changed will be skipped and the cached result used instead.</td>
<td>Similar to Make, Snakemake uses timestamps to determine modification status and resume points.</td>
<td>Uses customized timestamp signature (at millisecond resolution) of input / output to determine modification. By default <a href='https://github.com/ssadedin/bpipe/issues/157'>it does not</a> check status of command or script changes.</td>
<td>Pausing and resuming workflow is not part of the specification and is not required </td>
<td>No information on runtime signature or restart of failed jobs could be found.</td>
</tr>
<tr class="result">
<th align="left">Buit-in remote execution</th>
<td align="left">Yes</td>
<td align="left">No</td>
<td align="left">No</td>
<td align="left">No</td>
<td align="left">No</td>
<td align="left">No</td>
</tr>
<tr class="detail">
<td>Send tasks to remote hosts for execution.</td>
<td>SoS can execute entire workflows or individual tasks on multiple remote hosts, with file synchronization between
heterogeneous file systems.</td>
<td>Nextflow can be executed on a variety of environments but it has to be started within the environments</td>
<td>Snakemake can be executed on a variety of environments but it has to be started within the environments</td>
<td>Bpipe can be executed on a variety of environments but it has to be started within the environments</td>
<td>The CWL specification does not contain any feature for remote execution.</td>
<td>Galaxy can be executed on a variety of environments but it has to be started within the environments</td>
</tr>
<tr class="result">
<th align="left">Task monitoring</th>
<td align="left">Command line and GUI (Notebook), with summary report</td>
<td align="left">Report traces and performances</td>
<td align="left">Report traces and performance</td>
<td align="left">Event notification</td>
<td align="left">Implementation dependent</td>
<td align="left">GUI to explore, share and reuse histories</td>
</tr>
<tr class="detail">
<td>Ability to send tasks to multiple isolated computing environment and manage them from local host.
"Report traces and performance" means that benchmarking commands and outputs are logged, along with resources usage such as CPU hours and memory consumption.</td>
<td>SoS can monitor tasks through the Jupyter Notebook interface with magics (e.g <code>%taskinfo</code>) to retrieve details about the tasks. It can also monitor status of tasks through a command line interface (e.g. <code>sos status</code>).
A summary report could be generated with <a href="https://vatlab.github.io/sos-docs/doc/tutorials/Execution_of_Workflow.html" target="_blank">option <code>-p</code></a>.</td>
<td>Nextflow can generate <a href="https://www.nextflow.io/docs/latest/tracing.html#" target="_blank">complete reports</a> with details on CPU/task usage etc.</td>
<td><a href="http://snakemake.readthedocs.io/en/stable/tutorial/additional_features.html#benchmarking" target="_blank">Benchmarking</a> and <a href="http://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files">logging</a></td>
<td><a href='http://docs.bpipe.org/Guides/Notifications/' target="_blank">Notification in Bpipe</a> can be configured by Gmail, or genetric SMTP / XMPP protocols. It also provides commands such as <code>send, succeed, fail</code> for arbitrary notifications.</td>
<td>There is no mentioning of job monitoring of jobs in CWL specification, but workflow engines should provide their own facilities for job monitoring</td>
<td>The GUI shows the status of each step with colors.</td>
</tr>
<tr class="result">
<th align="left">Process-oriented workflow</th>
<td align="left">Yes</td>
<td align="left">Yes</td>
<td align="left">No</td>
<td align="left">Yes</td>
<td align="left">Yes</td>
<td align="left">Yes</td>
</tr>
<tr class="detail">
<td>Workflows that are constructed and executed by steps to execute.</td>
<td>SoS' "forward-style" workflow specifies steps of workflows through sequencial numbering although a DAG could be constructed with target dependencies.</td>
<td>Nextflow executes specified workflow with specified input and parameters.</td>
<td>Snakemake workflow depends on filename wildcard pattern matching, not rule names, although <a href='http://snakemake.readthedocs.io/en/stable/snakefiles/rules.html?highlight=rules#handling-ambiguous-rules'>rule order</a> and <a href='http://snakemake.readthedocs.io/en/stable/snakefiles/rules.html?highlight=rules#priorities'>rule priorities</a> can be configured to change execution ordering.</td>
<td>Bpipe workflow is <a href='http://docs.bpipe.org/Guides/ParallelTasks/'>process-oriented and executed in parallel</a>. </td>
<td>CWL execute workflows from specified steps and inputs, not from desired output</td>
<td>Galaxy construct and execute workflows as connected steps.</td>
</tr>
<tr class="result">
<th align="left">Output-oriented workflow</th>
<td align="left">Yes</td>
<td align="left">No</td>
<td align="left">Yes</td>
<td align="left">No</td>
<td align="left">No</td>
<td align="left">No</td>
</tr>
<tr class="detail">
<td>Workflows that are constructed and executed by the "outcome" of the workflow.</td>
<td>SoS' auxiliary steps specifies outcomes of steps and will be called when the target is needed.</td>
<td>Nextflow executes specified workflow with specified input and parameters.</td>
<td>Snakemake workflows are output-oriented: execution ordering relies on filename patterns (with exceptions).</td>
<td>Bpipe does not use implicit file name pattern matching to construct pipelines, although it supports input file wildcards for running multiple stages simultaneously on different data.</td>
<td>CWL does not use implicit workflow construction to execute workflow to generate specified outcomes</td>
<td>Galaxy does not automatically build workflows from intended outcomes</td>
</tr>
</tbody>
</table>
</div>
<h3 id="built-in-support">Built-in support</h3>
<div class="comparison">
<table>
<thead>
<tr>
<th align="left">Workflow</th>
<th align="left">SoS</th>
<th align="left">NextFlow</th>
<th align="left">Snakemake</th>
<th align="left">Bpipe</th>
<th align="left">CWL</th>
<th align="left">Galaxy</th>
</tr>
</thead>
<tbody>
<tr class="result">
<th align="left">Docker</th>
<td align="left">Yes</td>
<td align="left">Yes</td>
<td align="left">Yes</td>
<td align="left">No</td>
<td align="left">Yes</td>
<td align="left">Yes</td>
</tr>
<tr class="detail">
<td>Support for docker</td>
<td>A <code>docker_image</code> option can execute scripts inside specified docker images.</td>
<td>Nextflow support <a href="https://www.nextflow.io/docs/latest/docker.html">docker containers. You can
run all scripts in the specified docker image, or specify a docker image for each step.</a></td>
<td>Snakemake supports the use of <a href='http://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html?highlight=container#running-jobs-in-containers'>rule level containers</a>.</td>
<td>Bpipe does not have build-in support for containers.</td>
<td>CWL specification supports docker</td>
<td>Galaxy steps can execute <code>docker run</code> command with docker-flavored images.</td>
</tr>
<tr class="result">
<th align="left">Singularity</th>
<td align="left">Yes</td>
<td align="left">Yes</td>
<td align="left">Yes</td>
<td align="left">No</td>
<td align="left">Yes</td>
<td align="left">No</td>
</tr>
<tr class="detail">
<td>Support for singularity</td>
<td>SoS supports singularity with action options <code>container</code> and <code>engine</code>m see <a href="https://vatlab.github.io/sos-docs/doc/tutorials/Singularity.html">SoS Singularity Guide</a> for details.</td>
<td>Nextflow <a href="https://www.nextflow.io/docs/latest/singularity.html">supports singularity containers</a>.
It works similar to docker but with options such as <code>singularty.enabled=true</code>.</td>
<td>Snakemake supports the use of <a href='http://snakemake.readthedocs.io/en/stable/snakefiles/deployment.html?highlight=container#running-jobs-in-containers'>rule level containers</a>.</td>
<td>Bpipe does not have build-in support for containers.</td>
<td>Not mentioned in CWL specification but cwltool supports it</td>
<td>Galaxy supports Singularity containers.</td>
</tr>
<tr class="result">
<th align="left">PBS/Torque/LSF/SLURM</th>
<td align="left">Needs template</td>
<td align="left">Yes</td>
<td align="left">Direct or via template</td>
<td align="left">Yes</td>
<td align="left">Yes</td>
<td align="left">Yes</td>
</tr>
<tr class="detail">
<td>Ability to execute workflows on a PBS-style computer cluster
</td>
<td>SoS interact with clusters through pre-configured templates and commands. It has been tested to work on
Torque, LSF, SLURM, PBS, and Torque</td>
<td>Nextflow supports Open grid, Univa grid, LSF, SLURM, PBS Works, Torque</td>
<td>Snakemake can interact with clusters through templates, or directly if the cluster supports <a href="http://www.drmaa.org/http://www.drmaa.org/">DRMAA</a>.</td>
<td>Bpipe provides <a href='http://docs.bpipe.org/Guides/ResourceManagers/'>build-in support for some resource manager systems</a>, and a template-based system (<a href='http://docs.bpipe.org/Guides/ImplementingAResourceManager/'>adapter script</a>) to support implementing resource managers.</td>
<td>cwltool and other implementations supports cluster</td>
<td>Galaxy can be deployed on clusters with steps executed on computing nodes.</td>
</tr>
<tr class="result">
<th align="left">HTCondor</th>
<td align="left">Require template (?)</td>
<td align="left">Yes</td>
<td align="left">Require template (?)</td>
<td align="left">Require template</td>
<td align="left">No (?)</td>
<td align="left">Yes</td>
</tr>
<tr class="detail">
<td>Ability to use <a href="https://research.cs.wisc.edu/htcondor/">HTCondor</a> to execute workflows on large collections of distributively owned computing resources.</td>
<td>Do not know because we have not had a chance to configure SoS to run on a HT Condor system.</td>
<td>Nextflow <a href="https://www.nextflow.io/docs/latest/executor.html#htcondor">supports HTCondor</a></td>
<td>There is no built-in support for HTCondor, however we cannot find existing Snakemake HTCondor job templates either.</td>
<td>There is no built-in support for HTCondor, however there seems to be <a href='https://github.com/GenomicParisCentre/eoulsan/blob/3b171735888f728b6804abcdfd7e7fce80b5218a/src/main/bin/bpipe-htcondor.sh'>third-party adapter scripts</a> for HTCondor job scheduler.</td>
<td>There seems to be no built-in support for HTCondor</td>
<td>Galaxy supports HTCondor as described <a href="https://galaxyproject.org/cloudman/ht-condor/">here</a>.</td>
</tr>
<tr class="result">
<th align="left">Distributed Task Queue</th>
<td align="left">Yes (RQ)</td>
<td align="left">No</td>
<td align="left">No</td>
<td align="left">No</td>
<td align="left">No</td>
<td align="left">No</td>
</tr>
<tr class="detail">
<td>Ability to send tasks to distributed task queues such as <a href="http://python-rq.org/">RQ</a> and <a href="http://www.celeryproject.org/">Celery</a>.
</td>
<td>SoS supports RQ, Celery support is likely broken due to lack of maitainence.</td>
<td>Nextflow cannot submit tasks to external task queues</td>
<td>Snakemake cannot submit tasks to external task queues</td>
<td>Bpipe does not provide build-in support for external task queues</td>
<td>cwltools does not support external task queues</td>
<td>Galaxy does not support external task queues.</td>
</tr>
<tr class="result">
<th align="left">Distributed systems</th>
<td align="left">No</td>
<td align="left">Yes</td>
<td align="left">Experimental</td>
<td align="left">No</td>
<td align="left">Implementation dependent</td>
<td align="left">Yes</td>
</tr>
<tr class="detail">
<td>Ability to spawn the executions of pipeline tasks through a distributed cluster such as Apache Spark, Apache Ignite, Apache Mesos, and Kubernetes.</td>
<td>No</td>
<td>Nextflow supports distributed systems such as <a href="https://www.nextflow.io/docs/latest/ignite.html">Apache Ignite</a> and <a href="https://www.nextflow.io/docs/latest/kubernetes.html">Kubernetes</a></td>
<td>Snakemake 4.0 and later <a href='http://snakemake.readthedocs.io/en/stable/executable.html?highlight=Kubernetes'>supports experimental execution</a> in the cloud via Kubernetes.</td>
<td>No</td>
<td>No trace of support from cwltool but other workflow engines might support it</td>
<td>Galaxy could be delopyed on top of Kubernetes as described <a href="https://github.com/galaxyproject/galaxy-kubernetes">here</a> </td>
</tr>
<tr class="result">
<th align="left">Cloud Storage</th>
<td align="left">No</td>
<td align="left">Yes</td>
<td align="left">Yes</td>
<td align="left">No</td>
<td align="left">No</td>
<td align="left">Yes</td>
</tr>
<tr class="detail">
<td>Ability to make use of cloud storage (such as AWS).
</td>
<td>Not currently </td>
<td>Nexflow can <a href="https://www.nextflow.io/docs/latest/amazons3.html">access S3 storage</a></td>
<td>Snakemake can access files on <a href='http://snakemake.readthedocs.io/en/stable/snakefiles/remote_files.html'>cloud storage</a></td>
<td>No</td>
<td>No information could be found for support for cloud storage. This should again be implementation/engine specific.</td>
<td>Galaxy objects could be stored on distributed store or Amazon S3 (c.f. <a href="https://galaxyproject.org/object-store/">Galaxy Object Store</a>)</td>
</tr>
</tbody>
</table>
</div>
<h2 id="is-sos-for-you">Is SoS for you?</h2>
<p>SoS is not for everyone. As a workflow system:</p>
<ul>
<li>If you are looking for a industrial-grade workflow system for the handling of millions of large jobs, you should look for proven solutions such as <a href="https://github.com/spotify/luigi" target="_blank">Luigi</a>.</li>
<li>If you are aiming at the creation of “portable” workflows that can be executed in various cluster and cloud environments, <a href="https://www.nextflow.io/" target="_blank">NextFlow</a> can be the first to try. <a href="https://snakemake.readthedocs.io/en/stable/" target="_blank">Snakemake</a> also has a wide user base and is a close draw with NextFlow in many aspects. <a href="https://github.com/ssadedin/bpipe" target="_blank">Bpipe</a> is also popular but seems to be less popular then NextFlow and SnakeMake.</li>
<li>If you are aiming at the creation of “general” workflows with no specific workflow engine in mind, <a href="https://github.com/common-workflow-language/common-workflow-language" target="_blank">CWL</a> is currently the best bet as CWL workflows can be executed by multiple workflow engines in different environments.</li>
<li>If you are looking for a script-less GUI-based workflow system with the need for writing scripts, the answer is no because SoS is script based. <a href="https://usegalaxy.org/" target="_blank">Galaxy</a> can be a good choice at least for bioinformatic applications.</li>
<li>If you are a <strong>Jupyter</strong> or <strong>JupyterLab</strong> user, the answer is most likely yes because SoS is embedded into SoS Notebook, which is by itself a polyglot notebook. You can enjoy all features of SoS Notebook and step into SoS only when needed.</li>
<li>If you would like to use <strong>a workflow system for daily exploratory data analysis and computaional research</strong>, SoS should be most usable since it is designed for interaction data analysis and execution of tasks on remote systems.</li>
</ul>
</div>
</div>
</section>
<section class="section">
<div class="container">
<div id="gh-comments">
<br/><br/>
<h5>COMMENTS</h5>
<div id="gh-comments-list"></div>
<a href="javascript:void(0)" id="gh-load-comments" class="btn" style="display:none">Load more comments</a>
</div>
<script type="text/javascript" src="https://vatlab.github.io/blog/js/github-comments.js"></script>
<script type="text/javascript">
DoGithubComments( 10 );
</script>
</div>
</section>
<section class="section">
<div class="container has-text-centered">
<p>© <a href="https://faculty.mdanderson.org/profiles/bo_peng.html">Bo Peng, Ph.D. / MD Anderson Cancer Center</a> All rights reserved</p>
</div>
</section>
<script type="application/javascript">
var doNotTrack = false;
if (!doNotTrack) {
window.ga=window.ga||function(){(ga.q=ga.q||[]).push(arguments)};ga.l=+new Date;
ga('create', 'UA-107286198-1', 'auto');
ga('send', 'pageview');
}
</script>
<script async src='https://www.google-analytics.com/analytics.js'></script>
</body>
<script>
$(document).ready(function(){
$(".result").click(function(){
$(this).toggleClass('expanded')
$(this).nextUntil('tr.result').slideToggle(100, function(){
});
});
$(".detail").click(function(){
$(this).prev().toggleClass('expanded');
$(this).slideToggle(100, function(){
});
});
});
</script>
</html>