JOSS Review: State of the field #217

Midnighter · 2022-08-07T22:37:45Z

This is discussed on p. 1, l. 20-25 and seems too brief to me. It is also rather unclear what the statement

few handle the heterogeneity which is prevalent in many experimental environments

means specifically. The referenced frameworks are very general purpose and I'm quite sure that they can handle almost anything if programmed that way. Just as specific pipelines need to be developed for shed-streaming.

There are also more Python-specific frameworks that are in wide use:

Rapidz itself has an integration for dask
Apache Airflow with sensors might be able to do the job
Ray
Dagster
Flyte

not all of them are specialized for streaming data but the key differentiator of shed-streaming is could be more clear.

The big cloud providers also provide proprietary solutions for streaming data.

It seems to me that the main benefit of shed-streaming is rather that it tightly integrates with an existing ecosystem maintained by NSLS-II. Overall, it appears to provide rather high-level, shallow interfaces and adapters to, for example, rapidz, bluesky, and automatic use of databroker for data provenance. I think both the documentation and the manuscript would greatly benefit from a figure similar to https://nsls-ii.github.io/_images/collection-overview.svg that shows how exactly shed-streaming fits into this ecosystem.

Midnighter mentioned this issue Aug 7, 2022

[REVIEW]: SHED: Streaming Heterogeneous Event Data Tracking with Provenance openjournals/joss-reviews#3119

Closed

20 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JOSS Review: State of the field #217

JOSS Review: State of the field #217

Midnighter commented Aug 7, 2022 •

edited

Loading

JOSS Review: State of the field #217

JOSS Review: State of the field #217

Comments

Midnighter commented Aug 7, 2022 • edited Loading

Midnighter commented Aug 7, 2022 •

edited

Loading