# 

# Use cases

To understand how OSS development practices affect the development of data and metadata standards, it is informative to demonstrate this cross-fertilization through a few use cases. As we will see in these examples some fields, such as astronomy, high-energy physics and earth sciences have a relatively long history of shared data resources from organizations such as LSST and CERN, while other fields have only relatively recently become aware of the value of data sharing and its impact. These disparate histories inform how standards have evolved and how OSS practices have pervaded their development.

## Astronomy

One prominent example of a community-driven standard is the FITS (Flexible Image Transport System) file format standard, which was developed in the late 1970s and early 1980s ([Wells and Greisen 1979](#ref-wells1979fits)), and has been adopted worldwide for astronomy data preservation and exchange. Essentially every software platform used in astronomy reads and writes the FITS format. It was developed by observatories in the 1980s to store image data in the visible and x-ray spectrum. It has been endorsed by IAU, as well as funding agencies. Though the format has evolved over time, “once FITS, always FITS”. That is, the format cannot be evolved to introduce changes that break backwards-compatibility. Among the features that make FITS so durable is that it was designed originally to have a very restricted metadata schema. That is, FITS records were designed to be the lowest common denominator of word lengths in computer systems at the time. However, while FITS is compact, its ability to encode the coordinate frame and pixels, means that data from different observational instruments can be stored in this format and relationships between data from different instruments can be related, rendering manual and error-prone procedures for conforming images obsolete.

## High-energy physics

In HEP standards to collect the data have been established and the community is fairly homogeneous, so standards have very high penetration ([Basaglia et al. 2023](#ref-Basaglia2023-dq)). A top-down approach is taken so that within every large collaboration standards are enforced, and this adoption is centrally managed. Access to raw data is essentially impossible, and making it publicly available is both technically very hard and potentially ill-advised. Analysis tools are tuned specifically to the standards. Incentives to use the standards are provided by funders that require the data management plan that specifies how the data is shared.

## Neuroscience

In contrast to astronomy and HEP, Neuroscience has traditionally been a “cottage industry”, where individual labs have generated experimental data designed to answer specific experimental questions. While this model still exists, the field has also seen the emergence of new modes of data production that focus on generating large shared datasets designed to answer many different questions, more akin to the data generated in large astronomy data collection efforts ([Koch and Clay Reid 2012](#ref-Koch2012-ve)). This change has been brought on through a combination of technical advances in data acquisition techniques, which now generate large and very high-dimensional/information-rich datasets, cultural changes, which have ushered in new norms of transparency and reproducibility, and funding initiatives that have encouraged this kind of data collection (including the US BRAIN Initiative and the Allen Institute for Brain Science). Neuroscience presents an interesting example because these changes are relatively recent. This means that standards for data and metadata in neuroscience have been prone to adopt many of the elements of OSS development. Two salient examples in neuroscience are the Neurodata Without Borders file format for neurophysiology data ([Rübel et al. 2022](#ref-Rubel2022NWB)) and the Brain Imaging Data Structure standard for neuroimaging data ([Gorgolewski et al. 2016](#ref-Gorgolewski2016BIDS)). The latter in particular has adopted a

## Automated discovery

## Citizen science

Basaglia, T, M Bellis, J Blomer, J Boyd, C Bozzi, D Britzger, S Campana, et al. 2023. “Data Preservation in High Energy Physics.” *The European Physical Journal C* 83 (9): 795.

Gorgolewski, Krzysztof J, Tibor Auer, Vince D Calhoun, R Cameron Craddock, Samir Das, Eugene P Duff, Guillaume Flandin, et al. 2016. “The Brain Imaging Data Structure, a Format for Organizing and Describing Outputs of Neuroimaging Experiments.” *Sci Data* 3 (June): 160044. <https://www.nature.com/articles/sdata201644>.

Koch, Christof, and R Clay Reid. 2012. “Observatories of the Mind.” <http://dx.doi.org/10.1038/483397a>.

Rübel, Oliver, Andrew Tritt, Ryan Ly, Benjamin K Dichter, Satrajit Ghosh, Lawrence Niu, Pamela Baker, et al. 2022. “The Neurodata Without Borders Ecosystem for Neurophysiological Data Science.” *Elife* 11 (October).

Wells, Donald Carson, and Eric W Greisen. 1979. “FITS-a Flexible Image Transport System.” In *Image Processing in Astronomy*, 445.