Whole Tale is an NSF-funded Data Infrastructure Building Block (DIBBS) initiative to build a scalable, open source, web-based, multi-user platform for reproducible research enabling the creation, publication, and execution of tales -- executable research objects that capture data, code, and the complete software environment used to produce research findings.
A beta version of the system is available at https://dashboard.wholetale.org.
The Whole Tale platform has been designed based on community input primarily through working groups and collaborations with researchers.
The Whole Tale project is involved in several initiatives to train researchers for reproducibility as well as use of Whole Tale in the classroom.
The goal of Whole Tale is to enable researchers to define and create computational environments to (easily) manage the complete conduct of computational experiments and expose them for analysis and reproducibility. The platform addresses two trends;
- improved transparency so people can run much more ambitious computational experiments
- better computational experiment infrastructure to allow researchers to be more transparent
The Whole Tale platform is being developed to simplify the adoption of practices that improve the understandability and reproducibility of computational research.
Virtually all published discoveries today have data and computational components. There is a mismatch between traditional scientific dissemination practices and modern computational research practice leads to reproducibility concerns.
The Whole Tale platform supports computational reproducibility by enabling researchers to create and package code, data and information about the workflow and computational environment necessary to support review and reproduce results of computational analysis that are reported in published research. Whole Tale implements this definition by supporting explicit citation of externally referenced data, capturing the artifacts and provenance information needed to facilitate understanding, transparency, and execution of the computational processes and workflows used for review and reproducibility at the time of publication.
Researchers are increasingly adopting practices to improve the transparency and reproducibility of their own computational research. Some are self-motivated to improve their own rigor and transparency while others are responding to the demands and requirements of academic communities and journals. Some are advanced tool users with sophisticated methods of packaging and distributing scientific software, often with automated testing and verification. Others are more concerned with the research product than learning new tools and infrastructure for sharing and transparency.
Academic societies, associations and communities are responding to challenges in the reproducibility of published research by adopting recommendations, guidelines, and policies that impact publishers, editors, and researchers. Communities are beginning to adopt practices encouraging or requiring sharing of code and data. Some are even implementing verification and evaluation processes to confirm the reprodubility of published work.
In response to the demand of academic communities to address problems of reproducibility and reuse, journal editors are increasingly adopting guidelines and enforcing policies for the sharing of data, code and information about the software environment used in published research based on computational analysis.
The scholarly publication process has built-in mechanisms for anonymous peer review. Some communities are adopting replication practices to ensure that published research can be replicated at various levels. Anonymous reviewers and curators of research artifacts play an important role in the quality of research artifacts.
Developers and operators of research data repositories are faced with the challenge of addressing the needs of their communities through support for new types of scholarly objects, methods of access, and processes for review and verification.
A tale is an executable research object that combines data (references), code (computational methods), computational environment, and narrative (traditional science story). Tales are captured in a standards-based format complete with metadata.
Whole Tale is an ongoing NSF-funded Data Infrastructure Building Blocks (DIBBS) project initiated in 2016 with expected completion February, 2023.