Skip to content

usnistgov/CDCS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Configurable Data Curation System (CDCS)

The Configurable Data Curation System (CDCS) or Curator developed at NIST provides a means for capturing, sharing, and transforming unstructured data into a structured format based on XML or JSON Schemas.

The CDCS can be viewed as a "loading dock" for scientific data. It serves as means to enable the collection and dissemination of structured scientific data. It can be applied to any area and is agnostic to the type of data. “Curated” data is amenable to transformation to other formats such as those used by existing computational tools. The data are organized using user-selected community-developed templates encoded in XML or JSON Schemas used to create data documents that are indexed in a database (PostgreSQL, SQLite, MongoDB).

The CDCS is implemented in Python, with the Django framework. It provides a Representational State Transfer (REST) API that allows other software to directly interact with it over a network. CDCS functions are available via the API, allowing for full automation.

More information regarding the CDCS can be found on the CDCS Website.

Context

The CDCS originated from the Materials Genome Initiative (MGI). In the MGI, there may be collections of incompatible data often represented in diverse formats. This is a challenge to the distributed research goal envisaged by the MGI. The ability of the CDCS’ underlying XML/JSON format to be transformed into virtually any other format using standard tools, gives the CDCS the ability to serve as a data source for a wide variety of existing materials informatics efforts that can span across projects, groups, and organizations. Each project, group, or organization can run as many CDCS instances as needed. Individual CDCS repositories can be interconnected for federated searches and data sharing.

Getting started

CDCS Projects

Two types of systems are usually implemented using the CDCS:

  • Data repositories, such as the Materials Data Curation System (MDCS), that allows for the curation and dissemination of materials dataset in an online repository using predefined templates.
  • Data registries, such as the NIST Materials Resource Registry (NMRR), that allows for the registration of materials resources, bridging the gap between existing resources and the end users. The Registry functions as a centrally located service, making the registered information available for research to the materials community.

Deployment

The CDCS can be deployed using several methods:

Tools

The CDCS provides a REST API to programmatically access its features. A documentation of the different REST endpoints is available at the /api/docs url of a deployed system. The REST API can be accessed by any HTTP library such as curl or python requests. A python library has also been implemented to facilitate the interactions with the CDCS REST API: PyCDCS

Disclaimer

NIST Disclaimer

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published