Pyradigm: PYthon based data structure to improve Dataset's InteGrity in Machine learning workflows

A common problem for machine learning developers is keeping track of the source of the features extracted, and to ensure integrity of the dataset (e.g. not getting data mixed up from different subjects and/or classes). This is incredibly hard as the number of projects grow, or personnel changes are frequent. These aspects can break the chain of hyper-local info about various datasets, such as where did the original data come from, how was it processed or quality controlled, how was it put together, by who and what does some columns in the table mean etc. This package provides a Python data structure to encapsulate a machine learning dataset with key info greatly suited for neuroimaging applications (or any other domain), where each sample needs to be uniquely identified with a subject ID (or something similar). Key-level correspondence across data, labels (e.g. 1 or 2), classnames (e.g. 'healthy', 'disease') and the related helps maintain data integrity, in addition to offering a way to easily trace back to the sources from where the features have been originally derived.

For users of Panadas, some of the elements in pyradigm's API/interface may look familiar. However, the aim of this data structure is not to offer an alternative to pandas, but to ease the machine learning workflow for neuroscientists by 1) offering several well-knit methods and useful attributes specifically geared towards neuroscience research, 2) aiming to offer utilities that combines multiple or advanced patterns of routine dataset handling and 3) using a more accessible language (compared to hard to read pandas docs aimed at econometric audience) to better cater to neuroscience developers (esp. the novice).

Thanks for checking out. Your feedback will be appreciated.

Installation

pip install pyradigm

Usage

This Pyradigm Example notebook illustrates the usage.

Requirements

Packages: numpy
Python versions: I plan to support all the popular versions soon. Only 2.7 is tested for support at the moment.

Support on Beerpay

Hey dude! Help me out for a couple of 🍻!

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
pyradigm		pyradigm
.coverage		.coverage
.coveralls.yml		.coveralls.yml
.gitignore		.gitignore
.travis.yml		.travis.yml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
PyradigmExample.ipynb		PyradigmExample.ipynb
README.md		README.md
coverage.xml		coverage.xml
paper.bib		paper.bib
paper.md		paper.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pyradigm: PYthon based data structure to improve Dataset's InteGrity in Machine learning workflows

Installation

Usage

Requirements

Support on Beerpay

About

Uh oh!

Releases

Packages

Languages

License

m9h/pyradigm

Folders and files

Latest commit

History

Repository files navigation

Pyradigm: PYthon based data structure to improve Dataset's InteGrity in Machine learning workflows

Installation

Usage

Requirements

Support on Beerpay

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages