Multi-language library for accessing HCS data stored in a variety of forms (.MAT, .CSV, MySQL, sqlite, etc.), to support development of exploration and analysis tools
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
README

README

The goal of this project is to provide a multi-language and
cross-platform set of libraries for accessing HCS data through a
single interface, regardless of the backend used to store the data.

The initial language targets are:
- Python
- Matlab
- Java(?) (might be easiest path for Matlab)

The initial backends to be supported are:
- CellProfiler 1.0 .MAT files
- CellProfiler 2.0 .MAT files (same as 1.0?)
- CellProfiler 2.0 MySQL databases (v1)
- CellProfiler 2.0 MySQL databases (v2)
- CellProfiler 2.0 sqlite3 databases
- GE INCell Analysis .XML files (code to convert these to CP2 sqlite
  files is available in the CellProfiler Analyst source)
- Whatever format iBrain2 uses
- Possibly, a new HDF5-based format

Goals:
- common interface for all backends
- concise code for all operations currently used in CPA and the
  multiple CellClassifiers
   - per-well and per-cell data (as well as per-image?)
   - classify cells by CPA's GentleBoost, SVM, RF-LDA, others...
   - select a random cell from an image/well/group/experiment
   - select a random cell of some particular class
   - fetch a feature's value for every cell in an
     image/well/group/experiment
   - score images, cells, wells

   

--------------------------------------------------
(older thoughts)

Needs from CPA's point of view:
      metadata, per-image, and per-object tables.
      .properties file should be subsumed in metadata and some reasonable user defaults.
      simple queries for:
             per-object feature data.
             per-feature data across all objects (possibly with well or image information).
             per-image or per-well summaries.
                       Fine for these to be generated on the fly, if fast and cached.
             fetching of every cell's feature vector (with exclusions would be nice, but not necessary) for classification.
                      ... with a possible function to be called on each cell, possibly.
             unified objects such as nucleus/cell/cytoplasm as single object.
             Eventually: virtual flow cytometry via polygon queries.
       

Questions:
        How should data be written, both initially (by CP or another tool)?
            format?  library support?
        How are annotations, classifications, etc. written into data for future use?
            should be through library.
            should include provenance tracking.
        Can we support data correction (plate layout, etc.) in a reasonable fashion?
        Provenance tracking within file itself?
            log messages, svn/git revision #s, actual code...
        Languages supported - matlab, python, java, C
        Formats - CP1.mat, CP2.mat (same?), HDF5, sqlite or mysql with CPA .properties file