Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Failed to load latest commit information.|
.. -*- rest -*- .. vim:syntax=rest ======================================== Datarray: Numpy arrays with named axes ======================================== Scientists, engineers, mathematicians and statisticians don't just work with matrices; they often work with structured data, just like you'd find in a table. However, functionality for this is missing from Numpy, and there are efforts to create something to fill the void. This is one of those efforts. .. warning:: This code is currently experimental, and its API *will* change! It is meant to be a place for the community to understand and develop the right semantics and have a prototype implementation that will ultimately (hopefully) be folded back into Numpy. Datarray provides a subclass of Numpy ndarrays that support: - individual dimensions (axes) being labeled with meaningful descriptions - labeled 'ticks' along each axis - indexing and slicing by named axis - indexing on any axis with the tick labels instead of only integers - reduction operations (like .sum, .mean, etc) support named axis arguments instead of only integer indices. Prior Art ========= At present, there is no accepted standard solution to dealing with tabular data such as this. However, based on the following list of ad-hoc and proposal-level implementations of something such as this, there is *definitely* a demand for it. For examples, in no particular order: * [Tabular](http://bitbucket.org/elaine/tabular/src) implements a spreadsheet-inspired datatype, with rows/columns, csv/etc. IO, and fancy tabular operations. * [scikits.statsmodels](http://scikits.appspot.com/statsmodels) sounded as though it had some features we'd like to eventually see implemented on top of something such as datarray, and [Skipper](http://scipystats.blogspot.com/) seemed pretty interested in something like this himself. * [scikits.timeseries](http://scikits.appspot.com/timeseries) also has a time-series-specific object that's somewhat reminiscent of labeled arrays. * [pandas](http://pandas.sourceforge.net/) is based around a number of DataFrame-esque datatypes. * [pydataframe](http://code.google.com/p/pydataframe/) is supposed to be a clone of R's data.frame. * [larry](http://github.com/kwgoodman/la), or "labeled array," often comes up in discussions alongside pandas. * [divisi](http://github.com/commonsense/divisi2) includes labeled sparse and dense arrays. * [pymvpa](https://github.com/PyMVPA/PyMVPA) provides Dataset class encapsulating the data together with matching in length sets of attributes for the first two (samples and features) dimensions. Dataset is not a subclass of numpy array to allow other data structures (e.g. sparse matrices). * [ptsa](http://git.debian.org/?p=pkg-exppsy/ptsa.git) subclasses ndarray to provide attributes per dimensions aiming to ease slicing/indexing given the values of the axis attributes Project Goals ============= 1. Get something akin to this in the numpy core. 2. Stick to basic functionality such that projects like scikits.statsmodels and pandas can use it as a base datatype. 3. Make an interface that allows for simple, pretty manipulation that doesn't introduce confusion. 4. Oh, and make sure that the base numpy array is still accessible. Code ==== You can find our sources and single-click downloads: * `Main repository`_ on Github. * Documentation_ for all releases and current development tree. * Download as a tar/zip file the `current trunk`_. * Downloads of all `available releases`_. .. _main repository: http://github.com/fperez/datarray .. _Documentation: http://fperez.github.com/datarray-doc .. _current trunk: http://github.com/fperez/datarray/archives/master .. _available releases: http://github.com/fperez/datarray/downloads