RFC: In-memory sparse array interchange

## Motivation
The [`sparse`](https://github.com/pydata/sparse) (also called PyData/Sparse) development team have been working on integration efforts with the ecosystem, most notably with SciPy, scikit-learn and others, with CuPy, PyTorch, JAX and TensorFlow also on the radar. One of the challenges we were facing was the lack of (possibly zero-copy) interchange between the different sparse array implementations. We believe this may be a pain point for many sparse array implementations moving forward.

This mirrors an issue seen for dense arrays previously, where the [DLPack protocol](https://dmlc.github.io/dlpack/latest/python_spec.html) was the one of the first things to be standardised. We're hoping to achieve community consensus for a similar problem.

Luckily, all sparse array formats (with the possible exception of DOK) are usually collections of dense arrays underneath. In addition, this problem has been solved for on-disk arrays before by the [binsparse specification](https://graphblas.org/binsparse-specification/). @willow-ahrens is a co-author of that spec, and is also a collaborator for the `sparse` work.

## Proposal
We propose introducing two new methods to the array-API compliant sparse array objects (such as those in `sparse`), which are described below.

### `__binsparse_descriptor__`
The returned item is a `dict` equivalent to a parsed JSON [`binsparse` descriptor](https://graphblas.org/binsparse-specification/#descriptor) of an array.

### `__binsparse__`
The second item is a `dict[str, Array]` of `__dlpack__` compatible arrays, which are the constituent arrays of the sparse array. The key represents the equivalent key in the descriptor.

### Introduction of `from_binsparse` function.
If a library supports sparse arrays, its `from_binsparse` method should support accepting (when possible, zero-copy) versions of objects that follow this `__binsparse__` protocol, and have an equivalent sparse format within the library.

### Psuedocode implementation
Here's a psuedocode example using two libraries, `xp1` and `xp2`, both supporting sparse arrays:

```python
# In library code:
xp2_sparray = xp2.from_binsparse(xp1_sparray, ...)

# This psuedocode impl is common between `xp1` and `xp2`
def from_binsparse(x: object, /, *, device: device | None = None, copy: bool | None = None) -> array:
    binsparse_descr = getattr(x, "__binsparse_descriptor__", None)
    binsparse_impl = getattr(x, "__binsparse__", None)
    if binsparse_impl is None or binsparse_descr is None:
        raise TypeError(...)
    
    binsparse_descriptor = binsparse_descr()
    # Will raise an error if the format/descriptor is unsupported.
    sparse_type = _type_from_binsparse_descriptor(binsparse_descriptor)
    constituent_arrays = binsparse_impl()
    my_constituent_arrays = {
        k: from_dlpack(arr, device=device, copy=copy) for k, arr in constituent_arrays.items()
    }
    return sparse_type.from_strided_arrays(my_constituent_arrays, shape=...)
```

Parallel implementation in `sparse`: https://github.com/pydata/sparse/pull/764
Parallel implementation in SciPy: https://github.com/scipy/scipy/pull/22553

### Alternative solutions
There are formats for on-disk sparse-array interchange [[1]](https://math.nist.gov/MatrixMarket/formats.html) [[2]](http://frostt.io/); but none for in-memory interchange. `binsparse` is the one that comes closest to offering in-memory interchange.

Pinging possibly interested parties:
* @mtsokol @ivirshup (for `scipy.sparse`)
* @willow-ahrens @mtsokol (from `binsparse` and `finch-tensor`/`sparse`)
* @leofang (for `cupyx.sparse`)
* @pearu (for `torch.sparse`)
* @jakevdp (for JAX/TensorFlow)

Updated on 2024.10.09 as agreed in https://github.com/data-apis/array-api/issues/840#issuecomment-2337333242.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC: In-memory sparse array interchange #840

Motivation

Proposal

`__binsparse_descriptor__`

`binsparse`

Introduction of `from_binsparse` function.

Psuedocode implementation

Alternative solutions

16 remaining items

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

RFC: In-memory sparse array interchange #840

Description

Motivation

Proposal

__binsparse_descriptor__

__binsparse__

Introduction of from_binsparse function.

Psuedocode implementation

Alternative solutions

Activity

pearu commented on Sep 6, 2024

hameerabbasi commented on Sep 7, 2024

pearu commented on Sep 7, 2024

hameerabbasi commented on Sep 9, 2024

pearu commented on Sep 9, 2024

hameerabbasi commented on Sep 9, 2024

hameerabbasi commented on Sep 11, 2024

16 remaining items

pearu commented on Dec 3, 2024

hameerabbasi commented on Dec 3, 2024

pearu commented on Dec 3, 2024

rgommers commented on Dec 4, 2024

pearu commented on Dec 4, 2024

rgommers commented on Dec 4, 2024

hameerabbasi commented on Feb 19, 2025

hameerabbasi commented on Mar 6, 2025

rgommers commented on Mar 8, 2025

hameerabbasi commented on Mar 11, 2025

hameerabbasi commented on Apr 4, 2025

rgommers commented on Apr 4, 2025

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions

`__binsparse_descriptor__`

`binsparse`

Introduction of `from_binsparse` function.