Implement support for sparse matrices #8

flying-sheep · 2019-03-01T14:39:03Z

scipy.sparse has its counterparts in the Matrix package.

R Matrix classes

testables via is(m, class_name), or directly by parsing the letter in [dlni]([gst][CRT]|di|ge|s[yp]|t[rp])Matrix.

symmetric and triangular matrices have no equivalent in scipy or numpy.

sparsity:

sparseMatrix
denseMatrix

type:

d__Matrix: dMatrix: double
l__Matrix: lMatrix: logical (=bool)
n__Matrix: nMatrix: pattern (=logical without NA?)
~~i__Matrix: iMatrix: integer~~ doesn’t really exist (yet?)

shape:

_g_Matrix: generalMatrix
_s_Matrix: symmetricMatrix
_t_Matrix: triangularMatrix

storage:

__CMatrix: CsparseMatrix: column-compressed
__RMatrix: RsparseMatrix: row-comressed
__TMatrix: TsparseMatrix: triplet
__pMatrix: packed dense matrix (symmetric or triangular)

special (combinations of shape and storage)

_diMatrix: diagonalMatrix (counts as sparse)
_geMatrix: general dense matrix
_s[yp]Matrix: symmetric dense matrix (unpacked or packed)
_t[rp]Matrix: triangular dense matrix (unpacked or packed)

Scipy class names

Does not have triangular or symmetric matrices, only generic and diagonal sparse matrices. numpy has the dense ones of course (ndarray).

The type is given as dtype, here you can have everything in numpy/scipy, not only logical, double, and integer.

R doesn’t have float32, numpy/scipy doesn’t have NA. These have equivalents in R.

spmatrix: Base class for all sparse matrices
csc_matrix: Compressed Sparse Column matrix
csr_matrix: Compressed Sparse Row matrix
coo_matrix: sparse matrix in COOrdinate format
dia_matrix: Sparse matrix with DIAgonal storage

These are not available in R:

dok_matrix: Dictionary Of Keys based sparse matrix
lil_matrix: Row-based linked list sparse matrix
bsr_matrix: Block Sparse Row matrix

Mapping

The final possible lossless mappings (lossless except for NA, which isn’t supported in python at all):

R	Python
`dgCMatrix`	`csc_matrix(dtype=float64)`
`lgCMatrix`/`pgCMatrix`	`csc_matrix(dtype=bool)`
`dgRMatrix`	`csr_matrix(dtype=float64)`
`lgRMatrix`/`pgRMatrix`	`csr_matrix(dtype=bool)`
`dgTMatrix`	`coo_matrix(dtype=float64)`
`lgTMatrix`/`pgTMatrix`	`coo_matrix(dtype=bool)`
`ddiMatrix`	`dia_matrix(dtype=float64)`
`ldiMatrix`	`dia_matrix(dtype=bool)`

for lossy mappings, if we want we can convert

symmetricMatrix and triangularMatrix to csc_matix (or csr_matrix, depending on layout)
bsr_matrix, lil_matrix, and dok_matrix to CsparseMatrix
*_matrix(dtype=int32) to dMatrix (no chance to convert int64 to R)

The text was updated successfully, but these errors were encountered:

ivirshup · 2019-04-04T05:35:36Z

This isn't currently implemented, right? (the docs sound make it sound like it is)

If you want I've got some parts of this.

I'd also note that for converting from AnnData to SCE we typically want to transpose the matrix, but also probably want the sample data to be continuous (AnnData favors csr, pretty sure SCE favors dgC/ csc). This makes it pretty easy, since the underlying arrays are the same.

flying-sheep · 2019-04-04T21:23:27Z

I created a skeleton here, but nothing is implemented yet. It would be very nice if you shared some code!

I am transposing the (so far only dense) matrices, the code should be the same once sparse ones are implemented.

fidelram · 2019-04-12T08:27:45Z

any progress on this issue?

ivirshup · 2019-04-14T07:07:48Z

I haven't quite had time to try and integrate this, but here's what I've got for code that transforms a scipy csr matrix into a R dgc matrix (it gets transposed):

import numpy as np
from scipy import sparse

import rpy2.robjects as ro
from rpy2.robjects import pandas2ri, numpy2ri
from rpy2.robjects.conversion import localconverter

ro.r("library(Matrix)")

def dgc_to_csr(r_dgc):
    """Convert (and transpose) a dgCMatrix from R to a csr_matrix in python
    """
    with localconverter(ro.default_converter + pandas2ri.converter):
        X = sparse.csr_matrix(
                (
                    r_dgc.slots["x"], 
                    r_dgc.slots["i"], 
                    r_dgc.slots["p"]
                ),
                shape=tuple(ro.r("dim")(r_dgc))[::-1]
            )
    return X

def csr_to_dgc(csr):
    """Convert (and transpose) a csr matrix from python to a R dgCMatrix (not sure if type is consistent)
    """
    print(csr.shape)
    numeric = ro.r("as.numeric")
    with localconverter(ro.default_converter + ro.numpy2ri.converter):
        X = ro.r("sparseMatrix")(
            i=numeric(csr.indices),
            p=numeric(csr.indptr),
            x=numeric(csr.data),
            index1=False
        )
    return X

for i in range(10):
    X = sparse.rand(1000, 100, density=.1, format="csr")
    assert np.allclose(dgc_to_csr(csr_to_dgc(X)).todense(), X.todense())

ivirshup · 2019-04-14T07:12:53Z

Fair warning, I've got a comment in my notebook that says the csr_to_dgc isn't working, but I can't remember why I wrote that, and the round trip test passes.

Edit: Maybe this was it?

def csr_to_dgc(csr):
    """Convert (and transpose) a csr matrix from python to a R dgCMatrix (not sure if type is consistent)
    """
    numeric = ro.r("as.numeric")
    with localconverter(ro.default_converter + ro.numpy2ri.converter):
        X = ro.r("sparseMatrix")(
            i=numeric(csr.indices),
            p=numeric(csr.indptr),
            x=numeric(csr.data),
            dims=list(csr.shape[::-1]),
            index1=False
        )
    return X

Also the transformation is lossy, due to not having named indices for sparse arrays in python. Otherwise that seems to work, and passes a round trip test using some data from the Seurat integration tutorial.

flying-sheep · 2019-04-15T10:38:31Z

Thank you!

The lossyness is no problem if we use it for X, since we’ll have to treat the dimnames in a special way anyway to set the obs_names and var_names correctly.

flying-sheep pinned this issue Mar 1, 2019

flying-sheep mentioned this issue Apr 25, 2019

Scipy implementations #13

Merged

2 tasks

flying-sheep closed this as completed in #13 Apr 26, 2019

flying-sheep mentioned this issue Mar 4, 2020

anndata2ri milescsmith/s2a#2

Open

SNRNS mentioned this issue Mar 18, 2020

scanpy h5ad to pagoda2 object(s) kharchenkolab/pagoda2#91

Closed

flying-sheep added the enhancement New feature or request label Apr 29, 2020

ilibarra unpinned this issue May 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement support for sparse matrices #8

Implement support for sparse matrices #8

flying-sheep commented Mar 1, 2019 •

edited

Loading

ivirshup commented Apr 4, 2019

flying-sheep commented Apr 4, 2019

fidelram commented Apr 12, 2019

ivirshup commented Apr 14, 2019

ivirshup commented Apr 14, 2019 •

edited

Loading

flying-sheep commented Apr 15, 2019 •

edited

Loading

Implement support for sparse matrices #8

Implement support for sparse matrices #8

Comments

flying-sheep commented Mar 1, 2019 • edited Loading

R Matrix classes

Scipy class names

Mapping

ivirshup commented Apr 4, 2019

flying-sheep commented Apr 4, 2019

fidelram commented Apr 12, 2019

ivirshup commented Apr 14, 2019

ivirshup commented Apr 14, 2019 • edited Loading

flying-sheep commented Apr 15, 2019 • edited Loading

flying-sheep commented Mar 1, 2019 •

edited

Loading

ivirshup commented Apr 14, 2019 •

edited

Loading

flying-sheep commented Apr 15, 2019 •

edited

Loading