Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement support for sparse matrices #8

Closed
flying-sheep opened this issue Mar 1, 2019 · 6 comments · Fixed by #13
Closed

Implement support for sparse matrices #8

flying-sheep opened this issue Mar 1, 2019 · 6 comments · Fixed by #13
Labels
enhancement New feature or request

Comments

@flying-sheep
Copy link
Collaborator

flying-sheep commented Mar 1, 2019

scipy.sparse has its counterparts in the Matrix package.

R Matrix classes

testables via is(m, class_name), or directly by parsing the letter in [dlni]([gst][CRT]|di|ge|s[yp]|t[rp])Matrix.

symmetric and triangular matrices have no equivalent in scipy or numpy.

sparsity:

  • sparseMatrix
  • denseMatrix

type:

  • d__Matrix: dMatrix: double
  • l__Matrix: lMatrix: logical (=bool)
  • n__Matrix: nMatrix: pattern (=logical without NA?)
  • i__Matrix: iMatrix: integer doesn’t really exist (yet?)

shape:

  • _g_Matrix: generalMatrix
  • _s_Matrix: symmetricMatrix
  • _t_Matrix: triangularMatrix

storage:

  • __CMatrix: CsparseMatrix: column-compressed
  • __RMatrix: RsparseMatrix: row-comressed
  • __TMatrix: TsparseMatrix: triplet
  • __pMatrix: packed dense matrix (symmetric or triangular)

special (combinations of shape and storage)

  • _diMatrix: diagonalMatrix (counts as sparse)
  • _geMatrix: general dense matrix
  • _s[yp]Matrix: symmetric dense matrix (unpacked or packed)
  • _t[rp]Matrix: triangular dense matrix (unpacked or packed)

Scipy class names

Does not have triangular or symmetric matrices, only generic and diagonal sparse matrices. numpy has the dense ones of course (ndarray).

The type is given as dtype, here you can have everything in numpy/scipy, not only logical, double, and integer.

R doesn’t have float32, numpy/scipy doesn’t have NA. These have equivalents in R.

  • spmatrix: Base class for all sparse matrices
  • csc_matrix: Compressed Sparse Column matrix
  • csr_matrix: Compressed Sparse Row matrix
  • coo_matrix: sparse matrix in COOrdinate format
  • dia_matrix: Sparse matrix with DIAgonal storage

These are not available in R:

  • dok_matrix: Dictionary Of Keys based sparse matrix
  • lil_matrix: Row-based linked list sparse matrix
  • bsr_matrix: Block Sparse Row matrix

Mapping

The final possible lossless mappings (lossless except for NA, which isn’t supported in python at all):

R Python
dgCMatrix csc_matrix(dtype=float64)
lgCMatrix/pgCMatrix csc_matrix(dtype=bool)
dgRMatrix csr_matrix(dtype=float64)
lgRMatrix/pgRMatrix csr_matrix(dtype=bool)
dgTMatrix coo_matrix(dtype=float64)
lgTMatrix/pgTMatrix coo_matrix(dtype=bool)
ddiMatrix dia_matrix(dtype=float64)
ldiMatrix dia_matrix(dtype=bool)

for lossy mappings, if we want we can convert

  • symmetricMatrix and triangularMatrix to csc_matix (or csr_matrix, depending on layout)
  • bsr_matrix, lil_matrix, and dok_matrix to CsparseMatrix
  • *_matrix(dtype=int32) to dMatrix (no chance to convert int64 to R)
@flying-sheep flying-sheep pinned this issue Mar 1, 2019
@ivirshup
Copy link
Member

ivirshup commented Apr 4, 2019

This isn't currently implemented, right? (the docs sound make it sound like it is)

If you want I've got some parts of this.

I'd also note that for converting from AnnData to SCE we typically want to transpose the matrix, but also probably want the sample data to be continuous (AnnData favors csr, pretty sure SCE favors dgC/ csc). This makes it pretty easy, since the underlying arrays are the same.

@flying-sheep
Copy link
Collaborator Author

I created a skeleton here, but nothing is implemented yet. It would be very nice if you shared some code!

I am transposing the (so far only dense) matrices, the code should be the same once sparse ones are implemented.

@fidelram
Copy link

any progress on this issue?

@ivirshup
Copy link
Member

I haven't quite had time to try and integrate this, but here's what I've got for code that transforms a scipy csr matrix into a R dgc matrix (it gets transposed):

import numpy as np
from scipy import sparse

import rpy2.robjects as ro
from rpy2.robjects import pandas2ri, numpy2ri
from rpy2.robjects.conversion import localconverter

ro.r("library(Matrix)")

def dgc_to_csr(r_dgc):
    """Convert (and transpose) a dgCMatrix from R to a csr_matrix in python
    """
    with localconverter(ro.default_converter + pandas2ri.converter):
        X = sparse.csr_matrix(
                (
                    r_dgc.slots["x"], 
                    r_dgc.slots["i"], 
                    r_dgc.slots["p"]
                ),
                shape=tuple(ro.r("dim")(r_dgc))[::-1]
            )
    return X

def csr_to_dgc(csr):
    """Convert (and transpose) a csr matrix from python to a R dgCMatrix (not sure if type is consistent)
    """
    print(csr.shape)
    numeric = ro.r("as.numeric")
    with localconverter(ro.default_converter + ro.numpy2ri.converter):
        X = ro.r("sparseMatrix")(
            i=numeric(csr.indices),
            p=numeric(csr.indptr),
            x=numeric(csr.data),
            index1=False
        )
    return X

for i in range(10):
    X = sparse.rand(1000, 100, density=.1, format="csr")
    assert np.allclose(dgc_to_csr(csr_to_dgc(X)).todense(), X.todense())

@ivirshup
Copy link
Member

ivirshup commented Apr 14, 2019

Fair warning, I've got a comment in my notebook that says the csr_to_dgc isn't working, but I can't remember why I wrote that, and the round trip test passes.

Edit: Maybe this was it?

def csr_to_dgc(csr):
    """Convert (and transpose) a csr matrix from python to a R dgCMatrix (not sure if type is consistent)
    """
    numeric = ro.r("as.numeric")
    with localconverter(ro.default_converter + ro.numpy2ri.converter):
        X = ro.r("sparseMatrix")(
            i=numeric(csr.indices),
            p=numeric(csr.indptr),
            x=numeric(csr.data),
            dims=list(csr.shape[::-1]),
            index1=False
        )
    return X

Also the transformation is lossy, due to not having named indices for sparse arrays in python. Otherwise that seems to work, and passes a round trip test using some data from the Seurat integration tutorial.

@flying-sheep
Copy link
Collaborator Author

flying-sheep commented Apr 15, 2019

Thank you!

The lossyness is no problem if we use it for X, since we’ll have to treat the dimnames in a special way anyway to set the obs_names and var_names correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants