-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement support for sparse matrices #8
Comments
This isn't currently implemented, right? (the docs sound make it sound like it is) If you want I've got some parts of this. I'd also note that for converting from AnnData to SCE we typically want to transpose the matrix, but also probably want the sample data to be continuous (AnnData favors |
I created a skeleton here, but nothing is implemented yet. It would be very nice if you shared some code! I am transposing the (so far only dense) matrices, the code should be the same once sparse ones are implemented. |
any progress on this issue? |
I haven't quite had time to try and integrate this, but here's what I've got for code that transforms a scipy csr matrix into a R dgc matrix (it gets transposed): import numpy as np
from scipy import sparse
import rpy2.robjects as ro
from rpy2.robjects import pandas2ri, numpy2ri
from rpy2.robjects.conversion import localconverter
ro.r("library(Matrix)")
def dgc_to_csr(r_dgc):
"""Convert (and transpose) a dgCMatrix from R to a csr_matrix in python
"""
with localconverter(ro.default_converter + pandas2ri.converter):
X = sparse.csr_matrix(
(
r_dgc.slots["x"],
r_dgc.slots["i"],
r_dgc.slots["p"]
),
shape=tuple(ro.r("dim")(r_dgc))[::-1]
)
return X
def csr_to_dgc(csr):
"""Convert (and transpose) a csr matrix from python to a R dgCMatrix (not sure if type is consistent)
"""
print(csr.shape)
numeric = ro.r("as.numeric")
with localconverter(ro.default_converter + ro.numpy2ri.converter):
X = ro.r("sparseMatrix")(
i=numeric(csr.indices),
p=numeric(csr.indptr),
x=numeric(csr.data),
index1=False
)
return X
for i in range(10):
X = sparse.rand(1000, 100, density=.1, format="csr")
assert np.allclose(dgc_to_csr(csr_to_dgc(X)).todense(), X.todense()) |
Fair warning, I've got a comment in my notebook that says the Edit: Maybe this was it? def csr_to_dgc(csr):
"""Convert (and transpose) a csr matrix from python to a R dgCMatrix (not sure if type is consistent)
"""
numeric = ro.r("as.numeric")
with localconverter(ro.default_converter + ro.numpy2ri.converter):
X = ro.r("sparseMatrix")(
i=numeric(csr.indices),
p=numeric(csr.indptr),
x=numeric(csr.data),
dims=list(csr.shape[::-1]),
index1=False
)
return X Also the transformation is lossy, due to not having named indices for sparse arrays in python. Otherwise that seems to work, and passes a round trip test using some data from the Seurat integration tutorial. |
Thank you! The lossyness is no problem if we use it for X, since we’ll have to treat the dimnames in a special way anyway to set the |
scipy.sparse has its counterparts in the Matrix package.
R Matrix classes
testables via
is(m, class_name)
, or directly by parsing the letter in[dlni]([gst][CRT]|di|ge|s[yp]|t[rp])Matrix
.symmetric and triangular matrices have no equivalent in scipy or numpy.
sparsity:
sparseMatrix
denseMatrix
type:
d__Matrix
:dMatrix
: doublel__Matrix
:lMatrix
: logical (=bool)n__Matrix
:nMatrix
: pattern (=logical without NA?)doesn’t really exist (yet?)i__Matrix
:iMatrix
: integershape:
_g_Matrix
:generalMatrix
_s_Matrix
:symmetricMatrix
_t_Matrix
:triangularMatrix
storage:
__CMatrix
:CsparseMatrix
: column-compressed__RMatrix
:RsparseMatrix
: row-comressed__TMatrix
:TsparseMatrix
: triplet__pMatrix
: packed dense matrix (symmetric or triangular)special (combinations of shape and storage)
_diMatrix
:diagonalMatrix
(counts as sparse)_geMatrix
: general dense matrix_s[yp]Matrix
: symmetric dense matrix (unpacked or packed)_t[rp]Matrix
: triangular dense matrix (unpacked or packed)Scipy class names
Does not have triangular or symmetric matrices, only generic and diagonal sparse matrices. numpy has the dense ones of course (
ndarray
).The type is given as dtype, here you can have everything in numpy/scipy, not only logical, double, and integer.
R doesn’t have float32, numpy/scipy doesn’t have NA. These have equivalents in R.
spmatrix
: Base class for all sparse matricescsc_matrix
: Compressed Sparse Column matrixcsr_matrix
: Compressed Sparse Row matrixcoo_matrix
: sparse matrix in COOrdinate formatdia_matrix
: Sparse matrix with DIAgonal storageThese are not available in R:
dok_matrix
: Dictionary Of Keys based sparse matrixlil_matrix
: Row-based linked list sparse matrixbsr_matrix
: Block Sparse Row matrixMapping
The final possible lossless mappings (lossless except for
NA
, which isn’t supported in python at all):dgCMatrix
csc_matrix(dtype=float64)
lgCMatrix
/pgCMatrix
csc_matrix(dtype=bool)
dgRMatrix
csr_matrix(dtype=float64)
lgRMatrix
/pgRMatrix
csr_matrix(dtype=bool)
dgTMatrix
coo_matrix(dtype=float64)
lgTMatrix
/pgTMatrix
coo_matrix(dtype=bool)
ddiMatrix
dia_matrix(dtype=float64)
ldiMatrix
dia_matrix(dtype=bool)
for lossy mappings, if we want we can convert
symmetricMatrix
andtriangularMatrix
tocsc_matix
(orcsr_matrix
, depending on layout)bsr_matrix
,lil_matrix
, anddok_matrix
toCsparseMatrix
*_matrix(dtype=int32)
todMatrix
(no chance to convertint64
to R)The text was updated successfully, but these errors were encountered: