Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ComBat error "TypeError: data type not understood" #2

Closed
LuckyMD opened this issue Jan 9, 2019 · 14 comments
Closed

ComBat error "TypeError: data type not understood" #2

LuckyMD opened this issue Jan 9, 2019 · 14 comments

Comments

@LuckyMD
Copy link
Contributor

LuckyMD commented Jan 9, 2019

Issue report for the issue posted in #1:
ComBat gives the following error: TypeError: data type not understood.

@jphe Could you clarify whether you are still using a sparse data matrix? The current ComBat implementation does not work with the sparse matrix format.

The ComBat function from www.github.com/mbuttner/maren_codes/ was designed to take pandas Dataframes as input, so the pandas dataframe is not the problem. The code does have issues when your data has 0 variance in the expression values of a gene. So you should filter out genes with constant gene expression values (usually genes with 0 expression).

It would also be good to know the output of type(data.T).

@LuckyMD
Copy link
Contributor Author

LuckyMD commented Jan 9, 2019

@mbuttner do you have some insight into this error? I recall seeing it before. I believe you updated your code at some point to fix this. The code that leads to this is in #1

@mbuttner
Copy link

mbuttner commented Jan 9, 2019

Hi,
in my implementation, I had to use a dense matrix:

data = pd.DataFrame(adata.X.todense())
batch = pd.Series(adata.obs['sample'])
batch = batch.reset_index()
data_cor = combat(data=data.T,batch=batch['sample'])

@LuckyMD
Copy link
Contributor Author

LuckyMD commented Jan 9, 2019

@mbuttner Thanks for the confirmation. The case study also includes a call to adata.X.toarray() to generate a dense matrix. I hope that solves the issue here.

@jphe
Copy link

jphe commented Jan 9, 2019

Hi,

Thanks for your rely, the adata.X is a numpy.ndarray as I did adata.X = adata.X.toarray() before this step for the R issues, and the data is a pandas dataframe.
type(adata.X)
numpy.ndarray
type(data.T)
pandas.core.frame.DataFrame

And I tried "data = pd.DataFrame(adata.X.todense())", but it reports an error that "AttributeError: 'numpy.ndarray' object has no attribute 'todense'"

And yes, have already filtered out genes with constant 0 expression in the QC step followed the tutorials "sc.pp.filter_genes(adata, min_cells=20)"

Thanks,

@LuckyMD
Copy link
Contributor Author

LuckyMD commented Jan 9, 2019

The .todense() call only works on a sparse matrix format, so that makes sense.

The rather uninformative error message doesn't really help. Could it be that you have 0 variance of a gene in your current dataset within a batch?

@mbuttner
Copy link

mbuttner commented Jan 9, 2019

Hi,
@jphe I am also not terribly familiar with the data conversion in Python. Can you show the output of
print(data.iloc[:10,:10]) and print(data.shape)? I would like to know if the shape of the data matrix is correct and that the matrix has valid entries.

@jphe
Copy link

jphe commented Jan 9, 2019

I use the same test data of the tutorials from GSE92332_RAW.tar, and have checked there is no 0 variance genes
np.min(data.std(0).values)
0.02715344

Have tried to convert the adata.X to sparse and then convert to dense, bu still the same error.

import scipy
adata.X=scipy.sparse.csr_matrix(adata.X)
data = pd.DataFrame(adata.X.todense())
batch = pd.Series(adata.obs['sample'])
batch = batch.reset_index()
data_cor = c.combat(data=data.T,batch=batch['sample'])

~/anaconda3/envs/py36/lib/python3.6/site-packages/numpy/core/numerictypes.py in issubdtype(arg1, arg2)
712 """
713 if not issubclass_(arg1, generic):
--> 714 arg1 = dtype(arg1).type
715 if not issubclass_(arg2, generic):
716 arg2_orig = arg2

TypeError: data type not understood

@jphe
Copy link

jphe commented Jan 9, 2019

Hi,

@mbuttner The output is like the attached.

wx20190109-194935 2x

@LuckyMD
Copy link
Contributor Author

LuckyMD commented Jan 9, 2019

It's quite strange that you are using the tutorial dataset and nonetheless get different results than I do. The version of Combat that I have works for me. So to clarify, you have followed all the tutorial steps up until the batch correction part?

I will investigate and let you know if I can reproduce the error.

@LuckyMD
Copy link
Contributor Author

LuckyMD commented Jan 9, 2019

So if I rerun the script, I get the results as expected. Could you possibly send me your adata object (or the data and batch objects), so that I can check whether it's the version of the combat function or the object?

Thanks

@mbuttner
Copy link

mbuttner commented Jan 9, 2019

Hm, the output looks ok. Do you have the full traceback of the error message? Also, what is the output of sc.logging.print_versions()?

@LuckyMD
Copy link
Contributor Author

LuckyMD commented Jan 9, 2019

Hi,

@mbuttner The output is like the attached.

wx20190109-194935 2x

So I have a different dataset in data when I get to this point in the tutorial script.

image

@jphe
Copy link

jphe commented Jan 10, 2019

Hi,

Thanks, I have rebuild an independent conda environment with updated all packages to the latest version and it is correct now, may some of packages is too old.

Thanks agin.

@LuckyMD
Copy link
Contributor Author

LuckyMD commented Jan 10, 2019

I'm glad it's working for you now.

It would be good to know what package caused this in case it happens again though.

@LuckyMD LuckyMD closed this as completed Jan 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants