Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write() function error: 'reserved name for dataframe columns' #255

Closed
rfenouil opened this issue Jul 22, 2020 · 13 comments
Closed

write() function error: 'reserved name for dataframe columns' #255

rfenouil opened this issue Jul 22, 2020 · 13 comments

Comments

@rfenouil
Copy link

Hello, I am having an error message when trying to save intermediate results as binary file using adata.write() function.
The error message seems to happen only when using the Seurat wrapper found here, not when doing the tutorial with 'pancreas' dataset.

See below for R and Python code to reproduce:

library(Seurat)
library(SeuratDisk)
library(SeuratWrappers)

curl::curl_download(url = 'http://pklab.med.harvard.edu/velocyto/mouseBM/SCG71.loom', destfile = "/data.loom")

ldat <- ReadVelocity(file = "/data.loom")
bm <- as.Seurat(x = ldat) 
bm[["RNA"]] <- bm[["spliced"]]
bm <- SCTransform(bm)
bm <- RunPCA(bm)
bm <- RunUMAP(bm, dims = 1:20)
bm <- FindNeighbors(bm, dims = 1:20)
bm <- FindClusters(bm)
DefaultAssay(bm) <- "RNA"
SaveH5Seurat(bm, filename = "/mouseBM.h5Seurat")
Convert("/mouseBM.h5Seurat", dest = "h5ad")
import scvelo as scv

scv.settings.verbosity = 3  # show errors(0), warnings(1), info(2), hints(3)
scv.settings.presenter_view = True  # set max width size for presenter view
scv.settings.set_figure_params('scvelo')  # for beautified visualization

adata = scv.read("/mouseBM.h5ad")

adata.write("/mouseBM_processed.h5ad")
Error
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/anndata/_io/utils.py", line 188, in func_wrapper
    return func(elem, key, val, *args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/anndata/_io/h5ad.py", line 241, in write_dataframe
    raise ValueError(f"{reserved!r} is a reserved name for dataframe columns.")
ValueError: '_index' is a reserved name for dataframe columns.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.7/dist-packages/anndata/_core/anndata.py", line 1852, in write_h5ad
    as_dense=as_dense,
  File "/usr/local/lib/python3.7/dist-packages/anndata/_io/h5ad.py", line 104, in write_h5ad
    write_attribute(f, "raw", adata.raw, dataset_kwargs=dataset_kwargs)
  File "/usr/lib/python3.7/functools.py", line 827, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/usr/local/lib/python3.7/dist-packages/anndata/_io/h5ad.py", line 126, in write_attribute_h5ad
    _write_method(type(value))(f, key, value, *args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/anndata/_io/h5ad.py", line 135, in write_raw
    write_attribute(f, "raw/var", value.var, dataset_kwargs=dataset_kwargs)
  File "/usr/lib/python3.7/functools.py", line 827, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/usr/local/lib/python3.7/dist-packages/anndata/_io/h5ad.py", line 126, in write_attribute_h5ad
    _write_method(type(value))(f, key, value, *args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/anndata/_io/utils.py", line 195, in func_wrapper
    ) from e
ValueError: '_index' is a reserved name for dataframe columns.

Above error raised while writing key 'raw/var' of <class 'h5py._hl.files.File'> from /.

Versions:

scvelo==0.2.1 scanpy==1.5.1 anndata==0.7.4 loompy==3.0.6 numpy==1.19.0 scipy==1.5.1 matplotlib==3.2.2 sklearn==0.23.1 pandas==1.0.5

Thank you for the great work and your help.

@rfenouil rfenouil added the bug Something isn't working label Jul 22, 2020
@VolkerBergen
Copy link
Contributor

Running it directly in scvelo works fine

adata = scv.read('data/SCG71.loom',  backup_url='http://pklab.med.harvard.edu/velocyto/mouseBM/SCG71.loom')
adata.write('data/SCG71.h5ad')

Hence, something is included in Seurat, that triggers that error. Could you please print adata and see whether there is any entry named '_index'.

@davisidarta
Copy link

Any updates on this? I'm also having this issue using saving .h5ad files from .h5ad files created using SeuratDisk, exclusively after running scv.pp.moments(adata). The same error does not happen when saving the same .h5ad file after performing additional analysis on scanpy - only after calculating moments within scvelo.

@mihem
Copy link

mihem commented Nov 16, 2020

I'am also having this same issue, running
adata.write(filename = "scvelo.h5ad")

For me it also doesn't work before running scvelo.

adata = scv.read("SeuratObject.h5ad")
adata.write(filename = "scvelo.h5ad")

raises:

ValueError: '_index' is a reserved name for dataframe columns.

While it works fine with the dataset that VolkerBergen suggested.

@VolkerBergen could you maybe specify where you would expect the "_index" entry to be?

my AnnData object looks like this in the summary

obs: 'orig.ident', 'nCount_spliced', 'nFeature_spliced', 'nCount_unspliced', 'nFeature_unspliced', 'nCount_ambiguous', 'nFeature_ambiguous', 'nCount_RNA', 'nFeature_RNA', 'library', 'tissue', 'percent_mt', 'seurat_clusters', 'spliced_snn_res.0.3', 'label_new'
    var: 'features', 'ambiguous_features', 'spliced_features', 'unspliced_features'
    obsm: 'X_umap'
    layers: 'ambiguous', 'spliced', 'unspliced'

@mariafiruleva
Copy link

I guess, the source of problem is content of the df.__dict__['_raw'].__dict__.
Specifically, df.__dict__['_raw'].__dict__['_var'] contains dataframe with all features as rows and _index as column name.
Renaming resolves the issue.

adata.__dict__['_raw'].__dict__['_var'] = adata.__dict__['_raw'].__dict__['_var'].rename(columns={'_index': 'features'})

@zehualilab
Copy link

I guess, the source of problem is content of the df.__dict__['_raw'].__dict__.
Specifically, df.__dict__['_raw'].__dict__['_var'] contains dataframe with all features as rows and _index as column name.
Renaming resolves the issue.

adata.__dict__['_raw'].__dict__['_var'] = adata.__dict__['_raw'].__dict__['_var'].rename(columns={'_index': 'features'})

OMG!!!!OMG!!!!!OMG!!!!OMG!!!!!PROBLEM SOLVED!!!!!!!PROBLEM SOLVED!!!!!!!THX!!!!!!THX!!!!!!!!!!!

@WeilerP WeilerP closed this as completed May 30, 2021
@WeilerP WeilerP removed the bug Something isn't working label May 30, 2021
@genecell
Copy link

I guess, the source of problem is content of the df.__dict__['_raw'].__dict__. Specifically, df.__dict__['_raw'].__dict__['_var'] contains dataframe with all features as rows and _index as column name. Renaming resolves the issue.

adata.__dict__['_raw'].__dict__['_var'] = adata.__dict__['_raw'].__dict__['_var'].rename(columns={'_index': 'features'})

This works for me for saving the anndata h5ad file, but I got the following message when I plot the dotplot:

f"Could not find keys '{not_found}' in columns of `adata.{dim}` or in"
KeyError: "Could not find keys '['AC004791.2', 'ALKBH5', 'APOBEC3A', 'ATHL1', 'BANK1', 'BCL9L', 'BST1', 'C1QA', 'C1QC', 'C1QTNF4', 'CALB2', 'CCR8', 'CD1C', 'CD8B', 'CDK15', 'CLEC10A', 'CMTM8', 'CXCL13', 'CYB561', 'DERL3', 'EOMES', 'FCER1A', 'FCGR3A', 'FGFBP2', 'FOXP3', 'FSCN1', 'GALNT2', 'GNG4', 'GZMK', 'HOXC6', 'HSPA6', 'IDO1', 'IFIT1', 'IFIT3', 'IGFL2', 'IGHG4', 'IL1B', 'IL1RN', 'IL7R', 'KLRF1', 'KRT5', 'KRT86', 'LAD1', 'LEF1', 'LINC00926', 'METRNL', 'MKI67', 'MS4A1', 'MTRNR2L8', 'MZB1', 'NR4A2', 'P2RY6', 'PASK', 'PEMT', 'PTGS2', 'PTPN13', 'PTPRS', 'RNASE1', 'ROR1.AS1', 'RP11.138A9.1', 'RP11.354E11.2', 'RP11.89C3.4', 'RPL34', 'RPL36A', 'RRM2', 'RSAD2', 'RTKN2', 'TLDC2', 'TLR8', 'TOR4A', 'TUBA4A', 'UBE2C', 'ZNF331']' in columns of `adata.obs` or in adata.raw.var_names."

I tried to delete the adata.raw:

del adata.raw

and now I can save the anndata file, and also it works for the dotplot function.

@paulitikka
Copy link

If someone is still experiencing an issue with this saving execute also the following:
del(adata.var['_index']) #after the 'adata.dict['_raw'].dict['_var'] = adata.dict['_raw'].dict['_var'].rename(columns={'_index': 'features'}); del(adata.raw)' solution

@YY-SONG0718
Copy link

del(adata.var['_index'])

recently I encounter this error again after using the original solution for a while, this solved the issue, thanks!

@paulitikka
Copy link

paulitikka commented Aug 15, 2022

You are welcome Yuyao!

@Mayank0512
Copy link

I guess, the source of problem is content of the df.__dict__['_raw'].__dict__. Specifically, df.__dict__['_raw'].__dict__['_var'] contains dataframe with all features as rows and _index as column name. Renaming resolves the issue.

adata.__dict__['_raw'].__dict__['_var'] = adata.__dict__['_raw'].__dict__['_var'].rename(columns={'_index': 'features'})

Damn man that works....thanks so much....u are a true savior!!!
Thank youuuuuuuuu again

@weir12
Copy link

weir12 commented Mar 20, 2023

I guess, the source of problem is content of the df.__dict__['_raw'].__dict__. Specifically, df.__dict__['_raw'].__dict__['_var'] contains dataframe with all features as rows and _index as column name. Renaming resolves the issue.

adata.__dict__['_raw'].__dict__['_var'] = adata.__dict__['_raw'].__dict__['_var'].rename(columns={'_index': 'features'})

BRAVO ! !!

@maximilianh
Copy link

Oh boy, @mariafiruleva so many thanks!!

This command is a little easier to read, for me at least, and seems to do the same thing:

adata._raw._var.rename(columns={'_index': 'features'}, inplace=True)

@Tianran1998
Copy link

I guess, the source of problem is content of the df.__dict__['_raw'].__dict__. Specifically, df.__dict__['_raw'].__dict__['_var'] contains dataframe with all features as rows and _index as column name. Renaming resolves the issue.

adata.__dict__['_raw'].__dict__['_var'] = adata.__dict__['_raw'].__dict__['_var'].rename(columns={'_index': 'features'})

It works!Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests