-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question regarding batch effect removal step #5
Comments
Hello, Personally, I do not recommend removing the batch effect where batch and cell types are confounded. We also provided the batch removal results (UMAP figures) where batch and cell types are confounded in the paper, which tends to mess up the cell type separation. Best, |
Thank you, Ye, for your quick response! I run the demo data already, and it worked well. I think I can try to run the Kim et al. dataset without the GM12878-IMR90.R1-related cells, but even if it works, it does not mean too much to me, since I still need those cells to be normalized. I guess maybe I can run with no batch removal turning on, as you suggested above. Moreover, may I ask do you have a cell summary file for the Kim et al. dataset you provided in BandNorm? If you do, perhaps I can try yours. Additionally, when I run your script, some of the scvi-3D.py code that comes from the scvi tools package have been deprecated, which is "scvi.data.setup_anndata(adata)", it is now called by "scvi.model.SCVI.setup_anndata(adata)", you may need to update your code accordingly. Thanks |
Yes, the summary file for Kim2020 is provided through the BandNorm package: https://sshen82.github.io/BandNorm/articles/BandNorm-tutorial.html#download-existing-single-cell-hi-c-data More specifically: https://pages.stat.wisc.edu/~sshen82/bandnorm/Summary/Kim2020_Summary.txt Yes, the scvi-tool has been updated quite frequently after we launched scVI-3D. Thanks for pointing it out! We will make it more robust to newer and older versions. Thanks, |
Thanks for the information! Moreover, may I ask a question regarding BandNorm? I see that in the tutorial of BandNorm that you provided (https://sshen82.github.io/BandNorm/articles/BandNorm-tutorial.html), you can just provide the same contact regions format input files (format 1) to BandNorm to do the normalization. However, I don't see anywhere you mentioned including the cell summary information when implementing the main function of BandNorm, which is "bandnorm_result = bandnorm(hic_df = hic_df, save = FALSE)", while you have that option in scVI-3D. Did I miss something or it is just not necessary? Thanks |
Hello, Best, |
Thank you, Ye, I think I've got what I want to know about. Regards |
Hi Ye,
Thanks for this great tool first. I am trying to run it on the Kim et al. 2020 dataset, but not the one you provided, I download it elsewhere, where the cells are concatenated together, and 16707 cells in total. Well, that shouldn't be a problem, the resolution is still 500k, and I isolated the cells by their ids, and converted the data to the format that scVI-3D can process. In order to account for the batch effects, I also included a cell summary file as follows (sampled from the file),
name batch cell_type
cell_1.txt IMR90-HAP1.R1 HAP1
cell_2.txt IMR90-HAP1.R1 HAP1
cell_3.txt IMR90-HAP1.R1 HAP1
cell_4.txt IMR90-HAP1.R1 HAP1
cell_5.txt IMR90-HAP1.R1 HAP1
cell_6.txt IMR90-HAP1.R1 IMR90
cell_7.txt IMR90-HAP1.R1 HAP1
cell_8.txt IMR90-HAP1.R1 HAP1
cell_9.txt IMR90-HAP1.R1 HAP1
as the cell summary is shown above, although I don't have the depth and sparsity information in the example file, I think it should be enough for batch removal.
However, when implementing the algorithm, I got the following error message after 400 epochs,
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/mmfs1/apps/spack/0.16.1/linux-rhel8-zen2/gcc-10.2.0/python-3.8.6-2pmflf74yv3epdgoav5gykxzbrdxl37l/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/mmfs1/scratch/sdontsay/lib/python3.8/site-packages/joblib/_parallel_backends.py", line 595, in call
return self.func(*args, **kwargs)
File "/mmfs1/scratch/sdontsay/lib/python3.8/site-packages/joblib/parallel.py", line 262, in call
return [func(*args, **kwargs)
File "/mmfs1/scratch/sdontsay/lib/python3.8/site-packages/joblib/parallel.py", line 262, in
return [func(*args, **kwargs)
File "/mmfs1/scratch/scVI-3D/scripts/scVI-3D.py", line 194, in normalize
imputeTmp = imputeTmp + model.get_normalized_expression(library_size = bandDepth, transform_batch = batchName)
File "/mmfs1/scratch/sdontsay/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/mmfs1/scratch/sdontsay/lib/python3.8/site-packages/scvi/model/base/_rnamixin.py", line 100, in get_normalized_expression
transform_batch = _get_batch_code_from_category(
File "/mmfs1/scratch/sdontsay/lib/python3.8/site-packages/scvi/model/_utils.py", line 243, in _get_batch_code_from_category
raise ValueError(f'"{cat}" not a valid batch category.')
ValueError: "GM12878-IMR90.R1" not a valid batch category.
"""
I don't understand how to fix this problem, as I have included the batch information in the cell summary file. And I checked out the source code in scvi tools, it looks like it can only account for known batches, is that correct? I used to use another tool, and it can deal with this batch effect I attached, so I thought that you can just throw the batch information over and get it eliminated. If not, please correct me, thanks!
The text was updated successfully, but these errors were encountered: