Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Combat Function for Batch Effects #398
this is a re-implementation of the ComBat function in python for batch effect removal by Brent Pedersen at the University of Utah which I slightly modified to work with AnnData objects. I asked Brent for permission and he would be happy with us using this.
Originally, the code was written in R for the SVA package:
The idea is taken from this paper:
Originally, the method was developed to adjust for batch effects in microarray data, however, it is commonly applied to scRNA-seq data nowadays. The method fits linear models to the genes and pools statistical power by means of EB to estimate per gene correction factors.
I understand that @mbuttner also has an implementation of this - maybe we can combine and get the best of both approaches?
falexwolf left a comment
Thank you for the very well documented PR, @Marius1311! I have a few stylistic comments on the code below, which you should be able to address without much work.
But much more importantly, as this is a complete reimplementation, we need at least one unit test, better a couple more. Looking forward to the updated PR!
I second this initiative. I had used the code from Brent and works quite well. Naturally, having it integrated into Scanpy would be great.…
On Mon, Dec 17, 2018 at 2:18 PM Marius Lange ***@***.***> wrote: *@Marius1311* commented on this pull request. ------------------------------ In scanpy/preprocessing/combat.py <#398 (comment)>: > @@ -0,0 +1,161 @@ +import numpy as np +from scipy.sparse import issparse +import pandas as pd +import sys +from numpy import linalg as la +import patsy + +def design_mat(mod, batch_levels): + # require levels to make sure they are in the same order as we use in the + # rest of the script. + design = patsy.dmatrix("~ 0 + C(batch, levels=%s)" % str(batch_levels), thanks, did that! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#398 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEu_1fZSO-j8m0NwemluQp-0wNEGDHJ9ks5u55mlgaJpZM4ZTmeq> .
Hey everyone, thanks for your feedback! In the latest commit, I have tried to include all of your comments, including the more stylistic comments, the references, the numba integration, the unit tests and so on. Have a look and see what you think. I won't be able to work on this any more this year because I am going on holidays. Merry Christmas everyone!
referenced this pull request
Jan 6, 2019
Thank you very much! I merged this via the command line after adapting to the private module design.
I still get a to me cryptic AttributeError from patsy on my Mac, but the tests are fine and on the Linux server it also runs fine: