Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behavior by decreasing genes #20

Closed
tlnagy opened this issue May 13, 2016 · 4 comments
Closed

Unexpected behavior by decreasing genes #20

tlnagy opened this issue May 13, 2016 · 4 comments

Comments

@tlnagy
Copy link
Owner

tlnagy commented May 13, 2016

While analyzing the results of the growth screens #11, I noticed some weird behavior by the "decreasing" genes. These are the genes are that are expected to fall out of the experiment as you increase the number of bottlenecks and you can see this decrease with most combinations of growth screen parameters:

growth_screen_perf_param_dependence

but you can also see that the auroc scores then improve as you increase the number of bottlenecks further! The increasing genes behave as expected. This behavior might be a result of a bug in how I compute aurocs. Since I only have summary values for now, it's hard to say what is causing this, but I want to make sure that this is explored fully.

@tlnagy
Copy link
Owner Author

tlnagy commented May 13, 2016

Same data plotted differently

perf

@martinkampmann
Copy link
Collaborator

Yes, this is certainly unexpected. If anything I would have expected that there is an initial increase of AUC with number of bottlenecks (since the phenotypes separate out more with each round of growth), and then maybe a decrease due to noise - not the opposite. Will probably be instructive to look at raw counts in the sequencing data for sgRNAs targeting those genes (in scatter plots t0 versus end point)

@tlnagy
Copy link
Owner Author

tlnagy commented May 13, 2016

That is what I see for the "increasing" genes, only the "decreasing" ones have this weird behavior. I think I'll need to do some in-depth sleuthing to figure out what is going on here.

@tlnagy
Copy link
Owner Author

tlnagy commented May 14, 2016

So this appears to be an artifact of the "increasing" genes being too effective and causing all other genes to drop out of the pool. These dropouts then have a pseudocount applied and I end up comparing genes where all of their values at t10+ are pseudocounts only. Currently, the library is designed in such a way that increasing genes can double at twice the rate of the negative controls. After 20 bottlenecks, this is 2^20 in excess, which is almost half the pool at 1000x coverage. While this can explain why the AUROCs drop for large nimber of bottlenecks, it doesn't explain why the AUROCs improve for the decreasing genes specifically.

I suspect it may be the following: decreasing genes are likely to already have an effect in the growth period so they have a smaller value at t0 and when dividing t10 by this value, yields a greater value than for the inactive genes, leading to a better separation at t10+.

@martinkampmann and I decided the best way forward is to change the library design so that there are fewer increasing genes and they have a weaker phenotype.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants