Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics script always recomputes neighbors (overwrites knn-graph!) #48

Closed
mbuttner opened this issue Nov 26, 2019 · 16 comments
Closed

metrics script always recomputes neighbors (overwrites knn-graph!) #48

mbuttner opened this issue Nov 26, 2019 · 16 comments

Comments

@mbuttner
Copy link
Contributor

Hi,
I just realised that the scripts/metrics.py function always calls
scIB.preprocessing.reduce_data (line 100 at Master)
and recomputes the knn-graph EVEN IF we have run bbknn. Thus, the bbknn result is effectively overwritten.
Best,
Maren

@kridsadakorn
Copy link
Collaborator

This can be implied that I should be aware not to overwrite metrics' results to runIntegration's results, right?

@mbuttner
Copy link
Contributor Author

mbuttner commented Nov 26, 2019 via email

@kridsadakorn
Copy link
Collaborator

OK I got the point, so the result of Metrics.py is not correct in case of bkknn and cronos

@LuckyMD
Copy link
Collaborator

LuckyMD commented Nov 26, 2019

@mbuttner great that you spotted this! This is what we're trying to avoid with the unit tests. @mumichae can you take a look?

@mumichae
Copy link
Collaborator

@mbuttner Thanks for spotting it. I fixed it with 2cae16c on metric_fixes.
@kridsadakorn Try, if it works now.
@LuckyMD I'm not quite sure how to set up unit tests that test the script. Should I wrap the decision part into a function of the package or should I call the script in the unit test?

@LuckyMD
Copy link
Collaborator

LuckyMD commented Nov 26, 2019

For now call the script in the test. We'll work on speed another time.

@LuckyMD
Copy link
Collaborator

LuckyMD commented Nov 26, 2019

Also, we should try to merge PRs a bit quicker. At the moment we're working in our own PRs too much. Let's make specific PRs and merge quickly.

@mumichae
Copy link
Collaborator

mumichae commented Nov 26, 2019

Could we merge the metrics_fixes right now already? Then I would open a new branch for the script testing

@LuckyMD
Copy link
Collaborator

LuckyMD commented Nov 26, 2019

let me take a look at them later/tomorrow. Just open a new branch for script testing and pull metric fixes into that.

@mumichae
Copy link
Collaborator

@kridsadakorn Has this issue been resolved?

@kridsadakorn
Copy link
Collaborator

@mumichae How can I observe that the problem is gone? The ARI should not be equal to "1", right?

From Pancreas:

,human_pancreas_norm_bbknn_hvg0_knn,human_pancreas_norm_harmony_hvg0_embed,human_pancreas_norm_mnn_hvg0_full,human_pancreas_norm_scanorama_hvg0_embed,human_pancreas_norm_scanorama_hvg0_full,human_pancre
as_norm_seurat_hvg0_full,human_pancreas_norm_trvae_hvg0_embed,human_pancreas_norm_trvae_hvg0_full
NMI_cluster/label,0.46401801695452904,0.5042002919424581,0.6002282758722878,0.6123651119241046,0.6108255019101648,0.5679437049212369,0.5827635868924284,0.5388559433269677
ARI_cluster/label,0.16853642878982472,0.1514453414749276,0.20087193882907997,0.20956007009309266,0.20137486657273718,0.21995362399282753,0.20455721054365034,0.17647388431744748
ASW_label,,0.2852015053041666,0.509965711273253,0.42375880950884437,0.41808442771434784,0.3476724773645401,0.4333161190152168,0.273640975356102
ASW_label/batch,,0.32514855991659264,0.6220727460922302,0.5051653726518818,0.4626033526104369,0.4426532251891888,0.5113974121340542,0.3109629468094947
PCR_batch,,0.10250425238404949,0.06257392329011563,0.5766905597395535,0.5765105840018274,0.9603319486438531,0.39888335590252977,0.0166062273807448
cell_cycle_conservation,,0.8576605691022522,0.4196299205808169,0.3888340721973625,0.4194370296543745,0.6933786820742701,0.6689014648669532,0.9689979783426229
kBET,0.9811226904448863,0.3769181721402569,0.358476724390257,0.282121092915949,0.2860819109264171,0.63149884125723,0.3726876760945256,0.3726876760945256
iLISI,0.0,1.2251723136182102,1.1966802830580177,0.40344336386056434,0.40344336386056434,1.2357361497341808,0.4026809631922639,0.4026809631922639
cLISI,1.0,-0.0019265726299941832,-0.27309164785937856,0.6000196226157972,0.6000196226157972,0.37567982296272184,0.7273042624942341,0.7273042624942341

@kridsadakorn
Copy link
Collaborator

@mumichae I updated for my group meeting today. People questioned that, for BBKNN, there are no values for ASW_label, PCR_batch, cell_cycle_conservation. Are they not applicable for BBKNN or error from the code? If they are not applicable, should it be better to have a value like NA or "-", at least we know that it is not an error.

@mumichae
Copy link
Collaborator

The overwriting of knn graphs is solved (2cae16c), so I'll close this issue. We should discuss the other problems in new issues.

@mumichae
Copy link
Collaborator

@mumichae I updated for my group meeting today. People questioned that, for BBKNN, there are no values for ASW_label, PCR_batch, cell_cycle_conservation. Are they not applicable for BBKNN or error from the code? If they are not applicable, should it be better to have a value like NA or "-", at least we know that it is not an error.

For these measures we use PCA, which is not available in knn-based methods, due to missing embedding or corrected expression value output. In other words, we can simply not compute these metrics on corrected kNN graphs.

@mumichae
Copy link
Collaborator

@mumichae How can I observe that the problem is gone? The ARI should not be equal to "1", right?

From Pancreas:

,human_pancreas_norm_bbknn_hvg0_knn,human_pancreas_norm_harmony_hvg0_embed,human_pancreas_norm_mnn_hvg0_full,human_pancreas_norm_scanorama_hvg0_embed,human_pancreas_norm_scanorama_hvg0_full,human_pancre
as_norm_seurat_hvg0_full,human_pancreas_norm_trvae_hvg0_embed,human_pancreas_norm_trvae_hvg0_full
NMI_cluster/label,0.46401801695452904,0.5042002919424581,0.6002282758722878,0.6123651119241046,0.6108255019101648,0.5679437049212369,0.5827635868924284,0.5388559433269677
ARI_cluster/label,0.16853642878982472,0.1514453414749276,0.20087193882907997,0.20956007009309266,0.20137486657273718,0.21995362399282753,0.20455721054365034,0.17647388431744748
ASW_label,,0.2852015053041666,0.509965711273253,0.42375880950884437,0.41808442771434784,0.3476724773645401,0.4333161190152168,0.273640975356102
ASW_label/batch,,0.32514855991659264,0.6220727460922302,0.5051653726518818,0.4626033526104369,0.4426532251891888,0.5113974121340542,0.3109629468094947
PCR_batch,,0.10250425238404949,0.06257392329011563,0.5766905597395535,0.5765105840018274,0.9603319486438531,0.39888335590252977,0.0166062273807448
cell_cycle_conservation,,0.8576605691022522,0.4196299205808169,0.3888340721973625,0.4194370296543745,0.6933786820742701,0.6689014648669532,0.9689979783426229
kBET,0.9811226904448863,0.3769181721402569,0.358476724390257,0.282121092915949,0.2860819109264171,0.63149884125723,0.3726876760945256,0.3726876760945256
iLISI,0.0,1.2251723136182102,1.1966802830580177,0.40344336386056434,0.40344336386056434,1.2357361497341808,0.4026809631922639,0.4026809631922639
cLISI,1.0,-0.0019265726299941832,-0.27309164785937856,0.6000196226157972,0.6000196226157972,0.37567982296272184,0.7273042624942341,0.7273042624942341

I don't quite understand what you mean. The ARI scores are between 0 and 1. They are indeed quite small, which means that the cluster labels and cell type labels don't match very well according to ARI

@mbuttner
Copy link
Contributor Author

mbuttner commented Nov 28, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants