metrics script always recomputes neighbors (overwrites knn-graph!) #48

mbuttner · 2019-11-26T09:40:18Z

Hi,
I just realised that the scripts/metrics.py function always calls
scIB.preprocessing.reduce_data (line 100 at Master)
and recomputes the knn-graph EVEN IF we have run bbknn. Thus, the bbknn result is effectively overwritten.
Best,
Maren

The text was updated successfully, but these errors were encountered:

kridsadakorn · 2019-11-26T10:09:54Z

This can be implied that I should be aware not to overwrite metrics' results to runIntegration's results, right?

mbuttner · 2019-11-26T10:19:18Z

Erm, that problem lies within the `scripts/metrics.py` call, however, it does not overwrite the saved files, the problem is that the knn-graphs are recomputed while all metrics are calculated. The runIntegration's results remain unchanged. I consider it a bug within `scripts/metrics.py`, which needs fixing in order to secure comparability of bbknn/conos to all other tools.

On 26. Nov 2019, at 11:09, Kris ***@***.***> wrote: This can be implied that I should be aware not to overwrite metrics' results to runIntegration's results, right? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#48>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AF2HCATYJMO632YFCKGMLU3QVTYXHANCNFSM4JRVEGVQ>.

Helmholtz Zentrum München

kridsadakorn · 2019-11-26T10:21:41Z

OK I got the point, so the result of Metrics.py is not correct in case of bkknn and cronos

LuckyMD · 2019-11-26T11:09:39Z

@mbuttner great that you spotted this! This is what we're trying to avoid with the unit tests. @mumichae can you take a look?

mumichae · 2019-11-26T12:11:44Z

@mbuttner Thanks for spotting it. I fixed it with 2cae16c on metric_fixes.
@kridsadakorn Try, if it works now.
@LuckyMD I'm not quite sure how to set up unit tests that test the script. Should I wrap the decision part into a function of the package or should I call the script in the unit test?

LuckyMD · 2019-11-26T17:06:43Z

For now call the script in the test. We'll work on speed another time.

LuckyMD · 2019-11-26T17:07:20Z

Also, we should try to merge PRs a bit quicker. At the moment we're working in our own PRs too much. Let's make specific PRs and merge quickly.

mumichae · 2019-11-26T17:20:40Z

Could we merge the metrics_fixes right now already? Then I would open a new branch for the script testing

LuckyMD · 2019-11-26T17:31:59Z

let me take a look at them later/tomorrow. Just open a new branch for script testing and pull metric fixes into that.

mumichae · 2019-11-27T17:40:00Z

@kridsadakorn Has this issue been resolved?

kridsadakorn · 2019-11-27T18:15:38Z

@mumichae How can I observe that the problem is gone? The ARI should not be equal to "1", right?

From Pancreas:

,human_pancreas_norm_bbknn_hvg0_knn,human_pancreas_norm_harmony_hvg0_embed,human_pancreas_norm_mnn_hvg0_full,human_pancreas_norm_scanorama_hvg0_embed,human_pancreas_norm_scanorama_hvg0_full,human_pancre
as_norm_seurat_hvg0_full,human_pancreas_norm_trvae_hvg0_embed,human_pancreas_norm_trvae_hvg0_full
NMI_cluster/label,0.46401801695452904,0.5042002919424581,0.6002282758722878,0.6123651119241046,0.6108255019101648,0.5679437049212369,0.5827635868924284,0.5388559433269677
ARI_cluster/label,0.16853642878982472,0.1514453414749276,0.20087193882907997,0.20956007009309266,0.20137486657273718,0.21995362399282753,0.20455721054365034,0.17647388431744748
ASW_label,,0.2852015053041666,0.509965711273253,0.42375880950884437,0.41808442771434784,0.3476724773645401,0.4333161190152168,0.273640975356102
ASW_label/batch,,0.32514855991659264,0.6220727460922302,0.5051653726518818,0.4626033526104369,0.4426532251891888,0.5113974121340542,0.3109629468094947
PCR_batch,,0.10250425238404949,0.06257392329011563,0.5766905597395535,0.5765105840018274,0.9603319486438531,0.39888335590252977,0.0166062273807448
cell_cycle_conservation,,0.8576605691022522,0.4196299205808169,0.3888340721973625,0.4194370296543745,0.6933786820742701,0.6689014648669532,0.9689979783426229
kBET,0.9811226904448863,0.3769181721402569,0.358476724390257,0.282121092915949,0.2860819109264171,0.63149884125723,0.3726876760945256,0.3726876760945256
iLISI,0.0,1.2251723136182102,1.1966802830580177,0.40344336386056434,0.40344336386056434,1.2357361497341808,0.4026809631922639,0.4026809631922639
cLISI,1.0,-0.0019265726299941832,-0.27309164785937856,0.6000196226157972,0.6000196226157972,0.37567982296272184,0.7273042624942341,0.7273042624942341

kridsadakorn · 2019-11-27T18:19:24Z

@mumichae I updated for my group meeting today. People questioned that, for BBKNN, there are no values for ASW_label, PCR_batch, cell_cycle_conservation. Are they not applicable for BBKNN or error from the code? If they are not applicable, should it be better to have a value like NA or "-", at least we know that it is not an error.

mumichae · 2019-11-27T18:29:21Z

The overwriting of knn graphs is solved (2cae16c), so I'll close this issue. We should discuss the other problems in new issues.

mumichae · 2019-11-27T18:31:03Z

@mumichae I updated for my group meeting today. People questioned that, for BBKNN, there are no values for ASW_label, PCR_batch, cell_cycle_conservation. Are they not applicable for BBKNN or error from the code? If they are not applicable, should it be better to have a value like NA or "-", at least we know that it is not an error.

For these measures we use PCA, which is not available in knn-based methods, due to missing embedding or corrected expression value output. In other words, we can simply not compute these metrics on corrected kNN graphs.

mumichae · 2019-11-27T18:38:16Z

@mumichae How can I observe that the problem is gone? The ARI should not be equal to "1", right?

From Pancreas:

,human_pancreas_norm_bbknn_hvg0_knn,human_pancreas_norm_harmony_hvg0_embed,human_pancreas_norm_mnn_hvg0_full,human_pancreas_norm_scanorama_hvg0_embed,human_pancreas_norm_scanorama_hvg0_full,human_pancre
as_norm_seurat_hvg0_full,human_pancreas_norm_trvae_hvg0_embed,human_pancreas_norm_trvae_hvg0_full
NMI_cluster/label,0.46401801695452904,0.5042002919424581,0.6002282758722878,0.6123651119241046,0.6108255019101648,0.5679437049212369,0.5827635868924284,0.5388559433269677
ARI_cluster/label,0.16853642878982472,0.1514453414749276,0.20087193882907997,0.20956007009309266,0.20137486657273718,0.21995362399282753,0.20455721054365034,0.17647388431744748
ASW_label,,0.2852015053041666,0.509965711273253,0.42375880950884437,0.41808442771434784,0.3476724773645401,0.4333161190152168,0.273640975356102
ASW_label/batch,,0.32514855991659264,0.6220727460922302,0.5051653726518818,0.4626033526104369,0.4426532251891888,0.5113974121340542,0.3109629468094947
PCR_batch,,0.10250425238404949,0.06257392329011563,0.5766905597395535,0.5765105840018274,0.9603319486438531,0.39888335590252977,0.0166062273807448
cell_cycle_conservation,,0.8576605691022522,0.4196299205808169,0.3888340721973625,0.4194370296543745,0.6933786820742701,0.6689014648669532,0.9689979783426229
kBET,0.9811226904448863,0.3769181721402569,0.358476724390257,0.282121092915949,0.2860819109264171,0.63149884125723,0.3726876760945256,0.3726876760945256
iLISI,0.0,1.2251723136182102,1.1966802830580177,0.40344336386056434,0.40344336386056434,1.2357361497341808,0.4026809631922639,0.4026809631922639
cLISI,1.0,-0.0019265726299941832,-0.27309164785937856,0.6000196226157972,0.6000196226157972,0.37567982296272184,0.7273042624942341,0.7273042624942341

I don't quite understand what you mean. The ARI scores are between 0 and 1. They are indeed quite small, which means that the cluster labels and cell type labels don't match very well according to ARI

mbuttner · 2019-11-28T07:30:46Z

@mumichae Thank you a lot for fixing this. BTW, I had the following results on the metrics before fixing: ,human_pancreas_bbknn_hvg2000_15_knn NMI_cluster/label,0.6349159232981765 ARI_cluster/label,0.3855720993273689 ASW_label, ASW_label/batch, cell_cycle_conservation, PCR_batch, kBET,0.9116536605920609 iLISI,0.0 cLISI,1.0 kBET and LISI scores are obviously different now :)

On 27. Nov 2019, at 19:38, mumichae ***@***.***> wrote: @mumichae <https://github.com/mumichae> How can I observe that the problem is gone? The ARI should not be equal to "1", right? From Pancreas: ,human_pancreas_norm_bbknn_hvg0_knn,human_pancreas_norm_harmony_hvg0_embed,human_pancreas_norm_mnn_hvg0_full,human_pancreas_norm_scanorama_hvg0_embed,human_pancreas_norm_scanorama_hvg0_full,human_pancre as_norm_seurat_hvg0_full,human_pancreas_norm_trvae_hvg0_embed,human_pancreas_norm_trvae_hvg0_full NMI_cluster/label,0.46401801695452904,0.5042002919424581,0.6002282758722878,0.6123651119241046,0.6108255019101648,0.5679437049212369,0.5827635868924284,0.5388559433269677 ARI_cluster/label,0.16853642878982472,0.1514453414749276,0.20087193882907997,0.20956007009309266,0.20137486657273718,0.21995362399282753,0.20455721054365034,0.17647388431744748 ASW_label,,0.2852015053041666,0.509965711273253,0.42375880950884437,0.41808442771434784,0.3476724773645401,0.4333161190152168,0.273640975356102 ASW_label/batch,,0.32514855991659264,0.6220727460922302,0.5051653726518818,0.4626033526104369,0.4426532251891888,0.5113974121340542,0.3109629468094947 PCR_batch,,0.10250425238404949,0.06257392329011563,0.5766905597395535,0.5765105840018274,0.9603319486438531,0.39888335590252977,0.0166062273807448 cell_cycle_conservation,,0.8576605691022522,0.4196299205808169,0.3888340721973625,0.4194370296543745,0.6933786820742701,0.6689014648669532,0.9689979783426229 kBET,0.9811226904448863,0.3769181721402569,0.358476724390257,0.282121092915949,0.2860819109264171,0.63149884125723,0.3726876760945256,0.3726876760945256 iLISI,0.0,1.2251723136182102,1.1966802830580177,0.40344336386056434,0.40344336386056434,1.2357361497341808,0.4026809631922639,0.4026809631922639 cLISI,1.0,-0.0019265726299941832,-0.27309164785937856,0.6000196226157972,0.6000196226157972,0.37567982296272184,0.7273042624942341,0.7273042624942341 I don't quite understand what you mean. The ARI scores are between 0 and 1. They are indeed quite small, which means that the cluster labels and cell type labels don't match very well according to ARI — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#48>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AF2HCAXAONDIDP2FICZ7MWDQV25BRANCNFSM4JRVEGVQ>.

Helmholtz Zentrum München

mumichae closed this as completed Nov 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metrics script always recomputes neighbors (overwrites knn-graph!) #48

metrics script always recomputes neighbors (overwrites knn-graph!) #48

mbuttner commented Nov 26, 2019

kridsadakorn commented Nov 26, 2019

mbuttner commented Nov 26, 2019 via email

kridsadakorn commented Nov 26, 2019

LuckyMD commented Nov 26, 2019

mumichae commented Nov 26, 2019

LuckyMD commented Nov 26, 2019

LuckyMD commented Nov 26, 2019

mumichae commented Nov 26, 2019 •

edited

Loading

LuckyMD commented Nov 26, 2019

mumichae commented Nov 27, 2019

kridsadakorn commented Nov 27, 2019

kridsadakorn commented Nov 27, 2019

mumichae commented Nov 27, 2019

mumichae commented Nov 27, 2019

mumichae commented Nov 27, 2019

mbuttner commented Nov 28, 2019 via email

metrics script always recomputes neighbors (overwrites knn-graph!) #48

metrics script always recomputes neighbors (overwrites knn-graph!) #48

Comments

mbuttner commented Nov 26, 2019

kridsadakorn commented Nov 26, 2019

mbuttner commented Nov 26, 2019 via email

kridsadakorn commented Nov 26, 2019

LuckyMD commented Nov 26, 2019

mumichae commented Nov 26, 2019

LuckyMD commented Nov 26, 2019

LuckyMD commented Nov 26, 2019

mumichae commented Nov 26, 2019 • edited Loading

LuckyMD commented Nov 26, 2019

mumichae commented Nov 27, 2019

kridsadakorn commented Nov 27, 2019

kridsadakorn commented Nov 27, 2019

mumichae commented Nov 27, 2019

mumichae commented Nov 27, 2019

mumichae commented Nov 27, 2019

mbuttner commented Nov 28, 2019 via email

mumichae commented Nov 26, 2019 •

edited

Loading