Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prioritization with MutationTaster is broken #509

Closed
xiamaz opened this issue May 18, 2022 · 13 comments · Fixed by #512
Closed

Prioritization with MutationTaster is broken #509

xiamaz opened this issue May 18, 2022 · 13 comments · Fixed by #512
Labels
bug Something isn't working

Comments

@xiamaz
Copy link
Collaborator

xiamaz commented May 18, 2022

Describe the bug
Currently prioritization using MutationTaster does not work and returns an error.

To Reproduce
Steps to reproduce the behavior:

  1. Filter Variants in a case.
  2. Select Tab "Prioritization"
  3. Select MutationTaster in Pathogenecity Prioritization, tick enable variant path-based prioritization
  4. Filter & Display

Expected behavior
Table including MutationTaster scores

Screenshots
image

Additional context
Possibly caused by unguarded access. https://github.com/bihealth/varfish-server/blob/7a05efc0ef30f7a6f43752e5b5ae46c4076b154a/variants/models.py#L2694

@xiamaz xiamaz added the bug Something isn't working label May 18, 2022
@xiamaz xiamaz changed the title MutationTaster is broken Prioritization with MutationTaster is broken May 18, 2022
@xiamaz
Copy link
Collaborator Author

xiamaz commented May 18, 2022

Seems like the current mutationtaster does not return the bayes_prob_dc.

But currently the bayes prob is required in order to obtain a score from MutationTaster https://github.com/bihealth/varfish-server/blob/7a05efc0ef30f7a6f43752e5b5ae46c4076b154a/variants/models.py#L2747

@xiamaz
Copy link
Collaborator Author

xiamaz commented May 18, 2022

bayes_prob_dc has been changed to tree_vote. Will need to check, if the dataformat can be converted correctly.

@stolpeo
Copy link
Contributor

stolpeo commented May 18, 2022

@xiamaz Thanks for the bug hunt. We need to ask Dominik. I'll drop him an email if this is the correct way (or did you find it in the docs?). We should also replace record["bayes_prob_dc"] with record.get("tree_vote").

@your-highness
Copy link
Collaborator

The tree vote is not a probabilistic measure but a measure of supporting_trees|conflicting_trees in MutationTaster's new 2021 model.

One possible formula is supporting_trees / (supporting_trees + conflicting_trees) but this might end up in many similar scores and might penalize variants marked as pathogenic in ClinVar

Example of MT2021 output:

id      chr     pos     ref     alt     transcript_stable       NCBI_geneid     prediction      model   tree_vote       note    splicesite      distance_from_splicesite        disease_mutation        polymorphism
1       21      33039603        A       C       ENST00000389995 6647    disease causing (ClinVar)       simple_aae      43|57                           ClinVar 
1       21      33039603        A       C       ENST00000270142 6647    disease causing (ClinVar)       simple_aae      42|58                           ClinVar
2       2       233391374       T       TC      ENST00000258385 1144    disease causing (fs/PTC)        complex_aae     195|5                   11
2       2       233391374       T       TC      ENST00000543200 1144    disease causing (fs/PTC)        complex_aae     192|8                   11
2       2       233391374       T       TC      ENST00000536614 1144    disease causing (fs/PTC)        complex_aae     184|16                  11
2       2       233391374       T       TC      ENST00000457943 1144    polymorphism    5utr    53|247                  11

@your-highness
Copy link
Collaborator

@xiamaz Thanks for the bug hunt. We need to ask Dominik. I'll drop him an email if this is the correct way (or did you find it in the docs?). We should also replace record["bayes_prob_dc"] with record.get("tree_vote").

I think one can still use the old MutationTaster interface with the bayes probabilites under https://www.mutationtaster.org/ instead of https://www.genecascade.org/MutationTaster2021 .

@stolpeo
Copy link
Contributor

stolpeo commented May 18, 2022

@your-highness (Johannes?) thanks for the input. If this is possible, we should patch it back to use the old interface and then create a ticket for adapting to the new interface as it seems to be more involved.

@xiamaz
Copy link
Collaborator Author

xiamaz commented May 18, 2022

@stolpeo I have tested some changes to the current models that would be necessary. IMHO the cleanest implementation should separate MT2021 from MT86, since they follow very different principles.

@xiamaz
Copy link
Collaborator Author

xiamaz commented May 18, 2022

@your-highness Reading https://www.genecascade.org/MutationTaster2021/info/#rf I'm not sure whether doing a proba calculation is a good idea with the current MT2021 classifier.

@your-highness
Copy link
Collaborator

@your-highness (Johannes?) thanks for the input. If this is possible, we should patch it back to use the old interface and then create a ticket for adapting to the new interface as it seems to be more involved.

Just read "Your-Highness" with a German accent and it almost sounds correct 😄

@stolpeo I have tested some changes to the current models that would be necessary. IMHO the cleanest implementation should separate MT2021 from MT86, since they follow very different principles.

I second this: MT86 should be incorporated and kept.

@your-highness Reading https://www.genecascade.org/MutationTaster2021/info/#rf I'm not sure whether doing a proba calculation is a good idea with the current MT2021 classifier.

We should ask MT2021 developers on how to infer a proper ranking. They do it anyway with MutationDistiller

@domibln
Copy link

domibln commented May 20, 2022

We do not rank variables with MutationTaster.

The Bayes classifier gave a Boolean output and the float values where only indicating its internal confidence. But in contrast to a 'score' (e.g. in CADD or RegulationSpotter) they do not reflect how deleteriousness a variant is thought to be. Do not use them at all, they don't provide any benefit over the prediction values and lead to the false idea that MT could rank the disease potential of variants.

With MutationTaster2021 it's a bit different: this is a RandomForest model and the 'tree vote' shows how many decision trees suggest deleterious:harmless. But it's still an internal marker and cannot rank deleteriousness - the model has not been trained to give such a metric. It's either deleterious or harmless. Don't use this score either (unless you want to test the classifier).

@stolpeo
Copy link
Contributor

stolpeo commented May 20, 2022

@domibln Thanks for the input! I now inspected our ranking algorithm and currently the ranking with MT is based on the prediction returned by MT (which is turned into an integer for easier comparison, with disease causing (automatic) being the highest rank), refined with the value in model. The Bayes probability is used to further refine the ranking in case prediction and model give an equal rank for two variants. The Bayes probability does not change the overall ranking given by prediction /model in our algorithm. Would you suggest to get rid of this value completely, accepting possibly equally ranked variants?

@stolpeo
Copy link
Contributor

stolpeo commented May 20, 2022

Resolution Proposal
The new MT API interface (2021) relies on a new algorithm and returns different parameters. The old interface still exists. Therefore, we switch back to the old interface for now.

Affected Components
VarFish server

Affected Modules/Files

  • config/settings/base.py

Required Architectural Changes
None

Required Database Changes
None

Backport Possible?
Yes

Resolution Sketch
Change the global variable VARFISH_MUTATIONTASTER_REST_API_URL from

https://www.genecascade.org/MT2021/MT_API102.cgi

to

https://www.genecascade.org/MTc85/MT_API.cgi

@stolpeo
Copy link
Contributor

stolpeo commented May 20, 2022

Continue discussion for new implementation in #511

stolpeo added a commit that referenced this issue May 23, 2022
…nk-out alongside MT 2021 link-out

Closes: #509
Related-Issue: #509
Projected-Results-Impact: none
stolpeo added a commit that referenced this issue May 23, 2022
…nk-out alongside MT 2021 link-out (#512)

Closes: #509
Related-Issue: #509
Projected-Results-Impact: none
stolpeo added a commit that referenced this issue May 23, 2022
…ink-out

Related-Issue: #509
Projected-Results-Impact: none
stolpeo added a commit that referenced this issue May 23, 2022
…ink-out (#513)

Related-Issue: #509
Projected-Results-Impact: none
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants