Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add REVEL Score #242

Closed
snesic opened this issue Nov 12, 2021 · 2 comments · Fixed by #358
Closed

Add REVEL Score #242

snesic opened this issue Nov 12, 2021 · 2 comments · Fixed by #358
Labels
enhancement New feature or request

Comments

@snesic
Copy link
Contributor

snesic commented Nov 12, 2021

Feature Requirements:

Add REVEL score to VarFish so that the variants can be sorted and filtered by this score.

Use cases:

  1. Display REVEL score in "extra annotations" box in variant detail fold-out.
  2. Display REVEL score in the filtration results table in each row for each missense variant.
    The display should be in the same way as population frequency.
  3. Allow filtration by REVEL score threshold, i.e., we need to add it to the filter form.
    • A more detailed discussion yielded that this is not strictly required and we can just sort by the REVEL score and just look at the top N variants by descending REVEL score or stop considering variants with REVEL score below X.
  4. Download TSV/Excel file that contains REVEL score, just as they have population frequencies.

Design Overview

Different strategies to implement this are discussed here. We decided to follow the option where the REVEL score is added to the extra annotations table. We further want to allow annotation of arbitrary values from the extra annotations in the result table (and downloads) and allow filtration by these values.

Steps

  1. Extra Annotations Data Update Get REVEL score into extra annotations (TSV and table).
  2. Extra Annotations Into Results Get extra annotation values form extra annotations table into VarFish results.
    At this point, the extra annotations such as the REVEL score is visible to users and users can already sort in the result. We also need to ensure that the export works here.
    Care needs to be taken that the results table does not explode in terms of number of columns. Users should be able to show/hide the extra annotation columns similar to how it's done for columns such as "distance to splice site."
  3. Extra Annotation Tresholds Extend the filter form (and supporting) code such that it is aware of the extra annotations and thresholds can be set.
    • See comment above, no need to implement this at the moment.

Depencies

  • (1.) indepent
  • (3.) -> (2.)

Notes

  • The list of extra annotations is given by table ExtraAnnotationField.
  • Always annotate all extra annotations scores, all are hidden by default and the "Columns" Multi-Select allows to show them.
    image

Design Details

1. Extra Annotations Data Update

  • Develop a small utility to take the "vanilla" (originally released) ExtraAnnos.tsv and ExtraAnnoFields.tsv files and augment them with additional scores. The "additional scores" could/should come from a TSV file with (genome-build, chromosome, 1-based position, reference, alternative, ...) columns.
  • This will allow users to create their own augmented extra annotation tables beyond the REVEL score.
  • In the future, the varfish-db-downloader Snakefiles could incorporate the most "popular" scores but for now we want do not want to add every possible score in there.
  • The annotation utility itself can live in varfish-db-downloader repository.

INPUT

# cat score-to-augment.tsv
chr1 1000000 G A 37.5
# cat ExtraAnno.tsv
[...]
GRCh37  1       1000000 1000000 592     G       A       [27.33, 8.54, 0.72, 0.48, 8.2, 2.7, 5.02, 2.91, 52.21, 0, 0, 0]
[...]
# cat ExtraAnnoField.tsv 
field   label
1       EncodeH3K27me3-sum
2       EncodeH3K27me3-max
3       EncodeH3K36me3-sum
4       EncodeH3K36me3-max
5       EncodeH3K79me2-sum
6       EncodeH3K79me2-max
7       EncodeH4K20me1-sum
8       EncodeH4K20me1-max
9       EncodeH2AFZ-sum
10      MMSp_acceptorIntron
11      MMSp_acceptor
12      MMSp_donor

USAGE

augment-extra-annotations.py \
    --input-annos=ExtraAnno.tsv \
    --input-annos-fields=ExtraAnnoFields.tsv \
    --input-score=score-to-augment.tsv \
    --genomebuild=GRCh37 \
    --field-name=REVEL_score \
    --output-annos=ExtraAnnos_with_REVEL.tsv \
    --output-annos-fields=ExtraAnnosFields_with_REVEL.tsv

RESULT

# cat ExtraAnno.tsv
GRCh37  1       1000000 1000000 592     G       A       [27.33, 8.54, 0.72, 0.48, 8.2, 2.7, 5.02, 2.91, 52.21, 0, 0, 0, 37.5]
# cat ExtraAnnoField.tsv 
field   label
1       EncodeH3K27me3-sum
2       EncodeH3K27me3-max
3       EncodeH3K36me3-sum
4       EncodeH3K36me3-max
5       EncodeH3K79me2-sum
6       EncodeH3K79me2-max
7       EncodeH4K20me1-sum
8       EncodeH4K20me1-max
9       EncodeH2AFZ-sum
10      MMSp_acceptorIntron
11      MMSp_acceptor
12      MMSp_donor
13      REVEL_score
Affected Components
  • Additional script in varfish-db-downloader with documentation.
  • No change to varfish-db-downloader workflows or existing files.

2. Extra Annotations Into Results

(a) augmenting the query infrastructure

  • Augment variants.query such that the information from the ExtraAnno model is available in the results
  • Use ExtendQueryPartsCommentsJoin as a model (copy-paste-and-adjust) => ExtendQueryPartsExtraAnnoJoin
  • Develop in test-driven way, start with writing tests.

(b) adjust result table templates

  • paths: variants/templates/variants/filter_result/{header,row}.html
  • write all extra annotation fields in header and each row
  • use JQuery/CSS trick for hiding the columns that should not be displayed (e.g., look how it's done for detail-frequencies-thousand-genomes)
  • the current selects for constraints and columns are in table.html, look for string result-columns-selector
  • of course, the list of extra annotation fields must be given into the templates in variants.views

(c) for TSV/Excel downloads

  • module variants.file_exports
  • when (a) has been implemented, the data is available in the code that generates the download data
  • augment CaseExporterBase._yield_columns and CaseExporterBase._yield_small_vars to return the extra annotations in an appropriate way (break apart the JSON list that will be in the output)
  • probably it makes sense to start out with tests first

(d) adjust documentation

  • describe the new feature in the documentation and also the release notes
Affected Components
  • variants.queries, variants.file_exports, variants.views, possibly more modules in variants
  • extend templates in variants/templates
  • of course, tests as appropriate

3. Extra Annotation Thresholds

  • most probably not required as results can be sorted by score
  • decision: postpone for now
@snesic snesic added the enhancement New feature or request label Nov 12, 2021
@holtgrewe
Copy link
Collaborator

As discussed in Teams,

We should change the Requirements as follows.

### Feature Requirements
1. Display REVEL score in "extra annotations" box in variant detail fold-out.
2. Display REVEL score in the filtration results table in each row for each missense variant.
   The display should be in the same way as population frequency.
3. Allow filtration by REVEL score threshold, i.e., we need to add it to the filter form.
4. Download TSV/Excel file that contains REVEL score, just as they have population frequencies.

As for the change/design proposal please update this section towards what we discussed, i.e., no full inclusion in varfish db downloader but Snakefiles, more as an add-on tool to annotate extra annotations based on a chr/pos/ref/alt TSV. Annotating extra annotations with the REVEL score would be special case of such a general purpose tool. Please also propose where to change varfish-server with regards to the requirements 1-4. This will most probably be starting points for an design/architecture change document but please provide some structure and start looking at the source code to identify suitable parts in the VarFish Server views/queries/etc.

IOW: Please rework your requirements. To answer your question from Teams, we will not have the REVEL score in the mainline of varfish-db-downloader. Rather, we should (1) create a tool to annotate extra annos TSV file with further scores from TSV/tabix files such as downloaded REVEL score and (2) provide users with instrutions on how to extend their extra annos TSV score with REVEL scores for example.

We further need a more refined plan of the required changes in varfish-server. I don't see a need for a new table and query engine performance is not so critical, we will need to augment then design. Please start by studying the query engine source code.

@stolpeo
Copy link
Contributor

stolpeo commented Dec 3, 2021

@holtgrewe @snesic Specifications are looking good to me. Just some comments:

Tool: I would emphasize, for the tool interface please use argparse (code can be copied from other tools in the DB downloader, e.g. https://github.com/bihealth/varfish-db-downloader/blob/master/tools/knowngeneaa.py ). Also, please locate the tool in the tools subfolder.

Query: Getting into the query structure can be daunting at first, but the actual change is not too complicated. I wondered if in the long term there could be a generic or automatic solution to add extra annotation fields to the query.

Tests: For the planning, setting up tests will probably take at least as long as the actual implementation, if not longer.

snesic added a commit to snesic/varfish-server that referenced this issue Feb 21, 2022
snesic added a commit to snesic/varfish-server that referenced this issue Mar 6, 2022
snesic added a commit to snesic/varfish-server that referenced this issue Mar 6, 2022
snesic added a commit to snesic/varfish-server that referenced this issue Mar 9, 2022
snesic added a commit to snesic/varfish-server that referenced this issue Mar 9, 2022
Squashed commit of the following:

commit 5a0ff15
Author: Manuel Holtgrewe <manuel.holtgrewe@bihealth.de>
Date:   Wed Mar 9 11:14:51 2022 +0100

    Documenting release cycle and branch names

commit 7c50327
Author: Manuel Holtgrewe <manuel.holtgrewe@bihealth.de>
Date:   Wed Mar 9 11:13:04 2022 +0100

    Capping max. number of cases to query at once (varfish-org#372) (varfish-org#377)

commit cc29127
Author: Manuel Holtgrewe <manuel.holtgrewe@bihealth.de>
Date:   Wed Mar 9 08:41:46 2022 +0100

    Fixing problem with sodar-core upgrade (varfish-org#375) (varfish-org#376)

commit 034c311
Author: Manuel Holtgrewe <manuel.holtgrewe@bihealth.de>
Date:   Mon Mar 7 15:52:05 2022 +0100

    Optimize retrieving query IDs via API (varfish-org#371) (varfish-org#373)

commit bdb49c8
Author: Manuel Holtgrewe <manuel.holtgrewe@bihealth.de>
Date:   Mon Mar 7 15:40:35 2022 +0100

    Fix to support sodar-core v0.10.10.

commit be2ed28
Author: Manuel Holtgrewe <manuel.holtgrewe@bihealth.de>
Date:   Mon Mar 7 14:41:25 2022 +0100

    Improving performance for fetching result queries (varfish-org#371) (varfish-org#372)

    Closes: varfish-org#371

commit c0fbd74
Author: Manuel Holtgrewe <manuel.holtgrewe@bihealth.de>
Date:   Mon Mar 7 11:40:52 2022 +0100

    Fixing minor bug in _ClosingWrapper.

commit 00ab43f
Author: Manuel Holtgrewe <manuel.holtgrewe@bihealth.de>
Date:   Fri Mar 4 17:24:31 2022 +0100

    Documenting Clinical Beacon v1 protocol.

commit b954c1e
Author: Manuel Holtgrewe <manuel.holtgrewe@bihealth.de>
Date:   Fri Mar 4 16:48:19 2022 +0100

    Changing branch for badge to main

commit 005f7d4
Author: Manuel Holtgrewe <manuel.holtgrewe@bihealth.de>
Date:   Fri Mar 4 16:44:36 2022 +0100

    Splitting UNION queries in joint mode (varfish-org#241) (varfish-org#352)

commit 5e04b4d
Author: Manuel Holtgrewe <manuel.holtgrewe@bihealth.de>
Date:   Fri Mar 4 16:42:18 2022 +0100

    Fixing beaconsite queries with dots in the key id (varfish-org#369) (varfish-org#370)

    Closes: varfish-org#369

commit dc32e7e
Author: Manuel Holtgrewe <manuel.holtgrewe@bihealth.de>
Date:   Fri Mar 4 16:16:01 2022 +0100

    Adding REST API for small variants (varfish-org#332) (varfish-org#341)

commit bb8a9c1
Author: Manuel Holtgrewe <manuel.holtgrewe@bihealth.de>
Date:   Fri Mar 4 14:46:31 2022 +0100

    Fixing problem with ACMD classification (varfish-org#359) (varfish-org#368)

    Closes: varfish-org#359

commit fbfc86c
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Fri Mar 4 14:02:42 2022 +0100

    Bump url-parse from 1.5.4 to 1.5.10 in /varfish/vueapp (varfish-org#365)

    Bumps [url-parse](https://github.com/unshiftio/url-parse) from 1.5.4 to 1.5.10.
    - [Release notes](https://github.com/unshiftio/url-parse/releases)
    - [Commits](unshiftio/url-parse@1.5.4...1.5.10)

    ---
    updated-dependencies:
    - dependency-name: url-parse
      dependency-type: indirect
    ...

    Signed-off-by: dependabot[bot] <support@github.com>

    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

commit b06b258
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Fri Mar 4 14:02:36 2022 +0100

    Bump follow-redirects from 1.14.7 to 1.14.8 in /varfish/vueapp (varfish-org#356)

    Bumps [follow-redirects](https://github.com/follow-redirects/follow-redirects) from 1.14.7 to 1.14.8.
    - [Release notes](https://github.com/follow-redirects/follow-redirects/releases)
    - [Commits](follow-redirects/follow-redirects@v1.14.7...v1.14.8)

    ---
    updated-dependencies:
    - dependency-name: follow-redirects
      dependency-type: indirect
    ...

    Signed-off-by: dependabot[bot] <support@github.com>

    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

commit b017ad2
Author: Svetozar N <snesic@gmail.com>
Date:   Fri Mar 4 14:01:57 2022 +0100

    Updates to developer_installation.rst (varfish-org#340)

commit 00bfc89
Author: Manuel Holtgrewe <manuel.holtgrewe@bihealth.de>
Date:   Fri Feb 25 10:23:55 2022 +0100

    Removing `display_hgmd_public_membership` (varfish-org#363)

commit 2898ef0
Author: Oliver Stolpe <oliver.stolpe@bih-charite.de>
Date:   Wed Feb 23 11:30:11 2022 +0100

    Extending developers documentation with topics git and running specific tests (varfish-org#362)

commit 21a6e75
Author: Oliver Stolpe <oliver.stolpe@bih-charite.de>
Date:   Tue Feb 22 17:32:47 2022 +0100

    Fixing broken case detail page that has comments on svs. (varfish-org#360)
snesic added a commit to snesic/varfish-server that referenced this issue Mar 9, 2022
stolpeo pushed a commit that referenced this issue Mar 9, 2022
Closes: #242
Related-Issue: #242
Projected-Results-Impact: none
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
3 participants