Skip to content

Commit

Permalink
Revised parts of the developers documentation. (#274)
Browse files Browse the repository at this point in the history
  • Loading branch information
stolpeo committed Jan 11, 2022
1 parent 179358a commit 33ffcdb
Show file tree
Hide file tree
Showing 4 changed files with 164 additions and 90 deletions.
173 changes: 99 additions & 74 deletions docs_manual/developer_database.rst
Expand Up @@ -4,94 +4,119 @@
Database Import
===============

To prepare the VarFish database, follow `the instructions for the VarFish DB Downloader <https://github.com/bihealth/varfish-db-downloader>`_.
Downloading and processing the data can take multiple days.
First, download the pre-build database files that we provide and unpack them.
Please make sure that you have enough space available. The packed file consumes
31 Gb. When unpacked, it consumed additional 188 Gb.

The VarFish DB Downloader working folder consumes 1.7Tb for GRCh37 and 5.4Tb for GRCh38.
The pre-computed tables for VarFish consume 208Gb and the final
postgres database consumes 500Gb. Please make sure that there is enough free
space available. However, we recommend to exclude the large databases:
Frequency tables, extra annotations and dbSNP. Also, keep in mind that
importing the whole database takes >24h, depending on the speed of your HDD.
.. code-block:: bash
In the future, we plan to provide a pre-build package for import.
$ cd /plenty/space
$ wget https://file-public.bihealth.org/transient/varfish/varfish-server-background-db-20201006.tar.gz{,.sha256}
$ sha256sum -c varfish-server-background-db-20201006.tar.gz.sha256
$ tar xzvf varfish-server-background-db-20201006.tar.gz
We recommend to exclude the large databases: frequency tables, extra
annotations and dbSNP. Also, keep in mind that importing the whole database
takes >24h, depending on the speed of your HDD.

This is a list of the possible imports, sorted by its size:

=================== ==== ================== ===================================
=================== ==== ================== =============================
Component Size Exclude Function
=================== ==== ================== ===================================
=================== ==== ================== =============================
gnomAD_genomes 80G highly recommended frequency annotation
extra_annos 57G highly recommended diverse
dbSNP 56G highly recommended SNP annotation
gnomAD_exomes 6.0G highly recommended frequency annotation
knowngeneaa 4.5G highly recommended multiz alignment of 100 vertebrates
clinvar 2.4G highly recommended pathogenicity classification
ExAC 1.9G highly recommended frequency annotation
dbVar 623M recommended SNP annotation
thousand_genomes 312M recommended frequency annotation
gnomAD_SV 218M recommended SV frequency annotation
DGV 88M yes, import broken SV annotation
ensembl_regulatory 68M yes, import broken frequency annotation
gnomAD_constraints 13M yes, import broken frequency annotation
ensembltorefseq 8.6M identifier mapping
hgmd_public 6.3M yes, import broken gene annotation
ExAC_constraints 4.8M yes, import broken frequency annotation
hgnc 3.3M yes, import broken gene annotation
ensembltogenesymbol 1.8M yes, import broken identifier mapping
ensembl_genes 1.3M gene annotation
HelixMTdb 1.1M yes, import broken MT frequency annotation
MITOMAP 1.1M yes, import broken MT frequency annotation
refseq_genes 1.1M gene annotation
mtDB 514K yes, import broken MT frequency annotation
tads_hesc 258K domain annotation
tads_imr90 258K domain annotation
=================== ==== ================== ===================================
extra-annos 50G highly recommended diverse
dbSNP 32G highly recommended SNP annotation
thousand_genomes 6,5G highly recommended frequency annotation
gnomAD_exomes 6,0G highly recommended frequency annotation
knowngeneaa 4,5G highly recommended alignment annotation
clinvar 3,3G highly recommended pathogenicity classification
ExAC 1,9G highly recommended frequency annotation
dbVar 573M recommended SNP annotation
gnomAD_SV 250M recommended SV frequency annotation
ncbi_gene 151M gene annotation
ensembl_regulatory 77M frequency annotation
DGV 43M SV annotation
hpo 22M phenotype information
hgnc 15M gene annotation
gnomAD_constraints 13M frequency annotation
mgi 10M mouse gene annotation
ensembltorefseq 8,3M identifier mapping
hgmd_public 5,0M gene annotation
ExAC_constraints 4,6M frequency annotation
refseqtoensembl 2,0M identifier mapping
ensembltogenesymbol 1,6M identifier mapping
ensembl_genes 1,2M gene annotation
HelixMTdb 1,2M MT frequency annotation
refseqtogenesymbol 1,1M identifier mapping
refseq_genes 804K gene annotation
mim2gene 764K phenotype information
MITOMAP 660K MT frequency annotation
kegg 632K pathway annotation
mtDB 336K MT frequency annotation
tads_hesc 108K domain annotation
tads_imr90 108K domain annotation
vista 104K orthologous region annotation
acmg 16K disease gene annotation
=================== ==== ================== =============================

You can find the ``import_versions.tsv`` file in the root folder of the
package. This file determines which component (called ``table_group`` and
represented as folder in the package) gets imported when the import command is
issued. To exclude a table, simply comment out (``#``) or delete the line.
Excluding tables that are not required for development can reduce time and space
consumption.
Excluding tables that are not required for development can reduce time and
space consumption. Also, the GRCh38 tables can be excluded.

A space-consumption-friendly version of the file would look like this::

build table_group version
#GRCh37 clinvar 20210728
#GRCh37 dbSNP b155
#GRCh37 dbVar 20210728
#GRCh37 DGV 2016
#GRCh37 DGV 2020
GRCh37 ensembl_genes r104
#GRCh37 ensembl_regulatory 20210728
GRCh37 ensembltogenesymbol 20210728
#GRCh37 ensembltorefseq 20210728
#GRCh37 ExAC r1
#GRCh37 ExAC_constraints r0.3.1
#GRCh37 extra_annos 20210728
#GRCh37 gnomAD_constraints v2.1.1
#GRCh37 gnomAD_exomes r2.1.1
#GRCh37 gnomAD_genomes r2.1.1
#GRCh37 gnomAD_SV v2.1
#GRCh37 HelixMTdb 20200327
#GRCh37 hgmd_public ensembl_r104
#GRCh37 hgnc 20210728
#GRCh37 knowngeneaa 20210728
#GRCh37 MITOMAP 20210728
#GRCh37 mtDB 20210728
GRCh37 refseq_genes r105
GRCh37 tads_hesc dixon2012
GRCh37 tads_imr90 dixon2012
#GRCh37 thousand_genomes phase3
#GRCh37 vista 20210728

To perform the import, issue::

$ python manage.py import_tables --tables-path varfish-db-downloader

Performing the import twice will automatically skip tables that are already imported.
To re-import tables, add the ``--force`` parameter to the command::
build table_group version
GRCh37 acmg v2.0
#GRCh37 clinvar 20200929
#GRCh37 dbSNP b151
#GRCh37 dbVar latest
GRCh37 DGV 2016
GRCh37 ensembl_genes r96
GRCh37 ensembl_regulatory latest
GRCh37 ensembltogenesymbol latest
GRCh37 ensembltorefseq latest
GRCh37 ExAC_constraints r0.3.1
#GRCh37 ExAC r1
#GRCh37 extra-annos 20200704
GRCh37 gnomAD_constraints v2.1.1
#GRCh37 gnomAD_exomes r2.1
#GRCh37 gnomAD_genomes r2.1
#GRCh37 gnomAD_SV v2
GRCh37 HelixMTdb 20190926
GRCh37 hgmd_public ensembl_r75
GRCh37 hgnc latest
GRCh37 hpo latest
GRCh37 kegg april2011
#GRCh37 knowngeneaa latest
GRCh37 mgi latest
GRCh37 mim2gene latest
GRCh37 MITOMAP 20200116
GRCh37 mtDB latest
GRCh37 ncbi_gene latest
GRCh37 refseq_genes r105
GRCh37 refseqtoensembl latest
GRCh37 refseqtogenesymbol latest
GRCh37 tads_hesc dixon2012
GRCh37 tads_imr90 dixon2012
#GRCh37 thousand_genomes phase3
GRCh37 vista latest
#GRCh38 clinvar 20200929
#GRCh38 dbVar latest
#GRCh38 DGV 2016

$ python manage.py import_tables --tables-path varfish-db-downloader --force
To perform the import, issue:

.. code-block:: bash
$ python manage.py import_tables --tables-path /plenty/space/varfish-server-background-db-20201006
Performing the import twice will automatically skip tables that are already
imported. To re-import tables, add the ``--force`` parameter to the command:

.. code-block:: bash
$ python manage.py import_tables --tables-path varfish-db-downloader --force
64 changes: 54 additions & 10 deletions docs_manual/developer_installation.rst
Expand Up @@ -41,7 +41,9 @@ Install miniconda

miniconda helps to set up encapsulated Python environments.
This step is optional. You can also use pipenv, but to our experience,
resolving the dependencies in pipenv is terribly slow::
resolving the dependencies in pipenv is terribly slow.

.. code-block:: bash
$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ bash Miniconda3-latest-Linux-x86_64.sh -b -p ~/miniconda3
Expand All @@ -54,7 +56,9 @@ resolving the dependencies in pipenv is terribly slow::
Clone git repository
--------------------

Clone the VarFish Server repository and switch into the checkout::
Clone the VarFish Server repository and switch into the checkout.

.. code-block:: bash
$ git clone https://github.com/bihealth/varfish-server
$ cd varfish-server
Expand All @@ -64,7 +68,9 @@ Clone the VarFish Server repository and switch into the checkout::
Install Python Requirements
---------------------------

With the conda/Python environment activated, install all the requirements::
With the conda/Python environment activated, install all the requirements.

.. code-block:: bash
$ for i in requirements/*; do install -r $i; done
Expand All @@ -73,41 +79,79 @@ Setup Database
--------------

Use the tool provided in ``utility/`` to set up the database. The name for the
database should be ``varfish``::
database should be ``varfish``.

.. code-block:: bash
$ bash utility/setup_database.sh
------------
Setup vue.js
------------

Use the tool provided in ``utility/`` to set up vue.js.

.. code-block:: bash
$ bash utility/setup_vue_dev.sh
Open an additional terminal and switch into the vue directory. Then install
the VarFish vue app.

.. code-block:: bash
$ cd varfish/vueapp
$ npm install
When finished, keep this terminal open to run the vue app.

.. code-block:: bash
$ npm run serve
-------------
Setup VarFish
-------------

First, create a ``.env`` file with the following content::
First, create a ``.env`` file with the following content.

.. code-block:: bash
export DATABASE_URL="postgres://varfish:varfish@127.0.0.1/varfish"
export CELERY_BROKER_URL=redis://localhost:6379/0
export PROJECTROLES_ADMIN_OWNER=root
export DJANGO_SETTINGS_MODULE=config.settings.local
If you wish to enable structural variants, add the following line::
If you wish to enable structural variants, add the following line.

.. code-block:: bash
export VARFISH_ENABLE_SVS=1
To create the tables in the VarFish database, run the ``migrate`` command.
This step can take a few minutes::
This step can take a few minutes.

.. code-block:: bash
$ python manage.py migrate
Once done, create a superuser for your VarFish instance. By default, the VarFish root user is named ``root`` (the
setting can be changed in the ``.env`` file with the ``PROJECTROLES_ADMIN_OWNER`` variable)::
setting can be changed in the ``.env`` file with the ``PROJECTROLES_ADMIN_OWNER`` variable).

.. code-block:: bash
$ python manage.py createsuperuser
Last, download the icon sets for VarFish and make scripts, stylesheets and icons available::
Last, download the icon sets for VarFish and make scripts, stylesheets and icons available.

.. code-block:: bash
$ python manage.py geticons -c bi cil fa-regular fa-solid gridicons octicon
$ python manage.py collectstatic
When done, open two terminals and start the VarFish server and the celery server::
When done, open two terminals and start the VarFish server and the celery server.

.. code-block:: bash
terminal1$ make server
terminal2$ make celery
15 changes: 10 additions & 5 deletions docs_manual/developer_kiosk.rst
Expand Up @@ -12,14 +12,17 @@ organizing your cases properly. The mode serves only as a way to try out VarFish
Configuration
-------------

First, you need to download the VarFish annotator data (11Gb) and unpack it::
First, you need to download the VarFish annotator data (11Gb) and unpack it.

$ wget https://file-public.bihealth.org/transient/varfish/varfish-annotator-20191129.tar.gz
$ wget https://file-public.bihealth.org/transient/varfish/varfish-annotator-transcripts-20191129.tar.gz
.. code-block:: bash
$ wget https://file-public.bihealth.org/transient/varfish/varfish-annotator-{,transcripts-}20191129.tar.gz{,.sha256}
$ tar xzvf varfish-annotator-20191129.tar.gz
$ tar xzvf varfish-transcripts-20191129.tar.gz
If you want to enable Kiosk mode, add the following lines to the ``.env`` file::
If you want to enable Kiosk mode, add the following lines to the ``.env`` file.

.. code-block:: bash
export VARFISH_KIOSK_MODE=1
export VARFISH_KIOSK_VARFISH_ANNOTATOR_REFSEQ_SER_PATH=/path/to/varfish-annotator-transcripts-20191129/hg19_refseq_curated.ser
Expand All @@ -32,7 +35,9 @@ If you want to enable Kiosk mode, add the following lines to the ``.env`` file::
Run
---

To run the kiosk mode, simply (re)start the webserver server and the celery server::
To run the kiosk mode, simply (re)start the webserver server and the celery server.

.. code-block:: bash
terminal1$ make serve
terminal2$ make celery
Expand Down
2 changes: 1 addition & 1 deletion utility/install_vue_dev.sh
Expand Up @@ -3,7 +3,7 @@ echo "***********************************************"
echo "Installing Node.js"
echo "***********************************************"
curl -sL https://deb.nodesource.com/setup_12.x | sudo -E bash -
apt-get install -y nodejs
sudo apt-get install -y nodejs

echo "***********************************************"
echo "Installing Vue CLI and Init"
Expand Down

0 comments on commit 33ffcdb

Please sign in to comment.