Revised parts of the developers documentation. (#274)

varfish-org · Jan 11, 2022 · 33ffcdb · 33ffcdb
1 parent 179358a
commit 33ffcdb
Show file tree

Hide file tree

Showing 4 changed files with 164 additions and 90 deletions.
diff --git a/docs_manual/developer_database.rst b/docs_manual/developer_database.rst
@@ -4,94 +4,119 @@
 Database Import
 ===============
 
-To prepare the VarFish database, follow `the instructions for the VarFish DB Downloader <https://github.com/bihealth/varfish-db-downloader>`_.
-Downloading and processing the data can take multiple days.
+First, download the pre-build database files that we provide and unpack them.
+Please make sure that you have enough space available. The packed file consumes
+31 Gb. When unpacked, it consumed additional 188 Gb.
 
-The VarFish DB Downloader working folder consumes 1.7Tb for GRCh37 and 5.4Tb for GRCh38.
-The pre-computed tables for VarFish consume 208Gb and the final
-postgres database consumes 500Gb. Please make sure that there is enough free
-space available. However, we recommend to exclude the large databases:
-Frequency tables, extra annotations and dbSNP. Also, keep in mind that
-importing the whole database takes >24h, depending on the speed of your HDD.
+.. code-block:: bash
 
-In the future, we plan to provide a pre-build package for import.
+    $ cd /plenty/space
+    $ wget https://file-public.bihealth.org/transient/varfish/varfish-server-background-db-20201006.tar.gz{,.sha256}
+    $ sha256sum -c varfish-server-background-db-20201006.tar.gz.sha256
+    $ tar xzvf varfish-server-background-db-20201006.tar.gz
+
+We recommend to exclude the large databases: frequency tables, extra
+annotations and dbSNP. Also, keep in mind that importing the whole database
+takes >24h, depending on the speed of your HDD.
 
 This is a list of the possible imports, sorted by its size:
 
-===================  ====  ==================  ===================================
+===================  ====  ==================  =============================
 Component            Size  Exclude             Function
-===================  ====  ==================  ===================================
+===================  ====  ==================  =============================
 gnomAD_genomes       80G   highly recommended  frequency annotation
-extra_annos          57G   highly recommended  diverse
-dbSNP                56G   highly recommended  SNP annotation
-gnomAD_exomes        6.0G  highly recommended  frequency annotation
-knowngeneaa          4.5G  highly recommended  multiz alignment of 100 vertebrates
-clinvar              2.4G  highly recommended  pathogenicity classification
-ExAC                 1.9G  highly recommended  frequency annotation
-dbVar                623M  recommended         SNP annotation
-thousand_genomes     312M  recommended         frequency annotation
-gnomAD_SV            218M  recommended         SV frequency annotation
-DGV                  88M   yes, import broken  SV annotation
-ensembl_regulatory   68M   yes, import broken  frequency annotation
-gnomAD_constraints   13M   yes, import broken  frequency annotation
-ensembltorefseq      8.6M                      identifier mapping
-hgmd_public          6.3M  yes, import broken  gene annotation
-ExAC_constraints     4.8M  yes, import broken  frequency annotation
-hgnc                 3.3M  yes, import broken  gene annotation
-ensembltogenesymbol  1.8M  yes, import broken  identifier mapping
-ensembl_genes        1.3M                      gene annotation
-HelixMTdb            1.1M  yes, import broken  MT frequency annotation
-MITOMAP              1.1M  yes, import broken  MT frequency annotation
-refseq_genes         1.1M                      gene annotation
-mtDB                 514K  yes, import broken  MT frequency annotation
-tads_hesc            258K                      domain annotation
-tads_imr90           258K                      domain annotation
-===================  ====  ==================  ===================================
+extra-annos          50G   highly recommended  diverse
+dbSNP                32G   highly recommended  SNP annotation
+thousand_genomes     6,5G  highly recommended  frequency annotation
+gnomAD_exomes        6,0G  highly recommended  frequency annotation
+knowngeneaa          4,5G  highly recommended  alignment annotation
+clinvar              3,3G  highly recommended  pathogenicity classification
+ExAC                 1,9G  highly recommended  frequency annotation
+dbVar                573M  recommended         SNP annotation
+gnomAD_SV            250M  recommended         SV frequency annotation
+ncbi_gene            151M                      gene annotation 
+ensembl_regulatory   77M                       frequency annotation
+DGV                  43M                       SV annotation
+hpo                  22M                       phenotype information
+hgnc                 15M                       gene annotation
+gnomAD_constraints   13M                       frequency annotation
+mgi                  10M                       mouse gene annotation
+ensembltorefseq      8,3M                      identifier mapping
+hgmd_public          5,0M                      gene annotation
+ExAC_constraints     4,6M                      frequency annotation
+refseqtoensembl      2,0M                      identifier mapping
+ensembltogenesymbol  1,6M                      identifier mapping
+ensembl_genes        1,2M                      gene annotation
+HelixMTdb            1,2M                      MT frequency annotation
+refseqtogenesymbol   1,1M                      identifier mapping
+refseq_genes         804K                      gene annotation
+mim2gene             764K                      phenotype information
+MITOMAP              660K                      MT frequency annotation
+kegg                 632K                      pathway annotation
+mtDB                 336K                      MT frequency annotation
+tads_hesc            108K                      domain annotation
+tads_imr90           108K                      domain annotation
+vista                104K                      orthologous region annotation
+acmg                 16K                       disease gene annotation
+===================  ====  ==================  =============================
 
 You can find the ``import_versions.tsv`` file in the root folder of the
 package. This file determines which component (called ``table_group`` and
 represented as folder in the package) gets imported when the import command is
 issued. To exclude a table, simply comment out (``#``) or delete the line.
-Excluding tables that are not required for development can reduce time and space
-consumption.
+Excluding tables that are not required for development can reduce time and
+space consumption. Also, the GRCh38 tables can be excluded.
 
 A space-consumption-friendly version of the file would look like this::
 
-    build   table_group version
-    #GRCh37 clinvar 20210728
-    #GRCh37 dbSNP   b155
-    #GRCh37 dbVar   20210728
-    #GRCh37  DGV 2016
-    #GRCh37  DGV 2020
-    GRCh37  ensembl_genes   r104
-    #GRCh37  ensembl_regulatory  20210728
-    GRCh37  ensembltogenesymbol 20210728
-    #GRCh37  ensembltorefseq 20210728
-    #GRCh37 ExAC    r1
-    #GRCh37  ExAC_constraints    r0.3.1
-    #GRCh37 extra_annos 20210728
-    #GRCh37  gnomAD_constraints  v2.1.1
-    #GRCh37 gnomAD_exomes   r2.1.1
-    #GRCh37 gnomAD_genomes  r2.1.1
-    #GRCh37 gnomAD_SV   v2.1
-    #GRCh37  HelixMTdb   20200327
-    #GRCh37  hgmd_public ensembl_r104
-    #GRCh37  hgnc    20210728
-    #GRCh37 knowngeneaa 20210728
-    #GRCh37  MITOMAP 20210728
-    #GRCh37  mtDB    20210728
-    GRCh37  refseq_genes    r105
-    GRCh37  tads_hesc   dixon2012
-    GRCh37  tads_imr90  dixon2012
-    #GRCh37 thousand_genomes    phase3
-    #GRCh37  vista   20210728
-
-To perform the import, issue::
-
-    $ python manage.py import_tables --tables-path varfish-db-downloader
-
-Performing the import twice will automatically skip tables that are already imported.
-To re-import tables, add the ``--force`` parameter to the command::
+    build	table_group	version
+    GRCh37	acmg	v2.0
+    #GRCh37	clinvar	20200929
+    #GRCh37	dbSNP	b151
+    #GRCh37	dbVar	latest
+    GRCh37	DGV	2016
+    GRCh37	ensembl_genes	r96
+    GRCh37	ensembl_regulatory	latest
+    GRCh37	ensembltogenesymbol	latest
+    GRCh37	ensembltorefseq	latest
+    GRCh37	ExAC_constraints	r0.3.1
+    #GRCh37	ExAC	r1
+    #GRCh37	extra-annos	20200704
+    GRCh37	gnomAD_constraints	v2.1.1
+    #GRCh37	gnomAD_exomes	r2.1
+    #GRCh37	gnomAD_genomes	r2.1
+    #GRCh37	gnomAD_SV	v2
+    GRCh37	HelixMTdb	20190926
+    GRCh37	hgmd_public	ensembl_r75
+    GRCh37	hgnc	latest
+    GRCh37	hpo	latest
+    GRCh37	kegg	april2011
+    #GRCh37	knowngeneaa	latest
+    GRCh37	mgi	latest
+    GRCh37	mim2gene	latest
+    GRCh37	MITOMAP	20200116
+    GRCh37	mtDB	latest
+    GRCh37	ncbi_gene	latest
+    GRCh37	refseq_genes	r105
+    GRCh37	refseqtoensembl	latest
+    GRCh37	refseqtogenesymbol	latest
+    GRCh37	tads_hesc	dixon2012
+    GRCh37	tads_imr90	dixon2012
+    #GRCh37	thousand_genomes	phase3
+    GRCh37	vista	latest
+    #GRCh38	clinvar	20200929
+    #GRCh38	dbVar	latest
+    #GRCh38	DGV	2016
 
-    $ python manage.py import_tables --tables-path varfish-db-downloader --force
+To perform the import, issue:
+
+.. code-block:: bash
+
+    $ python manage.py import_tables --tables-path /plenty/space/varfish-server-background-db-20201006
 
+Performing the import twice will automatically skip tables that are already
+imported. To re-import tables, add the ``--force`` parameter to the command:
+
+.. code-block:: bash
+
+    $ python manage.py import_tables --tables-path varfish-db-downloader --force
diff --git a/docs_manual/developer_installation.rst b/docs_manual/developer_installation.rst
@@ -41,7 +41,9 @@ Install miniconda
 
 miniconda helps to set up encapsulated Python environments.
 This step is optional. You can also use pipenv, but to our experience,
-resolving the dependencies in pipenv is terribly slow::
+resolving the dependencies in pipenv is terribly slow.
+
+.. code-block:: bash
 
     $ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
     $ bash Miniconda3-latest-Linux-x86_64.sh -b -p ~/miniconda3
@@ -54,7 +56,9 @@ resolving the dependencies in pipenv is terribly slow::
 Clone git repository
 --------------------
 
-Clone the VarFish Server repository and switch into the checkout::
+Clone the VarFish Server repository and switch into the checkout.
+
+.. code-block:: bash
 
     $ git clone https://github.com/bihealth/varfish-server
     $ cd varfish-server
@@ -64,7 +68,9 @@ Clone the VarFish Server repository and switch into the checkout::
 Install Python Requirements
 ---------------------------
 
-With the conda/Python environment activated, install all the requirements::
+With the conda/Python environment activated, install all the requirements.
+
+.. code-block:: bash
 
     $ for i in requirements/*; do install -r $i; done
 
@@ -73,41 +79,79 @@ Setup Database
 --------------
 
 Use the tool provided in ``utility/`` to set up the database. The name for the
-database should be ``varfish``::
+database should be ``varfish``.
+
+.. code-block:: bash
 
     $ bash utility/setup_database.sh
 
+------------
+Setup vue.js
+------------
+
+Use the tool provided in ``utility/`` to set up vue.js.
+
+.. code-block:: bash
+
+    $ bash utility/setup_vue_dev.sh
+
+Open an additional terminal and switch into the vue directory. Then install
+the VarFish vue app.
+
+.. code-block:: bash
+
+    $ cd varfish/vueapp
+    $ npm install
+
+When finished, keep this terminal open to run the vue app.
+
+.. code-block:: bash
+
+    $ npm run serve
+
 -------------
 Setup VarFish
 -------------
 
-First, create a ``.env`` file with the following content::
+First, create a ``.env`` file with the following content.
+
+.. code-block:: bash
 
     export DATABASE_URL="postgres://varfish:varfish@127.0.0.1/varfish"
     export CELERY_BROKER_URL=redis://localhost:6379/0
     export PROJECTROLES_ADMIN_OWNER=root
     export DJANGO_SETTINGS_MODULE=config.settings.local
 
-If you wish to enable structural variants, add the following line::
+If you wish to enable structural variants, add the following line.
+
+.. code-block:: bash
 
     export VARFISH_ENABLE_SVS=1
 
 To create the tables in the VarFish database, run the ``migrate`` command.
-This step can take a few minutes::
+This step can take a few minutes.
+
+.. code-block:: bash
 
     $ python manage.py migrate
 
 Once done, create a superuser for your VarFish instance. By default, the VarFish root user is named ``root`` (the
-setting can be changed in the ``.env`` file with the ``PROJECTROLES_ADMIN_OWNER`` variable)::
+setting can be changed in the ``.env`` file with the ``PROJECTROLES_ADMIN_OWNER`` variable).
+
+.. code-block:: bash
 
     $ python manage.py createsuperuser
 
-Last, download the icon sets for VarFish and make scripts, stylesheets and icons available::
+Last, download the icon sets for VarFish and make scripts, stylesheets and icons available.
+
+.. code-block:: bash
 
     $ python manage.py geticons -c bi cil fa-regular fa-solid gridicons octicon
     $ python manage.py collectstatic
 
-When done, open two terminals and start the VarFish server and the celery server::
+When done, open two terminals and start the VarFish server and the celery server.
+
+.. code-block:: bash
 
     terminal1$ make server
     terminal2$ make celery
diff --git a/docs_manual/developer_kiosk.rst b/docs_manual/developer_kiosk.rst
@@ -12,14 +12,17 @@ organizing your cases properly. The mode serves only as a way to try out VarFish
 Configuration
 -------------
 
-First, you need to download the VarFish annotator data (11Gb) and unpack it::
+First, you need to download the VarFish annotator data (11Gb) and unpack it.
 
-    $ wget https://file-public.bihealth.org/transient/varfish/varfish-annotator-20191129.tar.gz
-    $ wget https://file-public.bihealth.org/transient/varfish/varfish-annotator-transcripts-20191129.tar.gz
+.. code-block:: bash
+
+    $ wget https://file-public.bihealth.org/transient/varfish/varfish-annotator-{,transcripts-}20191129.tar.gz{,.sha256}
     $ tar xzvf varfish-annotator-20191129.tar.gz
     $ tar xzvf varfish-transcripts-20191129.tar.gz
 
-If you want to enable Kiosk mode, add the following lines to the ``.env`` file::
+If you want to enable Kiosk mode, add the following lines to the ``.env`` file.
+
+.. code-block:: bash
 
     export VARFISH_KIOSK_MODE=1
     export VARFISH_KIOSK_VARFISH_ANNOTATOR_REFSEQ_SER_PATH=/path/to/varfish-annotator-transcripts-20191129/hg19_refseq_curated.ser
@@ -32,7 +35,9 @@ If you want to enable Kiosk mode, add the following lines to the ``.env`` file::
 Run
 ---
 
-To run the kiosk mode, simply (re)start the webserver server and the celery server::
+To run the kiosk mode, simply (re)start the webserver server and the celery server.
+
+.. code-block:: bash
 
     terminal1$ make serve
     terminal2$ make celery

diff --git a/utility/install_vue_dev.sh b/utility/install_vue_dev.sh
@@ -3,7 +3,7 @@ echo "***********************************************"
 echo "Installing Node.js"
 echo "***********************************************"
 curl -sL https://deb.nodesource.com/setup_12.x | sudo -E bash -
-apt-get install -y nodejs
+sudo apt-get install -y nodejs
 
 echo "***********************************************"
 echo "Installing Vue CLI and Init"