Skip to content

Commit

Permalink
formatting document
Browse files Browse the repository at this point in the history
  • Loading branch information
tkmamidi committed Jan 25, 2024
1 parent 7e5e303 commit a150750
Show file tree
Hide file tree
Showing 4 changed files with 15 additions and 20 deletions.
16 changes: 5 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,11 @@ Markdown](https://github.com/uab-cgds-worthey/DITTO/actions/workflows/linting.ym

***!!! For research purposes only !!!***

> **_NOTE:_** In a past life, DITTO used a different remote Git management provider, [UAB
> ***NOTE:*** In a past life, DITTO used a different remote Git management provider, [UAB
> Gitlab](https://gitlab.rc.uab.edu/center-for-computational-genomics-and-data-science/sciops/ditto). It was migrated to
> Github in April 2023, and the Gitlab version has been archived.

**Aim:** We aim to develop a pipeline for accurate and rapid interpretation of genetic variants for pathogenicity using patient’s genotype (VCF) information.

We aim to develop a pipeline for accurate and rapid interpretation of genetic variants for pathogenicity using patient’s genotype (VCF) information.

## Usage

Expand All @@ -28,7 +26,7 @@ in this [GitHub repo](https://github.com/uab-cgds-worthey/DITTO-API).

### Setting up to use locally

> **_NOTE:_** This setup will allow one to annotate a VCF sample and make DITTO predictions. Currently tested only in Cheaha (UAB HPC). Docker versions may need to be explored later to make it
> ***NOTE:*** This setup will allow one to annotate a VCF sample and make DITTO predictions. Currently tested only in Cheaha (UAB HPC). Docker versions may need to be explored later to make it
> useable in Mac and Windows.
#### System Requirements
Expand Down Expand Up @@ -63,12 +61,11 @@ git clone https://github.com/uab-cgds-worthey/DITTO.git

Please follow the steps mentioned in [install_openCravat.md](docs/install_openCravat.md).

> **_NOTE:_** Current version of OpenCravat that we're using doesn't support "Spanning or overlapping deletions"
> ***NOTE:*** Current version of OpenCravat that we're using doesn't support "Spanning or overlapping deletions"
> variants i.e. variants with `*` in `ALT Allele` column. More on these variants
> [here](https://gatk.broadinstitute.org/hc/en-us/articles/360035531912-Spanning-or-overlapping-deletions-allele-).
> These will be ignored when running the pipeline.

#### Run DITTO pipeline

Create an environment via conda or pip. Below is an example to install `nextflow`.
Expand All @@ -85,7 +82,6 @@ conda activate ditto-env
conda install bioconda::nextflow
```


Please make a samplesheet with VCF files (incl. path). Please make sure to edit the directory paths as needed.

```sh
Expand All @@ -103,14 +99,12 @@ To run on UAB cheaha, please update the `model.job` file and submit a slurm job
sbatch model.job
```


## Reproducing the DITTO model

Detailed instructions on reproducing the model is explained in [build_DITTO.md](docs/build_DITTO.md)


## Contact information

For queries, send an email with clear description to

Tarun Mamidi - tmamidi@uab.edu
Tarun Mamidi - <tmamidi@uab.edu>
10 changes: 3 additions & 7 deletions docs/build_DITTO.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ sources.

:fire: DITTO is currently trained on variants from ClinVar and is not intended for clinical use.


## System Requirements

*OS:*
Expand All @@ -30,18 +29,16 @@ sources.
- Storage: ~1TB (includes annotation databases from OpenCravat)
- RAM: ~50GB

> **_NOTE:_** We used 10 CPU cores, 50GB memory for training DITTO. The tuning and training process took ~16 hrs. Since
> ***NOTE:*** We used 10 CPU cores, 50GB memory for training DITTO. The tuning and training process took ~16 hrs. Since
> DITTO uses tensorflow architecture, this process can be potentially accelerated using GPUs.

## Installation

### Requirements:
### Requirements

- DITTO repo from GitHub
- OpenCravat with databases to annotate


To fetch DITTO source code, change in to directory of your choice and run:

```sh
Expand Down Expand Up @@ -75,7 +72,7 @@ Download the latest clinVar variants: [Download VCF](https://ftp.ncbi.nlm.nih.go
oc run clinvar.vcf.gz -l hg38 -t csv --package mypackage -d path/to/output/directory/
```

> **_NOTE:_** By default OpenCravat uses all available CPUs. Please specify the number of CPU cores using this parameter
> ***NOTE:*** By default OpenCravat uses all available CPUs. Please specify the number of CPU cores using this parameter
> in the above command `--mp 2`. Minimum number of CPUs to use is 2.
## Preprocessing
Expand Down Expand Up @@ -116,7 +113,6 @@ Follow the below steps to install and add more databases for annotation and befo

4. Follow the steps from Preprocessing above to parse, filter, process, tune and train DITTO.


## Benchmarking

Please follow the [python notebook](../src/analysis/opencravat_latest_benchmarking-Consequence_80_20.ipynb) to benchmark
Expand Down
5 changes: 5 additions & 0 deletions docs/install_openCravat.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
# OpenCravat

Original documentation for OpenCravat can be found [here](https://open-cravat.readthedocs.io/en/latest/index.html).

## Installation

### Create conda environment

```sh
# create conda environment. Needed only the first time.
conda create -n opencravat
Expand All @@ -13,9 +15,11 @@ conda activate opencravat
```

### Install openCravat

```sh
pip3 install open-cravat==2.4.1
```

### Set Modules Directory

Use `oc config md` to see where modules directory is currently pointed to. To change the modules directory, use `oc
Expand All @@ -24,6 +28,7 @@ config md [new directory]` to point OpencRAVAT to the new directory.
Test it by using `oc config md` command. It should output the new modules directory.

### Install necessary modules for DITTO

```sh
oc module install-base

Expand Down
4 changes: 2 additions & 2 deletions src/analysis/filter.sh
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Filter the DITTO scores and other annotations after running the pipeline. Example tested on CAGI project

#!/bin/bash

# Filter the DITTO scores and other annotations after running the pipeline. Example tested on CAGI project

# Specify the input folder containing the CSV files
input_folder="/data/project/worthey_lab/projects/experimental_pipelines/tarun/DITTO/data/processed/CAGI_TR/"

Expand Down

0 comments on commit a150750

Please sign in to comment.