GitHub - timothyjamesbecker/TensorSV: Tensor Based Structural Variation Analysis

Moment Based SV Calling and Genotyping

T. Becker and D.G. Shin,"TensorSV: structural variation inference 
using tensors and variable topology neural networks", 
2020 IEEE BIBM, Seoul, Korea (South), 2020, pp. 1356-1360

Requirements:

python 3.6+
cython 0.29+
numpy 1.18+
matplotlib 3.2.1
h5py 2.10
pysam 0.15.2
hfm 0.1.8
tensorflow 1.15.0 (works with tensorflow-gpu 1.15.0 for GPU as well)

PIP Installation:

python3 -m pip install https://github.com/timothyjamesbecker/TensorSV/releases/download/0.0.1/tensorsv-0.0.1.tar.gz

Basic Usage:

(1) Start by extracting the features from the BAM file using the hfm package. The script being used here: extractor.py is a high_level multi-bam aware extraction runner that ships with the hfm package. You can install this package from the git repo: https://github.com/timothyjamesbecker/hfm

extractor.py \
--ref_path ./reference_sequence.fa \
--in_path ./folder_of_bam_files/ \
--out_dir ./output_hdf5_files/ \
--seqs chr1,chr2, ... ,chr22,chrX,chrY,chrM \
--window 25 \
--branch 2 \
--cpus 12

(2) Next you need to normalize and standardize the HFM hdf5 files and capture targets if training is desired using the TensorSV script data_prep.py shown below. This script can run in parallel for each sample so setting your cpus to the number of samples when you have enough processors and memory is suggested. The result of this step will produce one *.norm.hdf file and one *.label.hdf per sample. For training you can run the data_prep.merge_samples function to mix together any samples that have under gone this process.

data_prep.py \
--vcf_in_path ./hgsv_hg38_hfm_server/hgsv.illumina.hg38.all.geno.vcf.gz \
--hfm_in_path ./hgsv_hg38_hfm_server/ \
--out_path ./hgsv_hg38_hfm_server/ \
--cpus 9

(3) Now you can either train a new SV model using train_sv.py or use an existing one in the next step.

train_sv.py \
--in_path ./hgsv_hg38_hfm_server/tensors/hgsv.hg38.labels.hdf5 \
--sub_sample_not 0.75 \
--out_path ./hgsv_hg38_hfm_server/cnn_75/ \
--sv_types DEL \ 
--filters all \ 
--form cnn  \ 
--levels 2,4 \
--cmxs 2,4  \
--batches 32,64,128 \
--epochs 10,25 \
--decays 1e-5,2e-5 \
--split_seed 0 \
--gpu_num 0 \
--verbose

(4) Now you can run the predict_sv.py on the normalized hdf5 from step (2) If you have used training diagnostics and your folder contains true and comparable calls, this will produce metrics on your calls to show your model accuracy.

predict_sv.py \
--base_dir ./hgsv_hg38_hfm_server/ \
--run_dir ./hgsv_hg38_hfm_server/cnn_75/ \
--out_dir ./hgsv_hg38_hfm_server/cnn_75_result/ \
--samples  HG00096,HG00268 \
--sv_type DEL \
--seqs chr19,chr20,chr21,chr22 \
--gpu_num 0

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
bin		bin
images		images
tensorsv		tensorsv
.travis.yml		.travis.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Moment Based SV Calling and Genotyping

Requirements:

PIP Installation:

Basic Usage:

About

Releases

Packages

Languages

License

timothyjamesbecker/TensorSV

Folders and files

Latest commit

History

Repository files navigation

Moment Based SV Calling and Genotyping

Requirements:

PIP Installation:

Basic Usage:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages