HIV-TRACE is an application that identifies potential transmission clusters within a supplied FASTA file with an option to find potential links against the Los Alamos HIV Sequence Database.
- gcc >= 6.0.0
- python3 >= 3.5.1
- tn93 >= 1.0.6
HIV-TRACE requires tn93 be installed and python3.
Install using pip
pip3 install biopython pip3 install numpy pip3 install scipy pip3 install hivtrace
Tested with Python
hivtrace -i ./INPUT.FASTA -a resolve -r HXB2_prrt -t .015 -m 500 -g .05 -c
A FASTA file, with nucleotide sequences to be analyzed. Each sequence will be aligned to the chosen reference sequence prior to network inference. Sequence names may include munged attributes, e.g. ISOLATE_XYZ|2005|SAN DIEGO|MSM
Handle ambiguious nucleotides using one of the following specified strategies.
|resolve||count any resolutions that match as a perfect match|
|average||average all possible resolutions|
|skip||skip all positions with ambiguities|
|gapmm||count character-gap positions as 4-way mismatches, otherwise same as average|
For more details, please see the the MBE paper.
The sequence that will be used to align all provided sequences to. It is assumed that the input sequences are in fact homologous to the reference and do not have too much indel variation.
|Path/to/FASTA/file||Path to a custom reference file|
Please reference the landmarks of the HIV-1 genome if the presets seem foreign to you.
Two sequences will be connected with a putative link (subject to filtering, see below), if and only if their pairwise distance does not exceed this threshold.
Only sequences who overlap by at least this many non-gap characters will be included in distance calculations. Be sure to adjust this based on the length of the input sequences. You should aim to have at least 2/(distance threshold) aligned characters.
Affects only the Resolve option for handling ambiguities. Any sequence with no more than the selected proportion [0 - 1] will have its ambiguities resolved (if possible), and ambiguities in sequences with higher fractions of them will be averaged. This mitigates spurious linkages due to highly ambiguous sequences.
Screen for contaminants by marking or removing sequences that cluster with any of the contaminant IDs.
|remove||Remove spurious edges from the inferred network|
|report||Flag all sequences sharing a cluster with the reference|
|separately||Flag all sequences and report them via secondary tn93 command|
Use a phylogenetic test of conditional independence on each triangle in the network to remove spurious transitive connections which make A->B->C chains look like A-B-C triangles.
|remove||reports supurious transitive connections|
|report||removes supurious transitive connections|
Masks known DRAMs (Drug Resistance-Associated Mutation) positions from provided sequences.
|lewis||Mask (with ---) the list of codon sites defined in Lewis et al.|
|wheeler||Mask (with ---) the list of codon sites defined in Wheeler et al.|
Compare uploaded sequences to all public sequences. Retrieved periodically from the Los Alamos HIV Sequence Database
Specify output filename. If no output filename is provided, then the output filename will be <input_filename>.results.json
Viewing JSON files
You can either use the command
hivtrace_viz <path_to_json_file> or visit
https://veg.github.io/hivtrace-viz/ and click Load File.