Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
90 lines (52 sloc) 6.55 KB

TRIgS - Tools for Rendering Ig Sequences

A collection of bioinformatics tools for sequence analysis, with an emphasis on Next-Generation Sequencing and Rep-Seq.

Online versions of some tools are available on our website.

Tools for Clonal Analysis
Tools for Junction Parsing and Results Manipulation
Tools for FASTA file Manipulation

This document illustrates the use of Trigs in combination with other tools in a recent analysis.

Tools for Clonal Analysis

ClusterSeqs partitions sequences into clusters using single-linkage clustering. In a recent analysis, 426,000 junction sequences were clustered in just under 90 minutes. NeighbourDist analyses nearest-neighbour distances to guide partitioning, and can down-sample to handle large datasets.

ClusterGraph converts output from ClusterSeqs or CD-HIT into a form that can be imported by Gephi.

ClusterStats produces some overview statistics of samples clustered in a ClusterSeqs output file.


Example Gephi output, produced from the analysis of an NGS-based heavy-chain repertoire

AnnotateTreeCmd creates annotated lineage trees and sequence alignments showing the point at which amino acid substitutions occur. It uses PHYLIP’s dnaml for ancestral reconstruction. Sequence numbering can be defined by the user, for example to match the numbering of a crystal structure, or to match a standard numbering scheme. If the sequences represent a B-cell clonal lineage, additional reports relating to variation in the CDRs can be produced.

Image
Part of a tree produced by AnnotateTreeCmd, showing amino acid substitutions

                                                         1         1         1     
                            7        8         9         0         1         2     
                   6784567890124567890123456789012345678901234567890123456789012345
consensus_germ_vdj YYSDSDKSTAQSVQGRFTASKDSSNLYLHMNQLKTEDSAVYYCA-EW-GAFDYWGKGTMVTVTS
216                SVG...N..................F...........T......R.RY................
605                SVG......................F...........T......R.RY..........NGHCHI
742                PVG......................F...........T......R.ID................
154                SVG......................F...........T......R.RY................


Part of an alignment produced by AnnotateTreeCmd, showing custom numbering including deletions

RevertToGermlineCmd uses a simple approach to infer the germline ancestor of a B-cell variable region sequence, given the IMGT junction analysis. If a clonal lineage is available, the inferred germline can be used to root a phylogenetic tree, from which a more accurate germline can then be inferred using AnnotateTree.

Tools for Junction Parsing and Results Manipulation

IgBLASTPlus processes the output from NCBI's IgBLAST, providing a full junction analysis and summarising results in an IMGT-style tab-separated format. This makes it possible to use an in-house copy of IgBLAST in place of IMGT High-V Quest. IgBLAST analyses of 3-4 million records will typically complete in under an hour.

The following tools will work on tab- or comma- separated files such as those produced by IgBLASTPlus, IMGT, or CHANGE-O:

ExtractFromIMGT is a flexible tool for extracting sequences in FASTA format. Options allow filtering by germlines, restriction of sequences to specific regions, and so on.

PlotGermline creates histograms showing germline usage.

Spectratype creates histograms showing CDR3 length distribution.

AbIdentity calculates sequence identities compared to a target sequence and a germline.

PlotIdentity creates repertoire identity/divergence plots.

ClusterExtract uses the sequence IDs in a ClusterSeqs output file to extract all records corresponding to a nominated cluster.

Tools for FASTA file Manipulation

A collection of small utilities that have proved useful in analysis pipelines.

CountRecords counts the number of records in a file, accounting for duplicates noted in the header.

FastaMatch filters records whose sequence or ID match a regular expression.

FastaSample extracts a random sample of records.

FastaUniq removes duplicate records (identified by ID).

FastaSampleUniq counts the number of unique sequences in a set of samples, accounting for duplicates noted in the header.

Installation and Usage

To use the tools, clone this repository or download and unzip the Zip file. All tools require Python 2 (v 2.7 or later), and BioPython.. Additional dependencies and usage information are given in the links above for each tool.

Further Information

William D. Lees and Adrian J. Shepherd, “Utilities for High-Throughput Analysis of B-Cell Clonal Lineages,” Journal of Immunology Research, vol. 2015, Article ID 323506, 9 pages, 2015. doi:10.1155/2015/323506

  • Download citation as EndNote

    Contact

    william@lees.org.uk