A collection of bioinformatics tools for sequence analysis, with an emphasis on Next-Generation Sequencing and Rep-Seq.
Python Other
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
docs Add explanatory note re. regex Jul 22, 2017
testfiles Add checking for missing columns, missing records. Jan 30, 2017
.gitattributes SVG files as binary Oct 9, 2015
.gitignore First commit. Oct 11, 2015
AbIdentity.py Use wb flag to prevent blank lines in csv files written under Windows Aug 17, 2016
Alignment.py First commit. Oct 11, 2015
AnalyseCDR.py First commit. Oct 11, 2015
AnnotateTreeCmd.py First commit. Oct 11, 2015
ClusterExtract.py Write all IMGT style files as tab-separated Jul 15, 2016
ClusterGraph.py Remove un-needed imports Sep 22, 2016
ClusterSeqs.py Include all clusters with >1 member in the timeline file, even if the… Aug 6, 2016
ClusterStats.py Remove superfluous argument Aug 8, 2016
CountRecords.py First commit of new pipeline tools Jul 15, 2016
Dnaml.ctl First commit. Oct 11, 2015
Dnaml.py First commit. Oct 11, 2015
ExtractFromIMGT.py Explicitly accept the ChangeO standard SEQUENCE_ID Aug 4, 2017
FastaMatch.py Use rec.description rather than id to access full FASTA header Jul 24, 2017
FastaSample.py First commit of new pipeline tools Jul 15, 2016
FastaSampleUniq.py Fix unique flag Aug 4, 2016
FastaSort.py Typo Aug 5, 2016
FastaUniq.py Typo Aug 5, 2016
Germlib.py First commit. Oct 11, 2015
GermlineFromIMGT.py First commit. Oct 11, 2015
IgBLASTPlus.py Updates for compatibility with IgBLAST v1.6 Sep 13, 2016
LICENSE Initial commit Oct 9, 2015
NeighbourDist.py Support CSV output Aug 12, 2016
PlotGermline.py Check that there is data to plot, dont try to plot null data Jan 30, 2017
PlotIdentity.py Fix typo Sep 22, 2016
README.md Newer figures Sep 22, 2016
RenderTree.py First commit. Oct 11, 2015
RevertToGermlineCmd.py First commit. Oct 11, 2015
Spectratype.py Add checking for missing columns, missing records. Jan 30, 2017

README.md

TRIgS - Tools for Rendering Ig Sequences

A collection of bioinformatics tools for sequence analysis, with an emphasis on Next-Generation Sequencing and Rep-Seq.

Online versions of some tools are available on our website.

Tools for Clonal Analysis
Tools for Junction Parsing and Results Manipulation
Tools for FASTA file Manipulation

This document illustrates the use of Trigs in combination with other tools in a recent analysis.

Tools for Clonal Analysis

ClusterSeqs partitions sequences into clusters using single-linkage clustering. In a recent analysis, 426,000 junction sequences were clustered in just under 90 minutes. NeighbourDist analyses nearest-neighbour distances to guide partitioning, and can down-sample to handle large datasets.

ClusterGraph converts output from ClusterSeqs or CD-HIT into a form that can be imported by Gephi.

ClusterStats produces some overview statistics of samples clustered in a ClusterSeqs output file.


Example Gephi output, produced from the analysis of an NGS-based heavy-chain repertoire

AnnotateTreeCmd creates annotated lineage trees and sequence alignments showing the point at which amino acid substitutions occur. It uses PHYLIP’s dnaml for ancestral reconstruction. Sequence numbering can be defined by the user, for example to match the numbering of a crystal structure, or to match a standard numbering scheme. If the sequences represent a B-cell clonal lineage, additional reports relating to variation in the CDRs can be produced.

Image
Part of a tree produced by AnnotateTreeCmd, showing amino acid substitutions

                                                         1         1         1     
                            7        8         9         0         1         2     
                   6784567890124567890123456789012345678901234567890123456789012345
consensus_germ_vdj YYSDSDKSTAQSVQGRFTASKDSSNLYLHMNQLKTEDSAVYYCA-EW-GAFDYWGKGTMVTVTS
216                SVG...N..................F...........T......R.RY................
605                SVG......................F...........T......R.RY..........NGHCHI
742                PVG......................F...........T......R.ID................
154                SVG......................F...........T......R.RY................


Part of an alignment produced by AnnotateTreeCmd, showing custom numbering including deletions

RevertToGermlineCmd uses a simple approach to infer the germline ancestor of a B-cell variable region sequence, given the IMGT junction analysis. If a clonal lineage is available, the inferred germline can be used to root a phylogenetic tree, from which a more accurate germline can then be inferred using AnnotateTree.

Tools for Junction Parsing and Results Manipulation

IgBLASTPlus processes the output from NCBI's IgBLAST, providing a full junction analysis and summarising results in an IMGT-style tab-separated format. This makes it possible to use an in-house copy of IgBLAST in place of IMGT High-V Quest. IgBLAST analyses of 3-4 million records will typically complete in under an hour.

The following tools will work on tab- or comma- separated files such as those produced by IgBLASTPlus, IMGT, or CHANGE-O:

ExtractFromIMGT is a flexible tool for extracting sequences in FASTA format. Options allow filtering by germlines, restriction of sequences to specific regions, and so on.

PlotGermline creates histograms showing germline usage.

Spectratype creates histograms showing CDR3 length distribution.

AbIdentity calculates sequence identities compared to a target sequence and a germline.

PlotIdentity creates repertoire identity/divergence plots.

ClusterExtract uses the sequence IDs in a ClusterSeqs output file to extract all records corresponding to a nominated cluster.

Tools for FASTA file Manipulation

A collection of small utilities that have proved useful in analysis pipelines.

CountRecords counts the number of records in a file, accounting for duplicates noted in the header.

FastaMatch filters records whose sequence or ID match a regular expression.

FastaSample extracts a random sample of records.

FastaUniq removes duplicate records (identified by ID).

FastaSampleUniq counts the number of unique sequences in a set of samples, accounting for duplicates noted in the header.

Installation and Usage

To use the tools, clone this repository or download and unzip the Zip file. All tools require Python 2 (v 2.7 or later), and BioPython.. Additional dependencies and usage information are given in the links above for each tool.

Further Information

William D. Lees and Adrian J. Shepherd, “Utilities for High-Throughput Analysis of B-Cell Clonal Lineages,” Journal of Immunology Research, vol. 2015, Article ID 323506, 9 pages, 2015. doi:10.1155/2015/323506

  • Download citation as EndNote

    Contact

    william@lees.org.uk