Bio::ToolBox - Tools for querying and analysis of genomic data
This package provides a number of Perl modules and scripts for working with common bioinformatic data. Many bioinformatic data analysis revolves around working with tables of information, including lists of genomic annotation (genes, promoters, etc.) or defined regions of interest (epigenetic enrichment, transcription factor binding sites, etc.). This library works with these tables and provides a set of common tools for working with them.
- Opening and saving common tab-delimited text formats
- Support for BED, GFF, VCF, narrowPeak files
- Scoring intervals and annotation with datasets from microarray or sequencing experiments, including ChIPSeq, RNASeq, and more
- Support for Bam, BigWig, BigBed, wig, and USeq data formats
- Works with any genomic annotation in GTF, GFF3, and various UCSC formats, including refFlat, knownGene, genePred and genePredExt formats
The libraries provide a unified and integrated approach to analyses. In many cases, they provide an abstraction layer over a variety of different specialized Bio::Perl and related modules. Instead of writing numerous scripts specialized for each data format (wig, bigWig, Bam), one script can now work with virtually any data format.
Basic installation is simple with the standard Module::Build incantation. This will get you a minimal installation that will work with text files (BED, GFF, GTF, etc), but not binary files.
perl ./Build.PL ./Build ./Build test ./Build install
To work with binary Bam and BigWig files, see advanced installation for further guidance. Most scripts should fail gently with warnings if required modules are missing.
Released versions may be obtained though the CPAN repository using your favorite package manager.
For those so inclined, a dockerfile is provided for your convenience in building a Docker image following the advanced installation guide.
Several user-oriented library modules are included in this distribution for working with bioinformatic data. They provide a foundation for the included analysis scripts, and can be used for custom coding projects. See Libraries for more information.
The BioToolBox package comes complete with a suite of high-quality production-ready scripts ready for a variety of analyses. A sampling of what can be done include the following:
- Annotated feature collection and selection
- Data collection and scoring for features
- Data file format manipulation and conversion
- Low-level processing of sequencing data into customizable wig representation
Scripts have built-in documentation. Execute the script without any options to print
a synopsis of available options, or add
--help to print the full documentation.
See Script examples for more information.
Timothy J. Parnell, PhD Huntsman Cancer Institute University of Utah Salt Lake City, UT, 84112
This package is free software; you can redistribute it and/or modify it under the terms of the Artistic License 2.0. For details, see the full text of the license in the file LICENSE.
This package is distributed in the hope that it will be useful, but it is provided "as is" and without any express or implied warranties. For details, see the full text of the license in the file LICENSE.