Covariance analysis on alignments of biological sequences.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
alignment
alphabet
cpp
database
dca
diagnostic
external
filtering
graphing
papers/sca_conservation
pdb
sca
statistics
unittests
utils
.gitignore
LICENSE
README.md
setup_paths.m

README.md

MultiCOV

This is a package for performing covariance analysis on alignments of biological sequences. In particular, it allows one to run various versions of the statistical coupling analysis (SCA) framework, as well as several different approaches to direct coupling analysis (DCA). The package can work with arbitrary alphabets (e.g., DNA, RNA, or protein) or with mixed alignments containing sequences from several alphabets.

This is also the software used to generate the analyses and figures used in the paper

Teşileanu, T., Colwell, L. J., & Leibler, S. (2015). Protein Sectors: Statistical Coupling Analysis versus Conservation. PLOS Computational Biology, 11(2), e1004091. doi:10.1371/journal.pcbi.1004091

The relevant files for that paper can be found in the papers/sca_conservation folder.

Installation

Simply download and unzip the files in a folder of your choice. Most of the code is pure Matlab so it should work without any further trouble. It is recommended to run the setup_paths.m script before using any of the multiCOV functions. This makes sure to add all of the relevant folders to the paths so that all the functions can be properly accessed.

Some of the features of the package require compilation of C++ extensions. Estimating sequence weights, for example, is greatly accelerated by use of the C++ extensions, though it does work even without them. The functions related to Monte Carlo simulations rely on the C++ code and will not work if the extensions are not compiled.

The code for the extensions is in the cpp folder. On Unix-like machines (Mac or Linux), go to each of the folders and either run make (if you have it installed), or run the bash scripts called mexify.sh. Note that on Mac OS, you will need the Xcode command-line tools installed. On Windows, you may be able to compile the extensions using Cygwin (https://www.cygwin.com).

Usage

All the functions in the multiCOV package are extensively documented. After running the setup_paths.m script, you can use Matlab's help command to get an explanation for what a function does and how it's used. You can, for example, type help loadfasta to learn about the function used to load FASTA alignments. You can also type help alignment to see short descriptions of all the functions in the alignment folder.

multiCOV contains a large number of functions, which can make it hard to navigate the package. A good way to get started is to look at the scripts that are included with the package. A short tutorial, example.m, on using multiCOV can be found in the doc folder. The scripts in papers/sca_conservation, that were used for the protein sectors paper, are also a good way to get an introduction to the code. You can also look at the scripts in the experiments folder.

Trouble?

If you have any trouble or you think you've found a bug, please file a bug report at https://bitbucket.org/ttesileanu/multicov/issues.