A Fully Automatic Pipeline for Predicting Contact Maps of RNAs by Evolutionary Coupling Analysis
RNAcmap predictor requires only a standard computer with around 32 GB RAM to support the in-memory operations for RNAs sequence length less than 500.
- BLASTN
- Infernal
- RNAstructure
- MATLAB (optinal if using mf_DCA)
- Python3
- virtualenv or Anaconda
RNAcmap has been tested on Ubuntu 14.04, 16.04, and 18.04 operating systems.
To install RNAcmap and it's dependencies following commands can be used in terminal:
git clone https://github.com/jaswindersingh2/RNAcmap.git
cd RNAcmap
Either follow virtualenv column steps or conda column steps to create virtual environment and to install RNAcmap python dependencies given in table below:
virtualenv | conda | |
---|---|---|
3. | virtualenv -p python3.6 venv_rnacmap |
conda create -n venv_rnacmap python=3.6 |
4. | source ./venv_rnacmap/bin/activate |
conda activate venv_rnacmap |
5. | pip install -r requirements.txt && deactivate |
conda install --file requirements.txt && conda deactivate |
If Infernal tool is already installed in the system, please add the path for binary files to the script scripts/set_environments.sh
. In case Infernal tool is not installed in the system, Run the following script, Infernal 1.1.3 will be installed under 3rd_party/infernal
./scripts/install_infernal.sh
In case of any problem and issue regarding Infernal download, please refer to Infernal webpage
If BLASTN tool is already installed in the system, pplease add the path for binary files to the script scripts/set_environments.sh
In case, BLASTN tool is not installed in the system, run the following script, latest blast+ will be installed under 3rd_party/blast
In case of any problem and issue regarding BLASTN download, please refer to BLASTN webpage as following commands only tested on Ubuntu 18.04, 64 bit system.
./scripts/install_blast.sh
Install either RNAfold or SPOT-RNA predictor depending upon which RNA Secondary Structure predictor you want to use. Installation of RNAfold will take 15-20 mins(ViennaRNA suite) and 2-3 mins for SPOT-RNA.
./scripts/install_RNAfold.sh
or/and./scripts/install_SPOT-RNA.sh
please refer to more specific and detailed guide for ViennaRNA and SPOT-RNA.
If NCBI's nt database already available in your system, please add the path for binary files to the script scripts/set_environments.sh
. Otherwise, download the reference database (NCBI's nt database) for BLASTN and INFERNAL. The following command can be used for NCBI's nt database. Make sure there is enough space on the system. In case of any issue, please rerfer to NCBI's database website.
wget -c "ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nt.gz" -O ./nt_database/nt.gz && gunzip ./nt_database/nt.gz
This NCBI's database need to formated to use with BLASTN tool. To format the NCBI's database, the following command can be used. Please make sure system have enough space as formated database is of size around 120 GB in addition to appox. 270 GB from previous step and it can take a few hours for processing.
./ncbi-blast-2.10.0+/bin/makeblastdb -in ./nt_database/nt -dbtype nucl
To install the DCA predictor, please run the following script:
./scripts/install_GREMLIN.sh
or/and./scripts/install_plmc.sh
The RNAstructure suite is required for conversion of RNA secondary structures & remove pseudoknot pairs. Please install it following the official guide and register, then add the path for binary files to the script scripts/set_environments.sh
or install with the script
./scripts/install_RNAstructure.sh
Finally, run the following script to check whether RNAcmap can find the dependent execuables:
./scripts/check_deps.sh
Expected output:
## checking infernal & Blastn
cmbuild OK
cmcalibrate OK
cmsearch OK
esl-reformat OK
blastn OK
## checking 3rd party tools
gremlin_cpp OK
plmc OK
RNAfold OK
ct2dot OK
dot2ct OK
RemovePseudoknots OK
SPOT-RNA.py OK
parse_blastn_local.pl OK
reformat.pl OK
get_ss.py OK
if not, please check steps 6~12 to solve missing dependencies.
Here is an example command to run the RNAcmap on a sample sequence. Use either RNAfold or SPOT-RNA for secondary structure predictor and one DCA method among GREMLIN, plmc, and mfDCA as input argument.
./scripts/run_rnacmap.sh inputs/sample_seq.fasta SPOT-RNA GREMLIN
A help page will be displayed while the script is executed without argument:
./scripts/run_rnacmap.sh
======================================================================
RNAcmap: A Fully Automatic pipeline
for Predicting Contact Maps of RNAs by Evolutionary Coupling Analysis
======================================================================
Usage ./scripts/run_rnacmap.sh input.fasta SSPredictor ECTOOL
input.fasta Input RNA sequence in fasta format
SSPredictor Tool for predicting RNA secondary structure.
Available options: [SPOT-RNA|RNAfold]
ECTOOL Tool for calculating evolutionary coupling score.
Available options: [GREMLIN|plmc|mfDCA]
The final output will be the "*.dca" file in the outputs
folder. Result consists of predicted Direct Coupling Analysis (DCA) by RNAcmap for a given input RNA sequence.
Zhang, T., Singh, J., Litfin, T., Zhan, J., Paliwal, K., Zhou, Y., 2020. RNAcmap: A Fully Automatic Method for Predicting Contact Maps of RNAs by Evolutionary Coupling Analysis. (Under Review)
[1] Nawrocki, E.P. and Eddy, S.R., 2013. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics, 29(22), pp.2933-2935..
[2] Hofacker, I.L., 2003. Vienna RNA secondary structure server. Nucleic acids research, 31(13), pp.3429-3431.
[3] H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne. (2000) The Protein Data Bank Nucleic Acids Research, 28: 235-242.
[4] Singh, J., Hanson, J., Paliwal, K. and Zhou, Y., 2019. RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning. Nature communications, 10(1), pp.1-13.
[5] Kamisetty, H., Ovchinnikov, S. and Baker, D., 2013. Assessing the utility of coevolution-based residue–residue contact predictions in a sequence-and structure-rich era. Proceedings of the National Academy of Sciences, 110(39), pp.15674-15679.
This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain one at https://mozilla.org/MPL/2.0/.
jaswinder.singh3@griffithuni.edu.au, tongchuan.zhang@griffithuni.edu.au, yaoqi.zhou@griffith.edu.au