Skip to content

A Fully Automatic Method for Predicting Contact Maps of RNAs by Evolutionary Coupling Analysis

Notifications You must be signed in to change notification settings

tcgriffith/RNAcmap-1

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RNAcmap

A Fully Automatic Pipeline for Predicting Contact Maps of RNAs by Evolutionary Coupling Analysis

SYSTEM REQUIREMENTS

Hardware Requirments:

RNAcmap predictor requires only a standard computer with around 32 GB RAM to support the in-memory operations for RNAs sequence length less than 500.

Software Requirments:

RNAcmap has been tested on Ubuntu 14.04, 16.04, and 18.04 operating systems.

USAGE

Installation:

To install RNAcmap and it's dependencies following commands can be used in terminal:

  1. git clone https://github.com/jaswindersingh2/RNAcmap.git
  2. cd RNAcmap

Either follow virtualenv column steps or conda column steps to create virtual environment and to install RNAcmap python dependencies given in table below:

virtualenv conda
3. virtualenv -p python3.6 venv_rnacmap conda create -n venv_rnacmap python=3.6
4. source ./venv_rnacmap/bin/activate conda activate venv_rnacmap
5. pip install -r requirements.txt && deactivate conda install --file requirements.txt && conda deactivate

If Infernal tool is already installed in the system, please add the path for binary files to the script scripts/set_environments.sh . In case Infernal tool is not installed in the system, Run the following script, Infernal 1.1.3 will be installed under 3rd_party/infernal

  1. ./scripts/install_infernal.sh

In case of any problem and issue regarding Infernal download, please refer to Infernal webpage

If BLASTN tool is already installed in the system, pplease add the path for binary files to the script scripts/set_environments.sh In case, BLASTN tool is not installed in the system, run the following script, latest blast+ will be installed under 3rd_party/blast

In case of any problem and issue regarding BLASTN download, please refer to BLASTN webpage as following commands only tested on Ubuntu 18.04, 64 bit system.

  1. ./scripts/install_blast.sh

Install either RNAfold or SPOT-RNA predictor depending upon which RNA Secondary Structure predictor you want to use. Installation of RNAfold will take 15-20 mins(ViennaRNA suite) and 2-3 mins for SPOT-RNA.

  1. ./scripts/install_RNAfold.sh or/and ./scripts/install_SPOT-RNA.sh

please refer to more specific and detailed guide for ViennaRNA and SPOT-RNA.

If NCBI's nt database already available in your system, please add the path for binary files to the script scripts/set_environments.sh. Otherwise, download the reference database (NCBI's nt database) for BLASTN and INFERNAL. The following command can be used for NCBI's nt database. Make sure there is enough space on the system. In case of any issue, please rerfer to NCBI's database website.

  1. wget -c "ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nt.gz" -O ./nt_database/nt.gz && gunzip ./nt_database/nt.gz

This NCBI's database need to formated to use with BLASTN tool. To format the NCBI's database, the following command can be used. Please make sure system have enough space as formated database is of size around 120 GB in addition to appox. 270 GB from previous step and it can take a few hours for processing.

  1. ./ncbi-blast-2.10.0+/bin/makeblastdb -in ./nt_database/nt -dbtype nucl

To install the DCA predictor, please run the following script:

  1. ./scripts/install_GREMLIN.sh or/and ./scripts/install_plmc.sh

The RNAstructure suite is required for conversion of RNA secondary structures & remove pseudoknot pairs. Please install it following the official guide and register, then add the path for binary files to the script scripts/set_environments.sh

or install with the script

  1. ./scripts/install_RNAstructure.sh

Finally, run the following script to check whether RNAcmap can find the dependent execuables:

  1. ./scripts/check_deps.sh

Expected output:

## checking infernal & Blastn
cmbuild                   OK
cmcalibrate               OK
cmsearch                  OK
esl-reformat              OK
blastn                    OK
## checking 3rd party tools
gremlin_cpp               OK
plmc                      OK
RNAfold                   OK
ct2dot                    OK
dot2ct                    OK
RemovePseudoknots         OK
SPOT-RNA.py               OK
parse_blastn_local.pl     OK
reformat.pl               OK
get_ss.py                 OK

if not, please check steps 6~12 to solve missing dependencies.

How To Use

Here is an example command to run the RNAcmap on a sample sequence. Use either RNAfold or SPOT-RNA for secondary structure predictor and one DCA method among GREMLIN, plmc, and mfDCA as input argument.

./scripts/run_rnacmap.sh inputs/sample_seq.fasta SPOT-RNA GREMLIN

A help page will be displayed while the script is executed without argument:

./scripts/run_rnacmap.sh 

======================================================================
              RNAcmap: A Fully Automatic pipeline                     
 for Predicting Contact Maps of RNAs by Evolutionary Coupling Analysis
======================================================================
Usage ./scripts/run_rnacmap.sh input.fasta SSPredictor ECTOOL
         input.fasta    Input RNA sequence in fasta format
         SSPredictor    Tool for predicting RNA secondary structure.
                        Available options: [SPOT-RNA|RNAfold]
         ECTOOL         Tool for calculating evolutionary coupling score.
                        Available options: [GREMLIN|plmc|mfDCA]

The final output will be the "*.dca" file in the outputs folder. Result consists of predicted Direct Coupling Analysis (DCA) by RNAcmap for a given input RNA sequence.

References

If you use RNAcmap for your research please cite the following papers:

Zhang, T., Singh, J., Litfin, T., Zhan, J., Paliwal, K., Zhou, Y., 2020. RNAcmap: A Fully Automatic Method for Predicting Contact Maps of RNAs by Evolutionary Coupling Analysis. (Under Review)

Other references:

[1] Nawrocki, E.P. and Eddy, S.R., 2013. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics, 29(22), pp.2933-2935..

[2] Hofacker, I.L., 2003. Vienna RNA secondary structure server. Nucleic acids research, 31(13), pp.3429-3431.

[3] H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne. (2000) The Protein Data Bank Nucleic Acids Research, 28: 235-242.

[4] Singh, J., Hanson, J., Paliwal, K. and Zhou, Y., 2019. RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning. Nature communications, 10(1), pp.1-13.

[5] Kamisetty, H., Ovchinnikov, S. and Baker, D., 2013. Assessing the utility of coevolution-based residue–residue contact predictions in a sequence-and structure-rich era. Proceedings of the National Academy of Sciences, 110(39), pp.15674-15679.

Licence

This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain one at https://mozilla.org/MPL/2.0/.

Contact

jaswinder.singh3@griffithuni.edu.au, tongchuan.zhang@griffithuni.edu.au, yaoqi.zhou@griffith.edu.au

About

A Fully Automatic Method for Predicting Contact Maps of RNAs by Evolutionary Coupling Analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Perl 67.5%
  • Shell 15.3%
  • MATLAB 11.7%
  • Python 5.5%