MRF: Missing Regions Finder

MRF is a virus comparative genomics tool. It takes a query fasta, reference fasta, reference gff3 as inputs and lists the missing coding sequences (complete and partial) of query genome with respect to the annotations of the reference genome. MRF can be run in single mode (one vs one) in command line and web interface or batch mode (one vs many) in command line.

To use a web version of this tool, please visit http://14.139.181.163/mrf

Dependencies

There are very minimal dependencies for running MRF-batch program. Verify if you have the following and install accordingly. Alternatively one can create a conda environment with requisite packages (scroll below)

Perl v5
Bash
Mummer
R > v3.6 and the following packages.
- RColorBrewer_1.1-2
- tidyr_1.1.2
- plyr_1.8.4
- gplots_3.0.1.1
- ggplot2_3.1.1
- dplyr_1.0.2

Installation

If you meet the above dependencies, clone the repository, cd into the directory and start running the programs.Otherwise you can intall through conda as below.

Through Conda

Clone this repository and follow the steps below

cd MRF
conda env create -f MRF.conda_env.yml
source activate MRF.env

Overview

single mode (This usage is equivalent to the web version of MRF)

To compare and list the missing coding sequences in a query genome with respect to a reference genome.

bash run-mrf-wrapper.sh -q query fasta -r reference fasta -f reference gff3

The above command generates several output files, out of which mrfOUT.mum.mcr contains the missing coding sequences and mrfOUT.mum.mr contains the missing regions.The program also creates a folder called circos_files in which the configuration and data files for generating a circos plot are present

To compare and list the missing genomic regions in a query genome with respect to a reference genome.

bash run-mrf-wrapper.sh -q query fasta -r reference fasta

The above command without the annotation file (GFF3), will just generate the missing regions.

To print usage

bash run-mrf-wrapper.sh -h

The above command prints help text as below.

Usage:  run-mrf-wrapper.sh -q query fasta -r reference fasta -f reference gff3

options: -m      mummer exact match length [default: 20]
     	 -l      False match length threshold.Values below this will be screened and confirmed if they are true matches [default: 15]
     	 -n      Negative offset. This option requires False match length as mandatory argument [default: 1]
     	 -p      Positive offset. This option requires False match length as mandatory argument [default: 1]
     	 -c      Partial coding sequences below this set threshold are not shown [default: 0]
     	 -o      output prefix [default: mrfOUT]
     	 -h      prints usage

batch mode

MRF can be used in batch mode (command line usage only) to analyze multiple query genomes against a single reference genome.

To see the available options, run the following command.

bash batch-run-mrf-wrapper.sh -h

Usage:    batch-run-mrf-wrapper.sh -d query_fasta_directory -r reference_fasta -f reference_gff3


options: -m      mummer exact match length [default: 20]
         -l      False match length threshold.Values below this will be screened and confirmed if they are true matches [default: 15]
         -n      Negative offset. This option requires False match length as mandatory argument [default: 1]
         -p      Positive offset. This option requires False match length as mandatory argument [default: 1]
         -c      Partial coding sequences below this set threshold are not shown [default: 0]
         -o      output prefix [default: mrfOUT]

Output parsing options:

         -I      Search by protein ids [provide comma separated]
         -N      Search by protein names [provide comma separated]
         -Y      Filter by missing coding sequences length [default: 0]
         -F      List first n proteins [default:10]
         -Z      List last n proteins [default:10]
         -R      List by range[m,n]
         -U      List only top n affected genomes [default: 10]
         -A      List all proteins

Plot options:

        -L      Number of missing coding sequences to show in heatmap [default: 15]
        -B      Number of genomes to show in barplot [default: 50]

        -h      prints usage

To analyze the genomes present in a directory called query against a reference genome, run the following command.

bash batch-run-mrf-wrapper.sh -d query -r Reference.fasta -f Reference.gff3

The above command produces three files called summaryAll.txt, cdsHeatmap.txt and cdsAll.txt. Apart from these three files, four easy to interpret figures, a Scatterplot, a Barplot and two Heatmaps (clustered and unclustered) are produced.

For detailed explanation along with the use cases refer to the manual

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
test		test
MRF.conda_env.yml		MRF.conda_env.yml
MRF_batch.pl		MRF_batch.pl
README.md		README.md
batch-run-mrf-wrapper.sh		batch-run-mrf-wrapper.sh
genPlots.R		genPlots.R
missing_regions_finder.pl		missing_regions_finder.pl
parse_mrf_output.pl		parse_mrf_output.pl
run-mrf-wrapper.sh		run-mrf-wrapper.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test

test

MRF.conda_env.yml

MRF.conda_env.yml

MRF_batch.pl

MRF_batch.pl

README.md

README.md

batch-run-mrf-wrapper.sh

batch-run-mrf-wrapper.sh

genPlots.R

genPlots.R

missing_regions_finder.pl

missing_regions_finder.pl

parse_mrf_output.pl

parse_mrf_output.pl

run-mrf-wrapper.sh

run-mrf-wrapper.sh

Repository files navigation

MRF: Missing Regions Finder

About

Releases

Packages

Contributors 2

Languages

vinayciba/MRF

Folders and files

Latest commit

History

Repository files navigation

MRF: Missing Regions Finder

About

Resources

Stars

Watchers

Forks

Languages