Licence: GNU General Public License v3.0 (copy provided in directory)
Author: Tom van Wijk
Contact: tom_van_wijk@hotmail.com
Determines the Multi loci VNTR (Variable Number Tandem Repeat) type of a given Salmonella Enteritidis genome assembly.
- Linux operating system. This script is developed on Linux Ubuntu 16.04
WARNING: Experiences when using different operating systems may vary. - python 2.7.x
- python libraries as listed in the import section of mistress.py
- ncbi BLAST 2.6.0+
- The reference directory supplied with this repository
- clone the MISTReSS repository to the desired location on your system:
git clone https://github.com/tom-van-wijk/MISTReSS.git
- Add the location of the MISTReSS repository to you path variable:
export PATH=$PATH:/path/to/MISTReSS
(it is recommended to add this command to your ~/.bashrc file) - Create path variable MISTRESS_REF to the reference subdirectory:
export MISTRESS_REF=/path/to/MISTReSS/reference_files
(it is recommended to add this command to your ~/.bashrc file)
Start the script with the following command:
mistress.py -i 'inputfile' -s 'pathogen' -o 'outputdir'
-
'inputfile': Location of input file. This should be a fully assembled Salmonella Enteritidis genome. genome in .fasta/.fsa/.fna/.fa format.
NOTE: To correctly determine the number of repeats, it is crucial to assemble your genome as accurate as possible. We recommend using the methods used in our paper: quality trim the fastq files with q=25 using erne-filter v2.1.1 and assemble with SPAdes v 3.10.0.
NOTE: When a tandem repeat is so long that it is not be covered by a single read, the assembly with problably compress the tandem repeat by assembling multiple repeats as a single sequence. When using illumina 2x150 bp, this will happen when SENTR5 n>13 and SENTR6 n>11. These are rare but to also type longer genotypes correctly, we recommend using at least illumina 2x250 bp for Salmonella Enteritidis. -
'pathogen': The serovar of the input strain. Currently, only "enteritidis" is supported.
Default = "enteritidis" -
'outputdir': Location of output directory. If none is specified, an output directory will be created in the parent directory of inputfile.
Added in this repository is multi_mistress.py
.
This script allows for large batches of data to be typed with a single command.
When the installation of mistress is complete, no additional dependencies have to be installed and no additional steps have to be taken,
you are ready to go.
This script will create an output directory with a subdirectory for each genome containing the mistress output.
Additionally multy_mistress_output.txt
will be created with an overview of all typed genomes.
Start multi_mistress with the following command:
multi_mistress.py -i 'inputdir' -o 'outputdir'
-
'inputdir': location of input directory.
This should only contain fully assembled genomes in .fasta/.fsa/.fna/.fa format. -
'outputdir': location of output directory.
If none is specified, an output directory will be created in input directory.
You can easily add your own pathogens to this tool by doing to following:
- Add a new 'Pathogen' element for the pathogen you want to add to
reference_files/supported_pathogens.xml
- Add a
reference_files/panel_'your_pathogen_name'.xml
file with the VNTR sizes for your pathogen.
'your_pathogen_name' in this file's filename, theSerovar
attribute of theVntrs
element in this file and theName
element of the addedPathogen
element inreference_files/supported_pathogens.xml
all need to be indentical.
This file is required to be in the same format as the supplied panel file(s), including identical element and attribute names. - Add a
reference_files/primers_'your_pathogen_name'.fsa
file with the primers that are used for your pathogen in the lab method. 'your_pathogen_name' once again needs to be need to be indentical to the afore mentioned terminology. This file needs to be in the same format as the supplied primer file(s). The primers in this file are named with an_P1
and_P2
flag for forward and reverse primers respectively. The headers of the primer sequences (without the_P1
and_P2
flags need to be identical to the elementvntr
'sName
attribute value inreference_files/panel_'your_pathogen_name'.xml
- Now you can run mistress or multi_mistress with using the -s flag.
Use the value for this parameter that is identical to your pathogens name
in the panel and primer files.
NOTE: Please keep in mind that the VNTR sizes used in the classical methods might be biased.
Also keep in mind that when the size of the total VNTR comes close or exceeds the read size of the
sequencing technology used, the VNTR is unlikely to have been assembled correctly.
Testing with a set of traditionally typed samples is highly recommended.