Skip to content
Walid Korani edited this page Feb 15, 2017 · 13 revisions

SNP-ML

Machine learning tools from improving SNP calling in polyploid organisms.

The application could recognize more than 90% of true SNPs of simulating data of tetra-, hexa- and octo-ploids, and more than 80 % of real data of peanut.

SNP-ML contains two tool:

  1. SNP-MLer (pronounced snip miller): is a tool for training the data to prepare a predictor for new sets. SNP-ML contains database of some species. If the user needs to use different data sets, this tool is usefull.
  2. SNP-ML (pronounced snip mill): is a tool for prediction the true SNPs. The user can apply the output vcf file of SNP calling using any pipeline (ex. Samtool mpileup) to a suitable database using this tool. The outputs are a vcf file containing only the true SNPs and csv file containing the output score of the trainer.

**SNP-MLer Usage: **

SNP-MLer [options] -iTP tp.vcf -iFP FP_-fp.vcf -o output_trainer

Arguments: -iTP: a vcf file containing the true SNPs.

-iFP: a vcf file containing the false SNPs.

-o: A name of the trainer, the trainer will be saved using this name in the db folder.

Options: -skip: allowing skipping attributes.

-custom: disabling all attributes and allowing to select desired ones. The –skip and –custom arguments are written in a comma delimited string for the following attributes (lg,n1,n2,mq,af1,qual,lg,n1n2), the description of these attributes are in the publication.

-m: enabling the tree-bagging trainer.

-add_new1, -add_new2: two csv files containing new attributes that user wants to include, the file should be csv without header or row names.

**SNP-ML Usage: **

SNP-ML [options] –i input.vcf –iM trainer -o outputs

Arguments: -i: a vcf file containing the SNPs for evaluation.

-iM: a string of the name of the trainer, it will be uploaded from db folder.

-o: The name of vcf and csv output files

Options: -skip: allowing skipping attributes.

-custom: disabling all attributes and allowing to select desired ones. The –skip and –custom arguments are written in a comma delimited string for the following attributes (lg,n1,n2,mq,af1,qual,lg,n1n2), the description of these attributes are in the publication.

-m: enabling the tree-bagging trainer.

-c: a cut-off for the neural network prediction (default 0.5)

-add_new2 a csv file containing new attributes that user wants to include, the file should be csv without header or row names.

NOTE: NUMBER OF TRAITS IN THE SNP-ML SHOULD BE THE SAME AS IN SNP-MLER. IF YOU SKIP, CUSTOM OR ADDED NEW PARAMTERS IN THE SNP-MLER, YOU HAVE TO USE THE SAME SETTINGS WITH SNP-ML.

Clone this wiki locally