-
Notifications
You must be signed in to change notification settings - Fork 5
Home
SNP-ML
Machine learning tools from improving SNP calling in polyploid organisms.
The application could recognize more than 90% of true SNPs of simulating data of tetra-, hexa- and octo-ploids, and more than 80 % of real data of peanut.
SNP-ML contains two tool:
- SNP-MLer (pronounced snip miller): is a tool for training the data to prepare a predictor for new sets. SNP-ML contains database of some species. If the user needs to use different data sets, this tool is usefull.
- SNP-ML (pronounced snip mill): is a tool for prediction the true SNPs. The user can apply the output vcf file of SNP calling using any pipeline (ex. Samtool mpileup) to a suitable database using this tool. The outputs are a vcf file containing only the true SNPs and csv file containing the output score of the trainer.
**SNP-MLer Usage: **
SNP-MLer [options] -iTP tp.vcf -iFP FP_-fp.vcf -o output_trainer
Arguments: -iTP: a vcf file containing the true SNPs.
-iFP: a vcf file containing the false SNPs.
-o: A name of the trainer, the trainer will be saved using this name in the db folder.
Options: -skip: allowing skipping attributes.
-custom: disabling all attributes and allowing to select desired ones. The –skip and –custom arguments are written in a comma delimited string for the following attributes (lg,n1,n2,mq,af1,qual,lg,n1n2), the description of these attributes are in the publication.
-m: enabling the tree-bagging trainer.
-add_new1, -add_new2: two csv files containing new attributes that user wants to include, the file should be csv without header or row names.
**SNP-ML Usage: **
SNP-ML [options] –i input.vcf –iM trainer -o outputs
Arguments: -i: a vcf file containing the SNPs for evaluation.
-iM: a string of the name of the trainer, it will be uploaded from db folder.
-o: The name of vcf and csv output files
Options: -skip: allowing skipping attributes.
-custom: disabling all attributes and allowing to select desired ones. The –skip and –custom arguments are written in a comma delimited string for the following attributes (lg,n1,n2,mq,af1,qual,lg,n1n2), the description of these attributes are in the publication.
-m: enabling the tree-bagging trainer.
-c: a cut-off for the neural network prediction (default 0.5)
-add_new2 a csv file containing new attributes that user wants to include, the file should be csv without header or row names.
NOTE: NUMBER OF TRAITS IN THE SNP-ML SHOULD BE THE SAME AS IN SNP-MLER. IF YOU SKIP, CUSTOM OR ADDED NEW PARAMTERS IN THE SNP-MLER, YOU HAVE TO USE THE SAME SETTINGS WITH SNP-ML.