GitHub - ybdong919/3GenomeSNP_pipeline: A pipeline to seperate NGS sequencing data into chloroplast, mitochondria and nuclear genomes, and then call SNPs in the 3 genomes respectively.

Getting Started with 3GenomeSNP

Steps to Use 3GenomeSNP:

Familiarize yourself with 3GenomeSNP by reading Getting Started with 3GenomeSNP.txt (this file) attached in the pipeline folder.
Install all required free software, set up paths to access those computer programs, and test if installed software is working by typing: minia, bowtie2, SAMtools, blast or perl separately.
Create a directory for the 3GenomeSNP pipeline and copy the whole pipeline to this directory.
Upload all FASTQ data into the subfolder �Input_data�.
If needed, adjust the related parameters for the output files NE_contigs.fasta by editing Pident_Plength.txt or removing SNP sites with missing by editing Missing_threshold.txt in the subfolder �Threshold_set�.
Start the pipeline by running the shell file 3GenomeSNP.sh by typing: ./3GenomeSNP.sh at the command prompt.
Fifteen output files are generated in the subfolder �Output_results� in the same directory of 3GenomeSNP.

Prerequisite:

Minia (http://minia.genouest.org/). Extend k-mer length to 100 by typing: make clean && make k=100
Bowtie2 (http://bowtie-bio.sourceforge.net/bowtie2/index.shtml)
SAMtools (http://samtools.sourceforge.net/)
Perl in Linux (http://www.perl.org/get.html)
Fastx_collapser (http://hannonlab.cshl.edu/fastx_toolkit/). Download it to the same directory of Minia.
Blast+( http://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastDocs&DOC_TYPE=Download). Input files:
Paired-end Illumina sequencing data files with FASTQ format are used.
Two input files in the "Threshold_set" subfolder with adjustable parameters for the output file: i) Pident_Plength.txt is used to identify the contigs located in nuclear exon regions. The parameters of �Pident� and �Plength� are percentage of identical matches and alignment length, respectively. The default settings are 75% and 99%.
ii) Missing_threshold.txt is used to remove the loci having a level of missing observations or higher; normally 10-20%. The default setting is 0%. (Optional)
Protein database of 38 plant species compressed by tarball in the folder �Input_data� are used. Output files:
Nu_contigs.fasta consists of de novo assembly contigs from all samples as a reference for nuclear SNP genotyping.
Nu_SNP_genotypes.txt includes nuclear SNP genotype data after removing SNPs showing the same genotypes for all samples and residing within 20 bases from both ends of each contig.
Nu_clean_SNP_genotypes.txt includes nuclear SNP genotype data after removing SNPs with missing based on Nu_SNP_genotypes.txt.
Nu_SNP_hap.txt is unphased haplotype data corresponding to Nu_SNP_Genotypes.txt.
Nu_clean_SNP_hap.txt includes the haplotype data after removing SNPs with missing based on Nu_SNP_hap.txt.
NE_contigs.fasta consists of de novo assembly contigs in nuclear exon regions from all samples as a reference for SNP genotyping in exon regions.
NE_contigs_information.txt consists of proten information associated with the contigs in nuclear exon regions.
NE_SNP_genotypes.txt includes SNP genotype data in exon regions after removing SNPs showing the same genotypes for all samples and residing within 20 bases from both ends of each contig.
NE_clean_SNP_genotypes.txt includes SNP genotype data after removing SNPs with missing based on NE_SNP_genotypes.txt.
NE_SNP_hap.txt is unphased haplotype data corresponding to NE_SNP_Genotypes.txt.
NE_clean_SNP_hap.txt includes haplotype data after removing SNPs missing based on NE_SNP_hap.txt.
Cp_SNPs.txt includes SNP data in chloroplast.
Cp_clean_SNPs.txt includes SNP data after removing SNPs with missing based on Cp_SNPs.txt.
Mt_SNPs.txt includes SNP data in mitochondria.
Mt_clean_SNPs.txt includes SNP data after removing SNPs with missing based on Mt_SNPs.txt.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
CpMt_ref_genomes		CpMt_ref_genomes
Input_data		Input_data
Output_results		Output_results
Scripts		Scripts
Threshold_set		Threshold_set
3GenomeSNP.sh		3GenomeSNP.sh
README.md		README.md
Readme.txt		Readme.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

ybdong919/3GenomeSNP_pipeline

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages