Home

Welcome to the GP3 wiki!

Below are some links that may be useful for quick start of already installed pipeline, troubleshooting pipeline problems and understanding the pipeline setup.

Quick Start Guide

Pipeline Setup Logic

Troubleshooting

Pipeline Arguments

Argument	Usage	Type	Default	Explanation
-inputPLINK	required	str ending in .bed/.ped	NA	Full path to PLINK file ends in .bed or .ped, whitespace characters are not allowed
-phenoFile	required	NA	NA	Full path to populated phenotype file: sample_sheet_template.xlsx
--config	only required if .json not in .home of chunky	str	search in chunky .home directory	configuration file produced after running `chunky config run_GWAS_analysis_pipeline.py`
--outDir	optional	str	current working directory	Full path to an already existing directory or location where you would like GP3 to build the project
--projectName	optional	str	year-month-date-hour-min-sec	Name of project to be created in the outDir location whitespace characters are not allowed
--startStep	optional	str	hwe	str
--endStep	optional	str	PCA_indi_graph (noTGP) or PCA_TGP_graph (if --TGP used)	options: if --TGP not set -> [hwe, LD, maf, het, ibd, PCA_indi, or PCA_indi_graph] if --TGP is set -> [hwe, LD, maf, het, ibd, outlier_removal, PCA_TGP, or PCA_TGP_graph]] Point of the pipeline where you would like to stop analysis. This step is inclusive!
--hweThresh	optional	float	1e-6	Filters out SNPs that are smaller than this threshold due to liklihood of genotyping error
--LDmethod	optional	str	indep	options:[indep, indep-pairwise or indep-pairphase] Method to calculate linkage disequilibrium. See PLINK documentation for more information.
--VIF	optional	int	2	variant inflation factor for indep method LD pruning method only; indep-pairwise or indep-pairphase method will not use VIF
--rsq	optional	float	0.50	any floating point number between 0.0-1.0; r-squared threshold for indep-pairwise or indep-pairphase LD pruning method; indep method will not use rsq
--windowSize	optional	int	50	any integer; the window size in kb for LD analysis
--stepSize	optional	int	5	any integer; variant count to shift window after each iteration
--maf	optional	float	0.05	any floating point number between 0.0-1.0; filter remaining LD pruned variants by MAF, any MAF below set threshold is filtered out
--hetMethod	optional	str	meanStd	options: minMax or meanStd; method to use to determine heterozygosity. minMax filter based on the parameters --hetThresh as the max F-inbreeding coefficient and --hetThreshMin for the minimum F-inbreeding coeffient, which by default are 0.10 and -0.10, respectively. The meanStd filter method calculates a het_score: 1-[observed[HOM]/total] and then filters out any samples that are more than 3 std deviations from the mean het_score. The number of standard deviations from the mean can be changed using the --het_std parameter
--het_std	optional	int or float	3	any floating point number or integer; if using hetMethod=meanStd you can determine how many standard deviations aways from the mean is allowable for heterozygosity. Setting to 3 is interpreted as +/-3 standard deviations away from the mean of the het_score, calculated as 1-[observed(HOM)/total]
--hetThresh	optional	float	0.10	any floating point number; filter out samples where inbreeding coefficient is greater than threshold (heterozygosity filtering); only used when method minMax for --hetMethod is selected
--hetThreshMin	optional	float	-0.10	any floating point number; filter out samples where inbreeding coefficient is samller than min threshold set (heterozygosity filtering); only used when method minMax for hetThresh is selected
--sampleMiss	optional	float	0.03	any floating point number between 0.0-1.0; Maximum missingness of genotype call in sample before it should be filtered out. Where 0 is no missing, and 1 is all missing (0.03 is interpreted as 3 percent of snp calls are missing in a sample)
--snpMiss	optional	float	0.03	any floating point number between 0.0-1.0; Maximum missingness of genotype call in a SNP cluster before the SNP should be filtered out. Where 0 is no missing, and 1 is all missing (0.03 is interpreted as 3 percent of sample calls are missing in a snp)
--TGP	optional	flag	NA	specifying this flag means to generate PCA plots with TGP data merged into the given cohort data set for the 5 superpopulations in TGP (AFR, AMR, EAS, EUR, SAS)
--centerPop	optional	str	myGroup	options: literally the string myGroup or available TGP group merged into input dataset; when using the TGP flag, you have the option to specify which population cohort that PCs should be centered around for boxplots. By default this is set to your group(s) listed in the sample sheet. You can pick a TGP super population listed in the TGP_Sub_and_SuperPopulation_info.txt file. CASE SENSITIVE!
--outliers	optional	str	None	A txt file of FID and IID, tab-delimited and one sample per line, that are outliers that should be removed from the sample set (PCA outlier removal); Use original names (original FID and IID), not renamed 1-n for GENESIS formatting
--pcmat	optional	int	5	any integer; Number of predicted admixture populations in dataset to be used in GENESIS calculation for PCA
--reanalyze	optional	flag	NA	by adding this flag, it means you are going to pass a dataset through the pipeline that has already been partially/fully analyzed by this pipeline. WARNING! May over write exisiting data!! required if using --startStep argument OR if using --endStep arguments on an already existing project

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Welcome to the GP3 wiki!

Pipeline Arguments

Clone this wiki locally