GWAS Pipeline

This is the standard GWAS pipeline. Please notice that your GWAS sites is in buid37 coordinate system, if it is not the case, you can use the UCSC liftOver tool to perform the conversion to build37 system. Please also notice it is prefer that your GWAS dataset is forward strand, otherwise the SNPs which found in both reference panel and target panel that have an incompatible alleles types will be removed during the procedure of strand alignment check in this pipeline.

Requirement:

please download tools to the folder "software":

plink (please download the latest plink1.9, otherwise you will waste a lot of time!!!)
shapeit
impute2

other pre-requirements:

a cluster or you need a server with many cores
genotyped data(plink format, transform to bed/bim/fam before running the pipeline if your genotyped data is ped/map)
enough storage since very large files will be created during imputation procedure
R 3.4.3;gawk;python3.6.3

Procedures

1.Git clone this repository

git clone https://github.com/verasiwei/GWAS_python

2.users to edit the directory in config.py
3.On your terminal, cd to the folder "pythonscript"

$ python quality_control.py

Notice:

1.If IDs are not matched in genotype and phenotype data,you should provide "mylist.txt" file.

2.Answer each question on the screen during the process, do not need to include quotation markers

4.On your terminal, cd to the folder "pythonscript"

$ python imputation.py

Notice:

1.You can choose whether to use 1000 Genome Phase1 or Phase3 as the reference panel during this process, just answer the "Phase1" or "Phase3" when the question promted on the screen

2.It may takes a long time if you want to impute the whole chromosomes, all chromosomes are separated into more than 500 chunks totally with each chunk 5MB according to the physical position, also you need to have enough storage to save the outputs!!!

5.On your terminal, cd to the folder "pythonscript", a manhattan plot and a qqplot shown like the below will be created.

$ python post_imputation.py

Notice:

1.You should change the column of phenotype in the .fam file and provide a file of "mycovpca.txt" which includes the adjusted covariates like age, sex and PCs by yourself.

After running post_imputation.py, answer every question on the screen and you will get a manhattan plot and an annotation file which annotates the significant snps

You can comment or uncomment each function when you only want to do specific jobs like generating prs score or something else.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.vscode		.vscode
input		input
output		output
pythonscript		pythonscript
rawdata		rawdata
reference		reference
result		result
rscript		rscript
README.md		README.md
demo.py		demo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.vscode

.vscode

input

input

output

output

pythonscript

pythonscript

rawdata

rawdata

reference

reference

result

result

rscript

rscript

README.md

README.md

demo.py

demo.py

Repository files navigation

GWAS Pipeline

Requirement:

Procedures

About

Releases

Packages

Languages

verasiwei/Plink-Based-GWAS

Folders and files

Latest commit

History

Repository files navigation

GWAS Pipeline

Requirement:

Procedures

About

Resources

Stars

Watchers

Forks

Languages