|
22 | 22 | "cell_type": "markdown",
|
23 | 23 | "metadata": {},
|
24 | 24 | "source": [
|
25 |
| - "#### Set up environment" |
| 25 | + "### Set up the environment" |
26 | 26 | ]
|
27 | 27 | },
|
28 | 28 | {
|
|
50 | 50 | "cell_type": "markdown",
|
51 | 51 | "metadata": {},
|
52 | 52 | "source": [
|
53 |
| - "#### Download data" |
| 53 | + "### Load the data" |
54 | 54 | ]
|
55 | 55 | },
|
56 | 56 | {
|
57 | 57 | "cell_type": "markdown",
|
58 | 58 | "metadata": {},
|
59 | 59 | "source": [
|
60 |
| - "We here load the yeast cross dataset.\n", |
61 |
| - "The data used in this study have been preconverted into an hdf5 file. \n", |
| 60 | + "Here, we load the yeast cross dataset.\n", |
| 61 | + "The data used in this study have already been converted into an hdf5 file. \n", |
62 | 62 | "To process your own data, please use the limix command line binary (see [here](http://nbviewer.jupyter.org/github/limix/limix-tutorials/blob/master/preprocessing_QC/loading_files.ipynb))."
|
63 | 63 | ]
|
64 | 64 | },
|
|
81 | 81 | "cell_type": "markdown",
|
82 | 82 | "metadata": {},
|
83 | 83 | "source": [
|
84 |
| - "#### Set up data object" |
| 84 | + "### Set up the data object" |
85 | 85 | ]
|
86 | 86 | },
|
87 | 87 | {
|
88 | 88 | "cell_type": "markdown",
|
89 | 89 | "metadata": {},
|
90 | 90 | "source": [
|
91 |
| - "Phenotypes and genotypes are stored inside the HDF5 file. Load them into a dataframe and select the first 3 phenotypes. " |
| 91 | + "Both the phenotypes and the genotypes are stored inside an HDF5 file. Load the data into a dataframe; here, we focus on the first 3 phenotypes. " |
92 | 92 | ]
|
93 | 93 | },
|
94 | 94 | {
|
|
462 | 462 | "cell_type": "markdown",
|
463 | 463 | "metadata": {},
|
464 | 464 | "source": [
|
465 |
| - "## Normal distributed phenotypes and phenotype transformations" |
| 465 | + "## Check the model assumptions (are the data normal?)" |
466 | 466 | ]
|
467 | 467 | },
|
468 | 468 | {
|
469 | 469 | "cell_type": "markdown",
|
470 | 470 | "metadata": {},
|
471 | 471 | "source": [
|
472 |
| - "To explore the phenotypic data, we create a histogram of the phenotype values." |
| 472 | + "Here, we use histograms to look at the distributions of the phenotypes." |
473 | 473 | ]
|
474 | 474 | },
|
475 | 475 | {
|
|
562 | 562 | "cell_type": "markdown",
|
563 | 563 | "metadata": {},
|
564 | 564 | "source": [
|
565 |
| - "Some of the phenotypes deviate from a normal distribution.\n", |
566 |
| - "One of the assumptions of the linear regression model we use for association testing is that the model residuals are normal distrbuted.\n", |
567 |
| - "Violation of this assumption leads to biases in the analysis.\n", |
568 |
| - "We only have access to the residuals after fitting the model.\n", |
569 |
| - "Under the assumption that the model eplains only a small portion of phenotypic variation we can assess the phenotype values instead." |
| 565 | + "Your data will often deviate from a normal distribution (sometimes drastically, like Cadmium Chloride shown above).\n", |
| 566 | + "Unfortunately, one of the assumptions of the model that we use in GWAS is that the residuals are normally distrbuted.\n", |
| 567 | + "Violations of this assumption can result in model misspecification and thus biased parameter estimates." |
570 | 568 | ]
|
571 | 569 | },
|
572 | 570 | {
|
573 | 571 | "cell_type": "markdown",
|
574 | 572 | "metadata": {},
|
575 | 573 | "source": [
|
576 |
| - "### Transforming phenotypes" |
| 574 | + "### Variance stabilizing transformations; standardizing the phenotypes" |
577 | 575 | ]
|
578 | 576 | },
|
579 | 577 | {
|
580 | 578 | "cell_type": "markdown",
|
581 | 579 | "metadata": {},
|
582 | 580 | "source": [
|
583 |
| - "To make the data look more normal distrbuted, we apply two different phenotype transformations, the Box-Cox transformation and a non-parametric rank-based transformation." |
| 581 | + "There are a wide variety of methods to stabilize variance and make data normally distributed. Here, we explore the usefulness of the Box-Cox transformation as well as a (non-parametric) rank-based transformation." |
584 | 582 | ]
|
585 | 583 | },
|
586 | 584 | {
|
|
594 | 592 | "cell_type": "markdown",
|
595 | 593 | "metadata": {},
|
596 | 594 | "source": [
|
597 |
| - "The Box-Cox transformation makes the data \"more normal\" by fitting a power transformation with one parameter to the observed phenotypic data." |
| 595 | + "The Box-Cox transformation makes the data \"more normal\" by fitting a power transformation ($y^{\\lambda}$, where $\\lambda$ is found using maximum likelihood) to the observed phenotypic data." |
598 | 596 | ]
|
599 | 597 | },
|
600 | 598 | {
|
|
926 | 924 | "cell_type": "markdown",
|
927 | 925 | "metadata": {},
|
928 | 926 | "source": [
|
929 |
| - "#### Manhattan plot" |
| 927 | + "### Plotting the results" |
930 | 928 | ]
|
931 | 929 | },
|
932 | 930 | {
|
933 | 931 | "cell_type": "markdown",
|
934 | 932 | "metadata": {},
|
935 | 933 | "source": [
|
936 |
| - "A common way to visualize the results of a GWAS is a so-called Manhattan plot, where the $-log_{10}$ P-values are plotted against the genomic position.\n", |
| 934 | + "A common way to visualize the results from GWAS is by using a so-called Manhattan plot, where the $-log_{10}$ P-values are plotted against the genomic position.\n", |
937 | 935 | "\n",
|
938 | 936 | "The LIMIX function for producing Manhattan plots is ``limix.plot.plot_manhattan`` (see [here][1]).\n",
|
939 | 937 | "\n",
|
|
1026 | 1024 | "cell_type": "markdown",
|
1027 | 1025 | "metadata": {},
|
1028 | 1026 | "source": [
|
1029 |
| - "##### GWAS using linear regression on the transformed phenotypes:" |
| 1027 | + "### Conducting GWAS with the transformed phenotypes:" |
1030 | 1028 | ]
|
1031 | 1029 | },
|
1032 | 1030 | {
|
1033 | 1031 | "cell_type": "markdown",
|
1034 | 1032 | "metadata": {},
|
1035 | 1033 | "source": [
|
1036 |
| - "First we analyze the Box-Cox transformed phenotypes." |
| 1034 | + "First we perform GWAS with the Box-Cox transformed phenotypes." |
1037 | 1035 | ]
|
1038 | 1036 | },
|
1039 | 1037 | {
|
|
1059 | 1057 | "cell_type": "markdown",
|
1060 | 1058 | "metadata": {},
|
1061 | 1059 | "source": [
|
1062 |
| - "Next, we analyze the phenotypes transformed by the rank-based transformation." |
| 1060 | + "Next, we investigate the rank-transformed phenotypes." |
1063 | 1061 | ]
|
1064 | 1062 | },
|
1065 | 1063 | {
|
|
1085 | 1083 | "cell_type": "markdown",
|
1086 | 1084 | "metadata": {},
|
1087 | 1085 | "source": [
|
1088 |
| - "To compare the results of the various transformations, we plot the p-values against one another:" |
| 1086 | + "To compare the results of the transformations, we can plot the p-values against one another:" |
1089 | 1087 | ]
|
1090 | 1088 | },
|
1091 | 1089 | {
|
|
1641 | 1639 | },
|
1642 | 1640 | {
|
1643 | 1641 | "cell_type": "code",
|
1644 |
| - "execution_count": 29, |
| 1642 | + "execution_count": 25, |
1645 | 1643 | "metadata": {},
|
1646 | 1644 | "outputs": [
|
1647 | 1645 | {
|
1648 | 1646 | "data": {
|
1649 | 1647 | "text/plain": [
|
1650 |
| - "<matplotlib.legend.Legend at 0x1a5bb55d10>" |
| 1648 | + "<matplotlib.legend.Legend at 0x1a24e668d0>" |
1651 | 1649 | ]
|
1652 | 1650 | },
|
1653 |
| - "execution_count": 29, |
| 1651 | + "execution_count": 25, |
1654 | 1652 | "metadata": {},
|
1655 | 1653 | "output_type": "execute_result"
|
1656 | 1654 | },
|
|
1716 | 1714 | "covars_conditional= sp.concatenate((geno_df.loc[sample_idx].values[:,imax:imax+1], sp.ones((phenotype_vals.values.shape[0],1))),1)\n",
|
1717 | 1715 | " \n",
|
1718 | 1716 | "\n",
|
1719 |
| - "#run linear regression on each SNP\n", |
| 1717 | + "#run linear regression on each SNP, while conditioning on the top SNP as a covariate.\n", |
1720 | 1718 | "lm_conditional = qtl_test_lm(snps=geno_df.loc[sample_idx].values,pheno=phenotype_vals.values,covs=covars_conditional)\n",
|
1721 | 1719 | "\n",
|
1722 |
| - "#convert P-values to a DataFrame for nice output writing:\n", |
| 1720 | + "#convert P-values to a pandas DataFrame:\n", |
1723 | 1721 | "pvalues_lm_conditional = pd.DataFrame(data=lm_conditional.pvalues.T,index=positions,\n",
|
1724 | 1722 | " columns=phenotype_ID)"
|
1725 | 1723 | ]
|
1726 | 1724 | },
|
1727 | 1725 | {
|
1728 | 1726 | "cell_type": "code",
|
1729 |
| - "execution_count": 30, |
| 1727 | + "execution_count": 27, |
1730 | 1728 | "metadata": {},
|
1731 | 1729 | "outputs": [
|
1732 | 1730 | {
|
|
0 commit comments