timeu
diff --git a/‎Lecture-2-Linear-Regression.ipynb
Lines changed: 12 additions & 8 deletions b/‎Lecture-2-Linear-Regression.ipynb
Lines changed: 12 additions & 8 deletions
@@ -564,7 +564,7 @@
    "source": [
     "Your data will often deviate from a normal distribution (sometimes drastically, like Cadmium Chloride shown above).\n",
     "However, one of the assumptions of the model that we use in GWAS is that the residuals are normally distrbuted.\n",
-    "Violations of this assumption can result in model misspecification and thus biased parameter estimates."
+    "Violations of this assumption can result in model misspecification and biased parameter estimates."
    ]
   },
   {
@@ -578,7 +578,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "There are a wide variety of methods to stabilize variance and make data normally distributed. Here, we explore the usefulness of the Box-Cox transformation as well as a (non-parametric) rank-based transformation."
+    "There are a wide variety of methods to stabilize variance and make data normally distributed. Here, we explore the Box-Cox transformation as well as a (non-parametric) rank-based transformation."
    ]
   },
   {
@@ -615,7 +615,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The rank transformation normalizes the data by converting the data to ranks and then transforming these ranks to the corresponding quantiles of a normal distribution. Because this transformation does not rely on a parameter (or actually one parameter per sample, namely the normal quantile), it is called non-parametric.\n",
+    "The rank transformation normalizes the data by converting the data to ranks and then transforming these ranks to the corresponding quantiles of a normal distribution. Because this transformation does not rely on or specify a parameter it is considered non-parametric.\n",
     "\n",
     "Before using a rank-based transformation, you should consider whether other models (e.g. the binomial model) are more appropriate for your data."
    ]
@@ -831,7 +831,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Next, we convert the P-values to a pandas DataFrame:"
+    "Next, we convert the P-values into a pandas DataFrame:"
    ]
   },
   {
@@ -1532,9 +1532,11 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "False discovery rates (FDR) give an idea of the expected type-1 error rate at a given P-value threshold. If we are testing millions of hypotheses, then we might be willing to accept type-1 errors at a given rate, if in return we get more discoveries.\n",
+    "False discovery rates (FDR) give an idea of the expected type-1 error rate at a given *P*-value threshold. This measure gives a useful alternative to traditional Bonferroni correction, which bounds the so-called family-wise error rate (FWER), namely the probability of having at least a single type-1 error.\n",
     "\n",
-    "This measure gives a useful alternative to traditional Bonferroni correction, which bounds the so-called family-wise error rate (FWER), namely the probability of having at least a single type 1 error."
+    "That is, a *P* value is the rate at which truly null hypotheses are called significant.\n",
+    "\n",
+    "The FDR is the rate that which significant results are truly null. So an FDR rate of 5% means that - among all of the features that are called significant - 5% of these will be false positives."
    ]
   },
   {
@@ -1548,7 +1550,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Definition: minimum false discovery rate threshold that would allow the variable to be significant."
+    "These are, like _P_ values, a measure of significance for a given test. \n",
+    "\n",
+    "In practice, if one is willing to accept a result at an FDR of the given $q$ value (e.g. 0.03) then among the results that have a lower $q$ value, $q$ percent of those will be false positives (3% in this example)."
    ]
   },
   {
@@ -1638,7 +1642,7 @@
     {
      "data": {
       "text/plain": [
-       "<matplotlib.legend.Legend at 0x1a24e668d0>"
+       "<matplotlib.legend.Legend at 0x1a23b30290>"
       ]
      },
      "execution_count": 25,
Original file line number	Diff line number	Diff line change
`@@ -564,7 +564,7 @@`
`564`	`564`	`"source": [`
`565`	`565`	`"Your data will often deviate from a normal distribution (sometimes drastically, like Cadmium Chloride shown above).\n",`
`566`	`566`	`"However, one of the assumptions of the model that we use in GWAS is that the residuals are normally distrbuted.\n",`
`567`		`- "Violations of this assumption can result in model misspecification and thus biased parameter estimates."`
	`567`	`+ "Violations of this assumption can result in model misspecification and biased parameter estimates."`
`568`	`568`	`]`
`569`	`569`	`},`
`570`	`570`	`{`
`@@ -578,7 +578,7 @@`
`578`	`578`	`"cell_type": "markdown",`
`579`	`579`	`"metadata": {},`
`580`	`580`	`"source": [`
`581`		`- "There are a wide variety of methods to stabilize variance and make data normally distributed. Here, we explore the usefulness of the Box-Cox transformation as well as a (non-parametric) rank-based transformation."`
	`581`	`+ "There are a wide variety of methods to stabilize variance and make data normally distributed. Here, we explore the Box-Cox transformation as well as a (non-parametric) rank-based transformation."`
`582`	`582`	`]`
`583`	`583`	`},`
`584`	`584`	`{`
`@@ -615,7 +615,7 @@`
`615`	`615`	`"cell_type": "markdown",`
`616`	`616`	`"metadata": {},`
`617`	`617`	`"source": [`
`618`		`- "The rank transformation normalizes the data by converting the data to ranks and then transforming these ranks to the corresponding quantiles of a normal distribution. Because this transformation does not rely on a parameter (or actually one parameter per sample, namely the normal quantile), it is called non-parametric.\n",`
	`618`	`+ "The rank transformation normalizes the data by converting the data to ranks and then transforming these ranks to the corresponding quantiles of a normal distribution. Because this transformation does not rely on or specify a parameter it is considered non-parametric.\n",`
`619`	`619`	`"\n",`
`620`	`620`	`"Before using a rank-based transformation, you should consider whether other models (e.g. the binomial model) are more appropriate for your data."`
`621`	`621`	`]`
`@@ -831,7 +831,7 @@`
`831`	`831`	`"cell_type": "markdown",`
`832`	`832`	`"metadata": {},`
`833`	`833`	`"source": [`
`834`		`- "Next, we convert the P-values to a pandas DataFrame:"`
	`834`	`+ "Next, we convert the P-values into a pandas DataFrame:"`
`835`	`835`	`]`
`836`	`836`	`},`
`837`	`837`	`{`
`@@ -1532,9 +1532,11 @@`
`1532`	`1532`	`"cell_type": "markdown",`
`1533`	`1533`	`"metadata": {},`
`1534`	`1534`	`"source": [`
`1535`		`- "False discovery rates (FDR) give an idea of the expected type-1 error rate at a given P-value threshold. If we are testing millions of hypotheses, then we might be willing to accept type-1 errors at a given rate, if in return we get more discoveries.\n",`
	`1535`	`+ "False discovery rates (FDR) give an idea of the expected type-1 error rate at a given P-value threshold. This measure gives a useful alternative to traditional Bonferroni correction, which bounds the so-called family-wise error rate (FWER), namely the probability of having at least a single type-1 error.\n",`
`1536`	`1536`	`"\n",`
`1537`		`- "This measure gives a useful alternative to traditional Bonferroni correction, which bounds the so-called family-wise error rate (FWER), namely the probability of having at least a single type 1 error."`
	`1537`	`+ "That is, a P value is the rate at which truly null hypotheses are called significant.\n",`
	`1538`	`+ "\n",`
	`1539`	`+ "The FDR is the rate that which significant results are truly null. So an FDR rate of 5% means that - among all of the features that are called significant - 5% of these will be false positives."`
`1538`	`1540`	`]`
`1539`	`1541`	`},`
`1540`	`1542`	`{`
`@@ -1548,7 +1550,9 @@`
`1548`	`1550`	`"cell_type": "markdown",`
`1549`	`1551`	`"metadata": {},`
`1550`	`1552`	`"source": [`
`1551`		`- "Definition: minimum false discovery rate threshold that would allow the variable to be significant."`
	`1553`	`+ "These are, like _P_ values, a measure of significance for a given test. \n",`
	`1554`	`+ "\n",`
	`1555`	`+ "In practice, if one is willing to accept a result at an FDR of the given $q$ value (e.g. 0.03) then among the results that have a lower $q$ value, $q$ percent of those will be false positives (3% in this example)."`
`1552`	`1556`	`]`
`1553`	`1557`	`},`
`1554`	`1558`	`{`
`@@ -1638,7 +1642,7 @@`
`1638`	`1642`	`{`
`1639`	`1643`	`"data": {`
`1640`	`1644`	`"text/plain": [`
`1641`		`- "<matplotlib.legend.Legend at 0x1a24e668d0>"`
	`1645`	`+ "<matplotlib.legend.Legend at 0x1a23b30290>"`
`1642`	`1646`	`]`
`1643`	`1647`	`},`
`1644`	`1648`	`"execution_count": 25,`