Skip to content

Commit a500145

Browse files
committed
small text things, cleaned up FDR/q-value defs
1 parent 648599c commit a500145

File tree

3 files changed

+66
-62
lines changed

3 files changed

+66
-62
lines changed

Lecture-2-Linear-Regression.ipynb

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -564,7 +564,7 @@
564564
"source": [
565565
"Your data will often deviate from a normal distribution (sometimes drastically, like Cadmium Chloride shown above).\n",
566566
"However, one of the assumptions of the model that we use in GWAS is that the residuals are normally distrbuted.\n",
567-
"Violations of this assumption can result in model misspecification and thus biased parameter estimates."
567+
"Violations of this assumption can result in model misspecification and biased parameter estimates."
568568
]
569569
},
570570
{
@@ -578,7 +578,7 @@
578578
"cell_type": "markdown",
579579
"metadata": {},
580580
"source": [
581-
"There are a wide variety of methods to stabilize variance and make data normally distributed. Here, we explore the usefulness of the Box-Cox transformation as well as a (non-parametric) rank-based transformation."
581+
"There are a wide variety of methods to stabilize variance and make data normally distributed. Here, we explore the Box-Cox transformation as well as a (non-parametric) rank-based transformation."
582582
]
583583
},
584584
{
@@ -615,7 +615,7 @@
615615
"cell_type": "markdown",
616616
"metadata": {},
617617
"source": [
618-
"The rank transformation normalizes the data by converting the data to ranks and then transforming these ranks to the corresponding quantiles of a normal distribution. Because this transformation does not rely on a parameter (or actually one parameter per sample, namely the normal quantile), it is called non-parametric.\n",
618+
"The rank transformation normalizes the data by converting the data to ranks and then transforming these ranks to the corresponding quantiles of a normal distribution. Because this transformation does not rely on or specify a parameter it is considered non-parametric.\n",
619619
"\n",
620620
"Before using a rank-based transformation, you should consider whether other models (e.g. the binomial model) are more appropriate for your data."
621621
]
@@ -831,7 +831,7 @@
831831
"cell_type": "markdown",
832832
"metadata": {},
833833
"source": [
834-
"Next, we convert the P-values to a pandas DataFrame:"
834+
"Next, we convert the P-values into a pandas DataFrame:"
835835
]
836836
},
837837
{
@@ -1532,9 +1532,11 @@
15321532
"cell_type": "markdown",
15331533
"metadata": {},
15341534
"source": [
1535-
"False discovery rates (FDR) give an idea of the expected type-1 error rate at a given P-value threshold. If we are testing millions of hypotheses, then we might be willing to accept type-1 errors at a given rate, if in return we get more discoveries.\n",
1535+
"False discovery rates (FDR) give an idea of the expected type-1 error rate at a given *P*-value threshold. This measure gives a useful alternative to traditional Bonferroni correction, which bounds the so-called family-wise error rate (FWER), namely the probability of having at least a single type-1 error.\n",
15361536
"\n",
1537-
"This measure gives a useful alternative to traditional Bonferroni correction, which bounds the so-called family-wise error rate (FWER), namely the probability of having at least a single type 1 error."
1537+
"That is, a *P* value is the rate at which truly null hypotheses are called significant.\n",
1538+
"\n",
1539+
"The FDR is the rate that which significant results are truly null. So an FDR rate of 5% means that - among all of the features that are called significant - 5% of these will be false positives."
15381540
]
15391541
},
15401542
{
@@ -1548,7 +1550,9 @@
15481550
"cell_type": "markdown",
15491551
"metadata": {},
15501552
"source": [
1551-
"Definition: minimum false discovery rate threshold that would allow the variable to be significant."
1553+
"These are, like _P_ values, a measure of significance for a given test. \n",
1554+
"\n",
1555+
"In practice, if one is willing to accept a result at an FDR of the given $q$ value (e.g. 0.03) then among the results that have a lower $q$ value, $q$ percent of those will be false positives (3% in this example)."
15521556
]
15531557
},
15541558
{
@@ -1638,7 +1642,7 @@
16381642
{
16391643
"data": {
16401644
"text/plain": [
1641-
"<matplotlib.legend.Legend at 0x1a24e668d0>"
1645+
"<matplotlib.legend.Legend at 0x1a23b30290>"
16421646
]
16431647
},
16441648
"execution_count": 25,

0 commit comments

Comments
 (0)