Chapter10 (#36)

* Edits to ch 10 based on 11/10 meeting * Update vaccine example to include data from paper
uchicago-dsi · Nov 30, 2022 · 2b047ac · 2b047ac
1 parent 4f2ed50
commit 2b047ac
Show file tree

Hide file tree

Showing 14 changed files with 115 additions and 30 deletions.
diff --git a/textbook/10/1/Causation.png b/textbook/10/1/Causation.png
diff --git a/textbook/10/1/Collider.png b/textbook/10/1/Collider.png
diff --git a/textbook/10/1/Confounder.png b/textbook/10/1/Confounder.png
diff --git a/textbook/10/1/causality.ipynb b/textbook/10/1/causality.ipynb
@@ -14,30 +14,52 @@
     "* **Common response** (confounding): some other variable *Z* causes change in both *X* and *Y*\n",
     "* **Common outcome** (colliding): changes in both *X* and *Y* cause change in some variable *Z*\n",
     "\n",
-    "Well-designed studies, which we will discuss further in the next section, can help distinguish between the three scenarios which are often depicted using causal graphs. A *causal graph* is a graph where each node depicts a variable and each edge is directed (an arrow) pointing in the direction of a cause. The figure below shows causal graphs as well as examples for all three scenarios.\n",
+    "Well-designed studies, which we will discuss further in the next section, can help distinguish between the three scenarios which are often depicted using causal graphs. A *causal graph* is a graph where each node depicts a variable and each edge is directed (an arrow) pointing in the direction of a cause. The figure below shows a causal graph as well as an examples of a *causal association*.\n",
     "\n",
-    "```{figure} ./causality.png\n",
+    "```{figure} ./Causation.png\n",
     "---\n",
     "align: center\n",
     "---\n",
-    "Three Types of Association\n",
+    "An example of a causal association\n",
     "```\n",
     "\n",
-    "The first panel above shows a *causal association*. When we see a causal association between *X* and *Y* we can depict it with an arrow from the cause to the effect. For example jumping in the lake is the direct cause of getting wet so the arrow is drawn from jumping in the lake to getting wet."
+    "When we see a causal association between *X* and *Y* we can depict it with an arrow from the cause to the effect. For example jumping in the lake is the direct cause of getting wet so the arrow is drawn from jumping in the lake to getting wet."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The second panel of the causal graph figure shows a false association between *X* and *Y* (the dotted line) that is present due to a *confounding* variable, *Z*. *Conditioning on* a confounding variable is best practice to remove the false association between *X* and *Y*. Conditioning on a variable means looking at only one value of the conditioned variable. For example: suppose we have a dataset that contains information about beach events. We plot ice cream sales and shark attacks and see that there is a positive association such that as ice cream sales increase so do shark attacks. Should we conclude that ice cream attracts sharks? Thinking more deeply about the problem, we realize that shark attacks increase when the weather is warm because there are more people in the ocean. Ice cream sales also increase during warm weather, therefore both variables have a common cause, weather. When we condition on weather and only consider ice cream sales and shark attacks in the summer months, the association disappears."
+    "## Confounding Variables\n",
+    "\n",
+    "The causal graph figure shows below a false association between *X* and *Y* (depicted by the dotted line) that is present due to a *confounding* variable, *Z*. \n",
+    "\n",
+    "```{figure} ./Confounder.png\n",
+    "---\n",
+    "align: center\n",
+    "---\n",
+    "An example of confounding\n",
+    "```\n",
+    "\n",
+    "*Conditioning on* a confounding variable is best practice to remove the false association between *X* and *Y*. Conditioning on a variable means looking at only one value of the conditioned variable. For example: suppose we have a dataset that contains information about beach events. We plot ice cream sales and shark attacks and see that there is a positive association such that as ice cream sales increase so do shark attacks. Should we conclude that ice cream attracts sharks? Thinking more deeply about the problem, we realize that shark attacks increase when the weather is warm because there are more people in the ocean. Ice cream sales also increase during warm weather, therefore both variables have a common cause: weather. When we condition on weather and only consider ice cream sales and shark attacks in the summer months, the association disappears."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The last panel of the causal graph figure depicts an association between *X* and *Y* is due to the *collider* variable, *Z*. We see false associations between two variables *X* and *Y* when both are causes of a third variable *Z* and we are conditioning on *Z*[^**]. For example: looking only at hospitalized patient data (conditioning on being hospitalized), we see a negative association between diabetes and heart disease such that those who have diabetes are less likely to have heart disease. However, it is known that diabetes is a risk factor of heart disease – having diabetes makes you more likely to develop heart disease – so we should see the opposite effect. This reversal in association occurs because we are only looking at hospitalized patients and both heart disease and diabetes are causes of hospitalization. Diabetes increases likelihood of heart disease and likelihood of hospitalization. Heart disease increases likelihood of hospitalization as well. If you are hopitalized for diabetes, it is less likely you also have heart disease. Therefore, those with diabetes in this sample of hospitalized patients have lower incidence of heart disease than those with diabetes in the general population, reversing the association between diabetes and heart disease.\n",
+    "## Colliding Variables\n",
+    "\n",
+    "The next causal graph figure depicts an association between *X* and *Y* is due to conditioning on the *collider* variable, *Z*. \n",
+    "\n",
+    "```{figure} ./Collider.png\n",
+    "---\n",
+    "align: center\n",
+    "---\n",
+    "An example of colliding\n",
+    "```\n",
+    "\n",
+    "We see false associations between two variables *X* and *Y* when both are causes of a third variable *Z* and we are conditioning on *Z*[^**]. For example: looking only at hospitalized patient data (conditioning on being hospitalized), we see a negative association between diabetes and heart disease such that those who have diabetes are less likely to have heart disease. However, it is known that diabetes is a risk factor of heart disease[^***] – having diabetes makes you more likely to develop heart disease – so we should see the opposite effect. This reversal in association occurs because we are only looking at hospitalized patients and both heart disease and diabetes are causes of hospitalization. Diabetes increases likelihood of heart disease and likelihood of hospitalization. Heart disease increases likelihood of hospitalization as well. If you are hopitalized for diabetes, it is less likely you also have heart disease. Therefore, those with diabetes in this sample of hospitalized patients have lower incidence of heart disease than those with diabetes in the general population, reversing the association between diabetes and heart disease.\n",
     "\n",
     "Since colliding is a difficult concept to grasp, consider another example. Suppose your friend is complaining about a recent date. The person she went to dinner with was very good-looking but had no sense of humor. Your friend comments that it seems all good-looking people have a bad sense of humor. You know that in reality looks and humor are not related. Your friend is conditioning on a collider by considering only people that she dates. She likely only dates people that meet a certain threshold of looks and humor. Those that are very good-looking don't need to have as good of a sense of humor to get a date whereas those who are less good-looking must have a better sense of humor. This creates a negative association between looks and humor that does not exist outside of her dating pool."
    ]
@@ -48,7 +70,8 @@
    "source": [
     "\n",
     "[^*]: An association is often referred to as a correlation. Correlations are discussed in more detail in Chapter 17.\n",
-    "[^**]: A more thorough discussion of colliders is beyond the scope of this book, but interested readers are referred to *The Book of Why* by Judea Pearl and Dana Mackenzie."
+    "[^**]: A more thorough discussion of colliders is beyond the scope of this book, but interested readers are referred to *The Book of Why* by Judea Pearl and Dana Mackenzie.\n",
+    "[^***]: Glovaci D, Fan W, Wong ND. Epidemiology of Diabetes Mellitus and Cardiovascular Disease. Curr Cardiol Rep. 2019 Mar 4;21(4):21. doi: 10.1007/s11886-019-1107-y. PMID: 30828746"
    ]
   }
  ],

diff --git a/textbook/10/1/causality.png b/textbook/10/1/causality.png