## 1. The Discovery of Handwashing
<p>In the mid 1800s, Dr. Ignaz Semmelweis was an obstetrician at Vienna General Hospital. At the time, maternal death due to puerperal fever was common, but he was particularly concerned that the death rate in his clinic (Clinic 1) was much higher than the death rate in another clinic at Vienna General Hospital (Clinic 2). <em>So what was the difference between these two clinics?</em> Doctors and midwives worked in Clinic 1, while only midwives worked in Clinic 2. This led Dr. Semmelweis to hypothesize that doctors carried deadly "cadaverous particles" from their autopsies to their patients in Clinic 2.</p>
<p>In 1847, Dr. Semmelweis instated a policy where doctors had to use a chlorine solution to wash their hands between performing autopsies and seeing patients. The maternal mortality rate drastically decreased as seen in the plot below. Sadly, germ theory (the idea that there are particles that cause disease) was not widely accepted at the time, so his hypothesis was rejected by most doctors.</p>
<p><img src="https://assets.datacamp.com/production/project_1187/img/semmelweis_plot.png" alt="Line plot of maternal mortality rate in Clinic 1 at Vienna General Hospital" width="600px"></p>
<p>The two datasets you will use are from Dr. Semmelweis's original 1859 publication<sup>1</sup>. Here are the details:</p>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
<div style="font-size:20px"><b>datasets/clinic_data.csv</b></div>
This contains yearly clinic-level data on births and maternal deaths in each of the two maternity clinics at Vienna General Hospital.
<ul>
<li><b><code>year</code>:</b> each year from 1833 to 1858</li>
<li><b><code>births</code>:</b> total number of births in the clinic</li>
<li><b><code>deaths</code>:</b> number of maternal deaths in the clinic</li>
<li><b><code>clinic</code>:</b> clinic (either <code>clinic_1</code> or <code>clinic_2</code>). Doctors and midwives worked in Clinic 1, while only midwives worked in Clinic 2.</li>
</ul>
</div>
<div style="background-color: #efebe4; color: #05192d; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
<div style="font-size:20px"><b>datasets/hospital_data.csv</b></div>
This contains yearly hospital-level data on births and maternal deaths. 
<ul>
<li><b><code>year</code>:</b> each year from 1784 to 1848</li>
<li><b><code>births</code>:</b> total number of births at the hospital</li>
<li><b><code>deaths</code>:</b> number of maternal deaths at the hospital</li>
<li><b><code>hospital</code>:</b> hospital (either <code>Vienna</code> or <code>Dublin</code>). At the Vienna General Hospital where Dr. Semmelweis worked, doctors began performing pathological autopsies in 1823. At the Dublin Rotunda Hospital, doctors did not perform pathological autopsies at all.</li>
</ul>
</div>
<p><small><sup>1</sup><a href="http://graphics8.nytimes.com/images/blogs/freakonomics/pdf/the%20etiology,%20concept%20and%20prophylaxis%20of%20childbed%20fever.pdf">Ignaz Semmelweis: The etiology, concept, and prophylaxis of childbed fever.</a></small></p>

In [52]:
# Load the data
clinic_data <- read.csv("datasets/clinic_data.csv")
hospital_data <- read.csv("datasets/hospital_data.csv")

# Use this cell to begin your analysis, and add as many as you would like!

In [53]:
library(dplyr)

In [54]:
head(clinic_data)
head(hospital_data)

Unnamed: 0_level_0,year,births,deaths,clinic
Unnamed: 0_level_1,<int>,<int>,<int>,<chr>
1,1833,3737,197,clinic_1
2,1834,2657,205,clinic_1
3,1835,2573,143,clinic_1
4,1836,2677,200,clinic_1
5,1837,2765,251,clinic_1
6,1838,2987,91,clinic_1


Unnamed: 0_level_0,year,births,deaths,hospital
Unnamed: 0_level_1,<int>,<int>,<int>,<chr>
1,1784,1261,11,Dublin
2,1785,1292,8,Dublin
3,1786,1351,8,Dublin
4,1787,1347,10,Dublin
5,1788,1469,23,Dublin
6,1789,1435,25,Dublin


In [55]:
# Question 1 (death rates)

clinic_data <- clinic_data %>%
mutate(death_rate = deaths/births)

#head(clinic_data)

hospital_data <- hospital_data %>%
mutate(death_rate = deaths/births)
#head(hospital_data)



In [56]:
#nrow(clinic_data)

rate_by_clinic_pre_handwashing <- data.frame(clinic_data %>%
filter(year < 1847) %>%
group_by(clinic = clinic) %>%
summarise(avg_rate = mean(death_rate)))

rate_by_clinic_pre_handwashing

`summarise()` ungrouping output (override with `.groups` argument)



clinic,avg_rate
<chr>,<dbl>
clinic_1,0.07993925
clinic_2,0.04787381


In [57]:
#head(hospital_data)
# nrow(hospital_data)
# unique(hospital_data$hospital)

rate_by_autopsies_introduced <- data.frame(hospital_data %>%
mutate(autopsies_introduced = ifelse(year < 1823, FALSE, TRUE)) %>%
filter(hospital == "Vienna") %>%
group_by(autopsies_introduced) %>%
summarize(avg_rate = mean(death_rate)))

rate_by_autopsies_introduced


`summarise()` ungrouping output (override with `.groups` argument)



autopsies_introduced,avg_rate
<lgl>,<dbl>
False,0.01166024
True,0.05877959
