# A Redesign Critique of John Snow's 1854 Cholera "Ghost Map"
### By Val Masters


![original_Snow](https://github.com/valhella/johnsnow/blob/master/images/snowmap.png?raw=true)

Figure 1. John Snow's map of the Soho area of London during the 1854 cholera epidemic. Circles show locations of water pumps and thin stacked lines represent deaths. From Snow, J., *On the Mode of Communication of Cholera*, 1855.

## Understanding the Creator's Context ##
Dr. John Snow (1813-1858) is famous for his data vizualiation of cholera deaths in 19th century London. Snow's motivations in creating the "Ghost Map", as the 1854 map shown is figure 1 is sometimes called, were in showing that cholera was transmitted via dirty water. The prevailing scientific and popular opinion at the time was that cholera was spread through the air (Koch & Denike 2006). 
His map was not entirely convincing to his professional contemporaries, as he was not able to obtain data to calulate precise mortality ratios, the accepted epidemiological metric at the time (Koch & Denike 2009; Koch & Denike 2006). 


## Understanding the Critic's Context
<p> As a scientist, my urge when examining Snow's map was to delve deeply into the possibilities of the data, examine what was there, what was missing, and what was possible to achieve with the available evidence. Most exciting to me were the hidden connections between the disesase and life: how the human conditions of gender, religion, occupation, living conditions, and social status interacted with cholera transmission. Since Snow's goal was to illustrate the medium and hypercenters of cholera transmission, I will construct my redesign with the same goal in mind, but with the modern tools available to me. I will take a slightly different approach than showing just death: I will show how far people traveled (in time and distance) from infection point to death, which given cholera's short incubation period should support Snow's hypothesis that pumps were the source of the disease. The fact that people died in a certain location does not necessarily support the pump hypothesis: more important is where they contracted the disease. Rather, death location could point to treatment centers or the homes of the afflicted's loved ones.

Unfortunately, I do not have data on the disease and location progression of individuals. Though I have data on how many people contracted the disease and died on specific days as well as the locations of deaths, since there is no unique identifier associated with each person I cannot know the fate of individuals nor can I correlate the date of deaths with the location of the deaths. No location data for attack is given, probably since it was not recorded. The difficulties I have with the data supports the notion that no data is raw, but rather cooked for a specific purpose: both Snow's decision in representing people impersonally and the decision of historians to run with this reflect a lack of importance accorded to the lives of the dead as individuals, who made important individual and culturally situated decisions. Most significant to Snow for the purposes of his map was where people died. However, a good deal more credit is due to him; since he worked mostly alone (and with no smartwatches) it would have been prohibitively difficult to track many individuals over time. He also collected some personal and community-specific data to aid his case of the relation between water usage and cholera, though this is not seen in the map (Koch & Denike, 2009).

    
 </p>




## The Iterative Design Process ###

I did not realize that I think in terms of independent and dependent variables until I looked back at my design notes. I insisted on structuring my thoughts this way, immediately jumping to a scientific worldview. I also considered a UXD perspective, though this was prompted by my peers' discussions. As my design process progressed, I incorporated more visual ways to structure my thinking, relating the various variables more concretely. Finally, my design coalesced around the idea of a journey map.

In [55]:
# Import html to display image-allows resizing
from IPython.display import Image
from IPython.core.display import HTML 
Image(url= "https://github.com/valhella/johnsnow/blob/master/images/redesign8may.jpg?raw=true", width = 500, height = 500)

In [56]:
from IPython.display import Image
from IPython.core.display import HTML 
Image(url= "https://github.com/valhella/johnsnow/blob/master/images/redesign13may.jpg?raw=true", width = 500, height = 500)

In [57]:
from IPython.display import Image
from IPython.core.display import HTML 
Image(url= "https://github.com/valhella/johnsnow/blob/master/images/redesign13may_2.jpg?raw=true", width = 500, height = 500)

In [58]:
from IPython.display import Image
from IPython.core.display import HTML 
Image(url = "https://github.com/valhella/johnsnow/blob/master/images/mapsketch.jpg?raw=true", width = 900, height = 500)

<p style="text-align: center;">My overly ambitious sketch of my redesign, featuring interactivity in the form of widgets and animated, clickable lines.<- /p>

## The Exploratory, Rhetorical, Empathetic Data Object ###

Modern epidemiologists would not find Snow's map terribly useful in directing specific action or further inquiry. This is because Snow's map does not address other variables that could have been significant (which makes his argument quite clear, at the expense of being scientifically rigorous) or show essential components such as the areas serviced by particular pumps and the associated mortality ratios, which would have been possible to calculate at the time (Koch & Denike 2009).Today, the general public might find Snow's map clear and helpful, but not necessarily emotionally compelling. Modern authors such as Koch and Denike (2009) have reconstructed Snow's map with greater rigor and statistical methods, so I will take a tact that I have not seen in the literature.

My visualization illustrates the concept of reconciling the needs for an epidemiologically useful graphic and a personal, emotionally compelling one. I made up the data, but this could become a real data object for modern day epidemics if a willing subset of the population transmitted gps coordinates and health data for collection.

The graphic tracks named two named individuals using a birds-eye view on a 2-d map during the progression of their disease. The starting point of each progression, indicated by an opaque outline on the marker, indicates when a person first developed symptoms and is labeled with name and date. The points tracking their movements (one location point per hour) go from large radius to small radius as their health worsens. Since the points are translucent, darker clusters represent more time spent in a location. The termination of the each line is a marker outlined in black, showing death, or white, showing recovery. Each is labeled with a name and date so that users can follow up on the stories of the people.

The map would provide useful information to epidemiologists, such as interaction between infected individuals, visits to pumps, and rate of health decline. From an emotional standpoint, the graphic shows a snapshot of the end of named individuals lives, which should allow viewers to better understand the human toll of the epidemic. 


In [59]:
# Here is my fake data I constructed by making up people,
# a continuous health variable from 10, perfectly healthy, to 0, dead,
# and a path consisting of latitude and longitude points taken every hour 
# from when a person first shows symptoms of disease.

# import pandas
import pandas as pd

# read in the data
journey1 = pd.read_csv('https://raw.githubusercontent.com/valhella/johnsnow/master/data/journey1.csv')
journey2 = pd.read_csv('https://raw.githubusercontent.com/valhella/johnsnow/master/data/journey2.csv')

# print shape of dataset
print(journey1.shape)

# Printing out the first 5 rows
print(journey1.head(5))

(24, 5)
   personid  hour  x_latitude  y_longitude  health
0         1     1    51.51185      -0.1369      10
1         1     2    51.51185      -0.1369      10
2         1     3    51.51185      -0.1369      10
3         1     4    51.51185      -0.1369      10
4         1     5    51.51185      -0.1369      10


In [60]:
# create `locations` variable for each person by subsetting only Latitude and Longitude from the dataset 
locations1 = journey1[['x_latitude', 'y_longitude']]
locations2 = journey2[['x_latitude', 'y_longitude']]

# create `locations_list` variables for each person by transforming the DataFrame to list of lists 
locations_list1 = locations1[['x_latitude', 'y_longitude']].values.tolist()
locations_list2 = locations2[['x_latitude', 'y_longitude']].values.tolist()

In [61]:
# import numpy to read csv files easily
import numpy as np

# create variables holding the array of health values for each person

# person 1
file1 = np.genfromtxt(r'https://raw.githubusercontent.com/valhella/johnsnow/master/data/journey1.csv', delimiter=',', names=True, dtype=None)
health1 = file1['health']

# person 2
file2 = np.genfromtxt(r'https://raw.githubusercontent.com/valhella/johnsnow/master/data/journey2.csv', delimiter=',', names=True, dtype=None)
health2 = file2['health']

In [62]:
# import the library
import folium

# Make map centered on broad st pump
m = folium.Map(location=[51.5132119,-0.13666], tiles='Stamen Toner', zoom_start=17)

#-----
# label start point for person 1
folium.Circle(locations_list1[0], radius=8, color='green', fill=True, fill_color='green', opacity = 1).add_child(folium.Popup('MaryJane first symptoms')).add_to(m)        

# print middle set of points for person 1
for point in range(1, len(locations1)-1):
    if health1[point] >= 9: 
        folium.CircleMarker(locations_list1[point], radius=16, color='green', fill=True, fill_color='green', opacity = 0.6).add_to(m)
    elif health1[point] >= 6 and health1[point] < 9:
         folium.CircleMarker(locations_list1[point], radius=8, color='green', fill=True, fill_color='green', opacity = 0.6).add_to(m)
    elif health1[point] >= 3 and health1[point] < 6:
        folium.CircleMarker(locations_list1[point], radius=4, color='green', fill=True, fill_color='green', opacity = 0.6).add_to(m)
    else:
        folium.CircleMarker(locations_list1[point], radius=2, color='green', fill=True, fill_color='green', opacity = 0.6).add_to(m)

# label point for person 1
last_m=((len(locations_list1))-1)
folium.Circle(locations_list1[last_m], radius=8, color='black', fill=True, fill_color='green', opacity = 0.7).add_child(folium.Popup('MaryJane passes, August 6, 1854')).add_to(m)

#------
#label start point for person 2
folium.Circle(locations_list2[0], radius=8, color='blue', fill=True, fill_color='blue', opacity = 1).add_child(folium.Popup('Dennison first symptoms')).add_to(m)        

# print middle set of points for person 2
for point in range(1, len(locations2)-1):
    if health2[point] >= 9:
        folium.CircleMarker(locations_list2[point], radius=16, color='blue', fill=True, fill_color='blue', opacity = 0.6).add_to(m)  
    elif health2[point] >= 6 and health2[point] < 9:
        folium.CircleMarker(locations_list2[point], radius=8, color='blue', fill=True, fill_color='blue', opacity = 0.6).add_to(m)  
    elif health2[point] >= 3 and health2[point] < 6:
        folium.CircleMarker(locations_list2[point], radius=4, color='blue', fill=True, fill_color='blue', opacity = 0.6).add_to(m)  
    else:
        folium.CircleMarker(locations_list2[point], radius=2, color='blue', fill=True, fill_color='blue', opacity = 0.6).add_to(m)  

#label end point for person 2
last_d=((len(locations_list2))-1)
folium.Circle(locations_list2[last_d], radius=8, color='black', fill=True, fill_color='blue', opacity = 0.7).add_child(folium.Popup('Dennison passes, June, 20, 1854')).add_to(m)

#-----

# import pumps data
pumps = pd.read_csv('pumps.csv')

# Subset the pumps DataFrame and select just ['X coordinate', 'Y coordinate'] columns
locations_pumps = pumps[['X coordinate', 'Y coordinate']]

# Transform the pumps DataFrame to list of lists in form of ['X coordinate', 'Y coordinate'] pairs
pumps_list = locations_pumps[['X coordinate', 'Y coordinate']].values.tolist()

# Create a for loop and plot the pumps data using folium (use previous map + add another layer)
map1 = m
for point in range(0, len(locations_pumps)):
    folium.Marker(pumps_list[point], popup=pumps['Pump Name'][point]).add_to(map1)

# call the map
map1
    

MaryJane and Dennison never met. They only lived two blocks away from one another, growing up in dirty South London. MaryJane lived on Kingly Street, and according to her friends she loved to visit Golden Square park. She worked as a launderer for the nearby orphanage. She contracted cholera on August 5th, 1854, after taking a swig of water she collected at nearby Warwick St. pump the day before. She died in her favorite park only 24 hours later. Dennison was an old man of ailing health when he contracted cholera, having lived only a block away from the Broad St. pump. He was a local tinkerer even in his old age, and he often set up his shed near the pump to attract customers. His business was ailing by the time he got sick, as people had stopped coming to the pump so often. Too late, he realized why. Dennison passed on June 20th, 1854. 

## Future Directions and Conclusion

In further iterations of this design, I would like to animate the paths and allow users to add the number of people whose paths they want to see with a slider widget, in order to see aggregate trends or focus more on individual stories. They could also select a date range they want to view, which would show all the journeys that initiated and/or ended within that time period. Associated text will feature profiles (including images) on the individuals for whom a large amount of information is available and whose families consented to the project, to increase empathetic connection and understanding of the human toll of the epidemic. Of course, this speculative design could become non-speculative for modern epidemics where robust data profiles of individuals and their health and movements can be collected. In this situation, the ethics of data collection would need to be considered. This future iteration would make greater use of modern data visualization structures (animation, interactivity) that would likely draw a user in more than Snow's paper map would. People love to slide the sliders. If someone knew when an ancestor died, they could focus the date range around that date and see if they could follow their relative's last days (or close call). While my current iteration of redesign does not allow for this high level of interactivity, it showcases the scientific and emotional potential of map-based objects. 


## References ##

[Koch and Denike, 2009.](https://www.ncbi.nlm.nih.gov/pubmed/19716638) 

[Koch and Denike, 2006.](https://www.ncbi.nlm.nih.gov/pubmed/16457925)

My github repository in case any links break: https://valhella.github.io/johnsnow/
