# Task 1: Instructions

Import the data that John Snow collected about the cholera epidemic.

- Read about Dr. John Snow to the right.
- Load in the `pandas` module.
- Import the data `deaths.csv` and assign the resulting DataFrame to `deaths`.
- Print out the first rows of `deaths`.

## Good to know

This Project is designed to test also your knowledge of pandas and Bokeh. If you'd like to refresh your memory, the recommended prerequisites for this course are [Data Manipulation with pandas](https://www.datacamp.com/courses/data-manipulation-with-pandas) and [Interactive Data Visualization with Bokeh](https://www.datacamp.com/courses/interactive-data-visualization-with-bokeh).

Even if you've finished all the DataCamp Python courses you may still find this project challenging unless you use/read some external _documentation_.
In this case check out Karlijn's Datacamp pandas DataFrame [tutorial](https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python), Hugo's Hierarchical indices, groupby and pandas [tutorial](https://www.datacamp.com/community/tutorials/pandas-multi-index), and pandas' [cheat sheet](https://github.com/pandas-dev/pandas/raw/master/doc/cheatsheet/Pandas_Cheat_Sheet.pdf) that summarizes the basics of pandas DataFrames. You could also look at the official pandas [documentation](https://pandas.pydata.org/pandas-docs/stable/pandas.pdf).

Stack Overflow is also a useful resource. A handy search pattern is **example of ??? in pandas** where **???*** is what you need to do.

A big thank you to [Robin Wilson](http://blog.rtwilson.com/john-snows-famous-cholera-analysis-data-in-modern-gis-formats/) from Southampton University who digitized John Snow’s original data and georeferenced it to the Ordnance Survey co-ordinate system which will allow us to analyze it and overlay it on modern maps of that area.

# Task 2: Instructions

Check, rename columns, and describe the DataFrame.

- Summarize the content of `deaths` (from previous exercise) with `.info()` method.
- Prepare dictionary that will be used to rename the `Death`, `X coordinate`, and `Y coordinate` columns to `death_count`, `x_latitude`, and `y_longitude`, respectively.
- Rename the columns of the dataset with the `.rename()` method.
- Describe the dataset with the `.describe()` method.

The following exercise may be helpful:

[Inspecting DataFrames](https://campus.datacamp.com/courses/pandas-foundations/data-ingestion-inspection?ex=2) from [`pandas` Foundations](https://www.datacamp.com/courses/pandas-foundations)

# Task 3: Instructions

Prepare and pre-process the data for plotting.

- Create a subset (called `locations`) of the original dataset selecting only `x_latitude` and `y_longitude` columns.
- Transform this subset into list of `x_latitude` and `y_longitude` pairs and name it `deaths_list`.
- Check the length of this list (the number of pairs).

The following links may be helpful:

- [Selecting columns using `[]`](https://campus.datacamp.com/courses/intermediate-python-for-data-science/dictionaries-pandas?ex=15) from [Intermediate Python for Data Science](https://www.datacamp.com/courses/intermediate-python-for-data-science).
- [Subselecting DataFrames with lists](https://campus.datacamp.com/courses/manipulating-dataframes-with-pandas/extracting-and-transforming-data?ex=8) from [Manipulating DataFrames with `pandas`](https://www.datacamp.com/courses/manipulating-dataframes-with-pandas).
- [Pandas DataFrame to list](https://stackoverflow.com/questions/23748995/pandas-dataframe-to-list) on Stack Overflow.

# Task 4: Instructions

Loop through the pre-processed data to create a map.

- Fill in the `len` function to loop through the data.

More info about on loops can be found here: [Loops/LearnPython.org](https://www.learnpython.org/en/Loops).

Basic info about the folium library is found here: [Folium 0.5.0 documentation](http://python-visualization.github.io/folium/).

The map displayed in the notebook is also in the 2nd reprint of _On the Mode of Communication of Cholera (1855)_ that is publicly available [here](http://www.academia.dk/MedHist/Biblioteket/Print/snow_1855.html).

# Task 5: Instructions

Recreate The Ghost Map.

- Import the data `pumps.csv` and assign the resulting DataFrame to `pumps`.
- Create subset `locations_pumps` of the original dataset (select only `'X coordinate'` and `'Y coordinate'` columns).
- Transform this subset into list of `'X coordinate'` and `'Y coordinate'` pairs and call it `pumps_list`.
- Create `for loop` to plot all the points on a map (we will use `folium/Leaflet library` again).

The following exercises may be helpful:

- [Selecting columns using `[]`](https://campus.datacamp.com/courses/intermediate-python-for-data-science/dictionaries-pandas?ex=15) from [Intermediate Python for Data Science](https://www.datacamp.com/courses/intermediate-python-for-data-science).
- [Dates in DataFrames](https://campus.datacamp.com/courses/pandas-foundations/time-series-in-pandas?ex=2) from [`pandas` Foundations](https://www.datacamp.com/courses/pandas-foundations).
- [Subselecting DataFrames with lists](https://campus.datacamp.com/courses/manipulating-dataframes-with-pandas/extracting-and-transforming-data?ex=8) from [Manipulating DataFrames with `pandas`](https://www.datacamp.com/courses/manipulating-dataframes-with-pandas).

# Task 6: Instructions

Reanalyze the John Snow's data about the Cholera Outbreak.

- Import the data `dates.csv` as DataFrame `dates` and parse `date` column as the datetime data type.
- Create new column `day_name` that will contain name of the day (Monday to Sunday) using `dt.day_name()` attribute.
- Create new column `handle` that will contain a Boolean (True or False) for whether or not the handle was present.

The following links may be helpful:

- [pandas.to_datetime](https://pandas.pydata.org/pandas-docs/version/0.20/generated/pandas.to_datetime.html).
- [Selecting columns using `[]`](https://campus.datacamp.com/courses/intermediate-python-for-data-science/dictionaries-pandas?ex=15) from [Intermediate Python for Data Science](https://www.datacamp.com/courses/intermediate-python-for-data-science).
- [Dates in DataFrames](https://campus.datacamp.com/courses/pandas-foundations/time-series-in-pandas?ex=2) from [`pandas` Foundations](https://www.datacamp.com/courses/pandas-foundations).

# Task 7: Instructions

Visualize the data about the Cholera Outbreak using the Bokeh library.

- Plot a line graph for cholera deaths vs. date.
- Plot a circle/point graph for cholera deaths vs. date.
- Plot a line graph for cholera attacks vs. date.

The following exercises may be helpful:

- From [Interactive Data Visualization with Bokeh](https://campus.datacamp.com/courses/interactive-data-visualization-with-bokeh/):
    - [Plotting data from Pandas DataFrames](https://campus.datacamp.com/courses/interactive-data-visualization-with-bokeh/basic-plotting-with-bokeh?ex=12).
    - [Lines](https://campus.datacamp.com/courses/interactive-data-visualization-with-bokeh/basic-plotting-with-bokeh?ex=7).
    - [Customizing glyphs](https://campus.datacamp.com/courses/interactive-data-visualization-with-bokeh/basic-plotting-with-bokeh?ex=15).
    - [Selection and non-selection glyphs](https://campus.datacamp.com/courses/interactive-data-visualization-with-bokeh/basic-plotting-with-bokeh?ex=16).
    - [How to create legends](https://campus.datacamp.com/courses/interactive-data-visualization-with-bokeh/layouts-interactions-and-annotations?ex=14).
- [Dates in DataFrames](https://campus.datacamp.com/courses/pandas-foundations/time-series-in-pandas?ex=2) from [`pandas` Foundations](https://www.datacamp.com/courses/pandas-foundations).

# Task 8: Instructions

True or false?

- Given the data John Snow collected and The Ghost Map he created, is it `True` or `False` that he "knows nothing"?

Congratulations, for completing the project!

Good luck on your data science journey! :)