# 🖥️ Altair 1: Health and Death in the Crimean War

**The following homework has 10 problems. To see how each problem will be graded, [rubrics that are linked in our syllabus](https://docs.google.com/document/d/1LEm11acAZC-MG5ylcJ5x7CpC2C6NcrTdU3yj3EgE_WA/edit?usp=sharing). This assignment will be graded out of 33 points.**

## To get started - Make a copy of this
- To make your own copy of this colab, click `File` --> `Save a copy in Drive`
- Alternatively, if you are more comfortable in other Jupyter environments, you may download of this notebook (`File` --> `Download`) and open it in the environment in which you're more comfortable.

## The goals of this assignment
- To apply different visual encodings to different data attributes
- To combine multiple visual encodings on the same chart (ex: bar + text)
- To make stylistic modifications of your visualizations
- To become more comfortable using AI assistants in your learning processes. I encourage you to use chatGPT or bard.google.com or something else!

## Constraints and permissions
- You must use Altair for this assignment
- You **may** use the internet, videos, and AI (like chat GPT) to help you make the chart (see syllabus for more detailed instructions about AI).

## Updating the Altair Library

Altair frequently updates - faster than Jupyter or Colab typically update. So we'll run this code first to update the library.
- After running this code, go to the menu at the top of the screen and click `Runtime` --> `Restart Session`. This will make sure your updated version is active.
- I typically comment out this code after I do that step (put a `#` in front of the code) so that I don't keep running it over and over again

In [4]:
pip install -U altair vega_datasets

Defaulting to user installation because normal site-packages is not writeable
You should consider upgrading via the '/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.


# The Background: Your Data


## An Introduction to the data

> *Printed tables and all-in double columns, I do not think anyone will read. None but scientific men ever look in the Appendix of a Report. And this is for the public.* - **Florence Nightingale**

For this assignment, we'll use Florence Nightingale's famous dataset containing monthly casualty counts from the Crimean war (we saw her coxcomb plot appear in our first lecture slides). It was originally published in 1858 by Florence Nightingale *(Nightingale, F. (1858) 'Notes on Matters Affecting the Health, Efficiency and Hospital Administration of the British Army'. RCIN 1075240)*

This dataset underlies a historic example of data visualization being used to sway policy-makers.


## Loading the Data (I do this for you)


Below, I've loaded the data from a pre-existing vega dataset. Then, I changed the format a bit to make it easier to perform some different manipulations in Altair.

In [5]:
import altair as alt # Loading the altair library
from vega_datasets import data

crimea = data.crimea()
crimea


Unnamed: 0,date,wounds,other,disease
0,1854-04-01,0,110,110
1,1854-05-01,0,95,105
2,1854-06-01,0,40,95
3,1854-07-01,0,140,520
4,1854-08-01,20,150,800
5,1854-09-01,220,230,740
6,1854-10-01,305,310,600
7,1854-11-01,480,290,820
8,1854-12-01,295,310,1100
9,1855-01-01,230,460,1440


An important tip for the future is that **altair is easiest to use with tidy data** (to read more about tidy data [you can follow this link](https://byuidatascience.github.io/python4ds/tidy-data.html)). In this case, it works best when there is a _single observation in each row of data_. In the original dataset (obove), each row had 3 different observations - `wounds`, `other`, and `disease`.

Below, I rearranged the dataset so that it each row has a _single_ observation, which we can do by adding a `cause` column which contains the value `wounds`, `disease`, or `other`.

In [6]:
# Melting the dataframe to a more tidy format
crimea = crimea.melt(id_vars=['date'], value_vars=['wounds', 'other', 'disease'], var_name='cause', value_name='count')
crimea = crimea.sort_values('date')
crimea

Unnamed: 0,date,cause,count
0,1854-04-01,wounds,0
48,1854-04-01,disease,110
24,1854-04-01,other,110
1,1854-05-01,wounds,0
49,1854-05-01,disease,105
...,...,...,...
22,1856-02-01,wounds,0
46,1856-02-01,other,100
47,1856-03-01,other,125
23,1856-03-01,wounds,0


**An explanation of the code above:** The `melt()` function is used to transform the dataframe from a wide format (where each column represents a separate variable) to a long format (where each row represents an observation).

- `id_vars=['date']`: This specifies that the date column should remain unchanged and serve as the identifier variable.
- `value_vars=['wounds', 'other', 'disease']`: This specifies the columns that you want to melt or unpivot into rows. In this case, you want to melt the columns wounds, other, and disease.
- `var_name='cause'`: The names of the original columns (wounds, other, disease) will now be stored in a new column called cause.
- `value_name='count'`: The values from the original columns will now be stored in a new column called count.

_tip: you can also ask chatGPT or another LLM for help in turning data into tidy data_

## Filtering your Data
**Helpful pandas reminder:** Remember that you can apply filters to your data. Below, I've created a `wounds` dataframe that only contains the observations related to wounds. You might find that you'll need this to answer a couple of the questions below.

In [7]:
# only keep rows in which the value in cause is 'wounds'
wounds = crimea[crimea.cause == 'wounds']
# only show the first 10 records
wounds.head()

Unnamed: 0,date,cause,count
0,1854-04-01,wounds,0
1,1854-05-01,wounds,0
2,1854-06-01,wounds,0
3,1854-07-01,wounds,0
4,1854-08-01,wounds,20


# Your Work - Replicate each Graph

### Problem 1
![problem1](https://drive.google.com/uc?id=1FRneHJmS6UmODBaHEoMKH1OmQmGL-sk6)

Replicate this line chart showing the number of deaths that occurred wounds over time.
- I did not change the color or default dimensions of the chart




In [8]:
# Problem 1 - Your code below should generate the chart above



## Problem 2

Now change your code so that it uses an area chart to show `disease`, `wounds`, and `other`:
- I am using the default colors and dimensions
- Change the y-axis so that it says `number of deaths` instead of `count`

<img src="https://drive.google.com/uc?id=1IVabFsNQ0JmauAdxhetjmc4v-IgSyNva" alt="drawing" width="600"/>

In [9]:
# Problem 2: Your code here


## Problem 3

<img src="https://drive.google.com/uc?id=1W3HGKmnMExxHILlJpHomORpio6KRa5_G" alt="drawing" width="600"/>

- Create a bar chart looking at the number of wounds
- Change each bar's width size to `14`.
- Change the color of each bar to CU gold.
  - It has the following RGB values: Red `207`,  Green `184`,  Blue `124`
  - It also has the following hex value: `#cfb77c` (in case that's easier)

In [10]:
# Problem 3: Your Code here



## Problem 4
<img src="https://drive.google.com/uc?id=1Jlreg7TD4JEcuQYs4WFawW28lAizMyfZ" alt="Problem 5" width="600"/>

Now change it so that it is a **stacked bar graph**
- Use default colors and default axis labels
- Add a **tooltip** that includes the `cause` and `count` (for an example of interaction with the tooltip, [watch this video](https://drive.google.com/file/d/1EtCTttIPam9zT0YBPB9Hm36kBEg68tZu/view?usp=drive_link) )

In [11]:
# Problem 4: Your Code here



## Problem 5

<img src="https://drive.google.com/uc?id=1Mm4nCzkllfa0Idck9oBW7HLPh3U7N5hk" alt="Problem 6" width="600"/>

- Create a horizontal bar graph that contains the **sum** of each cause across _all the dates in the dataset_
- **Hint:** `sum()` can be used in a similar way that `mean()` is used in the [examples on this page](https://altair-viz.github.io/user_guide/transform/aggregate.html)

In [12]:
# Problem 5: Your Code here



## Problem 6

<img src="https://drive.google.com/uc?id=1NlZDAqBgzmbWLLP-4dUZbRnLjwOEjdTM" alt="Problem 7" width="600"/>

Using the same chart as the previous example, make the following customizations.
1. Add text labels to each of the bars
2. Remove the labels on the y-axis
3. Change the x-axis title to `"Total Deaths"`

In [13]:
# Problem 6: Your Code here



## Problem 7

<img src="https://drive.google.com/uc?id=1hXggKmsuKepwZ0nuHVELBxsLcWd8loC4" alt="Problem 8" width="600"/>

Let's turn our focus to distributions
- Create a histogram of the deaths caused by wounds
- Change the x-axis title to `number of deaths caused by wounds`
- Change the y-axis title to `count of days`

In [14]:
# Problem 7: Your Code here



## Problem 8

<img src="https://drive.google.com/uc?id=1vRxCxaEVHdOrzQjv-QTV0niGHRmSVkAK" alt="Problem 9" width="600"/>

- Create a graph that shows histograms of _each_ cause.
  - **Hint:** consider using the `column` property in `encode`
- By default, each graph will look too wide. Change the width of the graph to be `125` and the height to be `100` (using `.properties()`)
- Modify the x-axis title to `number of deaths`

In [15]:
# Problem 8: Your Code here



## Problem 9



<img src="https://drive.google.com/uc?id=1schhT-78zhsujTHe2nqpIIBT5_YJO2ne" alt="Problem 9" width="600"/>

Make a heat map that shows the number of deaths by day for each cause:
- Use the default color
- Change the format of the date so that it looks nicer -
  - **hint**: In the place that you put `date` in your `encode()` function, you can specificy the format with `yearmonthdate(date)`)
- Make it so that when you hover over the a square, it shows the total count (_tip: use `tooltip` for this_). To see a short video of this in action, [click this link](https://drive.google.com/file/d/1l-osJ9deCO7pcGpDlwuB4_BO5mQ6tnGi/view?usp=drive_link).



In [16]:
# Problem 9: Your Code here



## Problem 10: Redesign

Use your creativity + design to improve/expand one of the previous problems.

**[This is graded using the following rubric](https://docs.google.com/document/d/1LEm11acAZC-MG5ylcJ5x7CpC2C6NcrTdU3yj3EgE_WA/edit?usp=sharing)**

In [17]:
# Problem 10: Your code here



### Your design justification

- Modify this text to include your design justification.
- You may use bullet points just like this.
- While subsequent assignments will have a high bar for design justificaitons, because we've covered very little of that topic so far, I don't expect this assignment to include as much alignment with course material (although you should use as much as you can!)