# SI649-20-Winter Lab 2 -> Altair I
## Overview 
We're going to re-create some of the visualizations we did in Tableau but this time using Altair for the article: [“The Dollar-And-Cents Case Against Hollywood’s Exclusion of Women”](https://fivethirtyeight.com/features/the-dollar-and-cents-case-against-hollywoods-exclusion-of-women/). We'll be teaching you different pieces of Altair over the next few weeks so we'll focus on just a few visualizations this time:

1.   Replicate 2 visualizations in the original article  
2.   Implementing 2 new visualizations according to our specifications

**For this lab, we have done all of the necessary data transformation for you. You do not need to modify any dataframe. You only need to write Altair code. It's fine if your visualization looks slightly different from the example (e.g., getting 1.1 instead of 1.0)**

### Lab Instructions (read the full version on the handout of the previous lab)

*   Save, rename, and submit the ipynb file (use your username in the name).
*   Run every cell (do Runtime -> Restart and run all to make sure you have a clean working version), print to pdf, submit the pdf file. 
*   For each visualization, we will ask you to write down a "Grammar of Graphics" plan first (basically a description of what you'll code).
*   If you end up stuck, show us your work by including links (URLs) that you have searched for. You'll get partial credit for showing your work in progress. 
*   There are many bonus point opportunities in this lab. 

We encourage you to go through the Altair tutorials before next week:
- [UW Course](https://github.com/uwdata/visualization-curriculum)
- [Altair tutorial](https://github.com/altair-viz/altair-tutorial)

### Resources
- [Altair Documentation](https://altair-viz.github.io/index.html)
- [Colab Overview](https://colab.research.google.com/notebooks/basic_features_overview.ipynb)
- [Markdown Cheatsheet](https://www.markdownguide.org/cheat-sheet/)
- [Pandas DataFrame Introduction](https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html)
- [Vega-Lite documentation](https://vega.github.io/vega-lite/docs/)
- [Vega/Vega-Lite editor](https://vega.github.io/vega-editor/)


In [0]:
# imports we will use
import altair as alt
import pandas as pd
from collections import defaultdict
from altair import datum

In [0]:
# load data and perform basic data processing 
# get the CSV
datasetURL="https://raw.githubusercontent.com/LiciaHe/SI649/master/week2/movies_individual_task.csv" 
movieDF=pd.read_csv(datasetURL, encoding="latin-1")

# fix the result column, rename the values
movieDF['test_result'] = movieDF['clean_test'].map({
    "ok":"Passes Bechdel Test",
    "men":'Women only talk about men',
    "notalk":"Women don't talk to each other",
    "nowomen":"Fewer than two women",
    "dubious":"dubious"
})

# fix the location column for later use
locationDict = defaultdict(lambda: 'International')
locationDict["United States"]="U.S. and Canada"
locationDict["Canada"]="U.S. and Canada"
movieDF["country_binary"]=movieDF["country"].map(locationDict)

##calculate ROI for 2nd chart 
movieDF["roi_dom"]=movieDF["domgross_2013$"]/movieDF["budget_2013$"]
movieDF["int_only_gross"]=movieDF["intgross_2013$"]-movieDF["domgross_2013$"]
movieDF["roi_int"]=movieDF["int_only_gross"]/movieDF["budget_2013$"]

movieDF=movieDF.drop(columns=["Unnamed: 0","test","budget","domgross","intgross","code","period code","decade code","director","genre","director_gender","imdb"])
movieDF_since_1990=movieDF[movieDF.year>1989]

In [22]:
#take a look at the new dataset
movieDF.sample(3)
# movieDF_since_1990.sample(3)

Unnamed: 0,year,title,clean_test,binary,budget_2013$,domgross_2013$,intgross_2013$,rating,country,language,test_result,country_binary,roi_dom,int_only_gross,roi_int
947,2004,Ella Enchanted,ok,PASS,43161751,28256983.0,28256983.0,6.3,United States,English,Passes Bechdel Test,U.S. and Canada,0.654676,0.0,0.0
314,2010,Beginners,notalk,FAIL,3418342,6186017.0,12353318.0,7.2,United States,English,Women don't talk to each other,U.S. and Canada,1.809654,6167301.0,1.804179
31,2013,Grown Ups 2,men,FAIL,80000000,133668525.0,247023808.0,5.4,United States,English,Women only talk about men,U.S. and Canada,1.670857,113355283.0,1.416941


## Visualization 1: Recreate this visualization 


![vis2](https://fivethirtyeight.com/wp-content/uploads/2014/04/hickey-bechdel-2.png?w=1150)

### Step 1: Write down your plan for the visualization (edit this cell)

*   Data Name: *movieDF_since_1990*
*   mark type: Bar
*   Encoding Specification:  
*   > x: median(budget_2013$) : Quantitative
*   > y: test_result : Nominal

Example encoding, if we had the nominal variable 'movietype' and we wanted to use color, it would be:

color: movietype:nominal


### Step 2: Create your chart. 
Please take a look at the checkpoints below. You can follow the checkpoint to work through the problem step-by-step. Don't forget to paste your FINAL answer to the cell immediately blow this block (it will allow us to grade). You can search for the keyword "TODO" to locate cells that need your edits


In [34]:
#TODO: Replicate visualization 1 

bars = alt.Chart(movieDF_since_1990,title="Median Budget For Films Since 1990: 2013 dollars").mark_bar().encode(
    x=alt.X('median(budget_2013$)', title=None),
    y=alt.Y("test_result", title=None,  sort=['Passes Bechdel Test', 'Women only talk about men', 'Women don\'t talk to each other', 'Fewer than two women','dubious'])
).transform_filter(
    (datum.test_result !="dubious") )
text = bars.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    text=alt.Text("median(budget_2013$):Q", format="$.3s")
)
(bars + text)

#### checkpoint 1: basic bar chart: you get full points if you 
 
*  Specify the correct mark 
*  Use the correct x and y encoding 
*  Plotting the right data (hint: make sure you examine the data frame and use the correct columns)


You chart should look like:

![](https://github.com/LiciaHe/SI649/blob/master/week2/imgs/v1/vis1_1.png?raw=true)

#### checkpoint 2: basic bar chart with sorted order: you get full points if you 
 
*  Completed checkpoint1 
*  Align the order of your y-axis values with the provided example.
*   >*i.e., from top to bottom, the order of the bars is "Passes Bechdel Test","Women only talk about men","Women don't talk to each other","Fewer than two women","dubious".*

Hint: [Sort](https://altair-viz.github.io/user_guide/generated/core/altair.Sort.html?highlight=sort)


You chart should look like:

![](https://github.com/LiciaHe/SI649/blob/master/week2/imgs/v1/vis1_2.png?raw=true)

#### checkpoint 3: basic bar chart with title: you get full points if you 
 
*  Completed checkpoint2 
*  Remove labels on x-axis and y-axis
*  Add a chart title 

You chart should look like:

![](https://github.com/LiciaHe/SI649/blob/master/week2/imgs/v1/vis1_3.png?raw=true)

#### checkpoint 4: BONUS: remove dubious. You will get full point if you 
 
* Complete checkpoint 3
* Remove the bar for "dubious" (using Altair, no Pandas)

You chart will look like:

![](https://github.com/LiciaHe/SI649/blob/master/week2/imgs/v1/vis1_4.png?raw=true)

#### checkpoint 5: BONUS: add number labels.

You will get full point if you 
 
* Complete checkpoint 4
* Add number as labels of your bars 

You chart will look like:

![](https://github.com/LiciaHe/SI649/blob/master/week2/imgs/v1/vis1_5.png?raw=true)

#### checkpoint 6: BONUS: format numbers.

You will get full points if you 
 
* Complete checkpoint 5
* Adjust number labels to display millions. e.g. (31.4592 M instead instead of 31459218). You might want to read about [format](https://altair-viz.github.io/user_guide/encoding.html?highlight=format%20type), and [D3's format specification](https://github.com/d3/d3-format#locale_format). 

You chart will look like:

![](https://github.com/LiciaHe/SI649/blob/master/week2/imgs/v1/vis1_6.png?raw=true)

## Visualization 2 Replicate this visualization


![alt text](https://fivethirtyeight.com/wp-content/uploads/2014/04/hickey-bechdel-3.png?w=1150)





### Step 1: Write down your plan for the visualization (edit this cell)

Left chart:
*   Data Name: *movieDF_since_1990*
*   mark type: Bar
*   Encoding Specification:  
*   > x: median(roi_dom) : Quantitative
*   > y: test_result : Nominal


Right chart:
*   Data Name: *movieDF_since_1990*
*   mark type: Bar
*   Encoding Specification:  
*   > x: median(roi_int) : Quantitative
*   > y: test_result : Nominal

Compound Method (how to join these charts together?): Horizontal Concatenation which can be created using the hconcat function or the | operator.

Example encoding, if we had the nominal variable 'movietype' and we wanted to use color, it would be:

color: movietype:nominal









### Step 2: Create your chart. 
Please take a look at the checkpoints below. You can follow the checkpoint to work through the problem step-by-step. Don't forget to paste your FINAL answer to the cell immediately blow this block (it will allow us to grade). You can search for the keyword "TODO" to locate cells that need your edits


In [32]:
#TODO: Replicate chart 2
bars_us_canada= alt.Chart(movieDF_since_1990,title="US and Canada").mark_bar().encode(
   x=alt.X('median(roi_dom)', title=None),
   y=alt.Y("test_result",title=None,sort=['Passes Bechdel Test', 'Women only talk about men', 'Women don\'t talk to each other', 'Fewer than two women','dubious'])
)
bars_international= alt.Chart(movieDF_since_1990,title="International").mark_bar(color="orange").encode(
   x=alt.X('median(roi_int)', title=None),
   y=alt.Y("test_result", axis=None, sort=['Passes Bechdel Test', 'Women only talk about men', 'Women don\'t talk to each other', 'Fewer than two women','dubious'])
)

(bars_us_canada|bars_international).resolve_scale(y='shared').properties(
   title="Dollars Earned for Every Dollars Spent"
)
text = bars_us_canada.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    text=alt.Text('median(roi_dom):Q', format='.5f')
)

text_international = bars_international.mark_text(
    align='left',
    baseline='middle',
    dx=3  
).encode(
    text=alt.Text('median(roi_int):Q',  format='.5f')
)

(bars_us_canada+text|bars_international + text_international).resolve_scale(y='shared').properties(
   title="Dollars Earned for Every Dollars Spent"
)

### Visualization 2 Checkpoints


#### checkpoint 1: basic bar charts
 
*  Specify the correct mark 
*  Use the correct x and y encoding 
*  Plotting the right data (hint: make sure you examine the data frame and use the correct columns)
*  You will have 2 charts, one for U.S.&Canada, one for International


You chart will look like:

![](https://github.com/LiciaHe/SI649/blob/master/week2/imgs/v2/vis2_1_1.png?raw=true) 
 
 and 
 ![](https://github.com/LiciaHe/SI649/blob/master/week2/imgs/v2/vis2_1_2.png?raw=true) 

#### checkpoint 2: joining two charts
 
* completed checkpoint1
* joined two charts 


You chart will look like:
![](https://github.com/LiciaHe/SI649/blob/master/week2/imgs/v2/vis2_2.png?raw=true)

#### checkpoint 3: resolve y scale and hide the second y-axis
 
* completed checkpoint2
* ensure that two charts are sharing the same y-axis 
* remove the second y-axis


You chart will look like:

![](https://github.com/LiciaHe/SI649/blob/master/week2/imgs/v2/vis2_3.png?raw=true)

#### checkpoint 4: sort y-axis 
 
* completed checkpoint 3
* Sort y-axis so that the order of the bars is (from top to bottom): 
> "Passes Bechdel Test","Women only talk about men","Women don't talk to each other","Fewer than two women","dubious"




You chart will look like:

![](https://github.com/LiciaHe/SI649/blob/master/week2/imgs/v2/vis2_4.png?raw=true)

#### checkpoint 5: Change color and titles  
 
* completed checkpoint 4
* color bars of these two charts with different colors
* add title to the compound chart 
* edit axis labels (you can also remove axis label and add chart title to individual chart)




You chart will look like:
![](https://github.com/LiciaHe/SI649/blob/master/week2/imgs/v2/vis2_5.png?raw=true)

#### checkpoint 6: BONUS: Add number layer  
 
* completed checkpoint 5
* add number annotations


You chart will look like:

![](https://github.com/LiciaHe/SI649/blob/master/week2/imgs/v2/vis2_6.png?raw=true)

## Visualization 3: Replicate this visualization

![](https://github.com/LiciaHe/SI649/blob/master/week2/imgs/v3/vis3_4.png?raw=true)

### Step 1: Write down your plan for the visualization (edit this cell)

*   Data Name: *movieDF*
*   mark type: Line
*   Encoding Specification (1st chart):  
*   > x: year : Nominal
*   > y: average(budget_2013$) : Quantitative

*   Encoding Specification (2nd chart):  
*   > x: year : Nominal
*   > y: median(budget_2013$) : Quantitative

*   Encoding Specification (3rd chart):  
*   > x: year : Nominal
*   > y: max(budget_2013$) : Quantitative






### Step 2: Create your chart. 
Please take a look at the checkpoints below. You can follow the checkpoint to work through the problem step-by-step. Don't forget to paste your FINAL answer to the cell immediately blow this block (it will allow us to grade). You can search for the keyword "TODO" to locate cells that need your edits


In [6]:
#TODO: Replicate visualization 3
line_avg = alt.Chart(movieDF).mark_line().encode(
    x=alt.X('year:Q', axis=None),
    y=alt.Y("average(budget_2013$)")
).properties(
    height=100,
    width=500
)
line_median = alt.Chart(movieDF).mark_line(color="grey").encode(
    x=alt.X('year:Q', axis=None),
    y=alt.Y("median(budget_2013$)")
).properties(
    height=100,
    width=500
)
line_max = alt.Chart(movieDF).mark_line(color="pink").encode(
    x=alt.X('year:N'),
    y=alt.Y("max(budget_2013$)")
).properties(
    height=100,
    width=500
)
(line_avg & line_median & line_max).resolve_scale(x='shared')

### Visualization3 Checkpoints

#### checkpoint 1: line chart for average, median, and max of budget 
 
You will get full points if you 
*  Specify the correct mark 
*  Use the correct x and y encoding 
*  Plotting the right data 
*  Produce 3 line charts


You chart will look like:
![](https://github.com/LiciaHe/SI649/blob/master/week2/imgs/v3/vis3_1_1.png?raw=true)
and 
![](https://github.com/LiciaHe/SI649/blob/master/week2/imgs/v3/vis3_1_2.png?raw=true)
and 
![](https://github.com/LiciaHe/SI649/blob/master/week2/imgs/v3/vis3_1_3.png?raw=true)

#### checkpoint 2: concat 3 line charts 
 
You will get full points if you 
*  Complete checkpoint 1
*  Concat 3 charts vertically


You chart will look like:
![](https://github.com/LiciaHe/SI649/blob/master/week2/imgs/v3/vis3_2.png?raw=true)


#### checkpoint 3: adjust width,  height and color 
Each chart should be 500x100, plotted with different colors
 
You will get full points if you 
*  Complete checkpoint 2
*  Adjust chart width and height
*  Plot charts with different colors


You chart will look like:
![](https://github.com/LiciaHe/SI649/blob/master/week2/imgs/v3/vis3_3.png?raw=true)


#### checkpoint 4: resolve axis and remove duplicated x-axis 

You will get full points if you 
*  Complete checkpoint 3
*  Ensure that 3 charts are sharing the same x-axis 
*  Remove duplicate axis ticks. 


You chart will look like:

![](https://github.com/LiciaHe/SI649/blob/master/week2/imgs/v3/vis3_4.png?raw=true)


## Visualization 4: Replicate this visualization


![](https://github.com/LiciaHe/SI649/blob/master/week2/imgs/v4/vis4.png?raw=true)

### Step 1: Write down your plan for the visualization (edit this cell)


*   Data Name: *movieDF*
*   mark type: circle
*   Encoding Specification:  
*   > x: rating : Quantitative
*   > y: intgross_2013$ : Quantitative






### Step 2: Create your chart. 
This chart is relatively simple so there's no checkpoint.

In [51]:
#TODO: Replicate visualization 4 
plot = alt.Chart(movieDF).mark_circle().encode(
    x=alt.X('rating:Q'),
    y=alt.Y('intgross_2013$')
)
plot

*End of LAB2*

Please run all cells (Runtime->Run all), and 
1.  save to PDF (File->Print->Save PDF)
2.  save to ipynb (File -> Download .ipynb)

Rename both files with your uniqname: e.g. uniqname.pdf/ uniqname.ipynb
Upload both files to canvas. 
