In [1]:
%matplotlib inline
import matplotlib
import seaborn as sns
matplotlib.rcParams['savefig.dpi'] = 144

In [2]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Explanatory Visualization with D3
<!-- requirement: projects/babyname -->
<!-- requirement: projects/d3-example-ridership -->
## Overview

D3 is useful for creating beautiful, interactive, and immediately clear visualizations that highlight exactly what you want the viewer to get out of the visualization. Instead of exploratory visualization power, D3 is a very useful **"explanatory"** visualization tool.

## Motivating Example
Let's look at a relatively simple standalone D3 graphic which demonstrates the main principles of D3.

[Click here for the example](projects/d3-example-ridership/d3-example.html). 


Inspect this example and its source code with your browser's developer tools.

## Key Concepts

The key concepts of D3 are:
* [Selections](https://bost.ocks.org/mike/selection/)
* [Method chaining](https://bost.ocks.org/mike/bar/#chaining)
* [Data join](https://bost.ocks.org/mike/join/)

In javascript, you manipulate elements one at a time. On the other hand, in D3, you manipulate **selections** or groups of elements, using **method chaining**:
```javascript
// select some DOM elements and apply methods with chaining
var svg = d3.select("body").append("svg")
            .attr("width", width + margin.left + margin.right)
            .attr("height", height + margin.top + margin.bottom)

```

When adding the actual data, you tell D3 what you want, not how to do it, using a **data join.**

``` javascript
// data join: initialize the bar elements by "selecting" all of them
// then join a data point to each using enter()
svg.selectAll("bar")
    .data(data)
    .enter().append("rect")
```

## Example: A Static Line Graph

In this section, we'll show you how to do things in D3 that you're doing in pandas.  First, run the following command from the command line.  This downloads data from the Social Security Administration giving baby names going back to 1884:

``` bash
python projects/babyname/src/save_babynames_data.py
```

Once this has run (it takes 30 seconds or so), check that you have 2 new CSV files, with:

``` bash
ls -l projects/babyname/data/*.csv
```

First, we're going to do this in pandas.

In [None]:
import pandas as pd
# read in
birthnames = pd.read_csv("projects/babyname/data/birthnames.csv")
birthnames.head()

We're now going to plot a few pairs of data.

In [None]:
from matplotlib import pylab as plt

boys = birthnames[birthnames.sex == 'M']
girls = birthnames[birthnames.sex == 'F']

def plot_name_frac(sex, name):
    series = boys[boys.name == name] if sex == 'M' else girls[girls.name == name]
    try:
        series.sort_values(by='year').plot(x='year', y='births', label=name)
    except TypeError:
        pass

plt.figure()
plot_name_frac('M', 'John')
plot_name_frac('M', 'Jonathan')
plt.xlabel("Year")
plt.ylabel("Births")
plt.legend(loc='lower right')

We can do the same thing in d3.

**Check out the final output:
[John versus Jonathan in d3](projects/babyname/name_pairs.html).**

Open the file in a text editor to see the full code.

There are three main parts to this code:

1.  The first part is the HTML tag which indicates where the chart will go.

    ``` html
    <div id="chart"><svg></svg></div>
    ```

2.  The second two parts are the JavaScript.  First we need to load the data.  

    ``` javascript
    d3.csv("data/birthnames_top_100.csv", function(error, data) {
        // do something with data
    })
    ```

    This is an asynchronous call to JavaScript.  That means the call returns immediately to execute the next line in the script.  When the file `data/birthnames_top_100.csv` is downloaded, the callback function (the anonymous function `function(error, data) { ... }` is then called with `data` representing the data that was loaded.  Hence, we must execute all our code in this callback.

    A brief note: `data/birthnames_top_100.csv` is actually returned via HTTP (point your browser to [birthnames_top_100.csv](projects/babyname/data/birthnames_top_100.csv) if you don't believe me).  This means that this file could be any csv file on the internet and it could be an API response generated dynamically by a server.  In our case, it happens to be a static file you generated at the beginning of this tutorial.  The server that powers this notebook is translating the HTTP request to fetching the corresponding file from disk.

3.  The code that generates the graph is here:

    ```javascript
    function bindAndRender(myData, maxBirths) {
        nv.addGraph(function() {
            chart = nv.models.lineChart()
                        .x(function(d) { return d.year })
                        .y(function(d) { return d.births })
                        .margin({left: 100})
                        .useInteractiveGuideline(true)
                        .transitionDuration(350)
                        .showLegend(true)
                        .showYAxis(true)
                        .showXAxis(true)
                        .forceY([0, maxBirths]);

          chart.xAxis.axisLabel('Year');

          chart.yAxis.axisLabel("Births");

          d3.select('#chart svg')    //Select the <svg> element for the chart.
              .datum(myData)         //Populate the <svg> element with chart myData...
              .call(chart);          //Finally, render the chart!
      });
    };
    ```
    
    This is actually using a library called `nvd3` which encapsulates a lot of the very basic functionality that `d3` offers.  Basically, `chart` specifies a bunch of rules for how to draw the data and `d3.select` binds (associates) the data with the html element set by the jQuery selector `'#chart svg'`.  This code demonstrates a few nifty aspects of JavaScript:
    - **Chaining:** `nv.models.lineChart()` returns a `chart` object.  Calling the method `.x` modifies it and returns the new modified instance of the object.  Calling the method `.y` ...
    - **Callbacks:** notice that both `.x` and `.y` require a function.  This function, when given an instance that will be bound to it (`d`), returns the attribute to set on the x and y axes.  The object `chart` is actually a function, which is called via the `.call` method by the selection `d3.select` with the data bound by the call to `.datum`.  Finally, the function `addGraph` takes a callback itself.
    - **jQuery selectors:** Recall that "#chart svg" selects an html element with `id="chart"` and an `svg` tag inside.
    - **Global versus local context:** notice that while the typical variable assignment pattern is
    
        ```javascript
        var chart = ...
        ```
     in here, we omit the `var`.  The `var` places the variable in the local context but here we want it in the global context, where we can use it again outside of this function (see below).
    
4.   Dynamic resizing.  The function `bindAndRender` is called when the page loads to bind (associate) the data with the chart.  Once this is done, all we have to do is reinvoke `.call(chart)` to redraw the chart.  This is done via a call-back associated with window resizing.  That is, the following line ensures that every time the window is resized, we issue another draw command:

    ```javascript
		window.addEventListener('resize', function(event){
			d3.select('#chart svg')
                .call(chart);
		});
    ```

    Notice that we can call `chart` here, which was left in the global context by the function `bindAndRender`.

**Exercise:** On the webpage in the above link, use `Cmd+Option+i` and click the console tab to enter the interactive JavaScript console in your browser.  Paste in these lines:
```javascript
d3.csv("data/birthnames_top_100.csv", function(error, data) {
    rawData = data;
})
```
Type `rawData` into the console.  Again, because we didn't use `var`, the variable `rawData` is placed in the global context.  Follow the data manipulation steps in the source code line by line (you can view the source code on your local machine).  The code uses a small but powerful library called [underscore.js](http://underscorejs.org/) which is imported as the symbol "`_`" (which is a valid variable name in JavaScript).

**Question:** In the above line of code, we save our data into `rawData`, a variable in our outer scope.  Why does this work in the console but would likely not have the intended side effect in a script?  Hint: this has to do with asynchronicity.

**Exercise:** Look at the plot of presidential first names below.  You can tell there's a spike in baby names around the time they are elected.  Reproduce this visualization in `d3`.  Highlight the portion of the data when they were president.

**Action Item:** Read through this introductory example by [Brock](http://bost.ocks.org/mike/bar/) and the online version of the [O'Reilly Book](http://alignedleft.com/tutorials/d3/)

In [None]:
# There's a spike in the popularity of presidential first names
# when they are elected

plt.figure()
plot_name_frac('M', 'Woodrow')
plot_name_frac('M', 'Warren')
plot_name_frac('M', 'Calvin')
plot_name_frac('M', 'Herbert')
plot_name_frac('M', 'Franklin')
plot_name_frac('M', 'Dwight')
plot_name_frac('M', 'Lyndon')
plot_name_frac('M', 'Barack')
plt.yscale('log')
plt.legend(loc='lower left')

### Resources

There are many D3 tutorials online. Here are a two relevant links:
- [Stack Overflow - D3 Documentation (beta)](http://stackoverflow.com/documentation/d3.js/topics)
- [D3 Tutorials Wiki](https://github.com/d3/d3/wiki/Tutorials)


The boilerplate `nvd3` library is only useful for certain, static charts. Never fear, though, because there exists a plethora of other libraries that cover the boilerplate for a variety of charts, while still offering flexibility by being built on top of D3.

- [C3.js](http://c3js.org/)
- [Rickshaw](http://code.shutterstock.com/rickshaw/tutorial/introduction.html) - another popular time series library.
- [JSNetworkX](http://felix-kling.de/JSNetworkX/index.html) - NetworkX for JavaScript!
- [DataMaps](http://datamaps.github.io/) - for maps
- [Pyxley](https://github.com/stitchfix/pyxley/tree/master/examples/metricsgraphics) is a wrapper with strong Python/Pandas support.

And of course, there are plenty of charting libraries outside of the D3 ecosystem, too.

*Copyright &copy; 2016 The Data Incubator.  All rights reserved.*