Data Visualization examples for the workshop at CMU
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
bar-d3
bar-highcharts
bar-p5
data
docs
js
line-d3
line-highcharts
line-p5
scatterplot-d3
scatterplot-highcharts
scatterplot-p5
.gitignore
readme.md

readme.md

Data Visualization Workshop

This is a repository of examples used in the Data Visualization workshop. In it, there are three versions of the same bar chart, three versions of the same line graph, and three versions of the same scatterplot. One of each is done using the Highcharts library, one is done using P5.js, and one is done using D3.

Bar Charts

The bar charts are basic examples, showing how to generate simple charts using each example.
they're based on the Highcharts intro tutorial, Mike Bostocks's bar chart tutorial, and the p5 examples.

This is the most basic form of chart that we're going to show—it's just a quick example of how to draw something with each library.

the data is:

value
4
8
15
16
23
42

We'll graph this on the horizontal axis, with inline labels (or the default style, for Highcharts).

Highcharts

the complete javascript code for the Highcharts object is:

$(function () { 

    // Set up a basic bar chart
    $('#container').highcharts({
        chart: {
        type: 'bar'  // <- Chart type
        },
        series: [{
            data: [4, 8, 15, 16, 23, 42]
        }]
    });
});

Highcharts uses jQuery to handle DOM load interaction and other interactions, and is itself a jQuery plugin. To create a Highchart chart, you use a jQuery selector to choose a DOM object, and you call the highcharts method on the jQuery object, and pass in a configuration object. The minimum you need is a chart type and a data series.

p5.js

p5.js by default creates an index.html and a sketch.js file. The index file has very little in it—just basic HTML setup and a link to the javascript files.

In sketch.js, we have the traditional setup() and draw() functions from most creative coding environments. setup() will run once, on initialization, and draw() will run continuously, once for each frame of the visualization. We've also declared a global variable, data, to hold our data.

The p5 sketch iterates through a for loop, drawing a rectangle for each bar. It is also using a drawing state matrix to handle position—this is the push() and pop() functions that wrap the actual drawing text.

(Note that all our examples use a local version of p5. The released version has issues with csv parsing, and will not work. This has been fixed in Master, and will function properly after the next release of p5.js.)

Line Charts

The linecharts are a slightly more complicated example. The data is still 1-dimensional, but there are many more points. The source of the data is from Golan Levin's Secret Lives of Numbers project. We're only using the first 100 numbers, rather than 100K.

This is slightly more complicated, since we're now loading data from an external source. We're also adding axis, which is trivial in highcharts, a function of d3, and significantly more complicated in p5.

Scatterplots

The data for the scatterplots comes from the Carnegie Museum of Art collection API, which is still in private beta. We're using it to graph the size of artwork (in square inches) against the year the museum acquired the artwork. This is useful because it shows changes in the collection choices that the museum has made over the years, and also because it's just a lot of interesting data that I have access to.

We're also going to call out large acquisition lots; collections of artwork obtained at the same time from the same person. We'll tag those with special colors.

This data set is a .CSV file with ~25,600 records and the following structure:

Sample Data
title group width height area creation acquisition
What Steve Saw 0 60.250 72.000 4338.0 2010 2012
The Studio 0 25.000 30.750 768.75 1951 1957
[Untitled] 0 5.512 1.772 9.77 1941 2014
68 Tiges Marteles 0 23.375 31.500 736.31 1942 1976

The group column calls out large collections of items that were acquired at the same time—it's zero unless the group is larger that 100 items, at which point it is the number of items within the group. Typically, we'd compute this on the fly, but since the API is not yet publicly released, we're going to use this as a placeholder.

Width and height are in inches, area is inches^2. creation and acquisition are years.

We'll be using the data to generate something that looks like this:

Example Scatterplot

What are we doing?

This data set is interesting for a couple of reasons. One is that it's big— there's a lot of data here, so we're going to test performance on this data set.

Another is that we can't directly graph the data, because the variance between the smallest item and the largest item is immense.

Wall Drawing #493 miniature portrait of Louis XVI
305,856 in^2 0.117 in^2

If we were to plot these with a linear scale for size, you'd get something like this:

Example Scatterplot

In order to display this information usefully, we'll need to use a log scale. This is a standard visualization tool, but it will add enough complexity to our project that we can begin to see the strengths and weaknesses of the various libraries.


Simple Local HTTP Server

Because we are loading external resources, it's usually easier to serve these files locally. The easiest way I know of to serve HTML pages is to use to view these files is to use Python's SimpleHTTPServer. To use it, navigate to a directory that you'd like to serve, and type:

python -m SimpleHTTPServer

Then open http://localhost:8000 in any browser.

if you're using Python 3 and this doesn't work, try python3 -m http.server.

Mac OSX has python by default — if you're on windows, you can download it here.

Helpful Links:

D3:

Highcharts

p5.js

General Data Visualization

People and Companies

  • Jer Thorp - Data artist. Worth knowing about.
  • Nicolas Feltron - Data artist. Famous for his Annual Reports.
  • Stamen - Data visualization company, excellent blog, heavy focus on maps.
  • Fathom - Another interesting data visualization company.