The Datasaurus Dozen

Thomas Lin Pedersen edited this page Sep 4, 2018 · 4 revisions

submitted by Tom Westlake

The Datasaurus Dozen is a playful twist on Anscombe's Quartet. A group of twelve datasets, with nigh-identical summary statistics, yet when plotted on a graph they prove to be distinctly dissimilar.

The animation below, utilising the datasauRus, ggplot2 and gganimate packages, highlights the dangers of relying solely on summary statistics without considering the whole distribution

The Code

library(datasauRus)
library(ggplot2)
library(gganimate)

ggplot(datasaurus_dozen, aes(x=x, y=y))+
  geom_point()+
  theme_minimal() +
  transition_states(dataset, 3, 1) + 
  ease_aes('cubic-in-out')

datasaurus

Source

Reanimating the Datasaurus

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.