Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
The Datasaurus Dozen
submitted by Tom Westlake
The Datasaurus Dozen is a playful twist on Anscombe's Quartet. A group of twelve datasets, with nigh-identical summary statistics, yet when plotted on a graph they prove to be distinctly dissimilar.
The animation below, utilising the
gganimate packages, highlights the dangers of relying solely on summary statistics without considering the whole distribution
library(datasauRus) library(ggplot2) library(gganimate) ggplot(datasaurus_dozen, aes(x=x, y=y))+ geom_point()+ theme_minimal() + transition_states(dataset, 3, 1) + ease_aes('cubic-in-out')