Skip to content
Search lots of data sets for spurious correlations
Rust JavaScript Ruby HTML
Branch: master
Clone or download
Latest commit 316bcfd Sep 10, 2017
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
btsf Import Canadian GDP Data Apr 10, 2016
public Fix overlay latency bug in a much better way May 7, 2016
scripts Script to crunch down enormous data sets Apr 20, 2016
src Parallelize correlation finding with Rayon May 5, 2016
.gitignore Script to crunch down enormous data sets Apr 20, 2016
Cargo.lock Update crates Sep 9, 2017
Cargo.toml Update crates Sep 9, 2017
LICENSE Add license Apr 30, 2016
README.md Nicer screenshot Apr 30, 2016
format.md Format Mar 26, 2016

README.md

Dayder

Dayder is a web app for finding spurious correlations in thousands of data sets. It was originally created for TerribleHack III where it just worked with 3000+ different causes of death, but we've kept improving it since then. It was made by Tristan Hume and Marc Mailhot. Since the hackathon we've expanded and optimized it to work with over 390,000 data sets at a time.

Having been originally created for a "Stupid shit no-one needs and terrible ideas" hackathon, we don't even pretend that Dayder is useful. It is mostly a tech demo for making a super fast web app despite dealing with large quantities of data. Dayder uses a custom binary format, custom JS Canvas and DOM rendering, heavily optimized server-side code in Rust, caching and tuned data layouts for excellent performance. It can filter through 390,000+ data sets as you type with sub 50ms server response time and instantaneous rendering of thousands of graphs in the browser. It also can find correlations among all 390,000+ data sets in less than 2 seconds.

Dayder

Origins

Inspired by the Spurious Correlations book and website. We thought of making a modern fast reactive site for finding spurious correlations in lots of time series data sets. We started with the data set of various causes of death over time, because it was the funniest in a morbid sort of way.

After the hackathon we added Canadian GDP from various sectors, and the full set of over 300,000 time series from FRED. This required making a whole bunch more improvements to speed and usability.

How we built it

We designed a custom binary time series data format (btsf!) that allows bandwidth and memory efficient processing of large amounts of time series data.

This is processed using typed arrays in JS on the client and using Rust on the server. All the DOM and canvas rendering is efficient custom code so that it can render thousands of graphs in milliseconds.

Later we optimized it further. It now does as-you-type filtering on the server side so it doesn't have to send 400MB of data to the client. In order to do as-you-type filtering I optimized the filtering method using efficient data layout, lazy processing, caching and incremental sorting to maintain sub 50ms response times on every request.

License

This project is released under the MIT license, see the LICENSE file for details.

You can’t perform that action at this time.