Skip to content
Materials for a workshop on hash-based sketches
Jupyter Notebook Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
data
datasketching
solutions
00-moments.ipynb
01-bloom-filters.ipynb
02-count-min-sketch.ipynb
03-hyperloglog.ipynb
04-minhash.ipynb
05-minhash-rec.ipynb
99a-data-generator.ipynb
Pipfile
README.md
bit-vector.ipynb

README.md

Data sketching and other magic tricks

This repository contains Jupyter notebooks for an interactive tutorial on hash-based probabilistic data structures. Here's how to run it:

The easy way

Use binder. (We don't recommend this if you'll be running the tutorial over conference wifi, but it requires almost no setup and can run from a computer that only has a browser.)

The flexible way

Install the prerequisites

  1. Make sure you have Python 3.7 installed, installing it if necessary
    • If you have a favorite package manager, use that
    • if not, python.org has binaries for many platforms
  2. Make sure you have git installed, installing it if necessary
    • If you have a favorite package manager, use that
    • if not, git-scm.com has binaries for many platforms (you won't need a GUI)
  3. Install pipenv
    • on a Mac, the easiest way is probably brew install pipenv
    • on a Fedora Linux machine, the easiest way is probably dnf install pipenv
    • on Windows, if you have Python installed already, the easiest way is probably to use pip

Install the notebooks and dependencies

  1. Clone this repository: git clone https://github.com/willb/data-sketching-and-other-magic-tricks/
  2. Change to this repository's directory: cd data-sketching-and-other-magic tricks
  3. Install the dependencies: pipenv install
  4. Run the notebooks: pipenv run jupyter notebook

Binder

You can’t perform that action at this time.