Skip to content
Branch: master
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
scripts
README.md
birth_analysis.py
birth_analysis_grizzly.py
data_cleaning.py
data_cleaning_grizzly.py
get_population_stats.py
get_population_stats_grizzly.py
get_population_stats_simplified.py
get_population_stats_simplified_grizzly.py
movielens.py
movielens_grizzly.py
tests.sh

README.md

Grizzly demo workloads

This directory contains example workloads that make use of Grizzly and a Weld-ified NumPy. This README file assumes for convenience that WELD_HOME is set to the root weld/ directory.

$ export WELD_HOME=/path/to/weld/root/directory

Acquire Data for Demo Workloads

To get data for data_cleaning and other related workloads, run:

$ cd $WELD_HOME/examples/python/grizzly
$ mkdir -p data
$ wget https://raw.githubusercontent.com/jvns/pandas-cookbook/master/data/311-service-requests.csv
$ mv 311-service-requests.csv data/311-service-requests-raw.csv
$ scripts/prune-csv -i data/311-service-requests-raw.csv -l "Incident Zip"
$ scripts/replicate-csv -i data/311-service-requests-raw-pruned.csv -o data/311-service-requests.csv -r 30

To get data for get_population_stats and other related workloads, run:

$ cd $WELD_HOME/examples/python/grizzly
$ mkdir -p data
$ wget https://raw.githubusercontent.com/grammakov/USA-cities-and-states/master/us_cities_states_counties.csv
$ mv us_cities_states_counties.csv data/us_cities_states_counties_raw.csv
$ scripts/transform-population-csv -i data/us_cities_states_counties_raw.csv -o data/us_cities_states_counties.csv -r 30

Running the Demo Workloads

The demo workloads are in $WELD_HOME/examples/python/grizzly.

Each workload has a corresponding Grizzly version. For example, the native Pandas/NumPy data cleaning workload is data_cleaning.py, while the corresponding Grizzly workload is data_cleaning_grizzly.py.

As an example, to compare performance between the native Pandas data cleaning workload and the Weld-ified Pandas data cleaning workload, run:

$ python data_cleaning.py                                         # Native
$ WELD_NUM_THREADS=<num_threads> python data_cleaning_grizzly.py  # Grizzly

By default, data_cleaning_grizzly.py will run with 1 thread.

These scripts print out timing information.

You can’t perform that action at this time.