Skip to content
Yangjun Wang edited this page Dec 23, 2015 · 12 revisions

Welcome to the RealtimeStreamBenchmark wiki!

Workloads

WordCount, Grouping WordCount -- uniform data and skewed data

WordCount

Skewed data: Throughput of reading from each partition of Kafka
Throughput of each node

For Storm ack enabled, make bolts line short doesn't help a lot.

Advertisement click

Data generator, adv generated in main threads and click wait in sub threads

1.1 how to do rpc
1.2 split one stream
1.3 merge two stream

datasets

http://cnets.indiana.edu/groups/nan/webtraffic/click-dataset/

Wikipedia clickstream data, uncompress around 2.5GB http://datahub.io/dataset/wikipedia-clickstream