GitHub - vibhaa/hashpipe: Heavy hitter detection algorithm that is entirely in the dataplane

Project aimed at identifying the flows that contribute to majority of packets between two points on a given network. The final paper can be found here.

All the evaluations used data from the CAIDA Traces from https://data.caida.org/datasets/passive-2016/equinix-chicago/20160218-130000.UTC/ and datacenter traces from http://pages.cs.wisc.edu/~tbenson/IMC10_Data.html. These were converted to .csv files with the srcip, dstip, protocol, srcport and dstport using tshark. They were then split them into chunks of 10M packets each after which TopKIdentifierFlowId was run on them. TopKIdentifierFlowId also needs a file with the actual flow size, which can be obtained by running FindFlowSize. This groups the packets from the csv by 5-tuple giving you a 5-tuple flowid and the associated count for a flow.

The output from TopKIdentifierFlowId itself involves rows of data one corresponding to each experiment (a run of HashPipe on that particular trace with those settings) and prints out the tablesize (number of flowid, counter pairs), K value, D (number of stages), false negative rate, false positive rate, weighted versions of the same (by size fo the flows), number of duplicates in the table, reported fraction of heavy hitters, number of reported heavy hitters, average deviation from the actual size across flows and so on. The header identifies the precise metrics reported. The variable "FlowSizes" has the map between flowid and their actual sizes.

runTopKIdentificationTrials function runs the experiments for you and runs only once across all the K values we want (for HashPipe and all of the incremental steps before that). To run this, you want to use the SummaryStructureType.RollingMinSingleLookup. So, you would want to run TopKIdentifierFlowId with the arguments flowSize file (actual flowsize per flow), the tracefile, "runTrial" and "Single" as its arguments to run HashPipe as per the algorithm in the paper.

runTrialsPerK on the other hand, runs it once for every K and this is relevant for Sample and Hold and the Count-Min Sketch. So, you would want to run TopKIdentifierFlowId with the arguments flowSize file (actual flowsize per flow), the tracefile, "PerThreshold" and "CM" or "SampleAndHold" as its arguments to run CM sketch or Sample And Hold respectively.

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
Analysis		Analysis
Caida_reader		Caida_reader
Paper Figures		Paper Figures
Paper/Draft		Paper/Draft
code		code
p4v1_1		p4v1_1
paper_final		paper_final
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analysis

Analysis

Caida_reader

Caida_reader

Paper Figures

Paper Figures

Paper/Draft

Paper/Draft

code

code

p4v1_1

p4v1_1

paper_final

paper_final

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

About

Releases

Packages

Contributors 2

Languages

License

vibhaa/hashpipe

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Languages