A Deterministic Finite Automata GPU-based Engine
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
MNRL @ 29e5380
bin/data
dfa_engine
generator
.gitmodules
LICENSE
Makefile
README.md
arch.png

README.md

DFAGE

DFAGE is a prototype framework for running DFA-based matching on NVIDIA CUDA-enabled GPU cards.

  1. Disclaimer

This software has mainly developed for research purpose. Even though some validation has been performed, authors do not guarantee that it completely respects automata and regular expression semantics and the complete correctness of the results produced.

  1. Requirements

To compile and run this framework you need:

  • CUDA >=1.3 enabled NVIDIA GPU card
  • CUDA Toolkit >=2.3
  • GNU C++ compiler >=4.1
  • libboost >= 1.40.0
  1. Overview

DFAGE is an optimized version of a DFA engine running on GPUs, which is inspired by the implementations proposed by Becchi et al. (X. Yu and M. Becchi, "GPU acceleration of regular expression matching for large datasets: exploring the implementation space", Proceedings of the ACM International Conference on Computing Frontiers CF '13, pp. 18, 2013) and Vasiliadis et al. (G. Vasiliadis, M. Polychronakis, S. loannidis, "Parallelization and characterization of pattern matching using gpus", Proceedings of the IEEE InternationalSymposium on Workload Characterization IISWC '11, pp. 216-225, 2011). The optimizations include:

  • Fast accepting-state recognition by encoding accepting states with negative IDs
  • Multi-byte input symbol fetches
  • DFA transition tables stored in GPU texture memory
  • Reporting the cycle and rule ID of each matched rule
  • Capability of simultaneously processing multiple DFAs and multiple packets

DFAGE enables execution of large and complex DFAs (e.g. automata or regular expressions rulesets) at very high throughput and exploration of performance in a two dimension space: number of DFAs and number of packets. Different packets are mapped to different CUDA thread-blocks on the x-dimension of the CUDA grid. The DFAGE engine assigns individual CUDA threads within a thread-block to process a particular DFA for the assigned packet. For large datasets where the number of DFAs exceed the thread-block size, more blocks on the y-dimension of the grid will be used.

Generally, four different types of automata representations can be accepted by DFAGE: (1) regular expressions in a plain text file format adhereing the PCRE syntax, (2) NFA(s) represented in Becchi's text format, (3) DFA(s) represented in MNRL format(MNRL: an open-source, general-purpose and extensible state machine representation language. Please refer to https://github.com/kevinaangstadt/mnrl for more details), (4) DFA(s) represented in binary format (DFA state transition table and the mapping between accepting states and corresponding rules are stored directly in binary files)

The framework is composed by three components:(i) Generator, (iii) MNRL, and (iii) DFA engine

(i) Generator: The generator purpose is to translate a set of regular expressions (type 1) or NFA(s) represented in Becchi's text format (type 2) into binary files (type 4) executable by the DFA engine. The generator is an adapted version of the regular expression processor developed by M. Becchi. Please refer to M. Becchi's web page for more information: http://regex.wustl.edu/index.php/Main_Page

(ii) MNRL: The MNRL component (https://github.com/kevinaangstadt/mnrl) provides the C++ MNRL APIs, which are then used in the DFA engine to import the DFA in MNRL format.

(iii) DFA engine: DFAGE's engine purpose is to take transition graph(s) (in binary format or MNRL format) and run it (them) over an NVIDIA CUDA-enabled GPU card. The engine is able to take input string as a binary file (or text file) and run the DFA(s) over a single stream or packets.

It should be noted that the original input stream is evenly divided into multiple segments in the multi-packet mode with the assumption that each segment is independent to others. This assumption is only suitable for performance evaluation purpose. The capability of handling multiple segments will be included in our future version.

The transition graph(s) can be generated in two ways:

  • generated by the generator (included in this package);
  • generated by VASim developed by Jack Wadden. Please refer to the J. Wadden's web page for more information: https://github.com/jackwadden/VASim

The following figure shows the DFAGE architecture and the automata representations accepted by the framework.

Alt text

3.1. Compiling the framework

From the root directory of the sources, simply launch:

$ make

The compiled binaries will be generated into the ./bin directory. You will obtain:

  • regex_memory and regex_memory_regen : the generator binaries
  • libmnrl.so : the MNRL library as a shared object
  • dfa_engine : the engine binary

3.2. Generating DFA binary representation from regular expressions

For using the generator to generate the DFA transition graph executable on the DFAGE engine, you must provide a file containing PCRE compatible regular expressions one per line.

You can use the simple.regex as an example. To generate the DFA transition graph, launch:

$ cd bin

$ ./regex_memory -gendfa -f ./data/simple.regex -E simple.dumpdfa

Three output files are generated: simple.dumpdfa (the DFA in the text format), simple.dumpdfabin (the DFA in the binary format), and simple.dumpdfaaccstbin (the mapping between accepting states and corresponding rules)

-NOTE- The binary files MUST be renamed in order to be correctly interpreted by the DFA engine, as follows

    simple.dumpdfabin      --> 1_dfa.bin
    
    simple.dumpdfaaccstbin --> 1_accst.bin

    The number "1" denotes the first regex file. If there are multiple regex files, the binary files should be 1_dfa.bin, 1_accst.bin, 2_dfa.bin, 2_accst.bin, 3_dfa.bin, 3_accst.bin, etc. 

3.3. Generating DFA binary representation from NFA(s) represented in Becchi's text format

You can use the simple.nfa as an example. To generate the DFA transition graph, launch:

$ cd bin

$ ./regex_memory_regen -gendfa -f ./data/simple.nfa -E simple

Three output files are generated: simple.dumpdfa (the DFA in the text format), simple_dfa.bin (the DFA in the binary format), and simple_accst.bin (the mapping between accepting states and corresponding rules)

-NOTE- The binary files MUST be renamed in order to be correctly interpreted by the DFA engine, as follows

    simple_dfa.bin   --> 1_dfa.bin

    simple_accst.bin --> 1_accst.bin

    The number "1" denotes the first regex file. If there are multiple regex files, the binary files should be 1_dfa.bin, 1_accst.bin, 2_dfa.bin, 2_accst.bin, 3_dfa.bin, 3_accst.bin, etc.

3.4. Generating DFAs from VASim

You can generate NFA text files, DFA binary files, or MNRL files using VASim with the ANML file as input. See https://github.com/jackwadden/VASim for more details.

$ cd VASim

$ ./vasim -n ../DFAGE/bin/data/simple.anml (VASim outputs automata in NFA text format readable by Becchi's tools)

$ ./vasim -Dn ../DFAGE/bin/data/simple.anml (VASim converts automata to DFA and outputs DFA binary plus DFA text files)

$ ./vasim -Dm ../DFAGE/bin/data/simple.anml (VASim converts automata to DFA and outputs DFA in MNRL format)

3.5. Running the DFA engine

If you want to run the DFA engine over an input file using the previously generated DFA transition graphs (e.g. simple), you can launch:

$ cd bin

$ ./dfa_engine -a ./data/simple -i ./data/simple.input -T 1 -g 1 -p 1 -N 3

$ ./dfa_engine -a ./data/simple -i ./data/simple.input -T 1 -g 1 -p 1 -N 3 -m 1

$ ./dfa_engine -a ./data/simpletwo -i ./data/simpletwo.input -T 1 -g 2 -p 1 -N 6 -m 1

$ ./dfa_engine -a ./data/simpletwo -i ./data/simpletwo.input -T 2 -g 2 -p 1 -N 6 -m 1 -O 1

where

    -a <file> :   automata name (must NOT contain the file extension)

    -i <file> :   input file (with file extension)

    -T <n>    :   number of threads per block (overwritten if block size tuning feature is used)

    -g <n>    :   number of graphs (or DFAs) to be executed (default: 1)

    -p <n>    :   number of parallel packets to be examined (default: 1)

    -N <n>    :   total number of rules (subgraphs)

    -O <n>    :   0 - block size tuning not enabled; 1 - block size tuned (optional, default: 0 - not tuned)

    -m <n>    :   0 - automata in binary format; 1 - automata in MNRL format (optional, default: 0 - binary)

NOTE: The DFA transition graphs MUST be stored in folders with the convention:

Ex: simple

        ./data/simple_1/    -- the original graph is grouped into one subgraph. This folder contains 3 files: "1_dfa.mnrl", "1_dfa.bin", "1_accst.bin"
		
        ./data/simpletwo_2/ -- the original graph is grouped into two subgraphs. This folder contains 6 files: "1_dfa.mnrl", "1_dfa.bin", "1_accst.bin", "2_dfa.mnrl", "2_dfa.bin", "2_accst.bin"

As output, the engine will return the cycles and rule identifiers of each matched rule (subgraph) that matched each packet.

You can run the engine with the -? or -h option to have a help with all the available options.

Author

Vinh Dang

University of Virginia - USA

vqd8a@virginia.edu

References

  • DFA algorithms:

X. Yu and M. Becchi, "GPU acceleration of regular expression matching for large datasets: exploring the implementation space", Proceedings of the ACM International Conference on Computing Frontiers CF '13, pp. 18, 2013

G. Vasiliadis, M. Polychronakis, S. loannidis, "Parallelization and characterization of pattern matching using gpus", Proceedings of the IEEE InternationalSymposium on Workload Characterization IISWC '11, pp. 216-225, 2011

  • M. Becchi regular expression processor:

http://regex.wustl.edu/index.php/Main_Page

  • Optimized DFA algorithm:

J. Wadden, V. Dang, N. Brunelle, T. Tracy II, D. Guo, E. Sadredini, K. Wang, C. Bo, G. Robins, M. Stan, and K. Skadron, "ANMLZoo: A Benchmark Suite for Exploring Bottlenecks in Automata Processing Engines and Architectures," 2016 IEEE International Symposium on Workload Characterization (IISWC), Providence, RI, 2016, pp. 1-12.

http://ieeexplore.ieee.org/document/7581271/

Acknowledgements

This work was partly funded by grants from the NSF (EF-1124931, CCF-1629450), Achievement Rewards for College Scientists (ARCS) Foundation, Micron Technologies, and C-FAR, one of six centers of STARnet, a Semiconductor Research Corporation program sponsored by MARCO and DARPA.