iNFAnt2 is a prototype framework for running NFA-based matching on NVIDIA CUDA-enabled GPU cards.
This software has mainly developed for research purpose. This software has been intended as a prototype, even though some validation has been performed, authors do not guarantee that it completely respects regular expression semantics and the complete correctness of the results produced.
To compile and run this framework you need:
- CUDA >=1.3 enabled nVidia GPU card
- Cuda Toolkit >=2.3
- GNU C++ compiler >=4.1
- libboost >= 1.40.0
- libpcap-dev >=0.8
iNFAnt2 is an optimized version of iNFAnt, which was developed by N. Cascarano et al. (N. Cascarano, P. Rolando, F. Risso, and R. Sisto, "iNFAnt: NFA pattern matching on GPGPU devices," ACM SIGCOMM Computer Communication Review, vol. 40, no. 5, pp. 20–26, 2010.) Optimizations include:
- Fast accept-state recognition by encoding accept states with negative IDs
- Multi-byte input symbol fetches
- NFA transition tables stored in GPU texture memory
- Reporting the cycle and rule ID of each matched rule.
- Capability of simultaneously processing multiple NFAs
iNFAnt2 is a framework that enables execution of very large and complex NFAs (e.g. regular expressions rulesets) with a low memory occupation and at very high throughput. The regular expressions are provided to the framework as a plain text file respecting the PCRE syntax. The framework can "compile" the regular expressions set into a transition graph file that is used as an input to the iNFAnt2 engine.
The framework is composed by two components:(i) Generator and (ii) Engine
- Generator: The generator purpose is to translate a set of regular expressions into a transition graph executable by the engine. The generator is an adapted version of the regular expression processor developed by M. Becchi.
Please refer to M. Becchi web page for more information: http://regex.wustl.edu/index.php/Main_Page
- Engine: iNFAnt2's engine purpose is to take transition graph(s) (in text format) and run it (them) over an nVidia CUDA enabled GPU card. The engine is able to take input as a binary file (or text file) and run the NFA(s) over a single stream or packets.
It should be noted that the original input stream is evenly divided into multiple segments in the multi-packet mode with the assumption that each segment is independent to others. This assumption is only suitable for performance evaluation purpose. The capability of handling multiple segments will be included in our future version.
The transition graph(s) can be generated in two ways:
- generated by the generator (included in this package);
- generated by VASim developed by Jack Wadden. Please refer to the J. Wadden's web page for more information: https://github.com/jackwadden/VASim
Compiling the framework
From the root directory of the sources, simply launch:
The compiled binaries will be generated into the bin/ directory. You will obtain:
- regex_memory : the generator binary
- nfa_engine : the engine binary
Generating Transition Graph from regular expressions
For using the generator to generate the transition graph to be run by the iNFAnt2 engine, you must provide a file containing PCRE compatible regular expressions one per line.
You can use the big_boy.txt as an example. To generate the transition graph, launch:
$ cd bin
$ ./regex_memory -nfa -f ../data/big_boy.txt -E ../data/big_boy.nfa
The big_boy.nfa file will contain the transition graph.
-NOTE- the transition graph file name MUST end with .nfa extension in order to be correctly interpreted by the engine
Running the iNFAnt2 engine
If you want to run the engine over an input file using the previously generated transition graphs (e.g. big-boy, Sample_NFA), you can launch:
$ cd bin
$ ./nfa_engine -a ../data/Sample_NFA/Sample_NFA -i ../data/random_stream_1MB.input -T 1024 -g 1 -p 1 -N 3072 -O 0
$ ./nfa_engine -a ../data/Sample_NFA/Sample_NFA -i ../data/random_stream_1MB.input -g 1 -p 1 -N 3072 -O 1
$ ./nfa_engine -a ../data/Sample_NFA/Sample_NFA -i ../data/random_stream_1MB.input -g 2 -p 1 -N 3072 -O 1
$ ./nfa_engine -a ../data/Sample_NFA/Sample_NFA -i ../data/random_stream_1MB.input -g 2 -p 4 -N 3072 -O 1
$ ./nfa_engine -a ../data/big_boy/big_boy -i ../data/random_stream_1MB.input -g 1 -p 1 -N 99 -O 1
-a : the transition graph prefix with full directory path (must NOT contain the file extension)
-i : binary input file to be processed (with file extension)
-T : number of threads per block (this will be overwritten if block size tuning feature is being used)
-g : number of graphs (or NFAs) which are combined from the original subgraphs (i.e. number of thread blocks in y-dimension of CUDA grid) to be executed (default: 1)
-p : number of parallel packets to be examined (number of thread blocks in x-dimension of CUDA grid)(defaul: 1)
-N : total number of rules (subgraphs)
-O : 0 - block size tuning not enabled; 1 - block size tuned (optional, default: 0 - not tuning)
Note: The transition graphs must be stored in folders with the convention:
Ex: Sample_NFA /data/Sample_NFA/Sample_NFA_1/ -- the original graph is grouped into one subgraphs. This folder contains one file "1.nfa" /data/Sample_NFA/Sample_NFA_2/ -- the original graph is grouped into two subgraphs. This folder contains two files "1.nfa" and "2.nfa"
As output, the engine will return the cycles and rule identifiers of each matched rule (subgraph) that matched each packet.
Note: The regular expressions are identified using the corresponding line number the file containing the regular expression ruleset
You can run the engine with the -? option to have a help with all the available options.
University of Virginia - USA
- iNFAnt algorithm:
N. Cascarano, P. Rolando, F Risso, R. Sisto, iNFAnt: NFA Pattern Matching on GPGPU Devices
- M. Becchi regular expression processor:
- iNFAnt2 algorithm:
J. Wadden, V. Dang, N. Brunelle, T. Tracy II, D. Guo, E. Sadredini, K. Wang, C. Bo, G. Robins, M. Stan, and K. Skadron, "ANMLZoo: A Benchmark Suite for Exploring Bottlenecks in Automata Processing Engines and Architectures," 2016 IEEE International Symposium on Workload Characterization (IISWC), Providence, RI, 2016, pp. 1-12.
This work was partly funded by grants from the NSF (EF-1124931, CCF-1629450), Achievement Rewards for College Scientists (ARCS) Foundation, Micron Technologies, and C-FAR, one of six centers of STARnet, a Semiconductor Research Corporation program sponsored by MARCO and DARPA.