- Concatenation :
ab
- Union :
a+b
- Kleene Start :
a*
- Parentesis based mixed Operators :
(l*o)+(o*)
- Parallelised Code.
- No Bugs currently discovered.
- variety of test cases tried.
- If improper regex string is an input then the program might fail.
- When writing the parallel part of the code there were issues with parallelising nested loops and threads. These got solved by an OpenMP method which allows the nested spawning of threads.
- Code works properly but will require more refactoring as there are a lot of unused methods and redundant methods.
- Parse regular expression string to readable regex string with missing operators such as concatenation
- Parse the preprocessed Regex to postfix notation using Shunting Yard Algorithm.
- Evaluate the postfix string with an NFA class that creates resultant NFA's based on Thompson's construction.
- Convert the final resultant NFA from postfix evaluation to a DFA Using Subset Construction
- Create a node graph from the resultant DFA
- Evaluate the text in the files by traversing the graph using recursion and print the matching values when a final state is reached in the graph.
class NFA
: Contains Methods to create an NFA and a DFA from any NFA.struct transition
: stucture that represents the transiton between the nodes of the automaton. It contains the starting and ending edge and the symbol of transition.vector<transtion>
: Vector containing the infomation needed to construct the node graph of the Automaton.vector< vector<trans> > dfa_node_graph
: actual graph used for Traversing the Tree.struct matched_symbol
: The structure that stores the matched tokens and postion of matched tokens from the text on which the search takes place.
string changeRegexOperators()
: Regex Preprocess : adds the missing concatenation symbol to convert to postfix.string convertRegexToPostfix()
: Shuntting Yard Algorithm : To convert infix regex into postfix notationNFA postFixNFABuilder()
: Post Fix Evaluation with Thompson Construction : A Method to build the final NFA from a postfix regular expressionvoid convert_to_dfa()::NFA
: Subset Constuction : Algorithm to convert the NFA to DFA. Usesset<int> epsilon_closure ::NFA
andset<int> move::NFA
to find the resulting DFA states.vector<matched_symbol> traverse_dfa_graph()::NFA
: DFS with Recurrsion : DFA Node graph explored with recursion. No backtracking supported in the algorithm as DFA's are deterministic to once a path is chosen in a graph there is no point in back tracking. String matching takes place here. avector<matched_symbol>
holds the tokens that got matched.
make
: run make to create the binariessearch
andpsearch
search "the" input_file1.txt input_file2.txt input_file3.txt
: either binaries search or psearch can be used. The second argument is the regular expression itself. Supported operators provided at the start of the README.
- C++ Best Practices
- Ways to Create String Arrays
- I/O with Files C++
- Array Vs Vectors In C++
- How Arrays Are Treated Within Functions
- Passing a Vector to Function in C++
- Using Iterators with Vectors
- Using Cont Iterators When iterating Through Constant Arrays.
- Shunting Yard Algorithm Psuedo Code
- Using :: in C++
- Stacks in C++
- Queues in C++
- Printing Vectors in C++
- Reference in C++
- Pointers Vs References in C++
- KMP Search Algorithm
- Pointers vs. References in C++
- Building NFAs with Thompson Construction
- Building Regex Machine with NFAs
- Subset Construction For Converting DFA to NFA
- Regex Engine In Python For Thompson Construction
- Thompson's Construction With C++
- The Practice of Programming by Brian W. Kernighan
- Beautiful Code (O'Reilly)
- Subset Construction From NFA
- Nested Parallelism with OpenMP