Perl Incompatible Regular Expressions library
C++ C Objective-C Other
Latest commit 6810912 Sep 1, 2016 @dprokoptsev dprokoptsev committed on GitHub Merge pull request #44 from moskupols/features-priority-comment
Clarify the comment on precedence of features
Failed to load latest commit information.
m4 Added a m4/ directory to make autotools work. Oct 15, 2010
pkg Bump version. Nov 8, 2013
tools Check for GCC version supporting diagnostic push Aug 4, 2016
COPYING Added copyright notices and license text. Oct 14, 2010
INSTALL Removed dependency on CppUnit Dec 5, 2010 - Made 'make install' install config.h (into $PREFIX/include/pire); Dec 17, 2010
README Fix typo in Jul 14, 2015 Bump version. Nov 8, 2013


This is PIRE, Perl Incompatible Regular Expressions library.

This library is aimed at checking a huge amount of text against
relatively many regular expressions. Roughly speaking, it can just
check whether given text maches the certain regexp, but can do it
really fast (more than 400 MB/s on our hardware is common). Even more,
multiple regexps can be combined together, giving capability to
check the text against apx.10 regexps in a single pass (and mantaining
the same speed).

Since Pire examines each character only once, without any lookaheads
or rollbacks, spending about five machine instructions per each character,
it can be used even in realtime tasks.

On the other hand, Pire has very limited functionality (compared to
other regexp libraries). Pire does not have any Perlish conditional
regexps, lookaheads & backtrackings, greedy/nongreedy matches; neither
has it any capturing facilities.

Pire was developed in Yandex ( as a part of its
web crawler.

More information can be found in (in Russian), which is
yet to be translated.

Please report bugs to or

Quick Start

#include <stdio.h>
#include <vector>
#include <pire/pire.h>

Pire::NonrelocScanner CompileRegexp(const char* pattern)
	// Transform the pattern from UTF-8 into UCS4
	std::vector<Pire::wchar32> ucs4;
	Pire::Encodings::Utf8().FromLocal(pattern, pattern + strlen(pattern), std::back_inserter(ucs4));

	return Pire::Lexer(ucs4.begin(), ucs4.end())
		.AddFeature(Pire::Features::CaseInsensitive())	// enable case insensitivity
		.SetEncoding(Pire::Encodings::Utf8())		// set input text encoding
		.Parse() 					// create an FSM 
		.Surround()					// PCRE_ANCHORED behavior
		.Compile<Pire::NonrelocScanner>();		// compile the FSM

bool Matches(const Pire::NonrelocScanner& scanner, const char* ptr, size_t len)
	return Pire::Runner(scanner)
		.Begin()	// '^'
		.Run(ptr, len)	// the text 
		.End();		// '$'
		// implicitly cast to bool

int main()
	char re[] = "hello\\s+w.+d$";
	char str[] = "Hello world";

	Pire::NonrelocScanner sc = CompileRegexp(re);

	bool res = Matches(sc, str, strlen(str));

	printf("String \"%s\" %s \"%s\"\n", str, (res ? "matches" : "doesn't match"), re);
	return 0;