Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
..
Failed to load latest commit information.
.libs
AUTHORS
COPYING
ChangeLog
INSTALL
LICENCE
Makefile
Makefile.in
NEWS
NON-UNIX-USE
README
RunTest
RunTest.in
chartables.c
config.guess
config.h
config.in
config.log
config.status
config.sub
configure
configure.in
configure.lineno
dftables
dftables.c
dftables.o
get.c
install-sh
internal.h
libpcre.def
libpcre.pc
libpcre.pc.in
libpcreposix.def
libtool
ltmain.sh
maketables.c
makevp.bat
mkinstalldirs
pcre-config
pcre-config.in
pcre.c
pcre.def
pcre.h
pcre.in
pcre.lo
pcre.o
pcredemo.c
pcregrep.c
pcreposix.c
pcreposix.h
pcretest.c
perltest
printint.c
study.c
study.lo
study.o
ucp.c
ucp.h
ucpinternal.h
ucptable.c
ucptypetable.c

README

README file for PCRE (Perl-compatible regular expression library)
-----------------------------------------------------------------

The latest release of PCRE is always available from

  ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.tar.gz

Please read the NEWS file if you are upgrading from a previous release.

PCRE has its own native API, but a set of "wrapper" functions that are based on
the POSIX API are also supplied in the library libpcreposix. Note that this
just provides a POSIX calling interface to PCRE: the regular expressions
themselves still follow Perl syntax and semantics. The header file
for the POSIX-style functions is called pcreposix.h. The official POSIX name is
regex.h, but I didn't want to risk possible problems with existing files of
that name by distributing it that way. To use it with an existing program that
uses the POSIX API, it will have to be renamed or pointed at by a link.

If you are using the POSIX interface to PCRE and there is already a POSIX regex
library installed on your system, you must take care when linking programs to
ensure that they link with PCRE's libpcreposix library. Otherwise they may pick
up the "real" POSIX functions of the same name.


Documentation for PCRE
----------------------

If you install PCRE in the normal way, you will end up with an installed set of
man pages whose names all start with "pcre". The one that is called "pcre"
lists all the others. In addition to these man pages, the PCRE documentation is
supplied in two other forms; however, as there is no standard place to install
them, they are left in the doc directory of the unpacked source distribution.
These forms are:

  1. Files called doc/pcre.txt, doc/pcregrep.txt, and doc/pcretest.txt. The
     first of these is a concatenation of the text forms of all the section 3
     man pages except those that summarize individual functions. The other two
     are the text forms of the section 1 man pages for the pcregrep and
     pcretest commands. Text forms are provided for ease of scanning with text
     editors or similar tools.

  2. A subdirectory called doc/html contains all the documentation in HTML
     form, hyperlinked in various ways, and rooted in a file called
     doc/index.html.


Contributions by users of PCRE
------------------------------

You can find contributions from PCRE users in the directory

  ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/Contrib

where there is also a README file giving brief descriptions of what they are.
Several of them provide support for compiling PCRE on various flavours of
Windows systems (I myself do not use Windows). Some are complete in themselves;
others are pointers to URLs containing relevant files.


Building PCRE on a Unix-like system
-----------------------------------

To build PCRE on a Unix-like system, first run the "configure" command from the
PCRE distribution directory, with your current directory set to the directory
where you want the files to be created. This command is a standard GNU
"autoconf" configuration script, for which generic instructions are supplied in
INSTALL.

Most commonly, people build PCRE within its own distribution directory, and in
this case, on many systems, just running "./configure" is sufficient, but the
usual methods of changing standard defaults are available. For example:

CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local

specifies that the C compiler should be run with the flags '-O2 -Wall' instead
of the default, and that "make install" should install PCRE under /opt/local
instead of the default /usr/local.

If you want to build in a different directory, just run "configure" with that
directory as current. For example, suppose you have unpacked the PCRE source
into /source/pcre/pcre-xxx, but you want to build it in /build/pcre/pcre-xxx:

cd /build/pcre/pcre-xxx
/source/pcre/pcre-xxx/configure

There are some optional features that can be included or omitted from the PCRE
library. You can read more about them in the pcrebuild man page.

. If you want to make use of the support for UTF-8 character strings in PCRE,
  you must add --enable-utf8 to the "configure" command. Without it, the code
  for handling UTF-8 is not included in the library. (Even when included, it
  still has to be enabled by an option at run time.)

. If, in addition to support for UTF-8 character strings, you want to include
  support for the \P, \p, and \X sequences that recognize Unicode character
  properties, you must add --enable-unicode-properties to the "configure"
  command. This adds about 90K to the size of the library (in the form of a
  property table); only the basic two-letter properties such as Lu are
  supported.

. You can build PCRE to recognized CR or NL as the newline character, instead
  of whatever your compiler uses for "\n", by adding --newline-is-cr or
  --newline-is-nl to the "configure" command, respectively. Only do this if you
  really understand what you are doing. On traditional Unix-like systems, the
  newline character is NL.

. When called via the POSIX interface, PCRE uses malloc() to get additional
  storage for processing capturing parentheses if there are more than 10 of
  them. You can increase this threshold by setting, for example,

  --with-posix-malloc-threshold=20

  on the "configure" command.

. PCRE has a counter which can be set to limit the amount of resources it uses.
  If the limit is exceeded during a match, the match fails. The default is ten
  million. You can change the default by setting, for example,

  --with-match-limit=500000

  on the "configure" command. This is just the default; individual calls to
  pcre_exec() can supply their own value. There is discussion on the pcreapi
  man page.

. The default maximum compiled pattern size is around 64K. You can increase
  this by adding --with-link-size=3 to the "configure" command. You can
  increase it even more by setting --with-link-size=4, but this is unlikely
  ever to be necessary. If you build PCRE with an increased link size, test 2
  (and 5 if you are using UTF-8) will fail. Part of the output of these tests
  is a representation of the compiled pattern, and this changes with the link
  size.

. You can build PCRE so that its match() function does not call itself
  recursively. Instead, it uses blocks of data from the heap via special
  functions pcre_stack_malloc() and pcre_stack_free() to save data that would
  otherwise be saved on the stack. To build PCRE like this, use

  --disable-stack-for-recursion

  on the "configure" command. PCRE runs more slowly in this mode, but it may be
  necessary in environments with limited stack sizes.

The "configure" script builds seven files:

. pcre.h is build by copying pcre.in and making substitutions
. Makefile is built by copying Makefile.in and making substitutions.
. config.h is built by copying config.in and making substitutions.
. pcre-config is built by copying pcre-config.in and making substitutions.
. libpcre.pc is data for the pkg-config command, built from libpcre.pc.in
. libtool is a script that builds shared and/or static libraries
. RunTest is a script for running tests

Once "configure" has run, you can run "make". It builds two libraries called
libpcre and libpcreposix, a test program called pcretest, and the pcregrep
command. You can use "make install" to copy these, the public header files
pcre.h and pcreposix.h, and the man pages to appropriate live directories on
your system, in the normal way.


Retrieving configuration information on Unix-like systems
---------------------------------------------------------

Running "make install" also installs the command pcre-config, which can be used
to recall information about the PCRE configuration and installation. For
example:

  pcre-config --version

prints the version number, and

  pcre-config --libs

outputs information about where the library is installed. This command can be
included in makefiles for programs that use PCRE, saving the programmer from
having to remember too many details.

The pkg-config command is another system for saving and retrieving information
about installed libraries. Instead of separate commands for each library, a
single command is used. For example:

  pkg-config --cflags pcre

The data is held in *.pc files that are installed in a directory called
pkgconfig.


Shared libraries on Unix-like systems
-------------------------------------

The default distribution builds PCRE as two shared libraries and two static
libraries, as long as the operating system supports shared libraries. Shared
library support relies on the "libtool" script which is built as part of the
"configure" process.

The libtool script is used to compile and link both shared and static
libraries. They are placed in a subdirectory called .libs when they are newly
built. The programs pcretest and pcregrep are built to use these uninstalled
libraries (by means of wrapper scripts in the case of shared libraries). When
you use "make install" to install shared libraries, pcregrep and pcretest are
automatically re-built to use the newly installed shared libraries before being
installed themselves. However, the versions left in the source directory still
use the uninstalled libraries.

To build PCRE using static libraries only you must use --disable-shared when
configuring it. For example:

./configure --prefix=/usr/gnu --disable-shared

Then run "make" in the usual way. Similarly, you can use --disable-static to
build only shared libraries.


Cross-compiling on a Unix-like system
-------------------------------------

You can specify CC and CFLAGS in the normal way to the "configure" command, in
order to cross-compile PCRE for some other host. However, during the building
process, the dftables.c source file is compiled *and run* on the local host, in
order to generate the default character tables (the chartables.c file). It
therefore needs to be compiled with the local compiler, not the cross compiler.
You can do this by specifying CC_FOR_BUILD (and if necessary CFLAGS_FOR_BUILD)
when calling the "configure" command. If they are not specified, they default
to the values of CC and CFLAGS.


Building on non-Unix systems
----------------------------

For a non-Unix system, read the comments in the file NON-UNIX-USE, though if
the system supports the use of "configure" and "make" you may be able to build
PCRE in the same way as for Unix systems.

PCRE has been compiled on Windows systems and on Macintoshes, but I don't know
the details because I don't use those systems. It should be straightforward to
build PCRE on any system that has a Standard C compiler, because it uses only
Standard C functions.


Testing PCRE
------------

To test PCRE on a Unix system, run the RunTest script that is created by the
configuring process. (This can also be run by "make runtest", "make check", or
"make test".) For other systems, see the instructions in NON-UNIX-USE.

The script runs the pcretest test program (which is documented in its own man
page) on each of the testinput files (in the testdata directory) in turn,
and compares the output with the contents of the corresponding testoutput file.
A file called testtry is used to hold the main output from pcretest
(testsavedregex is also used as a working file). To run pcretest on just one of
the test files, give its number as an argument to RunTest, for example:

  RunTest 2

The first file can also be fed directly into the perltest script to check that
Perl gives the same results. The only difference you should see is in the first
few lines, where the Perl version is given instead of the PCRE version.

The second set of tests check pcre_fullinfo(), pcre_info(), pcre_study(),
pcre_copy_substring(), pcre_get_substring(), pcre_get_substring_list(), error
detection, and run-time flags that are specific to PCRE, as well as the POSIX
wrapper API. It also uses the debugging flag to check some of the internals of
pcre_compile().

If you build PCRE with a locale setting that is not the standard C locale, the
character tables may be different (see next paragraph). In some cases, this may
cause failures in the second set of tests. For example, in a locale where the
isprint() function yields TRUE for characters in the range 128-255, the use of
[:isascii:] inside a character class defines a different set of characters, and
this shows up in this test as a difference in the compiled code, which is being
listed for checking. Where the comparison test output contains [\x00-\x7f] the
test will contain [\x00-\xff], and similarly in some other cases. This is not a
bug in PCRE.

The third set of tests checks pcre_maketables(), the facility for building a
set of character tables for a specific locale and using them instead of the
default tables. The tests make use of the "fr_FR" (French) locale. Before
running the test, the script checks for the presence of this locale by running
the "locale" command. If that command fails, or if it doesn't include "fr_FR"
in the list of available locales, the third test cannot be run, and a comment
is output to say why. If running this test produces instances of the error

  ** Failed to set locale "fr_FR"

in the comparison output, it means that locale is not available on your system,
despite being listed by "locale". This does not mean that PCRE is broken.

The fourth test checks the UTF-8 support. It is not run automatically unless
PCRE is built with UTF-8 support. To do this you must set --enable-utf8 when
running "configure". This file can be also fed directly to the perltest script,
provided you are running Perl 5.8 or higher. (For Perl 5.6, a small patch,
commented in the script, can be be used.)

The fifth test checks error handling with UTF-8 encoding, and internal UTF-8
features of PCRE that are not relevant to Perl.

The sixth and final test checks the support for Unicode character properties.
It it not run automatically unless PCRE is built with Unicode property support.
To to this you must set --enable-unicode-properties when running "configure".


Character tables
----------------

PCRE uses four tables for manipulating and identifying characters whose values
are less than 256. The final argument of the pcre_compile() function is a
pointer to a block of memory containing the concatenated tables. A call to
pcre_maketables() can be used to generate a set of tables in the current
locale. If the final argument for pcre_compile() is passed as NULL, a set of
default tables that is built into the binary is used.

The source file called chartables.c contains the default set of tables. This is
not supplied in the distribution, but is built by the program dftables
(compiled from dftables.c), which uses the ANSI C character handling functions
such as isalnum(), isalpha(), isupper(), islower(), etc. to build the table
sources. This means that the default C locale which is set for your system will
control the contents of these default tables. You can change the default tables
by editing chartables.c and then re-building PCRE. If you do this, you should
probably also edit Makefile to ensure that the file doesn't ever get
re-generated.

The first two 256-byte tables provide lower casing and case flipping functions,
respectively. The next table consists of three 32-byte bit maps which identify
digits, "word" characters, and white space, respectively. These are used when
building 32-byte bit maps that represent character classes.

The final 256-byte table has bits indicating various character types, as
follows:

    1   white space character
    2   letter
    4   decimal digit
    8   hexadecimal digit
   16   alphanumeric or '_'
  128   regular expression metacharacter or binary zero

You should not alter the set of characters that contain the 128 bit, as that
will cause PCRE to malfunction.


Manifest
--------

The distribution should contain the following files:

(A) The actual source files of the PCRE library functions and their
    headers:

  dftables.c            auxiliary program for building chartables.c

  get.c                 )
  maketables.c          )
  study.c               ) source of the functions
  pcre.c                )   in the library
  pcreposix.c           )
  printint.c            )

  ucp.c                 )
  ucp.h                 ) source for the code that is used for
  ucpinternal.h         )   Unicode property handling
  ucptable.c            )
  ucptypetable.c        )

  pcre.in               "source" for the header for the external API; pcre.h
                          is built from this by "configure"
  pcreposix.h           header for the external POSIX wrapper API
  internal.h            header for internal use
  config.in             template for config.h, which is built by configure

(B) Auxiliary files:

  AUTHORS               information about the author of PCRE
  ChangeLog             log of changes to the code
  INSTALL               generic installation instructions
  LICENCE               conditions for the use of PCRE
  COPYING               the same, using GNU's standard name
  Makefile.in           template for Unix Makefile, which is built by configure
  NEWS                  important changes in this release
  NON-UNIX-USE          notes on building PCRE on non-Unix systems
  README                this file
  RunTest.in            template for a Unix shell script for running tests
  config.guess          ) files used by libtool,
  config.sub            )   used only when building a shared library
  configure             a configuring shell script (built by autoconf)
  configure.in          the autoconf input used to build configure
  doc/Tech.Notes        notes on the encoding
  doc/*.3               man page sources for the PCRE functions
  doc/*.1               man page sources for pcregrep and pcretest
  doc/html/*            HTML documentation
  doc/pcre.txt          plain text version of the man pages
  doc/pcretest.txt      plain text documentation of test program
  doc/perltest.txt      plain text documentation of Perl test program
  install-sh            a shell script for installing files
  libpcre.pc.in         "source" for libpcre.pc for pkg-config
  ltmain.sh             file used to build a libtool script
  mkinstalldirs         script for making install directories
  pcretest.c            comprehensive test program
  pcredemo.c            simple demonstration of coding calls to PCRE
  perltest              Perl test program
  pcregrep.c            source of a grep utility that uses PCRE
  pcre-config.in        source of script which retains PCRE information
  testdata/testinput1   test data, compatible with Perl
  testdata/testinput2   test data for error messages and non-Perl things
  testdata/testinput3   test data for locale-specific tests
  testdata/testinput4   test data for UTF-8 tests compatible with Perl
  testdata/testinput5   test data for other UTF-8 tests
  testdata/testinput6   test data for Unicode property support tests
  testdata/testoutput1  test results corresponding to testinput1
  testdata/testoutput2  test results corresponding to testinput2
  testdata/testoutput3  test results corresponding to testinput3
  testdata/testoutput4  test results corresponding to testinput4
  testdata/testoutput5  test results corresponding to testinput5
  testdata/testoutput6  test results corresponding to testinput6

(C) Auxiliary files for Win32 DLL

  dll.mk
  libpcre.def
  libpcreposix.def
  pcre.def

(D) Auxiliary file for VPASCAL

  makevp.bat

Philip Hazel <ph10@cam.ac.uk>
September 2004