Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New text.regex package #165

Closed
thekid opened this issue Oct 17, 2011 · 0 comments
Closed

New text.regex package #165

thekid opened this issue Oct 17, 2011 · 0 comments

Comments

@thekid
Copy link
Member

thekid commented Oct 17, 2011

Scope of Change

A new package text.regex will be added.

Rationale

Object oriented API for regular expressions.

Functionality

The entry point class is text.regex.Pattern which is a wrapper around the
preg_*() functions in PHP.

Testing whether a pattern matches

The most common use-case is to test whether a given pattern matches.

<?php
  // Current
  if (preg_match('/([w]{3}\.)?example\.(com|net|org)/', $string)) {
    ...
  }

  // New
  if (Pattern::compile('([w]{3}\.)?example\.(com|net|org)')->matches($string)) {
    ...
  }
?>

The problem with the preg_match() approach is that it will return FALSE
if the pattern is malformed (and raise a warning) - this is something that
can lead to long debugging / wtf?! sessions. The Pattern class will throw
an exception.

Retrieving matched text

To match parts out of a string:

<?php
  // Current
  preg_match('/(([w]{3})\.)?example\.(com|net|org)/', $string, $matches);
  Console::writeLine($matches);

  // New
  $match= Pattern::compile('(([w]{3})\.)?example\.(com|net|org)')->match($string);
  Console::writeLine($match->group(0));
?>

The results in both cases is a string-array with the contents
[ "www.example.com", "www.", "www", "com" ].

Working with string objects

The text.regex pattern supports the lang.types.String object built-in:

<?php
  $string= new String('xp-framework/rfc #1');
  $num= Pattern::compile('RFC #([0-9]+)')->match($string)->group(1);  // "0001"
?>

Modifiers

Instead of embedding the modifiers in the pattern string, they need to be
passed to the Pattern class' compile() method as bitfield:

<?php
  // Current
  $ok= preg_match('/[a-z0-9_]+/i', $username);

  // New
  $ok= Pattern::compile('[a-z0-9_]+', Pattern::CASE_INSENSITIVE)->matches($username);
?>

Further modifiers are:

  Constant name    Modifier
  ================ ========
  CASE_INSENSITIVE i
  MULTILINE        m
  DOTALL           s
  EXTENDED         x
  ANCHORED         A
  DOLLAR_ENDONLY   D
  ANALYSIS         S
  UNGREEDY         U
  UTF8             u

This is more verbose but easier to read.

Security considerations

None.

Speed impact

Slightly slower than procedural approach.

Dependencies

PCRE extension (enabled by default)

Related documents

@thekid thekid closed this as completed Oct 17, 2011
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant