Categories
Development

Regex in PHP

If you want to use regular expressions in your PHP program the best way is to use so called preg-functions (they wrap Perl-Compatible Regular Expressions library so sometimes they are called PCRE functions). Of course, there’re some other function sets like ereg and mb_ereg but they are quite outdated and in this article we’ll focus on preg functions only.

Does it match?

Ok, let’s look how those preg functions work. We will start with preg_match function which simply returns number of times the pattern matches, the first parameter being a regex string and the second one being an input string:

preg_match('/java/', "regex in php");

returns 0, but

preg_match('/p/', "regex in php");

returns 1, Why not 2? Because this function always stops after the first match, if you need to continue please use preg_match_all function.

Note that in PHP you need to specify your regular expression using Perl syntax surrounding it with delimiters. In our case the delimiter is ‘/’ but it can be any non-alphanumeric, non-backslash, non-whitespace character. Sometimes you may use some special characters in pattern delimiters to differ from default one (/). Also you can also use Perl-style (), {}, [], and <> matching delimiters. If the delimiter character has to be used in the expression itself, it needs to be escaped by backslash.

If you need to retrieve all captured parenthesized subpatterns you may specify a third parameter:

preg_match('/(\w+)\s(\w+)\s+\w+)/', "regex in php",
$matches);

fills $matches array with the following values: “regex in php“, “regex“, “in“, “php“.

As we mentioned before if you need to get all the matches (not just subpatterns) you should use preg_match_all function:

preg_match_all('/\w+/', "regex in php", $matches);

also fills $matches array with the same values: “regex in php“, “regex“, “in“, “php“.

What does match?

If you need to select array elements that match the pattern use preg_grep function. In a simple case it takes only two arguments:  a pattern string and an array of input strings. For example:

preg_grep('/php/', Array ("regex in php",
 "regex in java", "php language"));
returns an array of two elements:  “regex in php” and “php language“.
If you want to return all the array elements that does NOT match the pattern just add PREG_GREP_INVERT option as a third parameter:
preg_grep('/php/', Array ("regex in php", 
"regex in java", "php language"), PREG_GREP_INVERT);
returns an array with one element:  “regex in java“.

Want to replace?

To replace all the matches you may use preg_replace function:

preg_replace('/(\w+) (\d{1,2})th/i', '$2 $1','March 12th');

returns 12 March Here we got day and month as numbered groups and changed their order.

Want to split?

To split use the simple preg_split() function. Let’s split a list with delimiter being any number of dots, commas, semicolons or space characters:

preg_split('/[.,;\s]+/', ' jan,feb mar. apr, may;june');

returns array of 6 elements: ‘jan’, ‘feb’, ‘mar’, ‘apr’, ‘may’, ‘june’.
Important thing here is that we could use both different delimiters and also one delimiter multiple times.
Note: If you don't need the power of regex, you may choose faster php functions for split: str_split() or explode()

Regex Pattern Modifiers

Sometimes you need to change the behaviour how regex works. To do it just add a letter (regex modifier) after the last delimiter, for example:

i – makes search case-insensitive:

preg_match('/SQL query/i', "the sql query process");

returns 1

m enables “multi-line mode” (In this mode you may use “start of line” (^) and “end of line” ($) characters):

preg_match_all('/^.+$/m', "line 1 \n line 2", $matches);

fills $matches array with “line 1” and “line 2″ values. Here /^.+$/ regex with ‘m’ modifier following considers line boundaries for a match rather than string boundaries.

s – enables “dot matches all” (In this mode a “dot” matches all characters, including newlines (otherwise newlines are excluded)):

preg_match_all('/^.+$/s', "line 1 \n line 2", $matches);

fills $matches array with one element:  “line 1 \n line 2″

x – enables “free-spacing mode” (In this mode, whitespace between regex tokens is ignored, and an unescaped # starts a comment):

preg_match('/reg ex # regex sample/x', 'regex', $matches);

fills $matches with  “regex”.

U –  makes quantifiers not greedy (but rather greedy if followed by “?”):

preg_match_all('/.+/U', 'navi', $matches);

fills $matches array with “n”, “a”, “v” and “i” elements since + is not greedy

Want more?

If you want to test regexes online visit different engines online regex testers.

The more comprehensive manual on PCRE functions is here.

For the whole set of PCRE pattern modifiers look here.

Other Resources

  1. PHP’s online documentation at http://www.php.net/pcre.
  2. http://www.phpbuilder.com/columns/dario19990616.php3

 

 

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.