Regular expressions provide a concise and flexible means to “match” (specify and capture) strings of text, such as particular characters, words, or patterns of characters. Here we tried our best to present to you the most used Regexes with examples for your handy referencing.
Introduction
Regexes or Regular Expressions is a formal language for setting string patterns for complex searches. Regular expressions are used by many text editors, utilities, and programming languages to search and manipulate text based on patterns.
There are many different dialects of regular expressions, all slightly different. Therefore, when referring to, include the tag for the specific programming language or tool (e.g., Perl, Ruby, Python, Java, JavaScript, vi, Emacs, sed, Lex, grep, etc.) you are using. It’s probably fair to say that Perl has the most robust regex engine in common use, so the Regex for Perl is most well spread and searched. The parse engines provide powerful and flexible means providing also extended Regex application, for example backreferencing, substittions, POSIX character classes and other. The online regex testers comparison table of different engines is available as well.
The modern Web Scraping is more or less wrapped up with Regexes, so for the developers’ sake we put out the Regex Cheat Sheet with examples. This would be a handy tool for your daily use.
Daily Regexes
Character classes | Metacharacters | Anchors |
Groups and ranges | Substitutions | Quantifiers |
Quantifiers modifier | Assertions | POSIX |
Pattern modifiers | Conditions & Comments |
Character classes | ||
---|---|---|
Character | Description | Example |
\с | Control character | \cx matches ‘Ctrl+x’ |
\s | White space | new\sgame matches new game |
\S | Not white space | \S+ matches cool in cool |
\d | Digit | \d matches 1 3 5 in 1st 3d 5th |
\D | Not digit | \D+ matches Harley road in 2500 Harley road |
\w | Matches any word character and underscore | \w matches A 1 2 9 6 _ in A-12-96_ |
\W | Matches any non-word character | \W matches - - , in A-12-96, |
Metacharacters and Escape sign | ||
---|---|---|
Character | Description | Example |
.*+?^&$()[]{}\|/ | Characters that go for special meaning unless backslashed ‘\’ | \^, \$, \\ match ^, $, \ |
\ |
Escape character (backslash) changes letters for their special meaning (see also ‘Character classes’). It also escapes metacharacters to suppress their special meaning (see the cell above). | \a matches Bell, hex07; \r matches Carriage Return, hex0D; \n matches New Line, hex0A; \t matches Tab, hex09; \v matches Vertical Tab, hex0B; \f matches Form Feed, hex0C; \ddd matches ASCII octal code ddd; \xhh matches ASCII hex code hh; \uxxxx matches Unicode character expressed in hexadecimal notation with exactly four numeric digits xxxx |
Anchors (match position) | ||
---|---|---|
^ | Start of string | ^a matches only first a in abcadaef. |
$ | End of string | .$ matches f in abcdef |
\A | Start of string. Never matches after line breaks. | \A. matches only a in abc dfg hij |
\Z \z | End of string. If ‘\Z’, then also before new line at the end | .\Z matches f in abcdef |
\b | String boundary | .\b matches c in abc; \b. matches a in abc |
\B | Not string boundary | \B.*\B matches bcd defg hi in abcd defg hij |
\G | The match must occur at the point where the previous match ended | \G\(\d\) matches (1)(3) in (1)(3)[4](5) |
Groups and Ranges | ||
---|---|---|
Character | Description | Example |
. (dot) | Any character except new line (\n) | a.c matches abc, asc or adc |
| (pipe) | Logical ‘OR’. | abc(def|xyz) matches abcdef or abcxyz |
( ) | Group (active) for capture and counts | (abc) matches abc in 34abcdef\t |
(?: ) | Passive Group (non-capturing) | \d(?:abc) captures 4abc in 34abcdef but abc being a passive capture group, while 4 is in active one. |
[ ] | Range | [a-zA-Z0-9] matches any letter or digit |
[^ ] | Negative (exclusive) Range | [^a-d4-6] matches any character except a, b, c or d, or 4,5 or 6 |
( )\n | Numbered group ( ) starting with 1; \n is an instance of nth group to be present in search string | (abc)\1\s(ghi)\1\2 matches abcabc ghiabcghi; where abc and ghi are numbered as 1st and 2nd capture groups respectively. |
(?<name> ) | Named group to be used as backreference | (?<Day>\d{1,2}) matches 12 in on June 12 and the match named as Day capture group. |
Substitutions (Backreferences) | ||
---|---|---|
Character | Description | Example |
$n | Substitutes the substring matched by group number n | Applying ^(\w+)\s(\w+)$ to Mary Seay with $2, $1 as a replace pattern results in Seay, Mary |
${name} | Substitutes the substring matched by the named group name.** | Applying (?<word1>\w+)\s(?<word2>\w+) to ABC DEF with ${word2} ${word1} as a replace pattern results in DEF ABC |
$` | Substitutes all the text of the input string before the matched string | Applying While to Kris While with $` as a replace pattern results in Kris Kris |
$' | Substitutes all the text of the input string after the matched string | Applying 3+ to 1122334455 with $' as a replace pattern results in 112244554455 |
$+ | Substitutes the last group that was captured | Applying (\d+)(USD)? to 200 USD with $$$1 as a replace pattern results in $200 |
$_ | Substitutes the entire input string | Applying regex to New regex Code with $_ as a replace pattern results in New New regex Code Code |
$& | Substitutes a copy of the whole match | Applying (\$*(\d*(\.+\d+)?){1}) to 1.4590 with ***$& as a replace pattern results in ***$1.4590*** |
$$ | Substitutes a $ character | Applying (\d+)\s*(USD)? to 200 USD with $$$1 as a replace pattern results in $200 |
Assertions | ||
---|---|---|
Character | Description | Example |
?= | Zero-width lookahead assertion | \w+(?=\.) matches sleep, ill and Oops in We sleep. The man is ill. Oops.! |
?! | Zero-width negative lookahead assertion | \b\w+(?!\.)\b matches We, The, man and is in We sleep. The man is ill. Oops.! |
?<= | Zero-width lookbehind assertion | (?<=DDR2)\s+\w+ matches 2MB in DDR2 2MB |
?<!= | Zero-width negative lookbehind assertion | (?<!=20)\d{2}\b matches 45, 76 in 1945 2012 1876 2002 |
Conditions and comment | ||
---|---|---|
Character | Description | Example |
(?(expression) yes | no ) |
Matches yes if expression matches; otherwise, matches the optional no part. | (?(A)A\d{3}\b|\b\d{3}\b) matches A380, 747, 400 in A380 C103 747-400 |
(?( name ) yes | no ) |
Matches yes if the name capture has a match; otherwise, matches the optional no. | (?")?(?(quot).+?"|\S+\s) matches rock.mp3 and "new song.mp3" in rock.jpg "new song.jpg" |
(?# ) | Comment text in brackets | (?# a new regex :-)) |
POSIX | ||
---|---|---|
Common range | Description | Notes |
[:upper:] | Upper case letters | [A-Z]+ |
[:lower:] | Lower case letters | [a-z] |
[:alpha:] | All letters | [A-Za-z] |
[:alnum:] | Digits and letters | [0-9A-Za-z] |
[:xdigit:] | Hexadecimal digits | |
[:digit:] | Digits | [0-9] |
[:punct:] | Punctuation | . , ” ‘ ? ! ; : # $ % & ( ) * + – / < > = @ [ ] \ ^ _ { } | ~ |
[:blank:] | Space and tab characters only | [\s \t ] |
[:space:] | Whitespace data characters | [ \t \v \r \f \n] |
[:graph:] | Anything excluding space, tab, control characters etc.; (printed characters) | [\x21-\x7E] |
[:print:] | Any printable character and space | [\x20-\x7E] |
[:word:] | Digits, letters and undescore | [0-9A-Za-z_] |
[:cntrl:] | Control characters | [\x00-\x1F\x7F] |
Pattern modifiers | ||
---|---|---|
Character | Description | |
g | Global match, seeks for all the occurences of pattern in input string | |
m | Multiline match | |
s | Single-line mode. Dot in pattern matches all characters, including newlines | |
x | Whitespaces in patterns are ignored | |
U | Ungreedy pattern | |
i | Case-insensitive matching |
Regex Pattern modifiers examples are here.
All Regex examples were tested with whether Expresso program or MyRegexTester.com online tester. We sure you get benefited from this reference list with your comments and suggestions welcomed.