For example, if you need to search an entire web site to remove some
outdated material and replace some HTML formatting tags, you can use
a regular expression to test each file to see if the material or the
HTML formatting tags you are looking for exists in that file. That
way, you can narrow down the affected files to only those that contain
the material that has to be removed or changed. You can then use a
regular expression to remove the outdated material, and finally, you
can use regular expressions to search for and replace the tags that
need replacing.
Another example of where a regular expression is useful occurs in
a language that is not known for its string-handling ability. Regular expressions provide a significant
improvement in string-handling for JScript. However, regular expressions
may also be more efficient to use in VBScript as well, allowing you
do perform multiple string manipulations in a single expression.
| Early
Beginnings
Regular expressions trace their ancestry back to early research on
how the human nervous system works. Warren McCulloch and Walter Pitts,
a pair of neuro-physiologists, developed a mathematical way of describing
these neural networks.
In 1956, a mathematician named Stephen Kleene, building on the earlier
work of McCulloch and Pitts, published a paper entitled, Representation
of Events in Nerve Nets that introduced the concept of regular
expressions. Regular expressions were expressions used to describe
what he called "the algebra of regular sets," hence the term "regular
expression."
Subsequently, his work found its way into some early efforts with
computational search algorithms done by Ken Thompson, the principal
inventor of Unix. The first practical application of regular expressions
was in the Unix editor called qed. And the rest, as they say, is history. Regular expressions have been
an important part of text-based editors and search tools ever since.
|
Regular Expressions
Unless you have worked with regular expressions before, the term
and the concept may be unfamiliar to you. However, they may not be
as unfamiliar as you think.
Think about how you search for files on your hard disk. You most
likely use the ? and * characters to help find the files you're looking
for. The ? character matches a single character in a file name, while
the * matches zero or more characters. A pattern such as 'data?.dat'
would find the following files:
data1.dat
data2.dat
datax.dat
dataN.dat
Using the * character instead of the ? character expands the number
of files found. 'data*.dat' matches all of the following:
data.dat
data1.dat
data2.dat
data12.dat
datax.dat
dataXYZ.dat
While this method of searching for files can certainly be useful,
it is also very limited. The limited ability of the ? and * wildcard
characters give you an idea of what regular expressions can do, but
regular expressions are much more powerful and flexible.
|
Regular
Expression Syntax
A regular expression is a pattern of text that consists of ordinary
characters (for example, letters a through z) and special characters,
known as metacharacters. The pattern describes one or more
strings to match when searching a body of text. The regular expression
serves as a template for matching a character pattern to the string
being searched.
Here are some examples of regular expression you might encounter:
| Pattern |
Matches |
| /^\s[ \t]*$/ |
Match a blank line. |
| /\d{2}-\d{5}/ |
Validate an ID number consisting of 2 digits,
a hyphen, and another 5 digits. |
The following table contains the complete list of metacharacters
and their behavior in the context of regular expressions:
| Character |
Description |
| \ |
Marks the next character as a special character,
a literal, a backreference, or an octal escape. For example,
'n' matches the character "n". '\n' matches a newline character.
The sequence '\\' matches "\" and "\(" matches "(". |
| ^ |
Matches the position at the beginning of the
input string. If the RegExp object's Multiline property is set, ^ also matches the position following '\n'
or '\r'. |
| $ |
Matches the position at the end of the input
string. If the RegExp object's Multiline property
is set, $ also matches the position preceding '\n' or '\r'. |
| * |
Matches the preceding character or subexpression
zero or more times. For example, zo* matches "z" and "zoo".
* is equivalent to {0,}. |
| + |
Matches the preceding character or subexpression
one or more times. For example, 'zo+' matches "zo" and "zoo",
but not "z". + is equivalent to {1,}. |
| ? |
Matches the preceding character or subexpression
zero or one time. For example, "do(es)?" matches the "do"
in "do" or "does". ? is equivalent to {0,1} |
| {n} |
n is a nonnegative integer. Matches exactly n times. For example, 'o{2}' does not match the 'o'
in "Bob," but matches the two o's in "food". |
| {n,} |
n is a nonnegative integer. Matches at
least n times. For example, 'o{2,}' does not match
the "o" in "Bob" and matches all the o's in "foooood". 'o{1,}'
is equivalent to 'o+'. 'o{0,}' is equivalent to 'o*'. |
| {n,m} |
m and n are nonnegative integers,
where n <= m. Matches at least n and
at most m times. For example, "o{1,3}" matches the
first three o's in "fooooood". 'o{0,1}' is equivalent to 'o?'.
Note that you cannot put a space between the comma and the
numbers. |
| ? |
When this character immediately follows any
of the other quantifiers (*, +, ?, {n}, {n,},
{n,m}), the matching pattern is non-greedy.
A non-greedy pattern matches as little of the searched string
as possible, whereas the default greedy pattern matches as
much of the searched string as possible. For example, in the
string "oooo", 'o+?' matches a single "o", while 'o+' matches
all 'o's. |
| . |
Matches any single character except "\n". To
match any character including the '\n', use a pattern such
as '[\s\S]. |
| (pattern) |
Matches pattern and captures the match.
The captured match can be retrieved from the resulting Matches
collection, using the SubMatches collection in VBScript
or the $0…$9 properties in JScript. To match
parentheses characters ( ), use '\(' or '\)'. |
| (?:pattern) |
Matches pattern but does not capture
the match, that is, it is a non-capturing match that is not
stored for possible later use. This is useful for combining
parts of a pattern with the "or" character (|). For example,
'industr(?:y|ies) is a more economical expression than 'industry|industries'. |
| (?=pattern) |
Positive lookahead matches the search string
at any point where a string matching pattern begins.
This is a non-capturing match, that is, the match is not captured
for possible later use. For example 'Windows (?=95|98|NT|2000)'
matches "Windows" in "Windows 2000" but not "Windows" in "Windows
3.1". Lookaheads do not consume characters, that is, after
a match occurs, the search for the next match begins immediately
following the last match, not after the characters that comprised
the lookahead. |
| (?!pattern) |
Negative lookahead matches the search string
at any point where a string not matching pattern begins.
This is a non-capturing match, that is, the match is not captured
for possible later use. For example 'Windows (?!95|98|NT|2000)'
matches "Windows" in "Windows 3.1" but does not match "Windows"
in "Windows 2000". Lookaheads do not consume characters, that
is, after a match occurs, the search for the next match begins
immediately following the last match, not after the characters
that comprised the lookahead. |
| x|y |
Matches either x or y. For example,
'z|food' matches "z" or "food". '(z|f)ood' matches "zood"
or "food". |
| [xyz] |
A character set. Matches any one of the enclosed
characters. For example, '[abc]' matches the 'a' in "plain". |
| [^xyz] |
A negative character set. Matches any character
not enclosed. For example, '[^abc]' matches the 'p' in "plain". |
| [a-z] |
A range of characters. Matches any character
in the specified range. For example, '[a-z]' matches any lowercase
alphabetic character in the range 'a' through 'z'. |
| [^a-z] |
A negative range characters. Matches any character
not in the specified range. For example, '[^a-z]' matches
any character not in the range 'a' through 'z'. |
| \b |
Matches a word boundary, that is, the position
between a word and a space. For example, 'er\b' matches the
'er' in "never" but not the 'er' in "verb". |
| \B |
Matches a nonword boundary. 'er\B' matches the
'er' in "verb" but not the 'er' in "never". |
| \cx |
Matches the control character indicated by x.
For example, \cM matches a Control-M or carriage return character.
The value of x must be in the range of A-Z or a-z.
If not, c is assumed to be a literal 'c' character. |
| \d |
Matches a digit character. Equivalent to [0-9]. |
| \D |
Matches a nondigit character. Equivalent to
[^0-9]. |
| \f |
Matches a form-feed character. Equivalent to
\x0c and \cL. |
| \n |
Matches a newline character. Equivalent to \x0a
and \cJ. |
| \r |
Matches a carriage return character. Equivalent
to \x0d and \cM. |
| \s |
Matches any white space character including
space, tab, form-feed, and so on. Equivalent to [ \f\n\r\t\v]. |
| \S |
Matches any non-white space character. Equivalent
to [^ \f\n\r\t\v]. |
| \t |
Matches a tab character. Equivalent to \x09
and \cI. |
| \v |
Matches a vertical tab character. Equivalent
to \x0b and \cK. |
| \w |
Matches any word character including underscore.
Equivalent to '[A-Za-z0-9_]'. |
| \W |
Matches any nonword character. Equivalent to
'[^A-Za-z0-9_]'. |
| \xn |
Matches n, where n is a hexadecimal
escape value. Hexadecimal escape values must be exactly two
digits long. For example, '\x41' matches "A". '\x041' is equivalent
to '\x04' & "1". Allows ASCII codes to be used in regular
expressions. |
| \num |
Matches num, where num is a positive
integer. A reference back to captured matches. For example,
'(.)\1' matches two consecutive identical characters. |
| \n |
Identifies either an octal escape value or a
backreference. If \n is preceded by at least n captured subexpressions, n is a backreference. Otherwise, n is an octal escape value if n is an octal
digit (0-7). |
| \nm |
Identifies either an octal escape value or a
backreference. If \nm is preceded by at least nm captured subexpressions, nm is a backreference. If
\nm is preceded by at least n captures, n is a backreference followed by literal m. If neither
of the preceding conditions exist, \nm matches octal
escape value nm when n and m are octal
digits (0-7). |
| \nml |
Matches octal escape value nml when n is an octal digit (0-3) and m and l are octal
digits (0-7). |
| \un |
Matches n, where n is a Unicode
character expressed as four hexadecimal digits. For example,
\u00A9 matches the copyright symbol (©).
|
|
Alternation and Grouping
Alternation allows use of the '|' character to allow a choice between
two or more alternatives. Expanding the chapter heading regular expression,
you can expand it to cover more than just chapter headings. However,
it is not as straightforward as you might think. When alternation
is used, the largest possible expression on either side of the '|'
character is matched. You might think that the following expressions
match either 'Chapter' or 'Section' followed
by one or two digits occurring at the beginning and ending of a line:
/^Chapter|Section [1-9][0-9]{0,1}$
Unfortunately, the regular expressions shown above matches either
the word 'Chapter' at the beginning of a line, or 'Section' and whatever
numbers follow that, at the end of the line. If the input string is
'Chapter 22', the expression shown above only matches the word 'Chapter'.
If the input string is 'Section 22', the expression matches 'Section
22'. But that is not the intent here so there must be a way to make
that regular expression more responsive to what you're trying to do
and there is.
You can use parentheses to limit the scope of the alternation, that
is, make sure that it applies only to the two words, 'Chapter' and
'Section'. However, parentheses are also used to create subexpressions
and possibly capture them for later use, something that is covered
in the section on backreferences. By taking the regular expressions
shown above and adding parentheses in the appropriate places, you
can make the regular expression match either 'Chapter 1' or 'Section
3'.
The following regular expressions use parentheses to group 'Chapter'
and 'Section' so the expression works properly.
/^(Chapter|Section) [1-9][0-9]{0,1}$/
Although these expressions work properly, the parentheses around
'Chapter|Section' also cause either of the two matching words to be
captured for future use. Since there is only one set of parentheses
in the expression shown above, there is only one captured submatch.
This submatch can be referred to using the Submatches collection
in VBScript or the $1-$9 properties of the RegExp object
in JScript.
In the above example, you merely want to use the parentheses to group
a choice between the words 'Chapter' and 'Section'. To prevent the
match from being saved for possible later use, place '?:' before the
regular expression pattern inside the parentheses. The following modification
provides the same capability without saving the submatch:
/^(?:Chapter|Section) [1-9][0-9]{0,1}$/
In addition to the '?:' metacharacters, there are two other non-capturing
metacharacters used for something called lookahead matches.
A positive lookahead, specified using ?=, matches the search string
at any point where a matching regular expression pattern in parentheses
begins. A negative lookahead, specified using '?!', matches the search
string at any point where a string not matching the regular expression
pattern begins.
For example, suppose you have a document containing references to
Windows 3.1, Windows 95, Windows 98, and Windows NT. Suppose further
that you need to update the document by finding all the references
to Windows 95, Windows 98, and Windows NT and changing those reference
to Windows 2000. You can use the following JScript regular expression,
which is an example of a positive lookahead, to match Windows 95,
Windows 98, and Windows NT:
/Windows(?=95 |98 |NT )/
Once the match is found, the search for the next match begins immediately
following the matched text, not including the characters included
in the look-ahead. For example, if the expressions shown above matched
'Windows 98', the search resumes after 'Windows' not after '98'.
|
So far, the examples you've seen have been concerned only with finding
chapter headings wherever they occur. Any occurrence of the string
'Chapter' followed by a space, followed by a number, could be an actual
chapter heading, or it could also be a cross-reference to another
chapter. Since true chapter headings always appear at the beginning
of a line, you'll need to devise a way to find only the headings and
not find the cross-references.
Anchors provide that capability. Anchors allow you to fix a regular
expression to either the beginning or end of a line. They also allow
you to create regular expressions that occur either within a word
or at the beginning or end of a word. The following table contains
the list of regular expression anchors and their meanings:
| Character |
Description |
| ^ |
Matches the position at the beginning of the
input string. If the RegExp object's Multiline property is set, ^ also matches the position following '\n'
or '\r'. |
| $ |
Matches the position at the end of the input
string. If the RegExp object's Multiline property
is set, $ also matches the position preceding '\n' or '\r'. |
| \b |
Matches a word boundary, that is, the position
between a word and a space. |
| \B |
Matches a nonword boundary. |
You cannot use a quantifier with an anchor. Since you cannot have
more than one position immediately before or after a newline or word
boundary, expressions such as '^*' are not permitted.
To match text at the beginning of a line of text, use the '^' character
at the beginning of the regular expression. Do not confuse this use
of the '^' with the use within a bracket expression.
To match text at the end of a line of text, use the '$' character
at the end of the regular expression.
To use anchors when searching for chapter headings, the following
JScript regular expression matches a chapter heading with up to two
following digits that occurs at the beginning of a line:
/^Chapter [1-9][0-9]{0,1}/
Not only does a true chapter heading occur at the beginning of a
line, it is also the only text on the line, so it also must be at
the end of a line as well. The following expression ensures that the
match specified only matches chapters and not cross-references. It
does so by creating a regular expression that matches only at the
beginning and end of a line of text.
/^Chapter [1-9][0-9]{0,1}$/
Matching word boundaries is a little different but adds a very important
capability to regular expressions. A word boundary is the position
between a word and a space. A nonword boundary is any other position.
The following JScript expression matches the first three characters
of the word 'Chapter' because they appear following a word boundary:
/\bCha/
The position of the '\b' operator is critical. If it is positioned
at the beginning of a string to be matched, it looks for the match
at the beginning of the word; if it is positioned at the end of the
string, it looks for the match at the end of the word. For example,
the following expressions match 'ter' in the word 'Chapter' because
it appears before a word boundary:
/ter\b/
The following expressions match 'apt' as it occurs in 'Chapter',
but not as it occurs in 'aptitude':
/\Bapt/
The string 'apt' occurs on a nonword boundary in the word 'Chapter'
but on a word boundary in the word 'aptitude'. For the \B nonword
boundary operator, position is not important because the match is
not relative to the beginning or end of a word.
|
The period (.) matches any single printing or non-printing character
in a string, except a newline character (\n). The following JScript
regular expression matches 'aac', 'abc', 'acc', 'adc', and so on,
as well as 'a1c', 'a2c', a-c', and a#c':
/a.c/
If you are trying to match a string containing a file name where
a period (.) is part of the input string, you do so by preceding the
period in the regular expression with a backslash (\) character. To
illustrate, the following JScript regular expression matches 'filename.ext':
/filename\.ext/
These expressions are still pretty limited. They only let you match any single character. Many times, it is useful to match specified
characters from a list. For example, if you have an input text that
contains chapter headings that are expressed numerically as Chapter
1, Chapter 2, and so on, you might want to find those chapter headings.
Bracket Expressions
You can create a list of matching characters by placing one or more
individual characters within square brackets ([ and ]). When characters
are enclosed in brackets, the list is called a bracket expression.
Within brackets, as anywhere else, ordinary characters represent themselves,
that is, they match an occurrence of themselves in the input text.
Most special characters lose their meaning when they occur inside
a bracket expression. Here are some exceptions:
-
The ']' character ends a list if it is not the first item. To
match the ']' character in a list, place it first, immediately following
the opening '['.
-
The '\' character continues to be the escape character. To match
the '\' character, use '\\'.
Characters enclosed in a bracket expression match only a single character
for the position in the regular expression where the bracket expression
appears. The following JScript regular expression matches 'Chapter
1', 'Chapter 2', 'Chapter 3', 'Chapter 4', and 'Chapter 5':
/Chapter [12345]/
Notice that the word 'Chapter' and the space that follows are fixed
in position relative to the characters within brackets. The bracket
expression then, is used to specify only the set of characters that
matches the single character position immediately following the word
'Chapter' and a space. That is the ninth character position.
If you want to express the matching characters using a range instead
of the characters themselves, you can separate the beginning and ending
characters in the range using the hyphen (-) character. The character
value of the individual characters determines their relative order
within a range. The following JScript regular expression contains
a range expression that is equivalent to the bracketed list shown
above.
/Chapter [1-5]/
When a range is specified in this manner, both the starting and ending
values are included in the range. It is important to note that the
starting value must precede the ending value in Unicode sort order.
If you want to include the hyphen character in your bracket expression,
you must do one of the following:
- Escape it with a backslash: [\-]
- Put the hyphen character at the beginning or the end of the bracketed
list. The following expressions matches all lowercase letters and
the hyphen: [-a-z]
[a-z-]
- Create a range where the beginning character value is lower than
the hyphen character and the ending character value is equal to
or greater than the hyphen. Both of the following regular expressions
satisfy this requirement: [!--]
[!-~]
You can also find all the characters not in the list or range by
placing the caret (^) character at the beginning of the list. If the
caret character appears in any other position within the list, it
matches itself, that is, it has no special meaning. The following
JScript regular expression matches chapter headings with numbers greater
than 5':
/Chapter [^12345]/
In the examples shown above, the expression matches any digit character
in the ninth position except 1, 2, 3, 4, or 5. So, for example, 'Chapter
7' is a match and so is 'Chapter 9'.
The same expressions above can be represented using the hyphen character
(-).
/Chapter [^1-5]/
A typical use of a bracket expression is to specify matches of any
upper- or lowercase alphabetic characters or any digits. The following
JScript expression specifies such a match:
/[A-Za-z0-9]/
|
There are a number of useful non-printing characters that must be
used occasionally. The following table shows the escape sequences
used to represent those non-printing characters:
| Character |
Meaning |
| \cx |
Matches the control character indicated by x.
For example, \cM matches a Control-M or carriage return character.
The value of x must be in the range of A-Z or a-z.
If not, c is assumed to be a literal 'c' character. |
| \f |
Matches a form-feed character. Equivalent to
\x0c and \cL. |
| \n |
Matches a newline character. Equivalent to \x0a
and \cJ. |
| \r |
Matches a carriage return character. Equivalent
to \x0d and \cM. |
| \s |
Matches any white space character including
space, tab, form-feed, and so on. Equivalent to [\f\n\r\t\v]. |
| \S |
Matches any non-white space character. Equivalent
to [^ \f\n\r\t\v]. |
| \t |
Matches a tab character. Equivalent to \x09
and \cI. |
| \v |
Matches a vertical tab character. Equivalent
to \x0b and \cK. |
|
Order
of Precedence
Once you have constructed a regular expression, it is evaluated much
like an arithmetic expression, that is, it is evaluated from left
to right and follows an order of precedence.
The following table illustrates, from highest to lowest, the order
of precedence of the various regular expression operators:
| Operator(s) |
Description |
| \ |
Escape |
| (), (?:), (?=), [] |
Parentheses and Brackets |
| *, +, ?, {n}, {n,}, {n,m} |
Quantifiers |
| ^, $, \anymetacharacter |
Anchors and Sequences |
| | |
Alternation |
Characters have higher precedence than the alternation operator,
which allows 'm|food' to match "m" or "food". To match "mood" or "food",
use parentheses to create a subexpression, which results in '(m|f)ood'.
|
Ordinary
Characters
Ordinary characters consist of all printable and non-printable characters
that are not explicitly designated as metacharacters. This includes
all uppercase and lowercase alphabetic characters, all digits, all
punctuation marks, and some symbols.
The simplest form of a regular expression is a single, ordinary character
that matches itself in a searched string. For example, the single-character
pattern 'A' matches the letter 'A' wherever it appears in the searched
string. Here are some examples of single-character regular expression
patterns:
/a/
/7/
/M/
You can combine a number of single characters to form a larger expression.
For example, the following JScript regular expression is nothing more
than an expression created by combining the single-character expressions
'a', '7', and 'M'.
/a7M/
Notice that there is no concatenation operator. All that is required
is that you just put one character after another.
|
Quantifiers
Sometimes, you do not know how many characters there are to match.
In order to accommodate that kind of uncertainty, regular expressions
support the concept of quantifiers. These quantifiers let you specify
how many times a given component of your regular expression must occur
for your match to be true.
The following table illustrates the various quantifiers and their
meanings:
| Character |
Description |
| * |
Matches the preceding character or subexpression
zero or more times. For example, 'zo*' matches "z" and "zoo".
* is equivalent to {0,}. |
| + |
Matches the preceding character or subexpression
one or more times. For example, 'zo+' matches "zo" and "zoo",
but not "z". + is equivalent to {1,}. |
| ? |
Matches the preceding character or subexpression
zero or one time. For example, 'do(es)?' matches the "do"
in "do" or "does". ? is equivalent to {0,1} |
| {n} |
n is a nonnegative integer. Matches exactly n times. For example, 'o{2}' does not match the 'o'
in "Bob," but matches the two o's in "food". |
| {n,} |
n is a nonnegative integer. Matches at
least n times. For example, 'o{2,}' does not match
the 'o' in "Bob" and matches all the o's in "foooood". 'o{1,}'
is equivalent to 'o+'. 'o{0,}' is equivalent to 'o*'. |
| {n,m} |
m and n are nonnegative integers,
where n <= m. Matches at least n and
at most m times. For example, 'o{1,3}' matches the
first three o's in "fooooood". 'o{0,1}' is equivalent to 'o?'.
Note that you cannot put a space between the comma and the
numbers. |
With a large input document, chapter numbers could easily exceed
nine, so you need a way to handle two or three digit chapter numbers.
Quantifiers give you that capability. The following JScript regular
expression matches chapter headings with any number of digits:
/Chapter [1-9][0-9]*/
Notice that the quantifier appears after the range expression. Therefore,
it applies to the entire range expression which, in this case, specifies
only digits from 0 through 9, inclusive.
The '+' quantifier is not used here because there does not necessarily
need to be a digit in the second or subsequent position. The '?' character
also is not used because it limits the chapter numbers to only two
digits. You want to match at least one digit following 'Chapter' and
a space character.
If you know that your chapter numbers are limited to only 99 chapters,
you can use the following JScript expression to specify at least one,
but not more than 2 digits.
/Chapter [0-9]{1,2}/
The disadvantage to the expression shown above is that if there is
a chapter number greater than 99, it will still only match the first
two digits. Another disadvantage is that somebody could create a Chapter
0 and it would match. A better JScript expression for matching only
two digits are the following:
/Chapter [1-9][0-9]?/
-or-
/Chapter [1-9][0-9]{0,1}/
The '*', '+', and '?' quantifiers are all what are referred to as greedy, that is, they match as much text as possible. Sometimes
that is not at all what you want to happen. Sometimes, you just want
a minimal match.
Say, for example, you are searching an HTML document for an occurrence
of a chapter title enclosed in an H1 tag. That text appears in your
document as:
<H1>Chapter 1 – Introduction to Regular Expressions</H1>
The following expression matches everything from the opening less
than symbol (<) to the greater than symbol (>) at the end of
the closing H1 tag.
/ <.*>/
If all you really wanted to match was the opening H1 tag, the following,
non-greedy expression matches only <H1>.
/ <.*?>/
-or-
"<.*?>"
By placing the '?' after a '*', '+', or '?' quantifier, the expression
is transformed from a greedy to a non-greedy, or minimal, match.
|
Special Characters
There are a number of metacharacters that require special treatment
when trying to match them. To match these special characters, you
must first escape those characters, that is, precede them with
a backslash character (\). The following table shows those special
characters and their meanings:
| Special Character |
Comment |
| $ |
Matches the position at the end of an input
string. If the RegExp object's Multiline property
is set, $ also matches the position preceding '\n' or '\r'.
To match the $ character itself, use \$. |
| ( ) |
Marks the beginning and end of a subexpression.
Subexpressions can be captured for later use. To match these
characters, use \( and \). |
| * |
Matches the preceding character or subexpression
zero or more times. To match the * character, use \*. |
| + |
Matches the preceding character or subexpression
one or more times. To match the + character, use \+. |
| . |
Matches any single character except the newline
character \n. To match ., use \. |
| [ ] |
Marks the beginning of a bracket expression.
To match these characters, use \[ and \]. |
| ? |
Matches the preceding character or subexpression
zero or one time, or indicates a non-greedy quantifier. To
match the ? character, use \?. |
| \ |
Marks the next character as either a special
character, a literal, a backreference, or an octal escape.
For example, 'n' matches the character 'n'. '\n' matches a
newline character. The sequence '\\' matches "\" and '\('
matches "(". |
| / |
Denotes the start or end of a literal regular
expression. To match the '/' character, use '\/'. |
| ^ |
Matches the position at the beginning of an
input string except when used in a bracket expression where
it negates the character set. To match the ^ character itself,
use \^. |
| { } |
Marks the beginning of a quantifier expression.
To match these characters, use \{ and \}. |
| | |
Indicates a choice between two items. To match
|, use \|. |
|
|