HTML Markup | JavaScript | Java | Home & Links

Tutorial 15 - Regular Expressions

Regular expressions are a shorthand notation for matching, extracting, sorting or formatting strings. Their most common use is to reduce the amount of work while validating data input. This tutorial expains how one can use a regular expression, the pattern syntax used, and several useful examples.

Using Regular Expressions

You should always test a regular expression before using it in your own scripts. One easy site to use is Dan's Tools. A cookbook of useful RegExp enabled functions is provided by O'Sullivan. To use a regular expression for validating an entry in JavaScript, first set up a variable that contains the expression.
Note: Forward slashes are used to quote a regular expression while ' and " are used to quote a string expression.

re = /whatever/

Then apply the regular expression test method on the string to be tested.

if (re.test(entryValue)) {return true;}

To use a regular expression to extract a matching string, first set up a regular expression variable as above. Next use the regular expression exec method on the string. Any match is returned and null indicates no match.

var ar = re.exec(var_string);

To use a regular expression for modifying a string in JavaScript first set up a regular expression variable as above. Next use the string replace method. Note that you can use back references if required.

var x = y.replace(re,"$1");

Escape Sequences and Character Classes

Escape sequences are used to allow print formatting as well as preventing certain characters from causing interpretation errors. Each escape sequence starts with a backslash. The available sequences are:

\rcarriage return
\thorizontal tab
\vvertical tab
\Bbackslash [alternate format]
\xnnASCII char defined by hex code nn
\onnASCII char defined by octal code nn
\unnnnUnicode char defined by sequence nnnn
\cXControl char defined by X


Special character class abbreviations are used to shorten the amount of typing and specifying required when creating a regular expression. For example \w includes all letters, numbers and the underscore character.

\dAny digit 0-9
\DAny non-digit
\sAny whitespace character
\SAny single non-whitespace
\wAny letter, number or underscore
\WAny char except letter, number
or underscore
.Any character except newline
[abcde]Any character in the enclosed set
[^abcde]Any character not in the enclosed set
[a-e]Any character in the enclosed range
x|yEither x or y (ie. logical OR)
()Grouping that is stored (back referenced)
for later use ($1, $2 etc.)


Boundary Matches and Greedy Quantifiers

^Beginning of string
$End of string
\bWord boundary
\BNon-word boundary
CharacterMatches Previous Char
*Zero or more times
+One or more times
?Zero or one time
{n}Exactly n occurrences
{n,}At least n occurrences
{n,m}Between n and m occurrences


Regular Expression Modifiers

Regular expression modifiers have been added to the syntax to handle global modification of the entire expression. They are placed at the end of the expression outside the quoting brackets as in /[abc]+/i

gglobal search for all matches
iinsensitive case searches
mmultiple line searches

Example: Trim() Function

The following trim() function removes leading and trailing whitespace characters from a string. If a second parameter is used, it is used instead of the standard whitespace characters. ltrim() and rtrim() can be used independently!

Example: URLs and Files

Validation of an URL or filename often checks for specific extensions. A regular expression that will catch all image filenames (and more!) is:


The above expression will match only image files that are Web standard. The expression is not foolproof as it permits subfolders with null names such as a//b.gif and specs like a:/b:/c.gif

Example: Canada Post Code

The Canada Post Code rules are:

  1. Letters and numbers alternate for exactly six characters (eg L0S1E0).
  2. D, F, I, O, Q and U are never used as they can cause optical reader issues.
  3. W and Z are not used as the first letter (region designator).

A 'first version' regular expression for Canadian postal codes is:


This expression makes sure that there is exactly 3 {3} groups of a letter [a-z] followed by a digit \d. The i suffix indicates insensitivity (ie capitals allowed). The ^ and $ guarantee that no other data is provided. However this easy to understand expression does not allow for an optional space after the third character or restricted subsets on each letter. It also doesn't allow for leading/trailing whitespace. The solution is to explicitly do the repeating but place a (\s)? to check for zero or one space after the third character and to reduce the matches on letters to the specific subsets.


Example: E-mail Addresses

E-mail addresses are of the form xxx@yyy where xxx is the specific mailbox (and can contain underscores and periods) and yyy is the domain (and can contain a series of suffixes such as


This matches 99.99% of valid entries. All regular expressions start and end with forward slashes to differentiate them from ordinary string expressions. Most regular expressions start matches at the first char ^ and end at the last $.

Next match the mailbox name which can include periods and dashes \w+ states one or more alphanumeric must be at the start of the name. ([\.-]?\w+)* allows periods or dashes to be included in the mailbox name with the trailing \w+ ensuring that those characters can not finish the name. The @ is the mandatory separator.

The domain name can have several .xx or .xyz suffixes (eg Once again \w+ ensures that domain starts with an alphanumeric and ([\.-]?\w+)* allows for the dashes and periods. Finally (\.\w{2,3})+ ensures that there is at least one suffix of between 2 and 3 characters preceded by a period.

Note: This is not a completely foolproof validation as it does not account for new domain names of 4 or more characters. Also not all two and three letter combinations are legitimate domains!

JR's HomePage | Comments [jstutorf.htm:2016 02 18]