Regular Expressions with ColdFusion - a Howto Guide
By Pete Freitag
Regular Expressions are a powerful tool for both developers and computer users alike. Regular Expressions were originally developed on Unix systems and used in programs like Perl, sed, and grep. You may find slight variations between the programs that use Regular Expressions, but for the most part they are very similar.
Regular Expressions are a simple pattern matching language. They are typically used to either find a string or substring that matches a pattern, or to perform some sort translation on the string using a pattern. Although Regular Expressions often look cryptic or like someone fell on the keyboard, there are only 12 key elements to the language. Once you learn these key elements you can do anything you want with them.
ColdFusion's support of Regular Expressions lies within the functions REFind, REReplace, REReplaceNoCase, and REFindNoCase. The REFind functions return the position of the matching pattern in the string, and the REReplace functions allow you to replace sub-strings matching the pattern with another string.
Lets start by explaining three symbols which we will call the quantifiers, they are
+. Quantifiers are used to specify how many times the preceding character may occur. The * quantifier represents zero or many, so it means that the preceding symbol could occur zero times, one time, or repeated any number of times. Here are some examples using the * quantifier;
The basic syntax for the REReplace function is:
REReplace(string, pattern, replacement)
REReplace("Ahhhhh", "Ah*", "Matched") returns
REReplace("A", "Ah*", "Matched") returns
Matched (there were zero h's the * matches zero or more)
The next quantifier is the ? it matches zero or one of the preceding character.
REReplace("Ah", "Ah?", "Matched") returns
REReplace("A", "Ah?", "Matched") returns
REReplace("Ahhhhh", "Ah?", "Matched") returns
The third quantifier as you may be able to guess matches one or more of the preceding characters is the + quantifier.
REReplace("Ah", "Ah+", "Matched") returns
REReplace("A", "Ah+", "Matched") returns
A (must be at least one h in the string)
REReplace("Ahhhhh", "Ah+", "Matched") returns
Regular Expressions also have some other special characters besides the quantifiers; they are used to represent a set of possible characters. The first special character we will look at is the . (dot). It represents any possible character including white space and new lines, chr(10), chr(13), etc. For example with the pattern "do." You can match the words dot, and dog, but you cannot match do. However if you add the * quantifier to the pattern "do.*" you can match dot, dog, do, and door.
Two special characters are defined to represent the beginning ^ (when outside of the square brackets), and the end $ of the string.
The next element of Regular Expressions we will look at are the square brackets [ ]. With the square brackets you can specify finite sets of characters that could be in the current character position. For example the pattern "[dl]og" will match dog, or log, but it will not match dlog.
By adding a ^ (caret) as the first character in the brackets you will impose a negation on the characters in the brackets. Lets add the ^ to the pattern above "[^dl]og" now we will match any sequence ending in og that isn't preceded with a d or l, such as fog.
The brackets also allow you to specify a range of characters using the - (dash) character. Here's an example using the - character "[0-9]?[0-9]/[0-9]?[0-9]/[0-9][0-9]", can you figure out what it is matching? It is matching dates, it can match 1/1/01 or 12/12/02 using the ? quantifier we made the first digit of the month and day optional.
The | (pipe) special character is used as a logical OR. Each character in the pattern is intrinsically joined together with a logical AND, unless explicitly specified with the | character. If you wanted a regular expression to match car or bar you could use the pattern "c|bar".
You may have noticed with that last example that we could have also matched car or bar using brackets "[cb]ar". It turns out that the | character has other uses, it can be used for matching sequences of characters as well, this is done with our next special character the parenthesis ( ). Parenthesis are used to group together characters. Here's an example using parenthesis and the | character "(Mon)|(Tues)day" which matches either Monday or Tuesday.
One final special character is the \ character which is used for escaping any of our quantifiers or special characters. Let's look at a practical example, of validating an email address. To build this pattern we will simply break down what an email address is. Email addresses start with a username ".+", next comes the @ sign ".+@" then a domain name that contains at least one dot (we will need to escape it) ".+@.+\..+" is our pattern. Here's a code snippet that you can use:
<cfif NOT ReFind(".+@.+\..+", form.emailAddress)> You entered an invalid email address. <cfabort> </cfif>
The \ is also used for a very handy feature called the back reference. It allows you to use your groups (patterns in parenthesis) again in your pattern or in your replacement. The back reference \1 represents the first group, \2 the second, etc. Here's an example that eliminates repeated words:REReplace("Echo Echo","(.+) +\1","\1","ALL")
One final aspect of Regular Expressions are Character Classes. Character Classes are keywords that represent a predefined set of characters. One example is [[:alnum:]] which is the same as [a-zA-Z0-9], these exist simply for convenience, and are detailed in the ColdFusion documentation.
Also checkout my Regex Cheat Sheet for more tips
Regular Expressions with ColdFusion - a Howto Guide was first published on December 19, 2003.
If you like reading about regular expressions, regex, cfml, coldfusion, howto, or tips then you might also like:
- Use CFSILENT
- URL Safe Base64 Encoding / Decoding in CFML
- Using Hashicorp Vault with ColdFusion
- CFML Left and Right Functions can Accept Negative Counts
The Fixinator Code Security Scanner for ColdFusion & CFML is an easy to use security tool that every CF developer can use. It can also easily integrate into CI for automatic scanning on every commit.