Regular Expressions with ColdFusion - a Howto Guide

December 19, 2003
coldfusion

Regular Expressions are a powerful tool for both developers and computer users alike. Regular Expressions were originally developed on Unix systems and used in programs like Perl, sed, and grep. You may find slight variations between the programs that use Regular Expressions, but for the most part they are very similar.

Regular Expressions are a simple pattern matching language. They are typically used to either find a string or substring that matches a pattern, or to perform some sort translation on the string using a pattern. Although Regular Expressions often look cryptic or like someone fell on the keyboard, there are only 12 key elements to the language. Once you learn these key elements you can do anything you want with them.

ColdFusion's support of Regular Expressions lies within the functions REFind, REReplace, REReplaceNoCase, and REFindNoCase. The REFind functions return the position of the matching pattern in the string, and the REReplace functions allow you to replace sub-strings matching the pattern with another string.

Lets start by explaining three symbols which we will call the quantifiers, they are *, ?, and +. Quantifiers are used to specify how many times the preceding character may occur. The * quantifier represents zero or many, so it means that the preceding symbol could occur zero times, one time, or repeated any number of times. Here are some examples using the * quantifier;

The basic syntax for the REReplace function is: REReplace(string, pattern, replacement)

REReplace("Ahhhhh", "Ah*", "Matched") returns Matched

REReplace("A", "Ah*", "Matched") returns Matched (there were zero h's the * matches zero or more)

The next quantifier is the ? it matches zero or one of the preceding character.

REReplace("Ah", "Ah?", "Matched") returns Matched

REReplace("A", "Ah?", "Matched") returns Matched

REReplace("Ahhhhh", "Ah?", "Matched") returns Matchedhhhh

The third quantifier as you may be able to guess matches one or more of the preceding characters is the + quantifier.

REReplace("Ah", "Ah+", "Matched") returns Matched

REReplace("A", "Ah+", "Matched") returns A (must be at least one h in the string)

REReplace("Ahhhhh", "Ah+", "Matched") returns Matched

Regular Expressions also have some other special characters besides the quantifiers; they are used to represent a set of possible characters. The first special character we will look at is the . (dot). It represents any possible character including white space and new lines. For example with the pattern "do." You can match the words dot, and dog, but you cannot match do. However if you add the * quantifier to the pattern "do.*" you can match dot, dog, do, and door.

Two special characters are defined to represent the beginning ^ (when outside of the square brackets), and the end $ of the string.

The next element of Regular Expressions we will look at are the square brackets [ ]. With the square brackets you can specify finite sets of characters that could be in the current character position. For example the pattern "[dl]og" will match dog, or log, but it will not match dlog.

By adding a ^ (caret) as the first character in the brackets you will impose a negation on the characters in the brackets. Lets add the ^ to the pattern above "[^dl]og" now we will match any sequence ending in og that isn't preceded with a d or l, such as fog.

The brackets also allow you to specify a range of characters using the - (dash) character. Here's an example using the - character "[0-9]?[0-9]/[0-9]?[0-9]/[0-9][0-9]", can you figure out what it is matching? It is matching dates, it can match 1/1/01 or 12/12/02 using the ? quantifier we made the first digit of the month and day optional.

The | (pipe) special character is used as a logical OR. Each character in the pattern is intrinsically joined together with a logical AND, unless explicitly specified with the | character. If you wanted a regular expression to match car or bar you could use the pattern "c|bar".

You may have noticed with that last example that we could have also matched car or bar using brackets "[cb]ar". It turns out that the | character has other uses, it can be used for matching sequences of characters as well, this is done with our next special character the parenthesis ( ). Parenthesis are used to group together characters. Here's an example using parenthesis and the | character "(Mon)|(Tues)day" which matches either Monday or Tuesday.

One final special character is the \ character which is used for escaping any of our quantifiers or special characters. Lets look at a practical example, of validating an email address. To build this pattern we will simply break down what an email address is. Email addresses start with a username ".+", next comes the @ sign ".+@" then a domain name that contains at least one dot (we will need to escape it) ".+@.+\..+" is our pattern. Here's a code snippet that you can use: You entered an invalid email address.

The \ is also used for a very handy feature called the back reference. It allows you to use your groups (patterns in parenthesis) again in your pattern or in your replacement. The back reference \1 represents the first group, \2 the second, etc. Here's an example that eliminates repeated words:

REReplace("Echo Echo","(.+) +\1","\1","ALL")

One final aspect of Regular Expressions are Character Classes. Character Classes are keywords that represent a predefined set of characters. One example is [[:alnum:]] which is the same as [a-zA-Z0-9], these exist simply for convenience, and are detailed in the ColdFusion documentation.

Also checkout my handy Regex Cheat Sheet



Related Entries

6 people found this page useful, what do you think?

Comments

Great Article... I noticed the date was a couple of years ago. do you know if Coldfusion 7 modified or improved any of this?
very haelpful but he .+ doesnt make me lucky the xepression [0-9] SSR.+/n remove everything till 14 and not only till 13! Any idea? LT/n 12 SSR DUE AIRCRAFT/VERSION CHANGE/n 13 RX RESTRICTED /n 14
Is it possible to use matches in coldfusion regular expressions?

For a function like

reReplaceNoCase(string, reg_expression, substring [, scope ])

Ideally it would be great to use matched cases in the substring....unfortunately it seems that the substring doesn't allow regular expressions. Is there a way to achieve this with CF?

Thanks....hopefully that makes sense.

Aaron

Post a Comment




  



Spell Checker by Foundeo

Recent Entries



foundeo


did you hack my cf?