Function to Remove Extended Characters
I wrote a ColdFusion function today that I thought I would share. What it does is replace extended, or unicode characters with HTML/XML character entities. EG: the character à becomes à.
I wrote this for an RSS feed that had a few unicode characters in it, but the majority of the feed was us-ascii. Rather than changing the encoding, I opted to replace those few chars with an ascii safe XML representation.
Here's The function:
<cffunction name="EscapeExtendedChars" returntype="string"> <cfargument name="str" type="string" required="true"> <cfset var buf = CreateObject("java", "java.lang.StringBuffer")> <cfset var len = Len(arguments.str)> <cfset var char = ""> <cfset var charcode = 0> <cfset buf.ensureCapacity(JavaCast("int", len+20))> <cfif NOT len> <cfreturn arguments.str> </cfif> <cfloop from="1" to="#len#" index="i"> <cfset char = arguments.str.charAt(JavaCast("int", i-1))> <cfset charcode = JavaCast("int", char)> <cfif (charcode GT 31 AND charcode LT 127) OR charcode EQ 10 OR charcode EQ 13 OR charcode EQ 9> <cfset buf.append(JavaCast("string", char))> <cfelse> <cfset buf.append(JavaCast("string", "#"))> <cfset buf.append(JavaCast("string", charcode))> <cfset buf.append(JavaCast("string", ";"))> </cfif> </cfloop> <cfreturn buf.toString()> </cffunction>
I'm making use of Java's StringBuffer class, and also the
charAt method of
java.lang.String. I think this code is a pretty fast solution, since it avoid appending strings by hand, and I would guess the
charAt method may be a bit faster than using the builtin CFML Mid function.
Like this? Follow me ↯Tweet Follow @pfreitag
Function to Remove Extended Characters was first published on January 21, 2005.
The FuseGuard Web Application Firewall for ColdFusion & CFML is a high performance, customizable engine that blocks various attacks against your ColdFusion applications.