Function to Remove Extended Characters
I wrote a ColdFusion function today that I thought I would share. What it does is replace extended, or unicode characters with HTML/XML character entities. EG: the character à becomes à.
I wrote this for an RSS feed that had a few unicode characters in it, but the majority of the feed was us-ascii. Rather than changing the encoding, I opted to replace those few chars with an ascii safe XML representation.
Here's The function:
<cffunction name="EscapeExtendedChars" returntype="string"> <cfargument name="str" type="string" required="true"> <cfset var buf = CreateObject("java", "java.lang.StringBuffer")> <cfset var len = Len(arguments.str)> <cfset var char = ""> <cfset var charcode = 0> <cfset buf.ensureCapacity(JavaCast("int", len+20))> <cfif NOT len> <cfreturn arguments.str> </cfif> <cfloop from="1" to="#len#" index="i"> <cfset char = arguments.str.charAt(JavaCast("int", i-1))> <cfset charcode = JavaCast("int", char)> <cfif (charcode GT 31 AND charcode LT 127) OR charcode EQ 10 OR charcode EQ 13 OR charcode EQ 9> <cfset buf.append(JavaCast("string", char))> <cfelse> <cfset buf.append(JavaCast("string", "#"))> <cfset buf.append(JavaCast("string", charcode))> <cfset buf.append(JavaCast("string", ";"))> </cfif> </cfloop> <cfreturn buf.toString()> </cffunction>
I'm making use of Java's StringBuffer class, and also the
charAt method of
java.lang.String. I think this code is a pretty fast solution, since it avoid appending strings by hand, and I would guess the
charAt method may be a bit faster than using the builtin CFML Mid function.
Like this? Follow me ↯Tweet Follow @pfreitag
Function to Remove Extended Characters was first published on January 21, 2005.
The FuseGuard Web Application Firewall for ColdFusion & CFML is a high performance, customizable engine that blocks various attacks against your ColdFusion applications.
The weekly newsletter for the CFML Community
btw nolan, those chars are most likely not unicode but windows codepage, which is a sort of superset of iso-8859-1.
Any suggestions on what to do in moving forward and resolve this issue? Thanks.
Cheers, Pete (aka lad4bear)
Hope some one show me how to use this fucntion in my cfquery to replace those character.
(excuse my poor English)