pf » Function to Remove Extended Characters
Function to Remove Extended Characters
I wrote a function today that I thought I would share. What it does is replace extended, or unicode characters with HTML/XML character entities. EG: the character à becomes à.
I wrote this for an RSS feed that had a few unicode characters in it, but the majority of the feed was us-ascii. Rather than changing the encoding, I opted to replace those few chars with an ascii safe XML representation.
Here's The function:
<cffunction name="EscapeExtendedChars" returntype="string">
<cfargument name="str" type="string" required="true">
<cfset var buf = CreateObject("java", "java.lang.StringBuffer")>
<cfset var len = Len(arguments.str)>
<cfset var char = "">
<cfset var charcode = 0>
<cfset buf.ensureCapacity(JavaCast("int", len+20))>
<cfif NOT len>
<cfreturn arguments.str>
</cfif>
<cfloop from="1" to="#len#" index="i">
<cfset char = arguments.str.charAt(JavaCast("int", i-1))>
<cfset charcode = JavaCast("int", char)>
<cfif (charcode GT 31 AND charcode LT 127) OR charcode EQ 10
OR charcode EQ 13 OR charcode EQ 9>
<cfset buf.append(JavaCast("string", char))>
<cfelse>
<cfset buf.append(JavaCast("string", "#"))>
<cfset buf.append(JavaCast("string", charcode))>
<cfset buf.append(JavaCast("string", ";"))>
</cfif>
</cfloop>
<cfreturn buf.toString()>
</cffunction>
I'm making use of Java's StringBuffer class, and also the charAt method of java.lang.String. I think this code is a pretty fast solution, since it avoid appending strings by hand, and I would guess the charAt method may be a bit faster than using Mid.
btw nolan, those chars are most likely not unicode but windows codepage, which is a sort of superset of iso-8859-1.
http://www.cflib.org/udf.cfm?ID=725
<cfsilent> <cfset stra = ""> <cfset strb = CreateObject("java", "java.lang.StringBuffer")>
<cfset a = gettickcount()>
<cfloop from="1" to="10000" index="I"> <cfset stra = stra & "a"> </cfloop>
<cfset b = gettickcount()>
<cfset c = gettickcount()>
<cfloop from="1" to="10000" index="I"> <cfset strb.append(JavaCast("string", "a"))> </cfloop>
<cfset d = gettickcount()>
</cfsilent><cfoutput>result: #b-a# - #d-c#<br></cfoutput>
Any suggestions on what to do in moving forward and resolve this issue? Thanks.
Cheers, Pete (aka lad4bear)
Hope some one show me how to use this fucntion in my cfquery to replace those character.
Thanks
(excuse my poor English)
- Dear SQL Server Enterprise Manager Developer
- PostalMethods - Web Service for Snail Mail
- Mastering CFQUERYPARAM
- Google Code Search for ColdFusion
- Speaking at CFUNITED 2008
- Getting ColdFusion SQL Statements from SQL Server Trace
- CFSCRIPT Cheatsheet
- 3 New Image Effects for ColdFusion 8
RSS
add to del.icio.us
Pete Freitag is a software engineer, and web developer located in











