Function to Remove Extended Characters
I wrote a function today that I thought I would share. What it does is replace extended, or unicode characters with HTML/XML character entities. EG: the character à becomes à.
I wrote this for an RSS feed that had a few unicode characters in it, but the majority of the feed was us-ascii. Rather than changing the encoding, I opted to replace those few chars with an ascii safe XML representation.
Here's The function:
<cffunction name="EscapeExtendedChars" returntype="string"> <cfargument name="str" type="string" required="true"> <cfset var buf = CreateObject("java", "java.lang.StringBuffer")> <cfset var len = Len(arguments.str)> <cfset var char = ""> <cfset var charcode = 0> <cfset buf.ensureCapacity(JavaCast("int", len+20))> <cfif NOT len> <cfreturn arguments.str> </cfif> <cfloop from="1" to="#len#" index="i"> <cfset char = arguments.str.charAt(JavaCast("int", i-1))> <cfset charcode = JavaCast("int", char)> <cfif (charcode GT 31 AND charcode LT 127) OR charcode EQ 10 OR charcode EQ 13 OR charcode EQ 9> <cfset buf.append(JavaCast("string", char))> <cfelse> <cfset buf.append(JavaCast("string", "#"))> <cfset buf.append(JavaCast("string", charcode))> <cfset buf.append(JavaCast("string", ";"))> </cfif> </cfloop> <cfreturn buf.toString()> </cffunction>
I'm making use of Java's StringBuffer class, and also the
charAt method of
java.lang.String. I think this code is a pretty fast solution, since it avoid appending strings by hand, and I would guess the
charAt method may be a bit faster than using Mid.
- What is the difference between ASCII Chr(10) and Chr(13)
- Fixinator and Foundeo Security Bundle
- Running CFML on AWS Lambda with FuseLess Slides
- Updating Java on ColdFusion or Lucee
- ColdFusion returning empty response with server-error: true
- Careful applying CF11u16, CF2016u8, CF2018u2
- Sessions don't work in Chrome but do in IE
- csrfVerifyToken does not invalidate the token