RSS and XPath

Updated , First Published by Pete Freitag

I came across a handy reference article on xml.com today that gives XPath queries for RSS and Atom feeds. Just last week I was attempting to parse a RSS 1.0 feed in CFMX using the XMLSearch coldfusion function function. I'm running into problems however due to the name spaces in RSS 1.0, here's the code I'm using:

<cfhttp url="https://www.example.com/rss.xml" method="get" />
<cfset rss = XMLParse(cfhttp.filecontent)>

<!--- get an array of items --->
<cfset items = XMLSearch(rss, "/rdf:RDF/item")>
<cfdump var="#items#">

The result is that the items array is empty. I think this is a namespace issue, but I'm not really sure. Is this a bug? Anyone have an idea?

BTW if your looking to parse RSS 0.9x or 2.0 with XPath check out this older blog post.

The Fixinator Code Security Scanner for ColdFusion & CFML is an easy to use security tool that every CF developer can use. It can also easily integrate into CI for automatic scanning on every commit.

Comments

Roger Benningfield

Why not just use: XmlSearch(rss, "//item") That way you don''t have to worry about what version of RSS you''re parsing... you''ll always get an array of items.

Pete Freitag

Hi Roger, I tried that as well previously, it also returned an empty array. Were you able to get that to work? -pete

Sean Corfield

This works: XMLSearch(rss,"/rdf:RDF/:item") as does this: XMLSearch(rss,"//:item") Because of the namespaces, you have to explicitly specify that ''item'' has an empty namespace prefix.

Roger Benningfield

Pete: I took a sec to dig around in JournURL''s aggregator code, and found what I''m actually using: XmlSearch(myRSS, "//*[name()=''item'']")

Justin

This seems to be the one that works best for most feeds we try to parse: XmlSearch(myRSS, "//*[name()='item']") I have a question about looping through the resulting array. What should the xpath be when looking for the description, title and item of the children? Thanks...