I came across a handy reference article on xml.com today that gives XPath queries for RSS and Atom feeds. Just last week I was attempting to parse a RSS 1.0 feed in CFMX using the XMLSearch coldfusion function function. I'm running into problems however due to the name spaces in RSS 1.0, here's the code I'm using:
<cfhttp url="https://www.example.com/rss.xml" method="get" /> <cfset rss = XMLParse(cfhttp.filecontent)> <!--- get an array of items ---> <cfset items = XMLSearch(rss, "/rdf:RDF/item")> <cfdump var="#items#">
The result is that the items array is empty. I think this is a namespace issue, but I'm not really sure. Is this a bug? Anyone have an idea?
BTW if your looking to parse RSS 0.9x or 2.0 with XPath check out this older blog post.
Comments
Why not just use: XmlSearch(rss, "//item") That way you don''t have to worry about what version of RSS you''re parsing... you''ll always get an array of items.
Hi Roger, I tried that as well previously, it also returned an empty array. Were you able to get that to work? -pete
This works: XMLSearch(rss,"/rdf:RDF/:item") as does this: XMLSearch(rss,"//:item") Because of the namespaces, you have to explicitly specify that ''item'' has an empty namespace prefix.
Ah! Good to know, thanks Sean!
Pete: I took a sec to dig around in JournURL''s aggregator code, and found what I''m actually using: XmlSearch(myRSS, "//*[name()=''item'']")
This seems to be the one that works best for most feeds we try to parse: XmlSearch(myRSS, "//*[name()='item']") I have a question about looping through the resulting array. What should the xpath be when looking for the description, title and item of the children? Thanks...