Saturday, October 12, 2013

Parsing XML with XPath and a User Agent

Working on an android project, I need to parse XML RSS Feeds and use the given links further down the application. I opted for XPath, for simplicity and being built-in (XPathFactory).

It all worked fine, but one site. It detects devices (user Agent) on the RSS feed, and reformats the links accordingly. This means that I get the links of the mobile site, instead of the links of the main site. This is a strage implementation and unneccessary, in my point of view, still this was the case, and I had to deal with it.

XPath parser recives the XML document through an InputSource, which doesn't give me any control over the passed parameters to the site:

XPathFactory xPathFact = XPathFactory.newInstance();
XPath xPath = xPathFact.newXPath();

InputSource XPInputSrc = new InputSource(rss_url);

NodeList links_nodes = (NodeList) xPath.evaluate("//item/link", XPInputSrc, XPathConstants.NODESET);

Again, this worked greate with all sites but one.

What I needed was actually to pass a user-agent argument with the http request, but I couldn't find a direct way to pass that through the InputSource.

At one point, I thought I might use something like Jsoup to get the XML document, then pass that to the InputSource via a StringReader!

Good thing I dismissed this idea, and found that if I set a system property of "http.agent", it gets passed in all http reqests! So, I did this:

System.setProperty("http.agent", "Mozilla/5.0");

This did the trick!