Supporting the Desperate RSS Hacker

(via MoreLikeThis)

Over at MoreLikeThis, there’s some discussion about Screen-scraping using XSL and Tidy. I’ve done stuff like this already, but entirely within Cocoon using the HTMLGenerator, which happens to JTidy the input, and I specify an XPath statement to “dig” into the page a bit to get to the goodies I want. After that, It’s all a matter of using XSL to transform to an intermediate format, and then another XSL to output to the format I want.

I’m using this not only in openWeather, but also in a personal little project of mine which aggregates job listings from sites like Techies.com, Dice.com, Monster, etc, and displays everything in a nice quick to read format. Now, instead of checking 4 sites, all I do is hit my private Cocoon page and I can scan through all the listings in about 10 seconds. Very efficient.

Edit: Oh yeah, I should turn those job listings into RSS feeds. I could be lightyears ahead of any of the job sites! :)

Comments are closed.