Parser suggestion(s)

Subscribe to Parser suggestion(s) 1 post, 1 voice

 
Avatar (SF) Matt Pe... 2 post(s)

I’m not sure exactly what the problem is, but coincidentally I’ve been looking into screen scraping for other purposes, and it looks like there are two pretty slick libraries for doing this.

Both are HTML parsers that convert to XML streams so you can use XPATH queries to find elements. The XPATH queries could possibly live in a config file, making updates easier.

Neko HTML:
http://www.apache.org/~andyc/neko/doc/html/

TagSoup:
http://mercury.ccil.org/~cowan/XML/tagsoup/