Iudex is a general purpose web crawler and feed processor in ruby/java. The iudex-html gem contains filters for HTML parsing, filtering, exracting text and links.
David Kellum
gem "iudex-html", "~> 1.2.b.2"