Faster, lighter: jsoup version 1.2.2 released

2010-Jul-11 I am pleased to announce that jsoup version 1.2.2 is now available for download.

jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.

Performance optimisations

The main focus of the release has been in optimising jsoup to be faster and lighter.

The core parser engine now runs 3.5 times faster than before, and the HTML generator is 2.5 times faster. The memory overhead is considerably lighter, and there are fewer garbage collections.

Combined, these optimisations make jsoup blaze through parsing, data extraction, and HTML generation.

Regular expression selectors

I've also added two new selectors:

  • el:matches(regex) finds elements that contain text that matches the supplied regular expression
  • el[key~=regex] finds elements that have attributes which values match the regular expression

These selectors provide a very powerful and flexible method of finding and extracting data from the DOM. For more information, see the Selector documentation.

If you have any suggestions for the next release, I would love to hear them; please get in touch via the mailing list.