jsoup Java HTML Parser release 1.11.2

2017-Nov-19

jsoup 1.11.2 is primarily a bugfix release, that improves stability and robustness across different servers, and in soupy HTML.

jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.

Download jsoup now.

Improvements

  • Added a new pseudo selector :matchText, which allows text nodes to match as if they were elements. This enables finding text that is only marked by a br tag, for example.
  • Change: marked Connection.validateTLSCertificates() as deprecated.
  • Normalize invisible characters (like soft-hyphens) in Element.text().
  • Added Element.wholeText(), to easily get the un-normalized text value of an element and its children.

Bug Fixes

  • Bugfix: in a deep DOM stack, a StackOverFlow exception could occur when generating implied end tags.
  • Bugfix: when parsing attribute values that happened to cross a buffer boundary, a character was dropped.
  • Bugfix: fixed an issue that prevented using infinite timeouts in Jsoup.Connection.
  • Bugfix: whitespace preserving tags were not honoured when nested deeper than two levels deep.
  • Bugfix: an unterminated comment token at the end of the HTML input would cause an out of bounds exception.
  • Bugfix: an NPE in the Cleaner which would occur if an <a href> attribute value was missing.
  • Bugfix: when serializing the same document in a multiple threads, on Android, with a character set that is not ascii or UTF-8, an encoding exception could occur.
  • Bugfix: removing a form value from the DOM would not remove it from FormData.
  • Bugfix: in the W3CDom transformer, siblings were incorrectly inheriting namespaces defined on previous siblings.

Many thanks to everyone who contributed patches, suggestions, and bug reports. If you have any suggestions for the next release, I would love to hear them; please get in touch via the mailing list or to me directly.