jsoup release 1.7.3

2013-Nov-10

jsoup 1.7.3 introduces improved form handling, more robust character-set detection, speed and memory optimisations in parsing and CSS selectors, and a set of bug fixes.

jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.

Improvements

  • Added the element type FormElement, to facilitate simple form submissions. Find forms in a doc using Elements.forms(), then prepare it for submission with FormElement.submit().
  • Improved the reliability of HTTP character-set recognition from response headers, particularly for when servers return out-of-spec responses.
  • Added Document.location() to retrieve the document's location URL. Handy if the request was redirected from the original URL.
  • Large decrease in the amount of temporary objects created during parsing, leading to less GC load (helpful particularly on Android), and faster parsing.
  • Improved the time to match elements with common CSS selectors by ~ 27%.

Bug Fixes

  • Fixed support for self-closing script tags.
  • Fixed a crash when reading an unterminated CDATA section.
  • Fixed an issue where elements added via the adoption agency algorithm did not preserve their attributes.
  • Fixed an issue when cloning a document with extremely nested elements that could cause a stack-overflow.
  • Fixed an issue when connecting or redirecting to a URL that contains a space.

Many thanks to everyone who contributed patches, suggestions, and bug reports. If you have any suggestions for the next release, I would love to hear them; please get in touch via the mailing list or to me directly.