jsoup release 1.7.3
2013-Nov-10
jsoup 1.7.3 introduces improved form handling, more robust character-set detection, speed and memory optimisations in parsing and CSS selectors, and a set of bug fixes.
jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.
Improvements
- Added the element type
FormElement
, to facilitate simple form submissions. Find forms in a doc usingElements.forms()
, then prepare it for submission withFormElement.submit()
. - Improved the reliability of HTTP character-set recognition from response headers, particularly for when servers return out-of-spec responses.
- Added
Document.location()
to retrieve the document's location URL. Handy if the request was redirected from the original URL. - Large decrease in the amount of temporary objects created during parsing, leading to less GC load (helpful particularly on Android), and faster parsing.
- Improved the time to match elements with common CSS selectors by ~ 27%.
Bug Fixes
- Fixed support for self-closing script tags.
- Fixed a crash when reading an unterminated CDATA section.
- Fixed an issue where elements added via the adoption agency algorithm did not preserve their attributes.
- Fixed an issue when cloning a document with extremely nested elements that could cause a stack-overflow.
- Fixed an issue when connecting or redirecting to a URL that contains a space.
Many thanks to everyone who contributed patches, suggestions, and bug reports. If you have any suggestions for the next release, I would love to hear them; please get in touch via the mailing list or to me directly.