jsoup release 1.7.2
jsoup 1.7.2 introduces selectors for structural pseudo CSS classes, full support for international supplementary characters, and a raft of improvements and bug fixes.
jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.
- Added support for supplementary characters outside of the Basic Multilingual Plane.
- Added support for structural pseudo CSS selectors, including
- Added a maximum body response size to
Jsoup.Connection, to prevent running out of memory when trying to read extremely large documents. The default is 1MB.
- Refactored the HTML
Cleanerto traverse rather than recurse child nodes, to avoid the risk of overflowing the stack.
Element.insertChildren(int, java.util.Collection), to easily insert a list of child nodes at a specific index.
Node.childNodesCopy(), to create an independent copy of a Node's children.
- When parsing in XML mode, preserve XML declarations (
<?xml ... ?>).
Parser.parseXmlFragment(), to allow easy parsing of XML fragments.
Whitelisttest methods to be extended
Document.OutputSettings.outlinemode, to aid HTML debugging by printing out in outline mode, similar to browser HTML inspectors.
- When parsing, allow all tags to self-close. Tags that aren't expected to self-close will get an end tag.
- Fixed an issue when parsing
<textarea>/RCDatatags containing unescaped closing tags that would drop the trailing
- When cloning an
Element, reset the classnames set so as not to hold a pointer to the source's.
- Corrected the javadoc for
Element#child()to note that it can throw
- Limit how far up the stack the formatting adoption agency algorithm will travel, to prevent the chance of a run-away parse when the HTML stack is hopelessly deep.
Element.text()to build text by traversing child nodes rather than recursing. This avoids stack-overflow errors when the DOM is very deep and the VM stack-size is low.
Many thanks to everyone who contributed patches, suggestions, and bug reports. If you have any suggestions for the next release, I would love to hear them; please get in touch via the mailing list or to me directly.