jsoup release 1.8.3
2015-Aug-02
jsoup 1.8.3 includes performance improvements when parsing large HTML files, automatically switches to the XML parser when fetching XML documents, and has some important bug fixes.
jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.
Improvements
- Performance improvement on parsing larger HTML pages. On Android KitKat, around 1.7x times faster. On Android Lollipop, ~ 1.3x faster. Improvements largely from re-ordering the
HtmlTreeBuilder
methods based on analysis of various websites; also from further memory reduction for nodes with no children, and other tweaks. - When fetching XML URLs, automatically switch to the XML parser instead of the HTML parser.
- Improved support for boolean attributes in HTML5.
- When serialising XML, ensure that '<' characters in attributes are escaped, per spec. Not required in HTML.
Bug Fixes
- Fixed an issue in
Element.elementSiblingIndex()
(and related methods) where sibling elements with the same content would incorrectly have the same sibling index. - Fixed an issue where unexpected elements in a badly nested table could be moved to the wrong location in the document.
- Fixed an issue where a table nested within a TH cell would parse to an incorrect tree.
- When serializing a document using the XHTML encoding entities, if the character set did not support
chars (such asShift_JIS
), the character would be skipped. For visibility, will now always output&xa0;
(the hex code for non-breaking-space); when using XHTML encoding entities (as
is not defined), regardless of the output character set. - Fixed an issue when resolving URLs, where if the absolute URL had no path, the relative URL was not normalized correctly.
- Fixed an issue where connections that were redirected to a relative URL did not have the same normalization rules as a URL read from Nodes.absUrl(String).
Many thanks to everyone who contributed patches, suggestions, and bug reports. If you have any suggestions for the next release, I would love to hear them; please get in touch via the mailing list or to me directly.