jsoup release 1.8.3
jsoup 1.8.3 includes performance improvements when parsing large HTML files, automatically switches to the XML parser when fetching XML documents, and has some important bug fixes.
jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.
- Performance improvement on parsing larger HTML pages. On Android KitKat, around 1.7x times faster. On Android Lollipop, ~ 1.3x faster. Improvements largely from re-ordering the
HtmlTreeBuildermethods based on analysis of various websites; also from further memory reduction for nodes with no children, and other tweaks.
- When fetching XML URLs, automatically switch to the XML parser instead of the HTML parser.
- Improved support for boolean attributes in HTML5.
- When serialising XML, ensure that '<' characters in attributes are escaped, per spec. Not required in HTML.
- Fixed an issue in
Element.elementSiblingIndex()(and related methods) where sibling elements with the same content would incorrectly have the same sibling index.
- Fixed an issue where unexpected elements in a badly nested table could be moved to the wrong location in the document.
- Fixed an issue where a table nested within a TH cell would parse to an incorrect tree.
- When serializing a document using the XHTML encoding entities, if the character set did not support
chars (such as
Shift_JIS), the character would be skipped. For visibility, will now always output
&xa0;(the hex code for non-breaking-space); when using XHTML encoding entities (as
is not defined), regardless of the output character set.
- Fixed an issue when resolving URLs, where if the absolute URL had no path, the relative URL was not normalized correctly.
- Fixed an issue where connections that were redirected to a relative URL did not have the same normalization rules as a URL read from Nodes.absUrl(String).
Many thanks to everyone who contributed patches, suggestions, and bug reports. If you have any suggestions for the next release, I would love to hear them; please get in touch via the mailing list or to me directly.