jsoup release 1.8.3

2015-Aug-02

jsoup 1.8.3 includes performance improvements when parsing large HTML files, automatically switches to the XML parser when fetching XML documents, and has some important bug fixes.

jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.

Improvements

  • Performance improvement on parsing larger HTML pages. On Android KitKat, around 1.7x times faster. On Android Lollipop, ~ 1.3x faster. Improvements largely from re-ordering the HtmlTreeBuilder methods based on analysis of various websites; also from further memory reduction for nodes with no children, and other tweaks.
  • When fetching XML URLs, automatically switch to the XML parser instead of the HTML parser.
  • Improved support for boolean attributes in HTML5.
  • When serialising XML, ensure that '<' characters in attributes are escaped, per spec. Not required in HTML.

Bug Fixes

  • Fixed an issue in Element.elementSiblingIndex() (and related methods) where sibling elements with the same content would incorrectly have the same sibling index.
  • Fixed an issue where unexpected elements in a badly nested table could be moved to the wrong location in the document.
  • Fixed an issue where a table nested within a TH cell would parse to an incorrect tree.
  • When serializing a document using the XHTML encoding entities, if the character set did not support &nbsp; chars (such as Shift_JIS), the character would be skipped. For visibility, will now always output &xa0; (the hex code for non-breaking-space); when using XHTML encoding entities (as &nbsp; is not defined), regardless of the output character set.
  • Fixed an issue when resolving URLs, where if the absolute URL had no path, the relative URL was not normalized correctly.
  • Fixed an issue where connections that were redirected to a relative URL did not have the same normalization rules as a URL read from Nodes.absUrl(String).

Many thanks to everyone who contributed patches, suggestions, and bug reports. If you have any suggestions for the next release, I would love to hear them; please get in touch via the mailing list or to me directly.