jsoup Java HTML Parser release 1.16.1

2023-Apr-29

jsoup 1.16.1 is out now, and includes a bunch of improvements, particularly when pretty-printing HTML, and bug fixes.

jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors.

Download jsoup now.

Improvements

  • In Jsoup.connect(String url), natively support URLs with Unicode characters in the path or query string, without having to be escaped by the caller. #1914
  • Calling Node.remove() on a node with no parent is now a no-op, vs a validation error. #1898

Bug Fixes

  • Aligned the HTML Tree Builder processing steps for AfterBody and AfterAfterBody to the updated WHATWG standard, to not pop the stack to close <body> or <html> elements. This prevents an errant </html> closing the preceding structure. Also added appropriate error message outputs in this case. #1851
  • Corrected support for ruby elements (<ruby>, <rp>, <rt>, and <rtc>) to current spec. #1294
  • In Jsoup.connect(String url), if the input URL had components that were already % escaped, they would be escaped again, causing errors when fetched. #1902
  • When tracking input source positions, text in tables that was fostered had invalid positions. #1927
  • When pretty-printing, the first inline Element or Comment in a block would not be wrap-indented if it were preceded by a blank text node. #1906
  • When pretty-printing a <pre> containing block tags, those tags were incorrectly indented. #1891
  • When pretty-printing nested inlineable blocks (such as a <p> in a <td>), the inner element should be indented. #1926
  • <br> tags should be wrap-indented when in block tags (and not when in inline tags). #1911
  • The contents of a sufficiently large <textarea> with un-escaped HTML closing tags may be incorrectly parsed to an empty node. #1929


My sincere thanks to everyone who contributed patches, suggestions, and bug reports. If you have any suggestions for the next release, I would love to hear them; please get in touch with me directly.

You can also follow me (@jhy@tilde.zone) on Mastodon / Fediverse to receive occasional notes about jsoup releases.