jsoup Java HTML Parser release 1.16.1
2023-Apr-29
jsoup 1.16.1 is out now, and includes a bunch of improvements, particularly when pretty-printing HTML, and bug fixes.
jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors.
Download jsoup now.
Improvements
- In
Jsoup.connect(String url)
, natively support URLs with Unicode characters in the path or query string, without having to be escaped by the caller. #1914
- Calling
Node.remove()
on a node with no parent is now a no-op, vs a validation error. #1898
Bug Fixes
- Aligned the HTML Tree Builder processing steps for
AfterBody
andAfterAfterBody
to the updated WHATWG standard, to not pop the stack to close<body>
or<html>
elements. This prevents an errant</html>
closing the preceding structure. Also added appropriate error message outputs in this case. #1851
- Corrected support for ruby elements (
<ruby>
,<rp>
,<rt>
, and<rtc>
) to current spec. #1294
- When using
Node.before(Node)
orNode.after(Node)
, if the incoming node was a sibling of the context node, the incoming node may be inserted into the wrong relative location. #1898
- In
Jsoup.connect(String url)
, if the input URL had components that were already%
escaped, they would be escaped again, causing errors when fetched. #1902
- When tracking input source positions, text in tables that was fostered had invalid positions. #1927
- If the
Document.OutputSettings
class was initialized, and thenEntities.escape(String)
called, an NPE may be thrown due to a class loading circular dependency. #1910
- When pretty-printing, the first inline
Element
orComment
in a block would not be wrap-indented if it were preceded by a blank text node. #1906
- When pretty-printing a
<pre>
containing block tags, those tags were incorrectly indented. #1891
- When pretty-printing nested inlineable blocks (such as a
<p>
in a<td>
), the inner element should be indented. #1926
<br>
tags should be wrap-indented when in block tags (and not when in inline tags). #1911
- The contents of a sufficiently large
<textarea>
with un-escaped HTML closing tags may be incorrectly parsed to an empty node. #1929
My sincere thanks to everyone who contributed patches, suggestions, and bug reports. If you have any suggestions for the next release, I would love to hear them; please get in touch with me directly.
You can also follow me (@jhy@tilde.zone) on Mastodon / Fediverse to receive occasional notes about jsoup releases.