jsoup Java HTML Parser release 1.21.2
2025-Aug-25
jsoup 1.21.2 is out now, adding support for custom SSLContext
in HTTP/2 connections, and improving consistency in how user data is handled in attributes. It also brings performance gains in DOM manipulation and fragment parsing, and fixes several edge cases in stream parsing, traversal, cloning, and concurrent reads.
jsoup is a Java library for working with real-world HTML and XML. It provides a very convenient API for extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors.
Changes
- Deprecated internal (yet visible) methods
Normalizer#normalize(
andString, bool) Attribute#shouldCollapseAttribute(
. These will be removed in a future version.Document.OutputSettings) - Deprecated
Connection#sslSocketFactory(
in favor of the newSSLSocketFactory) Connection#sslContext(
. UsingSSLContext) sslSocketFactory
will force the use of the legacyHttpUrlConnection
implementation, which does not support HTTP/2. #2370
Improvements
- When pretty-printing, if there are consecutive text nodes (via DOM manipulation), the non-significant whitespace between them will be collapsed. #2349.
- Updated
Connection.Response#statusMessage(
to return a simple loggable string message (e.g. "OK") when using the) HttpClient
implementation, which doesn't otherwise return any server-set status message. #2356 Attributes#size(
and) Attributes#isEmpty(
now exclude any internal attributes (such as user data) from their count. This aligns with the attributes' serialized output and iterator. #2369) - Added
Connection#sslContext(
to provide a custom SSL (TLS) context to requests, supporting both theSSLContext) HttpClient
and the legacyHttUrlConnection
implementations. #2370 - Performance optimizations for DOM manipulation methods including when repeatedly removing an element's first child (
element.child(
, and when using0).remove() Parser#parseBodyFragement(
to parse a large number of direct children. #2373.)
Bug Fixes
- When parsing from an InputStream and a multibyte character happened to straddle a buffer boundary, the stream would not be completely read. #2353.
- In
NodeTraversor
, if a last child element was removed during thehead(
call, the parent would be visited twice. #2355.) - Cloning an Element that has an Attributes object would add an empty internal user-data attribute to that clone, which would cause unexpected results for
Attributes#size(
and) Attributes#isEmpty(
. #2356) - In a multithreaded application where multiple threads are calling
Element#children(
on the same element concurrently, a race condition could happen when the method was generating the internal child element cache (a filtered view of its child nodes). Since concurrent reads of DOM objects should be threadsafe without external synchronization, this method has been updated to execute atomically. #2366) - Malformed HTML could throw an IndexOutOfBoundsException during the adoption agency. #2377.
My sincere thanks to everyone who contributed to this release! If you have any suggestions for the next release, I would love to hear them; please get in touch via jsoup discussions, or with me directly.
You can also follow me (@jhy@tilde.zone) on Mastodon / Fediverse to receive occasional notes about jsoup releases.
Download jsoup now.