jsoup Java HTML Parser release 1.21.2

2025-Aug-25

jsoup 1.21.2 is out now, adding support for custom SSLContext in HTTP/2 connections, and improving consistency in how user data is handled in attributes. It also brings performance gains in DOM manipulation and fragment parsing, and fixes several edge cases in stream parsing, traversal, cloning, and concurrent reads.

jsoup is a Java library for working with real-world HTML and XML. It provides a very convenient API for extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors.

Changes

Improvements

  • When pretty-printing, if there are consecutive text nodes (via DOM manipulation), the non-significant whitespace between them will be collapsed. #2349.
  • Updated Connection.Response#statusMessage() to return a simple loggable string message (e.g. "OK") when using the HttpClient implementation, which doesn't otherwise return any server-set status message. #2356
  • Attributes#size() and Attributes#isEmpty() now exclude any internal attributes (such as user data) from their count. This aligns with the attributes' serialized output and iterator. #2369
  • Added Connection#sslContext(SSLContext) to provide a custom SSL (TLS) context to requests, supporting both the HttpClient and the legacy HttUrlConnection implementations. #2370
  • Performance optimizations for DOM manipulation methods including when repeatedly removing an element's first child (element.child(0).remove(), and when using Parser#parseBodyFragement() to parse a large number of direct children. #2373.

Bug Fixes

  • When parsing from an InputStream and a multibyte character happened to straddle a buffer boundary, the stream would not be completely read. #2353.
  • In NodeTraversor, if a last child element was removed during the head() call, the parent would be visited twice. #2355.
  • Cloning an Element that has an Attributes object would add an empty internal user-data attribute to that clone, which would cause unexpected results for Attributes#size() and Attributes#isEmpty(). #2356
  • In a multithreaded application where multiple threads are calling Element#children() on the same element concurrently, a race condition could happen when the method was generating the internal child element cache (a filtered view of its child nodes). Since concurrent reads of DOM objects should be threadsafe without external synchronization, this method has been updated to execute atomically. #2366
  • Malformed HTML could throw an IndexOutOfBoundsException during the adoption agency. #2377.

My sincere thanks to everyone who contributed to this release! If you have any suggestions for the next release, I would love to hear them; please get in touch via jsoup discussions, or with me directly.

You can also follow me (@jhy@tilde.zone) on Mastodon / Fediverse to receive occasional notes about jsoup releases.

Download jsoup now.