jsoup Java HTML Parser release 1.15.3

2022-Aug-24

jsoup 1.15.3 is out now, and includes a security fix for potential XSS attacks, along with other improvements including more descriptive validation error messages.

jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors.

Download jsoup now.

Security

Improvements

  • The Cleaner will preserve the source position of cleaned elements, if source tracking is enabled in the original parse.
  • The error messages output from Validate are more descriptive. Exceptions are now ValidationExceptions (extending IllegalArgumentException). Stack traces do not include the Validate class, to make it simpler to see where the exception originated. Common validation errors including malformed URLs and empty selector results have more explicit error messages.
  • Build Improvement: added implementation version and related fields to the jar manifest. #1809

Bug Fixes

  • The DataUtil would incorrectly read from InputStreams that emitted reads less than the requested size. This lead to incorrect results when parsing from chunked server responses, for example. #1807

My sincere thanks to everyone who contributed patches, suggestions, and bug reports. If you have any suggestions for the next release, I would love to hear them; please get in touch with me directly.

You can also follow me (@jhy) on Twitter to receive occasional notes about jsoup releases.