jsoup Java HTML Parser release 1.17.1

2023-Nov-27

jsoup 1.17.1 is out now with support for request-level authentication, attribute name & value source ranges, stream() iterable support, the :is() selector, and a bunch of other improvements and bug fixes.

jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors.

Deprecation note: over the last few releases, a number of mostly internal methods have been deprecated. Please review your use of any of these methods and migrate away from them now if applicable. These will be removed in a following release.

Download jsoup now.

Improvements

  • Request-Level Authentication: Added support for request-level authentication in Jsoup.connect(), enabling authentication to proxies and servers. More.
  • Stream Interface: Introduced the NodeIterator class for efficient node tree traversal using the Iterator interface. Added Stream Element#stream() and Node#nodeStream() methods for fluent composable stream pipelines of node traversals. More.
  • XML OutputSettings: Automatically sets the xhtml EscapeMode as default when changing the OutputSettings syntax to XML.
  • is() Selector: Added the :is(selector list) pseudo-selector to find elements that match any selectors in the selector list. This enhances readability for large ORed selectors. More.
  • JPMS Module Support: Repackaged the library with native JPMS module support. More.
  • Source Position Fidelity: Improved fidelity of source positions when tracking is enabled. Implicitly created or closed elements are now trackable via Range.isImplicit(). More.
  • Attribute Source Positions: Enabled source position for attribute names and values when source tracking is on. Attribute#sourceRange() provides the ranges. More.
  • Virtual Threads: Enhanced performance under Java 21+ Virtual Threads by replacing the internal ConstrainableInputStream with ControllableInputStream. More.
  • XML Mimetype Support: Extended XML mimetype support in Jsoup.connect() to include any XML mimetype. More.

Bug Fixes

  • XML Data Nodes: Fixed a bug where HTML elements parsed as data nodes were not correctly emitted as CDATA nodes when outputting with XML syntax. More.
  • Immediate Parent Selector: Corrected a bug where the Immediate Parent selector > could match elements above the root context element. More.
  • Sub-Query Parsing: Resolved a bug where combinators following the , Or combinator in a sub-query were incorrectly skipped. More.
  • Empty Doctype: Fixed a bug in W3CDom where the conversion would fail if the jsoup input document contained an empty doctype. The doctype is now discarded, and the conversion continues.
  • SVG Elements Cleaning: Fixed incorrect nesting when cleaning a document containing SVG elements or other foreign elements with preserved-case names. More.
  • Unknown Self-Closing Tags: Preserved the output style of unknown self-closing tags from the input when cleaning a document. More.

Build Improvements

  • Local Test Proxy: Added a local test proxy implementation for proxy integration tests. More.
  • HTTPS Request Tests: Added tests for HTTPS request support using a local self-signed certificate. Includes proxy tests. More.

Changes