jsoup Java HTML Parser release 1.17.1
2023-Nov-27
jsoup 1.17.1 is out now with support for request-level authentication, attribute name & value source ranges, stream( iterable support, the :is( selector, and a bunch of other improvements and bug fixes.
jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors.
Deprecation note: over the last few releases, a number of mostly internal methods have been deprecated. Please review your use of any of these methods and migrate away from them now if applicable. These will be removed in a following release.
Download jsoup now.
Improvements
- Request-Level Authentication: Added support for request-level authentication in Jsoup.connect(), enabling authentication to proxies and servers. More.
- Elements DOM Mutators: In the
Elementslist, added direct support forElements#set(,int, Element) Elements#remove(,int) Elements#remove(,Object) Elements#clear(,) Elements#removeAll(,) Elements#retainAll(,) Elements#removeIf(,) Elements#replaceAll(. These methods update the original DOM, as well as the Elements list. More.)
- Stream Interface: Introduced the
NodeIteratorclass for efficient node tree traversal using the Iterator interface. Added StreamElement#stream(and) Node#nodeStream(methods for fluent composable stream pipelines of node traversals. More.)
- XML OutputSettings: Automatically sets the xhtml
EscapeModeas default when changing theOutputSettingssyntax toXML.
- is() Selector: Added the
:is(pseudo-selector to find elements that match any selectors in the selector list. This enhances readability for largeselector list) ORed selectors. More.
- JPMS Module Support: Repackaged the library with native JPMS module support. More.
- Source Position Fidelity: Improved fidelity of source positions when tracking is enabled. Implicitly created or closed elements are now trackable via
Range.isImplicit(. More.)
- Attribute Source Positions: Enabled source position for attribute names and values when source tracking is on.
Attribute#sourceRange(provides the ranges. More.)
- Virtual Threads: Enhanced performance under Java 21+ Virtual Threads by replacing the internal
ConstrainableInputStreamwithControllableInputStream. More.
- XML Mimetype Support: Extended XML mimetype support in
Jsoup.connect(to include any XML mimetype. More.)
Bug Fixes
- XML Data Nodes: Fixed a bug where HTML elements parsed as data nodes were not correctly emitted as
CDATAnodes when outputting withXMLsyntax. More.
- Immediate Parent Selector: Corrected a bug where the Immediate Parent selector
>could match elements above the root context element. More.
- Sub-Query Parsing: Resolved a bug where combinators following the
,Or combinator in a sub-query were incorrectly skipped. More.
- Empty Doctype: Fixed a bug in
W3CDomwhere the conversion would fail if the jsoup input document contained an empty doctype. The doctype is now discarded, and the conversion continues.
- SVG Elements Cleaning: Fixed incorrect nesting when cleaning a document containing SVG elements or other foreign elements with preserved-case names. More.
- Unknown Self-Closing Tags: Preserved the output style of unknown self-closing tags from the input when cleaning a document. More.
Build Improvements
- Local Test Proxy: Added a local test proxy implementation for proxy integration tests. More.
- HTTPS Request Tests: Added tests for HTTPS request support using a local self-signed certificate. Includes proxy tests. More.
Changes
- Response BodyStream: The InputStream returned in
Connection.Response.bodyStream(is now a plain) BufferedInputStream. More.