jsoup Java HTML Parser release 1.17.1
2023-Nov-27
jsoup 1.17.1 is out now with support for request-level authentication, attribute name & value source ranges, stream()
iterable support, the :is()
selector, and a bunch of other improvements and bug fixes.
jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors.
Deprecation note: over the last few releases, a number of mostly internal methods have been deprecated. Please review your use of any of these methods and migrate away from them now if applicable. These will be removed in a following release.
Download jsoup now.
Improvements
- Request-Level Authentication: Added support for request-level authentication in Jsoup.connect(), enabling authentication to proxies and servers. More.
- Elements DOM Mutators: In the
Elements
list, added direct support forElements#set(int, Element)
,Elements#remove(int)
,Elements#remove(Object)
,Elements#clear()
,Elements#removeAll()
,Elements#retainAll()
,Elements#removeIf()
,Elements#replaceAll()
. These methods update the original DOM, as well as the Elements list. More.
- Stream Interface: Introduced the
NodeIterator
class for efficient node tree traversal using the Iterator interface. Added StreamElement#stream()
andNode#nodeStream()
methods for fluent composable stream pipelines of node traversals. More.
- XML OutputSettings: Automatically sets the xhtml
EscapeMode
as default when changing theOutputSettings
syntax toXML
.
- is() Selector: Added the
:is(selector list)
pseudo-selector to find elements that match any selectors in the selector list. This enhances readability for largeOR
ed selectors. More.
- JPMS Module Support: Repackaged the library with native JPMS module support. More.
- Source Position Fidelity: Improved fidelity of source positions when tracking is enabled. Implicitly created or closed elements are now trackable via
Range.isImplicit()
. More.
- Attribute Source Positions: Enabled source position for attribute names and values when source tracking is on.
Attribute#sourceRange()
provides the ranges. More.
- Virtual Threads: Enhanced performance under Java 21+ Virtual Threads by replacing the internal
ConstrainableInputStream
withControllableInputStream
. More.
- XML Mimetype Support: Extended XML mimetype support in
Jsoup.connect()
to include any XML mimetype. More.
Bug Fixes
- XML Data Nodes: Fixed a bug where HTML elements parsed as data nodes were not correctly emitted as
CDATA
nodes when outputting withXML
syntax. More.
- Immediate Parent Selector: Corrected a bug where the Immediate Parent selector
>
could match elements above the root context element. More.
- Sub-Query Parsing: Resolved a bug where combinators following the
,
Or combinator in a sub-query were incorrectly skipped. More.
- Empty Doctype: Fixed a bug in
W3CDom
where the conversion would fail if the jsoup input document contained an empty doctype. The doctype is now discarded, and the conversion continues.
- SVG Elements Cleaning: Fixed incorrect nesting when cleaning a document containing SVG elements or other foreign elements with preserved-case names. More.
- Unknown Self-Closing Tags: Preserved the output style of unknown self-closing tags from the input when cleaning a document. More.
Build Improvements
- Local Test Proxy: Added a local test proxy implementation for proxy integration tests. More.
- HTTPS Request Tests: Added tests for HTTPS request support using a local self-signed certificate. Includes proxy tests. More.
Changes
- Response BodyStream: The InputStream returned in
Connection.Response.bodyStream()
is now a plainBufferedInputStream
. More.