jsoup Java HTML Parser release 1.17.2
2023-Dec-29
jsoup 1.17.2 is out now, with improvements around attribute source position tracking, and a range of bug fixes.
jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors.
Deprecation note: over the last few releases, a number of mostly internal methods have been deprecated. Please review your use of any of these methods and migrate away from them now if applicable. These will be removed in a following release.
Download jsoup now.
Improvements
- Attribute object accessors: Added Element.attribute(andString) Attributes.attribute(to more simply obtain anString) Attributeobject. 2069
- Attribute source tracking: If source tracking is on, and an Attribute's key is changed ( via Attribute.setKey(), the source range is now still tracked inString) Attribute.sourceRange(. 2070) 
- Wildcard attribute selector: Added support for the [*]element with any attribute selector. And also restored support for selecting by an empty attribute name prefix ([^]). 2079
Bug Fixes
- Mixed-cased source position: When tracking the source position of attributes, if the source attribute name was mix-cased but the parser was lower-case normalizing attribute names, the source position for that attribute was not tracked correctly. 2067
- Source position NPE: When tracking the source position of a body fragment parse, a null pointer exception was thrown. 2068
- Multi-point emoji entity: A multi-point encoded emoji entity may be incorrectly decoded to the replacement character. 2074
- Selector sub-expressions: (Regression) in a selector like parent [attr=va], other, the, ORwas binding to[attr=va]instead ofparent [attr=va], causing incorrect selections. The fix includes a EvaluatorDebug class that generates a sexpr to represent the query, allowing simpler and more thorough query parse tests. 2073
- XML CData output: When generating XML-syntax output from parsed HTML, script nodes containing (pseudo) CData sections would have an extraneous CData section added, causing script execution errors. Now, the data content is emitted in a HTML/XML/XHTML polyglot format, if the data is not already within a CData section. 2078
- Thread safety: The :hasevaluator held a non-thread-safe Iterator, and so if an Evaluator object was shared across multiple concurrent threads, a NoSuchElement exception may be thrown, and the selected results may be incorrect. Now, the iterator object is a thread-local. 2088