jsoup Java HTML Parser release 1.15.4
2023-Feb-18
jsoup 1.15.4 is out now, and includes a bunch of improvements, particularly when pretty-printing HTML, and bug fixes.
jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors.
Download jsoup now.
Improvements
- Added the ability to escape CSS selectors (tags, IDs, classes) to match elements that don't follow regular CSS syntax. For example, to match by classname
<p class="one.two">
, usedocument.select(
#838"p.one\\.two");
- When pretty-printing, wrap text that follows a
<br>
tag. #1858
- When pretty-printing, normalize newlines that follow self-closing tags in custom tags. #1852
- When pretty-printing, collapse non-significant whitespace between a block and an inline tag. #1802
- In
Element.forEach(
and) Node.forEachNode(
, use) java.util.function.Consumer
instead of the previous Android compatibility shimorg.jsoup.helper.Consumer
. Subsequently, the latter has been deprecated. #1870
- Added a new method
Document.forms(
, to conveniently retrieve a) List<FormElement>
containing the<form>
elements in a document.
- Added a new method
Document.expectForm(
, to find the first matching) FormElement
, or blow up trying.
Bug Fixes
- URLs containing characters such as
[
and]
were not escaped correctly, and would throw aMalformedURLException
when fetched. #1873
Element.cssSelector(
would create invalid selectors for elements where the tag name, ID, or classnames needed to be escaped (e.g. if a class name contained a) :
or.
). #1742
Element.text(
should have a space between a block and an inline element. #1877)
- Form data on a previous request was copied to a new request in
newRequest(
, resulting in an accumulation of form data when executing multi-step form submissions, or data sent to later requests incorrectly. Now,) newRequest(
only copies session related settings (cookies, proxy settings, user-agent, etc) but not the request data nor the body. #1778)
- Fixed an issue in
Safelist.removeAttributes(
which could throw a) ConcurrentModificationException
when using the:all
pseudo-attribute.
- Given extremely deeply nested HTML, a number of methods in
Element
could throw aStackOverflowError
due to excessive recursion. Namely:#data(
,) #hasText(
,) #parents(
, and) #wrap(
. #1864html)
Changes
- Deprecated the unused
Document.normalise(
method. Normalization occurs during the HTML tree construction, and no longer as a distinct phase.)
My sincere thanks to everyone who contributed patches, suggestions, and bug reports. If you have any suggestions for the next release, I would love to hear them; please get in touch with me directly.
You can also follow me (@jhy@tilde.zone) on Mastodon / Fediverse to receive occasional notes about jsoup releases.