jsoup Java HTML Parser release 1.15.4
jsoup 1.15.4 is out now, and includes a bunch of improvements, particularly when pretty-printing HTML, and bug fixes.
jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors.
Download jsoup now.
- Added the ability to escape CSS selectors (tags, IDs, classes) to match elements that don't follow regular CSS syntax. For example, to match by classname
<p class="one.two">, use
- When pretty-printing, wrap text that follows a
- When pretty-printing, normalize newlines that follow self-closing tags in custom tags. #1852
- When pretty-printing, collapse non-significant whitespace between a block and an inline tag. #1802
java.util.function.Consumerinstead of the previous Android compatibility shim
org.jsoup.helper.Consumer. Subsequently, the latter has been deprecated. #1870
- Added a new method
Document.forms(), to conveniently retrieve a
<form>elements in a document.
- Added a new method
Document.expectForm(), to find the first matching
FormElement, or blow up trying.
- URLs containing characters such as
and <code>were not escaped correctly, and would throw a
MalformedURLExceptionwhen fetched. #1873
Element.cssSelector()would create invalid selectors for elements where the tag name, ID, or classnames needed to be escaped (e.g. if a class name contained a
Element.text()should have a space between a block and an inline element. #1877
- Form data on a previous request was copied to a new request in
newRequest(), resulting in an accumulation of form data when executing multi-step form submissions, or data sent to later requests incorrectly. Now,
newRequest()only copies session related settings (cookies, proxy settings, user-agent, etc) but not the request data nor the body. #1778
- Fixed an issue in
Safelist.removeAttributes()which could throw a
ConcurrentModificationExceptionwhen using the
- Given extremely deeply nested HTML, a number of methods in
Elementcould throw a
StackOverflowErrordue to excessive recursion. Namely:
- Deprecated the unused
Document.normalise()method. Normalization occurs during the HTML tree construction, and no longer as a distinct phase.
My sincere thanks to everyone who contributed patches, suggestions, and bug reports. If you have any suggestions for the next release, I would love to hear them; please get in touch with me directly.
You can also follow me (@email@example.com) on Mastodon / Fediverse to receive occasional notes about jsoup releases.