jsoup Java HTML Parser release 1.11.3
jsoup 1.11.3 is out now, with a range of bug fixes and improvements for interoperability with hopeless HTML and substandard servers.
jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.
Download jsoup now.
CDATAsections are now treated as whitespace preserving (regardless of the containing element), and are round-tripped into output HTML.
- Added support for
- When parsing
<pre>tags, skip the first newline if present.
- Support nested quotes for attribute selection queries.
- Character references from Windows-1252 that are not valid Unicode are mapped to the appropriate Unicode replacement.
- Accept a custom SSL socket factory in
Jsoup.Connection. Note that
Connection.validateTLSCertificates()will be removed in the next release;
Connection.sslSocketFactory(SSLSocketFactory sslSocketFactory)provides a path to implement a workaround if you need to keep using a similar approach.
- Bugfix: A
Mark has been invalidatedexception was thrown when parsing some URLs on Android <= 6.
- Bugfix: The
- Bugfix: boolean attributes with empty string values were not collapsing in HTML output.
- Bugfix: when using the XML Parser set to lowercase normalize tags, uppercase closing tags were not correctly handled.
- Bugfix: when parsing from a URL, an end tag could be read incorrectly if it started on a buffer boundary.
- Bugfix: when parsing from a URL, if the remote server failed to complete its write (i.e. it writes less than the Content Length header promised on a gzipped stream), the parse method would incorrectly throw an unchecked exception. It now throws the declared
- Bugfix: leaf nodes (such as text nodes) where throwing an unsupported operation exception on
childNodes(), instead of just returning an empty list.
- Bugfix: documents with a leading UTF-8 BOM did not have that BOM consumed, so it acted as a zero width no-break space, which could impact the parse tree.
- Bugfix: when parsing an invalid XML declaration, the parse would fail.
Many thanks to everyone who contributed patches, suggestions, and bug reports. If you have any suggestions for the next release, I would love to hear them; please get in touch via the mailing list or to me directly.