jsoup Java HTML Parser release 1.15.2
jsoup 1.15.2 is out now, and includes a new ability to track the original input source position through to parsed nodes, a number of bug fixes, other improvements, and performance enhancements.
jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors.
Download jsoup now.
- Improvement: added the ability to track the position (line, column, index) in the original input source from where a given node was parsed. Accessible via
- Improvement: added
Node.lastChild(), as convenient accessors to those child nodes and elements.
- Improvement: added
Element.expectFirst(), which is just like
Element.selectFirst(), but instead of returning a null if there is no match, will throw an
IllegalArgumentException. This is useful if you want to simply abort processing if an expected match is not found, such as in test cases.
- Improvement: when pretty-printing HTML, doctypes are emitted on a newline if there is a preceding comment.
- Improvement: when pretty-printing, trim the leading and trailing spaces of textnodes in block tags when possible, so that they are indented correctly.
- Improvement: in
Element.selectXpath(), disable namespace awareness. This makes it possible to always select elements by their simple local name, regardless of whether an
xmlnsattribute was set.
- Bugfix: when using the
DataUtil.readToByteBuffer()method, such as in
Connection.Response.body(), if the document has not already been parsed and must be read fully, and there is any maximum buffer size being applied, only the default internal buffer size was read.
- Bugfix: when serializing HTML, newlines in elements descending from a
pretag were incorrectly skipped. That caused what should have been preformatted output to instead be a run of text.
- Bugfix: when pretty-print serializing HTML, newlines separating phrasing content (e.g. a
<span>tag within a
<p>tag would be incorrectly skipped, instead of normalized to a space. Additionally, improved space normalization between other end of line occurences, and whitespace handling after a closing
My sincere thanks to everyone who contributed patches, suggestions, and bug reports. If you have any suggestions for the next release, I would love to hear them; please get in touch with me directly.