jsoup release 1.10.2
2017-Jan-02
jsoup 1.10.2 features faster startup times, additional DOM-tree navigators, improved HTTP compatibility, and a range of bug fixes.
jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.
Download jsoup now.
Improvements
- Improved startup time, particularly on Android, by reducing garbage generation and CPU execution time when loading the HTML entity files. About 1.72x faster in this area.
- Added
Element.is(query)
to check if an element matches this CSS query. - Added new methods to
Elements
:next(query)
,nextAll(query)
,prev(query)
,prevAll(query)
to select next and previous element siblings from a current selection, with optional selectors. - Added
Node.root()
to get the topmost ancestor of a Node. - Added the new selector
:containsData()
, to find elements that hold data, like script and style tags. - Changed
Jsoup.isValid(bodyHtml)
to validate that the input contains only body HTML that is safe according to the whitelist, and does not include HTML errors. And in theJsoup.Cleaner.isValid(Document)
method, make sure the doc only includes body HTML. - In Whitelists, validate that a removed protocol exists before removing said protocol.
- Allow the
Jsoup.Connect
thread to be interrupted when reading the input stream; helps when reading from a long stream of data that doesn't read timeout. Jsoup.Connect
now uses a desktop user agent by default. Many developers were getting caught by not specifying the user agent, and sending the defaultJava
. That causes many servers to return different content than what they would to a desktop browser, and what the developer was expecting.- Increased the default connect/read timeout in Jsoup.Connect to 30 seconds.
Jsoup.Connect
now detects if a header value is actually in UTF-8 vs the HTTP spec of ISO-8859, and converts the header value appropriately. This improves compatibility with servers that are configured incorrectly.
Fixes
- Bugfix: in Jsoup.Connect, URLs containing non-URL-safe characters were not encoded to URL safe correctly.
- Bugfix: a "SYSTEM" flag in doctype tags would be incorrectly removed.
- Bugfix: removing attributes from an
Element
withremoveAttr()
would cause aConcurrentModificationException
. - Bugfix: the contents of
Comment
nodes were not returned byElement.data()
- Bugfix: if source checked out on Windows with
git autocrlf=true
,Entities.load
would fail because of ther
char.
Many thanks to everyone who contributed patches, suggestions, and bug reports. If you have any suggestions for the next release, I would love to hear them; please get in touch via the mailing list or to me directly.