jsoup release 1.9.1
2016-Apr-16
jsoup 1.9.1 includes improved HTTP connection support, faster HTML parsing, and some important bug fixes.
jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.
Download jsoup now.
Improvements
- Added support for
HTTP
andSOCKS
request proxies, specifiable per connection. SeeConnection.proxy(String, int)
. - Added support for sending plain HTTP request bodies in
POST
andPUT
requests, withConnection.requestBody(String)
. - Added support in
Jsoup.Connect()
forHEAD
,OPTIONS
, andTRACE
. - Added support for
HTTP 307 Temporary Redirect
(replays posts, if applicable). - Performance improvements when parsing HTML, particularly on Android Dalvik.
- Added support for writing HTML into
Appendable
objects (likeOutputStreamWriter
), to enable stream serialization. SeeNode.html(T)
- Added support for XML namespaces when converting jsoup documents to W3C documents.
- Added support for UTF-16 and UTF-32 character set detection from byte-order-marks (
BOM
). - Added support for tags with non-ascii (unicode) letters.
- Added
Connection.data(String)
to retrieve a data KeyVal by its key. Useful to update form data before submission.
Bug Fixes
- Fixed an issue in the Parent selector where it would not match against the root element it was applied to.
- Fix an issue where
Elements.select(String)
would not return every matching element if they had the same content. - Added not-null validators to
Element.appendText()
andElement.prependText()
- Fixed an issue when moving moving nodes using
Element.insert(int, Collection)
where the sibling index would be set incorrectly, leading to the original loads being lost. - Reverted
Node.equals()
andNode.hashCode()
back to identity (object) comparisons, as deep content inspection had negative performance impacts and hashkey stability problems. Functionality replaced withNode.hasSameValue()
. - In
Connection
, if the same header key is seen multiple times, combine their values with a comma per the HTTP RFC, instead of keeping just one value. Also fixes an issue where header values could be out of order.
Many thanks to everyone who contributed patches, suggestions, and bug reports. If you have any suggestions for the next release, I would love to hear them; please get in touch via the mailing list or to me directly.