Self-contained, and improved web connections: jsoup version 1.3.1
2010-Aug-23 I am pleased to announce that jsoup version 1.3.1 is now available for download.
jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.
In previous releases, jsoup depended on Apache Commons-Lang for some methods, such as HTML entity parsing. These methods have now been natively implemented, and jsoup now has no external dependencies. This is beneficial as the JVM memory overhead is lower, and it is easier for developers to get started using jsoup.
Improved web connections
jsoup now sports a convenient
Connection interface, available via
Jsoup.connect(String url). This allows the developer to easily build a HTTP request, including specifying the user-agent, referrer, cookies, request headers, data parameters, and timeout. The response can be parsed directly into a
Document, or response headers and cookies may be retrieved.
Document doc = Jsoup.connect("http://example.com") .data("query", "Java") .userAgent("Mozilla") .cookie("auth", "token") .timeout(3000) .post();
HTTP requests now support gzip compression by default.
The Connection interface is new and may change slightly in upcoming releases. Please get in touch if you have suggestions or find bugs.
Other improvements and bug fixes
- Added the
Element.ownText()method, to get only the direct text of an element, not including the text of its children.
- Added support for selectors :containsOwn(text) and :matchesOwn(regex), to supplement
- Added support for non-pretty-printed HTML output, to more closely mirror the input HTML. See
- Further speed optimisations for parsing and output generation.
- Fixed support for case-sensitive HTML escape entities. (issue #31)
- Fixed issue when parsing tags with keyless attributes. (issue #32)
Many thanks to everyone who has helped with this release of jsoup by contributing to the mailing lists, sending in bugs, and getting in touch with me.