Self-contained, and improved web connections: jsoup version 1.3.1

2010-Aug-23 I am pleased to announce that jsoup version 1.3.1 is now available for download.

jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.

Self-contained

In previous releases, jsoup depended on Apache Commons-Lang for some methods, such as HTML entity parsing. These methods have now been natively implemented, and jsoup now has no external dependencies. This is beneficial as the JVM memory overhead is lower, and it is easier for developers to get started using jsoup.

Improved web connections

jsoup now sports a convenient Connection interface, available via Jsoup.connect(String url). This allows the developer to easily build a HTTP request, including specifying the user-agent, referrer, cookies, request headers, data parameters, and timeout. The response can be parsed directly into a Document, or response headers and cookies may be retrieved.

E.g.:

Document doc = Jsoup.connect("http://example.com")
  .data("query", "Java")
  .userAgent("Mozilla")
  .cookie("auth", "token")
  .timeout(3000)
  .post();  

HTTP requests now support gzip compression by default.

The Connection interface is new and may change slightly in upcoming releases. Please get in touch if you have suggestions or find bugs.

Other improvements and bug fixes

  • Added the Element.ownText() method, to get only the direct text of an element, not including the text of its children.
  • Added support for selectors :containsOwn(text) and :matchesOwn(regex), to supplement Element.ownText().
  • Further speed optimisations for parsing and output generation.
  • Fixed support for case-sensitive HTML escape entities. (issue #31)
  • Fixed issue when parsing tags with keyless attributes. (issue #32)

Many thanks to everyone who has helped with this release of jsoup by contributing to the mailing lists, sending in bugs, and getting in touch with me.

If you have any suggestions for the next release, I would love to hear them; please get in touch via the mailing list or to me directly.