jsoup Java HTML Parser release 1.10.3

2017-Jun-11

jsoup 1.10.3 features better performance of CSS selectors, Jsoup.Connection improvements, and other bug fixes.

jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.

Download jsoup now.

Improvements

  • Added Elements.eachText() and Elements.eachAttr(), which return a list of an Element's text or attribute values, respectively. This makes it simpler to for example get a list of each URL on a page: List<String> urls = doc.select("a").eachAttr("abs:href"");
  • Improved selector validation for :contains(...) with unbalanced quotes.
  • Improved the speed of index based CSS selectors and other methods that use elementSiblingIndex, by a factor of 34x.
  • Added Node.clearAttributes(), to simplify removing of all attributes of a Node / Element.

Fixes

  • Bugfix: if an attribute name started or ended with a control character, the parse would fail with a validation exception.
  • Bugfix: Element.hasClass() and the .classname selector would not find the class attribute case-insensitively.
  • Bugfix: In Jsoup.Connection, if a redirect contained a query string with %xx escapes, they would be double escaped before the redirect was followed, leading to fetching an incorrect location.
  • Bugfix: In Jsoup.Connection, if a request body was set and the connection was redirected, the body would incorrectly still be sent.
  • Bugfix: In DataUtil when detecting the character set from meta data, and there are two Content-Types defined, use the one that defines a character set.
  • Bugfix: when parsing unknown tags in case-sensitive HTML mode, end tags would not close scope correctly.
  • In Jsoup.Connection, ensure there is no Content-Type set when being redirected to a GET.
  • Bugfix: in certain locales (Turkish specifically), lowercasing and case insensitivity could fail for specific items.

Many thanks to everyone who contributed patches, suggestions, and bug reports. If you have any suggestions for the next release, I would love to hear them; please get in touch via the mailing list or to me directly.