jsoup release 1.9.2

2016-May-17

jsoup 1.9.2 is a quick release to fix an issue where the parser could get stuck in an infinite loop on tags that contained mixed ascii and non-ascii characters. It also addresses some other minor issues.

jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.

Download jsoup now.

Improvements

  • In XML documents, detect the charset from the XML prolog -- <?xml encoding="UTF-8"?>

Bug Fixes

  • Fixed an issue where tag names that contained non-ascii characters but started with an ascii character would cause the parser to get stuck in an infinite loop.
  • Fixed an issue where API created XML documents would have an incorrect prolog.
  • Fixed an issue where you could not use an attribute selector to find values containing unbalanced braces or parentheses.
  • Fixed an issue where namespaced tags (like <fb:comment>) would cause Element.cssSelector() to fail.

Many thanks to everyone who contributed patches, suggestions, and bug reports. If you have any suggestions for the next release, I would love to hear them; please get in touch via the mailing list or to me directly.