HTML5 parser out of beta: jsoup 1.6.1 released
2011-Jul-02 I am very happy to announce that jsoup 1.6.1 has been released and is now available for download.
jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.
HTML5 parser out of beta
This release of jsoup includes a completely re-implemented parser, based on the WHATWG HTML5 specification. jsoup now parses HTML exactly like modern browsers such as Chrome, Firefox, and Safari parse HTML. This helps users scrape data more readily, and improves HTML tidying.
This release is a stabilised version of the
1.6.0 beta release.
Improvements and bug fixes since 1.6.0
- Fixed Java 1.5 (and Android 2.2) compatibility.
- Fixed an issue when parsing
<script>tags in body where the tokeniser wouldn't switch to the InScript state, which meant that data wasn't parsed correctly.
- Fixed an issue with a missing quote when serialising
- Fixed issue where a single
0character was lexed incorrectly as a null character.
- Fixed normalisation of carriage returns to newlines on input HTML.
- Disabled memory mapped files when loading files from disk, to improve compatibility in Windows environments.