<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>jsoup</title>
  <link href="https://jsoup.org/feed.xml" rel="self"></link>
  <updated>2026-01-01T00:00:00Z</updated>
  <author>
    <name>Jonathan Hedley</name>
    <uri>https://jsoup.org/</uri>
  </author>
  <id>https://jsoup.org/</id>
  <icon>https://jsoup.org/favicon.ico</icon>
  <entry>
    <title>jsoup Java HTML Parser release 1.22.1</title>
    <link href="https://jsoup.org/news/release-1.22.1"></link>
    <id>https://jsoup.org/news/release-1.22.1</id>
    <updated>2026-01-01T00:00:00Z</updated>
    <summary>jsoup 1.22.1 adds re2j regex support, configurable parser depth, and numerous bug fixes and improvements.</summary>
    <content type="html">&lt;div class=&#34;entry-head&#34;&gt;&lt;div class=&#34;meta pubdate&#34;&gt;&lt;time datetime=&#34;2026-01-01&#34;&gt;Jan 1, 2026&lt;/time&gt;&lt;/div&gt;&lt;/div&gt;&#xA;&lt;p&gt;&lt;b&gt;jsoup 1.22.1&lt;/b&gt; is out now, adding support for the &lt;code&gt;re2j&lt;/code&gt; regular expression engine for regex-based CSS selectors, a configurable maximum parser depth, and numerous bug fixes and improvements.&lt;/p&gt;&#xA;&lt;p&gt;&lt;b&gt;jsoup&lt;/b&gt; is a Java library for working with real-world HTML and XML. It provides a very convenient API for extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors.&lt;/p&gt;&#xA;&lt;p&gt;&lt;a href=&#34;/download&#34;&gt;&lt;b&gt;Download&lt;/b&gt;&lt;/a&gt; jsoup now.&lt;/p&gt;&#xA;&lt;h2 id=&#34;improvements&#34;&gt;Improvements&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Added support for using the &lt;code&gt;re2j&lt;/code&gt; regular expression engine for regex-based CSS selectors (e.g. &lt;code&gt;[attr~=regex]&lt;/code&gt;, &lt;code&gt;:matches(regex)&lt;/code&gt;), which ensures linear-time performance for regex evaluation. This allows safer handling of arbitrary user-supplied query regexes. To enable, add the &lt;code&gt;com.google.re2j&lt;/code&gt; dependency to your classpath, e.g.:&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;pre class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;nt&#34;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;nt&#34;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;com.google.re2j&lt;span class=&#34;nt&#34;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;nt&#34;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;re2j&lt;span class=&#34;nt&#34;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;nt&#34;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.8&lt;span class=&#34;nt&#34;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;nt&#34;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;(If you already have that dependency in your classpath, but you want to keep using the Java regex engine, you can disable re2j via &lt;code&gt;System.setProperty(&amp;#34;jsoup.useRe2j&amp;#34;, &amp;#34;false&amp;#34;)&lt;/code&gt;.) You can confirm that the re2j engine has been enabled correctly by calling &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/helper/Regex#usingRe2j()&#34; title=&#34;Checks if re2j is available (on classpath) and enabled (via system property).&#34;&gt;Regex.usingRe2j()&lt;/a&gt;&lt;/code&gt;. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/pull/2407&#34;&gt;#2407&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Added an instance method &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/parser/Parser#unescape(java.lang.String,boolean)&#34; title=&#34;Utility method to unescape HTML entities from a string, using this Parser&amp;#39;s configuration (for example, to&#xA;     collect errors while unescaping).&#34;&gt;Parser#unescape(String, boolean)&lt;/a&gt;&lt;/code&gt; that unescapes HTML entities using the parser’s configuration (e.g. to support error tracking), complementing the existing static utility &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/parser/Parser#unescapeEntities(java.lang.String,boolean)&#34; title=&#34;Utility method to unescape HTML entities from a string.&#34;&gt;Parser.unescapeEntities(String, boolean)&lt;/a&gt;&lt;/code&gt;. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/pull/2396&#34;&gt;#2396&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;Added a configurable maximum parser depth (to limit the number of open elements on stack) to both HTML and XML parsers. The HTML parser now defaults to a depth of 512 to match browser behavior, and protect against unbounded stack growth, while the XML parser keeps unlimited depth by default, but can opt into a limit via &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/parser/Parser#setMaxDepth(int)&#34; title=&#34;Set the parser&amp;#39;s maximum stack depth (maximum number of open elements).&#34;&gt;Parser.setMaxDepth()&lt;/a&gt;&lt;/code&gt;. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2421&#34;&gt;#2421&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;Build: added CI coverage for JDK 25 &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/pull/2403&#34;&gt;#2403&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;Build: added a CI fuzzer for contextual fragment parsing (in addition to existing full body HTML and XML fuzzers). &lt;a href=&#34;https://github.com/google/oss-fuzz/pull/14041&#34;&gt;oss-fuzz #14041&lt;/a&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;changes&#34;&gt;Changes&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Set a removal schedule of jsoup 1.24.1 for previously deprecated APIs.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;bug-fixes&#34;&gt;Bug Fixes&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Previously cached child &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/select/Elements&#34; title=&#34;A list of Elements, with methods that act on every element in the list.&#34;&gt;Elements&lt;/a&gt;&lt;/code&gt; of an &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/nodes/Element&#34; title=&#34;An HTML Element consists of a tag name, attributes, and child nodes (including text nodes and other elements).&#34;&gt;Element&lt;/a&gt;&lt;/code&gt; were not correctly invalidated in &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/nodes/Node#replaceWith(org.jsoup.nodes.Node)&#34; title=&#34;Replace this node in the DOM with the supplied node.&#34;&gt;Node#replaceWith(Node)&lt;/a&gt;&lt;/code&gt;, which could lead to incorrect results when subsequently calling &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/nodes/Element#children()&#34; title=&#34;Get this element&amp;#39;s child elements.&#34;&gt;Element#children()&lt;/a&gt;&lt;/code&gt;. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2391&#34;&gt;#2391&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;Attribute selector values are now compared literally without trimming. Previously, jsoup trimmed whitespace from selector values and from element attribute values, which could cause mismatches with browser behavior (e.g. &lt;code&gt;[attr=&amp;#34; foo &amp;#34;]&lt;/code&gt;). Now matches align with the CSS specification and browser engines. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2380&#34;&gt;#2380&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;When using the JDK HttpClient, any system default proxy (&lt;code&gt;ProxySelector.getDefault()&lt;/code&gt;) was ignored. Now, the system proxy is used if a per-request proxy is not set. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2388&#34;&gt;#2388&lt;/a&gt;&lt;/small&gt;, &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/pull/2390&#34;&gt;#2390&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;A &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/helper/ValidationException&#34; title=&#34;Validation exceptions, as thrown by the methods in Validate.&#34;&gt;ValidationException&lt;/a&gt;&lt;/code&gt; could be thrown in the adoption agency algorithm with particularly broken input. Now logged as a parse error. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2393&#34;&gt;#2393&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;Null characters in the HTML body were not consistently removed; and in foreign content were not correctly replaced. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2395&#34;&gt;#2395&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;An &lt;code&gt;IndexOutOfBoundsException&lt;/code&gt; could be thrown when parsing a body fragment with crafted input. Now logged as a parse error. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2397&#34;&gt;#2397&lt;/a&gt;&lt;/small&gt;, &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2406&#34;&gt;#2406&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;When using StructuralEvaluators (e.g., a &lt;code&gt;parent child&lt;/code&gt; selector) across many retained threads, their memoized results could also be retained, increasing memory use. These results are now cleared immediately after use, reducing overall memory consumption. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2411&#34;&gt;#2411&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;Cloning a &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/parser/Parser&#34; title=&#34;Parses HTML or XML into a Document.&#34;&gt;Parser&lt;/a&gt;&lt;/code&gt; now preserves any custom &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/parser/TagSet&#34; title=&#34;A TagSet controls the Tag configuration for a Document&amp;#39;s parse, and its serialization.&#34;&gt;TagSet&lt;/a&gt;&lt;/code&gt; applied to the parser. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2422&#34;&gt;#2422&lt;/a&gt;&lt;/small&gt;, &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/pull/2423&#34;&gt;#2423&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;Custom tags marked as &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/parser/Tag#Void&#34; title=&#34;Tag option: the tag is a void tag (e.g., &amp;lt;img&amp;gt;), that can contain no children, and in HTML does not require closing.&#34;&gt;Tag.Void&lt;/a&gt;&lt;/code&gt; now parse and serialize like the built-in void elements: they no longer consume following content, and the XML serializer emits the expected self-closing form. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2425&#34;&gt;#2425&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;The &lt;code&gt;&amp;lt;br&amp;gt;&lt;/code&gt; element is once again classified as an inline tag (&lt;code&gt;Tag.isBlock() == false&lt;/code&gt;), matching common developer expectations and its role as phrasing content in HTML, while pretty-printing and text extraction continue to treat it as a line break in the rendered output. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2387&#34;&gt;#2387&lt;/a&gt;&lt;/small&gt;, &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2439&#34;&gt;#2439&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;Fixed an intermittent truncation issue when fetching and parsing remote documents via &lt;code&gt;Jsoup.connect(url).get()&lt;/code&gt;. On responses without a charset header, the initial charset sniff could sometimes (depending on buffering / &lt;code&gt;available()&lt;/code&gt; behavior) be mistaken for end-of-stream and a partial parse reused, dropping trailing content. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2448&#34;&gt;#2448&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/parser/TagSet&#34; title=&#34;A TagSet controls the Tag configuration for a Document&amp;#39;s parse, and its serialization.&#34;&gt;TagSet&lt;/a&gt;&lt;/code&gt; copies no longer mutate their template during lazy lookups, preventing cross-thread &lt;code&gt;ConcurrentModificationException&lt;/code&gt; when parsing with shared sessions. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/pull/2453&#34;&gt;#2453&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;Fixed parsing of &lt;code&gt;&amp;lt;svg&amp;gt;&lt;/code&gt; &lt;code&gt;foreignObject&lt;/code&gt; content nested within a &lt;code&gt;&amp;lt;p&amp;gt;&lt;/code&gt;, which could incorrectly move the HTML subtree outside the SVG. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2452&#34;&gt;#2452&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;internal-changes&#34;&gt;Internal Changes&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Deprecated internal helper &lt;code&gt;org.jsoup.internal.Functions&lt;/code&gt; (for removal in v1.23.1). This was previously used to support older Android API levels without full &lt;code&gt;java.util.function&lt;/code&gt; coverage; jsoup now requires core library desugaring so this indirection is no longer necessary. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/pull/2412&#34;&gt;#2412&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;hr/&gt;&#xA;&lt;p&gt;My sincere thanks to everyone who contributed to this release!&#xA;If you have any suggestions for the next release, I would love to hear them; please get in touch via &lt;a href=&#34;https://github.com/jhy/jsoup/discussions&#34;&gt;jsoup discussions&lt;/a&gt;, or with me &lt;a href=&#34;https://jhedley.com/&#34;&gt;directly&lt;/a&gt;.&lt;/p&gt;&#xA;&lt;p&gt;You can also &lt;a rel=&#34;me&#34; href=&#34;https://tilde.zone/@jhy&#34;&gt;follow me&lt;/a&gt; (&lt;b&gt;&lt;a rel=&#34;me&#34; href=&#34;https://tilde.zone/@jhy&#34;&gt;@jhy@tilde.zone&lt;/a&gt;&lt;/b&gt;) on Mastodon / Fediverse to receive occasional notes about jsoup releases.&lt;/p&gt;&#xA;&#xA;      </content>
  </entry>
  <entry>
    <title>jsoup Java HTML Parser release 1.21.2</title>
    <link href="https://jsoup.org/news/release-1.21.2"></link>
    <id>https://jsoup.org/news/release-1.21.2</id>
    <updated>2025-08-25T00:00:00Z</updated>
    <summary>jsoup 1.21.2 adds custom SSLContext support, improves attribute handling, boosts DOM performance, and fixes edge case parsing bugs.</summary>
    <content type="html">&lt;div class=&#34;entry-head&#34;&gt;&lt;div class=&#34;meta pubdate&#34;&gt;&lt;time datetime=&#34;2025-08-25&#34;&gt;Aug 25, 2025&lt;/time&gt;&lt;/div&gt;&lt;/div&gt;&#xA;&lt;p&gt;&lt;b&gt;jsoup 1.21.2&lt;/b&gt; is out now, adding support for custom &lt;code&gt;SSLContext&lt;/code&gt; in HTTP/2 connections, and improving consistency in how user data is handled in attributes. It also brings performance gains in DOM manipulation and fragment parsing, and fixes several edge cases in stream parsing, traversal, cloning, and concurrent reads.&lt;/p&gt;&#xA;&lt;p&gt;&lt;b&gt;jsoup&lt;/b&gt; is a Java library for working with real-world HTML and XML. It provides a very convenient API for extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors.&lt;/p&gt;&#xA;&lt;h2 id=&#34;changes&#34;&gt;Changes&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Deprecated internal (yet visible) methods &lt;code&gt;Normalizer#normalize(String, bool)&lt;/code&gt; and &lt;code&gt;Attribute#shouldCollapseAttribute(Document.OutputSettings)&lt;/code&gt;. These will be removed in a future version.&lt;/li&gt;&#xA;&lt;li&gt;Deprecated &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/Connection#sslSocketFactory(javax.net.ssl.SSLSocketFactory)&#34; title=&#34;Deprecated.&#xA;use Connection.sslContext(SSLContext) instead; will be removed in jsoup 1.24.1.&#34;&gt;Connection#sslSocketFactory(SSLSocketFactory)&lt;/a&gt;&lt;/code&gt; in favor of the new &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/Connection#sslContext(javax.net.ssl.SSLContext)&#34; title=&#34;Set a custom SSL context for HTTPS connections.&#34;&gt;Connection#sslContext(SSLContext)&lt;/a&gt;&lt;/code&gt;. Using &lt;code&gt;sslSocketFactory&lt;/code&gt; will force the use of the legacy &lt;code&gt;HttpUrlConnection&lt;/code&gt; implementation, which does not support HTTP/2. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/pull/2370&#34;&gt;#2370&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;improvements&#34;&gt;Improvements&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;When pretty-printing, if there are consecutive text nodes (via DOM manipulation), the non-significant whitespace between them will be collapsed. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/pull/2349&#34;&gt;#2349&lt;/a&gt;&lt;/small&gt;.&lt;/li&gt;&#xA;&lt;li&gt;Updated &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/Connection.Response#statusMessage()&#34; title=&#34;Get the status message of the response.&#34;&gt;Connection.Response#statusMessage()&lt;/a&gt;&lt;/code&gt; to return a simple loggable string message (e.g. “OK”) when using the &lt;code&gt;HttpClient&lt;/code&gt; implementation, which doesn’t otherwise return any server-set status message. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2346&#34;&gt;#2356&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/nodes/Attributes#size()&#34; title=&#34;Get the number of attributes in this set, excluding any internal-only attributes (e.g. user data).&#34;&gt;Attributes#size()&lt;/a&gt;&lt;/code&gt; and &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/nodes/Attributes#isEmpty()&#34; title=&#34;Test if this Attributes list is empty.&#34;&gt;Attributes#isEmpty()&lt;/a&gt;&lt;/code&gt; now exclude any internal attributes (such as user data) from their count. This aligns with the attributes’ serialized output and iterator. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/pull/2369&#34;&gt;#2369&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;Added &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/Connection#sslContext(javax.net.ssl.SSLContext)&#34; title=&#34;Set a custom SSL context for HTTPS connections.&#34;&gt;Connection#sslContext(SSLContext)&lt;/a&gt;&lt;/code&gt; to provide a custom SSL (TLS) context to requests, supporting both the &lt;code&gt;HttpClient&lt;/code&gt; and the legacy &lt;code&gt;HttUrlConnection&lt;/code&gt; implementations. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/pull/2370&#34;&gt;#2370&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;Performance optimizations for DOM manipulation methods including when repeatedly removing an element’s first child (&lt;code&gt;element.child(0).remove()&lt;/code&gt;), and when using &lt;code&gt;Parser#parseBodyFragement()&lt;/code&gt; to parse a large number of direct children. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/pull/2373&#34;&gt;#2373&lt;/a&gt;&lt;/small&gt;.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;bug-fixes&#34;&gt;Bug Fixes&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;When parsing from an InputStream and a multibyte character happened to straddle a buffer boundary, the stream would not be completely read. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2353&#34;&gt;#2353&lt;/a&gt;&lt;/small&gt;.&lt;/li&gt;&#xA;&lt;li&gt;In &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/select/NodeTraversor&#34; title=&#34;A depth-first node traversor.&#34;&gt;NodeTraversor&lt;/a&gt;&lt;/code&gt;, if a last child element was removed during the &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/nodes/Document#head()&#34; title=&#34;Get this document&amp;#39;s head element.&#34;&gt;head()&lt;/a&gt;&lt;/code&gt; call, the parent would be visited twice. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2355&#34;&gt;#2355&lt;/a&gt;&lt;/small&gt;.&lt;/li&gt;&#xA;&lt;li&gt;Cloning an Element that has an Attributes object would add an empty internal user-data attribute to that clone, which would cause unexpected results for &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/nodes/Attributes#size()&#34; title=&#34;Get the number of attributes in this set, excluding any internal-only attributes (e.g. user data).&#34;&gt;Attributes#size()&lt;/a&gt;&lt;/code&gt; and &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/nodes/Attributes#isEmpty()&#34; title=&#34;Test if this Attributes list is empty.&#34;&gt;Attributes#isEmpty()&lt;/a&gt;&lt;/code&gt;. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2356&#34;&gt;#2356&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;In a multithreaded application where multiple threads are calling &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/nodes/Element#children()&#34; title=&#34;Get this element&amp;#39;s child elements.&#34;&gt;Element#children()&lt;/a&gt;&lt;/code&gt; on the same element concurrently, a race condition could happen when the method was generating the internal child element cache (a filtered view of its child nodes). Since concurrent reads of DOM objects should be threadsafe without external synchronization, this method has been updated to execute atomically. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2366&#34;&gt;#2366&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;When parsing HTML with &lt;code&gt;svg:script&lt;/code&gt; elements in SVG elements, don’t enter the Text insertion mode, but continue to parse as foreign content. Otherwise, misnested HTML could then cause an &lt;code&gt;IndexOutOfBoundsException&lt;/code&gt;. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2374&#34;&gt;#2374&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;Malformed HTML could throw an IndexOutOfBoundsException during the adoption agency. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/pull/2377&#34;&gt;#2377&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;hr/&gt;&#xA;&lt;p&gt;My sincere thanks to everyone who contributed to this release!&#xA;If you have any suggestions for the next release, I would love to hear them; please get in touch via &lt;a href=&#34;https://github.com/jhy/jsoup/discussions&#34;&gt;jsoup discussions&lt;/a&gt;, or with me &lt;a href=&#34;https://jhedley.com/&#34;&gt;directly&lt;/a&gt;.&lt;/p&gt;&#xA;&lt;p&gt;You can also &lt;a rel=&#34;me&#34; href=&#34;https://tilde.zone/@jhy&#34;&gt;follow me&lt;/a&gt; (&lt;b&gt;&lt;a rel=&#34;me&#34; href=&#34;https://tilde.zone/@jhy&#34;&gt;@jhy@tilde.zone&lt;/a&gt;&lt;/b&gt;) on Mastodon / Fediverse to receive occasional notes about jsoup releases.&lt;/p&gt;&#xA;&lt;p&gt;&lt;a href=&#34;/download&#34;&gt;&lt;i&gt;Download&lt;/i&gt;&lt;/a&gt; jsoup now.&lt;/p&gt;&#xA;&#xA;      </content>
  </entry>
  <entry>
    <title>jsoup Java HTML Parser release 1.21.1</title>
    <link href="https://jsoup.org/news/release-1.21.1"></link>
    <id>https://jsoup.org/news/release-1.21.1</id>
    <updated>2025-06-23T00:00:00Z</updated>
    <summary>jsoup 1.21.1 delivers powerful node selection capabilities, dynamic tag customization, HTTP/2 support, and enhanced security against XSS attacks.</summary>
    <content type="html">&lt;div class=&#34;entry-head&#34;&gt;&lt;div class=&#34;meta pubdate&#34;&gt;&lt;time datetime=&#34;2025-06-23&#34;&gt;Jun 23, 2025&lt;/time&gt;&lt;/div&gt;&lt;/div&gt;&#xA;&lt;p&gt;&lt;b&gt;jsoup 1.21.1&lt;/b&gt; is out now, featuring powerful new node selection capabilities that let you target specific DOM nodes like comments and text nodes using CSS selectors, dynamic tag customization through the new TagSet callback system, and improved defense against mutation XSS attacks with simplified attribute escaping. This release also brings HTTP/2 support by default, numerous API improvements for better developer experience, and fixes for several edge-case parsing issues.&lt;/p&gt;&#xA;&lt;p&gt;&lt;b&gt;jsoup&lt;/b&gt; is a Java library for working with real-world HTML and XML. It provides a very convenient API for extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors.&lt;/p&gt;&#xA;&lt;h2 id=&#34;changes&#34;&gt;Changes&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Removed previously deprecated methods. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/pull/2317&#34;&gt;#2317&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;Deprecated the &lt;code&gt;:matchText&lt;/code&gt; pseduo-selector due to its side effects on the DOM; use the new &lt;code&gt;::textnode&lt;/code&gt; selector and the &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/nodes/Element#selectNodes(java.lang.String,java.lang.Class)&#34; title=&#34;Find nodes that match the supplied Selector CSS query, with this element as the starting context.&#34;&gt;Element#selectNodes(String css, Class&amp;lt;T&amp;gt; type)&lt;/a&gt;&lt;/code&gt; method instead. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/pull/2343&#34;&gt;#2343&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;Deprecated &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/Connection.Response#bufferUp()&#34; title=&#34;Deprecated.&#xA;use Connection.Response.readFully() instead (for the checked exception). Will be removed in jsoup 1.24.1.&#34;&gt;Connection.Response#bufferUp()&lt;/a&gt;&lt;/code&gt; in lieu of &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/Connection.Response#readFully()&#34; title=&#34;Read the body of the response into a local buffer, so that Connection.Response.parse() may be called repeatedly on the same&#xA;         connection response.&#34;&gt;Connection.Response#readFully()&lt;/a&gt;&lt;/code&gt; which can throw a checked IOException.&lt;/li&gt;&#xA;&lt;li&gt;Deprecated internal methods &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/helper/Validate#ensureNotNull(java.lang.Object)&#34; title=&#34;Deprecated.&#xA;prefer to use Validate.expectNotNull(Object, String, Object...) instead; will be removed in jsoup 1.24.1&#34;&gt;Validate#ensureNotNull(Object)&lt;/a&gt;&lt;/code&gt; (replaced by typed &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/helper/Validate#expectNotNull(T)&#34; title=&#34;Verifies the input object is not null, and returns that object, maintaining its type.&#34;&gt;Validate#expectNotNull(T)&lt;/a&gt;&lt;/code&gt;); protected HTML appenders from Attribute and Node.&lt;/li&gt;&#xA;&lt;li&gt;If you happen to be using any of the deprecated methods, please take the opportunity now to migrate away from them, as they will be removed in a future release.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;improvements&#34;&gt;Improvements&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Enhanced the &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/select/Selector&#34; title=&#34;CSS element selector, that finds elements matching a query.&#34;&gt;Selector&lt;/a&gt;&lt;/code&gt; to support direct matching against nodes such as comments and text nodes. For example, you can now find an element that follows a specific comment: &lt;code&gt;::comment:contains(prices) + p&lt;/code&gt; will select &lt;code&gt;p&lt;/code&gt; elements immediately after a &lt;code&gt;&amp;lt;!-- prices: --&amp;gt;&lt;/code&gt; comment. Supported types include &lt;code&gt;::node&lt;/code&gt;, &lt;code&gt;::leafnode&lt;/code&gt;, &lt;code&gt;::comment&lt;/code&gt;, &lt;code&gt;::text&lt;/code&gt;, &lt;code&gt;::data&lt;/code&gt;, and &lt;code&gt;::cdata&lt;/code&gt;. Node contextual selectors like &lt;code&gt;::node:contains(text)&lt;/code&gt;, &lt;code&gt;:matches(regex)&lt;/code&gt;, and &lt;code&gt;:blank&lt;/code&gt; are also supported. Introduced &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/nodes/Element#selectNodes(java.lang.String)&#34; title=&#34;Find nodes that match the supplied Selector CSS query, with this element as the starting context.&#34;&gt;Element#selectNodes(String css)&lt;/a&gt;&lt;/code&gt; and &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/nodes/Element#selectNodes(java.lang.String,java.lang.Class)&#34; title=&#34;Find nodes that match the supplied Selector CSS query, with this element as the starting context.&#34;&gt;Element#selectNodes(String css, Class&amp;lt;T&amp;gt; nodeType)&lt;/a&gt;&lt;/code&gt; for direct node selection. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/pull/2324&#34;&gt;#2324&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;Added &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/parser/TagSet#onNewTag(java.util.function.Consumer)&#34; title=&#34;Register a callback to customize each Tag as it&amp;#39;s added to this TagSet.&#34;&gt;TagSet#onNewTag(Consumer&amp;lt;Tag&amp;gt; customizer)&lt;/a&gt;&lt;/code&gt;: register a callback that’s invoked for each new or cloned Tag when it’s inserted into the set. Enables dynamic tweaks of tag options (for example, marking all custom tags as self-closing, or everything in a given namespace as preserving whitespace). &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2330&#34;&gt;#2330&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;Made &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/parser/TokenQueue&#34; title=&#34;A character reader with helpers focusing on parsing CSS selectors.&#34;&gt;TokenQueue&lt;/a&gt;&lt;/code&gt; and &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/parser/CharacterReader&#34; title=&#34;CharacterReader consumes tokens off a string.&#34;&gt;CharacterReader&lt;/a&gt;&lt;/code&gt; autocloseable, to ensure that they will release their buffers back to the buffer pool, for later reuse.&lt;/li&gt;&#xA;&lt;li&gt;Added &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/select/Selector#evaluatorOf(java.lang.String)&#34; title=&#34;Parse a CSS query into an Evaluator.&#34;&gt;Selector#evaluatorOf(String css)&lt;/a&gt;&lt;/code&gt;, as a clearer way to obtain an Evaluator from a CSS query. An alias of &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/select/QueryParser#parse(java.lang.String)&#34; title=&#34;Parse a CSS query into an Evaluator.&#34;&gt;QueryParser.parse(String css)&lt;/a&gt;&lt;/code&gt;.&lt;/li&gt;&#xA;&lt;li&gt;Custom tags (defined via the &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/parser/TagSet&#34; title=&#34;A TagSet controls the Tag configuration for a Document&amp;#39;s parse, and its serialization.&#34;&gt;TagSet&lt;/a&gt;&lt;/code&gt;) in a foreign namespace (e.g. SVG) can be configured to parse as data tags.&lt;/li&gt;&#xA;&lt;li&gt;Added &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/select/NodeVisitor#traverse(org.jsoup.nodes.Node)&#34; title=&#34;Run a depth-first traverse of the root and all of its descendants.&#34;&gt;NodeVisitor#traverse(Node)&lt;/a&gt;&lt;/code&gt; to simplify node traversal calls (vs. importing &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/select/NodeTraversor&#34; title=&#34;A depth-first node traversor.&#34;&gt;NodeTraversor&lt;/a&gt;&lt;/code&gt;).&lt;/li&gt;&#xA;&lt;li&gt;Updated the default user-agent string to improve compatibility. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2341&#34;&gt;#2341&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;The HTML parser now allows the specific text-data type (Data, RcData) to be customized for known tags. (Previously, that was only supported on custom tags.) &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2326&#34;&gt;#2326&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;Added &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/Connection.Response#readFully()&#34; title=&#34;Read the body of the response into a local buffer, so that Connection.Response.parse() may be called repeatedly on the same&#xA;         connection response.&#34;&gt;Connection.Response#readFully()&lt;/a&gt;&lt;/code&gt; as a replacement for &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/Connection.Response#bufferUp()&#34; title=&#34;Deprecated.&#xA;use Connection.Response.readFully() instead (for the checked exception). Will be removed in jsoup 1.24.1.&#34;&gt;Connection.Response#bufferUp()&lt;/a&gt;&lt;/code&gt; with an explicit IOException. Similarly, added &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/Connection.Response#readBody()&#34; title=&#34;Read the response body, and returns it as a plain String.&#34;&gt;Connection.Response#readBody()&lt;/a&gt;&lt;/code&gt; over &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/Connection.Response#body()&#34; title=&#34;Get the body of the response as a plain String.&#34;&gt;Connection.Response#body()&lt;/a&gt;&lt;/code&gt;. Deprecated &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/Connection.Response#bufferUp()&#34; title=&#34;Deprecated.&#xA;use Connection.Response.readFully() instead (for the checked exception). Will be removed in jsoup 1.24.1.&#34;&gt;Connection.Response#bufferUp()&lt;/a&gt;&lt;/code&gt;. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/pull/2327&#34;&gt;#2327&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;When serializing HTML, the &lt;code&gt;&amp;lt;&lt;/code&gt; and &lt;code&gt;&amp;gt;&lt;/code&gt; characters are now escaped in attributes. This helps prevent a class of mutation XSS attacks. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/pull/2337&#34;&gt;#2337&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;Changed &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/Connection&#34; title=&#34;The Connection interface is a convenient HTTP client and session object to fetch content from the web, and parse them&#xA; into Documents.&#34;&gt;Connection&lt;/a&gt;&lt;/code&gt; to prefer using the JDK’s HttpClient over HttpUrlConnection, if available, to enable HTTP/2 support by default. Users can disable via &lt;code&gt;-Djsoup.useHttpClient=false&lt;/code&gt;. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/pull/2340&#34;&gt;#2340&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;bug-fixes&#34;&gt;Bug Fixes&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;The contents of a &lt;code&gt;script&lt;/code&gt; in a &lt;code&gt;svg&lt;/code&gt; foreign context should be parsed as script data, not text. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2320&#34;&gt;#2320&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/parser/Tag#isFormSubmittable()&#34; title=&#34;Get if this tag represents an element that should be submitted with a form.&#34;&gt;Tag#isFormSubmittable()&lt;/a&gt;&lt;/code&gt; was updating the Tag’s options. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2323&#34;&gt;#2323&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;The HTML pretty-printer would incorrectly trim whitespace when text followed an inline element in a block element. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2325&#34;&gt;#2325&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;Custom tags with hyphens or other non-letter characters in their names now work correctly as Data or RcData tags. Their closing tags are now tokenized properly. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2332&#34;&gt;#2332&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;When cloning an Element, the clone would retain the source’s cached child Element list (if any), which could lead to incorrect results when modifying the clone’s child elements. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2334&#34;&gt;#2334&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;hr/&gt;&#xA;&lt;p&gt;My sincere thanks to everyone who contributed to this release!&#xA;If you have any suggestions for the next release, I would love to hear them; please get in touch via &lt;a href=&#34;https://github.com/jhy/jsoup/discussions&#34;&gt;jsoup discussions&lt;/a&gt;, or with me &lt;a href=&#34;https://jhedley.com/&#34;&gt;directly&lt;/a&gt;.&lt;/p&gt;&#xA;&lt;p&gt;You can also &lt;a rel=&#34;me&#34; href=&#34;https://tilde.zone/@jhy&#34;&gt;follow me&lt;/a&gt; (&lt;b&gt;&lt;a rel=&#34;me&#34; href=&#34;https://tilde.zone/@jhy&#34;&gt;@jhy@tilde.zone&lt;/a&gt;&lt;/b&gt;) on Mastodon / Fediverse to receive occasional notes about jsoup releases.&lt;/p&gt;&#xA;&lt;p&gt;&lt;a href=&#34;/download&#34;&gt;&lt;i&gt;Download&lt;/i&gt;&lt;/a&gt;jsoup now.&lt;/p&gt;&#xA;&#xA;      </content>
  </entry>
  <entry>
    <title>jsoup Java HTML Parser release 1.20.1</title>
    <link href="https://jsoup.org/news/release-1.20.1"></link>
    <id>https://jsoup.org/news/release-1.20.1</id>
    <updated>2025-04-29T00:00:00Z</updated>
    <summary>jsoup 1.20.1 brings tighter HTML parsing, improved XML support, new API methods, performance gains, and robust bug fixes.</summary>
    <content type="html">&lt;div class=&#34;entry-head&#34;&gt;&lt;div class=&#34;meta pubdate&#34;&gt;&lt;time datetime=&#34;2025-04-29&#34;&gt;Apr 29, 2025&lt;/time&gt;&lt;/div&gt;&lt;/div&gt;&#xA;&lt;p&gt;&lt;b&gt;jsoup 1.20.1&lt;/b&gt; is out now, featuring improved HTML parse rules to align with modern browsers, improved XML namespace handling, and a redesigned HTML pretty-printer for better consistency and customizability. This release also delivers performance optimizations, new API enhancements such as flexible tag definitions via &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/parser/TagSet&#34; title=&#34;A TagSet controls the Tag configuration for a Document&amp;#39;s parse, and its serialization.&#34;&gt;TagSet&lt;/a&gt;&lt;/code&gt;, concise CSS selectors, and  parser thread-safety improvements. Additionally, multiple bug fixes enhance XML serialization and W3C DOM interoperability.&lt;/p&gt;&#xA;&lt;p&gt;&lt;b&gt;jsoup&lt;/b&gt; is a Java library for working with real-world HTML and XML. It provides a very convenient API for extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors.&lt;/p&gt;&#xA;&lt;h2 id=&#34;changes&#34;&gt;Changes&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;To better follow the HTML5 spec and current browsers, the HTML parser no longer allows self-closing tags (&lt;code&gt;&amp;lt;foo /&amp;gt;&lt;/code&gt;) to close HTML elements by default. Foreign content (SVG, MathML), and content parsed with the XML parser, still supports self-closing tags. If you need specific HTML tags to support self-closing, you can register a custom tag via&#xA;the &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/parser/TagSet&#34; title=&#34;A TagSet controls the Tag configuration for a Document&amp;#39;s parse, and its serialization.&#34;&gt;TagSet&lt;/a&gt;&lt;/code&gt; configured in &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/parser/Parser#tagSet()&#34; title=&#34;Get the current TagSet for this Parser, which will be either this parser&amp;#39;s default, or one that you have set.&#34;&gt;Parser.tagSet()&lt;/a&gt;&lt;/code&gt;, using &lt;code&gt;Tag#set(Tag.SelfClose)&lt;/code&gt;. Standard void tags (such as &lt;code&gt;&amp;lt;img&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;br&amp;gt;&lt;/code&gt;, etc.) continue to behave as usual and are not affected by this&#xA;change. &lt;a href=&#34;https://github.com/jhy/jsoup/issues/2300&#34;&gt;#2300&lt;/a&gt;.&lt;/li&gt;&#xA;&lt;li&gt;The following internal components have been &lt;b&gt;deprecated&lt;/b&gt;. If you do happen to be using any of these, please take the opportunity now to migrate away from them, as they will be removed in jsoup 1.21.1.&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;code&gt;ChangeNotifyingArrayList&lt;/code&gt;, &lt;code&gt;Document.updateMetaCharsetElement()&lt;/code&gt;, &lt;code&gt;Document.updateMetaCharsetElement(boolean)&lt;/code&gt;, &lt;code&gt;HtmlTreeBuilder.isContentForTagData(String)&lt;/code&gt;, &lt;code&gt;Parser.isContentForTagData(String)&lt;/code&gt;, &lt;code&gt;Parser.setTreeBuilder(TreeBuilder)&lt;/code&gt;, &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/parser/Tag#formatAsBlock()&#34; title=&#34;Deprecated.&#xA;internal pretty-printing flag; use Tag.isInline() or Tag.isBlock() to check layout intent. Will be removed in jsoup 1.24.1.&#34;&gt;Tag.formatAsBlock()&lt;/a&gt;&lt;/code&gt;, &lt;code&gt;Tag.isFormListed()&lt;/code&gt;, &lt;code&gt;TokenQueue.addFirst(String)&lt;/code&gt;, &lt;code&gt;TokenQueue.chompTo(String)&lt;/code&gt;, &lt;code&gt;TokenQueue.chompToIgnoreCase(String)&lt;/code&gt;, &lt;code&gt;TokenQueue.consumeToIgnoreCase(String)&lt;/code&gt;, &lt;code&gt;TokenQueue.consumeWord()&lt;/code&gt;, &lt;code&gt;TokenQueue.matchesAny(String...)&lt;/code&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;functional-improvements&#34;&gt;Functional Improvements&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Rebuilt the HTML pretty-printer, to simplify and consolidate the implementation, improve consistency, support custom Tags, and provide a cleaner path for ongoing improvements. The specific HTML produced by the pretty-printer may be different from previous versions. &lt;a href=&#34;https://github.com/jhy/jsoup/issues/2286&#34;&gt;#2286&lt;/a&gt;.&lt;/li&gt;&#xA;&lt;li&gt;Added the ability to define custom tags, and to modify properties of known tags, via the &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/parser/TagSet&#34; title=&#34;A TagSet controls the Tag configuration for a Document&amp;#39;s parse, and its serialization.&#34;&gt;TagSet&lt;/a&gt;&lt;/code&gt; tag collection. Their properties can impact both the parse and how content is serialized (output as HTML or XML). &lt;a href=&#34;https://github.com/jhy/jsoup/issues/2285&#34;&gt;#2285&lt;/a&gt;.&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/nodes/Element#cssSelector()&#34; title=&#34;Get a CSS selector that will uniquely select this element.&#34;&gt;Element.cssSelector()&lt;/a&gt;&lt;/code&gt; will prefer to return shorter selectors by using ancestor IDs when available and unique. E.g. &lt;code&gt;#id &amp;gt; div &amp;gt; p&lt;/code&gt; instead of  &lt;code&gt;html &amp;gt; body &amp;gt; div &amp;gt; div &amp;gt; p&lt;/code&gt; &lt;a href=&#34;https://github.com/jhy/jsoup/pull/2283&#34;&gt;#2283&lt;/a&gt;.&lt;/li&gt;&#xA;&lt;li&gt;Added &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/select/Elements#deselect(int)&#34; title=&#34;Remove the Element at the specified index in this list, but not from the DOM.&#34;&gt;Elements.deselect(int index)&lt;/a&gt;&lt;/code&gt;, &lt;code&gt;Elements.deselect(Object o)&lt;/code&gt;, and &lt;code&gt;Elements.deselectAll()&lt;/code&gt; methods to remove elements from the &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/select/Elements&#34; title=&#34;A list of Elements, with methods that act on every element in the list.&#34;&gt;Elements&lt;/a&gt;&lt;/code&gt; list without removing them from the underlying DOM. Also added &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/select/Elements#asList()&#34; title=&#34;Convenience method to get the Elements as a plain ArrayList.&#34;&gt;Elements.asList()&lt;/a&gt;&lt;/code&gt; method to get a modifiable list of elements without affecting the DOM. (Individual Elements remain linked to the DOM.) &lt;a href=&#34;https://github.com/jhy/jsoup/issues/2100&#34;&gt;#2100&lt;/a&gt;.&lt;/li&gt;&#xA;&lt;li&gt;Added support for sending a request body from an InputStream with &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/Connection#requestBodyStream(java.io.InputStream)&#34; title=&#34;Set the request body.&#34;&gt;Connection.requestBodyStream(InputStream stream)&lt;/a&gt;&lt;/code&gt;. &lt;a href=&#34;https://github.com/jhy/jsoup/issues/1122&#34;&gt;#1122&lt;/a&gt;.&lt;/li&gt;&#xA;&lt;li&gt;The XML parser now supports scoped xmlns: prefix namespace declarations, and applies the correct namespace to Tags and Attributes. Also, added &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/parser/Tag#prefix()&#34; title=&#34;Get this tag&amp;#39;s prefix, if it has one; else the empty string.&#34;&gt;Tag#prefix()&lt;/a&gt;&lt;/code&gt;, &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/parser/Tag#localName()&#34; title=&#34;Get this tag&amp;#39;s local name.&#34;&gt;Tag#localName()&lt;/a&gt;&lt;/code&gt;, &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/nodes/Attribute#prefix()&#34; title=&#34;Get this attribute&amp;#39;s key prefix, if it has one; else the empty string.&#34;&gt;Attribute#prefix()&lt;/a&gt;&lt;/code&gt;, &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/nodes/Attribute#localName()&#34; title=&#34;Get this attribute&amp;#39;s local name.&#34;&gt;Attribute#localName()&lt;/a&gt;&lt;/code&gt;, and &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/nodes/Attribute#namespace()&#34; title=&#34;Get this attribute&amp;#39;s namespace URI, if the attribute was prefixed with a defined namespace name.&#34;&gt;Attribute#namespace()&lt;/a&gt;&lt;/code&gt; to retrieve these. &lt;a href=&#34;https://github.com/jhy/jsoup/issues/2299&#34;&gt;#2299&lt;/a&gt;.&lt;/li&gt;&#xA;&lt;li&gt;CSS identifiers are now escaped and unescaped correctly to the CSS spec. &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/nodes/Element#cssSelector()&#34; title=&#34;Get a CSS selector that will uniquely select this element.&#34;&gt;Element#cssSelector()&lt;/a&gt;&lt;/code&gt; will emit appropriately escaped selectors, and the QueryParser supports those. Added &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/select/Selector#escapeCssIdentifier(java.lang.String)&#34; title=&#34;Given a CSS identifier (such as a tag, ID, or class), escape any CSS special characters that would otherwise not be&#xA;     valid in a selector.&#34;&gt;Selector.escapeCssIdentifier()&lt;/a&gt;&lt;/code&gt; and ` Selector.unescapeCssIdentifier(). &lt;a href=&#34;https://github.com/jhy/jsoup/pull/2297&#34;&gt;#2297&lt;/a&gt;, &lt;a href=&#34;https://github.com/jhy/jsoup/pull/2305&#34;&gt;#2305&lt;/a&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;structure-and-performance-improvements&#34;&gt;Structure and Performance Improvements&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Refactored the CSS &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/select/QueryParser&#34; title=&#34;Parses a CSS selector into an Evaluator tree.&#34;&gt;QueryParser&lt;/a&gt;&lt;/code&gt; into a clearer recursive descent parser. &lt;a href=&#34;https://github.com/jhy/jsoup/pull/2310&#34;&gt;#2310&lt;/a&gt;.&lt;/li&gt;&#xA;&lt;li&gt;CSS selectors with consecutive combinators (e.g. &lt;code&gt;div &amp;gt;&amp;gt; p&lt;/code&gt;) will throw an explicit parse exception. &lt;a href=&#34;https://github.com/jhy/jsoup/pull/2311&#34;&gt;#2311&lt;/a&gt;.&lt;/li&gt;&#xA;&lt;li&gt;Performance: reduced the shallow size of an Element from 40 to 32 bytes, and the NodeList from 32 to 24. &lt;a href=&#34;https://github.com/jhy/jsoup/pull/2307&#34;&gt;#2307&lt;/a&gt;.&lt;/li&gt;&#xA;&lt;li&gt;Performance: reduced GC load of new StringBuilders when tokenizing input HTML. &lt;a href=&#34;https://github.com/jhy/jsoup/pull/2304&#34;&gt;#2304&lt;/a&gt;.&lt;/li&gt;&#xA;&lt;li&gt;Made &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/parser/Parser&#34; title=&#34;Parses HTML or XML into a Document.&#34;&gt;Parser&lt;/a&gt;&lt;/code&gt; instances threadsafe, so that inadvertent use of the same instance across threads will not lead to errors. For actual concurrency, use &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/parser/Parser#newInstance()&#34; title=&#34;Creates a new Parser as a deep copy of this; including initializing a new TreeBuilder.&#34;&gt;Parser#newInstance()&lt;/a&gt;&lt;/code&gt; per thread. &lt;a href=&#34;https://github.com/jhy/jsoup/pull/2314&#34;&gt;#2314&lt;/a&gt;.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;bug-fixes&#34;&gt;Bug Fixes&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Element names containing characters invalid in XML are now normalized to valid XML names when serializing. &lt;a href=&#34;https://github.com/jhy/jsoup/issues/1496&#34;&gt;#1496&lt;/a&gt;.&lt;/li&gt;&#xA;&lt;li&gt;When serializing to XML, characters that are invalid in XML 1.0 should be removed (not encoded). &lt;a href=&#34;https://github.com/jhy/jsoup/issues/1743&#34;&gt;#1743&lt;/a&gt;.&lt;/li&gt;&#xA;&lt;li&gt;When converting a &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/nodes/Document&#34; title=&#34;A HTML Document.&#34;&gt;Document&lt;/a&gt;&lt;/code&gt; to the W3C DOM in &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/helper/W3CDom&#34; title=&#34;Helper class to transform a Document to a org.w3c.dom.Document,&#xA; for integration with toolsets that use the W3C DOM.&#34;&gt;W3CDom&lt;/a&gt;&lt;/code&gt;, elements with an attribute in an undeclared namespace now get a declaration of &lt;code&gt;xmlns:prefix=&amp;#34;undefined&amp;#34;&lt;/code&gt;. This allows subsequent serialization to XML via &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/helper/W3CDom#asString(org.w3c.dom.Document)&#34; title=&#34;Serialize a W3C document that was created by W3CDom.fromJsoup(org.jsoup.nodes.Element) to a String.&#34;&gt;W3CDom.asString()&lt;/a&gt;&lt;/code&gt; to succeed. &lt;a href=&#34;https://github.com/jhy/jsoup/issues/2087&#34;&gt;#2087&lt;/a&gt;.&lt;/li&gt;&#xA;&lt;li&gt;The &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/parser/StreamParser&#34; title=&#34;A StreamParser provides a progressive parse of its input.&#34;&gt;StreamParser&lt;/a&gt;&lt;/code&gt; could emit the final elements of a document twice, due to how &lt;code&gt;onNodeCompleted&lt;/code&gt; was fired when closing out the stack. &lt;a href=&#34;https://github.com/jhy/jsoup/issues/2295&#34;&gt;#2295&lt;/a&gt;.&lt;/li&gt;&#xA;&lt;li&gt;When parsing with the XML parser and error tracking enabled, the trailing &lt;code&gt;?&lt;/code&gt; in &lt;code&gt;&amp;lt;?xml version=&amp;#34;1.0&amp;#34;?&amp;gt;&lt;/code&gt; would incorrectly emit an error. &lt;a href=&#34;https://github.com/jhy/jsoup/issues/2298&#34;&gt;#2298&lt;/a&gt;.&lt;/li&gt;&#xA;&lt;li&gt;Calling &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/nodes/Element#cssSelector()&#34; title=&#34;Get a CSS selector that will uniquely select this element.&#34;&gt;Element#cssSelector()&lt;/a&gt;&lt;/code&gt; on an element with combining characters in the class or ID now produces the correct output. &lt;a href=&#34;https://github.com/jhy/jsoup/issues/1984&#34;&gt;#1984&lt;/a&gt;.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;hr/&gt;&#xA;&lt;p&gt;My sincere thanks to everyone who contributed to this release!&#xA;If you have any suggestions for the next release, I would love to hear them; please get in touch via &lt;a href=&#34;https://github.com/jhy/jsoup/discussions&#34;&gt;jsoup discussions&lt;/a&gt;, or with me &lt;a href=&#34;https://jhedley.com/&#34;&gt;directly&lt;/a&gt;.&lt;/p&gt;&#xA;&lt;p&gt;You can also &lt;a rel=&#34;me&#34; href=&#34;https://tilde.zone/@jhy&#34;&gt;follow me&lt;/a&gt; (&lt;b&gt;&lt;a rel=&#34;me&#34; href=&#34;https://tilde.zone/@jhy&#34;&gt;@jhy@tilde.zone&lt;/a&gt;&lt;/b&gt;) on Mastodon / Fediverse to receive occasional notes about jsoup releases.&lt;/p&gt;&#xA;&lt;p&gt;&lt;a href=&#34;/download&#34;&gt;&lt;i&gt;Download&lt;/i&gt;&lt;/a&gt; jsoup now.&lt;/p&gt;&#xA;&#xA;      </content>
  </entry>
  <entry>
    <title>jsoup Java HTML Parser release 1.19.1</title>
    <link href="https://jsoup.org/news/release-1.19.1"></link>
    <id>https://jsoup.org/news/release-1.19.1</id>
    <updated>2025-03-04T00:00:00Z</updated>
    <summary>jsoup 1.19.1 introduces HTTP/2 support, performance optimizations, and new APIs for cleaner, more efficient HTML parsing and manipulation.</summary>
    <content type="html">&lt;div class=&#34;entry-head&#34;&gt;&lt;div class=&#34;meta pubdate&#34;&gt;&lt;time datetime=&#34;2025-03-04&#34;&gt;Mar 4, 2025&lt;/time&gt;&lt;/div&gt;&lt;/div&gt;&#xA;&lt;p&gt;&lt;b&gt;jsoup 1.19.1&lt;/b&gt; is out now, with support for &lt;b&gt;http/2&lt;/b&gt; network requests, performance improvements, some new API methods, and a host of other improvements and bug fixes.&lt;/p&gt;&#xA;&lt;p&gt;&lt;b&gt;jsoup&lt;/b&gt; is a Java library for working with real-world HTML and XML. It provides a very convenient API for extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors.&lt;/p&gt;&#xA;&lt;h2 id=&#34;changes&#34;&gt;Changes&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Added support for &lt;b&gt;http/2&lt;/b&gt; requests in &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/Jsoup#connect(java.lang.String)&#34; title=&#34;Creates a new Connection (session), with the defined request URL.&#34;&gt;Jsoup.connect()&lt;/a&gt;&lt;/code&gt;, when running on Java 11+, via the Java HttpClient&#xA;implementation. &lt;a href=&#34;https://github.com/jhy/jsoup/pull/2257&#34;&gt;#2257&lt;/a&gt;.&#xA;&lt;ul&gt;&#xA;&lt;li&gt;In this version of jsoup, the default is to make requests via the HttpUrlConnection implementation: use&#xA;&lt;b&gt;&lt;code&gt;System.setProperty(&amp;#34;jsoup.useHttpClient&amp;#34;, &amp;#34;true&amp;#34;);&lt;/code&gt;&lt;/b&gt; to enable making requests via the HttpClient (if available),&#xA;which will enable &lt;code&gt;http/2&lt;/code&gt; support. This will become the default in a later version of jsoup, so now is&#xA;a good time to validate it.&lt;/li&gt;&#xA;&lt;li&gt;If you are repackaging the jsoup jar in your deployment (i.e. creating a shaded- or a fat-jar), make sure to specify&#xA;that as a Multi-Release&#xA;JAR.&lt;/li&gt;&#xA;&lt;li&gt;If the &lt;code&gt;HttpClient&lt;/code&gt; impl is not available in your JRE, requests will continue to be made via&#xA;&lt;code&gt;HttpURLConnection&lt;/code&gt; (in &lt;code&gt;http/1.1&lt;/code&gt; mode).&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;/li&gt;&#xA;&lt;li&gt;Updated the minimum Android API Level validation from 10 to &lt;b&gt;21&lt;/b&gt;. As with previous jsoup versions, Android&#xA;developers need to enable core library desugaring. The minimum Java version remains Java 8.&#xA;&lt;a href=&#34;https://github.com/jhy/jsoup/pull/2173&#34;&gt;#2173&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;Removed previously deprecated class: &lt;code&gt;org.jsoup.UncheckedIOException&lt;/code&gt; (replace with &lt;code&gt;java.io.UncheckedIOException&lt;/code&gt;);&#xA;moved previously deprecated method &lt;code&gt;Element Element#forEach(Consumer)&lt;/code&gt; to&#xA;&lt;code&gt;void Element#forEach(Consumer())&lt;/code&gt;. &lt;a href=&#34;https://github.com/jhy/jsoup/pull/2246&#34;&gt;#2246&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;Deprecated the methods &lt;code&gt;Document#updateMetaCharsetElement(boolean)&lt;/code&gt; and &lt;code&gt;Document#updateMetaCharsetElement()&lt;/code&gt;, as the&#xA;setting had no effect. When &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/nodes/Document#charset(java.nio.charset.Charset)&#34; title=&#34;Set the output character set of this Document.&#34;&gt;Document#charset(Charset)&lt;/a&gt;&lt;/code&gt; is called, the document’s meta charset or XML encoding&#xA;instruction is always set. &lt;a href=&#34;https://github.com/jhy/jsoup/pull/2247&#34;&gt;#2247&lt;/a&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;improvements&#34;&gt;Improvements&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;When cleaning HTML with a &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/safety/Safelist&#34; title=&#34;Safe-lists define what HTML (elements and attributes) to allow through the cleaner.&#34;&gt;Safelist&lt;/a&gt;&lt;/code&gt; that preserves relative links, the &lt;code&gt;isValid()&lt;/code&gt; method will now consider these&#xA;links valid. Additionally, the enforced attribute &lt;code&gt;rel=nofollow&lt;/code&gt; will only be added to external links when configured&#xA;in the safelist. &lt;a href=&#34;https://github.com/jhy/jsoup/pull/2245&#34;&gt;#2245&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;Added &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/nodes/Element#selectStream(java.lang.String)&#34; title=&#34;Selects elements from the given root that match the specified Selector CSS query, with this element as the&#xA;     starting context, and returns them as a lazy Stream.&#34;&gt;Element#selectStream(String query)&lt;/a&gt;&lt;/code&gt; and &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/nodes/Element#selectStream(org.jsoup.select.Evaluator)&#34; title=&#34;Find a Stream of elements that match the supplied Evaluator.&#34;&gt;Element#selectStream(Evaluator)&lt;/a&gt;&lt;/code&gt; methods, that return a &lt;code&gt;Stream&lt;/code&gt; of&#xA;matching elements. Elements are evaluated and returned as they are found, and the stream can be&#xA;terminated early. &lt;a href=&#34;https://github.com/jhy/jsoup/pull/2092&#34;&gt;#2092&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/nodes/Element&#34; title=&#34;An HTML Element consists of a tag name, attributes, and child nodes (including text nodes and other elements).&#34;&gt;Element&lt;/a&gt;&lt;/code&gt; objects now implement &lt;code&gt;Iterable&lt;/code&gt;, enabling them to be used in enhanced for loops.&lt;/li&gt;&#xA;&lt;li&gt;Added support for fragment parsing from a &lt;code&gt;Reader&lt;/code&gt; via&#xA;&lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/parser/Parser#parseFragmentInput(java.io.Reader,org.jsoup.nodes.Element,java.lang.String)&#34; title=&#34;Parse a fragment of HTML into a list of nodes.&#34;&gt;Parser#parseFragmentInput(Reader, Element, String)&lt;/a&gt;&lt;/code&gt;. &lt;a href=&#34;https://github.com/jhy/jsoup/issues/1177&#34;&gt;#1177&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;Reintroduced CLI executable examples, in &lt;code&gt;jsoup-examples.jar&lt;/code&gt;. &lt;a href=&#34;https://github.com/jhy/jsoup/issues/1702&#34;&gt;#1702&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;Optimized performance of selectors like &lt;code&gt;#id .class&lt;/code&gt; (and other similar descendant queries) by around 4.6x, by better&#xA;balancing the Ancestor evaluator’s cost function in the query&#xA;planner. &lt;a href=&#34;https://github.com/jhy/jsoup/issues/2254&#34;&gt;#2254&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;Removed the legacy parsing rules for &lt;code&gt;&amp;lt;isindex&amp;gt;&lt;/code&gt; tags, which would autovivify a &lt;code&gt;form&lt;/code&gt; element with labels. This is no&#xA;longer in the spec.&lt;/li&gt;&#xA;&lt;li&gt;Added &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/select/Elements#selectFirst(java.lang.String)&#34; title=&#34;Find the first Element that matches the Selector CSS query within this element list.&#34;&gt;Elements.selectFirst(String cssQuery)&lt;/a&gt;&lt;/code&gt; and &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/select/Elements#expectFirst(java.lang.String)&#34; title=&#34;Just like Elements.selectFirst(String), but if there is no match, throws an IllegalArgumentException.&#34;&gt;Elements.expectFirst(String cssQuery)&lt;/a&gt;&lt;/code&gt;, to select the first&#xA;matching element from an &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/select/Elements&#34; title=&#34;A list of Elements, with methods that act on every element in the list.&#34;&gt;Elements&lt;/a&gt;&lt;/code&gt; list.  &lt;a href=&#34;https://github.com/jhy/jsoup/pull/2263/&#34;&gt;#2263&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;When parsing with the XML parser, XML Declarations and Processing Instructions are directly handled, vs bouncing&#xA;through the HTML parser’s bogus comment handler. Serialization for non-doctype declarations no longer end with a&#xA;spurious &lt;code&gt;!&lt;/code&gt;. &lt;a href=&#34;https://github.com/jhy/jsoup/pull/2275&#34;&gt;#2275&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;When converting parsed HTML to XML or the W3C DOM, element names containing &lt;code&gt;&amp;lt;&lt;/code&gt; are normalized to &lt;code&gt;_&lt;/code&gt; to ensure valid&#xA;XML. For example, &lt;code&gt;&amp;lt;foo&amp;lt;bar&amp;gt;&lt;/code&gt; becomes &lt;code&gt;&amp;lt;foo_bar&amp;gt;&lt;/code&gt;, as XML does not allow &lt;code&gt;&amp;lt;&lt;/code&gt; in element names, but HTML5&#xA;does. &lt;a href=&#34;https://github.com/jhy/jsoup/pull/2276&#34;&gt;#2276&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;Reimplemented the HTML5 Adoption Agency Algorithm to the current spec. This handles mis-nested formating / structural elements. &lt;a href=&#34;https://github.com/jhy/jsoup/pull/2278&#34;&gt;#2278&lt;/a&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;bug-fixes&#34;&gt;Bug Fixes&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;If an element has an &lt;code&gt;;&lt;/code&gt; in an attribute name, it could not be converted to a W3C DOM element, and so subsequent XPath&#xA;queries could miss that element. Now, the attribute name is more completely&#xA;normalized. &lt;a href=&#34;https://github.com/jhy/jsoup/issues/2244&#34;&gt;#2244&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;For backwards compatibility, reverted the internal attribute key for doctype names to&#xA;“name”. &lt;a href=&#34;https://github.com/jhy/jsoup/issues/2241&#34;&gt;#2241&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;In &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/Connection&#34; title=&#34;The Connection interface is a convenient HTTP client and session object to fetch content from the web, and parse them&#xA; into Documents.&#34;&gt;Connection&lt;/a&gt;&lt;/code&gt;, skip cookies that have no name, rather than throwing a validation&#xA;exception. &lt;a href=&#34;https://github.com/jhy/jsoup/issues/2242&#34;&gt;#2242&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;When running on JDK 1.8, the error &lt;code&gt;java.lang.NoSuchMethodError: java.nio.ByteBuffer.flip()Ljava/nio/ByteBuffer;&lt;/code&gt;&#xA;could be thrown when calling &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/Connection.Response#body()&#34; title=&#34;Get the body of the response as a plain String.&#34;&gt;Response#body()&lt;/a&gt;&lt;/code&gt; after parsing from a URL and the buffer size was&#xA;exceeded. &lt;a href=&#34;https://github.com/jhy/jsoup/pull/2250&#34;&gt;#2250&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;For backwards compatibility, allow &lt;code&gt;null&lt;/code&gt; InputStream inputs to &lt;code&gt;Jsoup.parse(InputStream stream, ...)&lt;/code&gt;, by returning&#xA;an empty &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/nodes/Document&#34; title=&#34;A HTML Document.&#34;&gt;Document&lt;/a&gt;&lt;/code&gt;. &lt;a href=&#34;https://github.com/jhy/jsoup/issues/2252&#34;&gt;#2252&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;A &lt;code&gt;template&lt;/code&gt; tag containing an &lt;code&gt;li&lt;/code&gt; within an open &lt;code&gt;li&lt;/code&gt; would be parsed incorrectly, as it was not recognized as a&#xA;“special” tag (which have additional processing rules). Also, added the SVG and MathML namespace tags to the list of&#xA;special tags. &lt;a href=&#34;https://github.com/jhy/jsoup/issues/2258&#34;&gt;#2258&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;A &lt;code&gt;template&lt;/code&gt; tag containing a &lt;code&gt;button&lt;/code&gt; within an open &lt;code&gt;button&lt;/code&gt; would be parsed incorrectly, as the “in button scope”&#xA;check was not aware of the &lt;code&gt;template&lt;/code&gt; element. Corrected other instances including MathML and SVG elements,&#xA;also. &lt;a href=&#34;https://github.com/jhy/jsoup/issues/2271&#34;&gt;#2271&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;An &lt;code&gt;:nth-child&lt;/code&gt; selector with a negative digit-less step, such as &lt;code&gt;:nth-child(-n+2)&lt;/code&gt;, would be parsed incorrectly as a&#xA;positive step, and so would not match as expected. &lt;a href=&#34;https://github.com/jhy/jsoup/issues/1147&#34;&gt;#1147&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;Calling &lt;code&gt;doc.charset(charset)&lt;/code&gt; on an empty XML document would throw an&#xA;&lt;code&gt;IndexOutOfBoundsException&lt;/code&gt;. &lt;a href=&#34;https://github.com/jhy/jsoup/issues/2266&#34;&gt;#2266&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;Fixed a memory leak when reusing a nested &lt;code&gt;StructuralEvaluator&lt;/code&gt; (e.g., a selector ancestor chain like &lt;code&gt;A B C&lt;/code&gt;) by&#xA;ensuring cache reset calls cascade to inner members. &lt;a href=&#34;https://github.com/jhy/jsoup/issues/2277&#34;&gt;#2277&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;Concurrent calls to &lt;code&gt;doc.clone().append(html)&lt;/code&gt; were not supported. When a document was cloned, its &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/parser/Parser&#34; title=&#34;Parses HTML or XML into a Document.&#34;&gt;Parser&lt;/a&gt;&lt;/code&gt; was not cloned but was a shallow copy of the original parser. &lt;a href=&#34;https://github.com/jhy/jsoup/issues/2281&#34;&gt;#2281&lt;/a&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;hr/&gt;&#xA;&lt;p&gt;My sincere thanks to everyone who contributed to this release!&#xA;If you have any suggestions for the next release, I would love to hear them; please get in touch via &lt;a href=&#34;https://github.com/jhy/jsoup/discussions&#34;&gt;jsoup discussions&lt;/a&gt;, or with me &lt;a href=&#34;https://jhedley.com/&#34;&gt;directly&lt;/a&gt;.&lt;/p&gt;&#xA;&lt;p&gt;You can also &lt;a rel=&#34;me&#34; href=&#34;https://tilde.zone/@jhy&#34;&gt;follow me&lt;/a&gt; (&lt;b&gt;&lt;a rel=&#34;me&#34; href=&#34;https://tilde.zone/@jhy&#34;&gt;@jhy@tilde.zone&lt;/a&gt;&lt;/b&gt;) on Mastodon / Fediverse to receive occasional notes about jsoup releases.&lt;/p&gt;&#xA;&lt;p&gt;&lt;a href=&#34;/download&#34;&gt;&lt;i&gt;Download&lt;/i&gt;&lt;/a&gt; jsoup now.&lt;/p&gt;&#xA;&#xA;      </content>
  </entry>
  <entry>
    <title>Parse large documents efficiently with StreamParser</title>
    <link href="https://jsoup.org/cookbook/input/streamparser-dom-sax"></link>
    <id>https://jsoup.org/cookbook/input/streamparser-dom-sax</id>
    <updated>2025-03-02T00:00:00Z</updated>
    <summary>StreamParser is a hybrid Java SAX + DOM parser in jsoup, allowing efficient, incremental parsing without high memory usage. Process large or streamed HTML and XML documents seamlessly.</summary>
    <content type="html">&lt;div class=&#34;entry-head&#34;&gt;&lt;div class=&#34;meta pubdate&#34;&gt;&lt;time datetime=&#34;2025-03-02&#34;&gt;Mar 2, 2025&lt;/time&gt;&lt;/div&gt;&lt;/div&gt;&#xA;&lt;h2 id=&#34;problem&#34;&gt;Problem&lt;/h2&gt;&#xA;&lt;p&gt;You need to parse an HTML or XML document that is too large to fit entirely into memory, or you want to process elements progressively as they are encountered. A typical use case is extracting specific elements from a large document, or handling streamed HTML from a network source efficiently.&lt;/p&gt;&#xA;&lt;p&gt;Traditional Java SAX parsers offer efficient streaming parsing for XML and HTML, but they lack an ergonomic way to traverse or manipulate elements like a DOM parser. Meanwhile, standard DOM parsers, such as &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/Jsoup#parse(java.io.File)&#34; title=&#34;Parse the contents of a file as HTML.&#34;&gt;Jsoup.parse()&lt;/a&gt;&lt;/code&gt;, require loading the entire document into memory, which may be inefficient for large files.&lt;/p&gt;&#xA;&lt;h2 id=&#34;solution&#34;&gt;Solution&lt;/h2&gt;&#xA;&lt;p&gt;Use the &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/parser/StreamParser&#34; title=&#34;A StreamParser provides a progressive parse of its input.&#34;&gt;StreamParser&lt;/a&gt;&lt;/code&gt;, which allows you to parse parsing an HTML or XML document in an event driven hybrid DOM + SAX style. Elements are emitted as they are completed, enabling efficient memory use and incremental processing. This hybrid approach allows you to process elements as they arrive, including their children and ancestors, while still leveraging jsoup’s intuitive API.&lt;/p&gt;&#xA;&lt;p&gt;This makes StreamParser a viable alternative to traditional SAX parsers while providing a more ergonomic and familiar API. And jsoup’s robust handling of malformed HTML and XML ensures that real-world documents can be processed effectively.&lt;/p&gt;&#xA;&lt;pre class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;try&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;StreamParser&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;streamer&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Jsoup&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;connect&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;https://example.com/large.html&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;execute&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;streamParser&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;())&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;    &#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Element&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;el&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;while&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;((&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;el&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;streamer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;selectNext&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;article&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;))&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;!=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;null&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;        &lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;// Will include the children of &amp;lt;article&amp;gt;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;        &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;System&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;out&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;println&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;Processing article: &amp;#34;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;+&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;el&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;text&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;());&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;        &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;el&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;remove&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;// Keep memory usage low by discarding processed elements&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;description&#34;&gt;Description&lt;/h2&gt;&#xA;&lt;p&gt;Unlike the default jsoup &lt;a href=&#34;/apidocs/org/jsoup/Jsoup#parse(java.lang.String)&#34;&gt;&lt;code&gt;parse&lt;/code&gt;&lt;/a&gt; method, which constructs a full DOM tree in memory, &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/parser/StreamParser&#34; title=&#34;A StreamParser provides a progressive parse of its input.&#34;&gt;StreamParser&lt;/a&gt;&lt;/code&gt; allows for progressive parsing:&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Elements are fully formed and emitted as they are completed.&lt;/li&gt;&#xA;&lt;li&gt;The parser can run in an iterator-like fashion with &lt;a href=&#34;/apidocs/org/jsoup/parser/StreamParser#selectFirst(java.lang.String)&#34;&gt;&lt;code&gt;selectNext(query)&lt;/code&gt;&lt;/a&gt; to fetch elements as needed.&lt;/li&gt;&#xA;&lt;li&gt;The DOM tree can be pruned during parsing to save memory.&lt;/li&gt;&#xA;&lt;li&gt;The &lt;a href=&#34;/apidocs/org/jsoup/parser/StreamParser#document()&#34;&gt;&lt;code&gt;&lt;/code&gt;&lt;/a&gt;&lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/parser/StreamParser#document()&#34; title=&#34;Get the current Document as it is being parsed.&#34;&gt;document()&lt;/a&gt;&lt;/code&gt; method provides access to the partially built document.&lt;/li&gt;&#xA;&lt;li&gt;Parsing can be stopped early with &lt;a href=&#34;/apidocs/org/jsoup/parser/StreamParser#stop()&#34;&gt;&lt;code&gt;&lt;/code&gt;&lt;/a&gt;&lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/parser/StreamParser#stop()&#34; title=&#34;Flags that the parse should be stopped; the backing iterator will not return any more Elements.&#34;&gt;stop()&lt;/a&gt;&lt;/code&gt; if only a portion of the document is needed.&lt;/li&gt;&#xA;&lt;li&gt;The backing input (a URL connection, or a file) is read incrementally as the parse proceeds, reducing buffer bloat.&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt;A StreamParser can be reused via a new &lt;a href=&#34;/apidocs/org/jsoup/parser/StreamParser#parse(java.io.Reader,java.lang.String)&#34;&gt;&lt;code&gt;&lt;/code&gt;&lt;/a&gt;&lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/parser/StreamParser#parse(java.io.Reader,java.lang.String)&#34; title=&#34;Provide the input for a Document parse.&#34;&gt;parse(Reader, String)&lt;/a&gt;&lt;/code&gt;, but is not thread-safe for concurrent inputs. New parsers should be used in each thread.&lt;/p&gt;&#xA;&lt;p&gt;If created via &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/Connection.Response#streamParser()&#34; title=&#34;Returns a StreamParser that will parse the Response progressively.&#34;&gt;Connection.Response#streamParser()&lt;/a&gt;&lt;/code&gt;, or another Reader that is I/O backed, the iterator and&#xA;stream consumers will throw an &lt;code&gt;java.io.UncheckedIOException&lt;/code&gt; if the underlying Reader errors during read.&lt;/p&gt;&#xA;&lt;p&gt;The StreamParser wraps an underlying HTML or XML parser, so the same configuration options can be used as with the standard &lt;code&gt;Jsoup.parse&lt;/code&gt; method.&lt;/p&gt;&#xA;&lt;h2 id=&#34;examples&#34;&gt;Examples&lt;/h2&gt;&#xA;&lt;h3 id=&#34;process-a-file-in-chunks&#34;&gt;Process a file in chunks&lt;/h3&gt;&#xA;&lt;p&gt;Let’s say we have an XML file with a bunch of &lt;code&gt;&amp;lt;book&amp;gt;&lt;/code&gt; chunks, each with many &lt;code&gt;&amp;lt;chapter&amp;gt;&lt;/code&gt; elements, and loading it all into the DOM at once might run out of memory. Parse the file incrementally using &lt;a href=&#34;/apidocs/org/jsoup/helper/DataUtil#streamParser(java.nio.file.Path,java.nio.charset.Charset,java.lang.String,org.jsoup.parser.Parser)&#34;&gt;&lt;code&gt;DataUtil.streamParser(...)&lt;/code&gt;&lt;/a&gt;. Then process the file in chunks by iterating on &lt;code&gt;selectNext(cssquery)&lt;/code&gt;:&lt;/p&gt;&#xA;&lt;pre class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kd&#34;&gt;static&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;kt&#34;&gt;void&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nf&#34;&gt;streamChunks&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Path&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;path&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;kd&#34;&gt;throws&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;IOException&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;&lt;/span&gt;&lt;span class=&#34;k&#34;&gt;try&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;StreamParser&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;streamer&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;DataUtil&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;streamParser&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;path&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;StandardCharsets&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;UTF_8&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;https://example.com&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Parser&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;xmlParser&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()))&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Element&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;el&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;kd&#34;&gt;var&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;seenChunks&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;0&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;while&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;((&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;el&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;streamer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;selectNext&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;book&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;))&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;!=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;null&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;        &lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;// do something more useful! The element will have all its children elements&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;        &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Elements&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;chapters&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;el&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;select&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;chapter&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;        &lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;// remove this chunk once used to keep DOM light and not run out of memory&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;        &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;el&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;remove&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;        &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;seenChunks&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;++&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Document&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;doc&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;streamer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;document&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;// the completed doc, will just be a shell&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;log&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;Title&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;doc&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;expectFirst&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;title&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;));&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;log&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;Seen chunks&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;seenChunks&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;h3 id=&#34;parse-just-the-metadata-of-a-website&#34;&gt;Parse just the metadata of a website&lt;/h3&gt;&#xA;&lt;p&gt;Assume we are building a link preview tool. All the data we need is in the head section of a page, and so there’s no need to fetch and parse the complete page. Make the request using &lt;a href=&#34;/apidocs/org/jsoup/Connection#newRequest()&#34;&gt;&lt;code&gt;Jsoup.connect(url)&lt;/code&gt;&lt;/a&gt;, and stream parse it via &lt;a href=&#34;/apidocs/org/jsoup/helper/HttpConnection.Response#streamParser()&#34;&gt;&lt;code&gt;&lt;/code&gt;&lt;/a&gt;&lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/Connection.Response#streamParser()&#34; title=&#34;Returns a StreamParser that will parse the Response progressively.&#34;&gt;Response.streamParser()&lt;/a&gt;&lt;/code&gt;.&lt;/p&gt;&#xA;&lt;p&gt;This example will fetch a given URL, parse only the &lt;code&gt;&amp;lt;head&amp;gt;&lt;/code&gt; contents and use those, and then cleanly close the request:&lt;/p&gt;&#xA;&lt;pre class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kd&#34;&gt;static&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;kt&#34;&gt;void&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nf&#34;&gt;selectMeta&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;String&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;url&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;kd&#34;&gt;throws&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;IOException&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;&lt;/span&gt;&lt;span class=&#34;k&#34;&gt;try&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;StreamParser&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;streamer&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Jsoup&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;connect&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;url&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;).&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;execute&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;().&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;streamParser&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;())&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Element&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;head&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;streamer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;selectFirst&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;head&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;if&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;head&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;==&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;null&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;return&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;log&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;Title&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;head&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;select&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;title&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;).&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;text&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;());&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;log&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;Description&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;head&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;select&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;meta[name=description]&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;).&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;attr&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;content&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;));&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;log&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;Image&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;head&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;select&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;meta[name=twitter:image]&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;).&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;attr&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;content&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;));&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;h3 id=&#34;minify-the-loaded-dom-by-removing-empty-text-nodes&#34;&gt;Minify the loaded DOM by removing empty text nodes&lt;/h3&gt;&#xA;&lt;p&gt;This example shows a way to progressively parse an input and remove redundant empty textnodes during the parse, resulting in a (somewhat) minified DOM:&lt;/p&gt;&#xA;&lt;pre class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kd&#34;&gt;static&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;kt&#34;&gt;void&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;nf&#34;&gt;minifyDocument&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;String&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;html&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;&amp;lt;table&amp;gt;&amp;lt;tr&amp;gt; &amp;lt;td&amp;gt;a&amp;lt;/td&amp;gt; &amp;lt;td&amp;gt;a&amp;lt;/td&amp;gt; &amp;lt;td&amp;gt;a&amp;lt;/td&amp;gt; &amp;lt;td&amp;gt;a&amp;lt;/td&amp;gt; &amp;lt;/tr&amp;gt;&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;StreamParser&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;streamer&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;new&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;StreamParser&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Parser&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;htmlParser&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()).&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;parse&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;html&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;streamer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;stream&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;filter&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Element&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;::&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;isBlock&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;forEach&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;el&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;-&amp;gt;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;        &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;List&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;TextNode&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;textNodes&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;el&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;textNodes&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;        &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;for&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;TextNode&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;textNode&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;textNodes&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;            &lt;/span&gt;&lt;span class=&#34;k&#34;&gt;if&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;textNode&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;isBlank&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;())&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;                &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;textNode&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;remove&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;        &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;});&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Document&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;minified&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;streamer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;document&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;System&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;out&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;println&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;minified&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;body&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;());&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion&lt;/h2&gt;&#xA;&lt;p&gt;The StreamParser provides a practical solution for handling large or streamed XML and HTML documents efficiently, balancing the benefits of both SAX and DOM parsing. Whether you need to extract elements incrementally, reduce memory consumption, or selectively parse content, StreamParser offers a flexible alternative to traditional Java SAX parsers while maintaining the familiar API and robust parsing capabilities of jsoup.&lt;/p&gt;&#xA;&#xA;      </content>
  </entry>
  <entry>
    <title>jsoup 1.18.3 quick update</title>
    <link href="https://jsoup.org/news/release-1.18.3"></link>
    <id>https://jsoup.org/news/release-1.18.3</id>
    <updated>2024-12-02T00:00:00Z</updated>
    <summary>jsoup 1.18.3 is a quick release to fix an issue serializing XML, from 1.18.2.</summary>
    <content type="html">&lt;div class=&#34;entry-head&#34;&gt;&lt;div class=&#34;meta pubdate&#34;&gt;&lt;time datetime=&#34;2024-12-02&#34;&gt;Dec 2, 2024&lt;/time&gt;&lt;/div&gt;&lt;/div&gt;&#xA;&lt;p&gt;&lt;code&gt;1.18.3&lt;/code&gt; is a quick release to fix &lt;a href=&#34;https://github.com/jhy/jsoup/issues/2235&#34;&gt;#2235&lt;/a&gt; in &lt;code&gt;1.18.2&lt;/code&gt;.&lt;/p&gt;&#xA;&lt;p&gt;Please see also the full release notes for &lt;a href=&#34;/news/release-1.18.2&#34;&gt;jsoup 1.18.2&lt;/a&gt; if you are coming from an earlier release.&lt;/p&gt;&#xA;&lt;p&gt;&lt;a href=&#34;/download&#34;&gt;&lt;i&gt;Download&lt;/i&gt;&lt;/a&gt; jsoup now.&lt;/p&gt;&#xA;&lt;h2 id=&#34;bug-fixes&#34;&gt;Bug Fixes&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;When serializing to XML, attribute names containing &lt;code&gt;-&lt;/code&gt;, &lt;code&gt;.&lt;/code&gt;, or digits were incorrectly marked as invalid and&#xA;removed. &lt;a href=&#34;https://github.com/jhy/jsoup/issues/2235&#34;&gt;2235&lt;/a&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;hr/&gt;&#xA;&lt;p&gt;If you have any suggestions for the next release, I would love to hear them; please get in touch via &lt;a href=&#34;https://github.com/jhy/jsoup/discussions&#34;&gt;jsoup discussions&lt;/a&gt;, or with me &lt;a href=&#34;https://jhedley.com/&#34;&gt;directly&lt;/a&gt;.&lt;/p&gt;&#xA;&lt;p&gt;You can also &lt;a rel=&#34;me&#34; href=&#34;https://tilde.zone/@jhy&#34;&gt;follow me&lt;/a&gt; (&lt;b&gt;&lt;a rel=&#34;me&#34; href=&#34;https://tilde.zone/@jhy&#34;&gt;@jhy@tilde.zone&lt;/a&gt;&lt;/b&gt;) on Mastodon / Fediverse to receive occasional notes about jsoup releases.&lt;/p&gt;&#xA;&#xA;      </content>
  </entry>
  <entry>
    <title>jsoup Java HTML Parser release 1.18.2</title>
    <link href="https://jsoup.org/news/release-1.18.2"></link>
    <id>https://jsoup.org/news/release-1.18.2</id>
    <updated>2024-11-27T00:00:00Z</updated>
    <summary>jsoup 1.18.2 is out now, with significant performance gains when parsing HTML inputs, plus a range of other improvements and fixes.</summary>
    <content type="html">&lt;div class=&#34;entry-head&#34;&gt;&lt;div class=&#34;meta pubdate&#34;&gt;&lt;time datetime=&#34;2024-11-27&#34;&gt;Nov 27, 2024&lt;/time&gt;&lt;/div&gt;&lt;/div&gt;&#xA;&lt;p&gt;&lt;i&gt;jsoup 1.18.2&lt;/i&gt; is out now, with significant performance gains when parsing HTML inputs, plus a range of other improvements and fixes.&lt;/p&gt;&#xA;&lt;p&gt;&lt;i&gt;jsoup&lt;/i&gt; is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors.&lt;/p&gt;&#xA;&lt;p&gt;&lt;a href=&#34;/download&#34;&gt;&lt;i&gt;Download&lt;/i&gt;&lt;/a&gt; jsoup now.&lt;/p&gt;&#xA;&lt;h2 id=&#34;improvements&#34;&gt;Improvements&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Optimized the throughput and memory use throughout the input read and parse flows, with heap allocations and GC&#xA;down between -6% and -89%, and throughput improved up to +143% for small inputs. Most inputs sizes will see&#xA;throughput increases of ~ 20%. These performance improvements come through recycling the backing &lt;code&gt;byte[]&lt;/code&gt; and &lt;code&gt;char[]&lt;/code&gt;&#xA;arrays used to read and parse the input. &lt;a href=&#34;https://github.com/jhy/jsoup/pull/2186&#34;&gt;2186&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;Speed optimized &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/nodes/Element#html()&#34; title=&#34;Retrieves the element&amp;#39;s inner HTML.&#34;&gt;html()&lt;/a&gt;&lt;/code&gt; and &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/nodes/Entities#escape(java.lang.String)&#34; title=&#34;HTML escape an input string, using the default settings (UTF-8, base entities).&#34;&gt;Entities.escape()&lt;/a&gt;&lt;/code&gt; when the input contains UTF characters in a supplementary plane, by&#xA;around 49%. &lt;a href=&#34;https://github.com/jhy/jsoup/pull/2183&#34;&gt;2183&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;The form associated elements returned by &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/nodes/FormElement#elements()&#34; title=&#34;Get the list of form control elements associated with this form.&#34;&gt;FormElement.elements()&lt;/a&gt;&lt;/code&gt; now reflect changes made to the DOM,&#xA;subsequently to the original parse. &lt;a href=&#34;https://github.com/jhy/jsoup/issues/2140&#34;&gt;2140&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;In the &lt;code&gt;TreeBuilder&lt;/code&gt;, the &lt;code&gt;onNodeInserted()&lt;/code&gt; and &lt;code&gt;onNodeClosed()&lt;/code&gt; events are now also fired for the outermost /&#xA;root &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/nodes/Document&#34; title=&#34;A HTML Document.&#34;&gt;Document&lt;/a&gt;&lt;/code&gt; node. This enables source position tracking on the Document node (which was previously unset). And&#xA;it also enables the node traversor to see the outer Document node. &lt;a href=&#34;https://github.com/jhy/jsoup/pull/2182&#34;&gt;2182&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;Selected Elements can now be position swapped inline using&#xA;&lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/select/Elements#set(int,org.jsoup.nodes.Element)&#34; title=&#34;Replace the Element at the specified index in this list, and in the DOM.&#34;&gt;Elements#set()&lt;/a&gt;&lt;/code&gt;. &lt;a href=&#34;https://github.com/jhy/jsoup/issues/2212&#34;&gt;2212&lt;/a&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;bug-fixes&#34;&gt;Bug Fixes&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/nodes/Element#cssSelector()&#34; title=&#34;Get a CSS selector that will uniquely select this element.&#34;&gt;Element.cssSelector()&lt;/a&gt;&lt;/code&gt; would fail if the element’s class contained a &lt;code&gt;*&lt;/code&gt;&#xA;character. &lt;a href=&#34;https://github.com/jhy/jsoup/issues/2169&#34;&gt;2169&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;When tracking source ranges, a text node following an invalid self-closing element may be left&#xA;untracked. &lt;a href=&#34;https://github.com/jhy/jsoup/issues/2175&#34;&gt;2175&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;When a document has no doctype, or a doctype not named &lt;code&gt;html&lt;/code&gt;, it should be parsed in Quirks&#xA;Mode. &lt;a href=&#34;https://github.com/jhy/jsoup/issues/2197&#34;&gt;2197&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;With a selector like &lt;code&gt;div:has(span + a)&lt;/code&gt;, the &lt;code&gt;has()&lt;/code&gt; component was not working correctly, as the inner combining&#xA;query caused the evaluator to match those against the outer’s siblings, not&#xA;children. &lt;a href=&#34;https://github.com/jhy/jsoup/issues/2187&#34;&gt;2187&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;A selector query that included multiple &lt;code&gt;:has()&lt;/code&gt; components in a nested &lt;code&gt;:has()&lt;/code&gt; might incorrectly&#xA;execute. &lt;a href=&#34;https://github.com/jhy/jsoup/issues/2131&#34;&gt;2131&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;When cookie names in a response are duplicated, the simple view of cookies available via&#xA;&lt;code&gt;Connection.Response#cookies()&lt;/code&gt; will provide the last one set. Generally it is better to use&#xA;the &lt;a href=&#34;https://jsoup.org/cookbook/web/request-session&#34;&gt;Jsoup.newSession&lt;/a&gt; method to maintain a cookie jar, as that&#xA;applies appropriate path selection on cookies when making requests. &lt;a href=&#34;https://github.com/jhy/jsoup/issues/1831&#34;&gt;1831&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;When parsing named HTML entities, base entities should resolve if they are a prefix of the input token (and not in an&#xA;attribute). &lt;a href=&#34;https://github.com/jhy/jsoup/issues/2207&#34;&gt;2207&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;Fixed incorrect tracking of source ranges for attributes merged from late-occurring elements that were implicitly&#xA;created (&lt;code&gt;html&lt;/code&gt; or &lt;code&gt;body&lt;/code&gt;). &lt;a href=&#34;https://github.com/jhy/jsoup/issues/2204&#34;&gt;2204&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;Follow the current HTML specification in the tokenizer to allow &lt;code&gt;&amp;lt;&lt;/code&gt; as part of a tag name, instead of emitting it as a&#xA;character node. &lt;a href=&#34;https://github.com/jhy/jsoup/issues/2230&#34;&gt;2230&lt;/a&gt;&lt;/li&gt;&#xA;&lt;li&gt;Similarly, allow a &lt;code&gt;&amp;lt;&lt;/code&gt; as the start of an attribute name, vs creating a new element. The previous behavior was&#xA;intended to parse closer to what we anticipated the author’s intent to be, but that does not align to the spec or to&#xA;how browsers behave. &lt;a href=&#34;https://github.com/jhy/jsoup/issues/1483&#34;&gt;1483&lt;/a&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;hr/&gt;&#xA;&lt;p&gt;My sincere thanks to everyone who contributed patches, suggestions, and bug reports. If you have any suggestions for the next release, I would love to hear them; please get in touch via &lt;a href=&#34;https://github.com/jhy/jsoup/discussions&#34;&gt;jsoup discussions&lt;/a&gt;, or with me &lt;a href=&#34;https://jhedley.com/&#34;&gt;directly&lt;/a&gt;.&lt;/p&gt;&#xA;&lt;p&gt;You can also &lt;a rel=&#34;me&#34; href=&#34;https://tilde.zone/@jhy&#34;&gt;follow me&lt;/a&gt; (&lt;b&gt;&lt;a rel=&#34;me&#34; href=&#34;https://tilde.zone/@jhy&#34;&gt;@jhy@tilde.zone&lt;/a&gt;&lt;/b&gt;) on Mastodon / Fediverse to receive occasional notes about jsoup releases.&lt;/p&gt;&#xA;&#xA;      </content>
  </entry>
  <entry>
    <title>Maintaining a request session</title>
    <link href="https://jsoup.org/cookbook/web/request-session"></link>
    <id>https://jsoup.org/cookbook/web/request-session</id>
    <updated>2024-07-11T00:00:00Z</updated>
    <summary>A guide to maintaining web request sessions in jsoup.</summary>
    <content type="html">&lt;div class=&#34;entry-head&#34;&gt;&lt;div class=&#34;meta pubdate&#34;&gt;&lt;time datetime=&#34;2024-07-11&#34;&gt;Jul 11, 2024&lt;/time&gt;&lt;/div&gt;&lt;/div&gt;&#xA;&lt;h2 id=&#34;problem&#34;&gt;Problem&lt;/h2&gt;&#xA;&lt;p&gt;You want to perform multiple HTTP requests using the same configuration, and retain cookies across these requests.&lt;/p&gt;&#xA;&lt;h2 id=&#34;solution&#34;&gt;Solution&lt;/h2&gt;&#xA;&lt;p&gt;Use the &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/Jsoup#newSession()&#34; title=&#34;Creates a new Connection to use as a session.&#34;&gt;Jsoup.newSession()&lt;/a&gt;&lt;/code&gt; method to create a new session, represented by the &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/Connection&#34; title=&#34;The Connection interface is a convenient HTTP client and session object to fetch content from the web, and parse them&#xA; into Documents.&#34;&gt;Connection&lt;/a&gt;&lt;/code&gt; interface:&lt;/p&gt;&#xA;&lt;pre class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;// Create a new session with settings applied to all requests:&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Connection&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;session&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Jsoup&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;newSession&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;()&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;  &#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;timeout&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;45&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;*&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;1000&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;  &#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;maxBodySize&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;5&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;*&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;1024&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;*&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;1024&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;);&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;  &#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;  &#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;// Make the first request:  &lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Document&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;req1&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;session&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;newRequest&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;https://example.com/auth&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;  &#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;data&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;auth-code&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;my-secret-token&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;  &#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;post&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;  &#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;  &#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;&lt;/span&gt;&lt;span class=&#34;c1&#34;&gt;// Make a following request with the same settings, and cookies set from req1:&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;Document&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;req2&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;w&#34;&gt; &lt;/span&gt;&lt;span class=&#34;n&#34;&gt;session&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;newRequest&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s&#34;&gt;&amp;#34;https://example.com/admin/&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;w&#34;&gt;    &lt;/span&gt;&lt;span class=&#34;p&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;na&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;();&lt;/span&gt;&lt;span class=&#34;w&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;description&#34;&gt;Description&lt;/h2&gt;&#xA;&lt;p&gt;The session created by &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/Jsoup#newSession()&#34; title=&#34;Creates a new Connection to use as a session.&#34;&gt;newSession()&lt;/a&gt;&lt;/code&gt; supports making multiple requests with the same configuration. Any request-level settings applied on that session will be applied to each actual request.&lt;/p&gt;&#xA;&lt;p&gt;Cookies set by responses to those requests will be kept in a cookie jar for use in later requests.&lt;/p&gt;&#xA;&lt;p&gt;The &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/Connection#newRequest(java.lang.String)&#34; title=&#34;Creates a new request, using this Connection as the session-state and to initialize the connection settings (which&#xA;     may then be independently changed on the returned Connection.Request object).&#34;&gt;newRequest(String url)&lt;/a&gt;&lt;/code&gt; method returns a &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/Connection&#34; title=&#34;The Connection interface is a convenient HTTP client and session object to fetch content from the web, and parse them&#xA; into Documents.&#34;&gt;Connection&lt;/a&gt;&lt;/code&gt; object that is pre-configured with the session settings, but those settings can be overridden for that specific request.&lt;/p&gt;&#xA;&lt;p&gt;Sessions are thread-safe, meaning multiple threads can call &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/Connection#newRequest()&#34; title=&#34;Creates a new request, using this Connection as the session-state and to initialize the connection settings (which&#xA;     may then be independently changed on the returned Connection.Request object).&#34;&gt;newRequest()&lt;/a&gt;&lt;/code&gt; on the same session concurrently. Each request object should only be used by a single worker thread at once.&lt;/p&gt;&#xA;&lt;p&gt;The session’s cookie store is accessible via &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/Connection#cookieStore()&#34; title=&#34;Get the cookie store used by this Connection.&#34;&gt;Connection.cookieStore()&lt;/a&gt;&lt;/code&gt;. This is maintained in memory for the lifetime of the session. For longer sessions, you can save the cookie store to disk by serializing it.&lt;/p&gt;&#xA;&#xA;      </content>
  </entry>
  <entry>
    <title>jsoup Java HTML Parser release 1.18.1</title>
    <link href="https://jsoup.org/news/release-1.18.1"></link>
    <id>https://jsoup.org/news/release-1.18.1</id>
    <updated>2024-07-10T00:00:00Z</updated>
    <summary>jsoup 1.18.1 is out now, with a new streaming parser that provides a hybrid DOM + SAX event-driven parsing interface, request progress tracking, and many other improvements.</summary>
    <content type="html">&lt;div class=&#34;entry-head&#34;&gt;&lt;div class=&#34;meta pubdate&#34;&gt;&lt;time datetime=&#34;2024-07-10&#34;&gt;Jul 10, 2024&lt;/time&gt;&lt;/div&gt;&lt;/div&gt;&#xA;&lt;p&gt;&lt;b&gt;jsoup 1.18.1&lt;/b&gt; is out now, with a new streaming parser that provides a hybrid DOM + SAX event-driven parsing interface, request progress tracking, and many other improvements.&lt;/p&gt;&#xA;&lt;p&gt;&lt;b&gt;jsoup&lt;/b&gt; is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors.&lt;/p&gt;&#xA;&lt;p&gt;&lt;a href=&#34;/download&#34;&gt;&lt;b&gt;Download&lt;/b&gt;&lt;/a&gt; jsoup now.&lt;/p&gt;&#xA;&lt;h2 id=&#34;improvements&#34;&gt;Improvements&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;&lt;b&gt;Stream Parser&lt;/b&gt;: A &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/parser/StreamParser&#34; title=&#34;A StreamParser provides a progressive parse of its input.&#34;&gt;StreamParser&lt;/a&gt;&lt;/code&gt; provides a progressive parse of its input. For URL requests, available via &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/Connection.Response#streamParser()&#34; title=&#34;Returns a StreamParser that will parse the Response progressively.&#34;&gt;Connection.Response.streamParser()&lt;/a&gt;&lt;/code&gt;. As each &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/nodes/Element&#34; title=&#34;An HTML Element consists of a tag name, attributes, and child nodes (including text nodes and other elements).&#34;&gt;Element&lt;/a&gt;&lt;/code&gt; is completed, it is emitted via a &lt;code&gt;Stream&lt;/code&gt; or &lt;code&gt;Iterator&lt;/code&gt; interface. Elements returned will be complete with all their children, and an (empty) next sibling, if applicable. Elements (or their children) may be removed from the DOM during the parse, for e.g. to conserve memory, providing a mechanism to parse an input document that would otherwise be too large to fit into memory, yet still providing a DOM interface to the document and its elements. Additionally, the parser provides a &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/nodes/Element#selectFirst(java.lang.String)&#34; title=&#34;Find the first Element that matches the Selector CSS query, with this element as the starting context.&#34;&gt;selectFirst(String query)&lt;/a&gt;&lt;/code&gt; / &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/parser/StreamParser#selectNext(java.lang.String)&#34; title=&#34;Finds the next Element that matches the provided query.&#34;&gt;selectNext(String query)&lt;/a&gt;&lt;/code&gt;, which will run the parser until a hit is found, at which point the parse is suspended. It can be resumed via another &lt;code&gt;select()&lt;/code&gt; call, or via the &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/nodes/Element#stream()&#34; title=&#34;Returns a Stream of this Element and all of its descendant Elements.&#34;&gt;stream()&lt;/a&gt;&lt;/code&gt; or &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/nodes/Element#iterator()&#34; title=&#34;Returns an Iterator that iterates this Element and each of its descendant Elements, in document order.&#34;&gt;iterator()&lt;/a&gt;&lt;/code&gt; methods. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/pull/2096&#34;&gt;#2096&lt;/a&gt; (with examples)&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;b&gt;Download Progress&lt;/b&gt;: added a Response &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/Progress&#34;&gt;Progress&lt;/a&gt;&lt;/code&gt; event interface, which reports progress and URLs are downloaded (and parsed). Set via &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/Connection#onResponseProgress(org.jsoup.Progress)&#34; title=&#34;Set the response progress handler, which will be called periodically as the response body is downloaded.&#34;&gt;Connection.onResponseProgress()&lt;/a&gt;&lt;/code&gt;. Supported on both a session and a single connection level. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/pull/2164&#34;&gt;#2164&lt;/a&gt;&lt;/small&gt;, &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/656&#34;&gt;#656&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;Added &lt;code&gt;Path&lt;/code&gt; accepting parse methods: &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/Jsoup#parse(java.nio.file.Path)&#34; title=&#34;Parse the contents of a file as HTML.&#34;&gt;Jsoup.parse(Path)&lt;/a&gt;&lt;/code&gt;, &lt;code&gt;Jsoup.parse(path, charsetName, baseUri, parser)&lt;/code&gt;, etc. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/pull/2055&#34;&gt;#2055&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;Updated the &lt;code&gt;button&lt;/code&gt; tag configuration to include a space between multiple button elements in the &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/nodes/Element#text()&#34; title=&#34;Gets the normalized, combined text of this element and all its children.&#34;&gt;Element.text()&lt;/a&gt;&lt;/code&gt; method. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2105&#34;&gt;#2105&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;Added support for the &lt;code&gt;ns|*&lt;/code&gt; all elements in namespace Selector. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/1811&#34;&gt;#1811&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;When normalising attribute names during serialization, invalid characters are now replaced with &lt;code&gt;_&lt;/code&gt;, vs being stripped. This should make the process clearer, and generally prevent an invalid attribute name being coerced unexpectedly. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2143&#34;&gt;#2143&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;changes&#34;&gt;Changes&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;Removed previously deprecated internal classes and methods. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/pull/2094&#34;&gt;#2094&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;Build change: the built jar’s OSGi manifest no longer imports itself. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2158&#34;&gt;#2158&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;h2 id=&#34;bug-fixes&#34;&gt;Bug Fixes&lt;/h2&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;When tracking source positions, if the first node was a TextNode, its position was incorrectly set to &lt;code&gt;-1.&lt;/code&gt; &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2106&#34;&gt;#2106&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;When connecting (or redirecting) to URLs with characters such as &lt;code&gt;{&lt;/code&gt;, &lt;code&gt;}&lt;/code&gt; in the path, a Malformed URL exception would be thrown (if in development), or the URL might otherwise not be escaped correctly (if in production). The URL encoding process has been improved to handle these characters correctly. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2142&#34;&gt;#2142&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;When using &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/helper/W3CDom&#34; title=&#34;Helper class to transform a Document to a org.w3c.dom.Document,&#xA; for integration with toolsets that use the W3C DOM.&#34;&gt;W3CDom&lt;/a&gt;&lt;/code&gt; with a custom output Document, a Null Pointer Exception would be thrown. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/pull/2114&#34;&gt;#2114&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;The &lt;code&gt;:has()&lt;/code&gt; selector did not match correctly when using sibling combinators (like e.g.: &lt;code&gt;h1:has(+h2)&lt;/code&gt;). &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2137&#34;&gt;#2137&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;The &lt;code&gt;:empty&lt;/code&gt; selector incorrectly matched elements that started with a blank text node and were followed by non-empty nodes, due to an incorrect short-circuit. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2130&#34;&gt;#2130&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;&lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/nodes/Element#cssSelector()&#34; title=&#34;Get a CSS selector that will uniquely select this element.&#34;&gt;Element.cssSelector()&lt;/a&gt;&lt;/code&gt; would fail with “Did not find balanced marker” when building a selector for elements that had a &lt;code&gt;(&lt;/code&gt; or &lt;code&gt;[in their class names. And selectors with those characters escaped would not match as expected. &amp;lt;small&amp;gt;\[https://github.com/jhy/jsoup/issues/2146 #2146](&lt;/code&gt;)&lt;/li&gt;&#xA;&lt;li&gt;Updated &lt;code&gt;Entities.escape(string)&lt;/code&gt; to make the escaped text suitable for both text nodes and attributes (previously was only for text nodes). This does not impact the output of &lt;code&gt;&lt;a href=&#34;/apidocs/org/jsoup/nodes/Element#html()&#34; title=&#34;Retrieves the element&amp;#39;s inner HTML.&#34;&gt;Element.html()&lt;/a&gt;&lt;/code&gt; which correctly applies a minimal escape depending on if the use will be for text data or in a quoted attribute. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/1278&#34;&gt;#1278&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;li&gt;Fuzz: a Stack Overflow exception could occur when resolving a crafted &lt;code&gt;&amp;lt;base href&amp;gt;&lt;/code&gt; URL, in the normalizing regex. &lt;small&gt;&lt;a href=&#34;https://github.com/jhy/jsoup/issues/2165&#34;&gt;#2165&lt;/a&gt;&lt;/small&gt;&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;hr/&gt;&#xA;&lt;p&gt;My sincere thanks to everyone who contributed patches, suggestions, and bug reports. If you have any suggestions for the next release, I would love to hear them; please get in touch via &lt;a href=&#34;https://github.com/jhy/jsoup/discussions&#34;&gt;jsoup discussions&lt;/a&gt;, or with me &lt;a href=&#34;https://jhedley.com/&#34;&gt;directly&lt;/a&gt;.&lt;/p&gt;&#xA;&lt;p&gt;You can also &lt;a rel=&#34;me&#34; href=&#34;https://tilde.zone/@jhy&#34;&gt;follow me&lt;/a&gt; (&lt;b&gt;&lt;a rel=&#34;me&#34; href=&#34;https://tilde.zone/@jhy&#34;&gt;@jhy@tilde.zone&lt;/a&gt;&lt;/b&gt;) on Mastodon / Fediverse to receive occasional notes about jsoup releases.&lt;/p&gt;&#xA;&#xA;      </content>
  </entry>
</feed>