Class StreamParser
- All Implemented Interfaces:
-
Closeable
,AutoCloseable
Elements (or their children) may be removed from the DOM during the parse, for e.g. to conserve memory, providing a mechanism to parse an input document that would otherwise be too large to fit into memory, yet still providing a DOM interface to the document and its elements.
Additionally, the parser provides a selectFirst(String query)
/ selectNext(String query)
, which will run the parser until a hit is found, at which point the parse is suspended. It can be resumed via another select()
call, or via the stream()
or iterator()
methods.
Once the input has been fully read, the input Reader will be closed. Or, if the whole document does not need to be read, call stop()
and close()
.
The document()
method will return the Document being parsed into, which will be only partially complete until the input is fully consumed.
A StreamParser can be reused via a new parse(Reader, String)
, but is not thread-safe for concurrent inputs. New parsers should be used in each thread.
If created via Connection.Response.streamParser()
, or another Reader that is I/O backed, the iterator and stream consumers will throw an UncheckedIOException
if the underlying Reader errors during read.
The StreamParser interface is currently in beta and may change in subsequent releases. Feedback on the feature and how you're using it is very welcome via the jsoup discussions.
- Since:
- 1.18.1
-
Constructor Summary
ConstructorDescriptionStreamParser
(Parser parser) Construct a new StreamParser, using the supplied base Parser. -
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
Closes the input and releases resources including the underlying parser and reader.complete()
Runs the parser until the input is fully read, and returns the completed Document.List
<Node> When initialized as a fragment parse, runs the parser until the input is fully read, and returns the completed fragment child nodes.document()
Get the currentDocument
as it is being parsed.expectFirst
(String query) Just likeselectFirst(String)
, but if there is no match, throws anIllegalArgumentException
.expectNext
(String query) Just likeselectFirst(String)
, but if there is no match, throws anIllegalArgumentException
.Iterator
<Element> iterator()
Returns anIterator
ofElement
s, with the input being parsed as each element is consumed.Provide the input for a Document parse.parse
(String input, String baseUri) Provide the input for a Document parse.parseFragment
(Reader input, @Nullable Element context, String baseUri) Provide the input for a fragment parse.parseFragment
(String input, @Nullable Element context, String baseUri) Provide the input for a fragment parse.@Nullable Element
selectFirst
(String query) Finds the first Element that matches the provided query.@Nullable Element
selectFirst
(Evaluator eval) Finds the first Element that matches the provided query.@Nullable Element
selectNext
(String query) Finds the next Element that matches the provided query.@Nullable Element
selectNext
(Evaluator eval) Finds the next Element that matches the provided query.stop()
Flags that the parse should be stopped; the backing iterator will not return any more Elements.Stream
<Element> stream()
Creates aStream
ofElement
s, with the input being parsed as each element is consumed.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Constructor Details
-
Method Details
-
parse
Provide the input for a Document parse. The input is not read until a consuming operation is called.- Parameters:
-
input
- the input to be read. -
baseUri
- the URL of this input, for absolute link resolution - Returns:
- this parser, for chaining
-
parse
Provide the input for a Document parse. The input is not read until a consuming operation is called.- Parameters:
-
input
- the input to be read -
baseUri
- the URL of this input, for absolute link resolution - Returns:
- this parser
-
parseFragment
Provide the input for a fragment parse. The input is not read until a consuming operation is called.- Parameters:
-
input
- the input to be read -
context
- the optional fragment context element -
baseUri
- the URL of this input, for absolute link resolution - Returns:
- this parser
- See Also:
-
parseFragment
Provide the input for a fragment parse. The input is not read until a consuming operation is called.- Parameters:
-
input
- the input to be read -
context
- the optional fragment context element -
baseUri
- the URL of this input, for absolute link resolution - Returns:
- this parser
- See Also:
-
stream
Creates aStream
ofElement
s, with the input being parsed as each element is consumed. Each Element returned will be complete (that is, all of its children will be included, and if it has a next sibling, that (empty) sibling will exist atElement.nextElementSibling()
). The stream will be emitted in document order as each element is closed. That means that child elements will be returned prior to their parents.The stream will start from the current position of the backing iterator and the parse.
When consuming the stream, if the Reader that the Parser is reading throws an I/O exception (for example a SocketTimeoutException), that will be emitted as an
UncheckedIOException
- Returns:
- a stream of Element objects
- Throws:
-
UncheckedIOException
- if the underlying Reader excepts during a read (in stream consuming methods)
-
iterator
Returns anIterator
ofElement
s, with the input being parsed as each element is consumed. Each Element returned will be complete (that is, all of its children will be included, and if it has a next sibling, that (empty) sibling will exist atElement.nextElementSibling()
). The elements will be emitted in document order as each element is closed. That means that child elements will be returned prior to their parents.The iterator will start from the current position of the parse.
The iterator is backed by this StreamParser, and the resources it holds.
- Returns:
- a stream of Element objects
-
stop
Flags that the parse should be stopped; the backing iterator will not return any more Elements.- Returns:
- this parser
-
close
public void close()Closes the input and releases resources including the underlying parser and reader.The parser will also be closed when the input is fully read.
The parser can be reused with another call to
parse(Reader, String)
. -
document
Get the currentDocument
as it is being parsed. It will be only partially complete until the input is fully read. Structural changes (e.g. insert, remove) may be made to the Document contents.- Returns:
- the (partial) Document
-
complete
Runs the parser until the input is fully read, and returns the completed Document.- Returns:
- the completed Document
- Throws:
-
IOException
- if an I/O error occurs
-
completeFragment
When initialized as a fragment parse, runs the parser until the input is fully read, and returns the completed fragment child nodes.- Returns:
- the completed child nodes
- Throws:
-
IOException
- if an I/O error occurs - See Also:
-
selectFirst
Finds the first Element that matches the provided query. If the parsed Document does not already have a match, the input will be parsed until the first match is found, or the input is completely read.- Parameters:
-
query
- theSelector
query. - Returns:
-
the first matching
Element
, ornull
if there's no match - Throws:
-
IOException
- if an I/O error occurs
-
expectFirst
Just likeselectFirst(String)
, but if there is no match, throws anIllegalArgumentException
. This is useful if you want to simply abort processing on a failed match.- Parameters:
-
query
- theSelector
query. - Returns:
- the first matching element
- Throws:
-
IllegalArgumentException
- if no match is found -
IOException
- if an I/O error occurs
-
selectFirst
Finds the first Element that matches the provided query. If the parsed Document does not already have a match, the input will be parsed until the first match is found, or the input is completely read.- Parameters:
-
eval
- theSelector
evaluator. - Returns:
-
the first matching
Element
, ornull
if there's no match - Throws:
-
IOException
- if an I/O error occurs
-
selectNext
Finds the next Element that matches the provided query. The input will be parsed until the next match is found, or the input is completely read.- Parameters:
-
query
- theSelector
query. - Returns:
-
the next matching
Element
, ornull
if there's no match - Throws:
-
IOException
- if an I/O error occurs
-
expectNext
Just likeselectFirst(String)
, but if there is no match, throws anIllegalArgumentException
. This is useful if you want to simply abort processing on a failed match.- Parameters:
-
query
- theSelector
query. - Returns:
- the first matching element
- Throws:
-
IllegalArgumentException
- if no match is found -
IOException
- if an I/O error occurs
-
selectNext
Finds the next Element that matches the provided query. The input will be parsed until the next match is found, or the input is completely read.- Parameters:
-
eval
- theSelector
evaluator. - Returns:
-
the next matching
Element
, ornull
if there's no match - Throws:
-
IOException
- if an I/O error occurs
-