Package org.jsoup.parser
Class Parser
java.lang.Object
org.jsoup.parser.Parser
public class Parser extends Object
Parses HTML or XML into a
Document
. Generally, it is simpler to use one of the parse methods in Jsoup
.
Note that a Parser instance object is not threadsafe. To reuse a Parser configuration in a multi-threaded environment, use newInstance()
to make copies.
-
Field Summary
Modifier and TypeFieldDescriptionstatic final String
static final String
static final String
static final String
-
Constructor Summary
ConstructorDescriptionParser
(org.jsoup.parser.TreeBuilder treeBuilder) Create a new Parser, using the specified TreeBuilder -
Method Summary
Modifier and TypeMethodDescriptionString
Retrieve the parse errors, if any, from the last parse.org.jsoup.parser.TreeBuilder
Get the TreeBuilder currently in use.static Parser
Create a new HTML parser.boolean
isContentForTagData
(String normalName) (An internal method, visible for Element.boolean
Check if parse error tracking is enabled.boolean
Test if position tracking is enabled.Creates a new Parser as a deep copy of this; including initializing a new TreeBuilder.static Document
parse
(String html, String baseUri) Parse HTML into a Document.static Document
parseBodyFragment
(String bodyHtml, String baseUri) Parse a fragment of HTML into thebody
of a Document.static List<Node>
parseFragment
(String fragmentHtml, Element context, String baseUri) Parse a fragment of HTML into a list of nodes.static List<Node>
parseFragment
(String fragmentHtml, Element context, String baseUri, ParseErrorList errorList) Parse a fragment of HTML into a list of nodes.List<Node>
parseFragmentInput
(String fragment, Element context, String baseUri) parseInput
(Reader inputHtml, String baseUri) parseInput
(String html, String baseUri) static List<Node>
parseXmlFragment
(String fragmentXml, String baseUri) Parse a fragment of XML into a list of nodes.settings()
Gets the current ParseSettings for this Parsersettings
(ParseSettings settings) Update the ParseSettings of this Parser, to control the case sensitivity of tags and attributes.setTrackErrors
(int maxErrors) Enable or disable parse error tracking for the next parse.setTrackPosition
(boolean trackPosition) Enable or disable source position tracking.setTreeBuilder
(org.jsoup.parser.TreeBuilder treeBuilder) Update the TreeBuilder used when parsing content.static String
unescapeEntities
(String string, boolean inAttribute) Utility method to unescape HTML entities from a stringstatic Parser
Create a new XML parser.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Field Details
-
Constructor Details
-
Method Details
-
newInstance
Creates a new Parser as a deep copy of this; including initializing a new TreeBuilder. Allows independent (multi-threaded) use.- Returns:
- a copied parser
-
parseInput
-
parseInput
-
parseFragmentInput
-
getTreeBuilder
public org.jsoup.parser.TreeBuilder getTreeBuilder()Get the TreeBuilder currently in use.- Returns:
- current TreeBuilder.
-
setTreeBuilder
Update the TreeBuilder used when parsing content.- Parameters:
-
treeBuilder
- new TreeBuilder - Returns:
- this, for chaining
-
isTrackErrors
public boolean isTrackErrors()Check if parse error tracking is enabled.- Returns:
- current track error state.
-
setTrackErrors
Enable or disable parse error tracking for the next parse.- Parameters:
-
maxErrors
- the maximum number of errors to track. Set to 0 to disable. - Returns:
- this, for chaining
-
getErrors
Retrieve the parse errors, if any, from the last parse.- Returns:
- list of parse errors, up to the size of the maximum errors tracked.
- See Also:
-
isTrackPosition
public boolean isTrackPosition()Test if position tracking is enabled. If it is, Nodes will have a Position to track where in the original input source they were created from. By default, tracking is not enabled.- Returns:
- current track position setting
-
setTrackPosition
Enable or disable source position tracking. If enabled, Nodes will have a Position to track where in the original input source they were created from.- Parameters:
-
trackPosition
- position tracking setting;true
to enable - Returns:
- this Parser, for chaining
-
settings
Update the ParseSettings of this Parser, to control the case sensitivity of tags and attributes.- Parameters:
-
settings
- the new settings - Returns:
- this Parser
-
settings
Gets the current ParseSettings for this Parser- Returns:
- current ParseSettings
-
isContentForTagData
public boolean isContentForTagData(String normalName) (An internal method, visible for Element. For HTML parse, signals that script and style text should be treated as Data Nodes). -
defaultNamespace
public String defaultNamespace() -
parse
Parse HTML into a Document.- Parameters:
-
html
- HTML to parse -
baseUri
- base URI of document (i.e. original fetch location), for resolving relative URLs. - Returns:
- parsed Document
-
parseFragment
Parse a fragment of HTML into a list of nodes. The context element, if supplied, supplies parsing context.- Parameters:
-
fragmentHtml
- the fragment of HTML to parse -
context
- (optional) the element that this HTML fragment is being parsed for (i.e. for inner HTML). This provides stack context (for implicit element creation). -
baseUri
- base URI of document (i.e. original fetch location), for resolving relative URLs. - Returns:
- list of nodes parsed from the input HTML. Note that the context element, if supplied, is not modified.
-
parseFragment
public static List<Node> parseFragment(String fragmentHtml, Element context, String baseUri, ParseErrorList errorList) Parse a fragment of HTML into a list of nodes. The context element, if supplied, supplies parsing context.- Parameters:
-
fragmentHtml
- the fragment of HTML to parse -
context
- (optional) the element that this HTML fragment is being parsed for (i.e. for inner HTML). This provides stack context (for implicit element creation). -
baseUri
- base URI of document (i.e. original fetch location), for resolving relative URLs. -
errorList
- list to add errors to - Returns:
- list of nodes parsed from the input HTML. Note that the context element, if supplied, is not modified.
-
parseXmlFragment
Parse a fragment of XML into a list of nodes.- Parameters:
-
fragmentXml
- the fragment of XML to parse -
baseUri
- base URI of document (i.e. original fetch location), for resolving relative URLs. - Returns:
- list of nodes parsed from the input XML.
-
parseBodyFragment
Parse a fragment of HTML into thebody
of a Document.- Parameters:
-
bodyHtml
- fragment of HTML -
baseUri
- base URI of document (i.e. original fetch location), for resolving relative URLs. - Returns:
- Document, with empty head, and HTML parsed into body
-
unescapeEntities
public static String unescapeEntities(String string, boolean inAttribute) Utility method to unescape HTML entities from a string- Parameters:
-
string
- HTML escaped string -
inAttribute
- if the string is to be escaped in strict mode (as attributes are) - Returns:
- an unescaped string
-
htmlParser
Create a new HTML parser. This parser treats input as HTML5, and enforces the creation of a normalised document, based on a knowledge of the semantics of the incoming tags.- Returns:
- a new HTML parser.
-
xmlParser
Create a new XML parser. This parser assumes no knowledge of the incoming tags and does not treat it as HTML, rather creates a simple tree directly from the input.- Returns:
- a new simple XML parser.
-