Package org.jsoup.helper
Class W3CDom
java.lang.Object
org.jsoup.helper.W3CDom
public class W3CDom extends Object
Helper class to transform a
Document
to a org.w3c.dom.Document
, for integration with toolsets that use the W3C DOM.
-
Nested Class Summary
Modifier and TypeClassDescriptionprotected static class
Implements the conversion by walking the input. -
Field Summary
Modifier and TypeFieldDescriptionprotected DocumentBuilderFactory
static final String
For W3C Documents created by this class, this property is set on each node to link back to the original jsoup node.static final String
To get support for XPath versions > 1, set this property to the classname of an alternate XPathFactory implementation. -
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionString
Serialize a W3C document to a String.static String
Serialize a W3C document to a String.contextNode
(Document wDoc) For a Document created byfromJsoup(org.jsoup.nodes.Element)
, retrieves the W3C context node.static Document
Converts a jsoup DOM to a W3C DOM.void
Converts a jsoup document into the provided W3C Document.void
Converts a jsoup element into the provided W3C Document.Convert a jsoup Document to a W3C Document.Convert a jsoup DOM to a W3C Document.boolean
Returns if this W3C DOM is namespace aware.namespaceAware
(boolean namespaceAware) Update the namespace aware setting.static HashMap
<String, String> Canned default for HTML output.static HashMap
<String, String> Canned default for XML output.selectXpath
(String xpath, Document doc) Evaluate an XPath query against the supplied document, and return the results.selectXpath
(String xpath, Node contextNode) Evaluate an XPath query against the supplied context node, and return the results.<T extends Node>
List<T> sourceNodes
(NodeList nodeList, Class<T> nodeType) Retrieves the original jsoup DOM nodes from a nodelist created by this convertor.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Field Details
-
SourceProperty
public static final String SourcePropertyFor W3C Documents created by this class, this property is set on each node to link back to the original jsoup node.- See Also:
-
XPathFactoryProperty
public static final String XPathFactoryPropertyTo get support for XPath versions > 1, set this property to the classname of an alternate XPathFactory implementation. (For e.g.net.sf.saxon.xpath.XPathFactoryImpl
).- See Also:
-
factory
-
-
Constructor Details
-
Method Details
-
namespaceAware
public boolean namespaceAware()Returns if this W3C DOM is namespace aware. By default, this will betrue
, but is disabled for simplicity when using XPath selectors inElement.selectXpath(String)
.- Returns:
- the current namespace aware setting.
-
namespaceAware
Update the namespace aware setting. This impacts the factory that is used to create W3C nodes from jsoup nodes.For HTML documents, controls if the document will be in the default
.http://www.w3.org/1999/xhtml
namespace if otherwise unset.- Parameters:
-
namespaceAware
- the updated setting - Returns:
- this W3CDom, for chaining.
-
convert
Converts a jsoup DOM to a W3C DOM.- Parameters:
-
in
- jsoup Document - Returns:
- W3C Document
-
asString
Serialize a W3C document to a String. Provide Properties to define output settings including if HTML or XML. If you don't provide the properties (null
), the output will be auto-detected based on the content of the document.- Parameters:
-
doc
- Document -
properties
- (optional/nullable) the output properties to use. SeeTransformer.setOutputProperties(Properties)
andOutputKeys
- Returns:
- Document as string
- See Also:
-
OutputHtml
public static HashMap<String,String> OutputHtml()Canned default for HTML output. -
OutputXml
public static HashMap<String,String> OutputXml()Canned default for XML output. -
fromJsoup
Convert a jsoup Document to a W3C Document. The created nodes will link back to the original jsoup nodes in the user propertySourceProperty
(but after conversion, changes on one side will not flow to the other).- Parameters:
-
in
- jsoup doc - Returns:
- a W3C DOM Document representing the jsoup Document or Element contents.
-
fromJsoup
Convert a jsoup DOM to a W3C Document. The created nodes will link back to the original jsoup nodes in the user propertySourceProperty
(but after conversion, changes on one side will not flow to the other). The input Element is used as a context node, but the whole surrounding jsoup Document is converted. (If you just want a subtree converted, useconvert(org.jsoup.nodes.Element, Document)
.)- Parameters:
-
in
- jsoup element or doc - Returns:
- a W3C DOM Document representing the jsoup Document or Element contents.
- See Also:
-
convert
Converts a jsoup document into the provided W3C Document. If required, you can set options on the output document before converting.- Parameters:
-
in
- jsoup doc -
out
- w3c doc - See Also:
-
convert
Converts a jsoup element into the provided W3C Document. If required, you can set options on the output document before converting.- Parameters:
-
in
- jsoup element -
out
- w3c doc - See Also:
-
selectXpath
Evaluate an XPath query against the supplied document, and return the results.- Parameters:
-
xpath
- an XPath query -
doc
- the document to evaluate against - Returns:
- the matches nodes
-
selectXpath
Evaluate an XPath query against the supplied context node, and return the results.- Parameters:
-
xpath
- an XPath query -
contextNode
- the context node to evaluate against - Returns:
- the matches nodes
-
sourceNodes
Retrieves the original jsoup DOM nodes from a nodelist created by this convertor.- Type Parameters:
-
T
- node type - Parameters:
-
nodeList
- the W3C nodes to get the original jsoup nodes from -
nodeType
- the jsoup node type to retrieve (e.g. Element, DataNode, etc) - Returns:
- a list of the original nodes
-
contextNode
For a Document created byfromJsoup(org.jsoup.nodes.Element)
, retrieves the W3C context node.- Parameters:
-
wDoc
- Document created by this class - Returns:
- the corresponding W3C Node to the jsoup Element that was used as the creating context.
-
asString
Serialize a W3C document to a String. The output format will be XML or HTML depending on the content of the doc.- Parameters:
-
doc
- Document - Returns:
- Document as string
- See Also:
-