Package org.jsoup.helper

Class W3CDom

java.lang.Object
org.jsoup.helper.W3CDom

public class W3CDom extends Object
Helper class to transform a Document to a org.w3c.dom.Document, for integration with toolsets that use the W3C DOM.
  • Field Details

    • SourceProperty

      public static final String SourceProperty
      For W3C Documents created by this class, this property is set on each node to link back to the original jsoup node.
      See Also:
    • XPathFactoryProperty

      public static final String XPathFactoryProperty
      To get support for XPath versions > 1, set this property to the classname of an alternate XPathFactory implementation. (For e.g. net.sf.saxon.xpath.XPathFactoryImpl).
      See Also:
    • factory

      protected DocumentBuilderFactory factory
  • Constructor Details

    • W3CDom

      public W3CDom()
  • Method Details

    • namespaceAware

      public boolean namespaceAware()
      Returns if this W3C DOM is namespace aware. By default, this will be true, but is disabled for simplicity when using XPath selectors in Element.selectXpath(String).
      Returns:
      the current namespace aware setting.
    • namespaceAware

      public W3CDom namespaceAware(boolean namespaceAware)
      Update the namespace aware setting. This impacts the factory that is used to create W3C nodes from jsoup nodes.
      Parameters:
      namespaceAware - the updated setting
      Returns:
      this W3CDom, for chaining.
    • convert

      public static Document convert(Document in)
      Converts a jsoup DOM to a W3C DOM.
      Parameters:
      in - jsoup Document
      Returns:
      W3C Document
    • asString

      public static String asString(Document doc, @Nullable Map<String,String> properties)
      Serialize a W3C document to a String. Provide Properties to define output settings including if HTML or XML. If you don't provide the properties (null), the output will be auto-detected based on the content of the document.
      Parameters:
      doc - Document
      properties - (optional/nullable) the output properties to use. See Transformer.setOutputProperties(Properties) and OutputKeys
      Returns:
      Document as string
      See Also:
    • OutputHtml

      public static HashMap<String,String> OutputHtml()
      Canned default for HTML output.
    • OutputXml

      public static HashMap<String,String> OutputXml()
      Canned default for XML output.
    • fromJsoup

      public Document fromJsoup(Document in)
      Convert a jsoup Document to a W3C Document. The created nodes will link back to the original jsoup nodes in the user property SourceProperty (but after conversion, changes on one side will not flow to the other).
      Parameters:
      in - jsoup doc
      Returns:
      a W3C DOM Document representing the jsoup Document or Element contents.
    • fromJsoup

      public Document fromJsoup(Element in)
      Convert a jsoup DOM to a W3C Document. The created nodes will link back to the original jsoup nodes in the user property SourceProperty (but after conversion, changes on one side will not flow to the other). The input Element is used as a context node, but the whole surrounding jsoup Document is converted. (If you just want a subtree converted, use convert(org.jsoup.nodes.Element, Document).)
      Parameters:
      in - jsoup element or doc
      Returns:
      a W3C DOM Document representing the jsoup Document or Element contents.
      See Also:
    • convert

      public void convert(Document in, Document out)
      Converts a jsoup document into the provided W3C Document. If required, you can set options on the output document before converting.
      Parameters:
      in - jsoup doc
      out - w3c doc
      See Also:
    • convert

      public void convert(Element in, Document out)
      Converts a jsoup element into the provided W3C Document. If required, you can set options on the output document before converting.
      Parameters:
      in - jsoup element
      out - w3c doc
      See Also:
    • selectXpath

      public NodeList selectXpath(String xpath, Document doc)
      Evaluate an XPath query against the supplied document, and return the results.
      Parameters:
      xpath - an XPath query
      doc - the document to evaluate against
      Returns:
      the matches nodes
    • selectXpath

      public NodeList selectXpath(String xpath, Node contextNode)
      Evaluate an XPath query against the supplied context node, and return the results.
      Parameters:
      xpath - an XPath query
      contextNode - the context node to evaluate against
      Returns:
      the matches nodes
    • sourceNodes

      public <T extends Node> List<T> sourceNodes(NodeList nodeList, Class<T> nodeType)
      Retrieves the original jsoup DOM nodes from a nodelist created by this convertor.
      Type Parameters:
      T - node type
      Parameters:
      nodeList - the W3C nodes to get the original jsoup nodes from
      nodeType - the jsoup node type to retrieve (e.g. Element, DataNode, etc)
      Returns:
      a list of the original nodes
    • contextNode

      public Node contextNode(Document wDoc)
      For a Document created by fromJsoup(org.jsoup.nodes.Element), retrieves the W3C context node.
      Parameters:
      wDoc - Document created by this class
      Returns:
      the corresponding W3C Node to the jsoup Element that was used as the creating context.
    • asString

      public String asString(Document doc)
      Serialize a W3C document to a String. The output format will be XML or HTML depending on the content of the doc.
      Parameters:
      doc - Document
      Returns:
      Document as string
      See Also: