Interface NodeVisitor

All Known Implementing Classes:
W3CDom.W3CBuilder
Functional Interface:
This is a functional interface and can therefore be used as the assignment target for a lambda expression or method reference.

@FunctionalInterface public interface NodeVisitor
Node visitor interface, used to walk the DOM and visit each node. Execute via traverse(Node) or Node.traverse(NodeVisitor). The traversal is depth-first.

This interface provides two methods, head(org.jsoup.nodes.Node, int) and tail(org.jsoup.nodes.Node, int). The head method is called when a node is first seen, and the tail method when all that node's children have been visited. As an example, head can be used to emit a start tag for a node, and tail to emit the end tag. The tail method defaults to a no-op, so this interface can be used as a FunctionalInterface, with head as its single abstract method.

Example:


 doc.body().traverse((node, depth) -> {
     switch (node) {
         case Element el     -> print(el.tag() + ": " + el.ownText());
         case DataNode data  -> print("Data: " + data.getWholeData());
         default             -> print(node.nodeName() + " at depth " + depth);
     }
 });
 
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    head(Node node, int depth)
    Callback for when a node is first visited.
    default void
    tail(Node node, int depth)
    Callback for when a node is last visited, after all of its descendants have been visited.
    default void
    traverse(Node root)
    Run a depth-first traverse of the root and all of its descendants.
  • Method Details

    • head

      void head(Node node, int depth)
      Callback for when a node is first visited.

      The node may be modified (for example via Node.attr(String)), removed with Node.remove(), or replaced with Node.replaceWith(Node). If the node is an Element, you may cast it and access those methods.

      Traversal uses a forward cursor. After head() completes:

      • If the current node is still attached, traversal continues into its current children and then its following siblings. Nodes inserted before the current node are not visited.
      • If the current node was detached and another node now occupies its former sibling position, the node now at that position is not passed to head() again. Traversal continues from there: its children are visited, then the node is passed to tail(Node, int), then later siblings are visited.
      • If the current node was detached and no node occupies its former sibling position, the current node is not passed to tail(), and traversal resumes at the node that originally followed it.

      Traversal never advances outside the original root subtree. If the traversal root is detached during head(), traversal stops at the original root boundary.

      Parameters:
      node - the node being visited.
      depth - the depth of the node, relative to the root node. E.g., the root node has depth 0, and a child node of that will have depth 1.
    • tail

      default void tail(Node node, int depth)
      Callback for when a node is last visited, after all of its descendants have been visited.

      This method defaults to a no-op.

      The node passed to tail() is the node at the current traversal position when the subtree completes. If head() replaced the original node, this may be the replacement node instead.

      Structural changes to the current node are not supported during tail().

      Parameters:
      node - the node being visited.
      depth - the depth of the node, relative to the root node. E.g., the root node has depth 0, and a child node of that will have depth 1.
    • traverse

      default void traverse(Node root)
      Run a depth-first traverse of the root and all of its descendants.
      Parameters:
      root - the initial node point to traverse.
      Since:
      1.21.1