Package org.jsoup.nodes

Class Document

java.lang.Object
All Implemented Interfaces:
Cloneable, Iterable<Element>

public class Document extends Element
A HTML Document.
Author:
Jonathan Hedley, jonathan@hedley.net
  • Constructor Details

  • Method Details

    • createShell

      public static Document createShell(String baseUri)
      Create a valid, empty shell of an HTML document, suitable for adding more elements to.
      Parameters:
      baseUri- baseUri of document
      Returns:
      document with html, head, and body elements.
    • location

      public String location()
      Get the URL this Document was parsed from. If the starting URL is a redirect, this will return the final URL from which the document was served from.

      Will return an empty string if the location is unknown (e.g. if parsed from a String).

      Returns:
      location
    • connection

      public Connection connection()
      Returns the Connection (Request/Response) object that was used to fetch this document, if any; otherwise, a new default Connection object. This can be used to continue a session, preserving settings and cookies, etc.
      Returns:
      the Connection (session) associated with this Document, or an empty one otherwise.
      See Also:
    • documentType

      public @Nullable DocumentType documentType()
      Returns this Document's doctype.
      Returns:
      document type, or null if not set
    • head

      public Element head()
      Get this document's head element.

      As a side effect, if this Document does not already have an HTML structure, it will be created. If you do not want that, use #selectFirst("head") instead.

      Returns:
      headelement.
    • body

      public Element body()
      Get this document's <body> or <frameset> element.

      As a side-effect, if this Document does not already have an HTML structure, it will be created with a <body> element. If you do not want that, use #selectFirst("body") instead.

      Returns:
      bodyelement for documents with a <body>, a new <body> element if the document had no contents, or the outermost <frameset> element for frameset documents.
    • forms

      public List<FormElement> forms()
      Get each of the <form> elements contained in this document.
      Returns:
      a List of FormElement objects, which will be empty if there are none.
      Since:
      1.15.4
      See Also:
    • expectForm

      public FormElement expectForm(String cssQuery)
      Selects the first FormElement in this document that matches the query. If none match, throws an IllegalArgumentException.
      Parameters:
      cssQuery- a Selector CSS query
      Returns:
      the first matching <form> element
      Throws:
      IllegalArgumentException- if no match is found
      Since:
      1.15.4
    • title

      public String title()
      Get the string contents of the document's title element.
      Returns:
      Trimmed title, or empty string if none set.
    • title

      public void title(String title)
      Set the document's title element. Updates the existing element, or adds title to head if not present
      Parameters:
      title- string to set as title
    • createElement

      public Element createElement(String tagName)
      Create a new Element, with this document's base uri. Does not make the new element a child of this document.
      Parameters:
      tagName- element tag name (e.g. a)
      Returns:
      new element
    • outerHtml

      public String outerHtml()
      Description copied from class: Node
      Get the outer HTML of this node. For example, on a p element, may return <p>Para</p>.
      Overrides:
      outerHtml in class Node
      Returns:
      outer HTML
      See Also:
    • text

      public Element text(String text)
      Set the text of the body of this document. Any existing nodes within the body will be cleared.
      Overrides:
      text in class Element
      Parameters:
      text- un-encoded text
      Returns:
      this document
    • nodeName

      public String nodeName()
      Description copied from class: Node
      Get the node name of this node. Use for debugging purposes and not logic switching (for that, use instanceof).
      Overrides:
      nodeName in class Element
      Returns:
      node name
    • charset

      public void charset(Charset charset)
      Set the output character set of this Document. This method is equivalent to OutputSettings.charset(Charset), but additionally adds or updates the charset / encoding element within the Document.

      If there's no existing element with charset / encoding information yet, one will be created. Obsolete charset / encoding definitions are removed.

      Elements used:

      • HTML: <meta charset="CHARSET">
      • XML: <?xml version="1.0" encoding="CHARSET">
      Parameters:
      charset- Charset
      See Also:
    • charset

      public Charset charset()
      Get the output character set of this Document. This method is equivalent to Document.OutputSettings.charset().
      Returns:
      the current Charset
      See Also:
    • updateMetaCharsetElement

      @Deprecated public void updateMetaCharsetElement(boolean noop)
      Deprecated.
      this setting has no effect; the meta charset element is always updated when charset(Charset) is called. This method will be removed in jsoup 1.20.1.
    • updateMetaCharsetElement

      @Deprecated public boolean updateMetaCharsetElement()
      Deprecated.
      this setting has no effect; the meta charset element is always updated when charset(Charset) is called. This method will be removed in jsoup 1.20.1.
    • clone

      public Document clone()
      Description copied from class: Node
      Create a stand-alone, deep copy of this node, and all of its children. The cloned node will have no siblings.

      • If this node is a LeafNode, the clone will have no parent.
      • If this node is an Element, the clone will have a simple owning Document to retain the configured output settings and parser.

      The cloned node may be adopted into another Document or node structure using Element.appendChild(Node).

      Overrides:
      clone in class Element
      Returns:
      a stand-alone cloned node, including clones of any children
      See Also:
    • shallowClone

      public Document shallowClone()
      Description copied from class: Node
      Create a stand-alone, shallow copy of this node. None of its children (if any) will be cloned, and it will have no parent or sibling nodes.
      Overrides:
      shallowClone in class Element
      Returns:
      a single independent copy of this node
      See Also:
    • outputSettings

      public Document.OutputSettings outputSettings()
      Get the document's current output settings.
      Returns:
      the document's current output settings.
    • outputSettings

      public Document outputSettings(Document.OutputSettings outputSettings)
      Set the document's output settings.
      Parameters:
      outputSettings- new output settings.
      Returns:
      this document, for chaining.
    • quirksMode

      public Document.QuirksMode quirksMode()
    • quirksMode

      public Document quirksMode(Document.QuirksMode quirksMode)
    • parser

      public Parser parser()
      Get the parser that was used to parse this document.
      Returns:
      the parser
    • parser

      public Document parser(Parser parser)
      Set the parser used to create this document. This parser is then used when further parsing within this document is required.
      Parameters:
      parser- the configured parser to use when further parsing is required for this document.
      Returns:
      this document, for chaining.
    • connection

      public Document connection(Connection connection)
      Set the Connection used to fetch this document. This Connection is used as a session object when further requests are made (e.g. when a form is submitted).
      Parameters:
      connection- to set
      Returns:
      this document, for chaining
      Since:
      1.14.1
      See Also: