All Classes and Interfaces: jsoup HTML Parser Documentation

Class

Description

Attribute

A single key + value attribute.

Attributes

The attributes of an Element.

CDataNode

A Character Data node, to support CDATA sections.

CharacterReader

CharacterReader consumes tokens off a string.

Cleaner

The safelist based HTML cleaner.

Collector

Collects a list of elements that match the supplied criteria.

CombiningEvaluator

Base combining (and, or) evaluator.

CombiningEvaluator.And

CombiningEvaluator.Or

Comment

A comment node.

Connection

The Connection interface is a convenient HTTP client and session object to fetch content from the web, and parse them into Documents.

Connection.Base<T>

Common methods for Requests and Responses

Connection.KeyVal

A Key:Value tuple(+), used for form data.

Connection.Method

GET and POST http methods.

Connection.Request

Represents a HTTP request.

Connection.Response

Represents a HTTP response.

ControllableInputStream

A jsoup internal class (so don't use it as there is no contract API) that enables controls on a buffered input stream, namely a maximum read size, and the ability to Thread.interrupt() the read.

DataNode

A data node, for contents of style, script tags etc, where contents should not show in text().

DataUtil

Internal static utilities for handling data.

Document

A HTML Document.

Document.OutputSettings

A Document's output settings control the form of the text() and html() methods.

Document.OutputSettings.Syntax

The output serialization syntax.

Document.QuirksMode

DocumentType

A <!DOCTYPE> node.

Element

An HTML Element consists of a tag name, attributes, and child nodes (including text nodes and other elements).

Elements

A list of Elements, with methods that act on every element in the list.

Entities

HTML entities, and escape routines.

Entities.EscapeMode

Evaluator

An Evaluator tests if an element (or a node) meets the selector's requirements.

Evaluator.AllElements

Evaluator for any / all element matching

Evaluator.Attribute

Evaluator for attribute name matching

Evaluator.AttributeKeyPair

Abstract evaluator for attribute name/value matching

Evaluator.AttributeStarting

Evaluator for attribute name prefix matching

Evaluator.AttributeWithValue

Evaluator for attribute name/value matching

Evaluator.AttributeWithValueContaining

Evaluator for attribute name/value matching (value containing)

Evaluator.AttributeWithValueEnding

Evaluator for attribute name/value matching (value ending)

Evaluator.AttributeWithValueMatching

Evaluator for attribute name/value matching (value regex matching)

Evaluator.AttributeWithValueNot

Evaluator for attribute name !

Evaluator.AttributeWithValueStarting

Evaluator for attribute name/value matching (value prefix)

Evaluator.Class

Evaluator for element class

Evaluator.ContainsData

Evaluator for matching Element (and its descendants) data

Evaluator.ContainsOwnText

Evaluator for matching Element's own text

Evaluator.ContainsText

Evaluator for matching Element (and its descendants) text

Evaluator.ContainsWholeOwnText

Evaluator for matching Element (but not its descendants) wholeText.

Evaluator.ContainsWholeText

Evaluator for matching Element (and its descendants) wholeText.

Evaluator.CssNthEvaluator

Evaluator.Id

Evaluator for element id

Evaluator.IndexEquals

Evaluator for matching by sibling index number (e = idx)

Evaluator.IndexEvaluator

Abstract evaluator for sibling index matching

Evaluator.IndexGreaterThan

Evaluator for matching by sibling index number (e > idx)

Evaluator.IndexLessThan

Evaluator for matching by sibling index number (e < idx)

Evaluator.IsEmpty

Evaluator.IsFirstChild

Evaluator for matching the first sibling (css :first-child)

Evaluator.IsFirstOfType

Evaluator.IsLastChild

Evaluator for matching the last sibling (css :last-child)

Evaluator.IsLastOfType

Evaluator.IsNthChild

css-compatible Evaluator for :eq (css :nth-child)

Evaluator.IsNthLastChild

css pseudo class :nth-last-child)

Evaluator.IsNthLastOfType

Evaluator.IsNthOfType

css pseudo class nth-of-type

Evaluator.IsOnlyChild

Evaluator.IsOnlyOfType

Evaluator.IsRoot

css3 pseudo-class :root

Evaluator.Matches

Evaluator for matching Element (and its descendants) text with regex

Evaluator.MatchesOwn

Evaluator for matching Element's own text with regex

Evaluator.MatchesWholeOwnText

Evaluator for matching Element's own whole text with regex.

Evaluator.MatchesWholeText

Evaluator for matching Element (and its descendants) whole text with regex.

Evaluator.MatchText

Deprecated.

This selector is deprecated and will be removed in a future version.

Evaluator.Tag

Evaluator for tag name

Evaluator.TagEndsWith

Evaluator for tag name that ends with suffix; used for *|el

Evaluator.TagStartsWith

Evaluator for tag name that starts with prefix; used for ns|*

FormElement

An HTML Form Element provides ready access to the form fields/controls that are associated with it.

Functions

An internal class containing functions for use with Map.computeIfAbsent(Object, Function).

HtmlToPlainText

HTML to plain-text.

HtmlTreeBuilder

HTML Tree Builder; creates a DOM from Tokens.

HttpConnection

Implementation of Connection.

HttpConnection.KeyVal

HttpConnection.Request

HttpConnection.Response

HttpStatusException

Signals that a HTTP request resulted in a not OK HTTP response.

Jsoup

The core public access point to the jsoup functionality.

LeafNode

A node that does not hold any children.

ListLinks

Example program to list links from a URL.

Node

The base, abstract Node model.

NodeFilter

A controllable Node visitor interface.

NodeFilter.FilterResult

Traversal action.

NodeIterator<T>

Iterate through a Node and its tree of descendants, in document order, and returns nodes of the specified type.

Nodes<T>

A list of Node objects, with methods that act on every node in the list.

NodeTraversor

A depth-first node traversor.

NodeVisitor

Node visitor interface, used to walk the DOM, and visit each Node.

Normalizer

Util methods for normalizing strings.

ParseError

A Parse Error records an error in the input HTML that occurs in either the tokenisation or the tree building phase.

ParseErrorList

A container for ParseErrors.

Parser

Parses HTML or XML into a Document.

ParseSettings

Controls parser case settings, to optionally preserve tag and/or attribute name case.

Progress<ProgressContext>

PseudoTextElement

Deprecated.

use Element.selectNodes(String, Class) instead, with selector of ::textnode and class TextNode.

QueryParser

Parses a CSS selector into an Evaluator tree.

QuietAppendable

A jsoup internal class to wrap an Appendable and throw IOExceptions as SerializationExceptions.

Range

A Range object tracks the character positions in the original input source where a Node starts or ends.

Range.AttributeRange

Range.Position

A Position object tracks the character position in the original input source where a Node starts or ends.

RequestAuthenticator

A RequestAuthenticator is used in Connection to authenticate if required to proxies and web servers.

RequestAuthenticator.Context

Provides details for the request, to determine the appropriate credentials to return.

Safelist

Safe-lists define what HTML (elements and attributes) to allow through the cleaner.

Selector

CSS element selector, that finds elements matching a query.

Selector.SelectorParseException

SerializationException

A SerializationException is raised whenever serialization of a DOM element fails.

SharedConstants

jsoup constants used between packages.

SimpleStreamReader

A simple decoding InputStreamReader that recycles internal buffers.

SoftPool<T>

A SoftPool is a ThreadLocal that holds a SoftReference to a pool of initializable objects.

StreamParser

A StreamParser provides a progressive parse of its input.

StringUtil

A minimal String utility class.

StringUtil.StringJoiner

A StringJoiner allows incremental / filtered joining of a set of stringable objects.

Tag

A Tag represents an Element's name and configured options, common throughout the Document.

TagSet

A TagSet controls the Tag configuration for a Document's parse, and its serialization.

TextNode

A text node.

TokenQueue

A character reader with helpers focusing on parsing CSS selectors.

UnsupportedMimeTypeException

Signals that a HTTP response returned a mime type that is not supported.

Validate

Validators to check that method arguments meet expectations.

ValidationException

Validation exceptions, as thrown by the methods in Validate.

W3CDom

Helper class to transform a Document to a org.w3c.dom.Document, for integration with toolsets that use the W3C DOM.

W3CDom.W3CBuilder

Implements the conversion by walking the input.

Wikipedia

A simple example, used on the jsoup website.

XmlDeclaration

An XML Declaration.

XmlTreeBuilder

Use the XmlTreeBuilder when you want to parse XML without any of the HTML DOM rules being applied to the document.