Package org.jsoup.select

Class Selector

java.lang.Object
org.jsoup.select.Selector

public class Selector extends Object
CSS-like element selector, that finds elements matching a query.

Selector syntax

A selector is a chain of simple selectors, separated by combinators. Selectors are case insensitive (including against elements, attributes, and attribute values).

The universal selector * is implicit when no element selector is supplied (i.e. .header and *.header are equivalent).

Pattern Matches Example
* any element *
tag elements with the given tag name div
*|E elements of type E in any namespace (including non-namespaced) *|name finds <fb:name> and <name> elements
ns|E elements of type E in the namespace ns fb|name finds <fb:name> elements
#id elements with attribute ID of "id" div#wrap, #logo
.class elements with a class name of "class" div.left, .result
[attr] elements with an attribute named "attr" (with any value) a[href], [title]
[^attrPrefix] elements with an attribute name starting with "attrPrefix". Use to find elements with HTML5 datasets [^data-], div[^data-]
[attr=val] elements with an attribute named "attr", and value equal to "val" img[width=500], a[rel=nofollow]
[attr="val"] elements with an attribute named "attr", and value equal to "val" span[hello="Cleveland"][goodbye="Columbus"], a[rel="nofollow"]
[attr^=valPrefix] elements with an attribute named "attr", and value starting with "valPrefix" a[href^=http:]
[attr$=valSuffix] elements with an attribute named "attr", and value ending with "valSuffix" img[src$=.png]
[attr*=valContaining] elements with an attribute named "attr", and value containing "valContaining" a[href*=/search/]
[attr~=regex] elements with an attribute named "attr", and value matching the regular expression img[src~=(?i)\\.(png|jpe?g)]
[*] elements with any attribute p[*] finds p elements that have at least one attribute; p:not([*]) finds those with no attributes
The above may be combined in any order div.header[title]

Combinators

E F an F element descended from an E element div a, .logo h1
E > F an F direct child of E ol > li
E + F an F element immediately preceded by sibling E li + li, div.head + div
E ~ F an F element preceded by sibling E h1 ~ p
E, F, G all matching elements E, F, or G a[href], div, h3

Pseudo selectors

:lt(n) elements whose sibling index is less than n td:lt(3) finds the first 3 cells of each row
:gt(n) elements whose sibling index is greater than n td:gt(1) finds cells after skipping the first two
:eq(n) elements whose sibling index is equal to n td:eq(0) finds the first cell of each row
:has(selector) elements that contains at least one element matching the selector div:has(p) finds divs that contain p elements.
div:has(> a) selects div elements that have at least one direct child a element.
section:has(h1, h2) finds section elements that contain a h1 or a h2 element
:is(selector list) elements that match any of the selectors in the selector list :is(h1, h2, h3, h4, h5, h6) finds any heading element.
:is(section, article) > :is(h1, h2) finds a h1 or h2 that is a direct child of a section or an article
:not(selector) elements that do not match the selector. See also Elements.not(String) div:not(.logo) finds all divs that do not have the "logo" class.

div:not(:has(div)) finds divs that do not contain divs.

:contains(text) elements that contains the specified text. The search is case insensitive. The text may appear in the found element, or any of its descendants. The text is whitespace normalized.

To find content that includes parentheses, escape those with a \.

p:contains(jsoup) finds p elements containing the text "jsoup".

p:contains(hello \(there\) finds p elements containing the text "Hello (There)"

:containsOwn(text) elements that directly contain the specified text. The search is case insensitive. The text must appear in the found element, not any of its descendants. p:containsOwn(jsoup) finds p elements with own text "jsoup".
:containsData(data) elements that contains the specified data. The contents of script and style elements, and comment nodes (etc) are considered data nodes, not text nodes. The search is case insensitive. The data may appear in the found element, or any of its descendants. script:contains(jsoup) finds script elements containing the data "jsoup".
:containsWholeText(text) elements that contains the specified non-normalized text. The search is case sensitive, and will match exactly against spaces and newlines found in the original input. The text may appear in the found element, or any of its descendants.

To find content that includes parentheses, escape those with a \.

p:containsWholeText(jsoup\nThe Java HTML Parser) finds p elements containing the text "jsoup\nThe Java HTML Parser" (and not other variations of whitespace or casing, as :contains() would. Note that br elements are presented as a newline.

:containsWholeOwnText(text) elements that directly contain the specified non-normalized text. The search is case sensitive, and will match exactly against spaces and newlines found in the original input. The text may appear in the found element, but not in its descendants.

To find content that includes parentheses, escape those with a \.

p:containsWholeOwnText(jsoup\nThe Java HTML Parser) finds p elements directly containing the text "jsoup\nThe Java HTML Parser" (and not other variations of whitespace or casing, as :contains() would. Note that br elements are presented as a newline.

:matches(regex) elements containing whitespace normalized text that matches the specified regular expression. The text may appear in the found element, or any of its descendants. td:matches(\\d+) finds table cells containing digits. div:matches((?i)login) finds divs containing the text, case insensitively.
:matchesWholeText(regex) elements containing non-normalized whole text that matches the specified regular expression. The text may appear in the found element, or any of its descendants. td:matchesWholeText(\\s{2,}) finds table cells a run of at least two space characters.
:matchesWholeOwnText(regex) elements whose own non-normalized whole text matches the specified regular expression. The text must appear in the found element, not any of its descendants. td:matchesWholeOwnText(\n\\d+) finds table cells directly containing digits following a neewline.
The above may be combined in any order and with other selectors .light:contains(name):eq(0)
:matchText treats text nodes as elements, and so allows you to match against and select text nodes.

Note that using this selector will modify the DOM, so you may want to clone your document before using.

p:matchText:firstChild with input <p>One<br />Two</p> will return one PseudoTextElement with text "One".

Structural pseudo selectors

:root The element that is the root of the document. In HTML, this is the html element :root
:nth-child(an+b)

elements that have an+b-1 siblings before it in the document tree, for any positive integer or zero value of n, and has a parent element. For values of a and b greater than zero, this effectively divides the element's children into groups of a elements (the last group taking the remainder), and selecting the bth element of each group. For example, this allows the selectors to address every other row in a table, and could be used to alternate the color of paragraph text in a cycle of four. The a and b values must be integers (positive, negative, or zero). The index of the first child of an element is 1.

Additionally, :nth-child() supports odd and even as arguments. odd is the same as 2n+1, and even is the same as 2n.
tr:nth-child(2n+1) finds every odd row of a table. :nth-child(10n-1) the 9th, 19th, 29th, etc, element. li:nth-child(5) the 5h li
:nth-last-child(an+b) elements that have an+b-1 siblings after it in the document tree. Otherwise like :nth-child() tr:nth-last-child(-n+2) the last two rows of a table
:nth-of-type(an+b) pseudo-class notation represents an element that has an+b-1 siblings with the same expanded element name before it in the document tree, for any zero or positive integer value of n, and has a parent element img:nth-of-type(2n+1)
:nth-last-of-type(an+b) pseudo-class notation represents an element that has an+b-1 siblings with the same expanded element name after it in the document tree, for any zero or positive integer value of n, and has a parent element img:nth-last-of-type(2n+1)
:first-child elements that are the first child of some other element. div > p:first-child
:last-child elements that are the last child of some other element. ol > li:last-child
:first-of-type elements that are the first sibling of its type in the list of children of its parent element dl dt:first-of-type
:last-of-type elements that are the last sibling of its type in the list of children of its parent element tr > td:last-of-type
:only-child elements that have a parent element and whose parent element have no other element children
:only-of-type an element that has a parent element and whose parent element has no other element children with the same expanded element name
:empty elements that have no children at all

A word on using regular expressions in these selectors: depending on the content of the regex, you will need to quote the pattern using Pattern.quote("regex") for it to parse correctly through both the selector parser and the regex parser. E.g. String query = "div:matches(" + Pattern.quote(regex) + ");".

Escaping special characters: to match a tag, ID, or other selector that does not follow the regular CSS syntax, the query must be escaped with the \ character. For example, to match by ID <p id="i.d">, use document.select("#i\\.d").

See Also:
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    static class 
     
  • Method Summary

    Modifier and Type
    Method
    Description
    static Elements
    select(String query, Iterable<Element> roots)
    Find elements matching selector.
    static Elements
    select(String query, Element root)
    Find elements matching selector.
    static Elements
    select(Evaluator evaluator, Element root)
    Find elements matching selector.
    static @Nullable Element
    selectFirst(String cssQuery, Element root)
    Find the first element that matches the query.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Method Details

    • select

      public static Elements select(String query, Element root)
      Find elements matching selector.
      Parameters:
      query - CSS selector
      root - root element to descend into
      Returns:
      matching elements, empty if none
      Throws:
      Selector.SelectorParseException - (unchecked) on an invalid CSS query.
    • select

      public static Elements select(Evaluator evaluator, Element root)
      Find elements matching selector.
      Parameters:
      evaluator - CSS selector
      root - root element to descend into
      Returns:
      matching elements, empty if none
    • select

      public static Elements select(String query, Iterable<Element> roots)
      Find elements matching selector.
      Parameters:
      query - CSS selector
      roots - root elements to descend into
      Returns:
      matching elements, empty if none
    • selectFirst

      public static @Nullable Element selectFirst(String cssQuery, Element root)
      Find the first element that matches the query.
      Parameters:
      cssQuery - CSS selector
      root - root element to descend into
      Returns:
      the matching element, or null if none.