Package org.jsoup.parser

Class TokenQueue

java.lang.Object
org.jsoup.parser.TokenQueue

public class TokenQueue extends Object
A character reader with helpers focusing on parsing CSS selectors. Used internally by jsoup. API subject to changes.
  • Constructor Summary

    Constructors
    Constructor
    Description
    TokenQueue(String data)
    Create a new TokenQueue.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    addFirst(String seq)
    Deprecated.
    will be removed in 1.21.1.
    void
    Drops the next character off the queue.
    String
    chompBalanced(char open, char close)
    Pulls a balanced string off the queue.
    String
    chompTo(String seq)
    Deprecated.
    will be removed in 1.21.1
    String
    chompToIgnoreCase(String seq)
    Deprecated.
    will be removed in 1.21.1.
    char
    Consume one character off queue.
    void
    consume(String seq)
    Consumes the supplied sequence of the queue, case-insensitively.
    String
    Consume a CSS identifier (ID or class) off the queue.
    String
    Consume a CSS element selector (tag name, but | instead of : for namespaces (or *| for wildcard namespace), to not conflict with :pseudo selects).
    String
    consumeTo(String seq)
    Pulls a string off the queue, up to but exclusive of the match sequence, or to the queue running out.
    String
    consumeToAny(String... seq)
    Consumes to the first sequence provided, or to the end of the queue.
    String
    consumeToIgnoreCase(String seq)
    Deprecated.
    boolean
    Pulls the next run of whitespace characters of the queue.
    String
    Deprecated.
    will be removed in 1.21.1
    static String
    escapeCssIdentifier(String in)
    Given a CSS identifier (such as a tag, ID, or class), escape any CSS special characters that would otherwise not be valid in a selector.
    boolean
    Is the queue empty?
    boolean
    matchChomp(char c)
    If the queue matches the supplied (case-sensitive) character, consume it off the queue.
    boolean
    matchChomp(String seq)
    If the queue case-insensitively matches the supplied string, consume it off the queue.
    boolean
    matches(char c)
    Tests if the next character on the queue matches the character, case-sensitively.
    boolean
    matches(String seq)
    Tests if the next characters on the queue match the sequence, case-insensitively.
    boolean
    matchesAny(char... seq)
    Tests if the next characters match any of the sequences, case-sensitively.
    boolean
    matchesAny(String... seq)
    Deprecated.
    will be removed in 1.21.1.
    boolean
    Tests if queue starts with a whitespace character.
    boolean
    Test if the queue matches a tag word character (letter or digit).
    String
    Consume and return whatever is left on the queue.
    String
     
    static String
    unescape(String in)
    Unescape a \ escaped string.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
  • Constructor Details

    • TokenQueue

      public TokenQueue(String data)
      Create a new TokenQueue.
      Parameters:
      data- string of data to back queue.
  • Method Details

    • isEmpty

      public boolean isEmpty()
      Is the queue empty?
      Returns:
      true if no data left in queue.
    • consume

      public char consume()
      Consume one character off queue.
      Returns:
      first character on queue.
    • advance

      public void advance()
      Drops the next character off the queue.
    • addFirst

      @Deprecated public void addFirst(String seq)
      Deprecated.
      will be removed in 1.21.1.
      Internal method, no longer supported.
    • matches

      public boolean matches(String seq)
      Tests if the next characters on the queue match the sequence, case-insensitively.
      Parameters:
      seq- String to check queue for.
      Returns:
      true if the next characters match.
    • matches

      public boolean matches(char c)
      Tests if the next character on the queue matches the character, case-sensitively.
    • matchesAny

      @Deprecated public boolean matchesAny(String... seq)
      Deprecated.
      will be removed in 1.21.1.
    • matchesAny

      public boolean matchesAny(char... seq)
      Tests if the next characters match any of the sequences, case-sensitively.
      Parameters:
      seq- list of chars to case-sensitively check for
      Returns:
      true of any matched, false if none did
    • matchChomp

      public boolean matchChomp(String seq)
      If the queue case-insensitively matches the supplied string, consume it off the queue.
      Parameters:
      seq- String to search for, and if found, remove from queue.
      Returns:
      true if found and removed, false if not found.
    • matchChomp

      public boolean matchChomp(char c)
      If the queue matches the supplied (case-sensitive) character, consume it off the queue.
    • matchesWhitespace

      public boolean matchesWhitespace()
      Tests if queue starts with a whitespace character.
      Returns:
      if starts with whitespace
    • matchesWord

      public boolean matchesWord()
      Test if the queue matches a tag word character (letter or digit).
      Returns:
      if matches a word character
    • consume

      public void consume(String seq)
      Consumes the supplied sequence of the queue, case-insensitively. If the queue does not start with the supplied sequence, will throw an illegal state exception -- but you should be running match() against that condition.
      Parameters:
      seq- sequence to remove from head of queue.
    • consumeTo

      public String consumeTo(String seq)
      Pulls a string off the queue, up to but exclusive of the match sequence, or to the queue running out.
      Parameters:
      seq- String to end on (and not include in return, but leave on queue). Case-sensitive.
      Returns:
      The matched data consumed from queue.
    • consumeToIgnoreCase

      @Deprecated public String consumeToIgnoreCase(String seq)
      Deprecated.
    • consumeToAny

      public String consumeToAny(String... seq)
      Consumes to the first sequence provided, or to the end of the queue. Leaves the terminator on the queue.
      Parameters:
      seq- any number of terminators to consume to. Case-insensitive.
      Returns:
      consumed string
    • chompTo

      @Deprecated public String chompTo(String seq)
      Deprecated.
      will be removed in 1.21.1
      Pulls a string off the queue (like consumeTo), and then pulls off the matched string (but does not return it).

      If the queue runs out of characters before finding the seq, will return as much as it can (and queue will go isEmpty() == true).

      Parameters:
      seq- String to match up to, and not include in return, and to pull off queue. Case-sensitive.
      Returns:
      Data matched from queue.
    • chompToIgnoreCase

      @Deprecated public String chompToIgnoreCase(String seq)
      Deprecated.
      will be removed in 1.21.1.
    • chompBalanced

      public String chompBalanced(char open, char close)
      Pulls a balanced string off the queue. E.g. if queue is "(one (two) three) four", (,) will return "one (two) three", and leave " four" on the queue. Unbalanced openers and closers can be quoted (with ' or ") or escaped (with \). Those escapes will be left in the returned string, which is suitable for regexes (where we need to preserve the escape), but unsuitable for contains text strings; use unescape for that.
      Parameters:
      open- opener
      close- closer
      Returns:
      data matched from the queue
    • unescape

      public static String unescape(String in)
      Unescape a \ escaped string.
      Parameters:
      in- backslash escaped string
      Returns:
      unescaped string
    • escapeCssIdentifier

      public static String escapeCssIdentifier(String in)
      Given a CSS identifier (such as a tag, ID, or class), escape any CSS special characters that would otherwise not be valid in a selector.
      See Also:
    • consumeWhitespace

      public boolean consumeWhitespace()
      Pulls the next run of whitespace characters of the queue.
      Returns:
      Whether consuming whitespace or not
    • consumeWord

      @Deprecated public String consumeWord()
      Deprecated.
      will be removed in 1.21.1
      Retrieves the next run of word type (letter or digit) off the queue.
      Returns:
      String of word characters from queue, or empty string if none.
    • consumeElementSelector

      public String consumeElementSelector()
      Consume a CSS element selector (tag name, but | instead of : for namespaces (or *| for wildcard namespace), to not conflict with :pseudo selects).
      Returns:
      tag name
    • consumeCssIdentifier

      public String consumeCssIdentifier()
      Consume a CSS identifier (ID or class) off the queue.

      Note: For backwards compatibility this method supports improperly formatted CSS identifiers, e.g. 1 instead of \31.

      Returns:
      The unescaped identifier.
      Throws:
      IllegalArgumentException- if an invalid escape sequence was found. Afterward, the state of the TokenQueue is undefined.
      See Also:
    • remainder

      public String remainder()
      Consume and return whatever is left on the queue.
      Returns:
      remainder of queue.
    • toString

      public String toString()
      Overrides:
      toString in class Object