Package org.jsoup.safety

Class Cleaner

java.lang.Object
org.jsoup.safety.Cleaner

public class Cleaner extends Object
The safelist based HTML cleaner. Use to ensure that end-user provided HTML contains only the elements and attributes that you are expecting; no junk, and no cross-site scripting attacks!

The HTML cleaner parses the input as HTML and then runs it through a safe-list, so the output HTML can only contain HTML that is allowed by the safelist.

It is assumed that the input HTML is a body fragment; the clean methods only pull from the source's body, and the canned safe-lists only allow body contained tags.

Rather than interacting directly with a Cleaner object, generally see the clean methods in Jsoup.

  • Constructor Summary

    Constructors
    Constructor
    Description
    Cleaner(Safelist safelist)
    Create a new cleaner, that sanitizes documents using the supplied safelist.
  • Method Summary

    Modifier and Type
    Method
    Description
    clean(Document dirtyDocument)
    Creates a new, clean document, from the original dirty document, containing only elements allowed by the safelist.
    boolean
    isValid(Document dirtyDocument)
    Determines if the input document's body is valid, against the safelist.
    boolean
    isValidBodyHtml(String bodyHtml)
    Determines if the input document's body HTML is valid, against the safelist.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • Cleaner

      public Cleaner(Safelist safelist)
      Create a new cleaner, that sanitizes documents using the supplied safelist.
      Parameters:
      safelist - safe-list to clean with
  • Method Details

    • clean

      public Document clean(Document dirtyDocument)
      Creates a new, clean document, from the original dirty document, containing only elements allowed by the safelist. The original document is not modified. Only elements from the dirty document's body are used. The OutputSettings of the original document are cloned into the clean document.
      Parameters:
      dirtyDocument - Untrusted base document to clean.
      Returns:
      cleaned document.
    • isValid

      public boolean isValid(Document dirtyDocument)
      Determines if the input document's body is valid, against the safelist. It is considered valid if all the tags and attributes in the input HTML are allowed by the safelist, and that there is no content in the head.

      This method is intended to be used in a user interface as a validator for user input. Note that regardless of the output of this method, the input document must always be normalized using a method such as clean(Document), and the result of that method used to store or serialize the document before later reuse such as presentation to end users. This ensures that enforced attributes are set correctly, and that any differences between how a given browser and how jsoup parses the input HTML are normalized.

      Example:

      
           Document inputDoc = Jsoup.parse(inputHtml);
           Cleaner cleaner = new Cleaner(Safelist.relaxed());
           boolean isValid = cleaner.isValid(inputDoc);
           Document normalizedDoc = cleaner.clean(inputDoc);
           

      Parameters:
      dirtyDocument - document to test
      Returns:
      true if no tags or attributes need to be removed; false if they do
    • isValidBodyHtml

      public boolean isValidBodyHtml(String bodyHtml)
      Determines if the input document's body HTML is valid, against the safelist. It is considered valid if all the tags and attributes in the input HTML are allowed by the safelist.

      This method is intended to be used in a user interface as a validator for user input. Note that regardless of the output of this method, the input document must always be normalized using a method such as clean(Document), and the result of that method used to store or serialize the document before later reuse such as presentation to end users. This ensures that enforced attributes are set correctly, and that any differences between how a given browser and how jsoup parses the input HTML are normalized.

      Example:

      
           Document inputDoc = Jsoup.parse(inputHtml);
           Cleaner cleaner = new Cleaner(Safelist.relaxed());
           boolean isValid = cleaner.isValidBodyHtml(inputHtml);
           Document normalizedDoc = cleaner.clean(inputDoc);
           

      Parameters:
      bodyHtml - HTML fragment to test
      Returns:
      true if no tags or attributes need to be removed; false if they do