org.jsoup.safety
Class Whitelist

java.lang.Object
  extended by org.jsoup.safety.Whitelist

public class Whitelist
     
extends Object

Whitelists define what HTML (elements and attributes) to allow through the cleaner. Everything else is removed.

Start with one of the defaults:

If you need to allow more through (please be careful!), tweak a base whitelist with:

The cleaner and these whitelists assume that you want to clean a body fragment of HTML (to add user supplied HTML into a templated page), and not to clean a full HTML document. If the latter is the case, either wrap the document HTML around the cleaned body HTML, or create a whitelist that allows html and head elements as appropriate.

If you are going to extend a whitelist, please be very careful. Make sure you understand what attributes may lead to XSS attack vectors. URL attributes are particularly vulnerable and require careful validation. See http://ha.ckers.org/xss.html for some XSS attack examples.

Author:
Jonathan Hedley

Constructor Summary
Whitelist()
          Create a new, empty whitelist.
 
Method Summary
 Whitelist addAttributes(String tag, String... keys)
          Add a list of allowed attributes to a tag.
 Whitelist addEnforcedAttribute(String tag, String key, String value)
          Add an enforced attribute to a tag.
 Whitelist addProtocols(String tag, String key, String... protocols)
          Add allowed URL protocols for an element's URL attribute.
 Whitelist addTags(String... tags)
          Add a list of allowed elements to a whitelist.
static Whitelist basic()
          This whitelist allows a fuller range of text nodes: a, b, blockquote, br, cite, code, dd, dl, dt, em, i, li, ol, p, pre, q, small, strike, strong, sub, sup, u, ul, and appropriate attributes.
static Whitelist basicWithImages()
          This whitelist allows the same text tags as basic(), and also allows img tags, with appropriate attributes, with src pointing to http or https.
protected  boolean isSafeAttribute(String tagName, Element el, Attribute attr)
          Test if the supplied attribute is allowed by this whitelist for this tag
protected  boolean isSafeTag(String tag)
          Test if the supplied tag is allowed by this whitelist
static Whitelist none()
          This whitelist allows only text nodes: all HTML will be stripped.
 Whitelist preserveRelativeLinks(boolean preserve)
          Configure this Whitelist to preserve relative links in an element's URL attribute, or convert them to absolute links.
static Whitelist relaxed()
          This whitelist allows a full range of text and structural body HTML: a, b, blockquote, br, caption, cite, code, col, colgroup, dd, dl, dt, em, h1, h2, h3, h4, h5, h6, i, img, li, ol, p, pre, q, small, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, u, ul

Links do not have an enforced rel=nofollow attribute, but you can add that if desired.
static Whitelist simpleText()
          This whitelist allows only simple text formatting: b, em, i, strong, u.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Whitelist

public Whitelist()
Create a new, empty whitelist. Generally it will be better to start with a default prepared whitelist instead.

See Also:
basic(), basicWithImages(), simpleText(), relaxed()
Method Detail

none

public static Whitelist none()
This whitelist allows only text nodes: all HTML will be stripped.

Returns:
whitelist

simpleText

public static Whitelist simpleText()
This whitelist allows only simple text formatting: b, em, i, strong, u. All other HTML (tags and attributes) will be removed.

Returns:
whitelist

basic

public static Whitelist basic()
This whitelist allows a fuller range of text nodes: a, b, blockquote, br, cite, code, dd, dl, dt, em, i, li, ol, p, pre, q, small, strike, strong, sub, sup, u, ul, and appropriate attributes.

Links ( a elements) can point to http, https, ftp, mailto, and have an enforced rel=nofollow attribute.

Does not allow images.

Returns:
whitelist

basicWithImages

public static Whitelist basicWithImages()
This whitelist allows the same text tags as basic(), and also allows img tags, with appropriate attributes, with src pointing to http or https.

Returns:
whitelist

relaxed

public static Whitelist relaxed()
This whitelist allows a full range of text and structural body HTML: a, b, blockquote, br, caption, cite, code, col, colgroup, dd, dl, dt, em, h1, h2, h3, h4, h5, h6, i, img, li, ol, p, pre, q, small, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, u, ul

Links do not have an enforced rel=nofollow attribute, but you can add that if desired.

Returns:
whitelist

addTags

public Whitelist addTags(String... tags)
Add a list of allowed elements to a whitelist. (If a tag is not allowed, it will be removed from the HTML.)

Parameters:
tags - tag names to allow
Returns:
this (for chaining)

addAttributes

public Whitelist addAttributes(String tag,
                               String... keys)
Add a list of allowed attributes to a tag. (If an attribute is not allowed on an element, it will be removed.)

E.g.: addAttributes("a", "href", "class") allows href and class attributes on a tags.

To make an attribute valid for all tags, use the pseudo tag :all, e.g. addAttributes(":all", "class").

Parameters:
tag - The tag the attributes are for. The tag will be added to the allowed tag list if necessary.
keys - List of valid attributes for the tag
Returns:
this (for chaining)

addEnforcedAttribute

public Whitelist addEnforcedAttribute(String tag,
                                      String key,
                                      String value)
Add an enforced attribute to a tag. An enforced attribute will always be added to the element. If the element already has the attribute set, it will be overridden.

E.g.: addEnforcedAttribute("a", "rel", "nofollow") will make all a tags output as <a href="..." rel="nofollow">

Parameters:
tag - The tag the enforced attribute is for. The tag will be added to the allowed tag list if necessary.
key - The attribute key
value - The enforced attribute value
Returns:
this (for chaining)

preserveRelativeLinks

public Whitelist preserveRelativeLinks(boolean preserve)
Configure this Whitelist to preserve relative links in an element's URL attribute, or convert them to absolute links. By default, this is false: URLs will be made absolute (e.g. start with an allowed protocol, like e.g. http://.

Note that when handling relative links, the input document must have an appropriate base URI set when parsing, so that the link's protocol can be confirmed. Regardless of the setting of the preserve relative links option, the link must be resolvable against the base URI to an allowed protocol; otherwise the attribute will be removed.

Parameters:
preserve - true to allow relative links, false (default) to deny
Returns:
this Whitelist, for chaining.
See Also:
addProtocols(java.lang.String, java.lang.String, java.lang.String...)

addProtocols

public Whitelist addProtocols(String tag,
                              String key,
                              String... protocols)
Add allowed URL protocols for an element's URL attribute. This restricts the possible values of the attribute to URLs with the defined protocol.

E.g.: addProtocols("a", "href", "ftp", "http", "https")

Parameters:
tag - Tag the URL protocol is for
key - Attribute key
protocols - List of valid protocols
Returns:
this, for chaining

isSafeTag

protected boolean isSafeTag(String tag)
Test if the supplied tag is allowed by this whitelist

Parameters:
tag - test tag
Returns:
true if allowed

isSafeAttribute

protected boolean isSafeAttribute(String tagName,
                                  Element el,
                                  Attribute attr)
Test if the supplied attribute is allowed by this whitelist for this tag

Parameters:
tagName - tag to consider allowing the attribute in
el - element under test, to confirm protocol
attr - attribute under test
Returns:
true if allowed


Copyright © 2009-2013 Jonathan Hedley. All Rights Reserved.