Working with URLs
You have a HTML document that contains relative URLs, which you need to resolve to absolute URLs.
- Make sure you specify a
base URIwhen parsing the document (which is implicit when loading from a URL), and
- Use the
abs:attribute prefix to resolve an absolute URL from an attribute:
Document doc = Jsoup.connect("http://jsoup.org").get(); Element link = doc.select("a").first(); String relHref = link.attr("href"); // == "/" String absHref = link.attr("abs:href"); // "http://jsoup.org/"
In HTML elements, URLs are often written relative to the document's location:
<a href="/download">...</a>. When you use the
Node.attr(String key) method to get a href attribute, it will be returned as it is specified in the source HTML.
If you want to get an absolute URL, there is a attribute key prefix
abs: that will cause the attribute value to be resolved against the document's base URI (original location):
For this use case, it is important to specify the base URI when parsing the document.
If you don't want to use the
abs: prefix, there is also a method
Node.absUrl(String key) which does the same thing, but accesses via the natural attribute key.
- Parse a document from a String
- Parsing a body fragment
- Load a Document from a URL
- Load a Document from a File
- Use DOM methods to navigate a document
- Use selector-syntax to find elements
- Extract attributes, text, and HTML from elements
- Working with URLs
- Example program: list links