Working with relative and absolute URLs

Problem

You have a HTML document that contains relative URLs, which you need to resolve to absolute URLs.

Solution

Make sure you specify a base URI when parsing the document (which is implicit when loading from a URL), and
Use the abs: attribute prefix to resolve an absolute URL from an attribute:

Document doc = Jsoup.connect("http://jsoup.org").get();

Element link = doc.select("a").first();
String relHref = link.attr("href"); // == "/"
String absHref = link.attr("abs:href"); // "http://jsoup.org/"

Description

In HTML elements, URLs are often written relative to the document's location: <a href="/download">...</a>. When you use the Node.attr(String key) method to get a href attribute, it will be returned as it is specified in the source HTML.

If you want to get an absolute URL, there is a attribute key prefix abs: that will cause the attribute value to be resolved against the document's base URI (original location): attr("abs:href")

For this use case, it is important to specify the base URI when parsing the document.

If you don't want to use the abs: prefix, there is also a method Node.absUrl(String key) which does the same thing, but accesses via the natural attribute key.

Cookbook

Introduction

Parsing and traversing a Document

Input

Extracting data

Use DOM methods to navigate a document
Use CSS selectors to find elements
Use XPath selectors to find elements and nodes
Extract attributes, text, and HTML from elements
Working with relative and absolute URLs
Example program: list links

jsoup