Load a Document from a URL

Problem

You need to fetch and parse a HTML document from the web, and find data within it (screen scraping).

Solution

Use the Jsoup.connect(String url) method:

Document doc = Jsoup.connect("http://example.com/").get();
String title = doc.title();

Description

The connect(String url) method creates a new Connection, and get() fetches and parses a HTML file. If an error occurs whilst fetching the URL, it will throw an IOException, which you should handle appropriately.

The Connection interface is designed for method chaining to build specific requests:

Document doc = Jsoup.connect("http://example.com")
  .data("query", "Java")
  .userAgent("Mozilla")
  .cookie("auth", "token")
  .timeout(3000)
  .post();

This method only suports web URLs (http and https protocols); if you need to load from a file, use the parse(File in, String charsetName) method instead.

Cookbook

Introduction

Parsing and traversing a Document

Input

Parse a document from a String
Parsing a body fragment
Load a Document from a URL
Load a Document from a File
Parse large documents efficiently with StreamParser

jsoup

Load a Document from a URL

Problem

Solution

Description

Cookbook

Introduction

Input

Extracting data

Modifying data

Cleaning HTML

Working with the web