Skip to content
  • jsoup
  • News
  • Bugs
  • Discussion
  • Download
  • API Reference
  • Cookbook
  • Try jsoup
jsoup » Cookbook » Input » Load a Document from a URL

Load a Document from a URL

Feb 1, 2010

Problem

You need to fetch and parse a HTML document from the web, and find data within it (screen scraping).

Solution

Use the Jsoup.connect(String url) method:

Document doc = Jsoup.connect("https://example.com/").get();
String title = doc.title();

Description

The connect(String url) method creates a new Connection, and get() fetches and parses a HTML file. If an error occurs whilst fetching the URL, it will throw an IOException, which you should handle appropriately.

The Connection interface is designed for method chaining to build specific requests:

Document doc = Jsoup.connect("https://example.com")
  .data("query", "Java")
  .userAgent("Mozilla")
  .cookie("auth", "token")
  .timeout(3000)
  .post();

This method only suports web URLs (http and https protocols); if you need to load from a file, use the parse(File in, String charsetName) method instead.

Cookbook

Introduction

  1. Parsing and traversing a Document

Input

  1. Parse a document from a String
  2. Parsing a body fragment
  3. Load a Document from a URL
  4. Load a Document from a File
  5. Parse large documents efficiently with StreamParser

Extracting data

  1. Use DOM methods to navigate a document
  2. Use CSS selectors to find elements
  3. Use XPath selectors to find elements and nodes
  4. Extract attributes, text, and HTML from elements
  5. Working with relative and absolute URLs
  6. Example program: list links

Modifying data

  1. Set attribute values
  2. Set the HTML of an element
  3. Setting the text content of elements

Cleaning HTML

  1. Sanitize untrusted HTML (to prevent XSS)

Working with the web

  1. Maintaining a request session
jsoup HTML parser © 2009 - 2026 Jonathan Hedley