Skip to content
  • jsoup
  • News
  • Bugs
  • Discussion
  • Download
  • API Reference
  • Cookbook
  • Try jsoup
jsoup » Cookbook » Input » Load a Document from a File

Load a Document from a File

Jan 22, 2010

Problem

You have a file on disk that contains HTML, that you’d like to load and parse, and then maybe manipulate or extract data from.

Solution

Use the static Jsoup.parse(File in, String charsetName, String baseUri) method:

File input = new File("/tmp/input.html");
Document doc = Jsoup.parse(input, "UTF-8", "https://example.com/");

Description

The parse(File in, String charsetName, String baseUri) method loads and parses a HTML file. If an error occurs whilst loading the file, it will throw an IOException, which you should handle appropriately.

The baseUri parameter is used by the parser to resolve relative URLs in the document before a <base href> element is found. If that’s not a concern for you, you can pass an empty string instead.

There is a sister method parse(File in, String charsetName) which uses the file’s location as the baseUri. This is useful if you are working on a filesystem-local site and the relative links it points to are also on the filesystem.

Cookbook

Introduction

  1. Parsing and traversing a Document

Input

  1. Parse a document from a String
  2. Parsing a body fragment
  3. Load a Document from a URL
  4. Load a Document from a File
  5. Parse large documents efficiently with StreamParser

Extracting data

  1. Use DOM methods to navigate a document
  2. Use CSS selectors to find elements
  3. Use XPath selectors to find elements and nodes
  4. Extract attributes, text, and HTML from elements
  5. Working with relative and absolute URLs
  6. Example program: list links

Modifying data

  1. Set attribute values
  2. Set the HTML of an element
  3. Setting the text content of elements

Cleaning HTML

  1. Sanitize untrusted HTML (to prevent XSS)

Working with the web

  1. Maintaining a request session
jsoup HTML parser © 2009 - 2026 Jonathan Hedley