Load a Document from a File
You have a file on disk that contains HTML, that you'd like to load and parse, and then maybe manipulate or extract data from.
Use the static
Jsoup.parse(File in, String charsetName, String baseUri) method:
File input = new File("/tmp/input.html"); Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");
parse(File in, String charsetName, String baseUri) method loads and parses a HTML file. If an error occurs whilst loading the file, it will throw an
IOException, which you should handle appropriately.
baseUri parameter is used by the parser to resolve relative URLs in the document before a
<base href> element is found. If that's not a concern for you, you can pass an empty string instead.
There is a sister method
parse(File in, String charsetName) which uses the file's location as the
baseUri. This is useful if you are working on a filesystem-local site and the relative links it points to are also on the filesystem.
- Parse a document from a String
- Parsing a body fragment
- Load a Document from a URL
- Load a Document from a File
- Use DOM methods to navigate a document
- Use selector-syntax to find elements
- Extract attributes, text, and HTML from elements
- Working with URLs
- Example program: list links