Load a Document from a URL
Problem
You need to fetch and parse a HTML document from the web, and find data within it (screen scraping).
Solution
Use the static Jsoup.parse(URL url, int timeoutMillis) method:
URL url = new URL("http://example.com/");
Document doc = Jsoup.parse(url, 3*1000);
String title = doc.title();
Description
The parse(URL url, int timeoutMillis) method fetches and parses a HTML file. If an error occurs whilst fetching the URL, it will throw an IOException, which you should handle appropriately.
The timeout parameter specifies how long to wait for a connection and for content, in milliseconds; if exceeded an IOException is thrown.
This method only suports web URLs (http and https protocols); if you need to load from a file, use the parse(File in, String charsetName) method instead.
Cookbook contents
Introduction
Input
- Parse a document from a String
- Parsing a body fragment
- Load a Document from a URL
- Load a Document from a File
Extracting data
- Use DOM methods to navigate a document
- Use selector-syntax to find elements
- Extract attributes, text, and HTML from elements
- Working with URLs
- Example program: list links