Load a Document from a URL
Problem
You need to fetch and parse a HTML document from the web, and find data within it (screen scraping).
Solution
Use the Jsoup.connect(String url) method:
Document doc = Jsoup.connect("http://example.com/").get();
String title = doc.title();
Description
The connect(String url) method creates a new Connection, and get() fetches and parses a HTML file. If an error occurs whilst fetching the URL, it will throw an IOException, which you should handle appropriately.
The Connection interface is designed for method chaining to build specific requests:
Document doc = Jsoup.connect("http://example.com")
.data("query", "Java")
.userAgent("Mozilla")
.cookie("auth", "token")
.timeout(3000)
.post();
This method only suports web URLs (http and https protocols); if you need to load from a file, use the parse(File in, String charsetName) method instead.
Cookbook contents
Introduction
Input
- Parse a document from a String
- Parsing a body fragment
- Load a Document from a URL
- Load a Document from a File
Extracting data
- Use DOM methods to navigate a document
- Use selector-syntax to find elements
- Extract attributes, text, and HTML from elements
- Working with URLs
- Example program: list links