Skip to content
  • jsoup
  • News
  • Bugs
  • Discussion
  • Download
  • API Reference
  • Cookbook
  • Try jsoup
jsoup » Cookbook » Extracting data » Use XPath selectors to find elements and nodes

Use XPath selectors to find elements and nodes

Nov 29, 2023

Problem

You want to find or manipulate elements using Xpath selectors.

Solution

Use the Element.selectXpath(String xpath) and Element.selectXpath(String, Class<T>) methods:

  Document doc = Jsoup.connect("https://jsoup.org/").get();
  
  Elements elements = doc.selectXpath("//div[@class='col1']/p");
      // Each P element in div.col1
  
  List<TextNode> textNodes = doc.selectXpath("//a/text()", TextNode.class);
      // Each TextNode in every A element

Description

jsoup supports XPath selectors using the Element.selectXpath(String xpath) method. By default, XPath 1.0 expressions are supported. You can also provide an alternate XPathFactory implementation for other versions.

The Element.selectXpath(String, Class<T>) method enables selecting for specific node types, such as TextNode, DataNode, LeafNode etc.

You can experiment with different XPath selectors on Try jsoup.

This XPath cheatsheet has helpful comparisons between CSS and XPath selectors.

Cookbook

Introduction

  1. Parsing and traversing a Document

Input

  1. Parse a document from a String
  2. Parsing a body fragment
  3. Load a Document from a URL
  4. Load a Document from a File
  5. Parse large documents efficiently with StreamParser

Extracting data

  1. Use DOM methods to navigate a document
  2. Use CSS selectors to find elements
  3. Use XPath selectors to find elements and nodes
  4. Extract attributes, text, and HTML from elements
  5. Working with relative and absolute URLs
  6. Example program: list links

Modifying data

  1. Set attribute values
  2. Set the HTML of an element
  3. Setting the text content of elements

Cleaning HTML

  1. Sanitize untrusted HTML (to prevent XSS)

Working with the web

  1. Maintaining a request session
jsoup HTML parser © 2009 - 2026 Jonathan Hedley