Skip to content
  • jsoup
  • News
  • Bugs
  • Discussion
  • Download
  • API Reference
  • Cookbook
  • Try jsoup
jsoup » Cookbook » Extracting data » Extract attributes, text, and HTML from elements

Extract attributes, text, and HTML from elements

Feb 9, 2010

Problem

After parsing a document, and finding some elements, you’ll want to get at the data inside those elements.

Solution

  • To get the value of an attribute, use the Node.attr(String key) method
  • For the text on an element (and its combined children), use Element.text()
  • For HTML, use Element.html(), or Node.outerHtml() as appropriate

For example:

String html = "<p>An <a href='https://example.com/'><b>example</b></a> link.</p>";
Document doc = Jsoup.parse(html);
Element link = doc.select("a").first();

String text = doc.body().text(); // "An example link"
String linkHref = link.attr("href"); // "https://example.com/"
String linkText = link.text(); // "example""

String linkOuterH = link.outerHtml(); 
    // "<a href="https://example.com"><b>example</b></a>"
String linkInnerH = link.html(); // "<b>example</b>"

Description

The methods above are the core of the element data access methods. There are additional others:

  • Element.id()
  • Element.tagName()
  • Element.className() and Element.hasClass(String className)

All of these accessor methods have corresponding setter methods to change the data.

See also

  • The reference documentation for Element and the collection Elements class
  • Working with URLs
  • Finding elements with the CSS selector syntax

Cookbook

Introduction

  1. Parsing and traversing a Document

Input

  1. Parse a document from a String
  2. Parsing a body fragment
  3. Load a Document from a URL
  4. Load a Document from a File
  5. Parse large documents efficiently with StreamParser

Extracting data

  1. Use DOM methods to navigate a document
  2. Use CSS selectors to find elements
  3. Use XPath selectors to find elements and nodes
  4. Extract attributes, text, and HTML from elements
  5. Working with relative and absolute URLs
  6. Example program: list links

Modifying data

  1. Set attribute values
  2. Set the HTML of an element
  3. Setting the text content of elements

Cleaning HTML

  1. Sanitize untrusted HTML (to prevent XSS)

Working with the web

  1. Maintaining a request session
jsoup HTML parser © 2009 - 2026 Jonathan Hedley