Question 1

What is the difference between CSS selectors and XPath?

Accepted Answer

CSS selectors and XPath both target HTML elements, but they differ in capabilities. CSS selectors are simpler and work well for selecting elements by class, ID, or tag name (e.g., ".price", "#header", "h1"). XPath is more powerful and can: traverse up the document tree (select a parent based on a child), filter by text content (//a[contains(text(), "Buy")]), use logical conditions (and/or), select by position (//li[3]), and use functions like string-length() and normalize-space(). Use CSS selectors for simple extraction and XPath when you need advanced querying capabilities.

Question 2

Does SCRAPE_BY_XPATH always render JavaScript?

Accepted Answer

Yes. Unlike SCRAPE_BY_CSS_PATH which offers JavaScript rendering as an optional parameter, SCRAPE_BY_XPATH always uses a headless browser that fully executes JavaScript before evaluating the XPath expression. This means it works reliably on all types of websites including single-page applications, but it is slower than SCRAPE_BY_CSS_PATH in standard (non-JS) mode. If speed is a priority and the target page does not require JavaScript rendering, consider using SCRAPE_BY_CSS_PATH instead.

Question 3

How do I extract an attribute value with XPath?

Accepted Answer

Append /@attributeName to your XPath expression. For example, to get all image sources: "//img/@src". To get href attributes from links: "//a/@href". To get the value of a custom data attribute: "//div/@data-product-id". You can also combine attribute extraction with filters: "//a[@class='external']/@href" gets href values only from links with the class "external."

Question 4

Can I use XPath to select elements by their text content?

Accepted Answer

Yes, this is one of XPath's most powerful features. Use text() to match text content: "//a[text()='Click Here']" matches links with the exact text "Click Here". Use contains() for partial matches: "//p[contains(text(), 'price')]" matches paragraphs containing the word "price". Use starts-with() for prefix matching: "//div[starts-with(@class, 'product-')]" matches divs whose class starts with "product-". These text-based selectors are not available with CSS selectors.

Question 5

Why does my XPath return empty results when I can see the element in the browser?

Accepted Answer

Common causes include: (1) The element is inside an iframe, which is a separate document that the XPath cannot reach. (2) The element is inside a Shadow DOM component, which creates an encapsulated DOM tree. (3) The XPath syntax has an error, such as incorrect quoting or namespace issues. (4) The page uses dynamic class names that change on each load (common with CSS-in-JS libraries). Test your XPath in the browser console using document.evaluate() or the $x() shortcut: $x("//your/xpath/here") to verify it matches the expected elements.

Question 6

How do I handle pages that use different namespaces (like XHTML or SVG)?

Accepted Answer

Most modern HTML pages do not require namespace handling, and the scraper processes them as standard HTML. However, if you encounter namespace issues (typically with XML or XHTML strict documents), try using the local-name() function in your XPath: "//*[local-name()='div']" instead of "//div". This ignores namespace prefixes and matches elements by their local tag name only.

Question 7

What are some useful XPath functions I can use?

Accepted Answer

XPath provides many built-in functions: position() returns element index (//li[position()<=3] gets first 3 list items); last() selects the last element ((//p)[last()]); count() counts elements (//ul[count(li)>5] selects lists with more than 5 items); normalize-space() trims whitespace; translate() converts characters; concat() joins strings; and not() negates conditions (//div[not(@class="hidden")] selects visible divs). These functions can be combined for complex queries.

Parameter	Type	Required	Description
`url`	string	Yes	The full URL of the webpage to scrape (must include https:// or http://).
`xpath`	string	Yes	XPath expression targeting the element(s) to extract. Examples: "//h1" (all h1 elements), "//div[@class='content']" (divs with class "content"), "//a/@href" (all link URLs), "//table//tr/td[2]" (second column of all table rows).

SCRAPE_BY_XPATH

Overview

Parameters

Examples

Extract the main heading from a page

Get all links containing specific text

Extract second column from a data table

Scrape product names from elements with specific data attributes

Get the last paragraph on a page

Use Cases

Financial Data Extraction

Government Data Collection

Content Migration Audits

Competitor Feature Comparison

News and Media Monitoring

Pro Tips

Common Errors

Frequently Asked Questions

Related Functions

SCRAPE_BY_CSS_PATH

SCRAPE_BY_REGEX

AI_SCRAPE

Start using SCRAPE_BY_XPATH today