pillar|ยท13 min read

Google Sheets Web Scraper: Complete Guide

Learn how to scrape data from websites using Google Sheets built-in functions like IMPORTXML and IMPORTHTML โ€” no code required.

NM

Nacho Mascort

Author

Google Sheets can pull live data from the web โ€” no code, no server, no scraping infrastructure. Four built-in import functions handle everything from HTML tables to RSS feeds to raw CSV files. They're fast to set up, collaborative by nature, and free. But they also have real limits that will bite you if you don't know them going in.

This guide covers every built-in scraping function with working examples, the advanced techniques that most tutorials skip, a clear-eyed comparison with Python, and when to reach for something more powerful.

What Google Sheets Can Actually Scrape

Before writing a single formula, understand what Google's import functions can and can't access:

  • Can scrape: publicly accessible, static HTML pages โ€” tables, lists, structured data, RSS feeds, CSV/TSV files
  • Cannot scrape: JavaScript-rendered content (React, Angular, Vue apps), pages behind logins, paywalled content, sites that block Google's user-agent (Amazon, LinkedIn, Instagram)

The core limitation is that Google Sheets fetches raw HTML โ€” it does not execute JavaScript. If a page loads its content after the initial HTML response, Sheets sees nothing.

The 5 Built-in Scraping Functions

Google Sheets ships with five import functions. Each targets a different data format. Here's how they work and when to use each one.

IMPORTXML โ€” The Most Powerful Option

IMPORTXML fetches an HTML or XML page and lets you target any element using XPath queries. It's the most flexible of the five functions.

=IMPORTXML(url, xpath_query)

Example โ€” pull the page title from a site:

=IMPORTXML("https://example.com", "//title")

Example โ€” extract all links from a page:

=IMPORTXML("https://example.com", "//a/@href")

Example โ€” scrape product prices from an e-commerce page:

=IMPORTXML("https://example.com/products", "//span[@class='price']")

The key to IMPORTXML is writing the right XPath. Use your browser's DevTools: right-click any element, select Inspect, find the element in the HTML panel, right-click it, and choose Copy โ†’ Copy XPath. Paste that directly into your formula.

Note: Google Sheets doesn't support every XPath function. starts-with(), for example, doesn't work. Stick to common selectors: //element, [@attribute='value'], //element/text().

IMPORTHTML โ€” For Tables and Lists

IMPORTHTML pulls entire tables or unordered/ordered lists from a page. It's perfect for Wikipedia data, sports statistics, financial tables, and any site that presents data in an HTML <table>.

=IMPORTHTML(url, query, index)
  • query: either "table" or "list"
  • index: which table/list on the page (starts at 1)

Example โ€” pull the first table from a Wikipedia page:

=IMPORTHTML("https://en.wikipedia.org/wiki/List_of_countries_by_GDP", "table", 1)

Finding the right index requires trial and error. Open DevTools in your browser and search for <table to count how many tables appear on the page before the one you want.

To pull only one column from an imported table, wrap it in INDEX:

=INDEX(IMPORTHTML("https://example.com/data", "table", 1), , 2)

That returns only column 2 of the table, preventing the data from spilling into neighboring cells.

IMPORTDATA โ€” For CSV and TSV Files

IMPORTDATA fetches a CSV or TSV file directly into your sheet. It's the simplest function in the set.

=IMPORTDATA(url)

Example โ€” import a public dataset:

=IMPORTDATA("https://data.example.com/population.csv")

This works for any direct link to a .csv or .tsv file. Many government data portals, GitHub repos, and open data sources expose files this way. If the URL doesn't end with a CSV extension but returns CSV content, IMPORTDATA will still parse it correctly.

IMPORTFEED โ€” For RSS and Atom Feeds

IMPORTFEED parses RSS and Atom feeds. Use it to monitor blogs, news sites, podcasts, or any content publisher that exposes a feed.

=IMPORTFEED(url, [query], [headers], [num_items])

Example โ€” pull the 10 most recent posts from a blog:

=IMPORTFEED("https://example.com/feed", "items", TRUE, 10)

The query parameter accepts: "feed" (metadata), "items" (entries), "items title", "items url", "items summary", or "items created". Use these to pull just the fields you need.

IMPORTRANGE โ€” Pulling from Other Sheets

IMPORTRANGE is technically not a web scraper โ€” it pulls data from another Google Sheets document. But it's essential for building multi-sheet workflows where data flows automatically between files.

=IMPORTRANGE(spreadsheet_url, range_string)

Example:

=IMPORTRANGE("https://docs.google.com/spreadsheets/d/1ABC.../edit", "Sheet1!A1:D100")

The first time you use this in a new sheet, Google will prompt you to grant access. After that, it syncs automatically whenever the source sheet updates.

Advanced Techniques Most Guides Skip

Dynamic URLs with Cell References

Every import function accepts a cell reference instead of a hardcoded URL. This lets you build scraping systems where changing one cell updates all your formulas at once.

=IMPORTXML(A1, "//title")

Put a URL in A1. Change it to a different URL and the formula instantly re-fetches. Extend this to build bulk scrapers: list URLs in column A, write formulas in column B that reference each row. You get a table of scraped data across dozens of pages.

=IMPORTHTML(A2, "table", B2)

Here, both the URL and the table index are dynamic โ€” stored in cells A2 and B2.

Combine INDEX with IMPORTHTML

When IMPORTHTML imports a full table, it can spill into cells you're using for other data. Use INDEX to extract exactly the rows and columns you need:

=INDEX(IMPORTHTML(A1, "table", 1), 2, 3)

This returns only the value at row 2, column 3 of the imported table. Combine with ARRAYFORMULA to extract full columns while filtering rows.

Wrap Formulas in IFERROR

Import functions fail when a site is down, blocks Google, or returns unexpected data. Wrapping in IFERROR prevents error values from cascading through dependent cells:

=IFERROR(IMPORTXML(A1, "//h1/text()"), "Unavailable")

For bulk scrapers across many URLs, this is essential โ€” one bad URL won't break the entire sheet.

Finding XPath with DevTools

Don't guess XPath selectors. Use the browser:

  1. Open the page you want to scrape in Chrome
  2. Right-click the element you need and select Inspect
  3. In the Elements panel, right-click the highlighted HTML element
  4. Choose Copy โ†’ Copy XPath
  5. Paste into your IMPORTXML formula

The copied XPath is often too specific (includes exact positions like [3]). Simplify it by removing position predicates and testing with a more general selector. For example, /html/body/div[3]/div[1]/h1 can usually be simplified to //h1.

Try it with Unlimited Sheets

Need to scrape JavaScript-rendered pages or sites that block Google? The AI_SCRAPE function in Unlimited Sheets handles dynamic content, JavaScript rendering, and complex selectors โ€” all from a Google Sheets formula, no code required.

Install Free Add-on →

Where Google Sheets Falls Short

The built-in functions work well for simple, static pages. They stop working โ€” completely โ€” in four situations:

JavaScript-Rendered Content

If a page loads its data after the initial HTML response using JavaScript (React, Angular, Vue, any modern SPA), Google Sheets can't see it. The function returns empty results or errors. This includes the product listings on most e-commerce sites, dynamically generated prices, and any content loaded via AJAX.

There's no workaround using built-in functions. You need either Google Apps Script with a custom fetch, or a third-party add-on that handles JavaScript rendering.

Sites That Block Google's User-Agent

Google Sheets sends requests with a recognizable Google user-agent. Major platforms actively block it: Amazon, eBay, LinkedIn, Instagram, Twitter, and most large e-commerce sites. You'll get a #N/A error with the message "could not fetch URL."

There's no way to change the user-agent or route traffic through a proxy in built-in Sheets functions. If a site blocks Google's requests, you're stuck.

No Pagination Support

Built-in functions pull from a single URL. If the data you need spans multiple pages (page 1, page 2, page 3...), you have to manually add formulas for each page URL. There's no loop or iterator built into the functions. For anything over a few pages, this becomes unwieldy fast.

Scale Limits

Google Sheets caps out at 10 million cells per spreadsheet. For casual data collection this is fine. For production scraping pipelines pulling tens of thousands of records โ€” or for anything that needs to run on a schedule with retries and error logging โ€” Sheets isn't the right tool.

Google Sheets vs Python for Web Scraping

The right tool depends entirely on what you need to scrape and what you'll do with the data.

Factor Google Sheets Python (BeautifulSoup / Scrapy / Playwright)
Setup time Minutes โ€” just write a formula Hours โ€” environment setup, libraries, code
Technical skill required None โ€” anyone can use it Intermediate Python knowledge needed
JavaScript rendering Not supported Full support via Selenium, Playwright, Puppeteer
Login-protected pages Not supported Supported โ€” can automate login flows and sessions
Proxy / header control None Full control โ€” rotate proxies, set any header
Data scale Limited to 10M cells No practical limit
Collaboration Instant โ€” share a link Requires separate storage and dashboard setup
Scheduled runs Auto-refreshes (not configurable) Full control via cron, cloud functions, etc.
Best for One-off tasks, monitoring, team-shared data Production pipelines, dynamic sites, large scale

The most effective pattern for teams with some technical capacity: use Python to scrape (via gspread and the Google Sheets API), and write the results back into a Google Sheet for the team to consume. Python handles the hard scraping work; Sheets handles sharing, visualization, and collaboration. Each tool does what it does best.

Common Errors and How to Fix Them

These are the errors you'll hit most often and what actually fixes them:

#N/A โ€” "Could not fetch URL"

Cause: The URL is wrong, the site is down, or the site blocks Google's user-agent.
Fix: Verify the URL manually in a browser. If it loads fine in a browser but not in Sheets, the site is blocking Google. No workaround exists with built-in functions.

#REF! โ€” Not Enough Space

Cause: The imported data would overwrite cells that already contain content.
Fix: Move your formula to a cell with enough empty space below and to the right for all returned data. If you only need part of the data, use INDEX to limit what's imported.

#VALUE! โ€” Wrong Data Type

Cause: The imported content doesn't match what the formula expects โ€” text instead of a number, for example.
Fix: Use VALUE() to convert text to numbers, or TEXT() for the reverse. Alternatively, adjust your XPath to target the right element.

"Result Too Large"

Cause: The page has more data than Sheets can handle in a single cell range.
Fix: Narrow your XPath or table index to target a smaller portion of the page. Use INDEX to extract specific rows/columns rather than the full dataset.

"Array Result Was Not Expanded"

Cause: The formula would output multiple values but is blocked by adjacent cell content.
Fix: Clear the cells around your formula, or use INDEX to reduce the output to a single value.

Going Beyond Built-in Functions

When the built-in functions aren't enough, you have three options:

Google Apps Script

Apps Script is JavaScript that runs inside Google Workspace. You can write custom functions that make HTTP requests, parse responses, and write data directly to sheet cells. It supports loops, conditionals, error handling, and scheduled triggers โ€” everything the built-in functions lack.

The downside: you still can't render JavaScript, and Google's servers still send recognizable headers. For truly dynamic or protected pages, Apps Script alone isn't enough.

Google Workspace Add-ons

Third-party add-ons extend what Sheets can do. ImportFromWeb, for example, handles JavaScript-rendered pages and supports automatic data refresh. These add-ons run their own infrastructure to fetch and render pages before returning data to your sheet.

Unlimited Sheets โ€” AI_SCRAPE

Unlimited Sheets adds the AI_SCRAPE function directly to Google Sheets. It handles JavaScript-rendered content, complex selectors, and structured data extraction using AI to understand the page โ€” not just parse HTML. You describe what you want in plain language, and the function returns it.

=AI_SCRAPE("https://example.com/product", "product price and availability")

Pair it with UNLIMITED_AI to process scraped data further โ€” classify it, extract specific fields, summarize it โ€” all without leaving Sheets.

Try it with Unlimited Sheets

Use AI_SCRAPE to pull data from JavaScript-heavy pages, sites that block Google, and complex structured content โ€” directly in your spreadsheet, no code required. Free to install.

Install Free Add-on →

Frequently Asked Questions

Which Google Sheets function is best for web scraping?

It depends on what you're scraping. Use IMPORTXML for targeted data extraction from any HTML element using XPath. Use IMPORTHTML for full tables and lists. Use IMPORTDATA for CSV or TSV files. Use IMPORTFEED for RSS and Atom feeds. IMPORTXML is the most flexible and handles the widest range of use cases.

Can Google Sheets scrape JavaScript-rendered pages?

No. The built-in import functions fetch raw HTML and do not execute JavaScript. If a page's content is loaded after the initial HTML response โ€” common on React, Angular, and Vue sites โ€” Google Sheets will return empty results or errors. For JavaScript-rendered pages, use a third-party add-on like Unlimited Sheets' AI_SCRAPE, or scrape with Python using Playwright or Selenium.

Why does IMPORTXML fail on Amazon, LinkedIn, or Instagram?

These sites actively block requests that come from Google's servers. Google Sheets sends requests with a recognizable Google user-agent, and major platforms have flagged it and return errors. There's no workaround using built-in functions โ€” the block is server-side and cannot be bypassed by changing the formula.

How often does Google Sheets refresh scraped data?

Google Sheets automatically re-fetches import function data periodically, but the exact frequency is not published and is not configurable. In practice, data typically refreshes every 1โ€“2 hours. You can force a refresh by editing the formula (adding a space and removing it) or by using a helper cell that changes on a schedule. For real-time or precisely scheduled refreshes, Google Apps Script with a time-based trigger gives you full control.

How do I write XPath for IMPORTXML?

The fastest method: open the page in Chrome, right-click the element you want, select Inspect, right-click the highlighted HTML in DevTools, and choose Copy โ†’ Copy XPath. Paste the result into your formula. Simplify the XPath by removing overly specific position selectors. Common patterns: //h1/text() for headings, //a/@href for links, //span[@class='price']/text() for elements by class name.

Can Google Sheets scrape data across multiple pages?

Not automatically. Built-in functions target a single URL. To scrape paginated data, you need to create separate formulas for each page URL โ€” either manually or by building a list of paginated URLs in a column and referencing them with cell-based formulas. For true pagination scraping, Google Apps Script (which supports loops) or Python is the better tool.

Should I use Google Sheets or Python for web scraping?

Google Sheets is the right choice for quick, one-off tasks โ€” monitoring a competitor's pricing table, pulling Wikipedia data, tracking RSS feeds โ€” where the data is static, publicly accessible, and the audience is a team that uses spreadsheets. Python is better for production pipelines, JavaScript-rendered pages, login-protected content, large datasets, and anything requiring proxy rotation or precise scheduling. Many teams use both: Python to scrape, Google Sheets to display and share the results.

What does the "could not fetch URL" error mean?

The #N/A "could not fetch URL" error means Google Sheets couldn't retrieve the page. Check the URL first โ€” paste it directly into a browser to confirm it loads. If it loads in a browser but not in Sheets, the site is blocking Google's requests. Other causes: the site is temporarily down, the URL returns a non-200 status code, or the page requires authentication to view.

Try it in your spreadsheet

Install Unlimited Sheets and start tracking keyword rankings with a single formula. Free to start, no credit card required.

Install Free Add-on
#web scraping#google sheets#IMPORTXML#IMPORTHTML#data extraction#no-code#spreadsheets
Nacho Mascort

Nacho Mascort

Founder at Unlimited Sheets

Spreadsheet power user. Building tools that bring enterprise-grade SEO and data capabilities to Google Sheets.

Frequently Asked Questions

Common questions about this topic

Which Google Sheets function is best for web scraping?
It depends on what you're scraping. Use IMPORTXML for targeted data extraction from any HTML element using XPath. Use IMPORTHTML for full tables and lists. Use IMPORTDATA for CSV or TSV files. Use IMPORTFEED for RSS and Atom feeds. IMPORTXML is the most flexible and handles the widest range of use cases.
Can Google Sheets scrape JavaScript-rendered pages?
No. The built-in import functions fetch raw HTML and do not execute JavaScript. If a page's content is loaded after the initial HTML response โ€” common on React, Angular, and Vue sites โ€” Google Sheets will return empty results or errors. For JavaScript-rendered pages, use a third-party add-on like Unlimited Sheets' AI_SCRAPE, or scrape with Python using Playwright or Selenium.
Why does IMPORTXML fail on Amazon, LinkedIn, or Instagram?
These sites actively block requests that come from Google's servers. Google Sheets sends requests with a recognizable Google user-agent, and major platforms have flagged it and return errors. There's no workaround using built-in functions โ€” the block is server-side and cannot be bypassed by changing the formula.
How often does Google Sheets refresh scraped data?
Google Sheets automatically re-fetches import function data periodically, but the exact frequency is not published and is not configurable. In practice, data typically refreshes every 1โ€“2 hours. You can force a refresh by editing the formula or using a helper cell. For real-time or precisely scheduled refreshes, Google Apps Script with a time-based trigger gives you full control.
How do I write XPath for IMPORTXML?
The fastest method: open the page in Chrome, right-click the element you want, select Inspect, right-click the highlighted HTML in DevTools, and choose Copy โ†’ Copy XPath. Paste the result into your formula. Common patterns: //h1/text() for headings, //a/@href for links, //span[@class='price']/text() for elements by class name.
Can Google Sheets scrape data across multiple pages?
Not automatically. Built-in functions target a single URL. To scrape paginated data, you need to create separate formulas for each page URL. For true pagination scraping, Google Apps Script (which supports loops) or Python is the better tool.
Should I use Google Sheets or Python for web scraping?
Google Sheets is the right choice for quick, one-off tasks on static, publicly accessible pages where the audience uses spreadsheets. Python is better for production pipelines, JavaScript-rendered pages, login-protected content, large datasets, and anything requiring proxy rotation. Many teams use both: Python to scrape, Google Sheets to display and share the results.
What does the 'could not fetch URL' error mean?
The #N/A 'could not fetch URL' error means Google Sheets couldn't retrieve the page. Check the URL first in a browser. If it loads in a browser but not in Sheets, the site is blocking Google's requests. Other causes: the site is temporarily down, the URL returns a non-200 status code, or the page requires authentication to view.

Stop overpaying for SEO tools

Get keyword rankings, search volumes, SERP data, and AI โ€” all inside Google Sheets. One add-on, one formula.