GET_URLS_FROM_SITEMAP
Extract all URLs from an XML sitemap, handles sitemap indexes recursively
=GET_URLS_FROM_SITEMAP(sitemapUrl)Returns: string[][]
Overview
GET_URLS_FROM_SITEMAP fetches and parses an XML sitemap, extracting every URL listed within it and returning them as a vertical list in your Google Sheet. This function is a cornerstone of any technical SEO workflow, enabling you to quickly inventory all the pages a website has submitted to search engines for indexing.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
sitemapUrl | string | Yes | The full URL to an XML sitemap or sitemap index file |
Examples
Extract URLs from a standard sitemap
Fetches the sitemap and returns all URLs as a vertical list starting from the cell where the formula is entered.
=GET_URLS_FROM_SITEMAP("https://example.com/sitemap.xml")Output
| https://example.com/ |
| https://example.com/about |
| https://example.com/contact |
| https://example.com/blog |
Parse a sitemap index with multiple sitemaps
Automatically detects the sitemap index, follows each child sitemap, and returns all URLs from every sitemap combined.
=GET_URLS_FROM_SITEMAP("https://example.com/sitemap_index.xml")Output
| https://example.com/ |
| https://example.com/products/widget-a |
| https://example.com/products/widget-b |
| https://example.com/blog/post-1 |
Extract from a specific sub-sitemap
Targets a specific sitemap file (e.g., blog posts only) to get a focused list of URLs for a particular content type.
=GET_URLS_FROM_SITEMAP("https://example.com/post-sitemap.xml")Output
| https://example.com/blog/seo-guide |
| https://example.com/blog/keyword-research |
| https://example.com/blog/link-building |
Use Cases
Complete Site Inventory
Pull every indexed URL from a client site before starting an audit. Cross-reference with Google Search Console data to find pages that are indexed but not in the sitemap, or in the sitemap but not indexed.
Content Audit and Pruning
Extract all article URLs from a news sitemap to evaluate content performance, identify thin or outdated pages for pruning, and prioritize content refresh efforts.
Competitor Content Analysis
Analyze a competitor sitemap to understand their site structure, content volume, and URL patterns. Discover content gaps and opportunities by comparing their pages against yours.
Migration Pre-Check
Before migrating a site, extract all URLs from the current sitemap to create a redirect mapping spreadsheet, ensuring every existing page has a corresponding destination on the new site.
E-commerce Catalog Monitoring
Regularly extract product URLs from the sitemap to monitor catalog size, detect accidentally removed products, and ensure all new products are properly included in the sitemap.
International SEO Coverage Check
Parse sitemaps for each language version of a website to verify that all pages have been translated and included, identifying missing pages across different locales.
Pro Tips
After extracting URLs, use GET_STATUS_CODE on each one to create a comprehensive site health report that identifies broken pages, redirect chains, and server errors.
Combine with SCRAPE to extract the title tag and meta description from each URL, building a complete on-page SEO audit spreadsheet.
Use COUNTIF formulas on the extracted URLs to analyze your site structure. Count URLs by directory (e.g., /blog/, /products/) to understand content distribution.
Compare sitemap extractions over time to track how your site is growing and detect any pages that were accidentally removed from the sitemap.
Check your competitors sitemaps to discover their content strategy and find topic opportunities you might be missing.
One of the most powerful features of this function is its ability to handle sitemap index files recursively. Many large websites split their sitemaps into multiple files referenced by a single sitemap index. When you provide a sitemap index URL, the function automatically detects it, follows each referenced sitemap, and extracts URLs from all of them, returning a complete list without any manual effort.
This function pairs exceptionally well with other Unlimited Sheets functions. Once you have extracted the full list of URLs, you can use GET_STATUS_CODE to audit each page for errors, SCRAPE to extract on-page SEO elements like titles and meta descriptions, or GET_SEARCH_VOLUME_FROM_GOOGLE to research keywords related to each page. This makes it possible to build comprehensive SEO audits entirely within Google Sheets.
The function supports standard XML sitemaps conforming to the sitemaps.org protocol, including sitemap indexes, regular sitemaps, and news sitemaps. It handles compressed (gzip) sitemaps when the server provides proper content encoding. For websites with tens of thousands of URLs, the results are returned progressively as the sitemaps are parsed.
Common Errors
#ERROR!Cause: The URL does not point to a valid XML sitemap, or the server blocked the request.
Fix: Verify the URL leads to a valid XML sitemap by opening it in your browser. Ensure the URL ends in .xml and the server responds with XML content.
#VALUE!Cause: The provided URL is not properly formatted or is missing the protocol.
Fix: Make sure the URL includes "https://" or "http://" and is a complete, valid URL pointing to the sitemap file.
Frequently Asked Questions
Yes, the function automatically detects sitemap index files and recursively fetches all child sitemaps, returning a combined list of every URL found across all sitemaps.
The function can handle sitemaps of any size. However, Google Sheets has a cell limit, so extremely large sitemaps (100,000+ URLs) may need to be processed in segments or exported to a separate sheet.
Yes, you can extract URLs from any publicly accessible sitemap. Most websites make their sitemaps publicly available at /sitemap.xml or reference them in their robots.txt file.
If no sitemap exists at the specified URL, the function will return an error. Try checking the site robots.txt file (usually at /robots.txt) for the sitemap location, or look for common paths like /sitemap.xml, /sitemap_index.xml, or /sitemap/
The function focuses on extracting the URL list for maximum simplicity and usability. The URLs returned are the <loc> values from the sitemap XML.
Most websites place their sitemap at /sitemap.xml. You can also check the robots.txt file (e.g., https://example.com/robots.txt) which often contains a Sitemap: directive pointing to the sitemap location.
Yes, the function can handle gzip-compressed sitemaps (.xml.gz) as long as the server provides proper content encoding headers.
Related Functions
Start using GET_URLS_FROM_SITEMAP today
Install Unlimited Sheets to get GET_URLS_FROM_SITEMAP and 41 other powerful functions in Google Sheets.