pillar|ยท12 min read

Regex in Google Sheets: Complete Guide

Master REGEXMATCH, REGEXEXTRACT, and REGEXREPLACE in Google Sheets with real formula examples, a syntax cheatsheet, and common mistakes to avoid.

NM

Nacho Mascort

Author

Regular expressions unlock a level of text manipulation in Google Sheets that standard string functions simply can't match. Once you understand how to use REGEXMATCH, REGEXEXTRACT, and REGEXREPLACE, you'll find yourself solving data-cleaning problems in a single formula that used to take multiple steps โ€” or weren't feasible at all.

This guide covers everything from syntax basics to ARRAYFORMULA integration, with real formulas you can copy directly into your spreadsheet.

The Three Regex Functions in Google Sheets

Google Sheets provides three built-in regex functions, each with a distinct purpose:

Function Returns Use When
REGEXMATCH(text, pattern) TRUE / FALSE You need to check whether a pattern exists
REGEXEXTRACT(text, pattern) Matched string You need to pull out the matching text
REGEXREPLACE(text, pattern, replacement) Modified string You need to clean or transform text

All three use RE2 syntax โ€” Google's regular expression engine. RE2 is fast and safe, but it intentionally omits lookahead and lookbehind assertions. Keep that in mind if you're coming from Python or JavaScript regex.

REGEXMATCH: Check Whether a Pattern Exists

REGEXMATCH(text, regular_expression) returns TRUE if the pattern matches any part of the text, FALSE if it doesn't.

Syntax and basic example

=REGEXMATCH(A2, "\d+")

This returns TRUE if cell A2 contains at least one digit. The pattern \d+ means "one or more digit characters."

Validate email addresses

=REGEXMATCH(A2, "^[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$")

This checks whether A2 looks like a valid email. The ^ anchors the match to the start of the string, and $ anchors it to the end โ€” so the entire cell must conform to the pattern, not just a substring.

Match multiple values with OR

=REGEXMATCH(A2, "example\.com|test\.com|demo\.com")

The pipe character | acts as OR in regex. This returns TRUE if A2 contains any of those three domains. Note the escaped dot \. โ€” an unescaped dot matches any character, which is rarely what you want when matching literal periods.

Case-insensitive matching

REGEXMATCH is case-sensitive by default. To match regardless of case, prefix your pattern with (?i):

=REGEXMATCH(A2, "(?i)urgent")

This matches "urgent", "URGENT", "Urgent", and any other capitalization.

Use REGEXMATCH inside FILTER

=FILTER(A2:A100, REGEXMATCH(A2:A100, "(?i)manager"))

Combines FILTER with REGEXMATCH to return only rows where column A contains the word "manager" (case-insensitive). This is one of the most practical patterns in everyday spreadsheet work.

REGEXEXTRACT: Pull Specific Text Out of a String

REGEXEXTRACT(text, regular_expression) returns the first substring that matches the pattern. If nothing matches, it returns a #N/A error.

Extract domain from email address

=REGEXEXTRACT(A2, "@(.+)$")

The parentheses create a capture group. When you use a capture group, REGEXEXTRACT returns what's inside the group rather than the full match. Here, (.+)$ captures everything after the @ sign to end of string โ€” giving you the domain name.

Extract numbers from text

=VALUE(REGEXEXTRACT(A2, "\d+"))

Extracts the first sequence of digits from a string. Wrapping in VALUE() converts the result from text to a number so you can do arithmetic with it. For example, "Order #4521 shipped" becomes 4521.

Extract multiple parts with capture groups

=REGEXEXTRACT(A2, "(\d{4})-(\d{2})-(\d{2})")

When you have multiple capture groups, REGEXEXTRACT spills the results into adjacent cells (it acts like an array formula automatically). Use this to split ISO date strings like "2024-03-15" into separate year, month, and day columns.

Extract URL path segment

=REGEXEXTRACT(A2, "https?://[^/]+(/[^?#]*)")

Captures the path portion of a URL, stopping before any query string or fragment. Useful for analyzing which pages users are visiting when you have raw URL data.

Try it with Unlimited Sheets

Need to extract data from web pages directly into your spreadsheet? AI_SCRAPE from Unlimited Sheets lets you scrape structured content with a single formula โ€” no regex required for complex sites.

Install Free Add-on →

REGEXREPLACE: Clean and Transform Text

REGEXREPLACE(text, regular_expression, replacement) finds all occurrences of the pattern and replaces them with the replacement string. Unlike SUBSTITUTE, which only handles exact string matches, REGEXREPLACE handles patterns.

Remove extra whitespace

=REGEXREPLACE(A2, "\s+", " ")

Replaces any run of whitespace characters (spaces, tabs, newlines) with a single space. More powerful than TRIM() for multi-space issues inside a string.

Strip non-numeric characters

=VALUE(REGEXREPLACE(A2, "[^0-9]", ""))

The [^0-9] pattern matches any character that is NOT a digit. Replacing with an empty string removes everything except numbers. Great for cleaning phone number fields that contain parentheses, dashes, and spaces.

Reorder first/last name

=REGEXREPLACE(A2, "^(\w+)\s(\w+)$", "$2, $1")

Captures first name as group 1 and last name as group 2, then outputs "Last, First". The $1 and $2 in the replacement string refer back to the capture groups.

Mask email addresses

=REGEXREPLACE(A2, "(?i)([a-z0-9._%+\-]+)@", "***@")

Replaces the local part of an email address with asterisks. Useful for anonymizing data before sharing or exporting.

Replace multiple patterns (nested)

=REGEXREPLACE(REGEXREPLACE(A2, "colour", "color"), "favour", "favor")

Nest REGEXREPLACE calls to handle multiple replacements. For large-scale standardization, this approach is cleaner than multiple helper columns.

Essential Regex Syntax Reference

Google Sheets uses RE2 syntax. Here are the patterns you'll use most often:

Pattern Meaning Example
. Any single character c.t matches "cat", "cut", "cot"
\d Any digit (0โ€“9) \d{3} matches exactly 3 digits
\w Word character (letter, digit, underscore) \w+ matches a full word
\s Whitespace \s+ matches one or more spaces
^ Start of string ^http only matches strings starting with "http"
$ End of string \.pdf$ matches strings ending in ".pdf"
[abc] Any character in the set [aeiou] matches any vowel
[^abc] Any character NOT in the set [^0-9] matches non-digits
+ One or more \d+ matches one or more digits
* Zero or more \d* matches zero or more digits
? Zero or one (optional) colou?r matches "color" and "colour"
{n,m} Between n and m repetitions \d{2,4} matches 2 to 4 digits
(abc) Capture group (\w+)@ captures text before @
a|b OR cat|dog matches "cat" or "dog"
(?i) Case-insensitive flag (?i)urgent matches any capitalization

RE2 limitations to know: RE2 doesn't support lookahead ((?=...)), lookbehind ((?<=...)), or backreferences in patterns (only in replacement strings). If you're used to PCRE regex, these are the main things you'll miss.

Combining Regex with ARRAYFORMULA

By default, regex functions operate on a single cell. Wrap them in ARRAYFORMULA to process an entire column at once โ€” no need to drag formulas down.

Apply REGEXMATCH to a whole column

=ARRAYFORMULA(REGEXMATCH(A2:A100, "\d+"))

Returns a column of TRUE/FALSE values for every cell in A2:A100. This is significantly faster than copying the formula down, especially on large datasets.

Extract from a column with ARRAYFORMULA

=ARRAYFORMULA(IF(A2:A100="", "", REGEXEXTRACT(A2:A100, "@(.+)$")))

The IF(A2:A100="", "", ...) wrapper prevents #N/A errors on empty cells โ€” a common gotcha when using REGEXEXTRACT in array context.

Count matches across a range

=SUMPRODUCT(--REGEXMATCH(A2:A100, "(?i)urgent"))

The double-negative -- converts TRUE/FALSE to 1/0, then SUMPRODUCT sums them. This counts how many cells contain "urgent" (case-insensitive) without needing ARRAYFORMULA.

Filter rows based on regex match

=FILTER(A2:C100, REGEXMATCH(B2:B100, "(?i)^(manager|director|vp)"))

Returns all rows from A:C where column B starts with "manager", "director", or "vp". The ^ anchor ensures you're matching job titles at the start of the cell, not substrings elsewhere.

Try it with Unlimited Sheets

Need to apply AI to classify or extract data from messy text columns? UNLIMITED_AI lets you write natural language instructions instead of regex patterns โ€” ideal when patterns are unpredictable.

Install Free Add-on →

Common Regex Mistakes in Google Sheets

Most regex errors in Google Sheets fall into a handful of predictable categories.

Unescaped dots in patterns

A bare . in regex matches any character โ€” not a literal period. This is the most frequent source of unexpected matches.

โŒ =REGEXMATCH(A2, "example.com")   โ€” matches "exampleXcom", "example1com", etc.
โœ… =REGEXMATCH(A2, "example\.com")  โ€” matches only "example.com"

Always escape dots, brackets, parentheses, and other special characters with a backslash when you mean them literally.

Forgetting case sensitivity

REGEXMATCH and REGEXEXTRACT are case-sensitive by default. A pattern like error won't match "Error" or "ERROR". Add (?i) at the start of your pattern when you need case-insensitive matching.

Missing anchors causing partial matches

โŒ =REGEXMATCH(A2, "\d{5}")     โ€” TRUE for "zip: 90210 extended"
โœ… =REGEXMATCH(A2, "^\d{5}$")   โ€” TRUE only if cell is exactly 5 digits

Without anchors, REGEXMATCH returns TRUE if the pattern matches anywhere in the string. Use ^ and $ when you need to validate the entire cell value.

Using regex on numeric cells

Regex functions expect text input. If a cell contains a number (not text), you may get unexpected errors. Convert with TEXT(A2, "0") or concatenate with an empty string A2&"":

=REGEXMATCH(TEXT(A2, "0"), "^\d{4}$")

Not handling #N/A from REGEXEXTRACT

REGEXEXTRACT returns #N/A when the pattern doesn't match โ€” unlike REGEXMATCH which returns FALSE. Wrap with IFERROR to handle no-match cases gracefully:

=IFERROR(REGEXEXTRACT(A2, "@(.+)$"), "no match")

Greedy vs. lazy matching

By default, quantifiers like + and * are greedy โ€” they match as much as possible. This can cause over-matching when extracting content between delimiters.

โŒ =REGEXEXTRACT(A2, "<(.+)>")   โ€” matches everything from first < to LAST >
โœ… =REGEXEXTRACT(A2, "<([^>]+)>") โ€” matches from < to the NEXT >

Use negated character classes like [^>] instead of . to limit how far the match extends.

Real-World Use Cases

Standardize phone numbers

=REGEXREPLACE(A2, "[^0-9]", "")

Strips all non-numeric characters from a phone field. "(555) 867-5309" becomes "5558675309". Combine with a subsequent formatting formula if needed.

Extract UTM parameters from URLs

=REGEXEXTRACT(A2, "[?&]utm_source=([^&#]+)")

Captures the utm_source value from a full URL. Adapt the parameter name to extract utm_medium, utm_campaign, etc. from your analytics export.

Classify emails by domain

=IF(REGEXMATCH(A2, "@gmail\.com|@yahoo\.com|@hotmail\.com"), "Consumer", "Business")

Segments your contact list by email provider type โ€” no VLOOKUP required.

Extract city from an address string

=REGEXEXTRACT(A2, ",\s*([^,]+),\s*[A-Z]{2}\s+\d")

Extracts the city from an address formatted as "123 Main St, Springfield, IL 62701". Adjust the pattern based on your actual data format.

Parse version numbers

=REGEXEXTRACT(A2, "v?(\d+\.\d+\.\d+)")

Extracts semantic version strings like "3.2.1" from text such as "App v3.2.1 released". Useful for maintaining software inventory sheets.

Combining Regex with IF, COUNTIF, and IFS

Regex functions pair naturally with Google Sheets conditional logic. This combination lets you build validation systems and dynamic classifications without helper columns.

Classify data with IF and REGEXMATCH

=IF(REGEXMATCH(A2, "(?i)^(mr|mrs|ms|dr)\.?\s"), "Has Title", "No Title")

Checks whether a name field begins with a common title prefix. The \s at the end ensures there's whitespace after the title โ€” preventing partial matches on words like "Mrs" inside a longer name.

Tier segmentation with IFS and REGEXMATCH

Route contacts into segments based on email domain:

=IFS(
  REGEXMATCH(B2, "@(gmail|yahoo|hotmail|outlook)\.com$"), "B2C",
  REGEXMATCH(B2, "\.edu$"), "Education",
  REGEXMATCH(B2, "\.gov$"), "Government",
  TRUE, "B2B"
)

The final TRUE clause acts as a catch-all. This scales cleanly as you add more segments โ€” just insert additional REGEXMATCH conditions before the default.

Counting pattern matches with SUMPRODUCT

COUNTIF only handles wildcards (* and ?), not regex. Use SUMPRODUCT with REGEXMATCH instead:

=SUMPRODUCT(REGEXMATCH(A2:A100, "(?i)@gmail\.com$") * 1)

Counts cells ending in "@gmail.com" (case-insensitive). The * 1 coerces TRUE/FALSE to 1/0.

Using Regex for Data Validation

Google Sheets data validation rules support a "Custom formula" option, which means you can enforce regex patterns directly on input cells to catch errors at the source.

Setting up regex validation

  1. Select the cell range you want to validate (e.g., B2:B100)
  2. Go to Data → Data validation → Add rule
  3. Set "Criteria" to Custom formula
  4. Enter your REGEXMATCH formula, referencing the first cell in your range
=REGEXMATCH(B2, "^[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}$")

This blocks invalid email addresses at entry time. Users see an error message when they try to enter something that doesn't match the pattern โ€” no cleanup required after the fact.

Validate phone number format

=REGEXMATCH(TEXT(C2,"0"), "^\d{10}$")

Requires exactly 10 digits. The TEXT(C2,"0") ensures numeric cells are converted to strings before the regex check โ€” without it, numeric phone number entries would silently fail validation.

Validate URL format

=REGEXMATCH(D2, "^https?://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,}(/.*)?$")

Checks that a URL starts with http:// or https://, contains a valid domain, and optionally has a path. Simple but effective for catching obvious copy-paste errors in URL columns.

Smarter Text Splitting with Regex

The built-in SPLIT function works on fixed delimiters. Combining it with REGEXREPLACE gives you control when data uses inconsistent separators.

Handle variable whitespace

=SPLIT(REGEXREPLACE(A2, "\s+", " "), " ")

First collapses all runs of whitespace to a single space, then splits on that space. Without the REGEXREPLACE step, SPLIT would create empty cells wherever double-spaces appear in the source data.

Split on multiple different separators

=SPLIT(REGEXREPLACE(A2, "[,;|]\s*", "|||"), "|||")

Normalizes commas, semicolons, and pipes (with optional surrounding spaces) to a consistent delimiter, then splits on it. Far cleaner than nested SPLIT calls.

Frequently Asked Questions

Does Google Sheets support full PCRE regex?

No. Google Sheets uses RE2, not PCRE. RE2 omits lookahead, lookbehind, and atomic grouping. Most everyday patterns work fine, but if you rely on (?=...) or (?<=...) you'll need to restructure your approach using capture groups instead.

When should I use REGEXMATCH vs REGEXEXTRACT?

Use REGEXMATCH when you only need to know whether a pattern exists (yes/no). Use REGEXEXTRACT when you need the actual matched text. REGEXMATCH is also safer as a FILTER condition because it never returns an error โ€” it always returns TRUE or FALSE.

Can REGEXEXTRACT return multiple matches?

Not directly โ€” it returns the first match only. To extract multiple matches from a single cell, you need capture groups for distinct parts (e.g., year/month/day from a date string), or combine REGEXREPLACE with array techniques to iterate through matches.

How do I make regex case-insensitive?

Prefix your pattern with (?i). For example, =REGEXMATCH(A2, "(?i)error") matches "error", "Error", "ERROR", and every other capitalization.

Why does my backslash escape not work?

In Google Sheets formula strings, the backslash itself needs to be entered as-is โ€” no double-escaping required (unlike some programming languages). Just type \d directly in your pattern string. If you're copying patterns from JavaScript where you'd write \\d, remove the extra backslash.

Do regex formulas slow down large spreadsheets?

They can, especially with complex patterns applied to thousands of rows. Prefer ARRAYFORMULA over dragged-down formulas (fewer recalculation triggers), cache results as plain text where the source data doesn't change, and avoid regex inside deeply nested formulas that recalculate on every edit.

How do I extract all numbers from a cell, not just the first?

REGEXEXTRACT only returns the first match. A common workaround is =REGEXREPLACE(A2, "[^0-9]", "") which removes all non-digits, concatenating any numbers together. If you need them separated, consider splitting the text first or using a helper column approach.

How do I match literal parentheses?

Escape them with a backslash: \( and \). For example, to match a phone number format "(555) 123-4567" use the pattern \(\d{3}\)\s\d{3}-\d{4}.

Try it in your spreadsheet

Install Unlimited Sheets and start tracking keyword rankings with a single formula. Free to start, no credit card required.

Install Free Add-on
#google sheets#regex#REGEXMATCH#REGEXEXTRACT#REGEXREPLACE#formulas#data cleaning
Nacho Mascort

Nacho Mascort

Founder at Unlimited Sheets

Spreadsheet power user. Building tools that bring enterprise-grade SEO and data capabilities to Google Sheets.

Frequently Asked Questions

Common questions about this topic

Does Google Sheets support full PCRE regex?
No. Google Sheets uses RE2, not PCRE. RE2 omits lookahead, lookbehind, and atomic grouping. Most everyday patterns work fine, but if you rely on (?=...) or (?<=...) you'll need to restructure your approach using capture groups instead.
When should I use REGEXMATCH vs REGEXEXTRACT?
Use REGEXMATCH when you only need to know whether a pattern exists (yes/no). Use REGEXEXTRACT when you need the actual matched text. REGEXMATCH is also safer as a FILTER condition because it never returns an error โ€” it always returns TRUE or FALSE.
Can REGEXEXTRACT return multiple matches?
Not directly โ€” it returns the first match only. To extract multiple matches from a single cell, you need capture groups for distinct parts (e.g., year/month/day from a date string), or combine REGEXREPLACE with array techniques to iterate through matches.
How do I make regex case-insensitive in Google Sheets?
Prefix your pattern with (?i). For example, =REGEXMATCH(A2, "(?i)error") matches "error", "Error", "ERROR", and every other capitalization.
Why does my backslash escape not work?
In Google Sheets formula strings, the backslash itself needs to be entered as-is โ€” no double-escaping required. Just type \d directly in your pattern string. If you're copying patterns from JavaScript where you'd write \\d, remove the extra backslash.
Do regex formulas slow down large spreadsheets?
They can, especially with complex patterns applied to thousands of rows. Prefer ARRAYFORMULA over dragged-down formulas, cache results as plain text where source data doesn't change, and avoid regex inside deeply nested formulas that recalculate on every edit.
How do I extract all numbers from a cell, not just the first?
REGEXEXTRACT only returns the first match. A common workaround is =REGEXREPLACE(A2, "[^0-9]", "") which removes all non-digits, concatenating any numbers together. If you need them separated, consider splitting the text first or using a helper column approach.
How do I match literal parentheses in Google Sheets regex?
Escape them with a backslash: \( and \). For example, to match a phone number format "(555) 123-4567" use the pattern \(\d{3}\)\s\d{3}-\d{4}.

Stop overpaying for SEO tools

Get keyword rankings, search volumes, SERP data, and AI โ€” all inside Google Sheets. One add-on, one formula.