Regular expressions from the jump appear to be difficult and complex. Although, learning the basics is a lot simpler than you may think. Knowing the basics will enhance your audits, analyses, and tasks so you can provide better data-driven recommendations.
Learning the ins and outs of being a web developer are valuable for any digital marketer due to the overlap of concepts and environment. But expanding on those skills aren’t the easiest and quickest thing to do. A skill that digital marketers and SEOs in particular can benefit from that web developers and programmers use are regular expressions.
Regular expressions will take your analysis to the next level because they are usable in tools like Google Analytics and Screaming Frog. They will also be usable in Google Search Console once they roll out the regular expression filter for their performance reports.
Today we’ll cover the basics of regex to get you up to speed and even walk through a few use case examples that will take your analyses to the next level.
What Is Regex (Regular Expressions)?
A regular expression is a string of characters that are used for matching or managing text.
Escaping Special Characters
The backslash in regex escapes special characters. Escaping characters is important because the metacharacters used denote character classes, so if you are searching for one of them, you’ll need to escape it. Metacharacters that need to be escaped include:
Matching Any Character
The period is a wildcard character for matching any character except for the newline character.
Matching Digits (0-9)
Matches one digit from 0 to 9.
Matches one character that is NOT a digit.
Matching Word Characters (a-z, A-Z, 0-9, _)
Matches one word character or underscore.
Matching Non-Word Characters
Matches one character that is NOT a word character or underscore.
Matching Whitespace (newline, space, tab)
Matches a whitespace character: space, tab, newline, carriage return, and vertical tab.
Matches one character that is NOT a whitespace character.
A word boundary is a position that has a word character on one side and no character on the other.
Matches all positions where a word boundary does NOT match.
Start of a String
Specifies that the following pattern must begin at the first positioned character of the string.
End of a String
Specifies that the preceding pattern must match the end character of the string.
Character Sets and Logic
Matching Characters within Brackets
The straight brackets match one of the characters within.
Matching Characters NOT within Brackets
The carat in the brackets match one of the characters NOT within.
The vertical bar or pipe symbol is a logical character for “OR”.
The parenthesis create a group.
Zero or More (0+)
The asterisk looks for the specified character/group zero or more times.
One or More (1+)
The plus symbol looks for the specified character/group one or more times.
Zero or One (0 | 1)
The plus symbol looks for the specified character/group zero or one time.
The curly brackets look for the specified character/group in the number of times specified.
Range of Numbers (Min, Max)
The comma within the curly brackets look for the specified character/group in a range of numbers specified.
Printable Regex Cheat Sheet
To help you quickly remember each of the basic regular expressions above, we’ve created a free printable regex cheat sheet:
SEO Use Cases for Regular Expressions
Using regex can be daunting and knowing where to start can seem muddled when you’re starting off. To help you find ways to use regex in your day to day, we’ve lined out some of the most popular and practical uses cases.
Screaming Frog is a powerful SEO tool that allows you to crawl websites. Data like URLs, site structure, file types, scripts, and everything that compiles a site can be collected then analyzed. Sometimes there is specific data you want extracted or certain pages you want to pay attention to during your crawl. This is where custom extraction and the include/exclude features come in.
The custom extraction function in Screaming Frog is primarily used for data extraction. You have the option to select regex as a method for scraping data.
An example of using this feature with regex is to extract the dates of your articles. Sometimes your date isn’t shown on the page and can’t be extracted using a CSS class. In a CMS like WordPress, the date of publishing is usually set with the DateTime object, so we can reference that when extracting dates.
In Screaming Frog, navigate to “Configuration” > “Custom” > “Extraction”, then add a new extractor method and set it to “Regex”. From there, you would enter the following regular expression:
What this regular expression is looking for is the DateTime object, and the date in iso format within it. Notice how the group (wrapped in parenthesis), starts by looking for four digits followed by a hyphen, then another two digits, another hyphen, and the last two digits. Screaming Frog pulls the group as the custom extraction, leaving you with a clean URL that you can reformat when you export and open in a spreadsheet software.
Depending on the site you are crawling, using Screaming Frog can take a long time or may need more computer processing resources than you currently have. Fortunately, they have the include/exclude feature to limit and specify your crawl. You can choose which subdomains or subdirectories to include or exclude from your crawl.
An example for using regular expressions in this feature would be to selectively crawl the blog and category pages of a site. You have the ability to set different URL matching regular expressions in different lines, but we’ll utilize logic in place of that today.
For this site, let’s say we are looking to pay attention to just the /category/ pages and /learn/ pages so we can audit our knowledge center. The regular expression would look like:
If you were to set it up on different lines (without the logic) it would look like this:
The period following the URL is trying to match any character, and the asterisk quantifies the matching of any character zero or more times. The pipe or vertical line in between “learn” and “category” looks for any subfolder that matches one OR the other.
As a powerful tool to show you various metrics and valuable information, using regular expressions with Google Analytics is a must.
You can use regular expressions to set table filters to view the data you want to focus on.
An example for using table filters is when looking at the source of traffic. Maybe you want to include or exclude certain sources when looking at how your traffic was acquired.
The regex above looks for any sources in Google Analytics that are from Google OR Bing.
Custom segments are a great way to look at different sets of your data. Regular expressions are usable within the conditions that you can set and are a quicker way to do so. A common regular expression, similar to the table filters above will be using the pipe or vertical line.
An example of using regex for a custom segment would be to match specific countries to see how many users, sessions, or any other GA metric came from those specified countries.
The regular expression above is matching any sessions that came from Australia, Canada, or the United States.
We have the ability to leverage regular expressions when setting up goals in Google Analytics. You would specify this under the “Goal Details” and select “Regular expression”.
For this example, we’ll setup a goal to track the total amount of website actions. Let’s say we have a site that has options for users to download, donate, and purchase items. The URLs for this would be as follows:
To set the goal to track each of these, we would enter the following regular expression under “Goal details”:
The regex above is looking to match any visit to the URL with “thank-you-“. Then followed by matching “download” OR “purchase” OR “donate”, with “.html” at the end.
Google Search Console
It would be great if we could use regular expressions in Google Search Console since we could better strip branded queries from our analysis. Luckily, regular expressions in Google Search Console is coming soon! We’ll update this with examples when it rolls out.
Technical SEO Tasks
Regular expressions are also a great way to accomplish a number of technical SEO tasks. Some tasks that you can accomplish with regex are:
- Optimize Long URLs
- Correct Redirects
- Resolve Canonical URL Issues
You would need to have existing knowledge of the Apache mod_rewrite module and .htaccess files. For the sake of this guide, we’ll skip over those details as they are rather complex. Stay tuned for another post on this!
To help you further your knowledge on regular expressions, we’ve put together the following regex resources.
Learn more about regular expressions with this online resource focused on regex.
Regular Expressions 101
Test and debug your regular expressions with this tool.
Regular expressions are a surefire way to take your SEO skills to the next level. Even the basics are worthwhile to remember as they’ll be useful in your day to day.
- Regex is used for matching and managing text.
- There are characters, anchors, character sets, logic, and quantifiers in regex.
- Tools like Screaming Frog, Google Analytics, and one day Google Search console can be used with regular expressions.