URL Regexator

URL Input

0 URLs and 0 characters

Regex Output

0 characters

Create Regex

Standard regex for multiple different domains

Optimized regex for URLs with the same domain (reading the section "How does it work" in the FAQ before using this option is highly recommended)

   

Regex Customization

Ignore protocols (http:// + https://)    ->   

Ignore traling slashes (/)    ->   

Use double slash escaping (\\.)    ->   

Regex end matching changes

Regex start matching changes

FAQ

URL Regexator is a tool for creating regex from multiple URLs. It will help you with creating custom segments in Google Analytics and analyzing website data. Using Regexator is much faster than creating custom spreadsheets in Excel or Google Sheets with many tricky functions.

Input dataset of URLs or domains into the first text area. These will be used for creating your final regex. The optimal amount is approximately 1.000 rows. Then, based on your data, choose from two options, how to create your regex.

Regex from dataset with different domains

A button Regex (different domains) will create an essential, longer, and less efficient regular expression. On the other hand, it will be much more foolproof and more reliable to use. Let’s see an example with four URLs, four different subdomains, four different paths, one domain name, and 109 characters on input.

  • www.link-brain.com/page
  • test.link-brain.com/
  • link-brain.com/page=123
  • link-brain.com/pagespeed

Final regex will be 137 characters long. You can see that there are some parts repeated over and over again, wasting a lot of space. Specifically, it’s the domain. This type of expression is best used for multiple different domains.
^www\.link\-brain\.com\/page.*|^test\.link\-brain\.com\/contact.*|^alpha\.link\-brain\.com\/resources.*|^omega\.link\-brain\.com\/tools.*

Regex from dataset with the same domain

The second option triggered with a button Regex (same domain) is useful for a dataset with URLs from one single domain. It’s more efficient and saves a lot of characters. Unfortunately, it’s more tricky, and you should be pretty damn sure you know precisely what you are doing and have superb knowledge of your data.

First of all, you need to specify your domain name in the text box below the button. It’s necessary since there is no way to detect the domain with 100 %. And there are several ways to specify the domain. We’ll get to that later. For now, let’s input the same dataset of URLs as before, type link-brain.com into the domain detection field, and check the result. ^(www\.|test\.|alpha\.|omega\.)link\-brain\.com(\/page|\/contact|\/resources|\/tools).*

It’s only 87 characters long and looks a bit more compact. There is still some room for improvement. In the text box, you can specify more or fewer details. Type dot before a domain and/or trailing slash after. It can further reduce the final length of the expression. But it will also influence the regex matching logic. Let’s check other quick examples.

Now type domain with the dot at the beginning .link-brain.com. You will save six characters with a total regex length of 81 characters. ^(www|test|alpha|omega)\.link\-brain\.com(\/page|\/contact|\/resources|\/tools).*

And now try the dot at the beginning and slash at the end .link-brain.com/. You will save another six characters with a total regex length of 75 characters. ^(www|test|alpha|omega)\.link\-brain\.com\/(page|contact|resources|tools).*

Empty rows, empty groups, whitespaces

All empty rows will be automatically ignored. And all empty groups as well. For example, instead of regex with an empty group, you will get a nice and clean expression. Whitespaces are also automatically trimmed.

Final tip

How to use this tool depends on your data and knowledge of it. Some level of understanding of regular expressions is necessary. URL Regexator is simple automation for more advanced users. You should know at least a little bit about what you are doing and how it all works.

You can customize your final regex with two select buttons for ignoring protocols (http:// and https://) at the beginning and trailing slashes at the end of your URLs.

Generally, protocols are not a very useful and necessary thing for matching. With trailing slashes, it depends on the specific situation. It’s up to you.

By default, both options are inactive. If you want to use them, you must check them before clicking on any Regex creation button.

Wild card matching with end .*

At this moment, you can customize the end of your regex. By default ending characters are .*. This means open-end and such regex will match any character. Let’s check a few examples.

Regex ^link\-brain\.com\/page.* will match all of the following URLs:

  • link-brain.com/page
  • link-brain.com/page/
  • link-brain.com/page=123
  • link-brain.com/pagespeed

Exact matching with end $

You can change .* to $. A dollar sign is an anchor used at the end of the regex. It marks the end of the string. And only URLs with an exact match will be selected.

Let’s change the example regex ^link\-brain\.com\/page$. From the URLs above, only one will be matched:

  • link-brain.com/page
  • link-brain.com/page/
  • link-brain.com/page=123
  • link-brain.com/pagespeed

Exact matching with start ^

The final character to explain is also an anchor. Caret ^ marks the beginning of the string and matching. Strings with different start won’t be matched. Let’s use the open-ended regex again ^link\-brain\.com\/page.* and slightly modify some of the URLs.

  • link-brain.com/page
  • www.link-brain.com/page/
  • link-brain.com/page=123
  • brain.com/pagespeed

Wild card matching with start .*

You can also replace caret with .* to perform wild card match. Let’s check a quick example with following regex .*link\-brain\.com\/page.*.

  • link-brain.com/page
  • www.link-brain.com/page/
  • link-brain.com/page=123
  • brain.com/pagespeed

Theoretically, there are no limits. But it’s a good idea to use this tool for approximately 1.000 URLs. With 10.000 URLs, it will be noticeably slower. Above these limits, every task can take up to 60 seconds and more.

If you need to process more data, you should try using Excel, Google Sheets, or some other table processor. URL Regexator is suitable for up to 1.000-10.000 URLs. And it’s good to notice that Google Analytics also has some limits and 1.000 URLs in a regex is quite a lot, and there is a good chance it won’t be handled well by GA.

This tool does not gather personal data, inputs or outputs. Usage is fully anonymous, free of charge and free of ads.

info@entrop.ee