Getting Started with RegexBuddy

This section provides a brief overview of what you can do with RegexBuddy. It won’t really try to explain anything. RegexBuddy is quite straightforward to use, so you can jump right in. For more details, click on the links in the text below.

By default, RegexBuddy shows the regular expression and regex history at the top. The bottom area shows eight tabs: Create, Convert, Test, Debug, Use, Library, GREP and Forum. If you have a large monitor, you can arrange the tabs side by side in two groups as shown below. To do so, click on the View button in the toolbar. It’s the third button from the right in the topmost toolbar. Select Side by Side Layout in the View menu. If you have two monitors, the Dual Monitor Side by Side layout gives you a maximum view. You can also rearrange the tabs manually by dragging and dropping them with the mouse. Panels can be tabbed, docked or floating. Toolbars can also be rearranged and made to float.

The screen shot below shows RegexBuddy in side by side layout in its full glory. Other screen shots in this help file will be smaller with most of the panels and toolbars hidden, to keep the file size and download time reasonable. Read on below the screen shot to learn how to create your first regular expression with RegexBuddy.

RegexBuddy in side by side layout

If at First You Don’t Succeed: Cheat

Everything is easier if you cheat, so we’ll start with that. While RegexBuddy is designed to help you create and test regular expressions, and learn everything about them, it also comes with a handy library of regular expressions that you’ll find useful in many situations. To access it, simply click on the Library panel. Click on a regex that interests you, push the Use button, and pick Use Regex and Test Subject.

RegexBuddy explains how this regular expression works on the Create panel. Each node in the tree corresponds with one elementary piece of the regular expression, called a token. If you click on a node, the corresponding token will be selected in the regular expression. Click the Explain Token button to open RegexBuddy’s regular expressions tutorial at the page that explains the node you selected in the tree.

Reading through the whole tutorial from the first page to the last page can be quite overwhelming. Learning as you go by selecting regular expressions from the library and reading relevant parts through Explain Token is more pleasant.

Building Your First Regular Expression

The Create panel and the Test panel in RegexBuddy are two powerful tools to help you create regular expressions that match exactly what you want. If you have the screen space, it’s a good idea to keep both visible in the side by side view.

The Create panel explains your regular expression in plain English. Yet, it does maintain a one-on-one relationship with the actual regular expression syntax. This way it helps you learn the actual syntax, rather than being a crutch you’ll forever depend upon. As you become more comfortable with regular expressions, you’ll start typing in more and more of your regular expressions directly rather than going via the Create panel and its Insert Token menu. But even as an expert, you’ll still use the Create panel to help you analyze long regular expressions. Its tree structure is often easier to grasp than a long-winded linear regular expression.

The Test panel shows you what your regular expression actually does. It’s a sandbox where you can test your regular expression, before mauling actual data.

Enough talk! Let’s create our first regular expression to match a date in American mm/dd/yy format, with years from 00 to 99, and optional leading zeros for the day and month. Now read that sentence again. You may not realize it yet, but you’ll soon learn through bitter experience that that rather long sentence is the most important step in crafting a regular expression that does exactly what you want. That is: knowing exactly what you want. If you don’t know whether leading zeros should be optional or not, or if the year should have 2 or 4 digits, there’s no hope for you. Don’t launch RegexBuddy until you know what the job is.

Once you know the job, codify it by preparing test data. You can open a file, download a web page, or just type your samples directly into the Test panel. For this example, we’ll enter multiple test subjects line by line. So start with choosing the “Line by line” option in the drop-down list on the Test toolbar. Then copy and paste the following lines:

Valid:
1/1/01
01/01/78
12/31/99
Invalid:
0/0/0
0/0/00
1/1/1
12/32/52
19/19/19
1/9/1999
1212/12/12
On 6/2/07 I wrote this
On 6/24/13 I edited this

Particularly the invalid examples are important. It’s often much harder to “see” which undesired matches a regular expression will produce than it is to see that a regular expression will match everything you want.

Now, let’s start crafting our regular expression. Begin with clicking the Clear History button in the History. It looks like a File|New icon. This makes sure we start with a clean slate. To get the same results as explained below, select “C# (.NET 2.0–4.5)” in the list of applications. If you select another application, you may get slightly different results.

The easiest way to create a regular expression, is to have your sample matches ready on the Test panel, and simply proceed from left to right. Regular expressions work with text, character by character. So we’ll have to translate what we want, our date format, into a pattern of characters.

First up is the month, which consist of a digit 0 or 1, followed by a digit 0 through 9. The first digit is optional if the number is less than 10. Let’s try this. Click the Insert Token button on the Create panel, and select Character Class. In the box “literal characters”, type “01” (zero one, without the quotes), and click OK. We just created our very first regular expression: [01]. The Test panel immediately highlights all digits 0 and 1. Now, this first token has to be optional. So we click Insert Token again, and select Quantifier (repetition). Set the minimum to zero and the maximum to one. This essentially makes the character class token optional. Choose the “greedy” option, and click OK. RegexBuddy puts a question mark after the character class: [01]?. A question mark in a regular expression indeed makes the preceding token optional. To finish our month number, we insert another character class. Select Insert Token, Character Class and click the Clear button. Under “range of characters”, type 0 in the left box, and 9 in the right. Click OK. The regex so far is [01]?[0-9]. The test panel highlights a whole bunch of numbers. (Can you spot our first mistake? More about that later.)

The date separator is easy: a literal slash. Click Insert Token, Literal Text, type a forward slash, and click OK. RegexBuddy appends a forward slash to your regex. Very clever. The highlighting on the Test panel changes dramatically. We’ve already progressed to the point where years are no longer matched as lonely digits.

Next up is the day. It consists of two numbers. An optional digit between 0 and 3, and a required digit between 0 and 9. We already know how to match a range of digits and how to make one optional. So just type [0-3]?[0-9] at the end of your regex. Adding the date separator while we’re at it, we get: [01]?[0-9]/[0-3]?[0-9]/. Things are starting to shape up.

The year consists of two digits ranging from 0 to 9 each. You could just type in [0-9][0-9]. Or, you could type in [0-9] and get some practice inserting a quantifier that repeats the token twice. The result is then: [01]?[0-9]/[0-3]?[0-9]/[0-9]{2}.

All done! Our regex matches what we want. Copy it into the source code, compile, ship to customer, and wait for the bug reports to roll in.

The regular expression indeed matches the dates we want. But it also matches a bunch of stuff we don’t want! How important is this? This is yet another thing that must be specified in the requirements. If you’re parsing a computer-generated database export that you know will only contain valid dates, you could just use [0-9]{2}/[0-9]{2}/[0-9]{2} to grab all dates. No need to make your regex complicated to filter out 99/99/99, because the database can’t store that no-date anyway. But if you’re going to process user-provided data, you’d better case your regex in molded stainless steel with a shiny embossed logo.

The first problem is that our month and day parts allow too many numbers, like 0 and 19 for the month, and 0 and 32 for the day. Let’s begin with the month. While the first digit is indeed an optional 0 or 1, and the second digit is always between 0 and 9, there’s another restriction we didn’t put into the regex: if the first digit is 1, then the second digit must be between 0 and 2. And if the first digit is 0 or missing, then the second digit can’t be zero. So we essentially have two alternatives for the second digit, depending on what the first digit is. Let’s do this.

First, delete the tokens for matching the month from the regex, leaving /[0-3]?[0-9]/[0-9]{2}. Put the cursor at the start of the regex. For the first alternative, we have an optional zero and a digit between 1 and 9. We already know how to do this, so just type: 0?[1-9]. Now we need to tell RegexBuddy we want to add an alternative to what we just typed. This we do by selecting the Alternation item in the Insert Token menu. RegexBuddy will insert a vertical bar, also known as the pipe symbol. Now we type in the second alternative: 1[0-2] matches 10, 11 and 12. Our regex is now 0?[1-9]|1[0-2]/[0-3]?[0-9]/[0-9]{2}.

Unfortunately, that didn’t quite go as planned. The Test panel now highlights individual digits all over the place. The Create panel tells us why: the vertical bar alternates the what’s to the left of it with everything that’s to the right of it. You can see that by clicking on “match this alternative” in the regex tree, and then on “or match this alternative” below. The first alternative is correct, but the second one should stop at the /.

To do this, we need to group the two alternatives for the month together. In the regular expression, select 0?[1-9]|1[0-2]. Do this like you would select text in any text editor. Then click Insert Token, and select Numbered Capturing Group. We could have used a non-capturing group since we’re not interested in capturing anything. However, non-capturing groups use a more complicated syntax than numbered capturing groups. You can try them if you want though. It won’t make any difference in this example.

Now let’s look at the Create panel again: the capturing group’s node now sits on the same level in the tree as the two tokens that match the / literally. The two nodes for the alternatives for the date sit nice and cozy below the group node. If you click on them again, you’ll see each alternative selects exactly the two alternatives we typed in three paragraphs ago.

We can use the exact same technique for the day of the month. If the first digit is zero or missing, we match 0?[1-9]. If the first digit is a 1 or 2, we match [12][0-9]. If the first digit is a 3, we match 3[01]. Put together in a group with alternation, we match the month with: (0?[1-9]|[12][0-9]|3[01]).

Our overall regex is now (0?[1-9]|1[0-2])/(0?[1-9]|[12][0-9]|3[01])/[0-9]{2}. Looking at it like this, you can see why even regex gurus find RegexBuddy’s Create panel helpful. Even though you already know all the syntax used, the regex tree helps to analyze what’s going on.

We’re almost there. Everything highlighted on the Test panel is now a valid mm/dd/yy date. However, the regex is being sneaky and matching text that looks like a date from the middle of longer strings. There are two ways to go about this. When validating user input, you’ll want to check that the input is nothing but a date. For that, we can use start-of-string and end-of-string anchors. Place the cursor at the start of the regex and click Insert Token, Anchors, Beginning of The String. Move the cursor to the end of the regex, and click Insert Token, Anchors, End of The String. Our final regex is \A(0?[1-9]|1[0-2])/(0?[1-9]|[12][0-9]|3[01])/[0-9]{2}\z.

If you wanted to extract the date from “On 6/2/07 I wrote this” (I did!), you can’t use the \A and \z anchors. In that case, use Insert Token, Anchors, Word Boundary instead of Beginning or End of The String. A word boundary checks if the match isn’t in the middle of a word or number.

So how about that shiny embossed logo? Easy! Click Insert Token, Comment and type “shiny embossed logo”. Done!

How to Figure This out on Your Own

Of course you’re asking me how you’re supposed to know what to pick from the Insert Token menu. Well, RegexBuddy is designed to teach you about regular expressions while making it much easier to work with them regardless of your experience. So it doesn’t try to hide how regular expressions work. To the contrary: just like the regex tree on the Create panel has a one-on-one relationship with the actual regular expression syntax, so does the Insert Token menu.

At first, you’ll find this confusing. But you’ll soon get the hang of it. The next step after working through this “getting started” tutorial is to read the regular expressions quick start and the help topics for the Insert Token menu. It’s only a few pages. They’ll give you a great overview of exactly what you can do with regular expressions. Then it’ll be much easier to experiment with the Insert Token menu.

Remember:

  1. Figure out exactly what you want to match, and what you don’t want to match. Write it down.
  2. On the Test panel, put in examples of everything you want to match, and everything you don’t want to match.
  3. Create more stupid variations of everything you don’t want to match on the Test panel.
  4. Craft your regular expression, from left to right, bit by bit.
  5. Fix up your regex until the Test panel says you’re good to go.