Insert a Token to Recurse into The Regex or a Capturing Group

The Insert Token button on the Create panel makes it easy to insert tokens that recurse into the whole regular expression or into a capturing group. Only a few regex engines such as Perl, PCRE, and Ruby support this.

Recursion into The Whole Regular Expression

With (?R) or \g<0> you can make your regular expression recurse into itself. The Recursion item in the Insert Token menu automatically selects the correct syntax for your application.

You’ll need to make sure that your regular expression does not recurse infinitely. The recursion token must not be the first token in the regex. The regex must match at least one character before the next recursion, so that it will actually advance through the string. The regex also needs at least one alternative that does not recurse, or the recursion itself must be optional, to allow the recursion to stop at some point.

Recursion is mostly used to match balanced constructs. The regex \([^()]*+(?:(?R)[^()]*+)*+\) matches a pair of parentheses with all parentheses between them correctly nested, regardless of how many nested pairs there are or how deeply they are nested. This regex satisfies both requirements for valid recursion. The recursion token is preceded by \( which matches sure that at least one character (an opening parenthesis) is matched before the next recursion is attempted. The recursion is also optional because it is inside a group that is made optional with the quantifier *+.

Insert recursion

Subroutine Calls

If you’ve added one or more numbered or named capturing groups to your regular expression then you can make that group recurse into itself. (?1) or \g<1> recurses into a numbered group, while (?&name) or \g<name> recurses into a named group. The regex \A(\([^()]*+(?:(?1)[^()]*+)*+\))\z matches a pair of properly nested parentheses in the same way the example in the previous section does, but adds anchors to make the regex match the whole string (or not at all). The anchors need to be excluded from the recursion, which we do by adding a capturing group and limiting the recursion to the capturing group.

You can use the same syntax to insert a subroutine call to a named or numbered capturing group. (\d++)\+(?1)=(?1) is equivalent to (\d++)\+(?:\d++)=(?:\d++) and matches something like 1+2=3. This illustrates the key difference between a subroutine call and a backreference. A backreference matches the exact same text that was most recently matched by the group. A subroutine call reuses the part of the regex inside the group. Subroutine calls can significantly increase the readability and reduce the complexity of regular expressions that need to match the same construct (but not the exact same text) in more than one place. If we extend these two regex to match sums of floating point numbers in scientific notation, they become ([0-9]*+\.?+[0-9]++([eE][-+]?+[0-9]++)?+)\+(?1)=(?1) and ([0-9]*+\.?+[0-9]++([eE][-+]?+[0-9]++)?+)\+(?:[0-9]*+\.?+[0-9]++([eE][-+]?+[0-9]++)?+)=(?:[0-9]*+\.?+[0-9]++([eE][-+]?+[0-9]++)?+).

To get the correct syntax for your application, select Subroutine Call in the Insert Token menu. In the window that appears, click inside the capturing group to which you want to insert a subroutine call. RegexBuddy automatically inserts a named subroutine call when you select a named group, and a numbered subroutine call when you select a numbered group.

Insert a subroutine call

Different Behavior of Recursion in Different Applications

Recursion is a relatively new addition to the regular expression syntax. Even the first three popular regex engines to support it—Perl, PCRE, and Ruby—can’t agree on the finer details of how recursion should behave. They’ve copied each other’s syntax for the most part (leading to multiple syntax options for the same thing), but not their behavior. The developers of these regex engines likely didn’t test enough corner cases when copying each other’s features, or didn’t think that these corner cases were common enough to worry about.

Fortunately for you, RegexBuddy does worry about these differences. The Insert Subroutine Call shows how the selected application behaves. The Create panel explains the exact behavior when in Detailed mode. The Test panel always correctly emulates each application’s behavior.

The differences don’t affect any of the examples on this page. They only use possessive quantifiers which never backtrack anyway. The regex that is recursed as a whole doesn’t have any capturing groups, and the regexes with subroutine calls don’t have any capturing groups inside those subroutine calls.