Insert a Regex Token to Match One Character from Predefined POSIX Classes

The Insert Token button on the Create panel makes it easy to insert the following regular expression tokens to match one character out of many possible characters. See the Insert Token help topic for more details on how to build up a regular expression via this menu.

The POSIX standard defines a number of POSIX character classes, such as “alpha” for letters and “digit” for numbers. On a fully POSIX compliant system, these classes will include letters and digits from other languages and scripts, rather than just a to z and 0 to 9.

When you select POSIX Class in the Insert Token menu, a dialog box showing all the POSIX classes appears.

Insert a POSIX class

At the top of the dialog box, you can select whether you want to match a character that fits one of the POSIX classes you’ll select, or one that does not fit any of the selected classes.

The classes are arranged in a sort of tree. If a checkbox is indented under another one, that means that the indented one is a strict subset of the one above it. E.g. all characters in the “space” class are also part of the “blank” class. When you tick a checkbox, RegexBuddy will automatically tick the wholly contained classes as well. E.g. ticking “space” will also tick “blank”. When you untick a checkbox, RegexBuddy will automatically untick all classes that contain any of the unticked class’s characters. E.g. unticking “xdigit” will also untick “alpha”, because “xdigit” includes a-f which are also part of “alpha”.

An exception to the tree structure is the “ascii” class. When a regex flavor only supports ASCII, it includes every possible character. However, most flavors support characters beyond ASCII. With those flavors, most POSIX classes will include non-ASCII characters. The “ascii” class, however, always matches one of the 128 ASCII characters. So RegexBuddy’s POSIX class dialog treats the “ascii” class separately.

Below the tree you can select whether you want the POSIX class to match only ASCII characters, or all relevant characters from the active code page, or any relevant character supported by Unicode. Most applications give you only one choice.

If your application matches only ASCII characters with POSIX classes, then the grid at the bottom shows the 128 ASCII characters. It highlights the characters included in the selected POSIX classes. Clicking or moving the mouse over the grid has no effect.

If your application matches non-ASCII characters with (some) POSIX classes, then the grid at the bottom shows the characters matched by the POSIX class or classes you have ticked. If you move the mouse over the grid, you can see the hexadecimal and decimal representations of each character’s code point in the Unicode standard. The grid will be empty if you didn’t tick any POSIX classes. The grid won’t highlight anything, and clicking it still won’t do anything.