Insert a Unicode Block

A Unicode block is on of the many Unicode properties that you can insert via the Insert Token button on the Create panel.

Insert a Unicode block

The Unicode standard divides the Unicode character map into different blocks or ranges of code points. Characters with similar purposes are grouped together in Unicode blocks. The arrangement is not 100% strict. Some characters are placed in what seems to be the wrong block, mostly for historic reasons, such as compatibility with legacy character encodings. Though some blocks have the same names as scripts, they don’t necessarily include the same characters. If you want to match characters based on their meaning to human readers, use Unicode scripts. If you want to match characters based on their Unicode code points, use Unicode blocks.

The names of Unicode blocks can be quite long. For many of the longer block names, Unicode provides an alias that is shorter. If your application supports those aliases then you can tick “short block names” to use those instead.

Related blocks such as Latin_Extended-A through Latin_Extended-F can be scattered as newer versions of Unicode had to use gaps between other blocks to add new blocks for new characters. RegexBuddy presents the list of Unicode blocks in the order of the code points that they cover.

The window shows a preview of the characters in the block that you select. Every Unicode block covers a single range of code points. The number of code points is always a multiple of 16. The grid indicates these code point ranges. The leftmost column indicates the code point without the final hexadecimal digit. The top row indicates that final hexadecimal digit.

The only exception to all this is the block named No_Block. This is the value used for all the unassigned code points in the gaps between the Unicode blocks. RegexBuddy shows this first in the list if your application supports it. Note that many other blocks also contain unassigned code points. Later versions of Unicode can fill in those code points to add more characters to the block. If you select a block that includes unassigned code points then your regular expression can match those.