Insert a Unicode Word Break Value

A Unicode word break value is one of the many Unicode properties that you can insert via the Insert Token button on the Create panel.

Insert a Unicode word break value

Every Unicode code point has exactly one value for the Grapheme_Cluster_Break property. This property is part of Unicode Standard Annex 29 (UAX 29) titled “Unicode Text Segmentation”. This property is used to determine the boundaries between words. Such a boundary is a word break. The property alone does not determine where the breaks are. Rather, the rules in UAX 29 use the values that the characters before and after a position in the text have for this property to determine whether there is a word break at that position.

It’s not very likely that you would need to match this property with a regular expression. You could use it to implement the rules in UAX 29 using regular expressions. But most regex flavors that support this property also support \b{wb} to match an actual word break according to UAX 29, or they have the mode modifier (?w) to make \b implement UAX 29 rather than a traditional word boundary. You can insert one of those with the Word Boundary (UAX 29) item in the Anchor submenu of the Insert Token menu.