Insert a Unicode Sentence Break Value

A Unicode sentence break value is one of the many Unicode properties that you can insert via the Insert Token button on the Create panel.

Insert a Unicode sentence break value

Every Unicode code point has exactly one value for the Grapheme_Cluster_Break property. This property is part of Unicode Standard Annex 29 (UAX 29) titled “Unicode Text Segmentation”. This property is used to determine the boundaries between sentences. Such a boundary is a sentence break. The property alone does not determine where the breaks are. Rather, the rules in UAX 29 use the values that the characters before and after a position in the text have for this property to determine whether there is a sentence break at that position.

It’s not very likely that you would need to match this property with a regular expression. You could use it to implement the rules in UAX 29 using regular expressions. But most regex flavors that support this property also support \b{sb} to match an actual sentence break according to UAX 29. You can insert that with the Sentence Boundary item in the Anchor submenu of the Insert Token menu.