Insert a Unicode Numeric Type or Value

A Unicode numeric type and numeric value are two of the many Unicode properties that you can insert via the Insert Token button on the Create panel.

Insert a Unicode numeric type or value

Every Unicode code point has exactly one value for the Numeric_Type property and another value for the Numeric_Value property. RegexBuddy provides a single menu item for inserting these because the two properties are closely related. The item you select in the list determines both which of the two properties your regex needs to match, and which value for that property it needs to match.

There are four possible values for the Numeric_Type property. Each value is a word which can be abbreviated to 2 letters if you tick “short numeric type names”. The property itself is the same in all Unicode versions. But code points have been shuffled between the different numeric types, particularly prior to Unicode 4.1.0. New Unicode versions continue to add new code points as it continues to add writing systems that have their own characters representing numbers.

The set of code points with Numeric_Type=None is always identical to the set of code points with Numeric_Value=NaN. But while all applications that support the numeric properties support \p{Numeric_Type=None}, some of those applications do not support \p{Numeric_Value=NaN}. So RegexBuddy omits Numeric_Value=NaN from its list because it is redundant and not always supported.

All other values for Numeric_Value are numbers. The list shows all values that are assigned to at least one code point in the version of Unicode that your target application supports. Various Unicode versions have expanded this list over the years.

RegexBuddy shows values larger than 1 million with exponential notation, such as 1e9 for 1 billion. It uses that notation when generating the regex token if the target application supports it.

RegexBuddy shows values that are not whole numbers as fractions. RegexBuddy uses the fraction or its equivalent decimal number when generating the regex token, depending on what the target application supports.

Unicode 8.0.0 expanded its support for the Meroitic script with characters that represent fractions from 1/12 to 11/12. No other Unicode characters represent twelfth fractions. Mathematically, the fractions 2/12, 3/12, 4/12, 6/12, 8/12, 9/12, and 10/12 are equivalent, respectively, to the fractions 1/6, 1/4, 1/3, 1/2, 2/3, and 3/4. These are represented by other characters in Unicode, including the vulgar fractions that are popular in the USA. This probably won’t affect you. The Meroitic script went extinct some 1,600 years ago. It was used in the Kush kingdom, in an area in present-day Sudan. But, if Unicode supports it, then RegexBuddy must properly support it too.

A regex token that specifies the decimal value of one of these fractions matches characters representing both fractions. If the application supports Unicode 8.0.0 and astral characters then \p{Numeric_Value=0.8333} matches 3 characters: the vulgar fraction ⅚, the cuneiform numeric sign five sixths 𒑜, and the Meroitic cursive fraction ten twelfths 𐧿. Select the option “treat equivalent fractions as having the same numeric value” if this is what you want.

A regex token that specifies a fraction, if the application supports that at all, may or may not distinguish between those fractions. In some applications, \p{Numeric_Value=5/6} matches only the vulgar and cuneiform fractions while \p{Numeric_Value=10/12} matches only the Meroitic fraction. For those applications, RegexBuddy enables the option “treat equivalent fractions as having distinct decimal numbers”. But in other applications, both \p{Numeric_Value=5/6} and \p{Numeric_Value=10/12} match all 3 characters. Then RegexBuddy disables the option “treat equivalent fractions as having distinct decimal numbers”.