Module props

Source
Expand description

This module defines all available properties.

Properties may be empty marker types and implement BinaryProperty, or enumerations1 and implement EnumeratedProperty.

BinaryPropertys are queried through a CodePointSetData, while EnumeratedPropertys are queried through CodePointMapData.

In addition, some EnumeratedPropertys also implement ParseableEnumeratedProperty or NamedEnumeratedProperty. For these properties, PropertyParser, PropertyNamesLong, and PropertyNamesShort can be constructed.


  1. either Rust enums, or Rust structs with associated constants (open enums) 

Structs§

Alnum
Characters with the Alphabetic or Decimal_Number property.
Alphabetic
Alphabetic characters.
AsciiHexDigit
ASCII characters commonly used for the representation of hexadecimal numbers.
BasicEmoji
Characters and character sequences intended for general-purpose, independent, direct input.
BidiClass
Enumerated property Bidi_Class
BidiControl
Format control characters which have specific functions in the Unicode Bidirectional Algorithm.
BidiMirrored
Characters that are mirrored in bidirectional text.
BidiMirroringGlyph
This is a bitpacked combination of the Bidi_Mirroring_Glyph, Bidi_Mirrored, and Bidi_Paired_Bracket_Type properties.
Blank
Horizontal whitespace characters
CanonicalCombiningClass
Property Canonical_Combining_Class. See UAX #15: https://www.unicode.org/reports/tr15/.
CaseIgnorable
Characters which are ignored for casing purposes.
CaseSensitive
Characters that are either the source of a case mapping or in the target of a case mapping.
Cased
Uppercase, lowercase, and titlecase characters.
ChangesWhenCasefolded
Characters whose normalized forms are not stable under case folding.
ChangesWhenCasemapped
Characters which may change when they undergo case mapping.
ChangesWhenLowercased
Characters whose normalized forms are not stable under a toLowercase mapping.
ChangesWhenNfkcCasefolded
Characters which are not identical to their NFKC_Casefold mapping.
ChangesWhenTitlecased
Characters whose normalized forms are not stable under a toTitlecase mapping.
ChangesWhenUppercased
Characters whose normalized forms are not stable under a toUppercase mapping.
Dash
Punctuation characters explicitly called out as dashes in the Unicode Standard, plus their compatibility equivalents.
DefaultIgnorableCodePoint
For programmatic determination of default ignorable code points.
Deprecated
Deprecated characters.
Diacritic
Characters that linguistically modify the meaning of another character to which they apply.
EastAsianWidth
Enumerated property East_Asian_Width.
Emoji
Characters that are emoji.
EmojiComponent
Characters used in emoji sequences that normally do not appear on emoji keyboards as separate choices, such as base characters for emoji keycaps.
EmojiModifier
Characters that are emoji modifiers.
EmojiModifierBase
Characters that can serve as a base for emoji modifiers.
EmojiPresentation
Characters that have emoji presentation by default.
ExtendedPictographic
Pictographic symbols, as well as reserved ranges in blocks largely associated with emoji characters
Extender
Characters whose principal function is to extend the value of a preceding alphabetic character or to extend the shape of adjacent characters.
FullCompositionExclusion
Characters that are excluded from composition.
GeneralCategoryGroup
Groupings of multiple General_Category property values.
GeneralCategoryOutOfBoundsError
Error value for impl TryFrom<u8> for GeneralCategory.
Graph
Invisible characters.
GraphemeBase
Property used together with the definition of Standard Korean Syllable Block to define “Grapheme base”.
GraphemeClusterBreak
Enumerated property Grapheme_Cluster_Break.
GraphemeExtend
Property used to define “Grapheme extender”.
GraphemeLink
Deprecated property.
HangulSyllableType
Enumerated property Hangul_Syllable_Type
HexDigit
Characters commonly used for the representation of hexadecimal numbers, plus their compatibility equivalents.
Hyphen
Deprecated property.
IdContinue
Characters that can come after the first character in an identifier.
IdStart
Characters that can begin an identifier.
Ideographic
Characters considered to be CJKV (Chinese, Japanese, Korean, and Vietnamese) ideographs, or related siniform ideographs
IdsBinaryOperator
Characters used in Ideographic Description Sequences.
IdsTrinaryOperator
Characters used in Ideographic Description Sequences.
IndicSyllabicCategory
Property Indic_Syllabic_Category. See UAX #44: https://www.unicode.org/reports/tr44/#Indic_Syllabic_Category.
JoinControl
Format control characters which have specific functions for control of cursive joining and ligation.
JoiningType
Enumerated property Joining_Type.
LineBreak
Enumerated property Line_Break.
LogicalOrderException
A small number of spacing vowel letters occurring in certain Southeast Asian scripts such as Thai and Lao.
Lowercase
Lowercase characters.
Math
Characters used in mathematical notation.
NfcInert
Characters that are inert under NFC, i.e., they do not interact with adjacent characters.
NfdInert
Characters that are inert under NFD, i.e., they do not interact with adjacent characters.
NfkcInert
Characters that are inert under NFKC, i.e., they do not interact with adjacent characters.
NfkdInert
Characters that are inert under NFKD, i.e., they do not interact with adjacent characters.
NoncharacterCodePoint
Code points permanently reserved for internal use.
PatternSyntax
Characters used as syntax in patterns (such as regular expressions).
PatternWhiteSpace
Characters used as whitespace in patterns (such as regular expressions).
PrependedConcatenationMark
A small class of visible format controls, which precede and then span a sequence of other characters, usually digits.
Print
Printable characters (visible characters and whitespace).
QuotationMark
Punctuation characters that function as quotation marks.
Radical
Characters used in the definition of Ideographic Description Sequences.
RegionalIndicator
Regional indicator characters, U+1F1E6..U+1F1FF.
Script
Enumerated property Script.
SegmentStarter
Characters that are starters in terms of Unicode normalization and combining character sequences.
SentenceBreak
Enumerated property Sentence_Break.
SentenceTerminal
Punctuation characters that generally mark the end of sentences.
SoftDotted
Characters with a “soft dot”, like i or j.
TerminalPunctuation
Punctuation characters that generally mark the end of textual units.
UnifiedIdeograph
A property which specifies the exact set of Unified CJK Ideographs in the standard.
Uppercase
Uppercase characters.
VariationSelector
Characters that are Variation Selectors.
VerticalOrientation
Property Vertical_Orientation
WhiteSpace
Spaces, separator characters and other control characters which should be treated by programming languages as “white space” for the purpose of parsing elements.
WordBreak
Enumerated property Word_Break.
Xdigit
Hexadecimal digits
XidContinue
Characters that can come after the first character in an identifier.
XidStart
Characters that can begin an identifier.

Enums§

BidiPairedBracketType
The enum represents Bidi_Paired_Bracket_Type.
GeneralCategory
Enumerated property General_Category.

Traits§

BinaryProperty
A binary Unicode character property.
EmojiSet
An Emoji set as defined by Unicode Technical Standard #51.
EnumeratedProperty
A Unicode character property that assigns a value to each code point.
NamedEnumeratedProperty
A property whose value names can be represented as strings.
ParseableEnumeratedProperty
A property whose value names can be parsed from strings.