Expand description
This module defines all available properties.
Properties may be empty marker types and implement BinaryProperty
, or enumerations1
and implement EnumeratedProperty
.
BinaryProperty
s are queried through a CodePointSetData
,
while EnumeratedProperty
s are queried through CodePointMapData
.
In addition, some EnumeratedProperty
s also implement ParseableEnumeratedProperty
or
NamedEnumeratedProperty
. For these properties, PropertyParser
,
PropertyNamesLong
, and PropertyNamesShort
can be constructed.
either Rust
enum
s, or Ruststruct
s with associated constants (open enums) ↩
Structs§
- Alnum
- Characters with the
Alphabetic
orDecimal_Number
property. - Alphabetic
- Alphabetic characters.
- Ascii
HexDigit - ASCII characters commonly used for the representation of hexadecimal numbers.
- Basic
Emoji - Characters and character sequences intended for general-purpose, independent, direct input.
- Bidi
Class - Enumerated property Bidi_Class
- Bidi
Control - Format control characters which have specific functions in the Unicode Bidirectional Algorithm.
- Bidi
Mirrored - Characters that are mirrored in bidirectional text.
- Bidi
Mirroring Glyph - This is a bitpacked combination of the
Bidi_Mirroring_Glyph
,Bidi_Mirrored
, andBidi_Paired_Bracket_Type
properties. - Blank
- Horizontal whitespace characters
- Canonical
Combining Class - Property Canonical_Combining_Class. See UAX #15: https://www.unicode.org/reports/tr15/.
- Case
Ignorable - Characters which are ignored for casing purposes.
- Case
Sensitive - Characters that are either the source of a case mapping or in the target of a case mapping.
- Cased
- Uppercase, lowercase, and titlecase characters.
- Changes
When Casefolded - Characters whose normalized forms are not stable under case folding.
- Changes
When Casemapped - Characters which may change when they undergo case mapping.
- Changes
When Lowercased - Characters whose normalized forms are not stable under a
toLowercase
mapping. - Changes
When Nfkc Casefolded - Characters which are not identical to their
NFKC_Casefold
mapping. - Changes
When Titlecased - Characters whose normalized forms are not stable under a
toTitlecase
mapping. - Changes
When Uppercased - Characters whose normalized forms are not stable under a
toUppercase
mapping. - Dash
- Punctuation characters explicitly called out as dashes in the Unicode Standard, plus their compatibility equivalents.
- Default
Ignorable Code Point - For programmatic determination of default ignorable code points.
- Deprecated
- Deprecated characters.
- Diacritic
- Characters that linguistically modify the meaning of another character to which they apply.
- East
Asian Width - Enumerated property East_Asian_Width.
- Emoji
- Characters that are emoji.
- Emoji
Component - Characters used in emoji sequences that normally do not appear on emoji keyboards as separate choices, such as base characters for emoji keycaps.
- Emoji
Modifier - Characters that are emoji modifiers.
- Emoji
Modifier Base - Characters that can serve as a base for emoji modifiers.
- Emoji
Presentation - Characters that have emoji presentation by default.
- Extended
Pictographic - Pictographic symbols, as well as reserved ranges in blocks largely associated with emoji characters
- Extender
- Characters whose principal function is to extend the value of a preceding alphabetic character or to extend the shape of adjacent characters.
- Full
Composition Exclusion - Characters that are excluded from composition.
- General
Category Group - Groupings of multiple General_Category property values.
- General
Category OutOf Bounds Error - Error value for
impl TryFrom<u8> for GeneralCategory
. - Graph
- Invisible characters.
- Grapheme
Base - Property used together with the definition of Standard Korean Syllable Block to define “Grapheme base”.
- Grapheme
Cluster Break - Enumerated property Grapheme_Cluster_Break.
- Grapheme
Extend - Property used to define “Grapheme extender”.
- Grapheme
Link - Deprecated property.
- Hangul
Syllable Type - Enumerated property Hangul_Syllable_Type
- HexDigit
- Characters commonly used for the representation of hexadecimal numbers, plus their compatibility equivalents.
- Hyphen
- Deprecated property.
- IdContinue
- Characters that can come after the first character in an identifier.
- IdStart
- Characters that can begin an identifier.
- Ideographic
- Characters considered to be CJKV (Chinese, Japanese, Korean, and Vietnamese) ideographs, or related siniform ideographs
- IdsBinary
Operator - Characters used in Ideographic Description Sequences.
- IdsTrinary
Operator - Characters used in Ideographic Description Sequences.
- Indic
Syllabic Category - Property Indic_Syllabic_Category. See UAX #44: https://www.unicode.org/reports/tr44/#Indic_Syllabic_Category.
- Join
Control - Format control characters which have specific functions for control of cursive joining and ligation.
- Joining
Type - Enumerated property Joining_Type.
- Line
Break - Enumerated property Line_Break.
- Logical
Order Exception - A small number of spacing vowel letters occurring in certain Southeast Asian scripts such as Thai and Lao.
- Lowercase
- Lowercase characters.
- Math
- Characters used in mathematical notation.
- NfcInert
- Characters that are inert under NFC, i.e., they do not interact with adjacent characters.
- NfdInert
- Characters that are inert under NFD, i.e., they do not interact with adjacent characters.
- Nfkc
Inert - Characters that are inert under NFKC, i.e., they do not interact with adjacent characters.
- Nfkd
Inert - Characters that are inert under NFKD, i.e., they do not interact with adjacent characters.
- Noncharacter
Code Point - Code points permanently reserved for internal use.
- Pattern
Syntax - Characters used as syntax in patterns (such as regular expressions).
- Pattern
White Space - Characters used as whitespace in patterns (such as regular expressions).
- Prepended
Concatenation Mark - A small class of visible format controls, which precede and then span a sequence of other characters, usually digits.
- Printable characters (visible characters and whitespace).
- Quotation
Mark - Punctuation characters that function as quotation marks.
- Radical
- Characters used in the definition of Ideographic Description Sequences.
- Regional
Indicator - Regional indicator characters,
U+1F1E6..U+1F1FF
. - Script
- Enumerated property Script.
- Segment
Starter - Characters that are starters in terms of Unicode normalization and combining character sequences.
- Sentence
Break - Enumerated property Sentence_Break.
- Sentence
Terminal - Punctuation characters that generally mark the end of sentences.
- Soft
Dotted - Characters with a “soft dot”, like i or j.
- Terminal
Punctuation - Punctuation characters that generally mark the end of textual units.
- Unified
Ideograph - A property which specifies the exact set of Unified CJK Ideographs in the standard.
- Uppercase
- Uppercase characters.
- Variation
Selector - Characters that are Variation Selectors.
- Vertical
Orientation - Property Vertical_Orientation
- White
Space - Spaces, separator characters and other control characters which should be treated by programming languages as “white space” for the purpose of parsing elements.
- Word
Break - Enumerated property Word_Break.
- Xdigit
- Hexadecimal digits
- XidContinue
- Characters that can come after the first character in an identifier.
- XidStart
- Characters that can begin an identifier.
Enums§
- Bidi
Paired Bracket Type - The enum represents Bidi_Paired_Bracket_Type.
- General
Category - Enumerated property General_Category.
Traits§
- Binary
Property - A binary Unicode character property.
- Emoji
Set - An Emoji set as defined by
Unicode Technical Standard #51
. - Enumerated
Property - A Unicode character property that assigns a value to each code point.
- Named
Enumerated Property - A property whose value names can be represented as strings.
- Parseable
Enumerated Property - A property whose value names can be parsed from strings.