widestring/lib.rs
1//! A wide string library for converting to and from wide string variants.
2//!
3//! This library provides multiple types of wide strings, each corresponding to a string types in
4//! the Rust standard library. [`Utf16String`] and [`Utf32String`] are analogous to the standard
5//! [`String`] type, providing a similar interface, and are always encoded as valid UTF-16 and
6//! UTF-32, respectively. They are the only type in this library that can losslessly and infallibly
7//! convert to and from [`String`], and are the easiest type to work with. They are not designed for
8//! working with FFI, but do support efficient conversions from the FFI types.
9//!
10//! [`U16String`] and [`U32String`], on the other hand, are similar to (but not the same as),
11//! [`OsString`], and are designed around working with FFI. Unlike the UTF variants, these strings
12//! do not have a defined encoding, and can work with any wide character strings, regardless of
13//! the encoding. They can be converted to and from [`OsString`] (but may require an encoding
14//! conversion depending on the platform), although that string type is an OS-specified
15//! encoding, so take special care.
16//!
17//! [`U16String`] and [`U32String`] also allow access and mutation that relies on the user
18//! to enforce any constraints on the data. Some methods do assume a UTF encoding, but do so in a
19//! way that handles malformed encoding data. For FFI, use [`U16String`] or [`U32String`] when you
20//! simply need to pass-through string data, or when you're not dealing with a nul-terminated data.
21//!
22//! Finally, [`U16CString`] and [`U32CString`] are wide version of the standard [`CString`] type.
23//! Like [`U16String`] and [`U32String`], they do not have defined encoding, but are designed to
24//! work with FFI, particularly C-style nul-terminated wide string data. These C-style strings are
25//! always terminated in a nul value, and are guaranteed to contain no interior nul values (unless
26//! unchecked methods are used). Again, these types may contain ill-formed encoding data, and
27//! methods handle it appropriately. Use [`U16CString`] or [`U32CString`] anytime you must properly
28//! handle nul values for when dealing with wide string C FFI.
29//!
30//! Like the standard Rust string types, each wide string type has its corresponding wide string
31//! slice type, as shown in the following table:
32//!
33//! | String Type | Slice Type |
34//! |-----------------|--------------|
35//! | [`Utf16String`] | [`Utf16Str`] |
36//! | [`Utf32String`] | [`Utf32Str`] |
37//! | [`U16String`] | [`U16Str`] |
38//! | [`U32String`] | [`U32Str`] |
39//! | [`U16CString`] | [`U16CStr`] |
40//! | [`U32CString`] | [`U32CStr`] |
41//!
42//! All the string types in this library can be converted between string types of the same bit
43//! width, as well as appropriate standard Rust types, but be lossy and/or require knowledge of the
44//! underlying encoding. The UTF strings additionally can be converted between the two sizes of
45//! string, re-encoding the strings.
46//!
47//! # Wide string literals
48//!
49//! Macros are provided for each wide string slice type that convert standard Rust [`str`] literals
50//! into UTF-16 or UTF-32 encoded versions of the slice type at *compile time*.
51//!
52//! ```
53//! use widestring::u16str;
54//! let hello = u16str!("Hello, world!"); // `hello` will be a &U16Str value
55//! ```
56//!
57//! These can be used anywhere a `const` function can be used, and provide a convenient method of
58//! specifying wide string literals instead of coding values by hand. The resulting string slices
59//! are always valid UTF encoding, and the [`u16cstr!`] and [`u32cstr!`] macros are automatically
60//! nul-terminated.
61//!
62//! # Cargo features
63//!
64//! This crate supports `no_std` when default cargo features are disabled. The `std` and `alloc`
65//! cargo features (enabled by default) enable the owned string types: [`U16String`], [`U32String`],
66//! [`U16CString`], [`U32CString`], [`Utf16String`], and [`Utf32String`] types and their modules.
67//! Other types such as the string slices do not require allocation and can be used in a `no_std`
68//! environment, even without the [`alloc`](https://doc.rust-lang.org/stable/alloc/index.html)
69//! crate.
70//!
71//! # Remarks on UTF-16 and UTF-32
72//!
73//! UTF-16 encoding is a variable-length encoding. The 16-bit code units can specificy Unicode code
74//! points either as single units or in _surrogate pairs_. Because every value might be part of a
75//! surrogate pair, many regular string operations on UTF-16 data, including indexing, writing, or
76//! even iterating, require considering either one or two values at a time. This library provides
77//! safe methods for these operations when the data is known to be UTF-16, such as with
78//! [`Utf16String`]. In those cases, keep in mind that the number of elements (`len()`) of the
79//! wide string is _not_ equivalent to the number of Unicode code points in the string, but is
80//! instead the number of code unit values.
81//!
82//! For [`U16String`] and [`U16CString`], which do not define an encoding, these same operations
83//! (indexing, mutating, iterating) do _not_ take into account UTF-16 encoding and may result in
84//! sequences that are ill-formed UTF-16. Some methods are provided that do make an exception to
85//! this and treat the strings as malformed UTF-16, which are specified in their documentation as to
86//! how they handle the invalid data.
87//!
88//! UTF-32 simply encodes Unicode code points as-is in 32-bit Unicode Scalar Values, but Unicode
89//! character code points are reserved only for 21-bits, and UTF-16 surrogates are invalid in
90//! UTF-32. Since UTF-32 is a fixed-width encoding, it is much easier to deal with, but equivalent
91//! methods to the 16-bit strings are provided for compatibility.
92//!
93//! All the 32-bit wide strings provide efficient methods to convert to and from sequences of
94//! [`char`] data, as the representation of UTF-32 strings is functionally equivalent to sequences
95//! of [`char`]s. Keep in mind that only [`Utf32String`] guaruntees this equivalence, however, since
96//! the other strings may contain invalid values.
97//!
98//! # FFI with C/C++ `wchar_t`
99//!
100//! C/C++'s `wchar_t` (and C++'s corresponding `widestring`) varies in size depending on compiler
101//! and platform. Typically, `wchar_t` is 16-bits on Windows and 32-bits on most Unix-based
102//! platforms. For convenience when using `wchar_t`-based FFI's, type aliases for the corresponding
103//! string types are provided: [`WideString`] aliases [`U16String`] on Windows or [`U32String`]
104//! elsewhere, [`WideCString`] aliases [`U16CString`] or [`U32CString`], and [`WideUtfString`]
105//! aliases [`Utf16String`] or [`Utf32String`]. [`WideStr`], [`WideCStr`], and [`WideUtfStr`] are
106//! provided for the string slice types. The [`WideChar`] alias is also provided, aliasing [`u16`]
107//! or [`u32`] depending on platform.
108//!
109//! When not interacting with a FFI that uses `wchar_t`, it is recommended to use the string types
110//! directly rather than via the wide alias.
111//!
112//! # Nul values
113//!
114//! This crate uses the term legacy ASCII term "nul" to refer to Unicode code point `U+0000 NULL`
115//! and its associated code unit representation as zero-value bytes. This is to disambiguate this
116//! zero value from null pointer values. C-style strings end in a nul value, while regular Rust
117//! strings allow interior nul values and are not terminated with nul.
118//!
119//! # Examples
120//!
121//! The following example uses [`U16String`] to get Windows error messages, since `FormatMessageW`
122//! returns a string length for us and we don't need to pass error messages into other FFI
123//! functions so we don't need to worry about nul values.
124//!
125//! ```rust
126//! # #[cfg(any(not(windows), not(feature = "alloc")))]
127//! # fn main() {}
128//! # extern crate windows_sys;
129//! # extern crate widestring;
130//! # #[cfg(all(windows, feature = "alloc"))]
131//! # fn main() {
132//! use windows_sys::{Win32::{System::Diagnostics::Debug::{FormatMessageW,
133//! FORMAT_MESSAGE_FROM_SYSTEM, FORMAT_MESSAGE_ALLOCATE_BUFFER,
134//! FORMAT_MESSAGE_IGNORE_INSERTS}, Foundation::{LocalFree, HLOCAL}}, core::PWSTR};
135//! use std::ptr;
136//! use widestring::U16String;
137//!
138//! let error_code: u32 = 0;
139//! let s: U16String;
140//! unsafe {
141//! // First, get a string buffer from some windows api such as FormatMessageW...
142//! let mut buffer: PWSTR = ptr::null_mut();
143//! let strlen = FormatMessageW(FORMAT_MESSAGE_FROM_SYSTEM |
144//! FORMAT_MESSAGE_ALLOCATE_BUFFER |
145//! FORMAT_MESSAGE_IGNORE_INSERTS,
146//! ptr::null(),
147//! error_code, // error code from GetLastError()
148//! 0,
149//! (&mut buffer as *mut PWSTR) as PWSTR,
150//! 0,
151//! ptr::null_mut());
152//!
153//! // Get the buffer as a wide string
154//! s = U16String::from_ptr(buffer, strlen as usize);
155//! // Since U16String creates an owned copy, it's safe to free original buffer now
156//! // If you didn't want an owned copy, you could use &U16Str.
157//! LocalFree(buffer as HLOCAL);
158//! }
159//! // Convert to a regular Rust String and use it to your heart's desire!
160//! let message = s.to_string_lossy();
161//! # assert_eq!(message, "The operation completed successfully.\r\n");
162//! # }
163//! ```
164//!
165//! The following example is the functionally the same, only using [`U16CString`] instead.
166//!
167//! ```rust
168//! # #[cfg(any(not(windows), not(feature = "alloc")))]
169//! # fn main() {}
170//! # extern crate windows_sys;
171//! # extern crate widestring;
172//! # #[cfg(all(windows, feature = "alloc"))]
173//! # fn main() {
174//! use windows_sys::{Win32::{System::Diagnostics::Debug::{FormatMessageW,
175//! FORMAT_MESSAGE_FROM_SYSTEM, FORMAT_MESSAGE_ALLOCATE_BUFFER,
176//! FORMAT_MESSAGE_IGNORE_INSERTS}, Foundation::{LocalFree, HLOCAL}}, core::PWSTR};
177//! use std::ptr;
178//! use widestring::U16CString;
179//!
180//! let error_code: u32 = 0;
181//! let s: U16CString;
182//! unsafe {
183//! // First, get a string buffer from some windows api such as FormatMessageW...
184//! let mut buffer: PWSTR = ptr::null_mut();
185//! FormatMessageW(FORMAT_MESSAGE_FROM_SYSTEM |
186//! FORMAT_MESSAGE_ALLOCATE_BUFFER |
187//! FORMAT_MESSAGE_IGNORE_INSERTS,
188//! ptr::null(),
189//! error_code, // error code from GetLastError()
190//! 0,
191//! (&mut buffer as *mut PWSTR) as PWSTR,
192//! 0,
193//! ptr::null_mut());
194//!
195//! // Get the buffer as a wide string
196//! s = U16CString::from_ptr_str(buffer);
197//! // Since U16CString creates an owned copy, it's safe to free original buffer now
198//! // If you didn't want an owned copy, you could use &U16CStr.
199//! LocalFree(buffer as HLOCAL);
200//! }
201//! // Convert to a regular Rust String and use it to your heart's desire!
202//! let message = s.to_string_lossy();
203//! # assert_eq!(message, "The operation completed successfully.\r\n");
204//! # }
205//! ```
206//!
207//! [`OsString`]: std::ffi::OsString
208//! [`OsStr`]: std::ffi::OsStr
209//! [`CString`]: std::ffi::CString
210//! [`CStr`]: std::ffi::CStr
211
212#![warn(
213 missing_docs,
214 missing_debug_implementations,
215 trivial_casts,
216 trivial_numeric_casts,
217 future_incompatible
218)]
219#![allow(renamed_and_removed_lints, stable_features)] // Until min version gets bumped
220#![cfg_attr(not(feature = "std"), no_std)]
221#![doc(html_root_url = "https://docs.rs/widestring/1.2.0")]
222#![doc(test(attr(deny(warnings), allow(unused))))]
223#![cfg_attr(docsrs, feature(doc_cfg))]
224#![cfg_attr(
225 feature = "debugger_visualizer",
226 feature(debugger_visualizer),
227 debugger_visualizer(natvis_file = "../debug_metadata/widestring.natvis")
228)]
229
230#[cfg(feature = "alloc")]
231extern crate alloc;
232
233use crate::error::{DecodeUtf16Error, DecodeUtf32Error};
234#[cfg(feature = "alloc")]
235#[allow(unused_imports)]
236use alloc::vec::Vec;
237use core::fmt::Write;
238
239pub mod error;
240pub mod iter;
241mod macros;
242#[cfg(feature = "std")]
243#[cfg_attr(docsrs, doc(cfg(feature = "alloc")))]
244mod platform;
245pub mod ucstr;
246#[cfg(feature = "alloc")]
247#[cfg_attr(docsrs, doc(cfg(feature = "alloc")))]
248pub mod ucstring;
249pub mod ustr;
250#[cfg(feature = "alloc")]
251#[cfg_attr(docsrs, doc(cfg(feature = "alloc")))]
252pub mod ustring;
253pub mod utfstr;
254#[cfg(feature = "alloc")]
255#[cfg_attr(docsrs, doc(cfg(feature = "alloc")))]
256pub mod utfstring;
257
258#[doc(hidden)]
259pub use macros::internals;
260pub use ucstr::{U16CStr, U32CStr, WideCStr};
261#[cfg(feature = "alloc")]
262#[cfg_attr(docsrs, doc(cfg(feature = "alloc")))]
263pub use ucstring::{U16CString, U32CString, WideCString};
264pub use ustr::{U16Str, U32Str, WideStr};
265#[cfg(feature = "alloc")]
266#[cfg_attr(docsrs, doc(cfg(feature = "alloc")))]
267pub use ustring::{U16String, U32String, WideString};
268pub use utfstr::{Utf16Str, Utf32Str, WideUtfStr};
269#[cfg(feature = "alloc")]
270#[cfg_attr(docsrs, doc(cfg(feature = "alloc")))]
271pub use utfstring::{Utf16String, Utf32String, WideUtfString};
272
273#[cfg(not(windows))]
274/// Alias for [`u16`] or [`u32`] depending on platform. Intended to match typical C `wchar_t` size
275/// on platform.
276pub type WideChar = u32;
277
278#[cfg(windows)]
279/// Alias for [`u16`] or [`u32`] depending on platform. Intended to match typical C `wchar_t` size
280/// on platform.
281pub type WideChar = u16;
282
283/// Creates an iterator over the UTF-16 encoded code points in `iter`, returning unpaired surrogates
284/// as `Err`s.
285///
286/// # Examples
287///
288/// Basic usage:
289///
290/// ```
291/// use std::char::decode_utf16;
292///
293/// // 𝄞mus<invalid>ic<invalid>
294/// let v = [
295/// 0xD834, 0xDD1E, 0x006d, 0x0075, 0x0073, 0xDD1E, 0x0069, 0x0063, 0xD834,
296/// ];
297///
298/// assert_eq!(
299/// decode_utf16(v.iter().cloned())
300/// .map(|r| r.map_err(|e| e.unpaired_surrogate()))
301/// .collect::<Vec<_>>(),
302/// vec![
303/// Ok('𝄞'),
304/// Ok('m'), Ok('u'), Ok('s'),
305/// Err(0xDD1E),
306/// Ok('i'), Ok('c'),
307/// Err(0xD834)
308/// ]
309/// );
310/// ```
311///
312/// A lossy decoder can be obtained by replacing Err results with the replacement character:
313///
314/// ```
315/// use std::char::{decode_utf16, REPLACEMENT_CHARACTER};
316///
317/// // 𝄞mus<invalid>ic<invalid>
318/// let v = [
319/// 0xD834, 0xDD1E, 0x006d, 0x0075, 0x0073, 0xDD1E, 0x0069, 0x0063, 0xD834,
320/// ];
321///
322/// assert_eq!(
323/// decode_utf16(v.iter().cloned())
324/// .map(|r| r.unwrap_or(REPLACEMENT_CHARACTER))
325/// .collect::<String>(),
326/// "𝄞mus�ic�"
327/// );
328/// ```
329#[must_use]
330pub fn decode_utf16<I: IntoIterator<Item = u16>>(iter: I) -> iter::DecodeUtf16<I::IntoIter> {
331 iter::DecodeUtf16::new(iter.into_iter())
332}
333
334/// Creates a lossy decoder iterator over the possibly ill-formed UTF-16 encoded code points in
335/// `iter`.
336///
337/// This is equivalent to [`char::decode_utf16`][core::char::decode_utf16] except that any unpaired
338/// UTF-16 surrogate values are replaced by
339/// [`U+FFFD REPLACEMENT_CHARACTER`][core::char::REPLACEMENT_CHARACTER] (�) instead of returning
340/// errors.
341///
342/// # Examples
343///
344/// ```
345/// use widestring::decode_utf16_lossy;
346///
347/// // 𝄞mus<invalid>ic<invalid>
348/// let v = [
349/// 0xD834, 0xDD1E, 0x006d, 0x0075, 0x0073, 0xDD1E, 0x0069, 0x0063, 0xD834,
350/// ];
351///
352/// assert_eq!(
353/// decode_utf16_lossy(v.iter().copied()).collect::<String>(),
354/// "𝄞mus�ic�"
355/// );
356/// ```
357#[inline]
358#[must_use]
359pub fn decode_utf16_lossy<I: IntoIterator<Item = u16>>(
360 iter: I,
361) -> iter::DecodeUtf16Lossy<I::IntoIter> {
362 iter::DecodeUtf16Lossy {
363 iter: decode_utf16(iter),
364 }
365}
366
367/// Creates a decoder iterator over UTF-32 encoded code points in `iter`, returning invalid values
368/// as `Err`s.
369///
370/// # Examples
371///
372/// ```
373/// use widestring::decode_utf32;
374///
375/// // 𝄞mus<invalid>ic<invalid>
376/// let v = [
377/// 0x1D11E, 0x6d, 0x75, 0x73, 0xDD1E, 0x69, 0x63, 0x23FD5A,
378/// ];
379///
380/// assert_eq!(
381/// decode_utf32(v.iter().copied())
382/// .map(|r| r.map_err(|e| e.invalid_code_point()))
383/// .collect::<Vec<_>>(),
384/// vec![
385/// Ok('𝄞'),
386/// Ok('m'), Ok('u'), Ok('s'),
387/// Err(0xDD1E),
388/// Ok('i'), Ok('c'),
389/// Err(0x23FD5A)
390/// ]
391/// );
392/// ```
393#[inline]
394#[must_use]
395pub fn decode_utf32<I: IntoIterator<Item = u32>>(iter: I) -> iter::DecodeUtf32<I::IntoIter> {
396 iter::DecodeUtf32 {
397 iter: iter.into_iter(),
398 }
399}
400
401/// Creates a lossy decoder iterator over the possibly ill-formed UTF-32 encoded code points in
402/// `iter`.
403///
404/// This is equivalent to [`decode_utf32`] except that any invalid UTF-32 values are replaced by
405/// [`U+FFFD REPLACEMENT_CHARACTER`][core::char::REPLACEMENT_CHARACTER] (�) instead of returning
406/// errors.
407///
408/// # Examples
409///
410/// ```
411/// use widestring::decode_utf32_lossy;
412///
413/// // 𝄞mus<invalid>ic<invalid>
414/// let v = [
415/// 0x1D11E, 0x6d, 0x75, 0x73, 0xDD1E, 0x69, 0x63, 0x23FD5A,
416/// ];
417///
418/// assert_eq!(
419/// decode_utf32_lossy(v.iter().copied()).collect::<String>(),
420/// "𝄞mus�ic�"
421/// );
422/// ```
423#[inline]
424#[must_use]
425pub fn decode_utf32_lossy<I: IntoIterator<Item = u32>>(
426 iter: I,
427) -> iter::DecodeUtf32Lossy<I::IntoIter> {
428 iter::DecodeUtf32Lossy {
429 iter: decode_utf32(iter),
430 }
431}
432
433/// Creates an iterator that encodes an iterator over [`char`]s into UTF-8 bytes.
434///
435/// # Examples
436///
437/// ```
438/// use widestring::encode_utf8;
439///
440/// let music = "𝄞music";
441///
442/// let encoded: Vec<u8> = encode_utf8(music.chars()).collect();
443///
444/// assert_eq!(encoded, music.as_bytes());
445/// ```
446#[must_use]
447pub fn encode_utf8<I: IntoIterator<Item = char>>(iter: I) -> iter::EncodeUtf8<I::IntoIter> {
448 iter::EncodeUtf8::new(iter.into_iter())
449}
450
451/// Creates an iterator that encodes an iterator over [`char`]s into UTF-16 [`u16`] code units.
452///
453/// # Examples
454///
455/// ```
456/// use widestring::encode_utf16;
457///
458/// let encoded: Vec<u16> = encode_utf16("𝄞music".chars()).collect();
459///
460/// let v = [
461/// 0xD834, 0xDD1E, 0x006d, 0x0075, 0x0073, 0x0069, 0x0063,
462/// ];
463///
464/// assert_eq!(encoded, v);
465/// ```
466#[must_use]
467pub fn encode_utf16<I: IntoIterator<Item = char>>(iter: I) -> iter::EncodeUtf16<I::IntoIter> {
468 iter::EncodeUtf16::new(iter.into_iter())
469}
470
471/// Creates an iterator that encodes an iterator over [`char`]s into UTF-32 [`u32`] values.
472///
473/// This iterator is a simple type cast from [`char`] to [`u32`], as any sequence of [`char`]s is
474/// valid UTF-32.
475///
476/// # Examples
477///
478/// ```
479/// use widestring::encode_utf32;
480///
481/// let encoded: Vec<u32> = encode_utf32("𝄞music".chars()).collect();
482///
483/// let v = [
484/// 0x1D11E, 0x006d, 0x0075, 0x0073, 0x0069, 0x0063,
485/// ];
486///
487/// assert_eq!(encoded, v);
488/// ```
489#[must_use]
490pub fn encode_utf32<I: IntoIterator<Item = char>>(iter: I) -> iter::EncodeUtf32<I::IntoIter> {
491 iter::EncodeUtf32::new(iter.into_iter())
492}
493
494/// Debug implementation for any U16 string slice.
495///
496/// Properly encoded input data will output valid strings with escape sequences, however invalid
497/// encoding will purposefully output any unpaired surrogate as \<XXXX> which is not a valid escape
498/// sequence. This is intentional, as debug output is not meant to be parsed but read by humans.
499fn debug_fmt_u16(s: &[u16], fmt: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
500 debug_fmt_utf16_iter(decode_utf16(s.iter().copied()), fmt)
501}
502
503/// Debug implementation for any U16 string iterator.
504///
505/// Properly encoded input data will output valid strings with escape sequences, however invalid
506/// encoding will purposefully output any unpaired surrogate as \<XXXX> which is not a valid escape
507/// sequence. This is intentional, as debug output is not meant to be parsed but read by humans.
508fn debug_fmt_utf16_iter(
509 iter: impl Iterator<Item = Result<char, DecodeUtf16Error>>,
510 fmt: &mut core::fmt::Formatter<'_>,
511) -> core::fmt::Result {
512 fmt.write_char('"')?;
513 for res in iter {
514 match res {
515 Ok(ch) => {
516 for c in ch.escape_debug() {
517 fmt.write_char(c)?;
518 }
519 }
520 Err(e) => {
521 write!(fmt, "\\<{:X}>", e.unpaired_surrogate())?;
522 }
523 }
524 }
525 fmt.write_char('"')
526}
527
528/// Debug implementation for any U16 string slice.
529///
530/// Properly encoded input data will output valid strings with escape sequences, however invalid
531/// encoding will purposefully output any invalid code point as \<XXXX> which is not a valid escape
532/// sequence. This is intentional, as debug output is not meant to be parsed but read by humans.
533fn debug_fmt_u32(s: &[u32], fmt: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
534 debug_fmt_utf32_iter(decode_utf32(s.iter().copied()), fmt)
535}
536
537/// Debug implementation for any U16 string iterator.
538///
539/// Properly encoded input data will output valid strings with escape sequences, however invalid
540/// encoding will purposefully output any invalid code point as \<XXXX> which is not a valid escape
541/// sequence. This is intentional, as debug output is not meant to be parsed but read by humans.
542fn debug_fmt_utf32_iter(
543 iter: impl Iterator<Item = Result<char, DecodeUtf32Error>>,
544 fmt: &mut core::fmt::Formatter<'_>,
545) -> core::fmt::Result {
546 fmt.write_char('"')?;
547 for res in iter {
548 match res {
549 Ok(ch) => {
550 for c in ch.escape_debug() {
551 fmt.write_char(c)?;
552 }
553 }
554 Err(e) => {
555 write!(fmt, "\\<{:X}>", e.invalid_code_point())?;
556 }
557 }
558 }
559 fmt.write_char('"')
560}
561
562/// Debug implementation for any `char` iterator.
563fn debug_fmt_char_iter(
564 iter: impl Iterator<Item = char>,
565 fmt: &mut core::fmt::Formatter<'_>,
566) -> core::fmt::Result {
567 fmt.write_char('"')?;
568 iter.flat_map(|c| c.escape_debug())
569 .try_for_each(|c| fmt.write_char(c))?;
570 fmt.write_char('"')
571}
572
573/// Returns whether the code unit a UTF-16 surrogate value.
574#[inline(always)]
575#[allow(dead_code)]
576const fn is_utf16_surrogate(u: u16) -> bool {
577 u >= 0xD800 && u <= 0xDFFF
578}
579
580/// Returns whether the code unit a UTF-16 high surrogate value.
581#[inline(always)]
582#[allow(dead_code)]
583const fn is_utf16_high_surrogate(u: u16) -> bool {
584 u >= 0xD800 && u <= 0xDBFF
585}
586
587/// Returns whether the code unit a UTF-16 low surrogate value.
588#[inline(always)]
589const fn is_utf16_low_surrogate(u: u16) -> bool {
590 u >= 0xDC00 && u <= 0xDFFF
591}
592
593/// Convert a UTF-16 surrogate pair to a `char`. Does not validate if the surrogates are valid.
594#[inline(always)]
595unsafe fn decode_utf16_surrogate_pair(high: u16, low: u16) -> char {
596 let c: u32 = ((((high - 0xD800) as u32) << 10) | ((low) - 0xDC00) as u32) + 0x1_0000;
597 // SAFETY: we checked that it's a legal unicode value
598 core::char::from_u32_unchecked(c)
599}
600
601/// Validates whether a slice of 16-bit values is valid UTF-16, returning an error if it is not.
602#[inline(always)]
603fn validate_utf16(s: &[u16]) -> Result<(), crate::error::Utf16Error> {
604 for (index, result) in crate::decode_utf16(s.iter().copied()).enumerate() {
605 if let Err(e) = result {
606 return Err(crate::error::Utf16Error::empty(index, e));
607 }
608 }
609 Ok(())
610}
611
612/// Validates whether a vector of 16-bit values is valid UTF-16, returning an error if it is not.
613#[inline(always)]
614#[cfg(feature = "alloc")]
615fn validate_utf16_vec(v: Vec<u16>) -> Result<Vec<u16>, crate::error::Utf16Error> {
616 for (index, result) in crate::decode_utf16(v.iter().copied()).enumerate() {
617 if let Err(e) = result {
618 return Err(crate::error::Utf16Error::new(v, index, e));
619 }
620 }
621 Ok(v)
622}
623
624/// Validates whether a slice of 32-bit values is valid UTF-32, returning an error if it is not.
625#[inline(always)]
626fn validate_utf32(s: &[u32]) -> Result<(), crate::error::Utf32Error> {
627 for (index, result) in crate::decode_utf32(s.iter().copied()).enumerate() {
628 if let Err(e) = result {
629 return Err(crate::error::Utf32Error::empty(index, e));
630 }
631 }
632 Ok(())
633}
634
635/// Validates whether a vector of 32-bit values is valid UTF-32, returning an error if it is not.
636#[inline(always)]
637#[cfg(feature = "alloc")]
638fn validate_utf32_vec(v: Vec<u32>) -> Result<Vec<u32>, crate::error::Utf32Error> {
639 for (index, result) in crate::decode_utf32(v.iter().copied()).enumerate() {
640 if let Err(e) = result {
641 return Err(crate::error::Utf32Error::new(v, index, e));
642 }
643 }
644 Ok(v)
645}
646
647/// Copy of unstable core::slice::range to soundly handle ranges
648/// TODO: Replace with core::slice::range when it is stabilized
649#[track_caller]
650#[allow(dead_code, clippy::redundant_closure)]
651fn range<R>(range: R, bounds: core::ops::RangeTo<usize>) -> core::ops::Range<usize>
652where
653 R: core::ops::RangeBounds<usize>,
654{
655 #[inline(never)]
656 #[cold]
657 #[track_caller]
658 fn slice_end_index_len_fail(index: usize, len: usize) -> ! {
659 panic!(
660 "range end index {} out of range for slice of length {}",
661 index, len
662 );
663 }
664
665 #[inline(never)]
666 #[cold]
667 #[track_caller]
668 fn slice_index_order_fail(index: usize, end: usize) -> ! {
669 panic!("slice index starts at {} but ends at {}", index, end);
670 }
671
672 #[inline(never)]
673 #[cold]
674 #[track_caller]
675 fn slice_start_index_overflow_fail() -> ! {
676 panic!("attempted to index slice from after maximum usize");
677 }
678
679 #[inline(never)]
680 #[cold]
681 #[track_caller]
682 fn slice_end_index_overflow_fail() -> ! {
683 panic!("attempted to index slice up to maximum usize");
684 }
685
686 use core::ops::Bound::*;
687
688 let len = bounds.end;
689
690 let start = range.start_bound();
691 let start = match start {
692 Included(&start) => start,
693 Excluded(start) => start
694 .checked_add(1)
695 .unwrap_or_else(|| slice_start_index_overflow_fail()),
696 Unbounded => 0,
697 };
698
699 let end = range.end_bound();
700 let end = match end {
701 Included(end) => end
702 .checked_add(1)
703 .unwrap_or_else(|| slice_end_index_overflow_fail()),
704 Excluded(&end) => end,
705 Unbounded => len,
706 };
707
708 if start > end {
709 slice_index_order_fail(start, end);
710 }
711 if end > len {
712 slice_end_index_len_fail(end, len);
713 }
714
715 core::ops::Range { start, end }
716}
717
718/// Similar to core::slice::range, but returns [`None`] instead of panicking.
719fn range_check<R>(range: R, bounds: core::ops::RangeTo<usize>) -> Option<core::ops::Range<usize>>
720where
721 R: core::ops::RangeBounds<usize>,
722{
723 use core::ops::Bound::*;
724
725 let len = bounds.end;
726
727 let start = range.start_bound();
728 let start = match start {
729 Included(&start) => start,
730 Excluded(start) => start.checked_add(1)?,
731 Unbounded => 0,
732 };
733
734 let end = range.end_bound();
735 let end = match end {
736 Included(end) => end.checked_add(1)?,
737 Excluded(&end) => end,
738 Unbounded => len,
739 };
740
741 if start > end || end > len {
742 return None;
743 }
744 Some(core::ops::Range { start, end })
745}