safe_arch/
lib.rs

1#![no_std]
2#![warn(missing_docs)]
3#![allow(unused_imports)]
4#![allow(clippy::too_many_arguments)]
5#![allow(clippy::transmute_ptr_to_ptr)]
6#![cfg_attr(docsrs, feature(doc_cfg))]
7
8//! A crate that safely exposes arch intrinsics via `#[cfg()]`.
9//!
10//! `safe_arch` lets you safely use CPU intrinsics. Those things in the
11//! [`core::arch`](core::arch) modules. It works purely via `#[cfg()]` and
12//! compile time CPU feature declaration. If you want to check for a feature at
13//! runtime and then call an intrinsic or use a fallback path based on that then
14//! this crate is sadly not for you.
15//!
16//! SIMD register types are "newtype'd" so that better trait impls can be given
17//! to them, but the inner value is a `pub` field so feel free to just grab it
18//! out if you need to. Trait impls of the newtypes include: `Default` (zeroed),
19//! `From`/`Into` of appropriate data types, and appropriate operator
20//! overloading.
21//!
22//! * Most intrinsics (like addition and multiplication) are totally safe to use
23//!   as long as the CPU feature is available. In this case, what you get is 1:1
24//!   with the actual intrinsic.
25//! * Some intrinsics take a pointer of an assumed minimum alignment and
26//!   validity span. For these, the `safe_arch` function takes a reference of an
27//!   appropriate type to uphold safety.
28//!   * Try the [bytemuck](https://docs.rs/bytemuck) crate (and turn on the
29//!     `bytemuck` feature of this crate) if you want help safely casting
30//!     between reference types.
31//! * Some intrinsics are not safe unless you're _very_ careful about how you
32//!   use them, such as the streaming operations requiring you to use them in
33//!   combination with an appropriate memory fence. Those operations aren't
34//!   exposed here.
35//! * Some intrinsics mess with the processor state, such as changing the
36//!   floating point flags, saving and loading special register state, and so
37//!   on. LLVM doesn't really support you messing with that within a high level
38//!   language, so those operations aren't exposed here. Use assembly or
39//!   something if you want to do that.
40//!
41//! ## Naming Conventions
42//! The `safe_arch` crate does not simply use the "official" names for each
43//! intrinsic, because the official names are generally poor. Instead, the
44//! operations have been given better names that makes things hopefully easier
45//! to understand then you're reading the code.
46//!
47//! For a full explanation of the naming used, see the [Naming
48//! Conventions](crate::naming_conventions) page.
49//!
50//! ## Current Support
51//! * `x86` / `x86_64` (Intel, AMD, etc)
52//!   * 128-bit: `sse`, `sse2`, `sse3`, `ssse3`, `sse4.1`, `sse4.2`
53//!   * 256-bit: `avx`, `avx2`
54//!   * Other: `adx`, `aes`, `bmi1`, `bmi2`, `fma`, `lzcnt`, `pclmulqdq`,
55//!     `popcnt`, `rdrand`, `rdseed`
56//!
57//! ## Compile Time CPU Target Features
58//!
59//! At the time of me writing this, Rust enables the `sse` and `sse2` CPU
60//! features by default for all `i686` (x86) and `x86_64` builds. Those CPU
61//! features are built into the design of `x86_64`, and you'd need a _super_ old
62//! `x86` CPU for it to not support at least `sse` and `sse2`, so they're a safe
63//! bet for the language to enable all the time. In fact, because the standard
64//! library is compiled with them enabled, simply trying to _disable_ those
65//! features would actually cause ABI issues and fill your program with UB
66//! ([link][rustc_docs]).
67//!
68//! If you want additional CPU features available at compile time you'll have to
69//! enable them with an additional arg to `rustc`. For a feature named `name`
70//! you pass `-C target-feature=+name`, such as `-C target-feature=+sse3` for
71//! `sse3`.
72//!
73//! You can alternately enable _all_ target features of the current CPU with `-C
74//! target-cpu=native`. This is primarily of use if you're building a program
75//! you'll only run on your own system.
76//!
77//! It's sometimes hard to know if your target platform will support a given
78//! feature set, but the [Steam Hardware Survey][steam-survey] is generally
79//! taken as a guide to what you can expect people to have available. If you
80//! click "Other Settings" it'll expand into a list of CPU target features and
81//! how common they are. These days, it seems that `sse3` can be safely assumed,
82//! and `ssse3`, `sse4.1`, and `sse4.2` are pretty safe bets as well. The stuff
83//! above 128-bit isn't as common yet, give it another few years.
84//!
85//! **Please note that executing a program on a CPU that doesn't support the
86//! target features it was compiles for is Undefined Behavior.**
87//!
88//! Currently, Rust doesn't actually support an easy way for you to check that a
89//! feature enabled at compile time is _actually_ available at runtime. There is
90//! the "[feature_detected][feature_detected]" family of macros, but if you
91//! enable a feature they will evaluate to a constant `true` instead of actually
92//! deferring the check for the feature to runtime. This means that, if you
93//! _did_ want a check at the start of your program, to confirm that all the
94//! assumed features are present and error out when the assumptions don't hold,
95//! you can't use that macro. You gotta use CPUID and check manually. rip.
96//! Hopefully we can make that process easier in a future version of this crate.
97//!
98//! [steam-survey]:
99//! https://store.steampowered.com/hwsurvey/Steam-Hardware-Software-Survey-Welcome-to-Steam
100//! [feature_detected]:
101//! https://doc.rust-lang.org/std/index.html?search=feature_detected
102//! [rustc_docs]: https://doc.rust-lang.org/rustc/targets/known-issues.html
103//!
104//! ### A Note On Working With Cfg
105//!
106//! There's two main ways to use `cfg`:
107//! * Via an attribute placed on an item, block, or expression:
108//!   * `#[cfg(debug_assertions)] println!("hello");`
109//! * Via a macro used within an expression position:
110//!   * `if cfg!(debug_assertions) { println!("hello"); }`
111//!
112//! The difference might seem small but it's actually very important:
113//! * The attribute form will include code or not _before_ deciding if all the
114//!   items named and so forth really exist or not. This means that code that is
115//!   configured via attribute can safely name things that don't always exist as
116//!   long as the things they name do exist whenever that code is configured
117//!   into the build.
118//! * The macro form will include the configured code _no matter what_, and then
119//!   the macro resolves to a constant `true` or `false` and the compiler uses
120//!   dead code elimination to cut out the path not taken.
121//!
122//! This crate uses `cfg` via the attribute, so the functions it exposes don't
123//! exist at all when the appropriate CPU target features aren't enabled.
124//! Accordingly, if you plan to call this crate or not depending on what
125//! features are enabled in the build you'll also need to control your use of
126//! this crate via cfg attribute, not cfg macro.
127
128use core::{
129  convert::AsRef,
130  fmt::{Binary, Debug, Display, LowerExp, LowerHex, Octal, UpperExp, UpperHex},
131  ops::{Add, AddAssign, BitAnd, BitAndAssign, BitOr, BitOrAssign, BitXor, BitXorAssign, Div, DivAssign, Mul, MulAssign, Neg, Not, Sub, SubAssign},
132};
133
134pub mod naming_conventions;
135
136/// Turns a round operator token to the correct constant value.
137#[macro_export]
138#[cfg_attr(docsrs, doc(cfg(target_feature = "avx")))]
139// Note(Lokathor): keep this at the crate root.
140macro_rules! round_op {
141  (Nearest) => {{
142    #[cfg(target_arch = "x86")]
143    use ::core::arch::x86::{_MM_FROUND_NO_EXC, _MM_FROUND_TO_NEAREST_INT};
144    #[cfg(target_arch = "x86_64")]
145    use ::core::arch::x86_64::{_MM_FROUND_NO_EXC, _MM_FROUND_TO_NEAREST_INT};
146    _MM_FROUND_NO_EXC | _MM_FROUND_TO_NEAREST_INT
147  }};
148  (NegInf) => {{
149    #[cfg(target_arch = "x86")]
150    use ::core::arch::x86::{_MM_FROUND_NO_EXC, _MM_FROUND_TO_NEG_INF};
151    #[cfg(target_arch = "x86_64")]
152    use ::core::arch::x86_64::{_MM_FROUND_NO_EXC, _MM_FROUND_TO_NEG_INF};
153    _MM_FROUND_NO_EXC | _MM_FROUND_TO_NEG_INF
154  }};
155  (PosInf) => {{
156    #[cfg(target_arch = "x86")]
157    use ::core::arch::x86::{_MM_FROUND_NO_EXC, _MM_FROUND_TO_POS_INF};
158    #[cfg(target_arch = "x86_64")]
159    use ::core::arch::x86_64::{_MM_FROUND_NO_EXC, _MM_FROUND_TO_POS_INF};
160    _MM_FROUND_NO_EXC | _MM_FROUND_TO_POS_INF
161  }};
162  (Zero) => {{
163    #[cfg(target_arch = "x86")]
164    use ::core::arch::x86::{_mm256_round_pd, _MM_FROUND_NO_EXC, _MM_FROUND_TO_ZERO};
165    #[cfg(target_arch = "x86_64")]
166    use ::core::arch::x86_64::{_mm256_round_pd, _MM_FROUND_NO_EXC, _MM_FROUND_TO_ZERO};
167    _MM_FROUND_NO_EXC | _MM_FROUND_TO_ZERO
168  }};
169}
170
171/// Declares a private mod and then a glob `use` with the visibility specified.
172macro_rules! submodule {
173  ($v:vis $name:ident) => {
174    mod $name;
175    $v use $name::*;
176  };
177  ($v:vis $name:ident { $($content:tt)* }) => {
178    mod $name { $($content)* }
179    $v use $name::*;
180  };
181}
182
183// Note(Lokathor): Stupid as it sounds, we need to put the imports here at the
184// crate root because the arch-specific macros that we define in our inner
185// modules are actually "scoped" to also be at the crate root. We want the
186// rustdoc generation of the macros to "see" these imports so that the docs link
187// over to the `core::arch` module correctly.
188// https://github.com/rust-lang/rust/issues/72243
189
190#[cfg(target_arch = "x86")]
191use core::arch::x86::*;
192#[cfg(target_arch = "x86_64")]
193use core::arch::x86_64::*;
194
195#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
196submodule!(pub x86_x64 {
197  //! Types and functions for safe `x86` / `x86_64` intrinsic usage.
198  //!
199  //! `x86_64` is essentially a superset of `x86`, so we just lump it all into
200  //! one module. Anything not available on `x86` simply won't be in the build
201  //! on that arch.
202  use super::*;
203
204  submodule!(pub m128_);
205  submodule!(pub m128d_);
206  submodule!(pub m128i_);
207
208  submodule!(pub m256_);
209  submodule!(pub m256d_);
210  submodule!(pub m256i_);
211
212  // Note(Lokathor): We only include these sub-modules with the actual functions
213  // if the feature is enabled. Ae *also* have a cfg attribute on the inside of
214  // the modules as a "double-verification" of sorts. Technically either way on
215  // its own would also be fine.
216
217  // These CPU features follow a fairly clear and strict progression that's easy
218  // to remember. Most of them offer a fair pile of new functions.
219  #[cfg(target_feature = "sse")]
220  submodule!(pub sse);
221  #[cfg(target_feature = "sse2")]
222  submodule!(pub sse2);
223  #[cfg(target_feature = "sse3")]
224  submodule!(pub sse3);
225  #[cfg(target_feature = "ssse3")]
226  submodule!(pub ssse3);
227  #[cfg(target_feature = "sse4.1")]
228  submodule!(pub sse4_1);
229  #[cfg(target_feature = "sse4.2")]
230  submodule!(pub sse4_2);
231  #[cfg(target_feature = "avx")]
232  submodule!(pub avx);
233  #[cfg(target_feature = "avx2")]
234  submodule!(pub avx2);
235
236  // These features aren't as easy to remember the progression of and they each
237  // only add a small handful of functions.
238  #[cfg(target_feature = "adx")]
239  submodule!(pub adx);
240  #[cfg(target_feature = "aes")]
241  submodule!(pub aes);
242  #[cfg(target_feature = "bmi1")]
243  submodule!(pub bmi1);
244  #[cfg(target_feature = "bmi2")]
245  submodule!(pub bmi2);
246  #[cfg(target_feature = "fma")]
247  submodule!(pub fma);
248  #[cfg(target_feature = "lzcnt")]
249  submodule!(pub lzcnt);
250  #[cfg(target_feature = "pclmulqdq")]
251  submodule!(pub pclmulqdq);
252  #[cfg(target_feature = "popcnt")]
253  submodule!(pub popcnt);
254  #[cfg(target_feature = "rdrand")]
255  submodule!(pub rdrand);
256  #[cfg(target_feature = "rdseed")]
257  submodule!(pub rdseed);
258
259  /// Reads the CPU's timestamp counter value.
260  ///
261  /// This is a monotonically increasing time-stamp that goes up every clock
262  /// cycle of the CPU. However, since modern CPUs are variable clock rate
263  /// depending on demand this can't actually be used for telling the time. It
264  /// also does _not_ fully serialize all operations, so previous instructions
265  /// might still be in progress when this reads the timestamp.
266  ///
267  /// * **Intrinsic:** `_rdtsc`
268  /// * **Assembly:** `rdtsc`
269  pub fn read_timestamp_counter() -> u64 {
270    // Note(Lokathor): This was changed from i64 to u64 at some point, but
271    // everyone ever was already casting this value to `u64` so crater didn't
272    // even consider it a problem. We will follow suit.
273    #[allow(clippy::unnecessary_cast)]
274    unsafe { _rdtsc() as u64 }
275  }
276
277  /// Reads the CPU's timestamp counter value and store the processor signature.
278  ///
279  /// This works similar to [`read_timestamp_counter`] with two main
280  /// differences:
281  /// * It and also stores the `IA32_TSC_AUX MSR` value to the reference given.
282  /// * It waits on all previous instructions to finish before reading the
283  ///   timestamp (though it doesn't prevent other instructions from starting).
284  ///
285  /// As with `read_timestamp_counter`, you can't actually use this to tell the
286  /// time.
287  ///
288  /// * **Intrinsic:** `__rdtscp`
289  /// * **Assembly:** `rdtscp`
290  pub fn read_timestamp_counter_p(aux: &mut u32) -> u64 {
291    unsafe { __rdtscp(aux) }
292  }
293
294  /// Swap the bytes of the given 32-bit value.
295  ///
296  /// ```
297  /// # use safe_arch::*;
298  /// assert_eq!(byte_swap_i32(0x0A123456), 0x5634120A);
299  /// ```
300  /// * **Intrinsic:** `_bswap`
301  /// * **Assembly:** `bswap r32`
302  pub fn byte_swap_i32(i: i32) -> i32 {
303    unsafe { _bswap(i) }
304  }
305
306  /// Swap the bytes of the given 64-bit value.
307  ///
308  /// ```
309  /// # use safe_arch::*;
310  /// assert_eq!(byte_swap_i64(0x0A123456_789ABC01), 0x01BC9A78_5634120A);
311  /// ```
312  /// * **Intrinsic:** `_bswap64`
313  /// * **Assembly:** `bswap r64`
314  #[cfg(target_arch="x86_64")]
315  pub fn byte_swap_i64(i: i64) -> i64 {
316    unsafe { _bswap64(i) }
317  }
318});