Expand description
A crate that safely exposes arch intrinsics via #[cfg()].
safe_arch lets you safely use CPU intrinsics. Those things in the
core::arch modules. It works purely via #[cfg()] and
compile time CPU feature declaration. If you want to check for a feature at
runtime and then call an intrinsic or use a fallback path based on that then
this crate is sadly not for you.
SIMD register types are “newtype’d” so that better trait impls can be given
to them, but the inner value is a pub field so feel free to just grab it
out if you need to. Trait impls of the newtypes include: Default (zeroed),
From/Into of appropriate data types, and appropriate operator
overloading.
- Most intrinsics (like addition and multiplication) are totally safe to use as long as the CPU feature is available. In this case, what you get is 1:1 with the actual intrinsic.
- Some intrinsics take a pointer of an assumed minimum alignment and
validity span. For these, the
safe_archfunction takes a reference of an appropriate type to uphold safety.- Try the bytemuck crate (and turn on the
bytemuckfeature of this crate) if you want help safely casting between reference types.
- Try the bytemuck crate (and turn on the
- Some intrinsics are not safe unless you’re very careful about how you use them, such as the streaming operations requiring you to use them in combination with an appropriate memory fence. Those operations aren’t exposed here.
- Some intrinsics mess with the processor state, such as changing the floating point flags, saving and loading special register state, and so on. LLVM doesn’t really support you messing with that within a high level language, so those operations aren’t exposed here. Use assembly or something if you want to do that.
§Naming Conventions
The safe_arch crate does not simply use the “official” names for each
intrinsic, because the official names are generally poor. Instead, the
operations have been given better names that makes things hopefully easier
to understand then you’re reading the code.
For a full explanation of the naming used, see the Naming Conventions page.
§Current Support
x86/x86_64(Intel, AMD, etc)- 128-bit:
sse,sse2,sse3,ssse3,sse4.1,sse4.2 - 256-bit:
avx,avx2 - Other:
adx,aes,bmi1,bmi2,fma,lzcnt,pclmulqdq,popcnt,rdrand,rdseed
- 128-bit:
§Compile Time CPU Target Features
At the time of me writing this, Rust enables the sse and sse2 CPU
features by default for all i686 (x86) and x86_64 builds. Those CPU
features are built into the design of x86_64, and you’d need a super old
x86 CPU for it to not support at least sse and sse2, so they’re a safe
bet for the language to enable all the time. In fact, because the standard
library is compiled with them enabled, simply trying to disable those
features would actually cause ABI issues and fill your program with UB
(link).
If you want additional CPU features available at compile time you’ll have to
enable them with an additional arg to rustc. For a feature named name
you pass -C target-feature=+name, such as -C target-feature=+sse3 for
sse3.
You can alternately enable all target features of the current CPU with -C target-cpu=native. This is primarily of use if you’re building a program
you’ll only run on your own system.
It’s sometimes hard to know if your target platform will support a given
feature set, but the Steam Hardware Survey is generally
taken as a guide to what you can expect people to have available. If you
click “Other Settings” it’ll expand into a list of CPU target features and
how common they are. These days, it seems that sse3 can be safely assumed,
and ssse3, sse4.1, and sse4.2 are pretty safe bets as well. The stuff
above 128-bit isn’t as common yet, give it another few years.
Please note that executing a program on a CPU that doesn’t support the target features it was compiles for is Undefined Behavior.
Currently, Rust doesn’t actually support an easy way for you to check that a
feature enabled at compile time is actually available at runtime. There is
the “feature_detected” family of macros, but if you
enable a feature they will evaluate to a constant true instead of actually
deferring the check for the feature to runtime. This means that, if you
did want a check at the start of your program, to confirm that all the
assumed features are present and error out when the assumptions don’t hold,
you can’t use that macro. You gotta use CPUID and check manually. rip.
Hopefully we can make that process easier in a future version of this crate.
§A Note On Working With Cfg
There’s two main ways to use cfg:
- Via an attribute placed on an item, block, or expression:
#[cfg(debug_assertions)] println!("hello");
- Via a macro used within an expression position:
if cfg!(debug_assertions) { println!("hello"); }
The difference might seem small but it’s actually very important:
- The attribute form will include code or not before deciding if all the items named and so forth really exist or not. This means that code that is configured via attribute can safely name things that don’t always exist as long as the things they name do exist whenever that code is configured into the build.
- The macro form will include the configured code no matter what, and then
the macro resolves to a constant
trueorfalseand the compiler uses dead code elimination to cut out the path not taken.
This crate uses cfg via the attribute, so the functions it exposes don’t
exist at all when the appropriate CPU target features aren’t enabled.
Accordingly, if you plan to call this crate or not depending on what
features are enabled in the build you’ll also need to control your use of
this crate via cfg attribute, not cfg macro.
Modules§
- naming_
conventions - An explanation of the crate’s naming conventions.
Macros§
- round_
op - Turns a round operator token to the correct constant value.
Structs§
- m128
- The data for a 128-bit SSE register of four
f32lanes. - m256
- The data for a 256-bit AVX register of eight
f32lanes. - m128d
- The data for a 128-bit SSE register of two
f64values. - m128i
- The data for a 128-bit SSE register of integer data.
- m256d
- The data for a 256-bit AVX register of four
f64values. - m256i
- The data for a 256-bit AVX register of integer data.
Functions§
- add_
i8_ m128i - Lanewise
a + bwith lanes asi8. - add_
i16_ m128i - Lanewise
a + bwith lanes asi16. - add_
i32_ m128i - Lanewise
a + bwith lanes asi32. - add_
i64_ m128i - Lanewise
a + bwith lanes asi64. - add_
m128 - Lanewise
a + b. - add_
m128_ s - Low lane
a + b, other lanes unchanged. - add_
m128d - Lanewise
a + b. - add_
m128d_ s - Lowest lane
a + b, high lane unchanged. - add_
saturating_ i8_ m128i - Lanewise saturating
a + bwith lanes asi8. - add_
saturating_ i16_ m128i - Lanewise saturating
a + bwith lanes asi16. - add_
saturating_ u8_ m128i - Lanewise saturating
a + bwith lanes asu8. - add_
saturating_ u16_ m128i - Lanewise saturating
a + bwith lanes asu16. - average_
u8_ m128i - Lanewise average of the
u8values. - average_
u16_ m128i - Lanewise average of the
u16values. - bitand_
m128 - Bitwise
a & b. - bitand_
m128d - Bitwise
a & b. - bitand_
m128i - Bitwise
a & b. - bitandnot_
m128 - Bitwise
(!a) & b. - bitandnot_
m128d - Bitwise
(!a) & b. - bitandnot_
m128i - Bitwise
(!a) & b. - bitor_
m128 - Bitwise
a | b. - bitor_
m128d - Bitwise
a | b. - bitor_
m128i - Bitwise
a | b. - bitxor_
m128 - Bitwise
a ^ b. - bitxor_
m128d - Bitwise
a ^ b. - bitxor_
m128i - Bitwise
a ^ b. - byte_
shl_ imm_ u128_ m128i - Shifts all bits in the entire register left by a number of bytes.
- byte_
shr_ imm_ u128_ m128i - Shifts all bits in the entire register right by a number of bytes.
- byte_
swap_ i32 - Swap the bytes of the given 32-bit value.
- byte_
swap_ i64 - Swap the bytes of the given 64-bit value.
- cast_
to_ m128_ from_ m128d - Bit-preserving cast to
m128fromm128d - cast_
to_ m128_ from_ m128i - Bit-preserving cast to
m128fromm128i - cast_
to_ m128d_ from_ m128 - Bit-preserving cast to
m128dfromm128 - cast_
to_ m128d_ from_ m128i - Bit-preserving cast to
m128dfromm128i - cast_
to_ m128i_ from_ m128 - Bit-preserving cast to
m128ifromm128 - cast_
to_ m128i_ from_ m128d - Bit-preserving cast to
m128ifromm128d - cmp_
eq_ i32_ m128_ s - Low lane equality.
- cmp_
eq_ i32_ m128d_ s - Low lane
f64equal to. - cmp_
eq_ mask_ i8_ m128i - Lanewise
a == bwith lanes asi8. - cmp_
eq_ mask_ i16_ m128i - Lanewise
a == bwith lanes asi16. - cmp_
eq_ mask_ i32_ m128i - Lanewise
a == bwith lanes asi32. - cmp_
eq_ mask_ m128 - Lanewise
a == b. - cmp_
eq_ mask_ m128_ s - Low lane
a == b, other lanes unchanged. - cmp_
eq_ mask_ m128d - Lanewise
a == b, mask output. - cmp_
eq_ mask_ m128d_ s - Low lane
a == b, other lanes unchanged. - cmp_
ge_ i32_ m128_ s - Low lane greater than or equal to.
- cmp_
ge_ i32_ m128d_ s - Low lane
f64greater than or equal to. - cmp_
ge_ mask_ m128 - Lanewise
a >= b. - cmp_
ge_ mask_ m128_ s - Low lane
a >= b, other lanes unchanged. - cmp_
ge_ mask_ m128d - Lanewise
a >= b. - cmp_
ge_ mask_ m128d_ s - Low lane
a >= b, other lanes unchanged. - cmp_
gt_ i32_ m128_ s - Low lane greater than.
- cmp_
gt_ i32_ m128d_ s - Low lane
f64greater than. - cmp_
gt_ mask_ i8_ m128i - Lanewise
a > bwith lanes asi8. - cmp_
gt_ mask_ i16_ m128i - Lanewise
a > bwith lanes asi16. - cmp_
gt_ mask_ i32_ m128i - Lanewise
a > bwith lanes asi32. - cmp_
gt_ mask_ m128 - Lanewise
a > b. - cmp_
gt_ mask_ m128_ s - Low lane
a > b, other lanes unchanged. - cmp_
gt_ mask_ m128d - Lanewise
a > b. - cmp_
gt_ mask_ m128d_ s - Low lane
a > b, other lanes unchanged. - cmp_
le_ i32_ m128_ s - Low lane less than or equal to.
- cmp_
le_ i32_ m128d_ s - Low lane
f64less than or equal to. - cmp_
le_ mask_ m128 - Lanewise
a <= b. - cmp_
le_ mask_ m128_ s - Low lane
a <= b, other lanes unchanged. - cmp_
le_ mask_ m128d - Lanewise
a <= b. - cmp_
le_ mask_ m128d_ s - Low lane
a <= b, other lanes unchanged. - cmp_
lt_ i32_ m128_ s - Low lane less than.
- cmp_
lt_ i32_ m128d_ s - Low lane
f64less than. - cmp_
lt_ mask_ i8_ m128i - Lanewise
a < bwith lanes asi8. - cmp_
lt_ mask_ i16_ m128i - Lanewise
a < bwith lanes asi16. - cmp_
lt_ mask_ i32_ m128i - Lanewise
a < bwith lanes asi32. - cmp_
lt_ mask_ m128 - Lanewise
a < b. - cmp_
lt_ mask_ m128_ s - Low lane
a < b, other lanes unchanged. - cmp_
lt_ mask_ m128d - Lanewise
a < b. - cmp_
lt_ mask_ m128d_ s - Low lane
a < b, other lane unchanged. - cmp_
neq_ i32_ m128_ s - Low lane not equal to.
- cmp_
neq_ i32_ m128d_ s - Low lane
f64less than. - cmp_
neq_ mask_ m128 - Lanewise
a != b. - cmp_
neq_ mask_ m128_ s - Low lane
a != b, other lanes unchanged. - cmp_
neq_ mask_ m128d - Lanewise
a != b. - cmp_
neq_ mask_ m128d_ s - Low lane
a != b, other lane unchanged. - cmp_
nge_ mask_ m128 - Lanewise
!(a >= b). - cmp_
nge_ mask_ m128_ s - Low lane
!(a >= b), other lanes unchanged. - cmp_
nge_ mask_ m128d - Lanewise
!(a >= b). - cmp_
nge_ mask_ m128d_ s - Low lane
!(a >= b), other lane unchanged. - cmp_
ngt_ mask_ m128 - Lanewise
!(a > b). - cmp_
ngt_ mask_ m128_ s - Low lane
!(a > b), other lanes unchanged. - cmp_
ngt_ mask_ m128d - Lanewise
!(a > b). - cmp_
ngt_ mask_ m128d_ s - Low lane
!(a > b), other lane unchanged. - cmp_
nle_ mask_ m128 - Lanewise
!(a <= b). - cmp_
nle_ mask_ m128_ s - Low lane
!(a <= b), other lanes unchanged. - cmp_
nle_ mask_ m128d - Lanewise
!(a <= b). - cmp_
nle_ mask_ m128d_ s - Low lane
!(a <= b), other lane unchanged. - cmp_
nlt_ mask_ m128 - Lanewise
!(a < b). - cmp_
nlt_ mask_ m128_ s - Low lane
!(a < b), other lanes unchanged. - cmp_
nlt_ mask_ m128d - Lanewise
!(a < b). - cmp_
nlt_ mask_ m128d_ s - Low lane
!(a < b), other lane unchanged. - cmp_
ordered_ mask_ m128 - Lanewise
(!a.is_nan()) & (!b.is_nan()). - cmp_
ordered_ mask_ m128_ s - Low lane
(!a.is_nan()) & (!b.is_nan()), other lanes unchanged. - cmp_
ordered_ mask_ m128d - Lanewise
(!a.is_nan()) & (!b.is_nan()). - cmp_
ordered_ mask_ m128d_ s - Low lane
(!a.is_nan()) & (!b.is_nan()), other lane unchanged. - cmp_
unord_ mask_ m128 - Lanewise
a.is_nan() | b.is_nan(). - cmp_
unord_ mask_ m128_ s - Low lane
a.is_nan() | b.is_nan(), other lanes unchanged. - cmp_
unord_ mask_ m128d - Lanewise
a.is_nan() | b.is_nan(). - cmp_
unord_ mask_ m128d_ s - Low lane
a.is_nan() | b.is_nan(), other lane unchanged. - convert_
i32_ replace_ m128_ s - Convert
i32tof32and replace the low lane of the input. - convert_
i32_ replace_ m128d_ s - Convert
i32tof64and replace the low lane of the input. - convert_
i64_ replace_ m128_ s - Convert
i64tof32and replace the low lane of the input. - convert_
i64_ replace_ m128d_ s - Convert
i64tof64and replace the low lane of the input. - convert_
m128_ s_ replace_ m128d_ s - Converts the lower
f32tof64and replace the low lane of the input - convert_
m128d_ s_ replace_ m128_ s - Converts the low
f64tof32and replaces the low lane of the input. - convert_
to_ i32_ m128i_ from_ m128 - Rounds the
f32lanes toi32lanes. - convert_
to_ i32_ m128i_ from_ m128d - Rounds the two
f64lanes to the low twoi32lanes. - convert_
to_ m128_ from_ i32_ m128i - Rounds the four
i32lanes to fourf32lanes. - convert_
to_ m128_ from_ m128d - Rounds the two
f64lanes to the low twof32lanes. - convert_
to_ m128d_ from_ lower2_ i32_ m128i - Rounds the lower two
i32lanes to twof64lanes. - convert_
to_ m128d_ from_ lower2_ m128 - Rounds the two
f64lanes to the low twof32lanes. - copy_
i64_ m128i_ s - Copy the low
i64lane to a new register, upper bits 0. - copy_
replace_ low_ f64_ m128d - Copies the
avalue and replaces the low lane with the lowbvalue. - div_
m128 - Lanewise
a / b. - div_
m128_ s - Low lane
a / b, other lanes unchanged. - div_
m128d - Lanewise
a / b. - div_
m128d_ s - Lowest lane
a / b, high lane unchanged. - extract_
i16_ as_ i32_ m128i - Gets an
i16value out of anm128i, returns asi32. - get_
f32_ from_ m128_ s - Gets the low lane as an individual
f32value. - get_
f64_ from_ m128d_ s - Gets the lower lane as an
f64value. - get_
i32_ from_ m128_ s - Converts the low lane to
i32and extracts as an individual value. - get_
i32_ from_ m128d_ s - Converts the lower lane to an
i32value. - get_
i32_ from_ m128i_ s - Converts the lower lane to an
i32value. - get_
i64_ from_ m128_ s - Converts the low lane to
i64and extracts as an individual value. - get_
i64_ from_ m128d_ s - Converts the lower lane to an
i64value. - get_
i64_ from_ m128i_ s - Converts the lower lane to an
i64value. - insert_
i16_ from_ i32_ m128i - Inserts the low 16 bits of an
i32value into anm128i. - load_
f32_ m128_ s - Loads the
f32reference into the low lane of the register. - load_
f32_ splat_ m128 - Loads the
f32reference into all lanes of a register. - load_
f64_ m128d_ s - Loads the reference into the low lane of the register.
- load_
f64_ splat_ m128d - Loads the
f64reference into all lanes of a register. - load_
i64_ m128i_ s - Loads the low
i64into a register. - load_
m128 - Loads the reference into a register.
- load_
m128d - Loads the reference into a register.
- load_
m128i - Loads the reference into a register.
- load_
replace_ high_ m128d - Loads the reference into a register, replacing the high lane.
- load_
replace_ low_ m128d - Loads the reference into a register, replacing the low lane.
- load_
reverse_ m128 - Loads the reference into a register with reversed order.
- load_
reverse_ m128d - Loads the reference into a register with reversed order.
- load_
unaligned_ m128 - Loads the reference into a register.
- load_
unaligned_ m128d - Loads the reference into a register.
- load_
unaligned_ m128i - Loads the reference into a register.
- max_
i16_ m128i - Lanewise
max(a, b)with lanes asi16. - max_
m128 - Lanewise
max(a, b). - max_
m128_ s - Low lane
max(a, b), other lanes unchanged. - max_
m128d - Lanewise
max(a, b). - max_
m128d_ s - Low lane
max(a, b), other lanes unchanged. - max_
u8_ m128i - Lanewise
max(a, b)with lanes asu8. - min_
i16_ m128i - Lanewise
min(a, b)with lanes asi16. - min_
m128 - Lanewise
min(a, b). - min_
m128_ s - Low lane
min(a, b), other lanes unchanged. - min_
m128d - Lanewise
min(a, b). - min_
m128d_ s - Low lane
min(a, b), other lanes unchanged. - min_
u8_ m128i - Lanewise
min(a, b)with lanes asu8. - move_
high_ low_ m128 - Move the high lanes of
bto the low lanes ofa, other lanes unchanged. - move_
low_ high_ m128 - Move the low lanes of
bto the high lanes ofa, other lanes unchanged. - move_
m128_ s - Move the low lane of
btoa, other lanes unchanged. - move_
mask_ i8_ m128i - Gathers the
i8sign bit of each lane. - move_
mask_ m128 - Gathers the sign bit of each lane.
- move_
mask_ m128d - Gathers the sign bit of each lane.
- mul_
i16_ horizontal_ add_ m128i - Multiply
i16lanes producingi32values, horizontal add pairs ofi32values to produce the final output. - mul_
i16_ keep_ high_ m128i - Lanewise
a * bwith lanes asi16, keep the high bits of thei32intermediates. - mul_
i16_ keep_ low_ m128i - Lanewise
a * bwith lanes asi16, keep the low bits of thei32intermediates. - mul_
m128 - Lanewise
a * b. - mul_
m128_ s - Low lane
a * b, other lanes unchanged. - mul_
m128d - Lanewise
a * b. - mul_
m128d_ s - Lowest lane
a * b, high lane unchanged. - mul_
u16_ keep_ high_ m128i - Lanewise
a * bwith lanes asu16, keep the high bits of theu32intermediates. - mul_
widen_ u32_ odd_ m128i - Multiplies the odd
u32lanes and gives the widened (u64) results. - pack_
i16_ to_ i8_ m128i - Saturating convert
i16toi8, and pack the values. - pack_
i16_ to_ u8_ m128i - Saturating convert
i16tou8, and pack the values. - pack_
i32_ to_ i16_ m128i - Saturating convert
i32toi16, and pack the values. - prefetch_
et0 - Fetches the cache line containing
addrinto all levels of the cache hierarchy, anticipating write - prefetch_
et1 - Fetches into L2 and higher, anticipating write
- prefetch_
nta - Fetch data using the non-temporal access (NTA) hint. It may be a place closer than main memory but outside of the cache hierarchy. This is used to reduce access latency without polluting the cache.
- prefetch_
t0 - Fetches the cache line containing
addrinto all levels of the cache hierarchy. - prefetch_
t1 - Fetches into L2 and higher.
- prefetch_
t2 - Fetches into L3 and higher or an implementation-specific choice (e.g., L2 if there is no L3).
- read_
timestamp_ counter - Reads the CPU’s timestamp counter value.
- read_
timestamp_ counter_ p - Reads the CPU’s timestamp counter value and store the processor signature.
- reciprocal_
m128 - Lanewise
1.0 / aapproximation. - reciprocal_
m128_ s - Low lane
1.0 / aapproximation, other lanes unchanged. - reciprocal_
sqrt_ m128 - Lanewise
1.0 / sqrt(a)approximation. - reciprocal_
sqrt_ m128_ s - Low lane
1.0 / sqrt(a)approximation, other lanes unchanged. - set_
i8_ m128i - Sets the args into an
m128i, first arg is the high lane. - set_
i16_ m128i - Sets the args into an
m128i, first arg is the high lane. - set_
i32_ m128i - Sets the args into an
m128i, first arg is the high lane. - set_
i32_ m128i_ s - Set an
i32as the low 32-bit lane of anm128i, other lanes blank. - set_
i64_ m128i - Sets the args into an
m128i, first arg is the high lane. - set_
i64_ m128i_ s - Set an
i64as the low 64-bit lane of anm128i, other lanes blank. - set_
m128 - Sets the args into an
m128, first arg is the high lane. - set_
m128_ s - Sets the args into an
m128, first arg is the high lane. - set_
m128d - Sets the args into an
m128d, first arg is the high lane. - set_
m128d_ s - Sets the args into the low lane of a
m128d. - set_
reversed_ i8_ m128i - Sets the args into an
m128i, first arg is the low lane. - set_
reversed_ i16_ m128i - Sets the args into an
m128i, first arg is the low lane. - set_
reversed_ i32_ m128i - Sets the args into an
m128i, first arg is the low lane. - set_
reversed_ m128 - Sets the args into an
m128, first arg is the low lane. - set_
reversed_ m128d - Sets the args into an
m128d, first arg is the low lane. - set_
splat_ i8_ m128i - Splats the
i8to all lanes of them128i. - set_
splat_ i16_ m128i - Splats the
i16to all lanes of them128i. - set_
splat_ i32_ m128i - Splats the
i32to all lanes of them128i. - set_
splat_ i64_ m128i - Splats the
i64to both lanes of them128i. - set_
splat_ m128 - Splats the value to all lanes.
- set_
splat_ m128d - Splats the args into both lanes of the
m128d. - shl_
all_ u16_ m128i - Shift all
u16lanes to the left by thecountin the loweru64lane. - shl_
all_ u32_ m128i - Shift all
u32lanes to the left by thecountin the loweru64lane. - shl_
all_ u64_ m128i - Shift all
u64lanes to the left by thecountin the loweru64lane. - shl_
imm_ u16_ m128i - Shifts all
u16lanes left by an immediate. - shl_
imm_ u32_ m128i - Shifts all
u32lanes left by an immediate. - shl_
imm_ u64_ m128i - Shifts both
u64lanes left by an immediate. - shr_
all_ i16_ m128i - Shift each
i16lane to the right by thecountin the loweri64lane. - shr_
all_ i32_ m128i - Shift each
i32lane to the right by thecountin the loweri64lane. - shr_
all_ u16_ m128i - Shift each
u16lane to the right by thecountin the loweru64lane. - shr_
all_ u32_ m128i - Shift each
u32lane to the right by thecountin the loweru64lane. - shr_
all_ u64_ m128i - Shift each
u64lane to the right by thecountin the loweru64lane. - shr_
imm_ i16_ m128i - Shifts all
i16lanes right by an immediate. - shr_
imm_ i32_ m128i - Shifts all
i32lanes right by an immediate. - shr_
imm_ u16_ m128i - Shifts all
u16lanes right by an immediate. - shr_
imm_ u32_ m128i - Shifts all
u32lanes right by an immediate. - shr_
imm_ u64_ m128i - Shifts both
u64lanes right by an immediate. - shuffle_
abi_ f32_ all_ m128 - Shuffle the
f32lanes from$aand$btogether using an immediate control value. - shuffle_
abi_ f64_ all_ m128d - Shuffle the
f64lanes from$aand$btogether using an immediate control value. - shuffle_
ai_ f32_ all_ m128i - Shuffle the
i32lanes in$ausing an immediate control value. - shuffle_
ai_ i16_ h64all_ m128i - Shuffle the high
i16lanes in$ausing an immediate control value. - shuffle_
ai_ i16_ l64all_ m128i - Shuffle the low
i16lanes in$ausing an immediate control value. - sqrt_
m128 - Lanewise
sqrt(a). - sqrt_
m128_ s - Low lane
sqrt(a), other lanes unchanged. - sqrt_
m128d - Lanewise
sqrt(a). - sqrt_
m128d_ s - Low lane
sqrt(b), upper lane is unchanged froma. - store_
high_ m128d_ s - Stores the high lane value to the reference given.
- store_
i64_ m128i_ s - Stores the value to the reference given.
- store_
m128 - Stores the value to the reference given.
- store_
m128_ s - Stores the low lane value to the reference given.
- store_
m128d - Stores the value to the reference given.
- store_
m128d_ s - Stores the low lane value to the reference given.
- store_
m128i - Stores the value to the reference given.
- store_
reverse_ m128 - Stores the value to the reference given in reverse order.
- store_
reversed_ m128d - Stores the value to the reference given.
- store_
splat_ m128 - Stores the low lane value to all lanes of the reference given.
- store_
splat_ m128d - Stores the low lane value to all lanes of the reference given.
- store_
unaligned_ m128 - Stores the value to the reference given.
- store_
unaligned_ m128d - Stores the value to the reference given.
- store_
unaligned_ m128i - Stores the value to the reference given.
- sub_
i8_ m128i - Lanewise
a - bwith lanes asi8. - sub_
i16_ m128i - Lanewise
a - bwith lanes asi16. - sub_
i32_ m128i - Lanewise
a - bwith lanes asi32. - sub_
i64_ m128i - Lanewise
a - bwith lanes asi64. - sub_
m128 - Lanewise
a - b. - sub_
m128_ s - Low lane
a - b, other lanes unchanged. - sub_
m128d - Lanewise
a - b. - sub_
m128d_ s - Lowest lane
a - b, high lane unchanged. - sub_
saturating_ i8_ m128i - Lanewise saturating
a - bwith lanes asi8. - sub_
saturating_ i16_ m128i - Lanewise saturating
a - bwith lanes asi16. - sub_
saturating_ u8_ m128i - Lanewise saturating
a - bwith lanes asu8. - sub_
saturating_ u16_ m128i - Lanewise saturating
a - bwith lanes asu16. - sum_
of_ u8_ abs_ diff_ m128i - Compute “sum of
u8absolute differences”. - transpose_
four_ m128 - Transpose four
m128as if they were a 4x4 matrix. - truncate_
m128_ to_ m128i - Truncate the
f32lanes toi32lanes. - truncate_
m128d_ to_ m128i - Truncate the
f64lanes to the loweri32lanes (upperi32lanes 0). - truncate_
to_ i32_ m128d_ s - Truncate the lower lane into an
i32. - truncate_
to_ i64_ m128d_ s - Truncate the lower lane into an
i64. - unpack_
high_ i8_ m128i - Unpack and interleave high
i8lanes ofaandb. - unpack_
high_ i16_ m128i - Unpack and interleave high
i16lanes ofaandb. - unpack_
high_ i32_ m128i - Unpack and interleave high
i32lanes ofaandb. - unpack_
high_ i64_ m128i - Unpack and interleave high
i64lanes ofaandb. - unpack_
high_ m128 - Unpack and interleave high lanes of
aandb. - unpack_
high_ m128d - Unpack and interleave high lanes of
aandb. - unpack_
low_ i8_ m128i - Unpack and interleave low
i8lanes ofaandb. - unpack_
low_ i16_ m128i - Unpack and interleave low
i16lanes ofaandb. - unpack_
low_ i32_ m128i - Unpack and interleave low
i32lanes ofaandb. - unpack_
low_ i64_ m128i - Unpack and interleave low
i64lanes ofaandb. - unpack_
low_ m128 - Unpack and interleave low lanes of
aandb. - unpack_
low_ m128d - Unpack and interleave low lanes of
aandb. - zeroed_
m128 - All lanes zero.
- zeroed_
m128d - Both lanes zero.
- zeroed_
m128i - All lanes zero.