safe_arch/
naming_conventions.rs

1//! An explanation of the crate's naming conventions.
2//!
3//! This crate attempts to follow the general naming scheme of `verb_type` when
4//! the operation is "simple", and `verb_description_words_type` when the
5//! operation (op) needs to be more specific than normal. Like this:
6//! * `add_m128`
7//! * `add_saturating_i8_m128i`
8//!
9//! ## Types
10//! Currently, only `x86` and `x86_64` types are supported. Among those types:
11//! * `m128` and `m256` are always considered to hold `f32` lanes.
12//! * `m128d` and `m256d` are always considered to hold `f64` lanes.
13//! * `m128i` and `m256i` hold integer data, but each op specifies what lane
14//!   width of integers the operation uses.
15//! * If the type has `_s` on the end then it's a "scalar" operation that
16//!   affects just the lowest lane. The other lanes are generally copied forward
17//!   from one of the inputs, though the details there vary from op to op.
18//! * The SIMD types are often referred to as "registers" because each SIMD
19//!   typed value represents exactly one CPU register when you're doing work.
20//!
21//! ## Operations
22//! There's many operations that can be performed. When possible, `safe_arch`
23//! tries to follow normal Rust naming (eg: adding is still `add` and left
24//! shifting is still `shl`), but if an operation doesn't normally exist at all
25//! in Rust then we basically have to make something up.
26//!
27//! Many operations have more than one variant, such as `add` and also
28//! `add_saturating`. In this case, `safe_arch` puts the "core operation" first
29//! and then any "modifiers" go after, which isn't how you might normally say it
30//! in English, but it makes the list of functions sort better.
31//!
32//! As a general note on SIMD terminology: When an operation uses the same
33//! indexed lane in two _different_ registers to determine the output, that is a
34//! "vertical" operation. When an operation uses more than one lane in the
35//! _same_ register to determine the output, that is a "horizontal" operation.
36//! * Vertical: `out[0] = a[0] + b[0]`, `out[1] = a[1] + b[1]`
37//! * Horizontal: `out[0] = a[0] + a[1]`, `out[1] = b[0] + b[1]`
38//!
39//! ## Operation Glossary
40//! Here follows the list of all the main operations and their explanations.
41//!
42//! * `abs`: Absolute value (wrapping).
43//! * `add`: Addition. This is "wrapping" by default, though some other types of
44//!   addition are available. Remember that wrapping signed addition is the same
45//!   as wrapping unsigned addition.
46//! * `average`: Averages the two inputs.
47//! * `bitand`: Bitwise And, `a & b`, like [the trait](core::ops::BitAnd).
48//! * `bitandnot`: Bitwise `(!a) & b`. This seems a little funny at first but
49//!   it's useful for clearing bits. The output will be based on the `b` side's
50//!   bit pattern, but with all active bits in `a` cleared:
51//!   * `bitandnot(0b0010, 0b1011) == 0b1001`
52//! * `bitor`: Bitwise Or, `a | b`, like [the trait](core::ops::BitOr).
53//! * `bitxor`: Bitwise eXclusive Or, `a ^ b`, like [the
54//!   trait](core::ops::BitXor).
55//! * `blend`: Merge the data lanes of two SIMD values by taking either the `b`
56//!   value or `a` value for each lane. Depending on the instruction, the blend
57//!   mask can be either an immediate or a runtime value.
58//! * `cast`: Convert between data types while preserving the exact bit
59//!   patterns, like how [`transmute`](core::mem::transmute) works.
60//! * `ceil`: "Ceiling", rounds towards positive infinity.
61//! * `cmp`: Numeric comparisons of various kinds. This generally gives "mask"
62//!   output where the output value is of the same data type as the inputs, but
63//!   with all the bits in a "true" lane as 1 and all the bits in a "false" lane
64//!   as 0. Remember that with floating point values all 1s bits is a NaN, and
65//!   with signed integers all 1s bits is -1.
66//!   * An "Ordered comparison" checks if _neither_ floating point value is NaN.
67//!   * An "Unordered comparison" checks if _either_ floating point value is
68//!     NaN.
69//! * `convert`: This does some sort of numeric type change. The details can
70//!   vary wildly. Generally, if the number of lanes goes down then the lowest
71//!   lanes will be kept. If the number of lanes goes up then the new high lanes
72//!   will be zero.
73//! * `div`: Division.
74//! * `dot_product`: This works like the matrix math operation. The lanes are
75//!   multiplied and then the results are summed up into a single value.
76//! * `duplicate`: Copy the even or odd indexed lanes to the other set of lanes.
77//!   Eg, `[1, 2, 3, 4]` becomes `[1, 1, 3, 3]` or `[2, 2, 4, 4]`.
78//! * `extract`: Get a value from the lane of a SIMD type into a scalar type.
79//! * `floor`: Rounds towards negative infinity.
80//! * `fused`: All the fused operations are a multiply as well as some sort of
81//!   adding or subtracting. The details depend on which fused operation you
82//!   select. The benefit of this operation over a non-fused operation are that
83//!   it can compute slightly faster than doing the mul and add separately, and
84//!   also the output can have higher accuracy in the result.
85//! * `insert`: The opposite of `extract`, this puts a new value into a
86//!   particular lane of a SIMD type.
87//! * `load`: Reads an address and makes a SIMD register value. The details can
88//!   vary because there's more than one type of `load`, but generally this is a
89//!   `&T -> U` style operation.
90//! * `max`: Picks the larger value from each of the two inputs.
91//! * `min`: Picks the smaller value from each of the two inputs.
92//! * `mul`: Multiplication. For floating point this is just "normal"
93//!   multiplication, but for integer types you tend to have some options. An
94//!   integer multiplication of X bits will produce a 2X bit output, so
95//!   generally you'll get to pick if you want to keep the high half of that,
96//!   the low half of that (a normal "wrapping" mul), or "widen" the outputs to
97//!   be all the bits at the expense of not multiplying half the lanes the
98//!   lanes.
99//! * `pack`: Take the integers in the `a` and `b` inputs, reduce them to fit
100//!   within the half-sized integer type (eg: `i16` to `i8`), and pack them all
101//!   together into the output.
102//! * `population`: The "population" operations refer to the bits within an
103//!   integer. Either counting them or adjusting them in various ways.
104//! * `rdrand`: Use the hardware RNG to make a random value of the given length.
105//! * `rdseed`: Use the hardware RNG to make a random seed of the given length.
106//!   This is less commonly available, but theoretically an improvement over
107//!   `rdrand` in that if you have to combine more than one usage of this
108//!   operation to make your full seed size then the guess difficulty rises at a
109//!   multiplicative rate instead of just an additive rate. For example, two
110//!   `u64` outputs concatenated to a single `u128` have a guess difficulty of
111//!   2^(64*64) with `rdseed` but only 2^(64+64) with `rdrand`.
112//! * `read_timestamp_counter`: Lets you read the CPU's cycle counter, which
113//!   doesn't strictly mean anything in particular since even the CPU's clock
114//!   rate isn't even stable over time, but you might find it interesting as an
115//!   approximation during benchmarks, or something like that.
116//! * `reciprocal`: Turns `x` into `1/x`. Can also be combined with a `sqrt`
117//!   operation.
118//! * `round`: Convert floating point values to whole numbers, according to one
119//!   of several available methods.
120//! * `set`: Places a list of scalar values into a SIMD lane. Conceptually
121//!   similar to how building an array works in Rust.
122//! * `splat`: Not generally an operation of its own, but a modifier to other
123//!   operations such as `load` and `set`. This will copy a given value across a
124//!   SIMD type as many times as it can be copied. For example, a 32-bit value
125//!   splatted into a 128-bit register will be copied four times.
126//! * `shl`: Bit shift left. New bits shifted in are always 0. Because the shift
127//!   is the same for both signed and unsigned values, this crate simply marks
128//!   left shift as always being an unsigned operation.
129//!   * You can shift by an immediate value ("imm"), all lanes by the same value
130//!     ("all"), or each lane by its own value ("each").
131//! * `shr`: Bit shift right. This comes in two forms: "Arithmetic" shifts shift
132//!   in the starting sign bit (which preserves the sign of the value), and
133//!   "Logical" shifts shift in 0 regardless of the starting sign bit (so the
134//!   result ends up being positive). With normal Rust types, signed integers
135//!   use arithmetic shifts and unsigned integers use logical shifts, so these
136//!   functions are marked as being for signed or unsigned integers
137//!   appropriately.
138//!   * As with `shl`, you can shift by an immediate value ("imm"), all lanes by
139//!     the same value ("all"), or each lane by its own value ("each").
140//! * `sign_apply`: Multiplies one set of values by the signum (1, 0, or -1) of
141//!   another set of values.
142//! * `sqrt`: Square Root.
143//! * `store`: Writes a SIMD value to a memory location.
144//! * `string_search`: A rather specialized instruction that lets you do byte
145//!   based searching within a register. This lets you do some very high speed
146//!   searching through ASCII strings when the stars align.
147//! * `sub`: Subtract.
148//! * `shuffle`: This lets you re-order the data lanes. Sometimes x86/x64 calls
149//!   this is called "shuffle", and sometimes it's called "permute", and there's
150//!   no particular reasoning behind the different names, so we just call them
151//!   all shuffle.
152//!   * `shuffle_{args}_{lane-type}_{lane-sources}_{simd-type}`.
153//!   * "args" is the input arguments: `a` (one arg) or `ab` (two args), then
154//!     either `v` (runtime-varying) or `i` (immediate). All the immediate
155//!     shuffles are macros, of course.
156//!   * "lane type" is `f32`, `f64`, `i8`, etc. If there's a `z` after the type
157//!     then you'll also be able to zero an output position instead of making it
158//!     come from a particular source lane.
159//!   * "lane sources" is generally either "all" which means that all lanes can
160//!     go to all other lanes, or "half" which means that each half of the lanes
161//!     is isolated from the other half, and you can't cross data between the
162//!     two halves, only within a half (this is how most of the 256-bit x86/x64
163//!     shuffles work).
164//! * `unpack`: Takes a SIMD value and gets out some of the lanes while widening
165//!   them, such as converting `i16` to `i32`.