safe_arch/naming_conventions.rs
1//! An explanation of the crate's naming conventions.
2//!
3//! This crate attempts to follow the general naming scheme of `verb_type` when
4//! the operation is "simple", and `verb_description_words_type` when the
5//! operation (op) needs to be more specific than normal. Like this:
6//! * `add_m128`
7//! * `add_saturating_i8_m128i`
8//!
9//! ## Types
10//! Currently, only `x86` and `x86_64` types are supported. Among those types:
11//! * `m128` and `m256` are always considered to hold `f32` lanes.
12//! * `m128d` and `m256d` are always considered to hold `f64` lanes.
13//! * `m128i` and `m256i` hold integer data, but each op specifies what lane
14//! width of integers the operation uses.
15//! * If the type has `_s` on the end then it's a "scalar" operation that
16//! affects just the lowest lane. The other lanes are generally copied forward
17//! from one of the inputs, though the details there vary from op to op.
18//! * The SIMD types are often referred to as "registers" because each SIMD
19//! typed value represents exactly one CPU register when you're doing work.
20//!
21//! ## Operations
22//! There's many operations that can be performed. When possible, `safe_arch`
23//! tries to follow normal Rust naming (eg: adding is still `add` and left
24//! shifting is still `shl`), but if an operation doesn't normally exist at all
25//! in Rust then we basically have to make something up.
26//!
27//! Many operations have more than one variant, such as `add` and also
28//! `add_saturating`. In this case, `safe_arch` puts the "core operation" first
29//! and then any "modifiers" go after, which isn't how you might normally say it
30//! in English, but it makes the list of functions sort better.
31//!
32//! As a general note on SIMD terminology: When an operation uses the same
33//! indexed lane in two _different_ registers to determine the output, that is a
34//! "vertical" operation. When an operation uses more than one lane in the
35//! _same_ register to determine the output, that is a "horizontal" operation.
36//! * Vertical: `out[0] = a[0] + b[0]`, `out[1] = a[1] + b[1]`
37//! * Horizontal: `out[0] = a[0] + a[1]`, `out[1] = b[0] + b[1]`
38//!
39//! ## Operation Glossary
40//! Here follows the list of all the main operations and their explanations.
41//!
42//! * `abs`: Absolute value (wrapping).
43//! * `add`: Addition. This is "wrapping" by default, though some other types of
44//! addition are available. Remember that wrapping signed addition is the same
45//! as wrapping unsigned addition.
46//! * `average`: Averages the two inputs.
47//! * `bitand`: Bitwise And, `a & b`, like [the trait](core::ops::BitAnd).
48//! * `bitandnot`: Bitwise `(!a) & b`. This seems a little funny at first but
49//! it's useful for clearing bits. The output will be based on the `b` side's
50//! bit pattern, but with all active bits in `a` cleared:
51//! * `bitandnot(0b0010, 0b1011) == 0b1001`
52//! * `bitor`: Bitwise Or, `a | b`, like [the trait](core::ops::BitOr).
53//! * `bitxor`: Bitwise eXclusive Or, `a ^ b`, like [the
54//! trait](core::ops::BitXor).
55//! * `blend`: Merge the data lanes of two SIMD values by taking either the `b`
56//! value or `a` value for each lane. Depending on the instruction, the blend
57//! mask can be either an immediate or a runtime value.
58//! * `cast`: Convert between data types while preserving the exact bit
59//! patterns, like how [`transmute`](core::mem::transmute) works.
60//! * `ceil`: "Ceiling", rounds towards positive infinity.
61//! * `cmp`: Numeric comparisons of various kinds. This generally gives "mask"
62//! output where the output value is of the same data type as the inputs, but
63//! with all the bits in a "true" lane as 1 and all the bits in a "false" lane
64//! as 0. Remember that with floating point values all 1s bits is a NaN, and
65//! with signed integers all 1s bits is -1.
66//! * An "Ordered comparison" checks if _neither_ floating point value is NaN.
67//! * An "Unordered comparison" checks if _either_ floating point value is
68//! NaN.
69//! * `convert`: This does some sort of numeric type change. The details can
70//! vary wildly. Generally, if the number of lanes goes down then the lowest
71//! lanes will be kept. If the number of lanes goes up then the new high lanes
72//! will be zero.
73//! * `div`: Division.
74//! * `dot_product`: This works like the matrix math operation. The lanes are
75//! multiplied and then the results are summed up into a single value.
76//! * `duplicate`: Copy the even or odd indexed lanes to the other set of lanes.
77//! Eg, `[1, 2, 3, 4]` becomes `[1, 1, 3, 3]` or `[2, 2, 4, 4]`.
78//! * `extract`: Get a value from the lane of a SIMD type into a scalar type.
79//! * `floor`: Rounds towards negative infinity.
80//! * `fused`: All the fused operations are a multiply as well as some sort of
81//! adding or subtracting. The details depend on which fused operation you
82//! select. The benefit of this operation over a non-fused operation are that
83//! it can compute slightly faster than doing the mul and add separately, and
84//! also the output can have higher accuracy in the result.
85//! * `insert`: The opposite of `extract`, this puts a new value into a
86//! particular lane of a SIMD type.
87//! * `load`: Reads an address and makes a SIMD register value. The details can
88//! vary because there's more than one type of `load`, but generally this is a
89//! `&T -> U` style operation.
90//! * `max`: Picks the larger value from each of the two inputs.
91//! * `min`: Picks the smaller value from each of the two inputs.
92//! * `mul`: Multiplication. For floating point this is just "normal"
93//! multiplication, but for integer types you tend to have some options. An
94//! integer multiplication of X bits will produce a 2X bit output, so
95//! generally you'll get to pick if you want to keep the high half of that,
96//! the low half of that (a normal "wrapping" mul), or "widen" the outputs to
97//! be all the bits at the expense of not multiplying half the lanes the
98//! lanes.
99//! * `pack`: Take the integers in the `a` and `b` inputs, reduce them to fit
100//! within the half-sized integer type (eg: `i16` to `i8`), and pack them all
101//! together into the output.
102//! * `population`: The "population" operations refer to the bits within an
103//! integer. Either counting them or adjusting them in various ways.
104//! * `rdrand`: Use the hardware RNG to make a random value of the given length.
105//! * `rdseed`: Use the hardware RNG to make a random seed of the given length.
106//! This is less commonly available, but theoretically an improvement over
107//! `rdrand` in that if you have to combine more than one usage of this
108//! operation to make your full seed size then the guess difficulty rises at a
109//! multiplicative rate instead of just an additive rate. For example, two
110//! `u64` outputs concatenated to a single `u128` have a guess difficulty of
111//! 2^(64*64) with `rdseed` but only 2^(64+64) with `rdrand`.
112//! * `read_timestamp_counter`: Lets you read the CPU's cycle counter, which
113//! doesn't strictly mean anything in particular since even the CPU's clock
114//! rate isn't even stable over time, but you might find it interesting as an
115//! approximation during benchmarks, or something like that.
116//! * `reciprocal`: Turns `x` into `1/x`. Can also be combined with a `sqrt`
117//! operation.
118//! * `round`: Convert floating point values to whole numbers, according to one
119//! of several available methods.
120//! * `set`: Places a list of scalar values into a SIMD lane. Conceptually
121//! similar to how building an array works in Rust.
122//! * `splat`: Not generally an operation of its own, but a modifier to other
123//! operations such as `load` and `set`. This will copy a given value across a
124//! SIMD type as many times as it can be copied. For example, a 32-bit value
125//! splatted into a 128-bit register will be copied four times.
126//! * `shl`: Bit shift left. New bits shifted in are always 0. Because the shift
127//! is the same for both signed and unsigned values, this crate simply marks
128//! left shift as always being an unsigned operation.
129//! * You can shift by an immediate value ("imm"), all lanes by the same value
130//! ("all"), or each lane by its own value ("each").
131//! * `shr`: Bit shift right. This comes in two forms: "Arithmetic" shifts shift
132//! in the starting sign bit (which preserves the sign of the value), and
133//! "Logical" shifts shift in 0 regardless of the starting sign bit (so the
134//! result ends up being positive). With normal Rust types, signed integers
135//! use arithmetic shifts and unsigned integers use logical shifts, so these
136//! functions are marked as being for signed or unsigned integers
137//! appropriately.
138//! * As with `shl`, you can shift by an immediate value ("imm"), all lanes by
139//! the same value ("all"), or each lane by its own value ("each").
140//! * `sign_apply`: Multiplies one set of values by the signum (1, 0, or -1) of
141//! another set of values.
142//! * `sqrt`: Square Root.
143//! * `store`: Writes a SIMD value to a memory location.
144//! * `string_search`: A rather specialized instruction that lets you do byte
145//! based searching within a register. This lets you do some very high speed
146//! searching through ASCII strings when the stars align.
147//! * `sub`: Subtract.
148//! * `shuffle`: This lets you re-order the data lanes. Sometimes x86/x64 calls
149//! this is called "shuffle", and sometimes it's called "permute", and there's
150//! no particular reasoning behind the different names, so we just call them
151//! all shuffle.
152//! * `shuffle_{args}_{lane-type}_{lane-sources}_{simd-type}`.
153//! * "args" is the input arguments: `a` (one arg) or `ab` (two args), then
154//! either `v` (runtime-varying) or `i` (immediate). All the immediate
155//! shuffles are macros, of course.
156//! * "lane type" is `f32`, `f64`, `i8`, etc. If there's a `z` after the type
157//! then you'll also be able to zero an output position instead of making it
158//! come from a particular source lane.
159//! * "lane sources" is generally either "all" which means that all lanes can
160//! go to all other lanes, or "half" which means that each half of the lanes
161//! is isolated from the other half, and you can't cross data between the
162//! two halves, only within a half (this is how most of the 256-bit x86/x64
163//! shuffles work).
164//! * `unpack`: Takes a SIMD value and gets out some of the lanes while widening
165//! them, such as converting `i16` to `i32`.