pub enum AluWidth {
Scalar,
Vec128,
Vec256,
Vec512,
Amx,
Widest,
}Expand description
ALU/SIMD execution width for WorkType::AluHot.
Selects the widest data-path the worker exercises per
multiply chain. Today every variant executes the same scalar
four-stream multiply chain — the width selector is preserved
on the wire so a downstream classifier can distinguish runs
that requested SIMD from runs that requested scalar even
though the dispatch is uniform. Wider variants WILL drive
more functional-unit pressure and (for AVX-512 / AMX) draw
the package into a frequency-throttled mode the kernel
scheduler must observe once SIMD intrinsics land per-arm.
The serde wire form is snake_case ("scalar", "vec128",
"vec256", "vec512", "amx", "widest").
§Current behaviour
All widths run the same four-stream scalar multiply path;
the width selector is preserved on the wire (the
WorkType::AluHot / WorkPhase::AluHot config carries
width) so a downstream classifier can distinguish runs
that requested SIMD from runs that requested scalar even
though the dispatch is uniform.
§Default semantics
Scalar is the type-level Rust default (the
#[derive(Default)] fallback that serde uses when an
AluWidth field is missing on the wire — keeps backward-
compat for older capture data). Widest is the
workload-level default the
super::defaults::ALU_HOT_WIDTH constant resolves at runtime
via resolve_alu_width: tests that take
WorkType::from_name("AluHot") get the host’s widest
available data-path, not the type-level scalar fallback.
The asymmetry is deliberate — type-level Default favours
“always available everywhere”; workload-level default
favours “stress the host as hard as it can run.”
§Resolution rules
Widest is a runtime-resolved sentinel: at worker entry the
dispatch arm probes the host CPU via
std::is_x86_feature_detected! (x86_64) and picks the
widest available variant in the order
Amx > Vec512 > Vec256 > Vec128 > Scalar. On aarch64 only
Scalar and Vec128 (NEON) are available; Vec256 /
Vec512 / Amx are absent and Widest resolves to NEON
when present, falling back to Scalar. A configured value
that the host cannot run is downgraded to the next-widest
available variant with a one-shot tracing::warn! so the
test still produces useful telemetry rather than
hard-failing — silent downgrade without the warn would
mask the host capability gap.
§Frequency throttle on x86_64
On Intel client / server SKUs the AVX-512 license raises the
per-core voltage and lowers the all-core turbo for the
package; running Vec512 workers under one
scheduler while other workers run under another biases the
comparison because the throttle is package-wide, not
per-task. Tests that A/B-compare schedulers under
Vec512 or Amx need the
runs serialized on the same package — the framework does
not currently coordinate this serialization across worker
groups.
Variants§
Scalar
64-bit scalar integer multiply chain. Drives the integer pipeline only; no SIMD or AVX licensing involved. Available on every supported architecture.
Vec128
128-bit vector integer multiply chain (SSE2 on x86_64, NEON on aarch64). The widest baseline both architectures support; a reasonable default when the test cares about “vectorized ALU” without architecture-specific tuning.
Vec256
256-bit vector integer multiply chain (AVX2 on x86_64).
Not available on aarch64 — falls back to Vec128
(NEON) at worker entry with a one-shot warn.
Vec512
512-bit vector integer multiply chain (AVX-512F on
x86_64). Triggers the package-wide frequency throttle
described above. Not available on aarch64 — falls back
to Vec128 (NEON) at worker entry.
Amx
AMX tile multiply chain (x86_64 server SKUs with AMX-INT8
or AMX-BF16). The widest data-path on x86_64; uses XFD
gating in the kernel
(the first AMX instruction raises a #NM trap that
arch/x86/kernel/traps.c::handle_xfd_event handles,
calling arch/x86/kernel/fpu/xstate.c::__xfd_enable_feature
to allocate the dynamic XSAVE area) so the kernel allocates
the dynamic XSAVE area lazily — adds a one-time per-task
latency spike on first use.
AMX additionally requires
prctl(ARCH_REQ_XCOMP_PERM, XFEATURE_XTILE_DATA) per
process before the first AMX instruction; the framework
does NOT issue this prctl, so AMX is not yet runnable.
resolve_alu_width therefore downgrades AluWidth::Amx
to the host’s widest stable-detectable variant; AMX is
not currently runnable end-to-end on this framework.
Not available on aarch64 — falls back to Vec128.
Widest
Resolve to the widest variant the host supports at worker entry. See the type-level doc for the resolution order. Useful as a default when the test author wants “as much ALU pressure as the host can sustain” without hardcoding an architecture or feature level.
Trait Implementations§
Source§impl<'de> Deserialize<'de> for AluWidth
impl<'de> Deserialize<'de> for AluWidth
Source§fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
impl Copy for AluWidth
impl Eq for AluWidth
impl StructuralPartialEq for AluWidth
Auto Trait Implementations§
impl Freeze for AluWidth
impl RefUnwindSafe for AluWidth
impl Send for AluWidth
impl Sync for AluWidth
impl Unpin for AluWidth
impl UnwindSafe for AluWidth
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
§fn equivalent(&self, key: &K) -> bool
fn equivalent(&self, key: &K) -> bool
§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
§fn equivalent(&self, key: &K) -> bool
fn equivalent(&self, key: &K) -> bool
key and return true if they are equal.§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
§fn equivalent(&self, key: &K) -> bool
fn equivalent(&self, key: &K) -> bool
§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
§fn equivalent(&self, key: &K) -> bool
fn equivalent(&self, key: &K) -> bool
§impl<T> Instrument for T
impl<T> Instrument for T
§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more