Enum AluWidth

Source

pub enum AluWidth {
    Scalar,
    Vec128,
    Vec256,
    Vec512,
    Amx,
    Widest,
}

Expand description

ALU/SIMD execution width for WorkType::AluHot.

Selects the widest data-path the worker exercises per multiply chain. Today every variant executes the same scalar four-stream multiply chain — the width selector is preserved on the wire so a downstream classifier can distinguish runs that requested SIMD from runs that requested scalar even though the dispatch is uniform. Wider variants WILL drive more functional-unit pressure and (for AVX-512 / AMX) draw the package into a frequency-throttled mode the kernel scheduler must observe once SIMD intrinsics land per-arm. The serde wire form is snake_case ("scalar", "vec128", "vec256", "vec512", "amx", "widest").

§Current behaviour

All widths run the same four-stream scalar multiply path; the width selector is preserved on the wire (the WorkType::AluHot / WorkPhase::AluHot config carries width) so a downstream classifier can distinguish runs that requested SIMD from runs that requested scalar even though the dispatch is uniform.

§Default semantics

Scalar is the type-level Rust default (the #[derive(Default)] fallback that serde uses when an AluWidth field is missing on the wire — keeps backward- compat for older capture data). Widest is the workload-level default the super::defaults::ALU_HOT_WIDTH constant resolves at runtime via resolve_alu_width: tests that take WorkType::from_name("AluHot") get the host’s widest available data-path, not the type-level scalar fallback. The asymmetry is deliberate — type-level Default favours “always available everywhere”; workload-level default favours “stress the host as hard as it can run.”

§Resolution rules

Widest is a runtime-resolved sentinel: at worker entry the dispatch arm probes the host CPU via std::is_x86_feature_detected! (x86_64) and picks the widest available variant in the order Amx > Vec512 > Vec256 > Vec128 > Scalar. On aarch64 only Scalar and Vec128 (NEON) are available; Vec256 / Vec512 / Amx are absent and Widest resolves to NEON when present, falling back to Scalar. A configured value that the host cannot run is downgraded to the next-widest available variant with a one-shot tracing::warn! so the test still produces useful telemetry rather than hard-failing — silent downgrade without the warn would mask the host capability gap.

§Frequency throttle on x86_64

On Intel client / server SKUs the AVX-512 license raises the per-core voltage and lowers the all-core turbo for the package; running Vec512 workers under one scheduler while other workers run under another biases the comparison because the throttle is package-wide, not per-task. Tests that A/B-compare schedulers under Vec512 or Amx need the runs serialized on the same package — the framework does not currently coordinate this serialization across worker groups.

Variants§

§

Scalar

64-bit scalar integer multiply chain. Drives the integer pipeline only; no SIMD or AVX licensing involved. Available on every supported architecture.

§

Vec128

128-bit vector integer multiply chain (SSE2 on x86_64, NEON on aarch64). The widest baseline both architectures support; a reasonable default when the test cares about “vectorized ALU” without architecture-specific tuning.

§

Vec256

256-bit vector integer multiply chain (AVX2 on x86_64). Not available on aarch64 — falls back to Vec128 (NEON) at worker entry with a one-shot warn.

§

Vec512

512-bit vector integer multiply chain (AVX-512F on x86_64). Triggers the package-wide frequency throttle described above. Not available on aarch64 — falls back to Vec128 (NEON) at worker entry.

§

Amx

AMX tile multiply chain (x86_64 server SKUs with AMX-INT8 or AMX-BF16). The widest data-path on x86_64; uses XFD gating in the kernel (the first AMX instruction raises a #NM trap that arch/x86/kernel/traps.c::handle_xfd_event handles, calling arch/x86/kernel/fpu/xstate.c::__xfd_enable_feature to allocate the dynamic XSAVE area) so the kernel allocates the dynamic XSAVE area lazily — adds a one-time per-task latency spike on first use.

AMX additionally requires prctl(ARCH_REQ_XCOMP_PERM, XFEATURE_XTILE_DATA) per process before the first AMX instruction; the framework does NOT issue this prctl, so AMX is not yet runnable. resolve_alu_width therefore downgrades AluWidth::Amx to the host’s widest stable-detectable variant; AMX is not currently runnable end-to-end on this framework.

Not available on aarch64 — falls back to Vec128.

§

Widest

Resolve to the widest variant the host supports at worker entry. See the type-level doc for the resolution order. Useful as a default when the test author wants “as much ALU pressure as the host can sustain” without hardcoding an architecture or feature level.

Enum AluWidth Copy item path

§Current behaviour

§Default semantics

§Resolution rules

§Frequency throttle on x86_64

Variants§

Scalar

Vec128

Vec256

Vec512

Amx

Widest

Trait Implementations§

impl Clone for AluWidth

fn clone(&self) -> AluWidth

fn clone_from(&mut self, source: &Self)

impl Debug for AluWidth

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl Default for AluWidth

fn default() -> AluWidth

impl<'de> Deserialize<'de> for AluWidth

fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where __D: Deserializer<'de>,

impl Hash for AluWidth

fn hash<__H: Hasher>(&self, state: &mut __H)

fn hash_slice<H>(data: &[Self], state: &mut H)where H: Hasher, Self: Sized,

impl PartialEq for AluWidth

fn eq(&self, other: &AluWidth) -> bool

fn ne(&self, other: &Rhs) -> bool

impl Serialize for AluWidth

fn serialize<__S>(&self, __serializer: __S) -> Result<__S::Ok, __S::Error>where __S: Serializer,

impl Copy for AluWidth

impl Eq for AluWidth

impl StructuralPartialEq for AluWidth

Auto Trait Implementations§

impl Freeze for AluWidth

impl RefUnwindSafe for AluWidth

impl Send for AluWidth

impl Sync for AluWidth

impl Unpin for AluWidth

impl UnwindSafe for AluWidth

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> CloneToUninit for Twhere T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

impl<Q, K> Equivalent<K> for Qwhere Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

fn equivalent(&self, key: &K) -> bool

impl<Q, K> Equivalent<K> for Qwhere Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

fn equivalent(&self, key: &K) -> bool

impl<Q, K> Equivalent<K> for Qwhere Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

fn equivalent(&self, key: &K) -> bool

impl<Q, K> Equivalent<K> for Qwhere Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

fn equivalent(&self, key: &K) -> bool

impl<T> From<T> for T

fn from(t: T) -> T

impl<T> Instrument for T

fn instrument(self, span: Span) -> Instrumented<Self>

fn in_current_span(self) -> Instrumented<Self>

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> IntoEither for T

fn into_either(self, into_left: bool) -> Either<Self, Self>

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>where F: FnOnce(&Self) -> bool,

impl<T> Pointable for T

const ALIGN: usize

type Init = T

unsafe fn init(init: <T as Pointable>::Init) -> usize

unsafe fn deref<'a>(ptr: usize) -> &'a T

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

unsafe fn drop(ptr: usize)

impl<T> PolicyExt for Twhere T: ?Sized,

fn and<P, B, E>(self, other: P) -> And<T, P>where T: Policy<B, E>, P: Policy<B, E>,

fn or<P, B, E>(self, other: P) -> Or<T, P>where T: Policy<B, E>, P: Policy<B, E>,

impl<T> Same for T

type Output = T

impl<T> ToOwned for Twhere T: Clone,

Enum AluWidth

fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where __D: Deserializer<'de>,

fn hash<H: Hasher>(&self, state: &mut H)

fn hash_slice<H>(data: &[Self], state: &mut H)
where H: Hasher, Self: Sized,

fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where S: Serializer,

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

impl<T, U> Into<U> for T
where U: From<T>,

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

impl<T> PolicyExt for T
where T: ?Sized,

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

impl<T> ToOwned for T
where T: Clone,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,

impl<T> MaybeSend for T
where T: Send,

impl<T> MaybeSend for T
where T: Send,