CgroupStats

Struct CgroupStats 

Source
pub struct CgroupStats {
Show 30 fields pub cgroup_name: String, pub num_workers: usize, pub cpus_used: BTreeSet<usize>, pub num_cpus: usize, pub avg_off_cpu_pct: Option<f64>, pub min_off_cpu_pct: Option<f64>, pub max_off_cpu_pct: Option<f64>, pub spread: Option<f64>, pub max_gap_ms: u64, pub max_gap_cpu: usize, pub total_migrations: u64, pub migration_ratio: f64, pub p99_wake_latency_us: f64, pub median_wake_latency_us: f64, pub wake_latency_cv: f64, pub wake_measured: bool, pub median_timer_latency_us: f64, pub p99_timer_latency_us: f64, pub p999_timer_latency_us: f64, pub worst_timer_latency_us: f64, pub timer_measured: bool, pub total_iterations: u64, pub total_cpu_time_ns: u64, pub mean_run_delay_us: f64, pub worst_run_delay_us: f64, pub run_delay_measured: bool, pub page_locality: f64, pub cross_node_migration_ratio: f64, pub taobench_whole: Option<TaobenchStats>, pub ext_metrics: BTreeMap<String, f64>,
}
Expand description

Per-cgroup statistics from worker telemetry.

§Percentile convention

p99_wake_latency_us and median_wake_latency_us are computed by percentile using the NEAREST-RANK (Type 1) definition: the value at ceil(n * p) - 1 in sorted order. No interpolation between samples. This matches the percentile convention used throughout schbench and the BPF latency histograms the project cross-references, so a ktstr p99 reading aligns with a schbench lat99 without adjustment. For small n (wake reservoirs cap at MAX_WAKE_SAMPLES = 100_000 per worker — see workload.rs) nearest-rank is also numerically stable — interpolation between the two nearest ranks would be implementation-defined at sample-set boundaries.

§CV pooling scope

wake_latency_cv is POOLED across every sample from every worker in the cgroup, not a per-worker CV averaged back. That collapses per-worker dispersion into the cgroup-wide signal: two workers with uniformly low jitter but different means produce a high pooled CV (mean-shift between workers inflates stddev), while per-worker CV would show neither worker as bad. This is intentional for the fairness threshold (max_wake_latency_cv): a scheduler that gives worker A 10µs wakes and worker B 1ms wakes is failing fairness even if each worker on its own is tight. Tests comparing single-worker behavior should scope their assertions to per-worker data rather than this aggregate.

§Derived ratios

Two metrics are DERIVED rather than measured and live as &self methods, NOT as serde-serialized fields: Self::wake_latency_tail_ratio (= p99/median) and Self::iterations_per_worker (= total_iterations/num_workers). Pre-1.0 cleanup eliminated the prior stored-field shadow and derive_ratios stamper. Consumers always recompute on read, so a hand-constructed fixture or a deserialized sidecar from an older build cannot silently carry a stale ratio. The run-level worst-cgroup tail ratio (crate::stats::MetricKind::WakeLatencyTailRatio, an ext_metrics entry) and the iterations efficiencies (worst_iterations_per_worker / worst_iterations_per_cpu_sec) are all re-pooled POST-merge by populate_run_distribution_metrics — the tail ratio as the max over Self::wake_latency_tail_ratio across per-cgroup Self entries, the efficiencies lowest-wins from Self::iterations_per_worker / Self::iterations_per_cpu_sec.

Fields§

§cgroup_name: String

Cgroup name (the workload-handle label this telemetry belongs to), or empty for unlabeled call sites (collect_all, bare assert_cgroup). Set post-hoc by collect_handles (in crate::scenario) where the name is in scope; cgroup_stats itself has only the reports and leaves it empty. Lets a PASSING-run consumer say which cgroup’s work landed on which CPUs.

§num_workers: usize

Number of workers in this cgroup.

§cpus_used: BTreeSet<usize>

Distinct CPUs the workers in this cgroup actually ran on (union of each crate::workload::WorkerReport::cpus_used). num_cpus is its length, kept for the existing rollups; this set surfaces WHICH CPUs (not just how many) on every run, pass or fail.

§num_cpus: usize

Distinct CPUs used across all workers in this cgroup (cpus_used.len()).

§avg_off_cpu_pct: Option<f64>

Mean off-CPU percentage across workers (off_cpu_ns / wall_time_ns * 100). None when no worker reported a positive wall_time_ns (off-CPU% is undefined without wall time) — distinct from Some(0.0), a measured “never off CPU”. The Option keeps a not-measured cgroup from reading as a perfectly-on-CPU one in the telemetry consumers (ScenarioStats.cgroups).

§min_off_cpu_pct: Option<f64>

Minimum off-CPU percentage across workers. None under the same no-measurable-wall-time condition as avg_off_cpu_pct.

§max_off_cpu_pct: Option<f64>

Maximum off-CPU percentage across workers. None under the same no-measurable-wall-time condition as avg_off_cpu_pct.

§spread: Option<f64>

max_off_cpu_pct - min_off_cpu_pct. Measures scheduling fairness within the cgroup. None when off-CPU% was not measured (no worker with positive wall time) — a not-measured cgroup is inconclusive for fairness, NOT “spread 0 = perfectly fair”. Some(0.0) means a real measured zero spread.

§max_gap_ms: u64

Longest scheduling gap across all workers (ms).

§max_gap_cpu: usize

CPU where the longest scheduling gap occurred.

§total_migrations: u64

Sum of CPU migration counts across all workers.

§migration_ratio: f64

Migrations per iteration (total_migrations / total_iterations).

§p99_wake_latency_us: f64

99th percentile wake latency across all workers (microseconds).

§median_wake_latency_us: f64

Median wake latency across all workers (microseconds).

§wake_latency_cv: f64

Coefficient of variation (stddev / mean) of wake latencies.

Computed over the POOLED latency samples from every worker in the cgroup, not as a mean of per-worker CVs. Per-worker dispersion is therefore masked: a cgroup with one tight worker and one wildly variable worker can report a moderate pooled CV that looks healthier than either constituent. Use WorkerReport::wake_latencies_ns directly if per-worker CV is needed.

§wake_measured: bool

Whether any worker in this cgroup recorded a wake-latency sample. false makes the wake reductions above (p99_wake_latency_us, median_wake_latency_us, wake_latency_cv) a not-measured sentinel 0.0 rather than a measured zero — a percentile over zero samples is undefined, not “instant wakes”. The run-level distributional re-pool (populate_run_distribution_metrics) reads this to EXCLUDE a no-wake-sample cgroup from the cross-run mean instead of folding its 0.0 in (which, for the LowerBetter wake metrics, would falsely drag the mean toward “perfect”). Same not-measured-vs-measured-zero discipline the off-CPU% Option fields above carry.

§median_timer_latency_us: f64

Median timer-latency across all workers (microseconds) — the crate::workload::WorkType::TimerLatency cyclictest probe’s per-cgroup pooled reduction over crate::workload::WorkerReport::timer_latencies_ns. 0.0 when no worker recorded timer samples.

§p99_timer_latency_us: f64

99th-percentile timer-latency across all workers (microseconds). See Self::median_timer_latency_us.

§p999_timer_latency_us: f64

99.9th-percentile (deep-tail) timer-latency across all workers (microseconds). See Self::median_timer_latency_us.

§worst_timer_latency_us: f64

Worst (maximum) timer-latency across all workers (microseconds). See Self::median_timer_latency_us.

§timer_measured: bool

Whether any worker in this cgroup recorded a timer-latency sample. false makes the timer reductions above a not-measured sentinel 0.0 (no crate::workload::WorkType::TimerLatency worker ran), distinct from a measured zero. Read by the run-level re-pool to EXCLUDE a no-timer-sample cgroup from the cross-run mean. Mirrors Self::wake_measured for the timer carrier.

§total_iterations: u64

Sum of iteration counts across all workers.

§total_cpu_time_ns: u64

Sum of per-worker on-CPU time (nanoseconds), from each worker’s schedstat run time (crate::workload::WorkerReport::schedstat_cpu_time_nstask->se.sum_exec_runtime, the FIRST /proc/<pid>/schedstat field (sched_info supplies only the run_delay/pcount fields 2/3, not the on-CPU time), the summable per-thread proxy for the cgroup’s cpu.stat usage_usec). Denominator for Self::iterations_per_cpu_sec, the overcommit-invariant per-cell rate. 0 when no worker reported on-CPU time (the accessor then returns None).

§mean_run_delay_us: f64

Mean schedstat run delay across workers (microseconds).

§worst_run_delay_us: f64

Worst schedstat run delay across workers (microseconds).

§run_delay_measured: bool

Whether this cgroup had any worker to measure run-delay from (!run_delays.is_empty(), i.e. num_workers > 0) — false only for a worker-less cgroup, keeping a degenerate empty cohort from folding a sentinel 0.0 into the cross-run run-delay mean. Unlike wake/timer (per-sample streams a running worker may never emit), run-delay is one sched_info.run_delay value per worker, always present once a worker exists: a worker that never queued reads a real measured 0.0, not a no-measurement sentinel. sched_info.run_delay accumulates whenever CONFIG_SCHED_INFO is built in (compile-time — forced on in ktstr, selected by both CONFIG_SCHEDSTATS and CONFIG_TASK_DELAY_ACCT), with no gate on the runtime kernel.sched_schedstats key (that key gates only the schedstat_* rq/se aggregates, never run_delay), so run-delay is genuinely measured on every ktstr run and worker-presence is the correct measured predicate. Mirrors Self::wake_measured for the run-delay carrier.

§page_locality: f64

Fraction of pages on the expected NUMA node(s) (0.0-1.0). Derived from /proc/self/numa_maps and the worker’s MemPolicy.

§cross_node_migration_ratio: f64

Cross-node page migration ratio from /proc/vmstat numa_pages_migrated delta divided by total allocated pages.

§taobench_whole: Option<TaobenchStats>

Whole-run taobench engine COUNTER aggregate pooled across this cgroup’s crate::workload::WorkType::Taobench workers (Σ ops, MAX wall window — the window is shared by concurrent workers, per crate::workload::WorkerReport::taobench_whole). None for every non-taobench cgroup. A RAW carrier, like Self::total_iterations / Self::total_cpu_time_ns — not a reduced ratio: the run-level cross-cgroup pool crate::assert::populate_run_pooled_taobench folds it into the total_taobench_* Counter components and the derived taobench_*_per_sec / taobench_hit_fraction / taobench_command_hit_rate Rates in Self::ext_metrics (whole-run keys visible to --noise-adjust spread, unlike the per-phase taobench_*_qps which are MetricKind::PerPhase). Whole-run, NOT summable from the per-phase PhaseCgroupStats::taobench carriers (per-phase elapsed_ns is MAX-merged across concurrent threads, so summing phase windows is the wrong qps denominator), so the engine’s authoritative whole-run aggregate is shipped from the worker. Holds COUNTERS only (TaobenchStats) — the serve-latency histogram is per-phase data on PhaseCgroupStats::taobench, and the whole-run serve distribution (taobench_serve_*_us_whole) is the union of those per-phase histograms. pub (every CgroupStats field is pub and the struct is preluded, so a test author can read the counters). #[claim(skip)]: a raw aggregate carrier, not a test-author claim surface — assertions run against the host-derived run-level taobench_* Rate / serve-latency metrics, mirroring crate::workload::WorkerReport::taobench_whole.

§ext_metrics: BTreeMap<String, f64>

Extensible metrics for the generic comparison pipeline.

Implementations§

Source§

impl CgroupStats

Source

pub fn wake_latency_tail_ratio(&self) -> f64

Wake-latency tail amplification: p99_wake_latency_us / median_wake_latency_us. Returns 0.0 when median_wake_latency_us <= 0.0 so the result never propagates NaN / Infinity into downstream finite_or_zero filters. Method-only access (no stored shadow) — recomputed every call from the raw fields.

Unitless; ≥1.0 by definition of order statistics (p99 cannot undershoot the median on the same sample set). Values far above 1.0 signal a long tail — the scheduler wakes most workers promptly but occasionally stalls some, a regression axis that neither median_* nor p99_* exposes in isolation.

Source

pub fn measured_for(&self, source: SampleSource) -> bool

Whether this cgroup measured the given distribution source. Gates the run-level carrier-less re-pool in populate_run_distribution_metrics so a cgroup that recorded no samples for source contributes ABSENCE (leaving the fold None when no cgroup measured it), not a sentinel 0.0. See Self::wake_measured / Self::timer_measured / Self::run_delay_measured.

Source

pub fn iterations_per_worker(&self) -> Option<f64>

Throughput per parallel degree: total_iterations / num_workers. None when num_workers == 0 (no worker reported, so per-worker throughput is undefined — distinct from a measured zero); Some(0.0) when workers ran but completed zero iterations (a real throughput collapse). The None / Some(0.0) split is load-bearing: the run-level worst-cgroup re-pool in populate_run_distribution_metrics (the MetricKind::WorstLowest arm) must treat a measured zero as the worst reading (it wins the “lowest” bucket) while skipping a no-data cgroup — collapsing both to 0.0 would hide a starved cgroup behind the no-data sentinel. Method-only access (no stored shadow) — recomputed every call from the raw fields.

Only meaningful across runs of the SAME variant (equal scenario duration): cross-variant comparison is misleading because this metric is NOT rate-normalized — a longer- running scenario racks up more iterations per worker even if the scheduler is identical. perf-delta-style comparisons hold scenario, topology, and work_type constant before reading this method.

Source

pub fn iterations_per_cpu_sec(&self) -> Option<f64>

Worker iterations per CPU-second of on-CPU time consumed by this cgroup’s workers — total_iterations / (total_cpu_time_ns / 1e9).

Unlike Self::iterations_per_worker (raw work, which scales with the host-CPU budget delivered to the guest) and a wall-time rate (which also drops under host oversubscription), this is OVERCOMMIT-INVARIANT: under cpu_budget < vcpus a cell completes proportionally fewer iterations AND consumes proportionally less on-CPU time, so the ratio cancels the lost host-CPU-time factor. Use it to compare per-cgroup throughput across cpu_budget settings.

None when num_workers == 0 (no worker — undefined, distinct from a measured zero) or total_cpu_time_ns == 0 (no on-CPU time captured; returns inconclusive rather than Inf). For a pure busy-spin workload this rate is ~constant by construction, so it measures CPU-time EFFICIENCY; for the cross-cell ALLOCATION balance use ScenarioStats::cgroup_balance_ratio over iterations_per_worker.

Trait Implementations§

Source§

impl CgroupStatsClaim for CgroupStats

Source§

fn claim_cgroup_name<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, String>

Source§

fn claim_num_workers<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, usize>

Source§

fn claim_cpus_used<'a>( &'a self, verdict: &'a mut Verdict, ) -> SetClaim<'a, usize>

Source§

fn claim_num_cpus<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, usize>

Source§

fn claim_avg_off_cpu_pct<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, Option<f64>>

Source§

fn claim_min_off_cpu_pct<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, Option<f64>>

Source§

fn claim_max_off_cpu_pct<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, Option<f64>>

Source§

fn claim_spread<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, Option<f64>>

Source§

fn claim_max_gap_ms<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, u64>

Source§

fn claim_max_gap_cpu<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, usize>

Source§

fn claim_total_migrations<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, u64>

Source§

fn claim_migration_ratio<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, f64>

Source§

fn claim_p99_wake_latency_us<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, f64>

Source§

fn claim_median_wake_latency_us<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, f64>

Source§

fn claim_wake_latency_cv<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, f64>

Source§

fn claim_wake_measured<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, bool>

Source§

fn claim_median_timer_latency_us<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, f64>

Source§

fn claim_p99_timer_latency_us<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, f64>

Source§

fn claim_p999_timer_latency_us<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, f64>

Source§

fn claim_worst_timer_latency_us<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, f64>

Source§

fn claim_timer_measured<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, bool>

Source§

fn claim_total_iterations<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, u64>

Source§

fn claim_total_cpu_time_ns<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, u64>

Source§

fn claim_mean_run_delay_us<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, f64>

Source§

fn claim_worst_run_delay_us<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, f64>

Source§

fn claim_run_delay_measured<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, bool>

Source§

fn claim_page_locality<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, f64>

Source§

fn claim_cross_node_migration_ratio<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, f64>

Source§

impl Clone for CgroupStats

Source§

fn clone(&self) -> CgroupStats

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for CgroupStats

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for CgroupStats

Source§

fn default() -> CgroupStats

Returns the “default value” for a type. Read more
Source§

impl<'de> Deserialize<'de> for CgroupStats

Source§

fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more
Source§

impl PartialEq for CgroupStats

Source§

fn eq(&self, other: &CgroupStats) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl Serialize for CgroupStats

Source§

fn serialize<__S>(&self, __serializer: __S) -> Result<__S::Ok, __S::Error>
where __S: Serializer,

Serialize this value into the given Serde serializer. Read more
Source§

impl StructuralPartialEq for CgroupStats

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

§

impl<T> Instrument for T

§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided [Span], returning an Instrumented wrapper. Read more
§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
§

impl<T> Pointable for T

§

const ALIGN: usize

The alignment of pointer.
§

type Init = T

The type for initializers.
§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
§

impl<T> PolicyExt for T
where T: ?Sized,

§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns [Action::Follow] only if self and other return Action::Follow. Read more
§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns [Action::Follow] if either self or other returns Action::Follow. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

§

fn vzip(self) -> V

§

impl<T> WithSubscriber for T

§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a [WithDispatch] wrapper. Read more
§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a [WithDispatch] wrapper. Read more
Source§

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,

§

impl<T> MaybeSend for T
where T: Send,

§

impl<T> MaybeSend for T
where T: Send,