pub struct CgroupStats {Show 30 fields
pub cgroup_name: String,
pub num_workers: usize,
pub cpus_used: BTreeSet<usize>,
pub num_cpus: usize,
pub avg_off_cpu_pct: Option<f64>,
pub min_off_cpu_pct: Option<f64>,
pub max_off_cpu_pct: Option<f64>,
pub spread: Option<f64>,
pub max_gap_ms: u64,
pub max_gap_cpu: usize,
pub total_migrations: u64,
pub migration_ratio: f64,
pub p99_wake_latency_us: f64,
pub median_wake_latency_us: f64,
pub wake_latency_cv: f64,
pub wake_measured: bool,
pub median_timer_latency_us: f64,
pub p99_timer_latency_us: f64,
pub p999_timer_latency_us: f64,
pub worst_timer_latency_us: f64,
pub timer_measured: bool,
pub total_iterations: u64,
pub total_cpu_time_ns: u64,
pub mean_run_delay_us: f64,
pub worst_run_delay_us: f64,
pub run_delay_measured: bool,
pub page_locality: f64,
pub cross_node_migration_ratio: f64,
pub taobench_whole: Option<TaobenchStats>,
pub ext_metrics: BTreeMap<String, f64>,
}Expand description
Per-cgroup statistics from worker telemetry.
§Percentile convention
p99_wake_latency_us and median_wake_latency_us are computed
by percentile using the NEAREST-RANK (Type 1) definition:
the value at ceil(n * p) - 1 in sorted order. No interpolation
between samples. This matches the percentile convention used
throughout schbench and the BPF latency histograms the project
cross-references, so a ktstr p99 reading aligns with a
schbench lat99 without adjustment. For small n (wake
reservoirs cap at MAX_WAKE_SAMPLES = 100_000 per worker —
see workload.rs) nearest-rank is also numerically stable —
interpolation between the two nearest ranks would be
implementation-defined at sample-set boundaries.
§CV pooling scope
wake_latency_cv is POOLED across every sample from every
worker in the cgroup, not a per-worker CV averaged back. That
collapses per-worker dispersion into the cgroup-wide signal:
two workers with uniformly low jitter but different means
produce a high pooled CV (mean-shift between workers inflates
stddev), while per-worker CV would show neither worker as
bad. This is intentional for the fairness threshold
(max_wake_latency_cv): a scheduler that gives worker A
10µs wakes and worker B 1ms wakes is failing fairness even if
each worker on its own is tight. Tests comparing single-worker
behavior should scope their assertions to per-worker data
rather than this aggregate.
§Derived ratios
Two metrics are DERIVED rather than measured and live as
&self methods, NOT as serde-serialized fields:
Self::wake_latency_tail_ratio (= p99/median) and
Self::iterations_per_worker (= total_iterations/num_workers).
Pre-1.0 cleanup eliminated the prior stored-field shadow and
derive_ratios stamper. Consumers always recompute on read,
so a hand-constructed fixture or a deserialized sidecar from an
older build cannot silently carry a stale ratio. The run-level
worst-cgroup tail ratio (crate::stats::MetricKind::WakeLatencyTailRatio,
an ext_metrics entry) and the iterations efficiencies
(worst_iterations_per_worker / worst_iterations_per_cpu_sec) are all
re-pooled POST-merge by populate_run_distribution_metrics — the tail
ratio as the max over Self::wake_latency_tail_ratio across per-cgroup
Self entries, the efficiencies lowest-wins from
Self::iterations_per_worker / Self::iterations_per_cpu_sec.
Fields§
§cgroup_name: StringCgroup name (the workload-handle label this telemetry belongs to),
or empty for unlabeled call sites (collect_all, bare
assert_cgroup). Set post-hoc by collect_handles (in
crate::scenario) where the name is in scope; cgroup_stats
itself has only the reports and leaves it empty. Lets a PASSING-run
consumer say which cgroup’s work landed on which CPUs.
num_workers: usizeNumber of workers in this cgroup.
cpus_used: BTreeSet<usize>Distinct CPUs the workers in this cgroup actually ran on (union of
each crate::workload::WorkerReport::cpus_used). num_cpus is
its length, kept for the existing rollups; this set surfaces WHICH
CPUs (not just how many) on every run, pass or fail.
num_cpus: usizeDistinct CPUs used across all workers in this cgroup
(cpus_used.len()).
avg_off_cpu_pct: Option<f64>Mean off-CPU percentage across workers (off_cpu_ns /
wall_time_ns * 100). None when no worker reported a
positive wall_time_ns (off-CPU% is undefined without wall
time) — distinct from Some(0.0), a measured “never off
CPU”. The Option keeps a not-measured cgroup from reading
as a perfectly-on-CPU one in the telemetry consumers
(ScenarioStats.cgroups).
min_off_cpu_pct: Option<f64>Minimum off-CPU percentage across workers. None under the
same no-measurable-wall-time condition as avg_off_cpu_pct.
max_off_cpu_pct: Option<f64>Maximum off-CPU percentage across workers. None under the
same no-measurable-wall-time condition as avg_off_cpu_pct.
spread: Option<f64>max_off_cpu_pct - min_off_cpu_pct. Measures scheduling
fairness within the cgroup. None when off-CPU% was not
measured (no worker with positive wall time) — a not-measured
cgroup is inconclusive for fairness, NOT “spread 0 = perfectly
fair”. Some(0.0) means a real measured zero spread.
max_gap_ms: u64Longest scheduling gap across all workers (ms).
max_gap_cpu: usizeCPU where the longest scheduling gap occurred.
total_migrations: u64Sum of CPU migration counts across all workers.
migration_ratio: f64Migrations per iteration (total_migrations / total_iterations).
p99_wake_latency_us: f6499th percentile wake latency across all workers (microseconds).
median_wake_latency_us: f64Median wake latency across all workers (microseconds).
wake_latency_cv: f64Coefficient of variation (stddev / mean) of wake latencies.
Computed over the POOLED latency samples from every worker in
the cgroup, not as a mean of per-worker CVs. Per-worker
dispersion is therefore masked: a cgroup with one tight
worker and one wildly variable worker can report a moderate
pooled CV that looks healthier than either constituent. Use
WorkerReport::wake_latencies_ns directly if per-worker
CV is needed.
wake_measured: boolWhether any worker in this cgroup recorded a wake-latency sample.
false makes the wake reductions above (p99_wake_latency_us,
median_wake_latency_us, wake_latency_cv) a not-measured sentinel
0.0 rather than a measured zero — a percentile over zero samples is
undefined, not “instant wakes”. The run-level distributional re-pool
(populate_run_distribution_metrics) reads this to EXCLUDE a
no-wake-sample cgroup from the cross-run mean instead of folding its
0.0 in (which, for the LowerBetter wake metrics, would falsely drag
the mean toward “perfect”). Same not-measured-vs-measured-zero
discipline the off-CPU% Option fields above carry.
median_timer_latency_us: f64Median timer-latency across all workers (microseconds) — the
crate::workload::WorkType::TimerLatency cyclictest probe’s per-cgroup
pooled reduction over
crate::workload::WorkerReport::timer_latencies_ns. 0.0 when no
worker recorded timer samples.
p99_timer_latency_us: f6499th-percentile timer-latency across all workers (microseconds). See
Self::median_timer_latency_us.
p999_timer_latency_us: f6499.9th-percentile (deep-tail) timer-latency across all workers
(microseconds). See Self::median_timer_latency_us.
worst_timer_latency_us: f64Worst (maximum) timer-latency across all workers (microseconds). See
Self::median_timer_latency_us.
timer_measured: boolWhether any worker in this cgroup recorded a timer-latency sample.
false makes the timer reductions above a not-measured sentinel 0.0
(no crate::workload::WorkType::TimerLatency worker ran), distinct
from a measured zero. Read by the run-level re-pool to EXCLUDE a
no-timer-sample cgroup from the cross-run mean. Mirrors
Self::wake_measured for the timer carrier.
total_iterations: u64Sum of iteration counts across all workers.
total_cpu_time_ns: u64Sum of per-worker on-CPU time (nanoseconds), from each worker’s
schedstat run time (crate::workload::WorkerReport::schedstat_cpu_time_ns
— task->se.sum_exec_runtime, the FIRST /proc/<pid>/schedstat field
(sched_info supplies only the run_delay/pcount fields 2/3, not the
on-CPU time), the summable per-thread proxy for the cgroup’s
cpu.stat usage_usec).
Denominator for Self::iterations_per_cpu_sec, the
overcommit-invariant per-cell rate. 0 when no worker reported on-CPU
time (the accessor then returns None).
mean_run_delay_us: f64Mean schedstat run delay across workers (microseconds).
worst_run_delay_us: f64Worst schedstat run delay across workers (microseconds).
run_delay_measured: boolWhether this cgroup had any worker to measure run-delay from
(!run_delays.is_empty(), i.e. num_workers > 0) — false only for a
worker-less cgroup, keeping a degenerate empty cohort from folding a
sentinel 0.0 into the cross-run run-delay mean. Unlike wake/timer
(per-sample streams a running worker may never emit), run-delay is one
sched_info.run_delay value per worker, always present once a worker
exists: a worker that never queued reads a real measured 0.0, not a
no-measurement sentinel. sched_info.run_delay accumulates whenever
CONFIG_SCHED_INFO is built in (compile-time — forced on in ktstr,
selected by both CONFIG_SCHEDSTATS and CONFIG_TASK_DELAY_ACCT),
with no gate on the runtime kernel.sched_schedstats key (that key
gates only the schedstat_* rq/se aggregates, never run_delay), so
run-delay is genuinely measured on every ktstr run and worker-presence
is the correct measured predicate. Mirrors Self::wake_measured for
the run-delay carrier.
page_locality: f64Fraction of pages on the expected NUMA node(s) (0.0-1.0).
Derived from /proc/self/numa_maps and the worker’s
MemPolicy.
cross_node_migration_ratio: f64Cross-node page migration ratio from /proc/vmstat
numa_pages_migrated delta divided by total allocated pages.
taobench_whole: Option<TaobenchStats>Whole-run taobench engine COUNTER aggregate pooled across this cgroup’s
crate::workload::WorkType::Taobench workers (Σ ops, MAX wall window —
the window is shared by concurrent workers, per
crate::workload::WorkerReport::taobench_whole). None for every
non-taobench cgroup. A RAW carrier, like Self::total_iterations /
Self::total_cpu_time_ns — not a reduced ratio: the run-level
cross-cgroup pool crate::assert::populate_run_pooled_taobench folds it
into the total_taobench_* Counter components and the derived
taobench_*_per_sec / taobench_hit_fraction / taobench_command_hit_rate
Rates in Self::ext_metrics (whole-run keys visible to --noise-adjust
spread, unlike the per-phase taobench_*_qps which are
MetricKind::PerPhase). Whole-run, NOT summable from the per-phase
PhaseCgroupStats::taobench carriers (per-phase elapsed_ns is
MAX-merged across concurrent threads, so summing phase windows is the
wrong qps denominator), so the engine’s authoritative whole-run aggregate
is shipped from the worker. Holds COUNTERS only
(TaobenchStats) — the
serve-latency histogram is per-phase data on PhaseCgroupStats::taobench,
and the whole-run serve distribution (taobench_serve_*_us_whole) is the
union of those per-phase histograms. pub (every CgroupStats field is
pub and the struct is preluded, so a test author can read the counters).
#[claim(skip)]: a raw aggregate carrier, not a test-author claim
surface — assertions run against the host-derived run-level taobench_*
Rate / serve-latency metrics, mirroring
crate::workload::WorkerReport::taobench_whole.
ext_metrics: BTreeMap<String, f64>Extensible metrics for the generic comparison pipeline.
Implementations§
Source§impl CgroupStats
impl CgroupStats
Sourcepub fn wake_latency_tail_ratio(&self) -> f64
pub fn wake_latency_tail_ratio(&self) -> f64
Wake-latency tail amplification:
p99_wake_latency_us / median_wake_latency_us. Returns 0.0
when median_wake_latency_us <= 0.0 so the result never
propagates NaN / Infinity into downstream
finite_or_zero filters. Method-only access (no stored
shadow) — recomputed every call from the raw fields.
Unitless; ≥1.0 by definition of order statistics (p99 cannot
undershoot the median on the same sample set). Values far
above 1.0 signal a long tail — the scheduler wakes most
workers promptly but occasionally stalls some, a regression
axis that neither median_* nor p99_* exposes in
isolation.
Sourcepub fn measured_for(&self, source: SampleSource) -> bool
pub fn measured_for(&self, source: SampleSource) -> bool
Whether this cgroup measured the given distribution source. Gates the
run-level carrier-less re-pool in populate_run_distribution_metrics
so a cgroup that recorded no samples for source contributes ABSENCE
(leaving the fold None when no cgroup measured it), not a sentinel
0.0. See Self::wake_measured / Self::timer_measured /
Self::run_delay_measured.
Sourcepub fn iterations_per_worker(&self) -> Option<f64>
pub fn iterations_per_worker(&self) -> Option<f64>
Throughput per parallel degree:
total_iterations / num_workers. None when
num_workers == 0 (no worker reported, so per-worker
throughput is undefined — distinct from a measured zero);
Some(0.0) when workers ran but completed zero iterations
(a real throughput collapse). The None / Some(0.0) split
is load-bearing: the run-level worst-cgroup re-pool in
populate_run_distribution_metrics (the
MetricKind::WorstLowest arm) must treat a measured zero as
the worst reading (it wins the “lowest” bucket) while skipping
a no-data cgroup — collapsing both to 0.0 would hide a
starved cgroup behind the no-data sentinel. Method-only
access (no stored shadow) — recomputed every call from the
raw fields.
Only meaningful across runs of the SAME variant (equal
scenario duration): cross-variant comparison is misleading
because this metric is NOT rate-normalized — a longer-
running scenario racks up more iterations per worker even if
the scheduler is identical. perf-delta-style
comparisons hold scenario, topology, and work_type constant
before reading this method.
Sourcepub fn iterations_per_cpu_sec(&self) -> Option<f64>
pub fn iterations_per_cpu_sec(&self) -> Option<f64>
Worker iterations per CPU-second of on-CPU time consumed by this
cgroup’s workers — total_iterations / (total_cpu_time_ns / 1e9).
Unlike Self::iterations_per_worker (raw work, which scales with
the host-CPU budget delivered to the guest) and a wall-time rate
(which also drops under host oversubscription), this is
OVERCOMMIT-INVARIANT: under cpu_budget < vcpus a cell completes
proportionally fewer iterations AND consumes proportionally less
on-CPU time, so the ratio cancels the lost host-CPU-time factor. Use
it to compare per-cgroup throughput across cpu_budget settings.
None when num_workers == 0 (no worker — undefined, distinct from a
measured zero) or total_cpu_time_ns == 0 (no on-CPU time captured;
returns inconclusive rather than Inf). For a pure busy-spin
workload this rate is ~constant by construction, so it measures
CPU-time EFFICIENCY; for the cross-cell ALLOCATION balance use
ScenarioStats::cgroup_balance_ratio over iterations_per_worker.
Trait Implementations§
Source§impl CgroupStatsClaim for CgroupStats
impl CgroupStatsClaim for CgroupStats
fn claim_cgroup_name<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, String>
fn claim_num_workers<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, usize>
fn claim_cpus_used<'a>( &'a self, verdict: &'a mut Verdict, ) -> SetClaim<'a, usize>
fn claim_num_cpus<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, usize>
fn claim_avg_off_cpu_pct<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, Option<f64>>
fn claim_min_off_cpu_pct<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, Option<f64>>
fn claim_max_off_cpu_pct<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, Option<f64>>
fn claim_spread<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, Option<f64>>
fn claim_max_gap_ms<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, u64>
fn claim_max_gap_cpu<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, usize>
fn claim_total_migrations<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, u64>
fn claim_migration_ratio<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, f64>
fn claim_p99_wake_latency_us<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, f64>
fn claim_median_wake_latency_us<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, f64>
fn claim_wake_latency_cv<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, f64>
fn claim_wake_measured<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, bool>
fn claim_median_timer_latency_us<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, f64>
fn claim_p99_timer_latency_us<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, f64>
fn claim_p999_timer_latency_us<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, f64>
fn claim_worst_timer_latency_us<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, f64>
fn claim_timer_measured<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, bool>
fn claim_total_iterations<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, u64>
fn claim_total_cpu_time_ns<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, u64>
fn claim_mean_run_delay_us<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, f64>
fn claim_worst_run_delay_us<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, f64>
fn claim_run_delay_measured<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, bool>
fn claim_page_locality<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, f64>
fn claim_cross_node_migration_ratio<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, f64>
Source§impl Clone for CgroupStats
impl Clone for CgroupStats
Source§fn clone(&self) -> CgroupStats
fn clone(&self) -> CgroupStats
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for CgroupStats
impl Debug for CgroupStats
Source§impl Default for CgroupStats
impl Default for CgroupStats
Source§fn default() -> CgroupStats
fn default() -> CgroupStats
Source§impl<'de> Deserialize<'de> for CgroupStats
impl<'de> Deserialize<'de> for CgroupStats
Source§fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
Source§impl PartialEq for CgroupStats
impl PartialEq for CgroupStats
Source§impl Serialize for CgroupStats
impl Serialize for CgroupStats
impl StructuralPartialEq for CgroupStats
Auto Trait Implementations§
impl Freeze for CgroupStats
impl RefUnwindSafe for CgroupStats
impl Send for CgroupStats
impl Sync for CgroupStats
impl Unpin for CgroupStats
impl UnwindSafe for CgroupStats
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
§impl<T> Instrument for T
impl<T> Instrument for T
§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more