Struct CgroupStats

Source

pub struct CgroupStats {Show 30 fields
    pub cgroup_name: String,
    pub num_workers: usize,
    pub cpus_used: BTreeSet<usize>,
    pub num_cpus: usize,
    pub avg_off_cpu_pct: Option<f64>,
    pub min_off_cpu_pct: Option<f64>,
    pub max_off_cpu_pct: Option<f64>,
    pub spread: Option<f64>,
    pub max_gap_ms: u64,
    pub max_gap_cpu: usize,
    pub total_migrations: u64,
    pub migration_ratio: f64,
    pub p99_wake_latency_us: f64,
    pub median_wake_latency_us: f64,
    pub wake_latency_cv: f64,
    pub wake_measured: bool,
    pub median_timer_latency_us: f64,
    pub p99_timer_latency_us: f64,
    pub p999_timer_latency_us: f64,
    pub worst_timer_latency_us: f64,
    pub timer_measured: bool,
    pub total_iterations: u64,
    pub total_cpu_time_ns: u64,
    pub mean_run_delay_us: f64,
    pub worst_run_delay_us: f64,
    pub run_delay_measured: bool,
    pub page_locality: f64,
    pub cross_node_migration_ratio: f64,
    pub taobench_whole: Option<TaobenchStats>,
    pub ext_metrics: BTreeMap<String, f64>,
}

Expand description

Per-cgroup statistics from worker telemetry.

§Percentile convention

p99_wake_latency_us and median_wake_latency_us are computed by percentile using the NEAREST-RANK (Type 1) definition: the value at ceil(n * p) - 1 in sorted order. No interpolation between samples. This matches the percentile convention used throughout schbench and the BPF latency histograms the project cross-references, so a ktstr p99 reading aligns with a schbench lat99 without adjustment. For small n (wake reservoirs cap at MAX_WAKE_SAMPLES = 100_000 per worker — see workload.rs) nearest-rank is also numerically stable — interpolation between the two nearest ranks would be implementation-defined at sample-set boundaries.

§CV pooling scope

wake_latency_cv is POOLED across every sample from every worker in the cgroup, not a per-worker CV averaged back. That collapses per-worker dispersion into the cgroup-wide signal: two workers with uniformly low jitter but different means produce a high pooled CV (mean-shift between workers inflates stddev), while per-worker CV would show neither worker as bad. This is intentional for the fairness threshold (max_wake_latency_cv): a scheduler that gives worker A 10µs wakes and worker B 1ms wakes is failing fairness even if each worker on its own is tight. Tests comparing single-worker behavior should scope their assertions to per-worker data rather than this aggregate.

§Derived ratios

Two metrics are DERIVED rather than measured and live as &self methods, NOT as serde-serialized fields: Self::wake_latency_tail_ratio (= p99/median) and Self::iterations_per_worker (= total_iterations/num_workers). Pre-1.0 cleanup eliminated the prior stored-field shadow and derive_ratios stamper. Consumers always recompute on read, so a hand-constructed fixture or a deserialized sidecar from an older build cannot silently carry a stale ratio. The run-level worst-cgroup tail ratio (crate::stats::MetricKind::WakeLatencyTailRatio, an ext_metrics entry) and the iterations efficiencies (worst_iterations_per_worker / worst_iterations_per_cpu_sec) are all re-pooled POST-merge by populate_run_distribution_metrics — the tail ratio as the max over Self::wake_latency_tail_ratio across per-cgroup Self entries, the efficiencies lowest-wins from Self::iterations_per_worker / Self::iterations_per_cpu_sec.

Fields§

§cgroup_name: String

Cgroup name (the workload-handle label this telemetry belongs to), or empty for unlabeled call sites (collect_all, bare assert_cgroup). Set post-hoc by collect_handles (in crate::scenario) where the name is in scope; cgroup_stats itself has only the reports and leaves it empty. Lets a PASSING-run consumer say which cgroup’s work landed on which CPUs.

§num_workers: usize

Number of workers in this cgroup.

§cpus_used: BTreeSet<usize>

Distinct CPUs the workers in this cgroup actually ran on (union of each crate::workload::WorkerReport::cpus_used). num_cpus is its length, kept for the existing rollups; this set surfaces WHICH CPUs (not just how many) on every run, pass or fail.

§num_cpus: usize

Distinct CPUs used across all workers in this cgroup (cpus_used.len()).

§avg_off_cpu_pct: Option<f64>

Mean off-CPU percentage across workers (off_cpu_ns / wall_time_ns * 100). None when no worker reported a positive wall_time_ns (off-CPU% is undefined without wall time) — distinct from Some(0.0), a measured “never off CPU”. The Option keeps a not-measured cgroup from reading as a perfectly-on-CPU one in the telemetry consumers (ScenarioStats.cgroups).

§min_off_cpu_pct: Option<f64>

Minimum off-CPU percentage across workers. None under the same no-measurable-wall-time condition as avg_off_cpu_pct.

§max_off_cpu_pct: Option<f64>

Maximum off-CPU percentage across workers. None under the same no-measurable-wall-time condition as avg_off_cpu_pct.

§spread: Option<f64>

max_off_cpu_pct - min_off_cpu_pct. Measures scheduling fairness within the cgroup. None when off-CPU% was not measured (no worker with positive wall time) — a not-measured cgroup is inconclusive for fairness, NOT “spread 0 = perfectly fair”. Some(0.0) means a real measured zero spread.

§max_gap_ms: u64

Longest scheduling gap across all workers (ms).

§max_gap_cpu: usize

CPU where the longest scheduling gap occurred.

§total_migrations: u64

Sum of CPU migration counts across all workers.

§migration_ratio: f64

Migrations per iteration (total_migrations / total_iterations).

§p99_wake_latency_us: f64

99th percentile wake latency across all workers (microseconds).

§median_wake_latency_us: f64

Median wake latency across all workers (microseconds).

§wake_latency_cv: f64

Coefficient of variation (stddev / mean) of wake latencies.

Computed over the POOLED latency samples from every worker in the cgroup, not as a mean of per-worker CVs. Per-worker dispersion is therefore masked: a cgroup with one tight worker and one wildly variable worker can report a moderate pooled CV that looks healthier than either constituent. Use WorkerReport::wake_latencies_ns directly if per-worker CV is needed.

§wake_measured: bool

Whether any worker in this cgroup recorded a wake-latency sample. false makes the wake reductions above (p99_wake_latency_us, median_wake_latency_us, wake_latency_cv) a not-measured sentinel 0.0 rather than a measured zero — a percentile over zero samples is undefined, not “instant wakes”. The run-level distributional re-pool (populate_run_distribution_metrics) reads this to EXCLUDE a no-wake-sample cgroup from the cross-run mean instead of folding its 0.0 in (which, for the LowerBetter wake metrics, would falsely drag the mean toward “perfect”). Same not-measured-vs-measured-zero discipline the off-CPU% Option fields above carry.

§median_timer_latency_us: f64

Median timer-latency across all workers (microseconds) — the crate::workload::WorkType::TimerLatency cyclictest probe’s per-cgroup pooled reduction over crate::workload::WorkerReport::timer_latencies_ns. 0.0 when no worker recorded timer samples.

§p99_timer_latency_us: f64

99th-percentile timer-latency across all workers (microseconds). See Self::median_timer_latency_us.

§p999_timer_latency_us: f64

99.9th-percentile (deep-tail) timer-latency across all workers (microseconds). See Self::median_timer_latency_us.

§worst_timer_latency_us: f64

Worst (maximum) timer-latency across all workers (microseconds). See Self::median_timer_latency_us.

§timer_measured: bool

Whether any worker in this cgroup recorded a timer-latency sample. false makes the timer reductions above a not-measured sentinel 0.0 (no crate::workload::WorkType::TimerLatency worker ran), distinct from a measured zero. Read by the run-level re-pool to EXCLUDE a no-timer-sample cgroup from the cross-run mean. Mirrors Self::wake_measured for the timer carrier.

§total_iterations: u64

Sum of iteration counts across all workers.

§total_cpu_time_ns: u64

Sum of per-worker on-CPU time (nanoseconds), from each worker’s schedstat run time (crate::workload::WorkerReport::schedstat_cpu_time_ns — task->se.sum_exec_runtime, the FIRST /proc/<pid>/schedstat field (sched_info supplies only the run_delay/pcount fields 2/3, not the on-CPU time), the summable per-thread proxy for the cgroup’s cpu.stat usage_usec). Denominator for Self::iterations_per_cpu_sec, the overcommit-invariant per-cell rate. 0 when no worker reported on-CPU time (the accessor then returns None).

§mean_run_delay_us: f64

Mean schedstat run delay across workers (microseconds).

§worst_run_delay_us: f64

Worst schedstat run delay across workers (microseconds).

§run_delay_measured: bool

Whether this cgroup had any worker to measure run-delay from (!run_delays.is_empty(), i.e. num_workers > 0) — false only for a worker-less cgroup, keeping a degenerate empty cohort from folding a sentinel 0.0 into the cross-run run-delay mean. Unlike wake/timer (per-sample streams a running worker may never emit), run-delay is one sched_info.run_delay value per worker, always present once a worker exists: a worker that never queued reads a real measured 0.0, not a no-measurement sentinel. sched_info.run_delay accumulates whenever CONFIG_SCHED_INFO is built in (compile-time — forced on in ktstr, selected by both CONFIG_SCHEDSTATS and CONFIG_TASK_DELAY_ACCT), with no gate on the runtime kernel.sched_schedstats key (that key gates only the schedstat_* rq/se aggregates, never run_delay), so run-delay is genuinely measured on every ktstr run and worker-presence is the correct measured predicate. Mirrors Self::wake_measured for the run-delay carrier.

§page_locality: f64

Fraction of pages on the expected NUMA node(s) (0.0-1.0). Derived from /proc/self/numa_maps and the worker’s MemPolicy.

§cross_node_migration_ratio: f64

Cross-node page migration ratio from /proc/vmstat numa_pages_migrated delta divided by total allocated pages.

§taobench_whole: Option<TaobenchStats>

Whole-run taobench engine COUNTER aggregate pooled across this cgroup’s crate::workload::WorkType::Taobench workers (Σ ops, MAX wall window — the window is shared by concurrent workers, per crate::workload::WorkerReport::taobench_whole). None for every non-taobench cgroup. A RAW carrier, like Self::total_iterations / Self::total_cpu_time_ns — not a reduced ratio: the run-level cross-cgroup pool crate::assert::populate_run_pooled_taobench folds it into the total_taobench_* Counter components and the derived taobench_*_per_sec / taobench_hit_fraction / taobench_command_hit_rate Rates in Self::ext_metrics (whole-run keys visible to --noise-adjust spread, unlike the per-phase taobench_*_qps which are MetricKind::PerPhase). Whole-run, NOT summable from the per-phase PhaseCgroupStats::taobench carriers (per-phase elapsed_ns is MAX-merged across concurrent threads, so summing phase windows is the wrong qps denominator), so the engine’s authoritative whole-run aggregate is shipped from the worker. Holds COUNTERS only (TaobenchStats) — the serve-latency histogram is per-phase data on PhaseCgroupStats::taobench, and the whole-run serve distribution (taobench_serve_*_us_whole) is the union of those per-phase histograms. pub (every CgroupStats field is pub and the struct is preluded, so a test author can read the counters). #[claim(skip)]: a raw aggregate carrier, not a test-author claim surface — assertions run against the host-derived run-level taobench_* Rate / serve-latency metrics, mirroring crate::workload::WorkerReport::taobench_whole.

§ext_metrics: BTreeMap<String, f64>

Extensible metrics for the generic comparison pipeline.

Struct CgroupStats Copy item path

§Percentile convention

§CV pooling scope

§Derived ratios

Fields§

Implementations§

impl CgroupStats

pub fn wake_latency_tail_ratio(&self) -> f64

pub fn measured_for(&self, source: SampleSource) -> bool

pub fn iterations_per_worker(&self) -> Option<f64>

pub fn iterations_per_cpu_sec(&self) -> Option<f64>

Trait Implementations§

impl CgroupStatsClaim for CgroupStats

fn claim_cgroup_name<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, String>

fn claim_num_workers<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, usize>

fn claim_cpus_used<'a>( &'a self, verdict: &'a mut Verdict, ) -> SetClaim<'a, usize>

fn claim_num_cpus<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, usize>

fn claim_avg_off_cpu_pct<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, Option<f64>>

fn claim_min_off_cpu_pct<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, Option<f64>>

fn claim_max_off_cpu_pct<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, Option<f64>>

fn claim_spread<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, Option<f64>>

fn claim_max_gap_ms<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, u64>

fn claim_max_gap_cpu<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, usize>

fn claim_total_migrations<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, u64>

fn claim_migration_ratio<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, f64>

fn claim_p99_wake_latency_us<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, f64>

fn claim_median_wake_latency_us<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, f64>

fn claim_wake_latency_cv<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, f64>

fn claim_wake_measured<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, bool>

fn claim_median_timer_latency_us<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, f64>

fn claim_p99_timer_latency_us<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, f64>

fn claim_p999_timer_latency_us<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, f64>

fn claim_worst_timer_latency_us<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, f64>

fn claim_timer_measured<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, bool>

fn claim_total_iterations<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, u64>

fn claim_total_cpu_time_ns<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, u64>

fn claim_mean_run_delay_us<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, f64>

fn claim_worst_run_delay_us<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, f64>

fn claim_run_delay_measured<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, bool>

fn claim_page_locality<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, f64>

fn claim_cross_node_migration_ratio<'a>( &'a self, verdict: &'a mut Verdict, ) -> ClaimBuilder<'a, f64>

impl Clone for CgroupStats

fn clone(&self) -> CgroupStats

fn clone_from(&mut self, source: &Self)

impl Debug for CgroupStats

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl Default for CgroupStats

fn default() -> CgroupStats

impl<'de> Deserialize<'de> for CgroupStats

fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where __D: Deserializer<'de>,

impl PartialEq for CgroupStats

fn eq(&self, other: &CgroupStats) -> bool

fn ne(&self, other: &Rhs) -> bool

impl Serialize for CgroupStats

fn serialize<__S>(&self, __serializer: __S) -> Result<__S::Ok, __S::Error>where __S: Serializer,

impl StructuralPartialEq for CgroupStats

Auto Trait Implementations§

impl Freeze for CgroupStats

impl RefUnwindSafe for CgroupStats

impl Send for CgroupStats

impl Sync for CgroupStats

impl Unpin for CgroupStats

impl UnwindSafe for CgroupStats

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> CloneToUninit for Twhere T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

impl<T> From<T> for T

fn from(t: T) -> T

impl<T> Instrument for T

fn instrument(self, span: Span) -> Instrumented<Self>

fn in_current_span(self) -> Instrumented<Self>

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> IntoEither for T

Struct CgroupStats

fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where __D: Deserializer<'de>,

fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where S: Serializer,

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<T, U> Into<U> for T
where U: From<T>,

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

impl<T> PolicyExt for T
where T: ?Sized,

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

impl<T> ToOwned for T
where T: Clone,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,

impl<T> MaybeSend for T
where T: Send,

impl<T> MaybeSend for T
where T: Send,