pub fn populate_run_distribution_metrics(stats: &mut ScenarioStats)Expand description
Populate run-level DERIVED distributional metrics into
stats.ext_metrics: every registered MetricKind::Distribution,
MetricKind::WorstLowest, MetricKind::WakeLatencyTailRatio, and
MetricKind::WorstCrossNodeRatio. This is the SOLE
within-run producer of those metrics’ values — they carry no per-phase
sample slice and no cross-cgroup merge fold, and their registry accessors
are |_| None, so MetricDef::read reads the value
written here from ext_metrics.
DISTRIBUTION (the 5 wake / run-delay aggregates): pools the RAW sample
vectors held in stats.phases[].per_cgroup across EVERY phase and EVERY
cgroup into one combined set, then recomputes the percentile / CV / mean
/ extreme over it — the statistic of the union, NOT a max or mean of
per-cgroup reductions (the percentile of a union is not the max of
per-source percentiles). The ns→µs scale is applied ONCE here (the
carriers store raw ns, per PhaseCgroupStats::run_delays_ns). The wake
pool is population-WEIGHTED: each phase carrier’s samples carry weight
wake_sample_total / wake_latencies_ns.len(), so a phase whose reservoir
hit the cap contributes by true population, not capped length (the
cross-PHASE de-skew) — reduced via the weighted percentile / moments.
The run-delay pool is unweighted (per-worker, never reservoir-capped, so
length IS population). Below the wake cap every weight is 1.0, so the
weighted P99 / median / mean / worst are byte-identical to the unweighted
concat; the weighted CV matches only within ~1e-9 (it sums the mean in f64
where the unweighted path sums in u64 — a weighted variance cannot keep the
u64 sum).
CARRIER-LESS FOLD (graceful degradation): a cgroup whose raw samples are
NOT in the pool — a backdrop epoch that fell on BASELINE or the
inter-step gap (no paired host bucket, so no carrier) or a cgroup whose
carrier was stripped/empty (strip_phase_cgroup_samples) — is NOT
dropped. Its
surviving per-cgroup CgroupStats reduction folds worst-wins (max — every
Distribution metric is LowerBetter, registry-gated) into the pooled value.
The CgroupStats reductions are never stripped — stats.cgroups[] is the
already-reduced cgroup_stats(reports) output, a SEPARATE reduction path
from the per-phase carriers — so a carrier-less cgroup always has a source.
When EVERY carrier is empty (a fully-stripped run) the pool is empty and the
result degenerates to the max over every cgroup’s reduction — the pre-Item-7
cross-cgroup max. NOTE the value CLASS of a folded cgroup differs from a
pooled one for the P99 / Median / Mean / CV reductions: a pooled cgroup
contributes to the percentile of the union; a carrier-less cgroup
contributes its per-cgroup reduction worst-wins (a worst-cgroup proxy, not
pooled). For the SampleReduction::Worst reduction the two COINCIDE
(max-of-union == max-of-per-cgroup-maxes), so the carrier-less fold is exact
there, not a proxy. A second asymmetry specific to CV (from the population
weighting): the POOLED CV divides variance/mean by Σ per-sample weights (the
reconstructed population), while a carrier-less cgroup’s folded CV is
cgroup_stats’s UNWEIGHTED CV (n = all_latencies.len()). The two
coincide below the cap (all weights 1.0) and diverge above it; the mix is
sound — a carrier-less cgroup has no per-phase weight data to
population-weight (its carrier is absent by definition), and both feed the
same LowerBetter worst-wins max. Backdrop step-phase carriers now join
the pool directly (per-epoch expansion in collect_handles); only the
carrier-less cases above fold worst-wins.
WORSTLOWEST (the 2 iteration efficiencies): the lowest (worst) cgroup’s
efficiency, computed per-cgroup from the stats.cgroups[] COUNTERS via
CgroupStats::iterations_per_worker / CgroupStats::iterations_per_cpu_sec
and the None-aware lowest-wins fold (a measured Some(0.0) — starvation
— wins; a no-data None is skipped; an all-None cohort writes no key,
preserving absence as a missing ext entry rather than a 0.0). The
counters survive stripping, so WorstLowest needs no fallback branch.
Runs post-merge at the eval layer beside
populate_run_pooled_iterations_per_cpu_sec, AFTER the per-cgroup
carriers are folded into stats.phases and BEFORE the sidecar write, so
stats.phases[].per_cgroup is fully merged and stats.cgroups is the
final per-cgroup roll-up.