Compare a Scheduler vs EEVDF
A standard regression guard for a sched_ext scheduler: does it match (or beat) the kernel default (EEVDF) on the same workload — not just for throughput, but for latency and CPU overhead too? Run the workload under the scheduler in one phase, detach the scheduler mid-run so the kernel default takes over for a second phase, then compare the two phases metric by metric.
The workload must persist across the detach — a Backdrop population,
not per-step workers — so its cumulative counters span both phases. That
shared, continuous measurement is what makes a per-phase delta meaningful
(per-step workers reset each phase and read ~0).
Two readers cover the comparison, both on the &VmResult a post_vm
callback receives (the host-side hook that runs after the VM exits):
VmResult::throughput_ratio(a, b)— iterations/sec from the stimulus timeline. The timeline carries per-step boundaries independent of the periodic-capture pipeline, so throughput works even for--cell-parent-cgroupschedulers.VmResult::phase_metric(phase, name)— any other per-phase metric by its registry name (see Checking): CPU overhead (system_time_ns,user_time_ns) and scheduling quality (avg_imbalance_ratio,avg_dsq_depth). Wake-latency and run-delay distributions are run-level — pooled across cgroups into one whole-run value — so they cannot be split into the scheduler phase vs the EEVDF phase; to compare them, run the scheduler and EEVDF as two separate tests and read each run’s run-level metric. (The built-inSchbenchworkload is the exception: it measures its own wakeup latency internally and emitswakeup_p99_latency_usper phase.) Everything else flows through the one per-phase bucket pipeline, so a new metric becomes comparable here the moment it lands in that pipeline.
use anyhow::{ensure, Result};
use ktstr::assert::{AssertResult, Phase};
use ktstr::ktstr_test;
use ktstr::prelude::{Backdrop, VmResult};
use ktstr::scenario::Ctx;
use ktstr::scenario::ops::{execute_scenario, CgroupDef, HoldSpec, Op, Step};
use ktstr::test_support::{Scheduler, SchedulerSpec};
// Built directly rather than via declare_scheduler! so this comparison
// harness stays out of the verifier sweep (manual consts are not
// registered for sweeping). Use declare_scheduler! for the scheduler
// definition you ship.
const MY_SCHED: Scheduler =
Scheduler::named("my_sched").binary(SchedulerSpec::Discover("scx_my_sched"));
// Runs on the host after the VM exits; the &VmResult carries the stimulus
// timeline and the per-phase metric buckets the comparison reads.
fn compare_vs_eevdf(result: &VmResult) -> Result<()> {
let sched = Phase::step(0); // first Step ran under the scheduler under test
let eevdf = Phase::step(1); // second Step ran under EEVDF, after the detach
// Throughput: > 1.0 means the scheduler out-throughputs EEVDF; < 1.0
// is a regression.
let throughput = result
.throughput_ratio(sched, eevdf)
.ok_or_else(|| anyhow::anyhow!("no per-phase throughput — did both phases run?"))?;
ensure!(
throughput >= 0.8,
"my_sched throughput is {throughput:.2}x EEVDF (below the 0.8x floor)"
);
// Scheduling quality: any per-phase metric compares the same way via
// phase_metric. Skip the gate when a phase has no reading (None)
// rather than failing. (Wake-latency / run-delay distributions are
// run-level and not readable here — see the reader list above.)
if let (Some(s), Some(e)) = (
result.phase_metric(sched, "avg_imbalance_ratio"),
result.phase_metric(eevdf, "avg_imbalance_ratio"),
) {
ensure!(s <= e * 1.5, "my_sched imbalance {s:.2} is >1.5x EEVDF {e:.2}");
}
// CPU overhead: per-phase kernel (system) CPU time.
if let (Some(s), Some(e)) = (
result.phase_metric(sched, "system_time_ns"),
result.phase_metric(eevdf, "system_time_ns"),
) {
ensure!(s <= e * 2.0, "my_sched system time {s:.0}ns is >2x EEVDF {e:.0}ns");
}
Ok(())
}
#[ktstr_test(
scheduler = MY_SCHED,
duration_s = 10,
watchdog_timeout_s = 10,
post_vm = compare_vs_eevdf,
)]
fn scheduler_vs_eevdf(ctx: &Ctx) -> Result<AssertResult> {
// Persistent Backdrop population: runs across both phases so its
// cumulative counters span the detach.
let backdrop = Backdrop::new().push_cgroup(CgroupDef::named("cg").workers(4));
let steps = vec![
// Phase A: workload under the scheduler under test.
Step::new(vec![], HoldSpec::frac(0.5)),
// Phase B: detach -> the kernel default (EEVDF) takes over.
Step::new(vec![Op::detach_scheduler()], HoldSpec::frac(0.5)),
];
execute_scenario(ctx, backdrop, steps)
}
The 0.8x / 1.5x / 2.0x bounds above are illustrative, not
recommendations. Calibrate yours: run the test a few times with
generous bounds, note the observed ratios (each ensure! message
prints them; a run’s failure output leads with whichever message
tripped), and set each floor just outside the observed noise band.
A gate inside the noise band fails honest runs; one far outside it
never fails at all.
Notes:
Op::detach_scheduler()cleanly hands the workload to the kernel default. Each step emits its own boundary, so no trailing closer step is needed, and the intentional detach is not promoted to a scheduler-died failure.- Phases are keyed by
Phase:Phase::step(0)is the first scenario Step,Phase::step(1)the second.Phase::BASELINEis the pre-Step settle window. UsePhaserather than the raw stimulusstep_index. phase_metricreturnsNonewhen a phase has no reading for a metric, so gate insideif let (Some(..), Some(..))rather than unwrapping — a metric that did not populate skips its gate instead of failing the run.- For cross-cell balance rather than a phase-vs-phase comparison, read
result.stats.cgroup_balance_ratio()in the test body (the test body’sAssertResultcarriesstats).
This test gates scheduler-vs-EEVDF within one run. To gate your
scheduler against its own past self across commits, use
cargo ktstr perf-delta — the two nets catch
different regressions, and CI wants both.