Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

ktstr

Test Linux schedulers like code. Every test boots a real kernel in a KVM micro-VM with the topology it declares — and ktstr watches what your scheduler does from the host, without touching the guest.

Scheduler bugs hide in topology: the fairness regression that only shows up on an odd LLC count, the starvation that needs SMT siblings, the crash that wants a NUMA crossing. Testing a sched_ext scheduler against those shapes has meant scrounging hardware and hand-running repro scripts. ktstr turns it into cargo test: declare the topology on the test, and the VM actually has it.

Quick taste

use ktstr::prelude::*;

declare_scheduler!(MY_SCHED, {
    name = "my_sched",
    binary = "scx_mine",
});

#[ktstr_test(scheduler = MY_SCHED, llcs = 1, cores = 2, threads = 1)]
fn steady_under_my_sched(ctx: &Ctx) -> Result<AssertResult> {
    scenarios::steady(ctx)
}

Run it against any kernel — a released version, a local source tree, or a git URL:

cargo ktstr test --kernel 7.0
cargo ktstr: fetching latest 7.0.x kernel version
cargo ktstr: latest 7.0.x kernel: 7.0.14
cargo ktstr: resolved kernel "7.0"
...
 Nextest run ID 98581174-… with nextest profile: default
    Starting 1 test across 121 binaries (12531 tests skipped)
        PASS [  34.459s] (1/1) ktstr::failure_dump_e2e ktstr/failure_dump_renders_bss_fields
...
     Summary [  34.498s] 1 test run: 1 passed, 12531 skipped

Without a scheduler attribute, tests run under the kernel’s default scheduler (EEVDF) — useful for baselines and A/B comparisons.

When it breaks, you see why

A crash log tells you where the scheduler died. ktstr also tells you what the state looked like on the way there: on a crash it boots a second VM, attaches BPF probes along the crash path, and reruns the scenario. Each probed function prints decoded struct fields; marks fields that changed between entry and exit:

cargo ktstr test — auto-repro output after a scheduler crash
=== AUTO-PROBE: scx_exit fired ===

  ktstr_enqueue                                                   main.bpf.c:21
    task_struct *p
      pid         97
      cpus_ptr    0xf(0-3)
      dsq_id      SCX_DSQ_INVALID
      ...
      scx_flags   QUEUED|ENABLED
  do_enqueue_task                                               kernel/sched/ext.c
    rq *rq
      cpu         1
    task_struct *p
      pid         97
      cpus_ptr    0xf(0-3)
      dsq_id      SCX_DSQ_INVALID          →  SCX_DSQ_LOCAL
      ...
      scx_flags   QUEUED|DEQD_FOR_SLEEP    →  QUEUED

Auto-repro is on by default and needs a kernel with the sched_ext_exit tracepoint — see Auto-Repro. For the anatomy of ordinary failures (stats, timeline, monitor verdict), see Reading Failure Output.

Real kernels under KVM

Each test gets a fresh micro-VM booting the exact kernel you target. Real cgroups, real BPF, no shared state.

How it works →

Topology as code

NUMA nodes, LLCs, cores, SMT — declared on the test attribute, realized in the guest down to the ACPI tables.

Topology →

Gauntlet

One test declaration fans out across a matrix of topology presets — odd LLC counts, SMT, NUMA crossings — with budget-aware selection for CI.

Gauntlet →

Auto-repro

Crashes rerun themselves in a probe VM that captures function arguments and struct state along the crash path.

Auto-Repro →

Design

Fidelity without overhead. Every test boots a real Linux kernel in a KVM VM with real cgroups and real BPF programs — no mocking, no containers, no state carried between tests. The VMM is purpose-built for this job; see VMM.

Direct access over tooling layers. The host-side monitor reads guest memory through BTF-resolved struct offsets — runqueues, DSQ depths, schedstat counters — loading nothing into the guest, so observation does not perturb the scheduler under test. See Monitor.

What it tests

  • Fair scheduling — workers get CPU time without starvation or excessive scheduling gaps.
  • Cpuset isolation — workers stay on assigned CPUs.
  • Dynamic operations — cgroups created, destroyed, and resized mid-run.
  • Affinity — the scheduler respects thread affinity constraints.
  • Stress — many cgroups, many workers, rapid topology changes.
  • Stall detection — the scheduler doesn’t drop tasks.

Note

ktstr is pre-release. 0.x APIs change between releases, so pin the exact version — Getting Started shows how.

Next steps