Investigate a Crash
When the scheduler under test dies mid-run, the test fails with a structured crash report: the scx exit reason, a kernel-side backtrace, per-CPU scheduler state, and — by default — the results of a second VM that ktstr boots automatically to replay the crash with probes attached. This recipe walks a real crash from report to regression test.
First step: rerun with full diagnostics
RUST_BACKTRACE=1 cargo ktstr test --kernel 7.0 -- -E 'test(my_test)'
RUST_BACKTRACE=1 (or full) does two things: it appends the
--- diagnostics --- section (init stage, VM exit code, kernel
console tail) to every failure — not only scheduler deaths — and it
boots the guest with a verbose console (the same switch
KTSTR_VERBOSE=1 flips).
The crash report
A failure prints as a header line plus sections, each present only
when relevant: --- stats --- (per-cgroup worker results),
--- diagnostics ---, --- timeline --- (phases with monitor
samples), --- scheduler log --- (scheduler stdout+stderr,
including the kernel’s DEBUG DUMP when the scheduler died),
--- monitor --- (host-side observations and verdict),
--- sched_ext dump --- (the same dump as traced by the guest
kernel), and --- auto-repro ---. See
Reading Failure Output for the full
anatomy; this recipe focuses on the crash workflow.
A real crash, top to bottom
The crash below is real: ktstr’s fixture scheduler was told to call
scx_bpf_error() via a host-written .bss trigger (the same
mechanism step 7 of Test a New Scheduler
uses). Trimmed to the load-bearing sections:
BUG SUMMARY: scx_bpf_error (src/bpf/main.bpf.c:424: ktstr: host-triggered crash) ktstr_test 'bpf_crash_auto_repro_e2e' [sched=scx-ktstr] [topo=1n1l4c1t] failed: scheduler process died unexpectedly during workload (2.2s into test) ... --- scheduler log --- ... DEBUG DUMP ================================================================================ swapper/3[0] triggered exit kind 1025: scx_bpf_error (src/bpf/main.bpf.c:424: ktstr: host-triggered crash) Backtrace: scx_exit+0x50/0x70 scx_bpf_error_bstr+0x78/0x90 bpf_prog_1fed99378f3a8055_ktstr_dispatch+0x4d/0x1cb bpf__sched_ext_ops_dispatch+0x4b/0xa7 do_pick_task_scx+0x379/0x770 __schedule+0x5ca/0xfc0 schedule+0x44/0x1b0 worker_thread+0xa2/0x2d0 kthread+0xf3/0x130 ret_from_fork+0x19b/0x260 ret_from_fork_asm+0x1a/0x30 ktstr scheduler state: stall=0 crash=1 degrade_rt=0 rodata: degrade=0 slow=0 scattershot=0 verify_loop=0 fail_verify=0 ktstr_alloc_count=87 degrade_cnt=0 slow_cnt=0 ... Error: EXIT: scx_bpf_error (src/bpf/main.bpf.c:424: ktstr: host-triggered crash)
Three lines identify the cause:
- The
BUG SUMMARY/ exit line carries the message and C source line your scheduler passed toscx_bpf_error(). - The backtrace names the BPF program that raised it
(
…ktstr_dispatch) and shows it fired from the kernel’s pick path (do_pick_task_scxinside__schedule). - The
ktstr scheduler stateblock is the scheduler’s own dump callback output — whatever your scheduler prints in its.dump()op appears here. In this casecrash=1confirms the host-written trigger was read.
Auto-repro
auto_repro defaults to true in #[ktstr_test]. When the
scheduler crashes, ktstr automatically:
- Captures the crash stack trace from the scenario output.
- Boots a second VM with kprobes (kernel functions) and fentry
probes (BPF callbacks) on each function in the crash chain, plus
a
tp_btf/sched_ext_exittracepoint trigger. - Reruns the scenario to capture function arguments at each crash point.
The --- auto-repro --- section starts with the probe pipeline’s
own accounting, so you can always see how much of the chain
attached. From the run above:
--- auto-repro ---
--- probe pipeline ---
extracted: 10 functions from crash backtrace
traceable: 7 passed, 3 dropped: bpf_prog_1fed99378f3a8055_ktstr_dispatch, bpf__sched_ext_ops_dispatch, ret_from_fork_asm
bpf_discover: 0 programs found
after_expand: 7 total probe targets
kprobes: 0 attached
trigger: attach failed (skeleton load (retry): No such process (os error 3); original error before retry: No such process (os error 3))
probe_data: 0 keys, 0 unmatched IPs
events: 0 captured, 0 after stitch
repro VM duration: 16.9s
Read the trigger: line honestly: the probe trigger needs the
tp_btf/sched_ext_exit tracepoint, which not every kernel carries —
on this 7.0.14 guest the attach failed, so no per-function argument
events were captured. The section still delivers value: after the
pipeline accounting it appends the repro VM’s diagnostics (in this
run, its sched_ext dump and dmesg tail — a scheduler log and the
freeze-coordinator failure dump appear when the repro run produces
them), and here those reproduced the same crash. On kernels with
the tracepoint, the pipeline instead lists each probed function
with the argument values captured on the way to the crash. Cost:
one extra VM boot plus a scenario replay (16.9 s here). Auto-repro
is skipped when expect_err = true — an expected failure is not
worth a repro VM — and can be turned off with auto_repro = false.
See Auto-Repro for how the two-VM
cycle works and its kernel requirements.
Pin the bug as a regression test
Once the crash is understood, pin its signature so the same bug
fails the next CI run instead of silently regressing. Two
#[ktstr_test] attributes attach matchers to the captured
scx_bpf_error text (the combined scheduler log and
--- sched_ext dump --- corpus):
expect_scx_bpf_error_contains = "literal"— substring match. Use for the common case of pinning an exact error fragment without escaping regex metacharacters.expect_scx_bpf_error_matches = "regex"— full regex match via theregexcrate. Use for anchored patterns, character classes, and wildcards.
Both require expect_err = true, and both compose with AND
semantics — set both to require both to match. The test body must
actually set up the trigger; here the host-side BpfMapWrite from
the crash above does it:
use ktstr::prelude::*;
// Host writes 1 into the scheduler's `crash` .bss global after
// load; the scheduler's dispatch path reads it and calls
// scx_bpf_error(...) — the crash this test pins.
static BPF_CRASH: BpfMapWrite = BpfMapWrite::new(".bss", "crash", 1);
#[ktstr_test(
scheduler = MY_SCHED,
bpf_map_write = BPF_CRASH,
expect_err = true,
expect_scx_bpf_error_contains = "host-triggered crash",
expect_scx_bpf_error_matches = r"src/bpf/main\.bpf\.c:\d+",
)]
fn crash_regression(ctx: &Ctx) -> Result<AssertResult> {
ktstr::scenario::basic::custom_crash_light(ctx)
}
The test fails if either matcher misses. A passing run means the scheduler still hits the pinned bug; a failure means the error text drifted (update the matcher) or the bug was fixed (delete the regression test).
Regex anchors use string-boundary semantics against the whole
captured corpus: ^/$ match its start and end, and . does not
cross \n. Opt in to line-level anchoring with (?m) and to
dotall with (?s). Pattern-validity edge cases (empty patterns,
trivially-matching patterns, builder vs attribute validation) are
covered in the
#[ktstr_test] reference.