Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

BPF Verifier Sweep

A scheduler that loads on your machine can still be rejected — or attach and then wedge — on a topology you never booted. verified_insns varies with topology whenever topology-derived config like nr_cpus is baked into .rodata: the verifier sees different known constants, walks different branches, and can reach a different verdict. The verifier sweep boots every declared scheduler in a KVM VM across a range of topologies and checks three things against the real kernel: the BPF programs verify, the scheduler attaches as the active sched_ext scheduler, and it dispatches an injected workload.

The verifier that runs is the real verifier in the real target kernel — no host-side BPF loading, no version skew. And there is no subprocess to bpftool or veristat: the host reads per-program verified_insns directly from guest memory via bpf_prog_aux introspection, and applies cycle collapse to verifier logs instead of truncating them.

Quick start

# Every declared scheduler, kernel discovered via KTSTR_KERNEL / cache
cargo ktstr verifier

# Pin the kernel
cargo ktstr verifier --kernel ../linux

# Sweep across kernels (each cell runs against its own)
cargo ktstr verifier --kernel 6.14.2 --kernel 7.0

# One scheduler across topologies
cargo ktstr verifier --scheduler scx-ktstr

# Raw verifier log, no cycle collapse
cargo ktstr verifier --raw

See cargo-ktstr verifier for the flag list.

A healthy sweep

Four small cells, one scheduler, one kernel — each cell boots its own VM, loads the scheduler, and confirms attach + dispatch:

cargo ktstr verifier --kernel 7.0 --scheduler ktstr_sched --test kaslr_axis_e2e tiny-1llc tiny-2llc odd-3llc smt-2llc
cargo ktstr: resolved kernel "7.0"
cargo ktstr verifier: dispatching to nextest (verifier/ cells only) on 1 resolved kernel(s) forwarding to nextest: --test kaslr_axis_e2e tiny-1llc tiny-2llc odd-3llc smt-2llc
...
    Starting 4 tests across 1 binary (55 tests skipped)
        PASS [  12.406s] (1/4) ktstr::kaslr_axis_e2e verifier/ktstr_sched/kernel_7_0/odd-3llc
        PASS [  12.432s] (2/4) ktstr::kaslr_axis_e2e verifier/ktstr_sched/kernel_7_0/smt-2llc
        PASS [  12.656s] (3/4) ktstr::kaslr_axis_e2e verifier/ktstr_sched/kernel_7_0/tiny-1llc
        PASS [  12.929s] (4/4) ktstr::kaslr_axis_e2e verifier/ktstr_sched/kernel_7_0/tiny-2llc
────────────
     Summary [  12.929s] 4 tests run: 4 passed, 55 skipped

verifier verified_insns (per scheduler; rows: kernel, cols: BPF program, cell: range across topologies):

ktstr_sched:
 kernel      ktstr_dispatch  ktstr_dump  ktstr_dump_cpu  ktstr_dump_task  ktstr_enqueue  ktstr_exit  ktstr_exit_task  ktstr_init  ktstr_init_task  ktstr_select_cp  ktstr_yield
 kernel_7_0  102             81          13              70               74             25          419              2296        29077            39               8

verifier summary: 4 ✅  0 ❌  0 🇽
 topology   ktstr_sched
 odd-3llc   ✅
 smt-2llc   ✅
 tiny-1llc  ✅
 tiny-2llc  ✅

A cell in the verified_insns table shows a single number when the count is flat across topologies, lo..hi when it varies, and - when that program reported no stats on that kernel. In the grid, ✅ means the scheduler verified, attached, and dispatched on every kernel that ran the cell; ❌ means it failed on every kernel; 🇽 means mixed results across kernels (the 🇽 glyph renders inconsistently in some terminal fonts — the failing-combinations list below the grid is the authoritative record). This 4-cell sweep ran its VMs in parallel and finished in about 13 seconds of test time.

What a cell checks

  1. Verify — inside the VM the scheduler loads its BPF programs; the target kernel’s verifier runs against them. The host reads per-program verified_insns from bpf_prog_aux via guest memory introspection. On load failure, libbpf’s verifier log is forwarded to the host.
  2. Attach (positive confirmation) — the guest confirms the scheduler process survived load and /sys/kernel/sched_ext/state reached enabled. The kernel sets enabled only after ops.init, per-task init, and switching eligible tasks to the sched_ext class, so this proves the scheduler is scheduling, not merely that its BPF loaded. Attach is confirmed only when the guest reaches its post-attach dispatch phase — a guest that vanishes early (e.g. a panic before any frame is emitted) fails rather than passing by default.
  3. Dispatch probe — the verifier VM has no #[ktstr_test] body, so it injects a SpinWait workload sized to the guest’s online CPUs, running as SCHED_EXT. A cell passes only when a worker makes forward progress after attach: a scheduler that attaches but never dispatches a runnable task is a distinct, worse failure the attach gate alone cannot catch.

Every cell boots with performance mode disabled (no_perf_mode) — verified_insns is perf-mode-independent, so cells share LLC reservations instead of serializing on them.

A real rejection

The fixture scheduler ships rejection knobs (see fixture knobs) precisely so this path stays exercised. Here --verify-loop plants an unrolled loop ending in a store through a null pointer — the verifier walks the loop, then rejects the store. Note the collapse markers: the loop body is shown once, not eight times:

cargo ktstr verifier --kernel 7.0 --scheduler ktstr_broken --test verifier_pipeline tiny-1llc
=== ktstr_broken | kernel kernel_7_0 | topology tiny-1llc ===

verifier
  scheduler: NOT ATTACHED — scheduler process exited during BPF load/startup

verifier --- verifier stats ---
  processed=186  states=7/7

verifier --- scheduler log ---
Global function ktstr_dispatch() doesn't return scalar. Only those are supported.
0: R1=ctx() R10=fp0
; if (crash) @ main.bpf.c:423
0: (18) r1 = 0xff5d3bb3000f60dc       ; R1=map_value(map=bpf_bpf.bss,ks=4,vs=280,off=220)
...
; volatile u32 acc = 0; @ main.bpf.c:450
37: (63) *(u32 *)(r10 -8) = r1        ; R1=0 R10=fp0 fp-8=mmmm0
--- 8x of the following 25 lines ---
; u64 t = bpf_ktime_get_ns(); @ main.bpf.c:453
38: (85) call bpf_ktime_get_ns#5      ; R0=scalar()
; acc += (u32)t; @ main.bpf.c:454
39: (61) r1 = *(u32 *)(r10 -8)        ; R1=0 R10=fp0 fp-8=mmmm0
...
--- 6 identical iterations omitted ---
; u64 t = bpf_ktime_get_ns(); @ main.bpf.c:453
171: (85) call bpf_ktime_get_ns#5     ; R0=scalar()
...
--- end repeat ---
190: (b7) r1 = 0                      ; R1=0
; *p = (int)acc; @ main.bpf.c:464
...
192: (63) *(u32 *)(r1 +0) = r2
R1 invalid mem access 'scalar'
processed 186 insns (limit 1000000) max_states_per_insn 0 total_states 7 peak_states 7 mark_read 0
...
verifier summary: 0 ✅  1 ❌  0 🇽
 topology   ktstr_broken
 tiny-1llc  ❌

failing combinations (scheduler / kernel / topology):
  ktstr_broken / kernel_7_0 / tiny-1llc
error: cargo nextest run exited with 100

The interleaved ; source line @ file:line comments name the C statement each instruction group came from — the offending store is *p = (int)acc; at main.bpf.c:464.

Cycle collapse

The kernel verifier unrolls loops, re-verifying each instruction with updated register state. A bounded 8-instruction loop verified 100 times produces 800 near-identical lines that differ only in register-state annotations; naive truncation loses the context you came for. Cycle collapse keeps the structure: first iteration (what the loop does), an omission count, last iteration (final state).

The algorithm normalizes lines by stripping register-state annotations (source comments are preserved as anchors), finds the most frequent normalized line to establish the cycle period (minimum period 5 lines, minimum 3 repetitions), verifies consecutive blocks match, and collapses — iterating up to 5 passes for nested loops. --raw skips all of this and prints the full log.

Matrix dimensions and filters

The sweep matrix is (declared scheduler × kernel × topology preset). Schedulers come from the declare_scheduler! registry (--scheduler NAME narrows to one; EEVDF and kernel-builtin declarations are skipped — no userspace binary to verify). Kernels come from the operator’s --kernel set; with no flag, one auto-discovered kernel is used. The topology axis is the set of gauntlet presets each scheduler’s constraints accept.

Each scheduler’s kernels = [...] declaration filters the operator-supplied kernel set:

  • kernels = [] (or omitted) — accepts every kernel-list entry.
  • Version specs ("6.14.2") — match entries whose label equals the version (raw or sanitized form).
  • Range specs ("6.14..6.16", "6.14..=6.16") — match entries whose version falls in the inclusive range.
  • Path / cache-key / git specs — match by sanitized-label equality.
# Scheduler declares kernels = ["6.14..6.16"]
# Operator passes 6.14.2, 6.15.0, 6.17.0 — the third is filtered out.
# Cells emitted per accepted preset:
#   verifier/<sched>/kernel_6_14_2/<preset>
#   verifier/<sched>/kernel_6_15_0/<preset>
cargo ktstr verifier --kernel 6.14.2 --kernel 6.15.0 --kernel 6.17.0

A cell whose kernel label matches nothing in the resolved set errors with a diagnostic naming the present labels — no silent fallback to an unrelated kernel.

Runtime: total cost is one VM boot per cell — schedulers × kernels × accepted presets. Cells run in parallel under nextest; the 4-cell example above cost ~13 s.

Fixture knobs

The scx-ktstr fixture scheduler ships two flags that make the rejection path testable on demand:

  • --fail-verify — sets a .rodata variable before scx_ops_load! that enables a store through a null pointer in ktstr_dispatch — the invalid access the verifier rejects.
  • --verify-loop — same rejection, preceded by an unrolled 8-iteration loop so the log exercises cycle collapse. It is deliberately not a while(1): the verifier’s infinite-loop analysis could keep scx_ops_load from returning within the host’s scheduler-attach poll.

Pass them via sched_args on a scratch declare_scheduler! — that is exactly how the rejection capture above was produced.