Runs and Regression Gates
Every test run writes machine-readable results — one JSON sidecar
per test, grouped into a run directory keyed by (kernel, project
commit). That makes “did my change regress anything?” a one-command
question: cargo ktstr perf-delta pairs two commits’ sidecars and
fails the build when metrics regress past their gates.
The workflow
-
Run tests — each invocation writes sidecars into
target/ktstr/{kernel}-{project_commit}/:cargo ktstr test --kernel 6.14 -
List runs:
$ cargo ktstr stats list RUN TESTS DATE ARCH 7.0.14-73730e0-dirty 1 2026-07-04T23:28:34Z x86_64 7.1.0-73730e0-dirty 0 - - 7.1.1-73730e0-dirty 0 - - 7.0.0-73730e0-dirty 1 2026-07-04T22:16:24Z x86_64 7.1.0-73730e0 6 2026-07-04T21:41:43Z x86_64Rows sort by directory mtime, most recent first.
DATEis the run’s first sidecar timestamp;ARCHcomes from the first sidecar with host context (-when none has one). -
Gate a change against a baseline commit:
cargo ktstr perf-delta --noise-adjust 5 --kernel 6.14 # HEAD vs merge-base(HEAD, main) cargo ktstr perf-delta --base abc1234 # vs an explicit commit, cached sidecars cargo ktstr perf-delta --noise-adjust 5 --kernel 6.14 -E cgroup_steady # narrow the perf setThe canonical WIP-vs-baseline pattern: run
perf-delta --base abc1234from a-dirtyworking tree against the clean commit you edited from. -
Print analysis of the most recent run (gauntlet outliers, BPF verifier stats, callback profile, KVM stats):
cargo ktstr stats -
Inspect a run’s archived host context (the fingerprint the host-delta comparison uses — CPU identity, THP policy, sched sysctls):
cargo ktstr stats show-host --run 6.14-abc1234
perf-delta
perf-delta compares performance_mode test metrics between HEAD
and a baseline commit, per scenario, using the metric registry’s
polarity and thresholds (enumerate it with
cargo ktstr stats list-metrics — see
Assertable Metrics). Output is
one row per compared metric with the baseline and HEAD values,
colored red for regressions and green for improvements; the command
exits non-zero once enough metrics regress to trip the failure gate
— by default 5 or more (--fail-threshold), so a lone noisy
regression does not flip CI red, or any metric named in
--must-fail. If the baseline produced no performance_mode
sidecars at all, it prints a notice and exits 0 — an empty perf set
is “nothing to compare”, not a failure.
Baseline resolution (highest precedence first):
--base <commit>— compare HEAD directly against this commit-ish, no merge-base.--base-ref <ref>— compare againstmerge-base(HEAD, <ref>).$GITHUB_BASE_REF(set onpull_requestevents) — compare againstmerge-base(HEAD, origin/<ref>).- Otherwise
merge-base(HEAD, main)(override the branch with--default-branch).
The resolved baseline is shortened to the 7-hex form sidecars record, and the command bails if it resolves to HEAD (nothing to compare).
Two ways to get the baseline’s numbers:
- Cached (default) — both sides’ sidecars must already be in
the pool (a prior run, or a CI artifact you downloaded).
perf-deltaonly resolves the pair and compares, applying--threshold PCT(uniform gate) or--policy PATH(per-metric JSON) over the registry defaults. --noise-adjust N(requires--kernel, N ≥ 2) — produces both sides fresh: it checks the baseline and HEAD out into scratch checkouts, runs each side’sperformance_modetests N times, and gates on the observed spread. A regression counts only when the two sides are separated (a two-sided Welch t-test at α = 0.05, or fully disjoint min–max bands) and material (past the registry’s absolute + relative dual gate). This is the mode to trust on a noisy machine; budget N × per-side wall time for it.
perf-delta compares on the commit axis. A cross-config question —
scheduler A vs scheduler B at the same commit — is answered in-test
(see Compare a Scheduler vs EEVDF);
the worked A/B walkthrough with real gates is
A/B Compare Branches, and the CI
perf-gate job lives in CI.
Run directories
target/
└── ktstr/
├── 6.14-abc1234/ # kernel 6.14, project commit abc1234 (clean)
│ ├── test_a.ktstr.json
│ └── test_b.ktstr.json
└── 7.0-def5678-dirty/ # kernel 7.0, commit def5678 + uncommitted changes
├── test_a.ktstr.json
└── test_b.ktstr.json
The key is {kernel}-{project_commit}: the resolved kernel version,
plus the project tree’s HEAD short hex, suffixed -dirty when the
worktree differs from HEAD.
The commit is discovered from the test process’s working directory — for a scheduler crate using ktstr as a dev-dependency, that is the scheduler crate’s commit, not ktstr’s. Run from whichever clone you want the run keyed on.
Warning
A run directory is a last-writer-wins snapshot, not an archive. Re-running the suite at the same kernel and project commit pre-clears the prior sidecars at the new run’s first write. To preserve a run, move the directory out of the runs root (
mv target/ktstr/6.14-abc1234 ~/ktstr-archives/...) — a sibling insidetarget/ktstr/would still be walked bystats list— or commit your changes so the next run lands under a new key.
Pre-clear is shallow: only *.ktstr.json files at the top level are
removed. Subdirectories created by external orchestrators (per-job
gauntlet layouts) are left alone but still read by stats, so clean
those yourself when reusing them.
Inspecting sidecars
Each sidecar records the test name, topology, scheduler, work type, verdict, per-cgroup stats, monitor summary, verifier and KVM stats, kernel version, host context, and timestamps. Discovery tooling:
-
cargo ktstr stats list-values— the distinct values per filterable dimension (kernel, commit, scheduler, topology, work type, …) across the pool: the upstream answer to “what have I got?” before narrowing aperf-delta. -
cargo ktstr stats list-metrics— the regression metric registry (names, polarity, default gates, units). -
cargo ktstr stats explain-sidecar --run ID— why optional fields are absent, per sidecar, with a fix when one exists:walked 1 sidecar file(s), parsed 1 valid test: throughput_gate topology: 1n1l2c1t scheduler: my_sched ... populated optional fields (8): resolve_source, project_commit, monitor, kvm_stats, kernel_version, host, cleanup_duration_ms, run_source none fields (3): scheduler_commit [expected] - no SchedulerSpec variant currently exposes a reliable commit source — reserved on the schema for future enrichment (e.g. --version probe or ELF-note read on the resolved scheduler binary) payload [expected] - test declared no binary payload (scheduler-only test or pure-scenario test that never invokes ctx.payload(...)) kernel_commit [actionable] - KTSTR_KERNEL is unset or empty ... fix: set KTSTR_KERNEL to a local kernel source tree that is a git repository (e.g. a git clone of the kernel)expectedmeansNoneis the steady state;actionablemeans a different environment would populate the field.--jsonemits an aggregate object for dashboards.
stats list-values, show-host, and explain-sidecar all take
--dir DIR to point at an archived sidecar tree copied off a CI
host.
Environment notes
- Local filesystem required. The runs root must live on ext4 / xfs / btrfs / tmpfs — the advisory lock that serializes concurrent sidecar writes rejects NFS and other remote filesystems.
- Non-git runs collide. When the test process is not in a git
repository, the commit slot is the literal
unknown, and every such run shares{kernel}-unknown(with pre-clear between them). SetKTSTR_SIDECAR_DIRor put the tree under git to disambiguate. The sidecar’s ownproject_commitfield staysnullfor these runs — the dirname sentinel and the JSON field intentionally diverge. KTSTR_SIDECAR_DIRoverrides the sidecar directory itself (used as-is, no key suffix) for writes and for barecargo ktstr statsreads. Pre-clear is skipped under the override — you chose the directory, you own its contents. Thestats list/list-values/show-hostsubcommands do not consult it; use--dir.
Failure artifacts
Failed tests additionally write a failure-dump JSON next to their sidecar — see Reading Failure Output for the path scheme, the placeholder-vs-full distinction, and the investigation workflow.