Module workload

Module workload 

Source
Expand description

Worker process management and telemetry.

Workers are fork()ed processes by default (CloneMode::Fork, the #[default]) so each can be placed in its own cgroup; CloneMode::Thread instead uses std::thread::spawn, so those workers share the parent’s tgid, address space, and signal-handler table. Key types:

See the WorkSpec Types and Worker Processes chapters of the guide.

§Module layout

§Naming conventions

§“Intent” vs “Resolved” naming

Types named with an Intent suffix carry test-author intent (the input to the workload pipeline). Types named with a Resolved prefix carry runtime-resolved configuration (the output of intent + topology + cgroup state). AffinityIntent resolves to ResolvedAffinity at spawn time via resolve_affinity_for_cgroup.

CloneMode is a runtime-resolved value because the test author writes CloneMode::Fork / CloneMode::Thread directly (no resolution layer); the Mode suffix denotes a single kernel-facing dispatch decision rather than a two-stage intent/resolved pipeline.

SchedClass and SchedPolicy follow the same coarse-intent / concrete-runtime split using legacy kernel terminology rather than the Intent/Resolved naming — see SchedClass for the per-class mapping.

§“Churn” vs “Sweep” suffixes on WorkType variants

Variants whose names end in Churn cycle their target setting at high frequency to exercise the kernel’s per-task state machines under rapid transitions. WorkType::AffinityChurn samples a random CPU from the effective cpuset on every iteration (rand::rng().random_range); WorkType::PageFaultChurn touches a fresh random subset of pages each cycle (xorshift64). Most Churn variants pick each value randomly and independently of the previous one; WorkType::PolicyChurn is the exception — despite the Churn name it cycles through the supported scheduling policies in a fixed, ordered sequence keyed on the iteration counter (iterations % policies.len()).

Variants whose names end in Sweep rotate their target setting through an ordered list or range — the next value is a deterministic function of the iteration counter, not a random pick. WorkType::NiceSweep cycles nice values from effective_min..=19 modulo the range size; WorkType::NumaWorkingSetSweep rotates the working-set binding through target_nodes in declaration order. The intent is to walk a phase space evenly so every value gets comparable observation time, rather than producing the unbiased-random transitions Churn produces.

Choose Churn when the workload’s value is its transition-frequency entropy; choose Sweep when the workload must visit every phase deterministically.

Modules§

defaults
Named defaults for the parametric WorkType variants, used by WorkType::from_name. Extracting the magic numbers here provides a named home for the default values so tests and docs (e.g. doc/guide/src/architecture/workers.md) can cite them by constant name instead of each tracking a scattered integer literal. Every value carries a single-line comment naming the knob and its unit; the const names mirror the {variant_snake}_{field} convention so renames show up as compile errors in both sites.

Structs§

CustomCfg
Plain-old-data config payload for WorkType::Custom. Every field is Copy (integers / bools / a fixed byte array) so the whole struct survives fork() byte-faithfully in the child address space — no heap pointer, no shared mapping required. Read post-fork by the worker dispatch and handed to the closure via WorkerCtx::cfg.
CustomFn
Function-pointer wrapper for the WorkType::Custom variant’s run field.
Migration
A single CPU migration event observed by a worker.
MpolFlags
Optional mode flags for set_mempolicy(2).
PhaseSlice
Per-phase telemetry for one backdrop worker over one scenario step’s HOLD window [StepStart(k), StepEnd(k)). A backdrop worker spans every phase, so it accumulates a fresh slice between each parent-driven phase_epoch transition and finalizes it when the epoch moves (drain-on-change). Carries the per-phase subset of WorkerReport’s whole-run telemetry that has a host-side per-cgroup carrier (PhaseCgroupStats), so the host can pool slices across workers into per-epoch PhaseBuckets. Counter fields are per-phase deltas (end_snap − start_snap, re-baselined at each boundary); cpus_used and numa_pages are per-phase observations (gauges). Excludes iteration_costs_ns — that reservoir has no per-cgroup carrier at any level (it feeds only the run-level benchmark gate).
PipeTransferReport
Pipe-mode (-p) throughput reporting used by the ktstr-schbench-validate driver to mirror schbench’s avg worker transfer line; clamps the transfer size + scales bytes/sec exactly like schbench (schbench.c:1979-1982). Not in the prelude (validation-tool surface, like StandaloneReport). schbench’s pipe-mode (-p) throughput summary — the avg worker transfer line (schbench.c:1979-1982): the per-worker transfer rate as ops/sec plus the pretty-scaled bytes/sec. Built by pipe_transfer_report.
SchbenchConfig
User-facing config for the Schbench workload. Declarative config for the Schbench workload. Construct via SchbenchConfig::default (schbench’s own defaults) plus the chainable setters, e.g. SchbenchConfig::default().message_threads(2).worker_threads(4). Derives Clone/Debug/PartialEq/Eq/Hash/serde; the builder shape follows WorkloadConfig, but Eq+Hash (which WorkloadConfig and WorkSpec omit because of their transitive f64) are available here since every field is integer/bool – the ktstr f64-free leaf-config convention.
StandaloneReport
Whole-run result of a standalone (no-VM) schbench engine run, projected for the side-by-side comparison against the reference schbench. The percentile arrays index in SCHBENCH_PERCENTILES order (20.0, 50.0, 90.0, 99.0, 99.9), in microseconds. The sample counts are carried so a zero-sample run is visible rather than silently reported as an all-zero distribution.
TaobenchConfig
User-facing config for the Taobench workload. User-facing config for the Taobench workload — a bounded, evicting key-value cache with a fast hit path and a slow miss path, driven to a steady-state hit ratio.
TaobenchStandaloneReport
Host-side standalone runner + its report, backing the ktstr-taobench-validate validation driver (the analog of schbench’s run_standalone). Summary of a run_standalone run — the headline taobench metrics in the shape the reference taobench server reports (fast_qps / hit_rate / slow_qps) plus the derived total_qps (= fast + slow) and hit_ratio (= fast / total). Under open-loop arrival (arrival_rate > 0) it also carries the coordinated-omission serve-latency percentiles; these are None in closed loop (no intended-arrival schedule, so no serve latency is measured).
TaobenchStats
Whole-run / per-phase taobench engine counters, the carried aggregate on crate::workload::WorkerReport::taobench_whole and crate::assert::CgroupStats::taobench_whole. The host derives the run-level taobench_* Rate metrics from it; test authors normally assert those metrics rather than reading this struct directly. Taobench engine counters for one accounting window — a single phase epoch (the per-phase crate::workload::PhaseSlice::taobench carrier) or a whole worker run (the crate::workload::WorkerReport::taobench_whole / crate::assert::CgroupStats::taobench_whole aggregate). Integer-only so the enclosing PhaseSlice keeps Eq. get_cmds / get_misses are request-time; fast_ops / slow_ops are response-time (see the module docs). Self::merge pools two windows (Σ ops, MAX wall) and Self::total_ops is the throughput numerator; the host derives the run-level taobench_* Rate metrics from the pooled aggregate.
WorkSpec
WorkerCtx
Execution context handed to a WorkType::Custom worker function.
WorkerReport
Telemetry collected from a worker process after it stops.
WorkloadConfig
Configuration for spawning a group of worker processes.
WorkloadHandle
Handle to spawned worker tasks. Workers block until start() is called.

Enums§

AffinityIntent
Scenario-level affinity intent for a group of workers.
AluWidth
ALU/SIMD execution width for WorkType::AluHot.
CloneMode
How WorkloadHandle::spawn creates worker tasks.
FutexLockMode
Whether WorkType::PriorityInversion uses a PI-aware mutex or a plain futex.
MemPolicy
NUMA memory placement policy for worker processes.
ReapMode
How a WorkType::CgroupAttachStorm worker reaps the transient children it forks each iteration.
ResolvedAffinity
Resolved CPU affinity for a worker process.
SchedClass
Coarse Linux scheduling class identifier.
SchedPolicy
Linux scheduling policy for a worker process.
WakeMechanism
Wake mechanism between stages of a WorkType::WakeChain.
WorkPhase
A single phase in a WorkType::Sequence compound work pattern.
WorkType
What each worker process does during a scenario.
WorkTypeValidationError
Spawn-time validation failures for WorkType preconditions.
WorkerExitInfo
Reason a sentinel WorkerReport was synthesized — attached to the report’s exit_info field so operators can triage a missing or undecodable postcard payload without cross-referencing parent-side logs.

Constants§

SCHBENCH_PERCENTILES
The five latency percentiles reported by StandaloneReport and the per-phase metric path, in column order: 20.0, 50.0, 90.0, 99.0, 99.9. Matches schbench’s percentile rows (schbench.c show_latencies). Callers label the StandaloneReport percentile arrays by zipping with this slice rather than hard-coding an index-to-percentile mapping.

Traits§

WorkerReportClaim
Pointwise-claim accessors generated by #[derive(Claim)] on WorkerReport. One claim_<field> method per public field, taking &mut Verdict as the accumulator; container fields (BTreeSet/Vec) route through SetClaim/SeqClaim. Method dispatch keys on the stats struct’s type, so identical field names across distinct stats structs do not collide. For prelude-exported stats types the trait is preluded, so use ktstr::prelude::* brings the accessors into scope; otherwise import the trait from the stats type’s module.

Functions§

pipe_transfer_report
Pipe-mode (-p) throughput reporting used by the ktstr-schbench-validate driver to mirror schbench’s avg worker transfer line; clamps the transfer size + scales bytes/sec exactly like schbench (schbench.c:1979-1982). Not in the prelude (validation-tool surface, like StandaloneReport). Derive the pipe-mode avg worker transfer line from a run’s aggregate achieved_rps (completed cycles/sec over the true elapsed window), the requested pipe_transfer_bytes, and the resolved nr_workers. The figure is PER WORKER: schbench divides by loop_runtime = Σ each worker’s runtime (schbench.c:1697 sums worker->runtime; :1942-1943/:1979 divide by it), and Σ worker runtimes ≈ nr_workers * elapsed, so the per-worker rate is the aggregate achieved_rps / nr_workers — the label is literally “avg WORKER transfer”. (Dividing the aggregate by wall-clock alone would over-report by nr_workers×.) The transfer size is CLAMPED to PIPE_TRANSFER_BUFFER first — the engine moves only the clamped size per cycle (run applies the same .min()), matching schbench’s parse-time clamp (schbench.c:291-294) — so the throughput reflects the bytes ACTUALLY moved. Scaling is schbench’s pretty_size (schbench.c:1606). nr_workers is floored at 1 (no division by zero).
run_standalone
Run the schbench engine standalone — host-side, no VM, no phases — for run_secs seconds and project the whole-run result into a StandaloneReport for the side-by-side validation against the reference schbench.
set_thread_affinity
Set per-thread CPU affinity via sched_setaffinity(2).
taobench_run_standalone
Host-side standalone runner + its report, backing the ktstr-taobench-validate validation driver (the analog of schbench’s run_standalone). Host-side standalone run of the taobench engine for run_secs, returning a summary report — the analog of schbench’s run_standalone, backing the ktstr-taobench-validate driver for the side-by-side comparison against the reference taobench. NOT used in-VM (the scenario engine drives run there).