Module workload

Expand description

Worker process management and telemetry.

Workers are fork()ed processes by default (CloneMode::Fork, the #[default]) so each can be placed in its own cgroup; CloneMode::Thread instead uses std::thread::spawn, so those workers share the parent’s tgid, address space, and signal-handler table. Key types:

WorkType – what each worker does
WorkloadConfig – spawn configuration (count, affinity, work type, policy)
WorkloadHandle – RAII handle to spawned workers
WorkerReport – per-worker telemetry collected after stop
AffinityIntent – per-worker affinity intent (Inherit, LlcAligned, Exact, etc.)
ResolvedAffinity – resolved CPU affinity for workers
WorkSpec – workload definition for a single group of workers within a cgroup
WorkPhase – a single phase in a WorkType::Sequence compound work pattern
SchedPolicy – Linux scheduling policy for a worker process
MemPolicy – NUMA memory placement policy for worker processes

See the WorkSpec Types and Worker Processes chapters of the guide.

§Module layout

affinity — AffinityIntent / ResolvedAffinity + the resolver and sched_setaffinity wrapper.
config — declarative test-author input (WorkloadConfig, WorkSpec, SchedPolicy, MemPolicy, MpolFlags, CloneMode, FutexLockMode, WakeMechanism, AluWidth) and the humantime_serde_helper shared by every Duration field.
types — WorkType / WorkPhase / WorkTypeValidationError and the WorkType naming surface (from_name, suggest, ALL_NAMES).
spawn — runtime spawn pipeline: WorkloadHandle, SpawnGuard, Migration, WorkerReport, WorkerExitInfo, build_nodemask, apply_mempolicy_with_flags, apply_nice. Tests are co-located in spawn/tests_*.rs siblings with shared fixtures in spawn/testing.rs.
worker — worker_main and the per-WorkType bodies. worker/io.rs holds the IO-backing RAII wrappers and worker/sched.rs holds the scheduler/clock/metric helpers (incl. set_sched_policy).

§Naming conventions

§“Intent” vs “Resolved” naming

Types named with an Intent suffix carry test-author intent (the input to the workload pipeline). Types named with a Resolved prefix carry runtime-resolved configuration (the output of intent + topology + cgroup state). AffinityIntent resolves to ResolvedAffinity at spawn time via resolve_affinity_for_cgroup.

CloneMode is a runtime-resolved value because the test author writes CloneMode::Fork / CloneMode::Thread directly (no resolution layer); the Mode suffix denotes a single kernel-facing dispatch decision rather than a two-stage intent/resolved pipeline.

SchedClass and SchedPolicy follow the same coarse-intent / concrete-runtime split using legacy kernel terminology rather than the Intent/Resolved naming — see SchedClass for the per-class mapping.

§“Churn” vs “Sweep” suffixes on `WorkType` variants

Variants whose names end in Churn cycle their target setting at high frequency to exercise the kernel’s per-task state machines under rapid transitions. WorkType::AffinityChurn samples a random CPU from the effective cpuset on every iteration (rand::rng().random_range); WorkType::PageFaultChurn touches a fresh random subset of pages each cycle (xorshift64). Most Churn variants pick each value randomly and independently of the previous one; WorkType::PolicyChurn is the exception — despite the Churn name it cycles through the supported scheduling policies in a fixed, ordered sequence keyed on the iteration counter (iterations % policies.len()).

Variants whose names end in Sweep rotate their target setting through an ordered list or range — the next value is a deterministic function of the iteration counter, not a random pick. WorkType::NiceSweep cycles nice values from effective_min..=19 modulo the range size; WorkType::NumaWorkingSetSweep rotates the working-set binding through target_nodes in declaration order. The intent is to walk a phase space evenly so every value gets comparable observation time, rather than producing the unbiased-random transitions Churn produces.

Choose Churn when the workload’s value is its transition-frequency entropy; choose Sweep when the workload must visit every phase deterministically.

Modules§

defaults: Named defaults for the parametric WorkType variants, used by WorkType::from_name. Extracting the magic numbers here provides a named home for the default values so tests and docs (e.g. doc/guide/src/architecture/workers.md) can cite them by constant name instead of each tracking a scattered integer literal. Every value carries a single-line comment naming the knob and its unit; the const names mirror the {variant_snake}_{field} convention so renames show up as compile errors in both sites.

Structs§

CustomCfg: Plain-old-data config payload for WorkType::Custom. Every field is Copy (integers / bools / a fixed byte array) so the whole struct survives fork() byte-faithfully in the child address space — no heap pointer, no shared mapping required. Read post-fork by the worker dispatch and handed to the closure via WorkerCtx::cfg.
CustomFn: Function-pointer wrapper for the WorkType::Custom variant’s run field.
Migration: A single CPU migration event observed by a worker.
MpolFlags: Optional mode flags for set_mempolicy(2).
PhaseSlice: Per-phase telemetry for one backdrop worker over one scenario step’s HOLD window [StepStart(k), StepEnd(k)). A backdrop worker spans every phase, so it accumulates a fresh slice between each parent-driven phase_epoch transition and finalizes it when the epoch moves (drain-on-change). Carries the per-phase subset of WorkerReport’s whole-run telemetry that has a host-side per-cgroup carrier (PhaseCgroupStats), so the host can pool slices across workers into per-epoch PhaseBuckets. Counter fields are per-phase deltas (end_snap − start_snap, re-baselined at each boundary); cpus_used and numa_pages are per-phase observations (gauges). Excludes iteration_costs_ns — that reservoir has no per-cgroup carrier at any level (it feeds only the run-level benchmark gate).
PipeTransferReport: Pipe-mode (-p) throughput reporting used by the ktstr-schbench-validate driver to mirror schbench’s avg worker transfer line; clamps the transfer size + scales bytes/sec exactly like schbench (schbench.c:1979-1982). Not in the prelude (validation-tool surface, like StandaloneReport). schbench’s pipe-mode (-p) throughput summary — the avg worker transfer line (schbench.c:1979-1982): the per-worker transfer rate as ops/sec plus the pretty-scaled bytes/sec. Built by pipe_transfer_report.
SchbenchConfig: User-facing config for the Schbench workload. Declarative config for the Schbench workload. Construct via SchbenchConfig::default (schbench’s own defaults) plus the chainable setters, e.g. SchbenchConfig::default().message_threads(2).worker_threads(4). Derives Clone/Debug/PartialEq/Eq/Hash/serde; the builder shape follows WorkloadConfig, but Eq+Hash (which WorkloadConfig and WorkSpec omit because of their transitive f64) are available here since every field is integer/bool – the ktstr f64-free leaf-config convention.
StandaloneReport: Whole-run result of a standalone (no-VM) schbench engine run, projected for the side-by-side comparison against the reference schbench. The percentile arrays index in SCHBENCH_PERCENTILES order (20.0, 50.0, 90.0, 99.0, 99.9), in microseconds. The sample counts are carried so a zero-sample run is visible rather than silently reported as an all-zero distribution.
TaobenchConfig: User-facing config for the Taobench workload. User-facing config for the Taobench workload — a bounded, evicting key-value cache with a fast hit path and a slow miss path, driven to a steady-state hit ratio.
TaobenchStandaloneReport: Host-side standalone runner + its report, backing the ktstr-taobench-validate validation driver (the analog of schbench’s run_standalone). Summary of a run_standalone run — the headline taobench metrics in the shape the reference taobench server reports (fast_qps / hit_rate / slow_qps) plus the derived total_qps (= fast + slow) and hit_ratio (= fast / total). Under open-loop arrival (arrival_rate > 0) it also carries the coordinated-omission serve-latency percentiles; these are None in closed loop (no intended-arrival schedule, so no serve latency is measured).
TaobenchStats: Whole-run / per-phase taobench engine counters, the carried aggregate on crate::workload::WorkerReport::taobench_whole and crate::assert::CgroupStats::taobench_whole. The host derives the run-level taobench_* Rate metrics from it; test authors normally assert those metrics rather than reading this struct directly. Taobench engine counters for one accounting window — a single phase epoch (the per-phase crate::workload::PhaseSlice::taobench carrier) or a whole worker run (the crate::workload::WorkerReport::taobench_whole / crate::assert::CgroupStats::taobench_whole aggregate). Integer-only so the enclosing PhaseSlice keeps Eq. get_cmds / get_misses are request-time; fast_ops / slow_ops are response-time (see the module docs). Self::merge pools two windows (Σ ops, MAX wall) and Self::total_ops is the throughput numerator; the host derives the run-level taobench_* Rate metrics from the pooled aggregate.
WorkSpec
WorkerCtx: Execution context handed to a WorkType::Custom worker function.
WorkerReport: Telemetry collected from a worker process after it stops.
WorkloadConfig: Configuration for spawning a group of worker processes.
WorkloadHandle: Handle to spawned worker tasks. Workers block until start() is called.

Enums§

AffinityIntent: Scenario-level affinity intent for a group of workers.
AluWidth: ALU/SIMD execution width for WorkType::AluHot.
CloneMode: How WorkloadHandle::spawn creates worker tasks.
FutexLockMode: Whether WorkType::PriorityInversion uses a PI-aware mutex or a plain futex.
MemPolicy: NUMA memory placement policy for worker processes.
ReapMode: How a WorkType::CgroupAttachStorm worker reaps the transient children it forks each iteration.
ResolvedAffinity: Resolved CPU affinity for a worker process.
SchedClass: Coarse Linux scheduling class identifier.
SchedPolicy: Linux scheduling policy for a worker process.
WakeMechanism: Wake mechanism between stages of a WorkType::WakeChain.
WorkPhase: A single phase in a WorkType::Sequence compound work pattern.
WorkType: What each worker process does during a scenario.
WorkTypeValidationError: Spawn-time validation failures for WorkType preconditions.
WorkerExitInfo: Reason a sentinel WorkerReport was synthesized — attached to the report’s exit_info field so operators can triage a missing or undecodable postcard payload without cross-referencing parent-side logs.

Constants§

SCHBENCH_PERCENTILES: The five latency percentiles reported by StandaloneReport and the per-phase metric path, in column order: 20.0, 50.0, 90.0, 99.0, 99.9. Matches schbench’s percentile rows (schbench.c show_latencies). Callers label the StandaloneReport percentile arrays by zipping with this slice rather than hard-coding an index-to-percentile mapping.

Traits§

WorkerReportClaim: Pointwise-claim accessors generated by #[derive(Claim)] on WorkerReport. One claim_<field> method per public field, taking &mut Verdict as the accumulator; container fields (BTreeSet/Vec) route through SetClaim/SeqClaim. Method dispatch keys on the stats struct’s type, so identical field names across distinct stats structs do not collide. For prelude-exported stats types the trait is preluded, so use ktstr::prelude::* brings the accessors into scope; otherwise import the trait from the stats type’s module.

Functions§

pipe_transfer_report: Pipe-mode (-p) throughput reporting used by the ktstr-schbench-validate driver to mirror schbench’s avg worker transfer line; clamps the transfer size + scales bytes/sec exactly like schbench (schbench.c:1979-1982). Not in the prelude (validation-tool surface, like StandaloneReport). Derive the pipe-mode avg worker transfer line from a run’s aggregate achieved_rps (completed cycles/sec over the true elapsed window), the requested pipe_transfer_bytes, and the resolved nr_workers. The figure is PER WORKER: schbench divides by loop_runtime = Σ each worker’s runtime (schbench.c:1697 sums worker->runtime; :1942-1943/:1979 divide by it), and Σ worker runtimes ≈ nr_workers * elapsed, so the per-worker rate is the aggregate achieved_rps / nr_workers — the label is literally “avg WORKER transfer”. (Dividing the aggregate by wall-clock alone would over-report by nr_workers×.) The transfer size is CLAMPED to PIPE_TRANSFER_BUFFER first — the engine moves only the clamped size per cycle (run applies the same .min()), matching schbench’s parse-time clamp (schbench.c:291-294) — so the throughput reflects the bytes ACTUALLY moved. Scaling is schbench’s pretty_size (schbench.c:1606). nr_workers is floored at 1 (no division by zero).
run_standalone: Run the schbench engine standalone — host-side, no VM, no phases — for run_secs seconds and project the whole-run result into a StandaloneReport for the side-by-side validation against the reference schbench.
set_thread_affinity: Set per-thread CPU affinity via sched_setaffinity(2).
taobench_run_standalone: Host-side standalone runner + its report, backing the ktstr-taobench-validate validation driver (the analog of schbench’s run_standalone). Host-side standalone run of the taobench engine for run_secs, returning a summary report — the analog of schbench’s run_standalone, backing the ktstr-taobench-validate driver for the side-by-side comparison against the reference taobench. NOT used in-VM (the scenario engine drives run there).

Module workload

Module workload Copy item path

§Module layout

§Naming conventions

§“Intent” vs “Resolved” naming

§“Churn” vs “Sweep” suffixes on WorkType variants

Modules§

Structs§

Enums§

Constants§

Traits§

Functions§

Module workload

§“Churn” vs “Sweep” suffixes on `WorkType` variants