Expand description
Worker process management and telemetry.
Workers are fork()ed processes by default (CloneMode::Fork,
the #[default]) so each can be placed in its own cgroup;
CloneMode::Thread instead uses std::thread::spawn, so those
workers share the parent’s tgid, address space, and signal-handler
table. Key types:
WorkType– what each worker doesWorkloadConfig– spawn configuration (count, affinity, work type, policy)WorkloadHandle– RAII handle to spawned workersWorkerReport– per-worker telemetry collected after stopAffinityIntent– per-worker affinity intent (Inherit, LlcAligned, Exact, etc.)ResolvedAffinity– resolved CPU affinity for workersWorkSpec– workload definition for a single group of workers within a cgroupWorkPhase– a single phase in aWorkType::Sequencecompound work patternSchedPolicy– Linux scheduling policy for a worker processMemPolicy– NUMA memory placement policy for worker processes
See the WorkSpec Types and Worker Processes chapters of the guide.
§Module layout
affinity—AffinityIntent/ResolvedAffinity+ the resolver andsched_setaffinitywrapper.config— declarative test-author input (WorkloadConfig,WorkSpec,SchedPolicy,MemPolicy,MpolFlags,CloneMode,FutexLockMode,WakeMechanism,AluWidth) and thehumantime_serde_helpershared by everyDurationfield.types—WorkType/WorkPhase/WorkTypeValidationErrorand the WorkType naming surface (from_name,suggest,ALL_NAMES).spawn— runtime spawn pipeline:WorkloadHandle,SpawnGuard,Migration,WorkerReport,WorkerExitInfo,build_nodemask,apply_mempolicy_with_flags,apply_nice. Tests are co-located inspawn/tests_*.rssiblings with shared fixtures inspawn/testing.rs.worker—worker_mainand the per-WorkType bodies.worker/io.rsholds the IO-backing RAII wrappers andworker/sched.rsholds the scheduler/clock/metric helpers (incl.set_sched_policy).
§Naming conventions
§“Intent” vs “Resolved” naming
Types named with an Intent suffix carry test-author intent
(the input to the workload pipeline). Types named with a
Resolved prefix carry runtime-resolved configuration (the
output of intent + topology + cgroup state). AffinityIntent
resolves to ResolvedAffinity at spawn time via
resolve_affinity_for_cgroup.
CloneMode is a runtime-resolved value because the test
author writes CloneMode::Fork / CloneMode::Thread directly
(no resolution layer); the Mode suffix denotes a single
kernel-facing dispatch decision rather than a two-stage
intent/resolved pipeline.
SchedClass and SchedPolicy follow the same coarse-intent /
concrete-runtime split using legacy kernel terminology rather
than the Intent/Resolved naming — see SchedClass for
the per-class mapping.
§“Churn” vs “Sweep” suffixes on WorkType variants
Variants whose names end in Churn cycle their target setting at
high frequency to exercise the kernel’s per-task state machines
under rapid transitions. WorkType::AffinityChurn samples a
random CPU from the effective cpuset on every iteration
(rand::rng().random_range); WorkType::PageFaultChurn touches
a fresh random subset of pages each cycle (xorshift64). Most Churn
variants pick each value randomly and independently of the
previous one; WorkType::PolicyChurn is the exception — despite
the Churn name it cycles through the supported scheduling
policies in a fixed, ordered sequence keyed on the iteration
counter (iterations % policies.len()).
Variants whose names end in Sweep rotate their target setting
through an ordered list or range — the next value is a
deterministic function of the iteration counter, not a random
pick. WorkType::NiceSweep cycles nice values from
effective_min..=19 modulo the range size;
WorkType::NumaWorkingSetSweep rotates the working-set
binding through target_nodes in declaration order. The
intent is to walk a phase space evenly so every value gets
comparable observation time, rather than producing the
unbiased-random transitions Churn produces.
Choose Churn when the workload’s value is its
transition-frequency entropy; choose Sweep when the workload
must visit every phase deterministically.
Modules§
- defaults
- Named defaults for the parametric
WorkTypevariants, used byWorkType::from_name. Extracting the magic numbers here provides a named home for the default values so tests and docs (e.g.doc/guide/src/architecture/workers.md) can cite them by constant name instead of each tracking a scattered integer literal. Every value carries a single-line comment naming the knob and its unit; the const names mirror the{variant_snake}_{field}convention so renames show up as compile errors in both sites.
Structs§
- Custom
Cfg - Plain-old-data config payload for
WorkType::Custom. Every field isCopy(integers / bools / a fixed byte array) so the whole struct survivesfork()byte-faithfully in the child address space — no heap pointer, no shared mapping required. Read post-fork by the worker dispatch and handed to the closure viaWorkerCtx::cfg. - Custom
Fn - Function-pointer wrapper for the
WorkType::Customvariant’srunfield. - Migration
- A single CPU migration event observed by a worker.
- Mpol
Flags - Optional mode flags for
set_mempolicy(2). - Phase
Slice - Per-phase telemetry for one backdrop worker over one scenario
step’s HOLD window
[StepStart(k), StepEnd(k)). A backdrop worker spans every phase, so it accumulates a fresh slice between each parent-drivenphase_epochtransition and finalizes it when the epoch moves (drain-on-change). Carries the per-phase subset ofWorkerReport’s whole-run telemetry that has a host-side per-cgroup carrier (PhaseCgroupStats), so the host can pool slices across workers into per-epochPhaseBuckets. Counter fields are per-phase deltas (end_snap − start_snap, re-baselined at each boundary);cpus_usedandnuma_pagesare per-phase observations (gauges). Excludesiteration_costs_ns— that reservoir has no per-cgroup carrier at any level (it feeds only the run-level benchmark gate). - Pipe
Transfer Report - Pipe-mode (
-p) throughput reporting used by thektstr-schbench-validatedriver to mirror schbench’savg worker transferline; clamps the transfer size + scales bytes/sec exactly like schbench (schbench.c:1979-1982). Not in the prelude (validation-tool surface, likeStandaloneReport). schbench’s pipe-mode (-p) throughput summary — theavg worker transferline (schbench.c:1979-1982): the per-worker transfer rate as ops/sec plus the pretty-scaled bytes/sec. Built bypipe_transfer_report. - Schbench
Config - User-facing config for the
Schbenchworkload. Declarative config for theSchbenchworkload. Construct viaSchbenchConfig::default(schbench’s own defaults) plus the chainable setters, e.g.SchbenchConfig::default().message_threads(2).worker_threads(4). Derives Clone/Debug/PartialEq/Eq/Hash/serde; the builder shape followsWorkloadConfig, butEq+Hash(whichWorkloadConfigandWorkSpecomit because of their transitivef64) are available here since every field is integer/bool – the ktstr f64-free leaf-config convention. - Standalone
Report - Whole-run result of a standalone (no-VM) schbench engine run, projected for
the side-by-side comparison against the reference schbench. The percentile
arrays index in
SCHBENCH_PERCENTILESorder (20.0, 50.0, 90.0, 99.0, 99.9), in microseconds. The sample counts are carried so a zero-sample run is visible rather than silently reported as an all-zero distribution. - Taobench
Config - User-facing config for the
Taobenchworkload. User-facing config for theTaobenchworkload — a bounded, evicting key-value cache with a fast hit path and a slow miss path, driven to a steady-state hit ratio. - Taobench
Standalone Report - Host-side standalone runner + its report, backing the
ktstr-taobench-validatevalidation driver (the analog of schbench’srun_standalone). Summary of arun_standalonerun — the headline taobench metrics in the shape the reference taobench server reports (fast_qps/hit_rate/slow_qps) plus the derivedtotal_qps(= fast + slow) andhit_ratio(= fast / total). Under open-loop arrival (arrival_rate > 0) it also carries the coordinated-omission serve-latency percentiles; these areNonein closed loop (no intended-arrival schedule, so no serve latency is measured). - Taobench
Stats - Whole-run / per-phase taobench engine counters, the carried aggregate on
crate::workload::WorkerReport::taobench_wholeandcrate::assert::CgroupStats::taobench_whole. The host derives the run-leveltaobench_*Rate metrics from it; test authors normally assert those metrics rather than reading this struct directly. Taobench engine counters for one accounting window — a single phase epoch (the per-phasecrate::workload::PhaseSlice::taobenchcarrier) or a whole worker run (thecrate::workload::WorkerReport::taobench_whole/crate::assert::CgroupStats::taobench_wholeaggregate). Integer-only so the enclosingPhaseSlicekeepsEq.get_cmds/get_missesare request-time;fast_ops/slow_opsare response-time (see the module docs).Self::mergepools two windows (Σ ops, MAX wall) andSelf::total_opsis the throughput numerator; the host derives the run-leveltaobench_*Rate metrics from the pooled aggregate. - Work
Spec - Worker
Ctx - Execution context handed to a
WorkType::Customworker function. - Worker
Report - Telemetry collected from a worker process after it stops.
- Workload
Config - Configuration for spawning a group of worker processes.
- Workload
Handle - Handle to spawned worker tasks. Workers block until
start()is called.
Enums§
- Affinity
Intent - Scenario-level affinity intent for a group of workers.
- AluWidth
- ALU/SIMD execution width for
WorkType::AluHot. - Clone
Mode - How
WorkloadHandle::spawncreates worker tasks. - Futex
Lock Mode - Whether
WorkType::PriorityInversionuses a PI-aware mutex or a plain futex. - MemPolicy
- NUMA memory placement policy for worker processes.
- Reap
Mode - How a
WorkType::CgroupAttachStormworker reaps the transient children it forks each iteration. - Resolved
Affinity - Resolved CPU affinity for a worker process.
- Sched
Class - Coarse Linux scheduling class identifier.
- Sched
Policy - Linux scheduling policy for a worker process.
- Wake
Mechanism - Wake mechanism between stages of a
WorkType::WakeChain. - Work
Phase - A single phase in a
WorkType::Sequencecompound work pattern. - Work
Type - What each worker process does during a scenario.
- Work
Type Validation Error - Spawn-time validation failures for
WorkTypepreconditions. - Worker
Exit Info - Reason a sentinel
WorkerReportwas synthesized — attached to the report’sexit_infofield so operators can triage a missing or undecodable postcard payload without cross-referencing parent-side logs.
Constants§
- SCHBENCH_
PERCENTILES - The five latency percentiles reported by
StandaloneReportand the per-phase metric path, in column order: 20.0, 50.0, 90.0, 99.0, 99.9. Matches schbench’s percentile rows (schbench.cshow_latencies). Callers label theStandaloneReportpercentile arrays by zipping with this slice rather than hard-coding an index-to-percentile mapping.
Traits§
- Worker
Report Claim - Pointwise-claim accessors generated by
#[derive(Claim)]onWorkerReport. Oneclaim_<field>method per public field, taking&mut Verdictas the accumulator; container fields (BTreeSet/Vec) route throughSetClaim/SeqClaim. Method dispatch keys on the stats struct’s type, so identical field names across distinct stats structs do not collide. For prelude-exported stats types the trait is preluded, souse ktstr::prelude::*brings the accessors into scope; otherwise import the trait from the stats type’s module.
Functions§
- pipe_
transfer_ report - Pipe-mode (
-p) throughput reporting used by thektstr-schbench-validatedriver to mirror schbench’savg worker transferline; clamps the transfer size + scales bytes/sec exactly like schbench (schbench.c:1979-1982). Not in the prelude (validation-tool surface, likeStandaloneReport). Derive the pipe-modeavg worker transferline from a run’s aggregateachieved_rps(completed cycles/sec over the true elapsed window), the requestedpipe_transfer_bytes, and the resolvednr_workers. The figure is PER WORKER: schbench divides byloop_runtime= Σ each worker’s runtime (schbench.c:1697sumsworker->runtime;:1942-1943/:1979divide by it), andΣ worker runtimes ≈ nr_workers * elapsed, so the per-worker rate is the aggregateachieved_rps / nr_workers— the label is literally “avg WORKER transfer”. (Dividing the aggregate by wall-clock alone would over-report bynr_workers×.) The transfer size is CLAMPED toPIPE_TRANSFER_BUFFERfirst — the engine moves only the clamped size per cycle (runapplies the same.min()), matching schbench’s parse-time clamp (schbench.c:291-294) — so the throughput reflects the bytes ACTUALLY moved. Scaling is schbench’spretty_size(schbench.c:1606).nr_workersis floored at 1 (no division by zero). - run_
standalone - Run the schbench engine standalone — host-side, no VM, no phases — for
run_secsseconds and project the whole-run result into aStandaloneReportfor the side-by-side validation against the reference schbench. - set_
thread_ affinity - Set per-thread CPU affinity via
sched_setaffinity(2). - taobench_
run_ standalone - Host-side standalone runner + its report, backing the
ktstr-taobench-validatevalidation driver (the analog of schbench’srun_standalone). Host-side standalone run of the taobench engine forrun_secs, returning a summary report — the analog of schbench’srun_standalone, backing thektstr-taobench-validatedriver for the side-by-side comparison against the reference taobench. NOT used in-VM (the scenario engine drivesrunthere).