Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Work Types

WorkType decides what each worker process does — and each variant targets a specific kernel scheduling path, so a test pins down the code path a regression lives in. ForkExit hammers wake_up_new_task and the exit path (do_group_exit / wait_task_zombie); AffinityChurn drives affine_move_task and migration_cpu_stop. If you know which path your scheduler change touches, there is usually a work type aimed at it.

Choosing a work type

Scheduler behavior to testWork type
Basic load balancing / fairnessSpinWait (default)
Wake placement / sleep-wake cyclesYieldHeavy, FutexPingPong
CPU borrowing / idle balanceBursty, IdleChurn
Cross-CPU wake latencyPipeIo, CachePipe, WakeChain
Cache-aware schedulingCachePressure, CacheYield
Fan-out wake stormsFutexFanOut, FanOutCompute
Broadcast wakeups (thundering herd)ThunderingHerd
epoll exclusive-wake pathsEpollStorm
Timer (hrtimer) wake-to-run latencyTimerLatency
IRQ/softirq wake pathsIrqWake, NetTraffic
Wakeup + request latency (schbench parity)Schbench
KV object-cache request mix (taobench parity)Taobench
Task creation/destruction pressureForkExit
Priority reweighting / nice dynamicsNiceSweep
Affinity churn / forced migrationAffinityChurn, CrossAffinityChurn, NumaMigrationChurn
Scheduling-class transitionsPolicyChurn
Cgroup migration pathsCgroupChurn, CgroupAttachStorm
Page fault / TLB pressurePageFaultChurn
NUMA locality under migrationNumaWorkingSetSweep
Lock contention / convoy effectMutexContention
Priority inversionPriorityInversion
RT starving or preempting CFSRtStarvation, PreemptStorm
Signal delivery pressureSignalStorm
Producer/consumer imbalanceProducerConsumerImbalance
Block-I/O D-state cyclesIoSyncWrite, IoRandRead, IoConvoy
Mixed / phased real-world patternsMixed, Sequence
High-IPC compute, SMT interferenceAluHot, SmtSiblingSpin, IpcVariance
Arbitrary user-defined workloadCustom

Variants by intent

The WorkType enum in ktstr::workload is the source of truth — run cargo doc for full per-variant semantics, parameters, and kernel-path citations. The shape of each family:

CPU primitives. SpinWait — tight spin loop, pure CPU. YieldHeavysched_yield every iteration, exercising wake/sleep paths. Mixed — spin burst then yield. AluHot { width } — parallel multiply chains at high IPC, optionally SIMD. SmtSiblingSpin — paired PAUSE-spin on two SMT siblings. IpcVariance { hot_iters, cold_iters, period_iters } — alternating high-IPC and cache-miss phases.

Block I/O (against /dev/vda; per-worker tempfile fallback when absent). IoSyncWrite — striped O_SYNC pwrites + fdatasync, fsync-heavy D-state cycles. IoRandRead — 4 KB O_DIRECT preads at random offsets, high-IOPS short D-states. IoConvoy — interleaved sequential writes and random reads with periodic fdatasync.

Burst-and-sleep. Bursty { burst_duration, sleep_duration } — CPU burst then sleep, freeing CPUs for borrowing. IdleChurn — burst then nanosleep, exercising hrtimer + idle-class paths.

Cache pressure. CachePressure { size_kib, stride } — strided read-modify-write sized to pressure L1. CacheYield — the same plus sched_yield, testing re-placement with a cache-hot working set.

Wake placement and cross-CPU paths. PipeIo — CPU burst then 1-byte pipe exchange with a partner. FutexPingPong { spin_iters } — paired futex wait/wake (non-WF_SYNC path). CachePipe — cache-hot working set + pipe wake. FutexFanOut { fan_out, spin_iters } — one messenger wakes N receivers, which measure wake-to-run latency. FanOutCompute — fan-out plus matrix-multiply think time per receiver. WakeChain { depth, wake, work_per_hop } — a ring of waker-wakee hops via pipe (WF_SYNC) or futex. AsymmetricWaker — paired workers in mismatched scheduling classes sharing a futex. EpollStorm { producers, consumers, events_per_burst } — eventfd producers + epoll_wait consumers (exclusive autoremove wake). ThunderingHerd { waiters, batches, inter_batch_ms } — N waiters on one futex word, broadcast-woken.

Timer and IRQ wakes (the AF_PACKET variants need #[ktstr_test(network = ...)]). TimerLatency { interval_us } — cyclictest-style absolute-deadline hrtimer wake. NetTraffic — AF_PACKET self-traffic driving virtio-net RX hardirq + NAPI softirq. IrqWake — paired sender/receiver; the receiver blocked in recvfrom is woken from NET_RX softirq context.

Lifecycle and class churn. ForkExit — rapid fork + _exit + waitpid cycles. NiceSweep — nice level cycled -20..19 (reweight_task; negative values skipped without CAP_SYS_NICE). AffinityChurn { spin_iters } — self-directed sched_setaffinity to random CPUs. CrossAffinityChurn — workers rewrite their cgroup siblings’ affinity; needs a dedicated cgroup. PolicyChurnSCHED_OTHERBATCHIDLE (→ FIFO/RR with CAP_SYS_NICE) via __sched_setscheduler. NumaMigrationChurn { period_ms } — affinity rotated across NUMA nodes. CgroupChurn { groups, cycle_ms } — membership cycled between sibling cgroups. CgroupAttachStorm — transient children migrated into a sibling cgroup mid-exit (attach-path leader race).

Memory pressure / NUMA. PageFaultChurn { region_kib, touches_per_cycle, spin_iters } — mmap, fault random 4 KiB pages through do_anonymous_page, MADV_DONTNEED, repeat. NumaWorkingSetSweep — the working set rotated across NUMA nodes via mbind.

Lock contention. MutexContention { contenders, hold_iters, work_iters } — N-way futex mutex contention (convoy effect, lock-holder preemption). PriorityInversion — three priority tiers contending for one lock, PI or plain futex mode.

Signal / preemption pressure. SignalStorm — paired workers fire tkill(partner, SIGUSR2) between bursts. PreemptStorm — one SCHED_FIFO worker preempts CFS spinners at ~kHz. RtStarvationSCHED_FIFO workers monopolize CPUs while CFS workers starve.

Compound. Sequence { first, rest } — ordered WorkPhases (Spin / Sleep / Yield / Io / AluHot, each with a Duration) looped for the run:

WorkType::Sequence {
    first: WorkPhase::Spin(Duration::from_millis(100)),
    rest: vec![
        WorkPhase::Sleep(Duration::from_millis(50)),
        WorkPhase::Yield(Duration::from_millis(20)),
    ],
}

User-supplied. Custom — your own work function, with a fork-safe config payload. The how-to lives in Custom Scenarios.

use ktstr::prelude::*; brings in WorkType, WorkSpec, WorkPhase, SchedPolicy, and the parameter enums (SchbenchConfig, TaobenchConfig, FutexLockMode, WakeMechanism, SchedClass, ReapMode, AluWidth). Note the prelude also exports an unrelated Phase from the assertion layer — WorkType::Sequence uses WorkPhase.

Constructors and defaults

Every parameterized variant has a snake-case constructor — WorkType::bursty(burst, sleep), WorkType::mutex_contention(4, 256, 1024), WorkType::wake_chain(depth, wake, work_per_hop), and so on — with parameter validation where zero values are meaningless. Duration-typed parameters (Bursty, IdleChurn, WakeChain) take std::time::Duration, not raw integers.

WorkType::from_name("FutexPingPong") resolves a PascalCase name to a default-parameterized instance; the per-variant defaults are the constants in the ktstr::workload::defaults module. Sequence and Custom require explicit construction and return None from name lookup. WorkType::ALL_NAMES lists every name; WorkType::name() maps back.

Grouped work types

PipeIo, FutexPingPong, and CachePipe pair workers and require even num_workers. FutexFanOut and FanOutCompute require num_workers divisible by fan_out + 1 (one messenger + N receivers per group). MutexContention requires divisibility by contenders. WorkType::worker_group_size() returns the group size for these variants, None for ungrouped types.

Schbench

Schbench re-expresses schbench’s default mode natively: message threads batch-wake worker threads (wakeup latency), each worker think-sleeps then does matrix work under a per-CPU lock (request latency). SchbenchConfig’s fields map schbench’s -m/-t/-F/-n/-s/-L/-R/-A/-p flags — the rustdoc has the full CLI-parity table, including which knobs ktstr’s topology sets for you. Use a single ktstr worker (workers(1)): the message/worker parallelism is this variant’s internal thread topology, not ktstr worker processes.

Worker teardown and process groups

Every worker calls setpgid(0, 0) after fork, and teardown SIGKILLs the worker’s whole process group — on graceful stop, on escalation, and again at handle drop. Anything a worker spawns that inherits its pgid (a helper binary, a subshell) dies with it. A child that must outlive the worker needs its own process group (setpgid(child_pid, 0)) or an explicit wait before the worker returns.

Clone mode and pcomm

Workers fork by default (CloneMode::Fork: one process per worker); CloneMode::Thread runs them as threads sharing the parent’s thread group. Setting pcomm on a WorkSpec or CgroupDef routes workers through a fork-then-thread path: one forked leader whose comm is the pcomm value hosts the matching workers as its threads — the per-process-leader shape schedulers expect from real applications. See the WorkloadConfig and WorkSpec rustdoc for the mechanics.

WorkloadConfig

WorkloadConfig is the low-level spawn spec CgroupDef builds internally; use it directly only when calling WorkloadHandle::spawn from a custom scenario. Its default is one SpinWait worker with inherited affinity and policy. The composed field carries secondary WorkSpec groups spawned alongside the primary; reports identify them by group_idx. Topology-aware AffinityIntent variants (SingleCpu, LlcAligned, CrossCgroup, SmtSiblingPair) need scenario context and are rejected at the direct-spawn gate. See Workers and Workloads.

Scheduling policies

pub enum SchedPolicy {
    Normal,
    Batch,
    Idle,
    Fifo(u32),       // priority 1-99
    RoundRobin(u32), // priority 1-99
    Deadline { runtime: Duration, deadline: Duration, period: Duration },
    Ext,             // SCHED_EXT — route through the loaded BPF scheduler
}

Fifo, RoundRobin, and Deadline require CAP_SYS_NICE. A malformed Deadline (runtime <= deadline <= period violated) fails with a diagnostic before the syscall. Ext is SCHED_EXT: it routes the worker through the loaded sched_ext scheduler even under a SCX_OPS_SWITCH_PARTIAL scheduler that leaves other tasks in fair, and requires CONFIG_SCHED_CLASS_EXT in the guest kernel.