Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Topology

Schedulers make placement decisions across LLC and NUMA boundaries — where to wake a task, when a migration is worth the cache cost. Each ktstr test declares the topology those decisions should be tested against, and the VM it runs in actually has it: the declared NUMA nodes, cache domains, and SMT siblings are what the guest kernel sees.

The notation

Topologies render as {n}n{l}l{c}c{t}t — NUMA nodes, LLCs, cores per LLC, threads per core. One quirk to internalize:

Note

The l count is the total LLC count across the VM, not per-node. 2n4l4c2t is 2 NUMA nodes and 4 LLCs total (2 per node), 4 cores per LLC, 2 threads per core = 4 × 4 × 2 = 32 vCPUs.

Containment is strict — threads in a core, cores in an LLC, LLCs in a NUMA node — and guest CPUs are numbered sequentially through it. 1n2l4c2t (16 vCPUs) lays out as:

node 0
├─ LLC 0                      ├─ LLC 1
│  ├─ core 0: cpu 0, 1        │  ├─ core 4: cpu 8,  9
│  ├─ core 1: cpu 2, 3        │  ├─ core 5: cpu 10, 11
│  ├─ core 2: cpu 4, 5        │  ├─ core 6: cpu 12, 13
│  └─ core 3: cpu 6, 7        │  └─ core 7: cpu 14, 15

Most tests use one NUMA node; multi-NUMA topologies matter when the scheduler weighs memory locality. The gauntlet sweeps a test across a whole preset matrix of these shapes.

What a test declares — and what it gets

The #[ktstr_test] attributes numa_nodes, llcs, cores, threads declare the shape (see the macro reference for defaults and inheritance). The run output echoes the topology the guest booted with — the [topo=...] tag in failure headers and the timeline header:

ktstr_test 'throughput_gate' [sched=scx-ktstr] [topo=1n1l2c1t] failed:
...
topology: 1n1l2c1t (2 cpus)  scheduler: my_sched  scenario: throughput_gate  duration: 15.0s

To see a host’s physical layout in the same vocabulary, ktstr topo:

CPUs:       64
LLCs:       4
NUMA nodes: 1
  LLC 0 (node 0): [0, 1, 2, 3, 4, 5, 6, 7, 32, 33, 34, 35, 36, 37, 38, 39]
  LLC 1 (node 0): [8, 9, 10, 11, 12, 13, 14, 15, 40, 41, 42, 43, 44, 45, 46, 47]
  LLC 2 (node 0): [16, 17, 18, 19, 20, 21, 22, 23, 48, 49, 50, 51, 52, 53, 54, 55]
  LLC 3 (node 0): [24, 25, 26, 27, 28, 29, 30, 31, 56, 57, 58, 59, 60, 61, 62, 63]

(Host CPU numbering differs from the guest’s sequential scheme — here SMT siblings sit 32 apart — which is exactly why tests declare a topology instead of inheriting the host’s.)

Cpusets from topology

Scenarios don’t hard-code CPU lists; a CpusetSpec resolves against the test’s topology at runtime. On 1n2l4c2t, CpusetSpec::Llc(0) resolves to CPUs 0-7, so the cgroup’s cpuset.cpus is written as 0-7; Llc and Numa cover their full domain, while the fractional and partition variants (Range, Disjoint, Overlap) slice the usable-CPU pool.

Querying topology from a scenario

Ctx.topo is a TestTopology. The queries scenario authors actually use:

  • total_cpus(), num_llcs(), num_numa_nodes() — sizes, e.g. for skip guards (if ctx.topo.num_llcs() < 2 { return Ok(AssertResult::skip(...)) }).
  • usable_cpus() / usable_cpuset() — CPUs available for workload placement. On topologies with more than 2 CPUs the last CPU is reserved for the root cgroup (on 8 CPUs: usable = 0-6). Built-in scenarios and fractional CpusetSpecs use this pool automatically.
  • llc_aligned_cpuset(idx) / numa_aligned_cpuset(node) — the CPU set of one LLC or one node’s LLCs.
  • numa_nodes_for_cpuset(cpus) — which nodes a CPU set touches; this derives the expected-node set for NUMA checks.
  • numa_distance(from, to) — kernel conventions: 10 local, higher is farther, 255 unreachable/unknown. VM topologies without explicit distances report 10 local / 20 remote.
  • node_meminfo(node) / is_memory_only(node) — per-node memory and CXL-style memory-only node detection.

Ctx::cpuset_cpus(&spec) returns the CPU count a spec resolves to — useful for sizing worker counts by hand. Its denominator is the topology-level cpuset, not any cgroup’s currently-effective one; for cgroup-aware sizing prefer CgroupDef::workers_pct, which resolves against the cgroup’s own cpuset at apply time.

The full method catalog (construction, LlcInfo, CPU-list parsing) is in the TestTopology rustdoc.

  • Gauntlet — preset topology matrices and the constraints that filter them.
  • MemPolicy — NUMA memory placement to pair with multi-node topologies.
  • Resource Budget — how the host’s topology is carved up when tests run concurrently.