Module cpu_util

Module cpu_util 

Source
Expand description

CPU-affinity utilities shared across the crate.

Two helpers for reading and parsing per-task CPU affinity:

  • parse_cpu_list decodes the kernel cpulist string format ("0-3,5,7-9") emitted by /proc/<pid>/status:Cpus_allowed_list and /sys/devices/system/cpu/online.
  • read_affinity calls sched_getaffinity(2) with a dynamically-sized buffer so CONFIG_NR_CPUS > 1024 hosts are handled correctly (libc’s fixed cpu_set_t is only 1024 bits).

Both produce sorted-deduped Vec<u32> of CPU ids and route garbled / over-cap input to None. Used by the per-thread profiler (ctprof) AND the VM topology planner (vmm::host_topology) — the function shape is generic enough that either subsystem could have owned it; keeping the impls here so neither has to depend on the other for a CPU-list helper.

§Why this is NOT crate::topology::parse_cpu_list

crate::topology carries its own parse_cpu_list (returns Result<Vec<usize>>) and parse_cpu_list_lenient (returns Vec<usize>, never fails). The split is deliberate, not a duplicate to consolidate:

  • Threat model. This module’s parser ingests /proc/<tid>/status data captured from arbitrary tasks on the host. A hostile or corrupt Cpus_allowed_list: value like 0-4294967295 would allocate 16 GiB without the MAX_CPU_RANGE_EXPANSION cap. The topology parser ingests operator-supplied VM config — no untrusted-input concerns, no cap needed.
  • Return shape. Option<Vec<u32>> here vs Result<Vec<usize>> / Vec<usize> in topology. The capture path needs to distinguish “no data” (None) from “data but garbled” (also None for now, with an explicit comment); the topology path needs anyhow::Error for upstream ? propagation and Vec<usize> to interop with sysfs APIs that speak usize.
  • Dedup semantics. This module dedups duplicates produced by overlapping ranges (0-2,1[0,1,2]); the topology parser preserves duplicates so callers detecting operator config errors (e.g. accidentally listing the same CPU twice) can surface them.

Unifying the two behind a generic helper would require either collapsing one set of invariants into the other or carrying both behaviors through a config struct — neither produces a cleaner end result than the current cohabitation.

Constants§

AFFINITY_INITIAL_BITS
Initial number of CPU bits the affinity buffer starts at. 8192 is the x86_64 CONFIG_NR_CPUS ceiling (NR_CPUS_RANGE_END with CPUMASK_OFFSTACK; also the MAXSMP default), so no x86_64 host exceeds it and the overwhelming majority resolve on the first syscall.
AFFINITY_MAX_BITS
Maximum number of CPU bits read_affinity is willing to allocate for. 262144 bits = 32 KiB of mask data, well above the largest in-production CONFIG_NR_CPUS this project targets. Capping bounds the worst-case allocation and bounds the retry loop’s iteration count (log2(AFFINITY_MAX_BITS / AFFINITY_INITIAL_BITS) = 5 doublings).

Functions§

parse_cpu_list
Parse a cpulist string of the form "0-3,5,7-9" into a sorted deduped vec of CPU ids. None on empty input or any malformed token (partial results are rejected so the caller can distinguish “no data” from “data but garbled”).
read_affinity
Read the effective CPU affinity for a task via the sched_getaffinity(2) syscall. The kernel gates sched_getaffinity on security_task_getscheduler(p) only — under the default DAC config this is unrestricted (any task may read any other task’s affinity); an active LSM (SELinux/Yama) may return EPERM. Returns sorted CPU ids. None on syscall failure (EPERM, ESRCH) or when the kernel’s mask exceeds AFFINITY_MAX_BITS (hosts beyond 262144 CPUs).