read_affinity

Function read_affinity 

Source
pub fn read_affinity(tid: i32) -> Option<Vec<u32>>
Expand description

Read the effective CPU affinity for a task via the sched_getaffinity(2) syscall. The kernel gates sched_getaffinity on security_task_getscheduler(p) only — under the default DAC config this is unrestricted (any task may read any other task’s affinity); an active LSM (SELinux/Yama) may return EPERM. Returns sorted CPU ids. None on syscall failure (EPERM, ESRCH) or when the kernel’s mask exceeds AFFINITY_MAX_BITS (hosts beyond 262144 CPUs).

§Dynamic buffer sizing

The kernel’s SYSCALL_DEFINE3(sched_getaffinity) (kernel/sched/syscalls.c) rejects a caller buffer shorter than nr_cpu_ids / BITS_PER_BYTE with EINVAL. The x86_64 CONFIG_NR_CPUS maximum is 8192 (NR_CPUS_RANGE_END with CPUMASK_OFFSTACK; without it the max is 512); other architectures may allow higher (large NUMA / partitioning hardware). libc’s fixed [libc::cpu_set_t] is only 1024 bits wide, so calling sched_getaffinity with size_of::<cpu_set_t>() against a CONFIG_NR_CPUS > 1024 kernel fails EINVAL even when the caller has legitimate access.

This helper avoids the cap by allocating a dynamically-sized Vec<c_ulong> (an array of kernel unsigned long — the wire format the syscall expects, aligned and byte-length a multiple of sizeof(unsigned long) per the kernel’s second validation). On EINVAL the buffer doubles and the call retries, capped at AFFINITY_MAX_BITS = 262144 (32 KiB of mask data — covers every real-world CONFIG_NR_CPUS setting and bounds the worst-case allocation).

§Error-class handling

  • EINVAL → buffer too small. Double and retry until the ceiling is reached, then surface None.
  • EPERM / ESRCH → real access / process-identity failures. Return None so the caller falls back to the procfs Cpus_allowed_list: path. That field is emitted in /proc/<tid>/status and is governed by procfs DAC (open / directory-traversal permission), not the syscall’s security_task_getscheduler LSM hook, so it can succeed where an active LSM denied the syscall.
  • Any other error → return None. The procfs fallback will produce the correct value or its own None.

Without this split, the previous implementation collapsed every error to None indistinguishably — EINVAL on a >1024-CPU host was treated the same as EPERM, and every caller had to rely on the procfs fallback for correctness, making the syscall path effectively useless on the very hosts where affinity data matters most (1000-plus-CPU NUMA boxes).