Enum KernelTarget

Source

#[non_exhaustive]pub enum KernelTarget {
    Symbol(Cow<'static, str>),
    Direct(u64),
    Kva(u64),
    PerCpuField {
        symbol: Cow<'static, str>,
        field: Cow<'static, str>,
        cpu: u32,
    },
    TaskField {
        pid: u32,
        expected_start_time_ns: u64,
        field: Cow<'static, str>,
    },
}

Expand description

Host-side write/read target for the kernel-memory ops (Op::WriteKernelHot / Op::WriteKernelCold / Op::ReadKernelHot / Op::ReadKernelCold).

Each variant names a kernel address by the resolution path the host coordinator will take when the op fires; the actual GuestKernel write helpers consume the resolved KVA. The variant chosen here picks WHICH translation path (KASLR-aware kernel-image base for Self::Symbol, PAGE_OFFSET for Self::Direct, page-table walk for Self::Kva, or per-CPU dereference for Self::PerCpuField).

§Semantic risk — writing to load-bearing scheduler state

ktstr does not gate or filter target addresses. The framework trusts the test author to know what they are pointing at. That trust includes a class of fields where a raw write silently breaks downstream kernel invariants the test author did not intend to perturb. By design, mitigation is documentation-only: the framework will not refuse a write nor emit a runtime warn — the test author owns the choice. The cases to know about:

Per-runqueue counters maintained by the scheduler classes. Raw writes skip the side-effects the kernel encodes in the maintainer functions, leaving cross-class accounting in an inconsistent state.

struct rq.nr_running — the per-CPU runqueue task count. add_nr_running / sub_nr_running (kernel/sched/sched.h) also (a) fire the sched_update_nr_running_tp tracepoint and (b) call sched_update_tick_dependency(rq) (the NOHZ_FULL per-CPU tick gating logic); add_nr_running additionally sets the root-domain overloaded bit (rq->rd->overloaded) on the prev_nr < 2 && new_nr >= 2 transition. A bare 8-byte store skips all of those; the counter and the root-domain overload signal diverge, the NOHZ_FULL CPU may stop or start receiving ticks against the test author’s intent, and downstream load-balance decisions read a count that no longer matches reality.
struct cfs_rq.h_nr_runnable / h_nr_queued / h_nr_idle (kernel/sched/sched.h struct cfs_rq) — hierarchical CFS task counts maintained by account_entity_enqueue / dequeue with cascade up the task group tree. Raw write skips parent-cfs_rq propagation and breaks group scheduling accounting.
struct rt_rq.rt_nr_running (kernel/sched/sched.h struct rt_rq) — RT class runqueue task count; updated by inc_rt_tasks / dec_rt_tasks which also maintain the per-rt_rq overloaded bit and the highest_prio.curr/next priority-pushable tracking.
struct dl_rq.dl_nr_running / running_bw / this_bw (kernel/sched/sched.h struct dl_rq) — DEADLINE class counters and bandwidth tracking; add_running_bw / sub_running_bw (in kernel/sched/deadline.c) implement the admission-control accounting that SUGOV’s cpu_bw_dl() consumes for frequency selection. A raw write to any of these breaks admission control + DVFS.

PELT (Per-Entity Load Tracking) averages. These are exponential moving averages whose internal _sum accumulators are advanced against cfs_rq_clock_pelt(cfs_rq) (see kernel/sched/fair.c update_load_avg, which calls into kernel/sched/pelt.c __update_load_avg_se / __update_load_avg_cfs_rq). Writing only the visible _avg value desynchronises it from the _sum it was computed from; the next update_load_avg decays both and corrupts the next several passes.

struct sched_avg fields on task_struct.se.avg and cfs_rq.avg: load_avg, runnable_avg, util_avg, util_est, plus load_sum / runnable_sum / util_sum / last_update_time / period_contrib (see include/linux/sched.h struct sched_avg).
cfs_rq.removed.{load_avg,util_avg,runnable_avg} — pending-decay buffer for departing entities; flushed at the next update_load_avg.
rq.cpu_capacity — set by update_cpu_capacity (kernel/sched/fair.c, called from the load-balance path update_group_capacity) from per-CPU RT capacity scaling; initialized at boot in kernel/sched/core.c sched_init. Raw writes are overwritten on the next load-balance tick that triggers a capacity recomputation.

Cgroup / task-group accounting. Updating the task-group hierarchy bypasses the cascade that the kernel performs over every group entity.

task_group.shares — cgroup CPU shares, normally set via sched_group_set_shares (kernel/sched/fair.c) which cascades into update_load_set + walks every task in the group. Raw write skips the cascade and produces inconsistent per-entity load weights.
task_group.cfs_bandwidth.{quota, period, runtime} — CFS bandwidth control. tg_set_cfs_bandwidth (kernel/sched/core.c) is the cgroup-fs writer; the per-cfs_rq runtime distribution is performed by __refill_cfs_bandwidth_runtime (kernel/sched/fair.c) gated by the cfs_bandwidth_used() static-key (kernel/sched/fair.c) registered via start_cfs_bandwidth (kernel/sched/fair.c). Raw writes skip all of those.

The right shape for influencing these fields is to drive the kernel into the desired state through real activity — Op::Spawn with SpawnPlacement::RunnerCgroup (inherits the spawner’s cgroup, typically cgid=1 inside guest VMs) or Op::Spawn with SpawnPlacement::Cgroup (runs inside a named cgroup) of a synthetic WorkloadConfig for fake-load, real preemption pressure for sched_avg.

§Fields that ARE safe to write raw (with caveats)

jiffies_64 (include/linux/jiffies.h) — the global timekeeping tick counter. Safe to advance FORWARD only; backward jumps trigger soft-lockup watchdog warnings and can stall time_after_eq waiters whose expiry now appears to be in the past in a way the timer wheel cannot reconcile.
Per-CPU rq.clock (struct rq.clock, kernel/sched/sched.h) — the scheduler’s per-CPU wall-time clock. Not generically safe: update_rq_clock (kernel/sched/core.c) overwrites it at every scheduling tick + every enqueue/dequeue from sched_clock_cpu(cpu), so a raw write lasts at most until the next tick (~1 ms with HZ=1000). The rq_clock_skip_update() helper sets RQCF_REQ_SKIP in rq->clock_update_flags, which suppresses one update_rq_clock call, but its semantics are tightly coupled to the RQCF_ACT_SKIP / RQCF_REQ_SKIP state machine in __schedule — a self-contained “freeze rq.clock at value X across step Y” pattern is the framework’s responsibility (planned), not a one-shot raw-write primitive. Bumping rq.clock_task directly is also NOT safe — that field is computed by update_rq_clock_task from rq->clock minus IRQ and steal-time deltas (prev_irq_time and prev_steal_time_rq) and a raw write desynchronises it from the inputs.
Per-CPU rq.scx.clock (sched_ext per-CPU clock) — safe ONLY when paired with setting SCX_RQ_CLK_VALID in rq.scx.flags. The flag gates scx_bpf_now() reads; writing the clock without the flag leaves scx_bpf_now() returning stale data, and clearing the flag without resetting the clock makes downstream BPF readers fall back to the host TSC unexpectedly. Atomic bit-set without read-back is provided by KernelValue::OrU32 — the RMW variant whose width matches struct scx_rq.flags (u32 at kernel/sched/sched.h:803). Note there is no OrU64 sibling: a 64-bit RMW at this field address would corrupt the adjacent u32 nr_immed field at kernel/sched/sched.h:804. Width is the variant tag, so wrong-width writes are a compile-time error rather than a silent field-overflow bug at runtime. Pair OrU32(SCX_RQ_CLK_VALID) with the prior U64(clock_val) write in a single Op::WriteKernelCold batch so both land under one freeze rendezvous and the kernel’s documented write-clock-BEFORE-OR-flag ordering (per kernel/sched/sched.h:1848-1854 scx_rq_clock_update) holds.
scx-ktstr private bss / per-CPU scratch — the fixture scheduler exposes a dedicated write surface for test use; raw writes there don’t propagate into core sched code by construction.

§`#[non_exhaustive]`

KernelTarget is #[non_exhaustive] — see [crate::non_exhaustive] for the cross-crate pattern-match rule. Prefer the per-variant constructors (Self::symbol, Self::direct, Self::kva, Self::per_cpu_field) over naming variant literals.

Variants (Non-exhaustive)§

This enum is marked as non-exhaustive

Non-exhaustive enums could have additional variants added in future. Therefore, when matching against variants of non-exhaustive enums, an extra wildcard arm must be added to account for any future variants.

§

Symbol(Cow<'static, str>)

Kernel text/data/bss symbol. The host resolves name → KVA → PA via the runtime kernel image base + KASLR phys_base, exactly as crate::monitor::guest::GuestKernel::write_symbol_u64 already does for the existing write-symbol helper.

§

Direct(u64)

Direct-mapped kernel virtual address — translated via kva - PAGE_OFFSET. Use this when the caller has already resolved a SLAB / per-CPU / physmem KVA and just wants the host to write at that address.

§

Kva(u64)

Vmalloc’d / vmap’d kernel virtual address — translated via page-table walk through the guest’s CR3. Use this for BPF maps, vmalloc’d memory, and any other address that does NOT live in the direct map.

§

PerCpuField

Per-CPU field of a kernel struct, resolved at op dispatch time. The variant carries the symbolic intent only (symbol, field, cpu); the dispatcher looks up symbol in the vmlinux symbol table, adds __per_cpu_offset[cpu], and adds the BTF-resolved byte offset of field within symbol’s struct type to yield the per-CPU field’s runtime KVA.

symbol must be in the v1 supported set: runqueues → struct rq, kernel_cpustat → struct kernel_cpustat, kstat → struct kernel_stat, tick_cpu_sched → struct tick_sched. Unknown symbols fail with a typed error (the wire variant doesn’t carry struct type, so the dispatcher maps via a hardcoded table — extend it AND KernelSymbols::from_elf to add). KASLR-on round-trip coverage is an outstanding follow-up; ktstr defaults to nokaslr so the kaslr_offset slide is 0 on the standard test path.

Lazy resolution keeps the construction surface pure-data (the test author needs no GuestKernel/BTF/symbol-table handle to construct the variant); resolution failures surface as op-execution errors at the same layer as missing-symbol failures in other snapshot ops.

Fields

§symbol: Cow<'static, str>

Kernel symbol naming the per-CPU template (e.g. "runqueues").

§field: Cow<'static, str>

Field name within the symbol’s struct (e.g. "clock" for struct rq.clock).

§cpu: u32

CPU index whose per-CPU instance to address.

§

TaskField

Per-task field of struct task_struct — SCX-managed tasks only (the dispatcher’s L6 sched_class gate rejects non-SCX tasks). Resolved at dispatch by walking init_task.tasks plus each leader’s signal->thread_head to locate the task with matching pid AND matching expected_start_time_ns (anti-PID-reuse identity), then adding the BTF-resolved nested-path byte offset of field within task_struct. See crate::vmm::wire::KernelOpTarget::TaskField for the 7-layer validation chain the dispatcher applies.

expected_start_time_ns is task->start_time captured at WorkSpec spawn time. Get it via crate::workload::WorkloadHandle::worker_pids for the PID list, then read /proc/<pid>/stat field 22 + convert from jiffies to ns via * 1_000_000_000 / sysconf(_SC_CLK_TCK).

Fields

§pid: u32

Guest-side pid_t of the target task. Both leaders and non-leader threads are addressable.

§expected_start_time_ns: u64

task->start_time (ns) recorded at spawn time. The dispatcher’s L2 check rejects writes when the observed task->start_time differs (PID-reuse identity guard).

§field: Cow<'static, str>

Dot-separated nested-member path within task_struct. SCX-only fields recommended (e.g. "scx.dsq_vtime", "start_boottime"). "se.vruntime" writes are silently discarded by EEVDF’s place_entity on enqueue (kernel/sched/fair.c:5381-5514 since 6.6) AND rejected by the SCX-only class gate; do not use.

Enum KernelTarget Copy item path

§Semantic risk — writing to load-bearing scheduler state

§Fields that ARE safe to write raw (with caveats)

§#[non_exhaustive]

Variants (Non-exhaustive)§

Symbol(Cow<'static, str>)

Direct(u64)

Kva(u64)

PerCpuField

Fields

TaskField

Fields

Implementations§

impl KernelTarget

pub fn symbol(name: impl Into<Cow<'static, str>>) -> Self

pub const fn direct(kva: u64) -> Self

pub const fn kva(kva: u64) -> Self

pub fn per_cpu_field( symbol: impl Into<Cow<'static, str>>, field: impl Into<Cow<'static, str>>, cpu: u32, ) -> Self

pub fn task_field( pid: u32, expected_start_time_ns: u64, field: impl Into<Cow<'static, str>>, ) -> Self

§Examples

Trait Implementations§

impl Clone for KernelTarget

fn clone(&self) -> KernelTarget

fn clone_from(&mut self, source: &Self)

impl Debug for KernelTarget

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl PartialEq for KernelTarget

fn eq(&self, other: &KernelTarget) -> bool

fn ne(&self, other: &Rhs) -> bool

impl Eq for KernelTarget

impl StructuralPartialEq for KernelTarget

Auto Trait Implementations§

impl Freeze for KernelTarget

impl RefUnwindSafe for KernelTarget

impl Send for KernelTarget

impl Sync for KernelTarget

impl Unpin for KernelTarget

impl UnwindSafe for KernelTarget

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> CloneToUninit for Twhere T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

impl<Q, K> Equivalent<K> for Qwhere Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

fn equivalent(&self, key: &K) -> bool

impl<Q, K> Equivalent<K> for Qwhere Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

fn equivalent(&self, key: &K) -> bool

impl<Q, K> Equivalent<K> for Qwhere Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

fn equivalent(&self, key: &K) -> bool

impl<Q, K> Equivalent<K> for Qwhere Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

fn equivalent(&self, key: &K) -> bool

impl<T> From<T> for T

fn from(t: T) -> T

impl<T> Instrument for T

fn instrument(self, span: Span) -> Instrumented<Self>

fn in_current_span(self) -> Instrumented<Self>

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> IntoEither for T

fn into_either(self, into_left: bool) -> Either<Self, Self>

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>where F: FnOnce(&Self) -> bool,

impl<T> Pointable for T

const ALIGN: usize

type Init = T

unsafe fn init(init: <T as Pointable>::Init) -> usize

unsafe fn deref<'a>(ptr: usize) -> &'a T

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

unsafe fn drop(ptr: usize)

impl<T> PolicyExt for Twhere T: ?Sized,

fn and<P, B, E>(self, other: P) -> And<T, P>where T: Policy<B, E>, P: Policy<B, E>,

fn or<P, B, E>(self, other: P) -> Or<T, P>where T: Policy<B, E>, P: Policy<B, E>,

impl<T> Same for T

type Output = T

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

Enum KernelTarget

§`#[non_exhaustive]`

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

impl<T, U> Into<U> for T
where U: From<T>,

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

impl<T> PolicyExt for T
where T: ?Sized,

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

impl<T> ToOwned for T
where T: Clone,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

impl<T> MaybeSend for T
where T: Send,

impl<T> MaybeSend for T
where T: Send,