KernelTarget

Enum KernelTarget 

Source
#[non_exhaustive]
pub enum KernelTarget { Symbol(Cow<'static, str>), Direct(u64), Kva(u64), PerCpuField { symbol: Cow<'static, str>, field: Cow<'static, str>, cpu: u32, }, TaskField { pid: u32, expected_start_time_ns: u64, field: Cow<'static, str>, }, }
Expand description

Host-side write/read target for the kernel-memory ops (Op::WriteKernelHot / Op::WriteKernelCold / Op::ReadKernelHot / Op::ReadKernelCold).

Each variant names a kernel address by the resolution path the host coordinator will take when the op fires; the actual GuestKernel write helpers consume the resolved KVA. The variant chosen here picks WHICH translation path (KASLR-aware kernel-image base for Self::Symbol, PAGE_OFFSET for Self::Direct, page-table walk for Self::Kva, or per-CPU dereference for Self::PerCpuField).

§Semantic risk — writing to load-bearing scheduler state

ktstr does not gate or filter target addresses. The framework trusts the test author to know what they are pointing at. That trust includes a class of fields where a raw write silently breaks downstream kernel invariants the test author did not intend to perturb. By design, mitigation is documentation-only: the framework will not refuse a write nor emit a runtime warn — the test author owns the choice. The cases to know about:

Per-runqueue counters maintained by the scheduler classes. Raw writes skip the side-effects the kernel encodes in the maintainer functions, leaving cross-class accounting in an inconsistent state.

  • struct rq.nr_running — the per-CPU runqueue task count. add_nr_running / sub_nr_running (kernel/sched/sched.h) also (a) fire the sched_update_nr_running_tp tracepoint and (b) call sched_update_tick_dependency(rq) (the NOHZ_FULL per-CPU tick gating logic); add_nr_running additionally sets the root-domain overloaded bit (rq->rd->overloaded) on the prev_nr < 2 && new_nr >= 2 transition. A bare 8-byte store skips all of those; the counter and the root-domain overload signal diverge, the NOHZ_FULL CPU may stop or start receiving ticks against the test author’s intent, and downstream load-balance decisions read a count that no longer matches reality.
  • struct cfs_rq.h_nr_runnable / h_nr_queued / h_nr_idle (kernel/sched/sched.h struct cfs_rq) — hierarchical CFS task counts maintained by account_entity_enqueue / dequeue with cascade up the task group tree. Raw write skips parent-cfs_rq propagation and breaks group scheduling accounting.
  • struct rt_rq.rt_nr_running (kernel/sched/sched.h struct rt_rq) — RT class runqueue task count; updated by inc_rt_tasks / dec_rt_tasks which also maintain the per-rt_rq overloaded bit and the highest_prio.curr/next priority-pushable tracking.
  • struct dl_rq.dl_nr_running / running_bw / this_bw (kernel/sched/sched.h struct dl_rq) — DEADLINE class counters and bandwidth tracking; add_running_bw / sub_running_bw (in kernel/sched/deadline.c) implement the admission-control accounting that SUGOV’s cpu_bw_dl() consumes for frequency selection. A raw write to any of these breaks admission control + DVFS.

PELT (Per-Entity Load Tracking) averages. These are exponential moving averages whose internal _sum accumulators are advanced against cfs_rq_clock_pelt(cfs_rq) (see kernel/sched/fair.c update_load_avg, which calls into kernel/sched/pelt.c __update_load_avg_se / __update_load_avg_cfs_rq). Writing only the visible _avg value desynchronises it from the _sum it was computed from; the next update_load_avg decays both and corrupts the next several passes.

  • struct sched_avg fields on task_struct.se.avg and cfs_rq.avg: load_avg, runnable_avg, util_avg, util_est, plus load_sum / runnable_sum / util_sum / last_update_time / period_contrib (see include/linux/sched.h struct sched_avg).
  • cfs_rq.removed.{load_avg,util_avg,runnable_avg} — pending-decay buffer for departing entities; flushed at the next update_load_avg.
  • rq.cpu_capacity — set by update_cpu_capacity (kernel/sched/fair.c, called from the load-balance path update_group_capacity) from per-CPU RT capacity scaling; initialized at boot in kernel/sched/core.c sched_init. Raw writes are overwritten on the next load-balance tick that triggers a capacity recomputation.

Cgroup / task-group accounting. Updating the task-group hierarchy bypasses the cascade that the kernel performs over every group entity.

  • task_group.shares — cgroup CPU shares, normally set via sched_group_set_shares (kernel/sched/fair.c) which cascades into update_load_set + walks every task in the group. Raw write skips the cascade and produces inconsistent per-entity load weights.
  • task_group.cfs_bandwidth.{quota, period, runtime} — CFS bandwidth control. tg_set_cfs_bandwidth (kernel/sched/core.c) is the cgroup-fs writer; the per-cfs_rq runtime distribution is performed by __refill_cfs_bandwidth_runtime (kernel/sched/fair.c) gated by the cfs_bandwidth_used() static-key (kernel/sched/fair.c) registered via start_cfs_bandwidth (kernel/sched/fair.c). Raw writes skip all of those.

The right shape for influencing these fields is to drive the kernel into the desired state through real activityOp::Spawn with SpawnPlacement::RunnerCgroup (inherits the spawner’s cgroup, typically cgid=1 inside guest VMs) or Op::Spawn with SpawnPlacement::Cgroup (runs inside a named cgroup) of a synthetic WorkloadConfig for fake-load, real preemption pressure for sched_avg.

§Fields that ARE safe to write raw (with caveats)

  • jiffies_64 (include/linux/jiffies.h) — the global timekeeping tick counter. Safe to advance FORWARD only; backward jumps trigger soft-lockup watchdog warnings and can stall time_after_eq waiters whose expiry now appears to be in the past in a way the timer wheel cannot reconcile.
  • Per-CPU rq.clock (struct rq.clock, kernel/sched/sched.h) — the scheduler’s per-CPU wall-time clock. Not generically safe: update_rq_clock (kernel/sched/core.c) overwrites it at every scheduling tick + every enqueue/dequeue from sched_clock_cpu(cpu), so a raw write lasts at most until the next tick (~1 ms with HZ=1000). The rq_clock_skip_update() helper sets RQCF_REQ_SKIP in rq->clock_update_flags, which suppresses one update_rq_clock call, but its semantics are tightly coupled to the RQCF_ACT_SKIP / RQCF_REQ_SKIP state machine in __schedule — a self-contained “freeze rq.clock at value X across step Y” pattern is the framework’s responsibility (planned), not a one-shot raw-write primitive. Bumping rq.clock_task directly is also NOT safe — that field is computed by update_rq_clock_task from rq->clock minus IRQ and steal-time deltas (prev_irq_time and prev_steal_time_rq) and a raw write desynchronises it from the inputs.
  • Per-CPU rq.scx.clock (sched_ext per-CPU clock) — safe ONLY when paired with setting SCX_RQ_CLK_VALID in rq.scx.flags. The flag gates scx_bpf_now() reads; writing the clock without the flag leaves scx_bpf_now() returning stale data, and clearing the flag without resetting the clock makes downstream BPF readers fall back to the host TSC unexpectedly. Atomic bit-set without read-back is provided by KernelValue::OrU32 — the RMW variant whose width matches struct scx_rq.flags (u32 at kernel/sched/sched.h:803). Note there is no OrU64 sibling: a 64-bit RMW at this field address would corrupt the adjacent u32 nr_immed field at kernel/sched/sched.h:804. Width is the variant tag, so wrong-width writes are a compile-time error rather than a silent field-overflow bug at runtime. Pair OrU32(SCX_RQ_CLK_VALID) with the prior U64(clock_val) write in a single Op::WriteKernelCold batch so both land under one freeze rendezvous and the kernel’s documented write-clock-BEFORE-OR-flag ordering (per kernel/sched/sched.h:1848-1854 scx_rq_clock_update) holds.
  • scx-ktstr private bss / per-CPU scratch — the fixture scheduler exposes a dedicated write surface for test use; raw writes there don’t propagate into core sched code by construction.

§#[non_exhaustive]

KernelTarget is #[non_exhaustive] — see [crate::non_exhaustive] for the cross-crate pattern-match rule. Prefer the per-variant constructors (Self::symbol, Self::direct, Self::kva, Self::per_cpu_field) over naming variant literals.

Variants (Non-exhaustive)§

This enum is marked as non-exhaustive
Non-exhaustive enums could have additional variants added in future. Therefore, when matching against variants of non-exhaustive enums, an extra wildcard arm must be added to account for any future variants.
§

Symbol(Cow<'static, str>)

Kernel text/data/bss symbol. The host resolves name → KVA → PA via the runtime kernel image base + KASLR phys_base, exactly as crate::monitor::guest::GuestKernel::write_symbol_u64 already does for the existing write-symbol helper.

§

Direct(u64)

Direct-mapped kernel virtual address — translated via kva - PAGE_OFFSET. Use this when the caller has already resolved a SLAB / per-CPU / physmem KVA and just wants the host to write at that address.

§

Kva(u64)

Vmalloc’d / vmap’d kernel virtual address — translated via page-table walk through the guest’s CR3. Use this for BPF maps, vmalloc’d memory, and any other address that does NOT live in the direct map.

§

PerCpuField

Per-CPU field of a kernel struct, resolved at op dispatch time. The variant carries the symbolic intent only (symbol, field, cpu); the dispatcher looks up symbol in the vmlinux symbol table, adds __per_cpu_offset[cpu], and adds the BTF-resolved byte offset of field within symbol’s struct type to yield the per-CPU field’s runtime KVA.

symbol must be in the v1 supported set: runqueuesstruct rq, kernel_cpustatstruct kernel_cpustat, kstatstruct kernel_stat, tick_cpu_schedstruct tick_sched. Unknown symbols fail with a typed error (the wire variant doesn’t carry struct type, so the dispatcher maps via a hardcoded table — extend it AND KernelSymbols::from_elf to add). KASLR-on round-trip coverage is an outstanding follow-up; ktstr defaults to nokaslr so the kaslr_offset slide is 0 on the standard test path.

Lazy resolution keeps the construction surface pure-data (the test author needs no GuestKernel/BTF/symbol-table handle to construct the variant); resolution failures surface as op-execution errors at the same layer as missing-symbol failures in other snapshot ops.

Fields

§symbol: Cow<'static, str>

Kernel symbol naming the per-CPU template (e.g. "runqueues").

§field: Cow<'static, str>

Field name within the symbol’s struct (e.g. "clock" for struct rq.clock).

§cpu: u32

CPU index whose per-CPU instance to address.

§

TaskField

Per-task field of struct task_struct — SCX-managed tasks only (the dispatcher’s L6 sched_class gate rejects non-SCX tasks). Resolved at dispatch by walking init_task.tasks plus each leader’s signal->thread_head to locate the task with matching pid AND matching expected_start_time_ns (anti-PID-reuse identity), then adding the BTF-resolved nested-path byte offset of field within task_struct. See crate::vmm::wire::KernelOpTarget::TaskField for the 7-layer validation chain the dispatcher applies.

expected_start_time_ns is task->start_time captured at WorkSpec spawn time. Get it via crate::workload::WorkloadHandle::worker_pids for the PID list, then read /proc/<pid>/stat field 22 + convert from jiffies to ns via * 1_000_000_000 / sysconf(_SC_CLK_TCK).

Fields

§pid: u32

Guest-side pid_t of the target task. Both leaders and non-leader threads are addressable.

§expected_start_time_ns: u64

task->start_time (ns) recorded at spawn time. The dispatcher’s L2 check rejects writes when the observed task->start_time differs (PID-reuse identity guard).

§field: Cow<'static, str>

Dot-separated nested-member path within task_struct. SCX-only fields recommended (e.g. "scx.dsq_vtime", "start_boottime"). "se.vruntime" writes are silently discarded by EEVDF’s place_entity on enqueue (kernel/sched/fair.c:5381-5514 since 6.6) AND rejected by the SCX-only class gate; do not use.

Implementations§

Source§

impl KernelTarget

Source

pub fn symbol(name: impl Into<Cow<'static, str>>) -> Self

Kernel text/data/bss symbol target. Resolves at op-dispatch time via the runtime kernel image base + KASLR phys_base.

Heads up. See the # Semantic risk section on the enclosing KernelTarget type doc before pointing this at a scheduler-bookkeeping symbol.

Source

pub const fn direct(kva: u64) -> Self

Direct-mapped KVA target. Translates via kva - PAGE_OFFSET. For per-CPU bases the caller must add __per_cpu_offset[cpu] to the base symbol KVA before constructing the variant; use Self::per_cpu_field instead for the framework-resolved per-CPU shape.

Heads up. See the # Semantic risk section on the enclosing KernelTarget type doc before pointing this at a scheduler-bookkeeping address.

Source

pub const fn kva(kva: u64) -> Self

Vmalloc’d / vmap’d KVA target. Translates via page-table walk through the guest’s CR3.

Heads up. See the # Semantic risk section on the enclosing KernelTarget type doc before pointing this at a scheduler-bookkeeping address.

Source

pub fn per_cpu_field( symbol: impl Into<Cow<'static, str>>, field: impl Into<Cow<'static, str>>, cpu: u32, ) -> Self

Per-CPU field of a kernel struct. Resolves at op-dispatch time via symbol_kva + __per_cpu_offset[cpu] + BTF byte offset of field.

Heads up. See the # Semantic risk section on the enclosing KernelTarget type doc before pointing this at a per-CPU scheduler-bookkeeping field.

Source

pub fn task_field( pid: u32, expected_start_time_ns: u64, field: impl Into<Cow<'static, str>>, ) -> Self

Per-task struct task_struct field target — SCX-managed tasks only. Resolves at dispatch via init_task.tasks + per-leader signal->thread_head walks to find the task with matching pid AND matching expected_start_time_ns (anti-PID-reuse), then BTF nested-path offset of field.

expected_start_time_ns is task->start_time (set once by kernel/fork.c::copy_process via ktime_get_ns()). Get worker PIDs via crate::workload::WorkloadHandle::worker_pids then read /proc/<pid>/stat field 22 at spawn time and convert to ns: field_22_jiffies * 1_000_000_000 / sysconf(_SC_CLK_TCK).

field is dot-separated nested-member path. The dispatcher applies a 7-layer validation chain (pid match, start_time identity, lifetime, on_rq=0, scx queued-empty, ext sched_class, start_boottime != 0) before the write/read lands — see crate::vmm::wire::KernelOpTarget::TaskField for the full contract.

SCX-only. The dispatcher rejects non-SCX tasks via the class+policy gates. Recommended fields: "scx.dsq_vtime" (DSQ priority key, preserved across dequeue/enqueue), "start_boottime" (task fork timestamp).

Do NOT write "se.vruntime". EEVDF’s place_entity (kernel/sched/fair.c:5381-5514, since 6.6) overwrites se->vruntime on every enqueue; direct vruntime writes are silently discarded for sleeping tasks (our validation gate). CFS-class tasks are rejected before reaching the write regardless, but the field-level warning is the actionable guidance for “why won’t my vruntime write stick” debugging.

Heads up. The dispatcher’s L4 (on_rq == 0) + L5 (scx.dsq == NULL AND scx.runnable_node empty) gates reject writes on queued/running tasks per CFS rb-tree + SCX DSQ ordering safety. Test authors must use blocking workload patterns (e.g. crate::workload::WorkType::FutexPingPong, WorkType::WaitOnFutex, WorkType::Sleep) so workers are sleeping when the cold-path Op fires.

§Examples
// Escape-hatch primitive: seed a specific worker's
// scx.dsq_vtime to ~30 days. WorkSpec.uptime (separate API)
// wraps this; use the escape hatch when the scenario knows
// the exact PID + start_time tuple.
use ktstr::prelude::*;
use std::time::Duration;

let workers = handle.worker_pids();         // Vec<libc::pid_t>
let worker_pid = workers[0] as u32;
// Read `/proc/<pid>/stat` field 22, convert from jiffies to
// nanoseconds via `* 1_000_000_000 / sysconf(_SC_CLK_TCK)`.
// (Helper expected to land alongside WorkSpec.uptime.)
let start_time_ns: u64 = read_start_time_ns(worker_pid)?;

let seed_vtime_ns = (30 * 86_400_u64) * 1_000_000_000; // 30 days
let writes = vec![(
    KernelTarget::task_field(worker_pid, start_time_ns, "scx.dsq_vtime"),
    KernelValue::u64(seed_vtime_ns),
)];
// Worker MUST be in a blocking pattern (FutexPingPong, etc.)
// at op-fire time; the dispatcher's 8-layer validation
// rejects writes against runnable/queued tasks.

Trait Implementations§

Source§

impl Clone for KernelTarget

Source§

fn clone(&self) -> KernelTarget

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for KernelTarget

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl PartialEq for KernelTarget

Source§

fn eq(&self, other: &KernelTarget) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
Source§

impl Eq for KernelTarget

Source§

impl StructuralPartialEq for KernelTarget

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
§

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

§

fn equivalent(&self, key: &K) -> bool

Checks if this value is equivalent to the given key. Read more
§

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

§

fn equivalent(&self, key: &K) -> bool

Compare self to key and return true if they are equal.
§

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

§

fn equivalent(&self, key: &K) -> bool

Checks if this value is equivalent to the given key. Read more
§

impl<Q, K> Equivalent<K> for Q
where Q: Eq + ?Sized, K: Borrow<Q> + ?Sized,

§

fn equivalent(&self, key: &K) -> bool

Checks if this value is equivalent to the given key. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

§

impl<T> Instrument for T

§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided [Span], returning an Instrumented wrapper. Read more
§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
§

impl<T> Pointable for T

§

const ALIGN: usize

The alignment of pointer.
§

type Init = T

The type for initializers.
§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
§

impl<T> PolicyExt for T
where T: ?Sized,

§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns [Action::Follow] only if self and other return Action::Follow. Read more
§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns [Action::Follow] if either self or other returns Action::Follow. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

§

fn vzip(self) -> V

§

impl<T> WithSubscriber for T

§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a [WithDispatch] wrapper. Read more
§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a [WithDispatch] wrapper. Read more
§

impl<T> MaybeSend for T
where T: Send,

§

impl<T> MaybeSend for T
where T: Send,