#[non_exhaustive]pub enum KernelTarget {
Symbol(Cow<'static, str>),
Direct(u64),
Kva(u64),
PerCpuField {
symbol: Cow<'static, str>,
field: Cow<'static, str>,
cpu: u32,
},
TaskField {
pid: u32,
expected_start_time_ns: u64,
field: Cow<'static, str>,
},
}Expand description
Host-side write/read target for the kernel-memory ops
(Op::WriteKernelHot / Op::WriteKernelCold /
Op::ReadKernelHot / Op::ReadKernelCold).
Each variant names a kernel address by the resolution path the
host coordinator will take when the op fires; the actual
GuestKernel write helpers consume the resolved KVA. The variant
chosen here picks WHICH translation path (KASLR-aware kernel-image
base for Self::Symbol, PAGE_OFFSET for Self::Direct,
page-table walk for Self::Kva, or per-CPU dereference for
Self::PerCpuField).
§Semantic risk — writing to load-bearing scheduler state
ktstr does not gate or filter target addresses. The framework trusts the test author to know what they are pointing at. That trust includes a class of fields where a raw write silently breaks downstream kernel invariants the test author did not intend to perturb. By design, mitigation is documentation-only: the framework will not refuse a write nor emit a runtime warn — the test author owns the choice. The cases to know about:
Per-runqueue counters maintained by the scheduler classes. Raw writes skip the side-effects the kernel encodes in the maintainer functions, leaving cross-class accounting in an inconsistent state.
struct rq.nr_running— the per-CPU runqueue task count.add_nr_running/sub_nr_running(kernel/sched/sched.h) also (a) fire thesched_update_nr_running_tptracepoint and (b) callsched_update_tick_dependency(rq)(theNOHZ_FULLper-CPU tick gating logic);add_nr_runningadditionally sets the root-domainoverloadedbit (rq->rd->overloaded) on theprev_nr < 2 && new_nr >= 2transition. A bare 8-byte store skips all of those; the counter and the root-domain overload signal diverge, the NOHZ_FULL CPU may stop or start receiving ticks against the test author’s intent, and downstream load-balance decisions read a count that no longer matches reality.struct cfs_rq.h_nr_runnable/h_nr_queued/h_nr_idle(kernel/sched/sched.hstruct cfs_rq) — hierarchical CFS task counts maintained byaccount_entity_enqueue/dequeuewith cascade up the task group tree. Raw write skips parent-cfs_rq propagation and breaks group scheduling accounting.struct rt_rq.rt_nr_running(kernel/sched/sched.hstruct rt_rq) — RT class runqueue task count; updated byinc_rt_tasks/dec_rt_taskswhich also maintain the per-rt_rqoverloadedbit and thehighest_prio.curr/nextpriority-pushable tracking.struct dl_rq.dl_nr_running/running_bw/this_bw(kernel/sched/sched.hstruct dl_rq) — DEADLINE class counters and bandwidth tracking;add_running_bw/sub_running_bw(inkernel/sched/deadline.c) implement the admission-control accounting that SUGOV’scpu_bw_dl()consumes for frequency selection. A raw write to any of these breaks admission control + DVFS.
PELT (Per-Entity Load Tracking) averages. These are
exponential moving averages whose internal _sum accumulators
are advanced against cfs_rq_clock_pelt(cfs_rq) (see
kernel/sched/fair.c update_load_avg, which calls into
kernel/sched/pelt.c __update_load_avg_se /
__update_load_avg_cfs_rq). Writing only the visible
_avg value desynchronises it from the _sum it was
computed from; the next update_load_avg decays both and
corrupts the next several passes.
struct sched_avgfields ontask_struct.se.avgandcfs_rq.avg:load_avg,runnable_avg,util_avg,util_est, plusload_sum/runnable_sum/util_sum/last_update_time/period_contrib(seeinclude/linux/sched.h struct sched_avg).cfs_rq.removed.{load_avg,util_avg,runnable_avg}— pending-decay buffer for departing entities; flushed at the nextupdate_load_avg.rq.cpu_capacity— set byupdate_cpu_capacity(kernel/sched/fair.c, called from the load-balance pathupdate_group_capacity) from per-CPU RT capacity scaling; initialized at boot inkernel/sched/core.c sched_init. Raw writes are overwritten on the next load-balance tick that triggers a capacity recomputation.
Cgroup / task-group accounting. Updating the task-group hierarchy bypasses the cascade that the kernel performs over every group entity.
task_group.shares— cgroup CPU shares, normally set viasched_group_set_shares(kernel/sched/fair.c) which cascades intoupdate_load_set+ walks every task in the group. Raw write skips the cascade and produces inconsistent per-entity load weights.task_group.cfs_bandwidth.{quota, period, runtime}— CFS bandwidth control.tg_set_cfs_bandwidth(kernel/sched/core.c) is the cgroup-fs writer; the per-cfs_rq runtime distribution is performed by__refill_cfs_bandwidth_runtime(kernel/sched/fair.c) gated by thecfs_bandwidth_used()static-key (kernel/sched/fair.c) registered viastart_cfs_bandwidth(kernel/sched/fair.c). Raw writes skip all of those.
The right shape for influencing these fields is to drive the
kernel into the desired state through real activity —
Op::Spawn with SpawnPlacement::RunnerCgroup (inherits the spawner’s cgroup, typically
cgid=1 inside guest VMs) or
Op::Spawn with SpawnPlacement::Cgroup (runs inside a named cgroup) of a
synthetic WorkloadConfig
for fake-load, real preemption pressure for sched_avg.
§Fields that ARE safe to write raw (with caveats)
jiffies_64(include/linux/jiffies.h) — the global timekeeping tick counter. Safe to advance FORWARD only; backward jumps trigger soft-lockup watchdog warnings and can stalltime_after_eqwaiters whose expiry now appears to be in the past in a way the timer wheel cannot reconcile.- Per-CPU
rq.clock(struct rq.clock,kernel/sched/sched.h) — the scheduler’s per-CPU wall-time clock. Not generically safe:update_rq_clock(kernel/sched/core.c) overwrites it at every scheduling tick + every enqueue/dequeue fromsched_clock_cpu(cpu), so a raw write lasts at most until the next tick (~1 ms withHZ=1000). Therq_clock_skip_update()helper setsRQCF_REQ_SKIPinrq->clock_update_flags, which suppresses oneupdate_rq_clockcall, but its semantics are tightly coupled to the RQCF_ACT_SKIP / RQCF_REQ_SKIP state machine in__schedule— a self-contained “freeze rq.clock at value X across step Y” pattern is the framework’s responsibility (planned), not a one-shot raw-write primitive. Bumpingrq.clock_taskdirectly is also NOT safe — that field is computed byupdate_rq_clock_taskfromrq->clockminus IRQ and steal-time deltas (prev_irq_timeandprev_steal_time_rq) and a raw write desynchronises it from the inputs. - Per-CPU
rq.scx.clock(sched_ext per-CPU clock) — safe ONLY when paired with settingSCX_RQ_CLK_VALIDinrq.scx.flags. The flag gatesscx_bpf_now()reads; writing the clock without the flag leavesscx_bpf_now()returning stale data, and clearing the flag without resetting the clock makes downstream BPF readers fall back to the host TSC unexpectedly. Atomic bit-set without read-back is provided byKernelValue::OrU32— the RMW variant whose width matchesstruct scx_rq.flags(u32atkernel/sched/sched.h:803). Note there is noOrU64sibling: a 64-bit RMW at this field address would corrupt the adjacentu32 nr_immedfield atkernel/sched/sched.h:804. Width is the variant tag, so wrong-width writes are a compile-time error rather than a silent field-overflow bug at runtime. PairOrU32(SCX_RQ_CLK_VALID)with the priorU64(clock_val)write in a singleOp::WriteKernelColdbatch so both land under one freeze rendezvous and the kernel’s documented write-clock-BEFORE-OR-flag ordering (perkernel/sched/sched.h:1848-1854scx_rq_clock_update) holds. scx-ktstrprivate bss / per-CPU scratch — the fixture scheduler exposes a dedicated write surface for test use; raw writes there don’t propagate into core sched code by construction.
§#[non_exhaustive]
KernelTarget is #[non_exhaustive] — see
[crate::non_exhaustive] for the cross-crate pattern-match rule.
Prefer the per-variant constructors (Self::symbol,
Self::direct, Self::kva, Self::per_cpu_field) over
naming variant literals.
Variants (Non-exhaustive)§
This enum is marked as non-exhaustive
Symbol(Cow<'static, str>)
Kernel text/data/bss symbol. The host resolves
name → KVA → PA via the runtime kernel image base + KASLR
phys_base, exactly as
crate::monitor::guest::GuestKernel::write_symbol_u64
already does for the existing write-symbol helper.
Direct(u64)
Direct-mapped kernel virtual address — translated via
kva - PAGE_OFFSET. Use this when the caller has already
resolved a SLAB / per-CPU / physmem KVA and just wants the
host to write at that address.
Kva(u64)
Vmalloc’d / vmap’d kernel virtual address — translated via
page-table walk through the guest’s CR3. Use this for BPF
maps, vmalloc’d memory, and any other address that does NOT
live in the direct map.
PerCpuField
Per-CPU field of a kernel struct, resolved at op dispatch
time. The variant carries the symbolic intent only (symbol,
field, cpu); the dispatcher looks up symbol in the
vmlinux symbol table, adds __per_cpu_offset[cpu], and adds
the BTF-resolved byte offset of field within symbol’s
struct type to yield the per-CPU field’s runtime KVA.
symbol must be in the v1 supported set: runqueues →
struct rq, kernel_cpustat → struct kernel_cpustat,
kstat → struct kernel_stat, tick_cpu_sched →
struct tick_sched. Unknown symbols fail with a typed error
(the wire variant doesn’t carry struct type, so the
dispatcher maps via a hardcoded table — extend it AND
KernelSymbols::from_elf to add). KASLR-on round-trip
coverage is an outstanding follow-up; ktstr defaults to
nokaslr so the kaslr_offset slide is 0 on the standard
test path.
Lazy resolution keeps the construction surface pure-data
(the test author needs no GuestKernel/BTF/symbol-table
handle to construct the variant); resolution failures
surface as op-execution errors at the same layer as
missing-symbol failures in other snapshot ops.
Fields
TaskField
Per-task field of struct task_struct — SCX-managed tasks
only (the dispatcher’s L6 sched_class gate rejects non-SCX
tasks). Resolved at dispatch by walking init_task.tasks
plus each leader’s signal->thread_head to locate the task
with matching pid AND matching expected_start_time_ns
(anti-PID-reuse identity), then adding the BTF-resolved
nested-path byte offset of field within task_struct.
See crate::vmm::wire::KernelOpTarget::TaskField for the
7-layer validation chain the dispatcher applies.
expected_start_time_ns is task->start_time captured at
WorkSpec spawn time. Get it via
crate::workload::WorkloadHandle::worker_pids for
the PID list, then read /proc/<pid>/stat field 22 +
convert from jiffies to ns via
* 1_000_000_000 / sysconf(_SC_CLK_TCK).
Fields
expected_start_time_ns: u64task->start_time (ns) recorded at spawn time. The
dispatcher’s L2 check rejects writes when the observed
task->start_time differs (PID-reuse identity guard).
field: Cow<'static, str>Dot-separated nested-member path within task_struct.
SCX-only fields recommended (e.g. "scx.dsq_vtime",
"start_boottime"). "se.vruntime" writes are
silently discarded by EEVDF’s place_entity on enqueue
(kernel/sched/fair.c:5381-5514 since 6.6) AND rejected
by the SCX-only class gate; do not use.
Implementations§
Source§impl KernelTarget
impl KernelTarget
Sourcepub fn symbol(name: impl Into<Cow<'static, str>>) -> Self
pub fn symbol(name: impl Into<Cow<'static, str>>) -> Self
Kernel text/data/bss symbol target. Resolves at op-dispatch
time via the runtime kernel image base + KASLR phys_base.
Heads up. See the # Semantic risk section on the
enclosing KernelTarget type doc before pointing this
at a scheduler-bookkeeping symbol.
Sourcepub const fn direct(kva: u64) -> Self
pub const fn direct(kva: u64) -> Self
Direct-mapped KVA target. Translates via kva - PAGE_OFFSET.
For per-CPU bases the caller must add
__per_cpu_offset[cpu] to the base symbol KVA before
constructing the variant; use Self::per_cpu_field
instead for the framework-resolved per-CPU shape.
Heads up. See the # Semantic risk section on the
enclosing KernelTarget type doc before pointing this
at a scheduler-bookkeeping address.
Sourcepub const fn kva(kva: u64) -> Self
pub const fn kva(kva: u64) -> Self
Vmalloc’d / vmap’d KVA target. Translates via page-table
walk through the guest’s CR3.
Heads up. See the # Semantic risk section on the
enclosing KernelTarget type doc before pointing this
at a scheduler-bookkeeping address.
Sourcepub fn per_cpu_field(
symbol: impl Into<Cow<'static, str>>,
field: impl Into<Cow<'static, str>>,
cpu: u32,
) -> Self
pub fn per_cpu_field( symbol: impl Into<Cow<'static, str>>, field: impl Into<Cow<'static, str>>, cpu: u32, ) -> Self
Per-CPU field of a kernel struct. Resolves at op-dispatch
time via symbol_kva + __per_cpu_offset[cpu] + BTF byte offset of field.
Heads up. See the # Semantic risk section on the
enclosing KernelTarget type doc before pointing this
at a per-CPU scheduler-bookkeeping field.
Sourcepub fn task_field(
pid: u32,
expected_start_time_ns: u64,
field: impl Into<Cow<'static, str>>,
) -> Self
pub fn task_field( pid: u32, expected_start_time_ns: u64, field: impl Into<Cow<'static, str>>, ) -> Self
Per-task struct task_struct field target — SCX-managed
tasks only. Resolves at dispatch via init_task.tasks +
per-leader signal->thread_head walks to find the task
with matching pid AND matching expected_start_time_ns
(anti-PID-reuse), then BTF nested-path offset of field.
expected_start_time_ns is task->start_time (set once by
kernel/fork.c::copy_process via ktime_get_ns()).
Get worker PIDs via
crate::workload::WorkloadHandle::worker_pids then
read /proc/<pid>/stat field 22 at spawn time and convert
to ns: field_22_jiffies * 1_000_000_000 / sysconf(_SC_CLK_TCK).
field is dot-separated nested-member path. The dispatcher
applies a 7-layer validation chain (pid match, start_time
identity, lifetime, on_rq=0, scx queued-empty, ext
sched_class, start_boottime != 0) before
the write/read lands — see
crate::vmm::wire::KernelOpTarget::TaskField for the full
contract.
SCX-only. The dispatcher rejects non-SCX tasks via the
class+policy gates. Recommended fields: "scx.dsq_vtime"
(DSQ priority key, preserved across dequeue/enqueue),
"start_boottime" (task fork timestamp).
Do NOT write "se.vruntime". EEVDF’s place_entity
(kernel/sched/fair.c:5381-5514, since 6.6) overwrites
se->vruntime on every enqueue; direct vruntime writes are
silently discarded for sleeping tasks (our validation gate).
CFS-class tasks are rejected before reaching the write
regardless, but the field-level warning is the actionable
guidance for “why won’t my vruntime write stick” debugging.
Heads up. The dispatcher’s L4 (on_rq == 0) + L5
(scx.dsq == NULL AND scx.runnable_node empty) gates
reject writes on queued/running tasks per CFS rb-tree + SCX
DSQ ordering safety. Test authors must use blocking workload
patterns (e.g. crate::workload::WorkType::FutexPingPong,
WorkType::WaitOnFutex, WorkType::Sleep) so workers are
sleeping when the cold-path Op fires.
§Examples
// Escape-hatch primitive: seed a specific worker's
// scx.dsq_vtime to ~30 days. WorkSpec.uptime (separate API)
// wraps this; use the escape hatch when the scenario knows
// the exact PID + start_time tuple.
use ktstr::prelude::*;
use std::time::Duration;
let workers = handle.worker_pids(); // Vec<libc::pid_t>
let worker_pid = workers[0] as u32;
// Read `/proc/<pid>/stat` field 22, convert from jiffies to
// nanoseconds via `* 1_000_000_000 / sysconf(_SC_CLK_TCK)`.
// (Helper expected to land alongside WorkSpec.uptime.)
let start_time_ns: u64 = read_start_time_ns(worker_pid)?;
let seed_vtime_ns = (30 * 86_400_u64) * 1_000_000_000; // 30 days
let writes = vec![(
KernelTarget::task_field(worker_pid, start_time_ns, "scx.dsq_vtime"),
KernelValue::u64(seed_vtime_ns),
)];
// Worker MUST be in a blocking pattern (FutexPingPong, etc.)
// at op-fire time; the dispatcher's 8-layer validation
// rejects writes against runnable/queued tasks.Trait Implementations§
Source§impl Clone for KernelTarget
impl Clone for KernelTarget
Source§fn clone(&self) -> KernelTarget
fn clone(&self) -> KernelTarget
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreSource§impl Debug for KernelTarget
impl Debug for KernelTarget
Source§impl PartialEq for KernelTarget
impl PartialEq for KernelTarget
impl Eq for KernelTarget
impl StructuralPartialEq for KernelTarget
Auto Trait Implementations§
impl Freeze for KernelTarget
impl RefUnwindSafe for KernelTarget
impl Send for KernelTarget
impl Sync for KernelTarget
impl Unpin for KernelTarget
impl UnwindSafe for KernelTarget
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
§fn equivalent(&self, key: &K) -> bool
fn equivalent(&self, key: &K) -> bool
§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
§fn equivalent(&self, key: &K) -> bool
fn equivalent(&self, key: &K) -> bool
key and return true if they are equal.§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
§fn equivalent(&self, key: &K) -> bool
fn equivalent(&self, key: &K) -> bool
§impl<Q, K> Equivalent<K> for Q
impl<Q, K> Equivalent<K> for Q
§fn equivalent(&self, key: &K) -> bool
fn equivalent(&self, key: &K) -> bool
§impl<T> Instrument for T
impl<T> Instrument for T
§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more