pub struct SnapshotBridge { /* private fields */ }Expand description
Host-side capture pipeline that the freeze coordinator routes
Op::CaptureSnapshot and
Op::WatchSnapshot
requests through.
Construct via SnapshotBridge::new (with an explicit capture
callback) and optionally SnapshotBridge::with_watch_register
to attach watch support. Install for the current thread via
SnapshotBridge::set_thread_local — see BridgeGuard for
the RAII teardown contract.
Implementations§
Source§impl SnapshotBridge
impl SnapshotBridge
Sourcepub fn new(capture: CaptureCallback) -> Self
pub fn new(capture: CaptureCallback) -> Self
Build a bridge from a capture callback. The callback may
freeze the VM, build the report, or return None when
capture is unavailable. No watch-register callback —
Op::WatchSnapshot returns “not supported” when the host
did not wire one. Use Self::with_watch_register to
install one.
Sourcepub fn with_accessor_state(
self,
publish_seqno: Arc<AtomicU64>,
worker_state: Arc<AtomicU8>,
dispatcher_wake_evt: Arc<EventFd>,
) -> Self
pub fn with_accessor_state( self, publish_seqno: Arc<AtomicU64>, worker_state: Arc<AtomicU8>, dispatcher_wake_evt: Arc<EventFd>, ) -> Self
Install the accessor-init worker’s publish-seqno, worker-state,
and dispatcher-wake EventFd so
Self::wait_for_accessor_publish_advance /
Self::wait_for_worker_state_not_trying can block scheduler-
swap ops until the new scheduler’s BPF maps land — with kernel-
scheduling-tick wake latency rather than the 50 ms sleep tail
an atomic-only loop would carry. Called by the freeze-coordinator
when it sets up the worker (vmm-accessor-init thread); test
bridges that don’t drive a real worker can omit this call and
the wait becomes a no-op that returns Ok(0) immediately.
Sourcepub fn accessor_publish_seqno(&self) -> u64
pub fn accessor_publish_seqno(&self) -> u64
Snapshot of the accessor-init worker’s current publish seqno.
Returns 0 for bridges built without an accessor (test fixtures
— the dispatch wait sees no advance to gate on and the op
returns immediately). Read with Acquire so the seqno orders
against the worker’s Release bump.
Sourcepub fn wait_for_worker_state_not_trying(
&self,
deadline: Instant,
op_label: &str,
) -> Result<()>
pub fn wait_for_worker_state_not_trying( &self, deadline: Instant, op_label: &str, ) -> Result<()>
Block until the accessor-init worker exits the TRYING state
(transitions to SUCCEEDED on its first publish, or to
FAILED_PERMANENTLY on a terminal worker exit). Used by
Op::AttachScheduler to serialize against the worker’s
concurrent boot-publish: capturing the seqno baseline AFTER
the boot publish completes is the only way to ensure the
next observed seqno advance belongs to the just-attached
scheduler rather than to a co-resident boot scheduler that
finished its 60 s init in parallel. Event-driven via
accessor_dispatcher_wake_evt — wake latency is one
kernel scheduling tick, not a 50 ms poll. No-op (Ok(())) for
bridges without an accessor.
Sourcepub fn wait_for_accessor_publish_advance(
&self,
seqno_before: u64,
deadline: Instant,
op_label: &str,
) -> Result<u64>
pub fn wait_for_accessor_publish_advance( &self, seqno_before: u64, deadline: Instant, op_label: &str, ) -> Result<u64>
Block until the accessor-init worker’s publish seqno advances
past seqno_before, or until deadline elapses. The dispatch
for Op::ReplaceScheduler / Op::AttachScheduler calls this
after spawning the new scheduler so the op only returns success
once the new scheduler’s BPF accessor pair has been re-published
and a subsequent Snapshot::active() reflects the swap.
On timeout the surfaced error reads the worker-state sentinel
to distinguish “still trying” (transient — operator can bump
the deadline) from “worker exited” (terminal — retry will
hang the same way). Bridges without an accessor (test fixtures
that omit Self::with_accessor_state) succeed immediately
with Ok(0) so unit-test scenarios don’t synthesize a worker.
Sourcepub fn with_kernel_op(self, callback: KernelOpCallback) -> Self
pub fn with_kernel_op(self, callback: KernelOpCallback) -> Self
Install a kernel-op callback so
Op::WriteKernel{Hot,Cold} / Op::ReadKernel{Hot,Cold} ops
can dispatch host-side guest-memory writes/reads. Without
one installed, the in-process executor returns “no
SnapshotBridge installed” and the ops fall through to the
virtio-console wire path. Test fixtures use this seam to
record requests and assert on them without touching real
guest memory.
Sourcepub fn dispatch_kernel_op(
&self,
request: &KernelOpRequestPayload,
) -> Option<KernelOpReplyPayload>
pub fn dispatch_kernel_op( &self, request: &KernelOpRequestPayload, ) -> Option<KernelOpReplyPayload>
Dispatch a kernel-op request through the installed callback.
Returns None when no callback is installed (the executor
then falls through to the wire path); returns
Some(KernelOpReplyPayload) otherwise and records the
reply in the per-tag drain log.
Sourcepub fn drain_kernel_ops(&self) -> Vec<(String, KernelOpReplyPayload)>
pub fn drain_kernel_ops(&self) -> Vec<(String, KernelOpReplyPayload)>
Drain the per-tag kernel-op reply log. Returns the accumulated
(tag, reply) pairs in insertion order; leaves the bridge’s
own copy empty so subsequent calls see only newer entries.
Sourcepub fn record_cgroup_procs(&self, tag: String, cgroup: String, pids: Vec<pid_t>)
pub fn record_cgroup_procs(&self, tag: String, cgroup: String, pids: Vec<pid_t>)
Record a cgroup.procs snapshot captured by
Op::CaptureCgroupProcs.
Appends to the per-bridge log in insertion order. Multiple
captures of the same cgroup (distinguished by tag)
surface as separate entries — the bridge does NOT collapse
or overwrite on duplicate cgroup names, which lets a scenario
capture pre/post snapshots of the same cgroup under different
tags. Multiple captures with the same (tag, cgroup) also
append rather than overwrite; tag uniqueness is a caller
convention, not a framework-enforced contract.
Sourcepub fn drain_cgroup_procs(&self) -> Vec<CgroupProcsSnapshot>
pub fn drain_cgroup_procs(&self) -> Vec<CgroupProcsSnapshot>
Drain the per-bridge cgroup.procs snapshot log. Returns the
accumulated CgroupProcsSnapshots in insertion order; leaves
the bridge’s own copy empty so subsequent calls observe only
newer entries. Mirrors Self::drain_kernel_ops’ contract.
Sourcepub fn cgroup_procs_by_tag(&self, tag: &str) -> Option<CgroupProcsSnapshot>
pub fn cgroup_procs_by_tag(&self, tag: &str) -> Option<CgroupProcsSnapshot>
Look up the first cgroup.procs snapshot recorded under
tag without consuming the drain log. Returns None when no
snapshot was recorded under that tag (the typical case is a
typo, a missing capture op, or capture-before-drain ordering
mistake). For the common single-capture-per-tag pattern this
collapses the drain_cgroup_procs().iter().find(|s| s.tag == tag).expect("missing") 4-combinator dance into a one-liner.
Clone cost. Each CgroupProcsSnapshot carries tag,
cgroup, and pids: Vec<libc::pid_t> — the pids vec is
bounded by the number of tasks in the cgroup at capture
time (kilobytes at most for realistic test scenarios).
Mirrors Self::kernel_op_value’s non-draining-lookup
shape for sibling-API consistency.
Duplicate tag. Returns the FIRST snapshot recorded under
tag. Multiple captures with the same (tag, cgroup) are
stored separately (see Self::record_cgroup_procs); callers
who care about the multiplicity should use
Self::drain_cgroup_procs and filter the Vec manually.
Sourcepub fn record_kernel_op_reply(&self, tag: String, reply: KernelOpReplyPayload)
pub fn record_kernel_op_reply(&self, tag: String, reply: KernelOpReplyPayload)
Record a kernel-op reply produced by the host-side wire-path
dispatcher into the same per-tag drain log that
Self::dispatch_kernel_op populates.
The in-process bridge path stores its replies inside
dispatch_kernel_op (which both invokes the callback AND
pushes the reply into the log). The wire-path used by ops
running inside a guest VM produces its reply on the host
freeze-coordinator and frames it back over virtio-console
port 1 — there is no callback to drive the push, so the
host coordinator calls this record-only method directly
after framing each reply. Without this hook, post_vm
callbacks observing crate::vmm::VmResult::snapshot_bridge
would see an empty drain log for any guest-side
Op::WriteKernel* / Op::ReadKernel* invocation, defeating
the asserts-from-replies pattern the gated cold-path e2e
scaffolding pins.
Sourcepub fn kernel_op_value(&self, tag: &str) -> Option<KernelOpValue>
pub fn kernel_op_value(&self, tag: &str) -> Option<KernelOpValue>
Look up the first kernel-op reply value recorded under tag
in the kernel-op drain log without consuming the log.
The bulk shape returned by Self::drain_kernel_ops is
Vec<(tag, reply)> with each reply carrying a
Vec<crate::vmm::wire::KernelOpValue>. For the common
single-tag single-value read-back lookup, this helper
collapses the 4-layer unwrap (find by tag → check success →
index into read_values → match the variant) into a single
call. Returns None when no reply was recorded under tag,
when the reply reported success = false, or when the
reply’s read_values is empty (e.g. a write-op reply under
the same tag). Otherwise returns Some(value) with the
first KernelOpValue of the first matching reply.
The log is NOT drained — the caller can still inspect via
Self::drain_kernel_ops to observe the full per-tag
history.
Clone cost. For U32 / U64 the clone is 4 / 8 bytes.
For crate::vmm::wire::KernelOpValue::Bytes the clone
can be up to crate::vmm::wire::KERNEL_OP_REPLY_MAX
(1 MiB). Hot paths that repeatedly inspect the same tag
should prefer Self::drain_kernel_ops + index into the
returned Vec to avoid the per-call clone.
Sourcepub fn with_watch_register(self, register: WatchRegisterCallback) -> Self
pub fn with_watch_register(self, register: WatchRegisterCallback) -> Self
Install a watch-register callback so Op::WatchSnapshot
ops can attach hardware-watchpoint snapshots. The callback
is responsible for symbol resolution, watchpoint slot allocation, and
KVM_SET_GUEST_DEBUG arming.
Sourcepub fn register_watch(&self, symbol: &str) -> Result<(), String>
pub fn register_watch(&self, symbol: &str) -> Result<(), String>
Register a hardware-watchpoint snapshot for symbol.
Enforces the per-scenario MAX_WATCH_SNAPSHOTS cap before
invoking the host’s watch-register callback. Returns
Err(reason) when:
- The cap has been reached (slot 0 reserved + 3 user slots allocated).
- No watch-register callback was installed via
Self::with_watch_register. - The host’s callback rejected the request (symbol unresolved, alignment violation, ioctl failure).
Sourcepub fn watch_count(&self) -> usize
pub fn watch_count(&self) -> usize
Number of watchpoint snapshots currently registered.
Sourcepub fn capture(&self, name: &str) -> bool
pub fn capture(&self, name: &str) -> bool
Drive the capture closure and store the result under name.
Returns true when a report was captured and stored;
false when the closure returned None.
Sourcepub fn capture_with_step(&self, name: &str, step_index: u16) -> bool
pub fn capture_with_step(&self, name: &str, step_index: u16) -> bool
Step-aware variant of Self::capture: drives the capture
closure and stores the result under name, stamping it with
step_index so the drained entry’s
super::error::DrainedSnapshotEntry::step_index surfaces
the scenario phase the capture belongs to.
step_index is encoded per the 1-indexed phase convention:
0 is the BASELINE settle window, 1..=N align with
scenario Step ordinals. The host-side on-demand-capture
dispatch reads
crate::scenario::Ctx::current_step just before the
freeze rendezvous and passes the loaded value through this
entry point so the downstream phase aggregator can bucket
the sample directly.
Returns true when a report was captured and stored;
false when the closure returned None.
Sourcepub fn store(&self, name: &str, report: FailureDumpReport)
pub fn store(&self, name: &str, report: FailureDumpReport)
Store a pre-built FailureDumpReport under name,
bypassing the capture callback. Used by the host-side freeze
coordinator after it runs freeze_and_dispatch(FreezeMode::Capture { gate_on_exit_kind: false }) and
wants to publish the resulting report on the bridge for the
test author to drain post-VM-exit.
Storage is capped at MAX_STORED_SNAPSHOTS entries to bound
host memory under runaway capture cadence (e.g. a Loop step
firing Op::CaptureSnapshot with a unique tag every iteration).
When the cap is reached, the oldest stored entry is evicted
with a tracing::warn! naming the dropped tag. An overwrite
of an existing tag also warns and replaces the prior report
in place without disturbing FIFO ordering of other entries.
Sourcepub fn store_with_stats(
&self,
name: &str,
report: FailureDumpReport,
stats: Option<Result<Value, MissingStatsReason>>,
elapsed_ms: Option<u64>,
)
pub fn store_with_stats( &self, name: &str, report: FailureDumpReport, stats: Option<Result<Value, MissingStatsReason>>, elapsed_ms: Option<u64>, )
Bundle a FailureDumpReport with the scx_stats JSON and
elapsed-millisecond timestamp captured at the same periodic
boundary. Used by the freeze coordinator’s periodic-fire path
so Sample can pair the
frozen BPF state with the running-scheduler stats observed
just before the freeze rendezvous.
Stats / elapsed are stored in parallel HashMaps keyed by the
same tag as the report. FIFO eviction sweeps all three in
lock-step; an overwrite refreshes order and replaces every
parallel value (or clears it when the new write passes
None) so a stale stats / elapsed entry can never accompany
a freshly stored report.
Sourcepub fn store_with_stats_and_step(
&self,
name: &str,
report: FailureDumpReport,
stats: Option<Result<Value, MissingStatsReason>>,
elapsed_ms: Option<u64>,
boundary_offset_ms: Option<u64>,
step_index: u16,
)
pub fn store_with_stats_and_step( &self, name: &str, report: FailureDumpReport, stats: Option<Result<Value, MissingStatsReason>>, elapsed_ms: Option<u64>, boundary_offset_ms: Option<u64>, step_index: u16, )
Step-aware variant of Self::store_with_stats: bundles the
scenario phase index alongside the report / stats / elapsed
tuple so the drained entry’s
super::error::DrainedSnapshotEntry::step_index carries
the phase the capture belongs to. The freeze coordinator’s
periodic-fire path reads
crate::scenario::Ctx::current_step just before the
rendezvous and routes the value through this method so each
periodic sample is bucketable per phase without a second
lookup.
step_index is encoded per the 1-indexed phase convention
— 0 is the BASELINE settle window, 1..=N align with
scenario Step ordinals. All other arguments match
Self::store_with_stats verbatim.
Sourcepub fn periodic_real_count(&self) -> u32
pub fn periodic_real_count(&self) -> u32
Count of stored periodic-tagged reports that carry REAL BPF
state (not placeholders synthesized by rendezvous timeouts /
gate suppression). Distinct from
crate::vmm::VmResult::periodic_fired, which counts every
periodic boundary the freeze coordinator attempted —
including the ones that landed only a placeholder when the
vCPU rendezvous timed out.
This is the “useful data produced” floor: a scheduler that
attached but produced nothing but placeholders surfaces as
periodic_real_count() == 0 here even though
periodic_fired may be target. Tests that want a
stricter smoke-floor than periodic_fired >= 1 (which
passes on placeholder-only fills) read this query.
Tag-format pin. The coordinator emits periodic tags via
format!("periodic_{:03}", idx) at
src/vmm/freeze_coord/state.rs — always
"periodic_" + exactly 3 ASCII digits. The match here
enforces that exact shape (NOT a loose prefix) so a user
Op::CaptureSnapshot { name: "periodic_kaslr" } cannot
collide with the periodic dispatch namespace and pollute
the count.
What “real” measures. A placeholder report may still
carry vcpu_regs from a degraded capture (see
src/vmm/freeze_coord/mod.rs periodic-degraded path —
degraded.vcpu_regs is preserved into the stored
placeholder). The floor here treats those as “not real”
because the contract is “the test produced BPF-state data”
— vcpu_regs alone don’t satisfy that. Tests that want the
looser “any capture-attempt landed” floor read
crate::vmm::VmResult::periodic_fired instead.
Sourcepub fn has(&self, name: &str) -> bool
pub fn has(&self, name: &str) -> bool
True when a stored report already exists for name. Lets the
freeze coordinator’s final-drain placeholder path skip storing
a degraded “coord exited before capture” report on top of a
real capture that the in-loop dispatch landed earlier — without
this gate, a vCPU thread that re-armed hit=true after the
in-loop service successfully published the report would have
its tag’s stored capture overwritten by the placeholder at
teardown, presenting tests with a hollow snapshot in place of
the real one.
Sourcepub fn drain(&self) -> HashMap<String, FailureDumpReport>
pub fn drain(&self) -> HashMap<String, FailureDumpReport>
Take ownership of the captured snapshots, leaving the bridge
empty. Drops any periodic-capture stats / elapsed / boundary-
offset metadata stored alongside reports — callers that need
the stats JSON or per-sample timestamp must use
Self::drain_ordered_with_stats instead.
Sourcepub fn drain_ordered(&self) -> Vec<(String, FailureDumpReport)>
pub fn drain_ordered(&self) -> Vec<(String, FailureDumpReport)>
Take ownership of the captured snapshots in insertion order,
leaving the bridge empty. The returned Vec walks
SnapshotStore::order (the FIFO key list maintained by
Self::store) so periodic captures — whose ordering IS the
signal — are returned periodic_000 first, periodic_NNN
last. Self::drain returns a HashMap and loses ordering;
use this method when ordering matters.
An overwrite of an existing tag (the if let Some(existing) = store.reports.insert(...) branch in Self::store) moves
the tag to the back of the FIFO — drain_ordered therefore
returns the LATEST capture under each tag exactly once, in
the order of its most-recent insertion.
FIFO eviction at MAX_STORED_SNAPSHOTS drops the oldest
tags from order AND reports together, so a hot run that
fired more than the cap returns the most recent
MAX_STORED_SNAPSHOTS captures in insertion order; older
captures are gone and Self::store already logged the
eviction.
Sourcepub fn drain_ordered_with_stats(&self) -> Vec<DrainedSnapshotEntry>
pub fn drain_ordered_with_stats(&self) -> Vec<DrainedSnapshotEntry>
Take ownership of the captured snapshots in insertion order
along with the parallel scx_stats JSON and per-sample
elapsed-ms timestamps (None per slot when the tag was
captured outside the periodic-capture path or when the stats
request failed). Empties the bridge — every parallel map is
drained in lock-step so a follow-up call returns an empty
vec.
The returned tuple shape (tag, report, stats, elapsed_ms)
is the input to
SampleSeries::from_drained:
the bridge owns the raw drainable shape, the higher-level
SampleSeries view consumes it. Insertion order is the
signal — periodic captures land
periodic_000/periodic_001/… in monotonic wall-clock
order, and the temporal-assertion patterns walk the vec
expecting that ordering.
Sourcepub fn drain_events(&self) -> Vec<SnapshotBridgeEvent>
pub fn drain_events(&self) -> Vec<SnapshotBridgeEvent>
Take ownership of all queued SnapshotBridgeEvents in
insertion order. Empties the internal log; a follow-up call
returns an empty vec. Test authors call this after
Self::drain_ordered / Self::drain_ordered_with_stats
to inspect bridge-side conditions that fired during the
scenario (eviction, overwrite, missing capture, invariant
violation).
Independent of the report drain — events accumulate even on
scenarios that never call drain_ordered*, and reports
remain reachable even on scenarios that never call
drain_events. Tests that want to fail on a bridge event
compose the streams: drain events, inspect, fail with
AssertResult::fail(AssertDetail::new(Other, ...)) if any
variant is unexpected.
When the events log hit MAX_STORED_EVENTS and FIFO-evicted
older entries since the previous drain, a synthetic
SnapshotBridgeEvent::EventLogTruncated is appended at the
tail of the returned vec carrying the dropped count — the
operator never silently loses events. The internal dropped
counter resets to 0 after every drain.
Sourcepub fn event_count(&self) -> usize
pub fn event_count(&self) -> usize
Non-draining count of queued SnapshotBridgeEvents. Useful
for “no bridge events fired” assertions without consuming
the log — assert_eq!(bridge.event_count(), 0). Does NOT
include the synthetic
SnapshotBridgeEvent::EventLogTruncated marker that
Self::drain_events would append; that marker is
drain-time-only and events_dropped > 0 is observable via
the next drain rather than via this counter.
Sourcepub fn set_thread_local(self) -> BridgeGuard
pub fn set_thread_local(self) -> BridgeGuard
Install this bridge as the active bridge for the calling
thread. The bridge stays installed for the lifetime of the
returned BridgeGuard; on drop the prior bridge (or
None) is restored.
Thread-local because execute_steps
runs on the calling thread and Op::CaptureSnapshot only makes
sense in that exact thread’s call stack — installing a
bridge process-wide would race against parallel test
threads.
§Cloning for drain
set_thread_local consumes self. The bridge is Clone
(internal state lives behind Arc<Mutex<_>>), so callers
that need to inspect / drain the bridge after the scenario
runs MUST clone before installing:
let bridge = SnapshotBridge::new(capture_cb);
let drain_handle = bridge.clone(); // share the Arc
let _guard = bridge.set_thread_local(); // consume the original
// ... execute_scenario runs and records on the thread-local ...
let snaps = drain_handle.drain_cgroup_procs();Both handles share the same underlying snapshot store (reports, kernel_ops, cgroup_procs); ops recorded against the thread-local are observable via the cloned handle. Forgetting to clone is the single most common bridge-flow footgun for tests that consume capture results.
Trait Implementations§
Source§impl Clone for SnapshotBridge
impl Clone for SnapshotBridge
Source§fn clone(&self) -> SnapshotBridge
fn clone(&self) -> SnapshotBridge
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source. Read moreAuto Trait Implementations§
impl Freeze for SnapshotBridge
impl !RefUnwindSafe for SnapshotBridge
impl Send for SnapshotBridge
impl Sync for SnapshotBridge
impl Unpin for SnapshotBridge
impl !UnwindSafe for SnapshotBridge
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
§impl<T> Instrument for T
impl<T> Instrument for T
§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more