Module metric_types

Module metric_types 

Source
Expand description

Type-safe wrappers for per-thread metric values.

Each registered metric in crate::ctprof_compare::CTPROF_METRICS has a kernel-source-grounded semantic category — counter, cumulative-time, peak high-water (ns and bytes), instantaneous gauge, byte count, ordinal scalar, categorical, or cpuset. The aggregation pipeline reduces values per category: counters sum, peaks take max, gauges take max, ordinals carry a [min, max] range, categoricals carry the mode (most-frequent value), and cpusets carry an affinity summary.

§Temporal window

Every counter / cumulative-time / peak / byte-count newtype defined here represents a value that the kernel accumulates across the THREAD LIFETIME — from thread birth to the moment of the procfs read. All of these fields share the same window because they live in the same task_struct and tick along with the same task. That shared window is what makes ratios across fields well-defined (e.g. cpu_efficiency = run_time_ns / (run_time_ns + wait_time_ns) is a meaningful fraction because both numerator and denominator measure the same task’s same lifetime).

Cross-file read skew during one capture pass (the capture pipeline reads /proc/<tid>/stat, then /sched, then /io, etc. with a few hundred microseconds of drift between them) is negligible against cumulative-from-birth totals that grow over hours or days of thread runtime — the small in-flight delta during the read is rounding noise relative to the lifetime accumulator. The qualifier holds relative to a lifetime accumulator that has had time to integrate; threads captured very early in their lifetime carry larger relative read-skew error, but their absolute contribution to any group aggregate is correspondingly small (a thread alive for 500 µs cannot meaningfully drag a group total even if its individual reads are skewed by 100 µs).

crate::ctprof_compare runs in two modes that both preserve the shared-window property: SHOW renders one snapshot’s lifetime totals; COMPARE subtracts two snapshots captured at different wall-clock instants to scope the values to the (capture-A, capture-B) interval. In both modes every field carries the same temporal window, so cross-field ratios and per-thread totals stay well-defined.

Two newtypes break this convention deliberately: GaugeNs (a current-instantaneous reading like the scheduler’s current slice) and GaugeCount (a current count like signal_struct->nr_threads) — the per-newtype docs call out the gauge family separately.

§Type-system enforcement

Encoding the category into the type system surfaces category-mismatched aggregation as a compile error. The crate::ctprof_compare::AggRule dispatch routes each variant through the typed newtype’s reduction trait — Sum* through Summable::sum_across, Max* through Maxable::max_across, Range* through Rangeable::range_across, and Mode* through Modeable::mode_across — so a registry entry that pairs a peak field with a sum reduction (e.g. t.wait_max (PeakNs) bound to a Sum* rule whose accessor returns a Summable value) fails to compile rather than producing a meaningless 1×1s ⊕ 1000×1ms aggregate. This module defines the newtypes and traits the dispatch consumes.

§The newtypes

  • MonotonicCount — pure counter (only ever goes up across a thread’s lifetime). Examples: nr_wakeups, nr_migrations, voluntary_csw.
  • DeadCounter — same wire shape as MonotonicCount but tagged for kernel counters whose update path is permanently dead (the field exists in task_struct but no kernel writer touches it on any current code path — nr_wakeups_idle, nr_migrations_cold, nr_wakeups_passive all match this shape today). Captured for parity with /proc/<tid>/sched line numbers but does NOT implement any reduction trait (Summable / Maxable / Rangeable / Modeable) — the value is structurally zero, so every reduction is trivially zero and rendering it through any of the live reductions implies “we measured a thing” when in fact we measured a kernel-side dead pointer. The registry-level accommodation (a no-op aggregation arm or registry removal) is the migration batch’s problem; this newtype’s job is to make the dead-counter status visible at the field declaration so the migration can’t accidentally pair it with a Summable-bound AggRule variant.
  • MonotonicNs — cumulative-time counter, ns. Examples: run_time_ns, wait_sum, voluntary_sleep_ns, block_sum, iowait_sum, core_forceidle_sum.
  • PeakNs — lifetime high-water mark, ns. The kernel updates these via if (delta > stat->max) stat->max = delta inside update_stats_* wrappers (kernel/sched/stats.c) and inline schedstat updates in kernel/sched/fair.c (e.g. slice_max in set_next_entity, exec_max in update_se). Summing peaks is a category error — 1 thread × 1s peak carries different meaning than 1000 threads × 1ms peak. Examples: wait_max, sleep_max, block_max, exec_max, slice_max.
  • PeakBytes — lifetime high-water mark, bytes (per-process hiwater_rss / hiwater_vm from struct taskstats via the genetlink path). Same Maxable-only contract as PeakNs but Bytes-typed, so it renders on the IEC byte ladder (B → KiB → MiB → GiB → TiB) instead of the ns ladder.
  • GaugeNs — instantaneous gauge sampled at capture time, ns. fair_slice_ns is the canonical example. Summing gauges is a category error — N nearly-identical instantaneous samples sum to N×gauge with no physical meaning.
  • GaugeCount — gauge-family unitless count (u64) that can go up AND down at runtime. Carries the same Maxable-only contract as GaugeNs but renders as a plain count rather than a nanosecond ladder. nr_threads (the process-wide thread count from signal_struct->nr_threads) is the canonical example — threads spawn and exit so the value is not monotonic, and the registry reduces it by Max across a group rather than Sum. Distinct from GaugeNs because “thread count” and “current slice in nanoseconds” do not share a unit; routing nr_threads through GaugeNs would render it on the ns auto-scale ladder, which is a unit lie.
  • ClockTicks — USER_HZ-scaled time. Examples: utime_clock_ticks, stime_clock_ticks. Auto-scale ladder is ticks → Kticks → Mticks (decimal SI), distinct from ns (also decimal SI, different unit) and bytes (IEC binary).
  • Bytes — byte counts. Examples: allocated_bytes, read_bytes, wchar. Auto-scale ladder is IEC binary (B → KiB → MiB → GiB → TiB).
  • OrdinalI32 / OrdinalU32 / OrdinalU64 — bounded scalar, range-aggregated (no sum). OrdinalI32 examples: nice ([-20, 19]), priority (CFS=[0, 39], RT=[-2, -100], DL=-101), processor (last CPU the task ran on; signed for symmetry with nice — the kernel’s task_cpu() returns unsigned int (include/linux/sched.h), but ktstr stores i32 to share the OrdinalI32 wrapper with the genuinely-signed nice and priority fields). OrdinalU32 is for u32-backed ordinal fields like rt_priority (real-time priority, 0..99 in practice for SCHED_FIFO / SCHED_RR; the kernel declares unsigned int task_struct::rt_priority in include/linux/sched.h, so a u32 matches the kernel field width exactly). OrdinalU64 is reserved for future ordinal metrics whose kernel-side type genuinely exceeds u32::MAX; no field uses it today.
  • CategoricalString — string-valued, mode-aggregated. policy is the only example. The state char and ext_enabled bool fields stay unwrapped on crate::ctprof::ThreadState; the crate::ctprof_compare::AggRule::ModeChar and crate::ctprof_compare::AggRule::ModeBool accessors coerce them through String via to_string() at the call site. If a second bool field appears, promote both to a dedicated CategoricalBool wrapper rather than continuing the ad-hoc coercion.
  • CpuSetVec<u32> of CPU IDs, affinity-aggregated. cpu_affinity is the only example.

§The marker traits

  • Summable — sum across a group. Implemented by the four counter newtypes (MonotonicCount, MonotonicNs, ClockTicks, Bytes). NOT implemented by PeakNs / GaugeNs / GaugeCount / OrdinalI32 / OrdinalU32 / OrdinalU64 / CategoricalString / CpuSet. The trait is sealed via sealed::SummableSealed so a downstream crate cannot add impl Summable for PeakNs to bypass the category invariant.

  • Maxable — reduce by max. Implemented by PeakNs (max-of-peak is “worst peak any contributor saw across its lifetime”), GaugeNs (max-of-gauge is “longest current slice in the bucket”), and GaugeCount (max-of-count is “biggest current count any contributor carried”). NOT implemented by Summable cumulative counters (MonotonicCount / MonotonicNs / ClockTicks / Bytes) — max-across-snapshots on a lifetime accumulator reduces to “the last snapshot’s value”, which is mostly noise relative to the lifetime-integrated quantity it reports. NOT implemented by ordinals (those carry a [min, max] range, not a single max), nor by CategoricalString (string max has no useful semantic), nor by CpuSet (the affinity reduction is a custom summary, not a bare max). Sealed via sealed::MaxableSealed.

    max_across returns Option<Self>: None for an empty iterator (so callers can distinguish “no contributors” from “all contributors had zero”), Some(largest) otherwise. The parallel Summable::try_sum_across returns Option<Self> with the same empty-iterator semantics. The try_ prefix (rather than checked_) avoids colliding with the stdlib’s overflow-detection naming convention — this is an empty-iterator check, not an arithmetic check.

  • Modeable — reduce by mode (most-frequent value). Implemented by CategoricalString only. Sealed via sealed::ModeableSealed.

  • Rangeable — reduce by [min, max]. Implemented by OrdinalI32, OrdinalU32, and OrdinalU64. Sealed via sealed::RangeableSealed. range_across returns Option<Range<Self>> — the Range newtype enforces min ≤ max at construction so a downstream consumer cannot observe a swapped pair.

Reductions are exposed as trait methods on Summable / Maxable / Rangeable / Modeable. Callers must import the relevant trait (or use ktstr::metric_types::*;) to call T::sum_across(...) / T::max_across(...) / T::range_across(...) / T::mode_across(...). The traits double as compile-time markers — a generic site that wants “any summable type” can take T: Summable and statically reject PeakNs.

§Wire-format compatibility

Every wrapper carries #[serde(transparent)] so the JSON representation matches the unwrapped primitive. The crate::ctprof::ThreadState migration to these newtypes preserves wire format — existing snapshot files (.ctprof.zst) deserialize unchanged.

§What this module is NOT

  • It is NOT a unit-of-measure system. There is no MonotonicNs * MonotonicNs = MonotonicNs² — these wrappers carry semantic category, not algebraic dimensionality.
  • It is NOT a runtime-typed value enum (that lives next to the crate::ctprof_compare::AggRule dispatch). This module only defines the building-block newtypes.

Structs§

Bytes
Byte count, IEC-binary auto-scaled (B → KiB → MiB → GiB → TiB). Accumulated by the kernel (or jemalloc, for the per-thread TSD allocator counters) from thread birth.
CategoricalString
Categorical string-valued field. Group reduction takes the mode (most-frequent value); ties break alphabetically per the existing aggregate(AggRule::Mode, ...) rule.
ClockTicks
USER_HZ-scaled tick counter, accumulated by the kernel from thread birth. The kernel exposes user-mode and kernel-mode CPU time, plus delayacct blkio delay, in ticks of the userspace-visible USER_HZ frequency. Auto-scale ladder is ticks → Kticks → Mticks (decimal SI), kept distinct from ns and bytes so the rendered cell carries the correct unit suffix.
CpuSet
CPU affinity set. Group reduction produces an crate::ctprof_compare::AffinitySummary carrying the num_cpus range plus a uniform-cpuset flag.
DeadCounter
Kernel counter whose update path is permanently dead. The field exists in task_struct (and is exposed via /proc/<tid>/sched) but no kernel writer touches it on any current code path.
GaugeCount
Gauge-family unitless count (u64). Distinct from MonotonicCount: a MonotonicCount only ever goes UP over a thread’s lifetime (integrated from birth), while a GaugeCount is sampled at capture time and can go up AND down at runtime as the underlying state changes. Distinct from GaugeNs: same Maxable-only contract, but renders as a unitless count rather than a nanosecond ladder.
GaugeNs
Instantaneous gauge sampled at capture time, nanoseconds. Distinct from PeakNs: a gauge is a snapshot of the CURRENT value of a kernel field, not a lifetime maximum. fair_slice_ns reads the per-thread slice line from /proc/<tid>/sched, which carries the scheduler’s current timeslice for the task — a point-in-time reading, not a thread-lifetime accumulator. Cross-field ratios with MonotonicNs / MonotonicCount / etc. produce a quantity with mixed temporal interpretation (numerator integrates from thread birth, denominator samples the present), so callers should treat such ratios as a rough hint rather than a well-defined fraction.
MonotonicCount
Pure monotonic counter — only ever goes up over a thread’s lifetime, accumulated by the kernel from thread birth to the moment of the procfs read. Sum across a group; delta across snapshots scopes the value to the inter-capture interval.
MonotonicNs
Cumulative-time counter, nanoseconds, accumulated by the kernel from thread birth. Same temporal-window shape as MonotonicCount but tagged for the ns auto-scale ladder (ns → µs → ms → s).
OrdinalI32
Bounded ordinal scalar (i32). Range-aggregated across a group: the cell carries the observed [min, max] interval, not a sum. Sum is meaningless for ordinals — adding two nice values doesn’t produce a third nice value.
OrdinalU32
Bounded ordinal scalar (u32). Same range-aggregation contract as OrdinalI32 but for unsigned 32-bit fields.
OrdinalU64
Bounded ordinal scalar (u64). Same range-aggregation contract as OrdinalI32 but for unsigned 64-bit fields. No registry metric uses this width today; reserved for future ordinal metrics whose kernel-side type genuinely exceeds u32::MAX.
PeakBytes
Lifetime high-water mark, bytes. Same Maxable-only contract as PeakNs but Bytes-typed so the renderer routes through the IEC binary auto-scale ladder (B → KiB → MiB → GiB → TiB) instead of the ns ladder.
PeakNs
Lifetime high-water mark, nanoseconds. The kernel updates these as a max-against-prior in update_stats_* / update_se / set_next_entity paths (kernel/sched/stats.c, kernel/sched/fair.c); the value at any procfs read is the largest single window the thread has accumulated since its birth. Group reduction takes max across contributors so the rendered cell surfaces the worst single window any thread experienced over its lifetime.
Range
Inclusive [min, max] interval over a Rangeable type.

Traits§

Maxable
Marker for newtypes that can be reduced by max across a group.
Modeable
Marker for newtypes reduced by mode (most-frequent value). Implemented by CategoricalString.
Rangeable
Marker for newtypes reduced by [min, max] range. Implemented by OrdinalI32, OrdinalU32, and OrdinalU64.
Summable
Marker for newtypes that can be summed across a group.