Module ctprof_compare

Expand description

Group, aggregate, and render the comparison between two CtprofSnapshots.

Design summary: the per-thread profiler emits one snapshot per run. Comparison groups threads within each snapshot by a single axis (pcomm, cgroup, comm, or comm-exact), or by all pattern-aware axes at once (GroupBy::All) — see GroupBy; aggregates every metric per the rule on its CtprofMetricDef, then matches groups across the two snapshots and emits one row per (group, metric) pair. Groups present on only one side surface as unmatched entries rather than imaginary zero-valued rows — a row is missing because the process did not exist, not because it did zero work.

No judgment labels. The comparison prints raw numbers and percent delta; interpretation (regression vs improvement) is scheduler-specific and left to the user. This deliberately diverges from the gauntlet stats comparison in crate::stats, which DOES classify each metric (Finding::kind, CompareReport::{regressions, improvements, unchanged}): ctprof_compare emits no verdict.

Structs§

AffinitySummary: CPU-affinity aggregation result.
CompareOptions: Options controlling compare.
CtprofCompareArgs: Arguments for the ktstr ctprof compare subcommand.
CtprofDiff: Full comparison result.
CtprofMetricDef: One metric exposed by the comparison pipeline.
DerivedMetricDef: Definition of a derived metric: a function that consumes the already-aggregated input metrics for a group and produces a single scalar (with its own unit and operator-facing description).
DerivedRow: One row in the derived-metrics table: (matched group, derivation) with the computed scalar from both sides.
DiffRow: One row in the comparison table: (group, metric) pair with aggregated values from both sides.
DisplayOptions: Aggregate display options for the renderer. Plumbed as a single struct through write_diff so a future addition lands in one place without growing every signature. The show-side entry (write_show in src/bin/ktstr.rs) keeps a flatter signature for historical reasons but mirrors the same field semantics — --wrap, --sections, --metrics reach show via wrap / sections / metrics parameters that share the same helpers (new_wrapped_table, Section::cli_name).
FudgedPair: A pair of cgroup groups fudged together by thread population overlap. Fudging joins a baseline cgroup to a candidate cgroup when their per-cgroup thread-type sets share enough population (Jaccard similarity ≥ 0.90) — a renamed-but-otherwise-identical scope under a shifted path is rejoined for diffing instead of surfacing as separate orphans.
GroupByOrDefault: Newtype wrapper around GroupBy that defaults to GroupBy::Pcomm. Separate type so CompareOptions::default() does not need to spell out every field.
SortKey: One key in a multi-key --sort-by spec. Names a metric from CTPROF_METRICS or CTPROF_DERIVED_METRICS and the sort direction for that key. Direction defaults to descending (largest delta first) so the common operator request — “show me the biggest regressions first” — is the unmarked form.
ThreadGroup: Aggregated metrics for every thread matched by one group key.

Enums§

AggRule: Aggregation rule for a single metric.
Aggregated: Aggregated metric value for a single super::ThreadGroup.
Column: One column slot in the rendered diff/show table. The renderer iterates the resolved Column vec to build both the header row and each data row, dispatching cell construction per variant. Order in the slice is the rendered order — the renderer never re-sorts.
DerivedValue: Output value of a derived metric.
DisplayFormat: Per-row display layout for write_diff.
GroupBy: Grouping key for the ctprof compare.
ScaleLadder: Closed enumeration of auto-scale ladders driving format dispatch.
Section: One sub-table emitted by write_diff / write_show. --sections filters which sub-tables render — every section not in the filter is suppressed before its emission gate (zero-suppression, group-by-cgroup gating, etc.) runs, so a section that would otherwise emit when its data is present stays silent when omitted from the filter.

Statics§

CTPROF_DERIVED_METRICS: Registry of derived metrics. Each entry consumes one or more already-aggregated input metrics from CTPROF_METRICS and produces a single scalar with its own unit. See the per-entry doc strings for the formula and kernel-source rationale.
CTPROF_METRICS: Registry of per-thread metrics. Order here is the default display order for rows that have no numeric delta to sort by (ties fall back to registry order). Names are the ASCII short-form used in capture code; long-form display is the same — no translation layer.

Functions§

aggregate
build_cgroup_key_map: Build the post-flatten-path → final-tightened-key map for GroupBy::Cgroup under auto-normalization. Walks the union of paths from both snapshots’ threads and cgroup_stats so that Layer 3 (tighten) sees every contributor to a given Layer-2 skeleton group. Returns the map keyed by post-flatten path; consumers (build_groups, flatten_cgroup_stats) look up the final key for any path they see.
build_groups
cgroup_cell: Render a (baseline, candidate, delta) cell for the cgroup-enrichment secondary table emitted under super::GroupBy::Cgroup. The ladder parameter routes each scalar through auto_scale (private to this module) so a 7.5 GiB memory_current row reads 7.500GiB → 8.250GiB (+768.000MiB) instead of 8053063680 → 8858370048 (+805306368). Each cell scales independently — baseline, candidate, and delta may pick different prefixes when their magnitudes cross thresholds.
cgroup_limits_cell: Render a baseline → candidate cell for cpu.max (quota, period) pairs. When both pairs are equal, renders once via format_cpu_max; otherwise renders as <a> → <b>. Mirrors cgroup_optional_limit_cell’s equality-collapse policy.
cgroup_optional_limit_cell: Render a baseline → candidate cell for an Option<u64> LIMIT (e.g. memory.max, memory.high, pids.max). None reads as max (no limit) per format_optional_limit; a step from concrete to max between snapshots renders as <value> → max.
collect_smaps_rollup: Walk a snapshot’s threads and pull non-empty smaps_rollup maps off the leader threads (tid == tgid; non-leader threads land at empty map per the leader-dedup contract).
collect_smaps_rollup_hierarchical
color_derived_cells: Wrap a string-cell row in [comfy_table::Cell]s with blue foreground so derived-metric rows render visually distinct from the per-thread primary table when stdout is a TTY. Operators scanning a long compare or show output can locate the ## Derived metrics rows at a glance instead of relying on the section header alone.
color_diff_cell: Color a diff-table cell based on its column type, the row’s raw delta (sign of color), and the row’s delta_pct (fraction for the bold threshold). Delta/% cells: yellow for positive (increase), magenta for negative (decrease). Uptime: green/yellow/red gradient. Other columns: default.
colored_header: Build a colored header row — cyan foreground so headers are visually distinct from data rows.
colored_header_with_sort
compare: Compare two snapshots and produce a CtprofDiff. Sequences the comparison phases in data-flow order: build the per-side thread groups, emit matched + one-sided rows, fudge one-sided cgroups together (producing the fudged_key_pairs consumed by the uptime and enrichment phases), sort the one-sided lists, fill uptime%, order the rows, and attach enrichment. The fudged-pair (bkey, ckey) threading is the only cross-phase data dependency: built by apply_cgroup_fudge, read by fill_uptime_pct and attach_enrichment.
compile_flatten_patterns
flatten_cgroup_path: Collapse dynamic segments of a cgroup path per every pattern in patterns. A pattern is a glob matched with glob’s default MatchOptions (require_literal_separator = false), so * is NOT segment-bounded — it matches across / just like **. The literal portions are preserved and the wildcard portions are replaced with the wildcard token itself. Example: pattern /kubepods/*/workload applied to /kubepods/pod-abc/workload produces /kubepods/*/workload, so two runs with different pod IDs collapse onto the same key.
flatten_cgroup_stats
format_cpu_max: Render a cpu.max pair as <quota>/<period> where quota is either max (no cap) or the auto-scaled µs value. Period is always present (default 100_000 µs per default_bw_period_us() at kernel/sched/sched.h:441). The <quota>/<period> separator is THIS crate’s display convention — the kernel itself emits raw integers in cat cpu.max (space-separated, no auto-scale); we auto-scale via format_scaled_u64 for human-friendly output, which also widens the visual delimiter from the kernel’s space to a slash.
format_derived_delta_cell: Format the signed delta cell for a derived row. Mirrors format_derived_value_cell but always carries an explicit +/- sign so the operator can read directionality at a glance. Ratios render with three decimals (+0.100 is +10pp); other ladders route through auto_scale and pick up the scaled unit suffix.
format_derived_value_cell: Format a derived-metric value cell for the ## Derived metrics table. Ratio rows (is_ratio: true, ScaleLadder::None) render with three decimals (0.873); ns / B / ticks ladders route through the same auto-scale ladder as the main table. Negative values (e.g. a negative live_heap_estimate) carry their explicit minus sign through the format.
format_optional_limit: Render an Option<u64> cgroup limit as either max (no limit / kernel emitted the literal max token) or the auto-scaled value. Used for memory.max, memory.high, pids.max. Mirrors the kernel’s own display: cat memory.max prints max when no cap is set, a u64 byte count otherwise.
format_psi_avg_cell: Render a baseline→candidate→delta cell for a PSI average field. baseline and candidate are centi-percent (0..=10000 covering 0.00..=100.00 %); the cell renders each as N.NN% and computes a signed delta (+|-D.DD%). Mirrors cgroup_cell’s structure but does NOT route through the auto-scale ladder — a pressure percentage is dimensionless and topping out at 100 means there’s nothing to scale.
format_psi_avg_centi_percent: Convert a centi-percent value (0..=10000) to its display form N.NN%. The centi-percent representation is 1:1 with the kernel’s LOAD_INT.LOAD_FRAC 2-decimal-digit emission at kernel/sched/psi.c:1284 — preserve that precision on display.
format_scaled_u64: Auto-scale a u64 value at the given ladder and render it as a cell. Helper for format_value_cell — the Sum and Max arms share this exact logic. Also used by the ctprof show renderer for the cgroup-stats secondary table, where each scalar stands alone (no baseline/candidate pair to fold into a delta cell).
format_value_cell: Format a per-row baseline / candidate cell for super::write_diff. Numeric aggregates (Aggregated::Sum / Aggregated::Max) run through auto_scale so large values render in a readable magnitude (1.235ms instead of 1234567ns). When the scaled unit equals the ladder’s base unit (no step-up was triggered), the original integer value is rendered verbatim — this avoids polluting small numbers with a .000 suffix. Non-numeric aggregates (OrdinalRange, Mode, Affinity) fall through to the Aggregated std::fmt::Display impl unchanged because no scaling applies; the ladder is ScaleLadder::None for these and the suffix is empty.
limit_sections: Truncate each ## <heading> section to at most limit lines. Sections are delimited by lines starting with ## . Content before the first section header passes through untruncated (typically the file-path header row).
metric_display_name: Borrow the metric’s bare name from the registry. The &'static str lifetime piggybacks on CtprofMetricDef::name’s static-string storage — callers may borrow the static name without allocation; render sites that need owned Strings allocate at the table-cell boundary (see super::render at the metric_display_name(metric_def).to_string() call site and super::runner::write_metric_list).
metric_tags: Render a metric’s bracketed gating tags as a single space-separated string. Returns the empty string when sched_class is None, is_dead is false, AND config_gates is empty.
parse_columns: Parse a CLI --columns spec into a typed Column vec. Format: comma-separated names matching Column::cli_name. Whitespace around each name is trimmed. Empty input parses to an empty Vec — caller falls back to the format default.
parse_metrics: Parse a CLI --metrics spec into a typed Vec<&'static str> of registry names. Format: comma-separated names that must each match a name field from either CTPROF_METRICS or CTPROF_DERIVED_METRICS. Whitespace around each name is trimmed. Empty input parses to an empty Vec — caller treats that as “every metric renders” via DisplayOptions::is_metric_enabled, mirroring parse_sections’s empty-input semantic.
parse_sections: Parse a CLI --sections spec into a typed Section vec. Format: comma-separated names matching Section::cli_name. Whitespace around each name is trimmed. Empty input parses to an empty Vec — caller treats that as “every section renders” via DisplayOptions::is_section_enabled.
parse_sort_by: Parse a --sort-by CLI value into a list of SortKeys. Spec format: metric1[:dir1],metric2[:dir2],... where each metric is a name from CTPROF_METRICS or CTPROF_DERIVED_METRICS and dir is asc or desc (case-insensitive — :DESC, :Asc, :asc all work). Direction defaults to desc (largest delta first — operator “show me the largest changes” default).
pattern_display_label: Compute the operator-facing display label for a pattern-aware group, given the union of baseline+candidate member comms. For buckets with ≥ 2 distinct member names, runs grex over the sorted union to emit a regex that exactly matches the constituent thread names. For singleton or all-identical buckets, returns the join key unchanged so the rendered label equals what would have shown under literal grouping.
pattern_key: Compute the token-normalized skeleton for a name string.
print_diff: Render CtprofDiff as a table on stdout. Thin wrapper over write_diff so the non-test caller keeps the ergonomics of a one-line call; tests drive write_diff into a String buffer.
print_metric_list: Print the metric-list discovery output to stdout. Thin wrapper over write_metric_list so the CLI keeps the one-line call ergonomics; tests drive the writer into a String buffer.
run_compare: Entry point for the compare CLI. Parses --sort-by first, then loads both snapshots, computes the diff, prints the table, and returns 0 on success. Exits non-zero only on I/O or parse errors; a non-empty diff is data, not a failure.
run_metric_list: Entry point for the ctprof metric-list subcommand. Always returns Ok(0) — discovery output is informational and never fails.
warn_cgroup_only_sections_under_non_cgroup: Emit a stderr warning when an explicit --sections filter names a cgroup-only section while --group-by is not GroupBy::Cgroup. Without the warning, the section would silently render zero rows (its outer-gate suppresses it), leaving the operator wondering whether their snapshot lacked the data or their flag was misconfigured.
write_diff: Render CtprofDiff into w. The formatter layer lives here so tests can inspect exactly what print_diff would emit without shelling through stdout capture. Write errors propagate as std::fmt::Error — callers that write into an infallible sink (String) can unwrap or ignore.
write_metric_list: Render the metric-list discovery output: a tag legend (sched_class / config_gates / [dead]) followed by a per-metric table whose rows show name | tags | description. Tag legend is keyed off the closed-set vocabulary the registry pin test guards (registry_tag_vocabulary_is_closed), so adding a new allowed class or gate fails the test until both the legend and the closed-set table are updated together.