pub const EMBEDDED_KCONFIG: &str = "# -- Architecture defensives (x86_64) --\n# CONFIG_X86_FRED=y kernels skip idt_syscall_init (arch/x86/kernel/cpu/common.c:2303-2304)\n# and never write MSR_LSTAR. ktstr\'s virt-KASLR derivation reads MSR_LSTAR to\n# recover the runtime KVA of entry_SYSCALL_64 \u{2014} FRED would silently break that\n# path and corrupt every per-CPU read on KASLR-enabled boots. Locked off here\n# as a belt-and-suspenders measure since FRED is gaining adoption upstream and\n# an inherited defconfig flip would silently invalidate ktstr\'s KASLR fix.\n# CONFIG_X86_FRED is not set\n\n# CONFIG_RANDOMIZE_BASE=y forces virt-KASLR on for every ktstr boot. The\n# `kaslr_offset_nonzero_post_boot` e2e test (tests/kaslr_axis_e2e.rs) depends\n# on a non-zero KASLR offset to exercise the freeze_coord \u{2192} compute_rq_pas\n# wire-in; an inherited defconfig that flipped this off would silently\n# degrade the test into a tautology (kaslr=0 makes the buggy and fixed\n# paths agree on the same PA). Pinning here keeps the assertion meaningful.\nCONFIG_RANDOMIZE_BASE=y\n\n# CONFIG_RANDOMIZE_MEMORY=y randomizes `page_offset_base`, `vmalloc_base`,\n# and `vmemmap_base` at boot \u{2014} independent of the kernel-image text/data\n# slide which CONFIG_RANDOMIZE_BASE=y above controls. The host resolves the\n# runtime `page_offset_base` from the `page_offset_base` symbol\n# (`monitor::symbols`, via `symbols::page_offset_base_kva` and\n# `kva_to_pa` / `resolve_page_offset`), falling back to `DEFAULT_PAGE_OFFSET`\n# when the symbol is absent (aarch64) or unreadable. The guest also reads\n# the runtime KVA from `/proc/kallsyms`\n# (`guest_comms::read_kernel_page_offset_base_from_kallsyms`, called from\n# `src/vmm/rust_init/init.rs`) and ships it in `KernAddrs.page_offset_base`\n# (`src/vmm/wire.rs`), but the host KERN_ADDRS handler\n# (`src/vmm/freeze_coord/dispatch.rs`) currently consumes only `phys_base`\n# and `kernel_text_runtime_kva` \u{2014} that wire slot is unused. Operators opt\n# out with `kargs = [\"nokaslr\"]` on the scheduler decl or\n# `#[ktstr_test(kaslr = false, ...)]` per test. The\n# `kaslr_page_offset_derivation_nonzero` e2e test guards the runtime value:\n# `page_offset_base` must lie in `[default_page_offset, VADDR_END)` and be\n# PUD-aligned (1 GiB) relative to `default_page_offset` \u{2014} the L4 base\n# `0xffff_8880_0000_0000` or the L5 base `0xff11_0000_0000_0000` depending\n# on paging mode (per arch/x86/mm/kaslr.c::kernel_randomize_memory). The\n# ~1/30000 slot-0 outcome (`page_offset_base == default_page_offset`) only\n# warns; it does not fail. Pinning =y here so an inherited defconfig flip\n# can\'t silently degrade the test back into a tautology where the buggy\n# and fixed direct-map paths agree on the compile-time default.\nCONFIG_RANDOMIZE_MEMORY=y\n\n# CONFIG_GCC_PLUGIN_RANDSTRUCT=n keeps `struct task_struct` field order\n# deterministic across builds. The TaskField cold-path Op handler\n# (`src/vmm/freeze_coord/kernel_op_dispatch.rs`) resolves field byte\n# offsets via the live vmlinux\'s BTF at dispatch time, so RANDSTRUCT\n# layouts ARE correctly handled when BTF is present. This pin is\n# defense-in-depth: a kernel built without BTF coverage for `task_struct`\n# (extremely stripped vmlinux) combined with RANDSTRUCT would silently\n# read garbage at the wrong byte offsets \u{2014} the TaskField dispatcher\'s\n# 8-layer validation would catch obvious corruption (pid mismatch,\n# start_boottime=0) but a fortuitous match could land an undetected\n# write. Disabling RANDSTRUCT removes the dependence on BTF coverage\n# for the task_struct subset that TaskField writes.\n# CONFIG_GCC_PLUGIN_RANDSTRUCT is not set\n\n# -- sched_ext + BPF (required) --\n# sched_ext scheduling class. Depends on BPF_SYSCALL + BPF_JIT + DEBUG_INFO_BTF.\nCONFIG_BPF=y\nCONFIG_BPF_SYSCALL=y\nCONFIG_BPF_JIT=y\nCONFIG_BPF_JIT_ALWAYS_ON=y\nCONFIG_SCHED_CLASS_EXT=y\n\n# -- Debug info (required) --\n# BTF for monitor struct resolution, BPF CO-RE, probe output.\n# DWARF for source locations in probe output.\n# DEBUG_INFO_REDUCED breaks BTF generation \u{2014} must stay off.\nCONFIG_DEBUG_INFO=y\nCONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y\nCONFIG_DEBUG_INFO_BTF=y\n# Module BTF embeds BTF into .ko files. ktstr runs a monolithic\n# kernel with no loadable modules \u{2014} the scheduler is a BPF program,\n# not a .ko. Disable to avoid resolve_btfids/pahole crashes during\n# modfinal on some arm64 toolchains.\n# CONFIG_DEBUG_INFO_BTF_MODULES is not set\n# CONFIG_DEBUG_INFO_REDUCED is not set\n\n# -- Tracing/probes (required) --\n# Kprobes for ktstr probe pipeline. ftrace for dynamic attachment\n# and scx scheduler tracing (e.g. LAVD futex tracing).\n# PERF_EVENTS: explicit dep of BPF_EVENTS (provided by defconfig\'s\n# PROFILING=y, but stated here so the tracing section is self-contained).\n# FTRACE: top-level gate for all tracing infrastructure. arm64 defconfig\n# disables it \u{2014} without FTRACE=y, KPROBE_EVENTS, BPF_EVENTS, and\n# FUNCTION_TRACER are silently dropped (inside `if FTRACE` in Kconfig).\nCONFIG_FTRACE=y\nCONFIG_KPROBES=y\nCONFIG_KPROBE_EVENTS=y\nCONFIG_PERF_EVENTS=y\n# HW_PERF_EVENTS: hardware PMU sampling. On arm64 this is def_bool y\n# guarded on ARM_PMU (drivers/perf/Kconfig); declaring it explicitly\n# makes the dependency visible. Required for sched_ext schedulers\n# (scx_layered, scx_cosmos) that read perf counters via BPF kfuncs.\n# x86 builds ignore this symbol (x86 PMU support is unconditional via\n# CONFIG_PERF_EVENTS above); the symbol is defined only in\n# arch/arm64/Kconfig.\nCONFIG_HW_PERF_EVENTS=y\n# ARM_PMUV3: arm64 PMUv3 driver (drivers/perf/arm_pmuv3.c). bool with\n# `default ARM64` and `depends on HW_PERF_EVENTS && ((ARM && CPU_V7) ||\n# ARM64)` (drivers/perf/Kconfig). On arm64 it would be picked up by\n# defconfig anyway; declaring it explicitly pins the PMU driver in the\n# kconfig fragment so future merge_config runs cannot silently drop it\n# if the dependency chain changes. x86 builds ignore this symbol \u{2014} it\n# is gated on ARM/ARM64 in drivers/perf/Kconfig.\nCONFIG_ARM_PMUV3=y\n# BPF_EVENTS: default y when deps met (KPROBE_EVENTS + PERF_EVENTS).\n# Explicit for clarity.\nCONFIG_BPF_EVENTS=y\nCONFIG_DEBUG_FS=y\nCONFIG_FUNCTION_TRACER=y\nCONFIG_DYNAMIC_FTRACE=y\n# Preempt-disabled duration capture (#64).\n# tp_btf/preempt_disable + preempt_enable handlers in probe.bpf.c\n# attach to the trace_preempt_off / trace_preempt_on tracepoints\n# defined in include/trace/events/preemptirq.h. These tracepoints\n# are emitted from kernel/trace/trace_preemptirq.c only when\n# CONFIG_TRACE_PREEMPT_TOGGLE is set; the option in turn depends\n# on PREEMPT_TRACER (Kconfig under kernel/trace/Kconfig).\n# Without these, the BPF program loads but the two tp_btf attaches\n# silently fail and per-CPU max_ns stays at 0 \u{2014} graceful degradation\n# matching other optional tp_btf attaches in the probe.\nCONFIG_PREEMPT_TRACER=y\nCONFIG_TRACE_PREEMPT_TOGGLE=y\n\n# -- Boot/console (required) --\n# Init script mounts /proc (reads /proc/cmdline for SHM_BASE),\n# /sys (sysfs for cgroups, tracing), and reads /dev/mem for\n# SHM dump polling.\n#\n# Serial: x86 VMM uses ISA 16550 (COM1/COM2), arm64 uses ns16550a via MMIO.\n# Both drivers are included so the same fragment works on either arch.\nCONFIG_SERIAL_8250=y\nCONFIG_SERIAL_8250_CONSOLE=y\n# FDT-based probing for 8250 ports \u{2014} required on arm64 where the\n# VMM\'s ns16550a UARTs are described in the device tree. Without\n# this, the kernel has no driver to match the ns16550a FDT nodes\n# and /dev/ttyS* devices are never created.\nCONFIG_SERIAL_OF_PLATFORM=y\nCONFIG_SERIAL_AMBA_PL011=y\nCONFIG_SERIAL_AMBA_PL011_CONSOLE=y\nCONFIG_BLK_DEV_INITRD=y\nCONFIG_RD_LZ4=y\nCONFIG_TTY=y\nCONFIG_UNIX98_PTYS=y\nCONFIG_PROC_FS=y\nCONFIG_SYSFS=y\nCONFIG_DEVMEM=y\n# CONFIG_STRICT_DEVMEM is not set\nCONFIG_DEVTMPFS=y\nCONFIG_TMPFS=y\nCONFIG_SHMEM=y\nCONFIG_BINFMT_SCRIPT=y\n# x86 only \u{2014} arm64 uses earlycon (auto-selected by PL011/8250 drivers).\nCONFIG_EARLY_PRINTK=y\n\n# -- Virtio device support --\n# VIRTIO + VIRTIO_MMIO are the transport layer the VMM exposes; both\n# are required for any virtio device (console, block, or net) to\n# probe. VIRTIO_MMIO_CMDLINE_DEVICES lets the kernel pick up the\n# `virtio_mmio.device=...` kernel cmdline entries the VMM appends so\n# devices land in deterministic probe order. The console device\n# powers /dev/hvc0 for shell mode; the block device is used by tests\n# that need a real disk (DiskConfig); the net device is used by\n# tests that need a NIC and is also a future-proofing knob (#20).\nCONFIG_VIRTIO=y\nCONFIG_VIRTIO_MMIO=y\nCONFIG_VIRTIO_MMIO_CMDLINE_DEVICES=y\nCONFIG_VIRTIO_CONSOLE=y\nCONFIG_VIRTIO_BLK=y\nCONFIG_VIRTIO_NET=y\n\n# -- PCI bus (virtio-PCI transport) --\n# The virtio-PCI transport exposes NICs (and future PCI devices) via a\n# host bridge at 00:00.0 plus an ECAM/CAM config space. PCI must be\n# compiled in for the guest to enumerate ANY of it; the per-VM\n# `pci=off` cmdline (appended when a test does not request PCI) gates\n# scanning at runtime, so a PCI-capable kernel is harmless for non-PCI\n# tests. x86_64_defconfig sets CONFIG_PCI=y, but pin it here so an\n# inherited-defconfig flip or an --extra-kconfig that strips it cannot\n# silently disable the transport \u{2014} the failure mode would be a guest\n# with no /sys/bus/pci and every PCI device invisible. PCI_MMCONFIG\n# backs ECAM (extended config space, reg >= 256) which the VMM\'s MCFG\n# table and _SB.PCI0 DSDT window describe; it is `default y` on x86_64\n# when PCI + ACPI are set (arch/x86/Kconfig:2918) and pinned here for\n# the same defense-in-depth reason (base config still works CAM-only if\n# it is dropped). Both are x86 symbols; arm64 PCI uses\n# PCI_HOST_GENERIC, a later increment when arm64 virtio-PCI lands.\n#\n# ACPI is the linchpin of the x86 PCI transport: the guest parses the MCFG\n# (ECAM base), the _SB.PCI0 DSDT _PRT (PCI INTx -> GSI routing), and the\n# FADT PM register blocks only when ACPI is enabled. x86_64_defconfig sets\n# CONFIG_ACPI=y; pin it here for the same defense-in-depth reason \u{2014} if an\n# inherited-defconfig flip or --extra-kconfig drops it, PCI_MMCONFIG\n# (default y only on PCI && ACPI) silently drops too (ECAM gone) and PCI\n# INTx falls back to legacy MP-table routing with no NIC entry\n# (pci_dev->irq==0 -> vp_find_vqs_intx request_irq fails -> virtnet_probe\n# -EINVAL) \u{2014} the exact failure the FADT PM blocks + _PRT exist to fix, but\n# now silent. arm64 discovers devices via DT, so a compiled-in ACPI is\n# harmless there (its defconfig also sets =y); this pin is x86\'s foundation.\nCONFIG_ACPI=y\nCONFIG_PCI=y\nCONFIG_PCI_MMCONFIG=y\n# VIRTIO_PCI is the guest DRIVER that binds virtio devices on the PCI\n# transport: the virtio-net NIC enumerates as a PCI function, and only\n# this driver attaches to it (CONFIG_PCI is the bus; CONFIG_VIRTIO_NET\n# is the transport-agnostic net driver). Without it the guest sees the\n# PCI device but nothing binds, so the NIC\'s eth* interface never\n# appears. Pinned so an inherited-defconfig flip or an --extra-kconfig\n# strip surfaces as a build error rather than a silent in-guest bind\n# failure.\nCONFIG_VIRTIO_PCI=y\n\n# -- Networking --\n# PACKET: AF_PACKET sockets (net/packet/Kconfig:6 \u{2014} \"communicate directly\n# with network devices ... e.g. tcpdump\"), the raw-frame device-level TX/RX\n# path rather than the kernel-internal IP layer. PACKET has no `depends on`\n# of its own (it sits under `if NET`); NET stays on because both arch\n# defconfigs set CONFIG_NET=y. Both defconfigs also already set\n# CONFIG_PACKET=y, so this line is a defense-in-depth pin (like the\n# RANDOMIZE_* pins above), not a first-time enable: it keeps an\n# inherited-defconfig flip from silently dropping the symbol the wide-SMP\n# net-IRQ e2e (tests/wide_smp_net_irq_e2e.rs) relies on (it opens an\n# AF_PACKET raw socket).\nCONFIG_PACKET=y\n\n# -- Filesystems --\n# Btrfs: optional, for DiskConfig::Btrfs when the template-VM\n# lifecycle lands. Requires crypto deps (CRC32, BLAKE2B, SHA256,\n# ZLIB); if deps are not met, make olddefconfig silently drops it.\n# Not in VALIDATE_CONFIG_CRITICAL \u{2014} v0 uses Filesystem::Raw.\nCONFIG_BTRFS_FS=y\n\n# -- Topology (required) --\n# Multi-socket/core/thread VMs with NUMA.\n# On x86_64, X86_64_ACPI_NUMA is def_bool y when NUMA + ACPI + PCI\n# and selects ACPI_NUMA automatically.\nCONFIG_SMP=y\n# CONFIG_X86_X2APIC=y: x86 guests with >254 vCPUs need x2APIC (32-bit\n# APIC IDs). Legacy xAPIC has only 255 8-bit APIC IDs and the IOAPIC\n# consumes one, capping plain-xAPIC SMP at 254. Without this,\n# MAX_LOCAL_APIC stays 256 (arch/x86/include/asm/apicdef.h) and\n# acpi_parse_x2apic ignores every MADT x2APIC (type-9) entry\n# (arch/x86/kernel/acpi/boot.c \u{2014} its body is under #ifdef\n# CONFIG_X86_X2APIC), so no CPU with APIC ID >255 can be registered or\n# onlined. x86-only; arm64 ignores the symbol (GICv3 handles wide SMP).\nCONFIG_X86_X2APIC=y\n# CONFIG_NR_CPUS bounds the kernel\'s compile-time max CPU count. The\n# x86_64 default is 64 \u{2014} far below the wide-SMP topologies ktstr exercises.\n# x86_64 caps the Kconfig range at 512 without CONFIG_MAXSMP\n# (arch/x86/Kconfig: NR_CPUS_RANGE_END=512 unless MAXSMP selects\n# CPUMASK_OFFSTACK); an out-of-range value is silently dropped back to the\n# default 64. 512 is therefore the max settable without MAXSMP, which we\n# avoid because it force-selects DEBUG_KERNEL (can perturb scheduling\n# timing \u{2014} unwanted in a scheduler test tool). 512 covers ktstr\'s wide-SMP\n# guests (the 256-vCPU split-irqchip e2e; the documented 512 ceiling).\nCONFIG_NR_CPUS=512\nCONFIG_NUMA=y\n# HMAT: parses ACPI HMAT for memory tiering (CXL nodes).\n# NUMA_BALANCING: auto-migrates pages to local node based on access.\nCONFIG_ACPI_HMAT=y\nCONFIG_NUMA_BALANCING=y\n\n# -- Cgroups (required) --\n# Scenario cpuset partitioning and CPU controller.\nCONFIG_CGROUPS=y\nCONFIG_CPUSETS=y\nCONFIG_CGROUP_SCHED=y\n\n# -- KVM guest (required) --\n# Steal time accounting: guest scheduler sees actual CPU time\n# (not wall clock including host preemption), producing\n# bare-metal-like scheduling decisions.\n# x86: kvmclock via HYPERVISOR_GUEST + PARAVIRT.\n# arm64: SMCCC PV Time via PARAVIRT (auto-selected by\n# PARAVIRT_TIME_ACCOUNTING). HYPERVISOR_GUEST is\n# x86-only and silently ignored on arm64.\nCONFIG_HYPERVISOR_GUEST=y\nCONFIG_PARAVIRT=y\nCONFIG_PARAVIRT_TIME_ACCOUNTING=y\n\n# -- Scheduler features (safe, no behavioral change) --\n# SCHEDSTATS: runtime-disabled by default (static key NOP).\n# Opt in via sysctl kernel.sched_schedstats=1.\nCONFIG_SCHEDSTATS=y\n# SCHED_MC: multi-core scheduling awareness. Affects task\n# placement across cores within a package.\nCONFIG_SCHED_MC=y\n# TASK_DELAY_ACCT: gates /proc/<tid>/stat field 42\n# (delayacct_blkio_ticks). Without this, the field is hard-coded 0\n# in delayacct_blkio_ticks() (include/linux/delayacct.h). Also\n# `select`s SCHED_INFO. Requires runtime opt-in via the `delayacct`\n# kernel cmdline parameter or sysctl kernel.task_delayacct=1\n# (delayacct_on starts at 0 in kernel/delayacct.c).\n# Depends on TASKSTATS, which depends on NET + MULTIUSER. TASKSTATS\n# defaults to n; declare it explicitly so merge_config does not\n# silently drop TASK_DELAY_ACCT.\nCONFIG_TASKSTATS=y\nCONFIG_TASK_DELAY_ACCT=y\n# Symbols for monitor kallsyms resolution.\nCONFIG_KALLSYMS_ALL=y\n# Embedded .config for CONFIG_HZ detection by ktstr monitor.\n# IKCONFIG_PROC exposes /proc/config.gz for runtime inspection.\nCONFIG_IKCONFIG=y\nCONFIG_IKCONFIG_PROC=y\n\n# -- Accounting --\n# TASK_IO_ACCOUNTING: gates /proc/<tid>/io\n# (rchar/wchar/syscr/syscw/read_bytes/write_bytes/cancelled_write_bytes,\n# emitted by do_io_accounting() in fs/proc/base.c). The /proc/<tid>/io\n# file itself is registered under #ifdef CONFIG_TASK_IO_ACCOUNTING.\n# Depends on TASK_XACCT (which depends on TASKSTATS, declared above).\nCONFIG_TASK_XACCT=y\nCONFIG_TASK_IO_ACCOUNTING=y\n# PSI: standard pressure-stall infrastructure. Every major distro\n# ships PSI=y. Per-CPU state machines with deferred averaging \u{2014}\n# minimal scheduling impact. Required for host-level and per-cgroup\n# pressure capture. Active by default when compiled in\n# (PSI_DEFAULT_DISABLED defaults to n); registers\n# /proc/pressure/{io,memory,cpu,irq} via psi_proc_init() and\n# exposes per-cgroup *.pressure files.\nCONFIG_PSI=y\n\n# IRQ_TIME_ACCOUNTING: fine-grained IRQ time accounting. The kernel\n# timestamps each hardirq/softirq transition (irqtime_account_irq,\n# kernel/sched/cputime.c:57) and accounts that time separately instead of\n# charging it to the interrupted task; update_rq_clock_task subtracts the\n# resulting irq_delta (kernel/sched/core.c:817) so rq->clock_task EXCLUDES\n# IRQ time, and update_irq_load_avg feeds rq->avg_irq (core.c:840) \u{2014} IRQ\n# time becomes a signal distinct from task runtime. Deps (init/Kconfig:614):\n# HAVE_IRQ_TIME_ACCOUNTING \u{2014} selected by arch/x86/Kconfig:252 and\n# arch/arm64/Kconfig:212 \u{2014} and !VIRT_CPU_ACCOUNTING_NATIVE, which can never\n# be set on x86_64/arm64 (its prereq HAVE_VIRT_CPU_ACCOUNTING is selected\n# only by s390/powerpc). x86_64_defconfig lacks the symbol so this enables\n# it there; arm64 defconfig already sets it, so on arm64 this is a\n# defense-in-depth pin. =y survives make defconfig + merge + olddefconfig.\nCONFIG_IRQ_TIME_ACCOUNTING=y\n\n# -- Disabled (would alter scheduler behavior) --\n# LOCKDEP: 10-100x lock overhead. Non-representative timing.\n# Use for catching locking bugs, not for scheduler testing.\n# CONFIG_PROVE_LOCKING is not set\n# CONFIG_DEBUG_LOCKDEP is not set\n# DEBUG_INFO_REDUCED: breaks BTF generation.\n# CONFIG_DEBUG_INFO_SPLIT is not set\n\n# -- Opt-in: changes scheduler behavior intentionally --\n# Uncomment to enable broader test coverage at the cost of\n# different scheduling characteristics.\n#\n# Full preemption (kernel preemptible at most points):\n# CONFIG_PREEMPT=y\n# CONFIG_PREEMPT_DYNAMIC=y\n#\n# Core scheduling (co-schedule trusted tasks on SMT siblings):\n# CONFIG_SCHED_CORE=y\n\n# -- Disable unnecessary subsystems (faster build) --\n# CONFIG_SOUND is not set\n# CONFIG_DRM is not set\n# CONFIG_USB_SUPPORT is not set\n# CONFIG_WIRELESS is not set\n# CONFIG_BLUETOOTH is not set\n# CONFIG_INPUT is not set\n# CONFIG_FB is not set\n# CONFIG_VGA_CONSOLE is not set\n\n# strip: SHT_NOBITS for code sections (8826d5d)\n";Expand description
Contents of ktstr.kconfig (the kernel-config fragment that
enables sched_ext, BPF, kprobes, cgroups, and the other options
ktstr requires) baked into the binary at build time via
include_str!. Consumed by the kernel build pipeline to
olddefconfig a kernel source tree, and used to derive the
cache key suffix so a kconfig change produces a fresh cache
entry.