Module flock

Module flock 

Source
Expand description

Advisory flock(2) primitives shared across every ktstr lock file.

ktstr uses advisory flock(2) in four places:

  • LLC reservation locks at {lock_dir}/ktstr-llc-{N}.lock and per-CPU locks at {lock_dir}/ktstr-cpu-{C}.lock where lock_dir is resolved by crate::cache::resolve_lock_dir (KTSTR_LOCK_DIR env var, fallback /tmp). See crate::vmm::host_topology::acquire_resource_locks and friends.
  • Per-cache-entry coordination locks at {cache_root}/.locks/{cache_key}.lock (see crate::cache::CacheDir::acquire_shared_lock and friends).
  • Per-source-tree build locks at {cache_root}/.locks/source-{path_hash}.lock (see crate::cli::kernel_build::build::acquire_source_tree_lock) — serialize concurrent make invocations against the same kernel source checkout.
  • Observational enumeration from ktstr locks --json — a read-only scan that does NOT acquire flocks; reads /proc/locks through read_holders to attribute holders without contending with active acquirers.

All four share:

  • Non-blocking LOCK_NB attempt (the cache-entry path wraps this in a poll loop for timed-wait semantics).
  • O_CLOEXEC on every open so the kernel’s “release flock when the last fd referring to the OFD closes” invariant matches what OwnedFd::drop does — a leaked fd across exec(2) would keep the lock alive in the child and fool the next acquirer’s /proc/locks scan into naming the wrong pid.
  • /proc/locks parsing keyed on the mount-point-derived {major:02x}:{minor:02x}:{inode} triple, resolved via /proc/self/mountinfo (not stat().st_dev — see below).
  • HolderInfo with pid + truncated /proc/{pid}/cmdline for actionable error messages.

§Module layout

Each submodule owns a single, cohesive subsystem:

  • fs_filter — refuses to operate on filesystems where flock(2) is unreliable (NFS, CIFS/SMB, CephFS, AFS, FUSE).
  • primitives — the kernel-syscall wrappers (try_flock / block_flock / materialize) that open a lockfile and request a flock operation.
  • mountinfo/proc/self/mountinfo parser and the {major:02x}:{minor:02x}:{inode} needle derivation that proc_locks keys off.
  • proc_locks/proc/locks scanner that enumerates the PIDs holding a given lockfile’s flock.
  • holder — converts a PID into a HolderInfo (reads /proc/{pid}/cmdline) and renders a &[HolderInfo] into a multi-line operator-facing string.
  • acquire — high-level poll-with-timeout helper that wraps primitives::try_flock in a deadline loop and decorates timeout errors with the holder list from proc_locks and holder.

§Why mountinfo, not stat().st_dev

/proc/locks emits i_sb->s_dev for each held flock — the filesystem’s superblock device id. For most filesystems that matches stat().st_dev, but on btrfs, overlayfs, and bind-mounts the kernel installs a custom getattr implementation that returns an anonymous device id (anon_dev) distinct from s_dev. That divergence means the stat-derived needle would never match the /proc/locks line — a naive read_holders would silently return empty on every btrfs-backed /tmp, every overlay-rootfs container, and every bind-mounted /tmp, which is a silent correctness failure for --cpu-cap contention diagnostics and the ktstr locks observational command.

Needle production (see mountinfo::needle_from_path):

mountinfo::needle_from_path resolves path to the mount-point covering it via /proc/self/mountinfo (longest-prefix match on the mount_point field), then reads the {major:minor} field of that mount entry. Combines with stat().st_ino for the full triple. The mountinfo {major:minor} is the kernel’s i_sb->s_dev verbatim, so the resulting needle matches /proc/locks by construction. The needle feeds proc_locks::read_holders_for_needle, which scans /proc/locks exactly once and byte-compares.

§Remote-filesystem rejection

try_flock refuses to operate on NFS / CIFS / SMB2 / CEPH / AFS / FUSE (see fs_filter::reject_remote_fs). flock(2) on those filesystems is either advisory-only under some server configurations (NFSv3 without NLM coordination) or silently returns success without serializing peers (FUSE when the userspace server doesn’t implement the flock op). ktstr’s resource-budget contract is not robust to that silent degradation, so the safe call is to reject at lockfile-open time with an actionable message.

Structs§

HolderInfo
Identity of a process holding an advisory flock. Used by error messages in both LLC-coordination and cache-entry paths, plus the ktstr locks observational subcommand.

Enums§

FlockMode
Requested sharing mode for try_flock. Translated to the corresponding non-blocking [rustix::fs::FlockOperation] internally; callers never see the libc-specific constants.

Functions§

block_flock
Blocking variant of try_flock. Opens the lockfile (creating it if absent), then issues a blocking flock(2) that parks the caller in the kernel until the lock is available. Use after try_flock returns None to wait for a live peer to finish.
format_holder_list
Format a HolderInfo slice for inclusion in user-facing error strings. Empty slice yields the NO_HOLDERS_RECORDED sentinel so the diagnostic is unambiguous — a stale lockfile whose holder has exited presents as empty, and the error should say so rather than print a misleading blank. Non-empty renders one pid={pid} cmd={cmdline} line per holder, newline-separated and indented two spaces, so a multi-holder error stays readable when embedded in a wrapping anyhow chain; the prior comma-joined form ran every holder into a single wide line that terminals wrapped arbitrarily mid-cmdline.
try_flock
Open a lock file and attempt flock with LOCK_NB.