Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

hung_task: enable runtime reset of hung_task_detect_count

Currently, the hung_task_detect_count sysctl provides a cumulative count
of hung tasks since boot. In long-running, high-availability
environments, this counter may lose its utility if it cannot be reset once
an incident has been resolved. Furthermore, the previous implementation
relied upon implicit ordering, which could not strictly guarantee that
diagnostic metadata published by one CPU was visible to the panic logic on
another.

This patch introduces the capability to reset the detection count by
writing "0" to the hung_task_detect_count sysctl. The proc_handler logic
has been updated to validate this input and atomically reset the counter.

The synchronisation of sysctl_hung_task_detect_count relies upon a
transactional model to ensure the integrity of the detection counter
against concurrent resets from userspace. The application of
atomic_long_read_acquire() and atomic_long_cmpxchg_release() is correct
and provides the following guarantees:

1. Prevention of Load-Store Reordering via Acquire Semantics By
utilising atomic_long_read_acquire() to snapshot the counter
before initiating the task traversal, we establish a strict
memory barrier. This prevents the compiler or hardware from
reordering the initial load to a point later in the scan. Without
this "acquire" barrier, a delayed load could potentially read a
"0" value resulting from a userspace reset that occurred
mid-scan. This would lead to the subsequent cmpxchg succeeding
erroneously, thereby overwriting the user's reset with stale
increment data.

2. Atomicity of the "Commit" Phase via Release Semantics The
atomic_long_cmpxchg_release() serves as the transaction's commit
point. The "release" barrier ensures that all diagnostic
recordings and task-state observations made during the scan are
globally visible before the counter is incremented.

3. Race Condition Resolution This pairing effectively detects any
"out-of-band" reset of the counter. If
sysctl_hung_task_detect_count is modified via the procfs
interface during the scan, the final cmpxchg will detect the
discrepancy between the current value and the "acquire" snapshot.
Consequently, the update will fail, ensuring that a reset command
from the administrator is prioritised over a scan that may have
been invalidated by that very reset.

Link: https://lkml.kernel.org/r/20260303203031.4097316-3-atomlin@atomlin.com
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Joel Granados <joel.granados@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Lance Yang <lance.yang@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Aaron Tomlin and committed by
Andrew Morton
49085e1b 00b5cdeb

+53 -8
+2 -1
Documentation/admin-guide/sysctl/kernel.rst
··· 418 418 ====================== 419 419 420 420 Indicates the total number of tasks that have been detected as hung since 421 - the system boot. 421 + the system boot or since the counter was reset. The counter is zeroed when 422 + a value of 0 is written. 422 423 423 424 This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled. 424 425
+51 -7
kernel/hung_task.c
··· 306 306 int need_warning = sysctl_hung_task_warnings; 307 307 unsigned long si_mask = hung_task_si_mask; 308 308 309 - total_count = atomic_long_read(&sysctl_hung_task_detect_count); 309 + /* 310 + * The counter might get reset. Remember the initial value. 311 + * Acquire prevents reordering task checks before this point. 312 + */ 313 + total_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count); 310 314 /* 311 315 * If the system crashed already then all bets are off, 312 316 * do not report extra hung tasks: ··· 341 337 return; 342 338 343 339 /* 344 - * This counter tracks the total number of tasks detected as hung 345 - * since boot. 340 + * Do not count this round when the global counter has been reset 341 + * during this check. Release ensures we see all hang details 342 + * recorded during the scan. 346 343 */ 347 - atomic_long_cmpxchg_relaxed(&sysctl_hung_task_detect_count, 344 + atomic_long_cmpxchg_release(&sysctl_hung_task_detect_count, 348 345 total_count, total_count + 349 346 this_round_count); 350 347 ··· 371 366 } 372 367 373 368 #ifdef CONFIG_SYSCTL 369 + 370 + /** 371 + * proc_dohung_task_detect_count - proc handler for hung_task_detect_count 372 + * @table: Pointer to the struct ctl_table definition for this proc entry 373 + * @dir: Flag indicating the operation 374 + * @buffer: User space buffer for data transfer 375 + * @lenp: Pointer to the length of the data being transferred 376 + * @ppos: Pointer to the current file offset 377 + * 378 + * This handler is used for reading the current hung task detection count 379 + * and for resetting it to zero when a write operation is performed using a 380 + * zero value only. 381 + * Return: 0 on success, or a negative error code on failure. 382 + */ 383 + static int proc_dohung_task_detect_count(const struct ctl_table *table, int dir, 384 + void *buffer, size_t *lenp, loff_t *ppos) 385 + { 386 + unsigned long detect_count; 387 + struct ctl_table proxy_table; 388 + int err; 389 + 390 + proxy_table = *table; 391 + proxy_table.data = &detect_count; 392 + 393 + if (SYSCTL_KERN_TO_USER(dir)) 394 + detect_count = atomic_long_read(&sysctl_hung_task_detect_count); 395 + 396 + err = proc_doulongvec_minmax(&proxy_table, dir, buffer, lenp, ppos); 397 + if (err < 0) 398 + return err; 399 + 400 + if (SYSCTL_USER_TO_KERN(dir)) { 401 + if (detect_count) 402 + return -EINVAL; 403 + atomic_long_set(&sysctl_hung_task_detect_count, 0); 404 + } 405 + 406 + return 0; 407 + } 408 + 374 409 /* 375 410 * Process updating of timeout sysctl 376 411 */ ··· 491 446 }, 492 447 { 493 448 .procname = "hung_task_detect_count", 494 - .data = &sysctl_hung_task_detect_count, 495 449 .maxlen = sizeof(unsigned long), 496 - .mode = 0444, 497 - .proc_handler = proc_doulongvec_minmax, 450 + .mode = 0644, 451 + .proc_handler = proc_dohung_task_detect_count, 498 452 }, 499 453 { 500 454 .procname = "hung_task_sys_info",