Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

hung_task: increment the global counter immediately

A recent change allowed to reset the global counter of hung tasks using
the sysctl interface. A potential race with the regular check has been
solved by updating the global counter only once at the end of the check.

However, the hung task check can take a significant amount of time,
particularly when task information is being dumped to slow serial
consoles. Some users monitor this global counter to trigger immediate
migration of critical containers. Delaying the increment until the full
check completes postpones these high-priority rescue operations.

Update the global counter as soon as a hung task is detected. Since the
value is read asynchronously, a relaxed atomic operation is sufficient.

Link: https://lkml.kernel.org/r/20260303203031.4097316-4-atomlin@atomlin.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
Reported-by: Lance Yang <lance.yang@linux.dev>
Closes: https://lore.kernel.org/r/f239e00f-4282-408d-b172-0f9885f4b01b@linux.dev
Reviewed-by: Aaron Tomlin <atomlin@atomlin.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Petr Mladek and committed by
Andrew Morton
5eaef7f8 49085e1b

+8 -15
+8 -15
kernel/hung_task.c
··· 302 302 int max_count = sysctl_hung_task_check_count; 303 303 unsigned long last_break = jiffies; 304 304 struct task_struct *g, *t; 305 - unsigned long total_count, this_round_count; 305 + unsigned long this_round_count; 306 306 int need_warning = sysctl_hung_task_warnings; 307 307 unsigned long si_mask = hung_task_si_mask; 308 308 309 - /* 310 - * The counter might get reset. Remember the initial value. 311 - * Acquire prevents reordering task checks before this point. 312 - */ 313 - total_count = atomic_long_read_acquire(&sysctl_hung_task_detect_count); 314 309 /* 315 310 * If the system crashed already then all bets are off, 316 311 * do not report extra hung tasks: ··· 325 330 } 326 331 327 332 if (task_is_hung(t, timeout)) { 333 + /* 334 + * Increment the global counter so that userspace could 335 + * start migrating tasks ASAP. But count the current 336 + * round separately because userspace could reset 337 + * the global counter at any time. 338 + */ 339 + atomic_long_inc(&sysctl_hung_task_detect_count); 328 340 this_round_count++; 329 341 hung_task_info(t, timeout, this_round_count); 330 342 } ··· 341 339 342 340 if (!this_round_count) 343 341 return; 344 - 345 - /* 346 - * Do not count this round when the global counter has been reset 347 - * during this check. Release ensures we see all hang details 348 - * recorded during the scan. 349 - */ 350 - atomic_long_cmpxchg_release(&sysctl_hung_task_detect_count, 351 - total_count, total_count + 352 - this_round_count); 353 342 354 343 if (need_warning || hung_task_call_panic) { 355 344 si_mask |= SYS_INFO_LOCKS;