Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

hung_task: refactor detection logic and atomicise detection count

Patch series "hung_task: Provide runtime reset interface for hung task
detector", v9.

This series introduces the ability to reset
/proc/sys/kernel/hung_task_detect_count.

Writing a "0" value to this file atomically resets the counter of detected
hung tasks. This functionality provides system administrators with the
means to clear the cumulative diagnostic history following incident
resolution, thereby simplifying subsequent monitoring without
necessitating a system restart.


This patch (of 3):

The check_hung_task() function currently conflates two distinct
responsibilities: validating whether a task is hung and handling the
subsequent reporting (printing warnings, triggering panics, or
tracepoints).

This patch refactors the logic by introducing hung_task_info(), a function
dedicated solely to reporting. The actual detection check,
task_is_hung(), is hoisted into the primary loop within
check_hung_uninterruptible_tasks(). This separation clearly decouples the
mechanism of detection from the policy of reporting.

Furthermore, to facilitate future support for concurrent hung task
detection, the global sysctl_hung_task_detect_count variable is converted
from unsigned long to atomic_long_t. Consequently, the counting logic is
updated to accumulate the number of hung tasks locally (this_round_count)
during the iteration. The global counter is then updated atomically via
atomic_long_cmpxchg_relaxed() once the loop concludes, rather than
incrementally during the scan.

These changes are strictly preparatory and introduce no functional change
to the system's runtime behaviour.

Link: https://lkml.kernel.org/r/20260303203031.4097316-1-atomlin@atomlin.com
Link: https://lkml.kernel.org/r/20260303203031.4097316-2-atomlin@atomlin.com
Signed-off-by: Aaron Tomlin <atomlin@atomlin.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Joel Granados <joel.granados@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Aaron Tomlin and committed by
Andrew Morton
00b5cdeb c8f42847

+33 -25
+33 -25
kernel/hung_task.c
··· 36 36 /* 37 37 * Total number of tasks detected as hung since boot: 38 38 */ 39 - static unsigned long __read_mostly sysctl_hung_task_detect_count; 39 + static atomic_long_t sysctl_hung_task_detect_count = ATOMIC_LONG_INIT(0); 40 40 41 41 /* 42 42 * Limit number of tasks checked in a batch. ··· 223 223 } 224 224 #endif 225 225 226 - static void check_hung_task(struct task_struct *t, unsigned long timeout, 227 - unsigned long prev_detect_count) 226 + /** 227 + * hung_task_info - Print diagnostic details for a hung task 228 + * @t: Pointer to the detected hung task. 229 + * @timeout: Timeout threshold for detecting hung tasks 230 + * @this_round_count: Count of hung tasks detected in the current iteration 231 + * 232 + * Print structured information about the specified hung task, if warnings 233 + * are enabled or if the panic batch threshold is exceeded. 234 + */ 235 + static void hung_task_info(struct task_struct *t, unsigned long timeout, 236 + unsigned long this_round_count) 228 237 { 229 - unsigned long total_hung_task; 230 - 231 - if (!task_is_hung(t, timeout)) 232 - return; 233 - 234 - /* 235 - * This counter tracks the total number of tasks detected as hung 236 - * since boot. 237 - */ 238 - sysctl_hung_task_detect_count++; 239 - 240 - total_hung_task = sysctl_hung_task_detect_count - prev_detect_count; 241 238 trace_sched_process_hang(t); 242 239 243 - if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) { 240 + if (sysctl_hung_task_panic && this_round_count >= sysctl_hung_task_panic) { 244 241 console_verbose(); 245 242 hung_task_call_panic = true; 246 243 } 247 244 248 245 /* 249 - * Ok, the task did not get scheduled for more than 2 minutes, 250 - * complain: 246 + * The given task did not get scheduled for more than 247 + * CONFIG_DEFAULT_HUNG_TASK_TIMEOUT. Therefore, complain 248 + * accordingly 251 249 */ 252 250 if (sysctl_hung_task_warnings || hung_task_call_panic) { 253 251 if (sysctl_hung_task_warnings > 0) ··· 295 297 296 298 /* 297 299 * Check whether a TASK_UNINTERRUPTIBLE does not get woken up for 298 - * a really long time (120 seconds). If that happens, print out 299 - * a warning. 300 + * a really long time. If that happens, print out a warning. 300 301 */ 301 302 static void check_hung_uninterruptible_tasks(unsigned long timeout) 302 303 { 303 304 int max_count = sysctl_hung_task_check_count; 304 305 unsigned long last_break = jiffies; 305 306 struct task_struct *g, *t; 306 - unsigned long prev_detect_count = sysctl_hung_task_detect_count; 307 + unsigned long total_count, this_round_count; 307 308 int need_warning = sysctl_hung_task_warnings; 308 309 unsigned long si_mask = hung_task_si_mask; 309 310 311 + total_count = atomic_long_read(&sysctl_hung_task_detect_count); 310 312 /* 311 313 * If the system crashed already then all bets are off, 312 314 * do not report extra hung tasks: ··· 314 316 if (test_taint(TAINT_DIE) || did_panic) 315 317 return; 316 318 317 - 319 + this_round_count = 0; 318 320 rcu_read_lock(); 319 321 for_each_process_thread(g, t) { 320 - 321 322 if (!max_count--) 322 323 goto unlock; 323 324 if (time_after(jiffies, last_break + HUNG_TASK_LOCK_BREAK)) { ··· 325 328 last_break = jiffies; 326 329 } 327 330 328 - check_hung_task(t, timeout, prev_detect_count); 331 + if (task_is_hung(t, timeout)) { 332 + this_round_count++; 333 + hung_task_info(t, timeout, this_round_count); 334 + } 329 335 } 330 336 unlock: 331 337 rcu_read_unlock(); 332 338 333 - if (!(sysctl_hung_task_detect_count - prev_detect_count)) 339 + if (!this_round_count) 334 340 return; 341 + 342 + /* 343 + * This counter tracks the total number of tasks detected as hung 344 + * since boot. 345 + */ 346 + atomic_long_cmpxchg_relaxed(&sysctl_hung_task_detect_count, 347 + total_count, total_count + 348 + this_round_count); 335 349 336 350 if (need_warning || hung_task_call_panic) { 337 351 si_mask |= SYS_INFO_LOCKS;