Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

workqueue: Show all busy workers in stall diagnostics

show_cpu_pool_hog() only prints workers whose task is currently running
on the CPU (task_is_running()). This misses workers that are busy
processing a work item but are sleeping or blocked — for example, a
worker that clears PF_WQ_WORKER and enters wait_event_idle(). Such a
worker still occupies a pool slot and prevents progress, yet produces
an empty backtrace section in the watchdog output.

This is happening on real arm64 systems, where
toggle_allocation_gate() IPIs every single CPU in the machine (which
lacks NMI), causing workqueue stalls that show empty backtraces because
toggle_allocation_gate() is sleeping in wait_event_idle().

Remove the task_is_running() filter so every in-flight worker in the
pool's busy_hash is dumped. The busy_hash is protected by pool->lock,
which is already held.

Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>

authored by

Breno Leitao and committed by
Tejun Heo
8823eaef e8e14ac7

+13 -15
+13 -15
kernel/workqueue.c
··· 7583 7583 7584 7584 /* 7585 7585 * Show workers that might prevent the processing of pending work items. 7586 - * The only candidates are CPU-bound workers in the running state. 7587 - * Pending work items should be handled by another idle worker 7588 - * in all other situations. 7586 + * A busy worker that is not running on the CPU (e.g. sleeping in 7587 + * wait_event_idle() with PF_WQ_WORKER cleared) can stall the pool just as 7588 + * effectively as a CPU-bound one, so dump every in-flight worker. 7589 7589 */ 7590 7590 static void show_cpu_pool_hog(struct worker_pool *pool) 7591 7591 { ··· 7596 7596 raw_spin_lock_irqsave(&pool->lock, irq_flags); 7597 7597 7598 7598 hash_for_each(pool->busy_hash, bkt, worker, hentry) { 7599 - if (task_is_running(worker->task)) { 7600 - /* 7601 - * Defer printing to avoid deadlocks in console 7602 - * drivers that queue work while holding locks 7603 - * also taken in their write paths. 7604 - */ 7605 - printk_deferred_enter(); 7599 + /* 7600 + * Defer printing to avoid deadlocks in console 7601 + * drivers that queue work while holding locks 7602 + * also taken in their write paths. 7603 + */ 7604 + printk_deferred_enter(); 7606 7605 7607 - pr_info("pool %d:\n", pool->id); 7608 - sched_show_task(worker->task); 7606 + pr_info("pool %d:\n", pool->id); 7607 + sched_show_task(worker->task); 7609 7608 7610 - printk_deferred_exit(); 7611 - } 7609 + printk_deferred_exit(); 7612 7610 } 7613 7611 7614 7612 raw_spin_unlock_irqrestore(&pool->lock, irq_flags); ··· 7617 7619 struct worker_pool *pool; 7618 7620 int pi; 7619 7621 7620 - pr_info("Showing backtraces of running workers in stalled CPU-bound worker pools:\n"); 7622 + pr_info("Showing backtraces of busy workers in stalled CPU-bound worker pools:\n"); 7621 7623 7622 7624 rcu_read_lock(); 7623 7625