Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

hung_task: show the blocker task if the task is hung on semaphore

Inspired by mutex blocker tracking[1], this patch makes a trade-off to
balance the overhead and utility of the hung task detector.

Unlike mutexes, semaphores lack explicit ownership tracking, making it
challenging to identify the root cause of hangs. To address this, we
introduce a last_holder field to the semaphore structure, which is updated
when a task successfully calls down() and cleared during up().

The assumption is that if a task is blocked on a semaphore, the holders
must not have released it. While this does not guarantee that the last
holder is one of the current blockers, it likely provides a practical hint
for diagnosing semaphore-related stalls.

With this change, the hung task detector can now show blocker task's info
like below:

[Tue Apr 8 12:19:07 2025] INFO: task cat:945 blocked for more than 120 seconds.
[Tue Apr 8 12:19:07 2025] Tainted: G E 6.14.0-rc6+ #1
[Tue Apr 8 12:19:07 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Apr 8 12:19:07 2025] task:cat state:D stack:0 pid:945 tgid:945 ppid:828 task_flags:0x400000 flags:0x00000000
[Tue Apr 8 12:19:07 2025] Call Trace:
[Tue Apr 8 12:19:07 2025] <TASK>
[Tue Apr 8 12:19:07 2025] __schedule+0x491/0xbd0
[Tue Apr 8 12:19:07 2025] schedule+0x27/0xf0
[Tue Apr 8 12:19:07 2025] schedule_timeout+0xe3/0xf0
[Tue Apr 8 12:19:07 2025] ? __folio_mod_stat+0x2a/0x80
[Tue Apr 8 12:19:07 2025] ? set_ptes.constprop.0+0x27/0x90
[Tue Apr 8 12:19:07 2025] __down_common+0x155/0x280
[Tue Apr 8 12:19:07 2025] down+0x53/0x70
[Tue Apr 8 12:19:07 2025] read_dummy_semaphore+0x23/0x60
[Tue Apr 8 12:19:07 2025] full_proxy_read+0x5f/0xa0
[Tue Apr 8 12:19:07 2025] vfs_read+0xbc/0x350
[Tue Apr 8 12:19:07 2025] ? __count_memcg_events+0xa5/0x140
[Tue Apr 8 12:19:07 2025] ? count_memcg_events.constprop.0+0x1a/0x30
[Tue Apr 8 12:19:07 2025] ? handle_mm_fault+0x180/0x260
[Tue Apr 8 12:19:07 2025] ksys_read+0x66/0xe0
[Tue Apr 8 12:19:07 2025] do_syscall_64+0x51/0x120
[Tue Apr 8 12:19:07 2025] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[Tue Apr 8 12:19:07 2025] RIP: 0033:0x7f419478f46e
[Tue Apr 8 12:19:07 2025] RSP: 002b:00007fff1c4d2668 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[Tue Apr 8 12:19:07 2025] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f419478f46e
[Tue Apr 8 12:19:07 2025] RDX: 0000000000020000 RSI: 00007f4194683000 RDI: 0000000000000003
[Tue Apr 8 12:19:07 2025] RBP: 00007f4194683000 R08: 00007f4194682010 R09: 0000000000000000
[Tue Apr 8 12:19:07 2025] R10: fffffffffffffbc5 R11: 0000000000000246 R12: 0000000000000000
[Tue Apr 8 12:19:07 2025] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
[Tue Apr 8 12:19:07 2025] </TASK>
[Tue Apr 8 12:19:07 2025] INFO: task cat:945 blocked on a semaphore likely last held by task cat:938
[Tue Apr 8 12:19:07 2025] task:cat state:S stack:0 pid:938 tgid:938 ppid:584 task_flags:0x400000 flags:0x00000000
[Tue Apr 8 12:19:07 2025] Call Trace:
[Tue Apr 8 12:19:07 2025] <TASK>
[Tue Apr 8 12:19:07 2025] __schedule+0x491/0xbd0
[Tue Apr 8 12:19:07 2025] ? _raw_spin_unlock_irqrestore+0xe/0x40
[Tue Apr 8 12:19:07 2025] schedule+0x27/0xf0
[Tue Apr 8 12:19:07 2025] schedule_timeout+0x77/0xf0
[Tue Apr 8 12:19:07 2025] ? __pfx_process_timeout+0x10/0x10
[Tue Apr 8 12:19:07 2025] msleep_interruptible+0x49/0x60
[Tue Apr 8 12:19:07 2025] read_dummy_semaphore+0x2d/0x60
[Tue Apr 8 12:19:07 2025] full_proxy_read+0x5f/0xa0
[Tue Apr 8 12:19:07 2025] vfs_read+0xbc/0x350
[Tue Apr 8 12:19:07 2025] ? __count_memcg_events+0xa5/0x140
[Tue Apr 8 12:19:07 2025] ? count_memcg_events.constprop.0+0x1a/0x30
[Tue Apr 8 12:19:07 2025] ? handle_mm_fault+0x180/0x260
[Tue Apr 8 12:19:07 2025] ksys_read+0x66/0xe0
[Tue Apr 8 12:19:07 2025] do_syscall_64+0x51/0x120
[Tue Apr 8 12:19:07 2025] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[Tue Apr 8 12:19:07 2025] RIP: 0033:0x7f7c584a646e
[Tue Apr 8 12:19:07 2025] RSP: 002b:00007ffdba8ce158 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[Tue Apr 8 12:19:07 2025] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f7c584a646e
[Tue Apr 8 12:19:07 2025] RDX: 0000000000020000 RSI: 00007f7c5839a000 RDI: 0000000000000003
[Tue Apr 8 12:19:07 2025] RBP: 00007f7c5839a000 R08: 00007f7c58399010 R09: 0000000000000000
[Tue Apr 8 12:19:07 2025] R10: fffffffffffffbc5 R11: 0000000000000246 R12: 0000000000000000
[Tue Apr 8 12:19:07 2025] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
[Tue Apr 8 12:19:07 2025] </TASK>

[1] https://lore.kernel.org/all/174046694331.2194069.15472952050240807469.stgit@mhiramat.tok.corp.google.com

Link: https://lkml.kernel.org/r/20250414145945.84916-3-ioworker0@gmail.com
Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
Signed-off-by: Lance Yang <ioworker0@gmail.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Anna Schumaker <anna.schumaker@oracle.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joel Granados <joel.granados@kernel.org>
Cc: John Stultz <jstultz@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yongliang Gao <leonylgao@tencent.com>
Cc: Zi Li <amaindex@outlook.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Lance Yang and committed by
Andrew Morton
194a9b9e e711faaa

+106 -18
+14 -1
include/linux/semaphore.h
··· 16 16 raw_spinlock_t lock; 17 17 unsigned int count; 18 18 struct list_head wait_list; 19 + 20 + #ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER 21 + unsigned long last_holder; 22 + #endif 19 23 }; 24 + 25 + #ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER 26 + #define __LAST_HOLDER_SEMAPHORE_INITIALIZER \ 27 + , .last_holder = 0UL 28 + #else 29 + #define __LAST_HOLDER_SEMAPHORE_INITIALIZER 30 + #endif 20 31 21 32 #define __SEMAPHORE_INITIALIZER(name, n) \ 22 33 { \ 23 34 .lock = __RAW_SPIN_LOCK_UNLOCKED((name).lock), \ 24 35 .count = n, \ 25 - .wait_list = LIST_HEAD_INIT((name).wait_list), \ 36 + .wait_list = LIST_HEAD_INIT((name).wait_list) \ 37 + __LAST_HOLDER_SEMAPHORE_INITIALIZER \ 26 38 } 27 39 28 40 /* ··· 59 47 extern int __must_check down_trylock(struct semaphore *sem); 60 48 extern int __must_check down_timeout(struct semaphore *sem, long jiffies); 61 49 extern void up(struct semaphore *sem); 50 + extern unsigned long sem_last_holder(struct semaphore *sem); 62 51 63 52 #endif /* __LINUX_SEMAPHORE_H */
+41 -11
kernel/hung_task.c
··· 99 99 static void debug_show_blocker(struct task_struct *task) 100 100 { 101 101 struct task_struct *g, *t; 102 - unsigned long owner, blocker; 102 + unsigned long owner, blocker, blocker_type; 103 103 104 104 RCU_LOCKDEP_WARN(!rcu_read_lock_held(), "No rcu lock held"); 105 105 106 106 blocker = READ_ONCE(task->blocker); 107 - if (!blocker || 108 - hung_task_get_blocker_type(blocker) != BLOCKER_TYPE_MUTEX) 107 + if (!blocker) 109 108 return; 110 109 111 - owner = mutex_get_owner( 112 - (struct mutex *)hung_task_blocker_to_lock(blocker)); 110 + blocker_type = hung_task_get_blocker_type(blocker); 111 + 112 + switch (blocker_type) { 113 + case BLOCKER_TYPE_MUTEX: 114 + owner = mutex_get_owner( 115 + (struct mutex *)hung_task_blocker_to_lock(blocker)); 116 + break; 117 + case BLOCKER_TYPE_SEM: 118 + owner = sem_last_holder( 119 + (struct semaphore *)hung_task_blocker_to_lock(blocker)); 120 + break; 121 + default: 122 + WARN_ON_ONCE(1); 123 + return; 124 + } 125 + 113 126 114 127 if (unlikely(!owner)) { 115 - pr_err("INFO: task %s:%d is blocked on a mutex, but the owner is not found.\n", 116 - task->comm, task->pid); 128 + switch (blocker_type) { 129 + case BLOCKER_TYPE_MUTEX: 130 + pr_err("INFO: task %s:%d is blocked on a mutex, but the owner is not found.\n", 131 + task->comm, task->pid); 132 + break; 133 + case BLOCKER_TYPE_SEM: 134 + pr_err("INFO: task %s:%d is blocked on a semaphore, but the last holder is not found.\n", 135 + task->comm, task->pid); 136 + break; 137 + } 117 138 return; 118 139 } 119 140 120 141 /* Ensure the owner information is correct. */ 121 142 for_each_process_thread(g, t) { 122 - if ((unsigned long)t == owner) { 143 + if ((unsigned long)t != owner) 144 + continue; 145 + 146 + switch (blocker_type) { 147 + case BLOCKER_TYPE_MUTEX: 123 148 pr_err("INFO: task %s:%d is blocked on a mutex likely owned by task %s:%d.\n", 124 - task->comm, task->pid, t->comm, t->pid); 125 - sched_show_task(t); 126 - return; 149 + task->comm, task->pid, t->comm, t->pid); 150 + break; 151 + case BLOCKER_TYPE_SEM: 152 + pr_err("INFO: task %s:%d blocked on a semaphore likely last held by task %s:%d\n", 153 + task->comm, task->pid, t->comm, t->pid); 154 + break; 127 155 } 156 + sched_show_task(t); 157 + return; 128 158 } 129 159 } 130 160 #else
+51 -6
kernel/locking/semaphore.c
··· 34 34 #include <linux/spinlock.h> 35 35 #include <linux/ftrace.h> 36 36 #include <trace/events/lock.h> 37 + #include <linux/hung_task.h> 37 38 38 39 static noinline void __down(struct semaphore *sem); 39 40 static noinline int __down_interruptible(struct semaphore *sem); 40 41 static noinline int __down_killable(struct semaphore *sem); 41 42 static noinline int __down_timeout(struct semaphore *sem, long timeout); 42 43 static noinline void __up(struct semaphore *sem, struct wake_q_head *wake_q); 44 + 45 + #ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER 46 + static inline void hung_task_sem_set_holder(struct semaphore *sem) 47 + { 48 + WRITE_ONCE((sem)->last_holder, (unsigned long)current); 49 + } 50 + 51 + static inline void hung_task_sem_clear_if_holder(struct semaphore *sem) 52 + { 53 + if (READ_ONCE((sem)->last_holder) == (unsigned long)current) 54 + WRITE_ONCE((sem)->last_holder, 0UL); 55 + } 56 + 57 + unsigned long sem_last_holder(struct semaphore *sem) 58 + { 59 + return READ_ONCE(sem->last_holder); 60 + } 61 + #else 62 + static inline void hung_task_sem_set_holder(struct semaphore *sem) 63 + { 64 + } 65 + static inline void hung_task_sem_clear_if_holder(struct semaphore *sem) 66 + { 67 + } 68 + unsigned long sem_last_holder(struct semaphore *sem) 69 + { 70 + return 0UL; 71 + } 72 + #endif 73 + 74 + static inline void __sem_acquire(struct semaphore *sem) 75 + { 76 + sem->count--; 77 + hung_task_sem_set_holder(sem); 78 + } 43 79 44 80 /** 45 81 * down - acquire the semaphore ··· 95 59 might_sleep(); 96 60 raw_spin_lock_irqsave(&sem->lock, flags); 97 61 if (likely(sem->count > 0)) 98 - sem->count--; 62 + __sem_acquire(sem); 99 63 else 100 64 __down(sem); 101 65 raw_spin_unlock_irqrestore(&sem->lock, flags); ··· 119 83 might_sleep(); 120 84 raw_spin_lock_irqsave(&sem->lock, flags); 121 85 if (likely(sem->count > 0)) 122 - sem->count--; 86 + __sem_acquire(sem); 123 87 else 124 88 result = __down_interruptible(sem); 125 89 raw_spin_unlock_irqrestore(&sem->lock, flags); ··· 146 110 might_sleep(); 147 111 raw_spin_lock_irqsave(&sem->lock, flags); 148 112 if (likely(sem->count > 0)) 149 - sem->count--; 113 + __sem_acquire(sem); 150 114 else 151 115 result = __down_killable(sem); 152 116 raw_spin_unlock_irqrestore(&sem->lock, flags); ··· 176 140 raw_spin_lock_irqsave(&sem->lock, flags); 177 141 count = sem->count - 1; 178 142 if (likely(count >= 0)) 179 - sem->count = count; 143 + __sem_acquire(sem); 180 144 raw_spin_unlock_irqrestore(&sem->lock, flags); 181 145 182 146 return (count < 0); ··· 201 165 might_sleep(); 202 166 raw_spin_lock_irqsave(&sem->lock, flags); 203 167 if (likely(sem->count > 0)) 204 - sem->count--; 168 + __sem_acquire(sem); 205 169 else 206 170 result = __down_timeout(sem, timeout); 207 171 raw_spin_unlock_irqrestore(&sem->lock, flags); ··· 223 187 DEFINE_WAKE_Q(wake_q); 224 188 225 189 raw_spin_lock_irqsave(&sem->lock, flags); 190 + 191 + hung_task_sem_clear_if_holder(sem); 192 + 226 193 if (likely(list_empty(&sem->wait_list))) 227 194 sem->count++; 228 195 else ··· 267 228 raw_spin_unlock_irq(&sem->lock); 268 229 timeout = schedule_timeout(timeout); 269 230 raw_spin_lock_irq(&sem->lock); 270 - if (waiter.up) 231 + if (waiter.up) { 232 + hung_task_sem_set_holder(sem); 271 233 return 0; 234 + } 272 235 } 273 236 274 237 timed_out: ··· 287 246 { 288 247 int ret; 289 248 249 + hung_task_set_blocker(sem, BLOCKER_TYPE_SEM); 250 + 290 251 trace_contention_begin(sem, 0); 291 252 ret = ___down_common(sem, state, timeout); 292 253 trace_contention_end(sem, ret); 254 + 255 + hung_task_clear_blocker(); 293 256 294 257 return ret; 295 258 }