Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

perf/core: Fix slow perf_event_task_exit() with LBR callstacks

I got a report that a task is stuck in perf_event_exit_task() waiting
for global_ctx_data_rwsem. On large systems with lots threads, it'd
have performance issues when it grabs the lock to iterate all threads
in the system to allocate the context data.

And it'd block task exit path which is problematic especially under
memory pressure.

perf_event_open
perf_event_alloc
attach_perf_ctx_data
attach_global_ctx_data
percpu_down_write (global_ctx_data_rwsem)
for_each_process_thread
alloc_task_ctx_data
do_exit
perf_event_exit_task
percpu_down_read (global_ctx_data_rwsem)

It should not hold the global_ctx_data_rwsem on the exit path. Let's
skip allocation for exiting tasks and free the data carefully.

Reported-by: Rosalie Fang <rosaliefang@google.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260112165157.1919624-1-namhyung@kernel.org

authored by

Namhyung Kim and committed by
Peter Zijlstra
4960626f eebe6446

+18 -2
+18 -2
kernel/events/core.c
··· 5421 5421 return -ENOMEM; 5422 5422 5423 5423 for (;;) { 5424 - if (try_cmpxchg((struct perf_ctx_data **)&task->perf_ctx_data, &old, cd)) { 5424 + if (try_cmpxchg(&task->perf_ctx_data, &old, cd)) { 5425 5425 if (old) 5426 5426 perf_free_ctx_data_rcu(old); 5427 + /* 5428 + * Above try_cmpxchg() pairs with try_cmpxchg() from 5429 + * detach_task_ctx_data() such that 5430 + * if we race with perf_event_exit_task(), we must 5431 + * observe PF_EXITING. 5432 + */ 5433 + if (task->flags & PF_EXITING) { 5434 + /* detach_task_ctx_data() may free it already */ 5435 + if (try_cmpxchg(&task->perf_ctx_data, &cd, NULL)) 5436 + perf_free_ctx_data_rcu(cd); 5437 + } 5427 5438 return 0; 5428 5439 } 5429 5440 ··· 5480 5469 /* Allocate everything */ 5481 5470 scoped_guard (rcu) { 5482 5471 for_each_process_thread(g, p) { 5472 + if (p->flags & PF_EXITING) 5473 + continue; 5483 5474 cd = rcu_dereference(p->perf_ctx_data); 5484 5475 if (cd && !cd->global) { 5485 5476 cd->global = 1; ··· 14575 14562 14576 14563 /* 14577 14564 * Detach the perf_ctx_data for the system-wide event. 14565 + * 14566 + * Done without holding global_ctx_data_rwsem; typically 14567 + * attach_global_ctx_data() will skip over this task, but otherwise 14568 + * attach_task_ctx_data() will observe PF_EXITING. 14578 14569 */ 14579 - guard(percpu_read)(&global_ctx_data_rwsem); 14580 14570 detach_task_ctx_data(task); 14581 14571 } 14582 14572