Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

[PATCH] notifiers: fix blocking_notifier_call_chain() scalability

while lock-profiling the -rt kernel i noticed weird contention during
mmap-intense workloads, and the tracer showed the following gem, in one
of our MM hotpaths:

threaded-2771 1.... 65us : sys_munmap (sysenter_do_call)
threaded-2771 1.... 66us : profile_munmap (sys_munmap)
threaded-2771 1.... 66us : blocking_notifier_call_chain (profile_munmap)
threaded-2771 1.... 66us : rt_down_read (blocking_notifier_call_chain)

ouch! a global rw-semaphore taken in one of the most performance-
sensitive codepaths of the kernel. And i dont even have oprofile
enabled! All distro kernels have CONFIG_PROFILING enabled, so this
scalability problem affects the majority of Linux users.

The fix is to enhance blocking_notifier_call_chain() to only take the
lock if there appears to be work on the call-chain.

With this patch applied i get nicely saturated system, and much higher
munmap performance, on SMP systems.

And as a bonus this also fixes a similar scalability bottleneck in the
thread-exit codepath: profile_task_exit() ...

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Ingo Molnar and committed by
Linus Torvalds
1b5180b6 b53d0b91

+11 -4
+11 -4
kernel/sys.c
··· 323 323 int blocking_notifier_call_chain(struct blocking_notifier_head *nh, 324 324 unsigned long val, void *v) 325 325 { 326 - int ret; 326 + int ret = NOTIFY_DONE; 327 327 328 - down_read(&nh->rwsem); 329 - ret = notifier_call_chain(&nh->head, val, v); 330 - up_read(&nh->rwsem); 328 + /* 329 + * We check the head outside the lock, but if this access is 330 + * racy then it does not matter what the result of the test 331 + * is, we re-check the list after having taken the lock anyway: 332 + */ 333 + if (rcu_dereference(nh->head)) { 334 + down_read(&nh->rwsem); 335 + ret = notifier_call_chain(&nh->head, val, v); 336 + up_read(&nh->rwsem); 337 + } 331 338 return ret; 332 339 } 333 340