Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

mm owner: fix race between swapoff and exit

There's a race between mm->owner assignment and swapoff, more easily
seen when task slab poisoning is turned on. The condition occurs when
try_to_unuse() runs in parallel with an exiting task. A similar race
can occur with callers of get_task_mm(), such as /proc/<pid>/<mmstats>
or ptrace or page migration.

CPU0 CPU1
try_to_unuse
looks at mm = task0->mm
increments mm->mm_users
task 0 exits
mm->owner needs to be updated, but no
new owner is found (mm_users > 1, but
no other task has task->mm = task0->mm)
mm_update_next_owner() leaves
mmput(mm) decrements mm->mm_users
task0 freed
dereferencing mm->owner fails

The fix is to notify the subsystem via mm_owner_changed callback(),
if no new owner is found, by specifying the new task as NULL.

Jiri Slaby:
mm->owner was set to NULL prior to calling cgroup_mm_owner_callbacks(), but
must be set after that, so as not to pass NULL as old owner causing oops.

Daisuke Nishimura:
mm_update_next_owner() may set mm->owner to NULL, but mem_cgroup_from_task()
and its callers need to take account of this situation to avoid oops.

Hugh Dickins:
Lockdep warning and hang below exec_mmap() when testing these patches.
exit_mm() up_reads mmap_sem before calling mm_update_next_owner(),
so exec_mmap() now needs to do the same. And with that repositioning,
there's now no point in mm_need_new_owner() allowing for NULL mm.

Reported-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Balbir Singh and committed by
Linus Torvalds
31a78f23 bf5cb664

+31 -5
+1 -1
fs/exec.c
··· 752 752 tsk->active_mm = mm; 753 753 activate_mm(active_mm, mm); 754 754 task_unlock(tsk); 755 - mm_update_next_owner(old_mm); 756 755 arch_pick_mmap_layout(mm); 757 756 if (old_mm) { 758 757 up_read(&old_mm->mmap_sem); 759 758 BUG_ON(active_mm != old_mm); 759 + mm_update_next_owner(old_mm); 760 760 mmput(old_mm); 761 761 return 0; 762 762 }
+3 -2
kernel/cgroup.c
··· 2738 2738 */ 2739 2739 void cgroup_mm_owner_callbacks(struct task_struct *old, struct task_struct *new) 2740 2740 { 2741 - struct cgroup *oldcgrp, *newcgrp; 2741 + struct cgroup *oldcgrp, *newcgrp = NULL; 2742 2742 2743 2743 if (need_mm_owner_callback) { 2744 2744 int i; 2745 2745 for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) { 2746 2746 struct cgroup_subsys *ss = subsys[i]; 2747 2747 oldcgrp = task_cgroup(old, ss->subsys_id); 2748 - newcgrp = task_cgroup(new, ss->subsys_id); 2748 + if (new) 2749 + newcgrp = task_cgroup(new, ss->subsys_id); 2749 2750 if (oldcgrp == newcgrp) 2750 2751 continue; 2751 2752 if (ss->mm_owner_changed)
+10 -2
kernel/exit.c
··· 583 583 * If there are other users of the mm and the owner (us) is exiting 584 584 * we need to find a new owner to take on the responsibility. 585 585 */ 586 - if (!mm) 587 - return 0; 588 586 if (atomic_read(&mm->mm_users) <= 1) 589 587 return 0; 590 588 if (mm->owner != p) ··· 625 627 } while_each_thread(g, c); 626 628 627 629 read_unlock(&tasklist_lock); 630 + /* 631 + * We found no owner yet mm_users > 1: this implies that we are 632 + * most likely racing with swapoff (try_to_unuse()) or /proc or 633 + * ptrace or page migration (get_task_mm()). Mark owner as NULL, 634 + * so that subsystems can understand the callback and take action. 635 + */ 636 + down_write(&mm->mmap_sem); 637 + cgroup_mm_owner_callbacks(mm->owner, NULL); 638 + mm->owner = NULL; 639 + up_write(&mm->mmap_sem); 628 640 return; 629 641 630 642 assign_new_owner:
+17
mm/memcontrol.c
··· 250 250 251 251 struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p) 252 252 { 253 + /* 254 + * mm_update_next_owner() may clear mm->owner to NULL 255 + * if it races with swapoff, page migration, etc. 256 + * So this can be called with p == NULL. 257 + */ 258 + if (unlikely(!p)) 259 + return NULL; 260 + 253 261 return container_of(task_subsys_state(p, mem_cgroup_subsys_id), 254 262 struct mem_cgroup, css); 255 263 } ··· 557 549 if (likely(!memcg)) { 558 550 rcu_read_lock(); 559 551 mem = mem_cgroup_from_task(rcu_dereference(mm->owner)); 552 + if (unlikely(!mem)) { 553 + rcu_read_unlock(); 554 + kmem_cache_free(page_cgroup_cache, pc); 555 + return 0; 556 + } 560 557 /* 561 558 * For every charge from the cgroup, increment reference count 562 559 */ ··· 814 801 815 802 rcu_read_lock(); 816 803 mem = mem_cgroup_from_task(rcu_dereference(mm->owner)); 804 + if (unlikely(!mem)) { 805 + rcu_read_unlock(); 806 + return 0; 807 + } 817 808 css_get(&mem->css); 818 809 rcu_read_unlock(); 819 810