Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

hold task->mempolicy while numa_maps scans.

/proc/<pid>/numa_maps scans vma and show mempolicy under
mmap_sem. It sometimes accesses task->mempolicy which can
be freed without mmap_sem and numa_maps can show some
garbage while scanning.

This patch tries to take reference count of task->mempolicy at reading
numa_maps before calling get_vma_policy(). By this, task->mempolicy
will not be freed until numa_maps reaches its end.

V2->v3
- updated comments to be more verbose.
- removed task_lock() in numa_maps code.
V1->V2
- access task->mempolicy only once and remember it. Becase kernel/exit.c
can overwrite it.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

KAMEZAWA Hiroyuki and committed by
Linus Torvalds
9e781440 3b641bf4

+51 -3
+4
fs/proc/internal.h
··· 12 12 #include <linux/sched.h> 13 13 #include <linux/proc_fs.h> 14 14 struct ctl_table_header; 15 + struct mempolicy; 15 16 16 17 extern struct proc_dir_entry proc_root; 17 18 #ifdef CONFIG_PROC_SYSCTL ··· 74 73 struct task_struct *task; 75 74 #ifdef CONFIG_MMU 76 75 struct vm_area_struct *tail_vma; 76 + #endif 77 + #ifdef CONFIG_NUMA 78 + struct mempolicy *task_mempolicy; 77 79 #endif 78 80 }; 79 81
+47 -3
fs/proc/task_mmu.c
··· 90 90 seq_printf(m, "%*c", len, ' '); 91 91 } 92 92 93 + #ifdef CONFIG_NUMA 94 + /* 95 + * These functions are for numa_maps but called in generic **maps seq_file 96 + * ->start(), ->stop() ops. 97 + * 98 + * numa_maps scans all vmas under mmap_sem and checks their mempolicy. 99 + * Each mempolicy object is controlled by reference counting. The problem here 100 + * is how to avoid accessing dead mempolicy object. 101 + * 102 + * Because we're holding mmap_sem while reading seq_file, it's safe to access 103 + * each vma's mempolicy, no vma objects will never drop refs to mempolicy. 104 + * 105 + * A task's mempolicy (task->mempolicy) has different behavior. task->mempolicy 106 + * is set and replaced under mmap_sem but unrefed and cleared under task_lock(). 107 + * So, without task_lock(), we cannot trust get_vma_policy() because we cannot 108 + * gurantee the task never exits under us. But taking task_lock() around 109 + * get_vma_plicy() causes lock order problem. 110 + * 111 + * To access task->mempolicy without lock, we hold a reference count of an 112 + * object pointed by task->mempolicy and remember it. This will guarantee 113 + * that task->mempolicy points to an alive object or NULL in numa_maps accesses. 114 + */ 115 + static void hold_task_mempolicy(struct proc_maps_private *priv) 116 + { 117 + struct task_struct *task = priv->task; 118 + 119 + task_lock(task); 120 + priv->task_mempolicy = task->mempolicy; 121 + mpol_get(priv->task_mempolicy); 122 + task_unlock(task); 123 + } 124 + static void release_task_mempolicy(struct proc_maps_private *priv) 125 + { 126 + mpol_put(priv->task_mempolicy); 127 + } 128 + #else 129 + static void hold_task_mempolicy(struct proc_maps_private *priv) 130 + { 131 + } 132 + static void release_task_mempolicy(struct proc_maps_private *priv) 133 + { 134 + } 135 + #endif 136 + 93 137 static void vma_stop(struct proc_maps_private *priv, struct vm_area_struct *vma) 94 138 { 95 139 if (vma && vma != priv->tail_vma) { 96 140 struct mm_struct *mm = vma->vm_mm; 141 + release_task_mempolicy(priv); 97 142 up_read(&mm->mmap_sem); 98 143 mmput(mm); 99 144 } ··· 177 132 178 133 tail_vma = get_gate_vma(priv->task->mm); 179 134 priv->tail_vma = tail_vma; 180 - 135 + hold_task_mempolicy(priv); 181 136 /* Start with last addr hint */ 182 137 vma = find_vma(mm, last_addr); 183 138 if (last_addr && vma) { ··· 204 159 if (vma) 205 160 return vma; 206 161 162 + release_task_mempolicy(priv); 207 163 /* End of vmas has been reached */ 208 164 m->version = (tail_vma != NULL)? 0: -1UL; 209 165 up_read(&mm->mmap_sem); ··· 1224 1178 walk.private = md; 1225 1179 walk.mm = mm; 1226 1180 1227 - task_lock(task); 1228 1181 pol = get_vma_policy(task, vma, vma->vm_start); 1229 1182 mpol_to_str(buffer, sizeof(buffer), pol, 0); 1230 1183 mpol_cond_put(pol); 1231 - task_unlock(task); 1232 1184 1233 1185 seq_printf(m, "%08lx %s", vma->vm_start, buffer); 1234 1186