Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

mm/vma: rename is_vma_write_only(), separate out shared refcount put

The is_vma_writer_only() function is misnamed - this isn't determining if
there is only a write lock, as it checks for the presence of the
VM_REFCNT_EXCLUDE_READERS_FLAG.

Really, it is checking to see whether readers are excluded, with a
possibility of a false positive in the case of a detachment (there we
expect the vma->vm_refcnt to eventually be set to
VM_REFCNT_EXCLUDE_READERS_FLAG, whereas for an attached VMA we expect it
to eventually be set to VM_REFCNT_EXCLUDE_READERS_FLAG + 1).

Rename the function accordingly.

Relatedly, we use a __refcount_dec_and_test() primitive directly in
vma_refcount_put(), using the old value to determine what the reference
count ought to be after the operation is complete (ignoring racing
reference count adjustments).

Wrap this into a __vma_refcount_put_return() function, which we can then
utilise in vma_mark_detached() and thus keep the refcount primitive usage
abstracted.

This function, as the name implies, returns the value after the reference
count has been updated.

This reduces duplication in the two invocations of this function.

Also adjust comments, removing duplicative comments covered elsewhere and
adding more to aid understanding.

No functional change intended.

Link: https://lkml.kernel.org/r/32053580bff460eb1092ef780b526cefeb748bad.1769198904.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Waiman Long <longman@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Lorenzo Stoakes and committed by
Andrew Morton
180355d4 ef4c0cea

+65 -22
+55 -15
include/linux/mmap_lock.h
··· 122 122 vma->vm_lock_seq = UINT_MAX; 123 123 } 124 124 125 - static inline bool is_vma_writer_only(int refcnt) 125 + /* 126 + * This function determines whether the input VMA reference count describes a 127 + * VMA which has excluded all VMA read locks. 128 + * 129 + * In the case of a detached VMA, we may incorrectly indicate that readers are 130 + * excluded when one remains, because in that scenario we target a refcount of 131 + * VM_REFCNT_EXCLUDE_READERS_FLAG, rather than the attached target of 132 + * VM_REFCNT_EXCLUDE_READERS_FLAG + 1. 133 + * 134 + * However, the race window for that is very small so it is unlikely. 135 + * 136 + * Returns: true if readers are excluded, false otherwise. 137 + */ 138 + static inline bool __vma_are_readers_excluded(int refcnt) 126 139 { 127 140 /* 128 - * With a writer and no readers, refcnt is VM_REFCNT_EXCLUDE_READERS_FLAG 129 - * if the vma is detached and (VM_REFCNT_EXCLUDE_READERS_FLAG + 1) if it is 130 - * attached. Waiting on a detached vma happens only in 131 - * vma_mark_detached() and is a rare case, therefore most of the time 132 - * there will be no unnecessary wakeup. 133 - * 134 141 * See the comment describing the vm_area_struct->vm_refcnt field for 135 142 * details of possible refcnt values. 136 143 */ ··· 145 138 refcnt <= VM_REFCNT_EXCLUDE_READERS_FLAG + 1; 146 139 } 147 140 148 - static inline void vma_refcount_put(struct vm_area_struct *vma) 141 + /* 142 + * Actually decrement the VMA reference count. 143 + * 144 + * The function returns the reference count as it was immediately after the 145 + * decrement took place. If it returns zero, the VMA is now detached. 146 + */ 147 + static inline __must_check unsigned int 148 + __vma_refcount_put_return(struct vm_area_struct *vma) 149 149 { 150 - /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */ 151 - struct mm_struct *mm = vma->vm_mm; 152 150 int oldcnt; 153 151 154 - rwsem_release(&vma->vmlock_dep_map, _RET_IP_); 155 - if (!__refcount_dec_and_test(&vma->vm_refcnt, &oldcnt)) { 152 + if (__refcount_dec_and_test(&vma->vm_refcnt, &oldcnt)) 153 + return 0; 156 154 157 - if (is_vma_writer_only(oldcnt - 1)) 158 - rcuwait_wake_up(&mm->vma_writer_wait); 159 - } 155 + return oldcnt - 1; 156 + } 157 + 158 + /** 159 + * vma_refcount_put() - Drop reference count in VMA vm_refcnt field due to a 160 + * read-lock being dropped. 161 + * @vma: The VMA whose reference count we wish to decrement. 162 + * 163 + * If we were the last reader, wake up threads waiting to obtain an exclusive 164 + * lock. 165 + */ 166 + static inline void vma_refcount_put(struct vm_area_struct *vma) 167 + { 168 + /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt. */ 169 + struct mm_struct *mm = vma->vm_mm; 170 + int newcnt; 171 + 172 + rwsem_release(&vma->vmlock_dep_map, _RET_IP_); 173 + 174 + newcnt = __vma_refcount_put_return(vma); 175 + /* 176 + * __vma_enter_locked() may be sleeping waiting for readers to drop 177 + * their reference count, so wake it up if we were the last reader 178 + * blocking it from being acquired. 179 + * 180 + * We may be raced by other readers temporarily incrementing the 181 + * reference count, though the race window is very small, this might 182 + * cause spurious wakeups. 183 + */ 184 + if (newcnt && __vma_are_readers_excluded(newcnt)) 185 + rcuwait_wake_up(&mm->vma_writer_wait); 160 186 } 161 187 162 188 /*
+10 -7
mm/mmap_lock.c
··· 134 134 vma_assert_attached(vma); 135 135 136 136 /* 137 - * We are the only writer, so no need to use vma_refcount_put(). 138 - * The condition below is unlikely because the vma has been already 139 - * write-locked and readers can increment vm_refcnt only temporarily 140 - * before they check vm_lock_seq, realize the vma is locked and drop 141 - * back the vm_refcnt. That is a narrow window for observing a raised 142 - * vm_refcnt. 137 + * This condition - that the VMA is still attached (refcnt > 0) - is 138 + * unlikely, because the vma has been already write-locked and readers 139 + * can increment vm_refcnt only temporarily before they check 140 + * vm_lock_seq, realize the vma is locked and drop back the 141 + * vm_refcnt. That is a narrow window for observing a raised vm_refcnt. 143 142 * 144 143 * See the comment describing the vm_area_struct->vm_refcnt field for 145 144 * details of possible refcnt values. 146 145 */ 147 - if (unlikely(!refcount_dec_and_test(&vma->vm_refcnt))) { 146 + if (unlikely(__vma_refcount_put_return(vma))) { 148 147 /* Wait until vma is detached with no readers. */ 149 148 if (__vma_enter_locked(vma, true, TASK_UNINTERRUPTIBLE)) { 150 149 bool detached; 151 150 151 + /* 152 + * Once this is complete, no readers can increment the 153 + * reference count, and the VMA is marked detached. 154 + */ 152 155 __vma_exit_locked(vma, &detached); 153 156 WARN_ON_ONCE(!detached); 154 157 }