Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

mm/vma: add and use vma_assert_stabilised()

Sometimes we wish to assert that a VMA is stable, that is - the VMA cannot
be changed underneath us. This will be the case if EITHER the VMA lock or
the mmap lock is held.

In order to do so, we introduce a new assert vma_assert_stabilised() -
this will make a lockdep assert if lockdep is enabled AND the VMA is
read-locked.

Currently lockdep tracking for VMA write locks is not implemented, so it
suffices to check in this case that we have either an mmap read or write
semaphore held.

Note that because the VMA lock uses the non-standard vmlock_dep_map naming
convention, we cannot use lockdep_assert_is_write_held() so have to open
code this ourselves via lockdep-asserting that
lock_is_held_type(&vma->vmlock_dep_map, 0).

We have to be careful here - for instance when merging a VMA, we use the
mmap write lock to stabilise the examination of adjacent VMAs which might
be simultaneously VMA read-locked whilst being faulted in.

If we were to assert VMA read lock using lockdep we would encounter an
incorrect lockdep assert.

Also, we have to be careful about asserting mmap locks are held - if we
try to address the above issue by first checking whether mmap lock is held
and if so asserting it via lockdep, we may find that we were raced by
another thread acquiring an mmap read lock simultaneously that either we
don't own (and thus can be released any time - so we are not stable) or
was indeed released since we last checked.

So to deal with these complexities we end up with either a precise (if
lockdep is enabled) or imprecise (if not) approach - in the first instance
we assert the lock is held using lockdep and thus whether we own it.

If we do own it, then the check is complete, otherwise we must check for
the VMA read lock being held (VMA write lock implies mmap write lock so
the mmap lock suffices for this).

If lockdep is not enabled we simply check if the mmap lock is held and
risk a false negative (i.e. not asserting when we should do).

There are a couple places in the kernel where we already do this
stabliisation check - the anon_vma_name() helper in mm/madvise.c and
vma_flag_set_atomic() in include/linux/mm.h, which we update to use
vma_assert_stabilised().

This change abstracts these into vma_assert_stabilised(), uses lockdep if
possible, and avoids a duplicate check of whether the mmap lock is held.

This is also self-documenting and lays the foundations for further VMA
stability checks in the code.

The only functional change here is adding the lockdep check.

Link: https://lkml.kernel.org/r/6c9e64bb2b56ddb6f806fde9237f8a00cb3a776b.1769198904.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Waiman Long <longman@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Lorenzo Stoakes and committed by
Andrew Morton
17fd82c3 256c1193

+54 -7
+1 -4
include/linux/mm.h
··· 1008 1008 { 1009 1009 unsigned long *bitmap = ACCESS_PRIVATE(&vma->flags, __vma_flags); 1010 1010 1011 - /* mmap read lock/VMA read lock must be held. */ 1012 - if (!rwsem_is_locked(&vma->vm_mm->mmap_lock)) 1013 - vma_assert_locked(vma); 1014 - 1011 + vma_assert_stabilised(vma); 1015 1012 if (__vma_flag_atomic_valid(vma, bit)) 1016 1013 set_bit((__force int)bit, bitmap); 1017 1014 }
+52
include/linux/mmap_lock.h
··· 374 374 vma_assert_write_locked(vma); 375 375 } 376 376 377 + /** 378 + * vma_assert_stabilised() - assert that this VMA cannot be changed from 379 + * underneath us either by having a VMA or mmap lock held. 380 + * @vma: The VMA whose stability we wish to assess. 381 + * 382 + * If lockdep is enabled we can precisely ensure stability via either an mmap 383 + * lock owned by us or a specific VMA lock. 384 + * 385 + * With lockdep disabled we may sometimes race with other threads acquiring the 386 + * mmap read lock simultaneous with our VMA read lock. 387 + */ 388 + static inline void vma_assert_stabilised(struct vm_area_struct *vma) 389 + { 390 + /* 391 + * If another thread owns an mmap lock, it may go away at any time, and 392 + * thus is no guarantee of stability. 393 + * 394 + * If lockdep is enabled we can accurately determine if an mmap lock is 395 + * held and owned by us. Otherwise we must approximate. 396 + * 397 + * It doesn't necessarily mean we are not stabilised however, as we may 398 + * hold a VMA read lock (not a write lock as this would require an owned 399 + * mmap lock). 400 + * 401 + * If (assuming lockdep is not enabled) we were to assert a VMA read 402 + * lock first we may also run into issues, as other threads can hold VMA 403 + * read locks simlutaneous to us. 404 + * 405 + * Therefore if lockdep is not enabled we risk a false negative (i.e. no 406 + * assert fired). If accurate checking is required, enable lockdep. 407 + */ 408 + if (IS_ENABLED(CONFIG_LOCKDEP)) { 409 + if (lockdep_is_held(&vma->vm_mm->mmap_lock)) 410 + return; 411 + } else { 412 + if (rwsem_is_locked(&vma->vm_mm->mmap_lock)) 413 + return; 414 + } 415 + 416 + /* 417 + * We're not stabilised by the mmap lock, so assert that we're 418 + * stabilised by a VMA lock. 419 + */ 420 + vma_assert_locked(vma); 421 + } 422 + 377 423 static inline bool vma_is_attached(struct vm_area_struct *vma) 378 424 { 379 425 return refcount_read(&vma->vm_refcnt); ··· 519 473 520 474 static inline void vma_assert_locked(struct vm_area_struct *vma) 521 475 { 476 + mmap_assert_locked(vma->vm_mm); 477 + } 478 + 479 + static inline void vma_assert_stabilised(struct vm_area_struct *vma) 480 + { 481 + /* If no VMA locks, then either mmap lock suffices to stabilise. */ 522 482 mmap_assert_locked(vma->vm_mm); 523 483 } 524 484
+1 -3
mm/madvise.c
··· 109 109 110 110 struct anon_vma_name *anon_vma_name(struct vm_area_struct *vma) 111 111 { 112 - if (!rwsem_is_locked(&vma->vm_mm->mmap_lock)) 113 - vma_assert_locked(vma); 114 - 112 + vma_assert_stabilised(vma); 115 113 return vma->anon_name; 116 114 } 117 115