Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

mm: Fix a hmm_range_fault() livelock / starvation problem

If hmm_range_fault() fails a folio_trylock() in do_swap_page,
trying to acquire the lock of a device-private folio for migration,
to ram, the function will spin until it succeeds grabbing the lock.

However, if the process holding the lock is depending on a work
item to be completed, which is scheduled on the same CPU as the
spinning hmm_range_fault(), that work item might be starved and
we end up in a livelock / starvation situation which is never
resolved.

This can happen, for example if the process holding the
device-private folio lock is stuck in
migrate_device_unmap()->lru_add_drain_all()
sinc lru_add_drain_all() requires a short work-item
to be run on all online cpus to complete.

A prerequisite for this to happen is:
a) Both zone device and system memory folios are considered in
migrate_device_unmap(), so that there is a reason to call
lru_add_drain_all() for a system memory folio while a
folio lock is held on a zone device folio.
b) The zone device folio has an initial mapcount > 1 which causes
at least one migration PTE entry insertion to be deferred to
try_to_migrate(), which can happen after the call to
lru_add_drain_all().
c) No or voluntary only preemption.

This all seems pretty unlikely to happen, but indeed is hit by
the "xe_exec_system_allocator" igt test.

Resolve this by waiting for the folio to be unlocked if the
folio_trylock() fails in do_swap_page().

Rename migration_entry_wait_on_locked() to
softleaf_entry_wait_unlock() and update its documentation to
indicate the new use-case.

Future code improvements might consider moving
the lru_add_drain_all() call in migrate_device_unmap() to be
called *after* all pages have migration entries inserted.
That would eliminate also b) above.

v2:
- Instead of a cond_resched() in hmm_range_fault(),
eliminate the problem by waiting for the folio to be unlocked
in do_swap_page() (Alistair Popple, Andrew Morton)
v3:
- Add a stub migration_entry_wait_on_locked() for the
!CONFIG_MIGRATION case. (Kernel Test Robot)
v4:
- Rename migrate_entry_wait_on_locked() to
softleaf_entry_wait_on_locked() and update docs (Alistair Popple)
v5:
- Add a WARN_ON_ONCE() for the !CONFIG_MIGRATION
version of softleaf_entry_wait_on_locked().
- Modify wording around function names in the commit message
(Andrew Morton)

Suggested-by: Alistair Popple <apopple@nvidia.com>
Fixes: 1afaeb8293c9 ("mm/migrate: Trylock device page in do_swap_page")
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: linux-mm@kvack.org
Cc: <dri-devel@lists.freedesktop.org>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: <stable@vger.kernel.org> # v6.15+
Reviewed-by: John Hubbard <jhubbard@nvidia.com> #v3
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Link: https://patch.msgid.link/20260210115653.92413-1-thomas.hellstrom@linux.intel.com
(cherry picked from commit a69d1ab971a624c6f112cea61536569d579c3215)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

authored by

Thomas Hellström and committed by
Rodrigo Vivi
b570f37a 99f9b534

+26 -12
+9 -1
include/linux/migrate.h
··· 65 65 66 66 int migrate_huge_page_move_mapping(struct address_space *mapping, 67 67 struct folio *dst, struct folio *src); 68 - void migration_entry_wait_on_locked(softleaf_t entry, spinlock_t *ptl) 68 + void softleaf_entry_wait_on_locked(softleaf_t entry, spinlock_t *ptl) 69 69 __releases(ptl); 70 70 void folio_migrate_flags(struct folio *newfolio, struct folio *folio); 71 71 int folio_migrate_mapping(struct address_space *mapping, ··· 95 95 static inline int set_movable_ops(const struct movable_operations *ops, enum pagetype type) 96 96 { 97 97 return -ENOSYS; 98 + } 99 + 100 + static inline void softleaf_entry_wait_on_locked(softleaf_t entry, spinlock_t *ptl) 101 + __releases(ptl) 102 + { 103 + WARN_ON_ONCE(1); 104 + 105 + spin_unlock(ptl); 98 106 } 99 107 100 108 #endif /* CONFIG_MIGRATION */
+10 -5
mm/filemap.c
··· 1379 1379 1380 1380 #ifdef CONFIG_MIGRATION 1381 1381 /** 1382 - * migration_entry_wait_on_locked - Wait for a migration entry to be removed 1383 - * @entry: migration swap entry. 1382 + * softleaf_entry_wait_on_locked - Wait for a migration entry or 1383 + * device_private entry to be removed. 1384 + * @entry: migration or device_private swap entry. 1384 1385 * @ptl: already locked ptl. This function will drop the lock. 1385 1386 * 1386 - * Wait for a migration entry referencing the given page to be removed. This is 1387 + * Wait for a migration entry referencing the given page, or device_private 1388 + * entry referencing a dvice_private page to be unlocked. This is 1387 1389 * equivalent to folio_put_wait_locked(folio, TASK_UNINTERRUPTIBLE) except 1388 1390 * this can be called without taking a reference on the page. Instead this 1389 - * should be called while holding the ptl for the migration entry referencing 1391 + * should be called while holding the ptl for @entry referencing 1390 1392 * the page. 1391 1393 * 1392 1394 * Returns after unlocking the ptl. ··· 1396 1394 * This follows the same logic as folio_wait_bit_common() so see the comments 1397 1395 * there. 1398 1396 */ 1399 - void migration_entry_wait_on_locked(softleaf_t entry, spinlock_t *ptl) 1397 + void softleaf_entry_wait_on_locked(softleaf_t entry, spinlock_t *ptl) 1400 1398 __releases(ptl) 1401 1399 { 1402 1400 struct wait_page_queue wait_page; ··· 1430 1428 * If a migration entry exists for the page the migration path must hold 1431 1429 * a valid reference to the page, and it must take the ptl to remove the 1432 1430 * migration entry. So the page is valid until the ptl is dropped. 1431 + * Similarly any path attempting to drop the last reference to a 1432 + * device-private page needs to grab the ptl to remove the device-private 1433 + * entry. 1433 1434 */ 1434 1435 spin_unlock(ptl); 1435 1436
+2 -1
mm/memory.c
··· 4763 4763 unlock_page(vmf->page); 4764 4764 put_page(vmf->page); 4765 4765 } else { 4766 - pte_unmap_unlock(vmf->pte, vmf->ptl); 4766 + pte_unmap(vmf->pte); 4767 + softleaf_entry_wait_on_locked(entry, vmf->ptl); 4767 4768 } 4768 4769 } else if (softleaf_is_hwpoison(entry)) { 4769 4770 ret = VM_FAULT_HWPOISON;
+4 -4
mm/migrate.c
··· 500 500 if (!softleaf_is_migration(entry)) 501 501 goto out; 502 502 503 - migration_entry_wait_on_locked(entry, ptl); 503 + softleaf_entry_wait_on_locked(entry, ptl); 504 504 return; 505 505 out: 506 506 spin_unlock(ptl); ··· 532 532 * If migration entry existed, safe to release vma lock 533 533 * here because the pgtable page won't be freed without the 534 534 * pgtable lock released. See comment right above pgtable 535 - * lock release in migration_entry_wait_on_locked(). 535 + * lock release in softleaf_entry_wait_on_locked(). 536 536 */ 537 537 hugetlb_vma_unlock_read(vma); 538 - migration_entry_wait_on_locked(entry, ptl); 538 + softleaf_entry_wait_on_locked(entry, ptl); 539 539 return; 540 540 } 541 541 ··· 553 553 ptl = pmd_lock(mm, pmd); 554 554 if (!pmd_is_migration_entry(*pmd)) 555 555 goto unlock; 556 - migration_entry_wait_on_locked(softleaf_from_pmd(*pmd), ptl); 556 + softleaf_entry_wait_on_locked(softleaf_from_pmd(*pmd), ptl); 557 557 return; 558 558 unlock: 559 559 spin_unlock(ptl);
+1 -1
mm/migrate_device.c
··· 176 176 } 177 177 178 178 if (softleaf_is_migration(entry)) { 179 - migration_entry_wait_on_locked(entry, ptl); 179 + softleaf_entry_wait_on_locked(entry, ptl); 180 180 spin_unlock(ptl); 181 181 return -EAGAIN; 182 182 }