mempolicy: optimize queue_folios_pte_range by PTE batching

Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

kernel os linux

After the check for queue_folio_required(), the code only cares about the
folio in the for loop, i.e the PTEs are redundant. Therefore, optimize
this loop by skipping over a PTE batch mapping the same folio.

With a test program migrating pages of the calling process, which includes
a mapped VMA of size 4GB with pte-mapped large folios of order-9, and
migrating once back and forth node-0 and node-1, the average execution
time reduces from 7.5 to 4 seconds, giving an approx 47% speedup.

Link: https://lkml.kernel.org/r/20250416053048.96479-1-dev.jain@arm.com
Signed-off-by: Dev Jain <dev.jain@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Yang Shi <yang@os.amperecomputing.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Dev Jain and committed by

Andrew Morton 1 year ago 4a34c584 75404e07

+10 -2

1 changed file

expand all

mempolicy.c

+10 -2

mm/mempolicy.c

··· 566 566 static int queue_folios_pte_range(pmd_t *pmd, unsigned long addr, 567 567 unsigned long end, struct mm_walk *walk) 568 568 { 569 + const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; 569 570 struct vm_area_struct *vma = walk->vma; 570 571 struct folio *folio; 571 572 struct queue_pages *qp = walk->private; ··· 574 573 pte_t *pte, *mapped_pte; 575 574 pte_t ptent; 576 575 spinlock_t *ptl; 576 + int max_nr, nr; 577 577 578 578 ptl = pmd_trans_huge_lock(pmd, vma); 579 579 if (ptl) { ··· 588 586 walk->action = ACTION_AGAIN; 589 587 return 0; 590 588 } 591 - for (; addr != end; pte++, addr += PAGE_SIZE) { 589 + for (; addr != end; pte += nr, addr += nr * PAGE_SIZE) { 590 + max_nr = (end - addr) >> PAGE_SHIFT; 591 + nr = 1; 592 592 ptent = ptep_get(pte); 593 593 if (pte_none(ptent)) 594 594 continue; ··· 602 598 folio = vm_normal_folio(vma, addr, ptent); 603 599 if (!folio || folio_is_zone_device(folio)) 604 600 continue; 601 + if (folio_test_large(folio) && max_nr != 1) 602 + nr = folio_pte_batch(folio, addr, pte, ptent, 603 + max_nr, fpb_flags, 604 + NULL, NULL, NULL); 605 605 /* 606 606 * vm_normal_folio() filters out zero pages, but there might 607 607 * still be reserved folios to skip, perhaps in a VDSO. ··· 638 630 if (!(flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) || 639 631 !vma_migratable(vma) || 640 632 !migrate_folio_add(folio, qp->pagelist, flags)) { 641 - qp->nr_failed++; 633 + qp->nr_failed += nr; 642 634 if (strictly_unmovable(flags)) 643 635 break; 644 636 }

Configure Feed

Configure Feed