Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

khugepaged: optimize collapse_pte_mapped_thp() by PTE batching

Use PTE batching to batch process PTEs mapping the same large folio. An
improvement is expected due to batching mapcount manipulation on the
folios, and for arm64 which supports contig mappings, the number of
TLB flushes is also reduced.

Note that we do not need to make a change to the check
"if (folio_page(folio, i) != page)"; if i'th page of the folio is equal
to the first page of our batch, then i + 1, .... i + nr_batch_ptes - 1
pages of the folio will be equal to the corresponding pages of our
batch mapping consecutive pages.

Link: https://lkml.kernel.org/r/20250724052301.23844-4-dev.jain@arm.com
Signed-off-by: Dev Jain <dev.jain@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Dev Jain and committed by
Andrew Morton
22d02290 4ea3594a

+21 -12
+21 -12
mm/khugepaged.c
··· 1503 1503 int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, 1504 1504 bool install_pmd) 1505 1505 { 1506 + int nr_mapped_ptes = 0, result = SCAN_FAIL; 1507 + unsigned int nr_batch_ptes; 1506 1508 struct mmu_notifier_range range; 1507 1509 bool notified = false; 1508 1510 unsigned long haddr = addr & HPAGE_PMD_MASK; 1511 + unsigned long end = haddr + HPAGE_PMD_SIZE; 1509 1512 struct vm_area_struct *vma = vma_lookup(mm, haddr); 1510 1513 struct folio *folio; 1511 1514 pte_t *start_pte, *pte; 1512 1515 pmd_t *pmd, pgt_pmd; 1513 1516 spinlock_t *pml = NULL, *ptl; 1514 - int nr_ptes = 0, result = SCAN_FAIL; 1515 1517 int i; 1516 1518 1517 1519 mmap_assert_locked(mm); ··· 1627 1625 goto abort; 1628 1626 1629 1627 /* step 2: clear page table and adjust rmap */ 1630 - for (i = 0, addr = haddr, pte = start_pte; 1631 - i < HPAGE_PMD_NR; i++, addr += PAGE_SIZE, pte++) { 1628 + for (i = 0, addr = haddr, pte = start_pte; i < HPAGE_PMD_NR; 1629 + i += nr_batch_ptes, addr += nr_batch_ptes * PAGE_SIZE, 1630 + pte += nr_batch_ptes) { 1631 + unsigned int max_nr_batch_ptes = (end - addr) >> PAGE_SHIFT; 1632 1632 struct page *page; 1633 1633 pte_t ptent = ptep_get(pte); 1634 + 1635 + nr_batch_ptes = 1; 1634 1636 1635 1637 if (pte_none(ptent)) 1636 1638 continue; ··· 1649 1643 goto abort; 1650 1644 } 1651 1645 page = vm_normal_page(vma, addr, ptent); 1646 + 1652 1647 if (folio_page(folio, i) != page) 1653 1648 goto abort; 1649 + 1650 + nr_batch_ptes = folio_pte_batch(folio, pte, ptent, max_nr_batch_ptes); 1654 1651 1655 1652 /* 1656 1653 * Must clear entry, or a racing truncate may re-remove it. 1657 1654 * TLB flush can be left until pmdp_collapse_flush() does it. 1658 1655 * PTE dirty? Shmem page is already dirty; file is read-only. 1659 1656 */ 1660 - ptep_clear(mm, addr, pte); 1661 - folio_remove_rmap_pte(folio, page, vma); 1662 - nr_ptes++; 1657 + clear_ptes(mm, addr, pte, nr_batch_ptes); 1658 + folio_remove_rmap_ptes(folio, page, nr_batch_ptes, vma); 1659 + nr_mapped_ptes += nr_batch_ptes; 1663 1660 } 1664 1661 1665 1662 if (!pml) 1666 1663 spin_unlock(ptl); 1667 1664 1668 1665 /* step 3: set proper refcount and mm_counters. */ 1669 - if (nr_ptes) { 1670 - folio_ref_sub(folio, nr_ptes); 1671 - add_mm_counter(mm, mm_counter_file(folio), -nr_ptes); 1666 + if (nr_mapped_ptes) { 1667 + folio_ref_sub(folio, nr_mapped_ptes); 1668 + add_mm_counter(mm, mm_counter_file(folio), -nr_mapped_ptes); 1672 1669 } 1673 1670 1674 1671 /* step 4: remove empty page table */ ··· 1704 1695 : SCAN_SUCCEED; 1705 1696 goto drop_folio; 1706 1697 abort: 1707 - if (nr_ptes) { 1698 + if (nr_mapped_ptes) { 1708 1699 flush_tlb_mm(mm); 1709 - folio_ref_sub(folio, nr_ptes); 1710 - add_mm_counter(mm, mm_counter_file(folio), -nr_ptes); 1700 + folio_ref_sub(folio, nr_mapped_ptes); 1701 + add_mm_counter(mm, mm_counter_file(folio), -nr_mapped_ptes); 1711 1702 } 1712 1703 unlock: 1713 1704 if (start_pte)