Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

khugepaged: optimize __collapse_huge_page_copy_succeeded() by PTE batching

Use PTE batching to batch process PTEs mapping the same large folio. An
improvement is expected due to batching refcount-mapcount manipulation on
the folios, and for arm64 which supports contig mappings, the number of
TLB flushes is also reduced.

Link: https://lkml.kernel.org/r/20250724052301.23844-3-dev.jain@arm.com
Signed-off-by: Dev Jain <dev.jain@arm.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Dev Jain and committed by
Andrew Morton
4ea3594a 3dfde978

+18 -7
+18 -7
mm/khugepaged.c
··· 700 700 spinlock_t *ptl, 701 701 struct list_head *compound_pagelist) 702 702 { 703 + unsigned long end = address + HPAGE_PMD_SIZE; 703 704 struct folio *src, *tmp; 704 - pte_t *_pte; 705 705 pte_t pteval; 706 + pte_t *_pte; 707 + unsigned int nr_ptes; 706 708 707 - for (_pte = pte; _pte < pte + HPAGE_PMD_NR; 708 - _pte++, address += PAGE_SIZE) { 709 + for (_pte = pte; _pte < pte + HPAGE_PMD_NR; _pte += nr_ptes, 710 + address += nr_ptes * PAGE_SIZE) { 711 + nr_ptes = 1; 709 712 pteval = ptep_get(_pte); 710 713 if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { 711 714 add_mm_counter(vma->vm_mm, MM_ANONPAGES, 1); ··· 725 722 struct page *src_page = pte_page(pteval); 726 723 727 724 src = page_folio(src_page); 728 - if (!folio_test_large(src)) 725 + 726 + if (folio_test_large(src)) { 727 + unsigned int max_nr_ptes = (end - address) >> PAGE_SHIFT; 728 + 729 + nr_ptes = folio_pte_batch(src, _pte, pteval, max_nr_ptes); 730 + } else { 729 731 release_pte_folio(src); 732 + } 733 + 730 734 /* 731 735 * ptl mostly unnecessary, but preempt has to 732 736 * be disabled to update the per-cpu stats 733 737 * inside folio_remove_rmap_pte(). 734 738 */ 735 739 spin_lock(ptl); 736 - ptep_clear(vma->vm_mm, address, _pte); 737 - folio_remove_rmap_pte(src, src_page, vma); 740 + clear_ptes(vma->vm_mm, address, _pte, nr_ptes); 741 + folio_remove_rmap_ptes(src, src_page, nr_ptes, vma); 738 742 spin_unlock(ptl); 739 - free_folio_and_swap_cache(src); 743 + free_swap_cache(src); 744 + folio_put_refs(src, nr_ptes); 740 745 } 741 746 } 742 747