Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

mm: add a batched helper to clear the young flag for large folios

Currently, MGLRU will call ptep_test_and_clear_young_notify() to check and
clear the young flag for each PTE sequentially, which is inefficient for
large folios reclamation.

Moreover, on Arm64 architecture, which supports contiguous PTEs, the
Arm64- specific ptep_test_and_clear_young() already implements an
optimization to clear the young flags for PTEs within a contiguous range.
However, this is not sufficient. Similar to the Arm64 specific
clear_flush_young_ptes(), we can extend this to perform batched operations
for the entire large folio (which might exceed the contiguous range:
CONT_PTE_SIZE).

Thus, we can introduce a new batched helper: test_and_clear_young_ptes()
and its wrapper test_and_clear_young_ptes_notify() which are consistent
with the existing functions, to perform batched checking of the young
flags for large folios, which can help improve performance during large
folio reclamation when MGLRU is enabled. And it will be overridden by the
architecture that implements a more efficient batch operation in the
following patches.

Link: https://lkml.kernel.org/r/23ec671bfcc06cd24ee0fbff8e329402742274a0.1772778858.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand (Arm) <david@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Xu <weixugc@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yuanchu Xie <yuanchu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Baolin Wang and committed by
Andrew Morton
6d7237dd 83ec1286

+48 -5
+37
include/linux/pgtable.h
··· 1103 1103 } 1104 1104 #endif 1105 1105 1106 + #ifndef test_and_clear_young_ptes 1107 + /** 1108 + * test_and_clear_young_ptes - Mark PTEs that map consecutive pages of the same 1109 + * folio as old 1110 + * @vma: The virtual memory area the pages are mapped into. 1111 + * @addr: Address the first page is mapped at. 1112 + * @ptep: Page table pointer for the first entry. 1113 + * @nr: Number of entries to clear access bit. 1114 + * 1115 + * May be overridden by the architecture; otherwise, implemented as a simple 1116 + * loop over ptep_test_and_clear_young(). 1117 + * 1118 + * Note that PTE bits in the PTE range besides the PFN can differ. For example, 1119 + * some PTEs might be write-protected. 1120 + * 1121 + * Context: The caller holds the page table lock. The PTEs map consecutive 1122 + * pages that belong to the same folio. The PTEs are all in the same PMD. 1123 + * 1124 + * Returns: whether any PTE was young. 1125 + */ 1126 + static inline int test_and_clear_young_ptes(struct vm_area_struct *vma, 1127 + unsigned long addr, pte_t *ptep, unsigned int nr) 1128 + { 1129 + int young = 0; 1130 + 1131 + for (;;) { 1132 + young |= ptep_test_and_clear_young(vma, addr, ptep); 1133 + if (--nr == 0) 1134 + break; 1135 + ptep++; 1136 + addr += PAGE_SIZE; 1137 + } 1138 + 1139 + return young; 1140 + } 1141 + #endif 1142 + 1106 1143 /* 1107 1144 * On some architectures hardware does not set page access bit when accessing 1108 1145 * memory page, it is responsibility of software setting this bit. It brings
+11 -5
mm/internal.h
··· 1819 1819 return young; 1820 1820 } 1821 1821 1822 - static inline int ptep_test_and_clear_young_notify(struct vm_area_struct *vma, 1823 - unsigned long addr, pte_t *ptep) 1822 + static inline int test_and_clear_young_ptes_notify(struct vm_area_struct *vma, 1823 + unsigned long addr, pte_t *ptep, unsigned int nr) 1824 1824 { 1825 1825 int young; 1826 1826 1827 - young = ptep_test_and_clear_young(vma, addr, ptep); 1828 - young |= mmu_notifier_clear_young(vma->vm_mm, addr, addr + PAGE_SIZE); 1827 + young = test_and_clear_young_ptes(vma, addr, ptep, nr); 1828 + young |= mmu_notifier_clear_young(vma->vm_mm, addr, addr + nr * PAGE_SIZE); 1829 1829 return young; 1830 1830 } 1831 1831 ··· 1843 1843 1844 1844 #define clear_flush_young_ptes_notify clear_flush_young_ptes 1845 1845 #define pmdp_clear_flush_young_notify pmdp_clear_flush_young 1846 - #define ptep_test_and_clear_young_notify ptep_test_and_clear_young 1846 + #define test_and_clear_young_ptes_notify test_and_clear_young_ptes 1847 1847 #define pmdp_test_and_clear_young_notify pmdp_test_and_clear_young 1848 1848 1849 1849 #endif /* CONFIG_MMU_NOTIFIER */ 1850 + 1851 + static inline int ptep_test_and_clear_young_notify(struct vm_area_struct *vma, 1852 + unsigned long addr, pte_t *ptep) 1853 + { 1854 + return test_and_clear_young_ptes_notify(vma, addr, ptep, 1); 1855 + } 1850 1856 1851 1857 #endif /* __MM_INTERNAL_H */