Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

mm: avoid splitting pmd for lazyfree pmd-mapped THP in try_to_unmap

The try_to_unmap_one() function currently handles PMD-mapped THPs
inefficiently. It first splits the PMD into PTEs, copies the dirty state
from the PMD to the PTEs, iterates over the PTEs to locate the dirty
state, and then marks the THP as swap-backed. This process involves
unnecessary PMD splitting and redundant iteration. Instead, this
functionality can be efficiently managed in
__discard_anon_folio_pmd_locked(), avoiding the extra steps and improving
performance.

The following microbenchmark redirties folios after invoking MADV_FREE,
then measures the time taken to perform memory reclamation (actually set
those folios swapbacked again) on the redirtied folios.

#include <stdio.h>
#include <sys/mman.h>
#include <string.h>
#include <time.h>

#define SIZE 128*1024*1024 // 128 MB

int main(int argc, char *argv[])
{
while(1) {
volatile int *p = mmap(0, SIZE, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

memset((void *)p, 1, SIZE);
madvise((void *)p, SIZE, MADV_FREE);
/* redirty after MADV_FREE */
memset((void *)p, 1, SIZE);

clock_t start_time = clock();
madvise((void *)p, SIZE, MADV_PAGEOUT);
clock_t end_time = clock();

double elapsed_time = (double)(end_time - start_time) / CLOCKS_PER_SEC;
printf("Time taken by reclamation: %f seconds\n", elapsed_time);

munmap((void *)p, SIZE);
}
return 0;
}

Testing results are as below,
w/o patch:
~ # ./a.out
Time taken by reclamation: 0.007300 seconds
Time taken by reclamation: 0.007226 seconds
Time taken by reclamation: 0.007295 seconds
Time taken by reclamation: 0.007731 seconds
Time taken by reclamation: 0.007134 seconds
Time taken by reclamation: 0.007285 seconds
Time taken by reclamation: 0.007720 seconds
Time taken by reclamation: 0.007128 seconds
Time taken by reclamation: 0.007710 seconds
Time taken by reclamation: 0.007712 seconds
Time taken by reclamation: 0.007236 seconds
Time taken by reclamation: 0.007690 seconds
Time taken by reclamation: 0.007174 seconds
Time taken by reclamation: 0.007670 seconds
Time taken by reclamation: 0.007169 seconds
Time taken by reclamation: 0.007305 seconds
Time taken by reclamation: 0.007432 seconds
Time taken by reclamation: 0.007158 seconds
Time taken by reclamation: 0.007133 seconds


w/ patch

~ # ./a.out
Time taken by reclamation: 0.002124 seconds
Time taken by reclamation: 0.002116 seconds
Time taken by reclamation: 0.002150 seconds
Time taken by reclamation: 0.002261 seconds
Time taken by reclamation: 0.002137 seconds
Time taken by reclamation: 0.002173 seconds
Time taken by reclamation: 0.002063 seconds
Time taken by reclamation: 0.002088 seconds
Time taken by reclamation: 0.002169 seconds
Time taken by reclamation: 0.002124 seconds
Time taken by reclamation: 0.002111 seconds
Time taken by reclamation: 0.002224 seconds
Time taken by reclamation: 0.002297 seconds
Time taken by reclamation: 0.002260 seconds
Time taken by reclamation: 0.002246 seconds
Time taken by reclamation: 0.002272 seconds
Time taken by reclamation: 0.002277 seconds
Time taken by reclamation: 0.002462 seconds


This patch significantly speeds up try_to_unmap_one() by allowing it
to skip redirtied THPs without splitting the PMD.

Link: https://lkml.kernel.org/r/20250214093015.51024-5-21cnbao@gmail.com
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Suggested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Suggested-by: Lance Yang <ioworker0@gmail.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Lance Yang <ioworker0@gmail.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chis Li <chrisl@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Gavin Shan <gshan@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mauricio Faria de Oliveira <mfo@canonical.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shaoqin Huang <shahuang@redhat.com>
Cc: Tangquan Zheng <zhengtangquan@oppo.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Barry Song and committed by
Andrew Morton
2f9b43d6 354dffd2

+27 -10
+17 -7
mm/huge_memory.c
··· 3063 3063 int ref_count, map_count; 3064 3064 pmd_t orig_pmd = *pmdp; 3065 3065 3066 - if (folio_test_dirty(folio) || pmd_dirty(orig_pmd)) 3066 + if (pmd_dirty(orig_pmd)) 3067 + folio_set_dirty(folio); 3068 + if (folio_test_dirty(folio) && !(vma->vm_flags & VM_DROPPABLE)) { 3069 + folio_set_swapbacked(folio); 3067 3070 return false; 3071 + } 3068 3072 3069 3073 orig_pmd = pmdp_huge_clear_flush(vma, addr, pmdp); 3070 3074 ··· 3095 3091 * 3096 3092 * The only folio refs must be one from isolation plus the rmap(s). 3097 3093 */ 3098 - if (folio_test_dirty(folio) || pmd_dirty(orig_pmd) || 3099 - ref_count != map_count + 1) { 3094 + if (pmd_dirty(orig_pmd)) 3095 + folio_set_dirty(folio); 3096 + if (folio_test_dirty(folio) && !(vma->vm_flags & VM_DROPPABLE)) { 3097 + folio_set_swapbacked(folio); 3098 + set_pmd_at(mm, addr, pmdp, orig_pmd); 3099 + return false; 3100 + } 3101 + 3102 + if (ref_count != map_count + 1) { 3100 3103 set_pmd_at(mm, addr, pmdp, orig_pmd); 3101 3104 return false; 3102 3105 } ··· 3123 3112 { 3124 3113 VM_WARN_ON_FOLIO(!folio_test_pmd_mappable(folio), folio); 3125 3114 VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio); 3115 + VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio); 3116 + VM_WARN_ON_FOLIO(folio_test_swapbacked(folio), folio); 3126 3117 VM_WARN_ON_ONCE(!IS_ALIGNED(addr, HPAGE_PMD_SIZE)); 3127 3118 3128 - if (folio_test_anon(folio) && !folio_test_swapbacked(folio)) 3129 - return __discard_anon_folio_pmd_locked(vma, addr, pmdp, folio); 3130 - 3131 - return false; 3119 + return __discard_anon_folio_pmd_locked(vma, addr, pmdp, folio); 3132 3120 } 3133 3121 3134 3122 static void remap_page(struct folio *folio, unsigned long nr, int flags)
+10 -3
mm/rmap.c
··· 1804 1804 } 1805 1805 1806 1806 if (!pvmw.pte) { 1807 - if (unmap_huge_pmd_locked(vma, pvmw.address, pvmw.pmd, 1808 - folio)) 1809 - goto walk_done; 1807 + if (folio_test_anon(folio) && !folio_test_swapbacked(folio)) { 1808 + if (unmap_huge_pmd_locked(vma, pvmw.address, pvmw.pmd, folio)) 1809 + goto walk_done; 1810 + /* 1811 + * unmap_huge_pmd_locked has either already marked 1812 + * the folio as swap-backed or decided to retain it 1813 + * due to GUP or speculative references. 1814 + */ 1815 + goto walk_abort; 1816 + } 1810 1817 1811 1818 if (flags & TTU_SPLIT_HUGE_PMD) { 1812 1819 /*