Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

mm/rmap: fix a mlock race condition in folio_referenced_one()

The mlock_vma_folio() function requires the page table lock to be held in
order to safely mlock the folio. However, folio_referenced_one() mlocks a
large folios outside of the page_vma_mapped_walk() loop where the page
table lock has already been dropped.

Rework the mlock logic to use the same code path inside the loop for both
large and small folios.

Use PVMW_PGTABLE_CROSSED to detect when the folio is mapped across a page
table boundary.

[akpm@linux-foundation.org: s/CROSSSED/CROSSED/]
Link: https://lkml.kernel.org/r/20250923110711.690639-3-kirill@shutemov.name
Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Kiryl Shutsemau and committed by
Andrew Morton
a2880202 2db57983

+21 -38
+21 -38
mm/rmap.c
··· 850 850 { 851 851 struct folio_referenced_arg *pra = arg; 852 852 DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); 853 - int referenced = 0; 854 - unsigned long start = address, ptes = 0; 853 + int ptes = 0, referenced = 0; 855 854 856 855 while (page_vma_mapped_walk(&pvmw)) { 857 856 address = pvmw.address; 858 857 859 858 if (vma->vm_flags & VM_LOCKED) { 860 - if (!folio_test_large(folio) || !pvmw.pte) { 861 - /* Restore the mlock which got missed */ 862 - mlock_vma_folio(folio, vma); 863 - page_vma_mapped_walk_done(&pvmw); 864 - pra->vm_flags |= VM_LOCKED; 865 - return false; /* To break the loop */ 866 - } 867 - /* 868 - * For large folio fully mapped to VMA, will 869 - * be handled after the pvmw loop. 870 - * 871 - * For large folio cross VMA boundaries, it's 872 - * expected to be picked by page reclaim. But 873 - * should skip reference of pages which are in 874 - * the range of VM_LOCKED vma. As page reclaim 875 - * should just count the reference of pages out 876 - * the range of VM_LOCKED vma. 877 - */ 878 859 ptes++; 879 860 pra->mapcount--; 880 - continue; 861 + 862 + /* Only mlock fully mapped pages */ 863 + if (pvmw.pte && ptes != pvmw.nr_pages) 864 + continue; 865 + 866 + /* 867 + * All PTEs must be protected by page table lock in 868 + * order to mlock the page. 869 + * 870 + * If page table boundary has been cross, current ptl 871 + * only protect part of ptes. 872 + */ 873 + if (pvmw.flags & PVMW_PGTABLE_CROSSED) 874 + continue; 875 + 876 + /* Restore the mlock which got missed */ 877 + mlock_vma_folio(folio, vma); 878 + page_vma_mapped_walk_done(&pvmw); 879 + pra->vm_flags |= VM_LOCKED; 880 + return false; /* To break the loop */ 881 881 } 882 882 883 883 /* ··· 911 911 } 912 912 913 913 pra->mapcount--; 914 - } 915 - 916 - if ((vma->vm_flags & VM_LOCKED) && 917 - folio_test_large(folio) && 918 - folio_within_vma(folio, vma)) { 919 - unsigned long s_align, e_align; 920 - 921 - s_align = ALIGN_DOWN(start, PMD_SIZE); 922 - e_align = ALIGN_DOWN(start + folio_size(folio) - 1, PMD_SIZE); 923 - 924 - /* folio doesn't cross page table boundary and fully mapped */ 925 - if ((s_align == e_align) && (ptes == folio_nr_pages(folio))) { 926 - /* Restore the mlock which got missed */ 927 - mlock_vma_folio(folio, vma); 928 - pra->vm_flags |= VM_LOCKED; 929 - return false; /* To break the loop */ 930 - } 931 914 } 932 915 933 916 if (referenced)