Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

mm/rmap: fix potential out-of-bounds page table access during batched unmap

As pointed out by David[1], the batched unmap logic in
try_to_unmap_one() may read past the end of a PTE table when a large
folio's PTE mappings are not fully contained within a single page
table.

While this scenario might be rare, an issue triggerable from userspace
must be fixed regardless of its likelihood. This patch fixes the
out-of-bounds access by refactoring the logic into a new helper,
folio_unmap_pte_batch().

The new helper correctly calculates the safe batch size by capping the
scan at both the VMA and PMD boundaries. To simplify the code, it also
supports partial batching (i.e., any number of pages from 1 up to the
calculated safe maximum), as there is no strong reason to special-case
for fully mapped folios.

Link: https://lkml.kernel.org/r/20250701143100.6970-1-lance.yang@linux.dev
Link: https://lkml.kernel.org/r/20250630011305.23754-1-lance.yang@linux.dev
Link: https://lkml.kernel.org/r/20250627062319.84936-1-lance.yang@linux.dev
Link: https://lore.kernel.org/linux-mm/a694398c-9f03-4737-81b9-7e49c857fcbe@redhat.com [1]
Fixes: 354dffd29575 ("mm: support batched unmap for lazyfree large folios during reclamation")
Signed-off-by: Lance Yang <lance.yang@linux.dev>
Suggested-by: David Hildenbrand <david@redhat.com>
Reported-by: David Hildenbrand <david@redhat.com>
Closes: https://lore.kernel.org/linux-mm/a694398c-9f03-4737-81b9-7e49c857fcbe@redhat.com
Suggested-by: Barry Song <baohua@kernel.org>
Acked-by: Barry Song <baohua@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: "Huang, Ying" <huang.ying.caritas@gmail.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mingzhe Yang <mingzhe.yang@ly.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Tangquan Zheng <zhengtangquan@oppo.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Lance Yang and committed by
Andrew Morton
ddd05742 c39b8745

+28 -18
+28 -18
mm/rmap.c
··· 1845 1845 #endif 1846 1846 } 1847 1847 1848 - /* We support batch unmapping of PTEs for lazyfree large folios */ 1849 - static inline bool can_batch_unmap_folio_ptes(unsigned long addr, 1850 - struct folio *folio, pte_t *ptep) 1848 + static inline unsigned int folio_unmap_pte_batch(struct folio *folio, 1849 + struct page_vma_mapped_walk *pvmw, 1850 + enum ttu_flags flags, pte_t pte) 1851 1851 { 1852 1852 const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; 1853 - int max_nr = folio_nr_pages(folio); 1854 - pte_t pte = ptep_get(ptep); 1853 + unsigned long end_addr, addr = pvmw->address; 1854 + struct vm_area_struct *vma = pvmw->vma; 1855 + unsigned int max_nr; 1855 1856 1857 + if (flags & TTU_HWPOISON) 1858 + return 1; 1859 + if (!folio_test_large(folio)) 1860 + return 1; 1861 + 1862 + /* We may only batch within a single VMA and a single page table. */ 1863 + end_addr = pmd_addr_end(addr, vma->vm_end); 1864 + max_nr = (end_addr - addr) >> PAGE_SHIFT; 1865 + 1866 + /* We only support lazyfree batching for now ... */ 1856 1867 if (!folio_test_anon(folio) || folio_test_swapbacked(folio)) 1857 - return false; 1868 + return 1; 1858 1869 if (pte_unused(pte)) 1859 - return false; 1860 - if (pte_pfn(pte) != folio_pfn(folio)) 1861 - return false; 1870 + return 1; 1862 1871 1863 - return folio_pte_batch(folio, addr, ptep, pte, max_nr, fpb_flags, NULL, 1864 - NULL, NULL) == max_nr; 1872 + return folio_pte_batch(folio, addr, pvmw->pte, pte, max_nr, fpb_flags, 1873 + NULL, NULL, NULL); 1865 1874 } 1866 1875 1867 1876 /* ··· 2033 2024 if (pte_dirty(pteval)) 2034 2025 folio_mark_dirty(folio); 2035 2026 } else if (likely(pte_present(pteval))) { 2036 - if (folio_test_large(folio) && !(flags & TTU_HWPOISON) && 2037 - can_batch_unmap_folio_ptes(address, folio, pvmw.pte)) 2038 - nr_pages = folio_nr_pages(folio); 2027 + nr_pages = folio_unmap_pte_batch(folio, &pvmw, flags, pteval); 2039 2028 end_addr = address + nr_pages * PAGE_SIZE; 2040 2029 flush_cache_range(vma, address, end_addr); 2041 2030 ··· 2213 2206 hugetlb_remove_rmap(folio); 2214 2207 } else { 2215 2208 folio_remove_rmap_ptes(folio, subpage, nr_pages, vma); 2216 - folio_ref_sub(folio, nr_pages - 1); 2217 2209 } 2218 2210 if (vma->vm_flags & VM_LOCKED) 2219 2211 mlock_drain_local(); 2220 - folio_put(folio); 2221 - /* We have already batched the entire folio */ 2222 - if (nr_pages > 1) 2212 + folio_put_refs(folio, nr_pages); 2213 + 2214 + /* 2215 + * If we are sure that we batched the entire folio and cleared 2216 + * all PTEs, we can just optimize and stop right here. 2217 + */ 2218 + if (nr_pages == folio_nr_pages(folio)) 2223 2219 goto walk_done; 2224 2220 continue; 2225 2221 walk_abort: