Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

mm/huge_memory: don't mark refcounted folios special in vmf_insert_folio_pmd()

Marking PMDs that map a "normal" refcounted folios as special is against
our rules documented for vm_normal_page(): normal (refcounted) folios
shall never have the page table mapping marked as special.

Fortunately, there are not that many pmd_special() check that can be
mislead, and most vm_normal_page_pmd()/vm_normal_folio_pmd() users that
would get this wrong right now are rather harmless: e.g., none so far
bases decisions whether to grab a folio reference on that decision.

Well, and GUP-fast will fallback to GUP-slow. All in all, so far no big
implications as it seems.

Getting this right will get more important as we use
folio_normal_page_pmd() in more places.

Fix it by teaching insert_pfn_pmd() to properly handle folios and pfns --
moving refcount/mapcount/etc handling in there, renaming it to
insert_pmd(), and distinguishing between both cases using a new simple
"struct folio_or_pfn" structure.

Use folio_mk_pmd() to create a pmd for a folio cleanly.

Link: https://lkml.kernel.org/r/20250613092702.1943533-3-david@redhat.com
Fixes: 6c88f72691f8 ("mm/huge_memory: add vmf_insert_folio_pmd()")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

David Hildenbrand and committed by
Andrew Morton
c4297465 09fefdca

+40 -19
+40 -19
mm/huge_memory.c
··· 1372 1372 return __do_huge_pmd_anonymous_page(vmf); 1373 1373 } 1374 1374 1375 - static int insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr, 1376 - pmd_t *pmd, pfn_t pfn, pgprot_t prot, bool write, 1377 - pgtable_t pgtable) 1375 + struct folio_or_pfn { 1376 + union { 1377 + struct folio *folio; 1378 + pfn_t pfn; 1379 + }; 1380 + bool is_folio; 1381 + }; 1382 + 1383 + static int insert_pmd(struct vm_area_struct *vma, unsigned long addr, 1384 + pmd_t *pmd, struct folio_or_pfn fop, pgprot_t prot, 1385 + bool write, pgtable_t pgtable) 1378 1386 { 1379 1387 struct mm_struct *mm = vma->vm_mm; 1380 1388 pmd_t entry; ··· 1390 1382 lockdep_assert_held(pmd_lockptr(mm, pmd)); 1391 1383 1392 1384 if (!pmd_none(*pmd)) { 1385 + const unsigned long pfn = fop.is_folio ? folio_pfn(fop.folio) : 1386 + pfn_t_to_pfn(fop.pfn); 1387 + 1393 1388 if (write) { 1394 - if (pmd_pfn(*pmd) != pfn_t_to_pfn(pfn)) { 1389 + if (pmd_pfn(*pmd) != pfn) { 1395 1390 WARN_ON_ONCE(!is_huge_zero_pmd(*pmd)); 1396 1391 return -EEXIST; 1397 1392 } ··· 1407 1396 return -EEXIST; 1408 1397 } 1409 1398 1410 - entry = pmd_mkhuge(pfn_t_pmd(pfn, prot)); 1411 - if (pfn_t_devmap(pfn)) 1412 - entry = pmd_mkdevmap(entry); 1413 - else 1414 - entry = pmd_mkspecial(entry); 1399 + if (fop.is_folio) { 1400 + entry = folio_mk_pmd(fop.folio, vma->vm_page_prot); 1401 + 1402 + folio_get(fop.folio); 1403 + folio_add_file_rmap_pmd(fop.folio, &fop.folio->page, vma); 1404 + add_mm_counter(mm, mm_counter_file(fop.folio), HPAGE_PMD_NR); 1405 + } else { 1406 + entry = pmd_mkhuge(pfn_t_pmd(fop.pfn, prot)); 1407 + 1408 + if (pfn_t_devmap(fop.pfn)) 1409 + entry = pmd_mkdevmap(entry); 1410 + else 1411 + entry = pmd_mkspecial(entry); 1412 + } 1415 1413 if (write) { 1416 1414 entry = pmd_mkyoung(pmd_mkdirty(entry)); 1417 1415 entry = maybe_pmd_mkwrite(entry, vma); ··· 1451 1431 unsigned long addr = vmf->address & PMD_MASK; 1452 1432 struct vm_area_struct *vma = vmf->vma; 1453 1433 pgprot_t pgprot = vma->vm_page_prot; 1434 + struct folio_or_pfn fop = { 1435 + .pfn = pfn, 1436 + }; 1454 1437 pgtable_t pgtable = NULL; 1455 1438 spinlock_t *ptl; 1456 1439 int error; ··· 1481 1458 pfnmap_setup_cachemode_pfn(pfn_t_to_pfn(pfn), &pgprot); 1482 1459 1483 1460 ptl = pmd_lock(vma->vm_mm, vmf->pmd); 1484 - error = insert_pfn_pmd(vma, addr, vmf->pmd, pfn, pgprot, write, 1485 - pgtable); 1461 + error = insert_pmd(vma, addr, vmf->pmd, fop, pgprot, write, 1462 + pgtable); 1486 1463 spin_unlock(ptl); 1487 1464 if (error && pgtable) 1488 1465 pte_free(vma->vm_mm, pgtable); ··· 1497 1474 struct vm_area_struct *vma = vmf->vma; 1498 1475 unsigned long addr = vmf->address & PMD_MASK; 1499 1476 struct mm_struct *mm = vma->vm_mm; 1477 + struct folio_or_pfn fop = { 1478 + .folio = folio, 1479 + .is_folio = true, 1480 + }; 1500 1481 spinlock_t *ptl; 1501 1482 pgtable_t pgtable = NULL; 1502 1483 int error; ··· 1518 1491 } 1519 1492 1520 1493 ptl = pmd_lock(mm, vmf->pmd); 1521 - if (pmd_none(*vmf->pmd)) { 1522 - folio_get(folio); 1523 - folio_add_file_rmap_pmd(folio, &folio->page, vma); 1524 - add_mm_counter(mm, mm_counter_file(folio), HPAGE_PMD_NR); 1525 - } 1526 - error = insert_pfn_pmd(vma, addr, vmf->pmd, 1527 - pfn_to_pfn_t(folio_pfn(folio)), vma->vm_page_prot, 1528 - write, pgtable); 1494 + error = insert_pmd(vma, addr, vmf->pmd, fop, vma->vm_page_prot, 1495 + write, pgtable); 1529 1496 spin_unlock(ptl); 1530 1497 if (error && pgtable) 1531 1498 pte_free(mm, pgtable);