Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

mm/hmm: populate PFNs from PMD swap entry

Once support for THP migration of zone device pages is enabled, device
private swap entries will be found during the walk not only for PTEs but
also for PMDs.

Therefore, it is necessary to extend to PMDs the special handling which is
already in place for PTEs when device private pages are owned by the
caller: instead of faulting or skipping the range, the correct behavior is
to use the swap entry to populate HMM PFNs.

This change is a prerequisite to make use of device-private THP in drivers
using drivers/gpu/drm/drm_pagemap, such as xe.

Even though subsequent PFNs can be inferred when handling large order
PFNs, the PFN list is still fully populated because this is currently
expected by HMM users. In case this changes in the future, that is all
HMM users support a sparsely populated PFN list, the for() loop can be
made to skip remaining PFNs for the current order. A quick test shows the
loop takes about 10 ns, roughly 20 times faster than without this
optimization.

Link: https://lkml.kernel.org/r/20250908091052.612303-1-francois.dugast@intel.com
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Leon Romanovsky <leonro@nvidia.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Balbir Singh <balbirs@nvidia.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Francois Dugast and committed by
Andrew Morton
10b9feee 7cad96ae

+65 -5
+65 -5
mm/hmm.c
··· 326 326 return hmm_vma_fault(addr, end, required_fault, walk); 327 327 } 328 328 329 + #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION 330 + static int hmm_vma_handle_absent_pmd(struct mm_walk *walk, unsigned long start, 331 + unsigned long end, unsigned long *hmm_pfns, 332 + pmd_t pmd) 333 + { 334 + struct hmm_vma_walk *hmm_vma_walk = walk->private; 335 + struct hmm_range *range = hmm_vma_walk->range; 336 + unsigned long npages = (end - start) >> PAGE_SHIFT; 337 + unsigned long addr = start; 338 + swp_entry_t entry = pmd_to_swp_entry(pmd); 339 + unsigned int required_fault; 340 + 341 + if (is_device_private_entry(entry) && 342 + pfn_swap_entry_folio(entry)->pgmap->owner == 343 + range->dev_private_owner) { 344 + unsigned long cpu_flags = HMM_PFN_VALID | 345 + hmm_pfn_flags_order(PMD_SHIFT - PAGE_SHIFT); 346 + unsigned long pfn = swp_offset_pfn(entry); 347 + unsigned long i; 348 + 349 + if (is_writable_device_private_entry(entry)) 350 + cpu_flags |= HMM_PFN_WRITE; 351 + 352 + /* 353 + * Fully populate the PFN list though subsequent PFNs could be 354 + * inferred, because drivers which are not yet aware of large 355 + * folios probably do not support sparsely populated PFN lists. 356 + */ 357 + for (i = 0; addr < end; addr += PAGE_SIZE, i++, pfn++) { 358 + hmm_pfns[i] &= HMM_PFN_INOUT_FLAGS; 359 + hmm_pfns[i] |= pfn | cpu_flags; 360 + } 361 + 362 + return 0; 363 + } 364 + 365 + required_fault = hmm_range_need_fault(hmm_vma_walk, hmm_pfns, 366 + npages, 0); 367 + if (required_fault) { 368 + if (is_device_private_entry(entry)) 369 + return hmm_vma_fault(addr, end, required_fault, walk); 370 + else 371 + return -EFAULT; 372 + } 373 + 374 + return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR); 375 + } 376 + #else 377 + static int hmm_vma_handle_absent_pmd(struct mm_walk *walk, unsigned long start, 378 + unsigned long end, unsigned long *hmm_pfns, 379 + pmd_t pmd) 380 + { 381 + struct hmm_vma_walk *hmm_vma_walk = walk->private; 382 + struct hmm_range *range = hmm_vma_walk->range; 383 + unsigned long npages = (end - start) >> PAGE_SHIFT; 384 + 385 + if (hmm_range_need_fault(hmm_vma_walk, hmm_pfns, npages, 0)) 386 + return -EFAULT; 387 + return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR); 388 + } 389 + #endif /* CONFIG_ARCH_ENABLE_THP_MIGRATION */ 390 + 329 391 static int hmm_vma_walk_pmd(pmd_t *pmdp, 330 392 unsigned long start, 331 393 unsigned long end, ··· 416 354 return hmm_pfns_fill(start, end, range, 0); 417 355 } 418 356 419 - if (!pmd_present(pmd)) { 420 - if (hmm_range_need_fault(hmm_vma_walk, hmm_pfns, npages, 0)) 421 - return -EFAULT; 422 - return hmm_pfns_fill(start, end, range, HMM_PFN_ERROR); 423 - } 357 + if (!pmd_present(pmd)) 358 + return hmm_vma_handle_absent_pmd(walk, start, end, hmm_pfns, 359 + pmd); 424 360 425 361 if (pmd_trans_huge(pmd)) { 426 362 /*