Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

mm/memory/fault: add THP fault handling for zone device private pages

Implement CPU fault handling for zone device THP entries through
do_huge_pmd_device_private(), enabling transparent migration of
device-private large pages back to system memory on CPU access.

When the CPU accesses a zone device THP entry, the fault handler calls the
device driver's migrate_to_ram() callback to migrate the entire large page
back to system memory.

Link: https://lkml.kernel.org/r/20251001065707.920170-9-balbirs@nvidia.com
Signed-off-by: Balbir Singh <balbirs@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Rakie Kim <rakie.kim@sk.com>
Cc: Byungchul Park <byungchul@sk.com>
Cc: Gregory Price <gourry@gourry.net>
Cc: Ying Huang <ying.huang@linux.alibaba.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Balbir Singh and committed by
Andrew Morton
49640991 a30b48bf

+48 -2
+7
include/linux/huge_mm.h
··· 481 481 482 482 vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf); 483 483 484 + vm_fault_t do_huge_pmd_device_private(struct vm_fault *vmf); 485 + 484 486 extern struct folio *huge_zero_folio; 485 487 extern unsigned long huge_zero_pfn; 486 488 ··· 660 658 } 661 659 662 660 static inline vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf) 661 + { 662 + return 0; 663 + } 664 + 665 + static inline vm_fault_t do_huge_pmd_device_private(struct vm_fault *vmf) 663 666 { 664 667 return 0; 665 668 }
+38
mm/huge_memory.c
··· 1288 1288 1289 1289 } 1290 1290 1291 + vm_fault_t do_huge_pmd_device_private(struct vm_fault *vmf) 1292 + { 1293 + struct vm_area_struct *vma = vmf->vma; 1294 + vm_fault_t ret = 0; 1295 + spinlock_t *ptl; 1296 + swp_entry_t swp_entry; 1297 + struct page *page; 1298 + struct folio *folio; 1299 + 1300 + if (vmf->flags & FAULT_FLAG_VMA_LOCK) { 1301 + vma_end_read(vma); 1302 + return VM_FAULT_RETRY; 1303 + } 1304 + 1305 + ptl = pmd_lock(vma->vm_mm, vmf->pmd); 1306 + if (unlikely(!pmd_same(*vmf->pmd, vmf->orig_pmd))) { 1307 + spin_unlock(ptl); 1308 + return 0; 1309 + } 1310 + 1311 + swp_entry = pmd_to_swp_entry(vmf->orig_pmd); 1312 + page = pfn_swap_entry_to_page(swp_entry); 1313 + folio = page_folio(page); 1314 + vmf->page = page; 1315 + vmf->pte = NULL; 1316 + if (folio_trylock(folio)) { 1317 + folio_get(folio); 1318 + spin_unlock(ptl); 1319 + ret = page_pgmap(page)->ops->migrate_to_ram(vmf); 1320 + folio_unlock(folio); 1321 + folio_put(folio); 1322 + } else { 1323 + spin_unlock(ptl); 1324 + } 1325 + 1326 + return ret; 1327 + } 1328 + 1291 1329 /* 1292 1330 * always: directly stall for all thp allocations 1293 1331 * defer: wake kswapd and fail if not immediately available
+3 -2
mm/memory.c
··· 6345 6345 vmf.orig_pmd = pmdp_get_lockless(vmf.pmd); 6346 6346 6347 6347 if (unlikely(is_swap_pmd(vmf.orig_pmd))) { 6348 - VM_BUG_ON(thp_migration_supported() && 6349 - !is_pmd_migration_entry(vmf.orig_pmd)); 6348 + if (is_pmd_device_private_entry(vmf.orig_pmd)) 6349 + return do_huge_pmd_device_private(&vmf); 6350 + 6350 6351 if (is_pmd_migration_entry(vmf.orig_pmd)) 6351 6352 pmd_migration_entry_wait(mm, vmf.pmd); 6352 6353 return 0;