Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

mm: prevent poison consumption when splitting THP

When performing memory error injection on a THP (Transparent Huge Page)
mapped to userspace on an x86 server, the kernel panics with the following
trace. The expected behavior is to terminate the affected process instead
of panicking the kernel, as the x86 Machine Check code can recover from an
in-userspace #MC.

mce: [Hardware Error]: CPU 0: Machine Check Exception: f Bank 3: bd80000000070134
mce: [Hardware Error]: RIP 10:<ffffffff8372f8bc> {memchr_inv+0x4c/0xf0}
mce: [Hardware Error]: TSC afff7bbff88a ADDR 1d301b000 MISC 80 PPIN 1e741e77539027db
mce: [Hardware Error]: PROCESSOR 0:d06d0 TIME 1758093249 SOCKET 0 APIC 0 microcode 80000320
mce: [Hardware Error]: Run the above through 'mcelog --ascii'
mce: [Hardware Error]: Machine check: Data load in unrecoverable area of kernel
Kernel panic - not syncing: Fatal local machine check

The root cause of this panic is that handling a memory failure triggered
by an in-userspace #MC necessitates splitting the THP. The splitting
process employs a mechanism, implemented in
try_to_map_unused_to_zeropage(), which reads the pages in the THP to
identify zero-filled pages. However, reading the pages in the THP results
in a second in-kernel #MC, occurring before the initial memory_failure()
completes, ultimately leading to a kernel panic. See the kernel panic
call trace on the two #MCs.

First Machine Check occurs // [1]
memory_failure() // [2]
try_to_split_thp_page()
split_huge_page()
split_huge_page_to_list_to_order()
__folio_split() // [3]
remap_page()
remove_migration_ptes()
remove_migration_pte()
try_to_map_unused_to_zeropage() // [4]
memchr_inv() // [5]
Second Machine Check occurs // [6]
Kernel panic

[1] Triggered by accessing a hardware-poisoned THP in userspace, which is
typically recoverable by terminating the affected process.

[2] Call folio_set_has_hwpoisoned() before try_to_split_thp_page().

[3] Pass the RMP_USE_SHARED_ZEROPAGE remap flag to remap_page().

[4] Try to map the unused THP to zeropage.

[5] Re-access pages in the hw-poisoned THP in the kernel.

[6] Triggered in-kernel, leading to a panic kernel.

In Step[2], memory_failure() sets the poisoned flag on the page in the THP
by TestSetPageHWPoison() before calling try_to_split_thp_page().

As suggested by David Hildenbrand, fix this panic by not accessing to the
poisoned page in the THP during zeropage identification, while continuing
to scan unaffected pages in the THP for possible zeropage mapping. This
prevents a second in-kernel #MC that would cause kernel panic in Step[4].

Thanks to Andrew Zaborowski for his initial work on fixing this issue.

Link: https://lkml.kernel.org/r/20251015064926.1887643-1-qiuxu.zhuo@intel.com
Link: https://lkml.kernel.org/r/20251011075520.320862-1-qiuxu.zhuo@intel.com
Fixes: b1f202060afe ("mm: remap unused subpages to shared zeropage when splitting isolated thp")
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Reported-by: Farrah Chen <farrah.chen@intel.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Tested-by: Farrah Chen <farrah.chen@intel.com>
Tested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Acked-by: Lance Yang <lance.yang@linux.dev>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jiaqi Yan <jiaqiyan@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Qiuxu Zhuo and committed by
Andrew Morton
841a8bfc 78a63493

+5 -1
+3
mm/huge_memory.c
··· 4109 4109 if (khugepaged_max_ptes_none == HPAGE_PMD_NR - 1) 4110 4110 return false; 4111 4111 4112 + if (folio_contain_hwpoisoned_page(folio)) 4113 + return false; 4114 + 4112 4115 for (i = 0; i < folio_nr_pages(folio); i++) { 4113 4116 if (pages_identical(folio_page(folio, i), ZERO_PAGE(0))) { 4114 4117 if (++num_zero_pages > khugepaged_max_ptes_none)
+2 -1
mm/migrate.c
··· 301 301 struct page *page = folio_page(folio, idx); 302 302 pte_t newpte; 303 303 304 - if (PageCompound(page)) 304 + if (PageCompound(page) || PageHWPoison(page)) 305 305 return false; 306 + 306 307 VM_BUG_ON_PAGE(!PageAnon(page), page); 307 308 VM_BUG_ON_PAGE(!PageLocked(page), page); 308 309 VM_BUG_ON_PAGE(pte_present(old_pte), page);