Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

kho: warn and fail on metadata or preserved memory in scratch area

Patch series "KHO: kfence + KHO memory corruption fix", v3.

This series fixes a memory corruption bug in KHO that occurs when KFENCE
is enabled.

The root cause is that KHO metadata, allocated via kzalloc(), can be
randomly serviced by kfence_alloc(). When a kernel boots via KHO, the
early memblock allocator is restricted to a "scratch area". This forces
the KFENCE pool to be allocated within this scratch area, creating a
conflict. If KHO metadata is subsequently placed in this pool, it gets
corrupted during the next kexec operation.

Google is using KHO and have had obscure crashes due to this memory
corruption, with stacks all over the place. I would prefer this fix to be
properly backported to stable so we can also automatically consume it once
we switch to the upstream KHO.

Patch 1/3 introduces a debug-only feature (CONFIG_KEXEC_HANDOVER_DEBUG)
that adds checks to detect and fail any operation that attempts to place
KHO metadata or preserved memory within the scratch area. This serves as
a validation and diagnostic tool to confirm the problem without affecting
production builds.

Patch 2/3 Increases bitmap to PAGE_SIZE, so buddy allocator can be used.

Patch 3/3 Provides the fix by modifying KHO to allocate its metadata
directly from the buddy allocator instead of slab. This bypasses the
KFENCE interception entirely.


This patch (of 3):

It is invalid for KHO metadata or preserved memory regions to be located
within the KHO scratch area, as this area is overwritten when the next
kernel is loaded, and used early in boot by the next kernel. This can
lead to memory corruption.

Add checks to kho_preserve_* and KHO's internal metadata allocators
(xa_load_or_alloc, new_chunk) to verify that the physical address of the
memory does not overlap with any defined scratch region. If an overlap is
detected, the operation will fail and a WARN_ON is triggered. To avoid
performance overhead in production kernels, these checks are enabled only
when CONFIG_KEXEC_HANDOVER_DEBUG is selected.

[rppt@kernel.org: fix KEXEC_HANDOVER_DEBUG Kconfig dependency]
Link: https://lkml.kernel.org/r/aQHUyyFtiNZhx8jo@kernel.org
[pasha.tatashin@soleen.com: build fix]
Link: https://lkml.kernel.org/r/CA+CK2bBnorfsTymKtv4rKvqGBHs=y=MjEMMRg_tE-RME6n-zUw@mail.gmail.com
Link: https://lkml.kernel.org/r/20251021000852.2924827-1-pasha.tatashin@soleen.com
Link: https://lkml.kernel.org/r/20251021000852.2924827-2-pasha.tatashin@soleen.com
Fixes: fc33e4b44b27 ("kexec: enable KHO support for memory preservation")
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Mike Rapoport <rppt@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Pratyush Yadav <pratyush@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Matlack <dmatlack@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Samiullah Khawaja <skhawaja@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Pasha Tatashin and committed by
Andrew Morton
e38f65d3 77008e1b

+93 -19
+9
kernel/Kconfig.kexec
··· 109 109 to keep data or state alive across the kexec. For this to work, 110 110 both source and target kernels need to have this option enabled. 111 111 112 + config KEXEC_HANDOVER_DEBUG 113 + bool "Enable Kexec Handover debug checks" 114 + depends on KEXEC_HANDOVER 115 + help 116 + This option enables extra sanity checks for the Kexec Handover 117 + subsystem. Since, KHO performance is crucial in live update 118 + scenarios and the extra code might be adding overhead it is 119 + only optionally enabled. 120 + 112 121 config CRASH_DUMP 113 122 bool "kernel crash dumps" 114 123 default ARCH_DEFAULT_CRASH_DUMP
+1
kernel/Makefile
··· 83 83 obj-$(CONFIG_KEXEC_FILE) += kexec_file.o 84 84 obj-$(CONFIG_KEXEC_ELF) += kexec_elf.o 85 85 obj-$(CONFIG_KEXEC_HANDOVER) += kexec_handover.o 86 + obj-$(CONFIG_KEXEC_HANDOVER_DEBUG) += kexec_handover_debug.o 86 87 obj-$(CONFIG_BACKTRACE_SELF_TEST) += backtracetest.o 87 88 obj-$(CONFIG_COMPAT) += compat.o 88 89 obj-$(CONFIG_CGROUPS) += cgroup/
+38 -19
kernel/kexec_handover.c
··· 8 8 9 9 #define pr_fmt(fmt) "KHO: " fmt 10 10 11 + #include <linux/cleanup.h> 11 12 #include <linux/cma.h> 12 13 #include <linux/count_zeros.h> 13 14 #include <linux/debugfs.h> ··· 23 22 24 23 #include <asm/early_ioremap.h> 25 24 25 + #include "kexec_handover_internal.h" 26 26 /* 27 27 * KHO is tightly coupled with mm init and needs access to some of mm 28 28 * internal APIs. ··· 135 133 136 134 static void *xa_load_or_alloc(struct xarray *xa, unsigned long index, size_t sz) 137 135 { 138 - void *elm, *res; 136 + void *res = xa_load(xa, index); 139 137 140 - elm = xa_load(xa, index); 141 - if (elm) 142 - return elm; 138 + if (res) 139 + return res; 143 140 144 - elm = kzalloc(sz, GFP_KERNEL); 141 + void *elm __free(kfree) = kzalloc(sz, GFP_KERNEL); 142 + 145 143 if (!elm) 146 144 return ERR_PTR(-ENOMEM); 147 145 146 + if (WARN_ON(kho_scratch_overlap(virt_to_phys(elm), sz))) 147 + return ERR_PTR(-EINVAL); 148 + 148 149 res = xa_cmpxchg(xa, index, NULL, elm, GFP_KERNEL); 149 150 if (xa_is_err(res)) 150 - res = ERR_PTR(xa_err(res)); 151 - 152 - if (res) { 153 - kfree(elm); 151 + return ERR_PTR(xa_err(res)); 152 + else if (res) 154 153 return res; 155 - } 156 154 157 - return elm; 155 + return no_free_ptr(elm); 158 156 } 159 157 160 158 static void __kho_unpreserve(struct kho_mem_track *track, unsigned long pfn, ··· 347 345 static struct khoser_mem_chunk *new_chunk(struct khoser_mem_chunk *cur_chunk, 348 346 unsigned long order) 349 347 { 350 - struct khoser_mem_chunk *chunk; 348 + struct khoser_mem_chunk *chunk __free(kfree) = NULL; 351 349 352 350 chunk = kzalloc(PAGE_SIZE, GFP_KERNEL); 353 351 if (!chunk) 354 - return NULL; 352 + return ERR_PTR(-ENOMEM); 353 + 354 + if (WARN_ON(kho_scratch_overlap(virt_to_phys(chunk), PAGE_SIZE))) 355 + return ERR_PTR(-EINVAL); 356 + 355 357 chunk->hdr.order = order; 356 358 if (cur_chunk) 357 359 KHOSER_STORE_PTR(cur_chunk->hdr.next, chunk); 358 - return chunk; 360 + return no_free_ptr(chunk); 359 361 } 360 362 361 363 static void kho_mem_ser_free(struct khoser_mem_chunk *first_chunk) ··· 380 374 struct khoser_mem_chunk *chunk = NULL; 381 375 struct kho_mem_phys *physxa; 382 376 unsigned long order; 377 + int err = -ENOMEM; 383 378 384 379 xa_for_each(&ser->track.orders, order, physxa) { 385 380 struct kho_mem_phys_bits *bits; 386 381 unsigned long phys; 387 382 388 383 chunk = new_chunk(chunk, order); 389 - if (!chunk) 384 + if (IS_ERR(chunk)) { 385 + err = PTR_ERR(chunk); 390 386 goto err_free; 387 + } 391 388 392 389 if (!first_chunk) 393 390 first_chunk = chunk; ··· 400 391 401 392 if (chunk->hdr.num_elms == ARRAY_SIZE(chunk->bitmaps)) { 402 393 chunk = new_chunk(chunk, order); 403 - if (!chunk) 394 + if (IS_ERR(chunk)) { 395 + err = PTR_ERR(chunk); 404 396 goto err_free; 397 + } 405 398 } 406 399 407 400 elm = &chunk->bitmaps[chunk->hdr.num_elms]; ··· 420 409 421 410 err_free: 422 411 kho_mem_ser_free(first_chunk); 423 - return -ENOMEM; 412 + return err; 424 413 } 425 414 426 415 static void __init deserialize_bitmap(unsigned int order, ··· 476 465 * area for early allocations that happen before page allocator is 477 466 * initialized. 478 467 */ 479 - static struct kho_scratch *kho_scratch; 480 - static unsigned int kho_scratch_cnt; 468 + struct kho_scratch *kho_scratch; 469 + unsigned int kho_scratch_cnt; 481 470 482 471 /* 483 472 * The scratch areas are scaled by default as percent of memory allocated from ··· 763 752 const unsigned int order = folio_order(folio); 764 753 struct kho_mem_track *track = &kho_out.ser.track; 765 754 755 + if (WARN_ON(kho_scratch_overlap(pfn << PAGE_SHIFT, PAGE_SIZE << order))) 756 + return -EINVAL; 757 + 766 758 return __kho_preserve_order(track, pfn, order); 767 759 } 768 760 EXPORT_SYMBOL_GPL(kho_preserve_folio); ··· 788 774 unsigned long pfn = start_pfn; 789 775 unsigned long failed_pfn = 0; 790 776 int err = 0; 777 + 778 + if (WARN_ON(kho_scratch_overlap(start_pfn << PAGE_SHIFT, 779 + nr_pages << PAGE_SHIFT))) { 780 + return -EINVAL; 781 + } 791 782 792 783 while (pfn < end_pfn) { 793 784 const unsigned int order =
+25
kernel/kexec_handover_debug.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * kexec_handover_debug.c - kexec handover optional debug functionality 4 + * Copyright (C) 2025 Google LLC, Pasha Tatashin <pasha.tatashin@soleen.com> 5 + */ 6 + 7 + #define pr_fmt(fmt) "KHO: " fmt 8 + 9 + #include "kexec_handover_internal.h" 10 + 11 + bool kho_scratch_overlap(phys_addr_t phys, size_t size) 12 + { 13 + phys_addr_t scratch_start, scratch_end; 14 + unsigned int i; 15 + 16 + for (i = 0; i < kho_scratch_cnt; i++) { 17 + scratch_start = kho_scratch[i].addr; 18 + scratch_end = kho_scratch[i].addr + kho_scratch[i].size; 19 + 20 + if (phys < scratch_end && (phys + size) > scratch_start) 21 + return true; 22 + } 23 + 24 + return false; 25 + }
+20
kernel/kexec_handover_internal.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #ifndef LINUX_KEXEC_HANDOVER_INTERNAL_H 3 + #define LINUX_KEXEC_HANDOVER_INTERNAL_H 4 + 5 + #include <linux/kexec_handover.h> 6 + #include <linux/types.h> 7 + 8 + extern struct kho_scratch *kho_scratch; 9 + extern unsigned int kho_scratch_cnt; 10 + 11 + #ifdef CONFIG_KEXEC_HANDOVER_DEBUG 12 + bool kho_scratch_overlap(phys_addr_t phys, size_t size); 13 + #else 14 + static inline bool kho_scratch_overlap(phys_addr_t phys, size_t size) 15 + { 16 + return false; 17 + } 18 + #endif /* CONFIG_KEXEC_HANDOVER_DEBUG */ 19 + 20 + #endif /* LINUX_KEXEC_HANDOVER_INTERNAL_H */