Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

mm: numa_memblks: Identify the accurate NUMA ID of CFMW

In some physical memory layout designs, the address space of CFMW (CXL
Fixed Memory Window) resides between multiple segments of system memory
belonging to the same NUMA node. In numa_cleanup_meminfo, these multiple
segments of system memory are merged into a larger numa_memblk. When
identifying which NUMA node the CFMW belongs to, it may be incorrectly
assigned to the NUMA node of the merged system memory.

When a CXL RAM region is created in userspace, the memory capacity of
the newly created region is not added to the CFMW-dedicated NUMA node.
Instead, it is accumulated into an existing NUMA node (e.g., NUMA0
containing RAM). This makes it impossible to clearly distinguish
between the two types of memory, which may affect memory-tiering
applications.

Example memory layout:

Physical address space:
0x00000000 - 0x1FFFFFFF System RAM (node0)
0x20000000 - 0x2FFFFFFF CXL CFMW (node2)
0x40000000 - 0x5FFFFFFF System RAM (node0)
0x60000000 - 0x7FFFFFFF System RAM (node1)

After numa_cleanup_meminfo, the two node0 segments are merged into one:
0x00000000 - 0x5FFFFFFF System RAM (node0) // CFMW is inside the range
0x60000000 - 0x7FFFFFFF System RAM (node1)

So the CFMW (0x20000000-0x2FFFFFFF) will be incorrectly assigned to node0.

To address this scenario, accurately identifying the correct NUMA node
can be achieved by checking whether the region belongs to both
numa_meminfo and numa_reserved_meminfo.

While this issue is only observed in a QEMU configuration, and no known
end users are impacted by this problem, it is likely that some firmware
implementation is leaving memory map holes in a CXL Fixed Memory Window.
CXL hotplug depends on mapping free window capacity, and it seems to be
only a coincidence to have not hit this problem yet.

Fixes: 779dd20cfb56 ("cxl/region: Add region creation support")
Signed-off-by: Cui Chao <cuichao1753@phytium.com.cn>
Cc: stable@vger.kernel.org
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Gregory Price <gourry@gourry.net>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Link: https://patch.msgid.link/20260213060347.2389818-2-cuichao1753@phytium.com.cn
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>

authored by

Cui Chao and committed by
Mike Rapoport (Microsoft)
f043a93f 05f7e89a

+5 -4
+5 -4
mm/numa_memblks.c
··· 570 570 int phys_to_target_node(u64 start) 571 571 { 572 572 int nid = meminfo_to_nid(&numa_meminfo, start); 573 + int reserved_nid = meminfo_to_nid(&numa_reserved_meminfo, start); 573 574 574 575 /* 575 - * Prefer online nodes, but if reserved memory might be 576 - * hot-added continue the search with reserved ranges. 576 + * Prefer online nodes unless the address is also described 577 + * by reserved ranges, in which case use the reserved nid. 577 578 */ 578 - if (nid != NUMA_NO_NODE) 579 + if (nid != NUMA_NO_NODE && reserved_nid == NUMA_NO_NODE) 579 580 return nid; 580 581 581 - return meminfo_to_nid(&numa_reserved_meminfo, start); 582 + return reserved_nid; 582 583 } 583 584 EXPORT_SYMBOL_GPL(phys_to_target_node); 584 585