Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

cxl, acpi/hmat: Update CXL access coordinates directly instead of through HMAT

The current implementation of CXL memory hotplug notifier gets called
before the HMAT memory hotplug notifier. The CXL driver calculates the
access coordinates (bandwidth and latency values) for the CXL end to
end path (i.e. CPU to endpoint). When the CXL region is onlined, the CXL
memory hotplug notifier writes the access coordinates to the HMAT target
structs. Then the HMAT memory hotplug notifier is called and it creates
the access coordinates for the node sysfs attributes.

During testing on an Intel platform, it was found that although the
newly calculated coordinates were pushed to sysfs, the sysfs attributes for
the access coordinates showed up with the wrong initiator. The system has
4 nodes (0, 1, 2, 3) where node 0 and 1 are CPU nodes and node 2 and 3 are
CXL nodes. The expectation is that node 2 would show up as a target to node
0:
/sys/devices/system/node/node2/access0/initiators/node0

However it was observed that node 2 showed up as a target under node 1:
/sys/devices/system/node/node2/access0/initiators/node1

The original intent of the 'ext_updated' flag in HMAT handling code was to
stop HMAT memory hotplug callback from clobbering the access coordinates
after CXL has injected its calculated coordinates and replaced the generic
target access coordinates provided by the HMAT table in the HMAT target
structs. However the flag is hacky at best and blocks the updates from
other CXL regions that are onlined in the same node later on. Remove the
'ext_updated' flag usage and just update the access coordinates for the
nodes directly without touching HMAT target data.

The hotplug memory callback ordering is changed. Instead of changing CXL,
move HMAT back so there's room for the levels rather than have CXL share
the same level as SLAB_CALLBACK_PRI. The change will resulting in the CXL
callback to be executed after the HMAT callback.

With the change, the CXL hotplug memory notifier runs after the HMAT
callback. The HMAT callback will create the node sysfs attributes for
access coordinates. The CXL callback will write the access coordinates to
the now created node sysfs attributes directly and will not pollute the
HMAT target values.

A nodemask is introduced to keep track if a node has been updated and
prevents further updates.

Fixes: 067353a46d8c ("cxl/region: Add memory hotplug notifier for cxl region")
Cc: stable@vger.kernel.org
Tested-by: Marc Herbert <marc.herbert@linux.intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20250829222907.1290912-4-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>

+13 -21
-6
drivers/acpi/numa/hmat.c
··· 74 74 struct node_cache_attrs cache_attrs; 75 75 u8 gen_port_device_handle[ACPI_SRAT_DEVICE_HANDLE_SIZE]; 76 76 bool registered; 77 - bool ext_updated; /* externally updated */ 78 77 }; 79 78 80 79 struct memory_initiator { ··· 390 391 coord->read_bandwidth, access); 391 392 hmat_update_target_access(target, ACPI_HMAT_WRITE_BANDWIDTH, 392 393 coord->write_bandwidth, access); 393 - target->ext_updated = true; 394 394 395 395 return 0; 396 396 } ··· 770 772 struct memory_locality *loc = NULL; 771 773 u32 best = 0; 772 774 int i; 773 - 774 - /* Don't update if an external agent has changed the data. */ 775 - if (target->ext_updated) 776 - return; 777 775 778 776 /* Don't update for generic port if there's no device handle */ 779 777 if ((access == NODE_ACCESS_CLASS_GENPORT_SINK_LOCAL ||
-5
drivers/cxl/core/cdat.c
··· 1081 1081 { 1082 1082 return hmat_update_target_coordinates(nid, &cxlr->coord[access], access); 1083 1083 } 1084 - 1085 - bool cxl_need_node_perf_attrs_update(int nid) 1086 - { 1087 - return !acpi_node_backed_by_real_pxm(nid); 1088 - }
-1
drivers/cxl/core/core.h
··· 139 139 int cxl_pci_get_bandwidth(struct pci_dev *pdev, struct access_coordinate *c); 140 140 int cxl_update_hmat_access_coordinates(int nid, struct cxl_region *cxlr, 141 141 enum access_coordinate_class access); 142 - bool cxl_need_node_perf_attrs_update(int nid); 143 142 int cxl_port_get_switch_dport_bandwidth(struct cxl_port *port, 144 143 struct access_coordinate *c); 145 144
+12 -8
drivers/cxl/core/region.c
··· 30 30 * 3. Decoder targets 31 31 */ 32 32 33 + /* 34 + * nodemask that sets per node when the access_coordinates for the node has 35 + * been updated by the CXL memory hotplug notifier. 36 + */ 37 + static nodemask_t nodemask_region_seen = NODE_MASK_NONE; 38 + 33 39 static struct cxl_region *to_cxl_region(struct device *dev); 34 40 35 41 #define __ACCESS_ATTR_RO(_level, _name) { \ ··· 2448 2442 2449 2443 for (int i = 0; i < ACCESS_COORDINATE_MAX; i++) { 2450 2444 if (cxlr->coord[i].read_bandwidth) { 2451 - rc = 0; 2452 - if (cxl_need_node_perf_attrs_update(nid)) 2453 - node_set_perf_attrs(nid, &cxlr->coord[i], i); 2454 - else 2455 - rc = cxl_update_hmat_access_coordinates(nid, cxlr, i); 2456 - 2457 - if (rc == 0) 2458 - cset++; 2445 + node_update_perf_attrs(nid, &cxlr->coord[i], i); 2446 + cset++; 2459 2447 } 2460 2448 } 2461 2449 ··· 2485 2485 */ 2486 2486 region_nid = phys_to_target_node(cxlr->params.res->start); 2487 2487 if (nid != region_nid) 2488 + return NOTIFY_DONE; 2489 + 2490 + /* No action needed if node bit already set */ 2491 + if (node_test_and_set(nid, nodemask_region_seen)) 2488 2492 return NOTIFY_DONE; 2489 2493 2490 2494 if (!cxl_region_update_coordinates(cxlr, nid))
+1 -1
include/linux/memory.h
··· 120 120 */ 121 121 #define DEFAULT_CALLBACK_PRI 0 122 122 #define SLAB_CALLBACK_PRI 1 123 - #define HMAT_CALLBACK_PRI 2 124 123 #define CXL_CALLBACK_PRI 5 124 + #define HMAT_CALLBACK_PRI 6 125 125 #define MM_COMPUTE_BATCH_PRI 10 126 126 #define CPUSET_CALLBACK_PRI 10 127 127 #define MEMTIER_HOTPLUG_PRI 100