Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'cxl-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull Compute Express Link (CXL) updates from Dave Jiang:

- Remove always true condition in cxl features code

- Add verification of CHBS length for CXL 2.0

- Ignore interleave granularity when interleave ways is 1

- Add update addressing mising MODULE_DESCRIPTION for cxl_test

- A series of cleanups/refactor to prep for AMD Zen5 translate code

- Clean %pa debug printk in core/hdm.c

- Documentation updates:
- Update to CXL Maturity Map
- Fixes to source linking in CXL documentation
- CXL documentation fixes, spelling corrections
- A large collection of CXL documentation for the entire CXL
subsystem, including documentation on CXL related platform and
firmware notes

- Remove redundant code of cxlctl_get_supported_features()

- Series to support CXL RAS Features
- Including "Patrol Scrub Control", "Error Check Scrub",
"Performance Maitenance" and "Memory Sparing". The series
connects CXL to EDAC.

* tag 'cxl-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (53 commits)
cxl/edac: Add CXL memory device soft PPR control feature
cxl/edac: Add CXL memory device memory sparing control feature
cxl/edac: Support for finding memory operation attributes from the current boot
cxl/edac: Add support for PERFORM_MAINTENANCE command
cxl/edac: Add CXL memory device ECS control feature
cxl/edac: Add CXL memory device patrol scrub control feature
cxl: Update prototype of function get_support_feature_info()
EDAC: Update documentation for the CXL memory patrol scrub control feature
cxl/features: Remove the inline specifier from to_cxlfs()
cxl/feature: Remove redundant code of get supported features
docs: ABI: Fix "firwmare" to "firmware"
cxl/Documentation: Fix typo in sysfs write_bandwidth attribute path
cxl: doc/linux/access-coordinates Update access coordinates calculation methods
cxl: docs/platform/acpi/srat Add generic target documentation
cxl: docs/platform/cdat reference documentation
Documentation: Update the CXL Maturity Map
cxl: Sync up the driver-api/cxl documentation
cxl: docs - add self-referencing cross-links
cxl: docs/allocation/hugepages
cxl: docs/allocation/reclaim
...

+6770 -267
+2 -2
Documentation/ABI/testing/sysfs-bus-cxl
··· 242 242 decoding a Host Physical Address range. Note that this number 243 243 may be elevated without any regionX objects active or even 244 244 enumerated, as this may be due to decoders established by 245 - platform firwmare or a previous kernel (kexec). 245 + platform firmware or a previous kernel (kexec). 246 246 247 247 248 248 What: /sys/bus/cxl/devices/decoderX.Y ··· 572 572 573 573 574 574 What: /sys/bus/cxl/devices/regionZ/accessY/read_bandwidth 575 - /sys/bus/cxl/devices/regionZ/accessY/write_banwidth 575 + /sys/bus/cxl/devices/regionZ/accessY/write_bandwidth 576 576 Date: Jan, 2024 577 577 KernelVersion: v6.9 578 578 Contact: linux-cxl@vger.kernel.org
-91
Documentation/driver-api/cxl/access-coordinates.rst
··· 1 - .. SPDX-License-Identifier: GPL-2.0 2 - .. include:: <isonum.txt> 3 - 4 - ================================== 5 - CXL Access Coordinates Computation 6 - ================================== 7 - 8 - Shared Upstream Link Calculation 9 - ================================ 10 - For certain CXL region construction with endpoints behind CXL switches (SW) or 11 - Root Ports (RP), there is the possibility of the total bandwidth for all 12 - the endpoints behind a switch being more than the switch upstream link. 13 - A similar situation can occur within the host, upstream of the root ports. 14 - The CXL driver performs an additional pass after all the targets have 15 - arrived for a region in order to recalculate the bandwidths with possible 16 - upstream link being a limiting factor in mind. 17 - 18 - The algorithm assumes the configuration is a symmetric topology as that 19 - maximizes performance. When asymmetric topology is detected, the calculation 20 - is aborted. An asymmetric topology is detected during topology walk where the 21 - number of RPs detected as a grandparent is not equal to the number of devices 22 - iterated in the same iteration loop. The assumption is made that subtle 23 - asymmetry in properties does not happen and all paths to EPs are equal. 24 - 25 - There can be multiple switches under an RP. There can be multiple RPs under 26 - a CXL Host Bridge (HB). There can be multiple HBs under a CXL Fixed Memory 27 - Window Structure (CFMWS). 28 - 29 - An example hierarchy: 30 - 31 - > CFMWS 0 32 - > | 33 - > _________|_________ 34 - > | | 35 - > ACPI0017-0 ACPI0017-1 36 - > GP0/HB0/ACPI0016-0 GP1/HB1/ACPI0016-1 37 - > | | | | 38 - > RP0 RP1 RP2 RP3 39 - > | | | | 40 - > SW 0 SW 1 SW 2 SW 3 41 - > | | | | | | | | 42 - > EP0 EP1 EP2 EP3 EP4 EP5 EP6 EP7 43 - 44 - Computation for the example hierarchy: 45 - 46 - Min (GP0 to CPU BW, 47 - Min(SW 0 Upstream Link to RP0 BW, 48 - Min(SW0SSLBIS for SW0DSP0 (EP0), EP0 DSLBIS, EP0 Upstream Link) + 49 - Min(SW0SSLBIS for SW0DSP1 (EP1), EP1 DSLBIS, EP1 Upstream link)) + 50 - Min(SW 1 Upstream Link to RP1 BW, 51 - Min(SW1SSLBIS for SW1DSP0 (EP2), EP2 DSLBIS, EP2 Upstream Link) + 52 - Min(SW1SSLBIS for SW1DSP1 (EP3), EP3 DSLBIS, EP3 Upstream link))) + 53 - Min (GP1 to CPU BW, 54 - Min(SW 2 Upstream Link to RP2 BW, 55 - Min(SW2SSLBIS for SW2DSP0 (EP4), EP4 DSLBIS, EP4 Upstream Link) + 56 - Min(SW2SSLBIS for SW2DSP1 (EP5), EP5 DSLBIS, EP5 Upstream link)) + 57 - Min(SW 3 Upstream Link to RP3 BW, 58 - Min(SW3SSLBIS for SW3DSP0 (EP6), EP6 DSLBIS, EP6 Upstream Link) + 59 - Min(SW3SSLBIS for SW3DSP1 (EP7), EP7 DSLBIS, EP7 Upstream link)))) 60 - 61 - The calculation starts at cxl_region_shared_upstream_perf_update(). A xarray 62 - is created to collect all the endpoint bandwidths via the 63 - cxl_endpoint_gather_bandwidth() function. The min() of bandwidth from the 64 - endpoint CDAT and the upstream link bandwidth is calculated. If the endpoint 65 - has a CXL switch as a parent, then min() of calculated bandwidth and the 66 - bandwidth from the SSLBIS for the switch downstream port that is associated 67 - with the endpoint is calculated. The final bandwidth is stored in a 68 - 'struct cxl_perf_ctx' in the xarray indexed by a device pointer. If the 69 - endpoint is direct attached to a root port (RP), the device pointer would be an 70 - RP device. If the endpoint is behind a switch, the device pointer would be the 71 - upstream device of the parent switch. 72 - 73 - At the next stage, the code walks through one or more switches if they exist 74 - in the topology. For endpoints directly attached to RPs, this step is skipped. 75 - If there is another switch upstream, the code takes the min() of the current 76 - gathered bandwidth and the upstream link bandwidth. If there's a switch 77 - upstream, then the SSLBIS of the upstream switch. 78 - 79 - Once the topology walk reaches the RP, whether it's direct attached endpoints 80 - or walking through the switch(es), cxl_rp_gather_bandwidth() is called. At 81 - this point all the bandwidths are aggregated per each host bridge, which is 82 - also the index for the resulting xarray. 83 - 84 - The next step is to take the min() of the per host bridge bandwidth and the 85 - bandwidth from the Generic Port (GP). The bandwidths for the GP is retrieved 86 - via ACPI tables SRAT/HMAT. The min bandwidth are aggregated under the same 87 - ACPI0017 device to form a new xarray. 88 - 89 - Finally, the cxl_region_update_bandwidth() is called and the aggregated 90 - bandwidth from all the members of the last xarray is updated for the 91 - access coordinates residing in the cxl region (cxlr) context.
+60
Documentation/driver-api/cxl/allocation/dax.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + =========== 4 + DAX Devices 5 + =========== 6 + CXL capacity exposed as a DAX device can be accessed directly via mmap. 7 + Users may wish to use this interface mechanism to write their own userland 8 + CXL allocator, or to managed shared or persistent memory regions across multiple 9 + hosts. 10 + 11 + If the capacity is shared across hosts or persistent, appropriate flushing 12 + mechanisms must be employed unless the region supports Snoop Back-Invalidate. 13 + 14 + Note that mappings must be aligned (size and base) to the dax device's base 15 + alignment, which is typically 2MB - but maybe be configured larger. 16 + 17 + :: 18 + 19 + #include <stdio.h> 20 + #include <stdlib.h> 21 + #include <stdint.h> 22 + #include <sys/mman.h> 23 + #include <fcntl.h> 24 + #include <unistd.h> 25 + 26 + #define DEVICE_PATH "/dev/dax0.0" // Replace DAX device path 27 + #define DEVICE_SIZE (4ULL * 1024 * 1024 * 1024) // 4GB 28 + 29 + int main() { 30 + int fd; 31 + void* mapped_addr; 32 + 33 + /* Open the DAX device */ 34 + fd = open(DEVICE_PATH, O_RDWR); 35 + if (fd < 0) { 36 + perror("open"); 37 + return -1; 38 + } 39 + 40 + /* Map the device into memory */ 41 + mapped_addr = mmap(NULL, DEVICE_SIZE, PROT_READ | PROT_WRITE, 42 + MAP_SHARED, fd, 0); 43 + if (mapped_addr == MAP_FAILED) { 44 + perror("mmap"); 45 + close(fd); 46 + return -1; 47 + } 48 + 49 + printf("Mapped address: %p\n", mapped_addr); 50 + 51 + /* You can now access the device through the mapped address */ 52 + uint64_t* ptr = (uint64_t*)mapped_addr; 53 + *ptr = 0x1234567890abcdef; // Write a value to the device 54 + printf("Value at address %p: 0x%016llx\n", ptr, *ptr); 55 + 56 + /* Clean up */ 57 + munmap(mapped_addr, DEVICE_SIZE); 58 + close(fd); 59 + return 0; 60 + }
+32
Documentation/driver-api/cxl/allocation/hugepages.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ========== 4 + Huge Pages 5 + ========== 6 + 7 + Contiguous Memory Allocator 8 + =========================== 9 + CXL Memory onlined as SystemRAM during early boot is eligible for use by CMA, 10 + as the NUMA node hosting that capacity will be `Online` at the time CMA 11 + carves out contiguous capacity. 12 + 13 + CXL Memory deferred to the CXL Driver for configuration cannot have its 14 + capacity allocated by CMA - as the NUMA node hosting the capacity is `Offline` 15 + at :code:`__init` time - when CMA carves out contiguous capacity. 16 + 17 + HugeTLB 18 + ======= 19 + Different huge page sizes allow different memory configurations. 20 + 21 + 2MB Huge Pages 22 + -------------- 23 + All CXL capacity regardless of configuration time or memory zone is eligible 24 + for use as 2MB huge pages. 25 + 26 + 1GB Huge Pages 27 + -------------- 28 + CXL capacity onlined in :code:`ZONE_NORMAL` is eligible for 1GB Gigantic Page 29 + allocation. 30 + 31 + CXL capacity onlined in :code:`ZONE_MOVABLE` is not eligible for 1GB Gigantic 32 + Page allocation.
+85
Documentation/driver-api/cxl/allocation/page-allocator.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ================== 4 + The Page Allocator 5 + ================== 6 + 7 + The kernel page allocator services all general page allocation requests, such 8 + as :code:`kmalloc`. CXL configuration steps affect the behavior of the page 9 + allocator based on the selected `Memory Zone` and `NUMA node` the capacity is 10 + placed in. 11 + 12 + This section mostly focuses on how these configurations affect the page 13 + allocator (as of Linux v6.15) rather than the overall page allocator behavior. 14 + 15 + NUMA nodes and mempolicy 16 + ======================== 17 + Unless a task explicitly registers a mempolicy, the default memory policy 18 + of the linux kernel is to allocate memory from the `local NUMA node` first, 19 + and fall back to other nodes only if the local node is pressured. 20 + 21 + Generally, we expect to see local DRAM and CXL memory on separate NUMA nodes, 22 + with the CXL memory being non-local. Technically, however, it is possible 23 + for a compute node to have no local DRAM, and for CXL memory to be the 24 + `local` capacity for that compute node. 25 + 26 + 27 + Memory Zones 28 + ============ 29 + CXL capacity may be onlined in :code:`ZONE_NORMAL` or :code:`ZONE_MOVABLE`. 30 + 31 + As of v6.15, the page allocator attempts to allocate from the highest 32 + available and compatible ZONE for an allocation from the local node first. 33 + 34 + An example of a `zone incompatibility` is attempting to service an allocation 35 + marked :code:`GFP_KERNEL` from :code:`ZONE_MOVABLE`. Kernel allocations are 36 + typically not migratable, and as a result can only be serviced from 37 + :code:`ZONE_NORMAL` or lower. 38 + 39 + To simplify this, the page allocator will prefer :code:`ZONE_MOVABLE` over 40 + :code:`ZONE_NORMAL` by default, but if :code:`ZONE_MOVABLE` is depleted, it 41 + will fallback to allocate from :code:`ZONE_NORMAL`. 42 + 43 + 44 + Zone and Node Quirks 45 + ==================== 46 + Let's consider a configuration where the local DRAM capacity is largely onlined 47 + into :code:`ZONE_NORMAL`, with no :code:`ZONE_MOVABLE` capacity present. The 48 + CXL capacity has the opposite configuration - all onlined in 49 + :code:`ZONE_MOVABLE`. 50 + 51 + Under the default allocation policy, the page allocator will completely skip 52 + :code:`ZONE_MOVABLE` as a valid allocation target. This is because, as of 53 + Linux v6.15, the page allocator does (approximately) the following: :: 54 + 55 + for (each zone in local_node): 56 + 57 + for (each node in fallback_order): 58 + 59 + attempt_allocation(gfp_flags); 60 + 61 + Because the local node does not have :code:`ZONE_MOVABLE`, the CXL node is 62 + functionally unreachable for direct allocation. As a result, the only way 63 + for CXL capacity to be used is via `demotion` in the reclaim path. 64 + 65 + This configuration also means that if the DRAM ndoe has :code:`ZONE_MOVABLE` 66 + capacity - when that capacity is depleted, the page allocator will actually 67 + prefer CXL :code:`ZONE_MOVABLE` pages over DRAM :code:`ZONE_NORMAL` pages. 68 + 69 + We may wish to invert this priority in future Linux versions. 70 + 71 + If `demotion` and `swap` are disabled, Linux will begin to cause OOM crashes 72 + when the DRAM nodes are depleted. See the reclaim section for more details. 73 + 74 + 75 + CGroups and CPUSets 76 + =================== 77 + Finally, assuming CXL memory is reachable via the page allocation (i.e. onlined 78 + in :code:`ZONE_NORMAL`), the :code:`cpusets.mems_allowed` may be used by 79 + containers to limit the accessibility of certain NUMA nodes for tasks in that 80 + container. Users may wish to utilize this in multi-tenant systems where some 81 + tasks prefer not to use slower memory. 82 + 83 + In the reclaim section we'll discuss some limitations of this interface to 84 + prevent demotions of shared data to CXL memory (if demotions are enabled). 85 +
+51
Documentation/driver-api/cxl/allocation/reclaim.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ======= 4 + Reclaim 5 + ======= 6 + Another way CXL memory can be utilized *indirectly* is via the reclaim system 7 + in :code:`mm/vmscan.c`. Reclaim is engaged when memory capacity on the system 8 + becomes pressured based on global and cgroup-local `watermark` settings. 9 + 10 + In this section we won't discuss the `watermark` configurations, just how CXL 11 + memory can be consumed by various pieces of reclaim system. 12 + 13 + Demotion 14 + ======== 15 + By default, the reclaim system will prefer swap (or zswap) when reclaiming 16 + memory. Enabling :code:`kernel/mm/numa/demotion_enabled` will cause vmscan 17 + to opportunistically prefer distant NUMA nodes to swap or zswap, if capacity 18 + is available. 19 + 20 + Demotion engages the :code:`mm/memory_tier.c` component to determine the 21 + next demotion node. The next demotion node is based on the :code:`HMAT` 22 + or :code:`CDAT` performance data. 23 + 24 + cpusets.mems_allowed quirk 25 + -------------------------- 26 + In Linux v6.15 and below, demotion does not respect :code:`cpusets.mems_allowed` 27 + when migrating pages. As a result, if demotion is enabled, vmscan cannot 28 + guarantee isolation of a container's memory from nodes not set in mems_allowed. 29 + 30 + In Linux v6.XX and up, demotion does attempt to respect 31 + :code:`cpusets.mems_allowed`; however, certain classes of shared memory 32 + originally instantiated by another cgroup (such as common libraries - e.g. 33 + libc) may still be demoted. As a result, the mems_allowed interface still 34 + cannot provide perfect isolation from the remote nodes. 35 + 36 + ZSwap and Node Preference 37 + ========================= 38 + In Linux v6.15 and below, ZSwap allocates memory from the local node of the 39 + processor for the new pages being compressed. Since pages being compressed 40 + are typically cold, the result is a cold page becomes promoted - only to 41 + be later demoted as it ages off the LRU. 42 + 43 + In Linux v6.XX, ZSwap tries to prefer the node of the page being compressed 44 + as the allocation target for the compression page. This helps prevent 45 + thrashing. 46 + 47 + Demotion with ZSwap 48 + =================== 49 + When enabling both Demotion and ZSwap, you create a situation where ZSwap 50 + will prefer the slowest form of CXL memory by default until that tier of 51 + memory is exhausted.
+165
Documentation/driver-api/cxl/devices/device-types.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ===================== 4 + Devices and Protocols 5 + ===================== 6 + 7 + The type of CXL device (Memory, Accelerator, etc) dictates many configuration steps. This section 8 + covers some basic background on device types and on-device resources used by the platform and OS 9 + which impact configuration. 10 + 11 + Protocols 12 + ========= 13 + 14 + There are three core protocols to CXL. For the purpose of this documentation, 15 + we will only discuss very high level definitions as the specific hardware 16 + details are largely abstracted away from Linux. See the CXL specification 17 + for more details. 18 + 19 + CXL.io 20 + ------ 21 + The basic interaction protocol, similar to PCIe configuration mechanisms. 22 + Typically used for initialization, configuration, and I/O access for anything 23 + other than memory (CXL.mem) or cache (CXL.cache) operations. 24 + 25 + The Linux CXL driver exposes access to .io functionalty via the various sysfs 26 + interfaces and /dev/cxl/ devices (which exposes direct access to device 27 + mailboxes). 28 + 29 + CXL.cache 30 + --------- 31 + The mechanism by which a device may coherently access and cache host memory. 32 + 33 + Largely transparent to Linux once configured. 34 + 35 + CXL.mem 36 + --------- 37 + The mechanism by which the CPU may coherently access and cache device memory. 38 + 39 + Largely transparent to Linux once configured. 40 + 41 + 42 + Device Types 43 + ============ 44 + 45 + Type-1 46 + ------ 47 + 48 + A Type-1 CXL device: 49 + 50 + * Supports cxl.io and cxl.cache protocols 51 + * Implements a fully coherent cache 52 + * Allows Device-to-Host coherence and Host-to-Device snoops. 53 + * Does NOT have host-managed device memory (HDM) 54 + 55 + Typical examples of type-1 devices is a Smart NIC - which may want to 56 + directly operate on host-memory (DMA) to store incoming packets. These 57 + devices largely rely on CPU-attached memory. 58 + 59 + Type-2 60 + ------ 61 + 62 + A Type-2 CXL Device: 63 + 64 + * Supports cxl.io, cxl.cache, and cxl.mem protocols 65 + * Optionally implements coherent cache and Host-Managed Device Memory 66 + * Is typically an accelerator device w/ high bandwidth memory. 67 + 68 + The primary difference between a type-1 and type-2 device is the presence 69 + of host-managed device memory, which allows the device to operate on a 70 + local memory bank - while the CPU sill has coherent DMA to the same memory. 71 + 72 + The allows things like GPUs to expose their memory via DAX devices or file 73 + descriptors, allows drivers and programs direct access to device memory 74 + rather than use block-transfer semantics. 75 + 76 + Type-3 77 + ------ 78 + 79 + A Type-3 CXL Device 80 + 81 + * Supports cxl.io and cxl.mem 82 + * Implements Host-Managed Device Memory 83 + * May provide either Volatile or Persistent memory capacity (or both). 84 + 85 + A basic example of a type-3 device is a simple memory expander, whose 86 + local memory capacity is exposed to the CPU for access directly via 87 + basic coherent DMA. 88 + 89 + Switch 90 + ------ 91 + 92 + A CXL switch is a device capacity of routing any CXL (and by extension, PCIe) 93 + protocol between an upstream, downstream, or peer devices. Many devices, such 94 + as Multi-Logical Devices, imply the presence of switching in some manner. 95 + 96 + Logical Devices and Heads 97 + ------------------------- 98 + 99 + A CXL device may present one or more "Logical Devices" to one or more hosts 100 + (via physical "Heads"). 101 + 102 + A Single-Logical Device (SLD) is a device which presents a single device to 103 + one or more heads. 104 + 105 + A Multi-Logical Device (MLD) is a device which may present multiple devices 106 + to one or more devices. 107 + 108 + A Single-Headed Device exposes only a single physical connection. 109 + 110 + A Multi-Headed Device exposes multiple physical connections. 111 + 112 + MHSLD 113 + ~~~~~ 114 + A Multi-Headed Single-Logical Device (MHSLD) exposes a single logical 115 + device to multiple heads which may be connected to one or more discrete 116 + hosts. An example of this would be a simple memory-pool which may be 117 + statically configured (prior to boot) to expose portions of its memory 118 + to Linux via :doc:`CEDT <../platform/acpi/cedt>`. 119 + 120 + MHMLD 121 + ~~~~~ 122 + A Multi-Headed Multi-Logical Device (MHMLD) exposes multiple logical 123 + devices to multiple heads which may be connected to one or more discrete 124 + hosts. An example of this would be a Dynamic Capacity Device or which 125 + may be configured at runtime to expose portions of its memory to Linux. 126 + 127 + Example Devices 128 + =============== 129 + 130 + Memory Expander 131 + --------------- 132 + The simplest form of Type-3 device is a memory expander. A memory expander 133 + exposes Host-Managed Device Memory (HDM) to Linux. This memory may be 134 + Volatile or Non-Volatile (Persistent). 135 + 136 + Memory Expanders will typically be considered a form of Single-Headed, 137 + Single-Logical Device - as its form factor will typically be an add-in-card 138 + (AIC) or some other similar form-factor. 139 + 140 + The Linux CXL driver provides support for static or dynamic configuration of 141 + basic memory expanders. The platform may program decoders prior to OS init 142 + (e.g. auto-decoders), or the user may program the fabric if the platform 143 + defers these operations to the OS. 144 + 145 + Multiple Memory Expanders may be added to an external chassis and exposed to 146 + a host via a head attached to a CXL switch. This is a "memory pool", and 147 + would be considered an MHSLD or MHMLD depending on the management capabilities 148 + provided by the switch platform. 149 + 150 + As of v6.14, Linux does not provide a formalized interface to manage non-DCD 151 + MHSLD or MHMLD devices. 152 + 153 + Dynamic Capacity Device (DCD) 154 + ----------------------------- 155 + 156 + A Dynamic Capacity Device is a Type-3 device which provides dynamic management 157 + of memory capacity. The basic premise of a DCD to provide an allocator-like 158 + interface for physical memory capacity to a "Fabric Manager" (an external, 159 + privileged host with privileges to change configurations for other hosts). 160 + 161 + A DCD manages "Memory Extents", which may be volatile or persistent. Extents 162 + may also be exclusive to a single host or shared across multiple hosts. 163 + 164 + As of v6.14, Linux does not provide a formalized interface to manage DCD 165 + devices, however there is active work on LKML targeting future release.
+42 -4
Documentation/driver-api/cxl/index.rst
··· 4 4 Compute Express Link 5 5 ==================== 6 6 7 + CXL device configuration has a complex handoff between platform (Hardware, 8 + BIOS, EFI), OS (early boot, core kernel, driver), and user policy decisions 9 + that have impacts on each other. The docs here break up configurations steps. 10 + 7 11 .. toctree:: 8 - :maxdepth: 1 12 + :maxdepth: 2 13 + :caption: Overview 9 14 10 - memory-devices 11 - access-coordinates 12 - 15 + theory-of-operation 13 16 maturity-map 17 + 18 + .. toctree:: 19 + :maxdepth: 2 20 + :caption: Device Reference 21 + 22 + devices/device-types 23 + 24 + .. toctree:: 25 + :maxdepth: 2 26 + :caption: Platform Configuration 27 + 28 + platform/bios-and-efi 29 + platform/acpi 30 + platform/cdat 31 + platform/example-configs 32 + 33 + .. toctree:: 34 + :maxdepth: 2 35 + :caption: Linux Kernel Configuration 36 + 37 + linux/overview 38 + linux/early-boot 39 + linux/cxl-driver 40 + linux/dax-driver 41 + linux/memory-hotplug 42 + linux/access-coordinates 43 + 44 + .. toctree:: 45 + :maxdepth: 2 46 + :caption: Memory Allocation 47 + 48 + allocation/dax 49 + allocation/page-allocator 50 + allocation/reclaim 51 + allocation/hugepages.rst 14 52 15 53 .. only:: subproject and html
+178
Documentation/driver-api/cxl/linux/access-coordinates.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + .. include:: <isonum.txt> 3 + 4 + ================================== 5 + CXL Access Coordinates Computation 6 + ================================== 7 + 8 + Latency and Bandwidth Calculation 9 + ================================= 10 + A memory region performance coordinates (latency and bandwidth) are typically 11 + provided via ACPI tables :doc:`SRAT <../platform/acpi/srat>` and 12 + :doc:`HMAT <../platform/acpi/hmat>`. However, the platform firmware (BIOS) is 13 + not able to annotate those for CXL devices that are hot-plugged since they do 14 + not exist during platform firmware initialization. The CXL driver can compute 15 + the performance coordinates by retrieving data from several components. 16 + 17 + The :doc:`SRAT <../platform/acpi/srat>` provides a Generic Port Affinity 18 + subtable that ties a proximity domain to a device handle, which in this case 19 + would be the CXL hostbridge. Using this association, the performance 20 + coordinates for the Generic Port can be retrieved from the 21 + :doc:`HMAT <../platform/acpi/hmat>` subtable. This piece represents the 22 + performance coordinates between a CPU and a Generic Port (CXL hostbridge). 23 + 24 + The :doc:`CDAT <../platform/cdat>` provides the performance coordinates for 25 + the CXL device itself. That is the bandwidth and latency to access that device's 26 + memory region. The DSMAS subtable provides a DSMADHandle that is tied to a 27 + Device Physical Address (DPA) range. The DSLBIS subtable provides the 28 + performance coordinates that's tied to a DSMADhandle and this ties the two 29 + table entries together to provide the performance coordinates for each DPA 30 + region. For example, if a device exports a DRAM region and a PMEM region, 31 + then there would be different performance characteristsics for each of those 32 + regions. 33 + 34 + If there's a CXL switch in the topology, then the performance coordinates for the 35 + switch is provided by SSLBIS subtable. This provides the bandwidth and latency 36 + for traversing the switch between the switch upstream port and the switch 37 + downstream port that points to the endpoint device. 38 + 39 + Simple topology example:: 40 + 41 + GP0/HB0/ACPI0016-0 42 + RP0 43 + | 44 + | L0 45 + | 46 + SW 0 / USP0 47 + SW 0 / DSP0 48 + | 49 + | L1 50 + | 51 + EP0 52 + 53 + In this example, there is a CXL switch between an endpoint and a root port. 54 + Latency in this example is calculated as such: 55 + L(EP0) - Latency from EP0 CDAT DSMAS+DSLBIS 56 + L(L1) - Link latency between EP0 and SW0DSP0 57 + L(SW0) - Latency for the switch from SW0 CDAT SSLBIS. 58 + L(L0) - Link latency between SW0 and RP0 59 + L(RP0) - Latency from root port to CPU via SRAT and HMAT (Generic Port). 60 + Total read and write latencies are the sum of all these parts. 61 + 62 + Bandwidth in this example is calculated as such: 63 + B(EP0) - Bandwidth from EP0 CDAT DSMAS+DSLBIS 64 + B(L1) - Link bandwidth between EP0 and SW0DSP0 65 + B(SW0) - Bandwidth for the switch from SW0 CDAT SSLBIS. 66 + B(L0) - Link bandwidth between SW0 and RP0 67 + B(RP0) - Bandwidth from root port to CPU via SRAT and HMAT (Generic Port). 68 + The total read and write bandwidth is the min() of all these parts. 69 + 70 + To calculate the link bandwidth: 71 + LinkOperatingFrequency (GT/s) is the current negotiated link speed. 72 + DataRatePerLink (MB/s) = LinkOperatingFrequency / 8 73 + Bandwidth (MB/s) = PCIeCurrentLinkWidth * DataRatePerLink 74 + Where PCIeCurrentLinkWidth is the number of lanes in the link. 75 + 76 + To calculate the link latency: 77 + LinkLatency (picoseconds) = FlitSize / LinkBandwidth (MB/s) 78 + 79 + See `CXL Memory Device SW Guide r1.0 <https://www.intel.com/content/www/us/en/content-details/643805/cxl-memory-device-software-guide.html>`_, 80 + section 2.11.3 and 2.11.4 for details. 81 + 82 + In the end, the access coordinates for a constructed memory region is calculated from one 83 + or more memory partitions from each of the CXL device(s). 84 + 85 + Shared Upstream Link Calculation 86 + ================================ 87 + For certain CXL region construction with endpoints behind CXL switches (SW) or 88 + Root Ports (RP), there is the possibility of the total bandwidth for all 89 + the endpoints behind a switch being more than the switch upstream link. 90 + A similar situation can occur within the host, upstream of the root ports. 91 + The CXL driver performs an additional pass after all the targets have 92 + arrived for a region in order to recalculate the bandwidths with possible 93 + upstream link being a limiting factor in mind. 94 + 95 + The algorithm assumes the configuration is a symmetric topology as that 96 + maximizes performance. When asymmetric topology is detected, the calculation 97 + is aborted. An asymmetric topology is detected during topology walk where the 98 + number of RPs detected as a grandparent is not equal to the number of devices 99 + iterated in the same iteration loop. The assumption is made that subtle 100 + asymmetry in properties does not happen and all paths to EPs are equal. 101 + 102 + There can be multiple switches under an RP. There can be multiple RPs under 103 + a CXL Host Bridge (HB). There can be multiple HBs under a CXL Fixed Memory 104 + Window Structure (CFMWS) in the :doc:`CEDT <../platform/acpi/cedt>`. 105 + 106 + An example hierarchy:: 107 + 108 + CFMWS 0 109 + | 110 + _________|_________ 111 + | | 112 + ACPI0017-0 ACPI0017-1 113 + GP0/HB0/ACPI0016-0 GP1/HB1/ACPI0016-1 114 + | | | | 115 + RP0 RP1 RP2 RP3 116 + | | | | 117 + SW 0 SW 1 SW 2 SW 3 118 + | | | | | | | | 119 + EP0 EP1 EP2 EP3 EP4 EP5 EP6 EP7 120 + 121 + Computation for the example hierarchy: 122 + 123 + Min (GP0 to CPU BW, 124 + Min(SW 0 Upstream Link to RP0 BW, 125 + Min(SW0SSLBIS for SW0DSP0 (EP0), EP0 DSLBIS, EP0 Upstream Link) + 126 + Min(SW0SSLBIS for SW0DSP1 (EP1), EP1 DSLBIS, EP1 Upstream link)) + 127 + Min(SW 1 Upstream Link to RP1 BW, 128 + Min(SW1SSLBIS for SW1DSP0 (EP2), EP2 DSLBIS, EP2 Upstream Link) + 129 + Min(SW1SSLBIS for SW1DSP1 (EP3), EP3 DSLBIS, EP3 Upstream link))) + 130 + Min (GP1 to CPU BW, 131 + Min(SW 2 Upstream Link to RP2 BW, 132 + Min(SW2SSLBIS for SW2DSP0 (EP4), EP4 DSLBIS, EP4 Upstream Link) + 133 + Min(SW2SSLBIS for SW2DSP1 (EP5), EP5 DSLBIS, EP5 Upstream link)) + 134 + Min(SW 3 Upstream Link to RP3 BW, 135 + Min(SW3SSLBIS for SW3DSP0 (EP6), EP6 DSLBIS, EP6 Upstream Link) + 136 + Min(SW3SSLBIS for SW3DSP1 (EP7), EP7 DSLBIS, EP7 Upstream link)))) 137 + 138 + The calculation starts at cxl_region_shared_upstream_perf_update(). A xarray 139 + is created to collect all the endpoint bandwidths via the 140 + cxl_endpoint_gather_bandwidth() function. The min() of bandwidth from the 141 + endpoint CDAT and the upstream link bandwidth is calculated. If the endpoint 142 + has a CXL switch as a parent, then min() of calculated bandwidth and the 143 + bandwidth from the SSLBIS for the switch downstream port that is associated 144 + with the endpoint is calculated. The final bandwidth is stored in a 145 + 'struct cxl_perf_ctx' in the xarray indexed by a device pointer. If the 146 + endpoint is direct attached to a root port (RP), the device pointer would be an 147 + RP device. If the endpoint is behind a switch, the device pointer would be the 148 + upstream device of the parent switch. 149 + 150 + At the next stage, the code walks through one or more switches if they exist 151 + in the topology. For endpoints directly attached to RPs, this step is skipped. 152 + If there is another switch upstream, the code takes the min() of the current 153 + gathered bandwidth and the upstream link bandwidth. If there's a switch 154 + upstream, then the SSLBIS of the upstream switch. 155 + 156 + Once the topology walk reaches the RP, whether it's direct attached endpoints 157 + or walking through the switch(es), cxl_rp_gather_bandwidth() is called. At 158 + this point all the bandwidths are aggregated per each host bridge, which is 159 + also the index for the resulting xarray. 160 + 161 + The next step is to take the min() of the per host bridge bandwidth and the 162 + bandwidth from the Generic Port (GP). The bandwidths for the GP are retrieved 163 + via ACPI tables (:doc:`SRAT <../platform/acpi/srat>` and 164 + :doc:`HMAT <../platform/acpi/hmat>`). The minimum bandwidth are aggregated 165 + under the same ACPI0017 device to form a new xarray. 166 + 167 + Finally, the cxl_region_update_bandwidth() is called and the aggregated 168 + bandwidth from all the members of the last xarray is updated for the 169 + access coordinates residing in the cxl region (cxlr) context. 170 + 171 + QTG ID 172 + ====== 173 + Each :doc:`CEDT <../platform/acpi/cedt>` has a QTG ID field. This field provides 174 + the ID that associates with a QoS Throttling Group (QTG) for the CFMWS window. 175 + Once the access coordinates are calculated, an ACPI Device Specific Method can 176 + be issued to the ACPI0016 device to retrieve the QTG ID depends on the access 177 + coordinates provided. The QTG ID for the device can be used as guidance to match 178 + to the CFMWS to setup the best Linux root decoder for the device performance.
+630
Documentation/driver-api/cxl/linux/cxl-driver.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ==================== 4 + CXL Driver Operation 5 + ==================== 6 + 7 + The devices described in this section are present in :: 8 + 9 + /sys/bus/cxl/devices/ 10 + /dev/cxl/ 11 + 12 + The :code:`cxl-cli` library, maintained as part of the NDTCL project, may 13 + be used to script interactions with these devices. 14 + 15 + Drivers 16 + ======= 17 + The CXL driver is split into a number of drivers. 18 + 19 + * cxl_core - fundamental init interface and core object creation 20 + * cxl_port - initializes root and provides port enumeration interface. 21 + * cxl_acpi - initializes root decoders and interacts with ACPI data. 22 + * cxl_p/mem - initializes memory devices 23 + * cxl_pci - uses cxl_port to enumates the actual fabric hierarchy. 24 + 25 + Driver Devices 26 + ============== 27 + Here is an example from a single-socket system with 4 host bridges. Two host 28 + bridges have a single memory device attached, and the devices are interleaved 29 + into a single memory region. The memory region has been converted to dax. :: 30 + 31 + # ls /sys/bus/cxl/devices/ 32 + dax_region0 decoder3.0 decoder6.0 mem0 port3 33 + decoder0.0 decoder4.0 decoder6.1 mem1 port4 34 + decoder1.0 decoder5.0 endpoint5 port1 region0 35 + decoder2.0 decoder5.1 endpoint6 port2 root0 36 + 37 + 38 + .. kernel-render:: DOT 39 + :alt: Digraph of CXL fabric describing host-bridge interleaving 40 + :caption: Diagraph of CXL fabric with a host-bridge interleave memory region 41 + 42 + digraph foo { 43 + "root0" -> "port1"; 44 + "root0" -> "port3"; 45 + "root0" -> "decoder0.0"; 46 + "port1" -> "endpoint5"; 47 + "port3" -> "endpoint6"; 48 + "port1" -> "decoder1.0"; 49 + "port3" -> "decoder3.0"; 50 + "endpoint5" -> "decoder5.0"; 51 + "endpoint6" -> "decoder6.0"; 52 + "decoder0.0" -> "region0"; 53 + "decoder0.0" -> "decoder1.0"; 54 + "decoder0.0" -> "decoder3.0"; 55 + "decoder1.0" -> "decoder5.0"; 56 + "decoder3.0" -> "decoder6.0"; 57 + "decoder5.0" -> "region0"; 58 + "decoder6.0" -> "region0"; 59 + "region0" -> "dax_region0"; 60 + "dax_region0" -> "dax0.0"; 61 + } 62 + 63 + For this section we'll explore the devices present in this configuration, but 64 + we'll explore more configurations in-depth in example configurations below. 65 + 66 + Base Devices 67 + ------------ 68 + Most devices in a CXL fabric are a `port` of some kind (because each 69 + device mostly routes request from one device to the next, rather than 70 + provide a direct service). 71 + 72 + Root 73 + ~~~~ 74 + The `CXL Root` is logical object created by the `cxl_acpi` driver during 75 + :code:`cxl_acpi_probe` - if the :code:`ACPI0017` `Compute Express Link 76 + Root Object` Device Class is found. 77 + 78 + The Root contains links to: 79 + 80 + * `Host Bridge Ports` defined by CHBS in the :doc:`CEDT<../platform/acpi/cedt>` 81 + 82 + * `Downstream Ports` typically connected to `Host Bridge Ports`. 83 + 84 + * `Root Decoders` defined by CFMWS the :doc:`CEDT<../platform/acpi/cedt>` 85 + 86 + :: 87 + 88 + # ls /sys/bus/cxl/devices/root0 89 + decoder0.0 dport0 dport5 port2 subsystem 90 + decoders_committed dport1 modalias port3 uevent 91 + devtype dport4 port1 port4 uport 92 + 93 + # cat /sys/bus/cxl/devices/root0/devtype 94 + cxl_port 95 + 96 + # cat port1/devtype 97 + cxl_port 98 + 99 + # cat decoder0.0/devtype 100 + cxl_decoder_root 101 + 102 + The root is first `logical port` in the CXL fabric, as presented by the Linux 103 + CXL driver. The `CXL root` is a special type of `switch port`, in that it 104 + only has downstream port connections. 105 + 106 + Port 107 + ~~~~ 108 + A `port` object is better described as a `switch port`. It may represent a 109 + host bridge to the root or an actual switch port on a switch. A `switch port` 110 + contains one or more decoders used to route memory requests downstream ports, 111 + which may be connected to another `switch port` or an `endpoint port`. 112 + 113 + :: 114 + 115 + # ls /sys/bus/cxl/devices/port1 116 + decoder1.0 dport0 driver parent_dport uport 117 + decoders_committed dport113 endpoint5 subsystem 118 + devtype dport2 modalias uevent 119 + 120 + # cat devtype 121 + cxl_port 122 + 123 + # cat decoder1.0/devtype 124 + cxl_decoder_switch 125 + 126 + # cat endpoint5/devtype 127 + cxl_port 128 + 129 + CXL `Host Bridges` in the fabric are probed during :code:`cxl_acpi_probe` at 130 + the time the `CXL Root` is probed. The allows for the immediate logical 131 + connection to between the root and host bridge. 132 + 133 + * The root has a downstream port connection to a host bridge 134 + 135 + * The host bridge has an upstream port connection to the root. 136 + 137 + * The host bridge has one or more downstream port connections to switch 138 + or endpoint ports. 139 + 140 + A `Host Bridge` is a special type of CXL `switch port`. It is explicitly 141 + defined in the ACPI specification via `ACPI0016` ID. `Host Bridge` ports 142 + will be probed at `acpi_probe` time, while similar ports on an actual switch 143 + will be probed later. Otherwise, switch and host bridge ports look very 144 + similar - the both contain switch decoders which route accesses between 145 + upstream and downstream ports. 146 + 147 + Endpoint 148 + ~~~~~~~~ 149 + An `endpoint` is a terminal port in the fabric. This is a `logical device`, 150 + and may be one of many `logical devices` presented by a memory device. It 151 + is still considered a type of `port` in the fabric. 152 + 153 + An `endpoint` contains `endpoint decoders` and the device's Coherent Device 154 + Attribute Table (which describes the device's capabilities). :: 155 + 156 + # ls /sys/bus/cxl/devices/endpoint5 157 + CDAT decoders_committed modalias uevent 158 + decoder5.0 devtype parent_dport uport 159 + decoder5.1 driver subsystem 160 + 161 + # cat /sys/bus/cxl/devices/endpoint5/devtype 162 + cxl_port 163 + 164 + # cat /sys/bus/cxl/devices/endpoint5/decoder5.0/devtype 165 + cxl_decoder_endpoint 166 + 167 + 168 + Memory Device (memdev) 169 + ~~~~~~~~~~~~~~~~~~~~~~ 170 + A `memdev` is probed and added by the `cxl_pci` driver in :code:`cxl_pci_probe` 171 + and is managed by the `cxl_mem` driver. It primarily provides the `IOCTL` 172 + interface to a memory device, via :code:`/dev/cxl/memN`, and exposes various 173 + device configuration data. :: 174 + 175 + # ls /sys/bus/cxl/devices/mem0 176 + dev firmware_version payload_max security uevent 177 + driver label_storage_size pmem serial 178 + firmware numa_node ram subsystem 179 + 180 + A Memory Device is a discrete base object that is not a port. While the 181 + physical device it belongs to may also host an `endpoint`, the relationship 182 + between an `endpoint` and a `memdev` is not captured in sysfs. 183 + 184 + Port Relationships 185 + ~~~~~~~~~~~~~~~~~~ 186 + In our example described above, there are four host bridges attached to the 187 + root, and two of the host bridges have one endpoint attached. 188 + 189 + .. kernel-render:: DOT 190 + :alt: Digraph of CXL fabric describing host-bridge interleaving 191 + :caption: Diagraph of CXL fabric with a host-bridge interleave memory region 192 + 193 + digraph foo { 194 + "root0" -> "port1"; 195 + "root0" -> "port2"; 196 + "root0" -> "port3"; 197 + "root0" -> "port4"; 198 + "port1" -> "endpoint5"; 199 + "port3" -> "endpoint6"; 200 + } 201 + 202 + Decoders 203 + -------- 204 + A `Decoder` is short for a CXL Host-Managed Device Memory (HDM) Decoder. It is 205 + a device that routes accesses through the CXL fabric to an endpoint, and at 206 + the endpoint translates a `Host Physical` to `Device Physical` Addressing. 207 + 208 + The CXL 3.1 specification heavily implies that only endpoint decoders should 209 + engage in translation of `Host Physical Address` to `Device Physical Address`. 210 + :: 211 + 212 + 8.2.4.20 CXL HDM Decoder Capability Structure 213 + 214 + IMPLEMENTATION NOTE 215 + CXL Host Bridge and Upstream Switch Port Decode Flow 216 + 217 + IMPLEMENTATION NOTE 218 + Device Decode Logic 219 + 220 + These notes imply that there are two logical groups of decoders. 221 + 222 + * Routing Decoder - a decoder which routes accesses but does not translate 223 + addresses from HPA to DPA. 224 + 225 + * Translating Decoder - a decoder which translates accesses from HPA to DPA 226 + for an endpoint to service. 227 + 228 + The CXL drivers distinguish 3 decoder types: root, switch, and endpoint. Only 229 + endpoint decoders are Translating Decoders, all others are Routing Decoders. 230 + 231 + .. note:: PLATFORM VENDORS BE AWARE 232 + 233 + Linux makes a strong assumption that endpoint decoders are the only decoder 234 + in the fabric that actively translates HPA to DPA. Linux assumes routing 235 + decoders pass the HPA unchanged to the next decoder in the fabric. 236 + 237 + It is therefore assumed that any given decoder in the fabric will have an 238 + address range that is a subset of its upstream port decoder. Any deviation 239 + from this scheme undefined per the specification. Linux prioritizes 240 + spec-defined / architectural behavior. 241 + 242 + Decoders may have one or more `Downstream Targets` if configured to interleave 243 + memory accesses. This will be presented in sysfs via the :code:`target_list` 244 + parameter. 245 + 246 + Root Decoder 247 + ~~~~~~~~~~~~ 248 + A `Root Decoder` is logical construct of the physical address and interleave 249 + configurations present in the CFMWS field of the :doc:`CEDT 250 + <../platform/acpi/cedt>`. 251 + Linux presents this information as a decoder present in the `CXL Root`. We 252 + consider this a `Root Decoder`, though technically it exists on the boundary 253 + of the CXL specification and platform-specific CXL root implementations. 254 + 255 + Linux considers these logical decoders a type of `Routing Decoder`, and is the 256 + first decoder in the CXL fabric to receive a memory access from the platform's 257 + memory controllers. 258 + 259 + `Root Decoders` are created during :code:`cxl_acpi_probe`. One root decoder 260 + is created per CFMWS entry in the :doc:`CEDT <../platform/acpi/cedt>`. 261 + 262 + The :code:`target_list` parameter is filled by the CFMWS target fields. Targets 263 + of a root decoder are `Host Bridges`, which means interleave done at the root 264 + decoder level is an `Inter-Host-Bridge Interleave`. 265 + 266 + Only root decoders are capable of `Inter-Host-Bridge Interleave`. 267 + 268 + Such interleaves must be configured by the platform and described in the ACPI 269 + CEDT CFMWS, as the target CXL host bridge UIDs in the CFMWS must match the CXL 270 + host bridge UIDs in the CHBS field of the :doc:`CEDT 271 + <../platform/acpi/cedt>` and the UID field of CXL Host Bridges defined in 272 + the :doc:`DSDT <../platform/acpi/dsdt>`. 273 + 274 + Interleave settings in a root decoder describe how to interleave accesses among 275 + the *immediate downstream targets*, not the entire interleave set. 276 + 277 + The memory range described in the root decoder is used to 278 + 279 + 1) Create a memory region (:code:`region0` in this example), and 280 + 281 + 2) Associate the region with an IO Memory Resource (:code:`kernel/resource.c`) 282 + 283 + :: 284 + 285 + # ls /sys/bus/cxl/devices/decoder0.0/ 286 + cap_pmem devtype region0 287 + cap_ram interleave_granularity size 288 + cap_type2 interleave_ways start 289 + cap_type3 locked subsystem 290 + create_ram_region modalias target_list 291 + delete_region qos_class uevent 292 + 293 + # cat /sys/bus/cxl/devices/decoder0.0/region0/resource 294 + 0xc050000000 295 + 296 + The IO Memory Resource is created during early boot when the CFMWS region is 297 + identified in the EFI Memory Map or E820 table (on x86). 298 + 299 + Root decoders are defined as a separate devtype, but are also a type 300 + of `Switch Decoder` due to having downstream targets. :: 301 + 302 + # cat /sys/bus/cxl/devices/decoder0.0/devtype 303 + cxl_decoder_root 304 + 305 + Switch Decoder 306 + ~~~~~~~~~~~~~~ 307 + Any non-root, translating decoder is considered a `Switch Decoder`, and will 308 + present with the type :code:`cxl_decoder_switch`. Both `Host Bridge` and `CXL 309 + Switch` (device) decoders are of type :code:`cxl_decoder_switch`. :: 310 + 311 + # ls /sys/bus/cxl/devices/decoder1.0/ 312 + devtype locked size target_list 313 + interleave_granularity modalias start target_type 314 + interleave_ways region subsystem uevent 315 + 316 + # cat /sys/bus/cxl/devices/decoder1.0/devtype 317 + cxl_decoder_switch 318 + 319 + # cat /sys/bus/cxl/devices/decoder1.0/region 320 + region0 321 + 322 + A `Switch Decoder` has associations between a region defined by a root 323 + decoder and downstream target ports. Interleaving done within a switch decoder 324 + is a multi-downstream-port interleave (or `Intra-Host-Bridge Interleave` for 325 + host bridges). 326 + 327 + Interleave settings in a switch decoder describe how to interleave accesses 328 + among the *immediate downstream targets*, not the entire interleave set. 329 + 330 + Switch decoders are created during :code:`cxl_switch_port_probe` in the 331 + :code:`cxl_port` driver, and is created based on a PCI device's DVSEC 332 + registers. 333 + 334 + Switch decoder programming is validated during probe if the platform programs 335 + them during boot (See `Auto Decoders` below), or on commit if programmed at 336 + runtime (See `Runtime Programming` below). 337 + 338 + 339 + Endpoint Decoder 340 + ~~~~~~~~~~~~~~~~ 341 + Any decoder attached to a *terminal* point in the CXL fabric (`An Endpoint`) is 342 + considered an `Endpoint Decoder`. Endpoint decoders are of type 343 + :code:`cxl_decoder_endpoint`. :: 344 + 345 + # ls /sys/bus/cxl/devices/decoder5.0 346 + devtype locked start 347 + dpa_resource modalias subsystem 348 + dpa_size mode target_type 349 + interleave_granularity region uevent 350 + interleave_ways size 351 + 352 + # cat /sys/bus/cxl/devices/decoder5.0/devtype 353 + cxl_decoder_endpoint 354 + 355 + # cat /sys/bus/cxl/devices/decoder5.0/region 356 + region0 357 + 358 + An `Endpoint Decoder` has an association with a region defined by a root 359 + decoder and describes the device-local resource associated with this region. 360 + 361 + Unlike root and switch decoders, endpoint decoders translate `Host Physical` to 362 + `Device Physical` address ranges. The interleave settings on an endpoint 363 + therefore describe the entire *interleave set*. 364 + 365 + `Device Physical Address` regions must be committed in-order. For example, the 366 + DPA region starting at 0x80000000 cannot be committed before the DPA region 367 + starting at 0x0. 368 + 369 + As of Linux v6.15, Linux does not support *imbalanced* interleave setups, all 370 + endpoints in an interleave set are expected to have the same interleave 371 + settings (granularity and ways must be the same). 372 + 373 + Endpoint decoders are created during :code:`cxl_endpoint_port_probe` in the 374 + :code:`cxl_port` driver, and is created based on a PCI device's DVSEC registers. 375 + 376 + Decoder Relationships 377 + ~~~~~~~~~~~~~~~~~~~~~ 378 + In our example described above, there is one root decoder which routes memory 379 + accesses over two host bridges. Each host bridge has a decoder which routes 380 + access to their singular endpoint targets. Each endpoint has a decoder which 381 + translates HPA to DPA and services the memory request. 382 + 383 + The driver validates relationships between ports by decoder programming, so 384 + we can think of decoders being related in a similarly hierarchical fashion to 385 + ports. 386 + 387 + .. kernel-render:: DOT 388 + :alt: Digraph of hierarchical relationship between root, switch, and endpoint decoders. 389 + :caption: Diagraph of CXL root, switch, and endpoint decoders. 390 + 391 + digraph foo { 392 + "root0" -> "decoder0.0"; 393 + "decoder0.0" -> "decoder1.0"; 394 + "decoder0.0" -> "decoder3.0"; 395 + "decoder1.0" -> "decoder5.0"; 396 + "decoder3.0" -> "decoder6.0"; 397 + } 398 + 399 + Regions 400 + ------- 401 + 402 + Memory Region 403 + ~~~~~~~~~~~~~ 404 + A `Memory Region` is a logical construct that connects a set of CXL ports in 405 + the fabric to an IO Memory Resource. It is ultimately used to expose the memory 406 + on these devices to the DAX subsystem via a `DAX Region`. 407 + 408 + An example RAM region: :: 409 + 410 + # ls /sys/bus/cxl/devices/region0/ 411 + access0 devtype modalias subsystem uuid 412 + access1 driver mode target0 413 + commit interleave_granularity resource target1 414 + dax_region0 interleave_ways size uevent 415 + 416 + A memory region can be constructed during endpoint probe, if decoders were 417 + programmed by BIOS/EFI (see `Auto Decoders`), or by creating a region manually 418 + via a `Root Decoder`'s :code:`create_ram_region` or :code:`create_pmem_region` 419 + interfaces. 420 + 421 + The interleave settings in a `Memory Region` describe the configuration of the 422 + `Interleave Set` - and are what can be expected to be seen in the endpoint 423 + interleave settings. 424 + 425 + .. kernel-render:: DOT 426 + :alt: Digraph of CXL memory region relationships between root and endpoint decoders. 427 + :caption: Regions are created based on root decoder configurations. Endpoint decoders 428 + must be programmed with the same interleave settings as the region. 429 + 430 + digraph foo { 431 + "root0" -> "decoder0.0"; 432 + "decoder0.0" -> "region0"; 433 + "region0" -> "decoder5.0"; 434 + "region0" -> "decoder6.0"; 435 + } 436 + 437 + DAX Region 438 + ~~~~~~~~~~ 439 + A `DAX Region` is used to convert a CXL `Memory Region` to a DAX device. A 440 + DAX device may then be accessed directly via a file descriptor interface, or 441 + converted to System RAM via the DAX kmem driver. See the DAX driver section 442 + for more details. :: 443 + 444 + # ls /sys/bus/cxl/devices/dax_region0/ 445 + dax0.0 devtype modalias uevent 446 + dax_region driver subsystem 447 + 448 + Mailbox Interfaces 449 + ------------------ 450 + A mailbox command interface for each device is exposed in :: 451 + 452 + /dev/cxl/mem0 453 + /dev/cxl/mem1 454 + 455 + These mailboxes may receive any specification-defined command. Raw commands 456 + (custom commands) can only be sent to these interfaces if the build config 457 + :code:`CXL_MEM_RAW_COMMANDS` is set. This is considered a debug and/or 458 + development interface, not an officially supported mechanism for creation 459 + of vendor-specific commands (see the `fwctl` subsystem for that). 460 + 461 + Decoder Programming 462 + =================== 463 + 464 + Runtime Programming 465 + ------------------- 466 + During probe, the only decoders *required* to be programmed are `Root Decoders`. 467 + In reality, `Root Decoders` are a logical construct to describe the memory 468 + region and interleave configuration at the host bridge level - as described 469 + in the ACPI CEDT CFMWS. 470 + 471 + All other `Switch` and `Endpoint` decoders may be programmed by the user 472 + at runtime - if the platform supports such configurations. 473 + 474 + This interaction is what creates a `Software Defined Memory` environment. 475 + 476 + See the :code:`cxl-cli` documentation for more information about how to 477 + configure CXL decoders at runtime. 478 + 479 + Auto Decoders 480 + ------------- 481 + Auto Decoders are decoders programmed by BIOS/EFI at boot time, and are 482 + almost always locked (cannot be changed). This is done by a platform 483 + which may have a static configuration - or certain quirks which may prevent 484 + dynamic runtime changes to the decoders (such as requiring additional 485 + controller programming within the CPU complex outside the scope of CXL). 486 + 487 + Auto Decoders are probed automatically as long as the devices and memory 488 + regions they are associated with probe without issue. When probing Auto 489 + Decoders, the driver's primary responsibility is to ensure the fabric is 490 + sane - as-if validating runtime programmed regions and decoders. 491 + 492 + If Linux cannot validate auto-decoder configuration, the memory will not 493 + be surfaced as a DAX device - and therefore not be exposed to the page 494 + allocator - effectively stranding it. 495 + 496 + Interleave 497 + ---------- 498 + 499 + The Linux CXL driver supports `Cross-Link First` interleave. This dictates 500 + how interleave is programmed at each decoder step, as the driver validates 501 + the relationships between a decoder and it's parent. 502 + 503 + For example, in a `Cross-Link First` interleave setup with 16 endpoints 504 + attached to 4 host bridges, linux expects the following ways/granularity 505 + across the root, host bridge, and endpoints respectively. 506 + 507 + .. flat-table:: 4x4 cross-link first interleave settings 508 + 509 + * - decoder 510 + - ways 511 + - granularity 512 + 513 + * - root 514 + - 4 515 + - 256 516 + 517 + * - host bridge 518 + - 4 519 + - 1024 520 + 521 + * - endpoint 522 + - 16 523 + - 256 524 + 525 + At the root, every a given access will be routed to the 526 + :code:`((HPA / 256) % 4)th` target host bridge. Within a host bridge, every 527 + :code:`((HPA / 1024) % 4)th` target endpoint. Each endpoint translates based 528 + on the entire 16 device interleave set. 529 + 530 + Unbalanced interleave sets are not supported - decoders at a similar point 531 + in the hierarchy (e.g. all host bridge decoders) must have the same ways and 532 + granularity configuration. 533 + 534 + At Root 535 + ~~~~~~~ 536 + Root decoder interleave is defined by CFMWS field of the :doc:`CEDT 537 + <../platform/acpi/cedt>`. The CEDT may actually define multiple CFMWS 538 + configurations to describe the same physical capacity, with the intent to allow 539 + users to decide at runtime whether to online memory as interleaved or 540 + non-interleaved. :: 541 + 542 + Subtable Type : 01 [CXL Fixed Memory Window Structure] 543 + Window base address : 0000000100000000 544 + Window size : 0000000100000000 545 + Interleave Members (2^n) : 00 546 + Interleave Arithmetic : 00 547 + First Target : 00000007 548 + 549 + Subtable Type : 01 [CXL Fixed Memory Window Structure] 550 + Window base address : 0000000200000000 551 + Window size : 0000000100000000 552 + Interleave Members (2^n) : 00 553 + Interleave Arithmetic : 00 554 + First Target : 00000006 555 + 556 + Subtable Type : 01 [CXL Fixed Memory Window Structure] 557 + Window base address : 0000000300000000 558 + Window size : 0000000200000000 559 + Interleave Members (2^n) : 01 560 + Interleave Arithmetic : 00 561 + First Target : 00000007 562 + Next Target : 00000006 563 + 564 + In this example, the CFMWS defines two discrete non-interleaved 4GB regions 565 + for each host bridge, and one interleaved 8GB region that targets both. This 566 + would result in 3 root decoders presenting in the root. :: 567 + 568 + # ls /sys/bus/cxl/devices/root0/decoder* 569 + decoder0.0 decoder0.1 decoder0.2 570 + 571 + # cat /sys/bus/cxl/devices/decoder0.0/target_list start size 572 + 7 573 + 0x100000000 574 + 0x100000000 575 + 576 + # cat /sys/bus/cxl/devices/decoder0.1/target_list start size 577 + 6 578 + 0x200000000 579 + 0x100000000 580 + 581 + # cat /sys/bus/cxl/devices/decoder0.2/target_list start size 582 + 7,6 583 + 0x300000000 584 + 0x200000000 585 + 586 + These decoders are not runtime programmable. They are used to generate a 587 + `Memory Region` to bring this memory online with runtime programmed settings 588 + at the `Switch` and `Endpoint` decoders. 589 + 590 + At Host Bridge or Switch 591 + ~~~~~~~~~~~~~~~~~~~~~~~~ 592 + `Host Bridge` and `Switch` decoders are programmable via the following fields: 593 + 594 + - :code:`start` - the HPA region associated with the memory region 595 + - :code:`size` - the size of the region 596 + - :code:`target_list` - the list of downstream ports 597 + - :code:`interleave_ways` - the number downstream ports to interleave across 598 + - :code:`interleave_granularity` - the granularity to interleave at. 599 + 600 + Linux expects the :code:`interleave_granularity` of switch decoders to be 601 + derived from their upstream port connections. In `Cross-Link First` interleave 602 + configurations, the :code:`interleave_granularity` of a decoder is equal to 603 + :code:`parent_interleave_granularity * parent_interleave_ways`. 604 + 605 + At Endpoint 606 + ~~~~~~~~~~~ 607 + `Endpoint Decoders` are programmed similar to Host Bridge and Switch decoders, 608 + with the exception that the ways and granularity are defined by the interleave 609 + set (e.g. the interleave settings defined by the associated `Memory Region`). 610 + 611 + - :code:`start` - the HPA region associated with the memory region 612 + - :code:`size` - the size of the region 613 + - :code:`interleave_ways` - the number endpoints in the interleave set 614 + - :code:`interleave_granularity` - the granularity to interleave at. 615 + 616 + These settings are used by endpoint decoders to *Translate* memory requests 617 + from HPA to DPA. This is why they must be aware of the entire interleave set. 618 + 619 + Linux does not support unbalanced interleave configurations. As a result, all 620 + endpoints in an interleave set must have the same ways and granularity. 621 + 622 + Example Configurations 623 + ====================== 624 + .. toctree:: 625 + :maxdepth: 1 626 + 627 + example-configurations/single-device.rst 628 + example-configurations/hb-interleave.rst 629 + example-configurations/intra-hb-interleave.rst 630 + example-configurations/multi-interleave.rst
+43
Documentation/driver-api/cxl/linux/dax-driver.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ==================== 4 + DAX Driver Operation 5 + ==================== 6 + The `Direct Access Device` driver was originally designed to provide a 7 + memory-like access mechanism to memory-like block-devices. It was 8 + extended to support CXL Memory Devices, which provide user-configured 9 + memory devices. 10 + 11 + The CXL subsystem depends on the DAX subsystem to either: 12 + 13 + - Generate a file-like interface to userland via :code:`/dev/daxN.Y`, or 14 + - Engage the memory-hotplug interface to add CXL memory to page allocator. 15 + 16 + The DAX subsystem exposes this ability through the `cxl_dax_region` driver. 17 + A `dax_region` provides the translation between a CXL `memory_region` and 18 + a `DAX Device`. 19 + 20 + DAX Device 21 + ========== 22 + A `DAX Device` is a file-like interface exposed in :code:`/dev/daxN.Y`. A 23 + memory region exposed via dax device can be accessed via userland software 24 + via the :code:`mmap()` system-call. The result is direct mappings to the 25 + CXL capacity in the task's page tables. 26 + 27 + Users wishing to manually handle allocation of CXL memory should use this 28 + interface. 29 + 30 + kmem conversion 31 + =============== 32 + The :code:`dax_kmem` driver converts a `DAX Device` into a series of `hotplug 33 + memory blocks` managed by :code:`kernel/memory-hotplug.c`. This capacity 34 + will be exposed to the kernel page allocator in the user-selected memory 35 + zone. 36 + 37 + The :code:`memmap_on_memory` setting (both global and DAX device local) 38 + dictates where the kernell will allocate the :code:`struct folio` descriptors 39 + for this memory will come from. If :code:`memmap_on_memory` is set, memory 40 + hotplug will set aside a portion of the memory block capacity to allocate 41 + folios. If unset, the memory is allocated via a normal :code:`GFP_KERNEL` 42 + allocation - and as a result will most likely land on the local NUM node of the 43 + CPU executing the hotplug operation.
+137
Documentation/driver-api/cxl/linux/early-boot.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ======================= 4 + Linux Init (Early Boot) 5 + ======================= 6 + 7 + Linux configuration is split into two major steps: Early-Boot and everything else. 8 + 9 + During early boot, Linux sets up immutable resources (such as numa nodes), while 10 + later operations include things like driver probe and memory hotplug. Linux may 11 + read EFI and ACPI information throughout this process to configure logical 12 + representations of the devices. 13 + 14 + During Linux Early Boot stage (functions in the kernel that have the __init 15 + decorator), the system takes the resources created by EFI/BIOS 16 + (:doc:`ACPI tables <../platform/acpi>`) and turns them into resources that the 17 + kernel can consume. 18 + 19 + 20 + BIOS, Build and Boot Options 21 + ============================ 22 + 23 + There are 4 pre-boot options that need to be considered during kernel build 24 + which dictate how memory will be managed by Linux during early boot. 25 + 26 + * EFI_MEMORY_SP 27 + 28 + * BIOS/EFI Option that dictates whether memory is SystemRAM or 29 + Specific Purpose. Specific Purpose memory will be deferred to 30 + drivers to manage - and not immediately exposed as system RAM. 31 + 32 + * CONFIG_EFI_SOFT_RESERVE 33 + 34 + * Linux Build config option that dictates whether the kernel supports 35 + Specific Purpose memory. 36 + 37 + * CONFIG_MHP_DEFAULT_ONLINE_TYPE 38 + 39 + * Linux Build config that dictates whether and how Specific Purpose memory 40 + converted to a dax device should be managed (left as DAX or onlined as 41 + SystemRAM in ZONE_NORMAL or ZONE_MOVABLE). 42 + 43 + * nosoftreserve 44 + 45 + * Linux kernel boot option that dictates whether Soft Reserve should be 46 + supported. Similar to CONFIG_EFI_SOFT_RESERVE. 47 + 48 + Memory Map Creation 49 + =================== 50 + 51 + While the kernel parses the EFI memory map, if :code:`Specific Purpose` memory 52 + is supported and detected, it will set this region aside as 53 + :code:`SOFT_RESERVED`. 54 + 55 + If :code:`EFI_MEMORY_SP=0`, :code:`CONFIG_EFI_SOFT_RESERVE=n`, or 56 + :code:`nosoftreserve=y` - Linux will default a CXL device memory region to 57 + SystemRAM. This will expose the memory to the kernel page allocator in 58 + :code:`ZONE_NORMAL`, making it available for use for most allocations (including 59 + :code:`struct page` and page tables). 60 + 61 + If `Specific Purpose` is set and supported, :code:`CONFIG_MHP_DEFAULT_ONLINE_TYPE_*` 62 + dictates whether the memory is onlined by default (:code:`_OFFLINE` or 63 + :code:`_ONLINE_*`), and if online which zone to online this memory to by default 64 + (:code:`_NORMAL` or :code:`_MOVABLE`). 65 + 66 + If placed in :code:`ZONE_MOVABLE`, the memory will not be available for most 67 + kernel allocations (such as :code:`struct page` or page tables). This may 68 + significant impact performance depending on the memory capacity of the system. 69 + 70 + 71 + NUMA Node Reservation 72 + ===================== 73 + 74 + Linux refers to the proximity domains (:code:`PXM`) defined in the :doc:`SRAT 75 + <../platform/acpi/srat>` to create NUMA nodes in :code:`acpi_numa_init`. 76 + Typically, there is a 1:1 relation between :code:`PXM` and NUMA node IDs. 77 + 78 + The SRAT is the only ACPI defined way of defining Proximity Domains. Linux 79 + chooses to, at most, map those 1:1 with NUMA nodes. 80 + :doc:`CEDT <../platform/acpi/cedt>` adds a description of SPA ranges which 81 + Linux may map to one or more NUMA nodes. 82 + 83 + If there are CXL ranges in the CFMWS but not in SRAT, then a fake :code:`PXM` 84 + is created (as of v6.15). In the future, Linux may reject CFMWS not described 85 + by SRAT due to the ambiguity of proximity domain association. 86 + 87 + It is important to note that NUMA node creation cannot be done at runtime. All 88 + possible NUMA nodes are identified at :code:`__init` time, more specifically 89 + during :code:`mm_init`. The CEDT and SRAT must contain sufficient :code:`PXM` 90 + data for Linux to identify NUMA nodes their associated memory regions. 91 + 92 + The relevant code exists in: :code:`linux/drivers/acpi/numa/srat.c`. 93 + 94 + See :doc:`Example Platform Configurations <../platform/example-configs>` 95 + for more info. 96 + 97 + Memory Tiers Creation 98 + ===================== 99 + Memory tiers are a collection of NUMA nodes grouped by performance characteristics. 100 + During :code:`__init`, Linux initializes the system with a default memory tier that 101 + contains all nodes marked :code:`N_MEMORY`. 102 + 103 + :code:`memory_tier_init` is called at boot for all nodes with memory online by 104 + default. :code:`memory_tier_late_init` is called during late-init for nodes setup 105 + during driver configuration. 106 + 107 + Nodes are only marked :code:`N_MEMORY` if they have *online* memory. 108 + 109 + Tier membership can be inspected in :: 110 + 111 + /sys/devices/virtual/memory_tiering/memory_tierN/nodelist 112 + 0-1 113 + 114 + If nodes are grouped which have clear difference in performance, check the 115 + :doc:`HMAT <../platform/acpi/hmat>` and CDAT information for the CXL nodes. All 116 + nodes default to the DRAM tier, unless HMAT/CDAT information is reported to the 117 + memory_tier component via `access_coordinates`. 118 + 119 + For more, see :doc:`CXL access coordinates documentation 120 + <../linux/access-coordinates>`. 121 + 122 + Contiguous Memory Allocation 123 + ============================ 124 + The contiguous memory allocator (CMA) enables reservation of contiguous memory 125 + regions on NUMA nodes during early boot. However, CMA cannot reserve memory 126 + on NUMA nodes that are not online during early boot. :: 127 + 128 + void __init hugetlb_cma_reserve(int order) { 129 + if (!node_online(nid)) 130 + /* do not allow reservations */ 131 + } 132 + 133 + This means if users intend to defer management of CXL memory to the driver, CMA 134 + cannot be used to guarantee huge page allocations. If enabling CXL memory as 135 + SystemRAM in `ZONE_NORMAL` during early boot, CMA reservations per-node can be 136 + made with the :code:`cma_pernuma` or :code:`numa_cma` kernel command line 137 + parameters.
+314
Documentation/driver-api/cxl/linux/example-configurations/hb-interleave.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ============================ 4 + Inter-Host-Bridge Interleave 5 + ============================ 6 + This cxl-cli configuration dump shows the following host configuration: 7 + 8 + * A single socket system with one CXL root 9 + * CXL Root has Four (4) CXL Host Bridges 10 + * Two CXL Host Bridges have a single CXL Memory Expander Attached 11 + * The CXL root is configured to interleave across the two host bridges. 12 + 13 + This output is generated by :code:`cxl list -v` and describes the relationships 14 + between objects exposed in :code:`/sys/bus/cxl/devices/`. 15 + 16 + :: 17 + 18 + [ 19 + { 20 + "bus":"root0", 21 + "provider":"ACPI.CXL", 22 + "nr_dports":4, 23 + "dports":[ 24 + { 25 + "dport":"pci0000:00", 26 + "alias":"ACPI0016:01", 27 + "id":0 28 + }, 29 + { 30 + "dport":"pci0000:a8", 31 + "alias":"ACPI0016:02", 32 + "id":4 33 + }, 34 + { 35 + "dport":"pci0000:2a", 36 + "alias":"ACPI0016:03", 37 + "id":1 38 + }, 39 + { 40 + "dport":"pci0000:d2", 41 + "alias":"ACPI0016:00", 42 + "id":5 43 + } 44 + ], 45 + 46 + This chunk shows the CXL "bus" (root0) has 4 downstream ports attached to CXL 47 + Host Bridges. The `Root` can be considered the singular upstream port attached 48 + to the platform's memory controller - which routes memory requests to it. 49 + 50 + The `ports:root0` section lays out how each of these downstream ports are 51 + configured. If a port is not configured (id's 0 and 1), they are omitted. 52 + 53 + :: 54 + 55 + "ports:root0":[ 56 + { 57 + "port":"port1", 58 + "host":"pci0000:d2", 59 + "depth":1, 60 + "nr_dports":3, 61 + "dports":[ 62 + { 63 + "dport":"0000:d2:01.1", 64 + "alias":"device:02", 65 + "id":0 66 + }, 67 + { 68 + "dport":"0000:d2:01.3", 69 + "alias":"device:05", 70 + "id":2 71 + }, 72 + { 73 + "dport":"0000:d2:07.1", 74 + "alias":"device:0d", 75 + "id":113 76 + } 77 + ], 78 + 79 + This chunk shows the available downstream ports associated with the CXL Host 80 + Bridge :code:`port1`. In this case, :code:`port1` has 3 available downstream 81 + ports: :code:`dport1`, :code:`dport2`, and :code:`dport113`.. 82 + 83 + :: 84 + 85 + "endpoints:port1":[ 86 + { 87 + "endpoint":"endpoint5", 88 + "host":"mem0", 89 + "parent_dport":"0000:d2:01.1", 90 + "depth":2, 91 + "memdev":{ 92 + "memdev":"mem0", 93 + "ram_size":137438953472, 94 + "serial":0, 95 + "numa_node":0, 96 + "host":"0000:d3:00.0" 97 + }, 98 + "decoders:endpoint5":[ 99 + { 100 + "decoder":"decoder5.0", 101 + "resource":825975898112, 102 + "size":274877906944, 103 + "interleave_ways":2, 104 + "interleave_granularity":256, 105 + "region":"region0", 106 + "dpa_resource":0, 107 + "dpa_size":137438953472, 108 + "mode":"ram" 109 + } 110 + ] 111 + } 112 + ], 113 + 114 + This chunk shows the endpoints attached to the host bridge :code:`port1`. 115 + 116 + :code:`endpoint5` contains a single configured decoder :code:`decoder5.0` 117 + which has the same interleave configuration as :code:`region0` (shown later). 118 + 119 + Next we have the decodesr belonging to the host bridge: 120 + 121 + :: 122 + 123 + "decoders:port1":[ 124 + { 125 + "decoder":"decoder1.0", 126 + "resource":825975898112, 127 + "size":274877906944, 128 + "interleave_ways":1, 129 + "region":"region0", 130 + "nr_targets":1, 131 + "targets":[ 132 + { 133 + "target":"0000:d2:01.1", 134 + "alias":"device:02", 135 + "position":0, 136 + "id":0 137 + } 138 + ] 139 + } 140 + ] 141 + }, 142 + 143 + Host Bridge :code:`port1` has a single decoder (:code:`decoder1.0`), whose only 144 + target is :code:`dport1` - which is attached to :code:`endpoint5`. 145 + 146 + The following chunk shows a similar configuration for Host Bridge :code:`port3`, 147 + the second host bridge with a memory device attached. 148 + 149 + :: 150 + 151 + { 152 + "port":"port3", 153 + "host":"pci0000:a8", 154 + "depth":1, 155 + "nr_dports":1, 156 + "dports":[ 157 + { 158 + "dport":"0000:a8:01.1", 159 + "alias":"device:c3", 160 + "id":0 161 + } 162 + ], 163 + "endpoints:port3":[ 164 + { 165 + "endpoint":"endpoint6", 166 + "host":"mem1", 167 + "parent_dport":"0000:a8:01.1", 168 + "depth":2, 169 + "memdev":{ 170 + "memdev":"mem1", 171 + "ram_size":137438953472, 172 + "serial":0, 173 + "numa_node":0, 174 + "host":"0000:a9:00.0" 175 + }, 176 + "decoders:endpoint6":[ 177 + { 178 + "decoder":"decoder6.0", 179 + "resource":825975898112, 180 + "size":274877906944, 181 + "interleave_ways":2, 182 + "interleave_granularity":256, 183 + "region":"region0", 184 + "dpa_resource":0, 185 + "dpa_size":137438953472, 186 + "mode":"ram" 187 + } 188 + ] 189 + } 190 + ], 191 + "decoders:port3":[ 192 + { 193 + "decoder":"decoder3.0", 194 + "resource":825975898112, 195 + "size":274877906944, 196 + "interleave_ways":1, 197 + "region":"region0", 198 + "nr_targets":1, 199 + "targets":[ 200 + { 201 + "target":"0000:a8:01.1", 202 + "alias":"device:c3", 203 + "position":0, 204 + "id":0 205 + } 206 + ] 207 + } 208 + ] 209 + }, 210 + 211 + 212 + The next chunk shows the two CXL host bridges without attached endpoints. 213 + 214 + :: 215 + 216 + { 217 + "port":"port2", 218 + "host":"pci0000:00", 219 + "depth":1, 220 + "nr_dports":2, 221 + "dports":[ 222 + { 223 + "dport":"0000:00:01.3", 224 + "alias":"device:55", 225 + "id":2 226 + }, 227 + { 228 + "dport":"0000:00:07.1", 229 + "alias":"device:5d", 230 + "id":113 231 + } 232 + ] 233 + }, 234 + { 235 + "port":"port4", 236 + "host":"pci0000:2a", 237 + "depth":1, 238 + "nr_dports":1, 239 + "dports":[ 240 + { 241 + "dport":"0000:2a:01.1", 242 + "alias":"device:d0", 243 + "id":0 244 + } 245 + ] 246 + } 247 + ], 248 + 249 + Next we have the `Root Decoders` belonging to :code:`root0`. This root decoder 250 + applies the interleave across the downstream ports :code:`port1` and 251 + :code:`port3` - with a granularity of 256 bytes. 252 + 253 + This information is generated by the CXL driver reading the ACPI CEDT CMFWS. 254 + 255 + :: 256 + 257 + "decoders:root0":[ 258 + { 259 + "decoder":"decoder0.0", 260 + "resource":825975898112, 261 + "size":274877906944, 262 + "interleave_ways":2, 263 + "interleave_granularity":256, 264 + "max_available_extent":0, 265 + "volatile_capable":true, 266 + "nr_targets":2, 267 + "targets":[ 268 + { 269 + "target":"pci0000:a8", 270 + "alias":"ACPI0016:02", 271 + "position":1, 272 + "id":4 273 + }, 274 + { 275 + "target":"pci0000:d2", 276 + "alias":"ACPI0016:00", 277 + "position":0, 278 + "id":5 279 + } 280 + ], 281 + 282 + Finally we have the `Memory Region` associated with the `Root Decoder` 283 + :code:`decoder0.0`. This region describes the overall interleave configuration 284 + of the interleave set. 285 + 286 + :: 287 + 288 + "regions:decoder0.0":[ 289 + { 290 + "region":"region0", 291 + "resource":825975898112, 292 + "size":274877906944, 293 + "type":"ram", 294 + "interleave_ways":2, 295 + "interleave_granularity":256, 296 + "decode_state":"commit", 297 + "mappings":[ 298 + { 299 + "position":1, 300 + "memdev":"mem1", 301 + "decoder":"decoder6.0" 302 + }, 303 + { 304 + "position":0, 305 + "memdev":"mem0", 306 + "decoder":"decoder5.0" 307 + } 308 + ] 309 + } 310 + ] 311 + } 312 + ] 313 + } 314 + ]
+291
Documentation/driver-api/cxl/linux/example-configurations/intra-hb-interleave.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ============================ 4 + Intra-Host-Bridge Interleave 5 + ============================ 6 + This cxl-cli configuration dump shows the following host configuration: 7 + 8 + * A single socket system with one CXL root 9 + * CXL Root has Four (4) CXL Host Bridges 10 + * One (1) CXL Host Bridges has two CXL Memory Expanders Attached 11 + * The Host bridge decoder is programmed to interleave across the expanders. 12 + 13 + This output is generated by :code:`cxl list -v` and describes the relationships 14 + between objects exposed in :code:`/sys/bus/cxl/devices/`. 15 + 16 + :: 17 + 18 + [ 19 + { 20 + "bus":"root0", 21 + "provider":"ACPI.CXL", 22 + "nr_dports":4, 23 + "dports":[ 24 + { 25 + "dport":"pci0000:00", 26 + "alias":"ACPI0016:01", 27 + "id":0 28 + }, 29 + { 30 + "dport":"pci0000:a8", 31 + "alias":"ACPI0016:02", 32 + "id":4 33 + }, 34 + { 35 + "dport":"pci0000:2a", 36 + "alias":"ACPI0016:03", 37 + "id":1 38 + }, 39 + { 40 + "dport":"pci0000:d2", 41 + "alias":"ACPI0016:00", 42 + "id":5 43 + } 44 + ], 45 + 46 + This chunk shows the CXL "bus" (root0) has 4 downstream ports attached to CXL 47 + Host Bridges. The `Root` can be considered the singular upstream port attached 48 + to the platform's memory controller - which routes memory requests to it. 49 + 50 + The `ports:root0` section lays out how each of these downstream ports are 51 + configured. If a port is not configured (id's 0 and 1), they are omitted. 52 + 53 + :: 54 + 55 + "ports:root0":[ 56 + { 57 + "port":"port1", 58 + "host":"pci0000:d2", 59 + "depth":1, 60 + "nr_dports":3, 61 + "dports":[ 62 + { 63 + "dport":"0000:d2:01.1", 64 + "alias":"device:02", 65 + "id":0 66 + }, 67 + { 68 + "dport":"0000:d2:01.3", 69 + "alias":"device:05", 70 + "id":2 71 + }, 72 + { 73 + "dport":"0000:d2:07.1", 74 + "alias":"device:0d", 75 + "id":113 76 + } 77 + ], 78 + 79 + This chunk shows the available downstream ports associated with the CXL Host 80 + Bridge :code:`port1`. In this case, :code:`port1` has 3 available downstream 81 + ports: :code:`dport1`, :code:`dport2`, and :code:`dport113`.. 82 + 83 + :: 84 + 85 + "endpoints:port1":[ 86 + { 87 + "endpoint":"endpoint5", 88 + "host":"mem0", 89 + "parent_dport":"0000:d2:01.1", 90 + "depth":2, 91 + "memdev":{ 92 + "memdev":"mem0", 93 + "ram_size":137438953472, 94 + "serial":0, 95 + "numa_node":0, 96 + "host":"0000:d3:00.0" 97 + }, 98 + "decoders:endpoint5":[ 99 + { 100 + "decoder":"decoder5.0", 101 + "resource":825975898112, 102 + "size":274877906944, 103 + "interleave_ways":2, 104 + "interleave_granularity":256, 105 + "region":"region0", 106 + "dpa_resource":0, 107 + "dpa_size":137438953472, 108 + "mode":"ram" 109 + } 110 + ] 111 + }, 112 + { 113 + "endpoint":"endpoint6", 114 + "host":"mem1", 115 + "parent_dport":"0000:d2:01.3, 116 + "depth":2, 117 + "memdev":{ 118 + "memdev":"mem1", 119 + "ram_size":137438953472, 120 + "serial":0, 121 + "numa_node":0, 122 + "host":"0000:a9:00.0" 123 + }, 124 + "decoders:endpoint6":[ 125 + { 126 + "decoder":"decoder6.0", 127 + "resource":825975898112, 128 + "size":274877906944, 129 + "interleave_ways":2, 130 + "interleave_granularity":256, 131 + "region":"region0", 132 + "dpa_resource":0, 133 + "dpa_size":137438953472, 134 + "mode":"ram" 135 + } 136 + ] 137 + } 138 + ], 139 + 140 + This chunk shows the endpoints attached to the host bridge :code:`port1`. 141 + 142 + :code:`endpoint5` contains a single configured decoder :code:`decoder5.0` 143 + which has the same interleave configuration memory region they belong to 144 + (show later). 145 + 146 + Next we have the decoders belonging to the host bridge: 147 + 148 + :: 149 + 150 + "decoders:port1":[ 151 + { 152 + "decoder":"decoder1.0", 153 + "resource":825975898112, 154 + "size":274877906944, 155 + "interleave_ways":2, 156 + "interleave_granularity":256, 157 + "region":"region0", 158 + "nr_targets":2, 159 + "targets":[ 160 + { 161 + "target":"0000:d2:01.1", 162 + "alias":"device:02", 163 + "position":0, 164 + "id":0 165 + }, 166 + { 167 + "target":"0000:d2:01.3", 168 + "alias":"device:05", 169 + "position":1, 170 + "id":0 171 + } 172 + ] 173 + } 174 + ] 175 + }, 176 + 177 + Host Bridge :code:`port1` has a single decoder (:code:`decoder1.0`) with two 178 + targets: :code:`dport1` and :code:`dport3` - which are attached to 179 + :code:`endpoint5` and :code:`endpoint6` respectively. 180 + 181 + The host bridge decoder interleaves these devices at a 256 byte granularity. 182 + 183 + The next chunk shows the three CXL host bridges without attached endpoints. 184 + 185 + :: 186 + 187 + { 188 + "port":"port2", 189 + "host":"pci0000:00", 190 + "depth":1, 191 + "nr_dports":2, 192 + "dports":[ 193 + { 194 + "dport":"0000:00:01.3", 195 + "alias":"device:55", 196 + "id":2 197 + }, 198 + { 199 + "dport":"0000:00:07.1", 200 + "alias":"device:5d", 201 + "id":113 202 + } 203 + ] 204 + }, 205 + { 206 + "port":"port3", 207 + "host":"pci0000:a8", 208 + "depth":1, 209 + "nr_dports":1, 210 + "dports":[ 211 + { 212 + "dport":"0000:a8:01.1", 213 + "alias":"device:c3", 214 + "id":0 215 + } 216 + ], 217 + }, 218 + { 219 + "port":"port4", 220 + "host":"pci0000:2a", 221 + "depth":1, 222 + "nr_dports":1, 223 + "dports":[ 224 + { 225 + "dport":"0000:2a:01.1", 226 + "alias":"device:d0", 227 + "id":0 228 + } 229 + ] 230 + } 231 + ], 232 + 233 + Next we have the `Root Decoders` belonging to :code:`root0`. This root decoder 234 + applies the interleave across the downstream ports :code:`port1` and 235 + :code:`port3` - with a granularity of 256 bytes. 236 + 237 + This information is generated by the CXL driver reading the ACPI CEDT CMFWS. 238 + 239 + :: 240 + 241 + "decoders:root0":[ 242 + { 243 + "decoder":"decoder0.0", 244 + "resource":825975898112, 245 + "size":274877906944, 246 + "interleave_ways":1, 247 + "max_available_extent":0, 248 + "volatile_capable":true, 249 + "nr_targets":2, 250 + "targets":[ 251 + { 252 + "target":"pci0000:a8", 253 + "alias":"ACPI0016:02", 254 + "position":1, 255 + "id":4 256 + }, 257 + ], 258 + 259 + Finally we have the `Memory Region` associated with the `Root Decoder` 260 + :code:`decoder0.0`. This region describes the overall interleave configuration 261 + of the interleave set. 262 + 263 + :: 264 + 265 + "regions:decoder0.0":[ 266 + { 267 + "region":"region0", 268 + "resource":825975898112, 269 + "size":274877906944, 270 + "type":"ram", 271 + "interleave_ways":2, 272 + "interleave_granularity":256, 273 + "decode_state":"commit", 274 + "mappings":[ 275 + { 276 + "position":1, 277 + "memdev":"mem1", 278 + "decoder":"decoder6.0" 279 + }, 280 + { 281 + "position":0, 282 + "memdev":"mem0", 283 + "decoder":"decoder5.0" 284 + } 285 + ] 286 + } 287 + ] 288 + } 289 + ] 290 + } 291 + ]
+401
Documentation/driver-api/cxl/linux/example-configurations/multi-interleave.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ====================== 4 + Multi-Level Interleave 5 + ====================== 6 + This cxl-cli configuration dump shows the following host configuration: 7 + 8 + * A single socket system with one CXL root 9 + * CXL Root has Four (4) CXL Host Bridges 10 + * Two CXL Host Bridges have a two CXL Memory Expanders Attached each. 11 + * The CXL root is configured to interleave across the two host bridges. 12 + * Each host bridge with expanders interleaves across two endpoints. 13 + 14 + This output is generated by :code:`cxl list -v` and describes the relationships 15 + between objects exposed in :code:`/sys/bus/cxl/devices/`. 16 + 17 + :: 18 + 19 + [ 20 + { 21 + "bus":"root0", 22 + "provider":"ACPI.CXL", 23 + "nr_dports":4, 24 + "dports":[ 25 + { 26 + "dport":"pci0000:00", 27 + "alias":"ACPI0016:01", 28 + "id":0 29 + }, 30 + { 31 + "dport":"pci0000:a8", 32 + "alias":"ACPI0016:02", 33 + "id":4 34 + }, 35 + { 36 + "dport":"pci0000:2a", 37 + "alias":"ACPI0016:03", 38 + "id":1 39 + }, 40 + { 41 + "dport":"pci0000:d2", 42 + "alias":"ACPI0016:00", 43 + "id":5 44 + } 45 + ], 46 + 47 + This chunk shows the CXL "bus" (root0) has 4 downstream ports attached to CXL 48 + Host Bridges. The `Root` can be considered the singular upstream port attached 49 + to the platform's memory controller - which routes memory requests to it. 50 + 51 + The `ports:root0` section lays out how each of these downstream ports are 52 + configured. If a port is not configured (id's 0 and 1), they are omitted. 53 + 54 + :: 55 + 56 + "ports:root0":[ 57 + { 58 + "port":"port1", 59 + "host":"pci0000:d2", 60 + "depth":1, 61 + "nr_dports":3, 62 + "dports":[ 63 + { 64 + "dport":"0000:d2:01.1", 65 + "alias":"device:02", 66 + "id":0 67 + }, 68 + { 69 + "dport":"0000:d2:01.3", 70 + "alias":"device:05", 71 + "id":2 72 + }, 73 + { 74 + "dport":"0000:d2:07.1", 75 + "alias":"device:0d", 76 + "id":113 77 + } 78 + ], 79 + 80 + This chunk shows the available downstream ports associated with the CXL Host 81 + Bridge :code:`port1`. In this case, :code:`port1` has 3 available downstream 82 + ports: :code:`dport0`, :code:`dport2`, and :code:`dport113`. 83 + 84 + :: 85 + 86 + "endpoints:port1":[ 87 + { 88 + "endpoint":"endpoint5", 89 + "host":"mem0", 90 + "parent_dport":"0000:d2:01.1", 91 + "depth":2, 92 + "memdev":{ 93 + "memdev":"mem0", 94 + "ram_size":137438953472, 95 + "serial":0, 96 + "numa_node":0, 97 + "host":"0000:d3:00.0" 98 + }, 99 + "decoders:endpoint5":[ 100 + { 101 + "decoder":"decoder5.0", 102 + "resource":825975898112, 103 + "size":549755813888, 104 + "interleave_ways":4, 105 + "interleave_granularity":256, 106 + "region":"region0", 107 + "dpa_resource":0, 108 + "dpa_size":137438953472, 109 + "mode":"ram" 110 + } 111 + ] 112 + }, 113 + { 114 + "endpoint":"endpoint6", 115 + "host":"mem1", 116 + "parent_dport":"0000:d2:01.3", 117 + "depth":2, 118 + "memdev":{ 119 + "memdev":"mem1", 120 + "ram_size":137438953472, 121 + "serial":0, 122 + "numa_node":0, 123 + "host":"0000:d3:00.0" 124 + }, 125 + "decoders:endpoint6":[ 126 + { 127 + "decoder":"decoder6.0", 128 + "resource":825975898112, 129 + "size":549755813888, 130 + "interleave_ways":4, 131 + "interleave_granularity":256, 132 + "region":"region0", 133 + "dpa_resource":0, 134 + "dpa_size":137438953472, 135 + "mode":"ram" 136 + } 137 + ] 138 + } 139 + ], 140 + 141 + This chunk shows the endpoints attached to the host bridge :code:`port1`. 142 + 143 + :code:`endpoint5` contains a single configured decoder :code:`decoder5.0` 144 + which has the same interleave configuration as :code:`region0` (shown later). 145 + 146 + :code:`endpoint6` contains a single configured decoder :code:`decoder5.0` 147 + which has the same interleave configuration as :code:`region0` (shown later). 148 + 149 + Next we have the decoders belonging to the host bridge: 150 + 151 + :: 152 + 153 + "decoders:port1":[ 154 + { 155 + "decoder":"decoder1.0", 156 + "resource":825975898112, 157 + "size":549755813888, 158 + "interleave_ways":2, 159 + "interleave_granularity":512, 160 + "region":"region0", 161 + "nr_targets":2, 162 + "targets":[ 163 + { 164 + "target":"0000:d2:01.1", 165 + "alias":"device:02", 166 + "position":0, 167 + "id":0 168 + }, 169 + { 170 + "target":"0000:d2:01.3", 171 + "alias":"device:05", 172 + "position":2, 173 + "id":0 174 + } 175 + ] 176 + } 177 + ] 178 + }, 179 + 180 + Host Bridge :code:`port1` has a single decoder (:code:`decoder1.0`), whose 181 + targets are :code:`dport0` and :code:`dport2` - which are attached to 182 + :code:`endpoint5` and :code:`endpoint6` respectively. 183 + 184 + The following chunk shows a similar configuration for Host Bridge :code:`port3`, 185 + the second host bridge with a memory device attached. 186 + 187 + :: 188 + 189 + { 190 + "port":"port3", 191 + "host":"pci0000:a8", 192 + "depth":1, 193 + "nr_dports":1, 194 + "dports":[ 195 + { 196 + "dport":"0000:a8:01.1", 197 + "alias":"device:c3", 198 + "id":0 199 + }, 200 + { 201 + "dport":"0000:a8:01.3", 202 + "alias":"device:c5", 203 + "id":0 204 + } 205 + ], 206 + "endpoints:port3":[ 207 + { 208 + "endpoint":"endpoint7", 209 + "host":"mem2", 210 + "parent_dport":"0000:a8:01.1", 211 + "depth":2, 212 + "memdev":{ 213 + "memdev":"mem2", 214 + "ram_size":137438953472, 215 + "serial":0, 216 + "numa_node":0, 217 + "host":"0000:a9:00.0" 218 + }, 219 + "decoders:endpoint7":[ 220 + { 221 + "decoder":"decoder7.0", 222 + "resource":825975898112, 223 + "size":549755813888, 224 + "interleave_ways":4, 225 + "interleave_granularity":256, 226 + "region":"region0", 227 + "dpa_resource":0, 228 + "dpa_size":137438953472, 229 + "mode":"ram" 230 + } 231 + ] 232 + }, 233 + { 234 + "endpoint":"endpoint8", 235 + "host":"mem3", 236 + "parent_dport":"0000:a8:01.3", 237 + "depth":2, 238 + "memdev":{ 239 + "memdev":"mem3", 240 + "ram_size":137438953472, 241 + "serial":0, 242 + "numa_node":0, 243 + "host":"0000:a9:00.0" 244 + }, 245 + "decoders:endpoint8":[ 246 + { 247 + "decoder":"decoder8.0", 248 + "resource":825975898112, 249 + "size":549755813888, 250 + "interleave_ways":4, 251 + "interleave_granularity":256, 252 + "region":"region0", 253 + "dpa_resource":0, 254 + "dpa_size":137438953472, 255 + "mode":"ram" 256 + } 257 + ] 258 + } 259 + ], 260 + "decoders:port3":[ 261 + { 262 + "decoder":"decoder3.0", 263 + "resource":825975898112, 264 + "size":549755813888, 265 + "interleave_ways":2, 266 + "interleave_granularity":512, 267 + "region":"region0", 268 + "nr_targets":1, 269 + "targets":[ 270 + { 271 + "target":"0000:a8:01.1", 272 + "alias":"device:c3", 273 + "position":1, 274 + "id":0 275 + }, 276 + { 277 + "target":"0000:a8:01.3", 278 + "alias":"device:c5", 279 + "position":3, 280 + "id":0 281 + } 282 + ] 283 + } 284 + ] 285 + }, 286 + 287 + 288 + The next chunk shows the two CXL host bridges without attached endpoints. 289 + 290 + :: 291 + 292 + { 293 + "port":"port2", 294 + "host":"pci0000:00", 295 + "depth":1, 296 + "nr_dports":2, 297 + "dports":[ 298 + { 299 + "dport":"0000:00:01.3", 300 + "alias":"device:55", 301 + "id":2 302 + }, 303 + { 304 + "dport":"0000:00:07.1", 305 + "alias":"device:5d", 306 + "id":113 307 + } 308 + ] 309 + }, 310 + { 311 + "port":"port4", 312 + "host":"pci0000:2a", 313 + "depth":1, 314 + "nr_dports":1, 315 + "dports":[ 316 + { 317 + "dport":"0000:2a:01.1", 318 + "alias":"device:d0", 319 + "id":0 320 + } 321 + ] 322 + } 323 + ], 324 + 325 + Next we have the `Root Decoders` belonging to :code:`root0`. This root decoder 326 + applies the interleave across the downstream ports :code:`port1` and 327 + :code:`port3` - with a granularity of 256 bytes. 328 + 329 + This information is generated by the CXL driver reading the ACPI CEDT CMFWS. 330 + 331 + :: 332 + 333 + "decoders:root0":[ 334 + { 335 + "decoder":"decoder0.0", 336 + "resource":825975898112, 337 + "size":549755813888, 338 + "interleave_ways":2, 339 + "interleave_granularity":256, 340 + "max_available_extent":0, 341 + "volatile_capable":true, 342 + "nr_targets":2, 343 + "targets":[ 344 + { 345 + "target":"pci0000:a8", 346 + "alias":"ACPI0016:02", 347 + "position":1, 348 + "id":4 349 + }, 350 + { 351 + "target":"pci0000:d2", 352 + "alias":"ACPI0016:00", 353 + "position":0, 354 + "id":5 355 + } 356 + ], 357 + 358 + Finally we have the `Memory Region` associated with the `Root Decoder` 359 + :code:`decoder0.0`. This region describes the overall interleave configuration 360 + of the interleave set. So we see there are a total of :code:`4` interleave 361 + targets across 4 endpoint decoders. 362 + 363 + :: 364 + 365 + "regions:decoder0.0":[ 366 + { 367 + "region":"region0", 368 + "resource":825975898112, 369 + "size":549755813888, 370 + "type":"ram", 371 + "interleave_ways":4, 372 + "interleave_granularity":256, 373 + "decode_state":"commit", 374 + "mappings":[ 375 + { 376 + "position":3, 377 + "memdev":"mem3", 378 + "decoder":"decoder8.0" 379 + }, 380 + { 381 + "position":2, 382 + "memdev":"mem1", 383 + "decoder":"decoder6.0" 384 + } 385 + { 386 + "position":1, 387 + "memdev":"mem2", 388 + "decoder":"decoder7.0" 389 + }, 390 + { 391 + "position":0, 392 + "memdev":"mem0", 393 + "decoder":"decoder5.0" 394 + } 395 + ] 396 + } 397 + ] 398 + } 399 + ] 400 + } 401 + ]
+246
Documentation/driver-api/cxl/linux/example-configurations/single-device.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ============= 4 + Single Device 5 + ============= 6 + This cxl-cli configuration dump shows the following host configuration: 7 + 8 + * A single socket system with one CXL root 9 + * CXL Root has Four (4) CXL Host Bridges 10 + * One CXL Host Bridges has a single CXL Memory Expander Attached 11 + * No interleave is present. 12 + 13 + This output is generated by :code:`cxl list -v` and describes the relationships 14 + between objects exposed in :code:`/sys/bus/cxl/devices/`. 15 + 16 + :: 17 + 18 + [ 19 + { 20 + "bus":"root0", 21 + "provider":"ACPI.CXL", 22 + "nr_dports":4, 23 + "dports":[ 24 + { 25 + "dport":"pci0000:00", 26 + "alias":"ACPI0016:01", 27 + "id":0 28 + }, 29 + { 30 + "dport":"pci0000:a8", 31 + "alias":"ACPI0016:02", 32 + "id":4 33 + }, 34 + { 35 + "dport":"pci0000:2a", 36 + "alias":"ACPI0016:03", 37 + "id":1 38 + }, 39 + { 40 + "dport":"pci0000:d2", 41 + "alias":"ACPI0016:00", 42 + "id":5 43 + } 44 + ], 45 + 46 + This chunk shows the CXL "bus" (root0) has 4 downstream ports attached to CXL 47 + Host Bridges. The `Root` can be considered the singular upstream port attached 48 + to the platform's memory controller - which routes memory requests to it. 49 + 50 + The `ports:root0` section lays out how each of these downstream ports are 51 + configured. If a port is not configured (id's 0, 1, and 4), they are omitted. 52 + 53 + :: 54 + 55 + "ports:root0":[ 56 + { 57 + "port":"port1", 58 + "host":"pci0000:d2", 59 + "depth":1, 60 + "nr_dports":3, 61 + "dports":[ 62 + { 63 + "dport":"0000:d2:01.1", 64 + "alias":"device:02", 65 + "id":0 66 + }, 67 + { 68 + "dport":"0000:d2:01.3", 69 + "alias":"device:05", 70 + "id":2 71 + }, 72 + { 73 + "dport":"0000:d2:07.1", 74 + "alias":"device:0d", 75 + "id":113 76 + } 77 + ], 78 + 79 + This chunk shows the available downstream ports associated with the CXL Host 80 + Bridge :code:`port1`. In this case, :code:`port1` has 3 available downstream 81 + ports: :code:`dport1`, :code:`dport2`, and :code:`dport113`.. 82 + 83 + :: 84 + 85 + "endpoints:port1":[ 86 + { 87 + "endpoint":"endpoint5", 88 + "host":"mem0", 89 + "parent_dport":"0000:d2:01.1", 90 + "depth":2, 91 + "memdev":{ 92 + "memdev":"mem0", 93 + "ram_size":137438953472, 94 + "serial":0, 95 + "numa_node":0, 96 + "host":"0000:d3:00.0" 97 + }, 98 + "decoders:endpoint5":[ 99 + { 100 + "decoder":"decoder5.0", 101 + "resource":825975898112, 102 + "size":137438953472, 103 + "interleave_ways":1, 104 + "region":"region0", 105 + "dpa_resource":0, 106 + "dpa_size":137438953472, 107 + "mode":"ram" 108 + } 109 + ] 110 + } 111 + ], 112 + 113 + This chunk shows the endpoints attached to the host bridge :code:`port1`. 114 + 115 + :code:`endpoint5` contains a single configured decoder :code:`decoder5.0` 116 + which has the same interleave configuration as :code:`region0` (shown later). 117 + 118 + Next we have the decoders belonging to the host bridge: 119 + 120 + :: 121 + 122 + "decoders:port1":[ 123 + { 124 + "decoder":"decoder1.0", 125 + "resource":825975898112, 126 + "size":137438953472, 127 + "interleave_ways":1, 128 + "region":"region0", 129 + "nr_targets":1, 130 + "targets":[ 131 + { 132 + "target":"0000:d2:01.1", 133 + "alias":"device:02", 134 + "position":0, 135 + "id":0 136 + } 137 + ] 138 + } 139 + ] 140 + }, 141 + 142 + Host Bridge :code:`port1` has a single decoder (:code:`decoder1.0`), whose only 143 + target is :code:`dport1` - which is attached to :code:`endpoint5`. 144 + 145 + The next chunk shows the three CXL host bridges without attached endpoints. 146 + 147 + :: 148 + 149 + { 150 + "port":"port2", 151 + "host":"pci0000:00", 152 + "depth":1, 153 + "nr_dports":2, 154 + "dports":[ 155 + { 156 + "dport":"0000:00:01.3", 157 + "alias":"device:55", 158 + "id":2 159 + }, 160 + { 161 + "dport":"0000:00:07.1", 162 + "alias":"device:5d", 163 + "id":113 164 + } 165 + ] 166 + }, 167 + { 168 + "port":"port3", 169 + "host":"pci0000:a8", 170 + "depth":1, 171 + "nr_dports":1, 172 + "dports":[ 173 + { 174 + "dport":"0000:a8:01.1", 175 + "alias":"device:c3", 176 + "id":0 177 + } 178 + ] 179 + }, 180 + { 181 + "port":"port4", 182 + "host":"pci0000:2a", 183 + "depth":1, 184 + "nr_dports":1, 185 + "dports":[ 186 + { 187 + "dport":"0000:2a:01.1", 188 + "alias":"device:d0", 189 + "id":0 190 + } 191 + ] 192 + } 193 + ], 194 + 195 + Next we have the `Root Decoders` belonging to :code:`root0`. This root decoder 196 + is a pass-through decoder because :code:`interleave_ways` is set to :code:`1`. 197 + 198 + This information is generated by the CXL driver reading the ACPI CEDT CMFWS. 199 + 200 + :: 201 + 202 + "decoders:root0":[ 203 + { 204 + "decoder":"decoder0.0", 205 + "resource":825975898112, 206 + "size":137438953472, 207 + "interleave_ways":1, 208 + "max_available_extent":0, 209 + "volatile_capable":true, 210 + "nr_targets":1, 211 + "targets":[ 212 + { 213 + "target":"pci0000:d2", 214 + "alias":"ACPI0016:00", 215 + "position":0, 216 + "id":5 217 + } 218 + ], 219 + 220 + Finally we have the `Memory Region` associated with the `Root Decoder` 221 + :code:`decoder0.0`. This region describes the discrete region associated 222 + with the lone device. 223 + 224 + :: 225 + 226 + "regions:decoder0.0":[ 227 + { 228 + "region":"region0", 229 + "resource":825975898112, 230 + "size":137438953472, 231 + "type":"ram", 232 + "interleave_ways":1, 233 + "decode_state":"commit", 234 + "mappings":[ 235 + { 236 + "position":0, 237 + "memdev":"mem0", 238 + "decoder":"decoder5.0" 239 + } 240 + ] 241 + } 242 + ] 243 + } 244 + ] 245 + } 246 + ]
+78
Documentation/driver-api/cxl/linux/memory-hotplug.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ============== 4 + Memory Hotplug 5 + ============== 6 + The final phase of surfacing CXL memory to the kernel page allocator is for 7 + the `DAX` driver to surface a `Driver Managed` memory region via the 8 + memory-hotplug component. 9 + 10 + There are four major configurations to consider: 11 + 12 + 1) Default Online Behavior (on/off and zone) 13 + 2) Hotplug Memory Block size 14 + 3) Memory Map Resource location 15 + 4) Driver-Managed Memory Designation 16 + 17 + Default Online Behavior 18 + ======================= 19 + The default-online behavior of hotplug memory is dictated by the following, 20 + in order of precedence: 21 + 22 + - :code:`CONFIG_MHP_DEFAULT_ONLINE_TYPE` Build Configuration 23 + - :code:`memhp_default_state` Boot parameter 24 + - :code:`/sys/devices/system/memory/auto_online_blocks` value 25 + 26 + These dictate whether hotplugged memory blocks arrive in one of three states: 27 + 28 + 1) Offline 29 + 2) Online in :code:`ZONE_NORMAL` 30 + 3) Online in :code:`ZONE_MOVABLE` 31 + 32 + :code:`ZONE_NORMAL` implies this capacity may be used for almost any allocation, 33 + while :code:`ZONE_MOVABLE` implies this capacity should only be used for 34 + migratable allocations. 35 + 36 + :code:`ZONE_MOVABLE` attempts to retain the hotplug-ability of a memory block 37 + so that it the entire region may be hot-unplugged at a later time. Any capacity 38 + onlined into :code:`ZONE_NORMAL` should be considered permanently attached to 39 + the page allocator. 40 + 41 + Hotplug Memory Block Size 42 + ========================= 43 + By default, on most architectures, the Hotplug Memory Block Size is either 44 + 128MB or 256MB. On x86, the block size increases up to 2GB as total memory 45 + capacity exceeds 64GB. As of v6.15, Linux does not take into account the 46 + size and alignment of the ACPI CEDT CFMWS regions (see Early Boot docs) when 47 + deciding the Hotplug Memory Block Size. 48 + 49 + Memory Map 50 + ========== 51 + The location of :code:`struct folio` allocations to represent the hotplugged 52 + memory capacity are dictated by the following system settings: 53 + 54 + - :code:`/sys_module/memory_hotplug/parameters/memmap_on_memory` 55 + - :code:`/sys/bus/dax/devices/daxN.Y/memmap_on_memory` 56 + 57 + If both of these parameters are set to true, :code:`struct folio` for this 58 + capacity will be carved out of the memory block being onlined. This has 59 + performance implications if the memory is particularly high-latency and 60 + its :code:`struct folio` becomes hotly contended. 61 + 62 + If either parameter is set to false, :code:`struct folio` for this capacity 63 + will be allocated from the local node of the processor running the hotplug 64 + procedure. This capacity will be allocated from :code:`ZONE_NORMAL` on 65 + that node, as it is a :code:`GFP_KERNEL` allocation. 66 + 67 + Systems with extremely large amounts of :code:`ZONE_MOVABLE` memory (e.g. 68 + CXL memory pools) must ensure that there is sufficient local 69 + :code:`ZONE_NORMAL` capacity to host the memory map for the hotplugged capacity. 70 + 71 + Driver Managed Memory 72 + ===================== 73 + The DAX driver surfaces this memory to memory-hotplug as "Driver Managed". This 74 + is not a configurable setting, but it's important to note that driver managed 75 + memory is explicitly excluded from use during kexec. This is required to ensure 76 + any reset or out-of-band operations that the CXL device may be subject to during 77 + a functional system-reboot (such as a reset-on-probe) will not cause portions of 78 + the kexec kernel to be overwritten.
+103
Documentation/driver-api/cxl/linux/overview.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ======== 4 + Overview 5 + ======== 6 + 7 + This section presents the configuration process of a CXL Type-3 memory device, 8 + and how it is ultimately exposed to users as either a :code:`DAX` device or 9 + normal memory pages via the kernel's page allocator. 10 + 11 + Portions marked with a bullet are points at which certain kernel objects 12 + are generated. 13 + 14 + 1) Early Boot 15 + 16 + a) BIOS, Build, and Boot Parameters 17 + 18 + i) EFI_MEMORY_SP 19 + ii) CONFIG_EFI_SOFT_RESERVE 20 + iii) CONFIG_MHP_DEFAULT_ONLINE_TYPE 21 + iv) nosoftreserve 22 + 23 + b) Memory Map Creation 24 + 25 + i) EFI Memory Map / E820 Consulted for Soft-Reserved 26 + 27 + * CXL Memory is set aside to be handled by the CXL driver 28 + 29 + * Soft-Reserved IO Resource created for CFMWS entry 30 + 31 + c) NUMA Node Creation 32 + 33 + * Nodes created from ACPI CEDT CFMWS and SRAT Proximity domains (PXM) 34 + 35 + d) Memory Tier Creation 36 + 37 + * A default memory_tier is created with all nodes. 38 + 39 + e) Contiguous Memory Allocation 40 + 41 + * Any requested CMA is allocated from Online nodes 42 + 43 + f) Init Finishes, Drivers start probing 44 + 45 + 2) ACPI and PCI Drivers 46 + 47 + a) Detects PCI device is CXL, marking it for probe by CXL driver 48 + 49 + 3) CXL Driver Operation 50 + 51 + a) Base device creation 52 + 53 + * root, port, and memdev devices created 54 + * CEDT CFMWS IO Resource creation 55 + 56 + b) Decoder creation 57 + 58 + * root, switch, and endpoint decoders created 59 + 60 + c) Logical device creation 61 + 62 + * memory_region and endpoint devices created 63 + 64 + d) Devices are associated with each other 65 + 66 + * If auto-decoder (BIOS-programmed decoders), driver validates 67 + configurations, builds associations, and locks configs at probe time. 68 + 69 + * If user-configured, validation and associations are built at 70 + decoder-commit time. 71 + 72 + e) Regions surfaced as DAX region 73 + 74 + * dax_region created 75 + 76 + * DAX device created via DAX driver 77 + 78 + 4) DAX Driver Operation 79 + 80 + a) DAX driver surfaces DAX region as one of two dax device modes 81 + 82 + * kmem - dax device is converted to hotplug memory blocks 83 + 84 + * DAX kmem IO Resource creation 85 + 86 + * hmem - dax device is left as daxdev to be accessed as a file. 87 + 88 + * If hmem, journey ends here. 89 + 90 + b) DAX kmem surfaces memory region to Memory Hotplug to add to page 91 + allocator as "driver managed memory" 92 + 93 + 5) Memory Hotplug 94 + 95 + a) mhp component surfaces a dax device memory region as multiple memory 96 + blocks to the page allocator 97 + 98 + * blocks appear in :code:`/sys/bus/memory/devices` and linked to a NUMA node 99 + 100 + b) blocks are onlined into the requested zone (NORMAL or MOVABLE) 101 + 102 + * Memory is marked "Driver Managed" to avoid kexec from using it as region 103 + for kernel updates
+3 -3
Documentation/driver-api/cxl/maturity-map.rst
··· 51 51 52 52 * [2] CXL Window Enumeration 53 53 54 - * [0] :ref:`Extended-linear memory-side cache <extended-linear>` 54 + * [2] :ref:`Extended-linear memory-side cache <extended-linear>` 55 55 * [0] Low Memory-hole 56 - * [0] Hetero-interleave 56 + * [X] Hetero-interleave 57 57 58 58 * [2] Switch Enumeration 59 59 ··· 173 173 User Flow Support 174 174 ----------------- 175 175 176 - * [0] HPA->DPA Address translation (need xormaps export solution) 176 + * [0] Inject & clear poison by HPA 177 177 178 178 Details 179 179 =======
+22 -5
Documentation/driver-api/cxl/memory-devices.rst Documentation/driver-api/cxl/theory-of-operation.rst
··· 1 1 .. SPDX-License-Identifier: GPL-2.0 2 2 .. include:: <isonum.txt> 3 3 4 - =================================== 5 - Compute Express Link Memory Devices 6 - =================================== 4 + =============================================== 5 + Compute Express Link Driver Theory of Operation 6 + =============================================== 7 7 8 8 A Compute Express Link Memory Device is a CXL component that implements the 9 9 CXL.mem protocol. It contains some amount of volatile memory, persistent memory, ··· 14 14 range across multiple devices underneath a host-bridge or interleaved 15 15 across host-bridges. 16 16 17 - CXL Bus: Theory of Operation 18 - ============================ 17 + The CXL Bus 18 + =========== 19 19 Similar to how a RAID driver takes disk objects and assembles them into a new 20 20 logical device, the CXL subsystem is tasked to take PCIe and ACPI objects and 21 21 assemble them into a CXL.mem decode topology. The need for runtime configuration ··· 347 347 .. kernel-doc:: drivers/cxl/cxl.h 348 348 :internal: 349 349 350 + .. kernel-doc:: drivers/cxl/acpi.c 351 + :identifiers: add_cxl_resources 352 + 350 353 .. kernel-doc:: drivers/cxl/core/hdm.c 351 354 :doc: cxl core hdm 352 355 ··· 374 371 .. kernel-doc:: drivers/cxl/core/pmem.c 375 372 :doc: cxl pmem 376 373 374 + .. kernel-doc:: drivers/cxl/core/pmem.c 375 + :identifiers: 376 + 377 377 .. kernel-doc:: drivers/cxl/core/regs.c 378 378 :doc: cxl registers 379 379 380 + .. kernel-doc:: drivers/cxl/core/regs.c 381 + :identifiers: 382 + 380 383 .. kernel-doc:: drivers/cxl/core/mbox.c 381 384 :doc: cxl mbox 385 + 386 + .. kernel-doc:: drivers/cxl/core/mbox.c 387 + :identifiers: 388 + 389 + .. kernel-doc:: drivers/cxl/core/features.c 390 + :doc: cxl features 391 + 392 + See :c:func:`devm_cxl_setup_features` for API details. 382 393 383 394 CXL Regions 384 395 -----------
+76
Documentation/driver-api/cxl/platform/acpi.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + =========== 4 + ACPI Tables 5 + =========== 6 + 7 + ACPI is the "Advanced Configuration and Power Interface", which is a standard 8 + that defines how platforms and OS manage power and configure computer hardware. 9 + For the purpose of this theory of operation, when referring to "ACPI" we will 10 + usually refer to "ACPI Tables" - which are the way a platform (BIOS/EFI) 11 + communicates static configuration information to the operation system. 12 + 13 + The Following ACPI tables contain *static* configuration and performance data 14 + about CXL devices. 15 + 16 + .. toctree:: 17 + :maxdepth: 1 18 + 19 + acpi/cedt.rst 20 + acpi/srat.rst 21 + acpi/hmat.rst 22 + acpi/slit.rst 23 + acpi/dsdt.rst 24 + 25 + The SRAT table may also contain generic port/initiator content that is intended 26 + to describe the generic port, but not information about the rest of the path to 27 + the endpoint. 28 + 29 + Linux uses these tables to configure kernel resources for statically configured 30 + (by BIOS/EFI) CXL devices, such as: 31 + 32 + - NUMA nodes 33 + - Memory Tiers 34 + - NUMA Abstract Distances 35 + - SystemRAM Memory Regions 36 + - Weighted Interleave Node Weights 37 + 38 + ACPI Debugging 39 + ============== 40 + 41 + The :code:`acpidump -b` command dumps the ACPI tables into binary format. 42 + 43 + The :code:`iasl -d` command disassembles the files into human readable format. 44 + 45 + Example :code:`acpidump -b && iasl -d cedt.dat` :: 46 + 47 + [000h 0000 4] Signature : "CEDT" [CXL Early Discovery Table] 48 + 49 + Common Issues 50 + ------------- 51 + Most failures described here result in a failure of the driver to surface 52 + memory as a DAX device and/or kmem. 53 + 54 + * CEDT CFMWS targets list UIDs do not match CEDT CHBS UIDs. 55 + * CEDT CFMWS targets list UIDs do not match DSDT CXL Host Bridge UIDs. 56 + * CEDT CFMWS Restriction Bits are not correct. 57 + * CEDT CFMWS Memory regions are poorly aligned. 58 + * CEDT CFMWS Memory regions spans a platform memory hole. 59 + * CEDT CHBS UIDs do not match DSDT CXL Host Bridge UIDs. 60 + * CEDT CHBS Specification version is incorrect. 61 + * SRAT is missing regions described in CEDT CFMWS. 62 + 63 + * Result: failure to create a NUMA node for the region, or 64 + region is placed in wrong node. 65 + 66 + * HMAT is missing data for regions described in CEDT CFMWS. 67 + 68 + * Result: NUMA node being placed in the wrong memory tier. 69 + 70 + * SLIT has bad data. 71 + 72 + * Result: Lots of performance mechanisms in the kernel will be very unhappy. 73 + 74 + All of these issues will appear to users as if the driver is failing to 75 + support CXL - when in reality they are all the failure of a platform to 76 + configure the ACPI tables correctly.
+62
Documentation/driver-api/cxl/platform/acpi/cedt.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ================================ 4 + CEDT - CXL Early Discovery Table 5 + ================================ 6 + 7 + The CXL Early Discovery Table is generated by BIOS to describe the CXL memory 8 + regions configured at boot by the BIOS. 9 + 10 + CHBS 11 + ==== 12 + The CXL Host Bridge Structure describes CXL host bridges. Other than describing 13 + device register information, it reports the specific host bridge UID for this 14 + host bridge. These host bridge ID's will be referenced in other tables. 15 + 16 + Example :: 17 + 18 + Subtable Type : 00 [CXL Host Bridge Structure] 19 + Reserved : 00 20 + Length : 0020 21 + Associated host bridge : 00000007 <- Host bridge _UID 22 + Specification version : 00000001 23 + Reserved : 00000000 24 + Register base : 0000010370400000 25 + Register length : 0000000000010000 26 + 27 + CFMWS 28 + ===== 29 + The CXL Fixed Memory Window structure describes a memory region associated 30 + with one or more CXL host bridges (as described by the CHBS). It additionally 31 + describes any inter-host-bridge interleave configuration that may have been 32 + programmed by BIOS. 33 + 34 + Example :: 35 + 36 + Subtable Type : 01 [CXL Fixed Memory Window Structure] 37 + Reserved : 00 38 + Length : 002C 39 + Reserved : 00000000 40 + Window base address : 000000C050000000 <- Memory Region 41 + Window size : 0000003CA0000000 42 + Interleave Members (2^n) : 01 <- Interleave configuration 43 + Interleave Arithmetic : 00 44 + Reserved : 0000 45 + Granularity : 00000000 46 + Restrictions : 0006 47 + QtgId : 0001 48 + First Target : 00000007 <- Host Bridge _UID 49 + Next Target : 00000006 <- Host Bridge _UID 50 + 51 + The restriction field dictates what this SPA range may be used for (memory type, 52 + voltile vs persistent, etc). One or more bits may be set. :: 53 + 54 + Bit[0]: CXL Type 2 Memory 55 + Bit[1]: CXL Type 3 Memory 56 + Bit[2]: Volatile Memory 57 + Bit[3]: Persistent Memory 58 + Bit[4]: Fixed Config (HPA cannot be re-used) 59 + 60 + INTRA-host-bridge interleave (multiple devices on one host bridge) is NOT 61 + reported in this structure, and is solely defined via CXL device decoder 62 + programming (host bridge and endpoint decoders).
+28
Documentation/driver-api/cxl/platform/acpi/dsdt.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ============================================== 4 + DSDT - Differentiated system Description Table 5 + ============================================== 6 + 7 + This table describes what peripherals a machine has. 8 + 9 + This table's UIDs for CXL devices - specifically host bridges, must be 10 + consistent with the contents of the CEDT, otherwise the CXL driver will 11 + fail to probe correctly. 12 + 13 + Example Compute Express Link Host Bridge :: 14 + 15 + Scope (_SB) 16 + { 17 + Device (S0D0) 18 + { 19 + Name (_HID, "ACPI0016" /* Compute Express Link Host Bridge */) // _HID: Hardware ID 20 + Name (_CID, Package (0x02) // _CID: Compatible ID 21 + { 22 + EisaId ("PNP0A08") /* PCI Express Bus */, 23 + EisaId ("PNP0A03") /* PCI Bus */ 24 + }) 25 + ... 26 + Name (_UID, 0x05) // _UID: Unique ID 27 + ... 28 + }
+32
Documentation/driver-api/cxl/platform/acpi/hmat.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + =========================================== 4 + HMAT - Heterogeneous Memory Attribute Table 5 + =========================================== 6 + 7 + The Heterogeneous Memory Attributes Table contains information such as cache 8 + attributes and bandwidth and latency details for memory proximity domains. 9 + For the purpose of this document, we will only discuss the SSLIB entry. 10 + 11 + SLLBI 12 + ===== 13 + The System Locality Latency and Bandwidth Information records latency and 14 + bandwidth information for proximity domains. 15 + 16 + This table is used by Linux to configure interleave weights and memory tiers. 17 + 18 + Example (Heavily truncated for brevity) :: 19 + 20 + Structure Type : 0001 [SLLBI] 21 + Data Type : 00 <- Latency 22 + Target Proximity Domain List : 00000000 23 + Target Proximity Domain List : 00000001 24 + Entry : 0080 <- DRAM LTC 25 + Entry : 0100 <- CXL LTC 26 + 27 + Structure Type : 0001 [SLLBI] 28 + Data Type : 03 <- Bandwidth 29 + Target Proximity Domain List : 00000000 30 + Target Proximity Domain List : 00000001 31 + Entry : 1200 <- DRAM BW 32 + Entry : 0200 <- CXL BW
+21
Documentation/driver-api/cxl/platform/acpi/slit.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ======================================== 4 + SLIT - System Locality Information Table 5 + ======================================== 6 + 7 + The system locality information table provides "abstract distances" between 8 + accessor and memory nodes. Node without initiators (cpus) are infinitely (FF) 9 + distance away from all other nodes. 10 + 11 + The abstract distance described in this table does not describe any real 12 + latency of bandwidth information. 13 + 14 + Example :: 15 + 16 + Signature : "SLIT" [System Locality Information Table] 17 + Localities : 0000000000000004 18 + Locality 0 : 10 20 20 30 19 + Locality 1 : 20 10 30 20 20 + Locality 2 : FF FF 0A FF 21 + Locality 3 : FF FF FF 0A
+71
Documentation/driver-api/cxl/platform/acpi/srat.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ===================================== 4 + SRAT - Static Resource Affinity Table 5 + ===================================== 6 + 7 + The System/Static Resource Affinity Table describes resource (CPU, Memory) 8 + affinity to "Proximity Domains". This table is technically optional, but for 9 + performance information (see "HMAT") to be enumerated by linux it must be 10 + present. 11 + 12 + There is a careful dance between the CEDT and SRAT tables and how NUMA nodes are 13 + created. If things don't look quite the way you expect - check the SRAT Memory 14 + Affinity entries and CEDT CFMWS to determine what your platform actually 15 + supports in terms of flexible topologies. 16 + 17 + The SRAT may statically assign portions of a CFMWS SPA range to a specific 18 + proximity domains. See linux numa creation for more information about how 19 + this presents in the NUMA topology. 20 + 21 + Proximity Domain 22 + ================ 23 + A proximity domain is ROUGHLY equivalent to "NUMA Node" - though a 1-to-1 24 + mapping is not guaranteed. There are scenarios where "Proximity Domain 4" may 25 + map to "NUMA Node 3", for example. (See "NUMA Node Creation") 26 + 27 + Memory Affinity 28 + =============== 29 + Generally speaking, if a host does any amount of CXL fabric (decoder) 30 + programming in BIOS - an SRAT entry for that memory needs to be present. 31 + 32 + Example :: 33 + 34 + Subtable Type : 01 [Memory Affinity] 35 + Length : 28 36 + Proximity Domain : 00000001 <- NUMA Node 1 37 + Reserved1 : 0000 38 + Base Address : 000000C050000000 <- Physical Memory Region 39 + Address Length : 0000003CA0000000 40 + Reserved2 : 00000000 41 + Flags (decoded below) : 0000000B 42 + Enabled : 1 43 + Hot Pluggable : 1 44 + Non-Volatile : 0 45 + 46 + 47 + Generic Port Affinity 48 + ===================== 49 + The Generic Port Affinity subtable provides an association between a proximity 50 + domain and a device handle representing a Generic Port such as a CXL host 51 + bridge. With the association, latency and bandwidth numbers can be retrieved 52 + from the SRAT for the path between CPU(s) (initiator) and the Generic Port. 53 + This is used to construct performance coordinates for hotplugged CXL DEVICES, 54 + which cannot be enumerated at boot by platform firmware. 55 + 56 + Example :: 57 + 58 + Subtable Type : 06 [Generic Port Affinity] 59 + Length : 20 <- 32d, length of table 60 + Reserved : 00 61 + Device Handle Type : 00 <- 0 - ACPI, 1 - PCI 62 + Proximity Domain : 00000001 63 + Device Handle : ACPI0016:01 64 + Flags : 00000001 <- Bit 0 (Enabled) 65 + Reserved : 00000000 66 + 67 + The Proximity Domain is matched up to the :doc:`HMAT <hmat>` SSLBI Target 68 + Proximity Domain List for the related latency or bandwidth numbers. Those 69 + performance numbers are tied to a CXL host bridge via the Device Handle. 70 + The driver uses the association to retrieve the Generic Port performance 71 + numbers for the whole CXL path access coordinates calculation.
+262
Documentation/driver-api/cxl/platform/bios-and-efi.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ====================== 4 + BIOS/EFI Configuration 5 + ====================== 6 + 7 + BIOS and EFI are largely responsible for configuring static information about 8 + devices (or potential future devices) such that Linux can build the appropriate 9 + logical representations of these devices. 10 + 11 + At a high level, this is what occurs during this phase of configuration. 12 + 13 + * The bootloader starts the BIOS/EFI. 14 + 15 + * BIOS/EFI do early device probe to determine static configuration 16 + 17 + * BIOS/EFI creates ACPI Tables that describe static config for the OS 18 + 19 + * BIOS/EFI create the system memory map (EFI Memory Map, E820, etc) 20 + 21 + * BIOS/EFI calls :code:`start_kernel` and begins the Linux Early Boot process. 22 + 23 + Much of what this section is concerned with is ACPI Table production and 24 + static memory map configuration. More detail on these tables can be found 25 + at :doc:`ACPI Tables <acpi>`. 26 + 27 + .. note:: 28 + Platform Vendors should read carefully, as this sections has recommendations 29 + on physical memory region size and alignment, memory holes, HDM interleave, 30 + and what linux expects of HDM decoders trying to work with these features. 31 + 32 + UEFI Settings 33 + ============= 34 + If your platform supports it, the :code:`uefisettings` command can be used to 35 + read/write EFI settings. Changes will be reflected on the next reboot. Kexec 36 + is not a sufficient reboot. 37 + 38 + One notable configuration here is the EFI_MEMORY_SP (Specific Purpose) bit. 39 + When this is enabled, this bit tells linux to defer management of a memory 40 + region to a driver (in this case, the CXL driver). Otherwise, the memory is 41 + treated as "normal memory", and is exposed to the page allocator during 42 + :code:`__init`. 43 + 44 + uefisettings examples 45 + --------------------- 46 + 47 + :code:`uefisettings identify` :: 48 + 49 + uefisettings identify 50 + 51 + bios_vendor: xxx 52 + bios_version: xxx 53 + bios_release: xxx 54 + bios_date: xxx 55 + product_name: xxx 56 + product_family: xxx 57 + product_version: xxx 58 + 59 + On some AMD platforms, the :code:`EFI_MEMORY_SP` bit is set via the :code:`CXL 60 + Memory Attribute` field. This may be called something else on your platform. 61 + 62 + :code:`uefisettings get "CXL Memory Attribute"` :: 63 + 64 + selector: xxx 65 + ... 66 + question: Question { 67 + name: "CXL Memory Attribute", 68 + answer: "Enabled", 69 + ... 70 + } 71 + 72 + Physical Memory Map 73 + =================== 74 + 75 + Physical Address Region Alignment 76 + --------------------------------- 77 + 78 + As of Linux v6.14, the hotplug memory system requires memory regions to be 79 + uniform in size and alignment. While the CXL specification allows for memory 80 + regions as small as 256MB, the supported memory block size and alignment for 81 + hotplugged memory is architecture-defined. 82 + 83 + A Linux memory blocks may be as small as 128MB and increase in powers of two. 84 + 85 + * On ARM, the default block size and alignment is either 128MB or 256MB. 86 + 87 + * On x86, the default block size is 256MB, and increases to 2GB as the 88 + capacity of the system increases up to 64GB. 89 + 90 + For best support across versions, platform vendors should place CXL memory at 91 + a 2GB aligned base address, and regions should be 2GB aligned. This also helps 92 + prevent the creating thousands of memory devices (one per block). 93 + 94 + Memory Holes 95 + ------------ 96 + 97 + Holes in the memory map are tricky. Consider a 4GB device located at base 98 + address 0x100000000, but with the following memory map :: 99 + 100 + --------------------- 101 + | 0x100000000 | 102 + | CXL | 103 + | 0x1BFFFFFFF | 104 + --------------------- 105 + | 0x1C0000000 | 106 + | MEMORY HOLE | 107 + | 0x1FFFFFFFF | 108 + --------------------- 109 + | 0x200000000 | 110 + | CXL CONT. | 111 + | 0x23FFFFFFF | 112 + --------------------- 113 + 114 + There are two issues to consider: 115 + 116 + * decoder programming, and 117 + * memory block alignment. 118 + 119 + If your architecture requires 2GB uniform size and aligned memory blocks, the 120 + only capacity Linux is capable of mapping (as of v6.14) would be the capacity 121 + from `0x100000000-0x180000000`. The remaining capacity will be stranded, as 122 + they are not of 2GB aligned length. 123 + 124 + Assuming your architecture and memory configuration allows 1GB memory blocks, 125 + this memory map is supported and this should be presented as multiple CFMWS 126 + in the CEDT that describe each side of the memory hole separately - along with 127 + matching decoders. 128 + 129 + Multiple decoders can (and should) be used to manage such a memory hole (see 130 + below), but each chunk of a memory hole should be aligned to a reasonable block 131 + size (larger alignment is always better). If you intend to have memory holes 132 + in the memory map, expect to use one decoder per contiguous chunk of host 133 + physical memory. 134 + 135 + As of v6.14, Linux does provide support for memory hotplug of multiple 136 + physical memory regions separated by a memory hole described by a single 137 + HDM decoder. 138 + 139 + 140 + Decoder Programming 141 + =================== 142 + If BIOS/EFI intends to program the decoders to be statically configured, 143 + there are a few things to consider to avoid major pitfalls that will 144 + prevent Linux compatibility. Some of these recommendations are not 145 + required "per the specification", but Linux makes no guarantees of support 146 + otherwise. 147 + 148 + 149 + Translation Point 150 + ----------------- 151 + Per the specification, the only decoders which **TRANSLATE** Host Physical 152 + Address (HPA) to Device Physical Address (DPA) are the **Endpoint Decoders**. 153 + All other decoders in the fabric are intended to route accesses without 154 + translating the addresses. 155 + 156 + This is heavily implied by the specification, see: :: 157 + 158 + CXL Specification 3.1 159 + 8.2.4.20: CXL HDM Decoder Capability Structure 160 + - Implementation Note: CXL Host Bridge and Upstream Switch Port Decoder Flow 161 + - Implementation Note: Device Decoder Logic 162 + 163 + Given this, Linux makes a strong assumption that decoders between CPU and 164 + endpoint will all be programmed with addresses ranges that are subsets of 165 + their parent decoder. 166 + 167 + Due to some ambiguity in how Architecture, ACPI, PCI, and CXL specifications 168 + "hand off" responsibility between domains, some early adopting platforms 169 + attempted to do translation at the originating memory controller or host 170 + bridge. This configuration requires a platform specific extension to the 171 + driver and is not officially endorsed - despite being supported. 172 + 173 + It is *highly recommended* **NOT** to do this; otherwise, you are on your own 174 + to implement driver support for your platform. 175 + 176 + Interleave and Configuration Flexibility 177 + ---------------------------------------- 178 + If providing cross-host-bridge interleave, a CFMWS entry in the :doc:`CEDT 179 + <acpi/cedt>` must be presented with target host-bridges for the interleaved 180 + device sets (there may be multiple behind each host bridge). 181 + 182 + If providing intra-host-bridge interleaving, only 1 CFMWS entry in the CEDT is 183 + required for that host bridge - if it covers the entire capacity of the devices 184 + behind the host bridge. 185 + 186 + If intending to provide users flexibility in programming decoders beyond the 187 + root, you may want to provide multiple CFMWS entries in the CEDT intended for 188 + different purposes. For example, you may want to consider adding: 189 + 190 + 1) A CFMWS entry to cover all interleavable host bridges. 191 + 2) A CFMWS entry to cover all devices on a single host bridge. 192 + 3) A CFMWS entry to cover each device. 193 + 194 + A platform may choose to add all of these, or change the mode based on a BIOS 195 + setting. For each CFMWS entry, Linux expects descriptions of the described 196 + memory regions in the :doc:`SRAT <acpi/srat>` to determine the number of 197 + NUMA nodes it should reserve during early boot / init. 198 + 199 + As of v6.14, Linux will create a NUMA node for each CEDT CFMWS entry, even if 200 + a matching SRAT entry does not exist; however, this is not guaranteed in the 201 + future and such a configuration should be avoided. 202 + 203 + Memory Holes 204 + ------------ 205 + If your platform includes memory holes intersparsed between your CXL memory, it 206 + is recommended to utilize multiple decoders to cover these regions of memory, 207 + rather than try to program the decoders to accept the entire range and expect 208 + Linux to manage the overlap. 209 + 210 + For example, consider the Memory Hole described above :: 211 + 212 + --------------------- 213 + | 0x100000000 | 214 + | CXL | 215 + | 0x1BFFFFFFF | 216 + --------------------- 217 + | 0x1C0000000 | 218 + | MEMORY HOLE | 219 + | 0x1FFFFFFFF | 220 + --------------------- 221 + | 0x200000000 | 222 + | CXL CONT. | 223 + | 0x23FFFFFFF | 224 + --------------------- 225 + 226 + Assuming this is provided by a single device attached directly to a host bridge, 227 + Linux would expect the following decoder programming :: 228 + 229 + ----------------------- ----------------------- 230 + | root-decoder-0 | | root-decoder-1 | 231 + | base: 0x100000000 | | base: 0x200000000 | 232 + | size: 0xC0000000 | | size: 0x40000000 | 233 + ----------------------- ----------------------- 234 + | | 235 + ----------------------- ----------------------- 236 + | HB-decoder-0 | | HB-decoder-1 | 237 + | base: 0x100000000 | | base: 0x200000000 | 238 + | size: 0xC0000000 | | size: 0x40000000 | 239 + ----------------------- ----------------------- 240 + | | 241 + ----------------------- ----------------------- 242 + | ep-decoder-0 | | ep-decoder-1 | 243 + | base: 0x100000000 | | base: 0x200000000 | 244 + | size: 0xC0000000 | | size: 0x40000000 | 245 + ----------------------- ----------------------- 246 + 247 + With a CEDT configuration with two CFMWS describing the above root decoders. 248 + 249 + Linux makes no guarantee of support for strange memory hole situations. 250 + 251 + Multi-Media Devices 252 + ------------------- 253 + The CFMWS field of the CEDT has special restriction bits which describe whether 254 + the described memory region allows volatile or persistent memory (or both). If 255 + the platform intends to support either: 256 + 257 + 1) A device with multiple medias, or 258 + 2) Using a persistent memory device as normal memory 259 + 260 + A platform may wish to create multiple CEDT CFMWS entries to describe the same 261 + memory, with the intent of allowing the end user flexibility in how that memory 262 + is configured. Linux does not presently have strong requirements in this area.
+118
Documentation/driver-api/cxl/platform/cdat.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ====================================== 4 + Coherent Device Attribute Table (CDAT) 5 + ====================================== 6 + 7 + The CDAT provides functional and performance attributes of devices such 8 + as CXL accelerators, switches, or endpoints. The table formatting is 9 + similar to ACPI tables. CDAT data may be parsed by BIOS at boot or may 10 + be enumerated at runtime (after device hotplug, for example). 11 + 12 + Terminology: 13 + DPA - Device Physical Address, used by the CXL device to denote the address 14 + it supports for that device. 15 + 16 + DSMADHandle - A device unique handle that is associated with a DPA range 17 + defined by the DSMAS table. 18 + 19 + 20 + =============================================== 21 + Device Scoped Memory Affinity Structure (DSMAS) 22 + =============================================== 23 + 24 + The DSMAS contains information such as DSMADHandle, the DPA Base, and DPA 25 + Length. 26 + 27 + This table is used by Linux in conjunction with the Device Scoped Latency and 28 + Bandwidth Information Structure (DSLBIS) to determine the performance 29 + attributes of the CXL device itself. 30 + 31 + Example :: 32 + 33 + Structure Type : 00 [DSMAS] 34 + Reserved : 00 35 + Length : 0018 <- 24d, size of structure 36 + DSMADHandle : 01 37 + Flags : 00 38 + Reserved : 0000 39 + DPA Base : 0000000040000000 <- 1GiB base 40 + DPA Length : 0000000080000000 <- 2GiB size 41 + 42 + 43 + ================================================================== 44 + Device Scoped Latency and Bandwidth Information Structure (DSLBIS) 45 + ================================================================== 46 + 47 + This table is used by Linux in conjunction with DSMAS to determine the 48 + performance attributes of a CXL device. The DSLBIS contains latency 49 + and bandwidth information based on DSMADHandle matching. 50 + 51 + Example :: 52 + 53 + Structure Type : 01 [DSLBIS] 54 + Reserved : 00 55 + Length : 18 <- 24d, size of structure 56 + Handle : 0001 <- DSMAS handle 57 + Flags : 00 <- Matches flag field for HMAT SLLBIS 58 + Data Type : 00 <- Latency 59 + Entry Basee Unit : 0000000000001000 <- Entry Base Unit field in HMAT SSLBIS 60 + Entry : 010000000000 <- First byte used here, CXL LTC 61 + Reserved : 0000 62 + 63 + Structure Type : 01 [DSLBIS] 64 + Reserved : 00 65 + Length : 18 <- 24d, size of structure 66 + Handle : 0001 <- DSMAS handle 67 + Flags : 00 <- Matches flag field for HMAT SLLBIS 68 + Data Type : 03 <- Bandwidth 69 + Entry Basee Unit : 0000000000001000 <- Entry Base Unit field in HMAT SSLBIS 70 + Entry : 020000000000 <- First byte used here, CXL BW 71 + Reserved : 0000 72 + 73 + 74 + ================================================================== 75 + Switch Scoped Latency and Bandwidth Information Structure (SSLBIS) 76 + ================================================================== 77 + 78 + The SSLBIS contains information about the latency and bandwidth of a switch. 79 + 80 + The table is used by Linux to compute the performance coordinates of a CXL path 81 + from the device to the root port where a switch is part of the path. 82 + 83 + Example :: 84 + 85 + Structure Type : 05 [SSLBIS] 86 + Reserved : 00 87 + Length : 20 <- 32d, length of record, including SSLB entries 88 + Data Type : 00 <- Latency 89 + Reserved : 000000 90 + Entry Base Unit : 00000000000000001000 <- Matches Entry Base Unit in HMAT SSLBIS 91 + 92 + <- SSLB Entry 0 93 + Port X ID : 0100 <- First port, 0100h represents an upstream port 94 + Port Y ID : 0000 <- Second port, downstream port 0 95 + Latency : 0100 <- Port latency 96 + Reserved : 0000 97 + <- SSLB Entry 1 98 + Port X ID : 0100 99 + Port Y ID : 0001 100 + Latency : 0100 101 + Reserved : 0000 102 + 103 + 104 + Structure Type : 05 [SSLBIS] 105 + Reserved : 00 106 + Length : 18 <- 24d, length of record, including SSLB entry 107 + Data Type : 03 <- Bandwidth 108 + Reserved : 000000 109 + Entry Base Unit : 00000000000000001000 <- Matches Entry Base Unit in HMAT SSLBIS 110 + 111 + <- SSLB Entry 0 112 + Port X ID : 0100 <- First port, 0100h represents an upstream port 113 + Port Y ID : FFFF <- Second port, FFFFh indicates any port 114 + Bandwidth : 1200 <- Port bandwidth 115 + Reserved : 0000 116 + 117 + The CXL driver uses a combination of CDAT, HMAT, SRAT, and other data to 118 + generate "whole path performance" data for a CXL device.
+13
Documentation/driver-api/cxl/platform/example-configs.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + Example Platform Configurations 4 + ############################### 5 + 6 + .. toctree:: 7 + :maxdepth: 1 8 + :caption: Contents 9 + 10 + example-configurations/one-dev-per-hb.rst 11 + example-configurations/multi-dev-per-hb.rst 12 + example-configurations/hb-interleave.rst 13 + example-configurations/flexible.rst
+296
Documentation/driver-api/cxl/platform/example-configurations/flexible.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ===================== 4 + Flexible Presentation 5 + ===================== 6 + This system has a single socket with two CXL host bridges. Each host bridge 7 + has two CXL memory expanders with a 4GB of memory (32GB total). 8 + 9 + On this system, the platform designer wanted to provide the user flexibility 10 + to configure the memory devices in various interleave or NUMA node 11 + configurations. So they provided every combination. 12 + 13 + Things to note: 14 + 15 + * Cross-Bridge interleave is described in one CFMWS that covers all capacity. 16 + * One CFMWS is also described per-host bridge. 17 + * One CFMWS is also described per-device. 18 + * This SRAT describes one node for each of the above CFMWS. 19 + * The HMAT describes performance for each node in the SRAT. 20 + 21 + :doc:`CEDT <../acpi/cedt>`:: 22 + 23 + Subtable Type : 00 [CXL Host Bridge Structure] 24 + Reserved : 00 25 + Length : 0020 26 + Associated host bridge : 00000007 27 + Specification version : 00000001 28 + Reserved : 00000000 29 + Register base : 0000010370400000 30 + Register length : 0000000000010000 31 + 32 + Subtable Type : 00 [CXL Host Bridge Structure] 33 + Reserved : 00 34 + Length : 0020 35 + Associated host bridge : 00000006 36 + Specification version : 00000001 37 + Reserved : 00000000 38 + Register base : 0000010380800000 39 + Register length : 0000000000010000 40 + 41 + Subtable Type : 01 [CXL Fixed Memory Window Structure] 42 + Reserved : 00 43 + Length : 002C 44 + Reserved : 00000000 45 + Window base address : 0000001000000000 46 + Window size : 0000000400000000 47 + Interleave Members (2^n) : 01 48 + Interleave Arithmetic : 00 49 + Reserved : 0000 50 + Granularity : 00000000 51 + Restrictions : 0006 52 + QtgId : 0001 53 + First Target : 00000007 54 + Second Target : 00000006 55 + 56 + Subtable Type : 01 [CXL Fixed Memory Window Structure] 57 + Reserved : 00 58 + Length : 002C 59 + Reserved : 00000000 60 + Window base address : 0000002000000000 61 + Window size : 0000000200000000 62 + Interleave Members (2^n) : 00 63 + Interleave Arithmetic : 00 64 + Reserved : 0000 65 + Granularity : 00000000 66 + Restrictions : 0006 67 + QtgId : 0001 68 + First Target : 00000007 69 + 70 + Subtable Type : 01 [CXL Fixed Memory Window Structure] 71 + Reserved : 00 72 + Length : 002C 73 + Reserved : 00000000 74 + Window base address : 0000002200000000 75 + Window size : 0000000200000000 76 + Interleave Members (2^n) : 00 77 + Interleave Arithmetic : 00 78 + Reserved : 0000 79 + Granularity : 00000000 80 + Restrictions : 0006 81 + QtgId : 0001 82 + First Target : 00000006 83 + 84 + Subtable Type : 01 [CXL Fixed Memory Window Structure] 85 + Reserved : 00 86 + Length : 002C 87 + Reserved : 00000000 88 + Window base address : 0000003000000000 89 + Window size : 0000000100000000 90 + Interleave Members (2^n) : 00 91 + Interleave Arithmetic : 00 92 + Reserved : 0000 93 + Granularity : 00000000 94 + Restrictions : 0006 95 + QtgId : 0001 96 + First Target : 00000007 97 + 98 + Subtable Type : 01 [CXL Fixed Memory Window Structure] 99 + Reserved : 00 100 + Length : 002C 101 + Reserved : 00000000 102 + Window base address : 0000003100000000 103 + Window size : 0000000100000000 104 + Interleave Members (2^n) : 00 105 + Interleave Arithmetic : 00 106 + Reserved : 0000 107 + Granularity : 00000000 108 + Restrictions : 0006 109 + QtgId : 0001 110 + First Target : 00000007 111 + 112 + Subtable Type : 01 [CXL Fixed Memory Window Structure] 113 + Reserved : 00 114 + Length : 002C 115 + Reserved : 00000000 116 + Window base address : 0000003200000000 117 + Window size : 0000000100000000 118 + Interleave Members (2^n) : 00 119 + Interleave Arithmetic : 00 120 + Reserved : 0000 121 + Granularity : 00000000 122 + Restrictions : 0006 123 + QtgId : 0001 124 + First Target : 00000006 125 + 126 + Subtable Type : 01 [CXL Fixed Memory Window Structure] 127 + Reserved : 00 128 + Length : 002C 129 + Reserved : 00000000 130 + Window base address : 0000003300000000 131 + Window size : 0000000100000000 132 + Interleave Members (2^n) : 00 133 + Interleave Arithmetic : 00 134 + Reserved : 0000 135 + Granularity : 00000000 136 + Restrictions : 0006 137 + QtgId : 0001 138 + First Target : 00000006 139 + 140 + :doc:`SRAT <../acpi/srat>`:: 141 + 142 + Subtable Type : 01 [Memory Affinity] 143 + Length : 28 144 + Proximity Domain : 00000001 145 + Reserved1 : 0000 146 + Base Address : 0000001000000000 147 + Address Length : 0000000400000000 148 + Reserved2 : 00000000 149 + Flags (decoded below) : 0000000B 150 + Enabled : 1 151 + Hot Pluggable : 1 152 + Non-Volatile : 0 153 + 154 + Subtable Type : 01 [Memory Affinity] 155 + Length : 28 156 + Proximity Domain : 00000002 157 + Reserved1 : 0000 158 + Base Address : 0000002000000000 159 + Address Length : 0000000200000000 160 + Reserved2 : 00000000 161 + Flags (decoded below) : 0000000B 162 + Enabled : 1 163 + Hot Pluggable : 1 164 + Non-Volatile : 0 165 + 166 + Subtable Type : 01 [Memory Affinity] 167 + Length : 28 168 + Proximity Domain : 00000003 169 + Reserved1 : 0000 170 + Base Address : 0000002200000000 171 + Address Length : 0000000200000000 172 + Reserved2 : 00000000 173 + Flags (decoded below) : 0000000B 174 + Enabled : 1 175 + Hot Pluggable : 1 176 + Non-Volatile : 0 177 + 178 + Subtable Type : 01 [Memory Affinity] 179 + Length : 28 180 + Proximity Domain : 00000004 181 + Reserved1 : 0000 182 + Base Address : 0000003000000000 183 + Address Length : 0000000100000000 184 + Reserved2 : 00000000 185 + Flags (decoded below) : 0000000B 186 + Enabled : 1 187 + Hot Pluggable : 1 188 + Non-Volatile : 0 189 + 190 + Subtable Type : 01 [Memory Affinity] 191 + Length : 28 192 + Proximity Domain : 00000005 193 + Reserved1 : 0000 194 + Base Address : 0000003100000000 195 + Address Length : 0000000100000000 196 + Reserved2 : 00000000 197 + Flags (decoded below) : 0000000B 198 + Enabled : 1 199 + Hot Pluggable : 1 200 + Non-Volatile : 0 201 + 202 + Subtable Type : 01 [Memory Affinity] 203 + Length : 28 204 + Proximity Domain : 00000006 205 + Reserved1 : 0000 206 + Base Address : 0000003200000000 207 + Address Length : 0000000100000000 208 + Reserved2 : 00000000 209 + Flags (decoded below) : 0000000B 210 + Enabled : 1 211 + Hot Pluggable : 1 212 + Non-Volatile : 0 213 + 214 + Subtable Type : 01 [Memory Affinity] 215 + Length : 28 216 + Proximity Domain : 00000007 217 + Reserved1 : 0000 218 + Base Address : 0000003300000000 219 + Address Length : 0000000100000000 220 + Reserved2 : 00000000 221 + Flags (decoded below) : 0000000B 222 + Enabled : 1 223 + Hot Pluggable : 1 224 + Non-Volatile : 0 225 + 226 + :doc:`HMAT <../acpi/hmat>`:: 227 + 228 + Structure Type : 0001 [SLLBI] 229 + Data Type : 00 [Latency] 230 + Target Proximity Domain List : 00000000 231 + Target Proximity Domain List : 00000001 232 + Target Proximity Domain List : 00000002 233 + Target Proximity Domain List : 00000003 234 + Target Proximity Domain List : 00000004 235 + Target Proximity Domain List : 00000005 236 + Target Proximity Domain List : 00000006 237 + Target Proximity Domain List : 00000007 238 + Entry : 0080 239 + Entry : 0100 240 + Entry : 0100 241 + Entry : 0100 242 + Entry : 0100 243 + Entry : 0100 244 + Entry : 0100 245 + Entry : 0100 246 + 247 + Structure Type : 0001 [SLLBI] 248 + Data Type : 03 [Bandwidth] 249 + Target Proximity Domain List : 00000000 250 + Target Proximity Domain List : 00000001 251 + Target Proximity Domain List : 00000002 252 + Target Proximity Domain List : 00000003 253 + Target Proximity Domain List : 00000004 254 + Target Proximity Domain List : 00000005 255 + Target Proximity Domain List : 00000006 256 + Target Proximity Domain List : 00000007 257 + Entry : 1200 258 + Entry : 0400 259 + Entry : 0200 260 + Entry : 0200 261 + Entry : 0100 262 + Entry : 0100 263 + Entry : 0100 264 + Entry : 0100 265 + 266 + :doc:`SLIT <../acpi/slit>`:: 267 + 268 + Signature : "SLIT" [System Locality Information Table] 269 + Localities : 0000000000000003 270 + Locality 0 : 10 20 20 20 20 20 20 20 271 + Locality 1 : FF 0A FF FF FF FF FF FF 272 + Locality 2 : FF FF 0A FF FF FF FF FF 273 + Locality 3 : FF FF FF 0A FF FF FF FF 274 + Locality 4 : FF FF FF FF 0A FF FF FF 275 + Locality 5 : FF FF FF FF FF 0A FF FF 276 + Locality 6 : FF FF FF FF FF FF 0A FF 277 + Locality 7 : FF FF FF FF FF FF FF 0A 278 + 279 + :doc:`DSDT <../acpi/dsdt>`:: 280 + 281 + Scope (_SB) 282 + { 283 + Device (S0D0) 284 + { 285 + Name (_HID, "ACPI0016" /* Compute Express Link Host Bridge */) // _HID: Hardware ID 286 + ... 287 + Name (_UID, 0x07) // _UID: Unique ID 288 + } 289 + ... 290 + Device (S0D5) 291 + { 292 + Name (_HID, "ACPI0016" /* Compute Express Link Host Bridge */) // _HID: Hardware ID 293 + ... 294 + Name (_UID, 0x06) // _UID: Unique ID 295 + } 296 + }
+107
Documentation/driver-api/cxl/platform/example-configurations/hb-interleave.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ============================ 4 + Cross-Host-Bridge Interleave 5 + ============================ 6 + This system has a single socket with two CXL host bridges. Each host bridge 7 + has a single CXL memory expander with a 4GB of memory. 8 + 9 + Things to note: 10 + 11 + * Cross-Bridge interleave is described. 12 + * The expanders are described by a single CFMWS. 13 + * This SRAT describes one node for both host bridges. 14 + * The HMAT describes a single node's performance. 15 + 16 + :doc:`CEDT <../acpi/cedt>`:: 17 + 18 + Subtable Type : 00 [CXL Host Bridge Structure] 19 + Reserved : 00 20 + Length : 0020 21 + Associated host bridge : 00000007 22 + Specification version : 00000001 23 + Reserved : 00000000 24 + Register base : 0000010370400000 25 + Register length : 0000000000010000 26 + 27 + Subtable Type : 00 [CXL Host Bridge Structure] 28 + Reserved : 00 29 + Length : 0020 30 + Associated host bridge : 00000006 31 + Specification version : 00000001 32 + Reserved : 00000000 33 + Register base : 0000010380800000 34 + Register length : 0000000000010000 35 + 36 + Subtable Type : 01 [CXL Fixed Memory Window Structure] 37 + Reserved : 00 38 + Length : 002C 39 + Reserved : 00000000 40 + Window base address : 0000001000000000 41 + Window size : 0000000200000000 42 + Interleave Members (2^n) : 01 43 + Interleave Arithmetic : 00 44 + Reserved : 0000 45 + Granularity : 00000000 46 + Restrictions : 0006 47 + QtgId : 0001 48 + First Target : 00000007 49 + Second Target : 00000006 50 + 51 + :doc:`SRAT <../acpi/srat>`:: 52 + 53 + Subtable Type : 01 [Memory Affinity] 54 + Length : 28 55 + Proximity Domain : 00000001 56 + Reserved1 : 0000 57 + Base Address : 0000001000000000 58 + Address Length : 0000000200000000 59 + Reserved2 : 00000000 60 + Flags (decoded below) : 0000000B 61 + Enabled : 1 62 + Hot Pluggable : 1 63 + Non-Volatile : 0 64 + 65 + :doc:`HMAT <../acpi/hmat>`:: 66 + 67 + Structure Type : 0001 [SLLBI] 68 + Data Type : 00 [Latency] 69 + Target Proximity Domain List : 00000000 70 + Target Proximity Domain List : 00000001 71 + Target Proximity Domain List : 00000002 72 + Entry : 0080 73 + Entry : 0100 74 + 75 + Structure Type : 0001 [SLLBI] 76 + Data Type : 03 [Bandwidth] 77 + Target Proximity Domain List : 00000000 78 + Target Proximity Domain List : 00000001 79 + Target Proximity Domain List : 00000002 80 + Entry : 1200 81 + Entry : 0400 82 + 83 + :doc:`SLIT <../acpi/slit>`:: 84 + 85 + Signature : "SLIT" [System Locality Information Table] 86 + Localities : 0000000000000003 87 + Locality 0 : 10 20 88 + Locality 1 : FF 0A 89 + 90 + :doc:`DSDT <../acpi/dsdt>`:: 91 + 92 + Scope (_SB) 93 + { 94 + Device (S0D0) 95 + { 96 + Name (_HID, "ACPI0016" /* Compute Express Link Host Bridge */) // _HID: Hardware ID 97 + ... 98 + Name (_UID, 0x07) // _UID: Unique ID 99 + } 100 + ... 101 + Device (S0D5) 102 + { 103 + Name (_HID, "ACPI0016" /* Compute Express Link Host Bridge */) // _HID: Hardware ID 104 + ... 105 + Name (_UID, 0x06) // _UID: Unique ID 106 + } 107 + }
+90
Documentation/driver-api/cxl/platform/example-configurations/multi-dev-per-hb.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ================================ 4 + Multiple Devices per Host Bridge 5 + ================================ 6 + 7 + In this example system we will have a single socket and one CXL host bridge. 8 + There are two CXL memory expanders with 4GB attached to the host bridge. 9 + 10 + Things to note: 11 + 12 + * Intra-Bridge interleave is not described here. 13 + * The expanders are described by a single CEDT/CFMWS. 14 + * This CEDT/SRAT describes one node for both devices. 15 + * There is only one proximity domain the HMAT for both devices. 16 + 17 + :doc:`CEDT <../acpi/cedt>`:: 18 + 19 + Subtable Type : 00 [CXL Host Bridge Structure] 20 + Reserved : 00 21 + Length : 0020 22 + Associated host bridge : 00000007 23 + Specification version : 00000001 24 + Reserved : 00000000 25 + Register base : 0000010370400000 26 + Register length : 0000000000010000 27 + 28 + Subtable Type : 01 [CXL Fixed Memory Window Structure] 29 + Reserved : 00 30 + Length : 002C 31 + Reserved : 00000000 32 + Window base address : 0000001000000000 33 + Window size : 0000000200000000 34 + Interleave Members (2^n) : 00 35 + Interleave Arithmetic : 00 36 + Reserved : 0000 37 + Granularity : 00000000 38 + Restrictions : 0006 39 + QtgId : 0001 40 + First Target : 00000007 41 + 42 + :doc:`SRAT <../acpi/srat>`:: 43 + 44 + Subtable Type : 01 [Memory Affinity] 45 + Length : 28 46 + Proximity Domain : 00000001 47 + Reserved1 : 0000 48 + Base Address : 0000001000000000 49 + Address Length : 0000000200000000 50 + Reserved2 : 00000000 51 + Flags (decoded below) : 0000000B 52 + Enabled : 1 53 + Hot Pluggable : 1 54 + Non-Volatile : 0 55 + 56 + :doc:`HMAT <../acpi/hmat>`:: 57 + 58 + Structure Type : 0001 [SLLBI] 59 + Data Type : 00 [Latency] 60 + Target Proximity Domain List : 00000000 61 + Target Proximity Domain List : 00000001 62 + Entry : 0080 63 + Entry : 0100 64 + 65 + Structure Type : 0001 [SLLBI] 66 + Data Type : 03 [Bandwidth] 67 + Target Proximity Domain List : 00000000 68 + Target Proximity Domain List : 00000001 69 + Entry : 1200 70 + Entry : 0200 71 + 72 + :doc:`SLIT <../acpi/slit>`:: 73 + 74 + Signature : "SLIT" [System Locality Information Table] 75 + Localities : 0000000000000003 76 + Locality 0 : 10 20 77 + Locality 1 : FF 0A 78 + 79 + :doc:`DSDT <../acpi/dsdt>`:: 80 + 81 + Scope (_SB) 82 + { 83 + Device (S0D0) 84 + { 85 + Name (_HID, "ACPI0016" /* Compute Express Link Host Bridge */) // _HID: Hardware ID 86 + ... 87 + Name (_UID, 0x07) // _UID: Unique ID 88 + } 89 + ... 90 + }
+136
Documentation/driver-api/cxl/platform/example-configurations/one-dev-per-hb.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ========================== 4 + One Device per Host Bridge 5 + ========================== 6 + 7 + This system has a single socket with two CXL host bridges. Each host bridge 8 + has a single CXL memory expander with a 4GB of memory. 9 + 10 + Things to note: 11 + 12 + * Cross-Bridge interleave is not being used. 13 + * The expanders are in two separate but adjascent memory regions. 14 + * This CEDT/SRAT describes one node per device 15 + * The expanders have the same performance and will be in the same memory tier. 16 + 17 + :doc:`CEDT <../acpi/cedt>`:: 18 + 19 + Subtable Type : 00 [CXL Host Bridge Structure] 20 + Reserved : 00 21 + Length : 0020 22 + Associated host bridge : 00000007 23 + Specification version : 00000001 24 + Reserved : 00000000 25 + Register base : 0000010370400000 26 + Register length : 0000000000010000 27 + 28 + Subtable Type : 00 [CXL Host Bridge Structure] 29 + Reserved : 00 30 + Length : 0020 31 + Associated host bridge : 00000006 32 + Specification version : 00000001 33 + Reserved : 00000000 34 + Register base : 0000010380800000 35 + Register length : 0000000000010000 36 + 37 + Subtable Type : 01 [CXL Fixed Memory Window Structure] 38 + Reserved : 00 39 + Length : 002C 40 + Reserved : 00000000 41 + Window base address : 0000001000000000 42 + Window size : 0000000100000000 43 + Interleave Members (2^n) : 00 44 + Interleave Arithmetic : 00 45 + Reserved : 0000 46 + Granularity : 00000000 47 + Restrictions : 0006 48 + QtgId : 0001 49 + First Target : 00000007 50 + 51 + Subtable Type : 01 [CXL Fixed Memory Window Structure] 52 + Reserved : 00 53 + Length : 002C 54 + Reserved : 00000000 55 + Window base address : 0000001100000000 56 + Window size : 0000000100000000 57 + Interleave Members (2^n) : 00 58 + Interleave Arithmetic : 00 59 + Reserved : 0000 60 + Granularity : 00000000 61 + Restrictions : 0006 62 + QtgId : 0001 63 + First Target : 00000006 64 + 65 + :doc:`SRAT <../acpi/srat>`:: 66 + 67 + Subtable Type : 01 [Memory Affinity] 68 + Length : 28 69 + Proximity Domain : 00000001 70 + Reserved1 : 0000 71 + Base Address : 0000001000000000 72 + Address Length : 0000000100000000 73 + Reserved2 : 00000000 74 + Flags (decoded below) : 0000000B 75 + Enabled : 1 76 + Hot Pluggable : 1 77 + Non-Volatile : 0 78 + 79 + Subtable Type : 01 [Memory Affinity] 80 + Length : 28 81 + Proximity Domain : 00000002 82 + Reserved1 : 0000 83 + Base Address : 0000001100000000 84 + Address Length : 0000000100000000 85 + Reserved2 : 00000000 86 + Flags (decoded below) : 0000000B 87 + Enabled : 1 88 + Hot Pluggable : 1 89 + Non-Volatile : 0 90 + 91 + :doc:`HMAT <../acpi/hmat>`:: 92 + 93 + Structure Type : 0001 [SLLBI] 94 + Data Type : 00 [Latency] 95 + Target Proximity Domain List : 00000000 96 + Target Proximity Domain List : 00000001 97 + Target Proximity Domain List : 00000002 98 + Entry : 0080 99 + Entry : 0100 100 + Entry : 0100 101 + 102 + Structure Type : 0001 [SLLBI] 103 + Data Type : 03 [Bandwidth] 104 + Target Proximity Domain List : 00000000 105 + Target Proximity Domain List : 00000001 106 + Target Proximity Domain List : 00000002 107 + Entry : 1200 108 + Entry : 0200 109 + Entry : 0200 110 + 111 + :doc:`SLIT <../acpi/slit>`:: 112 + 113 + Signature : "SLIT" [System Locality Information Table] 114 + Localities : 0000000000000003 115 + Locality 0 : 10 20 20 116 + Locality 1 : FF 0A FF 117 + Locality 2 : FF FF 0A 118 + 119 + :doc:`DSDT <../acpi/dsdt>`:: 120 + 121 + Scope (_SB) 122 + { 123 + Device (S0D0) 124 + { 125 + Name (_HID, "ACPI0016" /* Compute Express Link Host Bridge */) // _HID: Hardware ID 126 + ... 127 + Name (_UID, 0x07) // _UID: Unique ID 128 + } 129 + ... 130 + Device (S0D5) 131 + { 132 + Name (_HID, "ACPI0016" /* Compute Express Link Host Bridge */) // _HID: Hardware ID 133 + ... 134 + Name (_UID, 0x06) // _UID: Unique ID 135 + } 136 + }
+31
Documentation/edac/memory_repair.rst
··· 119 119 120 120 Sysfs files are documented in 121 121 `Documentation/ABI/testing/sysfs-edac-memory-repair`. 122 + 123 + Examples 124 + -------- 125 + 126 + The memory repair usage takes the form shown in this example: 127 + 128 + 1. CXL memory sparing 129 + 130 + Memory sparing is defined as a repair function that replaces a portion of 131 + memory with a portion of functional memory at that same DPA. The subclass 132 + for this operation, cacheline/row/bank/rank sparing, vary in terms of the 133 + scope of the sparing being performed. 134 + 135 + Memory sparing maintenance operations may be supported by CXL devices that 136 + implement CXL.mem protocol. A sparing maintenance operation requests the 137 + CXL device to perform a repair operation on its media. For example, a CXL 138 + device with DRAM components that support memory sparing features may 139 + implement sparing maintenance operations. 140 + 141 + 2. CXL memory Soft Post Package Repair (sPPR) 142 + 143 + Post Package Repair (PPR) maintenance operations may be supported by CXL 144 + devices that implement CXL.mem protocol. A PPR maintenance operation 145 + requests the CXL device to perform a repair operation on its media. 146 + For example, a CXL device with DRAM components that support PPR features 147 + may implement PPR Maintenance operations. Soft PPR (sPPR) is a temporary 148 + row repair. Soft PPR may be faster, but the repair is lost with a power 149 + cycle. 150 + 151 + Sysfs files for memory repair are documented in 152 + `Documentation/ABI/testing/sysfs-edac-memory-repair`
+76
Documentation/edac/scrub.rst
··· 264 264 `Documentation/ABI/testing/sysfs-edac-scrub` 265 265 266 266 `Documentation/ABI/testing/sysfs-edac-ecs` 267 + 268 + Examples 269 + -------- 270 + 271 + The usage takes the form shown in these examples: 272 + 273 + 1. CXL memory Patrol Scrub 274 + 275 + The following are the use cases identified why we might increase the scrub rate. 276 + 277 + - Scrubbing is needed at device granularity because a device is showing 278 + unexpectedly high errors. 279 + 280 + - Scrubbing may apply to memory that isn't online at all yet. Likely this 281 + is a system wide default setting on boot. 282 + 283 + - Scrubbing at a higher rate because the monitor software has determined that 284 + more reliability is necessary for a particular data set. This is called 285 + Differentiated Reliability. 286 + 287 + 1.1. Device based scrubbing 288 + 289 + CXL memory is exposed to memory management subsystem and ultimately userspace 290 + via CXL devices. Device-based scrubbing is used for the first use case 291 + described in "Section 1 CXL Memory Patrol Scrub". 292 + 293 + When combining control via the device interfaces and region interfaces, 294 + "see Section 1.2 Region based scrubbing". 295 + 296 + Sysfs files for scrubbing are documented in 297 + `Documentation/ABI/testing/sysfs-edac-scrub` 298 + 299 + 1.2. Region based scrubbing 300 + 301 + CXL memory is exposed to memory management subsystem and ultimately userspace 302 + via CXL regions. CXL Regions represent mapped memory capacity in system 303 + physical address space. These can incorporate one or more parts of multiple CXL 304 + memory devices with traffic interleaved across them. The user may want to control 305 + the scrub rate via this more abstract region instead of having to figure out the 306 + constituent devices and program them separately. The scrub rate for each device 307 + covers the whole device. Thus if multiple regions use parts of that device then 308 + requests for scrubbing of other regions may result in a higher scrub rate than 309 + requested for this specific region. 310 + 311 + Region-based scrubbing is used for the third use case described in 312 + "Section 1 CXL Memory Patrol Scrub". 313 + 314 + Userspace must follow below set of rules on how to set the scrub rates for any 315 + mixture of requirements. 316 + 317 + 1. Taking each region in turn from lowest desired scrub rate to highest and set 318 + their scrub rates. Later regions may override the scrub rate on individual 319 + devices (and hence potentially whole regions). 320 + 321 + 2. Take each device for which enhanced scrubbing is required (higher rate) and 322 + set those scrub rates. This will override the scrub rates of individual devices, 323 + setting them to the maximum rate required for any of the regions they help back, 324 + unless a specific rate is already defined. 325 + 326 + Sysfs files for scrubbing are documented in 327 + `Documentation/ABI/testing/sysfs-edac-scrub` 328 + 329 + 2. CXL memory Error Check Scrub (ECS) 330 + 331 + The Error Check Scrub (ECS) feature enables a memory device to perform error 332 + checking and correction (ECC) and count single-bit errors. The associated 333 + memory controller sets the ECS mode with a trigger sent to the memory 334 + device. CXL ECS control allows the host, thus the userspace, to change the 335 + attributes for error count mode, threshold number of errors per segment 336 + (indicating how many segments have at least that number of errors) for 337 + reporting errors, and reset the ECS counter. Thus the responsibility for 338 + initiating Error Check Scrub on a memory device may lie with the memory 339 + controller or platform when unexpectedly high error rates are detected. 340 + 341 + Sysfs files for scrubbing are documented in 342 + `Documentation/ABI/testing/sysfs-edac-ecs`
+71
drivers/cxl/Kconfig
··· 114 114 115 115 If unsure say 'n' 116 116 117 + config CXL_EDAC_MEM_FEATURES 118 + bool "CXL: EDAC Memory Features" 119 + depends on EXPERT 120 + depends on CXL_MEM 121 + depends on CXL_FEATURES 122 + depends on EDAC >= CXL_BUS 123 + help 124 + The CXL EDAC memory feature is optional and allows host to 125 + control the EDAC memory features configurations of CXL memory 126 + expander devices. 127 + 128 + Say 'y' if you have an expert need to change default settings 129 + of a memory RAS feature established by the platform/device. 130 + Otherwise say 'n'. 131 + 132 + config CXL_EDAC_SCRUB 133 + bool "Enable CXL Patrol Scrub Control (Patrol Read)" 134 + depends on CXL_EDAC_MEM_FEATURES 135 + depends on EDAC_SCRUB 136 + help 137 + The CXL EDAC scrub control is optional and allows host to 138 + control the scrub feature configurations of CXL memory expander 139 + devices. 140 + 141 + When enabled 'cxl_mem' and 'cxl_region' EDAC devices are 142 + published with memory scrub control attributes as described by 143 + Documentation/ABI/testing/sysfs-edac-scrub. 144 + 145 + Say 'y' if you have an expert need to change default settings 146 + of a memory scrub feature established by the platform/device 147 + (e.g. scrub rates for the patrol scrub feature). 148 + Otherwise say 'n'. 149 + 150 + config CXL_EDAC_ECS 151 + bool "Enable CXL Error Check Scrub (Repair)" 152 + depends on CXL_EDAC_MEM_FEATURES 153 + depends on EDAC_ECS 154 + help 155 + The CXL EDAC ECS control is optional and allows host to 156 + control the ECS feature configurations of CXL memory expander 157 + devices. 158 + 159 + When enabled 'cxl_mem' EDAC devices are published with memory 160 + ECS control attributes as described by 161 + Documentation/ABI/testing/sysfs-edac-ecs. 162 + 163 + Say 'y' if you have an expert need to change default settings 164 + of a memory ECS feature established by the platform/device. 165 + Otherwise say 'n'. 166 + 167 + config CXL_EDAC_MEM_REPAIR 168 + bool "Enable CXL Memory Repair" 169 + depends on CXL_EDAC_MEM_FEATURES 170 + depends on EDAC_MEM_REPAIR 171 + help 172 + The CXL EDAC memory repair control is optional and allows host 173 + to control the memory repair features (e.g. sparing, PPR) 174 + configurations of CXL memory expander devices. 175 + 176 + When enabled, the memory repair feature requires an additional 177 + memory of approximately 43KB to store CXL DRAM and CXL general 178 + media event records. 179 + 180 + When enabled 'cxl_mem' EDAC devices are published with memory 181 + repair control attributes as described by 182 + Documentation/ABI/testing/sysfs-edac-memory-repair. 183 + 184 + Say 'y' if you have an expert need to change default settings 185 + of a memory repair feature established by the platform/device. 186 + Otherwise say 'n'. 187 + 117 188 config CXL_PORT 118 189 default CXL_BUS 119 190 tristate
+17 -7
drivers/cxl/acpi.c
··· 11 11 #include "cxlpci.h" 12 12 #include "cxl.h" 13 13 14 - #define CXL_RCRB_SIZE SZ_8K 15 - 16 14 struct cxl_cxims_data { 17 15 int nr_maps; 18 16 u64 xormaps[] __counted_by(nr_maps); ··· 419 421 rc = cxl_decoder_add(cxld, target_map); 420 422 if (rc) 421 423 return rc; 422 - return cxl_root_decoder_autoremove(dev, no_free_ptr(cxlrd)); 424 + 425 + rc = cxl_root_decoder_autoremove(dev, no_free_ptr(cxlrd)); 426 + if (rc) 427 + return rc; 428 + 429 + dev_dbg(root_port->dev.parent, "%s added to %s\n", 430 + dev_name(&cxld->dev), dev_name(&root_port->dev)); 431 + 432 + return 0; 423 433 } 424 434 425 435 static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg, ··· 485 479 chbs = (struct acpi_cedt_chbs *) header; 486 480 487 481 if (chbs->cxl_version == ACPI_CEDT_CHBS_VERSION_CXL11 && 488 - chbs->length != CXL_RCRB_SIZE) 482 + chbs->length != ACPI_CEDT_CHBS_LENGTH_CXL11) 483 + return 0; 484 + 485 + if (chbs->cxl_version == ACPI_CEDT_CHBS_VERSION_CXL20 && 486 + chbs->length != ACPI_CEDT_CHBS_LENGTH_CXL20) 489 487 return 0; 490 488 491 489 if (!chbs->base) ··· 749 739 * expanding its boundaries to ensure that any conflicting resources become 750 740 * children. If a window is expanded it may then conflict with a another window 751 741 * entry and require the window to be truncated or trimmed. Consider this 752 - * situation: 742 + * situation:: 753 743 * 754 - * |-- "CXL Window 0" --||----- "CXL Window 1" -----| 755 - * |--------------- "System RAM" -------------| 744 + * |-- "CXL Window 0" --||----- "CXL Window 1" -----| 745 + * |--------------- "System RAM" -------------| 756 746 * 757 747 * ...where platform firmware has established as System RAM resource across 2 758 748 * windows, but has left some portion of window 1 for dynamic CXL region
+1
drivers/cxl/core/Makefile
··· 20 20 cxl_core-$(CONFIG_CXL_REGION) += region.o 21 21 cxl_core-$(CONFIG_CXL_MCE) += mce.o 22 22 cxl_core-$(CONFIG_CXL_FEATURES) += features.o 23 + cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += edac.o
+1 -1
drivers/cxl/core/cdat.c
··· 28 28 */ 29 29 if (entry == 0xffff || !entry) 30 30 return 0; 31 - else if (base > (UINT_MAX / (entry))) 31 + if (base > (UINT_MAX / (entry))) 32 32 return 0; 33 33 34 34 /*
+3 -1
drivers/cxl/core/core.h
··· 76 76 struct dentry *cxl_debugfs_create_dir(const char *dir); 77 77 int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled, 78 78 enum cxl_partition_mode mode); 79 - int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size); 79 + int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, u64 size); 80 80 int cxl_dpa_free(struct cxl_endpoint_decoder *cxled); 81 81 resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled); 82 82 resource_size_t cxl_dpa_resource_start(struct cxl_endpoint_decoder *cxled); ··· 124 124 int nid, resource_size_t *size); 125 125 126 126 #ifdef CONFIG_CXL_FEATURES 127 + struct cxl_feat_entry * 128 + cxl_feature_info(struct cxl_features_state *cxlfs, const uuid_t *uuid); 127 129 size_t cxl_get_feature(struct cxl_mailbox *cxl_mbox, const uuid_t *feat_uuid, 128 130 enum cxl_get_feat_selection selection, 129 131 void *feat_out, size_t feat_out_size, u16 offset,
+2102
drivers/cxl/core/edac.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * CXL EDAC memory feature driver. 4 + * 5 + * Copyright (c) 2024-2025 HiSilicon Limited. 6 + * 7 + * - Supports functions to configure EDAC features of the 8 + * CXL memory devices. 9 + * - Registers with the EDAC device subsystem driver to expose 10 + * the features sysfs attributes to the user for configuring 11 + * CXL memory RAS feature. 12 + */ 13 + 14 + #include <linux/cleanup.h> 15 + #include <linux/edac.h> 16 + #include <linux/limits.h> 17 + #include <linux/unaligned.h> 18 + #include <linux/xarray.h> 19 + #include <cxl/features.h> 20 + #include <cxl.h> 21 + #include <cxlmem.h> 22 + #include "core.h" 23 + #include "trace.h" 24 + 25 + #define CXL_NR_EDAC_DEV_FEATURES 7 26 + 27 + #define CXL_SCRUB_NO_REGION -1 28 + 29 + struct cxl_patrol_scrub_context { 30 + u8 instance; 31 + u16 get_feat_size; 32 + u16 set_feat_size; 33 + u8 get_version; 34 + u8 set_version; 35 + u16 effects; 36 + struct cxl_memdev *cxlmd; 37 + struct cxl_region *cxlr; 38 + }; 39 + 40 + /* 41 + * See CXL spec rev 3.2 @8.2.10.9.11.1 Table 8-222 Device Patrol Scrub Control 42 + * Feature Readable Attributes. 43 + */ 44 + struct cxl_scrub_rd_attrbs { 45 + u8 scrub_cycle_cap; 46 + __le16 scrub_cycle_hours; 47 + u8 scrub_flags; 48 + } __packed; 49 + 50 + /* 51 + * See CXL spec rev 3.2 @8.2.10.9.11.1 Table 8-223 Device Patrol Scrub Control 52 + * Feature Writable Attributes. 53 + */ 54 + struct cxl_scrub_wr_attrbs { 55 + u8 scrub_cycle_hours; 56 + u8 scrub_flags; 57 + } __packed; 58 + 59 + #define CXL_SCRUB_CONTROL_CHANGEABLE BIT(0) 60 + #define CXL_SCRUB_CONTROL_REALTIME BIT(1) 61 + #define CXL_SCRUB_CONTROL_CYCLE_MASK GENMASK(7, 0) 62 + #define CXL_SCRUB_CONTROL_MIN_CYCLE_MASK GENMASK(15, 8) 63 + #define CXL_SCRUB_CONTROL_ENABLE BIT(0) 64 + 65 + #define CXL_GET_SCRUB_CYCLE_CHANGEABLE(cap) \ 66 + FIELD_GET(CXL_SCRUB_CONTROL_CHANGEABLE, cap) 67 + #define CXL_GET_SCRUB_CYCLE(cycle) \ 68 + FIELD_GET(CXL_SCRUB_CONTROL_CYCLE_MASK, cycle) 69 + #define CXL_GET_SCRUB_MIN_CYCLE(cycle) \ 70 + FIELD_GET(CXL_SCRUB_CONTROL_MIN_CYCLE_MASK, cycle) 71 + #define CXL_GET_SCRUB_EN_STS(flags) FIELD_GET(CXL_SCRUB_CONTROL_ENABLE, flags) 72 + 73 + #define CXL_SET_SCRUB_CYCLE(cycle) \ 74 + FIELD_PREP(CXL_SCRUB_CONTROL_CYCLE_MASK, cycle) 75 + #define CXL_SET_SCRUB_EN(en) FIELD_PREP(CXL_SCRUB_CONTROL_ENABLE, en) 76 + 77 + static int cxl_mem_scrub_get_attrbs(struct cxl_mailbox *cxl_mbox, u8 *cap, 78 + u16 *cycle, u8 *flags, u8 *min_cycle) 79 + { 80 + size_t rd_data_size = sizeof(struct cxl_scrub_rd_attrbs); 81 + size_t data_size; 82 + struct cxl_scrub_rd_attrbs *rd_attrbs __free(kfree) = 83 + kzalloc(rd_data_size, GFP_KERNEL); 84 + if (!rd_attrbs) 85 + return -ENOMEM; 86 + 87 + data_size = cxl_get_feature(cxl_mbox, &CXL_FEAT_PATROL_SCRUB_UUID, 88 + CXL_GET_FEAT_SEL_CURRENT_VALUE, rd_attrbs, 89 + rd_data_size, 0, NULL); 90 + if (!data_size) 91 + return -EIO; 92 + 93 + *cap = rd_attrbs->scrub_cycle_cap; 94 + *cycle = le16_to_cpu(rd_attrbs->scrub_cycle_hours); 95 + *flags = rd_attrbs->scrub_flags; 96 + if (min_cycle) 97 + *min_cycle = CXL_GET_SCRUB_MIN_CYCLE(*cycle); 98 + 99 + return 0; 100 + } 101 + 102 + static int cxl_scrub_get_attrbs(struct cxl_patrol_scrub_context *cxl_ps_ctx, 103 + u8 *cap, u16 *cycle, u8 *flags, u8 *min_cycle) 104 + { 105 + struct cxl_mailbox *cxl_mbox; 106 + u8 min_scrub_cycle = U8_MAX; 107 + struct cxl_region_params *p; 108 + struct cxl_memdev *cxlmd; 109 + struct cxl_region *cxlr; 110 + int i, ret; 111 + 112 + if (!cxl_ps_ctx->cxlr) { 113 + cxl_mbox = &cxl_ps_ctx->cxlmd->cxlds->cxl_mbox; 114 + return cxl_mem_scrub_get_attrbs(cxl_mbox, cap, cycle, 115 + flags, min_cycle); 116 + } 117 + 118 + struct rw_semaphore *region_lock __free(rwsem_read_release) = 119 + rwsem_read_intr_acquire(&cxl_region_rwsem); 120 + if (!region_lock) 121 + return -EINTR; 122 + 123 + cxlr = cxl_ps_ctx->cxlr; 124 + p = &cxlr->params; 125 + 126 + for (i = 0; i < p->nr_targets; i++) { 127 + struct cxl_endpoint_decoder *cxled = p->targets[i]; 128 + 129 + cxlmd = cxled_to_memdev(cxled); 130 + cxl_mbox = &cxlmd->cxlds->cxl_mbox; 131 + ret = cxl_mem_scrub_get_attrbs(cxl_mbox, cap, cycle, flags, 132 + min_cycle); 133 + if (ret) 134 + return ret; 135 + 136 + if (min_cycle) 137 + min_scrub_cycle = min(*min_cycle, min_scrub_cycle); 138 + } 139 + 140 + if (min_cycle) 141 + *min_cycle = min_scrub_cycle; 142 + 143 + return 0; 144 + } 145 + 146 + static int cxl_scrub_set_attrbs_region(struct device *dev, 147 + struct cxl_patrol_scrub_context *cxl_ps_ctx, 148 + u8 cycle, u8 flags) 149 + { 150 + struct cxl_scrub_wr_attrbs wr_attrbs; 151 + struct cxl_mailbox *cxl_mbox; 152 + struct cxl_region_params *p; 153 + struct cxl_memdev *cxlmd; 154 + struct cxl_region *cxlr; 155 + int ret, i; 156 + 157 + struct rw_semaphore *region_lock __free(rwsem_read_release) = 158 + rwsem_read_intr_acquire(&cxl_region_rwsem); 159 + if (!region_lock) 160 + return -EINTR; 161 + 162 + cxlr = cxl_ps_ctx->cxlr; 163 + p = &cxlr->params; 164 + wr_attrbs.scrub_cycle_hours = cycle; 165 + wr_attrbs.scrub_flags = flags; 166 + 167 + for (i = 0; i < p->nr_targets; i++) { 168 + struct cxl_endpoint_decoder *cxled = p->targets[i]; 169 + 170 + cxlmd = cxled_to_memdev(cxled); 171 + cxl_mbox = &cxlmd->cxlds->cxl_mbox; 172 + ret = cxl_set_feature(cxl_mbox, &CXL_FEAT_PATROL_SCRUB_UUID, 173 + cxl_ps_ctx->set_version, &wr_attrbs, 174 + sizeof(wr_attrbs), 175 + CXL_SET_FEAT_FLAG_DATA_SAVED_ACROSS_RESET, 176 + 0, NULL); 177 + if (ret) 178 + return ret; 179 + 180 + if (cycle != cxlmd->scrub_cycle) { 181 + if (cxlmd->scrub_region_id != CXL_SCRUB_NO_REGION) 182 + dev_info(dev, 183 + "Device scrub rate(%d hours) set by region%d rate overwritten by region%d scrub rate(%d hours)\n", 184 + cxlmd->scrub_cycle, 185 + cxlmd->scrub_region_id, cxlr->id, 186 + cycle); 187 + 188 + cxlmd->scrub_cycle = cycle; 189 + cxlmd->scrub_region_id = cxlr->id; 190 + } 191 + } 192 + 193 + return 0; 194 + } 195 + 196 + static int cxl_scrub_set_attrbs_device(struct device *dev, 197 + struct cxl_patrol_scrub_context *cxl_ps_ctx, 198 + u8 cycle, u8 flags) 199 + { 200 + struct cxl_scrub_wr_attrbs wr_attrbs; 201 + struct cxl_mailbox *cxl_mbox; 202 + struct cxl_memdev *cxlmd; 203 + int ret; 204 + 205 + wr_attrbs.scrub_cycle_hours = cycle; 206 + wr_attrbs.scrub_flags = flags; 207 + 208 + cxlmd = cxl_ps_ctx->cxlmd; 209 + cxl_mbox = &cxlmd->cxlds->cxl_mbox; 210 + ret = cxl_set_feature(cxl_mbox, &CXL_FEAT_PATROL_SCRUB_UUID, 211 + cxl_ps_ctx->set_version, &wr_attrbs, 212 + sizeof(wr_attrbs), 213 + CXL_SET_FEAT_FLAG_DATA_SAVED_ACROSS_RESET, 0, 214 + NULL); 215 + if (ret) 216 + return ret; 217 + 218 + if (cycle != cxlmd->scrub_cycle) { 219 + if (cxlmd->scrub_region_id != CXL_SCRUB_NO_REGION) 220 + dev_info(dev, 221 + "Device scrub rate(%d hours) set by region%d rate overwritten with device local scrub rate(%d hours)\n", 222 + cxlmd->scrub_cycle, cxlmd->scrub_region_id, 223 + cycle); 224 + 225 + cxlmd->scrub_cycle = cycle; 226 + cxlmd->scrub_region_id = CXL_SCRUB_NO_REGION; 227 + } 228 + 229 + return 0; 230 + } 231 + 232 + static int cxl_scrub_set_attrbs(struct device *dev, 233 + struct cxl_patrol_scrub_context *cxl_ps_ctx, 234 + u8 cycle, u8 flags) 235 + { 236 + if (cxl_ps_ctx->cxlr) 237 + return cxl_scrub_set_attrbs_region(dev, cxl_ps_ctx, cycle, flags); 238 + 239 + return cxl_scrub_set_attrbs_device(dev, cxl_ps_ctx, cycle, flags); 240 + } 241 + 242 + static int cxl_patrol_scrub_get_enabled_bg(struct device *dev, void *drv_data, 243 + bool *enabled) 244 + { 245 + struct cxl_patrol_scrub_context *ctx = drv_data; 246 + u8 cap, flags; 247 + u16 cycle; 248 + int ret; 249 + 250 + ret = cxl_scrub_get_attrbs(ctx, &cap, &cycle, &flags, NULL); 251 + if (ret) 252 + return ret; 253 + 254 + *enabled = CXL_GET_SCRUB_EN_STS(flags); 255 + 256 + return 0; 257 + } 258 + 259 + static int cxl_patrol_scrub_set_enabled_bg(struct device *dev, void *drv_data, 260 + bool enable) 261 + { 262 + struct cxl_patrol_scrub_context *ctx = drv_data; 263 + u8 cap, flags, wr_cycle; 264 + u16 rd_cycle; 265 + int ret; 266 + 267 + if (!capable(CAP_SYS_RAWIO)) 268 + return -EPERM; 269 + 270 + ret = cxl_scrub_get_attrbs(ctx, &cap, &rd_cycle, &flags, NULL); 271 + if (ret) 272 + return ret; 273 + 274 + wr_cycle = CXL_GET_SCRUB_CYCLE(rd_cycle); 275 + flags = CXL_SET_SCRUB_EN(enable); 276 + 277 + return cxl_scrub_set_attrbs(dev, ctx, wr_cycle, flags); 278 + } 279 + 280 + static int cxl_patrol_scrub_get_min_scrub_cycle(struct device *dev, 281 + void *drv_data, u32 *min) 282 + { 283 + struct cxl_patrol_scrub_context *ctx = drv_data; 284 + u8 cap, flags, min_cycle; 285 + u16 cycle; 286 + int ret; 287 + 288 + ret = cxl_scrub_get_attrbs(ctx, &cap, &cycle, &flags, &min_cycle); 289 + if (ret) 290 + return ret; 291 + 292 + *min = min_cycle * 3600; 293 + 294 + return 0; 295 + } 296 + 297 + static int cxl_patrol_scrub_get_max_scrub_cycle(struct device *dev, 298 + void *drv_data, u32 *max) 299 + { 300 + *max = U8_MAX * 3600; /* Max set by register size */ 301 + 302 + return 0; 303 + } 304 + 305 + static int cxl_patrol_scrub_get_scrub_cycle(struct device *dev, void *drv_data, 306 + u32 *scrub_cycle_secs) 307 + { 308 + struct cxl_patrol_scrub_context *ctx = drv_data; 309 + u8 cap, flags; 310 + u16 cycle; 311 + int ret; 312 + 313 + ret = cxl_scrub_get_attrbs(ctx, &cap, &cycle, &flags, NULL); 314 + if (ret) 315 + return ret; 316 + 317 + *scrub_cycle_secs = CXL_GET_SCRUB_CYCLE(cycle) * 3600; 318 + 319 + return 0; 320 + } 321 + 322 + static int cxl_patrol_scrub_set_scrub_cycle(struct device *dev, void *drv_data, 323 + u32 scrub_cycle_secs) 324 + { 325 + struct cxl_patrol_scrub_context *ctx = drv_data; 326 + u8 scrub_cycle_hours = scrub_cycle_secs / 3600; 327 + u8 cap, wr_cycle, flags, min_cycle; 328 + u16 rd_cycle; 329 + int ret; 330 + 331 + if (!capable(CAP_SYS_RAWIO)) 332 + return -EPERM; 333 + 334 + ret = cxl_scrub_get_attrbs(ctx, &cap, &rd_cycle, &flags, &min_cycle); 335 + if (ret) 336 + return ret; 337 + 338 + if (!CXL_GET_SCRUB_CYCLE_CHANGEABLE(cap)) 339 + return -EOPNOTSUPP; 340 + 341 + if (scrub_cycle_hours < min_cycle) { 342 + dev_dbg(dev, "Invalid CXL patrol scrub cycle(%d) to set\n", 343 + scrub_cycle_hours); 344 + dev_dbg(dev, 345 + "Minimum supported CXL patrol scrub cycle in hour %d\n", 346 + min_cycle); 347 + return -EINVAL; 348 + } 349 + wr_cycle = CXL_SET_SCRUB_CYCLE(scrub_cycle_hours); 350 + 351 + return cxl_scrub_set_attrbs(dev, ctx, wr_cycle, flags); 352 + } 353 + 354 + static const struct edac_scrub_ops cxl_ps_scrub_ops = { 355 + .get_enabled_bg = cxl_patrol_scrub_get_enabled_bg, 356 + .set_enabled_bg = cxl_patrol_scrub_set_enabled_bg, 357 + .get_min_cycle = cxl_patrol_scrub_get_min_scrub_cycle, 358 + .get_max_cycle = cxl_patrol_scrub_get_max_scrub_cycle, 359 + .get_cycle_duration = cxl_patrol_scrub_get_scrub_cycle, 360 + .set_cycle_duration = cxl_patrol_scrub_set_scrub_cycle, 361 + }; 362 + 363 + static int cxl_memdev_scrub_init(struct cxl_memdev *cxlmd, 364 + struct edac_dev_feature *ras_feature, 365 + u8 scrub_inst) 366 + { 367 + struct cxl_patrol_scrub_context *cxl_ps_ctx; 368 + struct cxl_feat_entry *feat_entry; 369 + u8 cap, flags; 370 + u16 cycle; 371 + int rc; 372 + 373 + feat_entry = cxl_feature_info(to_cxlfs(cxlmd->cxlds), 374 + &CXL_FEAT_PATROL_SCRUB_UUID); 375 + if (IS_ERR(feat_entry)) 376 + return -EOPNOTSUPP; 377 + 378 + if (!(le32_to_cpu(feat_entry->flags) & CXL_FEATURE_F_CHANGEABLE)) 379 + return -EOPNOTSUPP; 380 + 381 + cxl_ps_ctx = devm_kzalloc(&cxlmd->dev, sizeof(*cxl_ps_ctx), GFP_KERNEL); 382 + if (!cxl_ps_ctx) 383 + return -ENOMEM; 384 + 385 + *cxl_ps_ctx = (struct cxl_patrol_scrub_context){ 386 + .get_feat_size = le16_to_cpu(feat_entry->get_feat_size), 387 + .set_feat_size = le16_to_cpu(feat_entry->set_feat_size), 388 + .get_version = feat_entry->get_feat_ver, 389 + .set_version = feat_entry->set_feat_ver, 390 + .effects = le16_to_cpu(feat_entry->effects), 391 + .instance = scrub_inst, 392 + .cxlmd = cxlmd, 393 + }; 394 + 395 + rc = cxl_mem_scrub_get_attrbs(&cxlmd->cxlds->cxl_mbox, &cap, &cycle, 396 + &flags, NULL); 397 + if (rc) 398 + return rc; 399 + 400 + cxlmd->scrub_cycle = CXL_GET_SCRUB_CYCLE(cycle); 401 + cxlmd->scrub_region_id = CXL_SCRUB_NO_REGION; 402 + 403 + ras_feature->ft_type = RAS_FEAT_SCRUB; 404 + ras_feature->instance = cxl_ps_ctx->instance; 405 + ras_feature->scrub_ops = &cxl_ps_scrub_ops; 406 + ras_feature->ctx = cxl_ps_ctx; 407 + 408 + return 0; 409 + } 410 + 411 + static int cxl_region_scrub_init(struct cxl_region *cxlr, 412 + struct edac_dev_feature *ras_feature, 413 + u8 scrub_inst) 414 + { 415 + struct cxl_patrol_scrub_context *cxl_ps_ctx; 416 + struct cxl_region_params *p = &cxlr->params; 417 + struct cxl_feat_entry *feat_entry = NULL; 418 + struct cxl_memdev *cxlmd; 419 + u8 cap, flags; 420 + u16 cycle; 421 + int i, rc; 422 + 423 + /* 424 + * The cxl_region_rwsem must be held if the code below is used in a context 425 + * other than when the region is in the probe state, as shown here. 426 + */ 427 + for (i = 0; i < p->nr_targets; i++) { 428 + struct cxl_endpoint_decoder *cxled = p->targets[i]; 429 + 430 + cxlmd = cxled_to_memdev(cxled); 431 + feat_entry = cxl_feature_info(to_cxlfs(cxlmd->cxlds), 432 + &CXL_FEAT_PATROL_SCRUB_UUID); 433 + if (IS_ERR(feat_entry)) 434 + return -EOPNOTSUPP; 435 + 436 + if (!(le32_to_cpu(feat_entry->flags) & 437 + CXL_FEATURE_F_CHANGEABLE)) 438 + return -EOPNOTSUPP; 439 + 440 + rc = cxl_mem_scrub_get_attrbs(&cxlmd->cxlds->cxl_mbox, &cap, 441 + &cycle, &flags, NULL); 442 + if (rc) 443 + return rc; 444 + 445 + cxlmd->scrub_cycle = CXL_GET_SCRUB_CYCLE(cycle); 446 + cxlmd->scrub_region_id = CXL_SCRUB_NO_REGION; 447 + } 448 + 449 + cxl_ps_ctx = devm_kzalloc(&cxlr->dev, sizeof(*cxl_ps_ctx), GFP_KERNEL); 450 + if (!cxl_ps_ctx) 451 + return -ENOMEM; 452 + 453 + *cxl_ps_ctx = (struct cxl_patrol_scrub_context){ 454 + .get_feat_size = le16_to_cpu(feat_entry->get_feat_size), 455 + .set_feat_size = le16_to_cpu(feat_entry->set_feat_size), 456 + .get_version = feat_entry->get_feat_ver, 457 + .set_version = feat_entry->set_feat_ver, 458 + .effects = le16_to_cpu(feat_entry->effects), 459 + .instance = scrub_inst, 460 + .cxlr = cxlr, 461 + }; 462 + 463 + ras_feature->ft_type = RAS_FEAT_SCRUB; 464 + ras_feature->instance = cxl_ps_ctx->instance; 465 + ras_feature->scrub_ops = &cxl_ps_scrub_ops; 466 + ras_feature->ctx = cxl_ps_ctx; 467 + 468 + return 0; 469 + } 470 + 471 + struct cxl_ecs_context { 472 + u16 num_media_frus; 473 + u16 get_feat_size; 474 + u16 set_feat_size; 475 + u8 get_version; 476 + u8 set_version; 477 + u16 effects; 478 + struct cxl_memdev *cxlmd; 479 + }; 480 + 481 + /* 482 + * See CXL spec rev 3.2 @8.2.10.9.11.2 Table 8-225 DDR5 ECS Control Feature 483 + * Readable Attributes. 484 + */ 485 + struct cxl_ecs_fru_rd_attrbs { 486 + u8 ecs_cap; 487 + __le16 ecs_config; 488 + u8 ecs_flags; 489 + } __packed; 490 + 491 + struct cxl_ecs_rd_attrbs { 492 + u8 ecs_log_cap; 493 + struct cxl_ecs_fru_rd_attrbs fru_attrbs[]; 494 + } __packed; 495 + 496 + /* 497 + * See CXL spec rev 3.2 @8.2.10.9.11.2 Table 8-226 DDR5 ECS Control Feature 498 + * Writable Attributes. 499 + */ 500 + struct cxl_ecs_fru_wr_attrbs { 501 + __le16 ecs_config; 502 + } __packed; 503 + 504 + struct cxl_ecs_wr_attrbs { 505 + u8 ecs_log_cap; 506 + struct cxl_ecs_fru_wr_attrbs fru_attrbs[]; 507 + } __packed; 508 + 509 + #define CXL_ECS_LOG_ENTRY_TYPE_MASK GENMASK(1, 0) 510 + #define CXL_ECS_REALTIME_REPORT_CAP_MASK BIT(0) 511 + #define CXL_ECS_THRESHOLD_COUNT_MASK GENMASK(2, 0) 512 + #define CXL_ECS_COUNT_MODE_MASK BIT(3) 513 + #define CXL_ECS_RESET_COUNTER_MASK BIT(4) 514 + #define CXL_ECS_RESET_COUNTER 1 515 + 516 + enum { 517 + ECS_THRESHOLD_256 = 256, 518 + ECS_THRESHOLD_1024 = 1024, 519 + ECS_THRESHOLD_4096 = 4096, 520 + }; 521 + 522 + enum { 523 + ECS_THRESHOLD_IDX_256 = 3, 524 + ECS_THRESHOLD_IDX_1024 = 4, 525 + ECS_THRESHOLD_IDX_4096 = 5, 526 + }; 527 + 528 + static const u16 ecs_supp_threshold[] = { 529 + [ECS_THRESHOLD_IDX_256] = 256, 530 + [ECS_THRESHOLD_IDX_1024] = 1024, 531 + [ECS_THRESHOLD_IDX_4096] = 4096, 532 + }; 533 + 534 + enum { 535 + ECS_LOG_ENTRY_TYPE_DRAM = 0x0, 536 + ECS_LOG_ENTRY_TYPE_MEM_MEDIA_FRU = 0x1, 537 + }; 538 + 539 + enum cxl_ecs_count_mode { 540 + ECS_MODE_COUNTS_ROWS = 0, 541 + ECS_MODE_COUNTS_CODEWORDS = 1, 542 + }; 543 + 544 + static int cxl_mem_ecs_get_attrbs(struct device *dev, 545 + struct cxl_ecs_context *cxl_ecs_ctx, 546 + int fru_id, u8 *log_cap, u16 *config) 547 + { 548 + struct cxl_memdev *cxlmd = cxl_ecs_ctx->cxlmd; 549 + struct cxl_mailbox *cxl_mbox = &cxlmd->cxlds->cxl_mbox; 550 + struct cxl_ecs_fru_rd_attrbs *fru_rd_attrbs; 551 + size_t rd_data_size; 552 + size_t data_size; 553 + 554 + rd_data_size = cxl_ecs_ctx->get_feat_size; 555 + 556 + struct cxl_ecs_rd_attrbs *rd_attrbs __free(kvfree) = 557 + kvzalloc(rd_data_size, GFP_KERNEL); 558 + if (!rd_attrbs) 559 + return -ENOMEM; 560 + 561 + data_size = cxl_get_feature(cxl_mbox, &CXL_FEAT_ECS_UUID, 562 + CXL_GET_FEAT_SEL_CURRENT_VALUE, rd_attrbs, 563 + rd_data_size, 0, NULL); 564 + if (!data_size) 565 + return -EIO; 566 + 567 + fru_rd_attrbs = rd_attrbs->fru_attrbs; 568 + *log_cap = rd_attrbs->ecs_log_cap; 569 + *config = le16_to_cpu(fru_rd_attrbs[fru_id].ecs_config); 570 + 571 + return 0; 572 + } 573 + 574 + static int cxl_mem_ecs_set_attrbs(struct device *dev, 575 + struct cxl_ecs_context *cxl_ecs_ctx, 576 + int fru_id, u8 log_cap, u16 config) 577 + { 578 + struct cxl_memdev *cxlmd = cxl_ecs_ctx->cxlmd; 579 + struct cxl_mailbox *cxl_mbox = &cxlmd->cxlds->cxl_mbox; 580 + struct cxl_ecs_fru_rd_attrbs *fru_rd_attrbs; 581 + struct cxl_ecs_fru_wr_attrbs *fru_wr_attrbs; 582 + size_t rd_data_size, wr_data_size; 583 + u16 num_media_frus, count; 584 + size_t data_size; 585 + 586 + num_media_frus = cxl_ecs_ctx->num_media_frus; 587 + rd_data_size = cxl_ecs_ctx->get_feat_size; 588 + wr_data_size = cxl_ecs_ctx->set_feat_size; 589 + struct cxl_ecs_rd_attrbs *rd_attrbs __free(kvfree) = 590 + kvzalloc(rd_data_size, GFP_KERNEL); 591 + if (!rd_attrbs) 592 + return -ENOMEM; 593 + 594 + data_size = cxl_get_feature(cxl_mbox, &CXL_FEAT_ECS_UUID, 595 + CXL_GET_FEAT_SEL_CURRENT_VALUE, rd_attrbs, 596 + rd_data_size, 0, NULL); 597 + if (!data_size) 598 + return -EIO; 599 + 600 + struct cxl_ecs_wr_attrbs *wr_attrbs __free(kvfree) = 601 + kvzalloc(wr_data_size, GFP_KERNEL); 602 + if (!wr_attrbs) 603 + return -ENOMEM; 604 + 605 + /* 606 + * Fill writable attributes from the current attributes read 607 + * for all the media FRUs. 608 + */ 609 + fru_rd_attrbs = rd_attrbs->fru_attrbs; 610 + fru_wr_attrbs = wr_attrbs->fru_attrbs; 611 + wr_attrbs->ecs_log_cap = log_cap; 612 + for (count = 0; count < num_media_frus; count++) 613 + fru_wr_attrbs[count].ecs_config = 614 + fru_rd_attrbs[count].ecs_config; 615 + 616 + fru_wr_attrbs[fru_id].ecs_config = cpu_to_le16(config); 617 + 618 + return cxl_set_feature(cxl_mbox, &CXL_FEAT_ECS_UUID, 619 + cxl_ecs_ctx->set_version, wr_attrbs, 620 + wr_data_size, 621 + CXL_SET_FEAT_FLAG_DATA_SAVED_ACROSS_RESET, 622 + 0, NULL); 623 + } 624 + 625 + static u8 cxl_get_ecs_log_entry_type(u8 log_cap, u16 config) 626 + { 627 + return FIELD_GET(CXL_ECS_LOG_ENTRY_TYPE_MASK, log_cap); 628 + } 629 + 630 + static u16 cxl_get_ecs_threshold(u8 log_cap, u16 config) 631 + { 632 + u8 index = FIELD_GET(CXL_ECS_THRESHOLD_COUNT_MASK, config); 633 + 634 + return ecs_supp_threshold[index]; 635 + } 636 + 637 + static u8 cxl_get_ecs_count_mode(u8 log_cap, u16 config) 638 + { 639 + return FIELD_GET(CXL_ECS_COUNT_MODE_MASK, config); 640 + } 641 + 642 + #define CXL_ECS_GET_ATTR(attrb) \ 643 + static int cxl_ecs_get_##attrb(struct device *dev, void *drv_data, \ 644 + int fru_id, u32 *val) \ 645 + { \ 646 + struct cxl_ecs_context *ctx = drv_data; \ 647 + u8 log_cap; \ 648 + u16 config; \ 649 + int ret; \ 650 + \ 651 + ret = cxl_mem_ecs_get_attrbs(dev, ctx, fru_id, &log_cap, \ 652 + &config); \ 653 + if (ret) \ 654 + return ret; \ 655 + \ 656 + *val = cxl_get_ecs_##attrb(log_cap, config); \ 657 + \ 658 + return 0; \ 659 + } 660 + 661 + CXL_ECS_GET_ATTR(log_entry_type) 662 + CXL_ECS_GET_ATTR(count_mode) 663 + CXL_ECS_GET_ATTR(threshold) 664 + 665 + static int cxl_set_ecs_log_entry_type(struct device *dev, u8 *log_cap, 666 + u16 *config, u32 val) 667 + { 668 + if (val != ECS_LOG_ENTRY_TYPE_DRAM && 669 + val != ECS_LOG_ENTRY_TYPE_MEM_MEDIA_FRU) 670 + return -EINVAL; 671 + 672 + *log_cap = FIELD_PREP(CXL_ECS_LOG_ENTRY_TYPE_MASK, val); 673 + 674 + return 0; 675 + } 676 + 677 + static int cxl_set_ecs_threshold(struct device *dev, u8 *log_cap, u16 *config, 678 + u32 val) 679 + { 680 + *config &= ~CXL_ECS_THRESHOLD_COUNT_MASK; 681 + 682 + switch (val) { 683 + case ECS_THRESHOLD_256: 684 + *config |= FIELD_PREP(CXL_ECS_THRESHOLD_COUNT_MASK, 685 + ECS_THRESHOLD_IDX_256); 686 + break; 687 + case ECS_THRESHOLD_1024: 688 + *config |= FIELD_PREP(CXL_ECS_THRESHOLD_COUNT_MASK, 689 + ECS_THRESHOLD_IDX_1024); 690 + break; 691 + case ECS_THRESHOLD_4096: 692 + *config |= FIELD_PREP(CXL_ECS_THRESHOLD_COUNT_MASK, 693 + ECS_THRESHOLD_IDX_4096); 694 + break; 695 + default: 696 + dev_dbg(dev, "Invalid CXL ECS threshold count(%d) to set\n", 697 + val); 698 + dev_dbg(dev, "Supported ECS threshold counts: %u, %u, %u\n", 699 + ECS_THRESHOLD_256, ECS_THRESHOLD_1024, 700 + ECS_THRESHOLD_4096); 701 + return -EINVAL; 702 + } 703 + 704 + return 0; 705 + } 706 + 707 + static int cxl_set_ecs_count_mode(struct device *dev, u8 *log_cap, u16 *config, 708 + u32 val) 709 + { 710 + if (val != ECS_MODE_COUNTS_ROWS && val != ECS_MODE_COUNTS_CODEWORDS) { 711 + dev_dbg(dev, "Invalid CXL ECS scrub mode(%d) to set\n", val); 712 + dev_dbg(dev, 713 + "Supported ECS Modes: 0: ECS counts rows with errors," 714 + " 1: ECS counts codewords with errors\n"); 715 + return -EINVAL; 716 + } 717 + 718 + *config &= ~CXL_ECS_COUNT_MODE_MASK; 719 + *config |= FIELD_PREP(CXL_ECS_COUNT_MODE_MASK, val); 720 + 721 + return 0; 722 + } 723 + 724 + static int cxl_set_ecs_reset_counter(struct device *dev, u8 *log_cap, 725 + u16 *config, u32 val) 726 + { 727 + if (val != CXL_ECS_RESET_COUNTER) 728 + return -EINVAL; 729 + 730 + *config &= ~CXL_ECS_RESET_COUNTER_MASK; 731 + *config |= FIELD_PREP(CXL_ECS_RESET_COUNTER_MASK, val); 732 + 733 + return 0; 734 + } 735 + 736 + #define CXL_ECS_SET_ATTR(attrb) \ 737 + static int cxl_ecs_set_##attrb(struct device *dev, void *drv_data, \ 738 + int fru_id, u32 val) \ 739 + { \ 740 + struct cxl_ecs_context *ctx = drv_data; \ 741 + u8 log_cap; \ 742 + u16 config; \ 743 + int ret; \ 744 + \ 745 + if (!capable(CAP_SYS_RAWIO)) \ 746 + return -EPERM; \ 747 + \ 748 + ret = cxl_mem_ecs_get_attrbs(dev, ctx, fru_id, &log_cap, \ 749 + &config); \ 750 + if (ret) \ 751 + return ret; \ 752 + \ 753 + ret = cxl_set_ecs_##attrb(dev, &log_cap, &config, val); \ 754 + if (ret) \ 755 + return ret; \ 756 + \ 757 + return cxl_mem_ecs_set_attrbs(dev, ctx, fru_id, log_cap, \ 758 + config); \ 759 + } 760 + CXL_ECS_SET_ATTR(log_entry_type) 761 + CXL_ECS_SET_ATTR(count_mode) 762 + CXL_ECS_SET_ATTR(reset_counter) 763 + CXL_ECS_SET_ATTR(threshold) 764 + 765 + static const struct edac_ecs_ops cxl_ecs_ops = { 766 + .get_log_entry_type = cxl_ecs_get_log_entry_type, 767 + .set_log_entry_type = cxl_ecs_set_log_entry_type, 768 + .get_mode = cxl_ecs_get_count_mode, 769 + .set_mode = cxl_ecs_set_count_mode, 770 + .reset = cxl_ecs_set_reset_counter, 771 + .get_threshold = cxl_ecs_get_threshold, 772 + .set_threshold = cxl_ecs_set_threshold, 773 + }; 774 + 775 + static int cxl_memdev_ecs_init(struct cxl_memdev *cxlmd, 776 + struct edac_dev_feature *ras_feature) 777 + { 778 + struct cxl_ecs_context *cxl_ecs_ctx; 779 + struct cxl_feat_entry *feat_entry; 780 + int num_media_frus; 781 + 782 + feat_entry = 783 + cxl_feature_info(to_cxlfs(cxlmd->cxlds), &CXL_FEAT_ECS_UUID); 784 + if (IS_ERR(feat_entry)) 785 + return -EOPNOTSUPP; 786 + 787 + if (!(le32_to_cpu(feat_entry->flags) & CXL_FEATURE_F_CHANGEABLE)) 788 + return -EOPNOTSUPP; 789 + 790 + num_media_frus = (le16_to_cpu(feat_entry->get_feat_size) - 791 + sizeof(struct cxl_ecs_rd_attrbs)) / 792 + sizeof(struct cxl_ecs_fru_rd_attrbs); 793 + if (!num_media_frus) 794 + return -EOPNOTSUPP; 795 + 796 + cxl_ecs_ctx = 797 + devm_kzalloc(&cxlmd->dev, sizeof(*cxl_ecs_ctx), GFP_KERNEL); 798 + if (!cxl_ecs_ctx) 799 + return -ENOMEM; 800 + 801 + *cxl_ecs_ctx = (struct cxl_ecs_context){ 802 + .get_feat_size = le16_to_cpu(feat_entry->get_feat_size), 803 + .set_feat_size = le16_to_cpu(feat_entry->set_feat_size), 804 + .get_version = feat_entry->get_feat_ver, 805 + .set_version = feat_entry->set_feat_ver, 806 + .effects = le16_to_cpu(feat_entry->effects), 807 + .num_media_frus = num_media_frus, 808 + .cxlmd = cxlmd, 809 + }; 810 + 811 + ras_feature->ft_type = RAS_FEAT_ECS; 812 + ras_feature->ecs_ops = &cxl_ecs_ops; 813 + ras_feature->ctx = cxl_ecs_ctx; 814 + ras_feature->ecs_info.num_media_frus = num_media_frus; 815 + 816 + return 0; 817 + } 818 + 819 + /* 820 + * Perform Maintenance CXL 3.2 Spec 8.2.10.7.1 821 + */ 822 + 823 + /* 824 + * Perform Maintenance input payload 825 + * CXL rev 3.2 section 8.2.10.7.1 Table 8-117 826 + */ 827 + struct cxl_mbox_maintenance_hdr { 828 + u8 op_class; 829 + u8 op_subclass; 830 + } __packed; 831 + 832 + static int cxl_perform_maintenance(struct cxl_mailbox *cxl_mbox, u8 class, 833 + u8 subclass, void *data_in, 834 + size_t data_in_size) 835 + { 836 + struct cxl_memdev_maintenance_pi { 837 + struct cxl_mbox_maintenance_hdr hdr; 838 + u8 data[]; 839 + } __packed; 840 + struct cxl_mbox_cmd mbox_cmd; 841 + size_t hdr_size; 842 + 843 + struct cxl_memdev_maintenance_pi *pi __free(kvfree) = 844 + kvzalloc(cxl_mbox->payload_size, GFP_KERNEL); 845 + if (!pi) 846 + return -ENOMEM; 847 + 848 + pi->hdr.op_class = class; 849 + pi->hdr.op_subclass = subclass; 850 + hdr_size = sizeof(pi->hdr); 851 + /* 852 + * Check minimum mbox payload size is available for 853 + * the maintenance data transfer. 854 + */ 855 + if (hdr_size + data_in_size > cxl_mbox->payload_size) 856 + return -ENOMEM; 857 + 858 + memcpy(pi->data, data_in, data_in_size); 859 + mbox_cmd = (struct cxl_mbox_cmd){ 860 + .opcode = CXL_MBOX_OP_DO_MAINTENANCE, 861 + .size_in = hdr_size + data_in_size, 862 + .payload_in = pi, 863 + }; 864 + 865 + return cxl_internal_send_cmd(cxl_mbox, &mbox_cmd); 866 + } 867 + 868 + /* 869 + * Support for finding a memory operation attributes 870 + * are from the current boot or not. 871 + */ 872 + 873 + struct cxl_mem_err_rec { 874 + struct xarray rec_gen_media; 875 + struct xarray rec_dram; 876 + }; 877 + 878 + enum cxl_mem_repair_type { 879 + CXL_PPR, 880 + CXL_CACHELINE_SPARING, 881 + CXL_ROW_SPARING, 882 + CXL_BANK_SPARING, 883 + CXL_RANK_SPARING, 884 + CXL_REPAIR_MAX, 885 + }; 886 + 887 + /** 888 + * struct cxl_mem_repair_attrbs - CXL memory repair attributes 889 + * @dpa: DPA of memory to repair 890 + * @nibble_mask: nibble mask, identifies one or more nibbles on the memory bus 891 + * @row: row of memory to repair 892 + * @column: column of memory to repair 893 + * @channel: channel of memory to repair 894 + * @sub_channel: sub channel of memory to repair 895 + * @rank: rank of memory to repair 896 + * @bank_group: bank group of memory to repair 897 + * @bank: bank of memory to repair 898 + * @repair_type: repair type. For eg. PPR, memory sparing etc. 899 + */ 900 + struct cxl_mem_repair_attrbs { 901 + u64 dpa; 902 + u32 nibble_mask; 903 + u32 row; 904 + u16 column; 905 + u8 channel; 906 + u8 sub_channel; 907 + u8 rank; 908 + u8 bank_group; 909 + u8 bank; 910 + enum cxl_mem_repair_type repair_type; 911 + }; 912 + 913 + static struct cxl_event_gen_media * 914 + cxl_find_rec_gen_media(struct cxl_memdev *cxlmd, 915 + struct cxl_mem_repair_attrbs *attrbs) 916 + { 917 + struct cxl_mem_err_rec *array_rec = cxlmd->err_rec_array; 918 + struct cxl_event_gen_media *rec; 919 + 920 + if (!array_rec) 921 + return NULL; 922 + 923 + rec = xa_load(&array_rec->rec_gen_media, attrbs->dpa); 924 + if (!rec) 925 + return NULL; 926 + 927 + if (attrbs->repair_type == CXL_PPR) 928 + return rec; 929 + 930 + return NULL; 931 + } 932 + 933 + static struct cxl_event_dram * 934 + cxl_find_rec_dram(struct cxl_memdev *cxlmd, 935 + struct cxl_mem_repair_attrbs *attrbs) 936 + { 937 + struct cxl_mem_err_rec *array_rec = cxlmd->err_rec_array; 938 + struct cxl_event_dram *rec; 939 + u16 validity_flags; 940 + 941 + if (!array_rec) 942 + return NULL; 943 + 944 + rec = xa_load(&array_rec->rec_dram, attrbs->dpa); 945 + if (!rec) 946 + return NULL; 947 + 948 + validity_flags = get_unaligned_le16(rec->media_hdr.validity_flags); 949 + if (!(validity_flags & CXL_DER_VALID_CHANNEL) || 950 + !(validity_flags & CXL_DER_VALID_RANK)) 951 + return NULL; 952 + 953 + switch (attrbs->repair_type) { 954 + case CXL_PPR: 955 + if (!(validity_flags & CXL_DER_VALID_NIBBLE) || 956 + get_unaligned_le24(rec->nibble_mask) == attrbs->nibble_mask) 957 + return rec; 958 + break; 959 + case CXL_CACHELINE_SPARING: 960 + if (!(validity_flags & CXL_DER_VALID_BANK_GROUP) || 961 + !(validity_flags & CXL_DER_VALID_BANK) || 962 + !(validity_flags & CXL_DER_VALID_ROW) || 963 + !(validity_flags & CXL_DER_VALID_COLUMN)) 964 + return NULL; 965 + 966 + if (rec->media_hdr.channel == attrbs->channel && 967 + rec->media_hdr.rank == attrbs->rank && 968 + rec->bank_group == attrbs->bank_group && 969 + rec->bank == attrbs->bank && 970 + get_unaligned_le24(rec->row) == attrbs->row && 971 + get_unaligned_le16(rec->column) == attrbs->column && 972 + (!(validity_flags & CXL_DER_VALID_NIBBLE) || 973 + get_unaligned_le24(rec->nibble_mask) == 974 + attrbs->nibble_mask) && 975 + (!(validity_flags & CXL_DER_VALID_SUB_CHANNEL) || 976 + rec->sub_channel == attrbs->sub_channel)) 977 + return rec; 978 + break; 979 + case CXL_ROW_SPARING: 980 + if (!(validity_flags & CXL_DER_VALID_BANK_GROUP) || 981 + !(validity_flags & CXL_DER_VALID_BANK) || 982 + !(validity_flags & CXL_DER_VALID_ROW)) 983 + return NULL; 984 + 985 + if (rec->media_hdr.channel == attrbs->channel && 986 + rec->media_hdr.rank == attrbs->rank && 987 + rec->bank_group == attrbs->bank_group && 988 + rec->bank == attrbs->bank && 989 + get_unaligned_le24(rec->row) == attrbs->row && 990 + (!(validity_flags & CXL_DER_VALID_NIBBLE) || 991 + get_unaligned_le24(rec->nibble_mask) == 992 + attrbs->nibble_mask)) 993 + return rec; 994 + break; 995 + case CXL_BANK_SPARING: 996 + if (!(validity_flags & CXL_DER_VALID_BANK_GROUP) || 997 + !(validity_flags & CXL_DER_VALID_BANK)) 998 + return NULL; 999 + 1000 + if (rec->media_hdr.channel == attrbs->channel && 1001 + rec->media_hdr.rank == attrbs->rank && 1002 + rec->bank_group == attrbs->bank_group && 1003 + rec->bank == attrbs->bank && 1004 + (!(validity_flags & CXL_DER_VALID_NIBBLE) || 1005 + get_unaligned_le24(rec->nibble_mask) == 1006 + attrbs->nibble_mask)) 1007 + return rec; 1008 + break; 1009 + case CXL_RANK_SPARING: 1010 + if (rec->media_hdr.channel == attrbs->channel && 1011 + rec->media_hdr.rank == attrbs->rank && 1012 + (!(validity_flags & CXL_DER_VALID_NIBBLE) || 1013 + get_unaligned_le24(rec->nibble_mask) == 1014 + attrbs->nibble_mask)) 1015 + return rec; 1016 + break; 1017 + default: 1018 + return NULL; 1019 + } 1020 + 1021 + return NULL; 1022 + } 1023 + 1024 + #define CXL_MAX_STORAGE_DAYS 10 1025 + #define CXL_MAX_STORAGE_TIME_SECS (CXL_MAX_STORAGE_DAYS * 24 * 60 * 60) 1026 + 1027 + static void cxl_del_expired_gmedia_recs(struct xarray *rec_xarray, 1028 + struct cxl_event_gen_media *cur_rec) 1029 + { 1030 + u64 cur_ts = le64_to_cpu(cur_rec->media_hdr.hdr.timestamp); 1031 + struct cxl_event_gen_media *rec; 1032 + unsigned long index; 1033 + u64 delta_ts_secs; 1034 + 1035 + xa_for_each(rec_xarray, index, rec) { 1036 + delta_ts_secs = (cur_ts - 1037 + le64_to_cpu(rec->media_hdr.hdr.timestamp)) / 1000000000ULL; 1038 + if (delta_ts_secs >= CXL_MAX_STORAGE_TIME_SECS) { 1039 + xa_erase(rec_xarray, index); 1040 + kfree(rec); 1041 + } 1042 + } 1043 + } 1044 + 1045 + static void cxl_del_expired_dram_recs(struct xarray *rec_xarray, 1046 + struct cxl_event_dram *cur_rec) 1047 + { 1048 + u64 cur_ts = le64_to_cpu(cur_rec->media_hdr.hdr.timestamp); 1049 + struct cxl_event_dram *rec; 1050 + unsigned long index; 1051 + u64 delta_secs; 1052 + 1053 + xa_for_each(rec_xarray, index, rec) { 1054 + delta_secs = (cur_ts - 1055 + le64_to_cpu(rec->media_hdr.hdr.timestamp)) / 1000000000ULL; 1056 + if (delta_secs >= CXL_MAX_STORAGE_TIME_SECS) { 1057 + xa_erase(rec_xarray, index); 1058 + kfree(rec); 1059 + } 1060 + } 1061 + } 1062 + 1063 + #define CXL_MAX_REC_STORAGE_COUNT 200 1064 + 1065 + static void cxl_del_overflow_old_recs(struct xarray *rec_xarray) 1066 + { 1067 + void *err_rec; 1068 + unsigned long index, count = 0; 1069 + 1070 + xa_for_each(rec_xarray, index, err_rec) 1071 + count++; 1072 + 1073 + if (count <= CXL_MAX_REC_STORAGE_COUNT) 1074 + return; 1075 + 1076 + count -= CXL_MAX_REC_STORAGE_COUNT; 1077 + xa_for_each(rec_xarray, index, err_rec) { 1078 + xa_erase(rec_xarray, index); 1079 + kfree(err_rec); 1080 + count--; 1081 + if (!count) 1082 + break; 1083 + } 1084 + } 1085 + 1086 + int cxl_store_rec_gen_media(struct cxl_memdev *cxlmd, union cxl_event *evt) 1087 + { 1088 + struct cxl_mem_err_rec *array_rec = cxlmd->err_rec_array; 1089 + struct cxl_event_gen_media *rec; 1090 + void *old_rec; 1091 + 1092 + if (!IS_ENABLED(CONFIG_CXL_EDAC_MEM_REPAIR) || !array_rec) 1093 + return 0; 1094 + 1095 + rec = kmemdup(&evt->gen_media, sizeof(*rec), GFP_KERNEL); 1096 + if (!rec) 1097 + return -ENOMEM; 1098 + 1099 + old_rec = xa_store(&array_rec->rec_gen_media, 1100 + le64_to_cpu(rec->media_hdr.phys_addr), rec, 1101 + GFP_KERNEL); 1102 + if (xa_is_err(old_rec)) 1103 + return xa_err(old_rec); 1104 + 1105 + kfree(old_rec); 1106 + 1107 + cxl_del_expired_gmedia_recs(&array_rec->rec_gen_media, rec); 1108 + cxl_del_overflow_old_recs(&array_rec->rec_gen_media); 1109 + 1110 + return 0; 1111 + } 1112 + EXPORT_SYMBOL_NS_GPL(cxl_store_rec_gen_media, "CXL"); 1113 + 1114 + int cxl_store_rec_dram(struct cxl_memdev *cxlmd, union cxl_event *evt) 1115 + { 1116 + struct cxl_mem_err_rec *array_rec = cxlmd->err_rec_array; 1117 + struct cxl_event_dram *rec; 1118 + void *old_rec; 1119 + 1120 + if (!IS_ENABLED(CONFIG_CXL_EDAC_MEM_REPAIR) || !array_rec) 1121 + return 0; 1122 + 1123 + rec = kmemdup(&evt->dram, sizeof(*rec), GFP_KERNEL); 1124 + if (!rec) 1125 + return -ENOMEM; 1126 + 1127 + old_rec = xa_store(&array_rec->rec_dram, 1128 + le64_to_cpu(rec->media_hdr.phys_addr), rec, 1129 + GFP_KERNEL); 1130 + if (xa_is_err(old_rec)) 1131 + return xa_err(old_rec); 1132 + 1133 + kfree(old_rec); 1134 + 1135 + cxl_del_expired_dram_recs(&array_rec->rec_dram, rec); 1136 + cxl_del_overflow_old_recs(&array_rec->rec_dram); 1137 + 1138 + return 0; 1139 + } 1140 + EXPORT_SYMBOL_NS_GPL(cxl_store_rec_dram, "CXL"); 1141 + 1142 + static bool cxl_is_memdev_memory_online(const struct cxl_memdev *cxlmd) 1143 + { 1144 + struct cxl_port *port = cxlmd->endpoint; 1145 + 1146 + if (port && cxl_num_decoders_committed(port)) 1147 + return true; 1148 + 1149 + return false; 1150 + } 1151 + 1152 + /* 1153 + * CXL memory sparing control 1154 + */ 1155 + enum cxl_mem_sparing_granularity { 1156 + CXL_MEM_SPARING_CACHELINE, 1157 + CXL_MEM_SPARING_ROW, 1158 + CXL_MEM_SPARING_BANK, 1159 + CXL_MEM_SPARING_RANK, 1160 + CXL_MEM_SPARING_MAX 1161 + }; 1162 + 1163 + struct cxl_mem_sparing_context { 1164 + struct cxl_memdev *cxlmd; 1165 + uuid_t repair_uuid; 1166 + u16 get_feat_size; 1167 + u16 set_feat_size; 1168 + u16 effects; 1169 + u8 instance; 1170 + u8 get_version; 1171 + u8 set_version; 1172 + u8 op_class; 1173 + u8 op_subclass; 1174 + bool cap_safe_when_in_use; 1175 + bool cap_hard_sparing; 1176 + bool cap_soft_sparing; 1177 + u8 channel; 1178 + u8 rank; 1179 + u8 bank_group; 1180 + u32 nibble_mask; 1181 + u64 dpa; 1182 + u32 row; 1183 + u16 column; 1184 + u8 bank; 1185 + u8 sub_channel; 1186 + enum edac_mem_repair_type repair_type; 1187 + bool persist_mode; 1188 + }; 1189 + 1190 + #define CXL_SPARING_RD_CAP_SAFE_IN_USE_MASK BIT(0) 1191 + #define CXL_SPARING_RD_CAP_HARD_SPARING_MASK BIT(1) 1192 + #define CXL_SPARING_RD_CAP_SOFT_SPARING_MASK BIT(2) 1193 + 1194 + #define CXL_SPARING_WR_DEVICE_INITIATED_MASK BIT(0) 1195 + 1196 + #define CXL_SPARING_QUERY_RESOURCE_FLAG BIT(0) 1197 + #define CXL_SET_HARD_SPARING_FLAG BIT(1) 1198 + #define CXL_SPARING_SUB_CHNL_VALID_FLAG BIT(2) 1199 + #define CXL_SPARING_NIB_MASK_VALID_FLAG BIT(3) 1200 + 1201 + #define CXL_GET_SPARING_SAFE_IN_USE(flags) \ 1202 + (FIELD_GET(CXL_SPARING_RD_CAP_SAFE_IN_USE_MASK, \ 1203 + flags) ^ 1) 1204 + #define CXL_GET_CAP_HARD_SPARING(flags) \ 1205 + FIELD_GET(CXL_SPARING_RD_CAP_HARD_SPARING_MASK, \ 1206 + flags) 1207 + #define CXL_GET_CAP_SOFT_SPARING(flags) \ 1208 + FIELD_GET(CXL_SPARING_RD_CAP_SOFT_SPARING_MASK, \ 1209 + flags) 1210 + 1211 + #define CXL_SET_SPARING_QUERY_RESOURCE(val) \ 1212 + FIELD_PREP(CXL_SPARING_QUERY_RESOURCE_FLAG, val) 1213 + #define CXL_SET_HARD_SPARING(val) \ 1214 + FIELD_PREP(CXL_SET_HARD_SPARING_FLAG, val) 1215 + #define CXL_SET_SPARING_SUB_CHNL_VALID(val) \ 1216 + FIELD_PREP(CXL_SPARING_SUB_CHNL_VALID_FLAG, val) 1217 + #define CXL_SET_SPARING_NIB_MASK_VALID(val) \ 1218 + FIELD_PREP(CXL_SPARING_NIB_MASK_VALID_FLAG, val) 1219 + 1220 + /* 1221 + * See CXL spec rev 3.2 @8.2.10.7.2.3 Table 8-134 Memory Sparing Feature 1222 + * Readable Attributes. 1223 + */ 1224 + struct cxl_memdev_repair_rd_attrbs_hdr { 1225 + u8 max_op_latency; 1226 + __le16 op_cap; 1227 + __le16 op_mode; 1228 + u8 op_class; 1229 + u8 op_subclass; 1230 + u8 rsvd[9]; 1231 + } __packed; 1232 + 1233 + struct cxl_memdev_sparing_rd_attrbs { 1234 + struct cxl_memdev_repair_rd_attrbs_hdr hdr; 1235 + u8 rsvd; 1236 + __le16 restriction_flags; 1237 + } __packed; 1238 + 1239 + /* 1240 + * See CXL spec rev 3.2 @8.2.10.7.1.4 Table 8-120 Memory Sparing Input Payload. 1241 + */ 1242 + struct cxl_memdev_sparing_in_payload { 1243 + u8 flags; 1244 + u8 channel; 1245 + u8 rank; 1246 + u8 nibble_mask[3]; 1247 + u8 bank_group; 1248 + u8 bank; 1249 + u8 row[3]; 1250 + __le16 column; 1251 + u8 sub_channel; 1252 + } __packed; 1253 + 1254 + static int 1255 + cxl_mem_sparing_get_attrbs(struct cxl_mem_sparing_context *cxl_sparing_ctx) 1256 + { 1257 + size_t rd_data_size = sizeof(struct cxl_memdev_sparing_rd_attrbs); 1258 + struct cxl_memdev *cxlmd = cxl_sparing_ctx->cxlmd; 1259 + struct cxl_mailbox *cxl_mbox = &cxlmd->cxlds->cxl_mbox; 1260 + u16 restriction_flags; 1261 + size_t data_size; 1262 + u16 return_code; 1263 + struct cxl_memdev_sparing_rd_attrbs *rd_attrbs __free(kfree) = 1264 + kzalloc(rd_data_size, GFP_KERNEL); 1265 + if (!rd_attrbs) 1266 + return -ENOMEM; 1267 + 1268 + data_size = cxl_get_feature(cxl_mbox, &cxl_sparing_ctx->repair_uuid, 1269 + CXL_GET_FEAT_SEL_CURRENT_VALUE, rd_attrbs, 1270 + rd_data_size, 0, &return_code); 1271 + if (!data_size) 1272 + return -EIO; 1273 + 1274 + cxl_sparing_ctx->op_class = rd_attrbs->hdr.op_class; 1275 + cxl_sparing_ctx->op_subclass = rd_attrbs->hdr.op_subclass; 1276 + restriction_flags = le16_to_cpu(rd_attrbs->restriction_flags); 1277 + cxl_sparing_ctx->cap_safe_when_in_use = 1278 + CXL_GET_SPARING_SAFE_IN_USE(restriction_flags); 1279 + cxl_sparing_ctx->cap_hard_sparing = 1280 + CXL_GET_CAP_HARD_SPARING(restriction_flags); 1281 + cxl_sparing_ctx->cap_soft_sparing = 1282 + CXL_GET_CAP_SOFT_SPARING(restriction_flags); 1283 + 1284 + return 0; 1285 + } 1286 + 1287 + static struct cxl_event_dram * 1288 + cxl_mem_get_rec_dram(struct cxl_memdev *cxlmd, 1289 + struct cxl_mem_sparing_context *ctx) 1290 + { 1291 + struct cxl_mem_repair_attrbs attrbs = { 0 }; 1292 + 1293 + attrbs.dpa = ctx->dpa; 1294 + attrbs.channel = ctx->channel; 1295 + attrbs.rank = ctx->rank; 1296 + attrbs.nibble_mask = ctx->nibble_mask; 1297 + switch (ctx->repair_type) { 1298 + case EDAC_REPAIR_CACHELINE_SPARING: 1299 + attrbs.repair_type = CXL_CACHELINE_SPARING; 1300 + attrbs.bank_group = ctx->bank_group; 1301 + attrbs.bank = ctx->bank; 1302 + attrbs.row = ctx->row; 1303 + attrbs.column = ctx->column; 1304 + attrbs.sub_channel = ctx->sub_channel; 1305 + break; 1306 + case EDAC_REPAIR_ROW_SPARING: 1307 + attrbs.repair_type = CXL_ROW_SPARING; 1308 + attrbs.bank_group = ctx->bank_group; 1309 + attrbs.bank = ctx->bank; 1310 + attrbs.row = ctx->row; 1311 + break; 1312 + case EDAC_REPAIR_BANK_SPARING: 1313 + attrbs.repair_type = CXL_BANK_SPARING; 1314 + attrbs.bank_group = ctx->bank_group; 1315 + attrbs.bank = ctx->bank; 1316 + break; 1317 + case EDAC_REPAIR_RANK_SPARING: 1318 + attrbs.repair_type = CXL_BANK_SPARING; 1319 + break; 1320 + default: 1321 + return NULL; 1322 + } 1323 + 1324 + return cxl_find_rec_dram(cxlmd, &attrbs); 1325 + } 1326 + 1327 + static int 1328 + cxl_mem_perform_sparing(struct device *dev, 1329 + struct cxl_mem_sparing_context *cxl_sparing_ctx) 1330 + { 1331 + struct cxl_memdev *cxlmd = cxl_sparing_ctx->cxlmd; 1332 + struct cxl_memdev_sparing_in_payload sparing_pi; 1333 + struct cxl_event_dram *rec = NULL; 1334 + u16 validity_flags = 0; 1335 + 1336 + struct rw_semaphore *region_lock __free(rwsem_read_release) = 1337 + rwsem_read_intr_acquire(&cxl_region_rwsem); 1338 + if (!region_lock) 1339 + return -EINTR; 1340 + 1341 + struct rw_semaphore *dpa_lock __free(rwsem_read_release) = 1342 + rwsem_read_intr_acquire(&cxl_dpa_rwsem); 1343 + if (!dpa_lock) 1344 + return -EINTR; 1345 + 1346 + if (!cxl_sparing_ctx->cap_safe_when_in_use) { 1347 + /* Memory to repair must be offline */ 1348 + if (cxl_is_memdev_memory_online(cxlmd)) 1349 + return -EBUSY; 1350 + } else { 1351 + if (cxl_is_memdev_memory_online(cxlmd)) { 1352 + rec = cxl_mem_get_rec_dram(cxlmd, cxl_sparing_ctx); 1353 + if (!rec) 1354 + return -EINVAL; 1355 + 1356 + if (!get_unaligned_le16(rec->media_hdr.validity_flags)) 1357 + return -EINVAL; 1358 + } 1359 + } 1360 + 1361 + memset(&sparing_pi, 0, sizeof(sparing_pi)); 1362 + sparing_pi.flags = CXL_SET_SPARING_QUERY_RESOURCE(0); 1363 + if (cxl_sparing_ctx->persist_mode) 1364 + sparing_pi.flags |= CXL_SET_HARD_SPARING(1); 1365 + 1366 + if (rec) 1367 + validity_flags = get_unaligned_le16(rec->media_hdr.validity_flags); 1368 + 1369 + switch (cxl_sparing_ctx->repair_type) { 1370 + case EDAC_REPAIR_CACHELINE_SPARING: 1371 + sparing_pi.column = cpu_to_le16(cxl_sparing_ctx->column); 1372 + if (!rec || (validity_flags & CXL_DER_VALID_SUB_CHANNEL)) { 1373 + sparing_pi.flags |= CXL_SET_SPARING_SUB_CHNL_VALID(1); 1374 + sparing_pi.sub_channel = cxl_sparing_ctx->sub_channel; 1375 + } 1376 + fallthrough; 1377 + case EDAC_REPAIR_ROW_SPARING: 1378 + put_unaligned_le24(cxl_sparing_ctx->row, sparing_pi.row); 1379 + fallthrough; 1380 + case EDAC_REPAIR_BANK_SPARING: 1381 + sparing_pi.bank_group = cxl_sparing_ctx->bank_group; 1382 + sparing_pi.bank = cxl_sparing_ctx->bank; 1383 + fallthrough; 1384 + case EDAC_REPAIR_RANK_SPARING: 1385 + sparing_pi.rank = cxl_sparing_ctx->rank; 1386 + fallthrough; 1387 + default: 1388 + sparing_pi.channel = cxl_sparing_ctx->channel; 1389 + if ((rec && (validity_flags & CXL_DER_VALID_NIBBLE)) || 1390 + (!rec && (!cxl_sparing_ctx->nibble_mask || 1391 + (cxl_sparing_ctx->nibble_mask & 0xFFFFFF)))) { 1392 + sparing_pi.flags |= CXL_SET_SPARING_NIB_MASK_VALID(1); 1393 + put_unaligned_le24(cxl_sparing_ctx->nibble_mask, 1394 + sparing_pi.nibble_mask); 1395 + } 1396 + break; 1397 + } 1398 + 1399 + return cxl_perform_maintenance(&cxlmd->cxlds->cxl_mbox, 1400 + cxl_sparing_ctx->op_class, 1401 + cxl_sparing_ctx->op_subclass, 1402 + &sparing_pi, sizeof(sparing_pi)); 1403 + } 1404 + 1405 + static int cxl_mem_sparing_get_repair_type(struct device *dev, void *drv_data, 1406 + const char **repair_type) 1407 + { 1408 + struct cxl_mem_sparing_context *ctx = drv_data; 1409 + 1410 + switch (ctx->repair_type) { 1411 + case EDAC_REPAIR_CACHELINE_SPARING: 1412 + case EDAC_REPAIR_ROW_SPARING: 1413 + case EDAC_REPAIR_BANK_SPARING: 1414 + case EDAC_REPAIR_RANK_SPARING: 1415 + *repair_type = edac_repair_type[ctx->repair_type]; 1416 + break; 1417 + default: 1418 + return -EINVAL; 1419 + } 1420 + 1421 + return 0; 1422 + } 1423 + 1424 + #define CXL_SPARING_GET_ATTR(attrb, data_type) \ 1425 + static int cxl_mem_sparing_get_##attrb( \ 1426 + struct device *dev, void *drv_data, data_type *val) \ 1427 + { \ 1428 + struct cxl_mem_sparing_context *ctx = drv_data; \ 1429 + \ 1430 + *val = ctx->attrb; \ 1431 + \ 1432 + return 0; \ 1433 + } 1434 + CXL_SPARING_GET_ATTR(persist_mode, bool) 1435 + CXL_SPARING_GET_ATTR(dpa, u64) 1436 + CXL_SPARING_GET_ATTR(nibble_mask, u32) 1437 + CXL_SPARING_GET_ATTR(bank_group, u32) 1438 + CXL_SPARING_GET_ATTR(bank, u32) 1439 + CXL_SPARING_GET_ATTR(rank, u32) 1440 + CXL_SPARING_GET_ATTR(row, u32) 1441 + CXL_SPARING_GET_ATTR(column, u32) 1442 + CXL_SPARING_GET_ATTR(channel, u32) 1443 + CXL_SPARING_GET_ATTR(sub_channel, u32) 1444 + 1445 + #define CXL_SPARING_SET_ATTR(attrb, data_type) \ 1446 + static int cxl_mem_sparing_set_##attrb(struct device *dev, \ 1447 + void *drv_data, data_type val) \ 1448 + { \ 1449 + struct cxl_mem_sparing_context *ctx = drv_data; \ 1450 + \ 1451 + ctx->attrb = val; \ 1452 + \ 1453 + return 0; \ 1454 + } 1455 + CXL_SPARING_SET_ATTR(nibble_mask, u32) 1456 + CXL_SPARING_SET_ATTR(bank_group, u32) 1457 + CXL_SPARING_SET_ATTR(bank, u32) 1458 + CXL_SPARING_SET_ATTR(rank, u32) 1459 + CXL_SPARING_SET_ATTR(row, u32) 1460 + CXL_SPARING_SET_ATTR(column, u32) 1461 + CXL_SPARING_SET_ATTR(channel, u32) 1462 + CXL_SPARING_SET_ATTR(sub_channel, u32) 1463 + 1464 + static int cxl_mem_sparing_set_persist_mode(struct device *dev, void *drv_data, 1465 + bool persist_mode) 1466 + { 1467 + struct cxl_mem_sparing_context *ctx = drv_data; 1468 + 1469 + if ((persist_mode && ctx->cap_hard_sparing) || 1470 + (!persist_mode && ctx->cap_soft_sparing)) 1471 + ctx->persist_mode = persist_mode; 1472 + else 1473 + return -EOPNOTSUPP; 1474 + 1475 + return 0; 1476 + } 1477 + 1478 + static int cxl_get_mem_sparing_safe_when_in_use(struct device *dev, 1479 + void *drv_data, bool *safe) 1480 + { 1481 + struct cxl_mem_sparing_context *ctx = drv_data; 1482 + 1483 + *safe = ctx->cap_safe_when_in_use; 1484 + 1485 + return 0; 1486 + } 1487 + 1488 + static int cxl_mem_sparing_get_min_dpa(struct device *dev, void *drv_data, 1489 + u64 *min_dpa) 1490 + { 1491 + struct cxl_mem_sparing_context *ctx = drv_data; 1492 + struct cxl_memdev *cxlmd = ctx->cxlmd; 1493 + struct cxl_dev_state *cxlds = cxlmd->cxlds; 1494 + 1495 + *min_dpa = cxlds->dpa_res.start; 1496 + 1497 + return 0; 1498 + } 1499 + 1500 + static int cxl_mem_sparing_get_max_dpa(struct device *dev, void *drv_data, 1501 + u64 *max_dpa) 1502 + { 1503 + struct cxl_mem_sparing_context *ctx = drv_data; 1504 + struct cxl_memdev *cxlmd = ctx->cxlmd; 1505 + struct cxl_dev_state *cxlds = cxlmd->cxlds; 1506 + 1507 + *max_dpa = cxlds->dpa_res.end; 1508 + 1509 + return 0; 1510 + } 1511 + 1512 + static int cxl_mem_sparing_set_dpa(struct device *dev, void *drv_data, u64 dpa) 1513 + { 1514 + struct cxl_mem_sparing_context *ctx = drv_data; 1515 + struct cxl_memdev *cxlmd = ctx->cxlmd; 1516 + struct cxl_dev_state *cxlds = cxlmd->cxlds; 1517 + 1518 + if (dpa < cxlds->dpa_res.start || dpa > cxlds->dpa_res.end) 1519 + return -EINVAL; 1520 + 1521 + ctx->dpa = dpa; 1522 + 1523 + return 0; 1524 + } 1525 + 1526 + static int cxl_do_mem_sparing(struct device *dev, void *drv_data, u32 val) 1527 + { 1528 + struct cxl_mem_sparing_context *ctx = drv_data; 1529 + 1530 + if (val != EDAC_DO_MEM_REPAIR) 1531 + return -EINVAL; 1532 + 1533 + return cxl_mem_perform_sparing(dev, ctx); 1534 + } 1535 + 1536 + #define RANK_OPS \ 1537 + .get_repair_type = cxl_mem_sparing_get_repair_type, \ 1538 + .get_persist_mode = cxl_mem_sparing_get_persist_mode, \ 1539 + .set_persist_mode = cxl_mem_sparing_set_persist_mode, \ 1540 + .get_repair_safe_when_in_use = cxl_get_mem_sparing_safe_when_in_use, \ 1541 + .get_min_dpa = cxl_mem_sparing_get_min_dpa, \ 1542 + .get_max_dpa = cxl_mem_sparing_get_max_dpa, \ 1543 + .get_dpa = cxl_mem_sparing_get_dpa, \ 1544 + .set_dpa = cxl_mem_sparing_set_dpa, \ 1545 + .get_nibble_mask = cxl_mem_sparing_get_nibble_mask, \ 1546 + .set_nibble_mask = cxl_mem_sparing_set_nibble_mask, \ 1547 + .get_rank = cxl_mem_sparing_get_rank, \ 1548 + .set_rank = cxl_mem_sparing_set_rank, \ 1549 + .get_channel = cxl_mem_sparing_get_channel, \ 1550 + .set_channel = cxl_mem_sparing_set_channel, \ 1551 + .do_repair = cxl_do_mem_sparing 1552 + 1553 + #define BANK_OPS \ 1554 + RANK_OPS, .get_bank_group = cxl_mem_sparing_get_bank_group, \ 1555 + .set_bank_group = cxl_mem_sparing_set_bank_group, \ 1556 + .get_bank = cxl_mem_sparing_get_bank, \ 1557 + .set_bank = cxl_mem_sparing_set_bank 1558 + 1559 + #define ROW_OPS \ 1560 + BANK_OPS, .get_row = cxl_mem_sparing_get_row, \ 1561 + .set_row = cxl_mem_sparing_set_row 1562 + 1563 + #define CACHELINE_OPS \ 1564 + ROW_OPS, .get_column = cxl_mem_sparing_get_column, \ 1565 + .set_column = cxl_mem_sparing_set_column, \ 1566 + .get_sub_channel = cxl_mem_sparing_get_sub_channel, \ 1567 + .set_sub_channel = cxl_mem_sparing_set_sub_channel 1568 + 1569 + static const struct edac_mem_repair_ops cxl_rank_sparing_ops = { 1570 + RANK_OPS, 1571 + }; 1572 + 1573 + static const struct edac_mem_repair_ops cxl_bank_sparing_ops = { 1574 + BANK_OPS, 1575 + }; 1576 + 1577 + static const struct edac_mem_repair_ops cxl_row_sparing_ops = { 1578 + ROW_OPS, 1579 + }; 1580 + 1581 + static const struct edac_mem_repair_ops cxl_cacheline_sparing_ops = { 1582 + CACHELINE_OPS, 1583 + }; 1584 + 1585 + struct cxl_mem_sparing_desc { 1586 + const uuid_t repair_uuid; 1587 + enum edac_mem_repair_type repair_type; 1588 + const struct edac_mem_repair_ops *repair_ops; 1589 + }; 1590 + 1591 + static const struct cxl_mem_sparing_desc mem_sparing_desc[] = { 1592 + { 1593 + .repair_uuid = CXL_FEAT_CACHELINE_SPARING_UUID, 1594 + .repair_type = EDAC_REPAIR_CACHELINE_SPARING, 1595 + .repair_ops = &cxl_cacheline_sparing_ops, 1596 + }, 1597 + { 1598 + .repair_uuid = CXL_FEAT_ROW_SPARING_UUID, 1599 + .repair_type = EDAC_REPAIR_ROW_SPARING, 1600 + .repair_ops = &cxl_row_sparing_ops, 1601 + }, 1602 + { 1603 + .repair_uuid = CXL_FEAT_BANK_SPARING_UUID, 1604 + .repair_type = EDAC_REPAIR_BANK_SPARING, 1605 + .repair_ops = &cxl_bank_sparing_ops, 1606 + }, 1607 + { 1608 + .repair_uuid = CXL_FEAT_RANK_SPARING_UUID, 1609 + .repair_type = EDAC_REPAIR_RANK_SPARING, 1610 + .repair_ops = &cxl_rank_sparing_ops, 1611 + }, 1612 + }; 1613 + 1614 + static int cxl_memdev_sparing_init(struct cxl_memdev *cxlmd, 1615 + struct edac_dev_feature *ras_feature, 1616 + const struct cxl_mem_sparing_desc *desc, 1617 + u8 repair_inst) 1618 + { 1619 + struct cxl_mem_sparing_context *cxl_sparing_ctx; 1620 + struct cxl_feat_entry *feat_entry; 1621 + int ret; 1622 + 1623 + feat_entry = cxl_feature_info(to_cxlfs(cxlmd->cxlds), 1624 + &desc->repair_uuid); 1625 + if (IS_ERR(feat_entry)) 1626 + return -EOPNOTSUPP; 1627 + 1628 + if (!(le32_to_cpu(feat_entry->flags) & CXL_FEATURE_F_CHANGEABLE)) 1629 + return -EOPNOTSUPP; 1630 + 1631 + cxl_sparing_ctx = devm_kzalloc(&cxlmd->dev, sizeof(*cxl_sparing_ctx), 1632 + GFP_KERNEL); 1633 + if (!cxl_sparing_ctx) 1634 + return -ENOMEM; 1635 + 1636 + *cxl_sparing_ctx = (struct cxl_mem_sparing_context){ 1637 + .get_feat_size = le16_to_cpu(feat_entry->get_feat_size), 1638 + .set_feat_size = le16_to_cpu(feat_entry->set_feat_size), 1639 + .get_version = feat_entry->get_feat_ver, 1640 + .set_version = feat_entry->set_feat_ver, 1641 + .effects = le16_to_cpu(feat_entry->effects), 1642 + .cxlmd = cxlmd, 1643 + .repair_type = desc->repair_type, 1644 + .instance = repair_inst++, 1645 + }; 1646 + uuid_copy(&cxl_sparing_ctx->repair_uuid, &desc->repair_uuid); 1647 + 1648 + ret = cxl_mem_sparing_get_attrbs(cxl_sparing_ctx); 1649 + if (ret) 1650 + return ret; 1651 + 1652 + if ((cxl_sparing_ctx->cap_soft_sparing && 1653 + cxl_sparing_ctx->cap_hard_sparing) || 1654 + cxl_sparing_ctx->cap_soft_sparing) 1655 + cxl_sparing_ctx->persist_mode = 0; 1656 + else if (cxl_sparing_ctx->cap_hard_sparing) 1657 + cxl_sparing_ctx->persist_mode = 1; 1658 + else 1659 + return -EOPNOTSUPP; 1660 + 1661 + ras_feature->ft_type = RAS_FEAT_MEM_REPAIR; 1662 + ras_feature->instance = cxl_sparing_ctx->instance; 1663 + ras_feature->mem_repair_ops = desc->repair_ops; 1664 + ras_feature->ctx = cxl_sparing_ctx; 1665 + 1666 + return 0; 1667 + } 1668 + 1669 + /* 1670 + * CXL memory soft PPR & hard PPR control 1671 + */ 1672 + struct cxl_ppr_context { 1673 + uuid_t repair_uuid; 1674 + u8 instance; 1675 + u16 get_feat_size; 1676 + u16 set_feat_size; 1677 + u8 get_version; 1678 + u8 set_version; 1679 + u16 effects; 1680 + u8 op_class; 1681 + u8 op_subclass; 1682 + bool cap_dpa; 1683 + bool cap_nib_mask; 1684 + bool media_accessible; 1685 + bool data_retained; 1686 + struct cxl_memdev *cxlmd; 1687 + enum edac_mem_repair_type repair_type; 1688 + bool persist_mode; 1689 + u64 dpa; 1690 + u32 nibble_mask; 1691 + }; 1692 + 1693 + /* 1694 + * See CXL rev 3.2 @8.2.10.7.2.1 Table 8-128 sPPR Feature Readable Attributes 1695 + * 1696 + * See CXL rev 3.2 @8.2.10.7.2.2 Table 8-131 hPPR Feature Readable Attributes 1697 + */ 1698 + 1699 + #define CXL_PPR_OP_CAP_DEVICE_INITIATED BIT(0) 1700 + #define CXL_PPR_OP_MODE_DEV_INITIATED BIT(0) 1701 + 1702 + #define CXL_PPR_FLAG_DPA_SUPPORT_MASK BIT(0) 1703 + #define CXL_PPR_FLAG_NIB_SUPPORT_MASK BIT(1) 1704 + #define CXL_PPR_FLAG_MEM_SPARING_EV_REC_SUPPORT_MASK BIT(2) 1705 + #define CXL_PPR_FLAG_DEV_INITED_PPR_AT_BOOT_CAP_MASK BIT(3) 1706 + 1707 + #define CXL_PPR_RESTRICTION_FLAG_MEDIA_ACCESSIBLE_MASK BIT(0) 1708 + #define CXL_PPR_RESTRICTION_FLAG_DATA_RETAINED_MASK BIT(2) 1709 + 1710 + #define CXL_PPR_SPARING_EV_REC_EN_MASK BIT(0) 1711 + #define CXL_PPR_DEV_INITED_PPR_AT_BOOT_EN_MASK BIT(1) 1712 + 1713 + #define CXL_PPR_GET_CAP_DPA(flags) \ 1714 + FIELD_GET(CXL_PPR_FLAG_DPA_SUPPORT_MASK, flags) 1715 + #define CXL_PPR_GET_CAP_NIB_MASK(flags) \ 1716 + FIELD_GET(CXL_PPR_FLAG_NIB_SUPPORT_MASK, flags) 1717 + #define CXL_PPR_GET_MEDIA_ACCESSIBLE(restriction_flags) \ 1718 + (FIELD_GET(CXL_PPR_RESTRICTION_FLAG_MEDIA_ACCESSIBLE_MASK, \ 1719 + restriction_flags) ^ 1) 1720 + #define CXL_PPR_GET_DATA_RETAINED(restriction_flags) \ 1721 + (FIELD_GET(CXL_PPR_RESTRICTION_FLAG_DATA_RETAINED_MASK, \ 1722 + restriction_flags) ^ 1) 1723 + 1724 + struct cxl_memdev_ppr_rd_attrbs { 1725 + struct cxl_memdev_repair_rd_attrbs_hdr hdr; 1726 + u8 ppr_flags; 1727 + __le16 restriction_flags; 1728 + u8 ppr_op_mode; 1729 + } __packed; 1730 + 1731 + /* 1732 + * See CXL rev 3.2 @8.2.10.7.1.2 Table 8-118 sPPR Maintenance Input Payload 1733 + * 1734 + * See CXL rev 3.2 @8.2.10.7.1.3 Table 8-119 hPPR Maintenance Input Payload 1735 + */ 1736 + struct cxl_memdev_ppr_maintenance_attrbs { 1737 + u8 flags; 1738 + __le64 dpa; 1739 + u8 nibble_mask[3]; 1740 + } __packed; 1741 + 1742 + static int cxl_mem_ppr_get_attrbs(struct cxl_ppr_context *cxl_ppr_ctx) 1743 + { 1744 + size_t rd_data_size = sizeof(struct cxl_memdev_ppr_rd_attrbs); 1745 + struct cxl_memdev *cxlmd = cxl_ppr_ctx->cxlmd; 1746 + struct cxl_mailbox *cxl_mbox = &cxlmd->cxlds->cxl_mbox; 1747 + u16 restriction_flags; 1748 + size_t data_size; 1749 + u16 return_code; 1750 + 1751 + struct cxl_memdev_ppr_rd_attrbs *rd_attrbs __free(kfree) = 1752 + kmalloc(rd_data_size, GFP_KERNEL); 1753 + if (!rd_attrbs) 1754 + return -ENOMEM; 1755 + 1756 + data_size = cxl_get_feature(cxl_mbox, &cxl_ppr_ctx->repair_uuid, 1757 + CXL_GET_FEAT_SEL_CURRENT_VALUE, rd_attrbs, 1758 + rd_data_size, 0, &return_code); 1759 + if (!data_size) 1760 + return -EIO; 1761 + 1762 + cxl_ppr_ctx->op_class = rd_attrbs->hdr.op_class; 1763 + cxl_ppr_ctx->op_subclass = rd_attrbs->hdr.op_subclass; 1764 + cxl_ppr_ctx->cap_dpa = CXL_PPR_GET_CAP_DPA(rd_attrbs->ppr_flags); 1765 + cxl_ppr_ctx->cap_nib_mask = 1766 + CXL_PPR_GET_CAP_NIB_MASK(rd_attrbs->ppr_flags); 1767 + 1768 + restriction_flags = le16_to_cpu(rd_attrbs->restriction_flags); 1769 + cxl_ppr_ctx->media_accessible = 1770 + CXL_PPR_GET_MEDIA_ACCESSIBLE(restriction_flags); 1771 + cxl_ppr_ctx->data_retained = 1772 + CXL_PPR_GET_DATA_RETAINED(restriction_flags); 1773 + 1774 + return 0; 1775 + } 1776 + 1777 + static int cxl_mem_perform_ppr(struct cxl_ppr_context *cxl_ppr_ctx) 1778 + { 1779 + struct cxl_memdev_ppr_maintenance_attrbs maintenance_attrbs; 1780 + struct cxl_memdev *cxlmd = cxl_ppr_ctx->cxlmd; 1781 + struct cxl_mem_repair_attrbs attrbs = { 0 }; 1782 + 1783 + struct rw_semaphore *region_lock __free(rwsem_read_release) = 1784 + rwsem_read_intr_acquire(&cxl_region_rwsem); 1785 + if (!region_lock) 1786 + return -EINTR; 1787 + 1788 + struct rw_semaphore *dpa_lock __free(rwsem_read_release) = 1789 + rwsem_read_intr_acquire(&cxl_dpa_rwsem); 1790 + if (!dpa_lock) 1791 + return -EINTR; 1792 + 1793 + if (!cxl_ppr_ctx->media_accessible || !cxl_ppr_ctx->data_retained) { 1794 + /* Memory to repair must be offline */ 1795 + if (cxl_is_memdev_memory_online(cxlmd)) 1796 + return -EBUSY; 1797 + } else { 1798 + if (cxl_is_memdev_memory_online(cxlmd)) { 1799 + /* Check memory to repair is from the current boot */ 1800 + attrbs.repair_type = CXL_PPR; 1801 + attrbs.dpa = cxl_ppr_ctx->dpa; 1802 + attrbs.nibble_mask = cxl_ppr_ctx->nibble_mask; 1803 + if (!cxl_find_rec_dram(cxlmd, &attrbs) && 1804 + !cxl_find_rec_gen_media(cxlmd, &attrbs)) 1805 + return -EINVAL; 1806 + } 1807 + } 1808 + 1809 + memset(&maintenance_attrbs, 0, sizeof(maintenance_attrbs)); 1810 + maintenance_attrbs.flags = 0; 1811 + maintenance_attrbs.dpa = cpu_to_le64(cxl_ppr_ctx->dpa); 1812 + put_unaligned_le24(cxl_ppr_ctx->nibble_mask, 1813 + maintenance_attrbs.nibble_mask); 1814 + 1815 + return cxl_perform_maintenance(&cxlmd->cxlds->cxl_mbox, 1816 + cxl_ppr_ctx->op_class, 1817 + cxl_ppr_ctx->op_subclass, 1818 + &maintenance_attrbs, 1819 + sizeof(maintenance_attrbs)); 1820 + } 1821 + 1822 + static int cxl_ppr_get_repair_type(struct device *dev, void *drv_data, 1823 + const char **repair_type) 1824 + { 1825 + *repair_type = edac_repair_type[EDAC_REPAIR_PPR]; 1826 + 1827 + return 0; 1828 + } 1829 + 1830 + static int cxl_ppr_get_persist_mode(struct device *dev, void *drv_data, 1831 + bool *persist_mode) 1832 + { 1833 + struct cxl_ppr_context *cxl_ppr_ctx = drv_data; 1834 + 1835 + *persist_mode = cxl_ppr_ctx->persist_mode; 1836 + 1837 + return 0; 1838 + } 1839 + 1840 + static int cxl_get_ppr_safe_when_in_use(struct device *dev, void *drv_data, 1841 + bool *safe) 1842 + { 1843 + struct cxl_ppr_context *cxl_ppr_ctx = drv_data; 1844 + 1845 + *safe = cxl_ppr_ctx->media_accessible & cxl_ppr_ctx->data_retained; 1846 + 1847 + return 0; 1848 + } 1849 + 1850 + static int cxl_ppr_get_min_dpa(struct device *dev, void *drv_data, u64 *min_dpa) 1851 + { 1852 + struct cxl_ppr_context *cxl_ppr_ctx = drv_data; 1853 + struct cxl_memdev *cxlmd = cxl_ppr_ctx->cxlmd; 1854 + struct cxl_dev_state *cxlds = cxlmd->cxlds; 1855 + 1856 + *min_dpa = cxlds->dpa_res.start; 1857 + 1858 + return 0; 1859 + } 1860 + 1861 + static int cxl_ppr_get_max_dpa(struct device *dev, void *drv_data, u64 *max_dpa) 1862 + { 1863 + struct cxl_ppr_context *cxl_ppr_ctx = drv_data; 1864 + struct cxl_memdev *cxlmd = cxl_ppr_ctx->cxlmd; 1865 + struct cxl_dev_state *cxlds = cxlmd->cxlds; 1866 + 1867 + *max_dpa = cxlds->dpa_res.end; 1868 + 1869 + return 0; 1870 + } 1871 + 1872 + static int cxl_ppr_get_dpa(struct device *dev, void *drv_data, u64 *dpa) 1873 + { 1874 + struct cxl_ppr_context *cxl_ppr_ctx = drv_data; 1875 + 1876 + *dpa = cxl_ppr_ctx->dpa; 1877 + 1878 + return 0; 1879 + } 1880 + 1881 + static int cxl_ppr_set_dpa(struct device *dev, void *drv_data, u64 dpa) 1882 + { 1883 + struct cxl_ppr_context *cxl_ppr_ctx = drv_data; 1884 + struct cxl_memdev *cxlmd = cxl_ppr_ctx->cxlmd; 1885 + struct cxl_dev_state *cxlds = cxlmd->cxlds; 1886 + 1887 + if (dpa < cxlds->dpa_res.start || dpa > cxlds->dpa_res.end) 1888 + return -EINVAL; 1889 + 1890 + cxl_ppr_ctx->dpa = dpa; 1891 + 1892 + return 0; 1893 + } 1894 + 1895 + static int cxl_ppr_get_nibble_mask(struct device *dev, void *drv_data, 1896 + u32 *nibble_mask) 1897 + { 1898 + struct cxl_ppr_context *cxl_ppr_ctx = drv_data; 1899 + 1900 + *nibble_mask = cxl_ppr_ctx->nibble_mask; 1901 + 1902 + return 0; 1903 + } 1904 + 1905 + static int cxl_ppr_set_nibble_mask(struct device *dev, void *drv_data, 1906 + u32 nibble_mask) 1907 + { 1908 + struct cxl_ppr_context *cxl_ppr_ctx = drv_data; 1909 + 1910 + cxl_ppr_ctx->nibble_mask = nibble_mask; 1911 + 1912 + return 0; 1913 + } 1914 + 1915 + static int cxl_do_ppr(struct device *dev, void *drv_data, u32 val) 1916 + { 1917 + struct cxl_ppr_context *cxl_ppr_ctx = drv_data; 1918 + 1919 + if (!cxl_ppr_ctx->dpa || val != EDAC_DO_MEM_REPAIR) 1920 + return -EINVAL; 1921 + 1922 + return cxl_mem_perform_ppr(cxl_ppr_ctx); 1923 + } 1924 + 1925 + static const struct edac_mem_repair_ops cxl_sppr_ops = { 1926 + .get_repair_type = cxl_ppr_get_repair_type, 1927 + .get_persist_mode = cxl_ppr_get_persist_mode, 1928 + .get_repair_safe_when_in_use = cxl_get_ppr_safe_when_in_use, 1929 + .get_min_dpa = cxl_ppr_get_min_dpa, 1930 + .get_max_dpa = cxl_ppr_get_max_dpa, 1931 + .get_dpa = cxl_ppr_get_dpa, 1932 + .set_dpa = cxl_ppr_set_dpa, 1933 + .get_nibble_mask = cxl_ppr_get_nibble_mask, 1934 + .set_nibble_mask = cxl_ppr_set_nibble_mask, 1935 + .do_repair = cxl_do_ppr, 1936 + }; 1937 + 1938 + static int cxl_memdev_soft_ppr_init(struct cxl_memdev *cxlmd, 1939 + struct edac_dev_feature *ras_feature, 1940 + u8 repair_inst) 1941 + { 1942 + struct cxl_ppr_context *cxl_sppr_ctx; 1943 + struct cxl_feat_entry *feat_entry; 1944 + int ret; 1945 + 1946 + feat_entry = cxl_feature_info(to_cxlfs(cxlmd->cxlds), 1947 + &CXL_FEAT_SPPR_UUID); 1948 + if (IS_ERR(feat_entry)) 1949 + return -EOPNOTSUPP; 1950 + 1951 + if (!(le32_to_cpu(feat_entry->flags) & CXL_FEATURE_F_CHANGEABLE)) 1952 + return -EOPNOTSUPP; 1953 + 1954 + cxl_sppr_ctx = 1955 + devm_kzalloc(&cxlmd->dev, sizeof(*cxl_sppr_ctx), GFP_KERNEL); 1956 + if (!cxl_sppr_ctx) 1957 + return -ENOMEM; 1958 + 1959 + *cxl_sppr_ctx = (struct cxl_ppr_context){ 1960 + .get_feat_size = le16_to_cpu(feat_entry->get_feat_size), 1961 + .set_feat_size = le16_to_cpu(feat_entry->set_feat_size), 1962 + .get_version = feat_entry->get_feat_ver, 1963 + .set_version = feat_entry->set_feat_ver, 1964 + .effects = le16_to_cpu(feat_entry->effects), 1965 + .cxlmd = cxlmd, 1966 + .repair_type = EDAC_REPAIR_PPR, 1967 + .persist_mode = 0, 1968 + .instance = repair_inst, 1969 + }; 1970 + uuid_copy(&cxl_sppr_ctx->repair_uuid, &CXL_FEAT_SPPR_UUID); 1971 + 1972 + ret = cxl_mem_ppr_get_attrbs(cxl_sppr_ctx); 1973 + if (ret) 1974 + return ret; 1975 + 1976 + ras_feature->ft_type = RAS_FEAT_MEM_REPAIR; 1977 + ras_feature->instance = cxl_sppr_ctx->instance; 1978 + ras_feature->mem_repair_ops = &cxl_sppr_ops; 1979 + ras_feature->ctx = cxl_sppr_ctx; 1980 + 1981 + return 0; 1982 + } 1983 + 1984 + int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd) 1985 + { 1986 + struct edac_dev_feature ras_features[CXL_NR_EDAC_DEV_FEATURES]; 1987 + int num_ras_features = 0; 1988 + u8 repair_inst = 0; 1989 + int rc; 1990 + 1991 + if (IS_ENABLED(CONFIG_CXL_EDAC_SCRUB)) { 1992 + rc = cxl_memdev_scrub_init(cxlmd, &ras_features[num_ras_features], 0); 1993 + if (rc < 0 && rc != -EOPNOTSUPP) 1994 + return rc; 1995 + 1996 + if (rc != -EOPNOTSUPP) 1997 + num_ras_features++; 1998 + } 1999 + 2000 + if (IS_ENABLED(CONFIG_CXL_EDAC_ECS)) { 2001 + rc = cxl_memdev_ecs_init(cxlmd, &ras_features[num_ras_features]); 2002 + if (rc < 0 && rc != -EOPNOTSUPP) 2003 + return rc; 2004 + 2005 + if (rc != -EOPNOTSUPP) 2006 + num_ras_features++; 2007 + } 2008 + 2009 + if (IS_ENABLED(CONFIG_CXL_EDAC_MEM_REPAIR)) { 2010 + for (int i = 0; i < CXL_MEM_SPARING_MAX; i++) { 2011 + rc = cxl_memdev_sparing_init(cxlmd, 2012 + &ras_features[num_ras_features], 2013 + &mem_sparing_desc[i], repair_inst); 2014 + if (rc == -EOPNOTSUPP) 2015 + continue; 2016 + if (rc < 0) 2017 + return rc; 2018 + 2019 + repair_inst++; 2020 + num_ras_features++; 2021 + } 2022 + 2023 + rc = cxl_memdev_soft_ppr_init(cxlmd, &ras_features[num_ras_features], 2024 + repair_inst); 2025 + if (rc < 0 && rc != -EOPNOTSUPP) 2026 + return rc; 2027 + 2028 + if (rc != -EOPNOTSUPP) { 2029 + repair_inst++; 2030 + num_ras_features++; 2031 + } 2032 + 2033 + if (repair_inst) { 2034 + struct cxl_mem_err_rec *array_rec = 2035 + devm_kzalloc(&cxlmd->dev, sizeof(*array_rec), 2036 + GFP_KERNEL); 2037 + if (!array_rec) 2038 + return -ENOMEM; 2039 + 2040 + xa_init(&array_rec->rec_gen_media); 2041 + xa_init(&array_rec->rec_dram); 2042 + cxlmd->err_rec_array = array_rec; 2043 + } 2044 + } 2045 + 2046 + if (!num_ras_features) 2047 + return -EINVAL; 2048 + 2049 + char *cxl_dev_name __free(kfree) = 2050 + kasprintf(GFP_KERNEL, "cxl_%s", dev_name(&cxlmd->dev)); 2051 + if (!cxl_dev_name) 2052 + return -ENOMEM; 2053 + 2054 + return edac_dev_register(&cxlmd->dev, cxl_dev_name, NULL, 2055 + num_ras_features, ras_features); 2056 + } 2057 + EXPORT_SYMBOL_NS_GPL(devm_cxl_memdev_edac_register, "CXL"); 2058 + 2059 + int devm_cxl_region_edac_register(struct cxl_region *cxlr) 2060 + { 2061 + struct edac_dev_feature ras_features[CXL_NR_EDAC_DEV_FEATURES]; 2062 + int num_ras_features = 0; 2063 + int rc; 2064 + 2065 + if (!IS_ENABLED(CONFIG_CXL_EDAC_SCRUB)) 2066 + return 0; 2067 + 2068 + rc = cxl_region_scrub_init(cxlr, &ras_features[num_ras_features], 0); 2069 + if (rc < 0) 2070 + return rc; 2071 + 2072 + num_ras_features++; 2073 + 2074 + char *cxl_dev_name __free(kfree) = 2075 + kasprintf(GFP_KERNEL, "cxl_%s", dev_name(&cxlr->dev)); 2076 + if (!cxl_dev_name) 2077 + return -ENOMEM; 2078 + 2079 + return edac_dev_register(&cxlr->dev, cxl_dev_name, NULL, 2080 + num_ras_features, ras_features); 2081 + } 2082 + EXPORT_SYMBOL_NS_GPL(devm_cxl_region_edac_register, "CXL"); 2083 + 2084 + void devm_cxl_memdev_edac_release(struct cxl_memdev *cxlmd) 2085 + { 2086 + struct cxl_mem_err_rec *array_rec = cxlmd->err_rec_array; 2087 + struct cxl_event_gen_media *rec_gen_media; 2088 + struct cxl_event_dram *rec_dram; 2089 + unsigned long index; 2090 + 2091 + if (!IS_ENABLED(CONFIG_CXL_EDAC_MEM_REPAIR) || !array_rec) 2092 + return; 2093 + 2094 + xa_for_each(&array_rec->rec_dram, index, rec_dram) 2095 + kfree(rec_dram); 2096 + xa_destroy(&array_rec->rec_dram); 2097 + 2098 + xa_for_each(&array_rec->rec_gen_media, index, rec_gen_media) 2099 + kfree(rec_gen_media); 2100 + xa_destroy(&array_rec->rec_gen_media); 2101 + } 2102 + EXPORT_SYMBOL_NS_GPL(devm_cxl_memdev_edac_release, "CXL");
+19 -24
drivers/cxl/core/features.c
··· 9 9 #include "core.h" 10 10 #include "cxlmem.h" 11 11 12 + /** 13 + * DOC: cxl features 14 + * 15 + * CXL Features: 16 + * A CXL device that includes a mailbox supports commands that allows 17 + * listing, getting, and setting of optionally defined features such 18 + * as memory sparing or post package sparing. Vendors may define custom 19 + * features for the device. 20 + */ 21 + 12 22 /* All the features below are exclusive to the kernel */ 13 23 static const uuid_t cxl_exclusive_feats[] = { 14 24 CXL_FEAT_PATROL_SCRUB_UUID, ··· 46 36 return is_cxl_feature_exclusive_by_uuid(&entry->uuid); 47 37 } 48 38 49 - inline struct cxl_features_state *to_cxlfs(struct cxl_dev_state *cxlds) 39 + struct cxl_features_state *to_cxlfs(struct cxl_dev_state *cxlds) 50 40 { 51 41 return cxlds->cxlfs; 52 42 } ··· 365 355 { 366 356 } 367 357 368 - static struct cxl_feat_entry * 369 - get_support_feature_info(struct cxl_features_state *cxlfs, 370 - const struct fwctl_rpc_cxl *rpc_in) 358 + struct cxl_feat_entry * 359 + cxl_feature_info(struct cxl_features_state *cxlfs, 360 + const uuid_t *uuid) 371 361 { 372 362 struct cxl_feat_entry *feat; 373 - const uuid_t *uuid; 374 - 375 - if (rpc_in->op_size < sizeof(uuid)) 376 - return ERR_PTR(-EINVAL); 377 - 378 - uuid = &rpc_in->set_feat_in.uuid; 379 363 380 364 for (int i = 0; i < cxlfs->entries->num_features; i++) { 381 365 feat = &cxlfs->entries->ent[i]; ··· 420 416 421 417 rpc_out->size = struct_size(feat_out, ents, requested); 422 418 feat_out = &rpc_out->get_sup_feats_out; 423 - if (requested == 0) { 424 - feat_out->num_entries = cpu_to_le16(requested); 425 - feat_out->supported_feats = 426 - cpu_to_le16(cxlfs->entries->num_features); 427 - rpc_out->retval = CXL_MBOX_CMD_RC_SUCCESS; 428 - *out_len = out_size; 429 - return no_free_ptr(rpc_out); 430 - } 431 419 432 420 for (i = start, pos = &feat_out->ents[0]; 433 421 i < cxlfs->entries->num_features; i++, pos++) { ··· 543 547 struct cxl_feat_entry *feat; 544 548 u32 flags; 545 549 546 - feat = get_support_feature_info(cxlfs, rpc_in); 550 + if (rpc_in->op_size < sizeof(uuid_t)) 551 + return ERR_PTR(-EINVAL); 552 + 553 + feat = cxl_feature_info(cxlfs, &rpc_in->set_feat_in.uuid); 547 554 if (IS_ERR(feat)) 548 555 return false; 549 556 ··· 613 614 switch (opcode) { 614 615 case CXL_MBOX_OP_GET_SUPPORTED_FEATURES: 615 616 case CXL_MBOX_OP_GET_FEATURE: 616 - if (cxl_mbox->feat_cap < CXL_FEATURES_RO) 617 - return false; 618 - if (scope >= FWCTL_RPC_CONFIGURATION) 619 - return true; 620 - return false; 617 + return cxl_mbox->feat_cap >= CXL_FEATURES_RO; 621 618 case CXL_MBOX_OP_SET_FEATURE: 622 619 if (cxl_mbox->feat_cap < CXL_FEATURES_RW) 623 620 return false;
+6 -5
drivers/cxl/core/hdm.c
··· 34 34 if (rc) 35 35 return rc; 36 36 37 - dev_dbg(&cxld->dev, "Added to port %s\n", dev_name(&port->dev)); 37 + dev_dbg(port->uport_dev, "%s added to %s\n", 38 + dev_name(&cxld->dev), dev_name(&port->dev)); 38 39 39 40 return 0; 40 41 } ··· 604 603 return 0; 605 604 } 606 605 607 - static int __cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size) 606 + static int __cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, u64 size) 608 607 { 609 608 struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); 610 609 struct cxl_dev_state *cxlds = cxlmd->cxlds; ··· 667 666 skip = res->start - skip_start; 668 667 669 668 if (size > avail) { 670 - dev_dbg(dev, "%pa exceeds available %s capacity: %pa\n", &size, 671 - res->name, &avail); 669 + dev_dbg(dev, "%llu exceeds available %s capacity: %llu\n", size, 670 + res->name, (u64)avail); 672 671 return -ENOSPC; 673 672 } 674 673 675 674 return __cxl_dpa_reserve(cxled, start, size, skip); 676 675 } 677 676 678 - int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size) 677 + int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, u64 size) 679 678 { 680 679 struct cxl_port *port = cxled_to_port(cxled); 681 680 int rc;
+9 -2
drivers/cxl/core/mbox.c
··· 922 922 hpa_alias = hpa - cache_size; 923 923 } 924 924 925 - if (event_type == CXL_CPER_EVENT_GEN_MEDIA) 925 + if (event_type == CXL_CPER_EVENT_GEN_MEDIA) { 926 + if (cxl_store_rec_gen_media((struct cxl_memdev *)cxlmd, evt)) 927 + dev_dbg(&cxlmd->dev, "CXL store rec_gen_media failed\n"); 928 + 926 929 trace_cxl_general_media(cxlmd, type, cxlr, hpa, 927 930 hpa_alias, &evt->gen_media); 928 - else if (event_type == CXL_CPER_EVENT_DRAM) 931 + } else if (event_type == CXL_CPER_EVENT_DRAM) { 932 + if (cxl_store_rec_dram((struct cxl_memdev *)cxlmd, evt)) 933 + dev_dbg(&cxlmd->dev, "CXL store rec_dram failed\n"); 934 + 929 935 trace_cxl_dram(cxlmd, type, cxlr, hpa, hpa_alias, 930 936 &evt->dram); 937 + } 931 938 } 932 939 } 933 940 EXPORT_SYMBOL_NS_GPL(cxl_event_trace_record, "CXL");
+3 -2
drivers/cxl/core/memdev.c
··· 27 27 struct cxl_memdev *cxlmd = to_cxl_memdev(dev); 28 28 29 29 ida_free(&cxl_memdev_ida, cxlmd->id); 30 + devm_cxl_memdev_edac_release(cxlmd); 30 31 kfree(cxlmd); 31 32 } 32 33 ··· 154 153 return sysfs_emit(buf, "frozen\n"); 155 154 if (state & CXL_PMEM_SEC_STATE_LOCKED) 156 155 return sysfs_emit(buf, "locked\n"); 157 - else 158 - return sysfs_emit(buf, "unlocked\n"); 156 + 157 + return sysfs_emit(buf, "unlocked\n"); 159 158 } 160 159 static struct device_attribute dev_attr_security_state = 161 160 __ATTR(state, 0444, security_state_show, NULL);
+32 -18
drivers/cxl/core/pci.c
··· 415 415 */ 416 416 if (global_ctrl & CXL_HDM_DECODER_ENABLE || (!hdm && info->mem_enabled)) 417 417 return devm_cxl_enable_mem(&port->dev, cxlds); 418 - else if (!hdm) 418 + 419 + /* 420 + * If the HDM Decoder Capability does not exist and DVSEC was 421 + * not setup, the DVSEC based emulation cannot be used. 422 + */ 423 + if (!hdm) 419 424 return -ENODEV; 425 + 426 + /* The HDM Decoder Capability exists but is globally disabled. */ 427 + 428 + /* 429 + * If the DVSEC CXL Range registers are not enabled, just 430 + * enable and use the HDM Decoder Capability registers. 431 + */ 432 + if (!info->mem_enabled) { 433 + rc = devm_cxl_enable_hdm(&port->dev, cxlhdm); 434 + if (rc) 435 + return rc; 436 + 437 + return devm_cxl_enable_mem(&port->dev, cxlds); 438 + } 439 + 440 + /* 441 + * Per CXL 2.0 Section 8.1.3.8.3 and 8.1.3.8.4 DVSEC CXL Range 1 Base 442 + * [High,Low] when HDM operation is enabled the range register values 443 + * are ignored by the device, but the spec also recommends matching the 444 + * DVSEC Range 1,2 to HDM Decoder Range 0,1. So, non-zero info->ranges 445 + * are expected even though Linux does not require or maintain that 446 + * match. Check if at least one DVSEC range is enabled and allowed by 447 + * the platform. That is, the DVSEC range must be covered by a locked 448 + * platform window (CFMWS). Fail otherwise as the endpoint's decoders 449 + * cannot be used. 450 + */ 420 451 421 452 root = to_cxl_port(port->dev.parent); 422 453 while (!is_cxl_root(root) && is_cxl_port(root->dev.parent)) ··· 455 424 if (!is_cxl_root(root)) { 456 425 dev_err(dev, "Failed to acquire root port for HDM enable\n"); 457 426 return -ENODEV; 458 - } 459 - 460 - if (!info->mem_enabled) { 461 - rc = devm_cxl_enable_hdm(&port->dev, cxlhdm); 462 - if (rc) 463 - return rc; 464 - 465 - return devm_cxl_enable_mem(&port->dev, cxlds); 466 427 } 467 428 468 429 for (i = 0, allowed = 0; i < info->ranges; i++) { ··· 476 453 return -ENXIO; 477 454 } 478 455 479 - /* 480 - * Per CXL 2.0 Section 8.1.3.8.3 and 8.1.3.8.4 DVSEC CXL Range 1 Base 481 - * [High,Low] when HDM operation is enabled the range register values 482 - * are ignored by the device, but the spec also recommends matching the 483 - * DVSEC Range 1,2 to HDM Decoder Range 0,1. So, non-zero info->ranges 484 - * are expected even though Linux does not require or maintain that 485 - * match. If at least one DVSEC range is enabled and allowed, skip HDM 486 - * Decoder Capability Enable. 487 - */ 488 456 return 0; 489 457 } 490 458 EXPORT_SYMBOL_NS_GPL(cxl_hdm_decode_init, "CXL");
+8 -15
drivers/cxl/core/port.c
··· 602 602 } 603 603 EXPORT_SYMBOL_NS_GPL(to_cxl_port, "CXL"); 604 604 605 + struct cxl_port *parent_port_of(struct cxl_port *port) 606 + { 607 + if (!port || !port->parent_dport) 608 + return NULL; 609 + return port->parent_dport->port; 610 + } 611 + 605 612 static void unregister_port(void *_port) 606 613 { 607 614 struct cxl_port *port = _port; 608 - struct cxl_port *parent; 615 + struct cxl_port *parent = parent_port_of(port); 609 616 struct device *lock_dev; 610 - 611 - if (is_cxl_root(port)) 612 - parent = NULL; 613 - else 614 - parent = to_cxl_port(port->dev.parent); 615 617 616 618 /* 617 619 * CXL root port's and the first level of ports are unregistered ··· 1036 1034 return to_cxl_root(iter); 1037 1035 } 1038 1036 EXPORT_SYMBOL_NS_GPL(find_cxl_root, "CXL"); 1039 - 1040 - void put_cxl_root(struct cxl_root *cxl_root) 1041 - { 1042 - if (!cxl_root) 1043 - return; 1044 - 1045 - put_device(&cxl_root->port.dev); 1046 - } 1047 - EXPORT_SYMBOL_NS_GPL(put_cxl_root, "CXL"); 1048 1037 1049 1038 static struct cxl_dport *find_dport(struct cxl_port *port, int id) 1050 1039 {
+121 -68
drivers/cxl/core/region.c
··· 231 231 &cxlr->dev, 232 232 "Bypassing cpu_cache_invalidate_memregion() for testing!\n"); 233 233 return 0; 234 - } else { 235 - dev_WARN(&cxlr->dev, 236 - "Failed to synchronize CPU cache state\n"); 237 - return -ENXIO; 238 234 } 235 + dev_WARN(&cxlr->dev, 236 + "Failed to synchronize CPU cache state\n"); 237 + return -ENXIO; 239 238 } 240 239 241 240 cpu_cache_invalidate_memregion(IORES_DESC_CXL); ··· 864 865 return 0; 865 866 } 866 867 868 + /** 869 + * cxl_port_pick_region_decoder() - assign or lookup a decoder for a region 870 + * @port: a port in the ancestry of the endpoint implied by @cxled 871 + * @cxled: endpoint decoder to be, or currently, mapped by @port 872 + * @cxlr: region to establish, or validate, decode @port 873 + * 874 + * In the region creation path cxl_port_pick_region_decoder() is an 875 + * allocator to find a free port. In the region assembly path, it is 876 + * recalling the decoder that platform firmware picked for validation 877 + * purposes. 878 + * 879 + * The result is recorded in a 'struct cxl_region_ref' in @port. 880 + */ 867 881 static struct cxl_decoder * 868 - cxl_region_find_decoder(struct cxl_port *port, 869 - struct cxl_endpoint_decoder *cxled, 870 - struct cxl_region *cxlr) 882 + cxl_port_pick_region_decoder(struct cxl_port *port, 883 + struct cxl_endpoint_decoder *cxled, 884 + struct cxl_region *cxlr) 871 885 { 872 886 struct device *dev; 873 887 ··· 928 916 929 917 static struct cxl_region_ref * 930 918 alloc_region_ref(struct cxl_port *port, struct cxl_region *cxlr, 931 - struct cxl_endpoint_decoder *cxled) 919 + struct cxl_endpoint_decoder *cxled, 920 + struct cxl_decoder *cxld) 932 921 { 933 922 struct cxl_region_params *p = &cxlr->params; 934 923 struct cxl_region_ref *cxl_rr, *iter; ··· 943 930 continue; 944 931 945 932 if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) { 946 - struct cxl_decoder *cxld; 947 - 948 - cxld = cxl_region_find_decoder(port, cxled, cxlr); 949 933 if (auto_order_ok(port, iter->region, cxld)) 950 934 continue; 951 935 } ··· 1024 1014 return 0; 1025 1015 } 1026 1016 1027 - static int cxl_rr_alloc_decoder(struct cxl_port *port, struct cxl_region *cxlr, 1028 - struct cxl_endpoint_decoder *cxled, 1029 - struct cxl_region_ref *cxl_rr) 1017 + static int cxl_rr_assign_decoder(struct cxl_port *port, struct cxl_region *cxlr, 1018 + struct cxl_endpoint_decoder *cxled, 1019 + struct cxl_region_ref *cxl_rr, 1020 + struct cxl_decoder *cxld) 1030 1021 { 1031 - struct cxl_decoder *cxld; 1032 - 1033 - cxld = cxl_region_find_decoder(port, cxled, cxlr); 1034 - if (!cxld) { 1035 - dev_dbg(&cxlr->dev, "%s: no decoder available\n", 1036 - dev_name(&port->dev)); 1037 - return -EBUSY; 1038 - } 1039 - 1040 1022 if (cxld->region) { 1041 1023 dev_dbg(&cxlr->dev, "%s: %s already attached to %s\n", 1042 1024 dev_name(&port->dev), dev_name(&cxld->dev), ··· 1119 1117 nr_targets_inc = true; 1120 1118 } 1121 1119 } else { 1122 - cxl_rr = alloc_region_ref(port, cxlr, cxled); 1120 + struct cxl_decoder *cxld; 1121 + 1122 + cxld = cxl_port_pick_region_decoder(port, cxled, cxlr); 1123 + if (!cxld) { 1124 + dev_dbg(&cxlr->dev, "%s: no decoder available\n", 1125 + dev_name(&port->dev)); 1126 + return -EBUSY; 1127 + } 1128 + 1129 + cxl_rr = alloc_region_ref(port, cxlr, cxled, cxld); 1123 1130 if (IS_ERR(cxl_rr)) { 1124 1131 dev_dbg(&cxlr->dev, 1125 1132 "%s: failed to allocate region reference\n", ··· 1137 1126 } 1138 1127 nr_targets_inc = true; 1139 1128 1140 - rc = cxl_rr_alloc_decoder(port, cxlr, cxled, cxl_rr); 1129 + rc = cxl_rr_assign_decoder(port, cxlr, cxled, cxl_rr, cxld); 1141 1130 if (rc) 1142 1131 goto out_erase; 1143 1132 } ··· 1457 1446 1458 1447 if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) { 1459 1448 if (cxld->interleave_ways != iw || 1460 - cxld->interleave_granularity != ig || 1449 + (iw > 1 && cxld->interleave_granularity != ig) || 1461 1450 !region_res_match_cxl_range(p, &cxld->hpa_range) || 1462 1451 ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)) { 1463 1452 dev_err(&cxlr->dev, ··· 1759 1748 return cxled_a->pos - cxled_b->pos; 1760 1749 } 1761 1750 1762 - static struct cxl_port *next_port(struct cxl_port *port) 1763 - { 1764 - if (!port->parent_dport) 1765 - return NULL; 1766 - return port->parent_dport->port; 1767 - } 1768 - 1769 1751 static int match_switch_decoder_by_range(struct device *dev, 1770 1752 const void *data) 1771 1753 { ··· 1785 1781 struct device *dev; 1786 1782 int rc = -ENXIO; 1787 1783 1788 - parent = next_port(port); 1784 + parent = parent_port_of(port); 1789 1785 if (!parent) 1790 1786 return rc; 1791 1787 ··· 1808 1804 } 1809 1805 } 1810 1806 put_device(dev); 1807 + 1808 + if (rc) 1809 + dev_err(port->uport_dev, 1810 + "failed to find %s:%s in target list of %s\n", 1811 + dev_name(&port->dev), 1812 + dev_name(port->parent_dport->dport_dev), 1813 + dev_name(&cxlsd->cxld.dev)); 1811 1814 1812 1815 return rc; 1813 1816 } ··· 1872 1861 */ 1873 1862 1874 1863 /* Iterate from endpoint to root_port refining the position */ 1875 - for (iter = port; iter; iter = next_port(iter)) { 1864 + for (iter = port; iter; iter = parent_port_of(iter)) { 1876 1865 if (is_cxl_root(iter)) 1877 1866 break; 1878 1867 ··· 1951 1940 if (p->state > CXL_CONFIG_INTERLEAVE_ACTIVE) { 1952 1941 dev_dbg(&cxlr->dev, "region already active\n"); 1953 1942 return -EBUSY; 1954 - } else if (p->state < CXL_CONFIG_INTERLEAVE_ACTIVE) { 1943 + } 1944 + 1945 + if (p->state < CXL_CONFIG_INTERLEAVE_ACTIVE) { 1955 1946 dev_dbg(&cxlr->dev, "interleave config missing\n"); 1956 1947 return -ENXIO; 1957 1948 } ··· 2173 2160 rc = cxl_region_attach(cxlr, cxled, pos); 2174 2161 up_read(&cxl_dpa_rwsem); 2175 2162 up_write(&cxl_region_rwsem); 2163 + 2164 + if (rc) 2165 + dev_warn(cxled->cxld.dev.parent, 2166 + "failed to attach %s to %s: %d\n", 2167 + dev_name(&cxled->cxld.dev), dev_name(&cxlr->dev), rc); 2168 + 2176 2169 return rc; 2177 2170 } 2178 2171 ··· 3215 3196 return rc; 3216 3197 } 3217 3198 3218 - static int match_root_decoder_by_range(struct device *dev, 3219 - const void *data) 3199 + static int match_decoder_by_range(struct device *dev, const void *data) 3220 3200 { 3221 3201 const struct range *r1, *r2 = data; 3222 - struct cxl_root_decoder *cxlrd; 3202 + struct cxl_decoder *cxld; 3223 3203 3224 - if (!is_root_decoder(dev)) 3204 + if (!is_switch_decoder(dev)) 3225 3205 return 0; 3226 3206 3227 - cxlrd = to_cxl_root_decoder(dev); 3228 - r1 = &cxlrd->cxlsd.cxld.hpa_range; 3207 + cxld = to_cxl_decoder(dev); 3208 + r1 = &cxld->hpa_range; 3229 3209 return range_contains(r1, r2); 3210 + } 3211 + 3212 + static struct cxl_decoder * 3213 + cxl_port_find_switch_decoder(struct cxl_port *port, struct range *hpa) 3214 + { 3215 + struct device *cxld_dev = device_find_child(&port->dev, hpa, 3216 + match_decoder_by_range); 3217 + 3218 + return cxld_dev ? to_cxl_decoder(cxld_dev) : NULL; 3219 + } 3220 + 3221 + static struct cxl_root_decoder * 3222 + cxl_find_root_decoder(struct cxl_endpoint_decoder *cxled) 3223 + { 3224 + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); 3225 + struct cxl_port *port = cxled_to_port(cxled); 3226 + struct cxl_root *cxl_root __free(put_cxl_root) = find_cxl_root(port); 3227 + struct cxl_decoder *root, *cxld = &cxled->cxld; 3228 + struct range *hpa = &cxld->hpa_range; 3229 + 3230 + root = cxl_port_find_switch_decoder(&cxl_root->port, hpa); 3231 + if (!root) { 3232 + dev_err(cxlmd->dev.parent, 3233 + "%s:%s no CXL window for range %#llx:%#llx\n", 3234 + dev_name(&cxlmd->dev), dev_name(&cxld->dev), 3235 + cxld->hpa_range.start, cxld->hpa_range.end); 3236 + return NULL; 3237 + } 3238 + 3239 + return to_cxl_root_decoder(&root->dev); 3230 3240 } 3231 3241 3232 3242 static int match_region_by_range(struct device *dev, const void *data) ··· 3424 3376 return cxlr; 3425 3377 } 3426 3378 3427 - int cxl_add_to_region(struct cxl_port *root, struct cxl_endpoint_decoder *cxled) 3379 + static struct cxl_region * 3380 + cxl_find_region_by_range(struct cxl_root_decoder *cxlrd, struct range *hpa) 3428 3381 { 3429 - struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); 3382 + struct device *region_dev; 3383 + 3384 + region_dev = device_find_child(&cxlrd->cxlsd.cxld.dev, hpa, 3385 + match_region_by_range); 3386 + if (!region_dev) 3387 + return NULL; 3388 + 3389 + return to_cxl_region(region_dev); 3390 + } 3391 + 3392 + int cxl_add_to_region(struct cxl_endpoint_decoder *cxled) 3393 + { 3430 3394 struct range *hpa = &cxled->cxld.hpa_range; 3431 - struct cxl_decoder *cxld = &cxled->cxld; 3432 - struct device *cxlrd_dev, *region_dev; 3433 - struct cxl_root_decoder *cxlrd; 3434 3395 struct cxl_region_params *p; 3435 - struct cxl_region *cxlr; 3436 3396 bool attach = false; 3437 3397 int rc; 3438 3398 3439 - cxlrd_dev = device_find_child(&root->dev, &cxld->hpa_range, 3440 - match_root_decoder_by_range); 3441 - if (!cxlrd_dev) { 3442 - dev_err(cxlmd->dev.parent, 3443 - "%s:%s no CXL window for range %#llx:%#llx\n", 3444 - dev_name(&cxlmd->dev), dev_name(&cxld->dev), 3445 - cxld->hpa_range.start, cxld->hpa_range.end); 3399 + struct cxl_root_decoder *cxlrd __free(put_cxl_root_decoder) = 3400 + cxl_find_root_decoder(cxled); 3401 + if (!cxlrd) 3446 3402 return -ENXIO; 3447 - } 3448 - 3449 - cxlrd = to_cxl_root_decoder(cxlrd_dev); 3450 3403 3451 3404 /* 3452 3405 * Ensure that if multiple threads race to construct_region() for @hpa 3453 3406 * one does the construction and the others add to that. 3454 3407 */ 3455 3408 mutex_lock(&cxlrd->range_lock); 3456 - region_dev = device_find_child(&cxlrd->cxlsd.cxld.dev, hpa, 3457 - match_region_by_range); 3458 - if (!region_dev) { 3409 + struct cxl_region *cxlr __free(put_cxl_region) = 3410 + cxl_find_region_by_range(cxlrd, hpa); 3411 + if (!cxlr) 3459 3412 cxlr = construct_region(cxlrd, cxled); 3460 - region_dev = &cxlr->dev; 3461 - } else 3462 - cxlr = to_cxl_region(region_dev); 3463 3413 mutex_unlock(&cxlrd->range_lock); 3464 3414 3465 3415 rc = PTR_ERR_OR_ZERO(cxlr); 3466 3416 if (rc) 3467 - goto out; 3417 + return rc; 3468 3418 3469 3419 attach_target(cxlr, cxled, -1, TASK_UNINTERRUPTIBLE); 3470 3420 ··· 3482 3436 p->res); 3483 3437 } 3484 3438 3485 - put_device(region_dev); 3486 - out: 3487 - put_device(cxlrd_dev); 3488 3439 return rc; 3489 3440 } 3490 3441 EXPORT_SYMBOL_NS_GPL(cxl_add_to_region, "CXL"); ··· 3580 3537 3581 3538 switch (cxlr->mode) { 3582 3539 case CXL_PARTMODE_PMEM: 3540 + rc = devm_cxl_region_edac_register(cxlr); 3541 + if (rc) 3542 + dev_dbg(&cxlr->dev, "CXL EDAC registration for region_id=%d failed\n", 3543 + cxlr->id); 3544 + 3583 3545 return devm_cxl_add_pmem_region(cxlr); 3584 3546 case CXL_PARTMODE_RAM: 3547 + rc = devm_cxl_region_edac_register(cxlr); 3548 + if (rc) 3549 + dev_dbg(&cxlr->dev, "CXL EDAC registration for region_id=%d failed\n", 3550 + cxlr->id); 3551 + 3585 3552 /* 3586 3553 * The region can not be manged by CXL if any portion of 3587 3554 * it is already online as 'System RAM'
+17 -6
drivers/cxl/cxl.h
··· 724 724 int cxl_num_decoders_committed(struct cxl_port *port); 725 725 bool is_cxl_port(const struct device *dev); 726 726 struct cxl_port *to_cxl_port(const struct device *dev); 727 + struct cxl_port *parent_port_of(struct cxl_port *port); 727 728 void cxl_port_commit_reap(struct cxl_decoder *cxld); 728 729 struct pci_bus; 729 730 int devm_cxl_register_pci_bus(struct device *host, struct device *uport_dev, ··· 737 736 struct cxl_root *devm_cxl_add_root(struct device *host, 738 737 const struct cxl_root_ops *ops); 739 738 struct cxl_root *find_cxl_root(struct cxl_port *port); 740 - void put_cxl_root(struct cxl_root *cxl_root); 741 - DEFINE_FREE(put_cxl_root, struct cxl_root *, if (_T) put_cxl_root(_T)) 742 739 740 + DEFINE_FREE(put_cxl_root, struct cxl_root *, if (_T) put_device(&_T->port.dev)) 743 741 DEFINE_FREE(put_cxl_port, struct cxl_port *, if (!IS_ERR_OR_NULL(_T)) put_device(&_T->dev)) 742 + DEFINE_FREE(put_cxl_root_decoder, struct cxl_root_decoder *, if (!IS_ERR_OR_NULL(_T)) put_device(&_T->cxlsd.cxld.dev)) 743 + DEFINE_FREE(put_cxl_region, struct cxl_region *, if (!IS_ERR_OR_NULL(_T)) put_device(&_T->dev)) 744 + 744 745 int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd); 745 746 void cxl_bus_rescan(void); 746 747 void cxl_bus_drain(void); ··· 859 856 #ifdef CONFIG_CXL_REGION 860 857 bool is_cxl_pmem_region(struct device *dev); 861 858 struct cxl_pmem_region *to_cxl_pmem_region(struct device *dev); 862 - int cxl_add_to_region(struct cxl_port *root, 863 - struct cxl_endpoint_decoder *cxled); 859 + int cxl_add_to_region(struct cxl_endpoint_decoder *cxled); 864 860 struct cxl_dax_region *to_cxl_dax_region(struct device *dev); 865 861 u64 cxl_port_get_spa_cache_alias(struct cxl_port *endpoint, u64 spa); 866 862 #else ··· 871 869 { 872 870 return NULL; 873 871 } 874 - static inline int cxl_add_to_region(struct cxl_port *root, 875 - struct cxl_endpoint_decoder *cxled) 872 + static inline int cxl_add_to_region(struct cxl_endpoint_decoder *cxled) 876 873 { 877 874 return 0; 878 875 } ··· 912 911 #endif 913 912 914 913 u16 cxl_gpf_get_dvsec(struct device *dev); 914 + 915 + static inline struct rw_semaphore *rwsem_read_intr_acquire(struct rw_semaphore *rwsem) 916 + { 917 + if (down_read_interruptible(rwsem)) 918 + return NULL; 919 + 920 + return rwsem; 921 + } 922 + 923 + DEFINE_FREE(rwsem_read_release, struct rw_semaphore *, if (_T) up_read(_T)) 915 924 916 925 #endif /* __CXL_H__ */
+30
drivers/cxl/cxlmem.h
··· 45 45 * @endpoint: connection to the CXL port topology for this memory device 46 46 * @id: id number of this memdev instance. 47 47 * @depth: endpoint port depth 48 + * @scrub_cycle: current scrub cycle set for this device 49 + * @scrub_region_id: id number of a backed region (if any) for which current scrub cycle set 50 + * @err_rec_array: List of xarrarys to store the memdev error records to 51 + * check attributes for a memory repair operation are from 52 + * current boot. 48 53 */ 49 54 struct cxl_memdev { 50 55 struct device dev; ··· 61 56 struct cxl_port *endpoint; 62 57 int id; 63 58 int depth; 59 + u8 scrub_cycle; 60 + int scrub_region_id; 61 + void *err_rec_array; 64 62 }; 65 63 66 64 static inline struct cxl_memdev *to_cxl_memdev(struct device *dev) ··· 535 527 CXL_MBOX_OP_GET_SUPPORTED_FEATURES = 0x0500, 536 528 CXL_MBOX_OP_GET_FEATURE = 0x0501, 537 529 CXL_MBOX_OP_SET_FEATURE = 0x0502, 530 + CXL_MBOX_OP_DO_MAINTENANCE = 0x0600, 538 531 CXL_MBOX_OP_IDENTIFY = 0x4000, 539 532 CXL_MBOX_OP_GET_PARTITION_INFO = 0x4100, 540 533 CXL_MBOX_OP_SET_PARTITION_INFO = 0x4101, ··· 861 852 int cxl_trigger_poison_list(struct cxl_memdev *cxlmd); 862 853 int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa); 863 854 int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa); 855 + 856 + #ifdef CONFIG_CXL_EDAC_MEM_FEATURES 857 + int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd); 858 + int devm_cxl_region_edac_register(struct cxl_region *cxlr); 859 + int cxl_store_rec_gen_media(struct cxl_memdev *cxlmd, union cxl_event *evt); 860 + int cxl_store_rec_dram(struct cxl_memdev *cxlmd, union cxl_event *evt); 861 + void devm_cxl_memdev_edac_release(struct cxl_memdev *cxlmd); 862 + #else 863 + static inline int devm_cxl_memdev_edac_register(struct cxl_memdev *cxlmd) 864 + { return 0; } 865 + static inline int devm_cxl_region_edac_register(struct cxl_region *cxlr) 866 + { return 0; } 867 + static inline int cxl_store_rec_gen_media(struct cxl_memdev *cxlmd, 868 + union cxl_event *evt) 869 + { return 0; } 870 + static inline int cxl_store_rec_dram(struct cxl_memdev *cxlmd, 871 + union cxl_event *evt) 872 + { return 0; } 873 + static inline void devm_cxl_memdev_edac_release(struct cxl_memdev *cxlmd) 874 + { return; } 875 + #endif 864 876 865 877 #ifdef CONFIG_CXL_SUSPEND 866 878 void cxl_mem_active_inc(void);
+4
drivers/cxl/mem.c
··· 180 180 return rc; 181 181 } 182 182 183 + rc = devm_cxl_memdev_edac_register(cxlmd); 184 + if (rc) 185 + dev_dbg(dev, "CXL memdev EDAC registration failed rc=%d\n", rc); 186 + 183 187 /* 184 188 * The kernel may be operating out of CXL memory on this device, 185 189 * there is no spec defined way to determine whether this device
+3 -12
drivers/cxl/port.c
··· 30 30 schedule_cxl_memdev_detach(cxlmd); 31 31 } 32 32 33 - static int discover_region(struct device *dev, void *root) 33 + static int discover_region(struct device *dev, void *unused) 34 34 { 35 35 struct cxl_endpoint_decoder *cxled; 36 36 int rc; ··· 49 49 * Region enumeration is opportunistic, if this add-event fails, 50 50 * continue to the next endpoint decoder. 51 51 */ 52 - rc = cxl_add_to_region(root, cxled); 52 + rc = cxl_add_to_region(cxled); 53 53 if (rc) 54 54 dev_dbg(dev, "failed to add to region: %#llx-%#llx\n", 55 55 cxled->cxld.hpa_range.start, cxled->cxld.hpa_range.end); ··· 95 95 struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev); 96 96 struct cxl_dev_state *cxlds = cxlmd->cxlds; 97 97 struct cxl_hdm *cxlhdm; 98 - struct cxl_port *root; 99 98 int rc; 100 99 101 100 rc = cxl_dvsec_rr_decode(cxlds, &info); ··· 126 127 return rc; 127 128 128 129 /* 129 - * This can't fail in practice as CXL root exit unregisters all 130 - * descendant ports and that in turn synchronizes with cxl_port_probe() 131 - */ 132 - struct cxl_root *cxl_root __free(put_cxl_root) = find_cxl_root(port); 133 - 134 - root = &cxl_root->port; 135 - 136 - /* 137 130 * Now that all endpoint decoders are successfully enumerated, try to 138 131 * assemble regions from committed decoders 139 132 */ 140 - device_for_each_child(&port->dev, root, discover_region); 133 + device_for_each_child(&port->dev, NULL, discover_region); 141 134 142 135 return 0; 143 136 }
+9
drivers/edac/mem_repair.c
··· 45 45 struct attribute_group group; 46 46 }; 47 47 48 + const char * const edac_repair_type[] = { 49 + [EDAC_REPAIR_PPR] = "ppr", 50 + [EDAC_REPAIR_CACHELINE_SPARING] = "cacheline-sparing", 51 + [EDAC_REPAIR_ROW_SPARING] = "row-sparing", 52 + [EDAC_REPAIR_BANK_SPARING] = "bank-sparing", 53 + [EDAC_REPAIR_RANK_SPARING] = "rank-sparing", 54 + }; 55 + EXPORT_SYMBOL_GPL(edac_repair_type); 56 + 48 57 #define TO_MR_DEV_ATTR(_dev_attr) \ 49 58 container_of(_dev_attr, struct edac_mem_repair_dev_attr, dev_attr) 50 59
+1 -1
include/cxl/features.h
··· 64 64 struct cxl_mailbox; 65 65 struct cxl_memdev; 66 66 #ifdef CONFIG_CXL_FEATURES 67 - inline struct cxl_features_state *to_cxlfs(struct cxl_dev_state *cxlds); 67 + struct cxl_features_state *to_cxlfs(struct cxl_dev_state *cxlds); 68 68 int devm_cxl_setup_features(struct cxl_dev_state *cxlds); 69 69 int devm_cxl_setup_fwctl(struct device *host, struct cxl_memdev *cxlmd); 70 70 #else
+7
include/linux/edac.h
··· 745 745 #endif /* CONFIG_EDAC_ECS */ 746 746 747 747 enum edac_mem_repair_type { 748 + EDAC_REPAIR_PPR, 749 + EDAC_REPAIR_CACHELINE_SPARING, 750 + EDAC_REPAIR_ROW_SPARING, 751 + EDAC_REPAIR_BANK_SPARING, 752 + EDAC_REPAIR_RANK_SPARING, 748 753 EDAC_REPAIR_MAX 749 754 }; 755 + 756 + extern const char * const edac_repair_type[]; 750 757 751 758 enum edac_mem_repair_cmd { 752 759 EDAC_DO_MEM_REPAIR = 1,
+1
tools/testing/cxl/Kbuild
··· 67 67 cxl_core-$(CONFIG_CXL_REGION) += $(CXL_CORE_SRC)/region.o 68 68 cxl_core-$(CONFIG_CXL_MCE) += $(CXL_CORE_SRC)/mce.o 69 69 cxl_core-$(CONFIG_CXL_FEATURES) += $(CXL_CORE_SRC)/features.o 70 + cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += $(CXL_CORE_SRC)/edac.o 70 71 cxl_core-y += config_check.o 71 72 cxl_core-y += cxl_core_test.o 72 73 cxl_core-y += cxl_core_exports.o
+1
tools/testing/cxl/test/cxl.c
··· 1527 1527 module_init(cxl_test_init); 1528 1528 module_exit(cxl_test_exit); 1529 1529 MODULE_LICENSE("GPL v2"); 1530 + MODULE_DESCRIPTION("cxl_test: setup module"); 1530 1531 MODULE_IMPORT_NS("ACPI"); 1531 1532 MODULE_IMPORT_NS("CXL");
+1
tools/testing/cxl/test/mem.c
··· 1909 1909 1910 1910 module_platform_driver(cxl_mock_mem_driver); 1911 1911 MODULE_LICENSE("GPL v2"); 1912 + MODULE_DESCRIPTION("cxl_test: mem device mock module"); 1912 1913 MODULE_IMPORT_NS("CXL");
+1
tools/testing/cxl/test/mock.c
··· 312 312 EXPORT_SYMBOL_NS_GPL(__wrap_cxl_dport_init_ras_reporting, "CXL"); 313 313 314 314 MODULE_LICENSE("GPL v2"); 315 + MODULE_DESCRIPTION("cxl_test: emulation module"); 315 316 MODULE_IMPORT_NS("ACPI"); 316 317 MODULE_IMPORT_NS("CXL");