Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'edac_updates_for_v6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC updates from Borislav Petkov:

- Add support for new AMD family 0x1a models to amd64_edac

- Add an EDAC driver for the AMD VersalNET memory controller which
reports hw errors from different IP blocks in the fabric using an
IPC-type transport

- Drop the silly static number of memory controllers in the Intel EDAC
drivers (skx, i10nm) in favor of a flexible array so that former
doesn't need to be increased with every new generation which adds
more memory controllers; along with a proper refactoring

- Add support for two Alder Lake-S SOCs to ie31200_edac

- Add an EDAC driver for ADM Cortex A72 cores, and specifically for
reporting L1 and L2 cache errors

- Last but not least, the usual fixes, cleanups and improvements all
over the subsystem

* tag 'edac_updates_for_v6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: (23 commits)
EDAC/versalnet: Return the correct error in mc_probe()
EDAC/mc_sysfs: Increase legacy channel support to 16
EDAC/amd64: Add support for AMD family 1Ah-based newer models
EDAC: Add a driver for the AMD Versal NET DDR controller
dt-bindings: memory-controllers: Add support for Versal NET EDAC
RAS: Export log_non_standard_event() to drivers
cdx: Export Symbols for MCDI RPC and Initialization
cdx: Split mcdi.h and reorganize headers
EDAC/skx_common: Use topology_physical_package_id() instead of open coding
EDAC: Fix wrong executable file modes for C source files
EDAC/altera: Use dev_fwnode()
EDAC/skx_common: Remove unused *NUM*_IMC macros
EDAC/i10nm: Reallocate skx_dev list if preconfigured cnt != runtime cnt
EDAC/skx_common: Remove redundant upper bound check for res->imc
EDAC/skx_common: Make skx_dev->imc[] a flexible array
EDAC/skx_common: Swap memory controller index mapping
EDAC/skx_common: Move mc_mapping to be a field inside struct skx_imc
EDAC/{skx_common,skx}: Use configuration data, not global macros
EDAC/i10nm: Skip DIMM enumeration on a disabled memory controller
EDAC/ie31200: Add two more Intel Alder Lake-S SoCs for EDAC support
...

+1553 -111
+17
Documentation/devicetree/bindings/arm/cpus.yaml
··· 353 353 $ref: /schemas/types.yaml#/definitions/phandle 354 354 description: Link to Mediatek Cache Coherent Interconnect 355 355 356 + edac-enabled: 357 + $ref: /schemas/types.yaml#/definitions/flag 358 + description: 359 + A72 CPUs support Error Detection And Correction (EDAC) on their L1 and 360 + L2 caches. This flag marks this function as usable. 361 + 356 362 qcom,saw: 357 363 $ref: /schemas/types.yaml#/definitions/phandle 358 364 description: ··· 405 399 allOf: 406 400 - $ref: /schemas/cpu.yaml# 407 401 - $ref: /schemas/opp/opp-v1.yaml# 402 + - if: 403 + not: 404 + properties: 405 + compatible: 406 + contains: 407 + const: arm,cortex-a72 408 + then: 409 + # Allow edac-enabled only for Cortex A72 410 + properties: 411 + edac-enabled: false 412 + 408 413 - if: 409 414 # If the enable-method property contains one of those values 410 415 properties:
+41
Documentation/devicetree/bindings/memory-controllers/xlnx,versal-net-ddrmc5.yaml
··· 1 + # SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause) 2 + %YAML 1.2 3 + --- 4 + $id: http://devicetree.org/schemas/memory-controllers/xlnx,versal-net-ddrmc5.yaml# 5 + $schema: http://devicetree.org/meta-schemas/core.yaml# 6 + 7 + title: Xilinx Versal NET Memory Controller 8 + 9 + maintainers: 10 + - Shubhrajyoti Datta <shubhrajyoti.datta@amd.com> 11 + 12 + description: 13 + The integrated DDR Memory Controllers (DDRMCs) support both DDR5 and LPDDR5 14 + compact and extended memory interfaces. Versal NET DDR memory controller 15 + has an optional ECC support which correct single bit ECC errors and detect 16 + double bit ECC errors. It also has support for reporting other errors like 17 + MMCM (Mixed-Mode Clock Manager) errors and General software errors. 18 + 19 + properties: 20 + compatible: 21 + const: xlnx,versal-net-ddrmc5 22 + 23 + amd,rproc: 24 + $ref: /schemas/types.yaml#/definitions/phandle 25 + description: 26 + phandle to the remoteproc_r5 rproc node using which APU interacts 27 + with remote processor. APU primarily communicates with the RPU for 28 + accessing the DDRMC address space and getting error notification. 29 + 30 + required: 31 + - compatible 32 + - amd,rproc 33 + 34 + additionalProperties: false 35 + 36 + examples: 37 + - | 38 + memory-controller { 39 + compatible = "xlnx,versal-net-ddrmc5"; 40 + amd,rproc = <&remoteproc_r5>; 41 + };
+14 -3
MAINTAINERS
··· 8745 8745 EDAC-CORE 8746 8746 M: Borislav Petkov <bp@alien8.de> 8747 8747 M: Tony Luck <tony.luck@intel.com> 8748 - R: James Morse <james.morse@arm.com> 8749 - R: Mauro Carvalho Chehab <mchehab@kernel.org> 8750 - R: Robert Richter <rric@kernel.org> 8751 8748 L: linux-edac@vger.kernel.org 8752 8749 S: Supported 8753 8750 T: git git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git edac-for-next 8754 8751 F: Documentation/driver-api/edac.rst 8755 8752 F: drivers/edac/ 8756 8753 F: include/linux/edac.h 8754 + 8755 + EDAC-A72 8756 + M: Vijay Balakrishna <vijayb@linux.microsoft.com> 8757 + M: Tyler Hicks <code@tyhicks.com> 8758 + L: linux-edac@vger.kernel.org 8759 + S: Supported 8760 + F: drivers/edac/a72_edac.c 8757 8761 8758 8762 EDAC-DMC520 8759 8763 M: Lei Wang <lewan@microsoft.com> ··· 27678 27674 S: Maintained 27679 27675 F: Documentation/devicetree/bindings/memory-controllers/xlnx,versal-ddrmc-edac.yaml 27680 27676 F: drivers/edac/versal_edac.c 27677 + 27678 + XILINX VERSALNET EDAC DRIVER 27679 + M: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com> 27680 + S: Maintained 27681 + F: Documentation/devicetree/bindings/memory-controllers/xlnx,versal-net-ddrmc5.yaml 27682 + F: drivers/edac/versalnet_edac.c 27683 + F: include/linux/cdx/edac_cdx_pcol.h 27681 27684 27682 27685 XILINX WATCHDOG DRIVER 27683 27686 M: Srinivas Neeli <srinivas.neeli@amd.com>
drivers/cdx/controller/bitfield.h include/linux/cdx/bitfield.h
+1 -1
drivers/cdx/controller/cdx_controller.c
··· 14 14 #include "cdx_controller.h" 15 15 #include "../cdx.h" 16 16 #include "mcdi_functions.h" 17 - #include "mcdi.h" 17 + #include "mcdid.h" 18 18 19 19 static unsigned int cdx_mcdi_rpc_timeout(struct cdx_mcdi *cdx, unsigned int cmd) 20 20 {
+1 -1
drivers/cdx/controller/cdx_rpmsg.c
··· 15 15 #include "../cdx.h" 16 16 #include "cdx_controller.h" 17 17 #include "mcdi_functions.h" 18 - #include "mcdi.h" 18 + #include "mcdid.h" 19 19 20 20 static struct rpmsg_device_id cdx_rpmsg_id_table[] = { 21 21 { .name = "mcdi_ipc" },
+41 -2
drivers/cdx/controller/mcdi.c
··· 23 23 #include <linux/log2.h> 24 24 #include <linux/net_tstamp.h> 25 25 #include <linux/wait.h> 26 + #include <linux/cdx/bitfield.h> 26 27 27 - #include "bitfield.h" 28 - #include "mcdi.h" 28 + #include <linux/cdx/mcdi.h> 29 + #include "mcdid.h" 29 30 30 31 static void cdx_mcdi_cancel_cmd(struct cdx_mcdi *cdx, struct cdx_mcdi_cmd *cmd); 31 32 static void cdx_mcdi_wait_for_cleanup(struct cdx_mcdi *cdx); ··· 100 99 return cdx->mcdi_ops->mcdi_rpc_timeout(cdx, cmd); 101 100 } 102 101 102 + /** 103 + * cdx_mcdi_init - Initialize MCDI (Management Controller Driver Interface) state 104 + * @cdx: Handle to the CDX MCDI structure 105 + * 106 + * This function allocates and initializes internal MCDI structures and resources 107 + * for the CDX device, including the workqueue, locking primitives, and command 108 + * tracking mechanisms. It sets the initial operating mode and prepares the device 109 + * for MCDI operations. 110 + * 111 + * Return: 112 + * * 0 - on success 113 + * * -ENOMEM - if memory allocation or workqueue creation fails 114 + */ 103 115 int cdx_mcdi_init(struct cdx_mcdi *cdx) 104 116 { 105 117 struct cdx_mcdi_iface *mcdi; ··· 142 128 fail: 143 129 return rc; 144 130 } 131 + EXPORT_SYMBOL_GPL(cdx_mcdi_init); 145 132 133 + /** 134 + * cdx_mcdi_finish - Cleanup MCDI (Management Controller Driver Interface) state 135 + * @cdx: Handle to the CDX MCDI structure 136 + * 137 + * This function is responsible for cleaning up the MCDI (Management Controller Driver Interface) 138 + * resources associated with a cdx_mcdi structure. Also destroys the mcdi workqueue. 139 + * 140 + */ 146 141 void cdx_mcdi_finish(struct cdx_mcdi *cdx) 147 142 { 148 143 struct cdx_mcdi_iface *mcdi; ··· 166 143 kfree(cdx->mcdi); 167 144 cdx->mcdi = NULL; 168 145 } 146 + EXPORT_SYMBOL_GPL(cdx_mcdi_finish); 169 147 170 148 static bool cdx_mcdi_flushed(struct cdx_mcdi_iface *mcdi, bool ignore_cleanups) 171 149 { ··· 577 553 cdx_mcdi_cmd_start_or_queue(mcdi, cmd); 578 554 } 579 555 556 + /** 557 + * cdx_mcdi_process_cmd - Process an incoming MCDI response 558 + * @cdx: Handle to the CDX MCDI structure 559 + * @outbuf: Pointer to the response buffer received from the management controller 560 + * @len: Length of the response buffer in bytes 561 + * 562 + * This function handles a response from the management controller. It locates the 563 + * corresponding command using the sequence number embedded in the header, 564 + * completes the command if it is still pending, and initiates any necessary cleanup. 565 + * 566 + * The function assumes that the response buffer is well-formed and at least one 567 + * dword in size. 568 + */ 580 569 void cdx_mcdi_process_cmd(struct cdx_mcdi *cdx, struct cdx_dword *outbuf, int len) 581 570 { 582 571 struct cdx_mcdi_iface *mcdi; ··· 627 590 628 591 cdx_mcdi_process_cleanup_list(mcdi->cdx, &cleanup_list); 629 592 } 593 + EXPORT_SYMBOL_GPL(cdx_mcdi_process_cmd); 630 594 631 595 static void cdx_mcdi_cmd_work(struct work_struct *context) 632 596 { ··· 795 757 return cdx_mcdi_rpc_sync(cdx, cmd, inbuf, inlen, outbuf, outlen, 796 758 outlen_actual, false); 797 759 } 760 + EXPORT_SYMBOL_GPL(cdx_mcdi_rpc); 798 761 799 762 /** 800 763 * cdx_mcdi_rpc_async - Schedule an MCDI command to run asynchronously
+2 -45
drivers/cdx/controller/mcdi.h include/linux/cdx/mcdi.h
··· 11 11 #include <linux/kref.h> 12 12 #include <linux/rpmsg.h> 13 13 14 - #include "bitfield.h" 15 - #include "mc_cdx_pcol.h" 16 - 17 - #ifdef DEBUG 18 - #define CDX_WARN_ON_ONCE_PARANOID(x) WARN_ON_ONCE(x) 19 - #define CDX_WARN_ON_PARANOID(x) WARN_ON(x) 20 - #else 21 - #define CDX_WARN_ON_ONCE_PARANOID(x) do {} while (0) 22 - #define CDX_WARN_ON_PARANOID(x) do {} while (0) 23 - #endif 14 + #include "linux/cdx/bitfield.h" 24 15 25 16 /** 26 17 * enum cdx_mcdi_mode - MCDI transaction mode ··· 26 35 #define MCDI_RPC_TIMEOUT (10 * HZ) 27 36 #define MCDI_RPC_LONG_TIMEOU (60 * HZ) 28 37 #define MCDI_RPC_POST_RST_TIME (10 * HZ) 29 - 30 - #define MCDI_BUF_LEN (8 + MCDI_CTL_SDU_LEN_MAX) 31 38 32 39 /** 33 40 * enum cdx_mcdi_cmd_state - State for an individual MCDI command ··· 169 180 u32 fn_flags; 170 181 }; 171 182 172 - static inline struct cdx_mcdi_iface *cdx_mcdi_if(struct cdx_mcdi *cdx) 173 - { 174 - return cdx->mcdi ? &cdx->mcdi->iface : NULL; 175 - } 176 - 177 - int cdx_mcdi_init(struct cdx_mcdi *cdx); 178 183 void cdx_mcdi_finish(struct cdx_mcdi *cdx); 179 - 184 + int cdx_mcdi_init(struct cdx_mcdi *cdx); 180 185 void cdx_mcdi_process_cmd(struct cdx_mcdi *cdx, struct cdx_dword *outbuf, int len); 181 186 int cdx_mcdi_rpc(struct cdx_mcdi *cdx, unsigned int cmd, 182 187 const struct cdx_dword *inbuf, size_t inlen, 183 188 struct cdx_dword *outbuf, size_t outlen, size_t *outlen_actual); 184 - int cdx_mcdi_rpc_async(struct cdx_mcdi *cdx, unsigned int cmd, 185 - const struct cdx_dword *inbuf, size_t inlen, 186 - cdx_mcdi_async_completer *complete, 187 - unsigned long cookie); 188 - int cdx_mcdi_wait_for_quiescence(struct cdx_mcdi *cdx, 189 - unsigned int timeout_jiffies); 190 189 191 190 /* 192 191 * We expect that 16- and 32-bit fields in MCDI requests and responses ··· 192 215 #define _MCDI_DWORD(_buf, _field) \ 193 216 ((_buf) + (_MCDI_CHECK_ALIGN(MC_CMD_ ## _field ## _OFST, 4) >> 2)) 194 217 195 - #define MCDI_BYTE(_buf, _field) \ 196 - ((void)BUILD_BUG_ON_ZERO(MC_CMD_ ## _field ## _LEN != 1), \ 197 - *MCDI_PTR(_buf, _field)) 198 - #define MCDI_WORD(_buf, _field) \ 199 - ((void)BUILD_BUG_ON_ZERO(MC_CMD_ ## _field ## _LEN != 2), \ 200 - le16_to_cpu(*(__force const __le16 *)MCDI_PTR(_buf, _field))) 201 218 #define MCDI_SET_DWORD(_buf, _field, _value) \ 202 219 CDX_POPULATE_DWORD_1(*_MCDI_DWORD(_buf, _field), CDX_DWORD, _value) 203 220 #define MCDI_DWORD(_buf, _field) \ 204 221 CDX_DWORD_FIELD(*_MCDI_DWORD(_buf, _field), CDX_DWORD) 205 - #define MCDI_POPULATE_DWORD_1(_buf, _field, _name1, _value1) \ 206 - CDX_POPULATE_DWORD_1(*_MCDI_DWORD(_buf, _field), \ 207 - MC_CMD_ ## _name1, _value1) 208 - #define MCDI_SET_QWORD(_buf, _field, _value) \ 209 - do { \ 210 - CDX_POPULATE_DWORD_1(_MCDI_DWORD(_buf, _field)[0], \ 211 - CDX_DWORD, (u32)(_value)); \ 212 - CDX_POPULATE_DWORD_1(_MCDI_DWORD(_buf, _field)[1], \ 213 - CDX_DWORD, (u64)(_value) >> 32); \ 214 - } while (0) 215 - #define MCDI_QWORD(_buf, _field) \ 216 - (CDX_DWORD_FIELD(_MCDI_DWORD(_buf, _field)[0], CDX_DWORD) | \ 217 - (u64)CDX_DWORD_FIELD(_MCDI_DWORD(_buf, _field)[1], CDX_DWORD) << 32) 218 - 219 222 #endif /* CDX_MCDI_H */
-1
drivers/cdx/controller/mcdi_functions.c
··· 5 5 6 6 #include <linux/module.h> 7 7 8 - #include "mcdi.h" 9 8 #include "mcdi_functions.h" 10 9 11 10 int cdx_mcdi_get_num_buses(struct cdx_mcdi *cdx)
+2 -1
drivers/cdx/controller/mcdi_functions.h
··· 8 8 #ifndef CDX_MCDI_FUNCTIONS_H 9 9 #define CDX_MCDI_FUNCTIONS_H 10 10 11 - #include "mcdi.h" 11 + #include <linux/cdx/mcdi.h> 12 + #include "mcdid.h" 12 13 #include "../cdx.h" 13 14 14 15 /**
+63
drivers/cdx/controller/mcdid.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 2 + * 3 + * Copyright 2008-2013 Solarflare Communications Inc. 4 + * Copyright (C) 2022-2025, Advanced Micro Devices, Inc. 5 + */ 6 + 7 + #ifndef CDX_MCDID_H 8 + #define CDX_MCDID_H 9 + 10 + #include <linux/mutex.h> 11 + #include <linux/kref.h> 12 + #include <linux/rpmsg.h> 13 + 14 + #include "mc_cdx_pcol.h" 15 + 16 + #ifdef DEBUG 17 + #define CDX_WARN_ON_ONCE_PARANOID(x) WARN_ON_ONCE(x) 18 + #define CDX_WARN_ON_PARANOID(x) WARN_ON(x) 19 + #else 20 + #define CDX_WARN_ON_ONCE_PARANOID(x) do {} while (0) 21 + #define CDX_WARN_ON_PARANOID(x) do {} while (0) 22 + #endif 23 + 24 + #define MCDI_BUF_LEN (8 + MCDI_CTL_SDU_LEN_MAX) 25 + 26 + static inline struct cdx_mcdi_iface *cdx_mcdi_if(struct cdx_mcdi *cdx) 27 + { 28 + return cdx->mcdi ? &cdx->mcdi->iface : NULL; 29 + } 30 + 31 + int cdx_mcdi_rpc_async(struct cdx_mcdi *cdx, unsigned int cmd, 32 + const struct cdx_dword *inbuf, size_t inlen, 33 + cdx_mcdi_async_completer *complete, 34 + unsigned long cookie); 35 + int cdx_mcdi_wait_for_quiescence(struct cdx_mcdi *cdx, 36 + unsigned int timeout_jiffies); 37 + 38 + /* 39 + * We expect that 16- and 32-bit fields in MCDI requests and responses 40 + * are appropriately aligned, but 64-bit fields are only 41 + * 32-bit-aligned. 42 + */ 43 + #define MCDI_BYTE(_buf, _field) \ 44 + ((void)BUILD_BUG_ON_ZERO(MC_CMD_ ## _field ## _LEN != 1), \ 45 + *MCDI_PTR(_buf, _field)) 46 + #define MCDI_WORD(_buf, _field) \ 47 + ((void)BUILD_BUG_ON_ZERO(MC_CMD_ ## _field ## _LEN != 2), \ 48 + le16_to_cpu(*(__force const __le16 *)MCDI_PTR(_buf, _field))) 49 + #define MCDI_POPULATE_DWORD_1(_buf, _field, _name1, _value1) \ 50 + CDX_POPULATE_DWORD_1(*_MCDI_DWORD(_buf, _field), \ 51 + MC_CMD_ ## _name1, _value1) 52 + #define MCDI_SET_QWORD(_buf, _field, _value) \ 53 + do { \ 54 + CDX_POPULATE_DWORD_1(_MCDI_DWORD(_buf, _field)[0], \ 55 + CDX_DWORD, (u32)(_value)); \ 56 + CDX_POPULATE_DWORD_1(_MCDI_DWORD(_buf, _field)[1], \ 57 + CDX_DWORD, (u64)(_value) >> 32); \ 58 + } while (0) 59 + #define MCDI_QWORD(_buf, _field) \ 60 + (CDX_DWORD_FIELD(_MCDI_DWORD(_buf, _field)[0], CDX_DWORD) | \ 61 + (u64)CDX_DWORD_FIELD(_MCDI_DWORD(_buf, _field)[1], CDX_DWORD) << 32) 62 + 63 + #endif /* CDX_MCDID_H */
+16
drivers/edac/Kconfig
··· 576 576 errors (CE) only. Loongson-3A5000/3C5000/3D5000/3A6000/3C6000 577 577 are compatible. 578 578 579 + config EDAC_CORTEX_A72 580 + tristate "ARM Cortex A72" 581 + depends on ARM64 582 + help 583 + Support for L1/L2 cache error detection for ARM Cortex A72 processor. 584 + The detected and reported errors are from reading CPU/L2 memory error 585 + syndrome registers. 586 + 587 + config EDAC_VERSALNET 588 + tristate "AMD VersalNET DDR Controller" 589 + depends on CDX_CONTROLLER && ARCH_ZYNQMP 590 + help 591 + Support for single bit error correction, double bit error detection 592 + and other system errors from various IP subsystems like RPU, NOCs, 593 + HNICX, PL on the AMD Versal NET DDR memory controller. 594 + 579 595 endif # EDAC
+2
drivers/edac/Makefile
··· 88 88 obj-$(CONFIG_EDAC_ZYNQMP) += zynqmp_edac.o 89 89 obj-$(CONFIG_EDAC_VERSAL) += versal_edac.o 90 90 obj-$(CONFIG_EDAC_LOONGSON) += loongson_edac.o 91 + obj-$(CONFIG_EDAC_VERSALNET) += versalnet_edac.o 92 + obj-$(CONFIG_EDAC_CORTEX_A72) += a72_edac.o
+225
drivers/edac/a72_edac.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Cortex A72 EDAC L1 and L2 cache error detection 4 + * 5 + * Copyright (c) 2020 Pengutronix, Sascha Hauer <s.hauer@pengutronix.de> 6 + * Copyright (c) 2025 Microsoft Corporation, <vijayb@linux.microsoft.com> 7 + * 8 + * Based on Code from: 9 + * Copyright (c) 2018, NXP Semiconductor 10 + * Author: York Sun <york.sun@nxp.com> 11 + */ 12 + 13 + #include <linux/module.h> 14 + #include <linux/of.h> 15 + #include <linux/bitfield.h> 16 + #include <asm/smp_plat.h> 17 + 18 + #include "edac_module.h" 19 + 20 + #define DRVNAME "a72-edac" 21 + 22 + #define SYS_CPUMERRSR_EL1 sys_reg(3, 1, 15, 2, 2) 23 + #define SYS_L2MERRSR_EL1 sys_reg(3, 1, 15, 2, 3) 24 + 25 + #define CPUMERRSR_EL1_RAMID GENMASK(30, 24) 26 + #define L2MERRSR_EL1_CPUID_WAY GENMASK(21, 18) 27 + 28 + #define CPUMERRSR_EL1_VALID BIT(31) 29 + #define CPUMERRSR_EL1_FATAL BIT(63) 30 + #define L2MERRSR_EL1_VALID BIT(31) 31 + #define L2MERRSR_EL1_FATAL BIT(63) 32 + 33 + #define L1_I_TAG_RAM 0x00 34 + #define L1_I_DATA_RAM 0x01 35 + #define L1_D_TAG_RAM 0x08 36 + #define L1_D_DATA_RAM 0x09 37 + #define TLB_RAM 0x18 38 + 39 + #define MESSAGE_SIZE 64 40 + 41 + struct mem_err_synd_reg { 42 + u64 cpu_mesr; 43 + u64 l2_mesr; 44 + }; 45 + 46 + static struct cpumask compat_mask; 47 + 48 + static void report_errors(struct edac_device_ctl_info *edac_ctl, int cpu, 49 + struct mem_err_synd_reg *mesr) 50 + { 51 + u64 cpu_mesr = mesr->cpu_mesr; 52 + u64 l2_mesr = mesr->l2_mesr; 53 + char msg[MESSAGE_SIZE]; 54 + 55 + if (cpu_mesr & CPUMERRSR_EL1_VALID) { 56 + const char *str; 57 + bool fatal = cpu_mesr & CPUMERRSR_EL1_FATAL; 58 + 59 + switch (FIELD_GET(CPUMERRSR_EL1_RAMID, cpu_mesr)) { 60 + case L1_I_TAG_RAM: 61 + str = "L1-I Tag RAM"; 62 + break; 63 + case L1_I_DATA_RAM: 64 + str = "L1-I Data RAM"; 65 + break; 66 + case L1_D_TAG_RAM: 67 + str = "L1-D Tag RAM"; 68 + break; 69 + case L1_D_DATA_RAM: 70 + str = "L1-D Data RAM"; 71 + break; 72 + case TLB_RAM: 73 + str = "TLB RAM"; 74 + break; 75 + default: 76 + str = "Unspecified"; 77 + break; 78 + } 79 + 80 + snprintf(msg, MESSAGE_SIZE, "%s %s error(s) on CPU %d", 81 + str, fatal ? "fatal" : "correctable", cpu); 82 + 83 + if (fatal) 84 + edac_device_handle_ue(edac_ctl, cpu, 0, msg); 85 + else 86 + edac_device_handle_ce(edac_ctl, cpu, 0, msg); 87 + } 88 + 89 + if (l2_mesr & L2MERRSR_EL1_VALID) { 90 + bool fatal = l2_mesr & L2MERRSR_EL1_FATAL; 91 + 92 + snprintf(msg, MESSAGE_SIZE, "L2 %s error(s) on CPU %d CPUID/WAY 0x%lx", 93 + fatal ? "fatal" : "correctable", cpu, 94 + FIELD_GET(L2MERRSR_EL1_CPUID_WAY, l2_mesr)); 95 + if (fatal) 96 + edac_device_handle_ue(edac_ctl, cpu, 1, msg); 97 + else 98 + edac_device_handle_ce(edac_ctl, cpu, 1, msg); 99 + } 100 + } 101 + 102 + static void read_errors(void *data) 103 + { 104 + struct mem_err_synd_reg *mesr = data; 105 + 106 + mesr->cpu_mesr = read_sysreg_s(SYS_CPUMERRSR_EL1); 107 + if (mesr->cpu_mesr & CPUMERRSR_EL1_VALID) { 108 + write_sysreg_s(0, SYS_CPUMERRSR_EL1); 109 + isb(); 110 + } 111 + mesr->l2_mesr = read_sysreg_s(SYS_L2MERRSR_EL1); 112 + if (mesr->l2_mesr & L2MERRSR_EL1_VALID) { 113 + write_sysreg_s(0, SYS_L2MERRSR_EL1); 114 + isb(); 115 + } 116 + } 117 + 118 + static void a72_edac_check(struct edac_device_ctl_info *edac_ctl) 119 + { 120 + struct mem_err_synd_reg mesr; 121 + int cpu; 122 + 123 + cpus_read_lock(); 124 + for_each_cpu_and(cpu, cpu_online_mask, &compat_mask) { 125 + smp_call_function_single(cpu, read_errors, &mesr, true); 126 + report_errors(edac_ctl, cpu, &mesr); 127 + } 128 + cpus_read_unlock(); 129 + } 130 + 131 + static int a72_edac_probe(struct platform_device *pdev) 132 + { 133 + struct edac_device_ctl_info *edac_ctl; 134 + struct device *dev = &pdev->dev; 135 + int rc; 136 + 137 + edac_ctl = edac_device_alloc_ctl_info(0, "cpu", 138 + num_possible_cpus(), "L", 2, 1, 139 + edac_device_alloc_index()); 140 + if (!edac_ctl) 141 + return -ENOMEM; 142 + 143 + edac_ctl->edac_check = a72_edac_check; 144 + edac_ctl->dev = dev; 145 + edac_ctl->mod_name = dev_name(dev); 146 + edac_ctl->dev_name = dev_name(dev); 147 + edac_ctl->ctl_name = DRVNAME; 148 + dev_set_drvdata(dev, edac_ctl); 149 + 150 + rc = edac_device_add_device(edac_ctl); 151 + if (rc) 152 + goto out_dev; 153 + 154 + return 0; 155 + 156 + out_dev: 157 + edac_device_free_ctl_info(edac_ctl); 158 + 159 + return rc; 160 + } 161 + 162 + static void a72_edac_remove(struct platform_device *pdev) 163 + { 164 + struct edac_device_ctl_info *edac_ctl = dev_get_drvdata(&pdev->dev); 165 + 166 + edac_device_del_device(edac_ctl->dev); 167 + edac_device_free_ctl_info(edac_ctl); 168 + } 169 + 170 + static const struct of_device_id cortex_arm64_edac_of_match[] = { 171 + { .compatible = "arm,cortex-a72" }, 172 + {} 173 + }; 174 + MODULE_DEVICE_TABLE(of, cortex_arm64_edac_of_match); 175 + 176 + static struct platform_driver a72_edac_driver = { 177 + .probe = a72_edac_probe, 178 + .remove = a72_edac_remove, 179 + .driver = { 180 + .name = DRVNAME, 181 + }, 182 + }; 183 + 184 + static struct platform_device *a72_pdev; 185 + 186 + static int __init a72_edac_driver_init(void) 187 + { 188 + int cpu; 189 + 190 + for_each_possible_cpu(cpu) { 191 + struct device_node *np __free(device_node) = of_cpu_device_node_get(cpu); 192 + if (np) { 193 + if (of_match_node(cortex_arm64_edac_of_match, np) && 194 + of_property_read_bool(np, "edac-enabled")) { 195 + cpumask_set_cpu(cpu, &compat_mask); 196 + } 197 + } else { 198 + pr_warn("failed to find device node for CPU %d\n", cpu); 199 + } 200 + } 201 + 202 + if (cpumask_empty(&compat_mask)) 203 + return 0; 204 + 205 + a72_pdev = platform_device_register_simple(DRVNAME, -1, NULL, 0); 206 + if (IS_ERR(a72_pdev)) { 207 + pr_err("failed to register A72 EDAC device\n"); 208 + return PTR_ERR(a72_pdev); 209 + } 210 + 211 + return platform_driver_register(&a72_edac_driver); 212 + } 213 + 214 + static void __exit a72_edac_driver_exit(void) 215 + { 216 + platform_device_unregister(a72_pdev); 217 + platform_driver_unregister(&a72_edac_driver); 218 + } 219 + 220 + module_init(a72_edac_driver_init); 221 + module_exit(a72_edac_driver_exit); 222 + 223 + MODULE_LICENSE("GPL"); 224 + MODULE_AUTHOR("Sascha Hauer <s.hauer@pengutronix.de>"); 225 + MODULE_DESCRIPTION("Cortex A72 L1 and L2 cache EDAC driver");
+2 -2
drivers/edac/altera_edac.c
··· 2130 2130 edac->irq_chip.name = pdev->dev.of_node->name; 2131 2131 edac->irq_chip.irq_mask = a10_eccmgr_irq_mask; 2132 2132 edac->irq_chip.irq_unmask = a10_eccmgr_irq_unmask; 2133 - edac->domain = irq_domain_create_linear(of_fwnode_handle(pdev->dev.of_node), 2134 - 64, &a10_eccmgr_ic_ops, edac); 2133 + edac->domain = irq_domain_create_linear(dev_fwnode(&pdev->dev), 64, &a10_eccmgr_ic_ops, 2134 + edac); 2135 2135 if (!edac->domain) { 2136 2136 dev_err(&pdev->dev, "Error adding IRQ domain\n"); 2137 2137 return -ENOMEM;
+20
drivers/edac/amd64_edac.c
··· 3923 3923 pvt->ctl_name = "F1Ah_M40h"; 3924 3924 pvt->flags.zn_regs_v2 = 1; 3925 3925 break; 3926 + case 0x50 ... 0x57: 3927 + pvt->ctl_name = "F1Ah_M50h"; 3928 + pvt->max_mcs = 16; 3929 + pvt->flags.zn_regs_v2 = 1; 3930 + break; 3931 + case 0x90 ... 0x9f: 3932 + pvt->ctl_name = "F1Ah_M90h"; 3933 + pvt->max_mcs = 8; 3934 + pvt->flags.zn_regs_v2 = 1; 3935 + break; 3936 + case 0xa0 ... 0xaf: 3937 + pvt->ctl_name = "F1Ah_MA0h"; 3938 + pvt->max_mcs = 8; 3939 + pvt->flags.zn_regs_v2 = 1; 3940 + break; 3941 + case 0xc0 ... 0xc7: 3942 + pvt->ctl_name = "F1Ah_MC0h"; 3943 + pvt->max_mcs = 16; 3944 + pvt->flags.zn_regs_v2 = 1; 3945 + break; 3926 3946 } 3927 3947 break; 3928 3948
+1 -1
drivers/edac/amd64_edac.h
··· 96 96 /* Hardware limit on ChipSelect rows per MC and processors per system */ 97 97 #define NUM_CHIPSELECTS 8 98 98 #define DRAM_RANGES 8 99 - #define NUM_CONTROLLERS 12 99 + #define NUM_CONTROLLERS 16 100 100 101 101 #define ON true 102 102 #define OFF false
drivers/edac/ecs.c
+24
drivers/edac/edac_mc_sysfs.c
··· 305 305 channel_dimm_label_show, channel_dimm_label_store, 10); 306 306 DEVICE_CHANNEL(ch11_dimm_label, S_IRUGO | S_IWUSR, 307 307 channel_dimm_label_show, channel_dimm_label_store, 11); 308 + DEVICE_CHANNEL(ch12_dimm_label, S_IRUGO | S_IWUSR, 309 + channel_dimm_label_show, channel_dimm_label_store, 12); 310 + DEVICE_CHANNEL(ch13_dimm_label, S_IRUGO | S_IWUSR, 311 + channel_dimm_label_show, channel_dimm_label_store, 13); 312 + DEVICE_CHANNEL(ch14_dimm_label, S_IRUGO | S_IWUSR, 313 + channel_dimm_label_show, channel_dimm_label_store, 14); 314 + DEVICE_CHANNEL(ch15_dimm_label, S_IRUGO | S_IWUSR, 315 + channel_dimm_label_show, channel_dimm_label_store, 15); 308 316 309 317 /* Total possible dynamic DIMM Label attribute file table */ 310 318 static struct attribute *dynamic_csrow_dimm_attr[] = { ··· 328 320 &dev_attr_legacy_ch9_dimm_label.attr.attr, 329 321 &dev_attr_legacy_ch10_dimm_label.attr.attr, 330 322 &dev_attr_legacy_ch11_dimm_label.attr.attr, 323 + &dev_attr_legacy_ch12_dimm_label.attr.attr, 324 + &dev_attr_legacy_ch13_dimm_label.attr.attr, 325 + &dev_attr_legacy_ch14_dimm_label.attr.attr, 326 + &dev_attr_legacy_ch15_dimm_label.attr.attr, 331 327 NULL 332 328 }; 333 329 ··· 360 348 channel_ce_count_show, NULL, 10); 361 349 DEVICE_CHANNEL(ch11_ce_count, S_IRUGO, 362 350 channel_ce_count_show, NULL, 11); 351 + DEVICE_CHANNEL(ch12_ce_count, S_IRUGO, 352 + channel_ce_count_show, NULL, 12); 353 + DEVICE_CHANNEL(ch13_ce_count, S_IRUGO, 354 + channel_ce_count_show, NULL, 13); 355 + DEVICE_CHANNEL(ch14_ce_count, S_IRUGO, 356 + channel_ce_count_show, NULL, 14); 357 + DEVICE_CHANNEL(ch15_ce_count, S_IRUGO, 358 + channel_ce_count_show, NULL, 15); 363 359 364 360 /* Total possible dynamic ce_count attribute file table */ 365 361 static struct attribute *dynamic_csrow_ce_count_attr[] = { ··· 383 363 &dev_attr_legacy_ch9_ce_count.attr.attr, 384 364 &dev_attr_legacy_ch10_ce_count.attr.attr, 385 365 &dev_attr_legacy_ch11_ce_count.attr.attr, 366 + &dev_attr_legacy_ch12_ce_count.attr.attr, 367 + &dev_attr_legacy_ch13_ce_count.attr.attr, 368 + &dev_attr_legacy_ch14_ce_count.attr.attr, 369 + &dev_attr_legacy_ch15_ce_count.attr.attr, 386 370 NULL 387 371 }; 388 372
+21 -6
drivers/edac/i10nm_base.c
··· 468 468 return -ENODEV; 469 469 } 470 470 471 - if (imc_num > I10NM_NUM_DDR_IMC) { 472 - i10nm_printk(KERN_ERR, "Need to make I10NM_NUM_DDR_IMC >= %d\n", imc_num); 473 - return -EINVAL; 474 - } 475 - 476 471 if (cfg->ddr_imc_num != imc_num) { 477 472 /* 478 - * Store the number of present DDR memory controllers. 473 + * Update the configuration data to reflect the number of 474 + * present DDR memory controllers. 479 475 */ 480 476 cfg->ddr_imc_num = imc_num; 481 477 edac_dbg(2, "Set DDR MC number: %d", imc_num); 478 + 479 + /* Release and reallocate skx_dev list with the updated number. */ 480 + skx_remove(); 481 + if (skx_get_all_bus_mappings(cfg, &i10nm_edac_list) <= 0) 482 + return -ENODEV; 482 483 } 483 484 484 485 return 0; ··· 1058 1057 return !!GET_BITFIELD(mcmtr, 2, 2); 1059 1058 } 1060 1059 1060 + static bool i10nm_channel_disabled(struct skx_imc *imc, int chan) 1061 + { 1062 + u32 mcmtr = I10NM_GET_MCMTR(imc, chan); 1063 + 1064 + edac_dbg(1, "mc%d ch%d mcmtr reg %x\n", imc->mc, chan, mcmtr); 1065 + 1066 + return (mcmtr == ~0 || GET_BITFIELD(mcmtr, 18, 18)); 1067 + } 1068 + 1061 1069 static int i10nm_get_dimm_config(struct mem_ctl_info *mci, 1062 1070 struct res_config *cfg) 1063 1071 { ··· 1079 1069 for (i = 0; i < imc->num_channels; i++) { 1080 1070 if (!imc->mbase) 1081 1071 continue; 1072 + 1073 + if (i10nm_channel_disabled(imc, i)) { 1074 + edac_dbg(1, "mc%d ch%d is disabled.\n", imc->mc, i); 1075 + continue; 1076 + } 1082 1077 1083 1078 ndimms = 0; 1084 1079
+4
drivers/edac/ie31200_edac.c
··· 99 99 100 100 /* Alder Lake-S */ 101 101 #define PCI_DEVICE_ID_INTEL_IE31200_ADL_S_1 0x4660 102 + #define PCI_DEVICE_ID_INTEL_IE31200_ADL_S_2 0x4668 /* 8P+4E, e.g. i7-12700K */ 103 + #define PCI_DEVICE_ID_INTEL_IE31200_ADL_S_3 0x4648 /* 6P+4E, e.g. i5-12600K */ 102 104 103 105 /* Bartlett Lake-S */ 104 106 #define PCI_DEVICE_ID_INTEL_IE31200_BTL_S_1 0x4639 ··· 763 761 { PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_IE31200_RPL_S_6), (kernel_ulong_t)&rpl_s_cfg}, 764 762 { PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_IE31200_RPL_HX_1), (kernel_ulong_t)&rpl_s_cfg}, 765 763 { PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_IE31200_ADL_S_1), (kernel_ulong_t)&rpl_s_cfg}, 764 + { PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_IE31200_ADL_S_2), (kernel_ulong_t)&rpl_s_cfg}, 765 + { PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_IE31200_ADL_S_3), (kernel_ulong_t)&rpl_s_cfg}, 766 766 { PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_IE31200_BTL_S_1), (kernel_ulong_t)&rpl_s_cfg}, 767 767 { PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_IE31200_BTL_S_2), (kernel_ulong_t)&rpl_s_cfg}, 768 768 { PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_IE31200_BTL_S_3), (kernel_ulong_t)&rpl_s_cfg},
drivers/edac/mem_repair.c
drivers/edac/scrub.c
+20 -13
drivers/edac/skx_base.c
··· 33 33 #define MASK26 0x3FFFFFF /* Mask for 2^26 */ 34 34 #define MASK29 0x1FFFFFFF /* Mask for 2^29 */ 35 35 36 + static struct res_config skx_cfg = { 37 + .type = SKX, 38 + .decs_did = 0x2016, 39 + .busno_cfg_offset = 0xcc, 40 + .ddr_imc_num = 2, 41 + .ddr_chan_num = 3, 42 + .ddr_dimm_num = 2, 43 + }; 44 + 36 45 static struct skx_dev *get_skx_dev(struct pci_bus *bus, u8 idx) 37 46 { 38 47 struct skx_dev *d; ··· 61 52 62 53 struct munit { 63 54 u16 did; 64 - u16 devfn[SKX_NUM_IMC]; 55 + u16 devfn[2]; 65 56 u8 busidx; 66 57 u8 per_socket; 67 58 enum munittype mtype; ··· 98 89 if (!pdev) 99 90 break; 100 91 ndev++; 101 - if (m->per_socket == SKX_NUM_IMC) { 102 - for (i = 0; i < SKX_NUM_IMC; i++) 92 + if (m->per_socket == skx_cfg.ddr_imc_num) { 93 + for (i = 0; i < skx_cfg.ddr_imc_num; i++) 103 94 if (m->devfn[i] == pdev->devfn) 104 95 break; 105 - if (i == SKX_NUM_IMC) 96 + if (i == skx_cfg.ddr_imc_num) 106 97 goto fail; 107 98 } 108 99 d = get_skx_dev(pdev->bus, m->busidx); ··· 166 157 return -ENODEV; 167 158 } 168 159 169 - static struct res_config skx_cfg = { 170 - .type = SKX, 171 - .decs_did = 0x2016, 172 - .busno_cfg_offset = 0xcc, 173 - }; 174 - 175 160 static const struct x86_cpu_id skx_cpuids[] = { 176 161 X86_MATCH_VFM(INTEL_SKYLAKE_X, &skx_cfg), 177 162 { } ··· 189 186 /* Only the mcmtr on the first channel is effective */ 190 187 pci_read_config_dword(imc->chan[0].cdev, 0x87c, &mcmtr); 191 188 192 - for (i = 0; i < SKX_NUM_CHANNELS; i++) { 189 + for (i = 0; i < cfg->ddr_chan_num; i++) { 193 190 ndimms = 0; 194 191 pci_read_config_dword(imc->chan[i].cdev, 0x8C, &amap); 195 192 pci_read_config_dword(imc->chan[i].cdev, 0x400, &mcddrtcfg); 196 - for (j = 0; j < SKX_NUM_DIMMS; j++) { 193 + for (j = 0; j < cfg->ddr_dimm_num; j++) { 197 194 dimm = edac_get_dimm(mci, i, j, 0); 198 195 pci_read_config_dword(imc->chan[i].cdev, 199 196 0x80 + 4 * j, &mtr); ··· 623 620 return -ENODEV; 624 621 625 622 cfg = (struct res_config *)id->driver_data; 623 + skx_set_res_cfg(cfg); 626 624 627 625 rc = skx_get_hi_lo(0x2034, off, &skx_tolm, &skx_tohm); 628 626 if (rc) ··· 656 652 goto fail; 657 653 658 654 edac_dbg(2, "src_id = %d\n", src_id); 659 - for (i = 0; i < SKX_NUM_IMC; i++) { 655 + for (i = 0; i < cfg->ddr_imc_num; i++) { 660 656 d->imc[i].mc = mc++; 661 657 d->imc[i].lmc = i; 662 658 d->imc[i].src_id = src_id; 659 + d->imc[i].num_channels = cfg->ddr_chan_num; 660 + d->imc[i].num_dimms = cfg->ddr_dimm_num; 661 + 663 662 rc = skx_register_mci(&d->imc[i], d->imc[i].chan[0].cdev, 664 663 "Skylake Socket", EDAC_MOD_STR, 665 664 skx_get_dimm_config, cfg);
+35 -19
drivers/edac/skx_common.c
··· 14 14 * Copyright (c) 2018, Intel Corporation. 15 15 */ 16 16 17 + #include <linux/topology.h> 17 18 #include <linux/acpi.h> 18 19 #include <linux/dmi.h> 19 20 #include <linux/adxl.h> 21 + #include <linux/overflow.h> 20 22 #include <acpi/nfit.h> 21 23 #include <asm/mce.h> 22 24 #include <asm/uv/uv.h> ··· 132 130 * the logical indices of the memory controllers enumerated by the 133 131 * EDAC driver. 134 132 */ 135 - for (int i = 0; i < NUM_IMC; i++) 136 - d->mc_mapping[i] = i; 133 + for (int i = 0; i < d->num_imc; i++) 134 + d->imc[i].mc_mapping = i; 137 135 } 138 136 139 137 void skx_set_mc_mapping(struct skx_dev *d, u8 pmc, u8 lmc) ··· 141 139 edac_dbg(0, "Set the mapping of mc phy idx to logical idx: %02d -> %02d\n", 142 140 pmc, lmc); 143 141 144 - d->mc_mapping[pmc] = lmc; 142 + d->imc[lmc].mc_mapping = pmc; 145 143 } 146 144 EXPORT_SYMBOL_GPL(skx_set_mc_mapping); 147 145 148 - static u8 skx_get_mc_mapping(struct skx_dev *d, u8 pmc) 146 + static int skx_get_mc_mapping(struct skx_dev *d, u8 pmc) 149 147 { 150 - edac_dbg(0, "Get the mapping of mc phy idx to logical idx: %02d -> %02d\n", 151 - pmc, d->mc_mapping[pmc]); 148 + for (int lmc = 0; lmc < d->num_imc; lmc++) { 149 + if (d->imc[lmc].mc_mapping == pmc) { 150 + edac_dbg(0, "Get the mapping of mc phy idx to logical idx: %02d -> %02d\n", 151 + pmc, lmc); 152 152 153 - return d->mc_mapping[pmc]; 153 + return lmc; 154 + } 155 + } 156 + 157 + return -1; 154 158 } 155 159 156 160 static bool skx_adxl_decode(struct decoded_addr *res, enum error_source err_src) 157 161 { 162 + int i, lmc, len = 0; 158 163 struct skx_dev *d; 159 - int i, len = 0; 160 164 161 165 if (res->addr >= skx_tohm || (res->addr >= skx_tolm && 162 166 res->addr < BIT_ULL(32))) { ··· 208 200 res->cs = (int)adxl_values[component_indices[INDEX_CS]]; 209 201 } 210 202 211 - if (res->imc > NUM_IMC - 1 || res->imc < 0) { 203 + if (res->imc < 0) { 212 204 skx_printk(KERN_ERR, "Bad imc %d\n", res->imc); 213 205 return false; 214 206 } ··· 226 218 return false; 227 219 } 228 220 229 - res->imc = skx_get_mc_mapping(d, res->imc); 221 + lmc = skx_get_mc_mapping(d, res->imc); 222 + if (lmc < 0) { 223 + skx_printk(KERN_ERR, "No lmc for imc %d\n", res->imc); 224 + return false; 225 + } 226 + 227 + res->imc = lmc; 230 228 231 229 for (i = 0; i < adxl_component_count; i++) { 232 230 if (adxl_values[i] == ~0x0ull) ··· 279 265 struct cpuinfo_x86 *c = &cpu_data(cpu); 280 266 281 267 if (c->initialized && cpu_to_node(cpu) == node) { 282 - *id = c->topo.pkg_id; 268 + *id = topology_physical_package_id(cpu); 283 269 return 0; 284 270 } 285 271 } ··· 334 320 */ 335 321 int skx_get_all_bus_mappings(struct res_config *cfg, struct list_head **list) 336 322 { 323 + int ndev = 0, imc_num = cfg->ddr_imc_num + cfg->hbm_imc_num; 337 324 struct pci_dev *pdev, *prev; 338 325 struct skx_dev *d; 339 326 u32 reg; 340 - int ndev = 0; 341 327 342 328 prev = NULL; 343 329 for (;;) { ··· 345 331 if (!pdev) 346 332 break; 347 333 ndev++; 348 - d = kzalloc(sizeof(*d), GFP_KERNEL); 334 + d = kzalloc(struct_size(d, imc, imc_num), GFP_KERNEL); 349 335 if (!d) { 350 336 pci_dev_put(pdev); 351 337 return -ENOMEM; ··· 368 354 d->seg = GET_BITFIELD(reg, 16, 23); 369 355 } 370 356 371 - edac_dbg(2, "busses: 0x%x, 0x%x, 0x%x, 0x%x\n", 372 - d->bus[0], d->bus[1], d->bus[2], d->bus[3]); 357 + d->num_imc = imc_num; 358 + 359 + edac_dbg(2, "busses: 0x%x, 0x%x, 0x%x, 0x%x, imcs %d\n", 360 + d->bus[0], d->bus[1], d->bus[2], d->bus[3], imc_num); 373 361 list_add_tail(&d->list, &dev_edac_list); 374 362 prev = pdev; 375 363 ··· 557 541 558 542 /* Allocate a new MC control structure */ 559 543 layers[0].type = EDAC_MC_LAYER_CHANNEL; 560 - layers[0].size = NUM_CHANNELS; 544 + layers[0].size = imc->num_channels; 561 545 layers[0].is_virt_csrow = false; 562 546 layers[1].type = EDAC_MC_LAYER_SLOT; 563 - layers[1].size = NUM_DIMMS; 547 + layers[1].size = imc->num_dimms; 564 548 layers[1].is_virt_csrow = true; 565 549 mci = edac_mc_alloc(imc->mc, ARRAY_SIZE(layers), layers, 566 550 sizeof(struct skx_pvt)); ··· 800 784 801 785 list_for_each_entry_safe(d, tmp, &dev_edac_list, list) { 802 786 list_del(&d->list); 803 - for (i = 0; i < NUM_IMC; i++) { 787 + for (i = 0; i < d->num_imc; i++) { 804 788 if (d->imc[i].mci) 805 789 skx_unregister_mci(&d->imc[i]); 806 790 ··· 810 794 if (d->imc[i].mbase) 811 795 iounmap(d->imc[i].mbase); 812 796 813 - for (j = 0; j < NUM_CHANNELS; j++) { 797 + for (j = 0; j < d->imc[i].num_channels; j++) { 814 798 if (d->imc[i].chan[j].cdev) 815 799 pci_dev_put(d->imc[i].chan[j].cdev); 816 800 }
+12 -16
drivers/edac/skx_common.h
··· 29 29 #define GET_BITFIELD(v, lo, hi) \ 30 30 (((v) & GENMASK_ULL((hi), (lo))) >> (lo)) 31 31 32 - #define SKX_NUM_IMC 2 /* Memory controllers per socket */ 33 32 #define SKX_NUM_CHANNELS 3 /* Channels per memory controller */ 34 33 #define SKX_NUM_DIMMS 2 /* Max DIMMS per channel */ 35 34 36 - #define I10NM_NUM_DDR_IMC 12 37 35 #define I10NM_NUM_DDR_CHANNELS 2 38 36 #define I10NM_NUM_DDR_DIMMS 2 39 37 40 - #define I10NM_NUM_HBM_IMC 16 41 38 #define I10NM_NUM_HBM_CHANNELS 2 42 39 #define I10NM_NUM_HBM_DIMMS 1 43 40 44 - #define I10NM_NUM_IMC (I10NM_NUM_DDR_IMC + I10NM_NUM_HBM_IMC) 45 41 #define I10NM_NUM_CHANNELS MAX(I10NM_NUM_DDR_CHANNELS, I10NM_NUM_HBM_CHANNELS) 46 42 #define I10NM_NUM_DIMMS MAX(I10NM_NUM_DDR_DIMMS, I10NM_NUM_HBM_DIMMS) 47 43 48 - #define NUM_IMC MAX(SKX_NUM_IMC, I10NM_NUM_IMC) 49 44 #define NUM_CHANNELS MAX(SKX_NUM_CHANNELS, I10NM_NUM_CHANNELS) 50 45 #define NUM_DIMMS MAX(SKX_NUM_DIMMS, I10NM_NUM_DIMMS) 51 46 ··· 129 134 struct pci_dev *uracu; /* for i10nm CPU */ 130 135 struct pci_dev *pcu_cr3; /* for HBM memory detection */ 131 136 u32 mcroute; 132 - /* 133 - * Some server BIOS may hide certain memory controllers, and the 134 - * EDAC driver skips those hidden memory controllers. However, the 135 - * ADXL still decodes memory error address using physical memory 136 - * controller indices. The mapping table is used to convert the 137 - * physical indices (reported by ADXL) to the logical indices 138 - * (used the EDAC driver) of present memory controllers during the 139 - * error handling process. 140 - */ 141 - u8 mc_mapping[NUM_IMC]; 137 + int num_imc; 142 138 struct skx_imc { 143 139 struct mem_ctl_info *mci; 144 140 struct pci_dev *mdev; /* for i10nm CPU */ ··· 141 155 u8 mc; /* system wide mc# */ 142 156 u8 lmc; /* socket relative mc# */ 143 157 u8 src_id; 158 + /* 159 + * Some server BIOS may hide certain memory controllers, and the 160 + * EDAC driver skips those hidden memory controllers. However, the 161 + * ADXL still decodes memory error address using physical memory 162 + * controller indices. The mapping table is used to convert the 163 + * physical indices (reported by ADXL) to the logical indices 164 + * (used the EDAC driver) of present memory controllers during the 165 + * error handling process. 166 + */ 167 + u8 mc_mapping; 144 168 struct skx_channel { 145 169 struct pci_dev *cdev; 146 170 struct pci_dev *edev; ··· 167 171 u8 colbits; 168 172 } dimms[NUM_DIMMS]; 169 173 } chan[NUM_CHANNELS]; 170 - } imc[NUM_IMC]; 174 + } imc[]; 171 175 }; 172 176 173 177 struct skx_pvt {
+960
drivers/edac/versalnet_edac.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * AMD Versal NET memory controller driver 4 + * Copyright (C) 2025 Advanced Micro Devices, Inc. 5 + */ 6 + 7 + #include <linux/cdx/edac_cdx_pcol.h> 8 + #include <linux/edac.h> 9 + #include <linux/module.h> 10 + #include <linux/of_device.h> 11 + #include <linux/ras.h> 12 + #include <linux/remoteproc.h> 13 + #include <linux/rpmsg.h> 14 + #include <linux/sizes.h> 15 + #include <ras/ras_event.h> 16 + 17 + #include "edac_module.h" 18 + 19 + /* Granularity of reported error in bytes */ 20 + #define MC5_ERR_GRAIN 1 21 + #define MC_GET_DDR_CONFIG_IN_LEN 4 22 + 23 + #define MC5_IRQ_CE_MASK GENMASK(18, 15) 24 + #define MC5_IRQ_UE_MASK GENMASK(14, 11) 25 + 26 + #define MC5_RANK_1_MASK GENMASK(11, 6) 27 + #define MASK_24 GENMASK(29, 24) 28 + #define MASK_0 GENMASK(5, 0) 29 + 30 + #define MC5_LRANK_1_MASK GENMASK(11, 6) 31 + #define MC5_LRANK_2_MASK GENMASK(17, 12) 32 + #define MC5_BANK1_MASK GENMASK(11, 6) 33 + #define MC5_GRP_0_MASK GENMASK(17, 12) 34 + #define MC5_GRP_1_MASK GENMASK(23, 18) 35 + 36 + #define MC5_REGHI_ROW 7 37 + #define MC5_EACHBIT 1 38 + #define MC5_ERR_TYPE_CE 0 39 + #define MC5_ERR_TYPE_UE 1 40 + #define MC5_HIGH_MEM_EN BIT(20) 41 + #define MC5_MEM_MASK GENMASK(19, 0) 42 + #define MC5_X16_BASE 256 43 + #define MC5_X16_ECC 32 44 + #define MC5_X16_SIZE (MC5_X16_BASE + MC5_X16_ECC) 45 + #define MC5_X32_SIZE 576 46 + #define MC5_HIMEM_BASE (256 * SZ_1M) 47 + #define MC5_ILC_HIMEM_EN BIT(28) 48 + #define MC5_ILC_MEM GENMASK(27, 0) 49 + #define MC5_INTERLEAVE_SEL GENMASK(3, 0) 50 + #define MC5_BUS_WIDTH_MASK GENMASK(19, 18) 51 + #define MC5_NUM_CHANS_MASK BIT(17) 52 + #define MC5_RANK_MASK GENMASK(15, 14) 53 + 54 + #define ERROR_LEVEL 2 55 + #define ERROR_ID 3 56 + #define TOTAL_ERR_LENGTH 5 57 + #define MSG_ERR_OFFSET 8 58 + #define MSG_ERR_LENGTH 9 59 + #define ERROR_DATA 10 60 + #define MCDI_RESPONSE 0xFF 61 + 62 + #define REG_MAX 152 63 + #define ADEC_MAX 152 64 + #define NUM_CONTROLLERS 8 65 + #define REGS_PER_CONTROLLER 19 66 + #define ADEC_NUM 19 67 + #define BUFFER_SZ 80 68 + 69 + #define XDDR5_BUS_WIDTH_64 0 70 + #define XDDR5_BUS_WIDTH_32 1 71 + #define XDDR5_BUS_WIDTH_16 2 72 + 73 + /** 74 + * struct ecc_error_info - ECC error log information. 75 + * @burstpos: Burst position. 76 + * @lrank: Logical Rank number. 77 + * @rank: Rank number. 78 + * @group: Group number. 79 + * @bank: Bank number. 80 + * @col: Column number. 81 + * @row: Row number. 82 + * @rowhi: Row number higher bits. 83 + * @i: Combined ECC error vector containing encoded values of burst position, 84 + * rank, bank, column, and row information. 85 + */ 86 + union ecc_error_info { 87 + struct { 88 + u32 burstpos:3; 89 + u32 lrank:4; 90 + u32 rank:2; 91 + u32 group:3; 92 + u32 bank:2; 93 + u32 col:11; 94 + u32 row:7; 95 + u32 rowhi; 96 + }; 97 + u64 i; 98 + } __packed; 99 + 100 + /* Row and column bit positions in the address decoder (ADEC) registers. */ 101 + union row_col_mapping { 102 + struct { 103 + u32 row0:6; 104 + u32 row1:6; 105 + u32 row2:6; 106 + u32 row3:6; 107 + u32 row4:6; 108 + u32 reserved:2; 109 + }; 110 + struct { 111 + u32 col1:6; 112 + u32 col2:6; 113 + u32 col3:6; 114 + u32 col4:6; 115 + u32 col5:6; 116 + u32 reservedcol:2; 117 + }; 118 + u32 i; 119 + } __packed; 120 + 121 + /** 122 + * struct ecc_status - ECC status information to report. 123 + * @ceinfo: Correctable errors. 124 + * @ueinfo: Uncorrected errors. 125 + * @channel: Channel number. 126 + * @error_type: Error type. 127 + */ 128 + struct ecc_status { 129 + union ecc_error_info ceinfo[2]; 130 + union ecc_error_info ueinfo[2]; 131 + u8 channel; 132 + u8 error_type; 133 + }; 134 + 135 + /** 136 + * struct mc_priv - DDR memory controller private instance data. 137 + * @message: Buffer for framing the event specific info. 138 + * @stat: ECC status information. 139 + * @error_id: The error id. 140 + * @error_level: The error level. 141 + * @dwidth: Width of data bus excluding ECC bits. 142 + * @part_len: The support of the message received. 143 + * @regs: The registers sent on the rpmsg. 144 + * @adec: Address decode registers. 145 + * @mci: Memory controller interface. 146 + * @ept: rpmsg endpoint. 147 + * @mcdi: The mcdi handle. 148 + */ 149 + struct mc_priv { 150 + char message[256]; 151 + struct ecc_status stat; 152 + u32 error_id; 153 + u32 error_level; 154 + u32 dwidth; 155 + u32 part_len; 156 + u32 regs[REG_MAX]; 157 + u32 adec[ADEC_MAX]; 158 + struct mem_ctl_info *mci[NUM_CONTROLLERS]; 159 + struct rpmsg_endpoint *ept; 160 + struct cdx_mcdi *mcdi; 161 + }; 162 + 163 + /* 164 + * Address decoder (ADEC) registers to match the order in which the register 165 + * information is received from the firmware. 166 + */ 167 + enum adec_info { 168 + CONF = 0, 169 + ADEC0, 170 + ADEC1, 171 + ADEC2, 172 + ADEC3, 173 + ADEC4, 174 + ADEC5, 175 + ADEC6, 176 + ADEC7, 177 + ADEC8, 178 + ADEC9, 179 + ADEC10, 180 + ADEC11, 181 + ADEC12, 182 + ADEC13, 183 + ADEC14, 184 + ADEC15, 185 + ADEC16, 186 + ADECILC, 187 + }; 188 + 189 + enum reg_info { 190 + ISR = 0, 191 + IMR, 192 + ECCR0_ERR_STATUS, 193 + ECCR0_ADDR_LO, 194 + ECCR0_ADDR_HI, 195 + ECCR0_DATA_LO, 196 + ECCR0_DATA_HI, 197 + ECCR0_PAR, 198 + ECCR1_ERR_STATUS, 199 + ECCR1_ADDR_LO, 200 + ECCR1_ADDR_HI, 201 + ECCR1_DATA_LO, 202 + ECCR1_DATA_HI, 203 + ECCR1_PAR, 204 + XMPU_ERR, 205 + XMPU_ERR_ADDR_L0, 206 + XMPU_ERR_ADDR_HI, 207 + XMPU_ERR_AXI_ID, 208 + ADEC_CHK_ERR_LOG, 209 + }; 210 + 211 + static bool get_ddr_info(u32 *error_data, struct mc_priv *priv) 212 + { 213 + u32 reglo, reghi, parity, eccr0_val, eccr1_val, isr; 214 + struct ecc_status *p; 215 + 216 + isr = error_data[ISR]; 217 + 218 + if (!(isr & (MC5_IRQ_UE_MASK | MC5_IRQ_CE_MASK))) 219 + return false; 220 + 221 + eccr0_val = error_data[ECCR0_ERR_STATUS]; 222 + eccr1_val = error_data[ECCR1_ERR_STATUS]; 223 + 224 + if (!eccr0_val && !eccr1_val) 225 + return false; 226 + 227 + p = &priv->stat; 228 + 229 + if (!eccr0_val) 230 + p->channel = 1; 231 + else 232 + p->channel = 0; 233 + 234 + reglo = error_data[ECCR0_ADDR_LO]; 235 + reghi = error_data[ECCR0_ADDR_HI]; 236 + if (isr & MC5_IRQ_CE_MASK) 237 + p->ceinfo[0].i = reglo | (u64)reghi << 32; 238 + else if (isr & MC5_IRQ_UE_MASK) 239 + p->ueinfo[0].i = reglo | (u64)reghi << 32; 240 + 241 + parity = error_data[ECCR0_PAR]; 242 + edac_dbg(2, "ERR DATA: 0x%08X%08X PARITY: 0x%08X\n", 243 + reghi, reglo, parity); 244 + 245 + reglo = error_data[ECCR1_ADDR_LO]; 246 + reghi = error_data[ECCR1_ADDR_HI]; 247 + if (isr & MC5_IRQ_CE_MASK) 248 + p->ceinfo[1].i = reglo | (u64)reghi << 32; 249 + else if (isr & MC5_IRQ_UE_MASK) 250 + p->ueinfo[1].i = reglo | (u64)reghi << 32; 251 + 252 + parity = error_data[ECCR1_PAR]; 253 + edac_dbg(2, "ERR DATA: 0x%08X%08X PARITY: 0x%08X\n", 254 + reghi, reglo, parity); 255 + 256 + return true; 257 + } 258 + 259 + /** 260 + * convert_to_physical - Convert @error_data to a physical address. 261 + * @priv: DDR memory controller private instance data. 262 + * @pinf: ECC error info structure. 263 + * @controller: Controller number of the MC5 264 + * @error_data: the DDRMC5 ADEC address decoder register data 265 + * 266 + * Return: physical address of the DDR memory. 267 + */ 268 + static unsigned long convert_to_physical(struct mc_priv *priv, 269 + union ecc_error_info pinf, 270 + int controller, int *error_data) 271 + { 272 + u32 row, blk, rsh_req_addr, interleave, ilc_base_ctrl_add, ilc_himem_en, reg, offset; 273 + u64 high_mem_base, high_mem_offset, low_mem_offset, ilcmem_base; 274 + unsigned long err_addr = 0, addr; 275 + union row_col_mapping cols; 276 + union row_col_mapping rows; 277 + u32 col_bit_0; 278 + 279 + row = pinf.rowhi << MC5_REGHI_ROW | pinf.row; 280 + offset = controller * ADEC_NUM; 281 + 282 + reg = error_data[ADEC6]; 283 + rows.i = reg; 284 + err_addr |= (row & BIT(0)) << rows.row0; 285 + row >>= MC5_EACHBIT; 286 + err_addr |= (row & BIT(0)) << rows.row1; 287 + row >>= MC5_EACHBIT; 288 + err_addr |= (row & BIT(0)) << rows.row2; 289 + row >>= MC5_EACHBIT; 290 + err_addr |= (row & BIT(0)) << rows.row3; 291 + row >>= MC5_EACHBIT; 292 + err_addr |= (row & BIT(0)) << rows.row4; 293 + row >>= MC5_EACHBIT; 294 + 295 + reg = error_data[ADEC7]; 296 + rows.i = reg; 297 + err_addr |= (row & BIT(0)) << rows.row0; 298 + row >>= MC5_EACHBIT; 299 + err_addr |= (row & BIT(0)) << rows.row1; 300 + row >>= MC5_EACHBIT; 301 + err_addr |= (row & BIT(0)) << rows.row2; 302 + row >>= MC5_EACHBIT; 303 + err_addr |= (row & BIT(0)) << rows.row3; 304 + row >>= MC5_EACHBIT; 305 + err_addr |= (row & BIT(0)) << rows.row4; 306 + row >>= MC5_EACHBIT; 307 + 308 + reg = error_data[ADEC8]; 309 + rows.i = reg; 310 + err_addr |= (row & BIT(0)) << rows.row0; 311 + row >>= MC5_EACHBIT; 312 + err_addr |= (row & BIT(0)) << rows.row1; 313 + row >>= MC5_EACHBIT; 314 + err_addr |= (row & BIT(0)) << rows.row2; 315 + row >>= MC5_EACHBIT; 316 + err_addr |= (row & BIT(0)) << rows.row3; 317 + row >>= MC5_EACHBIT; 318 + err_addr |= (row & BIT(0)) << rows.row4; 319 + 320 + reg = error_data[ADEC9]; 321 + rows.i = reg; 322 + 323 + err_addr |= (row & BIT(0)) << rows.row0; 324 + row >>= MC5_EACHBIT; 325 + err_addr |= (row & BIT(0)) << rows.row1; 326 + row >>= MC5_EACHBIT; 327 + err_addr |= (row & BIT(0)) << rows.row2; 328 + row >>= MC5_EACHBIT; 329 + 330 + col_bit_0 = FIELD_GET(MASK_24, error_data[ADEC9]); 331 + pinf.col >>= 1; 332 + err_addr |= (pinf.col & 1) << col_bit_0; 333 + 334 + cols.i = error_data[ADEC10]; 335 + err_addr |= (pinf.col & 1) << cols.col1; 336 + pinf.col >>= 1; 337 + err_addr |= (pinf.col & 1) << cols.col2; 338 + pinf.col >>= 1; 339 + err_addr |= (pinf.col & 1) << cols.col3; 340 + pinf.col >>= 1; 341 + err_addr |= (pinf.col & 1) << cols.col4; 342 + pinf.col >>= 1; 343 + err_addr |= (pinf.col & 1) << cols.col5; 344 + pinf.col >>= 1; 345 + 346 + cols.i = error_data[ADEC11]; 347 + err_addr |= (pinf.col & 1) << cols.col1; 348 + pinf.col >>= 1; 349 + err_addr |= (pinf.col & 1) << cols.col2; 350 + pinf.col >>= 1; 351 + err_addr |= (pinf.col & 1) << cols.col3; 352 + pinf.col >>= 1; 353 + err_addr |= (pinf.col & 1) << cols.col4; 354 + pinf.col >>= 1; 355 + err_addr |= (pinf.col & 1) << cols.col5; 356 + pinf.col >>= 1; 357 + 358 + reg = error_data[ADEC12]; 359 + err_addr |= (pinf.bank & BIT(0)) << (reg & MASK_0); 360 + pinf.bank >>= MC5_EACHBIT; 361 + err_addr |= (pinf.bank & BIT(0)) << FIELD_GET(MC5_BANK1_MASK, reg); 362 + pinf.bank >>= MC5_EACHBIT; 363 + 364 + err_addr |= (pinf.bank & BIT(0)) << FIELD_GET(MC5_GRP_0_MASK, reg); 365 + pinf.group >>= MC5_EACHBIT; 366 + err_addr |= (pinf.bank & BIT(0)) << FIELD_GET(MC5_GRP_1_MASK, reg); 367 + pinf.group >>= MC5_EACHBIT; 368 + err_addr |= (pinf.bank & BIT(0)) << FIELD_GET(MASK_24, reg); 369 + pinf.group >>= MC5_EACHBIT; 370 + 371 + reg = error_data[ADEC4]; 372 + err_addr |= (pinf.rank & BIT(0)) << (reg & MASK_0); 373 + pinf.rank >>= MC5_EACHBIT; 374 + err_addr |= (pinf.rank & BIT(0)) << FIELD_GET(MC5_RANK_1_MASK, reg); 375 + pinf.rank >>= MC5_EACHBIT; 376 + 377 + reg = error_data[ADEC5]; 378 + err_addr |= (pinf.lrank & BIT(0)) << (reg & MASK_0); 379 + pinf.lrank >>= MC5_EACHBIT; 380 + err_addr |= (pinf.lrank & BIT(0)) << FIELD_GET(MC5_LRANK_1_MASK, reg); 381 + pinf.lrank >>= MC5_EACHBIT; 382 + err_addr |= (pinf.lrank & BIT(0)) << FIELD_GET(MC5_LRANK_2_MASK, reg); 383 + pinf.lrank >>= MC5_EACHBIT; 384 + err_addr |= (pinf.lrank & BIT(0)) << FIELD_GET(MASK_24, reg); 385 + pinf.lrank >>= MC5_EACHBIT; 386 + 387 + high_mem_base = (priv->adec[ADEC2 + offset] & MC5_MEM_MASK) * MC5_HIMEM_BASE; 388 + interleave = priv->adec[ADEC13 + offset] & MC5_INTERLEAVE_SEL; 389 + 390 + high_mem_offset = priv->adec[ADEC3 + offset] & MC5_MEM_MASK; 391 + low_mem_offset = priv->adec[ADEC1 + offset] & MC5_MEM_MASK; 392 + reg = priv->adec[ADEC14 + offset]; 393 + ilc_himem_en = !!(reg & MC5_ILC_HIMEM_EN); 394 + ilcmem_base = (reg & MC5_ILC_MEM) * SZ_1M; 395 + if (ilc_himem_en) 396 + ilc_base_ctrl_add = ilcmem_base - high_mem_offset; 397 + else 398 + ilc_base_ctrl_add = ilcmem_base - low_mem_offset; 399 + 400 + if (priv->dwidth == DEV_X16) { 401 + blk = err_addr / MC5_X16_SIZE; 402 + rsh_req_addr = (blk << 8) + ilc_base_ctrl_add; 403 + err_addr = rsh_req_addr * interleave * 2; 404 + } else { 405 + blk = err_addr / MC5_X32_SIZE; 406 + rsh_req_addr = (blk << 9) + ilc_base_ctrl_add; 407 + err_addr = rsh_req_addr * interleave * 2; 408 + } 409 + 410 + if ((priv->adec[ADEC2 + offset] & MC5_HIGH_MEM_EN) && err_addr >= high_mem_base) 411 + addr = err_addr - high_mem_offset; 412 + else 413 + addr = err_addr - low_mem_offset; 414 + 415 + return addr; 416 + } 417 + 418 + /** 419 + * handle_error - Handle errors. 420 + * @priv: DDR memory controller private instance data. 421 + * @stat: ECC status structure. 422 + * @ctl_num: Controller number of the MC5 423 + * @error_data: the MC5 ADEC address decoder register data 424 + * 425 + * Handles ECC correctable and uncorrectable errors. 426 + */ 427 + static void handle_error(struct mc_priv *priv, struct ecc_status *stat, 428 + int ctl_num, int *error_data) 429 + { 430 + union ecc_error_info pinf; 431 + struct mem_ctl_info *mci; 432 + unsigned long pa; 433 + phys_addr_t pfn; 434 + int err; 435 + 436 + if (WARN_ON_ONCE(ctl_num > NUM_CONTROLLERS)) 437 + return; 438 + 439 + mci = priv->mci[ctl_num]; 440 + 441 + if (stat->error_type == MC5_ERR_TYPE_CE) { 442 + pinf = stat->ceinfo[stat->channel]; 443 + snprintf(priv->message, sizeof(priv->message), 444 + "Error type:%s Controller %d Addr at %lx\n", 445 + "CE", ctl_num, convert_to_physical(priv, pinf, ctl_num, error_data)); 446 + 447 + edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci, 448 + 1, 0, 0, 0, 0, 0, -1, 449 + priv->message, ""); 450 + } 451 + 452 + if (stat->error_type == MC5_ERR_TYPE_UE) { 453 + pinf = stat->ueinfo[stat->channel]; 454 + snprintf(priv->message, sizeof(priv->message), 455 + "Error type:%s controller %d Addr at %lx\n", 456 + "UE", ctl_num, convert_to_physical(priv, pinf, ctl_num, error_data)); 457 + 458 + edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci, 459 + 1, 0, 0, 0, 0, 0, -1, 460 + priv->message, ""); 461 + pa = convert_to_physical(priv, pinf, ctl_num, error_data); 462 + pfn = PHYS_PFN(pa); 463 + 464 + if (IS_ENABLED(CONFIG_MEMORY_FAILURE)) { 465 + err = memory_failure(pfn, MF_ACTION_REQUIRED); 466 + if (err) 467 + edac_dbg(2, "memory_failure() error: %d", err); 468 + else 469 + edac_dbg(2, "Poison page at PA 0x%lx\n", pa); 470 + } 471 + } 472 + } 473 + 474 + static void mc_init(struct mem_ctl_info *mci, struct device *dev) 475 + { 476 + struct mc_priv *priv = mci->pvt_info; 477 + struct csrow_info *csi; 478 + struct dimm_info *dimm; 479 + u32 row; 480 + int ch; 481 + 482 + /* Initialize controller capabilities and configuration */ 483 + mci->mtype_cap = MEM_FLAG_DDR5; 484 + mci->edac_ctl_cap = EDAC_FLAG_NONE | EDAC_FLAG_SECDED; 485 + mci->scrub_cap = SCRUB_HW_SRC; 486 + mci->scrub_mode = SCRUB_NONE; 487 + 488 + mci->edac_cap = EDAC_FLAG_SECDED; 489 + mci->ctl_name = "VersalNET DDR5"; 490 + mci->dev_name = dev_name(dev); 491 + mci->mod_name = "versalnet_edac"; 492 + 493 + edac_op_state = EDAC_OPSTATE_INT; 494 + 495 + for (row = 0; row < mci->nr_csrows; row++) { 496 + csi = mci->csrows[row]; 497 + for (ch = 0; ch < csi->nr_channels; ch++) { 498 + dimm = csi->channels[ch]->dimm; 499 + dimm->edac_mode = EDAC_SECDED; 500 + dimm->mtype = MEM_DDR5; 501 + dimm->grain = MC5_ERR_GRAIN; 502 + dimm->dtype = priv->dwidth; 503 + } 504 + } 505 + } 506 + 507 + #define to_mci(k) container_of(k, struct mem_ctl_info, dev) 508 + 509 + static unsigned int mcdi_rpc_timeout(struct cdx_mcdi *cdx, unsigned int cmd) 510 + { 511 + return MCDI_RPC_TIMEOUT; 512 + } 513 + 514 + static void mcdi_request(struct cdx_mcdi *cdx, 515 + const struct cdx_dword *hdr, size_t hdr_len, 516 + const struct cdx_dword *sdu, size_t sdu_len) 517 + { 518 + void *send_buf; 519 + int ret; 520 + 521 + send_buf = kzalloc(hdr_len + sdu_len, GFP_KERNEL); 522 + if (!send_buf) 523 + return; 524 + 525 + memcpy(send_buf, hdr, hdr_len); 526 + memcpy(send_buf + hdr_len, sdu, sdu_len); 527 + 528 + ret = rpmsg_send(cdx->ept, send_buf, hdr_len + sdu_len); 529 + if (ret) 530 + dev_err(&cdx->rpdev->dev, "Failed to send rpmsg data: %d\n", ret); 531 + 532 + kfree(send_buf); 533 + } 534 + 535 + static const struct cdx_mcdi_ops mcdi_ops = { 536 + .mcdi_rpc_timeout = mcdi_rpc_timeout, 537 + .mcdi_request = mcdi_request, 538 + }; 539 + 540 + static void get_ddr_config(u32 index, u32 *buffer, struct cdx_mcdi *amd_mcdi) 541 + { 542 + size_t outlen; 543 + int ret; 544 + 545 + MCDI_DECLARE_BUF(inbuf, MC_GET_DDR_CONFIG_IN_LEN); 546 + MCDI_DECLARE_BUF(outbuf, BUFFER_SZ); 547 + 548 + MCDI_SET_DWORD(inbuf, EDAC_GET_DDR_CONFIG_IN_CONTROLLER_INDEX, index); 549 + 550 + ret = cdx_mcdi_rpc(amd_mcdi, MC_CMD_EDAC_GET_DDR_CONFIG, inbuf, sizeof(inbuf), 551 + outbuf, sizeof(outbuf), &outlen); 552 + if (!ret) 553 + memcpy(buffer, MCDI_PTR(outbuf, GET_DDR_CONFIG), 554 + (ADEC_NUM * 4)); 555 + } 556 + 557 + static int setup_mcdi(struct mc_priv *mc_priv) 558 + { 559 + struct cdx_mcdi *amd_mcdi; 560 + int ret, i; 561 + 562 + amd_mcdi = kzalloc(sizeof(*amd_mcdi), GFP_KERNEL); 563 + if (!amd_mcdi) 564 + return -ENOMEM; 565 + 566 + amd_mcdi->mcdi_ops = &mcdi_ops; 567 + ret = cdx_mcdi_init(amd_mcdi); 568 + if (ret) { 569 + kfree(amd_mcdi); 570 + return ret; 571 + } 572 + 573 + amd_mcdi->ept = mc_priv->ept; 574 + mc_priv->mcdi = amd_mcdi; 575 + 576 + for (i = 0; i < NUM_CONTROLLERS; i++) 577 + get_ddr_config(i, &mc_priv->adec[ADEC_NUM * i], amd_mcdi); 578 + 579 + return 0; 580 + } 581 + 582 + static const guid_t amd_versalnet_guid = GUID_INIT(0x82678888, 0xa556, 0x44f2, 583 + 0xb8, 0xb4, 0x45, 0x56, 0x2e, 584 + 0x8c, 0x5b, 0xec); 585 + 586 + static int rpmsg_cb(struct rpmsg_device *rpdev, void *data, 587 + int len, void *priv, u32 src) 588 + { 589 + struct mc_priv *mc_priv = dev_get_drvdata(&rpdev->dev); 590 + const guid_t *sec_type = &guid_null; 591 + u32 length, offset, error_id; 592 + u32 *result = (u32 *)data; 593 + struct ecc_status *p; 594 + int i, j, k, sec_sev; 595 + const char *err_str; 596 + u32 *adec_data; 597 + 598 + if (*(u8 *)data == MCDI_RESPONSE) { 599 + cdx_mcdi_process_cmd(mc_priv->mcdi, (struct cdx_dword *)data, len); 600 + return 0; 601 + } 602 + 603 + sec_sev = result[ERROR_LEVEL]; 604 + error_id = result[ERROR_ID]; 605 + length = result[MSG_ERR_LENGTH]; 606 + offset = result[MSG_ERR_OFFSET]; 607 + 608 + if (result[TOTAL_ERR_LENGTH] > length) { 609 + if (!mc_priv->part_len) 610 + mc_priv->part_len = length; 611 + else 612 + mc_priv->part_len += length; 613 + /* 614 + * The data can come in 2 stretches. Construct the regs from 2 615 + * messages the offset indicates the offset from which the data is to 616 + * be taken 617 + */ 618 + for (i = 0 ; i < length; i++) { 619 + k = offset + i; 620 + j = ERROR_DATA + i; 621 + mc_priv->regs[k] = result[j]; 622 + } 623 + if (mc_priv->part_len < result[TOTAL_ERR_LENGTH]) 624 + return 0; 625 + mc_priv->part_len = 0; 626 + } 627 + 628 + mc_priv->error_id = error_id; 629 + mc_priv->error_level = result[ERROR_LEVEL]; 630 + 631 + switch (error_id) { 632 + case 5: err_str = "General Software Non-Correctable error"; break; 633 + case 6: err_str = "CFU error"; break; 634 + case 7: err_str = "CFRAME error"; break; 635 + case 10: err_str = "DDRMC Microblaze Correctable ECC error"; break; 636 + case 11: err_str = "DDRMC Microblaze Non-Correctable ECC error"; break; 637 + case 15: err_str = "MMCM error"; break; 638 + case 16: err_str = "HNICX Correctable error"; break; 639 + case 17: err_str = "HNICX Non-Correctable error"; break; 640 + 641 + case 18: 642 + p = &mc_priv->stat; 643 + memset(p, 0, sizeof(struct ecc_status)); 644 + p->error_type = MC5_ERR_TYPE_CE; 645 + for (i = 0 ; i < NUM_CONTROLLERS; i++) { 646 + if (get_ddr_info(&mc_priv->regs[i * REGS_PER_CONTROLLER], mc_priv)) { 647 + adec_data = mc_priv->adec + ADEC_NUM * i; 648 + handle_error(mc_priv, &mc_priv->stat, i, adec_data); 649 + } 650 + } 651 + return 0; 652 + case 19: 653 + p = &mc_priv->stat; 654 + memset(p, 0, sizeof(struct ecc_status)); 655 + p->error_type = MC5_ERR_TYPE_UE; 656 + for (i = 0 ; i < NUM_CONTROLLERS; i++) { 657 + if (get_ddr_info(&mc_priv->regs[i * REGS_PER_CONTROLLER], mc_priv)) { 658 + adec_data = mc_priv->adec + ADEC_NUM * i; 659 + handle_error(mc_priv, &mc_priv->stat, i, adec_data); 660 + } 661 + } 662 + return 0; 663 + 664 + case 21: err_str = "GT Non-Correctable error"; break; 665 + case 22: err_str = "PL Sysmon Correctable error"; break; 666 + case 23: err_str = "PL Sysmon Non-Correctable error"; break; 667 + case 111: err_str = "LPX unexpected dfx activation error"; break; 668 + case 114: err_str = "INT_LPD Non-Correctable error"; break; 669 + case 116: err_str = "INT_OCM Non-Correctable error"; break; 670 + case 117: err_str = "INT_FPD Correctable error"; break; 671 + case 118: err_str = "INT_FPD Non-Correctable error"; break; 672 + case 120: err_str = "INT_IOU Non-Correctable error"; break; 673 + case 123: err_str = "err_int_irq from APU GIC Distributor"; break; 674 + case 124: err_str = "fault_int_irq from APU GIC Distribute"; break; 675 + case 132 ... 139: err_str = "FPX SPLITTER error"; break; 676 + case 140: err_str = "APU Cluster 0 error"; break; 677 + case 141: err_str = "APU Cluster 1 error"; break; 678 + case 142: err_str = "APU Cluster 2 error"; break; 679 + case 143: err_str = "APU Cluster 3 error"; break; 680 + case 145: err_str = "WWDT1 LPX error"; break; 681 + case 147: err_str = "IPI error"; break; 682 + case 152 ... 153: err_str = "AFIFS error"; break; 683 + case 154 ... 155: err_str = "LPX glitch error"; break; 684 + case 185 ... 186: err_str = "FPX AFIFS error"; break; 685 + case 195 ... 199: err_str = "AFIFM error"; break; 686 + case 108: err_str = "PSM Correctable error"; break; 687 + case 59: err_str = "PMC correctable error"; break; 688 + case 60: err_str = "PMC Un correctable error"; break; 689 + case 43 ... 47: err_str = "PMC Sysmon error"; break; 690 + case 163 ... 184: err_str = "RPU error"; break; 691 + case 148: err_str = "OCM0 correctable error"; break; 692 + case 149: err_str = "OCM1 correctable error"; break; 693 + case 150: err_str = "OCM0 Un-correctable error"; break; 694 + case 151: err_str = "OCM1 Un-correctable error"; break; 695 + case 189: err_str = "PSX_CMN_3 PD block consolidated error"; break; 696 + case 191: err_str = "FPD_INT_WRAP PD block consolidated error"; break; 697 + case 232: err_str = "CRAM Un-Correctable error"; break; 698 + default: err_str = "VERSAL_EDAC_ERR_ID: %d"; break; 699 + } 700 + 701 + snprintf(mc_priv->message, 702 + sizeof(mc_priv->message), 703 + "[VERSAL_EDAC_ERR_ID: %d] Error type: %s", error_id, err_str); 704 + 705 + /* Convert to bytes */ 706 + length = result[TOTAL_ERR_LENGTH] * 4; 707 + log_non_standard_event(sec_type, &amd_versalnet_guid, mc_priv->message, 708 + sec_sev, (void *)&result[ERROR_DATA], length); 709 + 710 + return 0; 711 + } 712 + 713 + static struct rpmsg_device_id amd_rpmsg_id_table[] = { 714 + { .name = "error_ipc" }, 715 + { }, 716 + }; 717 + MODULE_DEVICE_TABLE(rpmsg, amd_rpmsg_id_table); 718 + 719 + static int rpmsg_probe(struct rpmsg_device *rpdev) 720 + { 721 + struct rpmsg_channel_info chinfo; 722 + struct mc_priv *pg; 723 + 724 + pg = (struct mc_priv *)amd_rpmsg_id_table[0].driver_data; 725 + chinfo.src = RPMSG_ADDR_ANY; 726 + chinfo.dst = rpdev->dst; 727 + strscpy(chinfo.name, amd_rpmsg_id_table[0].name, 728 + strlen(amd_rpmsg_id_table[0].name)); 729 + 730 + pg->ept = rpmsg_create_ept(rpdev, rpmsg_cb, NULL, chinfo); 731 + if (!pg->ept) 732 + return dev_err_probe(&rpdev->dev, -ENXIO, "Failed to create ept for channel %s\n", 733 + chinfo.name); 734 + 735 + dev_set_drvdata(&rpdev->dev, pg); 736 + 737 + return 0; 738 + } 739 + 740 + static void rpmsg_remove(struct rpmsg_device *rpdev) 741 + { 742 + struct mc_priv *mc_priv = dev_get_drvdata(&rpdev->dev); 743 + 744 + rpmsg_destroy_ept(mc_priv->ept); 745 + dev_set_drvdata(&rpdev->dev, NULL); 746 + } 747 + 748 + static struct rpmsg_driver amd_rpmsg_driver = { 749 + .drv.name = KBUILD_MODNAME, 750 + .probe = rpmsg_probe, 751 + .remove = rpmsg_remove, 752 + .callback = rpmsg_cb, 753 + .id_table = amd_rpmsg_id_table, 754 + }; 755 + 756 + static void versal_edac_release(struct device *dev) 757 + { 758 + kfree(dev); 759 + } 760 + 761 + static int init_versalnet(struct mc_priv *priv, struct platform_device *pdev) 762 + { 763 + u32 num_chans, rank, dwidth, config; 764 + struct edac_mc_layer layers[2]; 765 + struct mem_ctl_info *mci; 766 + struct device *dev; 767 + enum dev_type dt; 768 + char *name; 769 + int rc, i; 770 + 771 + for (i = 0; i < NUM_CONTROLLERS; i++) { 772 + config = priv->adec[CONF + i * ADEC_NUM]; 773 + num_chans = FIELD_GET(MC5_NUM_CHANS_MASK, config); 774 + rank = 1 << FIELD_GET(MC5_RANK_MASK, config); 775 + dwidth = FIELD_GET(MC5_BUS_WIDTH_MASK, config); 776 + 777 + switch (dwidth) { 778 + case XDDR5_BUS_WIDTH_16: 779 + dt = DEV_X16; 780 + break; 781 + case XDDR5_BUS_WIDTH_32: 782 + dt = DEV_X32; 783 + break; 784 + case XDDR5_BUS_WIDTH_64: 785 + dt = DEV_X64; 786 + break; 787 + default: 788 + dt = DEV_UNKNOWN; 789 + } 790 + 791 + if (dt == DEV_UNKNOWN) 792 + continue; 793 + 794 + /* Find the first enabled device and register that one. */ 795 + layers[0].type = EDAC_MC_LAYER_CHIP_SELECT; 796 + layers[0].size = rank; 797 + layers[0].is_virt_csrow = true; 798 + layers[1].type = EDAC_MC_LAYER_CHANNEL; 799 + layers[1].size = num_chans; 800 + layers[1].is_virt_csrow = false; 801 + 802 + rc = -ENOMEM; 803 + mci = edac_mc_alloc(i, ARRAY_SIZE(layers), layers, 804 + sizeof(struct mc_priv)); 805 + if (!mci) { 806 + edac_printk(KERN_ERR, EDAC_MC, "Failed memory allocation for MC%d\n", i); 807 + goto err_alloc; 808 + } 809 + 810 + priv->mci[i] = mci; 811 + priv->dwidth = dt; 812 + 813 + dev = kzalloc(sizeof(*dev), GFP_KERNEL); 814 + dev->release = versal_edac_release; 815 + name = kmalloc(32, GFP_KERNEL); 816 + sprintf(name, "versal-net-ddrmc5-edac-%d", i); 817 + dev->init_name = name; 818 + rc = device_register(dev); 819 + if (rc) 820 + goto err_alloc; 821 + 822 + mci->pdev = dev; 823 + 824 + platform_set_drvdata(pdev, priv); 825 + 826 + mc_init(mci, dev); 827 + rc = edac_mc_add_mc(mci); 828 + if (rc) { 829 + edac_printk(KERN_ERR, EDAC_MC, "Failed to register MC%d with EDAC core\n", i); 830 + goto err_alloc; 831 + } 832 + } 833 + return 0; 834 + 835 + err_alloc: 836 + while (i--) { 837 + mci = priv->mci[i]; 838 + if (!mci) 839 + continue; 840 + 841 + if (mci->pdev) { 842 + device_unregister(mci->pdev); 843 + edac_mc_del_mc(mci->pdev); 844 + } 845 + 846 + edac_mc_free(mci); 847 + } 848 + 849 + return rc; 850 + } 851 + 852 + static void remove_versalnet(struct mc_priv *priv) 853 + { 854 + struct mem_ctl_info *mci; 855 + int i; 856 + 857 + for (i = 0; i < NUM_CONTROLLERS; i++) { 858 + device_unregister(priv->mci[i]->pdev); 859 + mci = edac_mc_del_mc(priv->mci[i]->pdev); 860 + if (!mci) 861 + return; 862 + 863 + edac_mc_free(mci); 864 + } 865 + } 866 + 867 + static int mc_probe(struct platform_device *pdev) 868 + { 869 + struct device_node *r5_core_node; 870 + struct mc_priv *priv; 871 + struct rproc *rp; 872 + int rc; 873 + 874 + r5_core_node = of_parse_phandle(pdev->dev.of_node, "amd,rproc", 0); 875 + if (!r5_core_node) { 876 + dev_err(&pdev->dev, "amd,rproc: invalid phandle\n"); 877 + return -EINVAL; 878 + } 879 + 880 + rp = rproc_get_by_phandle(r5_core_node->phandle); 881 + if (!rp) 882 + return -EPROBE_DEFER; 883 + 884 + rc = rproc_boot(rp); 885 + if (rc) { 886 + dev_err(&pdev->dev, "Failed to attach to remote processor\n"); 887 + goto err_rproc_boot; 888 + } 889 + 890 + priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL); 891 + if (!priv) { 892 + rc = -ENOMEM; 893 + goto err_alloc; 894 + } 895 + 896 + amd_rpmsg_id_table[0].driver_data = (kernel_ulong_t)priv; 897 + 898 + rc = register_rpmsg_driver(&amd_rpmsg_driver); 899 + if (rc) { 900 + edac_printk(KERN_ERR, EDAC_MC, "Failed to register RPMsg driver: %d\n", rc); 901 + goto err_alloc; 902 + } 903 + 904 + rc = setup_mcdi(priv); 905 + if (rc) 906 + goto err_unreg; 907 + 908 + priv->mcdi->r5_rproc = rp; 909 + 910 + rc = init_versalnet(priv, pdev); 911 + if (rc) 912 + goto err_init; 913 + 914 + return 0; 915 + 916 + err_init: 917 + cdx_mcdi_finish(priv->mcdi); 918 + 919 + err_unreg: 920 + unregister_rpmsg_driver(&amd_rpmsg_driver); 921 + 922 + err_alloc: 923 + rproc_shutdown(rp); 924 + 925 + err_rproc_boot: 926 + rproc_put(rp); 927 + 928 + return rc; 929 + } 930 + 931 + static void mc_remove(struct platform_device *pdev) 932 + { 933 + struct mc_priv *priv = platform_get_drvdata(pdev); 934 + 935 + unregister_rpmsg_driver(&amd_rpmsg_driver); 936 + remove_versalnet(priv); 937 + rproc_shutdown(priv->mcdi->r5_rproc); 938 + cdx_mcdi_finish(priv->mcdi); 939 + } 940 + 941 + static const struct of_device_id amd_edac_match[] = { 942 + { .compatible = "xlnx,versal-net-ddrmc5", }, 943 + {} 944 + }; 945 + MODULE_DEVICE_TABLE(of, amd_edac_match); 946 + 947 + static struct platform_driver amd_ddr_edac_mc_driver = { 948 + .driver = { 949 + .name = "versal-net-edac", 950 + .of_match_table = amd_edac_match, 951 + }, 952 + .probe = mc_probe, 953 + .remove = mc_remove, 954 + }; 955 + 956 + module_platform_driver(amd_ddr_edac_mc_driver); 957 + 958 + MODULE_AUTHOR("AMD Inc"); 959 + MODULE_DESCRIPTION("Versal NET EDAC driver"); 960 + MODULE_LICENSE("GPL");
+1
drivers/ras/ras.c
··· 51 51 { 52 52 trace_non_standard_event(sec_type, fru_id, fru_text, sev, err, len); 53 53 } 54 + EXPORT_SYMBOL_GPL(log_non_standard_event); 54 55 55 56 void log_arm_hw_error(struct cper_sec_proc_arm *err) 56 57 {
+28
include/linux/cdx/edac_cdx_pcol.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 2 + * 3 + * Driver for AMD network controllers and boards 4 + * 5 + * Copyright (C) 2021, Xilinx, Inc. 6 + * Copyright (C) 2022-2023, Advanced Micro Devices, Inc. 7 + */ 8 + 9 + #ifndef MC_CDX_PCOL_H 10 + #define MC_CDX_PCOL_H 11 + #include <linux/cdx/mcdi.h> 12 + 13 + #define MC_CMD_EDAC_GET_DDR_CONFIG_OUT_WORD_LENGTH_LEN 4 14 + /* Number of registers for the DDR controller */ 15 + #define MC_CMD_GET_DDR_CONFIG_OFST 4 16 + #define MC_CMD_GET_DDR_CONFIG_LEN 4 17 + 18 + /***********************************/ 19 + /* MC_CMD_EDAC_GET_DDR_CONFIG 20 + * Provides detailed configuration for the DDR controller of the given index. 21 + */ 22 + #define MC_CMD_EDAC_GET_DDR_CONFIG 0x3 23 + 24 + /* MC_CMD_EDAC_GET_DDR_CONFIG_IN msgrequest */ 25 + #define MC_CMD_EDAC_GET_DDR_CONFIG_IN_CONTROLLER_INDEX_OFST 0 26 + #define MC_CMD_EDAC_GET_DDR_CONFIG_IN_CONTROLLER_INDEX_LEN 4 27 + 28 + #endif /* MC_CDX_PCOL_H */