Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

powerpc/powernv: Show checkstop reason for NPU2 HMIs

If the kernel is notified of an HMI caused by the NPU2, it's currently
not being recognized and it logs the default message:

Unknown Malfunction Alert of type 3

The NPU on Power 9 has 3 Fault Isolation Registers, so that's a lot of
possible causes, but we should at least log that it's an NPU problem
and report which FIR and which bit were raised if opal gave us the
information.

Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

authored by

Frederic Barrat and committed by
Michael Ellerman
89d87bcb 1549c42d

+41
+1
arch/powerpc/include/asm/opal-api.h
··· 568 568 CHECKSTOP_TYPE_UNKNOWN = 0, 569 569 CHECKSTOP_TYPE_CORE = 1, 570 570 CHECKSTOP_TYPE_NX = 2, 571 + CHECKSTOP_TYPE_NPU = 3 571 572 }; 572 573 573 574 enum OpalHMI_CoreXstopReason {
+40
arch/powerpc/platforms/powernv/opal-hmi.c
··· 137 137 xstop_reason[i].description); 138 138 } 139 139 140 + static void print_npu_checkstop_reason(const char *level, 141 + struct OpalHMIEvent *hmi_evt) 142 + { 143 + uint8_t reason, reason_count, i; 144 + 145 + /* 146 + * We may not have a checkstop reason on some combination of 147 + * hardware and/or skiboot version 148 + */ 149 + if (!hmi_evt->u.xstop_error.xstop_reason) { 150 + printk("%s NPU checkstop on chip %x\n", level, 151 + be32_to_cpu(hmi_evt->u.xstop_error.u.chip_id)); 152 + return; 153 + } 154 + 155 + /* 156 + * NPU2 has 3 FIRs. Reason encoded on a byte as: 157 + * 2 bits for the FIR number 158 + * 6 bits for the bit number 159 + * It may be possible to find several reasons. 160 + * 161 + * We don't display a specific message per FIR bit as there 162 + * are too many and most are meaningless without the workbook 163 + * and/or hw team help anyway. 164 + */ 165 + reason_count = sizeof(hmi_evt->u.xstop_error.xstop_reason) / 166 + sizeof(reason); 167 + for (i = 0; i < reason_count; i++) { 168 + reason = (hmi_evt->u.xstop_error.xstop_reason >> (8 * i)) & 0xFF; 169 + if (reason) 170 + printk("%s NPU checkstop on chip %x: FIR%d bit %d is set\n", 171 + level, 172 + be32_to_cpu(hmi_evt->u.xstop_error.u.chip_id), 173 + reason >> 6, reason & 0x3F); 174 + } 175 + } 176 + 140 177 static void print_checkstop_reason(const char *level, 141 178 struct OpalHMIEvent *hmi_evt) 142 179 { ··· 184 147 break; 185 148 case CHECKSTOP_TYPE_NX: 186 149 print_nx_checkstop_reason(level, hmi_evt); 150 + break; 151 + case CHECKSTOP_TYPE_NPU: 152 + print_npu_checkstop_reason(level, hmi_evt); 187 153 break; 188 154 default: 189 155 printk("%s Unknown Malfunction Alert of type %d\n",