Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

MIPS: DEC: Rate-limit memory errors for ECC systems

Prevent the system from becoming unusable due to a flood of memory error
messages with DECstation and DECsystem models using ECC, that is KN02,
KN03 and KN05 systems. It seems common for gradual oxidation of memory
module contacts to cause memory errors to eventually develop and while
ECC takes care of correcting them and the system affected can continue
operating normally until the contacts have been cleaned, the unlimited
messages make the system spend all its time on producing them, therefore
preventing it from being used.

Rate-limiting removes the load from the system and enables its normal
operation, e.g.:

Bus error interrupt: CPU memory read ECC error at 0x139cfb04
ECC syndrome 0x54 -- corrected single bit error at data bit D3
Bus error interrupt: CPU partial memory write ECC error at 0x138c1f5c
ECC syndrome 0x54 -- corrected single bit error at data bit D3
Bus error interrupt: CPU partial memory write ECC error at 0x138c1f6c
ECC syndrome 0x54 -- corrected single bit error at data bit D3
Bus error interrupt: CPU memory read ECC error at 0x139cff64
ECC syndrome 0x54 -- corrected single bit error at data bit D3
Bus error interrupt: CPU memory read ECC error at 0x136af00c
ECC syndrome 0x54 -- corrected single bit error at data bit D3
Bus error interrupt: CPU memory read ECC error at 0x136af044
ECC syndrome 0x54 -- corrected single bit error at data bit D3
Bus error interrupt: CPU memory read ECC error at 0x136af0cc
ECC syndrome 0x54 -- corrected single bit error at data bit D3
Bus error interrupt: CPU memory read ECC error at 0x136af0cc
ECC syndrome 0x54 -- corrected single bit error at data bit D3
Bus error interrupt: CPU memory read ECC error at 0x136af0e4
ECC syndrome 0x54 -- corrected single bit error at data bit D3
Bus error interrupt: CPU memory read ECC error at 0x136af104
ECC syndrome 0x54 -- corrected single bit error at data bit D3
dec_ecc_be_backend: 34455 callbacks suppressed

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>

authored by

Maciej W. Rozycki and committed by
Thomas Bogendoerfer
56236b7f a163a96d

+11 -5
+11 -5
arch/mips/dec/ecc-berr.c
··· 5 5 * 5000/240 (KN03), 5000/260 (KN05) and DECsystem 5900 (KN03), 6 6 * 5900/260 (KN05) systems. 7 7 * 8 - * Copyright (c) 2003, 2005 Maciej W. Rozycki 8 + * Copyright (c) 2003, 2005, 2026 Maciej W. Rozycki 9 9 */ 10 10 11 11 #include <linux/init.h> 12 12 #include <linux/interrupt.h> 13 13 #include <linux/kernel.h> 14 + #include <linux/ratelimit.h> 14 15 #include <linux/sched.h> 15 16 #include <linux/types.h> 16 17 ··· 52 51 static const char overstr[] = "overrun"; 53 52 static const char eccstr[] = "ECC error"; 54 53 54 + static DEFINE_RATELIMIT_STATE(rs, 55 + DEFAULT_RATELIMIT_INTERVAL, 56 + DEFAULT_RATELIMIT_BURST); 57 + 55 58 const char *kind, *agent, *cycle, *event; 56 59 const char *status = "", *xbit = "", *fmt = ""; 57 60 unsigned long address; ··· 75 70 76 71 if (!(erraddr & KN0X_EAR_VALID)) { 77 72 /* No idea what happened. */ 78 - printk(KERN_ALERT "Unidentified bus error %s\n", kind); 73 + pr_alert_ratelimited("Unidentified bus error %s\n", kind); 79 74 return action; 80 75 } 81 76 ··· 185 180 } 186 181 } 187 182 188 - if (action != MIPS_BE_FIXUP) 183 + if (action != MIPS_BE_FIXUP && __ratelimit(&rs)) { 189 184 printk(KERN_ALERT "Bus error %s: %s %s %s at %#010lx\n", 190 185 kind, agent, cycle, event, address); 191 186 192 - if (action != MIPS_BE_FIXUP && erraddr & KN0X_EAR_ECCERR) 193 - printk(fmt, " ECC syndrome ", syn, status, xbit, i); 187 + if (erraddr & KN0X_EAR_ECCERR) 188 + printk(fmt, " ECC syndrome ", syn, status, xbit, i); 189 + } 194 190 195 191 return action; 196 192 }