doc: watchdog: document buddy detector · tjh.dev/kernel@cb8615f

+101 -48

1 changed file

expand all

Documentation

admin-guide

+101 -48

Documentation/admin-guide/lockup-watchdogs.rst

··· 30 30 to cause the system to reboot automatically after a specified amount 31 31 of time. 32 32 33 + Configuration 34 + ============= 35 + 36 + A kernel knob is provided that allows administrators to configure 37 + this period. The "watchdog_thresh" parameter (default 10 seconds) 38 + controls the threshold. The right value for a particular environment 39 + is a trade-off between fast response to lockups and detection overhead. 40 + 33 41 Implementation 34 42 ============== 35 43 36 - The soft and hard lockup detectors are built on top of the hrtimer and 37 - perf subsystems, respectively. A direct consequence of this is that, 38 - in principle, they should work in any architecture where these 39 - subsystems are present. 44 + The soft lockup detector is built on top of the hrtimer subsystem. 45 + The hard lockup detector is built on top of the perf subsystem 46 + (on architectures that support it) or uses an SMP "buddy" system. 40 47 41 - A periodic hrtimer runs to generate interrupts and kick the watchdog 42 - job. An NMI perf event is generated every "watchdog_thresh" 43 - (compile-time initialized to 10 and configurable through sysctl of the 44 - same name) seconds to check for hardlockups. If any CPU in the system 45 - does not receive any hrtimer interrupt during that time the 46 - 'hardlockup detector' (the handler for the NMI perf event) will 47 - generate a kernel warning or call panic, depending on the 48 - configuration. 48 + Softlockup Detector 49 + ------------------- 49 50 50 51 The watchdog job runs in a stop scheduling thread that updates a 51 52 timestamp every time it is scheduled. If that timestamp is not updated ··· 56 55 will call panic if it was instructed to do so or resume execution of 57 56 other kernel code. 58 57 59 - The period of the hrtimer is 2*watchdog_thresh/5, which means it has 60 - two or three chances to generate an interrupt before the hardlockup 61 - detector kicks in. 58 + Frequency and Heartbeats 59 + ------------------------ 62 60 63 - As explained above, a kernel knob is provided that allows 64 - administrators to configure the period of the hrtimer and the perf 65 - event. The right value for a particular environment is a trade-off 66 - between fast response to lockups and detection overhead. 61 + The hrtimer used by the softlockup detector serves a dual purpose: 62 + it detects softlockups, and it also generates the interrupts 63 + (heartbeats) that the hardlockup detectors use to verify CPU liveness. 67 64 68 - Detection Overhead 69 - ------------------ 65 + The period of this hrtimer is 2*watchdog_thresh/5. This means the 66 + hrtimer has two or three chances to generate an interrupt before the 67 + NMI hardlockup detector kicks in. 70 68 71 - The hardlockup detector checks for lockups using a periodic NMI perf 72 - event. This means the time to detect a lockup can vary depending on 73 - when the lockup occurs relative to the NMI check window. 69 + Hardlockup Detector (NMI/Perf) 70 + ------------------------------ 74 71 75 - **Best Case:** 76 - In the best case scenario, the lockup occurs just before the first 77 - heartbeat is due. The detector will notice the missing hrtimer 78 - interrupt almost immediately during the next check. 72 + On architectures that support NMI (Non-Maskable Interrupt) perf events, 73 + a periodic NMI is generated every "watchdog_thresh" seconds. 79 74 80 - :: 75 + If any CPU in the system does not receive any hrtimer interrupt 76 + (heartbeat) during the "watchdog_thresh" window, the 'hardlockup 77 + detector' (the handler for the NMI perf event) will generate a kernel 78 + warning or call panic. 81 79 82 - Time 100.0: cpu 1 heartbeat 83 - Time 100.1: hardlockup_check, cpu1 stores its state 84 - Time 103.9: Hard Lockup on cpu1 85 - Time 104.0: cpu 1 heartbeat never comes 86 - Time 110.1: hardlockup_check, cpu1 checks the state again, should be the same, declares lockup 80 + **Detection Overhead (NMI):** 87 81 88 - Time to detection: ~6 seconds 82 + The time to detect a lockup can vary depending on when the lockup 83 + occurs relative to the NMI check window. Examples below assume a watchdog_thresh of 10. 89 84 90 - **Worst Case:** 91 - In the worst case scenario, the lockup occurs shortly after a valid 92 - interrupt (heartbeat) which itself happened just after the NMI check. 93 - The next NMI check sees that the interrupt count has changed (due to 94 - that one heartbeat), assumes the CPU is healthy, and resets the 95 - baseline. The lockup is only detected at the subsequent check. 85 + * **Best Case:** The lockup occurs just before the first heartbeat is 86 + due. The detector will notice the missing hrtimer interrupt almost 87 + immediately during the next check. 96 88 97 - :: 89 + :: 98 90 99 - Time 100.0: hardlockup_check, cpu1 stores its state 100 - Time 100.1: cpu 1 heartbeat 101 - Time 100.2: Hard Lockup on cpu1 102 - Time 110.0: hardlockup_check, cpu1 stores its state (misses lockup as state changed) 103 - Time 120.0: hardlockup_check, cpu1 checks the state again, should be the same, declares lockup 91 + Time 100.0: cpu 1 heartbeat 92 + Time 100.1: hardlockup_check, cpu1 stores its state 93 + Time 103.9: Hard Lockup on cpu1 94 + Time 104.0: cpu 1 heartbeat never comes 95 + Time 110.1: hardlockup_check, cpu1 checks the state again, should be the same, declares lockup 104 96 105 - Time to detection: ~20 seconds 97 + Time to detection: ~6 seconds 98 + 99 + * **Worst Case:** The lockup occurs shortly after a valid interrupt 100 + (heartbeat) which itself happened just after the NMI check. The next 101 + NMI check sees that the interrupt count has changed (due to that one 102 + heartbeat), assumes the CPU is healthy, and resets the baseline. The 103 + lockup is only detected at the subsequent check. 104 + 105 + :: 106 + 107 + Time 100.0: hardlockup_check, cpu1 stores its state 108 + Time 100.1: cpu 1 heartbeat 109 + Time 100.2: Hard Lockup on cpu1 110 + Time 110.0: hardlockup_check, cpu1 stores its state (misses lockup as state changed) 111 + Time 120.0: hardlockup_check, cpu1 checks the state again, should be the same, declares lockup 112 + 113 + Time to detection: ~20 seconds 114 + 115 + Hardlockup Detector (Buddy) 116 + --------------------------- 117 + 118 + On architectures or configurations where NMI perf events are not 119 + available (or disabled), the kernel may use the "buddy" hardlockup 120 + detector. This mechanism requires SMP (Symmetric Multi-Processing). 121 + 122 + In this mode, each CPU is assigned a "buddy" CPU to monitor. The 123 + monitoring CPU runs its own hrtimer (the same one used for softlockup 124 + detection) and checks if the buddy CPU's hrtimer interrupt count has 125 + increased. 126 + 127 + To ensure timeliness and avoid false positives, the buddy system performs 128 + checks at every hrtimer interval (2*watchdog_thresh/5, which is 4 seconds 129 + by default). It uses a missed-interrupt threshold of 3. If the buddy's 130 + interrupt count has not changed for 3 consecutive checks, it is assumed 131 + that the buddy CPU is hardlocked (interrupts disabled). The monitoring 132 + CPU will then trigger the hardlockup response (warning or panic). 133 + 134 + **Detection Overhead (Buddy):** 135 + 136 + With a default check interval of 4 seconds (watchdog_thresh = 10): 137 + 138 + * **Best case:** Lockup occurs just before a check. 139 + Detected in ~8s (0s till 1st check + 4s till 2nd + 4s till 3rd). 140 + * **Worst case:** Lockup occurs just after a check. 141 + Detected in ~12s (4s till 1st check + 4s till 2nd + 4s till 3rd). 142 + 143 + **Limitations of the Buddy Detector:** 144 + 145 + 1. **All-CPU Lockup:** If all CPUs lock up simultaneously, the buddy 146 + detector cannot detect the condition because the monitoring CPUs 147 + are also frozen. 148 + 2. **Stack Traces:** Unlike the NMI detector, the buddy detector 149 + cannot directly interrupt the locked CPU to grab a stack trace. 150 + It relies on architecture-specific mechanisms (like NMI backtrace 151 + support) to try and retrieve the status of the locked CPU. If 152 + such support is missing, the log may only show that a lockup 153 + occurred without providing the locked CPU's stack. 154 + 155 + Watchdog Core Exclusion 156 + ======================= 106 157 107 158 By default, the watchdog runs on all online cores. However, on a 108 159 kernel configured with NO_HZ_FULL, by default the watchdog runs only

Configure Feed

Configure Feed