Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

doc: Update stallwarn.rst

This commit updates stallwarn.rst to reflect RCU additions and changes
over the past few years.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

+25 -18
+25 -18
Documentation/RCU/stallwarn.rst
··· 25 25 26 26 - A CPU looping with bottom halves disabled. 27 27 28 - - For !CONFIG_PREEMPTION kernels, a CPU looping anywhere in the kernel 29 - without invoking schedule(). If the looping in the kernel is 30 - really expected and desirable behavior, you might need to add 31 - some calls to cond_resched(). 28 + - For !CONFIG_PREEMPTION kernels, a CPU looping anywhere in the 29 + kernel without potentially invoking schedule(). If the looping 30 + in the kernel is really expected and desirable behavior, you 31 + might need to add some calls to cond_resched(). 32 32 33 33 - Booting Linux using a console connection that is too slow to 34 34 keep up with the boot-time console-message rate. For example, ··· 108 108 109 109 - A bug in the RCU implementation. 110 110 111 - - A hardware failure. This is quite unlikely, but has occurred 112 - at least once in real life. A CPU failed in a running system, 113 - becoming unresponsive, but not causing an immediate crash. 114 - This resulted in a series of RCU CPU stall warnings, eventually 115 - leading the realization that the CPU had failed. 111 + - A hardware failure. This is quite unlikely, but is not at all 112 + uncommon in large datacenter. In one memorable case some decades 113 + back, a CPU failed in a running system, becoming unresponsive, 114 + but not causing an immediate crash. This resulted in a series 115 + of RCU CPU stall warnings, eventually leading the realization 116 + that the CPU had failed. 116 117 117 - The RCU, RCU-sched, and RCU-tasks implementations have CPU stall warning. 118 - Note that SRCU does *not* have CPU stall warnings. Please note that 119 - RCU only detects CPU stalls when there is a grace period in progress. 120 - No grace period, no CPU stall warnings. 118 + The RCU, RCU-sched, RCU-tasks, and RCU-tasks-trace implementations have 119 + CPU stall warning. Note that SRCU does *not* have CPU stall warnings. 120 + Please note that RCU only detects CPU stalls when there is a grace period 121 + in progress. No grace period, no CPU stall warnings. 121 122 122 123 To diagnose the cause of the stall, inspect the stack traces. 123 124 The offending function will usually be near the top of the stack. ··· 206 205 rcupdate.rcu_task_stall_timeout 207 206 ------------------------------- 208 207 209 - This boot/sysfs parameter controls the RCU-tasks stall warning 210 - interval. A value of zero or less suppresses RCU-tasks stall 211 - warnings. A positive value sets the stall-warning interval 212 - in seconds. An RCU-tasks stall warning starts with the line: 208 + This boot/sysfs parameter controls the RCU-tasks and 209 + RCU-tasks-trace stall warning intervals. A value of zero or less 210 + suppresses RCU-tasks stall warnings. A positive value sets the 211 + stall-warning interval in seconds. An RCU-tasks stall warning 212 + starts with the line: 213 213 214 214 INFO: rcu_tasks detected stalls on tasks: 215 215 216 216 And continues with the output of sched_show_task() for each 217 217 task stalling the current RCU-tasks grace period. 218 + 219 + An RCU-tasks-trace stall warning starts (and continues) similarly: 220 + 221 + INFO: rcu_tasks_trace detected stalls on tasks 218 222 219 223 220 224 Interpreting RCU's CPU Stall-Detector "Splats" ··· 254 248 is in dyntick-idle mode and an odd-numbered value otherwise. The hex 255 249 number between the two "/"s is the value of the nesting, which will be 256 250 a small non-negative number if in the idle loop (as shown above) and a 257 - very large positive number otherwise. 251 + very large positive number otherwise. The number following the final 252 + "/" is the NMI nesting, which will be a small non-negative number. 258 253 259 254 The "softirq=" portion of the message tracks the number of RCU softirq 260 255 handlers that the stalled CPU has executed. The number before the "/"