Documentation/oops-tracing.txt: convert to ReST markup

+140 -119

1 changed file

expand all

Documentation

+140 -119

Documentation/oops-tracing.txt

··· 1 - NOTE: ksymoops is useless on 2.6. Please use the Oops in its original format 2 - (from dmesg, etc). Ignore any references in this or other docs to "decoding 3 - the Oops" or "running it through ksymoops". If you post an Oops from 2.6 that 4 - has been run through ksymoops, people will just tell you to repost it. 1 + OOPS tracing 2 + ============ 3 + 4 + .. note:: 5 + 6 + ``ksymoops`` is useless on 2.6 or upper. Please use the Oops in its original 7 + format (from ``dmesg``, etc). Ignore any references in this or other docs to 8 + "decoding the Oops" or "running it through ksymoops". 9 + If you post an Oops from 2.6+ that has been run through ``ksymoops``, 10 + people will just tell you to repost it. 5 11 6 12 Quick Summary 7 13 ------------- ··· 18 12 what you were doing. If it occurs repeatably try and describe how to recreate 19 13 it. That's worth even more than the oops. 20 14 21 - If you are totally stumped as to whom to send the report, send it to 15 + If you are totally stumped as to whom to send the report, send it to 22 16 linux-kernel@vger.kernel.org. Thanks for your help in making Linux as 23 17 stable as humanly possible. 24 18 ··· 26 20 ---------------------- 27 21 28 22 Normally the Oops text is read from the kernel buffers by klogd and 29 - handed to syslogd which writes it to a syslog file, typically 30 - /var/log/messages (depends on /etc/syslog.conf). Sometimes klogd dies, 31 - in which case you can run dmesg > file to read the data from the kernel 32 - buffers and save it. Or you can cat /proc/kmsg > file, however you 33 - have to break in to stop the transfer, kmsg is a "never ending file". 23 + handed to ``syslogd`` which writes it to a syslog file, typically 24 + ``/var/log/messages`` (depends on ``/etc/syslog.conf``). Sometimes ``klogd`` 25 + dies, in which case you can run ``dmesg > file`` to read the data from the 26 + kernel buffers and save it. Or you can ``cat /proc/kmsg > file``, however you 27 + have to break in to stop the transfer, ``kmsg`` is a "never ending file". 34 28 If the machine has crashed so badly that you cannot enter commands or 35 - the disk is not available then you have three options :- 29 + the disk is not available then you have three options : 36 30 37 31 (1) Hand copy the text from the screen and type it in after the machine 38 32 has restarted. Messy but it is the only option if you have not 39 33 planned for a crash. Alternatively, you can take a picture of 40 34 the screen with a digital camera - not nice, but better than 41 35 nothing. If the messages scroll off the top of the console, you 42 - may find that booting with a higher resolution (eg, vga=791) 43 - will allow you to read more of the text. (Caveat: This needs vesafb, 36 + may find that booting with a higher resolution (eg, ``vga=791``) 37 + will allow you to read more of the text. (Caveat: This needs ``vesafb``, 44 38 so won't help for 'early' oopses) 45 39 46 - (2) Boot with a serial console (see Documentation/serial-console.txt), 40 + (2) Boot with a serial console (see 41 + :ref:`Documentation/serial-console.txt <serial_console>`), 47 42 run a null modem to a second machine and capture the output there 48 43 using your favourite communication program. Minicom works well. 49 44 ··· 56 49 Full Information 57 50 ---------------- 58 51 59 - NOTE: the message from Linus below applies to 2.4 kernel. I have preserved it 60 - for historical reasons, and because some of the information in it still 61 - applies. Especially, please ignore any references to ksymoops. 52 + .. note:: 62 53 63 - From: Linus Torvalds <torvalds@osdl.org> 54 + the message from Linus below applies to 2.4 kernel. I have preserved it 55 + for historical reasons, and because some of the information in it still 56 + applies. Especially, please ignore any references to ksymoops. 64 57 65 - How to track down an Oops.. [originally a mail to linux-kernel] 58 + :: 66 59 67 - The main trick is having 5 years of experience with those pesky oops 68 - messages ;-) 60 + From: Linus Torvalds <torvalds@osdl.org> 69 61 70 - Actually, there are things you can do that make this easier. I have two 71 - separate approaches: 62 + How to track down an Oops.. [originally a mail to linux-kernel] 63 + 64 + The main trick is having 5 years of experience with those pesky oops 65 + messages ;-) 66 + 67 + Actually, there are things you can do that make this easier. I have two 68 + separate approaches:: 72 69 73 70 gdb /usr/src/linux/vmlinux 74 71 gdb> disassemble <offending_function> 75 72 76 - That's the easy way to find the problem, at least if the bug-report is 77 - well made (like this one was - run through ksymoops to get the 78 - information of which function and the offset in the function that it 73 + That's the easy way to find the problem, at least if the bug-report is 74 + well made (like this one was - run through ``ksymoops`` to get the 75 + information of which function and the offset in the function that it 79 76 happened in). 80 77 81 - Oh, it helps if the report happens on a kernel that is compiled with the 78 + Oh, it helps if the report happens on a kernel that is compiled with the 82 79 same compiler and similar setups. 83 80 84 - The other thing to do is disassemble the "Code:" part of the bug report: 81 + The other thing to do is disassemble the "Code:" part of the bug report: 85 82 ksymoops will do this too with the correct tools, but if you don't have 86 - the tools you can just do a silly program: 83 + the tools you can just do a silly program:: 87 84 88 85 char str[] = "\xXX\xXX\xXX..."; 89 86 main(){} 90 87 91 - and compile it with gcc -g and then do "disassemble str" (where the "XX" 92 - stuff are the values reported by the Oops - you can just cut-and-paste 93 - and do a replace of spaces to "\x" - that's what I do, as I'm too lazy 88 + and compile it with ``gcc -g`` and then do ``disassemble str`` (where the ``XX`` 89 + stuff are the values reported by the Oops - you can just cut-and-paste 90 + and do a replace of spaces to ``\x`` - that's what I do, as I'm too lazy 94 91 to write a program to automate this all). 95 92 96 - Alternatively, you can use the shell script in scripts/decodecode. 97 - Its usage is: decodecode < oops.txt 93 + Alternatively, you can use the shell script in ``scripts/decodecode``. 94 + Its usage is:: 95 + 96 + decodecode < oops.txt 98 97 99 98 The hex bytes that follow "Code:" may (in some architectures) have a series 100 99 of bytes that precede the current instruction pointer as well as bytes at and 101 100 following the current instruction pointer. In some cases, one instruction 102 - byte or word is surrounded by <> or (), as in "<86>" or "(f00d)". These 103 - <> or () markings indicate the current instruction pointer. Example from 104 - i386, split into multiple lines for readability: 101 + byte or word is surrounded by ``<>`` or ``()``, as in ``<86>`` or ``(f00d)``. 102 + These ``<>`` or ``()`` markings indicate the current instruction pointer. 105 103 106 - Code: f9 0f 8d f9 00 00 00 8d 42 0c e8 dd 26 11 c7 a1 60 ea 2b f9 8b 50 08 a1 107 - 64 ea 2b f9 8d 34 82 8b 1e 85 db 74 6d 8b 15 60 ea 2b f9 <8b> 43 04 39 42 54 108 - 7e 04 40 89 42 54 8b 43 04 3b 05 00 f6 52 c0 104 + Example from i386, split into multiple lines for readability:: 109 105 110 - Finally, if you want to see where the code comes from, you can do 106 + Code: f9 0f 8d f9 00 00 00 8d 42 0c e8 dd 26 11 c7 a1 60 ea 2b f9 8b 50 08 a1 107 + 64 ea 2b f9 8d 34 82 8b 1e 85 db 74 6d 8b 15 60 ea 2b f9 <8b> 43 04 39 42 54 108 + 7e 04 40 89 42 54 8b 43 04 3b 05 00 f6 52 c0 109 + 110 + Finally, if you want to see where the code comes from, you can do:: 111 111 112 112 cd /usr/src/linux 113 113 make fs/buffer.s # or whatever file the bug happened in 114 114 115 - and then you get a better idea of what happens than with the gdb 115 + and then you get a better idea of what happens than with the gdb 116 116 disassembly. 117 117 118 - Now, the trick is just then to combine all the data you have: the C 119 - sources (and general knowledge of what it _should_ do), the assembly 120 - listing and the code disassembly (and additionally the register dump you 121 - also get from the "oops" message - that can be useful to see _what_ the 122 - corrupted pointers were, and when you have the assembler listing you can 123 - also match the other registers to whatever C expressions they were used 118 + Now, the trick is just then to combine all the data you have: the C 119 + sources (and general knowledge of what it **should** do), the assembly 120 + listing and the code disassembly (and additionally the register dump you 121 + also get from the "oops" message - that can be useful to see **what** the 122 + corrupted pointers were, and when you have the assembler listing you can 123 + also match the other registers to whatever C expressions they were used 124 124 for). 125 125 126 - Essentially, you just look at what doesn't match (in this case it was the 127 - "Code" disassembly that didn't match with what the compiler generated). 128 - Then you need to find out _why_ they don't match. Often it's simple - you 129 - see that the code uses a NULL pointer and then you look at the code and 130 - wonder how the NULL pointer got there, and if it's a valid thing to do 126 + Essentially, you just look at what doesn't match (in this case it was the 127 + "Code" disassembly that didn't match with what the compiler generated). 128 + Then you need to find out **why** they don't match. Often it's simple - you 129 + see that the code uses a NULL pointer and then you look at the code and 130 + wonder how the NULL pointer got there, and if it's a valid thing to do 131 131 you just check against it.. 132 132 133 - Now, if somebody gets the idea that this is time-consuming and requires 134 - some small amount of concentration, you're right. Which is why I will 135 - mostly just ignore any panic reports that don't have the symbol table 136 - info etc looked up: it simply gets too hard to look it up (I have some 137 - programs to search for specific patterns in the kernel code segment, and 138 - sometimes I have been able to look up those kinds of panics too, but 139 - that really requires pretty good knowledge of the kernel just to be able 133 + Now, if somebody gets the idea that this is time-consuming and requires 134 + some small amount of concentration, you're right. Which is why I will 135 + mostly just ignore any panic reports that don't have the symbol table 136 + info etc looked up: it simply gets too hard to look it up (I have some 137 + programs to search for specific patterns in the kernel code segment, and 138 + sometimes I have been able to look up those kinds of panics too, but 139 + that really requires pretty good knowledge of the kernel just to be able 140 140 to pick out the right sequences etc..) 141 141 142 - _Sometimes_ it happens that I just see the disassembled code sequence 143 - from the panic, and I know immediately where it's coming from. That's when 142 + **Sometimes** it happens that I just see the disassembled code sequence 143 + from the panic, and I know immediately where it's coming from. That's when 144 144 I get worried that I've been doing this for too long ;-) 145 145 146 146 Linus 147 147 148 148 149 149 --------------------------------------------------------------------------- 150 - Notes on Oops tracing with klogd: 150 + 151 + Notes on Oops tracing with ``klogd`` 152 + ------------------------------------ 151 153 152 154 In order to help Linus and the other kernel developers there has been 153 - substantial support incorporated into klogd for processing protection 155 + substantial support incorporated into ``klogd`` for processing protection 154 156 faults. In order to have full support for address resolution at least 155 - version 1.3-pl3 of the sysklogd package should be used. 157 + version 1.3-pl3 of the ``sysklogd`` package should be used. 156 158 157 - When a protection fault occurs the klogd daemon automatically 159 + When a protection fault occurs the ``klogd`` daemon automatically 158 160 translates important addresses in the kernel log messages to their 159 161 symbolic equivalents. This translated kernel message is then 160 - forwarded through whatever reporting mechanism klogd is using. The 162 + forwarded through whatever reporting mechanism ``klogd`` is using. The 161 163 protection fault message can be simply cut out of the message files 162 164 and forwarded to the kernel developers. 163 165 164 - Two types of address resolution are performed by klogd. The first is 166 + Two types of address resolution are performed by ``klogd``. The first is 165 167 static translation and the second is dynamic translation. Static 166 168 translation uses the System.map file in much the same manner that 167 - ksymoops does. In order to do static translation the klogd daemon 169 + ksymoops does. In order to do static translation the ``klogd`` daemon 168 170 must be able to find a system map file at daemon initialization time. 169 - See the klogd man page for information on how klogd searches for map 171 + See the klogd man page for information on how ``klogd`` searches for map 170 172 files. 171 173 172 174 Dynamic address translation is important when kernel loadable modules ··· 194 178 export symbol information from the module. 195 179 196 180 Since the kernel module environment can be dynamic there must be a 197 - mechanism for notifying the klogd daemon when a change in module 181 + mechanism for notifying the ``klogd`` daemon when a change in module 198 182 environment occurs. There are command line options available which 199 183 allow klogd to signal the currently executing daemon that symbol 200 - information should be refreshed. See the klogd manual page for more 184 + information should be refreshed. See the ``klogd`` manual page for more 201 185 information. 202 186 203 187 A patch is included with the sysklogd distribution which modifies the 204 - modules-2.0.0 package to automatically signal klogd whenever a module 188 + ``modules-2.0.0`` package to automatically signal klogd whenever a module 205 189 is loaded or unloaded. Applying this patch provides essentially 206 190 seamless support for debugging protection faults which occur with 207 191 kernel loadable modules. 208 192 209 193 The following is an example of a protection fault in a loadable module 210 - processed by klogd: 211 - --------------------------------------------------------------------------- 212 - Aug 29 09:51:01 blizard kernel: Unable to handle kernel paging request at virtual address f15e97cc 213 - Aug 29 09:51:01 blizard kernel: current->tss.cr3 = 0062d000, %cr3 = 0062d000 214 - Aug 29 09:51:01 blizard kernel: *pde = 00000000 215 - Aug 29 09:51:01 blizard kernel: Oops: 0002 216 - Aug 29 09:51:01 blizard kernel: CPU: 0 217 - Aug 29 09:51:01 blizard kernel: EIP: 0010:[oops:_oops+16/3868] 218 - Aug 29 09:51:01 blizard kernel: EFLAGS: 00010212 219 - Aug 29 09:51:01 blizard kernel: eax: 315e97cc ebx: 003a6f80 ecx: 001be77b edx: 00237c0c 220 - Aug 29 09:51:01 blizard kernel: esi: 00000000 edi: bffffdb3 ebp: 00589f90 esp: 00589f8c 221 - Aug 29 09:51:01 blizard kernel: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018 222 - Aug 29 09:51:01 blizard kernel: Process oops_test (pid: 3374, process nr: 21, stackpage=00589000) 223 - Aug 29 09:51:01 blizard kernel: Stack: 315e97cc 00589f98 0100b0b4 bffffed4 0012e38e 00240c64 003a6f80 00000001 224 - Aug 29 09:51:01 blizard kernel: 00000000 00237810 bfffff00 0010a7fa 00000003 00000001 00000000 bfffff00 225 - Aug 29 09:51:01 blizard kernel: bffffdb3 bffffed4 ffffffda 0000002b 0007002b 0000002b 0000002b 00000036 226 - Aug 29 09:51:01 blizard kernel: Call Trace: [oops:_oops_ioctl+48/80] [_sys_ioctl+254/272] [_system_call+82/128] 227 - Aug 29 09:51:01 blizard kernel: Code: c7 00 05 00 00 00 eb 08 90 90 90 90 90 90 90 90 89 ec 5d c3 228 - --------------------------------------------------------------------------- 194 + processed by ``klogd``:: 229 195 230 - Dr. G.W. Wettstein Oncology Research Div. Computing Facility 231 - Roger Maris Cancer Center INTERNET: greg@wind.rmcc.com 232 - 820 4th St. N. 233 - Fargo, ND 58122 234 - Phone: 701-234-7556 235 - 196 + Aug 29 09:51:01 blizard kernel: Unable to handle kernel paging request at virtual address f15e97cc 197 + Aug 29 09:51:01 blizard kernel: current->tss.cr3 = 0062d000, %cr3 = 0062d000 198 + Aug 29 09:51:01 blizard kernel: *pde = 00000000 199 + Aug 29 09:51:01 blizard kernel: Oops: 0002 200 + Aug 29 09:51:01 blizard kernel: CPU: 0 201 + Aug 29 09:51:01 blizard kernel: EIP: 0010:[oops:_oops+16/3868] 202 + Aug 29 09:51:01 blizard kernel: EFLAGS: 00010212 203 + Aug 29 09:51:01 blizard kernel: eax: 315e97cc ebx: 003a6f80 ecx: 001be77b edx: 00237c0c 204 + Aug 29 09:51:01 blizard kernel: esi: 00000000 edi: bffffdb3 ebp: 00589f90 esp: 00589f8c 205 + Aug 29 09:51:01 blizard kernel: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018 206 + Aug 29 09:51:01 blizard kernel: Process oops_test (pid: 3374, process nr: 21, stackpage=00589000) 207 + Aug 29 09:51:01 blizard kernel: Stack: 315e97cc 00589f98 0100b0b4 bffffed4 0012e38e 00240c64 003a6f80 00000001 208 + Aug 29 09:51:01 blizard kernel: 00000000 00237810 bfffff00 0010a7fa 00000003 00000001 00000000 bfffff00 209 + Aug 29 09:51:01 blizard kernel: bffffdb3 bffffed4 ffffffda 0000002b 0007002b 0000002b 0000002b 00000036 210 + Aug 29 09:51:01 blizard kernel: Call Trace: [oops:_oops_ioctl+48/80] [_sys_ioctl+254/272] [_system_call+82/128] 211 + Aug 29 09:51:01 blizard kernel: Code: c7 00 05 00 00 00 eb 08 90 90 90 90 90 90 90 90 89 ec 5d c3 236 212 237 213 --------------------------------------------------------------------------- 238 - Tainted kernels: 239 214 240 - Some oops reports contain the string 'Tainted: ' after the program 215 + :: 216 + 217 + Dr. G.W. Wettstein Oncology Research Div. Computing Facility 218 + Roger Maris Cancer Center INTERNET: greg@wind.rmcc.com 219 + 820 4th St. N. 220 + Fargo, ND 58122 221 + Phone: 701-234-7556 222 + 223 + 224 + --------------------------------------------------------------------------- 225 + 226 + Tainted kernels 227 + --------------- 228 + 229 + Some oops reports contain the string **'Tainted: '** after the program 241 230 counter. This indicates that the kernel has been tainted by some 242 231 mechanism. The string is followed by a series of position-sensitive 243 232 characters, each representing a particular tainted value. 244 233 245 - 1: 'G' if all modules loaded have a GPL or compatible license, 'P' if 234 + 1) 'G' if all modules loaded have a GPL or compatible license, 'P' if 246 235 any proprietary module has been loaded. Modules without a 247 236 MODULE_LICENSE or with a MODULE_LICENSE that is not recognised by 248 237 insmod as GPL compatible are assumed to be proprietary. 249 238 250 - 2: 'F' if any module was force loaded by "insmod -f", ' ' if all 239 + 2) ``F`` if any module was force loaded by ``insmod -f``, ``' '`` if all 251 240 modules were loaded normally. 252 241 253 - 3: 'S' if the oops occurred on an SMP kernel running on hardware that 242 + 3) ``S`` if the oops occurred on an SMP kernel running on hardware that 254 243 hasn't been certified as safe to run multiprocessor. 255 244 Currently this occurs only on various Athlons that are not 256 245 SMP capable. 257 246 258 - 4: 'R' if a module was force unloaded by "rmmod -f", ' ' if all 247 + 4) ``R`` if a module was force unloaded by ``rmmod -f``, ``' '`` if all 259 248 modules were unloaded normally. 260 249 261 - 5: 'M' if any processor has reported a Machine Check Exception, 262 - ' ' if no Machine Check Exceptions have occurred. 250 + 5) ``M`` if any processor has reported a Machine Check Exception, 251 + ``' '`` if no Machine Check Exceptions have occurred. 263 252 264 - 6: 'B' if a page-release function has found a bad page reference or 253 + 6) ``B`` if a page-release function has found a bad page reference or 265 254 some unexpected page flags. 266 255 267 - 7: 'U' if a user or user application specifically requested that the 268 - Tainted flag be set, ' ' otherwise. 256 + 7) ``U`` if a user or user application specifically requested that the 257 + Tainted flag be set, ``' '`` otherwise. 269 258 270 - 8: 'D' if the kernel has died recently, i.e. there was an OOPS or BUG. 259 + 8) ``D`` if the kernel has died recently, i.e. there was an OOPS or BUG. 271 260 272 - 9: 'A' if the ACPI table has been overridden. 261 + 9) ``A`` if the ACPI table has been overridden. 273 262 274 - 10: 'W' if a warning has previously been issued by the kernel. 263 + 10) ``W`` if a warning has previously been issued by the kernel. 275 264 (Though some warnings may set more specific taint flags.) 276 265 277 - 11: 'C' if a staging driver has been loaded. 266 + 11) ``C`` if a staging driver has been loaded. 278 267 279 - 12: 'I' if the kernel is working around a severe bug in the platform 268 + 12) ``I`` if the kernel is working around a severe bug in the platform 280 269 firmware (BIOS or similar). 281 270 282 - 13: 'O' if an externally-built ("out-of-tree") module has been loaded. 271 + 13) ``O`` if an externally-built ("out-of-tree") module has been loaded. 283 272 284 - 14: 'E' if an unsigned module has been loaded in a kernel supporting 273 + 14) ``E`` if an unsigned module has been loaded in a kernel supporting 285 274 module signature. 286 275 287 - 15: 'L' if a soft lockup has previously occurred on the system. 276 + 15) ``L`` if a soft lockup has previously occurred on the system. 288 277 289 - 16: 'K' if the kernel has been live patched. 278 + 16) ``K`` if the kernel has been live patched. 290 279 291 - The primary reason for the 'Tainted: ' string is to tell kernel 280 + The primary reason for the **'Tainted: '** string is to tell kernel 292 281 debuggers if this is a clean kernel or if anything unusual has 293 282 occurred. Tainting is permanent: even if an offending module is 294 283 unloaded, the tainted value remains to indicate that the kernel is not

Configure Feed

Configure Feed