docs: sysctl: convert to ReST · tjh.dev/kernel@53b9537

+1 -1

Documentation/admin-guide/kernel-parameters.txt

··· 3144 3144 numa_zonelist_order= [KNL, BOOT] Select zonelist order for NUMA. 3145 3145 'node', 'default' can be specified 3146 3146 This can be set from sysctl after boot. 3147 - See Documentation/sysctl/vm.txt for details. 3147 + See Documentation/sysctl/vm.rst for details. 3148 3148 3149 3149 ohci1394_dma=early [HW] enable debugging via the ohci1394 driver. 3150 3150 See Documentation/debugging-via-ohci1394.txt for more

+1 -1

Documentation/admin-guide/mm/index.rst

··· 11 11 Linux memory management is a complex system with many configurable 12 12 settings. Most of these settings are available via ``/proc`` 13 13 filesystem and can be quired and adjusted using ``sysctl``. These APIs 14 - are described in Documentation/sysctl/vm.txt and in `man 5 proc`_. 14 + are described in Documentation/sysctl/vm.rst and in `man 5 proc`_. 15 15 16 16 .. _man 5 proc: http://man7.org/linux/man-pages/man5/proc.5.html 17 17

+1 -1

Documentation/admin-guide/mm/ksm.rst

··· 59 59 60 60 If a region of memory must be split into at least one new MADV_MERGEABLE 61 61 or MADV_UNMERGEABLE region, the madvise may return ENOMEM if the process 62 - will exceed ``vm.max_map_count`` (see Documentation/sysctl/vm.txt). 62 + will exceed ``vm.max_map_count`` (see Documentation/sysctl/vm.rst). 63 63 64 64 Like other madvise calls, they are intended for use on mapped areas of 65 65 the user address space: they will report ENOMEM if the specified range

+1 -1

Documentation/core-api/printk-formats.rst

··· 119 119 120 120 For printing kernel pointers which should be hidden from unprivileged 121 121 users. The behaviour of %pK depends on the kptr_restrict sysctl - see 122 - Documentation/sysctl/kernel.txt for more details. 122 + Documentation/sysctl/kernel.rst for more details. 123 123 124 124 Unmodified Addresses 125 125 --------------------

+1 -1

Documentation/networking/ip-sysctl.txt

··· 2287 2287 2288 2288 2289 2289 /proc/sys/net/core/* 2290 - Please see: Documentation/sysctl/net.txt for descriptions of these entries. 2290 + Please see: Documentation/sysctl/net.rst for descriptions of these entries. 2291 2291 2292 2292 2293 2293 /proc/sys/net/unix/*

+30 -6

Documentation/sysctl/README Documentation/sysctl/index.rst

··· 1 - Documentation for /proc/sys/ kernel version 2.2.10 2 - (c) 1998, 1999, Rik van Riel <riel@nl.linux.org> 1 + :orphan: 2 + 3 + =========================== 4 + Documentation for /proc/sys 5 + =========================== 6 + 7 + Copyright (c) 1998, 1999, Rik van Riel <riel@nl.linux.org> 8 + 9 + ------------------------------------------------------------------------------ 3 10 4 11 'Why', I hear you ask, 'would anyone even _want_ documentation 5 12 for them sysctl files? If anybody really needs it, it's all in ··· 19 12 Furthermore, the programmers who built sysctl have built it to 20 13 be actually used, not just for the fun of programming it :-) 21 14 22 - ============================================================== 15 + ------------------------------------------------------------------------------ 23 16 24 17 Legal blurb: 25 18 26 19 As usual, there are two main things to consider: 20 + 27 21 1. you get what you pay for 28 22 2. it's free 29 23 ··· 43 35 44 36 Rik van Riel. 45 37 46 - ============================================================== 38 + -------------------------------------------------------------- 47 39 48 - Introduction: 40 + Introduction 41 + ============ 49 42 50 43 Sysctl is a means of configuring certain aspects of the kernel 51 44 at run-time, and the /proc/sys/ directory is there so that you 52 45 don't even need special tools to do it! 53 46 In fact, there are only four things needed to use these config 54 47 facilities: 48 + 55 49 - a running Linux system 56 50 - root access 57 51 - common sense (this is especially hard to come by these days) ··· 64 54 one part of the kernel, so you can do configuration on a piece 65 55 by piece basis, or just some 'thematic frobbing'. 66 56 67 - The subdirs are about: 57 + This documentation is about: 58 + 59 + =============== =============================================================== 68 60 abi/ execution domains & personalities 69 61 debug/ <empty> 70 62 dev/ device specific information (eg dev/cdrom/info) ··· 82 70 vm/ memory management tuning 83 71 buffer and cache management 84 72 user/ Per user per user namespace limits 73 + =============== =============================================================== 85 74 86 75 These are the subdirs I have on my system. There might be more 87 76 or other subdirs in another setup. If you see another dir, I'd 88 77 really like to hear about it :-) 78 + 79 + .. toctree:: 80 + :maxdepth: 1 81 + 82 + abi 83 + fs 84 + kernel 85 + net 86 + sunrpc 87 + user 88 + vm

+67

Documentation/sysctl/abi.rst

··· 1 + ================================ 2 + Documentation for /proc/sys/abi/ 3 + ================================ 4 + 5 + kernel version 2.6.0.test2 6 + 7 + Copyright (c) 2003, Fabian Frederick <ffrederick@users.sourceforge.net> 8 + 9 + For general info: index.rst. 10 + 11 + ------------------------------------------------------------------------------ 12 + 13 + This path is binary emulation relevant aka personality types aka abi. 14 + When a process is executed, it's linked to an exec_domain whose 15 + personality is defined using values available from /proc/sys/abi. 16 + You can find further details about abi in include/linux/personality.h. 17 + 18 + Here are the files featuring in 2.6 kernel: 19 + 20 + - defhandler_coff 21 + - defhandler_elf 22 + - defhandler_lcall7 23 + - defhandler_libcso 24 + - fake_utsname 25 + - trace 26 + 27 + defhandler_coff 28 + --------------- 29 + 30 + defined value: 31 + PER_SCOSVR3:: 32 + 33 + 0x0003 | STICKY_TIMEOUTS | WHOLE_SECONDS | SHORT_INODE 34 + 35 + defhandler_elf 36 + -------------- 37 + 38 + defined value: 39 + PER_LINUX:: 40 + 41 + 0 42 + 43 + defhandler_lcall7 44 + ----------------- 45 + 46 + defined value : 47 + PER_SVR4:: 48 + 49 + 0x0001 | STICKY_TIMEOUTS | MMAP_PAGE_ZERO, 50 + 51 + defhandler_libsco 52 + ----------------- 53 + 54 + defined value: 55 + PER_SVR4:: 56 + 57 + 0x0001 | STICKY_TIMEOUTS | MMAP_PAGE_ZERO, 58 + 59 + fake_utsname 60 + ------------ 61 + 62 + Unused 63 + 64 + trace 65 + ----- 66 + 67 + Unused

-54

Documentation/sysctl/abi.txt

··· 1 - Documentation for /proc/sys/abi/* kernel version 2.6.0.test2 2 - (c) 2003, Fabian Frederick <ffrederick@users.sourceforge.net> 3 - 4 - For general info : README. 5 - 6 - ============================================================== 7 - 8 - This path is binary emulation relevant aka personality types aka abi. 9 - When a process is executed, it's linked to an exec_domain whose 10 - personality is defined using values available from /proc/sys/abi. 11 - You can find further details about abi in include/linux/personality.h. 12 - 13 - Here are the files featuring in 2.6 kernel : 14 - 15 - - defhandler_coff 16 - - defhandler_elf 17 - - defhandler_lcall7 18 - - defhandler_libcso 19 - - fake_utsname 20 - - trace 21 - 22 - =========================================================== 23 - defhandler_coff: 24 - defined value : 25 - PER_SCOSVR3 26 - 0x0003 | STICKY_TIMEOUTS | WHOLE_SECONDS | SHORT_INODE 27 - 28 - =========================================================== 29 - defhandler_elf: 30 - defined value : 31 - PER_LINUX 32 - 0 33 - 34 - =========================================================== 35 - defhandler_lcall7: 36 - defined value : 37 - PER_SVR4 38 - 0x0001 | STICKY_TIMEOUTS | MMAP_PAGE_ZERO, 39 - 40 - =========================================================== 41 - defhandler_libsco: 42 - defined value: 43 - PER_SVR4 44 - 0x0001 | STICKY_TIMEOUTS | MMAP_PAGE_ZERO, 45 - 46 - =========================================================== 47 - fake_utsname: 48 - Unused 49 - 50 - =========================================================== 51 - trace: 52 - Unused 53 - 54 - ===========================================================

+76 -66

Documentation/sysctl/fs.txt Documentation/sysctl/fs.rst

··· 1 - Documentation for /proc/sys/fs/* kernel version 2.2.10 2 - (c) 1998, 1999, Rik van Riel <riel@nl.linux.org> 3 - (c) 2009, Shen Feng<shen@cn.fujitsu.com> 1 + =============================== 2 + Documentation for /proc/sys/fs/ 3 + =============================== 4 4 5 - For general info and legal blurb, please look in README. 5 + kernel version 2.2.10 6 6 7 - ============================================================== 7 + Copyright (c) 1998, 1999, Rik van Riel <riel@nl.linux.org> 8 + 9 + Copyright (c) 2009, Shen Feng<shen@cn.fujitsu.com> 10 + 11 + For general info and legal blurb, please look in intro.rst. 12 + 13 + ------------------------------------------------------------------------------ 8 14 9 15 This file contains documentation for the sysctl files in 10 16 /proc/sys/fs/ and is valid for Linux kernel version 2.2. ··· 22 16 before actually making adjustments. 23 17 24 18 1. /proc/sys/fs 25 - ---------------------------------------------------------- 19 + =============== 26 20 27 21 Currently, these files are in /proc/sys/fs: 22 + 28 23 - aio-max-nr 29 24 - aio-nr 30 25 - dentry-state ··· 49 42 - super-max 50 43 - super-nr 51 44 52 - ============================================================== 53 45 54 - aio-nr & aio-max-nr: 46 + aio-nr & aio-max-nr 47 + ------------------- 55 48 56 49 aio-nr is the running total of the number of events specified on the 57 50 io_setup system call for all currently active aio contexts. If aio-nr ··· 59 52 raising aio-max-nr does not result in the pre-allocation or re-sizing 60 53 of any kernel data structures. 61 54 62 - ============================================================== 63 55 64 - dentry-state: 56 + dentry-state 57 + ------------ 65 58 66 - From linux/include/linux/dcache.h: 67 - -------------------------------------------------------------- 68 - struct dentry_stat_t dentry_stat { 59 + From linux/include/linux/dcache.h:: 60 + 61 + struct dentry_stat_t dentry_stat { 69 62 int nr_dentry; 70 63 int nr_unused; 71 64 int age_limit; /* age in seconds */ 72 65 int want_pages; /* pages requested by system */ 73 66 int nr_negative; /* # of unused negative dentries */ 74 67 int dummy; /* Reserved for future use */ 75 - }; 76 - -------------------------------------------------------------- 68 + }; 77 69 78 70 Dentries are dynamically allocated and deallocated. 79 71 ··· 90 84 they help speeding up rejection of non-existing files provided 91 85 by the users. 92 86 93 - ============================================================== 94 87 95 - dquot-max & dquot-nr: 88 + dquot-max & dquot-nr 89 + -------------------- 96 90 97 91 The file dquot-max shows the maximum number of cached disk 98 92 quota entries. ··· 104 98 you have some awesome number of simultaneous system users, 105 99 you might want to raise the limit. 106 100 107 - ============================================================== 108 101 109 - file-max & file-nr: 102 + file-max & file-nr 103 + ------------------ 110 104 111 105 The value in file-max denotes the maximum number of file- 112 106 handles that the Linux kernel will allocate. When you get lots ··· 125 119 Attempts to allocate more file descriptors than file-max are 126 120 reported with printk, look for "VFS: file-max limit <number> 127 121 reached". 128 - ============================================================== 129 122 130 - nr_open: 123 + 124 + nr_open 125 + ------- 131 126 132 127 This denotes the maximum number of file-handles a process can 133 128 allocate. Default value is 1024*1024 (1048576) which should be 134 129 enough for most machines. Actual limit depends on RLIMIT_NOFILE 135 130 resource limit. 136 131 137 - ============================================================== 138 132 139 - inode-max, inode-nr & inode-state: 133 + inode-max, inode-nr & inode-state 134 + --------------------------------- 140 135 141 136 As with file handles, the kernel allocates the inode structures 142 137 dynamically, but can't free them yet. ··· 164 157 system needs to prune the inode list instead of allocating 165 158 more. 166 159 167 - ============================================================== 168 160 169 - overflowgid & overflowuid: 161 + overflowgid & overflowuid 162 + ------------------------- 170 163 171 164 Some filesystems only support 16-bit UIDs and GIDs, although in Linux 172 165 UIDs and GIDs are 32 bits. When one of these filesystems is mounted ··· 176 169 These sysctls allow you to change the value of the fixed UID and GID. 177 170 The default is 65534. 178 171 179 - ============================================================== 180 172 181 - pipe-user-pages-hard: 173 + pipe-user-pages-hard 174 + -------------------- 182 175 183 176 Maximum total number of pages a non-privileged user may allocate for pipes. 184 177 Once this limit is reached, no new pipes may be allocated until usage goes 185 178 below the limit again. When set to 0, no limit is applied, which is the default 186 179 setting. 187 180 188 - ============================================================== 189 181 190 - pipe-user-pages-soft: 182 + pipe-user-pages-soft 183 + -------------------- 191 184 192 185 Maximum total number of pages a non-privileged user may allocate for pipes 193 186 before the pipe size gets limited to a single page. Once this limit is reached, ··· 197 190 allocate up to 1024 pipes at their default size. When set to 0, no limit is 198 191 applied. 199 192 200 - ============================================================== 201 193 202 - protected_fifos: 194 + protected_fifos 195 + --------------- 203 196 204 197 The intent of this protection is to avoid unintentional writes to 205 198 an attacker-controlled FIFO, where a program expected to create a regular ··· 215 208 216 209 This protection is based on the restrictions in Openwall. 217 210 218 - ============================================================== 219 211 220 - protected_hardlinks: 212 + protected_hardlinks 213 + -------------------- 221 214 222 215 A long-standing class of security issues is the hardlink-based 223 216 time-of-check-time-of-use race, most commonly seen in world-writable ··· 235 228 236 229 This protection is based on the restrictions in Openwall and grsecurity. 237 230 238 - ============================================================== 239 231 240 - protected_regular: 232 + protected_regular 233 + ----------------- 241 234 242 235 This protection is similar to protected_fifos, but it 243 236 avoids writes to an attacker-controlled regular file, where a program ··· 251 244 252 245 When set to "2" it also applies to group writable sticky directories. 253 246 254 - ============================================================== 255 247 256 - protected_symlinks: 248 + protected_symlinks 249 + ------------------ 257 250 258 251 A long-standing class of security issues is the symlink-based 259 252 time-of-check-time-of-use race, most commonly seen in world-writable ··· 271 264 272 265 This protection is based on the restrictions in Openwall and grsecurity. 273 266 274 - ============================================================== 275 267 276 268 suid_dumpable: 269 + -------------- 277 270 278 271 This value can be used to query and set the core dump mode for setuid 279 272 or otherwise protected/tainted binaries. The modes are 280 273 281 - 0 - (default) - traditional behaviour. Any process which has changed 282 - privilege levels or is execute only will not be dumped. 283 - 1 - (debug) - all processes dump core when possible. The core dump is 284 - owned by the current user and no security is applied. This is 285 - intended for system debugging situations only. Ptrace is unchecked. 286 - This is insecure as it allows regular users to examine the memory 287 - contents of privileged processes. 288 - 2 - (suidsafe) - any binary which normally would not be dumped is dumped 289 - anyway, but only if the "core_pattern" kernel sysctl is set to 290 - either a pipe handler or a fully qualified path. (For more details 291 - on this limitation, see CVE-2006-2451.) This mode is appropriate 292 - when administrators are attempting to debug problems in a normal 293 - environment, and either have a core dump pipe handler that knows 294 - to treat privileged core dumps with care, or specific directory 295 - defined for catching core dumps. If a core dump happens without 296 - a pipe handler or fully qualifid path, a message will be emitted 297 - to syslog warning about the lack of a correct setting. 274 + = ========== =============================================================== 275 + 0 (default) traditional behaviour. Any process which has changed 276 + privilege levels or is execute only will not be dumped. 277 + 1 (debug) all processes dump core when possible. The core dump is 278 + owned by the current user and no security is applied. This is 279 + intended for system debugging situations only. 280 + Ptrace is unchecked. 281 + This is insecure as it allows regular users to examine the 282 + memory contents of privileged processes. 283 + 2 (suidsafe) any binary which normally would not be dumped is dumped 284 + anyway, but only if the "core_pattern" kernel sysctl is set to 285 + either a pipe handler or a fully qualified path. (For more 286 + details on this limitation, see CVE-2006-2451.) This mode is 287 + appropriate when administrators are attempting to debug 288 + problems in a normal environment, and either have a core dump 289 + pipe handler that knows to treat privileged core dumps with 290 + care, or specific directory defined for catching core dumps. 291 + If a core dump happens without a pipe handler or fully 292 + qualified path, a message will be emitted to syslog warning 293 + about the lack of a correct setting. 294 + = ========== =============================================================== 298 295 299 - ============================================================== 300 296 301 - super-max & super-nr: 297 + super-max & super-nr 298 + -------------------- 302 299 303 300 These numbers control the maximum number of superblocks, and 304 301 thus the maximum number of mounted filesystems the kernel ··· 310 299 mount more filesystems than the current value in super-max 311 300 allows you to. 312 301 313 - ============================================================== 314 302 315 - aio-nr & aio-max-nr: 303 + aio-nr & aio-max-nr 304 + ------------------- 316 305 317 306 aio-nr shows the current system-wide number of asynchronous io 318 307 requests. aio-max-nr allows you to change the maximum value 319 308 aio-nr can grow to. 320 309 321 - ============================================================== 322 310 323 - mount-max: 311 + mount-max 312 + --------- 324 313 325 314 This denotes the maximum number of mounts that may exist 326 315 in a mount namespace. 327 316 328 - ============================================================== 329 317 330 318 331 319 2. /proc/sys/fs/binfmt_misc 332 - ---------------------------------------------------------- 320 + =========================== 333 321 334 322 Documentation for the files in /proc/sys/fs/binfmt_misc is 335 323 in Documentation/admin-guide/binfmt-misc.rst. 336 324 337 325 338 326 3. /proc/sys/fs/mqueue - POSIX message queues filesystem 339 - ---------------------------------------------------------- 327 + ======================================================== 328 + 340 329 341 330 The "mqueue" filesystem provides the necessary kernel features to enable the 342 331 creation of a user space library that implements the POSIX message queues ··· 367 356 exceed msgsize_max, the default value is initialized msgsize_max. 368 357 369 358 4. /proc/sys/fs/epoll - Configuration options for the epoll interface 370 - -------------------------------------------------------- 359 + ===================================================================== 371 360 372 361 This directory contains configuration options for the epoll(7) interface. 373 362 ··· 382 371 on a 64bit one. 383 372 The current default value for max_user_watches is the 1/32 of the available 384 373 low memory, divided for the "watch" cost in bytes. 385 -

+212 -164

Documentation/sysctl/kernel.txt Documentation/sysctl/kernel.rst

··· 1 - Documentation for /proc/sys/kernel/* kernel version 2.2.10 2 - (c) 1998, 1999, Rik van Riel <riel@nl.linux.org> 3 - (c) 2009, Shen Feng<shen@cn.fujitsu.com> 1 + =================================== 2 + Documentation for /proc/sys/kernel/ 3 + =================================== 4 4 5 - For general info and legal blurb, please look in README. 5 + kernel version 2.2.10 6 6 7 - ============================================================== 7 + Copyright (c) 1998, 1999, Rik van Riel <riel@nl.linux.org> 8 + 9 + Copyright (c) 2009, Shen Feng<shen@cn.fujitsu.com> 10 + 11 + For general info and legal blurb, please look in index.rst. 12 + 13 + ------------------------------------------------------------------------------ 8 14 9 15 This file contains documentation for the sysctl files in 10 16 /proc/sys/kernel/ and is valid for Linux kernel version 2.2. ··· 107 101 - watchdog_thresh 108 102 - version 109 103 110 - ============================================================== 111 104 112 105 acct: 106 + ===== 113 107 114 108 highwater lowwater frequency 115 109 ··· 124 118 if we got >=4%; consider information about amount of free space 125 119 valid for 30 seconds. 126 120 127 - ============================================================== 128 121 129 122 acpi_video_flags: 123 + ================= 130 124 131 125 flags 132 126 133 127 See Doc*/kernel/power/video.txt, it allows mode of video boot to be 134 128 set during run time. 135 129 136 - ============================================================== 137 130 138 131 auto_msgmni: 132 + ============ 139 133 140 134 This variable has no effect and may be removed in future kernel 141 135 releases. Reading it always returns 0. ··· 145 139 Echoing "0" turned it off. auto_msgmni default value was 1. 146 140 147 141 148 - ============================================================== 149 - 150 142 bootloader_type: 143 + ================ 151 144 152 145 x86 bootloader identification 153 146 ··· 161 156 See the type_of_loader and ext_loader_type fields in 162 157 Documentation/x86/boot.rst for additional information. 163 158 164 - ============================================================== 165 159 166 160 bootloader_version: 161 + =================== 167 162 168 163 x86 bootloader version 169 164 ··· 173 168 See the type_of_loader and ext_loader_ver fields in 174 169 Documentation/x86/boot.rst for additional information. 175 170 176 - ============================================================== 177 171 178 - cap_last_cap 172 + cap_last_cap: 173 + ============= 179 174 180 175 Highest valid capability of the running kernel. Exports 181 176 CAP_LAST_CAP from the kernel. 182 177 183 - ============================================================== 184 178 185 179 core_pattern: 180 + ============= 186 181 187 182 core_pattern is used to specify a core dumpfile pattern name. 188 - . max length 127 characters; default value is "core" 189 - . core_pattern is used as a pattern template for the output filename; 183 + 184 + * max length 127 characters; default value is "core" 185 + * core_pattern is used as a pattern template for the output filename; 190 186 certain string patterns (beginning with '%') are substituted with 191 187 their actual values. 192 - . backward compatibility with core_uses_pid: 188 + * backward compatibility with core_uses_pid: 189 + 193 190 If core_pattern does not include "%p" (default does not) 194 191 and core_uses_pid is set, then .PID will be appended to 195 192 the filename. 196 - . corename format specifiers: 193 + 194 + * corename format specifiers:: 195 + 197 196 %<NUL> '%' is dropped 198 197 %% output one '%' 199 198 %p pid ··· 214 205 %e executable filename (may be shortened) 215 206 %E executable path 216 207 %<OTHER> both are dropped 217 - . If the first character of the pattern is a '|', the kernel will treat 208 + 209 + * If the first character of the pattern is a '|', the kernel will treat 218 210 the rest of the pattern as a command to run. The core dump will be 219 211 written to the standard input of that program instead of to a file. 220 212 221 - ============================================================== 222 213 223 214 core_pipe_limit: 215 + ================ 224 216 225 217 This sysctl is only applicable when core_pattern is configured to pipe 226 218 core files to a user space helper (when the first character of ··· 242 232 process is not guaranteed access to /proc/<crashing pid>/). This 243 233 value defaults to 0. 244 234 245 - ============================================================== 246 235 247 236 core_uses_pid: 237 + ============== 248 238 249 239 The default coredump filename is "core". By setting 250 240 core_uses_pid to 1, the coredump filename becomes core.PID. ··· 252 242 and core_uses_pid is set, then .PID will be appended to 253 243 the filename. 254 244 255 - ============================================================== 256 245 257 246 ctrl-alt-del: 247 + ============= 258 248 259 249 When the value in this file is 0, ctrl-alt-del is trapped and 260 250 sent to the init(1) program to handle a graceful restart. ··· 262 252 Nerve Pinch (tm) will be an immediate reboot, without even 263 253 syncing its dirty buffers. 264 254 265 - Note: when a program (like dosemu) has the keyboard in 'raw' 266 - mode, the ctrl-alt-del is intercepted by the program before it 267 - ever reaches the kernel tty layer, and it's up to the program 268 - to decide what to do with it. 255 + Note: 256 + when a program (like dosemu) has the keyboard in 'raw' 257 + mode, the ctrl-alt-del is intercepted by the program before it 258 + ever reaches the kernel tty layer, and it's up to the program 259 + to decide what to do with it. 269 260 270 - ============================================================== 271 261 272 262 dmesg_restrict: 263 + =============== 273 264 274 265 This toggle indicates whether unprivileged users are prevented 275 266 from using dmesg(8) to view messages from the kernel's log buffer. ··· 281 270 The kernel config option CONFIG_SECURITY_DMESG_RESTRICT sets the 282 271 default value of dmesg_restrict. 283 272 284 - ============================================================== 285 273 286 274 domainname & hostname: 275 + ====================== 287 276 288 277 These files can be used to set the NIS/YP domainname and the 289 278 hostname of your box in exactly the same way as the commands 290 - domainname and hostname, i.e.: 291 - # echo "darkstar" > /proc/sys/kernel/hostname 292 - # echo "mydomain" > /proc/sys/kernel/domainname 293 - has the same effect as 294 - # hostname "darkstar" 295 - # domainname "mydomain" 279 + domainname and hostname, i.e.:: 280 + 281 + # echo "darkstar" > /proc/sys/kernel/hostname 282 + # echo "mydomain" > /proc/sys/kernel/domainname 283 + 284 + has the same effect as:: 285 + 286 + # hostname "darkstar" 287 + # domainname "mydomain" 296 288 297 289 Note, however, that the classic darkstar.frop.org has the 298 290 hostname "darkstar" and DNS (Internet Domain Name Server) ··· 304 290 domain names are in general different. For a detailed discussion 305 291 see the hostname(1) man page. 306 292 307 - ============================================================== 293 + 308 294 hardlockup_all_cpu_backtrace: 295 + ============================= 309 296 310 297 This value controls the hard lockup detector behavior when a hard 311 298 lockup condition is detected as to whether or not to gather further ··· 316 301 0: do nothing. This is the default behavior. 317 302 318 303 1: on detection capture more debug information. 319 - ============================================================== 304 + 320 305 321 306 hardlockup_panic: 307 + ================= 322 308 323 309 This parameter can be used to control whether the kernel panics 324 310 when a hard lockup is detected. ··· 330 314 See Documentation/lockup-watchdogs.txt for more information. This can 331 315 also be set using the nmi_watchdog kernel parameter. 332 316 333 - ============================================================== 334 317 335 318 hotplug: 319 + ======== 336 320 337 321 Path for the hotplug policy agent. 338 322 Default value is "/sbin/hotplug". 339 323 340 - ============================================================== 341 324 342 325 hung_task_panic: 326 + ================ 343 327 344 328 Controls the kernel's behavior when a hung task is detected. 345 329 This file shows up if CONFIG_DETECT_HUNG_TASK is enabled. ··· 348 332 349 333 1: panic immediately. 350 334 351 - ============================================================== 352 335 353 336 hung_task_check_count: 337 + ====================== 354 338 355 339 The upper bound on the number of tasks that are checked. 356 340 This file shows up if CONFIG_DETECT_HUNG_TASK is enabled. 357 341 358 - ============================================================== 359 342 360 343 hung_task_timeout_secs: 344 + ======================= 361 345 362 346 When a task in D state did not get scheduled 363 347 for more than this value report a warning. 364 348 This file shows up if CONFIG_DETECT_HUNG_TASK is enabled. 365 349 366 350 0: means infinite timeout - no checking done. 351 + 367 352 Possible values to set are in range {0..LONG_MAX/HZ}. 368 353 369 - ============================================================== 370 354 371 355 hung_task_check_interval_secs: 356 + ============================== 372 357 373 358 Hung task check interval. If hung task checking is enabled 374 359 (see hung_task_timeout_secs), the check is done every ··· 379 362 0 (default): means use hung_task_timeout_secs as checking interval. 380 363 Possible values to set are in range {0..LONG_MAX/HZ}. 381 364 382 - ============================================================== 383 365 384 366 hung_task_warnings: 367 + =================== 385 368 386 369 The maximum number of warnings to report. During a check interval 387 370 if a hung task is detected, this value is decreased by 1. ··· 390 373 391 374 -1: report an infinite number of warnings. 392 375 393 - ============================================================== 394 376 395 377 hyperv_record_panic_msg: 378 + ======================== 396 379 397 380 Controls whether the panic kmsg data should be reported to Hyper-V. 398 381 ··· 400 383 401 384 1: report the panic kmsg data. This is the default behavior. 402 385 403 - ============================================================== 404 386 405 387 kexec_load_disabled: 388 + ==================== 406 389 407 390 A toggle indicating if the kexec_load syscall has been disabled. This 408 391 value defaults to 0 (false: kexec_load enabled), but can be set to 1 ··· 412 395 later use) an image without it being altered. Generally used together 413 396 with the "modules_disabled" sysctl. 414 397 415 - ============================================================== 416 398 417 399 kptr_restrict: 400 + ============== 418 401 419 402 This toggle indicates whether restrictions are placed on 420 403 exposing kernel addresses via /proc and other interfaces. ··· 437 420 When kptr_restrict is set to (2), kernel pointers printed using 438 421 %pK will be replaced with 0's regardless of privileges. 439 422 440 - ============================================================== 441 423 442 424 l2cr: (PPC only) 425 + ================ 443 426 444 427 This flag controls the L2 cache of G3 processor boards. If 445 428 0, the cache is disabled. Enabled if nonzero. 446 429 447 - ============================================================== 448 430 449 431 modules_disabled: 432 + ================= 450 433 451 434 A toggle value indicating if modules are allowed to be loaded 452 435 in an otherwise modular kernel. This toggle defaults to off ··· 454 437 neither loaded nor unloaded, and the toggle cannot be set back 455 438 to false. Generally used with the "kexec_load_disabled" toggle. 456 439 457 - ============================================================== 458 440 459 441 msg_next_id, sem_next_id, and shm_next_id: 442 + ========================================== 460 443 461 444 These three toggles allows to specify desired id for next allocated IPC 462 445 object: message, semaphore or shared memory respectively. ··· 465 448 Possible values to set are in range {0..INT_MAX}. 466 449 467 450 Notes: 468 - 1) kernel doesn't guarantee, that new object will have desired id. So, 469 - it's up to userspace, how to handle an object with "wrong" id. 470 - 2) Toggle with non-default value will be set back to -1 by kernel after 471 - successful IPC object allocation. If an IPC object allocation syscall 472 - fails, it is undefined if the value remains unmodified or is reset to -1. 451 + 1) kernel doesn't guarantee, that new object will have desired id. So, 452 + it's up to userspace, how to handle an object with "wrong" id. 453 + 2) Toggle with non-default value will be set back to -1 by kernel after 454 + successful IPC object allocation. If an IPC object allocation syscall 455 + fails, it is undefined if the value remains unmodified or is reset to -1. 473 456 474 - ============================================================== 475 457 476 458 nmi_watchdog: 459 + ============= 477 460 478 461 This parameter can be used to control the NMI watchdog 479 462 (i.e. the hard lockup detector) on x86 systems. 480 463 481 - 0 - disable the hard lockup detector 482 - 1 - enable the hard lockup detector 464 + 0 - disable the hard lockup detector 465 + 466 + 1 - enable the hard lockup detector 483 467 484 468 The hard lockup detector monitors each CPU for its ability to respond to 485 469 timer interrupts. The mechanism utilizes CPU performance counter registers ··· 488 470 while a CPU is busy. Hence, the alternative name 'NMI watchdog'. 489 471 490 472 The NMI watchdog is disabled by default if the kernel is running as a guest 491 - in a KVM virtual machine. This default can be overridden by adding 473 + in a KVM virtual machine. This default can be overridden by adding:: 492 474 493 475 nmi_watchdog=1 494 476 495 477 to the guest kernel command line (see Documentation/admin-guide/kernel-parameters.rst). 496 478 497 - ============================================================== 498 479 499 - numa_balancing 480 + numa_balancing: 481 + =============== 500 482 501 483 Enables/disables automatic page fault based NUMA memory 502 484 balancing. Memory is moved automatically to nodes ··· 518 500 numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, 519 501 numa_balancing_scan_size_mb, and numa_balancing_settle_count sysctls. 520 502 521 - ============================================================== 503 + numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, numa_balancing_scan_size_mb 504 + =============================================================================================================================== 522 505 523 - numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms, 524 - numa_balancing_scan_period_max_ms, numa_balancing_scan_size_mb 525 506 526 507 Automatic NUMA balancing scans tasks address space and unmaps pages to 527 508 detect if pages are properly placed or if the data should be migrated to a ··· 556 539 numa_balancing_scan_size_mb is how many megabytes worth of pages are 557 540 scanned for a given scan. 558 541 559 - ============================================================== 560 542 561 543 osrelease, ostype & version: 544 + ============================ 562 545 563 - # cat osrelease 564 - 2.1.88 565 - # cat ostype 566 - Linux 567 - # cat version 568 - #5 Wed Feb 25 21:49:24 MET 1998 546 + :: 547 + 548 + # cat osrelease 549 + 2.1.88 550 + # cat ostype 551 + Linux 552 + # cat version 553 + #5 Wed Feb 25 21:49:24 MET 1998 569 554 570 555 The files osrelease and ostype should be clear enough. Version 571 556 needs a little more clarification however. The '#5' means that ··· 575 556 date behind it indicates the time the kernel was built. 576 557 The only way to tune these values is to rebuild the kernel :-) 577 558 578 - ============================================================== 579 559 580 560 overflowgid & overflowuid: 561 + ========================== 581 562 582 563 if your architecture did not always support 32-bit UIDs (i.e. arm, 583 564 i386, m68k, sh, and sparc32), a fixed UID and GID will be returned to ··· 587 568 These sysctls allow you to change the value of the fixed UID and GID. 588 569 The default is 65534. 589 570 590 - ============================================================== 591 571 592 572 panic: 573 + ====== 593 574 594 575 The value in this file represents the number of seconds the kernel 595 576 waits before rebooting on a panic. When you use the software watchdog, 596 577 the recommended setting is 60. 597 578 598 - ============================================================== 599 579 600 580 panic_on_io_nmi: 581 + ================ 601 582 602 583 Controls the kernel's behavior when a CPU receives an NMI caused by 603 584 an IO error. ··· 610 591 servers issue this sort of NMI when the dump button is pushed, 611 592 and you can use this option to take a crash dump. 612 593 613 - ============================================================== 614 594 615 595 panic_on_oops: 596 + ============== 616 597 617 598 Controls the kernel's behaviour when an oops or BUG is encountered. 618 599 619 600 0: try to continue operation 620 601 621 - 1: panic immediately. If the `panic' sysctl is also non-zero then the 602 + 1: panic immediately. If the `panic` sysctl is also non-zero then the 622 603 machine will be rebooted. 623 604 624 - ============================================================== 625 605 626 606 panic_on_stackoverflow: 607 + ======================= 627 608 628 609 Controls the kernel's behavior when detecting the overflows of 629 610 kernel, IRQ and exception stacks except a user stack. ··· 633 614 634 615 1: panic immediately. 635 616 636 - ============================================================== 637 617 638 618 panic_on_unrecovered_nmi: 619 + ========================= 639 620 640 621 The default Linux behaviour on an NMI of either memory or unknown is 641 622 to continue operation. For many environments such as scientific ··· 646 627 such as power management so the default is off. That sysctl works like 647 628 the existing panic controls already in that directory. 648 629 649 - ============================================================== 650 630 651 631 panic_on_warn: 632 + ============== 652 633 653 634 Calls panic() in the WARN() path when set to 1. This is useful to avoid 654 635 a kernel rebuild when attempting to kdump at the location of a WARN(). ··· 657 638 658 639 1: call panic() after printing out WARN() location. 659 640 660 - ============================================================== 661 641 662 642 panic_print: 643 + ============ 663 644 664 645 Bitmask for printing system info when panic happens. User can chose 665 646 combination of the following bits: 666 647 667 - bit 0: print all tasks info 668 - bit 1: print system memory info 669 - bit 2: print timer info 670 - bit 3: print locks info if CONFIG_LOCKDEP is on 671 - bit 4: print ftrace buffer 648 + ===== ======================================== 649 + bit 0 print all tasks info 650 + bit 1 print system memory info 651 + bit 2 print timer info 652 + bit 3 print locks info if CONFIG_LOCKDEP is on 653 + bit 4 print ftrace buffer 654 + ===== ======================================== 672 655 673 - So for example to print tasks and memory info on panic, user can: 656 + So for example to print tasks and memory info on panic, user can:: 657 + 674 658 echo 3 > /proc/sys/kernel/panic_print 675 659 676 - ============================================================== 677 660 678 661 panic_on_rcu_stall: 662 + =================== 679 663 680 664 When set to 1, calls panic() after RCU stall detection messages. This 681 665 is useful to define the root cause of RCU stalls using a vmcore. ··· 687 665 688 666 1: panic() after printing RCU stall messages. 689 667 690 - ============================================================== 691 668 692 669 perf_cpu_time_max_percent: 670 + ========================== 693 671 694 672 Hints to the kernel how much CPU time it should be allowed to 695 673 use to handle perf sampling events. If the perf subsystem ··· 702 680 stacked up next to each other so much that nothing else is 703 681 allowed to execute. 704 682 705 - 0: disable the mechanism. Do not monitor or correct perf's 683 + 0: 684 + disable the mechanism. Do not monitor or correct perf's 706 685 sampling rate no matter how CPU time it takes. 707 686 708 - 1-100: attempt to throttle perf's sample rate to this 687 + 1-100: 688 + attempt to throttle perf's sample rate to this 709 689 percentage of CPU. Note: the kernel calculates an 710 690 "expected" length of each sample event. 100 here means 711 691 100% of that expected length. Even if this is set to ··· 715 691 length is exceeded. Set to 0 if you truly do not care 716 692 how much CPU is consumed. 717 693 718 - ============================================================== 719 694 720 695 perf_event_paranoid: 696 + ==================== 721 697 722 698 Controls use of the performance events system by unprivileged 723 699 users (without CAP_SYS_ADMIN). The default value is 2. 724 700 725 - -1: Allow use of (almost) all events by all users 726 - Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK 727 - >=0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN 728 - Disallow raw tracepoint access by users without CAP_SYS_ADMIN 729 - >=1: Disallow CPU event access by users without CAP_SYS_ADMIN 730 - >=2: Disallow kernel profiling by users without CAP_SYS_ADMIN 701 + === ================================================================== 702 + -1 Allow use of (almost) all events by all users 731 703 732 - ============================================================== 704 + Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK 705 + 706 + >=0 Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN 707 + 708 + Disallow raw tracepoint access by users without CAP_SYS_ADMIN 709 + 710 + >=1 Disallow CPU event access by users without CAP_SYS_ADMIN 711 + 712 + >=2 Disallow kernel profiling by users without CAP_SYS_ADMIN 713 + === ================================================================== 714 + 733 715 734 716 perf_event_max_stack: 717 + ===================== 735 718 736 719 Controls maximum number of stack frames to copy for (attr.sample_type & 737 720 PERF_SAMPLE_CALLCHAIN) configured events, for instance, when using ··· 749 718 750 719 The default value is 127. 751 720 752 - ============================================================== 753 721 754 722 perf_event_mlock_kb: 723 + ==================== 755 724 756 725 Control size of per-cpu ring buffer not counted agains mlock limit. 757 726 758 727 The default value is 512 + 1 page 759 728 760 - ============================================================== 761 729 762 730 perf_event_max_contexts_per_stack: 731 + ================================== 763 732 764 733 Controls maximum number of stack frame context entries for 765 734 (attr.sample_type & PERF_SAMPLE_CALLCHAIN) configured events, for ··· 770 739 771 740 The default value is 8. 772 741 773 - ============================================================== 774 742 775 743 pid_max: 744 + ======== 776 745 777 746 PID allocation wrap value. When the kernel's next PID value 778 747 reaches this value, it wraps back to a minimum PID value. 779 748 PIDs of value pid_max or larger are not allocated. 780 749 781 - ============================================================== 782 750 783 751 ns_last_pid: 752 + ============ 784 753 785 754 The last pid allocated in the current (the one task using this sysctl 786 755 lives in) pid namespace. When selecting a pid for a next task on fork 787 756 kernel tries to allocate a number starting from this one. 788 757 789 - ============================================================== 790 758 791 759 powersave-nap: (PPC only) 760 + ========================= 792 761 793 762 If set, Linux-PPC will use the 'nap' mode of powersaving, 794 763 otherwise the 'doze' mode will be used. ··· 796 765 ============================================================== 797 766 798 767 printk: 768 + ======= 799 769 800 770 The four values in printk denote: console_loglevel, 801 771 default_message_loglevel, minimum_console_loglevel and ··· 806 774 logging error messages. See 'man 2 syslog' for more info on 807 775 the different loglevels. 808 776 809 - - console_loglevel: messages with a higher priority than 810 - this will be printed to the console 811 - - default_message_loglevel: messages without an explicit priority 812 - will be printed with this priority 813 - - minimum_console_loglevel: minimum (highest) value to which 814 - console_loglevel can be set 815 - - default_console_loglevel: default value for console_loglevel 777 + - console_loglevel: 778 + messages with a higher priority than 779 + this will be printed to the console 780 + - default_message_loglevel: 781 + messages without an explicit priority 782 + will be printed with this priority 783 + - minimum_console_loglevel: 784 + minimum (highest) value to which 785 + console_loglevel can be set 786 + - default_console_loglevel: 787 + default value for console_loglevel 816 788 817 - ============================================================== 818 789 819 790 printk_delay: 791 + ============= 820 792 821 793 Delay each printk message in printk_delay milliseconds 822 794 823 795 Value from 0 - 10000 is allowed. 824 796 825 - ============================================================== 826 797 827 798 printk_ratelimit: 799 + ================= 828 800 829 801 Some warning messages are rate limited. printk_ratelimit specifies 830 802 the minimum length of time between these messages (in jiffies), by ··· 836 800 837 801 A value of 0 will disable rate limiting. 838 802 839 - ============================================================== 840 803 841 804 printk_ratelimit_burst: 805 + ======================= 842 806 843 807 While long term we enforce one message per printk_ratelimit 844 808 seconds, we do allow a burst of messages to pass through. 845 809 printk_ratelimit_burst specifies the number of messages we can 846 810 send before ratelimiting kicks in. 847 811 848 - ============================================================== 849 812 850 813 printk_devkmsg: 814 + =============== 851 815 852 816 Control the logging to /dev/kmsg from userspace: 853 817 854 - ratelimit: default, ratelimited 818 + ratelimit: 819 + default, ratelimited 820 + 855 821 on: unlimited logging to /dev/kmsg from userspace 822 + 856 823 off: logging to /dev/kmsg disabled 857 824 858 825 The kernel command line parameter printk.devkmsg= overrides this and is 859 826 a one-time setting until next reboot: once set, it cannot be changed by 860 827 this sysctl interface anymore. 861 828 862 - ============================================================== 863 829 864 830 randomize_va_space: 831 + =================== 865 832 866 833 This option can be used to select the type of process address 867 834 space randomization that is used in the system, for architectures 868 835 that support this feature. 869 836 870 - 0 - Turn the process address space randomization off. This is the 837 + == =========================================================================== 838 + 0 Turn the process address space randomization off. This is the 871 839 default for architectures that do not support this feature anyways, 872 840 and kernels that are booted with the "norandmaps" parameter. 873 841 874 - 1 - Make the addresses of mmap base, stack and VDSO page randomized. 842 + 1 Make the addresses of mmap base, stack and VDSO page randomized. 875 843 This, among other things, implies that shared libraries will be 876 844 loaded to random addresses. Also for PIE-linked binaries, the 877 845 location of code start is randomized. This is the default if the 878 846 CONFIG_COMPAT_BRK option is enabled. 879 847 880 - 2 - Additionally enable heap randomization. This is the default if 848 + 2 Additionally enable heap randomization. This is the default if 881 849 CONFIG_COMPAT_BRK is disabled. 882 850 883 851 There are a few legacy applications out there (such as some ancient ··· 894 854 Systems with ancient and/or broken binaries should be configured 895 855 with CONFIG_COMPAT_BRK enabled, which excludes the heap from process 896 856 address space randomization. 857 + == =========================================================================== 897 858 898 - ============================================================== 899 859 900 860 reboot-cmd: (Sparc only) 861 + ======================== 901 862 902 863 ??? This seems to be a way to give an argument to the Sparc 903 864 ROM/Flash boot loader. Maybe to tell it what to do after 904 865 rebooting. ??? 905 866 906 - ============================================================== 907 867 908 868 rtsig-max & rtsig-nr: 869 + ===================== 909 870 910 871 The file rtsig-max can be used to tune the maximum number 911 872 of POSIX realtime (queued) signals that can be outstanding ··· 914 873 915 874 rtsig-nr shows the number of RT signals currently queued. 916 875 917 - ============================================================== 918 876 919 877 sched_energy_aware: 878 + =================== 920 879 921 880 Enables/disables Energy Aware Scheduling (EAS). EAS starts 922 881 automatically on platforms where it can run (that is, ··· 925 884 requirements for EAS but you do not want to use it, change 926 885 this value to 0. 927 886 928 - ============================================================== 929 887 930 888 sched_schedstats: 889 + ================= 931 890 932 891 Enables/disables scheduler statistics. Enabling this feature 933 892 incurs a small amount of overhead in the scheduler but is 934 893 useful for debugging and performance tuning. 935 894 936 - ============================================================== 937 895 938 896 sg-big-buff: 897 + ============ 939 898 940 899 This file shows the size of the generic SCSI (sg) buffer. 941 900 You can't tune it just yet, but you could change it on ··· 946 905 you can come up with one, you probably know what you 947 906 are doing anyway :) 948 907 949 - ============================================================== 950 908 951 909 shmall: 910 + ======= 952 911 953 912 This parameter sets the total amount of shared memory pages that 954 913 can be used system wide. Hence, SHMALL should always be at least ··· 957 916 If you are not sure what the default PAGE_SIZE is on your Linux 958 917 system, you can run the following command: 959 918 960 - # getconf PAGE_SIZE 919 + # getconf PAGE_SIZE 961 920 962 - ============================================================== 963 921 964 922 shmmax: 923 + ======= 965 924 966 925 This value can be used to query and set the run time limit 967 926 on the maximum shared memory segment size that can be created. 968 927 Shared memory segments up to 1Gb are now supported in the 969 928 kernel. This value defaults to SHMMAX. 970 929 971 - ============================================================== 972 930 973 931 shm_rmid_forced: 932 + ================ 974 933 975 934 Linux lets you set resource limits, including how much memory one 976 935 process can consume, via setrlimit(2). Unfortunately, shared memory ··· 989 948 Note that if you change this from 0 to 1, already created segments 990 949 without users and with a dead originative process will be destroyed. 991 950 992 - ============================================================== 993 951 994 952 sysctl_writes_strict: 953 + ===================== 995 954 996 955 Control how file position affects the behavior of updating sysctl values 997 956 via the /proc/sys interface: 998 957 999 - -1 - Legacy per-write sysctl value handling, with no printk warnings. 958 + == ====================================================================== 959 + -1 Legacy per-write sysctl value handling, with no printk warnings. 1000 960 Each write syscall must fully contain the sysctl value to be 1001 961 written, and multiple writes on the same sysctl file descriptor 1002 962 will rewrite the sysctl value, regardless of file position. 1003 - 0 - Same behavior as above, but warn about processes that perform writes 963 + 0 Same behavior as above, but warn about processes that perform writes 1004 964 to a sysctl file descriptor when the file position is not 0. 1005 - 1 - (default) Respect file position when writing sysctl strings. Multiple 965 + 1 (default) Respect file position when writing sysctl strings. Multiple 1006 966 writes will append to the sysctl value buffer. Anything past the max 1007 967 length of the sysctl value buffer will be ignored. Writes to numeric 1008 968 sysctl entries must always be at file position 0 and the value must 1009 969 be fully contained in the buffer sent in the write syscall. 970 + == ====================================================================== 1010 971 1011 - ============================================================== 1012 972 1013 973 softlockup_all_cpu_backtrace: 974 + ============================= 1014 975 1015 976 This value controls the soft lockup detector thread's behavior 1016 977 when a soft lockup condition is detected as to whether or not ··· 1026 983 1027 984 1: on detection capture more debug information. 1028 985 1029 - ============================================================== 1030 986 1031 - soft_watchdog 987 + soft_watchdog: 988 + ============== 1032 989 1033 990 This parameter can be used to control the soft lockup detector. 1034 991 1035 992 0 - disable the soft lockup detector 993 + 1036 994 1 - enable the soft lockup detector 1037 995 1038 996 The soft lockup detector monitors CPUs for threads that are hogging the CPUs ··· 1043 999 the watchdog timer function, otherwise the NMI watchdog - if enabled - can 1044 1000 detect a hard lockup condition. 1045 1001 1046 - ============================================================== 1047 1002 1048 - stack_erasing 1003 + stack_erasing: 1004 + ============== 1049 1005 1050 1006 This parameter can be used to control kernel stack erasing at the end 1051 1007 of syscalls for kernels built with CONFIG_GCC_PLUGIN_STACKLEAK. ··· 1059 1015 1060 1016 1: kernel stack erasing is enabled (default), it is performed before 1061 1017 returning to the userspace at the end of syscalls. 1062 - ============================================================== 1018 + 1063 1019 1064 1020 tainted 1021 + ======= 1065 1022 1066 1023 Non-zero if the kernel has been tainted. Numeric values, which can be 1067 1024 ORed together. The letters are seen in "Tainted" line of Oops reports. 1068 1025 1069 - 1 (P): proprietary module was loaded 1070 - 2 (F): module was force loaded 1071 - 4 (S): SMP kernel oops on an officially SMP incapable processor 1072 - 8 (R): module was force unloaded 1073 - 16 (M): processor reported a Machine Check Exception (MCE) 1074 - 32 (B): bad page referenced or some unexpected page flags 1075 - 64 (U): taint requested by userspace application 1076 - 128 (D): kernel died recently, i.e. there was an OOPS or BUG 1077 - 256 (A): an ACPI table was overridden by user 1078 - 512 (W): kernel issued warning 1079 - 1024 (C): staging driver was loaded 1080 - 2048 (I): workaround for bug in platform firmware applied 1081 - 4096 (O): externally-built ("out-of-tree") module was loaded 1082 - 8192 (E): unsigned module was loaded 1083 - 16384 (L): soft lockup occurred 1084 - 32768 (K): kernel has been live patched 1085 - 65536 (X): Auxiliary taint, defined and used by for distros 1086 - 131072 (T): The kernel was built with the struct randomization plugin 1026 + ====== ===== ============================================================== 1027 + 1 `(P)` proprietary module was loaded 1028 + 2 `(F)` module was force loaded 1029 + 4 `(S)` SMP kernel oops on an officially SMP incapable processor 1030 + 8 `(R)` module was force unloaded 1031 + 16 `(M)` processor reported a Machine Check Exception (MCE) 1032 + 32 `(B)` bad page referenced or some unexpected page flags 1033 + 64 `(U)` taint requested by userspace application 1034 + 128 `(D)` kernel died recently, i.e. there was an OOPS or BUG 1035 + 256 `(A)` an ACPI table was overridden by user 1036 + 512 `(W)` kernel issued warning 1037 + 1024 `(C)` staging driver was loaded 1038 + 2048 `(I)` workaround for bug in platform firmware applied 1039 + 4096 `(O)` externally-built ("out-of-tree") module was loaded 1040 + 8192 `(E)` unsigned module was loaded 1041 + 16384 `(L)` soft lockup occurred 1042 + 32768 `(K)` kernel has been live patched 1043 + 65536 `(X)` Auxiliary taint, defined and used by for distros 1044 + 131072 `(T)` The kernel was built with the struct randomization plugin 1045 + ====== ===== ============================================================== 1087 1046 1088 1047 See Documentation/admin-guide/tainted-kernels.rst for more information. 1089 1048 1090 - ============================================================== 1091 1049 1092 - threads-max 1050 + threads-max: 1051 + ============ 1093 1052 1094 1053 This value controls the maximum number of threads that can be created 1095 1054 using fork(). ··· 1102 1055 a part (1/8th) of the available RAM pages. 1103 1056 1104 1057 The minimum value that can be written to threads-max is 20. 1058 + 1105 1059 The maximum value that can be written to threads-max is given by the 1106 1060 constant FUTEX_TID_MASK (0x3fffffff). 1061 + 1107 1062 If a value outside of this range is written to threads-max an error 1108 1063 EINVAL occurs. 1109 1064 ··· 1113 1064 thread structures would occupy too much (more than 1/8th) of the 1114 1065 available RAM pages threads-max is reduced accordingly. 1115 1066 1116 - ============================================================== 1117 1067 1118 1068 unknown_nmi_panic: 1069 + ================== 1119 1070 1120 1071 The value in this file affects behavior of handling NMI. When the 1121 1072 value is non-zero, unknown NMI is trapped and then panic occurs. At ··· 1124 1075 NMI switch that most IA32 servers have fires unknown NMI up, for 1125 1076 example. If a system hangs up, try pressing the NMI switch. 1126 1077 1127 - ============================================================== 1128 1078 1129 1079 watchdog: 1080 + ========= 1130 1081 1131 1082 This parameter can be used to disable or enable the soft lockup detector 1132 1083 _and_ the NMI watchdog (i.e. the hard lockup detector) at the same time. 1133 1084 1134 1085 0 - disable both lockup detectors 1086 + 1135 1087 1 - enable both lockup detectors 1136 1088 1137 1089 The soft lockup detector and the NMI watchdog can also be disabled or 1138 1090 enabled individually, using the soft_watchdog and nmi_watchdog parameters. 1139 - If the watchdog parameter is read, for example by executing 1091 + If the watchdog parameter is read, for example by executing:: 1140 1092 1141 1093 cat /proc/sys/kernel/watchdog 1142 1094 1143 1095 the output of this command (0 or 1) shows the logical OR of soft_watchdog 1144 1096 and nmi_watchdog. 1145 1097 1146 - ============================================================== 1147 1098 1148 1099 watchdog_cpumask: 1100 + ================= 1149 1101 1150 1102 This value can be used to control on which cpus the watchdog may run. 1151 1103 The default cpumask is all possible cores, but if NO_HZ_FULL is ··· 1161 1111 1162 1112 The argument value is the standard cpulist format for cpumasks, 1163 1113 so for example to enable the watchdog on cores 0, 2, 3, and 4 you 1164 - might say: 1114 + might say:: 1165 1115 1166 1116 echo 0,2-4 > /proc/sys/kernel/watchdog_cpumask 1167 1117 1168 - ============================================================== 1169 1118 1170 1119 watchdog_thresh: 1120 + ================ 1171 1121 1172 1122 This value can be used to control the frequency of hrtimer and NMI 1173 1123 events and the soft and hard lockup thresholds. The default threshold ··· 1175 1125 1176 1126 The softlockup threshold is (2 * watchdog_thresh). Setting this 1177 1127 tunable to zero will disable lockup detection altogether. 1178 - 1179 - ==============================================================

+91 -52

Documentation/sysctl/net.txt Documentation/sysctl/net.rst

··· 1 - Documentation for /proc/sys/net/* 2 - (c) 1999 Terrehon Bowden <terrehon@pacbell.net> 3 - Bodo Bauer <bb@ricochet.net> 4 - (c) 2000 Jorge Nerin <comandante@zaralinux.com> 5 - (c) 2009 Shen Feng <shen@cn.fujitsu.com> 1 + ================================ 2 + Documentation for /proc/sys/net/ 3 + ================================ 6 4 7 - For general info and legal blurb, please look in README. 5 + Copyright 8 6 9 - ============================================================== 7 + Copyright (c) 1999 8 + 9 + - Terrehon Bowden <terrehon@pacbell.net> 10 + - Bodo Bauer <bb@ricochet.net> 11 + 12 + Copyright (c) 2000 13 + 14 + - Jorge Nerin <comandante@zaralinux.com> 15 + 16 + Copyright (c) 2009 17 + 18 + - Shen Feng <shen@cn.fujitsu.com> 19 + 20 + For general info and legal blurb, please look in index.rst. 21 + 22 + ------------------------------------------------------------------------------ 10 23 11 24 This file contains the documentation for the sysctl files in 12 25 /proc/sys/net ··· 30 17 31 18 32 19 Table : Subdirectories in /proc/sys/net 33 - .............................................................................. 34 - Directory Content Directory Content 35 - core General parameter appletalk Appletalk protocol 36 - unix Unix domain sockets netrom NET/ROM 37 - 802 E802 protocol ax25 AX25 38 - ethernet Ethernet protocol rose X.25 PLP layer 39 - ipv4 IP version 4 x25 X.25 protocol 40 - ipx IPX token-ring IBM token ring 41 - bridge Bridging decnet DEC net 42 - ipv6 IP version 6 tipc TIPC 43 - .............................................................................. 20 + 21 + ========= =================== = ========== ================== 22 + Directory Content Directory Content 23 + ========= =================== = ========== ================== 24 + core General parameter appletalk Appletalk protocol 25 + unix Unix domain sockets netrom NET/ROM 26 + 802 E802 protocol ax25 AX25 27 + ethernet Ethernet protocol rose X.25 PLP layer 28 + ipv4 IP version 4 x25 X.25 protocol 29 + ipx IPX token-ring IBM token ring 30 + bridge Bridging decnet DEC net 31 + ipv6 IP version 6 tipc TIPC 32 + ========= =================== = ========== ================== 44 33 45 34 1. /proc/sys/net/core - Network core options 46 - ------------------------------------------------------- 35 + ============================================ 47 36 48 37 bpf_jit_enable 49 38 -------------- ··· 59 44 through bpf(2) and passing a verifier in the kernel, a JIT will then 60 45 translate these BPF proglets into native CPU instructions. There are 61 46 two flavors of JITs, the newer eBPF JIT currently supported on: 47 + 62 48 - x86_64 63 49 - x86_32 64 50 - arm64 ··· 71 55 - riscv 72 56 73 57 And the older cBPF JIT supported on the following archs: 58 + 74 59 - mips 75 60 - ppc 76 61 - sparc ··· 82 65 tcpdump filters, seccomp rules, etc, but not mentioned eBPF 83 66 programs loaded through bpf(2). 84 67 85 - Values : 86 - 0 - disable the JIT (default value) 87 - 1 - enable the JIT 88 - 2 - enable the JIT and ask the compiler to emit traces on kernel log. 68 + Values: 69 + 70 + - 0 - disable the JIT (default value) 71 + - 1 - enable the JIT 72 + - 2 - enable the JIT and ask the compiler to emit traces on kernel log. 89 73 90 74 bpf_jit_harden 91 75 -------------- ··· 94 76 This enables hardening for the BPF JIT compiler. Supported are eBPF 95 77 JIT backends. Enabling hardening trades off performance, but can 96 78 mitigate JIT spraying. 97 - Values : 98 - 0 - disable JIT hardening (default value) 99 - 1 - enable JIT hardening for unprivileged users only 100 - 2 - enable JIT hardening for all users 79 + 80 + Values: 81 + 82 + - 0 - disable JIT hardening (default value) 83 + - 1 - enable JIT hardening for unprivileged users only 84 + - 2 - enable JIT hardening for all users 101 85 102 86 bpf_jit_kallsyms 103 87 ---------------- ··· 109 89 in /proc/kallsyms. This enables export of these addresses, which can 110 90 be used for debugging/tracing. If bpf_jit_harden is enabled, this 111 91 feature is disabled. 92 + 112 93 Values : 113 - 0 - disable JIT kallsyms export (default value) 114 - 1 - enable JIT kallsyms export for privileged users only 94 + 95 + - 0 - disable JIT kallsyms export (default value) 96 + - 1 - enable JIT kallsyms export for privileged users only 115 97 116 98 bpf_jit_limit 117 99 ------------- ··· 124 102 in bytes. 125 103 126 104 dev_weight 127 - -------------- 105 + ---------- 128 106 129 107 The maximum number of packets that kernel can handle on a NAPI interrupt, 130 108 it's a Per-CPU variable. For drivers that support LRO or GRO_HW, a hardware ··· 133 111 Default: 64 134 112 135 113 dev_weight_rx_bias 136 - -------------- 114 + ------------------ 137 115 138 116 RPS (e.g. RFS, aRFS) processing is competing with the registered NAPI poll function 139 117 of the driver for the per softirq cycle netdev_budget. This parameter influences ··· 142 120 dev_weight adaptable for asymmetric CPU needs on RX/TX side of the network stack. 143 121 (see dev_weight_tx_bias) It is effective on a per CPU basis. Determination is based 144 122 on dev_weight and is calculated multiplicative (dev_weight * dev_weight_rx_bias). 123 + 145 124 Default: 1 146 125 147 126 dev_weight_tx_bias 148 - -------------- 127 + ------------------ 149 128 150 129 Scales the maximum number of packets that can be processed during a TX softirq cycle. 151 130 Effective on a per CPU basis. Allows scaling of current dev_weight for asymmetric 152 131 net stack processing needs. Be careful to avoid making TX softirq processing a CPU hog. 132 + 153 133 Calculation is based on dev_weight (dev_weight * dev_weight_tx_bias). 134 + 154 135 Default: 1 155 136 156 137 default_qdisc 157 - -------------- 138 + ------------- 158 139 159 140 The default queuing discipline to use for network devices. This allows 160 141 overriding the default of pfifo_fast with an alternative. Since the default ··· 169 144 interfaces still use mq as root qdisc, which in turn uses this default for its 170 145 leaves. Virtual devices (like e.g. lo or veth) ignore this setting and instead 171 146 default to noqueue. 147 + 172 148 Default: pfifo_fast 173 149 174 150 busy_read 175 - ---------------- 151 + --------- 152 + 176 153 Low latency busy poll timeout for socket reads. (needs CONFIG_NET_RX_BUSY_POLL) 177 154 Approximate time in us to busy loop waiting for packets on the device queue. 178 155 This sets the default value of the SO_BUSY_POLL socket option. 179 156 Can be set or overridden per socket by setting socket option SO_BUSY_POLL, 180 157 which is the preferred method of enabling. If you need to enable the feature 181 158 globally via sysctl, a value of 50 is recommended. 159 + 182 160 Will increase power usage. 161 + 183 162 Default: 0 (off) 184 163 185 164 busy_poll ··· 196 167 Note that only sockets with SO_BUSY_POLL set will be busy polled, 197 168 so you want to either selectively set SO_BUSY_POLL on those sockets or set 198 169 sysctl.net.busy_read globally. 170 + 199 171 Will increase power usage. 172 + 200 173 Default: 0 (off) 201 174 202 175 rmem_default ··· 216 185 Allow processes to receive tx timestamps looped together with the original 217 186 packet contents. If disabled, transmit timestamp requests from unprivileged 218 187 processes are dropped unless socket option SOF_TIMESTAMPING_OPT_TSONLY is set. 188 + 219 189 Default: 1 (on) 220 190 221 191 ··· 282 250 Some user space might need to gather its content even if drivers do not 283 251 provide ethtool -x support yet. 284 252 285 - myhost:~# cat /proc/sys/net/core/netdev_rss_key 286 - 84:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8: ... (52 bytes total) 253 + :: 254 + 255 + myhost:~# cat /proc/sys/net/core/netdev_rss_key 256 + 84:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8: ... (52 bytes total) 287 257 288 258 File contains nul bytes if no driver ever called netdev_rss_key_fill() function. 289 - Note: 290 - /proc/sys/net/core/netdev_rss_key contains 52 bytes of key, 291 - but most drivers only use 40 bytes of it. 292 259 293 - myhost:~# ethtool -x eth0 294 - RX flow hash indirection table for eth0 with 8 RX ring(s): 295 - 0: 0 1 2 3 4 5 6 7 296 - RSS hash key: 297 - 84:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8:43:e3:c9:0c:fd:17:55:c2:3a:4d:69:ed:f1:42:89 260 + Note: 261 + /proc/sys/net/core/netdev_rss_key contains 52 bytes of key, 262 + but most drivers only use 40 bytes of it. 263 + 264 + :: 265 + 266 + myhost:~# ethtool -x eth0 267 + RX flow hash indirection table for eth0 with 8 RX ring(s): 268 + 0: 0 1 2 3 4 5 6 7 269 + RSS hash key: 270 + 84:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8:43:e3:c9:0c:fd:17:55:c2:3a:4d:69:ed:f1:42:89 298 271 299 272 netdev_tstamp_prequeue 300 273 ---------------------- ··· 330 293 Default : 0 (for compatibility reasons) 331 294 332 295 devconf_inherit_init_net 333 - ---------------------------- 296 + ------------------------ 334 297 335 298 Controls if a new network namespace should inherit all current 336 299 settings under /proc/sys/net/{ipv4,ipv6}/conf/{all,default}/. By ··· 344 307 Default : 0 (for compatibility reasons) 345 308 346 309 2. /proc/sys/net/unix - Parameters for Unix domain sockets 347 - ------------------------------------------------------- 310 + ---------------------------------------------------------- 348 311 349 312 There is only one file in this directory. 350 313 unix_dgram_qlen limits the max number of datagrams queued in Unix domain ··· 352 315 353 316 354 317 3. /proc/sys/net/ipv4 - IPV4 settings 355 - ------------------------------------------------------- 318 + ------------------------------------- 356 319 Please see: Documentation/networking/ip-sysctl.txt and ipvs-sysctl.txt for 357 320 descriptions of these entries. 358 321 359 322 360 323 4. Appletalk 361 - ------------------------------------------------------- 324 + ------------ 362 325 363 326 The /proc/sys/net/appletalk directory holds the Appletalk configuration data 364 327 when Appletalk is loaded. The configurable parameters are: ··· 403 366 404 367 405 368 5. IPX 406 - ------------------------------------------------------- 369 + ------ 407 370 408 371 The IPX protocol has no tunable values in proc/sys/net. 409 372 ··· 428 391 address of the router (or Connected) for internal networks. 429 392 430 393 6. TIPC 431 - ------------------------------------------------------- 394 + ------- 432 395 433 396 tipc_rmem 434 - ---------- 397 + --------- 435 398 436 399 The TIPC protocol now has a tunable for the receive memory, similar to the 437 400 tcp_rmem - i.e. a vector of 3 INTEGERs: (min, default, max) 401 + 402 + :: 438 403 439 404 # cat /proc/sys/net/tipc/tipc_rmem 440 405 4252725 34021800 68043600 ··· 448 409 preserved in order to be consistent with things like tcp_rmem. 449 410 450 411 named_timeout 451 - -------------- 412 + ------------- 452 413 453 414 TIPC name table updates are distributed asynchronously in a cluster, without 454 415 any form of transaction handling. This means that different race scenarios are

+9 -4

Documentation/sysctl/sunrpc.txt Documentation/sysctl/sunrpc.rst

··· 1 - Documentation for /proc/sys/sunrpc/* kernel version 2.2.10 2 - (c) 1998, 1999, Rik van Riel <riel@nl.linux.org> 1 + =================================== 2 + Documentation for /proc/sys/sunrpc/ 3 + =================================== 3 4 4 - For general info and legal blurb, please look in README. 5 + kernel version 2.2.10 5 6 6 - ============================================================== 7 + Copyright (c) 1998, 1999, Rik van Riel <riel@nl.linux.org> 8 + 9 + For general info and legal blurb, please look in index.rst. 10 + 11 + ------------------------------------------------------------------------------ 7 12 8 13 This file contains the documentation for the sysctl files in 9 14 /proc/sys/sunrpc and is valid for Linux kernel version 2.2.

+22 -10

Documentation/sysctl/user.txt Documentation/sysctl/user.rst

··· 1 - Documentation for /proc/sys/user/* kernel version 4.9.0 2 - (c) 2016 Eric Biederman <ebiederm@xmission.com> 1 + ================================= 2 + Documentation for /proc/sys/user/ 3 + ================================= 3 4 4 - ============================================================== 5 + kernel version 4.9.0 6 + 7 + Copyright (c) 2016 Eric Biederman <ebiederm@xmission.com> 8 + 9 + ------------------------------------------------------------------------------ 5 10 6 11 This file contains the documentation for the sysctl files in 7 12 /proc/sys/user. ··· 35 30 36 31 Currently, these files are in /proc/sys/user: 37 32 38 - - max_cgroup_namespaces 33 + max_cgroup_namespaces 34 + ===================== 39 35 40 36 The maximum number of cgroup namespaces that any user in the current 41 37 user namespace may create. 42 38 43 - - max_ipc_namespaces 39 + max_ipc_namespaces 40 + ================== 44 41 45 42 The maximum number of ipc namespaces that any user in the current 46 43 user namespace may create. 47 44 48 - - max_mnt_namespaces 45 + max_mnt_namespaces 46 + ================== 49 47 50 48 The maximum number of mount namespaces that any user in the current 51 49 user namespace may create. 52 50 53 - - max_net_namespaces 51 + max_net_namespaces 52 + ================== 54 53 55 54 The maximum number of network namespaces that any user in the 56 55 current user namespace may create. 57 56 58 - - max_pid_namespaces 57 + max_pid_namespaces 58 + ================== 59 59 60 60 The maximum number of pid namespaces that any user in the current 61 61 user namespace may create. 62 62 63 - - max_user_namespaces 63 + max_user_namespaces 64 + =================== 64 65 65 66 The maximum number of user namespaces that any user in the current 66 67 user namespace may create. 67 68 68 - - max_uts_namespaces 69 + max_uts_namespaces 70 + ================== 69 71 70 72 The maximum number of user namespaces that any user in the current 71 73 user namespace may create.

+138 -120

Documentation/sysctl/vm.txt Documentation/sysctl/vm.rst

··· 1 - Documentation for /proc/sys/vm/* kernel version 2.6.29 2 - (c) 1998, 1999, Rik van Riel <riel@nl.linux.org> 3 - (c) 2008 Peter W. Morreale <pmorreale@novell.com> 1 + =============================== 2 + Documentation for /proc/sys/vm/ 3 + =============================== 4 4 5 - For general info and legal blurb, please look in README. 5 + kernel version 2.6.29 6 6 7 - ============================================================== 7 + Copyright (c) 1998, 1999, Rik van Riel <riel@nl.linux.org> 8 + 9 + Copyright (c) 2008 Peter W. Morreale <pmorreale@novell.com> 10 + 11 + For general info and legal blurb, please look in index.rst. 12 + 13 + ------------------------------------------------------------------------------ 8 14 9 15 This file contains the documentation for the sysctl files in 10 16 /proc/sys/vm and is valid for Linux kernel version 2.6.29. ··· 74 68 - watermark_scale_factor 75 69 - zone_reclaim_mode 76 70 77 - ============================================================== 78 71 79 72 admin_reserve_kbytes 73 + ==================== 80 74 81 75 The amount of free memory in the system that should be reserved for users 82 76 with the capability cap_sys_admin. ··· 103 97 104 98 Changing this takes effect whenever an application requests memory. 105 99 106 - ============================================================== 107 100 108 101 block_dump 102 + ========== 109 103 110 104 block_dump enables block I/O debugging when set to a nonzero value. More 111 105 information on block I/O debugging is in Documentation/laptops/laptop-mode.rst. 112 106 113 - ============================================================== 114 107 115 108 compact_memory 109 + ============== 116 110 117 111 Available only when CONFIG_COMPACTION is set. When 1 is written to the file, 118 112 all zones are compacted such that free memory is available in contiguous 119 113 blocks where possible. This can be important for example in the allocation of 120 114 huge pages although processes will also directly compact memory as required. 121 115 122 - ============================================================== 123 116 124 117 compact_unevictable_allowed 118 + =========================== 125 119 126 120 Available only when CONFIG_COMPACTION is set. When set to 1, compaction is 127 121 allowed to examine the unevictable lru (mlocked pages) for pages to compact. ··· 129 123 acceptable trade for large contiguous free memory. Set to 0 to prevent 130 124 compaction from moving pages that are unevictable. Default value is 1. 131 125 132 - ============================================================== 133 126 134 127 dirty_background_bytes 128 + ====================== 135 129 136 130 Contains the amount of dirty memory at which the background kernel 137 131 flusher threads will start writeback. 138 132 139 - Note: dirty_background_bytes is the counterpart of dirty_background_ratio. Only 140 - one of them may be specified at a time. When one sysctl is written it is 141 - immediately taken into account to evaluate the dirty memory limits and the 142 - other appears as 0 when read. 133 + Note: 134 + dirty_background_bytes is the counterpart of dirty_background_ratio. Only 135 + one of them may be specified at a time. When one sysctl is written it is 136 + immediately taken into account to evaluate the dirty memory limits and the 137 + other appears as 0 when read. 143 138 144 - ============================================================== 145 139 146 140 dirty_background_ratio 141 + ====================== 147 142 148 143 Contains, as a percentage of total available memory that contains free pages 149 144 and reclaimable pages, the number of pages at which the background kernel ··· 152 145 153 146 The total available memory is not equal to total system memory. 154 147 155 - ============================================================== 156 148 157 149 dirty_bytes 150 + =========== 158 151 159 152 Contains the amount of dirty memory at which a process generating disk writes 160 153 will itself start writeback. ··· 168 161 value lower than this limit will be ignored and the old configuration will be 169 162 retained. 170 163 171 - ============================================================== 172 164 173 165 dirty_expire_centisecs 166 + ====================== 174 167 175 168 This tunable is used to define when dirty data is old enough to be eligible 176 169 for writeout by the kernel flusher threads. It is expressed in 100'ths 177 170 of a second. Data which has been dirty in-memory for longer than this 178 171 interval will be written out next time a flusher thread wakes up. 179 172 180 - ============================================================== 181 173 182 174 dirty_ratio 175 + =========== 183 176 184 177 Contains, as a percentage of total available memory that contains free pages 185 178 and reclaimable pages, the number of pages at which a process which is ··· 187 180 188 181 The total available memory is not equal to total system memory. 189 182 190 - ============================================================== 191 183 192 184 dirtytime_expire_seconds 185 + ======================== 193 186 194 187 When a lazytime inode is constantly having its pages dirtied, the inode with 195 188 an updated timestamp will never get chance to be written out. And, if the ··· 199 192 inode is old enough to be eligible for writeback by the kernel flusher threads. 200 193 And, it is also used as the interval to wakeup dirtytime_writeback thread. 201 194 202 - ============================================================== 203 195 204 196 dirty_writeback_centisecs 197 + ========================= 205 198 206 - The kernel flusher threads will periodically wake up and write `old' data 199 + The kernel flusher threads will periodically wake up and write `old` data 207 200 out to disk. This tunable expresses the interval between those wakeups, in 208 201 100'ths of a second. 209 202 210 203 Setting this to zero disables periodic writeback altogether. 211 204 212 - ============================================================== 213 205 214 206 drop_caches 207 + =========== 215 208 216 209 Writing to this will cause the kernel to drop clean caches, as well as 217 210 reclaimable slab objects like dentries and inodes. Once dropped, their 218 211 memory becomes free. 219 212 220 - To free pagecache: 213 + To free pagecache:: 214 + 221 215 echo 1 > /proc/sys/vm/drop_caches 222 - To free reclaimable slab objects (includes dentries and inodes): 216 + 217 + To free reclaimable slab objects (includes dentries and inodes):: 218 + 223 219 echo 2 > /proc/sys/vm/drop_caches 224 - To free slab objects and pagecache: 220 + 221 + To free slab objects and pagecache:: 222 + 225 223 echo 3 > /proc/sys/vm/drop_caches 226 224 227 225 This is a non-destructive operation and will not free any dirty objects. 228 226 To increase the number of objects freed by this operation, the user may run 229 - `sync' prior to writing to /proc/sys/vm/drop_caches. This will minimize the 227 + `sync` prior to writing to /proc/sys/vm/drop_caches. This will minimize the 230 228 number of dirty objects on the system and create more candidates to be 231 229 dropped. 232 230 ··· 245 233 use outside of a testing or debugging environment is not recommended. 246 234 247 235 You may see informational messages in your kernel log when this file is 248 - used: 236 + used:: 249 237 250 238 cat (1234): drop_caches: 3 251 239 252 240 These are informational only. They do not mean that anything is wrong 253 241 with your system. To disable them, echo 4 (bit 2) into drop_caches. 254 242 255 - ============================================================== 256 243 257 244 extfrag_threshold 245 + ================= 258 246 259 247 This parameter affects whether the kernel will compact memory or direct 260 248 reclaim to satisfy a high-order allocation. The extfrag/extfrag_index file in ··· 266 254 The kernel will not compact memory in a zone if the 267 255 fragmentation index is <= extfrag_threshold. The default value is 500. 268 256 269 - ============================================================== 270 257 271 258 highmem_is_dirtyable 259 + ==================== 272 260 273 261 Available only for systems with CONFIG_HIGHMEM enabled (32b systems). 274 262 ··· 286 274 only use the low memory and they can fill it up with dirty data without 287 275 any throttling. 288 276 289 - ============================================================== 290 277 291 278 hugetlb_shm_group 279 + ================= 292 280 293 281 hugetlb_shm_group contains group id that is allowed to create SysV 294 282 shared memory segment using hugetlb page. 295 283 296 - ============================================================== 297 284 298 285 laptop_mode 286 + =========== 299 287 300 288 laptop_mode is a knob that controls "laptop mode". All the things that are 301 289 controlled by this knob are discussed in Documentation/laptops/laptop-mode.rst. 302 290 303 - ============================================================== 304 291 305 292 legacy_va_layout 293 + ================ 306 294 307 295 If non-zero, this sysctl disables the new 32-bit mmap layout - the kernel 308 296 will use the legacy (2.4) layout for all processes. 309 297 310 - ============================================================== 311 298 312 299 lowmem_reserve_ratio 300 + ==================== 313 301 314 302 For some specialised workloads on highmem machines it is dangerous for 315 303 the kernel to allow process memory to be allocated from the "lowmem" ··· 320 308 can be fatal. 321 309 322 310 So the Linux page allocator has a mechanism which prevents allocations 323 - which _could_ use highmem from using too much lowmem. This means that 311 + which *could* use highmem from using too much lowmem. This means that 324 312 a certain amount of lowmem is defended from the possibility of being 325 313 captured into pinned user memory. 326 314 ··· 328 316 mechanism will also defend that region from allocations which could use 329 317 highmem or lowmem). 330 318 331 - The `lowmem_reserve_ratio' tunable determines how aggressive the kernel is 319 + The `lowmem_reserve_ratio` tunable determines how aggressive the kernel is 332 320 in defending these lower zones. 333 321 334 322 If you have a machine which uses highmem or ISA DMA and your 335 323 applications are using mlock(), or if you are running with no swap then 336 324 you probably should change the lowmem_reserve_ratio setting. 337 325 338 - The lowmem_reserve_ratio is an array. You can see them by reading this file. 339 - - 340 - % cat /proc/sys/vm/lowmem_reserve_ratio 341 - 256 256 32 342 - - 326 + The lowmem_reserve_ratio is an array. You can see them by reading this file:: 327 + 328 + % cat /proc/sys/vm/lowmem_reserve_ratio 329 + 256 256 32 343 330 344 331 But, these values are not used directly. The kernel calculates # of protection 345 332 pages for each zones from them. These are shown as array of protection pages 346 333 in /proc/zoneinfo like followings. (This is an example of x86-64 box). 347 - Each zone has an array of protection pages like this. 334 + Each zone has an array of protection pages like this:: 348 335 349 - - 350 - Node 0, zone DMA 351 - pages free 1355 352 - min 3 353 - low 3 354 - high 4 336 + Node 0, zone DMA 337 + pages free 1355 338 + min 3 339 + low 3 340 + high 4 355 341 : 356 342 : 357 - numa_other 0 358 - protection: (0, 2004, 2004, 2004) 343 + numa_other 0 344 + protection: (0, 2004, 2004, 2004) 359 345 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 360 - pagesets 361 - cpu: 0 pcp: 0 362 - : 363 - - 346 + pagesets 347 + cpu: 0 pcp: 0 348 + : 349 + 364 350 These protections are added to score to judge whether this zone should be used 365 351 for page allocation or should be reclaimed. 366 352 ··· 369 359 normal page requirement. If requirement is DMA zone(index=0), protection[0] 370 360 (=0) is used. 371 361 372 - zone[i]'s protection[j] is calculated by following expression. 362 + zone[i]'s protection[j] is calculated by following expression:: 373 363 374 - (i < j): 375 - zone[i]->protection[j] 376 - = (total sums of managed_pages from zone[i+1] to zone[j] on the node) 377 - / lowmem_reserve_ratio[i]; 378 - (i = j): 379 - (should not be protected. = 0; 380 - (i > j): 381 - (not necessary, but looks 0) 364 + (i < j): 365 + zone[i]->protection[j] 366 + = (total sums of managed_pages from zone[i+1] to zone[j] on the node) 367 + / lowmem_reserve_ratio[i]; 368 + (i = j): 369 + (should not be protected. = 0; 370 + (i > j): 371 + (not necessary, but looks 0) 382 372 383 373 The default values of lowmem_reserve_ratio[i] are 374 + 375 + === ==================================== 384 376 256 (if zone[i] means DMA or DMA32 zone) 385 - 32 (others). 377 + 32 (others) 378 + === ==================================== 379 + 386 380 As above expression, they are reciprocal number of ratio. 387 381 256 means 1/256. # of protection pages becomes about "0.39%" of total managed 388 382 pages of higher zones on the node. ··· 395 381 The minimum value is 1 (1/1 -> 100%). The value less than 1 completely 396 382 disables protection of the pages. 397 383 398 - ============================================================== 399 384 400 385 max_map_count: 386 + ============== 401 387 402 388 This file contains the maximum number of memory map areas a process 403 389 may have. Memory map areas are used as a side-effect of calling ··· 410 396 411 397 The default value is 65536. 412 398 413 - ============================================================= 414 399 415 400 memory_failure_early_kill: 401 + ========================== 416 402 417 403 Control how to kill processes when uncorrected memory error (typically 418 404 a 2bit error in a memory module) is detected in the background by hardware ··· 438 424 439 425 Applications can override this setting individually with the PR_MCE_KILL prctl 440 426 441 - ============================================================== 442 427 443 428 memory_failure_recovery 429 + ======================= 444 430 445 431 Enable memory failure recovery (when supported by the platform) 446 432 ··· 448 434 449 435 0: Always panic on a memory failure. 450 436 451 - ============================================================== 452 437 453 - min_free_kbytes: 438 + min_free_kbytes 439 + =============== 454 440 455 441 This is used to force the Linux VM to keep a minimum number 456 442 of kilobytes free. The VM uses this number to compute a ··· 464 450 465 451 Setting this too high will OOM your machine instantly. 466 452 467 - ============================================================= 468 453 469 - min_slab_ratio: 454 + min_slab_ratio 455 + ============== 470 456 471 457 This is available only on NUMA kernels. 472 458 ··· 482 468 The process of reclaiming slab memory is currently not node specific 483 469 and may not be fast. 484 470 485 - ============================================================= 486 471 487 - min_unmapped_ratio: 472 + min_unmapped_ratio 473 + ================== 488 474 489 475 This is available only on NUMA kernels. 490 476 ··· 499 485 500 486 The default is 1 percent. 501 487 502 - ============================================================== 503 488 504 489 mmap_min_addr 490 + ============= 505 491 506 492 This file indicates the amount of address space which a user process will 507 493 be restricted from mmapping. Since kernel null dereference bugs could ··· 512 498 vast majority of applications to work correctly and provide defense in depth 513 499 against future potential kernel bugs. 514 500 515 - ============================================================== 516 501 517 - mmap_rnd_bits: 502 + mmap_rnd_bits 503 + ============= 518 504 519 505 This value can be used to select the number of bits to use to 520 506 determine the random offset to the base address of vma regions ··· 525 511 This value can be changed after boot using the 526 512 /proc/sys/vm/mmap_rnd_bits tunable 527 513 528 - ============================================================== 529 514 530 - mmap_rnd_compat_bits: 515 + mmap_rnd_compat_bits 516 + ==================== 531 517 532 518 This value can be used to select the number of bits to use to 533 519 determine the random offset to the base address of vma regions ··· 539 525 This value can be changed after boot using the 540 526 /proc/sys/vm/mmap_rnd_compat_bits tunable 541 527 542 - ============================================================== 543 528 544 529 nr_hugepages 530 + ============ 545 531 546 532 Change the minimum size of the hugepage pool. 547 533 548 534 See Documentation/admin-guide/mm/hugetlbpage.rst 549 535 550 - ============================================================== 551 536 552 537 nr_hugepages_mempolicy 538 + ====================== 553 539 554 540 Change the size of the hugepage pool at run-time on a specific 555 541 set of NUMA nodes. 556 542 557 543 See Documentation/admin-guide/mm/hugetlbpage.rst 558 544 559 - ============================================================== 560 545 561 546 nr_overcommit_hugepages 547 + ======================= 562 548 563 549 Change the maximum size of the hugepage pool. The maximum is 564 550 nr_hugepages + nr_overcommit_hugepages. 565 551 566 552 See Documentation/admin-guide/mm/hugetlbpage.rst 567 553 568 - ============================================================== 569 554 570 555 nr_trim_pages 556 + ============= 571 557 572 558 This is available only on NOMMU kernels. 573 559 ··· 582 568 583 569 See Documentation/nommu-mmap.txt for more information. 584 570 585 - ============================================================== 586 571 587 572 numa_zonelist_order 573 + =================== 588 574 589 575 This sysctl is only for NUMA and it is deprecated. Anything but 590 576 Node order will fail! 591 577 592 578 'where the memory is allocated from' is controlled by zonelists. 579 + 593 580 (This documentation ignores ZONE_HIGHMEM/ZONE_DMA32 for simple explanation. 594 - you may be able to read ZONE_DMA as ZONE_DMA32...) 581 + you may be able to read ZONE_DMA as ZONE_DMA32...) 595 582 596 583 In non-NUMA case, a zonelist for GFP_KERNEL is ordered as following. 597 584 ZONE_NORMAL -> ZONE_DMA ··· 600 585 get memory from ZONE_DMA only when ZONE_NORMAL is not available. 601 586 602 587 In NUMA case, you can think of following 2 types of order. 603 - Assume 2 node NUMA and below is zonelist of Node(0)'s GFP_KERNEL 588 + Assume 2 node NUMA and below is zonelist of Node(0)'s GFP_KERNEL:: 604 589 605 - (A) Node(0) ZONE_NORMAL -> Node(0) ZONE_DMA -> Node(1) ZONE_NORMAL 606 - (B) Node(0) ZONE_NORMAL -> Node(1) ZONE_NORMAL -> Node(0) ZONE_DMA. 590 + (A) Node(0) ZONE_NORMAL -> Node(0) ZONE_DMA -> Node(1) ZONE_NORMAL 591 + (B) Node(0) ZONE_NORMAL -> Node(1) ZONE_NORMAL -> Node(0) ZONE_DMA. 607 592 608 593 Type(A) offers the best locality for processes on Node(0), but ZONE_DMA 609 594 will be used before ZONE_NORMAL exhaustion. This increases possibility of ··· 631 616 Default order is recommended unless this is causing problems for your 632 617 system/application. 633 618 634 - ============================================================== 635 619 636 620 oom_dump_tasks 621 + ============== 637 622 638 623 Enables a system-wide task dump (excluding kernel threads) to be produced 639 624 when the kernel performs an OOM-killing and includes such information as ··· 653 638 654 639 The default value is 1 (enabled). 655 640 656 - ============================================================== 657 641 658 642 oom_kill_allocating_task 643 + ======================== 659 644 660 645 This enables or disables killing the OOM-triggering task in 661 646 out-of-memory situations. ··· 674 659 675 660 The default value is 0. 676 661 677 - ============================================================== 678 662 679 - overcommit_kbytes: 663 + overcommit_kbytes 664 + ================= 680 665 681 666 When overcommit_memory is set to 2, the committed address space is not 682 667 permitted to exceed swap plus this amount of physical RAM. See below. ··· 685 670 of them may be specified at a time. Setting one disables the other (which 686 671 then appears as 0 when read). 687 672 688 - ============================================================== 689 673 690 - overcommit_memory: 674 + overcommit_memory 675 + ================= 691 676 692 677 This value contains a flag that enables memory overcommitment. 693 678 ··· 710 695 See Documentation/vm/overcommit-accounting.rst and 711 696 mm/util.c::__vm_enough_memory() for more information. 712 697 713 - ============================================================== 714 698 715 - overcommit_ratio: 699 + overcommit_ratio 700 + ================ 716 701 717 702 When overcommit_memory is set to 2, the committed address 718 703 space is not permitted to exceed swap plus this percentage 719 704 of physical RAM. See above. 720 705 721 - ============================================================== 722 706 723 707 page-cluster 708 + ============ 724 709 725 710 page-cluster controls the number of pages up to which consecutive pages 726 711 are read in from swap in a single attempt. This is the swap counterpart ··· 740 725 extra faults and I/O delays for following faults if they would have been part of 741 726 that consecutive pages readahead would have brought in. 742 727 743 - ============================================================= 744 728 745 729 panic_on_oom 730 + ============ 746 731 747 732 This enables or disables panic on out-of-memory feature. 748 733 ··· 762 747 system panics. 763 748 764 749 The default value is 0. 750 + 765 751 1 and 2 are for failover of clustering. Please select either 766 752 according to your policy of failover. 753 + 767 754 panic_on_oom=2+kdump gives you very strong tool to investigate 768 755 why oom happens. You can get snapshot. 769 756 770 - ============================================================= 771 757 772 758 percpu_pagelist_fraction 759 + ======================== 773 760 774 761 This is the fraction of pages at most (high mark pcp->high) in each zone that 775 762 are allocated for each per cpu page list. The min value for this is 8. It ··· 787 770 the high water marks for each per cpu page list. If the user writes '0' to this 788 771 sysctl, it will revert to this default behavior. 789 772 790 - ============================================================== 791 773 792 774 stat_interval 775 + ============= 793 776 794 777 The time interval between which vm statistics are updated. The default 795 778 is 1 second. 796 779 797 - ============================================================== 798 780 799 781 stat_refresh 782 + ============ 800 783 801 784 Any read or write (by root only) flushes all the per-cpu vm statistics 802 785 into their global totals, for more accurate reports when testing ··· 807 790 (At time of writing, a few stats are known sometimes to be found negative, 808 791 with no ill effects: errors and warnings on these stats are suppressed.) 809 792 810 - ============================================================== 811 793 812 794 numa_stat 795 + ========= 813 796 814 797 This interface allows runtime configuration of numa statistics. 815 798 816 799 When page allocation performance becomes a bottleneck and you can tolerate 817 800 some possible tool breakage and decreased numa counter precision, you can 818 - do: 801 + do:: 802 + 819 803 echo 0 > /proc/sys/vm/numa_stat 820 804 821 805 When page allocation performance is not a bottleneck and you want all 822 - tooling to work, you can do: 806 + tooling to work, you can do:: 807 + 823 808 echo 1 > /proc/sys/vm/numa_stat 824 809 825 - ============================================================== 826 810 827 811 swappiness 812 + ========== 828 813 829 814 This control is used to define how aggressive the kernel will swap 830 815 memory pages. Higher values will increase aggressiveness, lower values ··· 836 817 837 818 The default value is 60. 838 819 839 - ============================================================== 840 820 841 821 unprivileged_userfaultfd 822 + ======================== 842 823 843 824 This flag controls whether unprivileged users can use the userfaultfd 844 825 system calls. Set this to 1 to allow unprivileged users to use the ··· 847 828 848 829 The default value is 1. 849 830 850 - ============================================================== 851 831 852 - - user_reserve_kbytes 832 + user_reserve_kbytes 833 + =================== 853 834 854 835 When overcommit_memory is set to 2, "never overcommit" mode, reserve 855 836 min(3% of current process size, user_reserve_kbytes) of free memory. ··· 865 846 866 847 Changing this takes effect whenever an application requests memory. 867 848 868 - ============================================================== 869 849 870 850 vfs_cache_pressure 871 - ------------------ 851 + ================== 872 852 873 853 This percentage value controls the tendency of the kernel to reclaim 874 854 the memory which is used for caching of directory and inode objects. ··· 885 867 directory and inode objects. With vfs_cache_pressure=1000, it will look for 886 868 ten times more freeable objects than there are. 887 869 888 - ============================================================= 889 870 890 - watermark_boost_factor: 871 + watermark_boost_factor 872 + ====================== 891 873 892 874 This factor controls the level of reclaim when memory is being fragmented. 893 875 It defines the percentage of the high watermark of a zone that will be ··· 905 887 smaller than a pageblock then a pageblocks worth of pages will be reclaimed 906 888 (e.g. 2MB on 64-bit x86). A boost factor of 0 will disable the feature. 907 889 908 - ============================================================= 909 890 910 - watermark_scale_factor: 891 + watermark_scale_factor 892 + ====================== 911 893 912 894 This factor controls the aggressiveness of kswapd. It defines the 913 895 amount of memory left in a node/system before kswapd is woken up and ··· 923 905 too small for the allocation bursts occurring in the system. This knob 924 906 can then be used to tune kswapd aggressiveness accordingly. 925 907 926 - ============================================================== 927 908 928 - zone_reclaim_mode: 909 + zone_reclaim_mode 910 + ================= 929 911 930 912 Zone_reclaim_mode allows someone to set more or less aggressive approaches to 931 913 reclaim memory when a zone runs out of memory. If it is set to zero then no 932 914 zone reclaim occurs. Allocations will be satisfied from other zones / nodes 933 915 in the system. 934 916 935 - This is value ORed together of 917 + This is value OR'ed together of 936 918 937 - 1 = Zone reclaim on 938 - 2 = Zone reclaim writes dirty pages out 939 - 4 = Zone reclaim swaps pages 919 + = =================================== 920 + 1 Zone reclaim on 921 + 2 Zone reclaim writes dirty pages out 922 + 4 Zone reclaim swaps pages 923 + = =================================== 940 924 941 925 zone_reclaim_mode is disabled by default. For file servers or workloads 942 926 that benefit from having their data cached, zone_reclaim_mode should be ··· 962 942 Allowing regular swap effectively restricts allocations to the local 963 943 node unless explicitly overridden by memory policies or cpuset 964 944 configurations. 965 - 966 - ============ End of Document =================================

+1 -1

Documentation/vm/unevictable-lru.rst

··· 439 439 440 440 The unevictable LRU can be scanned for compactable regions and the default 441 441 behavior is to do so. /proc/sys/vm/compact_unevictable_allowed controls 442 - this behavior (see Documentation/sysctl/vm.txt). Once scanning of the 442 + this behavior (see Documentation/sysctl/vm.rst). Once scanning of the 443 443 unevictable LRU is enabled, the work of compaction is mostly handled by 444 444 the page migration code and the same work flow as described in MIGRATING 445 445 MLOCKED PAGES will apply.

+1 -1

kernel/panic.c

··· 372 372 /** 373 373 * print_tainted - return a string to represent the kernel taint state. 374 374 * 375 - * For individual taint flag meanings, see Documentation/sysctl/kernel.txt 375 + * For individual taint flag meanings, see Documentation/sysctl/kernel.rst 376 376 * 377 377 * The string is overwritten by the next call to print_tainted(), 378 378 * but is always NULL terminated.

+1 -1

mm/swap.c

··· 8 8 /* 9 9 * This file contains the default values for the operation of the 10 10 * Linux VM subsystem. Fine-tuning documentation can be found in 11 - * Documentation/sysctl/vm.txt. 11 + * Documentation/sysctl/vm.rst. 12 12 * Started 18.12.91 13 13 * Swap aging added 23.2.95, Stephen Tweedie. 14 14 * Buffermem limits added 12.3.98, Rik van Riel.

Configure Feed

Configure Feed