Merge branch 'akpm' (patches from Andrew)

-296

Documentation/vm/cleancache.rst

··· 1 - .. _cleancache: 2 - 3 - ========== 4 - Cleancache 5 - ========== 6 - 7 - Motivation 8 - ========== 9 - 10 - Cleancache is a new optional feature provided by the VFS layer that 11 - potentially dramatically increases page cache effectiveness for 12 - many workloads in many environments at a negligible cost. 13 - 14 - Cleancache can be thought of as a page-granularity victim cache for clean 15 - pages that the kernel's pageframe replacement algorithm (PFRA) would like 16 - to keep around, but can't since there isn't enough memory. So when the 17 - PFRA "evicts" a page, it first attempts to use cleancache code to 18 - put the data contained in that page into "transcendent memory", memory 19 - that is not directly accessible or addressable by the kernel and is 20 - of unknown and possibly time-varying size. 21 - 22 - Later, when a cleancache-enabled filesystem wishes to access a page 23 - in a file on disk, it first checks cleancache to see if it already 24 - contains it; if it does, the page of data is copied into the kernel 25 - and a disk access is avoided. 26 - 27 - Transcendent memory "drivers" for cleancache are currently implemented 28 - in Xen (using hypervisor memory) and zcache (using in-kernel compressed 29 - memory) and other implementations are in development. 30 - 31 - :ref:`FAQs <faq>` are included below. 32 - 33 - Implementation Overview 34 - ======================= 35 - 36 - A cleancache "backend" that provides transcendent memory registers itself 37 - to the kernel's cleancache "frontend" by calling cleancache_register_ops, 38 - passing a pointer to a cleancache_ops structure with funcs set appropriately. 39 - The functions provided must conform to certain semantics as follows: 40 - 41 - Most important, cleancache is "ephemeral". Pages which are copied into 42 - cleancache have an indefinite lifetime which is completely unknowable 43 - by the kernel and so may or may not still be in cleancache at any later time. 44 - Thus, as its name implies, cleancache is not suitable for dirty pages. 45 - Cleancache has complete discretion over what pages to preserve and what 46 - pages to discard and when. 47 - 48 - Mounting a cleancache-enabled filesystem should call "init_fs" to obtain a 49 - pool id which, if positive, must be saved in the filesystem's superblock; 50 - a negative return value indicates failure. A "put_page" will copy a 51 - (presumably about-to-be-evicted) page into cleancache and associate it with 52 - the pool id, a file key, and a page index into the file. (The combination 53 - of a pool id, a file key, and an index is sometimes called a "handle".) 54 - A "get_page" will copy the page, if found, from cleancache into kernel memory. 55 - An "invalidate_page" will ensure the page no longer is present in cleancache; 56 - an "invalidate_inode" will invalidate all pages associated with the specified 57 - file; and, when a filesystem is unmounted, an "invalidate_fs" will invalidate 58 - all pages in all files specified by the given pool id and also surrender 59 - the pool id. 60 - 61 - An "init_shared_fs", like init_fs, obtains a pool id but tells cleancache 62 - to treat the pool as shared using a 128-bit UUID as a key. On systems 63 - that may run multiple kernels (such as hard partitioned or virtualized 64 - systems) that may share a clustered filesystem, and where cleancache 65 - may be shared among those kernels, calls to init_shared_fs that specify the 66 - same UUID will receive the same pool id, thus allowing the pages to 67 - be shared. Note that any security requirements must be imposed outside 68 - of the kernel (e.g. by "tools" that control cleancache). Or a 69 - cleancache implementation can simply disable shared_init by always 70 - returning a negative value. 71 - 72 - If a get_page is successful on a non-shared pool, the page is invalidated 73 - (thus making cleancache an "exclusive" cache). On a shared pool, the page 74 - is NOT invalidated on a successful get_page so that it remains accessible to 75 - other sharers. The kernel is responsible for ensuring coherency between 76 - cleancache (shared or not), the page cache, and the filesystem, using 77 - cleancache invalidate operations as required. 78 - 79 - Note that cleancache must enforce put-put-get coherency and get-get 80 - coherency. For the former, if two puts are made to the same handle but 81 - with different data, say AAA by the first put and BBB by the second, a 82 - subsequent get can never return the stale data (AAA). For get-get coherency, 83 - if a get for a given handle fails, subsequent gets for that handle will 84 - never succeed unless preceded by a successful put with that handle. 85 - 86 - Last, cleancache provides no SMP serialization guarantees; if two 87 - different Linux threads are simultaneously putting and invalidating a page 88 - with the same handle, the results are indeterminate. Callers must 89 - lock the page to ensure serial behavior. 90 - 91 - Cleancache Performance Metrics 92 - ============================== 93 - 94 - If properly configured, monitoring of cleancache is done via debugfs in 95 - the `/sys/kernel/debug/cleancache` directory. The effectiveness of cleancache 96 - can be measured (across all filesystems) with: 97 - 98 - ``succ_gets`` 99 - number of gets that were successful 100 - 101 - ``failed_gets`` 102 - number of gets that failed 103 - 104 - ``puts`` 105 - number of puts attempted (all "succeed") 106 - 107 - ``invalidates`` 108 - number of invalidates attempted 109 - 110 - A backend implementation may provide additional metrics. 111 - 112 - .. _faq: 113 - 114 - FAQ 115 - === 116 - 117 - * Where's the value? (Andrew Morton) 118 - 119 - Cleancache provides a significant performance benefit to many workloads 120 - in many environments with negligible overhead by improving the 121 - effectiveness of the pagecache. Clean pagecache pages are 122 - saved in transcendent memory (RAM that is otherwise not directly 123 - addressable to the kernel); fetching those pages later avoids "refaults" 124 - and thus disk reads. 125 - 126 - Cleancache (and its sister code "frontswap") provide interfaces for 127 - this transcendent memory (aka "tmem"), which conceptually lies between 128 - fast kernel-directly-addressable RAM and slower DMA/asynchronous devices. 129 - Disallowing direct kernel or userland reads/writes to tmem 130 - is ideal when data is transformed to a different form and size (such 131 - as with compression) or secretly moved (as might be useful for write- 132 - balancing for some RAM-like devices). Evicted page-cache pages (and 133 - swap pages) are a great use for this kind of slower-than-RAM-but-much- 134 - faster-than-disk transcendent memory, and the cleancache (and frontswap) 135 - "page-object-oriented" specification provides a nice way to read and 136 - write -- and indirectly "name" -- the pages. 137 - 138 - In the virtual case, the whole point of virtualization is to statistically 139 - multiplex physical resources across the varying demands of multiple 140 - virtual machines. This is really hard to do with RAM and efforts to 141 - do it well with no kernel change have essentially failed (except in some 142 - well-publicized special-case workloads). Cleancache -- and frontswap -- 143 - with a fairly small impact on the kernel, provide a huge amount 144 - of flexibility for more dynamic, flexible RAM multiplexing. 145 - Specifically, the Xen Transcendent Memory backend allows otherwise 146 - "fallow" hypervisor-owned RAM to not only be "time-shared" between multiple 147 - virtual machines, but the pages can be compressed and deduplicated to 148 - optimize RAM utilization. And when guest OS's are induced to surrender 149 - underutilized RAM (e.g. with "self-ballooning"), page cache pages 150 - are the first to go, and cleancache allows those pages to be 151 - saved and reclaimed if overall host system memory conditions allow. 152 - 153 - And the identical interface used for cleancache can be used in 154 - physical systems as well. The zcache driver acts as a memory-hungry 155 - device that stores pages of data in a compressed state. And 156 - the proposed "RAMster" driver shares RAM across multiple physical 157 - systems. 158 - 159 - * Why does cleancache have its sticky fingers so deep inside the 160 - filesystems and VFS? (Andrew Morton and Christoph Hellwig) 161 - 162 - The core hooks for cleancache in VFS are in most cases a single line 163 - and the minimum set are placed precisely where needed to maintain 164 - coherency (via cleancache_invalidate operations) between cleancache, 165 - the page cache, and disk. All hooks compile into nothingness if 166 - cleancache is config'ed off and turn into a function-pointer- 167 - compare-to-NULL if config'ed on but no backend claims the ops 168 - functions, or to a compare-struct-element-to-negative if a 169 - backend claims the ops functions but a filesystem doesn't enable 170 - cleancache. 171 - 172 - Some filesystems are built entirely on top of VFS and the hooks 173 - in VFS are sufficient, so don't require an "init_fs" hook; the 174 - initial implementation of cleancache didn't provide this hook. 175 - But for some filesystems (such as btrfs), the VFS hooks are 176 - incomplete and one or more hooks in fs-specific code are required. 177 - And for some other filesystems, such as tmpfs, cleancache may 178 - be counterproductive. So it seemed prudent to require a filesystem 179 - to "opt in" to use cleancache, which requires adding a hook in 180 - each filesystem. Not all filesystems are supported by cleancache 181 - only because they haven't been tested. The existing set should 182 - be sufficient to validate the concept, the opt-in approach means 183 - that untested filesystems are not affected, and the hooks in the 184 - existing filesystems should make it very easy to add more 185 - filesystems in the future. 186 - 187 - The total impact of the hooks to existing fs and mm files is only 188 - about 40 lines added (not counting comments and blank lines). 189 - 190 - * Why not make cleancache asynchronous and batched so it can more 191 - easily interface with real devices with DMA instead of copying each 192 - individual page? (Minchan Kim) 193 - 194 - The one-page-at-a-time copy semantics simplifies the implementation 195 - on both the frontend and backend and also allows the backend to 196 - do fancy things on-the-fly like page compression and 197 - page deduplication. And since the data is "gone" (copied into/out 198 - of the pageframe) before the cleancache get/put call returns, 199 - a great deal of race conditions and potential coherency issues 200 - are avoided. While the interface seems odd for a "real device" 201 - or for real kernel-addressable RAM, it makes perfect sense for 202 - transcendent memory. 203 - 204 - * Why is non-shared cleancache "exclusive"? And where is the 205 - page "invalidated" after a "get"? (Minchan Kim) 206 - 207 - The main reason is to free up space in transcendent memory and 208 - to avoid unnecessary cleancache_invalidate calls. If you want inclusive, 209 - the page can be "put" immediately following the "get". If 210 - put-after-get for inclusive becomes common, the interface could 211 - be easily extended to add a "get_no_invalidate" call. 212 - 213 - The invalidate is done by the cleancache backend implementation. 214 - 215 - * What's the performance impact? 216 - 217 - Performance analysis has been presented at OLS'09 and LCA'10. 218 - Briefly, performance gains can be significant on most workloads, 219 - especially when memory pressure is high (e.g. when RAM is 220 - overcommitted in a virtual workload); and because the hooks are 221 - invoked primarily in place of or in addition to a disk read/write, 222 - overhead is negligible even in worst case workloads. Basically 223 - cleancache replaces I/O with memory-copy-CPU-overhead; on older 224 - single-core systems with slow memory-copy speeds, cleancache 225 - has little value, but in newer multicore machines, especially 226 - consolidated/virtualized machines, it has great value. 227 - 228 - * How do I add cleancache support for filesystem X? (Boaz Harrash) 229 - 230 - Filesystems that are well-behaved and conform to certain 231 - restrictions can utilize cleancache simply by making a call to 232 - cleancache_init_fs at mount time. Unusual, misbehaving, or 233 - poorly layered filesystems must either add additional hooks 234 - and/or undergo extensive additional testing... or should just 235 - not enable the optional cleancache. 236 - 237 - Some points for a filesystem to consider: 238 - 239 - - The FS should be block-device-based (e.g. a ram-based FS such 240 - as tmpfs should not enable cleancache) 241 - - To ensure coherency/correctness, the FS must ensure that all 242 - file removal or truncation operations either go through VFS or 243 - add hooks to do the equivalent cleancache "invalidate" operations 244 - - To ensure coherency/correctness, either inode numbers must 245 - be unique across the lifetime of the on-disk file OR the 246 - FS must provide an "encode_fh" function. 247 - - The FS must call the VFS superblock alloc and deactivate routines 248 - or add hooks to do the equivalent cleancache calls done there. 249 - - To maximize performance, all pages fetched from the FS should 250 - go through the do_mpag_readpage routine or the FS should add 251 - hooks to do the equivalent (cf. btrfs) 252 - - Currently, the FS blocksize must be the same as PAGESIZE. This 253 - is not an architectural restriction, but no backends currently 254 - support anything different. 255 - - A clustered FS should invoke the "shared_init_fs" cleancache 256 - hook to get best performance for some backends. 257 - 258 - * Why not use the KVA of the inode as the key? (Christoph Hellwig) 259 - 260 - If cleancache would use the inode virtual address instead of 261 - inode/filehandle, the pool id could be eliminated. But, this 262 - won't work because cleancache retains pagecache data pages 263 - persistently even when the inode has been pruned from the 264 - inode unused list, and only invalidates the data page if the file 265 - gets removed/truncated. So if cleancache used the inode kva, 266 - there would be potential coherency issues if/when the inode 267 - kva is reused for a different file. Alternately, if cleancache 268 - invalidated the pages when the inode kva was freed, much of the value 269 - of cleancache would be lost because the cache of pages in cleanache 270 - is potentially much larger than the kernel pagecache and is most 271 - useful if the pages survive inode cache removal. 272 - 273 - * Why is a global variable required? 274 - 275 - The cleancache_enabled flag is checked in all of the frequently-used 276 - cleancache hooks. The alternative is a function call to check a static 277 - variable. Since cleancache is enabled dynamically at runtime, systems 278 - that don't enable cleancache would suffer thousands (possibly 279 - tens-of-thousands) of unnecessary function calls per second. So the 280 - global variable allows cleancache to be enabled by default at compile 281 - time, but have insignificant performance impact when cleancache remains 282 - disabled at runtime. 283 - 284 - * Does cleanache work with KVM? 285 - 286 - The memory model of KVM is sufficiently different that a cleancache 287 - backend may have less value for KVM. This remains to be tested, 288 - especially in an overcommitted system. 289 - 290 - * Does cleancache work in userspace? It sounds useful for 291 - memory hungry caches like web browsers. (Jamie Lokier) 292 - 293 - No plans yet, though we agree it sounds useful, at least for 294 - apps that bypass the page cache (e.g. O_DIRECT). 295 - 296 - Last updated: Dan Magenheimer, April 13 2011

+2 -29

Documentation/vm/frontswap.rst

··· 8 8 In some environments, dramatic performance savings may be obtained because 9 9 swapped pages are saved in RAM (or a RAM-like device) instead of a swap disk. 10 10 11 - (Note, frontswap -- and :ref:`cleancache` (merged at 3.0) -- are the "frontends" 12 - and the only necessary changes to the core kernel for transcendent memory; 13 - all other supporting code -- the "backends" -- is implemented as drivers. 14 - See the LWN.net article `Transcendent memory in a nutshell`_ 15 - for a detailed overview of frontswap and related kernel parts) 16 - 17 11 .. _Transcendent memory in a nutshell: https://lwn.net/Articles/454795/ 18 12 19 13 Frontswap is so named because it can be thought of as the opposite of ··· 38 44 a disk write and, if the data is later read back, a disk read are avoided. 39 45 If a store returns failure, transcendent memory has rejected the data, and the 40 46 page can be written to swap as usual. 41 - 42 - If a backend chooses, frontswap can be configured as a "writethrough 43 - cache" by calling frontswap_writethrough(). In this mode, the reduction 44 - in swap device writes is lost (and also a non-trivial performance advantage) 45 - in order to allow the backend to arbitrarily "reclaim" space used to 46 - store frontswap pages to more completely manage its memory usage. 47 47 48 48 Note that if a page is stored and the page already exists in transcendent memory 49 49 (a "duplicate" store), either the store succeeds and the data is overwritten, ··· 75 87 and size (such as with compression) or secretly moved (as might be 76 88 useful for write-balancing for some RAM-like devices). Swap pages (and 77 89 evicted page-cache pages) are a great use for this kind of slower-than-RAM- 78 - but-much-faster-than-disk "pseudo-RAM device" and the frontswap (and 79 - cleancache) interface to transcendent memory provides a nice way to read 80 - and write -- and indirectly "name" -- the pages. 90 + but-much-faster-than-disk "pseudo-RAM device". 81 91 82 - Frontswap -- and cleancache -- with a fairly small impact on the kernel, 92 + Frontswap with a fairly small impact on the kernel, 83 93 provides a huge amount of flexibility for more dynamic, flexible RAM 84 94 utilization in various system configurations: 85 95 ··· 254 268 the old data and ensure that it is no longer accessible. Since the 255 269 swap subsystem then writes the new data to the read swap device, 256 270 this is the correct course of action to ensure coherency. 257 - 258 - * What is frontswap_shrink for? 259 - 260 - When the (non-frontswap) swap subsystem swaps out a page to a real 261 - swap device, that page is only taking up low-value pre-allocated disk 262 - space. But if frontswap has placed a page in transcendent memory, that 263 - page may be taking up valuable real estate. The frontswap_shrink 264 - routine allows code outside of the swap subsystem to force pages out 265 - of the memory managed by frontswap and back into kernel-addressable memory. 266 - For example, in RAMster, a "suction driver" thread will attempt 267 - to "repatriate" pages sent to a remote machine back to the local machine; 268 - this is driven using the frontswap_shrink mechanism when memory pressure 269 - subsides. 270 271 271 272 * Why does the frontswap patch create the new include file swapfile.h? 272 273

-1

Documentation/vm/index.rst

··· 15 15 active_mm 16 16 arch_pgtable_helpers 17 17 balance 18 - cleancache 19 18 damon/index 20 19 free_page_reporting 21 20 frontswap

-7

MAINTAINERS

··· 4705 4705 F: include/linux/cfi.h 4706 4706 F: kernel/cfi.c 4707 4707 4708 - CLEANCACHE API 4709 - M: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> 4710 - L: linux-kernel@vger.kernel.org 4711 - S: Maintained 4712 - F: include/linux/cleancache.h 4713 - F: mm/cleancache.c 4714 - 4715 4708 CLK API 4716 4709 M: Russell King <linux@armlinux.org.uk> 4717 4710 L: linux-clk@vger.kernel.org

+2 -2

arch/alpha/kernel/srm_env.c

··· 83 83 84 84 static int srm_env_proc_open(struct inode *inode, struct file *file) 85 85 { 86 - return single_open(file, srm_env_proc_show, PDE_DATA(inode)); 86 + return single_open(file, srm_env_proc_show, pde_data(inode)); 87 87 } 88 88 89 89 static ssize_t srm_env_proc_write(struct file *file, const char __user *buffer, 90 90 size_t count, loff_t *pos) 91 91 { 92 92 int res; 93 - unsigned long id = (unsigned long)PDE_DATA(file_inode(file)); 93 + unsigned long id = (unsigned long)pde_data(file_inode(file)); 94 94 char *buf = (char *) __get_free_page(GFP_USER); 95 95 unsigned long ret1, ret2; 96 96

-1

arch/arm/configs/bcm2835_defconfig

··· 31 31 CONFIG_PREEMPT_VOLUNTARY=y 32 32 CONFIG_AEABI=y 33 33 CONFIG_KSM=y 34 - CONFIG_CLEANCACHE=y 35 34 CONFIG_CMA=y 36 35 CONFIG_SECCOMP=y 37 36 CONFIG_KEXEC=y

-1

arch/arm/configs/qcom_defconfig

··· 27 27 CONFIG_SMP=y 28 28 CONFIG_PREEMPT=y 29 29 CONFIG_HIGHMEM=y 30 - CONFIG_CLEANCACHE=y 31 30 CONFIG_ARM_APPENDED_DTB=y 32 31 CONFIG_ARM_ATAG_DTB_COMPAT=y 33 32 CONFIG_CPU_IDLE=y

+1 -1

arch/arm/kernel/atags_proc.c

··· 13 13 static ssize_t atags_read(struct file *file, char __user *buf, 14 14 size_t count, loff_t *ppos) 15 15 { 16 - struct buffer *b = PDE_DATA(file_inode(file)); 16 + struct buffer *b = pde_data(file_inode(file)); 17 17 return simple_read_from_buffer(buf, count, ppos, b->data, b->size); 18 18 } 19 19

+1 -1

arch/arm/mm/alignment.c

··· 1005 1005 __setup("noalign", noalign_setup); 1006 1006 1007 1007 /* 1008 - * This needs to be done after sysctl_init, otherwise sys/ will be 1008 + * This needs to be done after sysctl_init_bases(), otherwise sys/ will be 1009 1009 * overwritten. Actually, this shouldn't be in sys/ at all since 1010 1010 * it isn't a sysctl, and it doesn't contain sysctl information. 1011 1011 * We now locate it in /proc/cpu/alignment instead.

+5 -5

arch/ia64/kernel/salinfo.c

··· 282 282 static ssize_t 283 283 salinfo_event_read(struct file *file, char __user *buffer, size_t count, loff_t *ppos) 284 284 { 285 - struct salinfo_data *data = PDE_DATA(file_inode(file)); 285 + struct salinfo_data *data = pde_data(file_inode(file)); 286 286 char cmd[32]; 287 287 size_t size; 288 288 int i, n, cpu = -1; ··· 340 340 static int 341 341 salinfo_log_open(struct inode *inode, struct file *file) 342 342 { 343 - struct salinfo_data *data = PDE_DATA(inode); 343 + struct salinfo_data *data = pde_data(inode); 344 344 345 345 if (!capable(CAP_SYS_ADMIN)) 346 346 return -EPERM; ··· 365 365 static int 366 366 salinfo_log_release(struct inode *inode, struct file *file) 367 367 { 368 - struct salinfo_data *data = PDE_DATA(inode); 368 + struct salinfo_data *data = pde_data(inode); 369 369 370 370 if (data->state == STATE_NO_DATA) { 371 371 vfree(data->log_buffer); ··· 433 433 static ssize_t 434 434 salinfo_log_read(struct file *file, char __user *buffer, size_t count, loff_t *ppos) 435 435 { 436 - struct salinfo_data *data = PDE_DATA(file_inode(file)); 436 + struct salinfo_data *data = pde_data(file_inode(file)); 437 437 u8 *buf; 438 438 u64 bufsize; 439 439 ··· 494 494 static ssize_t 495 495 salinfo_log_write(struct file *file, const char __user *buffer, size_t count, loff_t *ppos) 496 496 { 497 - struct salinfo_data *data = PDE_DATA(file_inode(file)); 497 + struct salinfo_data *data = pde_data(file_inode(file)); 498 498 char cmd[32]; 499 499 size_t size; 500 500 u32 offset;

-1

arch/m68k/configs/amiga_defconfig

··· 45 45 CONFIG_BINFMT_AOUT=m 46 46 CONFIG_BINFMT_MISC=m 47 47 # CONFIG_COMPACTION is not set 48 - CONFIG_CLEANCACHE=y 49 48 CONFIG_ZPOOL=m 50 49 CONFIG_NET=y 51 50 CONFIG_PACKET=y

-1

arch/m68k/configs/apollo_defconfig

··· 41 41 CONFIG_BINFMT_AOUT=m 42 42 CONFIG_BINFMT_MISC=m 43 43 # CONFIG_COMPACTION is not set 44 - CONFIG_CLEANCACHE=y 45 44 CONFIG_ZPOOL=m 46 45 CONFIG_NET=y 47 46 CONFIG_PACKET=y

-1

arch/m68k/configs/atari_defconfig

··· 48 48 CONFIG_BINFMT_AOUT=m 49 49 CONFIG_BINFMT_MISC=m 50 50 # CONFIG_COMPACTION is not set 51 - CONFIG_CLEANCACHE=y 52 51 CONFIG_ZPOOL=m 53 52 CONFIG_NET=y 54 53 CONFIG_PACKET=y

-1

arch/m68k/configs/bvme6000_defconfig

··· 38 38 CONFIG_BINFMT_AOUT=m 39 39 CONFIG_BINFMT_MISC=m 40 40 # CONFIG_COMPACTION is not set 41 - CONFIG_CLEANCACHE=y 42 41 CONFIG_ZPOOL=m 43 42 CONFIG_NET=y 44 43 CONFIG_PACKET=y

-1

arch/m68k/configs/hp300_defconfig

··· 40 40 CONFIG_BINFMT_AOUT=m 41 41 CONFIG_BINFMT_MISC=m 42 42 # CONFIG_COMPACTION is not set 43 - CONFIG_CLEANCACHE=y 44 43 CONFIG_ZPOOL=m 45 44 CONFIG_NET=y 46 45 CONFIG_PACKET=y

-1

arch/m68k/configs/mac_defconfig

··· 39 39 CONFIG_BINFMT_AOUT=m 40 40 CONFIG_BINFMT_MISC=m 41 41 # CONFIG_COMPACTION is not set 42 - CONFIG_CLEANCACHE=y 43 42 CONFIG_ZPOOL=m 44 43 CONFIG_NET=y 45 44 CONFIG_PACKET=y

-1

arch/m68k/configs/multi_defconfig

··· 59 59 CONFIG_BINFMT_AOUT=m 60 60 CONFIG_BINFMT_MISC=m 61 61 # CONFIG_COMPACTION is not set 62 - CONFIG_CLEANCACHE=y 63 62 CONFIG_ZPOOL=m 64 63 CONFIG_NET=y 65 64 CONFIG_PACKET=y

-1

arch/m68k/configs/mvme147_defconfig

··· 37 37 CONFIG_BINFMT_AOUT=m 38 38 CONFIG_BINFMT_MISC=m 39 39 # CONFIG_COMPACTION is not set 40 - CONFIG_CLEANCACHE=y 41 40 CONFIG_ZPOOL=m 42 41 CONFIG_NET=y 43 42 CONFIG_PACKET=y

-1

arch/m68k/configs/mvme16x_defconfig

··· 38 38 CONFIG_BINFMT_AOUT=m 39 39 CONFIG_BINFMT_MISC=m 40 40 # CONFIG_COMPACTION is not set 41 - CONFIG_CLEANCACHE=y 42 41 CONFIG_ZPOOL=m 43 42 CONFIG_NET=y 44 43 CONFIG_PACKET=y

-1

arch/m68k/configs/q40_defconfig

··· 39 39 CONFIG_BINFMT_AOUT=m 40 40 CONFIG_BINFMT_MISC=m 41 41 # CONFIG_COMPACTION is not set 42 - CONFIG_CLEANCACHE=y 43 42 CONFIG_ZPOOL=m 44 43 CONFIG_NET=y 45 44 CONFIG_PACKET=y

-1

arch/m68k/configs/sun3_defconfig

··· 35 35 CONFIG_BINFMT_AOUT=m 36 36 CONFIG_BINFMT_MISC=m 37 37 # CONFIG_COMPACTION is not set 38 - CONFIG_CLEANCACHE=y 39 38 CONFIG_ZPOOL=m 40 39 CONFIG_NET=y 41 40 CONFIG_PACKET=y

-1

arch/m68k/configs/sun3x_defconfig

··· 35 35 CONFIG_BINFMT_AOUT=m 36 36 CONFIG_BINFMT_MISC=m 37 37 # CONFIG_COMPACTION is not set 38 - CONFIG_CLEANCACHE=y 39 38 CONFIG_ZPOOL=m 40 39 CONFIG_NET=y 41 40 CONFIG_PACKET=y

+2 -2

arch/powerpc/kernel/proc_powerpc.c

··· 25 25 loff_t *ppos) 26 26 { 27 27 return simple_read_from_buffer(buf, nbytes, ppos, 28 - PDE_DATA(file_inode(file)), PAGE_SIZE); 28 + pde_data(file_inode(file)), PAGE_SIZE); 29 29 } 30 30 31 31 static int page_map_mmap( struct file *file, struct vm_area_struct *vma ) ··· 34 34 return -EINVAL; 35 35 36 36 remap_pfn_range(vma, vma->vm_start, 37 - __pa(PDE_DATA(file_inode(file))) >> PAGE_SHIFT, 37 + __pa(pde_data(file_inode(file))) >> PAGE_SHIFT, 38 38 PAGE_SIZE, vma->vm_page_prot); 39 39 return 0; 40 40 }

-1

arch/s390/configs/debug_defconfig

··· 96 96 CONFIG_MEMORY_HOTREMOVE=y 97 97 CONFIG_KSM=y 98 98 CONFIG_TRANSPARENT_HUGEPAGE=y 99 - CONFIG_CLEANCACHE=y 100 99 CONFIG_FRONTSWAP=y 101 100 CONFIG_CMA_DEBUG=y 102 101 CONFIG_CMA_DEBUGFS=y

-1

arch/s390/configs/defconfig

··· 91 91 CONFIG_MEMORY_HOTREMOVE=y 92 92 CONFIG_KSM=y 93 93 CONFIG_TRANSPARENT_HUGEPAGE=y 94 - CONFIG_CLEANCACHE=y 95 94 CONFIG_FRONTSWAP=y 96 95 CONFIG_CMA_SYSFS=y 97 96 CONFIG_CMA_AREAS=7

+2 -2

arch/sh/mm/alignment.c

··· 140 140 static ssize_t alignment_proc_write(struct file *file, 141 141 const char __user *buffer, size_t count, loff_t *pos) 142 142 { 143 - int *data = PDE_DATA(file_inode(file)); 143 + int *data = pde_data(file_inode(file)); 144 144 char mode; 145 145 146 146 if (count > 0) { ··· 161 161 }; 162 162 163 163 /* 164 - * This needs to be done after sysctl_init, otherwise sys/ will be 164 + * This needs to be done after sysctl_init_bases(), otherwise sys/ will be 165 165 * overwritten. Actually, this shouldn't be in sys/ at all since 166 166 * it isn't a sysctl, and it doesn't contain sysctl information. 167 167 * We now locate it in /proc/cpu/alignment instead.

+2 -2

arch/xtensa/platforms/iss/simdisk.c

··· 208 208 static ssize_t proc_read_simdisk(struct file *file, char __user *buf, 209 209 size_t size, loff_t *ppos) 210 210 { 211 - struct simdisk *dev = PDE_DATA(file_inode(file)); 211 + struct simdisk *dev = pde_data(file_inode(file)); 212 212 const char *s = dev->filename; 213 213 if (s) { 214 214 ssize_t n = simple_read_from_buffer(buf, size, ppos, ··· 225 225 size_t count, loff_t *ppos) 226 226 { 227 227 char *tmp = memdup_user_nul(buf, count); 228 - struct simdisk *dev = PDE_DATA(file_inode(file)); 228 + struct simdisk *dev = pde_data(file_inode(file)); 229 229 int err; 230 230 231 231 if (IS_ERR(tmp))

-5

block/bdev.c

··· 24 24 #include <linux/pseudo_fs.h> 25 25 #include <linux/uio.h> 26 26 #include <linux/namei.h> 27 - #include <linux/cleancache.h> 28 27 #include <linux/part_stat.h> 29 28 #include <linux/uaccess.h> 30 29 #include "../fs/internal.h" ··· 87 88 lru_add_drain_all(); /* make sure all lru add caches are flushed */ 88 89 invalidate_mapping_pages(mapping, 0, -1); 89 90 } 90 - /* 99% of the time, we don't need to flush the cleancache on the bdev. 91 - * But, for the strange corners, lets be cautious 92 - */ 93 - cleancache_invalidate_inode(mapping); 94 91 } 95 92 EXPORT_SYMBOL(invalidate_bdev); 96 93

+1 -1

drivers/acpi/proc.c

··· 127 127 acpi_system_wakeup_device_open_fs(struct inode *inode, struct file *file) 128 128 { 129 129 return single_open(file, acpi_system_wakeup_device_seq_show, 130 - PDE_DATA(inode)); 130 + pde_data(inode)); 131 131 } 132 132 133 133 static const struct proc_ops acpi_system_wakeup_device_proc_ops = {

+6 -1

drivers/base/firmware_loader/fallback.c

··· 199 199 200 200 int register_sysfs_loader(void) 201 201 { 202 - return class_register(&firmware_class); 202 + int ret = class_register(&firmware_class); 203 + 204 + if (ret != 0) 205 + return ret; 206 + return register_firmware_config_sysctl(); 203 207 } 204 208 205 209 void unregister_sysfs_loader(void) 206 210 { 211 + unregister_firmware_config_sysctl(); 207 212 class_unregister(&firmware_class); 208 213 } 209 214

+11

drivers/base/firmware_loader/fallback.h

··· 42 42 43 43 int register_sysfs_loader(void); 44 44 void unregister_sysfs_loader(void); 45 + #ifdef CONFIG_SYSCTL 46 + extern int register_firmware_config_sysctl(void); 47 + extern void unregister_firmware_config_sysctl(void); 48 + #else 49 + static inline int register_firmware_config_sysctl(void) 50 + { 51 + return 0; 52 + } 53 + static inline void unregister_firmware_config_sysctl(void) { } 54 + #endif /* CONFIG_SYSCTL */ 55 + 45 56 #else /* CONFIG_FW_LOADER_USER_HELPER */ 46 57 static inline int firmware_fallback_sysfs(struct firmware *fw, const char *name, 47 58 struct device *device,

+23 -2

drivers/base/firmware_loader/fallback_table.c

··· 4 4 #include <linux/kconfig.h> 5 5 #include <linux/list.h> 6 6 #include <linux/slab.h> 7 + #include <linux/export.h> 7 8 #include <linux/security.h> 8 9 #include <linux/highmem.h> 9 10 #include <linux/umh.h> ··· 25 24 EXPORT_SYMBOL_NS_GPL(fw_fallback_config, FIRMWARE_LOADER_PRIVATE); 26 25 27 26 #ifdef CONFIG_SYSCTL 28 - struct ctl_table firmware_config_table[] = { 27 + static struct ctl_table firmware_config_table[] = { 29 28 { 30 29 .procname = "force_sysfs_fallback", 31 30 .data = &fw_fallback_config.force_sysfs_fallback, ··· 46 45 }, 47 46 { } 48 47 }; 49 - #endif 48 + 49 + static struct ctl_table_header *firmware_config_sysct_table_header; 50 + int register_firmware_config_sysctl(void) 51 + { 52 + firmware_config_sysct_table_header = 53 + register_sysctl("kernel/firmware_config", 54 + firmware_config_table); 55 + if (!firmware_config_sysct_table_header) 56 + return -ENOMEM; 57 + return 0; 58 + } 59 + EXPORT_SYMBOL_NS_GPL(register_firmware_config_sysctl, FIRMWARE_LOADER_PRIVATE); 60 + 61 + void unregister_firmware_config_sysctl(void) 62 + { 63 + unregister_sysctl_table(firmware_config_sysct_table_header); 64 + firmware_config_sysct_table_header = NULL; 65 + } 66 + EXPORT_SYMBOL_NS_GPL(unregister_firmware_config_sysctl, FIRMWARE_LOADER_PRIVATE); 67 + 68 + #endif /* CONFIG_SYSCTL */

+1 -22

drivers/cdrom/cdrom.c

··· 3691 3691 }, 3692 3692 { } 3693 3693 }; 3694 - 3695 - static struct ctl_table cdrom_cdrom_table[] = { 3696 - { 3697 - .procname = "cdrom", 3698 - .maxlen = 0, 3699 - .mode = 0555, 3700 - .child = cdrom_table, 3701 - }, 3702 - { } 3703 - }; 3704 - 3705 - /* Make sure that /proc/sys/dev is there */ 3706 - static struct ctl_table cdrom_root_table[] = { 3707 - { 3708 - .procname = "dev", 3709 - .maxlen = 0, 3710 - .mode = 0555, 3711 - .child = cdrom_cdrom_table, 3712 - }, 3713 - { } 3714 - }; 3715 3694 static struct ctl_table_header *cdrom_sysctl_header; 3716 3695 3717 3696 static void cdrom_sysctl_register(void) ··· 3700 3721 if (!atomic_add_unless(&initialized, 1, 1)) 3701 3722 return; 3702 3723 3703 - cdrom_sysctl_header = register_sysctl_table(cdrom_root_table); 3724 + cdrom_sysctl_header = register_sysctl("dev/cdrom", cdrom_table); 3704 3725 3705 3726 /* set the defaults */ 3706 3727 cdrom_sysctl_settings.autoclose = autoclose;

+1 -21

drivers/char/hpet.c

··· 746 746 {} 747 747 }; 748 748 749 - static struct ctl_table hpet_root[] = { 750 - { 751 - .procname = "hpet", 752 - .maxlen = 0, 753 - .mode = 0555, 754 - .child = hpet_table, 755 - }, 756 - {} 757 - }; 758 - 759 - static struct ctl_table dev_root[] = { 760 - { 761 - .procname = "dev", 762 - .maxlen = 0, 763 - .mode = 0555, 764 - .child = hpet_root, 765 - }, 766 - {} 767 - }; 768 - 769 749 static struct ctl_table_header *sysctl_header; 770 750 771 751 /* ··· 1041 1061 if (result < 0) 1042 1062 return -ENODEV; 1043 1063 1044 - sysctl_header = register_sysctl_table(dev_root); 1064 + sysctl_header = register_sysctl("dev/hpet", hpet_table); 1045 1065 1046 1066 result = acpi_bus_register_driver(&hpet_acpi_driver); 1047 1067 if (result < 0) {

+12 -2

drivers/char/random.c

··· 1992 1992 } 1993 1993 1994 1994 static int sysctl_poolsize = POOL_BITS; 1995 - extern struct ctl_table random_table[]; 1996 - struct ctl_table random_table[] = { 1995 + static struct ctl_table random_table[] = { 1997 1996 { 1998 1997 .procname = "poolsize", 1999 1998 .data = &sysctl_poolsize, ··· 2054 2055 #endif 2055 2056 { } 2056 2057 }; 2058 + 2059 + /* 2060 + * rand_initialize() is called before sysctl_init(), 2061 + * so we cannot call register_sysctl_init() in rand_initialize() 2062 + */ 2063 + static int __init random_sysctls_init(void) 2064 + { 2065 + register_sysctl_init("kernel/random", random_table); 2066 + return 0; 2067 + } 2068 + device_initcall(random_sysctls_init); 2057 2069 #endif /* CONFIG_SYSCTL */ 2058 2070 2059 2071 struct batched_entropy {

+1

drivers/gpu/drm/drm_dp_mst_topology.c

··· 5511 5511 mutex_init(&mgr->probe_lock); 5512 5512 #if IS_ENABLED(CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS) 5513 5513 mutex_init(&mgr->topology_ref_history_lock); 5514 + stack_depot_init(); 5514 5515 #endif 5515 5516 INIT_LIST_HEAD(&mgr->tx_msg_downq); 5516 5517 INIT_LIST_HEAD(&mgr->destroy_port_list);

+4

drivers/gpu/drm/drm_mm.c

··· 980 980 add_hole(&mm->head_node); 981 981 982 982 mm->scan_active = 0; 983 + 984 + #ifdef CONFIG_DRM_DEBUG_MM 985 + stack_depot_init(); 986 + #endif 983 987 } 984 988 EXPORT_SYMBOL(drm_mm_init); 985 989

+9

drivers/gpu/drm/drm_modeset_lock.c

··· 107 107 108 108 kfree(buf); 109 109 } 110 + 111 + static void __drm_stack_depot_init(void) 112 + { 113 + stack_depot_init(); 114 + } 110 115 #else /* CONFIG_DRM_DEBUG_MODESET_LOCK */ 111 116 static depot_stack_handle_t __drm_stack_depot_save(void) 112 117 { 113 118 return 0; 114 119 } 115 120 static void __drm_stack_depot_print(depot_stack_handle_t stack_depot) 121 + { 122 + } 123 + static void __drm_stack_depot_init(void) 116 124 { 117 125 } 118 126 #endif /* CONFIG_DRM_DEBUG_MODESET_LOCK */ ··· 367 359 { 368 360 ww_mutex_init(&lock->mutex, &crtc_ww_class); 369 361 INIT_LIST_HEAD(&lock->head); 362 + __drm_stack_depot_init(); 370 363 } 371 364 EXPORT_SYMBOL(drm_modeset_lock_init); 372 365

+1 -21

drivers/gpu/drm/i915/i915_perf.c

··· 4273 4273 {} 4274 4274 }; 4275 4275 4276 - static struct ctl_table i915_root[] = { 4277 - { 4278 - .procname = "i915", 4279 - .maxlen = 0, 4280 - .mode = 0555, 4281 - .child = oa_table, 4282 - }, 4283 - {} 4284 - }; 4285 - 4286 - static struct ctl_table dev_root[] = { 4287 - { 4288 - .procname = "dev", 4289 - .maxlen = 0, 4290 - .mode = 0555, 4291 - .child = i915_root, 4292 - }, 4293 - {} 4294 - }; 4295 - 4296 4276 static void oa_init_supported_formats(struct i915_perf *perf) 4297 4277 { 4298 4278 struct drm_i915_private *i915 = perf->i915; ··· 4468 4488 4469 4489 int i915_perf_sysctl_register(void) 4470 4490 { 4471 - sysctl_header = register_sysctl_table(dev_root); 4491 + sysctl_header = register_sysctl("dev/i915", oa_table); 4472 4492 return 0; 4473 4493 } 4474 4494

+3

drivers/gpu/drm/i915/intel_runtime_pm.c

··· 68 68 static void init_intel_runtime_pm_wakeref(struct intel_runtime_pm *rpm) 69 69 { 70 70 spin_lock_init(&rpm->debug.lock); 71 + 72 + if (rpm->available) 73 + stack_depot_init(); 71 74 } 72 75 73 76 static noinline depot_stack_handle_t

+2 -2

drivers/hwmon/dell-smm-hwmon.c

··· 451 451 452 452 static long i8k_ioctl(struct file *fp, unsigned int cmd, unsigned long arg) 453 453 { 454 - struct dell_smm_data *data = PDE_DATA(file_inode(fp)); 454 + struct dell_smm_data *data = pde_data(file_inode(fp)); 455 455 int __user *argp = (int __user *)arg; 456 456 int speed, err; 457 457 int val = 0; ··· 585 585 586 586 static int i8k_open_fs(struct inode *inode, struct file *file) 587 587 { 588 - return single_open(file, i8k_proc_show, PDE_DATA(inode)); 588 + return single_open(file, i8k_proc_show, pde_data(inode)); 589 589 } 590 590 591 591 static const struct proc_ops i8k_proc_ops = {

+1 -23

drivers/macintosh/mac_hid.c

··· 239 239 { } 240 240 }; 241 241 242 - /* dir in /proc/sys/dev */ 243 - static struct ctl_table mac_hid_dir[] = { 244 - { 245 - .procname = "mac_hid", 246 - .maxlen = 0, 247 - .mode = 0555, 248 - .child = mac_hid_files, 249 - }, 250 - { } 251 - }; 252 - 253 - /* /proc/sys/dev itself, in case that is not there yet */ 254 - static struct ctl_table mac_hid_root_dir[] = { 255 - { 256 - .procname = "dev", 257 - .maxlen = 0, 258 - .mode = 0555, 259 - .child = mac_hid_dir, 260 - }, 261 - { } 262 - }; 263 - 264 242 static struct ctl_table_header *mac_hid_sysctl_header; 265 243 266 244 static int __init mac_hid_init(void) 267 245 { 268 - mac_hid_sysctl_header = register_sysctl_table(mac_hid_root_dir); 246 + mac_hid_sysctl_header = register_sysctl("dev/mac_hid", mac_hid_files); 269 247 if (!mac_hid_sysctl_header) 270 248 return -ENOMEM; 271 249

+4 -4

drivers/net/bonding/bond_procfs.c

··· 11 11 static void *bond_info_seq_start(struct seq_file *seq, loff_t *pos) 12 12 __acquires(RCU) 13 13 { 14 - struct bonding *bond = PDE_DATA(file_inode(seq->file)); 14 + struct bonding *bond = pde_data(file_inode(seq->file)); 15 15 struct list_head *iter; 16 16 struct slave *slave; 17 17 loff_t off = 0; ··· 30 30 31 31 static void *bond_info_seq_next(struct seq_file *seq, void *v, loff_t *pos) 32 32 { 33 - struct bonding *bond = PDE_DATA(file_inode(seq->file)); 33 + struct bonding *bond = pde_data(file_inode(seq->file)); 34 34 struct list_head *iter; 35 35 struct slave *slave; 36 36 bool found = false; ··· 57 57 58 58 static void bond_info_show_master(struct seq_file *seq) 59 59 { 60 - struct bonding *bond = PDE_DATA(file_inode(seq->file)); 60 + struct bonding *bond = pde_data(file_inode(seq->file)); 61 61 const struct bond_opt_value *optval; 62 62 struct slave *curr, *primary; 63 63 int i; ··· 175 175 static void bond_info_show_slave(struct seq_file *seq, 176 176 const struct slave *slave) 177 177 { 178 - struct bonding *bond = PDE_DATA(file_inode(seq->file)); 178 + struct bonding *bond = pde_data(file_inode(seq->file)); 179 179 180 180 seq_printf(seq, "\nSlave Interface: %s\n", slave->dev->name); 181 181 seq_printf(seq, "MII Status: %s\n", bond_slave_link_status(slave->link));

+11 -11

drivers/net/wireless/cisco/airo.c

··· 4672 4672 static int proc_status_open(struct inode *inode, struct file *file) 4673 4673 { 4674 4674 struct proc_data *data; 4675 - struct net_device *dev = PDE_DATA(inode); 4675 + struct net_device *dev = pde_data(inode); 4676 4676 struct airo_info *apriv = dev->ml_priv; 4677 4677 CapabilityRid cap_rid; 4678 4678 StatusRid status_rid; ··· 4756 4756 u16 rid) 4757 4757 { 4758 4758 struct proc_data *data; 4759 - struct net_device *dev = PDE_DATA(inode); 4759 + struct net_device *dev = pde_data(inode); 4760 4760 struct airo_info *apriv = dev->ml_priv; 4761 4761 StatsRid stats; 4762 4762 int i, j; ··· 4819 4819 static void proc_config_on_close(struct inode *inode, struct file *file) 4820 4820 { 4821 4821 struct proc_data *data = file->private_data; 4822 - struct net_device *dev = PDE_DATA(inode); 4822 + struct net_device *dev = pde_data(inode); 4823 4823 struct airo_info *ai = dev->ml_priv; 4824 4824 char *line; 4825 4825 ··· 5030 5030 static int proc_config_open(struct inode *inode, struct file *file) 5031 5031 { 5032 5032 struct proc_data *data; 5033 - struct net_device *dev = PDE_DATA(inode); 5033 + struct net_device *dev = pde_data(inode); 5034 5034 struct airo_info *ai = dev->ml_priv; 5035 5035 int i; 5036 5036 __le16 mode; ··· 5120 5120 static void proc_SSID_on_close(struct inode *inode, struct file *file) 5121 5121 { 5122 5122 struct proc_data *data = file->private_data; 5123 - struct net_device *dev = PDE_DATA(inode); 5123 + struct net_device *dev = pde_data(inode); 5124 5124 struct airo_info *ai = dev->ml_priv; 5125 5125 SsidRid SSID_rid; 5126 5126 int i; ··· 5156 5156 static void proc_APList_on_close(struct inode *inode, struct file *file) 5157 5157 { 5158 5158 struct proc_data *data = file->private_data; 5159 - struct net_device *dev = PDE_DATA(inode); 5159 + struct net_device *dev = pde_data(inode); 5160 5160 struct airo_info *ai = dev->ml_priv; 5161 5161 APListRid *APList_rid = &ai->APList; 5162 5162 int i; ··· 5280 5280 static void proc_wepkey_on_close(struct inode *inode, struct file *file) 5281 5281 { 5282 5282 struct proc_data *data; 5283 - struct net_device *dev = PDE_DATA(inode); 5283 + struct net_device *dev = pde_data(inode); 5284 5284 struct airo_info *ai = dev->ml_priv; 5285 5285 int i, rc; 5286 5286 char key[16]; ··· 5331 5331 static int proc_wepkey_open(struct inode *inode, struct file *file) 5332 5332 { 5333 5333 struct proc_data *data; 5334 - struct net_device *dev = PDE_DATA(inode); 5334 + struct net_device *dev = pde_data(inode); 5335 5335 struct airo_info *ai = dev->ml_priv; 5336 5336 char *ptr; 5337 5337 WepKeyRid wkr; ··· 5379 5379 static int proc_SSID_open(struct inode *inode, struct file *file) 5380 5380 { 5381 5381 struct proc_data *data; 5382 - struct net_device *dev = PDE_DATA(inode); 5382 + struct net_device *dev = pde_data(inode); 5383 5383 struct airo_info *ai = dev->ml_priv; 5384 5384 int i; 5385 5385 char *ptr; ··· 5423 5423 static int proc_APList_open(struct inode *inode, struct file *file) 5424 5424 { 5425 5425 struct proc_data *data; 5426 - struct net_device *dev = PDE_DATA(inode); 5426 + struct net_device *dev = pde_data(inode); 5427 5427 struct airo_info *ai = dev->ml_priv; 5428 5428 int i; 5429 5429 char *ptr; ··· 5462 5462 static int proc_BSSList_open(struct inode *inode, struct file *file) 5463 5463 { 5464 5464 struct proc_data *data; 5465 - struct net_device *dev = PDE_DATA(inode); 5465 + struct net_device *dev = pde_data(inode); 5466 5466 struct airo_info *ai = dev->ml_priv; 5467 5467 char *ptr; 5468 5468 BSSListRid BSSList_rid;

+8 -8

drivers/net/wireless/intersil/hostap/hostap_ap.c

··· 69 69 #if !defined(PRISM2_NO_PROCFS_DEBUG) && defined(CONFIG_PROC_FS) 70 70 static int ap_debug_proc_show(struct seq_file *m, void *v) 71 71 { 72 - struct ap_data *ap = PDE_DATA(file_inode(m->file)); 72 + struct ap_data *ap = pde_data(file_inode(m->file)); 73 73 74 74 seq_printf(m, "BridgedUnicastFrames=%u\n", ap->bridged_unicast); 75 75 seq_printf(m, "BridgedMulticastFrames=%u\n", ap->bridged_multicast); ··· 320 320 321 321 static int ap_control_proc_show(struct seq_file *m, void *v) 322 322 { 323 - struct ap_data *ap = PDE_DATA(file_inode(m->file)); 323 + struct ap_data *ap = pde_data(file_inode(m->file)); 324 324 char *policy_txt; 325 325 struct mac_entry *entry; 326 326 ··· 352 352 353 353 static void *ap_control_proc_start(struct seq_file *m, loff_t *_pos) 354 354 { 355 - struct ap_data *ap = PDE_DATA(file_inode(m->file)); 355 + struct ap_data *ap = pde_data(file_inode(m->file)); 356 356 spin_lock_bh(&ap->mac_restrictions.lock); 357 357 return seq_list_start_head(&ap->mac_restrictions.mac_list, *_pos); 358 358 } 359 359 360 360 static void *ap_control_proc_next(struct seq_file *m, void *v, loff_t *_pos) 361 361 { 362 - struct ap_data *ap = PDE_DATA(file_inode(m->file)); 362 + struct ap_data *ap = pde_data(file_inode(m->file)); 363 363 return seq_list_next(v, &ap->mac_restrictions.mac_list, _pos); 364 364 } 365 365 366 366 static void ap_control_proc_stop(struct seq_file *m, void *v) 367 367 { 368 - struct ap_data *ap = PDE_DATA(file_inode(m->file)); 368 + struct ap_data *ap = pde_data(file_inode(m->file)); 369 369 spin_unlock_bh(&ap->mac_restrictions.lock); 370 370 } 371 371 ··· 554 554 555 555 static void *prism2_ap_proc_start(struct seq_file *m, loff_t *_pos) 556 556 { 557 - struct ap_data *ap = PDE_DATA(file_inode(m->file)); 557 + struct ap_data *ap = pde_data(file_inode(m->file)); 558 558 spin_lock_bh(&ap->sta_table_lock); 559 559 return seq_list_start_head(&ap->sta_list, *_pos); 560 560 } 561 561 562 562 static void *prism2_ap_proc_next(struct seq_file *m, void *v, loff_t *_pos) 563 563 { 564 - struct ap_data *ap = PDE_DATA(file_inode(m->file)); 564 + struct ap_data *ap = pde_data(file_inode(m->file)); 565 565 return seq_list_next(v, &ap->sta_list, _pos); 566 566 } 567 567 568 568 static void prism2_ap_proc_stop(struct seq_file *m, void *v) 569 569 { 570 - struct ap_data *ap = PDE_DATA(file_inode(m->file)); 570 + struct ap_data *ap = pde_data(file_inode(m->file)); 571 571 spin_unlock_bh(&ap->sta_table_lock); 572 572 } 573 573

+1 -1

drivers/net/wireless/intersil/hostap/hostap_download.c

··· 227 227 sizeof(struct prism2_download_aux_dump)); 228 228 if (ret == 0) { 229 229 struct seq_file *m = file->private_data; 230 - m->private = PDE_DATA(inode); 230 + m->private = pde_data(inode); 231 231 } 232 232 return ret; 233 233 }

+12 -12

drivers/net/wireless/intersil/hostap/hostap_proc.c

··· 97 97 98 98 static void *prism2_wds_proc_start(struct seq_file *m, loff_t *_pos) 99 99 { 100 - local_info_t *local = PDE_DATA(file_inode(m->file)); 100 + local_info_t *local = pde_data(file_inode(m->file)); 101 101 read_lock_bh(&local->iface_lock); 102 102 return seq_list_start(&local->hostap_interfaces, *_pos); 103 103 } 104 104 105 105 static void *prism2_wds_proc_next(struct seq_file *m, void *v, loff_t *_pos) 106 106 { 107 - local_info_t *local = PDE_DATA(file_inode(m->file)); 107 + local_info_t *local = pde_data(file_inode(m->file)); 108 108 return seq_list_next(v, &local->hostap_interfaces, _pos); 109 109 } 110 110 111 111 static void prism2_wds_proc_stop(struct seq_file *m, void *v) 112 112 { 113 - local_info_t *local = PDE_DATA(file_inode(m->file)); 113 + local_info_t *local = pde_data(file_inode(m->file)); 114 114 read_unlock_bh(&local->iface_lock); 115 115 } 116 116 ··· 123 123 124 124 static int prism2_bss_list_proc_show(struct seq_file *m, void *v) 125 125 { 126 - local_info_t *local = PDE_DATA(file_inode(m->file)); 126 + local_info_t *local = pde_data(file_inode(m->file)); 127 127 struct list_head *ptr = v; 128 128 struct hostap_bss_info *bss; 129 129 ··· 151 151 static void *prism2_bss_list_proc_start(struct seq_file *m, loff_t *_pos) 152 152 __acquires(&local->lock) 153 153 { 154 - local_info_t *local = PDE_DATA(file_inode(m->file)); 154 + local_info_t *local = pde_data(file_inode(m->file)); 155 155 spin_lock_bh(&local->lock); 156 156 return seq_list_start_head(&local->bss_list, *_pos); 157 157 } 158 158 159 159 static void *prism2_bss_list_proc_next(struct seq_file *m, void *v, loff_t *_pos) 160 160 { 161 - local_info_t *local = PDE_DATA(file_inode(m->file)); 161 + local_info_t *local = pde_data(file_inode(m->file)); 162 162 return seq_list_next(v, &local->bss_list, _pos); 163 163 } 164 164 165 165 static void prism2_bss_list_proc_stop(struct seq_file *m, void *v) 166 166 __releases(&local->lock) 167 167 { 168 - local_info_t *local = PDE_DATA(file_inode(m->file)); 168 + local_info_t *local = pde_data(file_inode(m->file)); 169 169 spin_unlock_bh(&local->lock); 170 170 } 171 171 ··· 198 198 static ssize_t prism2_pda_proc_read(struct file *file, char __user *buf, 199 199 size_t count, loff_t *_pos) 200 200 { 201 - local_info_t *local = PDE_DATA(file_inode(file)); 201 + local_info_t *local = pde_data(file_inode(file)); 202 202 size_t off; 203 203 204 204 if (local->pda == NULL || *_pos >= PRISM2_PDA_SIZE) ··· 272 272 #ifndef PRISM2_NO_STATION_MODES 273 273 static int prism2_scan_results_proc_show(struct seq_file *m, void *v) 274 274 { 275 - local_info_t *local = PDE_DATA(file_inode(m->file)); 275 + local_info_t *local = pde_data(file_inode(m->file)); 276 276 unsigned long entry; 277 277 int i, len; 278 278 struct hfa384x_hostscan_result *scanres; ··· 322 322 323 323 static void *prism2_scan_results_proc_start(struct seq_file *m, loff_t *_pos) 324 324 { 325 - local_info_t *local = PDE_DATA(file_inode(m->file)); 325 + local_info_t *local = pde_data(file_inode(m->file)); 326 326 spin_lock_bh(&local->lock); 327 327 328 328 /* We have a header (pos 0) + N results to show (pos 1...N) */ ··· 333 333 334 334 static void *prism2_scan_results_proc_next(struct seq_file *m, void *v, loff_t *_pos) 335 335 { 336 - local_info_t *local = PDE_DATA(file_inode(m->file)); 336 + local_info_t *local = pde_data(file_inode(m->file)); 337 337 338 338 ++*_pos; 339 339 if (*_pos > local->last_scan_results_count) ··· 343 343 344 344 static void prism2_scan_results_proc_stop(struct seq_file *m, void *v) 345 345 { 346 - local_info_t *local = PDE_DATA(file_inode(m->file)); 346 + local_info_t *local = pde_data(file_inode(m->file)); 347 347 spin_unlock_bh(&local->lock); 348 348 } 349 349

+1 -1

drivers/net/wireless/ray_cs.c

··· 2746 2746 nr = nr * 10 + c; 2747 2747 p++; 2748 2748 } while (--len); 2749 - *(int *)PDE_DATA(file_inode(file)) = nr; 2749 + *(int *)pde_data(file_inode(file)) = nr; 2750 2750 return count; 2751 2751 } 2752 2752

+18 -18

drivers/nubus/proc.c

··· 93 93 static struct nubus_proc_pde_data * 94 94 nubus_proc_alloc_pde_data(unsigned char *ptr, unsigned int size) 95 95 { 96 - struct nubus_proc_pde_data *pde_data; 96 + struct nubus_proc_pde_data *pded; 97 97 98 - pde_data = kmalloc(sizeof(*pde_data), GFP_KERNEL); 99 - if (!pde_data) 98 + pded = kmalloc(sizeof(*pded), GFP_KERNEL); 99 + if (!pded) 100 100 return NULL; 101 101 102 - pde_data->res_ptr = ptr; 103 - pde_data->res_size = size; 104 - return pde_data; 102 + pded->res_ptr = ptr; 103 + pded->res_size = size; 104 + return pded; 105 105 } 106 106 107 107 static int nubus_proc_rsrc_show(struct seq_file *m, void *v) 108 108 { 109 109 struct inode *inode = m->private; 110 - struct nubus_proc_pde_data *pde_data; 110 + struct nubus_proc_pde_data *pded; 111 111 112 - pde_data = PDE_DATA(inode); 113 - if (!pde_data) 112 + pded = pde_data(inode); 113 + if (!pded) 114 114 return 0; 115 115 116 - if (pde_data->res_size > m->size) 116 + if (pded->res_size > m->size) 117 117 return -EFBIG; 118 118 119 - if (pde_data->res_size) { 119 + if (pded->res_size) { 120 120 int lanes = (int)proc_get_parent_data(inode); 121 121 struct nubus_dirent ent; 122 122 ··· 124 124 return 0; 125 125 126 126 ent.mask = lanes; 127 - ent.base = pde_data->res_ptr; 127 + ent.base = pded->res_ptr; 128 128 ent.data = 0; 129 - nubus_seq_write_rsrc_mem(m, &ent, pde_data->res_size); 129 + nubus_seq_write_rsrc_mem(m, &ent, pded->res_size); 130 130 } else { 131 - unsigned int data = (unsigned int)pde_data->res_ptr; 131 + unsigned int data = (unsigned int)pded->res_ptr; 132 132 133 133 seq_putc(m, data >> 16); 134 134 seq_putc(m, data >> 8); ··· 142 142 unsigned int size) 143 143 { 144 144 char name[9]; 145 - struct nubus_proc_pde_data *pde_data; 145 + struct nubus_proc_pde_data *pded; 146 146 147 147 if (!procdir) 148 148 return; 149 149 150 150 snprintf(name, sizeof(name), "%x", ent->type); 151 151 if (size) 152 - pde_data = nubus_proc_alloc_pde_data(nubus_dirptr(ent), size); 152 + pded = nubus_proc_alloc_pde_data(nubus_dirptr(ent), size); 153 153 else 154 - pde_data = NULL; 154 + pded = NULL; 155 155 proc_create_single_data(name, S_IFREG | 0444, procdir, 156 - nubus_proc_rsrc_show, pde_data); 156 + nubus_proc_rsrc_show, pded); 157 157 } 158 158 159 159 void nubus_proc_add_rsrc(struct proc_dir_entry *procdir,

+2 -2

drivers/parisc/led.c

··· 168 168 169 169 static int led_proc_open(struct inode *inode, struct file *file) 170 170 { 171 - return single_open(file, led_proc_show, PDE_DATA(inode)); 171 + return single_open(file, led_proc_show, pde_data(inode)); 172 172 } 173 173 174 174 175 175 static ssize_t led_proc_write(struct file *file, const char __user *buf, 176 176 size_t count, loff_t *pos) 177 177 { 178 - void *data = PDE_DATA(file_inode(file)); 178 + void *data = pde_data(file_inode(file)); 179 179 char *cur, lbuf[32]; 180 180 int d; 181 181

+5 -5

drivers/pci/proc.c

··· 21 21 22 22 static loff_t proc_bus_pci_lseek(struct file *file, loff_t off, int whence) 23 23 { 24 - struct pci_dev *dev = PDE_DATA(file_inode(file)); 24 + struct pci_dev *dev = pde_data(file_inode(file)); 25 25 return fixed_size_llseek(file, off, whence, dev->cfg_size); 26 26 } 27 27 28 28 static ssize_t proc_bus_pci_read(struct file *file, char __user *buf, 29 29 size_t nbytes, loff_t *ppos) 30 30 { 31 - struct pci_dev *dev = PDE_DATA(file_inode(file)); 31 + struct pci_dev *dev = pde_data(file_inode(file)); 32 32 unsigned int pos = *ppos; 33 33 unsigned int cnt, size; 34 34 ··· 114 114 size_t nbytes, loff_t *ppos) 115 115 { 116 116 struct inode *ino = file_inode(file); 117 - struct pci_dev *dev = PDE_DATA(ino); 117 + struct pci_dev *dev = pde_data(ino); 118 118 int pos = *ppos; 119 119 int size = dev->cfg_size; 120 120 int cnt, ret; ··· 196 196 static long proc_bus_pci_ioctl(struct file *file, unsigned int cmd, 197 197 unsigned long arg) 198 198 { 199 - struct pci_dev *dev = PDE_DATA(file_inode(file)); 199 + struct pci_dev *dev = pde_data(file_inode(file)); 200 200 #ifdef HAVE_PCI_MMAP 201 201 struct pci_filp_private *fpriv = file->private_data; 202 202 #endif /* HAVE_PCI_MMAP */ ··· 244 244 #ifdef HAVE_PCI_MMAP 245 245 static int proc_bus_pci_mmap(struct file *file, struct vm_area_struct *vma) 246 246 { 247 - struct pci_dev *dev = PDE_DATA(file_inode(file)); 247 + struct pci_dev *dev = pde_data(file_inode(file)); 248 248 struct pci_filp_private *fpriv = file->private_data; 249 249 int i, ret, write_combine = 0, res_bit = IORESOURCE_MEM; 250 250

+2 -2

drivers/platform/x86/thinkpad_acpi.c

··· 880 880 881 881 static int dispatch_proc_open(struct inode *inode, struct file *file) 882 882 { 883 - return single_open(file, dispatch_proc_show, PDE_DATA(inode)); 883 + return single_open(file, dispatch_proc_show, pde_data(inode)); 884 884 } 885 885 886 886 static ssize_t dispatch_proc_write(struct file *file, 887 887 const char __user *userbuf, 888 888 size_t count, loff_t *pos) 889 889 { 890 - struct ibm_struct *ibm = PDE_DATA(file_inode(file)); 890 + struct ibm_struct *ibm = pde_data(file_inode(file)); 891 891 char *kernbuf; 892 892 int ret; 893 893

+8 -8

drivers/platform/x86/toshiba_acpi.c

··· 1368 1368 1369 1369 static int lcd_proc_open(struct inode *inode, struct file *file) 1370 1370 { 1371 - return single_open(file, lcd_proc_show, PDE_DATA(inode)); 1371 + return single_open(file, lcd_proc_show, pde_data(inode)); 1372 1372 } 1373 1373 1374 1374 static int set_lcd_brightness(struct toshiba_acpi_dev *dev, int value) ··· 1404 1404 static ssize_t lcd_proc_write(struct file *file, const char __user *buf, 1405 1405 size_t count, loff_t *pos) 1406 1406 { 1407 - struct toshiba_acpi_dev *dev = PDE_DATA(file_inode(file)); 1407 + struct toshiba_acpi_dev *dev = pde_data(file_inode(file)); 1408 1408 char cmd[42]; 1409 1409 size_t len; 1410 1410 int levels; ··· 1469 1469 1470 1470 static int video_proc_open(struct inode *inode, struct file *file) 1471 1471 { 1472 - return single_open(file, video_proc_show, PDE_DATA(inode)); 1472 + return single_open(file, video_proc_show, pde_data(inode)); 1473 1473 } 1474 1474 1475 1475 static ssize_t video_proc_write(struct file *file, const char __user *buf, 1476 1476 size_t count, loff_t *pos) 1477 1477 { 1478 - struct toshiba_acpi_dev *dev = PDE_DATA(file_inode(file)); 1478 + struct toshiba_acpi_dev *dev = pde_data(file_inode(file)); 1479 1479 char *buffer; 1480 1480 char *cmd; 1481 1481 int lcd_out = -1, crt_out = -1, tv_out = -1; ··· 1580 1580 1581 1581 static int fan_proc_open(struct inode *inode, struct file *file) 1582 1582 { 1583 - return single_open(file, fan_proc_show, PDE_DATA(inode)); 1583 + return single_open(file, fan_proc_show, pde_data(inode)); 1584 1584 } 1585 1585 1586 1586 static ssize_t fan_proc_write(struct file *file, const char __user *buf, 1587 1587 size_t count, loff_t *pos) 1588 1588 { 1589 - struct toshiba_acpi_dev *dev = PDE_DATA(file_inode(file)); 1589 + struct toshiba_acpi_dev *dev = pde_data(file_inode(file)); 1590 1590 char cmd[42]; 1591 1591 size_t len; 1592 1592 int value; ··· 1628 1628 1629 1629 static int keys_proc_open(struct inode *inode, struct file *file) 1630 1630 { 1631 - return single_open(file, keys_proc_show, PDE_DATA(inode)); 1631 + return single_open(file, keys_proc_show, pde_data(inode)); 1632 1632 } 1633 1633 1634 1634 static ssize_t keys_proc_write(struct file *file, const char __user *buf, 1635 1635 size_t count, loff_t *pos) 1636 1636 { 1637 - struct toshiba_acpi_dev *dev = PDE_DATA(file_inode(file)); 1637 + struct toshiba_acpi_dev *dev = pde_data(file_inode(file)); 1638 1638 char cmd[42]; 1639 1639 size_t len; 1640 1640 int value;

+1 -1

drivers/pnp/isapnp/proc.c

··· 22 22 static ssize_t isapnp_proc_bus_read(struct file *file, char __user * buf, 23 23 size_t nbytes, loff_t * ppos) 24 24 { 25 - struct pnp_dev *dev = PDE_DATA(file_inode(file)); 25 + struct pnp_dev *dev = pde_data(file_inode(file)); 26 26 int pos = *ppos; 27 27 int cnt, size = 256; 28 28

+2 -2

drivers/pnp/pnpbios/proc.c

··· 173 173 174 174 static int pnpbios_proc_open(struct inode *inode, struct file *file) 175 175 { 176 - return single_open(file, pnpbios_proc_show, PDE_DATA(inode)); 176 + return single_open(file, pnpbios_proc_show, pde_data(inode)); 177 177 } 178 178 179 179 static ssize_t pnpbios_proc_write(struct file *file, const char __user *buf, 180 180 size_t count, loff_t *pos) 181 181 { 182 - void *data = PDE_DATA(file_inode(file)); 182 + void *data = pde_data(file_inode(file)); 183 183 struct pnp_bios_node *node; 184 184 int boot = (long)data >> 8; 185 185 u8 nodenum = (long)data;

+2 -2

drivers/scsi/scsi_proc.c

··· 49 49 static ssize_t proc_scsi_host_write(struct file *file, const char __user *buf, 50 50 size_t count, loff_t *ppos) 51 51 { 52 - struct Scsi_Host *shost = PDE_DATA(file_inode(file)); 52 + struct Scsi_Host *shost = pde_data(file_inode(file)); 53 53 ssize_t ret = -ENOMEM; 54 54 char *page; 55 55 ··· 79 79 80 80 static int proc_scsi_host_open(struct inode *inode, struct file *file) 81 81 { 82 - return single_open_size(file, proc_scsi_show, PDE_DATA(inode), 82 + return single_open_size(file, proc_scsi_show, pde_data(inode), 83 83 4 * PAGE_SIZE); 84 84 } 85 85

+34 -1

drivers/scsi/sg.c

··· 77 77 78 78 #define SG_DEFAULT_TIMEOUT mult_frac(SG_DEFAULT_TIMEOUT_USER, HZ, USER_HZ) 79 79 80 - int sg_big_buff = SG_DEF_RESERVED_SIZE; 80 + static int sg_big_buff = SG_DEF_RESERVED_SIZE; 81 81 /* N.B. This variable is readable and writeable via 82 82 /proc/scsi/sg/def_reserved_size . Each time sg_open() is called a buffer 83 83 of this size (or less if there is not enough memory) will be reserved ··· 1634 1634 MODULE_PARM_DESC(def_reserved_size, "size of buffer reserved for each fd"); 1635 1635 MODULE_PARM_DESC(allow_dio, "allow direct I/O (default: 0 (disallow))"); 1636 1636 1637 + #ifdef CONFIG_SYSCTL 1638 + #include <linux/sysctl.h> 1639 + 1640 + static struct ctl_table sg_sysctls[] = { 1641 + { 1642 + .procname = "sg-big-buff", 1643 + .data = &sg_big_buff, 1644 + .maxlen = sizeof(int), 1645 + .mode = 0444, 1646 + .proc_handler = proc_dointvec, 1647 + }, 1648 + {} 1649 + }; 1650 + 1651 + static struct ctl_table_header *hdr; 1652 + static void register_sg_sysctls(void) 1653 + { 1654 + if (!hdr) 1655 + hdr = register_sysctl("kernel", sg_sysctls); 1656 + } 1657 + 1658 + static void unregister_sg_sysctls(void) 1659 + { 1660 + if (hdr) 1661 + unregister_sysctl_table(hdr); 1662 + } 1663 + #else 1664 + #define register_sg_sysctls() do { } while (0) 1665 + #define unregister_sg_sysctls() do { } while (0) 1666 + #endif /* CONFIG_SYSCTL */ 1667 + 1637 1668 static int __init 1638 1669 init_sg(void) 1639 1670 { ··· 1697 1666 return 0; 1698 1667 } 1699 1668 class_destroy(sg_sysfs_class); 1669 + register_sg_sysctls(); 1700 1670 err_out: 1701 1671 unregister_chrdev_region(MKDEV(SCSI_GENERIC_MAJOR, 0), SG_MAX_DEVS); 1702 1672 return rc; ··· 1706 1674 static void __exit 1707 1675 exit_sg(void) 1708 1676 { 1677 + unregister_sg_sysctls(); 1709 1678 #ifdef CONFIG_SCSI_PROC_FS 1710 1679 remove_proc_subtree("scsi/sg", NULL); 1711 1680 #endif /* CONFIG_SCSI_PROC_FS */

+2 -2

drivers/usb/gadget/function/rndis.c

··· 1117 1117 static ssize_t rndis_proc_write(struct file *file, const char __user *buffer, 1118 1118 size_t count, loff_t *ppos) 1119 1119 { 1120 - rndis_params *p = PDE_DATA(file_inode(file)); 1120 + rndis_params *p = pde_data(file_inode(file)); 1121 1121 u32 speed = 0; 1122 1122 int i, fl_speed = 0; 1123 1123 ··· 1161 1161 1162 1162 static int rndis_proc_open(struct inode *inode, struct file *file) 1163 1163 { 1164 - return single_open(file, rndis_proc_show, PDE_DATA(inode)); 1164 + return single_open(file, rndis_proc_show, pde_data(inode)); 1165 1165 } 1166 1166 1167 1167 static const struct proc_ops rndis_proc_ops = {

+1 -1

drivers/zorro/proc.c

··· 30 30 static ssize_t 31 31 proc_bus_zorro_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos) 32 32 { 33 - struct zorro_dev *z = PDE_DATA(file_inode(file)); 33 + struct zorro_dev *z = pde_data(file_inode(file)); 34 34 struct ConfigDev cd; 35 35 loff_t pos = *ppos; 36 36

+2

fs/Makefile

··· 6 6 # Rewritten to use lists instead of if-statements. 7 7 # 8 8 9 + obj-$(CONFIG_SYSCTL) += sysctls.o 10 + 9 11 obj-y := open.o read_write.o file_table.o super.o \ 10 12 char_dev.o stat.o exec.o pipe.o namei.o fcntl.o \ 11 13 ioctl.o readdir.o select.o dcache.o inode.o \

+3 -3

fs/afs/proc.c

··· 227 227 static void *afs_proc_cell_volumes_start(struct seq_file *m, loff_t *_pos) 228 228 __acquires(cell->proc_lock) 229 229 { 230 - struct afs_cell *cell = PDE_DATA(file_inode(m->file)); 230 + struct afs_cell *cell = pde_data(file_inode(m->file)); 231 231 232 232 rcu_read_lock(); 233 233 return seq_hlist_start_head_rcu(&cell->proc_volumes, *_pos); ··· 236 236 static void *afs_proc_cell_volumes_next(struct seq_file *m, void *v, 237 237 loff_t *_pos) 238 238 { 239 - struct afs_cell *cell = PDE_DATA(file_inode(m->file)); 239 + struct afs_cell *cell = pde_data(file_inode(m->file)); 240 240 241 241 return seq_hlist_next_rcu(v, &cell->proc_volumes, _pos); 242 242 } ··· 322 322 { 323 323 struct afs_vl_seq_net_private *priv = m->private; 324 324 struct afs_vlserver_list *vllist; 325 - struct afs_cell *cell = PDE_DATA(file_inode(m->file)); 325 + struct afs_cell *cell = pde_data(file_inode(m->file)); 326 326 loff_t pos = *_pos; 327 327 328 328 rcu_read_lock();

+29 -2

fs/aio.c

··· 220 220 221 221 /*------ sysctl variables----*/ 222 222 static DEFINE_SPINLOCK(aio_nr_lock); 223 - unsigned long aio_nr; /* current system wide number of aio requests */ 224 - unsigned long aio_max_nr = 0x10000; /* system wide maximum number of aio requests */ 223 + static unsigned long aio_nr; /* current system wide number of aio requests */ 224 + static unsigned long aio_max_nr = 0x10000; /* system wide maximum number of aio requests */ 225 225 /*----end sysctl variables---*/ 226 + #ifdef CONFIG_SYSCTL 227 + static struct ctl_table aio_sysctls[] = { 228 + { 229 + .procname = "aio-nr", 230 + .data = &aio_nr, 231 + .maxlen = sizeof(aio_nr), 232 + .mode = 0444, 233 + .proc_handler = proc_doulongvec_minmax, 234 + }, 235 + { 236 + .procname = "aio-max-nr", 237 + .data = &aio_max_nr, 238 + .maxlen = sizeof(aio_max_nr), 239 + .mode = 0644, 240 + .proc_handler = proc_doulongvec_minmax, 241 + }, 242 + {} 243 + }; 244 + 245 + static void __init aio_sysctl_init(void) 246 + { 247 + register_sysctl_init("fs", aio_sysctls); 248 + } 249 + #else 250 + #define aio_sysctl_init() do { } while (0) 251 + #endif 226 252 227 253 static struct kmem_cache *kiocb_cachep; 228 254 static struct kmem_cache *kioctx_cachep; ··· 301 275 302 276 kiocb_cachep = KMEM_CACHE(aio_kiocb, SLAB_HWCACHE_ALIGN|SLAB_PANIC); 303 277 kioctx_cachep = KMEM_CACHE(kioctx,SLAB_HWCACHE_ALIGN|SLAB_PANIC); 278 + aio_sysctl_init(); 304 279 return 0; 305 280 } 306 281 __initcall(aio_setup);

+5 -1

fs/binfmt_misc.c

··· 822 822 int err = register_filesystem(&bm_fs_type); 823 823 if (!err) 824 824 insert_binfmt(&misc_format); 825 - return err; 825 + if (!register_sysctl_mount_point("fs/binfmt_misc")) { 826 + pr_warn("Failed to create fs/binfmt_misc sysctl mount point"); 827 + return -ENOMEM; 828 + } 829 + return 0; 826 830 } 827 831 828 832 static void __exit exit_misc_binfmt(void)

-10

fs/btrfs/extent_io.c

··· 12 12 #include <linux/writeback.h> 13 13 #include <linux/pagevec.h> 14 14 #include <linux/prefetch.h> 15 - #include <linux/cleancache.h> 16 15 #include <linux/fsverity.h> 17 16 #include "misc.h" 18 17 #include "extent_io.h" ··· 3575 3576 btrfs_page_set_error(fs_info, page, start, PAGE_SIZE); 3576 3577 unlock_page(page); 3577 3578 goto out; 3578 - } 3579 - 3580 - if (!PageUptodate(page)) { 3581 - if (cleancache_get_page(page) == 0) { 3582 - BUG_ON(blocksize != PAGE_SIZE); 3583 - unlock_extent(tree, start, end); 3584 - unlock_page(page); 3585 - goto out; 3586 - } 3587 3579 } 3588 3580 3589 3581 if (page->index == last_byte >> PAGE_SHIFT) {

-2

fs/btrfs/super.c

··· 23 23 #include <linux/miscdevice.h> 24 24 #include <linux/magic.h> 25 25 #include <linux/slab.h> 26 - #include <linux/cleancache.h> 27 26 #include <linux/ratelimit.h> 28 27 #include <linux/crc32c.h> 29 28 #include <linux/btrfs.h> ··· 1373 1374 goto fail_close; 1374 1375 } 1375 1376 1376 - cleancache_init_fs(sb); 1377 1377 sb->s_flags |= SB_ACTIVE; 1378 1378 return 0; 1379 1379

+61 -5

fs/coredump.c

··· 41 41 #include <linux/fs.h> 42 42 #include <linux/path.h> 43 43 #include <linux/timekeeping.h> 44 + #include <linux/sysctl.h> 44 45 45 46 #include <linux/uaccess.h> 46 47 #include <asm/mmu_context.h> ··· 53 52 54 53 #include <trace/events/sched.h> 55 54 56 - int core_uses_pid; 57 - unsigned int core_pipe_limit; 58 - char core_pattern[CORENAME_MAX_SIZE] = "core"; 55 + static int core_uses_pid; 56 + static unsigned int core_pipe_limit; 57 + static char core_pattern[CORENAME_MAX_SIZE] = "core"; 59 58 static int core_name_size = CORENAME_MAX_SIZE; 60 59 61 60 struct core_name { 62 61 char *corename; 63 62 int used, size; 64 63 }; 65 - 66 - /* The maximal length of core_pattern is also specified in sysctl.c */ 67 64 68 65 static int expand_corename(struct core_name *cn, int size) 69 66 { ··· 891 892 return 1; 892 893 } 893 894 EXPORT_SYMBOL(dump_align); 895 + 896 + #ifdef CONFIG_SYSCTL 897 + 898 + void validate_coredump_safety(void) 899 + { 900 + if (suid_dumpable == SUID_DUMP_ROOT && 901 + core_pattern[0] != '/' && core_pattern[0] != '|') { 902 + pr_warn( 903 + "Unsafe core_pattern used with fs.suid_dumpable=2.\n" 904 + "Pipe handler or fully qualified core dump path required.\n" 905 + "Set kernel.core_pattern before fs.suid_dumpable.\n" 906 + ); 907 + } 908 + } 909 + 910 + static int proc_dostring_coredump(struct ctl_table *table, int write, 911 + void *buffer, size_t *lenp, loff_t *ppos) 912 + { 913 + int error = proc_dostring(table, write, buffer, lenp, ppos); 914 + 915 + if (!error) 916 + validate_coredump_safety(); 917 + return error; 918 + } 919 + 920 + static struct ctl_table coredump_sysctls[] = { 921 + { 922 + .procname = "core_uses_pid", 923 + .data = &core_uses_pid, 924 + .maxlen = sizeof(int), 925 + .mode = 0644, 926 + .proc_handler = proc_dointvec, 927 + }, 928 + { 929 + .procname = "core_pattern", 930 + .data = core_pattern, 931 + .maxlen = CORENAME_MAX_SIZE, 932 + .mode = 0644, 933 + .proc_handler = proc_dostring_coredump, 934 + }, 935 + { 936 + .procname = "core_pipe_limit", 937 + .data = &core_pipe_limit, 938 + .maxlen = sizeof(unsigned int), 939 + .mode = 0644, 940 + .proc_handler = proc_dointvec, 941 + }, 942 + { } 943 + }; 944 + 945 + static int __init init_fs_coredump_sysctls(void) 946 + { 947 + register_sysctl_init("kernel", coredump_sysctls); 948 + return 0; 949 + } 950 + fs_initcall(init_fs_coredump_sysctls); 951 + #endif /* CONFIG_SYSCTL */ 894 952 895 953 /* 896 954 * The purpose of always_dump_vma() is to make sure that special kernel mappings

+31 -6

fs/dcache.c

··· 115 115 return in_lookup_hashtable + hash_32(hash, IN_LOOKUP_SHIFT); 116 116 } 117 117 118 - 119 - /* Statistics gathering. */ 120 - struct dentry_stat_t dentry_stat = { 121 - .age_limit = 45, 118 + struct dentry_stat_t { 119 + long nr_dentry; 120 + long nr_unused; 121 + long age_limit; /* age in seconds */ 122 + long want_pages; /* pages requested by system */ 123 + long nr_negative; /* # of unused negative dentries */ 124 + long dummy; /* Reserved for future use */ 122 125 }; 123 126 124 127 static DEFINE_PER_CPU(long, nr_dentry); ··· 129 126 static DEFINE_PER_CPU(long, nr_dentry_negative); 130 127 131 128 #if defined(CONFIG_SYSCTL) && defined(CONFIG_PROC_FS) 129 + /* Statistics gathering. */ 130 + static struct dentry_stat_t dentry_stat = { 131 + .age_limit = 45, 132 + }; 132 133 133 134 /* 134 135 * Here we resort to our own counters instead of using generic per-cpu counters ··· 174 167 return sum < 0 ? 0 : sum; 175 168 } 176 169 177 - int proc_nr_dentry(struct ctl_table *table, int write, void *buffer, 178 - size_t *lenp, loff_t *ppos) 170 + static int proc_nr_dentry(struct ctl_table *table, int write, void *buffer, 171 + size_t *lenp, loff_t *ppos) 179 172 { 180 173 dentry_stat.nr_dentry = get_nr_dentry(); 181 174 dentry_stat.nr_unused = get_nr_dentry_unused(); 182 175 dentry_stat.nr_negative = get_nr_dentry_negative(); 183 176 return proc_doulongvec_minmax(table, write, buffer, lenp, ppos); 184 177 } 178 + 179 + static struct ctl_table fs_dcache_sysctls[] = { 180 + { 181 + .procname = "dentry-state", 182 + .data = &dentry_stat, 183 + .maxlen = 6*sizeof(long), 184 + .mode = 0444, 185 + .proc_handler = proc_nr_dentry, 186 + }, 187 + { } 188 + }; 189 + 190 + static int __init init_fs_dcache_sysctls(void) 191 + { 192 + register_sysctl_init("fs", fs_dcache_sysctls); 193 + return 0; 194 + } 195 + fs_initcall(init_fs_dcache_sysctls); 185 196 #endif 186 197 187 198 /*

+9 -1

fs/eventpoll.c

··· 307 307 static long long_zero; 308 308 static long long_max = LONG_MAX; 309 309 310 - struct ctl_table epoll_table[] = { 310 + static struct ctl_table epoll_table[] = { 311 311 { 312 312 .procname = "max_user_watches", 313 313 .data = &max_user_watches, ··· 319 319 }, 320 320 { } 321 321 }; 322 + 323 + static void __init epoll_sysctls_init(void) 324 + { 325 + register_sysctl("fs/epoll", epoll_table); 326 + } 327 + #else 328 + #define epoll_sysctls_init() do { } while (0) 322 329 #endif /* CONFIG_SYSCTL */ 323 330 324 331 static const struct file_operations eventpoll_fops; ··· 2385 2378 /* Allocates slab cache used to allocate "struct eppoll_entry" */ 2386 2379 pwq_cache = kmem_cache_create("eventpoll_pwq", 2387 2380 sizeof(struct eppoll_entry), 0, SLAB_PANIC|SLAB_ACCOUNT, NULL); 2381 + epoll_sysctls_init(); 2388 2382 2389 2383 ephead_cache = kmem_cache_create("ep_head", 2390 2384 sizeof(struct epitems_head), 0, SLAB_PANIC|SLAB_ACCOUNT, NULL);

+35

fs/exec.c

··· 65 65 #include <linux/vmalloc.h> 66 66 #include <linux/io_uring.h> 67 67 #include <linux/syscall_user_dispatch.h> 68 + #include <linux/coredump.h> 68 69 69 70 #include <linux/uaccess.h> 70 71 #include <asm/mmu_context.h> ··· 2100 2099 argv, envp, flags); 2101 2100 } 2102 2101 #endif 2102 + 2103 + #ifdef CONFIG_SYSCTL 2104 + 2105 + static int proc_dointvec_minmax_coredump(struct ctl_table *table, int write, 2106 + void *buffer, size_t *lenp, loff_t *ppos) 2107 + { 2108 + int error = proc_dointvec_minmax(table, write, buffer, lenp, ppos); 2109 + 2110 + if (!error) 2111 + validate_coredump_safety(); 2112 + return error; 2113 + } 2114 + 2115 + static struct ctl_table fs_exec_sysctls[] = { 2116 + { 2117 + .procname = "suid_dumpable", 2118 + .data = &suid_dumpable, 2119 + .maxlen = sizeof(int), 2120 + .mode = 0644, 2121 + .proc_handler = proc_dointvec_minmax_coredump, 2122 + .extra1 = SYSCTL_ZERO, 2123 + .extra2 = SYSCTL_TWO, 2124 + }, 2125 + { } 2126 + }; 2127 + 2128 + static int __init init_fs_exec_sysctls(void) 2129 + { 2130 + register_sysctl_init("fs", fs_exec_sysctls); 2131 + return 0; 2132 + } 2133 + 2134 + fs_initcall(init_fs_exec_sysctls); 2135 + #endif /* CONFIG_SYSCTL */

+7 -7

fs/ext4/mballoc.c

··· 2834 2834 2835 2835 static void *ext4_mb_seq_groups_start(struct seq_file *seq, loff_t *pos) 2836 2836 { 2837 - struct super_block *sb = PDE_DATA(file_inode(seq->file)); 2837 + struct super_block *sb = pde_data(file_inode(seq->file)); 2838 2838 ext4_group_t group; 2839 2839 2840 2840 if (*pos < 0 || *pos >= ext4_get_groups_count(sb)) ··· 2845 2845 2846 2846 static void *ext4_mb_seq_groups_next(struct seq_file *seq, void *v, loff_t *pos) 2847 2847 { 2848 - struct super_block *sb = PDE_DATA(file_inode(seq->file)); 2848 + struct super_block *sb = pde_data(file_inode(seq->file)); 2849 2849 ext4_group_t group; 2850 2850 2851 2851 ++*pos; ··· 2857 2857 2858 2858 static int ext4_mb_seq_groups_show(struct seq_file *seq, void *v) 2859 2859 { 2860 - struct super_block *sb = PDE_DATA(file_inode(seq->file)); 2860 + struct super_block *sb = pde_data(file_inode(seq->file)); 2861 2861 ext4_group_t group = (ext4_group_t) ((unsigned long) v); 2862 2862 int i; 2863 2863 int err, buddy_loaded = 0; ··· 2985 2985 static void *ext4_mb_seq_structs_summary_start(struct seq_file *seq, loff_t *pos) 2986 2986 __acquires(&EXT4_SB(sb)->s_mb_rb_lock) 2987 2987 { 2988 - struct super_block *sb = PDE_DATA(file_inode(seq->file)); 2988 + struct super_block *sb = pde_data(file_inode(seq->file)); 2989 2989 unsigned long position; 2990 2990 2991 2991 read_lock(&EXT4_SB(sb)->s_mb_rb_lock); ··· 2998 2998 2999 2999 static void *ext4_mb_seq_structs_summary_next(struct seq_file *seq, void *v, loff_t *pos) 3000 3000 { 3001 - struct super_block *sb = PDE_DATA(file_inode(seq->file)); 3001 + struct super_block *sb = pde_data(file_inode(seq->file)); 3002 3002 unsigned long position; 3003 3003 3004 3004 ++*pos; ··· 3010 3010 3011 3011 static int ext4_mb_seq_structs_summary_show(struct seq_file *seq, void *v) 3012 3012 { 3013 - struct super_block *sb = PDE_DATA(file_inode(seq->file)); 3013 + struct super_block *sb = pde_data(file_inode(seq->file)); 3014 3014 struct ext4_sb_info *sbi = EXT4_SB(sb); 3015 3015 unsigned long position = ((unsigned long) v); 3016 3016 struct ext4_group_info *grp; ··· 3058 3058 static void ext4_mb_seq_structs_summary_stop(struct seq_file *seq, void *v) 3059 3059 __releases(&EXT4_SB(sb)->s_mb_rb_lock) 3060 3060 { 3061 - struct super_block *sb = PDE_DATA(file_inode(seq->file)); 3061 + struct super_block *sb = pde_data(file_inode(seq->file)); 3062 3062 3063 3063 read_unlock(&EXT4_SB(sb)->s_mb_rb_lock); 3064 3064 }

-6

fs/ext4/readpage.c

··· 43 43 #include <linux/writeback.h> 44 44 #include <linux/backing-dev.h> 45 45 #include <linux/pagevec.h> 46 - #include <linux/cleancache.h> 47 46 48 47 #include "ext4.h" 49 48 ··· 348 349 } 349 350 } else if (fully_mapped) { 350 351 SetPageMappedToDisk(page); 351 - } 352 - if (fully_mapped && blocks_per_page == 1 && 353 - !PageUptodate(page) && cleancache_get_page(page) == 0) { 354 - SetPageUptodate(page); 355 - goto confused; 356 352 } 357 353 358 354 /*

-3

fs/ext4/super.c

··· 39 39 #include <linux/log2.h> 40 40 #include <linux/crc16.h> 41 41 #include <linux/dax.h> 42 - #include <linux/cleancache.h> 43 42 #include <linux/uaccess.h> 44 43 #include <linux/iversion.h> 45 44 #include <linux/unicode.h> ··· 3148 3149 EXT4_BLOCKS_PER_GROUP(sb), 3149 3150 EXT4_INODES_PER_GROUP(sb), 3150 3151 sbi->s_mount_opt, sbi->s_mount_opt2); 3151 - 3152 - cleancache_init_fs(sb); 3153 3152 return err; 3154 3153 } 3155 3154

-13

fs/f2fs/data.c

··· 18 18 #include <linux/swap.h> 19 19 #include <linux/prefetch.h> 20 20 #include <linux/uio.h> 21 - #include <linux/cleancache.h> 22 21 #include <linux/sched/signal.h> 23 22 #include <linux/fiemap.h> 24 23 #include <linux/iomap.h> ··· 2034 2035 block_nr = map->m_pblk + block_in_file - map->m_lblk; 2035 2036 SetPageMappedToDisk(page); 2036 2037 2037 - if (!PageUptodate(page) && (!PageSwapCache(page) && 2038 - !cleancache_get_page(page))) { 2039 - SetPageUptodate(page); 2040 - goto confused; 2041 - } 2042 - 2043 2038 if (!f2fs_is_valid_blkaddr(F2FS_I_SB(inode), block_nr, 2044 2039 DATA_GENERIC_ENHANCE_READ)) { 2045 2040 ret = -EFSCORRUPTED; ··· 2089 2096 ClearPageError(page); 2090 2097 *last_block_in_bio = block_nr; 2091 2098 goto out; 2092 - confused: 2093 - if (bio) { 2094 - __submit_bio(F2FS_I_SB(inode), bio, DATA); 2095 - bio = NULL; 2096 - } 2097 - unlock_page(page); 2098 2099 out: 2099 2100 *bio_ret = bio; 2100 2101 return ret;

+39 -8

fs/file_table.c

··· 33 33 #include "internal.h" 34 34 35 35 /* sysctl tunables... */ 36 - struct files_stat_struct files_stat = { 36 + static struct files_stat_struct files_stat = { 37 37 .max_files = NR_FILE 38 38 }; 39 39 ··· 75 75 } 76 76 EXPORT_SYMBOL_GPL(get_max_files); 77 77 78 + #if defined(CONFIG_SYSCTL) && defined(CONFIG_PROC_FS) 79 + 78 80 /* 79 81 * Handle nr_files sysctl 80 82 */ 81 - #if defined(CONFIG_SYSCTL) && defined(CONFIG_PROC_FS) 82 - int proc_nr_files(struct ctl_table *table, int write, 83 - void *buffer, size_t *lenp, loff_t *ppos) 83 + static int proc_nr_files(struct ctl_table *table, int write, void *buffer, 84 + size_t *lenp, loff_t *ppos) 84 85 { 85 86 files_stat.nr_files = get_nr_files(); 86 87 return proc_doulongvec_minmax(table, write, buffer, lenp, ppos); 87 88 } 88 - #else 89 - int proc_nr_files(struct ctl_table *table, int write, 90 - void *buffer, size_t *lenp, loff_t *ppos) 89 + 90 + static struct ctl_table fs_stat_sysctls[] = { 91 + { 92 + .procname = "file-nr", 93 + .data = &files_stat, 94 + .maxlen = sizeof(files_stat), 95 + .mode = 0444, 96 + .proc_handler = proc_nr_files, 97 + }, 98 + { 99 + .procname = "file-max", 100 + .data = &files_stat.max_files, 101 + .maxlen = sizeof(files_stat.max_files), 102 + .mode = 0644, 103 + .proc_handler = proc_doulongvec_minmax, 104 + .extra1 = SYSCTL_LONG_ZERO, 105 + .extra2 = SYSCTL_LONG_MAX, 106 + }, 107 + { 108 + .procname = "nr_open", 109 + .data = &sysctl_nr_open, 110 + .maxlen = sizeof(unsigned int), 111 + .mode = 0644, 112 + .proc_handler = proc_dointvec_minmax, 113 + .extra1 = &sysctl_nr_open_min, 114 + .extra2 = &sysctl_nr_open_max, 115 + }, 116 + { } 117 + }; 118 + 119 + static int __init init_fs_stat_sysctls(void) 91 120 { 92 - return -ENOSYS; 121 + register_sysctl_init("fs", fs_stat_sysctls); 122 + return 0; 93 123 } 124 + fs_initcall(init_fs_stat_sysctls); 94 125 #endif 95 126 96 127 static struct file *__alloc_file(int flags, const struct cred *cred)

+32 -7

fs/inode.c

··· 67 67 }; 68 68 EXPORT_SYMBOL(empty_aops); 69 69 70 - /* 71 - * Statistics gathering.. 72 - */ 73 - struct inodes_stat_t inodes_stat; 74 - 75 70 static DEFINE_PER_CPU(unsigned long, nr_inodes); 76 71 static DEFINE_PER_CPU(unsigned long, nr_unused); 77 72 ··· 101 106 * Handle nr_inode sysctl 102 107 */ 103 108 #ifdef CONFIG_SYSCTL 104 - int proc_nr_inodes(struct ctl_table *table, int write, 105 - void *buffer, size_t *lenp, loff_t *ppos) 109 + /* 110 + * Statistics gathering.. 111 + */ 112 + static struct inodes_stat_t inodes_stat; 113 + 114 + static int proc_nr_inodes(struct ctl_table *table, int write, void *buffer, 115 + size_t *lenp, loff_t *ppos) 106 116 { 107 117 inodes_stat.nr_inodes = get_nr_inodes(); 108 118 inodes_stat.nr_unused = get_nr_inodes_unused(); 109 119 return proc_doulongvec_minmax(table, write, buffer, lenp, ppos); 110 120 } 121 + 122 + static struct ctl_table inodes_sysctls[] = { 123 + { 124 + .procname = "inode-nr", 125 + .data = &inodes_stat, 126 + .maxlen = 2*sizeof(long), 127 + .mode = 0444, 128 + .proc_handler = proc_nr_inodes, 129 + }, 130 + { 131 + .procname = "inode-state", 132 + .data = &inodes_stat, 133 + .maxlen = 7*sizeof(long), 134 + .mode = 0444, 135 + .proc_handler = proc_nr_inodes, 136 + }, 137 + { } 138 + }; 139 + 140 + static int __init init_fs_inode_sysctls(void) 141 + { 142 + register_sysctl_init("fs", inodes_sysctls); 143 + return 0; 144 + } 145 + early_initcall(init_fs_inode_sysctls); 111 146 #endif 112 147 113 148 static int no_open(struct inode *inode, struct file *file)

+1 -1

fs/jbd2/journal.c

··· 1212 1212 1213 1213 static int jbd2_seq_info_open(struct inode *inode, struct file *file) 1214 1214 { 1215 - journal_t *journal = PDE_DATA(inode); 1215 + journal_t *journal = pde_data(inode); 1216 1216 struct jbd2_stats_proc_session *s; 1217 1217 int rc, size; 1218 1218

+32 -2

fs/locks.c

··· 62 62 #include <linux/pid_namespace.h> 63 63 #include <linux/hashtable.h> 64 64 #include <linux/percpu.h> 65 + #include <linux/sysctl.h> 65 66 66 67 #define CREATE_TRACE_POINTS 67 68 #include <trace/events/filelock.h> ··· 89 88 return fl->fl_type; 90 89 } 91 90 92 - int leases_enable = 1; 93 - int lease_break_time = 45; 91 + static int leases_enable = 1; 92 + static int lease_break_time = 45; 93 + 94 + #ifdef CONFIG_SYSCTL 95 + static struct ctl_table locks_sysctls[] = { 96 + { 97 + .procname = "leases-enable", 98 + .data = &leases_enable, 99 + .maxlen = sizeof(int), 100 + .mode = 0644, 101 + .proc_handler = proc_dointvec, 102 + }, 103 + #ifdef CONFIG_MMU 104 + { 105 + .procname = "lease-break-time", 106 + .data = &lease_break_time, 107 + .maxlen = sizeof(int), 108 + .mode = 0644, 109 + .proc_handler = proc_dointvec, 110 + }, 111 + #endif /* CONFIG_MMU */ 112 + {} 113 + }; 114 + 115 + static int __init init_fs_locks_sysctls(void) 116 + { 117 + register_sysctl_init("fs", locks_sysctls); 118 + return 0; 119 + } 120 + early_initcall(init_fs_locks_sysctls); 121 + #endif /* CONFIG_SYSCTL */ 94 122 95 123 /* 96 124 * The global file_lock_list is only used for displaying /proc/locks, so we

-7

fs/mpage.c

··· 29 29 #include <linux/writeback.h> 30 30 #include <linux/backing-dev.h> 31 31 #include <linux/pagevec.h> 32 - #include <linux/cleancache.h> 33 32 #include "internal.h" 34 33 35 34 /* ··· 281 282 } 282 283 } else if (fully_mapped) { 283 284 SetPageMappedToDisk(page); 284 - } 285 - 286 - if (fully_mapped && blocks_per_page == 1 && !PageUptodate(page) && 287 - cleancache_get_page(page) == 0) { 288 - SetPageUptodate(page); 289 - goto confused; 290 285 } 291 286 292 287 /*

+54 -4

fs/namei.c

··· 1020 1020 path_put(&last->link); 1021 1021 } 1022 1022 1023 - int sysctl_protected_symlinks __read_mostly = 0; 1024 - int sysctl_protected_hardlinks __read_mostly = 0; 1025 - int sysctl_protected_fifos __read_mostly; 1026 - int sysctl_protected_regular __read_mostly; 1023 + static int sysctl_protected_symlinks __read_mostly; 1024 + static int sysctl_protected_hardlinks __read_mostly; 1025 + static int sysctl_protected_fifos __read_mostly; 1026 + static int sysctl_protected_regular __read_mostly; 1027 + 1028 + #ifdef CONFIG_SYSCTL 1029 + static struct ctl_table namei_sysctls[] = { 1030 + { 1031 + .procname = "protected_symlinks", 1032 + .data = &sysctl_protected_symlinks, 1033 + .maxlen = sizeof(int), 1034 + .mode = 0600, 1035 + .proc_handler = proc_dointvec_minmax, 1036 + .extra1 = SYSCTL_ZERO, 1037 + .extra2 = SYSCTL_ONE, 1038 + }, 1039 + { 1040 + .procname = "protected_hardlinks", 1041 + .data = &sysctl_protected_hardlinks, 1042 + .maxlen = sizeof(int), 1043 + .mode = 0600, 1044 + .proc_handler = proc_dointvec_minmax, 1045 + .extra1 = SYSCTL_ZERO, 1046 + .extra2 = SYSCTL_ONE, 1047 + }, 1048 + { 1049 + .procname = "protected_fifos", 1050 + .data = &sysctl_protected_fifos, 1051 + .maxlen = sizeof(int), 1052 + .mode = 0600, 1053 + .proc_handler = proc_dointvec_minmax, 1054 + .extra1 = SYSCTL_ZERO, 1055 + .extra2 = SYSCTL_TWO, 1056 + }, 1057 + { 1058 + .procname = "protected_regular", 1059 + .data = &sysctl_protected_regular, 1060 + .maxlen = sizeof(int), 1061 + .mode = 0600, 1062 + .proc_handler = proc_dointvec_minmax, 1063 + .extra1 = SYSCTL_ZERO, 1064 + .extra2 = SYSCTL_TWO, 1065 + }, 1066 + { } 1067 + }; 1068 + 1069 + static int __init init_fs_namei_sysctls(void) 1070 + { 1071 + register_sysctl_init("fs", namei_sysctls); 1072 + return 0; 1073 + } 1074 + fs_initcall(init_fs_namei_sysctls); 1075 + 1076 + #endif /* CONFIG_SYSCTL */ 1027 1077 1028 1078 /** 1029 1079 * may_follow_link - Check symlink following for unsafe situations

+23 -1

fs/namespace.c

··· 37 37 #include "internal.h" 38 38 39 39 /* Maximum number of mounts in a mount namespace */ 40 - unsigned int sysctl_mount_max __read_mostly = 100000; 40 + static unsigned int sysctl_mount_max __read_mostly = 100000; 41 41 42 42 static unsigned int m_hash_mask __read_mostly; 43 43 static unsigned int m_hash_shift __read_mostly; ··· 4620 4620 .install = mntns_install, 4621 4621 .owner = mntns_owner, 4622 4622 }; 4623 + 4624 + #ifdef CONFIG_SYSCTL 4625 + static struct ctl_table fs_namespace_sysctls[] = { 4626 + { 4627 + .procname = "mount-max", 4628 + .data = &sysctl_mount_max, 4629 + .maxlen = sizeof(unsigned int), 4630 + .mode = 0644, 4631 + .proc_handler = proc_dointvec_minmax, 4632 + .extra1 = SYSCTL_ONE, 4633 + }, 4634 + { } 4635 + }; 4636 + 4637 + static int __init init_fs_namespace_sysctls(void) 4638 + { 4639 + register_sysctl_init("fs", fs_namespace_sysctls); 4640 + return 0; 4641 + } 4642 + fs_initcall(init_fs_namespace_sysctls); 4643 + 4644 + #endif /* CONFIG_SYSCTL */

+20 -1

fs/notify/dnotify/dnotify.c

··· 19 19 #include <linux/fdtable.h> 20 20 #include <linux/fsnotify_backend.h> 21 21 22 - int dir_notify_enable __read_mostly = 1; 22 + static int dir_notify_enable __read_mostly = 1; 23 + #ifdef CONFIG_SYSCTL 24 + static struct ctl_table dnotify_sysctls[] = { 25 + { 26 + .procname = "dir-notify-enable", 27 + .data = &dir_notify_enable, 28 + .maxlen = sizeof(int), 29 + .mode = 0644, 30 + .proc_handler = proc_dointvec, 31 + }, 32 + {} 33 + }; 34 + static void __init dnotify_sysctl_init(void) 35 + { 36 + register_sysctl_init("fs", dnotify_sysctls); 37 + } 38 + #else 39 + #define dnotify_sysctl_init() do { } while (0) 40 + #endif 23 41 24 42 static struct kmem_cache *dnotify_struct_cache __read_mostly; 25 43 static struct kmem_cache *dnotify_mark_cache __read_mostly; ··· 404 386 dnotify_group = fsnotify_alloc_group(&dnotify_fsnotify_ops); 405 387 if (IS_ERR(dnotify_group)) 406 388 panic("unable to allocate fsnotify group for dnotify\n"); 389 + dnotify_sysctl_init(); 407 390 return 0; 408 391 } 409 392

+9 -1

fs/notify/fanotify/fanotify_user.c

··· 59 59 static long ft_zero = 0; 60 60 static long ft_int_max = INT_MAX; 61 61 62 - struct ctl_table fanotify_table[] = { 62 + static struct ctl_table fanotify_table[] = { 63 63 { 64 64 .procname = "max_user_groups", 65 65 .data = &init_user_ns.ucount_max[UCOUNT_FANOTIFY_GROUPS], ··· 88 88 }, 89 89 { } 90 90 }; 91 + 92 + static void __init fanotify_sysctls_init(void) 93 + { 94 + register_sysctl("fs/fanotify", fanotify_table); 95 + } 96 + #else 97 + #define fanotify_sysctls_init() do { } while (0) 91 98 #endif /* CONFIG_SYSCTL */ 92 99 93 100 /* ··· 1750 1743 init_user_ns.ucount_max[UCOUNT_FANOTIFY_GROUPS] = 1751 1744 FANOTIFY_DEFAULT_MAX_GROUPS; 1752 1745 init_user_ns.ucount_max[UCOUNT_FANOTIFY_MARKS] = max_marks; 1746 + fanotify_sysctls_init(); 1753 1747 1754 1748 return 0; 1755 1749 }

+10 -1

fs/notify/inotify/inotify_user.c

··· 58 58 static long it_zero = 0; 59 59 static long it_int_max = INT_MAX; 60 60 61 - struct ctl_table inotify_table[] = { 61 + static struct ctl_table inotify_table[] = { 62 62 { 63 63 .procname = "max_user_instances", 64 64 .data = &init_user_ns.ucount_max[UCOUNT_INOTIFY_INSTANCES], ··· 87 87 }, 88 88 { } 89 89 }; 90 + 91 + static void __init inotify_sysctls_init(void) 92 + { 93 + register_sysctl("fs/inotify", inotify_table); 94 + } 95 + 96 + #else 97 + #define inotify_sysctls_init() do { } while (0) 90 98 #endif /* CONFIG_SYSCTL */ 91 99 92 100 static inline __u32 inotify_arg_to_mask(struct inode *inode, u32 arg) ··· 857 849 inotify_max_queued_events = 16384; 858 850 init_user_ns.ucount_max[UCOUNT_INOTIFY_INSTANCES] = 128; 859 851 init_user_ns.ucount_max[UCOUNT_INOTIFY_WATCHES] = watches_max; 852 + inotify_sysctls_init(); 860 853 861 854 return 0; 862 855 }

-1

fs/ntfs3/ntfs_fs.h

··· 11 11 12 12 #include <linux/blkdev.h> 13 13 #include <linux/buffer_head.h> 14 - #include <linux/cleancache.h> 15 14 #include <linux/fs.h> 16 15 #include <linux/highmem.h> 17 16 #include <linux/kernel.h>

+1 -24

fs/ocfs2/stackglue.c

··· 672 672 { } 673 673 }; 674 674 675 - static struct ctl_table ocfs2_kern_table[] = { 676 - { 677 - .procname = "ocfs2", 678 - .data = NULL, 679 - .maxlen = 0, 680 - .mode = 0555, 681 - .child = ocfs2_mod_table 682 - }, 683 - { } 684 - }; 685 - 686 - static struct ctl_table ocfs2_root_table[] = { 687 - { 688 - .procname = "fs", 689 - .data = NULL, 690 - .maxlen = 0, 691 - .mode = 0555, 692 - .child = ocfs2_kern_table 693 - }, 694 - { } 695 - }; 696 - 697 675 static struct ctl_table_header *ocfs2_table_header; 698 - 699 676 700 677 /* 701 678 * Initialization ··· 682 705 { 683 706 strcpy(cluster_stack_name, OCFS2_STACK_PLUGIN_O2CB); 684 707 685 - ocfs2_table_header = register_sysctl_table(ocfs2_root_table); 708 + ocfs2_table_header = register_sysctl("fs/ocfs2", ocfs2_mod_table); 686 709 if (!ocfs2_table_header) { 687 710 printk(KERN_ERR 688 711 "ocfs2 stack glue: unable to register sysctl\n");

-2

fs/ocfs2/super.c

··· 25 25 #include <linux/mount.h> 26 26 #include <linux/seq_file.h> 27 27 #include <linux/quotaops.h> 28 - #include <linux/cleancache.h> 29 28 #include <linux/signal.h> 30 29 31 30 #define CREATE_TRACE_POINTS ··· 2282 2283 mlog_errno(status); 2283 2284 goto bail; 2284 2285 } 2285 - cleancache_init_shared_fs(sb); 2286 2286 2287 2287 osb->ocfs2_wq = alloc_ordered_workqueue("ocfs2_wq", WQ_MEM_RECLAIM); 2288 2288 if (!osb->ocfs2_wq) {

+61 -3

fs/pipe.c

··· 25 25 #include <linux/fcntl.h> 26 26 #include <linux/memcontrol.h> 27 27 #include <linux/watch_queue.h> 28 + #include <linux/sysctl.h> 28 29 29 30 #include <linux/uaccess.h> 30 31 #include <asm/ioctls.h> ··· 51 50 * The max size that a non-root user is allowed to grow the pipe. Can 52 51 * be set by root in /proc/sys/fs/pipe-max-size 53 52 */ 54 - unsigned int pipe_max_size = 1048576; 53 + static unsigned int pipe_max_size = 1048576; 55 54 56 55 /* Maximum allocatable pages per user. Hard limit is unset by default, soft 57 56 * matches default values. 58 57 */ 59 - unsigned long pipe_user_pages_hard; 60 - unsigned long pipe_user_pages_soft = PIPE_DEF_BUFFERS * INR_OPEN_CUR; 58 + static unsigned long pipe_user_pages_hard; 59 + static unsigned long pipe_user_pages_soft = PIPE_DEF_BUFFERS * INR_OPEN_CUR; 61 60 62 61 /* 63 62 * We use head and tail indices that aren't masked off, except at the point of ··· 1429 1428 .kill_sb = kill_anon_super, 1430 1429 }; 1431 1430 1431 + #ifdef CONFIG_SYSCTL 1432 + static int do_proc_dopipe_max_size_conv(unsigned long *lvalp, 1433 + unsigned int *valp, 1434 + int write, void *data) 1435 + { 1436 + if (write) { 1437 + unsigned int val; 1438 + 1439 + val = round_pipe_size(*lvalp); 1440 + if (val == 0) 1441 + return -EINVAL; 1442 + 1443 + *valp = val; 1444 + } else { 1445 + unsigned int val = *valp; 1446 + *lvalp = (unsigned long) val; 1447 + } 1448 + 1449 + return 0; 1450 + } 1451 + 1452 + static int proc_dopipe_max_size(struct ctl_table *table, int write, 1453 + void *buffer, size_t *lenp, loff_t *ppos) 1454 + { 1455 + return do_proc_douintvec(table, write, buffer, lenp, ppos, 1456 + do_proc_dopipe_max_size_conv, NULL); 1457 + } 1458 + 1459 + static struct ctl_table fs_pipe_sysctls[] = { 1460 + { 1461 + .procname = "pipe-max-size", 1462 + .data = &pipe_max_size, 1463 + .maxlen = sizeof(pipe_max_size), 1464 + .mode = 0644, 1465 + .proc_handler = proc_dopipe_max_size, 1466 + }, 1467 + { 1468 + .procname = "pipe-user-pages-hard", 1469 + .data = &pipe_user_pages_hard, 1470 + .maxlen = sizeof(pipe_user_pages_hard), 1471 + .mode = 0644, 1472 + .proc_handler = proc_doulongvec_minmax, 1473 + }, 1474 + { 1475 + .procname = "pipe-user-pages-soft", 1476 + .data = &pipe_user_pages_soft, 1477 + .maxlen = sizeof(pipe_user_pages_soft), 1478 + .mode = 0644, 1479 + .proc_handler = proc_doulongvec_minmax, 1480 + }, 1481 + { } 1482 + }; 1483 + #endif 1484 + 1432 1485 static int __init init_pipe_fs(void) 1433 1486 { 1434 1487 int err = register_filesystem(&pipe_fs_type); ··· 1494 1439 unregister_filesystem(&pipe_fs_type); 1495 1440 } 1496 1441 } 1442 + #ifdef CONFIG_SYSCTL 1443 + register_sysctl_init("fs", fs_pipe_sysctls); 1444 + #endif 1497 1445 return err; 1498 1446 } 1499 1447

-6

fs/proc/generic.c

··· 791 791 } 792 792 EXPORT_SYMBOL(proc_remove); 793 793 794 - void *PDE_DATA(const struct inode *inode) 795 - { 796 - return __PDE_DATA(inode); 797 - } 798 - EXPORT_SYMBOL(PDE_DATA); 799 - 800 794 /* 801 795 * Pull a user buffer into memory and pass it to the file's write handler if 802 796 * one is supplied. The ->write() method is permitted to modify the

+1

fs/proc/inode.c

··· 650 650 return NULL; 651 651 } 652 652 653 + inode->i_private = de->data; 653 654 inode->i_ino = de->low_ino; 654 655 inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode); 655 656 PROC_I(inode)->pde = de;

-5

fs/proc/internal.h

··· 115 115 return PROC_I(inode)->pde; 116 116 } 117 117 118 - static inline void *__PDE_DATA(const struct inode *inode) 119 - { 120 - return PDE(inode)->data; 121 - } 122 - 123 118 static inline struct pid *proc_pid(const struct inode *inode) 124 119 { 125 120 return PROC_I(inode)->pid;

+4 -4

fs/proc/proc_net.c

··· 138 138 * @parent: The parent directory in which to create. 139 139 * @ops: The seq_file ops with which to read the file. 140 140 * @write: The write method with which to 'modify' the file. 141 - * @data: Data for retrieval by PDE_DATA(). 141 + * @data: Data for retrieval by pde_data(). 142 142 * 143 143 * Create a network namespaced proc file in the @parent directory with the 144 144 * specified @name and @mode that allows reading of a file that displays a ··· 153 153 * modified by the @write function. @write should return 0 on success. 154 154 * 155 155 * The @data value is accessible from the @show and @write functions by calling 156 - * PDE_DATA() on the file inode. The network namespace must be accessed by 156 + * pde_data() on the file inode. The network namespace must be accessed by 157 157 * calling seq_file_net() on the seq_file struct. 158 158 */ 159 159 struct proc_dir_entry *proc_create_net_data_write(const char *name, umode_t mode, ··· 230 230 * @parent: The parent directory in which to create. 231 231 * @show: The seqfile show method with which to read the file. 232 232 * @write: The write method with which to 'modify' the file. 233 - * @data: Data for retrieval by PDE_DATA(). 233 + * @data: Data for retrieval by pde_data(). 234 234 * 235 235 * Create a network-namespaced proc file in the @parent directory with the 236 236 * specified @name and @mode that allows reading of a file that displays a ··· 245 245 * modified by the @write function. @write should return 0 on success. 246 246 * 247 247 * The @data value is accessible from the @show and @write functions by calling 248 - * PDE_DATA() on the file inode. The network namespace must be accessed by 248 + * pde_data() on the file inode. The network namespace must be accessed by 249 249 * calling seq_file_single_net() on the seq_file struct. 250 250 */ 251 251 struct proc_dir_entry *proc_create_net_single_write(const char *name, umode_t mode,

+61 -2

fs/proc/proc_sysctl.c

··· 16 16 #include <linux/module.h> 17 17 #include <linux/bpf-cgroup.h> 18 18 #include <linux/mount.h> 19 + #include <linux/kmemleak.h> 19 20 #include "internal.h" 20 21 21 22 static const struct dentry_operations proc_sys_dentry_operations; ··· 26 25 static const struct inode_operations proc_sys_dir_operations; 27 26 28 27 /* shared constants to be used in various sysctls */ 29 - const int sysctl_vals[] = { 0, 1, INT_MAX }; 28 + const int sysctl_vals[] = { -1, 0, 1, 2, 4, 100, 200, 1000, 3000, INT_MAX, 65535 }; 30 29 EXPORT_SYMBOL(sysctl_vals); 30 + 31 + const unsigned long sysctl_long_vals[] = { 0, 1, LONG_MAX }; 32 + EXPORT_SYMBOL_GPL(sysctl_long_vals); 31 33 32 34 /* Support for permanently empty directories */ 33 35 34 36 struct ctl_table sysctl_mount_point[] = { 35 37 { } 36 38 }; 39 + 40 + /** 41 + * register_sysctl_mount_point() - registers a sysctl mount point 42 + * @path: path for the mount point 43 + * 44 + * Used to create a permanently empty directory to serve as mount point. 45 + * There are some subtle but important permission checks this allows in the 46 + * case of unprivileged mounts. 47 + */ 48 + struct ctl_table_header *register_sysctl_mount_point(const char *path) 49 + { 50 + return register_sysctl(path, sysctl_mount_point); 51 + } 52 + EXPORT_SYMBOL(register_sysctl_mount_point); 37 53 38 54 static bool is_empty_dir(struct ctl_table_header *head) 39 55 { ··· 1401 1383 } 1402 1384 EXPORT_SYMBOL(register_sysctl); 1403 1385 1386 + /** 1387 + * __register_sysctl_init() - register sysctl table to path 1388 + * @path: path name for sysctl base 1389 + * @table: This is the sysctl table that needs to be registered to the path 1390 + * @table_name: The name of sysctl table, only used for log printing when 1391 + * registration fails 1392 + * 1393 + * The sysctl interface is used by userspace to query or modify at runtime 1394 + * a predefined value set on a variable. These variables however have default 1395 + * values pre-set. Code which depends on these variables will always work even 1396 + * if register_sysctl() fails. If register_sysctl() fails you'd just loose the 1397 + * ability to query or modify the sysctls dynamically at run time. Chances of 1398 + * register_sysctl() failing on init are extremely low, and so for both reasons 1399 + * this function does not return any error as it is used by initialization code. 1400 + * 1401 + * Context: Can only be called after your respective sysctl base path has been 1402 + * registered. So for instance, most base directories are registered early on 1403 + * init before init levels are processed through proc_sys_init() and 1404 + * sysctl_init_bases(). 1405 + */ 1406 + void __init __register_sysctl_init(const char *path, struct ctl_table *table, 1407 + const char *table_name) 1408 + { 1409 + struct ctl_table_header *hdr = register_sysctl(path, table); 1410 + 1411 + if (unlikely(!hdr)) { 1412 + pr_err("failed when register_sysctl %s to %s\n", table_name, path); 1413 + return; 1414 + } 1415 + kmemleak_not_leak(hdr); 1416 + } 1417 + 1404 1418 static char *append_path(const char *path, char *pos, const char *name) 1405 1419 { 1406 1420 int namelen; ··· 1646 1596 } 1647 1597 EXPORT_SYMBOL(register_sysctl_table); 1648 1598 1599 + int __register_sysctl_base(struct ctl_table *base_table) 1600 + { 1601 + struct ctl_table_header *hdr; 1602 + 1603 + hdr = register_sysctl_table(base_table); 1604 + kmemleak_not_leak(hdr); 1605 + return 0; 1606 + } 1607 + 1649 1608 static void put_links(struct ctl_table_header *header) 1650 1609 { 1651 1610 struct ctl_table_set *root_set = &sysctl_table_root.default_set; ··· 1768 1709 proc_sys_root->proc_dir_ops = &proc_sys_dir_file_operations; 1769 1710 proc_sys_root->nlink = 0; 1770 1711 1771 - return sysctl_init(); 1712 + return sysctl_init_bases(); 1772 1713 } 1773 1714 1774 1715 struct sysctl_alias {

-3

fs/super.c

··· 31 31 #include <linux/mutex.h> 32 32 #include <linux/backing-dev.h> 33 33 #include <linux/rculist_bl.h> 34 - #include <linux/cleancache.h> 35 34 #include <linux/fscrypt.h> 36 35 #include <linux/fsnotify.h> 37 36 #include <linux/lockdep.h> ··· 259 260 s->s_time_gran = 1000000000; 260 261 s->s_time_min = TIME64_MIN; 261 262 s->s_time_max = TIME64_MAX; 262 - s->cleancache_poolid = CLEANCACHE_NO_POOL; 263 263 264 264 s->s_shrink.seeks = DEFAULT_SEEKS; 265 265 s->s_shrink.scan_objects = super_cache_scan; ··· 328 330 { 329 331 struct file_system_type *fs = s->s_type; 330 332 if (atomic_dec_and_test(&s->s_active)) { 331 - cleancache_invalidate_fs(s); 332 333 unregister_shrinker(&s->s_shrink); 333 334 fs->kill_sb(s); 334 335

+39

fs/sysctls.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * /proc/sys/fs shared sysctls 4 + * 5 + * These sysctls are shared between different filesystems. 6 + */ 7 + #include <linux/init.h> 8 + #include <linux/sysctl.h> 9 + 10 + static struct ctl_table fs_shared_sysctls[] = { 11 + { 12 + .procname = "overflowuid", 13 + .data = &fs_overflowuid, 14 + .maxlen = sizeof(int), 15 + .mode = 0644, 16 + .proc_handler = proc_dointvec_minmax, 17 + .extra1 = SYSCTL_ZERO, 18 + .extra2 = SYSCTL_MAXOLDUID, 19 + }, 20 + { 21 + .procname = "overflowgid", 22 + .data = &fs_overflowgid, 23 + .maxlen = sizeof(int), 24 + .mode = 0644, 25 + .proc_handler = proc_dointvec_minmax, 26 + .extra1 = SYSCTL_ZERO, 27 + .extra2 = SYSCTL_MAXOLDUID, 28 + }, 29 + { } 30 + }; 31 + 32 + DECLARE_SYSCTL_BASE(fs, fs_shared_sysctls); 33 + 34 + static int __init init_fs_sysctls(void) 35 + { 36 + return register_sysctl_base(fs); 37 + } 38 + 39 + early_initcall(init_fs_sysctls);

-4

include/linux/aio.h

··· 20 20 kiocb_cancel_fn *cancel) { } 21 21 #endif /* CONFIG_AIO */ 22 22 23 - /* for sysctl: */ 24 - extern unsigned long aio_nr; 25 - extern unsigned long aio_max_nr; 26 - 27 23 #endif /* __LINUX__AIO_H */

-124

include/linux/cleancache.h

··· 1 - /* SPDX-License-Identifier: GPL-2.0 */ 2 - #ifndef _LINUX_CLEANCACHE_H 3 - #define _LINUX_CLEANCACHE_H 4 - 5 - #include <linux/fs.h> 6 - #include <linux/exportfs.h> 7 - #include <linux/mm.h> 8 - 9 - #define CLEANCACHE_NO_POOL -1 10 - #define CLEANCACHE_NO_BACKEND -2 11 - #define CLEANCACHE_NO_BACKEND_SHARED -3 12 - 13 - #define CLEANCACHE_KEY_MAX 6 14 - 15 - /* 16 - * cleancache requires every file with a page in cleancache to have a 17 - * unique key unless/until the file is removed/truncated. For some 18 - * filesystems, the inode number is unique, but for "modern" filesystems 19 - * an exportable filehandle is required (see exportfs.h) 20 - */ 21 - struct cleancache_filekey { 22 - union { 23 - ino_t ino; 24 - __u32 fh[CLEANCACHE_KEY_MAX]; 25 - u32 key[CLEANCACHE_KEY_MAX]; 26 - } u; 27 - }; 28 - 29 - struct cleancache_ops { 30 - int (*init_fs)(size_t); 31 - int (*init_shared_fs)(uuid_t *uuid, size_t); 32 - int (*get_page)(int, struct cleancache_filekey, 33 - pgoff_t, struct page *); 34 - void (*put_page)(int, struct cleancache_filekey, 35 - pgoff_t, struct page *); 36 - void (*invalidate_page)(int, struct cleancache_filekey, pgoff_t); 37 - void (*invalidate_inode)(int, struct cleancache_filekey); 38 - void (*invalidate_fs)(int); 39 - }; 40 - 41 - extern int cleancache_register_ops(const struct cleancache_ops *ops); 42 - extern void __cleancache_init_fs(struct super_block *); 43 - extern void __cleancache_init_shared_fs(struct super_block *); 44 - extern int __cleancache_get_page(struct page *); 45 - extern void __cleancache_put_page(struct page *); 46 - extern void __cleancache_invalidate_page(struct address_space *, struct page *); 47 - extern void __cleancache_invalidate_inode(struct address_space *); 48 - extern void __cleancache_invalidate_fs(struct super_block *); 49 - 50 - #ifdef CONFIG_CLEANCACHE 51 - #define cleancache_enabled (1) 52 - static inline bool cleancache_fs_enabled_mapping(struct address_space *mapping) 53 - { 54 - return mapping->host->i_sb->cleancache_poolid >= 0; 55 - } 56 - static inline bool cleancache_fs_enabled(struct page *page) 57 - { 58 - return cleancache_fs_enabled_mapping(page->mapping); 59 - } 60 - #else 61 - #define cleancache_enabled (0) 62 - #define cleancache_fs_enabled(_page) (0) 63 - #define cleancache_fs_enabled_mapping(_page) (0) 64 - #endif 65 - 66 - /* 67 - * The shim layer provided by these inline functions allows the compiler 68 - * to reduce all cleancache hooks to nothingness if CONFIG_CLEANCACHE 69 - * is disabled, to a single global variable check if CONFIG_CLEANCACHE 70 - * is enabled but no cleancache "backend" has dynamically enabled it, 71 - * and, for the most frequent cleancache ops, to a single global variable 72 - * check plus a superblock element comparison if CONFIG_CLEANCACHE is enabled 73 - * and a cleancache backend has dynamically enabled cleancache, but the 74 - * filesystem referenced by that cleancache op has not enabled cleancache. 75 - * As a result, CONFIG_CLEANCACHE can be enabled by default with essentially 76 - * no measurable performance impact. 77 - */ 78 - 79 - static inline void cleancache_init_fs(struct super_block *sb) 80 - { 81 - if (cleancache_enabled) 82 - __cleancache_init_fs(sb); 83 - } 84 - 85 - static inline void cleancache_init_shared_fs(struct super_block *sb) 86 - { 87 - if (cleancache_enabled) 88 - __cleancache_init_shared_fs(sb); 89 - } 90 - 91 - static inline int cleancache_get_page(struct page *page) 92 - { 93 - if (cleancache_enabled && cleancache_fs_enabled(page)) 94 - return __cleancache_get_page(page); 95 - return -1; 96 - } 97 - 98 - static inline void cleancache_put_page(struct page *page) 99 - { 100 - if (cleancache_enabled && cleancache_fs_enabled(page)) 101 - __cleancache_put_page(page); 102 - } 103 - 104 - static inline void cleancache_invalidate_page(struct address_space *mapping, 105 - struct page *page) 106 - { 107 - /* careful... page->mapping is NULL sometimes when this is called */ 108 - if (cleancache_enabled && cleancache_fs_enabled_mapping(mapping)) 109 - __cleancache_invalidate_page(mapping, page); 110 - } 111 - 112 - static inline void cleancache_invalidate_inode(struct address_space *mapping) 113 - { 114 - if (cleancache_enabled && cleancache_fs_enabled_mapping(mapping)) 115 - __cleancache_invalidate_inode(mapping); 116 - } 117 - 118 - static inline void cleancache_invalidate_fs(struct super_block *sb) 119 - { 120 - if (cleancache_enabled) 121 - __cleancache_invalidate_fs(sb); 122 - } 123 - 124 - #endif /* _LINUX_CLEANCACHE_H */

+6 -4

include/linux/coredump.h

··· 14 14 unsigned long dump_size; 15 15 }; 16 16 17 - extern int core_uses_pid; 18 - extern char core_pattern[]; 19 - extern unsigned int core_pipe_limit; 20 - 21 17 /* 22 18 * These are the only things you should do on a core-file: use only these 23 19 * functions to write out all the necessary info. ··· 31 35 extern void do_coredump(const kernel_siginfo_t *siginfo); 32 36 #else 33 37 static inline void do_coredump(const kernel_siginfo_t *siginfo) {} 38 + #endif 39 + 40 + #if defined(CONFIG_COREDUMP) && defined(CONFIG_SYSCTL) 41 + extern void validate_coredump_safety(void); 42 + #else 43 + static inline void validate_coredump_safety(void) {} 34 44 #endif 35 45 36 46 #endif /* _LINUX_COREDUMP_H */

-10

include/linux/dcache.h

··· 61 61 extern const struct qstr slash_name; 62 62 extern const struct qstr dotdot_name; 63 63 64 - struct dentry_stat_t { 65 - long nr_dentry; 66 - long nr_unused; 67 - long age_limit; /* age in seconds */ 68 - long want_pages; /* pages requested by system */ 69 - long nr_negative; /* # of unused negative dentries */ 70 - long dummy; /* Reserved for future use */ 71 - }; 72 - extern struct dentry_stat_t dentry_stat; 73 - 74 64 /* 75 65 * Try to keep struct dentry aligned on 64 byte cachelines (this will 76 66 * give reasonable cacheline footprint with larger lines without the

-1

include/linux/dnotify.h

··· 29 29 FS_CREATE | FS_RENAME |\ 30 30 FS_MOVED_FROM | FS_MOVED_TO) 31 31 32 - extern int dir_notify_enable; 33 32 extern void dnotify_flush(struct file *, fl_owner_t); 34 33 extern int fcntl_dirnotify(int, struct file *, unsigned long); 35 34

-2

include/linux/fanotify.h

··· 5 5 #include <linux/sysctl.h> 6 6 #include <uapi/linux/fanotify.h> 7 7 8 - extern struct ctl_table fanotify_table[]; /* for sysctl */ 9 - 10 8 #define FAN_GROUP_FLAG(group, flag) \ 11 9 ((group)->fanotify_data.flags & (flag)) 12 10

+2 -33

include/linux/frontswap.h

··· 7 7 #include <linux/bitops.h> 8 8 #include <linux/jump_label.h> 9 9 10 - /* 11 - * Return code to denote that requested number of 12 - * frontswap pages are unused(moved to page cache). 13 - * Used in shmem_unuse and try_to_unuse. 14 - */ 15 - #define FRONTSWAP_PAGES_UNUSED 2 16 - 17 10 struct frontswap_ops { 18 11 void (*init)(unsigned); /* this swap type was just swapon'ed */ 19 12 int (*store)(unsigned, pgoff_t, struct page *); /* store a page */ 20 13 int (*load)(unsigned, pgoff_t, struct page *); /* load a page */ 21 14 void (*invalidate_page)(unsigned, pgoff_t); /* page no longer needed */ 22 15 void (*invalidate_area)(unsigned); /* swap type just swapoff'ed */ 23 - struct frontswap_ops *next; /* private pointer to next ops */ 24 16 }; 25 17 26 - extern void frontswap_register_ops(struct frontswap_ops *ops); 27 - extern void frontswap_shrink(unsigned long); 28 - extern unsigned long frontswap_curr_pages(void); 29 - extern void frontswap_writethrough(bool); 30 - #define FRONTSWAP_HAS_EXCLUSIVE_GETS 31 - extern void frontswap_tmem_exclusive_gets(bool); 18 + int frontswap_register_ops(const struct frontswap_ops *ops); 32 19 33 - extern bool __frontswap_test(struct swap_info_struct *, pgoff_t); 34 - extern void __frontswap_init(unsigned type, unsigned long *map); 20 + extern void frontswap_init(unsigned type, unsigned long *map); 35 21 extern int __frontswap_store(struct page *page); 36 22 extern int __frontswap_load(struct page *page); 37 23 extern void __frontswap_invalidate_page(unsigned, pgoff_t); ··· 29 43 static inline bool frontswap_enabled(void) 30 44 { 31 45 return static_branch_unlikely(&frontswap_enabled_key); 32 - } 33 - 34 - static inline bool frontswap_test(struct swap_info_struct *sis, pgoff_t offset) 35 - { 36 - return __frontswap_test(sis, offset); 37 46 } 38 47 39 48 static inline void frontswap_map_set(struct swap_info_struct *p, ··· 45 64 /* all inline routines become no-ops and all externs are ignored */ 46 65 47 66 static inline bool frontswap_enabled(void) 48 - { 49 - return false; 50 - } 51 - 52 - static inline bool frontswap_test(struct swap_info_struct *sis, pgoff_t offset) 53 67 { 54 68 return false; 55 69 } ··· 86 110 { 87 111 if (frontswap_enabled()) 88 112 __frontswap_invalidate_area(type); 89 - } 90 - 91 - static inline void frontswap_init(unsigned type, unsigned long *map) 92 - { 93 - #ifdef CONFIG_FRONTSWAP 94 - __frontswap_init(type, map); 95 - #endif 96 113 } 97 114 98 115 #endif /* _LINUX_FRONTSWAP_H */

-18

include/linux/fs.h

··· 79 79 extern void __init files_init(void); 80 80 extern void __init files_maxfiles_init(void); 81 81 82 - extern struct files_stat_struct files_stat; 83 82 extern unsigned long get_max_files(void); 84 83 extern unsigned int sysctl_nr_open; 85 - extern struct inodes_stat_t inodes_stat; 86 - extern int leases_enable, lease_break_time; 87 - extern int sysctl_protected_symlinks; 88 - extern int sysctl_protected_hardlinks; 89 - extern int sysctl_protected_fifos; 90 - extern int sysctl_protected_regular; 91 84 92 85 typedef __kernel_rwf_t rwf_t; 93 86 ··· 1534 1541 const char *s_subtype; 1535 1542 1536 1543 const struct dentry_operations *s_d_op; /* default d_op for dentries */ 1537 - 1538 - /* 1539 - * Saved pool identifier for cleancache (-1 means none) 1540 - */ 1541 - int cleancache_poolid; 1542 1544 1543 1545 struct shrinker s_shrink; /* per-sb shrinker handle */ 1544 1546 ··· 3521 3533 size_t len, loff_t *ppos); 3522 3534 3523 3535 struct ctl_table; 3524 - int proc_nr_files(struct ctl_table *table, int write, 3525 - void *buffer, size_t *lenp, loff_t *ppos); 3526 - int proc_nr_dentry(struct ctl_table *table, int write, 3527 - void *buffer, size_t *lenp, loff_t *ppos); 3528 - int proc_nr_inodes(struct ctl_table *table, int write, 3529 - void *buffer, size_t *lenp, loff_t *ppos); 3530 3536 int __init list_bdev_fs_names(char *buf, size_t size); 3531 3537 3532 3538 #define __FMODE_EXEC ((__force int) FMODE_EXEC)

-3

include/linux/inotify.h

··· 7 7 #ifndef _LINUX_INOTIFY_H 8 8 #define _LINUX_INOTIFY_H 9 9 10 - #include <linux/sysctl.h> 11 10 #include <uapi/linux/inotify.h> 12 - 13 - extern struct ctl_table inotify_table[]; /* for sysctl */ 14 11 15 12 #define ALL_INOTIFY_BITS (IN_ACCESS | IN_MODIFY | IN_ATTRIB | IN_CLOSE_WRITE | \ 16 13 IN_CLOSE_NOWRITE | IN_OPEN | IN_MOVED_FROM | \

-6

include/linux/kprobes.h

··· 348 348 349 349 DEFINE_INSN_CACHE_OPS(optinsn); 350 350 351 - #ifdef CONFIG_SYSCTL 352 - extern int sysctl_kprobes_optimization; 353 - extern int proc_kprobes_optimization_handler(struct ctl_table *table, 354 - int write, void *buffer, 355 - size_t *length, loff_t *ppos); 356 - #endif /* CONFIG_SYSCTL */ 357 351 extern void wait_for_kprobe_optimizer(void); 358 352 #else /* !CONFIG_OPTPROBES */ 359 353 static inline void wait_for_kprobe_optimizer(void) { }

+2

include/linux/migrate.h

··· 40 40 struct page *newpage, struct page *page); 41 41 extern int migrate_page_move_mapping(struct address_space *mapping, 42 42 struct page *newpage, struct page *page, int extra_count); 43 + void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep, 44 + spinlock_t *ptl); 43 45 void folio_migrate_flags(struct folio *newfolio, struct folio *folio); 44 46 void folio_migrate_copy(struct folio *newfolio, struct folio *folio); 45 47 int folio_migrate_mapping(struct address_space *mapping,

-3

include/linux/mount.h

··· 113 113 extern void mark_mounts_for_expiry(struct list_head *mounts); 114 114 115 115 extern dev_t name_to_dev_t(const char *name); 116 - 117 - extern unsigned int sysctl_mount_max; 118 - 119 116 extern bool path_is_mountpoint(const struct path *path); 120 117 121 118 extern void kern_unmount_array(struct vfsmount *mnt[], unsigned int num);

-4

include/linux/pipe_fs_i.h

··· 238 238 void pipe_unlock(struct pipe_inode_info *); 239 239 void pipe_double_lock(struct pipe_inode_info *, struct pipe_inode_info *); 240 240 241 - extern unsigned int pipe_max_size; 242 - extern unsigned long pipe_user_pages_hard; 243 - extern unsigned long pipe_user_pages_soft; 244 - 245 241 /* Wait for a pipe to be readable/writable while dropping the pipe lock */ 246 242 void pipe_wait_readable(struct pipe_inode_info *); 247 243 void pipe_wait_writable(struct pipe_inode_info *);

-2

include/linux/poll.h

··· 8 8 #include <linux/wait.h> 9 9 #include <linux/string.h> 10 10 #include <linux/fs.h> 11 - #include <linux/sysctl.h> 12 11 #include <linux/uaccess.h> 13 12 #include <uapi/linux/poll.h> 14 13 #include <uapi/linux/eventpoll.h> 15 14 16 - extern struct ctl_table epoll_table[]; /* for sysctl */ 17 15 /* ~832 bytes of stack space used max in sys_select/sys_poll before allocating 18 16 additional memory. */ 19 17 #ifdef __clang__

-4

include/linux/printk.h

··· 183 183 extern int printk_delay_msec; 184 184 extern int dmesg_restrict; 185 185 186 - extern int 187 - devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write, void *buf, 188 - size_t *lenp, loff_t *ppos); 189 - 190 186 extern void wake_up_klogd(void); 191 187 192 188 char *log_buf_addr_get(void);

+11 -2

include/linux/proc_fs.h

··· 110 110 struct proc_dir_entry *proc_create(const char *name, umode_t mode, struct proc_dir_entry *parent, const struct proc_ops *proc_ops); 111 111 extern void proc_set_size(struct proc_dir_entry *, loff_t); 112 112 extern void proc_set_user(struct proc_dir_entry *, kuid_t, kgid_t); 113 - extern void *PDE_DATA(const struct inode *); 113 + 114 + /* 115 + * Obtain the private data passed by user through proc_create_data() or 116 + * related. 117 + */ 118 + static inline void *pde_data(const struct inode *inode) 119 + { 120 + return inode->i_private; 121 + } 122 + 114 123 extern void *proc_get_parent_data(const struct inode *); 115 124 extern void proc_remove(struct proc_dir_entry *); 116 125 extern void remove_proc_entry(const char *, struct proc_dir_entry *); ··· 200 191 201 192 static inline void proc_set_size(struct proc_dir_entry *de, loff_t size) {} 202 193 static inline void proc_set_user(struct proc_dir_entry *de, kuid_t uid, kgid_t gid) {} 203 - static inline void *PDE_DATA(const struct inode *inode) {BUG(); return NULL;} 194 + static inline void *pde_data(const struct inode *inode) {BUG(); return NULL;} 204 195 static inline void *proc_get_parent_data(const struct inode *inode) { BUG(); return NULL; } 205 196 206 197 static inline void proc_remove(struct proc_dir_entry *de) {}

+2

include/linux/ref_tracker.h

··· 4 4 #include <linux/refcount.h> 5 5 #include <linux/types.h> 6 6 #include <linux/spinlock.h> 7 + #include <linux/stackdepot.h> 7 8 8 9 struct ref_tracker; 9 10 ··· 27 26 spin_lock_init(&dir->lock); 28 27 dir->quarantine_avail = quarantine_count; 29 28 refcount_set(&dir->untracked, 1); 29 + stack_depot_init(); 30 30 } 31 31 32 32 void ref_tracker_dir_exit(struct ref_tracker_dir *dir);

+6

include/linux/rwlock.h

··· 55 55 #define write_lock(lock) _raw_write_lock(lock) 56 56 #define read_lock(lock) _raw_read_lock(lock) 57 57 58 + #ifdef CONFIG_DEBUG_LOCK_ALLOC 59 + #define write_lock_nested(lock, subclass) _raw_write_lock_nested(lock, subclass) 60 + #else 61 + #define write_lock_nested(lock, subclass) _raw_write_lock(lock) 62 + #endif 63 + 58 64 #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) 59 65 60 66 #define read_lock_irqsave(lock, flags) \

+8

include/linux/rwlock_api_smp.h

··· 17 17 18 18 void __lockfunc _raw_read_lock(rwlock_t *lock) __acquires(lock); 19 19 void __lockfunc _raw_write_lock(rwlock_t *lock) __acquires(lock); 20 + void __lockfunc _raw_write_lock_nested(rwlock_t *lock, int subclass) __acquires(lock); 20 21 void __lockfunc _raw_read_lock_bh(rwlock_t *lock) __acquires(lock); 21 22 void __lockfunc _raw_write_lock_bh(rwlock_t *lock) __acquires(lock); 22 23 void __lockfunc _raw_read_lock_irq(rwlock_t *lock) __acquires(lock); ··· 207 206 { 208 207 preempt_disable(); 209 208 rwlock_acquire(&lock->dep_map, 0, 0, _RET_IP_); 209 + LOCK_CONTENDED(lock, do_raw_write_trylock, do_raw_write_lock); 210 + } 211 + 212 + static inline void __raw_write_lock_nested(rwlock_t *lock, int subclass) 213 + { 214 + preempt_disable(); 215 + rwlock_acquire(&lock->dep_map, subclass, 0, _RET_IP_); 210 216 LOCK_CONTENDED(lock, do_raw_write_trylock, do_raw_write_lock); 211 217 } 212 218

+10

include/linux/rwlock_rt.h

··· 28 28 extern int rt_read_trylock(rwlock_t *rwlock); 29 29 extern void rt_read_unlock(rwlock_t *rwlock); 30 30 extern void rt_write_lock(rwlock_t *rwlock); 31 + extern void rt_write_lock_nested(rwlock_t *rwlock, int subclass); 31 32 extern int rt_write_trylock(rwlock_t *rwlock); 32 33 extern void rt_write_unlock(rwlock_t *rwlock); 33 34 ··· 83 82 { 84 83 rt_write_lock(rwlock); 85 84 } 85 + 86 + #ifdef CONFIG_DEBUG_LOCK_ALLOC 87 + static __always_inline void write_lock_nested(rwlock_t *rwlock, int subclass) 88 + { 89 + rt_write_lock_nested(rwlock, subclass); 90 + } 91 + #else 92 + #define write_lock_nested(lock, subclass) rt_write_lock(((void)(subclass), (lock))) 93 + #endif 86 94 87 95 static __always_inline void write_lock_bh(rwlock_t *rwlock) 88 96 {

+1 -13

include/linux/sched/sysctl.h

··· 7 7 struct ctl_table; 8 8 9 9 #ifdef CONFIG_DETECT_HUNG_TASK 10 - 11 - #ifdef CONFIG_SMP 12 - extern unsigned int sysctl_hung_task_all_cpu_backtrace; 13 - #else 14 - #define sysctl_hung_task_all_cpu_backtrace 0 15 - #endif /* CONFIG_SMP */ 16 - 17 - extern int sysctl_hung_task_check_count; 18 - extern unsigned int sysctl_hung_task_panic; 10 + /* used for hung_task and block/ */ 19 11 extern unsigned long sysctl_hung_task_timeout_secs; 20 - extern unsigned long sysctl_hung_task_check_interval_secs; 21 - extern int sysctl_hung_task_warnings; 22 - int proc_dohung_task_timeout_secs(struct ctl_table *table, int write, 23 - void *buffer, size_t *lenp, loff_t *ppos); 24 12 #else 25 13 /* Avoid need for ifdefs elsewhere in the code */ 26 14 enum { sysctl_hung_task_timeout_secs = 0 };

+1 -1

include/linux/seq_file.h

··· 209 209 #define DEFINE_PROC_SHOW_ATTRIBUTE(__name) \ 210 210 static int __name ## _open(struct inode *inode, struct file *file) \ 211 211 { \ 212 - return single_open(file, __name ## _show, PDE_DATA(inode)); \ 212 + return single_open(file, __name ## _show, pde_data(inode)); \ 213 213 } \ 214 214 \ 215 215 static const struct proc_ops __name ## _proc_ops = { \

+1 -2

include/linux/shmem_fs.h

··· 83 83 extern struct page *shmem_read_mapping_page_gfp(struct address_space *mapping, 84 84 pgoff_t index, gfp_t gfp_mask); 85 85 extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t end); 86 - extern int shmem_unuse(unsigned int type, bool frontswap, 87 - unsigned long *fs_pages_to_unuse); 86 + int shmem_unuse(unsigned int type); 88 87 89 88 extern bool shmem_is_huge(struct vm_area_struct *vma, 90 89 struct inode *inode, pgoff_t index);

+1

include/linux/spinlock_api_up.h

··· 59 59 #define _raw_spin_lock_nested(lock, subclass) __LOCK(lock) 60 60 #define _raw_read_lock(lock) __LOCK(lock) 61 61 #define _raw_write_lock(lock) __LOCK(lock) 62 + #define _raw_write_lock_nested(lock, subclass) __LOCK(lock) 62 63 #define _raw_spin_lock_bh(lock) __LOCK_BH(lock) 63 64 #define _raw_read_lock_bh(lock) __LOCK_BH(lock) 64 65 #define _raw_write_lock_bh(lock) __LOCK_BH(lock)

+16 -9

include/linux/stackdepot.h

··· 19 19 unsigned int nr_entries, 20 20 gfp_t gfp_flags, bool can_alloc); 21 21 22 + /* 23 + * Every user of stack depot has to call this during its own init when it's 24 + * decided that it will be calling stack_depot_save() later. 25 + * 26 + * The alternative is to select STACKDEPOT_ALWAYS_INIT to have stack depot 27 + * enabled as part of mm_init(), for subsystems where it's known at compile time 28 + * that stack depot will be used. 29 + */ 30 + int stack_depot_init(void); 31 + 32 + #ifdef CONFIG_STACKDEPOT_ALWAYS_INIT 33 + static inline int stack_depot_early_init(void) { return stack_depot_init(); } 34 + #else 35 + static inline int stack_depot_early_init(void) { return 0; } 36 + #endif 37 + 22 38 depot_stack_handle_t stack_depot_save(unsigned long *entries, 23 39 unsigned int nr_entries, gfp_t gfp_flags); 24 40 ··· 45 29 int spaces); 46 30 47 31 void stack_depot_print(depot_stack_handle_t stack); 48 - 49 - #ifdef CONFIG_STACKDEPOT 50 - int stack_depot_init(void); 51 - #else 52 - static inline int stack_depot_init(void) 53 - { 54 - return 0; 55 - } 56 - #endif /* CONFIG_STACKDEPOT */ 57 32 58 33 #endif

-5

include/linux/stackleak.h

··· 23 23 # endif 24 24 } 25 25 26 - #ifdef CONFIG_STACKLEAK_RUNTIME_DISABLE 27 - int stack_erasing_sysctl(struct ctl_table *table, int write, 28 - void *buffer, size_t *lenp, loff_t *ppos); 29 - #endif 30 - 31 26 #else /* !CONFIG_GCC_PLUGIN_STACKLEAK */ 32 27 static inline void stackleak_task_init(struct task_struct *t) { } 33 28 #endif

-3

include/linux/swapfile.h

··· 6 6 * these were static in swapfile.c but frontswap.c needs them and we don't 7 7 * want to expose them to the dozens of source files that include swap.h 8 8 */ 9 - extern spinlock_t swap_lock; 10 - extern struct plist_head swap_active_head; 11 9 extern struct swap_info_struct *swap_info[]; 12 - extern int try_to_unuse(unsigned int, bool, unsigned long); 13 10 extern unsigned long generic_max_swapfile_size(void); 14 11 extern unsigned long max_swapfile_size(void); 15 12

+60 -7

include/linux/sysctl.h

··· 38 38 struct ctl_dir; 39 39 40 40 /* Keep the same order as in fs/proc/proc_sysctl.c */ 41 - #define SYSCTL_ZERO ((void *)&sysctl_vals[0]) 42 - #define SYSCTL_ONE ((void *)&sysctl_vals[1]) 43 - #define SYSCTL_INT_MAX ((void *)&sysctl_vals[2]) 41 + #define SYSCTL_NEG_ONE ((void *)&sysctl_vals[0]) 42 + #define SYSCTL_ZERO ((void *)&sysctl_vals[1]) 43 + #define SYSCTL_ONE ((void *)&sysctl_vals[2]) 44 + #define SYSCTL_TWO ((void *)&sysctl_vals[3]) 45 + #define SYSCTL_FOUR ((void *)&sysctl_vals[4]) 46 + #define SYSCTL_ONE_HUNDRED ((void *)&sysctl_vals[5]) 47 + #define SYSCTL_TWO_HUNDRED ((void *)&sysctl_vals[6]) 48 + #define SYSCTL_ONE_THOUSAND ((void *)&sysctl_vals[7]) 49 + #define SYSCTL_THREE_THOUSAND ((void *)&sysctl_vals[8]) 50 + #define SYSCTL_INT_MAX ((void *)&sysctl_vals[9]) 51 + 52 + /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */ 53 + #define SYSCTL_MAXOLDUID ((void *)&sysctl_vals[10]) 44 54 45 55 extern const int sysctl_vals[]; 56 + 57 + #define SYSCTL_LONG_ZERO ((void *)&sysctl_long_vals[0]) 58 + #define SYSCTL_LONG_ONE ((void *)&sysctl_long_vals[1]) 59 + #define SYSCTL_LONG_MAX ((void *)&sysctl_long_vals[2]) 60 + 61 + extern const unsigned long sysctl_long_vals[]; 46 62 47 63 typedef int proc_handler(struct ctl_table *ctl, int write, void *buffer, 48 64 size_t *lenp, loff_t *ppos); ··· 194 178 195 179 #ifdef CONFIG_SYSCTL 196 180 181 + #define DECLARE_SYSCTL_BASE(_name, _table) \ 182 + static struct ctl_table _name##_base_table[] = { \ 183 + { \ 184 + .procname = #_name, \ 185 + .mode = 0555, \ 186 + .child = _table, \ 187 + }, \ 188 + { }, \ 189 + } 190 + 191 + extern int __register_sysctl_base(struct ctl_table *base_table); 192 + 193 + #define register_sysctl_base(_name) __register_sysctl_base(_name##_base_table) 194 + 197 195 void proc_sys_poll_notify(struct ctl_table_poll *poll); 198 196 199 197 extern void setup_sysctl_set(struct ctl_table_set *p, ··· 228 198 229 199 void unregister_sysctl_table(struct ctl_table_header * table); 230 200 231 - extern int sysctl_init(void); 201 + extern int sysctl_init_bases(void); 202 + extern void __register_sysctl_init(const char *path, struct ctl_table *table, 203 + const char *table_name); 204 + #define register_sysctl_init(path, table) __register_sysctl_init(path, table, #table) 205 + extern struct ctl_table_header *register_sysctl_mount_point(const char *path); 206 + 232 207 void do_sysctl_args(void); 208 + int do_proc_douintvec(struct ctl_table *table, int write, 209 + void *buffer, size_t *lenp, loff_t *ppos, 210 + int (*conv)(unsigned long *lvalp, 211 + unsigned int *valp, 212 + int write, void *data), 213 + void *data); 233 214 234 215 extern int pwrsw_enabled; 235 216 extern int unaligned_enabled; ··· 248 207 extern int no_unaligned_warning; 249 208 250 209 extern struct ctl_table sysctl_mount_point[]; 251 - extern struct ctl_table random_table[]; 252 - extern struct ctl_table firmware_config_table[]; 253 - extern struct ctl_table epoll_table[]; 254 210 255 211 #else /* CONFIG_SYSCTL */ 212 + 213 + #define DECLARE_SYSCTL_BASE(_name, _table) 214 + 215 + static inline int __register_sysctl_base(struct ctl_table *base_table) 216 + { 217 + return 0; 218 + } 219 + 220 + #define register_sysctl_base(table) __register_sysctl_base(table) 221 + 256 222 static inline struct ctl_table_header *register_sysctl_table(struct ctl_table * table) 223 + { 224 + return NULL; 225 + } 226 + 227 + static inline struct sysctl_header *register_sysctl_mount_point(const char *path) 257 228 { 258 229 return NULL; 259 230 }

-4

include/scsi/sg.h

··· 29 29 * For utility and test programs see: http://sg.danny.cz/sg/sg3_utils.html 30 30 */ 31 31 32 - #ifdef __KERNEL__ 33 - extern int sg_big_buff; /* for sysctl */ 34 - #endif 35 - 36 32 37 33 typedef struct sg_iovec /* same structure as used by readv() Linux system */ 38 34 { /* call. It defines one scatter-gather element. */

+6 -3

init/main.c

··· 834 834 init_mem_debugging_and_hardening(); 835 835 kfence_alloc_pool(); 836 836 report_meminit(); 837 - stack_depot_init(); 837 + stack_depot_early_init(); 838 838 mem_init(); 839 839 mem_init_print_info(); 840 - /* page_owner must be initialized after buddy is ready */ 841 - page_ext_init_flatmem_late(); 842 840 kmem_cache_init(); 841 + /* 842 + * page_owner must be initialized after buddy is ready, and also after 843 + * slab is ready so that stack_depot_init() works properly 844 + */ 845 + page_ext_init_flatmem_late(); 843 846 kmemleak_init(); 844 847 pgtable_init(); 845 848 debug_objects_mem_init();

+1 -1

ipc/util.c

··· 894 894 if (!iter) 895 895 return -ENOMEM; 896 896 897 - iter->iface = PDE_DATA(inode); 897 + iter->iface = pde_data(inode); 898 898 iter->ns = get_ipc_ns(current->nsproxy->ipc_ns); 899 899 iter->pid_ns = get_pid_ns(task_active_pid_ns(current)); 900 900

+78 -3

kernel/hung_task.c

··· 63 63 * Should we dump all CPUs backtraces in a hung task event? 64 64 * Defaults to 0, can be changed via sysctl. 65 65 */ 66 - unsigned int __read_mostly sysctl_hung_task_all_cpu_backtrace; 66 + static unsigned int __read_mostly sysctl_hung_task_all_cpu_backtrace; 67 + #else 68 + #define sysctl_hung_task_all_cpu_backtrace 0 67 69 #endif /* CONFIG_SMP */ 68 70 69 71 /* ··· 224 222 MAX_SCHEDULE_TIMEOUT; 225 223 } 226 224 225 + #ifdef CONFIG_SYSCTL 227 226 /* 228 227 * Process updating of timeout sysctl 229 228 */ 230 - int proc_dohung_task_timeout_secs(struct ctl_table *table, int write, 231 - void *buffer, size_t *lenp, loff_t *ppos) 229 + static int proc_dohung_task_timeout_secs(struct ctl_table *table, int write, 230 + void __user *buffer, 231 + size_t *lenp, loff_t *ppos) 232 232 { 233 233 int ret; 234 234 ··· 244 240 out: 245 241 return ret; 246 242 } 243 + 244 + /* 245 + * This is needed for proc_doulongvec_minmax of sysctl_hung_task_timeout_secs 246 + * and hung_task_check_interval_secs 247 + */ 248 + static const unsigned long hung_task_timeout_max = (LONG_MAX / HZ); 249 + static struct ctl_table hung_task_sysctls[] = { 250 + #ifdef CONFIG_SMP 251 + { 252 + .procname = "hung_task_all_cpu_backtrace", 253 + .data = &sysctl_hung_task_all_cpu_backtrace, 254 + .maxlen = sizeof(int), 255 + .mode = 0644, 256 + .proc_handler = proc_dointvec_minmax, 257 + .extra1 = SYSCTL_ZERO, 258 + .extra2 = SYSCTL_ONE, 259 + }, 260 + #endif /* CONFIG_SMP */ 261 + { 262 + .procname = "hung_task_panic", 263 + .data = &sysctl_hung_task_panic, 264 + .maxlen = sizeof(int), 265 + .mode = 0644, 266 + .proc_handler = proc_dointvec_minmax, 267 + .extra1 = SYSCTL_ZERO, 268 + .extra2 = SYSCTL_ONE, 269 + }, 270 + { 271 + .procname = "hung_task_check_count", 272 + .data = &sysctl_hung_task_check_count, 273 + .maxlen = sizeof(int), 274 + .mode = 0644, 275 + .proc_handler = proc_dointvec_minmax, 276 + .extra1 = SYSCTL_ZERO, 277 + }, 278 + { 279 + .procname = "hung_task_timeout_secs", 280 + .data = &sysctl_hung_task_timeout_secs, 281 + .maxlen = sizeof(unsigned long), 282 + .mode = 0644, 283 + .proc_handler = proc_dohung_task_timeout_secs, 284 + .extra2 = (void *)&hung_task_timeout_max, 285 + }, 286 + { 287 + .procname = "hung_task_check_interval_secs", 288 + .data = &sysctl_hung_task_check_interval_secs, 289 + .maxlen = sizeof(unsigned long), 290 + .mode = 0644, 291 + .proc_handler = proc_dohung_task_timeout_secs, 292 + .extra2 = (void *)&hung_task_timeout_max, 293 + }, 294 + { 295 + .procname = "hung_task_warnings", 296 + .data = &sysctl_hung_task_warnings, 297 + .maxlen = sizeof(int), 298 + .mode = 0644, 299 + .proc_handler = proc_dointvec_minmax, 300 + .extra1 = SYSCTL_NEG_ONE, 301 + }, 302 + {} 303 + }; 304 + 305 + static void __init hung_task_sysctl_init(void) 306 + { 307 + register_sysctl_init("kernel", hung_task_sysctls); 308 + } 309 + #else 310 + #define hung_task_sysctl_init() do { } while (0) 311 + #endif /* CONFIG_SYSCTL */ 312 + 247 313 248 314 static atomic_t reset_hung_task = ATOMIC_INIT(0); 249 315 ··· 384 310 pm_notifier(hungtask_pm_notify, 0); 385 311 386 312 watchdog_task = kthread_run(watchdog, NULL, "khungtaskd"); 313 + hung_task_sysctl_init(); 387 314 388 315 return 0; 389 316 }

+4 -4

kernel/irq/proc.c

··· 137 137 static ssize_t write_irq_affinity(int type, struct file *file, 138 138 const char __user *buffer, size_t count, loff_t *pos) 139 139 { 140 - unsigned int irq = (int)(long)PDE_DATA(file_inode(file)); 140 + unsigned int irq = (int)(long)pde_data(file_inode(file)); 141 141 cpumask_var_t new_value; 142 142 int err; 143 143 ··· 190 190 191 191 static int irq_affinity_proc_open(struct inode *inode, struct file *file) 192 192 { 193 - return single_open(file, irq_affinity_proc_show, PDE_DATA(inode)); 193 + return single_open(file, irq_affinity_proc_show, pde_data(inode)); 194 194 } 195 195 196 196 static int irq_affinity_list_proc_open(struct inode *inode, struct file *file) 197 197 { 198 - return single_open(file, irq_affinity_list_proc_show, PDE_DATA(inode)); 198 + return single_open(file, irq_affinity_list_proc_show, pde_data(inode)); 199 199 } 200 200 201 201 static const struct proc_ops irq_affinity_proc_ops = { ··· 265 265 266 266 static int default_affinity_open(struct inode *inode, struct file *file) 267 267 { 268 - return single_open(file, default_affinity_show, PDE_DATA(inode)); 268 + return single_open(file, default_affinity_show, pde_data(inode)); 269 269 } 270 270 271 271 static const struct proc_ops default_affinity_proc_ops = {

+26 -4

kernel/kprobes.c

··· 48 48 #define KPROBE_HASH_BITS 6 49 49 #define KPROBE_TABLE_SIZE (1 << KPROBE_HASH_BITS) 50 50 51 + #if !defined(CONFIG_OPTPROBES) || !defined(CONFIG_SYSCTL) 52 + #define kprobe_sysctls_init() do { } while (0) 53 + #endif 51 54 52 55 static int kprobes_initialized; 53 56 /* kprobe_table can be accessed by ··· 941 938 } 942 939 943 940 static DEFINE_MUTEX(kprobe_sysctl_mutex); 944 - int sysctl_kprobes_optimization; 945 - int proc_kprobes_optimization_handler(struct ctl_table *table, int write, 946 - void *buffer, size_t *length, 947 - loff_t *ppos) 941 + static int sysctl_kprobes_optimization; 942 + static int proc_kprobes_optimization_handler(struct ctl_table *table, 943 + int write, void *buffer, 944 + size_t *length, loff_t *ppos) 948 945 { 949 946 int ret; 950 947 ··· 959 956 mutex_unlock(&kprobe_sysctl_mutex); 960 957 961 958 return ret; 959 + } 960 + 961 + static struct ctl_table kprobe_sysctls[] = { 962 + { 963 + .procname = "kprobes-optimization", 964 + .data = &sysctl_kprobes_optimization, 965 + .maxlen = sizeof(int), 966 + .mode = 0644, 967 + .proc_handler = proc_kprobes_optimization_handler, 968 + .extra1 = SYSCTL_ZERO, 969 + .extra2 = SYSCTL_ONE, 970 + }, 971 + {} 972 + }; 973 + 974 + static void __init kprobe_sysctls_init(void) 975 + { 976 + register_sysctl_init("debug", kprobe_sysctls); 962 977 } 963 978 #endif /* CONFIG_SYSCTL */ 964 979 ··· 2605 2584 err = register_module_notifier(&kprobe_module_nb); 2606 2585 2607 2586 kprobes_initialized = (err == 0); 2587 + kprobe_sysctls_init(); 2608 2588 return err; 2609 2589 } 2610 2590 early_initcall(init_kprobes);

+10

kernel/locking/spinlock.c

··· 300 300 __raw_write_lock(lock); 301 301 } 302 302 EXPORT_SYMBOL(_raw_write_lock); 303 + 304 + #ifndef CONFIG_DEBUG_LOCK_ALLOC 305 + #define __raw_write_lock_nested(lock, subclass) __raw_write_lock(((void)(subclass), (lock))) 306 + #endif 307 + 308 + void __lockfunc _raw_write_lock_nested(rwlock_t *lock, int subclass) 309 + { 310 + __raw_write_lock_nested(lock, subclass); 311 + } 312 + EXPORT_SYMBOL(_raw_write_lock_nested); 303 313 #endif 304 314 305 315 #ifndef CONFIG_INLINE_WRITE_LOCK_IRQSAVE

+12

kernel/locking/spinlock_rt.c

··· 239 239 } 240 240 EXPORT_SYMBOL(rt_write_lock); 241 241 242 + #ifdef CONFIG_DEBUG_LOCK_ALLOC 243 + void __sched rt_write_lock_nested(rwlock_t *rwlock, int subclass) 244 + { 245 + rtlock_might_resched(); 246 + rwlock_acquire(&rwlock->dep_map, subclass, 0, _RET_IP_); 247 + rwbase_write_lock(&rwlock->rwbase, TASK_RTLOCK_WAIT); 248 + rcu_read_lock(); 249 + migrate_disable(); 250 + } 251 + EXPORT_SYMBOL(rt_write_lock_nested); 252 + #endif 253 + 242 254 void __sched rt_read_unlock(rwlock_t *rwlock) 243 255 { 244 256 rwlock_release(&rwlock->dep_map, _RET_IP_);

+4 -1

kernel/printk/Makefile

··· 2 2 obj-y = printk.o 3 3 obj-$(CONFIG_PRINTK) += printk_safe.o 4 4 obj-$(CONFIG_A11Y_BRAILLE_CONSOLE) += braille.o 5 - obj-$(CONFIG_PRINTK) += printk_ringbuffer.o 6 5 obj-$(CONFIG_PRINTK_INDEX) += index.o 6 + 7 + obj-$(CONFIG_PRINTK) += printk_support.o 8 + printk_support-y := printk_ringbuffer.o 9 + printk_support-$(CONFIG_SYSCTL) += sysctl.o

+8

kernel/printk/internal.h

··· 4 4 */ 5 5 #include <linux/percpu.h> 6 6 7 + #if defined(CONFIG_PRINTK) && defined(CONFIG_SYSCTL) 8 + void __init printk_sysctl_init(void); 9 + int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write, 10 + void *buffer, size_t *lenp, loff_t *ppos); 11 + #else 12 + #define printk_sysctl_init() do { } while (0) 13 + #endif 14 + 7 15 #ifdef CONFIG_PRINTK 8 16 9 17 /* Flags for a single printk record. */

+3 -1

kernel/printk/printk.c

··· 171 171 __setup("printk.devkmsg=", control_devkmsg); 172 172 173 173 char devkmsg_log_str[DEVKMSG_STR_MAX_SIZE] = "ratelimit"; 174 - 174 + #if defined(CONFIG_PRINTK) && defined(CONFIG_SYSCTL) 175 175 int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write, 176 176 void *buffer, size_t *lenp, loff_t *ppos) 177 177 { ··· 210 210 211 211 return 0; 212 212 } 213 + #endif /* CONFIG_PRINTK && CONFIG_SYSCTL */ 213 214 214 215 /* Number of registered extended console drivers. */ 215 216 static int nr_ext_console_drivers; ··· 3212 3211 ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "printk:online", 3213 3212 console_cpu_notify, NULL); 3214 3213 WARN_ON(ret < 0); 3214 + printk_sysctl_init(); 3215 3215 return 0; 3216 3216 } 3217 3217 late_initcall(printk_late_init);

+85

kernel/printk/sysctl.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * sysctl.c: General linux system control interface 4 + */ 5 + 6 + #include <linux/sysctl.h> 7 + #include <linux/printk.h> 8 + #include <linux/capability.h> 9 + #include <linux/ratelimit.h> 10 + #include "internal.h" 11 + 12 + static const int ten_thousand = 10000; 13 + 14 + static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write, 15 + void __user *buffer, size_t *lenp, loff_t *ppos) 16 + { 17 + if (write && !capable(CAP_SYS_ADMIN)) 18 + return -EPERM; 19 + 20 + return proc_dointvec_minmax(table, write, buffer, lenp, ppos); 21 + } 22 + 23 + static struct ctl_table printk_sysctls[] = { 24 + { 25 + .procname = "printk", 26 + .data = &console_loglevel, 27 + .maxlen = 4*sizeof(int), 28 + .mode = 0644, 29 + .proc_handler = proc_dointvec, 30 + }, 31 + { 32 + .procname = "printk_ratelimit", 33 + .data = &printk_ratelimit_state.interval, 34 + .maxlen = sizeof(int), 35 + .mode = 0644, 36 + .proc_handler = proc_dointvec_jiffies, 37 + }, 38 + { 39 + .procname = "printk_ratelimit_burst", 40 + .data = &printk_ratelimit_state.burst, 41 + .maxlen = sizeof(int), 42 + .mode = 0644, 43 + .proc_handler = proc_dointvec, 44 + }, 45 + { 46 + .procname = "printk_delay", 47 + .data = &printk_delay_msec, 48 + .maxlen = sizeof(int), 49 + .mode = 0644, 50 + .proc_handler = proc_dointvec_minmax, 51 + .extra1 = SYSCTL_ZERO, 52 + .extra2 = (void *)&ten_thousand, 53 + }, 54 + { 55 + .procname = "printk_devkmsg", 56 + .data = devkmsg_log_str, 57 + .maxlen = DEVKMSG_STR_MAX_SIZE, 58 + .mode = 0644, 59 + .proc_handler = devkmsg_sysctl_set_loglvl, 60 + }, 61 + { 62 + .procname = "dmesg_restrict", 63 + .data = &dmesg_restrict, 64 + .maxlen = sizeof(int), 65 + .mode = 0644, 66 + .proc_handler = proc_dointvec_minmax_sysadmin, 67 + .extra1 = SYSCTL_ZERO, 68 + .extra2 = SYSCTL_ONE, 69 + }, 70 + { 71 + .procname = "kptr_restrict", 72 + .data = &kptr_restrict, 73 + .maxlen = sizeof(int), 74 + .mode = 0644, 75 + .proc_handler = proc_dointvec_minmax_sysadmin, 76 + .extra1 = SYSCTL_ZERO, 77 + .extra2 = SYSCTL_TWO, 78 + }, 79 + {} 80 + }; 81 + 82 + void __init printk_sysctl_init(void) 83 + { 84 + register_sysctl_init("kernel", printk_sysctls); 85 + }

+2 -2

kernel/resource.c

··· 99 99 static void *r_start(struct seq_file *m, loff_t *pos) 100 100 __acquires(resource_lock) 101 101 { 102 - struct resource *p = PDE_DATA(file_inode(m->file)); 102 + struct resource *p = pde_data(file_inode(m->file)); 103 103 loff_t l = 0; 104 104 read_lock(&resource_lock); 105 105 for (p = p->child; p && l < *pos; p = r_next(m, p, &l)) ··· 115 115 116 116 static int r_show(struct seq_file *m, void *v) 117 117 { 118 - struct resource *root = PDE_DATA(file_inode(m->file)); 118 + struct resource *root = pde_data(file_inode(m->file)); 119 119 struct resource *r = v, *p; 120 120 unsigned long long start, end; 121 121 int width = root->end < 0x10000 ? 4 : 8;

+24 -2

kernel/stackleak.c

··· 16 16 #ifdef CONFIG_STACKLEAK_RUNTIME_DISABLE 17 17 #include <linux/jump_label.h> 18 18 #include <linux/sysctl.h> 19 + #include <linux/init.h> 19 20 20 21 static DEFINE_STATIC_KEY_FALSE(stack_erasing_bypass); 21 22 22 - int stack_erasing_sysctl(struct ctl_table *table, int write, 23 - void *buffer, size_t *lenp, loff_t *ppos) 23 + #ifdef CONFIG_SYSCTL 24 + static int stack_erasing_sysctl(struct ctl_table *table, int write, 25 + void __user *buffer, size_t *lenp, loff_t *ppos) 24 26 { 25 27 int ret = 0; 26 28 int state = !static_branch_unlikely(&stack_erasing_bypass); ··· 44 42 state ? "enabled" : "disabled"); 45 43 return ret; 46 44 } 45 + static struct ctl_table stackleak_sysctls[] = { 46 + { 47 + .procname = "stack_erasing", 48 + .data = NULL, 49 + .maxlen = sizeof(int), 50 + .mode = 0600, 51 + .proc_handler = stack_erasing_sysctl, 52 + .extra1 = SYSCTL_ZERO, 53 + .extra2 = SYSCTL_ONE, 54 + }, 55 + {} 56 + }; 57 + 58 + static int __init stackleak_sysctls_init(void) 59 + { 60 + register_sysctl_init("kernel", stackleak_sysctls); 61 + return 0; 62 + } 63 + late_initcall(stackleak_sysctls_init); 64 + #endif /* CONFIG_SYSCTL */ 47 65 48 66 #define skip_erasing() static_branch_unlikely(&stack_erasing_bypass) 49 67 #else

+48 -674

kernel/sysctl.c

··· 20 20 */ 21 21 22 22 #include <linux/module.h> 23 - #include <linux/aio.h> 24 23 #include <linux/mm.h> 25 24 #include <linux/swap.h> 26 25 #include <linux/slab.h> ··· 49 50 #include <linux/times.h> 50 51 #include <linux/limits.h> 51 52 #include <linux/dcache.h> 52 - #include <linux/dnotify.h> 53 53 #include <linux/syscalls.h> 54 54 #include <linux/vmstat.h> 55 55 #include <linux/nfs_fs.h> ··· 56 58 #include <linux/reboot.h> 57 59 #include <linux/ftrace.h> 58 60 #include <linux/perf_event.h> 59 - #include <linux/kprobes.h> 60 - #include <linux/pipe_fs_i.h> 61 61 #include <linux/oom.h> 62 62 #include <linux/kmod.h> 63 63 #include <linux/capability.h> 64 64 #include <linux/binfmts.h> 65 65 #include <linux/sched/sysctl.h> 66 - #include <linux/sched/coredump.h> 67 66 #include <linux/kexec.h> 68 67 #include <linux/bpf.h> 69 68 #include <linux/mount.h> 70 69 #include <linux/userfaultfd_k.h> 71 - #include <linux/coredump.h> 72 70 #include <linux/latencytop.h> 73 71 #include <linux/pid.h> 74 72 #include <linux/delayacct.h> ··· 91 97 #if defined(CONFIG_PROVE_LOCKING) || defined(CONFIG_LOCK_STAT) 92 98 #include <linux/lockdep.h> 93 99 #endif 94 - #ifdef CONFIG_CHR_DEV_SG 95 - #include <scsi/sg.h> 96 - #endif 97 - #ifdef CONFIG_STACKLEAK_RUNTIME_DISABLE 98 - #include <linux/stackleak.h> 99 - #endif 100 - #ifdef CONFIG_LOCKUP_DETECTOR 101 - #include <linux/nmi.h> 102 - #endif 103 100 104 101 #if defined(CONFIG_SYSCTL) 105 102 106 103 /* Constants used for minimum and maximum */ 107 - #ifdef CONFIG_LOCKUP_DETECTOR 108 - static int sixty = 60; 109 - #endif 110 104 111 - static int __maybe_unused neg_one = -1; 112 - static int __maybe_unused two = 2; 113 - static int __maybe_unused four = 4; 114 - static unsigned long zero_ul; 115 - static unsigned long one_ul = 1; 116 - static unsigned long long_max = LONG_MAX; 117 - static int one_hundred = 100; 118 - static int two_hundred = 200; 119 - static int one_thousand = 1000; 120 - static int three_thousand = 3000; 121 - #ifdef CONFIG_PRINTK 122 - static int ten_thousand = 10000; 123 - #endif 124 105 #ifdef CONFIG_PERF_EVENTS 125 - static int six_hundred_forty_kb = 640 * 1024; 106 + static const int six_hundred_forty_kb = 640 * 1024; 126 107 #endif 127 108 128 109 /* this is needed for the proc_doulongvec_minmax of vm_dirty_bytes */ 129 - static unsigned long dirty_bytes_min = 2 * PAGE_SIZE; 110 + static const unsigned long dirty_bytes_min = 2 * PAGE_SIZE; 130 111 131 - /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */ 132 - static int maxolduid = 65535; 133 - static int minolduid; 134 - 135 - static int ngroups_max = NGROUPS_MAX; 112 + static const int ngroups_max = NGROUPS_MAX; 136 113 static const int cap_last_cap = CAP_LAST_CAP; 137 - 138 - /* 139 - * This is needed for proc_doulongvec_minmax of sysctl_hung_task_timeout_secs 140 - * and hung_task_check_interval_secs 141 - */ 142 - #ifdef CONFIG_DETECT_HUNG_TASK 143 - static unsigned long hung_task_timeout_max = (LONG_MAX/HZ); 144 - #endif 145 - 146 - #ifdef CONFIG_INOTIFY_USER 147 - #include <linux/inotify.h> 148 - #endif 149 - #ifdef CONFIG_FANOTIFY 150 - #include <linux/fanotify.h> 151 - #endif 152 114 153 115 #ifdef CONFIG_PROC_SYSCTL 154 116 ··· 142 192 #endif 143 193 144 194 #ifdef CONFIG_COMPACTION 145 - static int min_extfrag_threshold; 146 - static int max_extfrag_threshold = 1000; 195 + /* min_extfrag_threshold is SYSCTL_ZERO */; 196 + static const int max_extfrag_threshold = 1000; 147 197 #endif 148 198 149 199 #endif /* CONFIG_SYSCTL */ ··· 754 804 return do_proc_douintvec_r(i, buffer, lenp, ppos, conv, data); 755 805 } 756 806 757 - static int do_proc_douintvec(struct ctl_table *table, int write, 758 - void *buffer, size_t *lenp, loff_t *ppos, 759 - int (*conv)(unsigned long *lvalp, 760 - unsigned int *valp, 761 - int write, void *data), 762 - void *data) 807 + int do_proc_douintvec(struct ctl_table *table, int write, 808 + void *buffer, size_t *lenp, loff_t *ppos, 809 + int (*conv)(unsigned long *lvalp, 810 + unsigned int *valp, 811 + int write, void *data), 812 + void *data) 763 813 { 764 814 return __do_proc_douintvec(table->data, table, write, 765 815 buffer, lenp, ppos, conv, data); ··· 887 937 888 938 return err; 889 939 } 890 - 891 - #ifdef CONFIG_PRINTK 892 - static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write, 893 - void *buffer, size_t *lenp, loff_t *ppos) 894 - { 895 - if (write && !capable(CAP_SYS_ADMIN)) 896 - return -EPERM; 897 - 898 - return proc_dointvec_minmax(table, write, buffer, lenp, ppos); 899 - } 900 - #endif 901 940 902 941 /** 903 942 * struct do_proc_dointvec_minmax_conv_param - proc_dointvec_minmax() range checking structure ··· 1083 1144 } 1084 1145 EXPORT_SYMBOL_GPL(proc_dou8vec_minmax); 1085 1146 1086 - static int do_proc_dopipe_max_size_conv(unsigned long *lvalp, 1087 - unsigned int *valp, 1088 - int write, void *data) 1089 - { 1090 - if (write) { 1091 - unsigned int val; 1092 - 1093 - val = round_pipe_size(*lvalp); 1094 - if (val == 0) 1095 - return -EINVAL; 1096 - 1097 - *valp = val; 1098 - } else { 1099 - unsigned int val = *valp; 1100 - *lvalp = (unsigned long) val; 1101 - } 1102 - 1103 - return 0; 1104 - } 1105 - 1106 - static int proc_dopipe_max_size(struct ctl_table *table, int write, 1107 - void *buffer, size_t *lenp, loff_t *ppos) 1108 - { 1109 - return do_proc_douintvec(table, write, buffer, lenp, ppos, 1110 - do_proc_dopipe_max_size_conv, NULL); 1111 - } 1112 - 1113 - static void validate_coredump_safety(void) 1114 - { 1115 - #ifdef CONFIG_COREDUMP 1116 - if (suid_dumpable == SUID_DUMP_ROOT && 1117 - core_pattern[0] != '/' && core_pattern[0] != '|') { 1118 - printk(KERN_WARNING 1119 - "Unsafe core_pattern used with fs.suid_dumpable=2.\n" 1120 - "Pipe handler or fully qualified core dump path required.\n" 1121 - "Set kernel.core_pattern before fs.suid_dumpable.\n" 1122 - ); 1123 - } 1124 - #endif 1125 - } 1126 - 1127 - static int proc_dointvec_minmax_coredump(struct ctl_table *table, int write, 1128 - void *buffer, size_t *lenp, loff_t *ppos) 1129 - { 1130 - int error = proc_dointvec_minmax(table, write, buffer, lenp, ppos); 1131 - if (!error) 1132 - validate_coredump_safety(); 1133 - return error; 1134 - } 1135 - 1136 - #ifdef CONFIG_COREDUMP 1137 - static int proc_dostring_coredump(struct ctl_table *table, int write, 1138 - void *buffer, size_t *lenp, loff_t *ppos) 1139 - { 1140 - int error = proc_dostring(table, write, buffer, lenp, ppos); 1141 - if (!error) 1142 - validate_coredump_safety(); 1143 - return error; 1144 - } 1145 - #endif 1146 - 1147 1147 #ifdef CONFIG_MAGIC_SYSRQ 1148 1148 static int sysrq_sysctl_handler(struct ctl_table *table, int write, 1149 1149 void *buffer, size_t *lenp, loff_t *ppos) ··· 1145 1267 err = proc_get_long(&p, &left, &val, &neg, 1146 1268 proc_wspace_sep, 1147 1269 sizeof(proc_wspace_sep), NULL); 1148 - if (err) 1270 + if (err || neg) { 1271 + err = -EINVAL; 1149 1272 break; 1150 - if (neg) 1151 - continue; 1273 + } 1274 + 1152 1275 val = convmul * val / convdiv; 1153 1276 if ((min && val < *min) || (max && val > *max)) { 1154 1277 err = -EINVAL; ··· 1807 1928 .mode = 0644, 1808 1929 .proc_handler = proc_dointvec, 1809 1930 }, 1810 - #ifdef CONFIG_COREDUMP 1811 - { 1812 - .procname = "core_uses_pid", 1813 - .data = &core_uses_pid, 1814 - .maxlen = sizeof(int), 1815 - .mode = 0644, 1816 - .proc_handler = proc_dointvec, 1817 - }, 1818 - { 1819 - .procname = "core_pattern", 1820 - .data = core_pattern, 1821 - .maxlen = CORENAME_MAX_SIZE, 1822 - .mode = 0644, 1823 - .proc_handler = proc_dostring_coredump, 1824 - }, 1825 - { 1826 - .procname = "core_pipe_limit", 1827 - .data = &core_pipe_limit, 1828 - .maxlen = sizeof(unsigned int), 1829 - .mode = 0644, 1830 - .proc_handler = proc_dointvec, 1831 - }, 1832 - #endif 1833 1931 #ifdef CONFIG_PROC_SYSCTL 1834 1932 { 1835 1933 .procname = "tainted", ··· 1820 1964 .maxlen = sizeof(int), 1821 1965 .mode = 0644, 1822 1966 .proc_handler = proc_dointvec_minmax, 1823 - .extra1 = &neg_one, 1967 + .extra1 = SYSCTL_NEG_ONE, 1824 1968 .extra2 = SYSCTL_ONE, 1825 1969 }, 1826 1970 #endif ··· 1987 2131 .proc_handler = proc_dostring, 1988 2132 }, 1989 2133 #endif 1990 - #ifdef CONFIG_CHR_DEV_SG 1991 - { 1992 - .procname = "sg-big-buff", 1993 - .data = &sg_big_buff, 1994 - .maxlen = sizeof (int), 1995 - .mode = 0444, 1996 - .proc_handler = proc_dointvec, 1997 - }, 1998 - #endif 1999 2134 #ifdef CONFIG_BSD_PROCESS_ACCT 2000 2135 { 2001 2136 .procname = "acct", ··· 2022 2175 .proc_handler = sysctl_max_threads, 2023 2176 }, 2024 2177 { 2025 - .procname = "random", 2026 - .mode = 0555, 2027 - .child = random_table, 2028 - }, 2029 - { 2030 2178 .procname = "usermodehelper", 2031 2179 .mode = 0555, 2032 2180 .child = usermodehelper_table, 2033 2181 }, 2034 - #ifdef CONFIG_FW_LOADER_USER_HELPER 2035 - { 2036 - .procname = "firmware_config", 2037 - .mode = 0555, 2038 - .child = firmware_config_table, 2039 - }, 2040 - #endif 2041 2182 { 2042 2183 .procname = "overflowuid", 2043 2184 .data = &overflowuid, 2044 2185 .maxlen = sizeof(int), 2045 2186 .mode = 0644, 2046 2187 .proc_handler = proc_dointvec_minmax, 2047 - .extra1 = &minolduid, 2048 - .extra2 = &maxolduid, 2188 + .extra1 = SYSCTL_ZERO, 2189 + .extra2 = SYSCTL_MAXOLDUID, 2049 2190 }, 2050 2191 { 2051 2192 .procname = "overflowgid", ··· 2041 2206 .maxlen = sizeof(int), 2042 2207 .mode = 0644, 2043 2208 .proc_handler = proc_dointvec_minmax, 2044 - .extra1 = &minolduid, 2045 - .extra2 = &maxolduid, 2209 + .extra1 = SYSCTL_ZERO, 2210 + .extra2 = SYSCTL_MAXOLDUID, 2046 2211 }, 2047 2212 #ifdef CONFIG_S390 2048 2213 { ··· 2087 2252 .mode = 0644, 2088 2253 .proc_handler = proc_doulongvec_minmax, 2089 2254 }, 2090 - #if defined CONFIG_PRINTK 2091 - { 2092 - .procname = "printk", 2093 - .data = &console_loglevel, 2094 - .maxlen = 4*sizeof(int), 2095 - .mode = 0644, 2096 - .proc_handler = proc_dointvec, 2097 - }, 2098 - { 2099 - .procname = "printk_ratelimit", 2100 - .data = &printk_ratelimit_state.interval, 2101 - .maxlen = sizeof(int), 2102 - .mode = 0644, 2103 - .proc_handler = proc_dointvec_jiffies, 2104 - }, 2105 - { 2106 - .procname = "printk_ratelimit_burst", 2107 - .data = &printk_ratelimit_state.burst, 2108 - .maxlen = sizeof(int), 2109 - .mode = 0644, 2110 - .proc_handler = proc_dointvec, 2111 - }, 2112 - { 2113 - .procname = "printk_delay", 2114 - .data = &printk_delay_msec, 2115 - .maxlen = sizeof(int), 2116 - .mode = 0644, 2117 - .proc_handler = proc_dointvec_minmax, 2118 - .extra1 = SYSCTL_ZERO, 2119 - .extra2 = &ten_thousand, 2120 - }, 2121 - { 2122 - .procname = "printk_devkmsg", 2123 - .data = devkmsg_log_str, 2124 - .maxlen = DEVKMSG_STR_MAX_SIZE, 2125 - .mode = 0644, 2126 - .proc_handler = devkmsg_sysctl_set_loglvl, 2127 - }, 2128 - { 2129 - .procname = "dmesg_restrict", 2130 - .data = &dmesg_restrict, 2131 - .maxlen = sizeof(int), 2132 - .mode = 0644, 2133 - .proc_handler = proc_dointvec_minmax_sysadmin, 2134 - .extra1 = SYSCTL_ZERO, 2135 - .extra2 = SYSCTL_ONE, 2136 - }, 2137 - { 2138 - .procname = "kptr_restrict", 2139 - .data = &kptr_restrict, 2140 - .maxlen = sizeof(int), 2141 - .mode = 0644, 2142 - .proc_handler = proc_dointvec_minmax_sysadmin, 2143 - .extra1 = SYSCTL_ZERO, 2144 - .extra2 = &two, 2145 - }, 2146 - #endif 2147 2255 { 2148 2256 .procname = "ngroups_max", 2149 - .data = &ngroups_max, 2257 + .data = (void *)&ngroups_max, 2150 2258 .maxlen = sizeof (int), 2151 2259 .mode = 0444, 2152 2260 .proc_handler = proc_dointvec, ··· 2101 2323 .mode = 0444, 2102 2324 .proc_handler = proc_dointvec, 2103 2325 }, 2104 - #if defined(CONFIG_LOCKUP_DETECTOR) 2105 - { 2106 - .procname = "watchdog", 2107 - .data = &watchdog_user_enabled, 2108 - .maxlen = sizeof(int), 2109 - .mode = 0644, 2110 - .proc_handler = proc_watchdog, 2111 - .extra1 = SYSCTL_ZERO, 2112 - .extra2 = SYSCTL_ONE, 2113 - }, 2114 - { 2115 - .procname = "watchdog_thresh", 2116 - .data = &watchdog_thresh, 2117 - .maxlen = sizeof(int), 2118 - .mode = 0644, 2119 - .proc_handler = proc_watchdog_thresh, 2120 - .extra1 = SYSCTL_ZERO, 2121 - .extra2 = &sixty, 2122 - }, 2123 - { 2124 - .procname = "nmi_watchdog", 2125 - .data = &nmi_watchdog_user_enabled, 2126 - .maxlen = sizeof(int), 2127 - .mode = NMI_WATCHDOG_SYSCTL_PERM, 2128 - .proc_handler = proc_nmi_watchdog, 2129 - .extra1 = SYSCTL_ZERO, 2130 - .extra2 = SYSCTL_ONE, 2131 - }, 2132 - { 2133 - .procname = "watchdog_cpumask", 2134 - .data = &watchdog_cpumask_bits, 2135 - .maxlen = NR_CPUS, 2136 - .mode = 0644, 2137 - .proc_handler = proc_watchdog_cpumask, 2138 - }, 2139 - #ifdef CONFIG_SOFTLOCKUP_DETECTOR 2140 - { 2141 - .procname = "soft_watchdog", 2142 - .data = &soft_watchdog_user_enabled, 2143 - .maxlen = sizeof(int), 2144 - .mode = 0644, 2145 - .proc_handler = proc_soft_watchdog, 2146 - .extra1 = SYSCTL_ZERO, 2147 - .extra2 = SYSCTL_ONE, 2148 - }, 2149 - { 2150 - .procname = "softlockup_panic", 2151 - .data = &softlockup_panic, 2152 - .maxlen = sizeof(int), 2153 - .mode = 0644, 2154 - .proc_handler = proc_dointvec_minmax, 2155 - .extra1 = SYSCTL_ZERO, 2156 - .extra2 = SYSCTL_ONE, 2157 - }, 2158 - #ifdef CONFIG_SMP 2159 - { 2160 - .procname = "softlockup_all_cpu_backtrace", 2161 - .data = &sysctl_softlockup_all_cpu_backtrace, 2162 - .maxlen = sizeof(int), 2163 - .mode = 0644, 2164 - .proc_handler = proc_dointvec_minmax, 2165 - .extra1 = SYSCTL_ZERO, 2166 - .extra2 = SYSCTL_ONE, 2167 - }, 2168 - #endif /* CONFIG_SMP */ 2169 - #endif 2170 - #ifdef CONFIG_HARDLOCKUP_DETECTOR 2171 - { 2172 - .procname = "hardlockup_panic", 2173 - .data = &hardlockup_panic, 2174 - .maxlen = sizeof(int), 2175 - .mode = 0644, 2176 - .proc_handler = proc_dointvec_minmax, 2177 - .extra1 = SYSCTL_ZERO, 2178 - .extra2 = SYSCTL_ONE, 2179 - }, 2180 - #ifdef CONFIG_SMP 2181 - { 2182 - .procname = "hardlockup_all_cpu_backtrace", 2183 - .data = &sysctl_hardlockup_all_cpu_backtrace, 2184 - .maxlen = sizeof(int), 2185 - .mode = 0644, 2186 - .proc_handler = proc_dointvec_minmax, 2187 - .extra1 = SYSCTL_ZERO, 2188 - .extra2 = SYSCTL_ONE, 2189 - }, 2190 - #endif /* CONFIG_SMP */ 2191 - #endif 2192 - #endif 2193 - 2194 2326 #if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_X86) 2195 2327 { 2196 2328 .procname = "unknown_nmi_panic", ··· 2203 2515 .proc_handler = proc_dointvec, 2204 2516 }, 2205 2517 #endif 2206 - #ifdef CONFIG_DETECT_HUNG_TASK 2207 - #ifdef CONFIG_SMP 2208 - { 2209 - .procname = "hung_task_all_cpu_backtrace", 2210 - .data = &sysctl_hung_task_all_cpu_backtrace, 2211 - .maxlen = sizeof(int), 2212 - .mode = 0644, 2213 - .proc_handler = proc_dointvec_minmax, 2214 - .extra1 = SYSCTL_ZERO, 2215 - .extra2 = SYSCTL_ONE, 2216 - }, 2217 - #endif /* CONFIG_SMP */ 2218 - { 2219 - .procname = "hung_task_panic", 2220 - .data = &sysctl_hung_task_panic, 2221 - .maxlen = sizeof(int), 2222 - .mode = 0644, 2223 - .proc_handler = proc_dointvec_minmax, 2224 - .extra1 = SYSCTL_ZERO, 2225 - .extra2 = SYSCTL_ONE, 2226 - }, 2227 - { 2228 - .procname = "hung_task_check_count", 2229 - .data = &sysctl_hung_task_check_count, 2230 - .maxlen = sizeof(int), 2231 - .mode = 0644, 2232 - .proc_handler = proc_dointvec_minmax, 2233 - .extra1 = SYSCTL_ZERO, 2234 - }, 2235 - { 2236 - .procname = "hung_task_timeout_secs", 2237 - .data = &sysctl_hung_task_timeout_secs, 2238 - .maxlen = sizeof(unsigned long), 2239 - .mode = 0644, 2240 - .proc_handler = proc_dohung_task_timeout_secs, 2241 - .extra2 = &hung_task_timeout_max, 2242 - }, 2243 - { 2244 - .procname = "hung_task_check_interval_secs", 2245 - .data = &sysctl_hung_task_check_interval_secs, 2246 - .maxlen = sizeof(unsigned long), 2247 - .mode = 0644, 2248 - .proc_handler = proc_dohung_task_timeout_secs, 2249 - .extra2 = &hung_task_timeout_max, 2250 - }, 2251 - { 2252 - .procname = "hung_task_warnings", 2253 - .data = &sysctl_hung_task_warnings, 2254 - .maxlen = sizeof(int), 2255 - .mode = 0644, 2256 - .proc_handler = proc_dointvec_minmax, 2257 - .extra1 = &neg_one, 2258 - }, 2259 - #endif 2260 2518 #ifdef CONFIG_RT_MUTEXES 2261 2519 { 2262 2520 .procname = "max_lock_depth", ··· 2262 2628 .mode = 0644, 2263 2629 .proc_handler = perf_cpu_time_max_percent_handler, 2264 2630 .extra1 = SYSCTL_ZERO, 2265 - .extra2 = &one_hundred, 2631 + .extra2 = SYSCTL_ONE_HUNDRED, 2266 2632 }, 2267 2633 { 2268 2634 .procname = "perf_event_max_stack", ··· 2271 2637 .mode = 0644, 2272 2638 .proc_handler = perf_event_max_stack_handler, 2273 2639 .extra1 = SYSCTL_ZERO, 2274 - .extra2 = &six_hundred_forty_kb, 2640 + .extra2 = (void *)&six_hundred_forty_kb, 2275 2641 }, 2276 2642 { 2277 2643 .procname = "perf_event_max_contexts_per_stack", ··· 2280 2646 .mode = 0644, 2281 2647 .proc_handler = perf_event_max_stack_handler, 2282 2648 .extra1 = SYSCTL_ZERO, 2283 - .extra2 = &one_thousand, 2649 + .extra2 = SYSCTL_ONE_THOUSAND, 2284 2650 }, 2285 2651 #endif 2286 2652 { ··· 2311 2677 .mode = 0644, 2312 2678 .proc_handler = bpf_unpriv_handler, 2313 2679 .extra1 = SYSCTL_ZERO, 2314 - .extra2 = &two, 2680 + .extra2 = SYSCTL_TWO, 2315 2681 }, 2316 2682 { 2317 2683 .procname = "bpf_stats_enabled", ··· 2343 2709 .extra2 = SYSCTL_INT_MAX, 2344 2710 }, 2345 2711 #endif 2346 - #ifdef CONFIG_STACKLEAK_RUNTIME_DISABLE 2347 - { 2348 - .procname = "stack_erasing", 2349 - .data = NULL, 2350 - .maxlen = sizeof(int), 2351 - .mode = 0600, 2352 - .proc_handler = stack_erasing_sysctl, 2353 - .extra1 = SYSCTL_ZERO, 2354 - .extra2 = SYSCTL_ONE, 2355 - }, 2356 - #endif 2357 2712 { } 2358 2713 }; 2359 2714 ··· 2354 2731 .mode = 0644, 2355 2732 .proc_handler = overcommit_policy_handler, 2356 2733 .extra1 = SYSCTL_ZERO, 2357 - .extra2 = &two, 2734 + .extra2 = SYSCTL_TWO, 2358 2735 }, 2359 2736 { 2360 2737 .procname = "panic_on_oom", ··· 2363 2740 .mode = 0644, 2364 2741 .proc_handler = proc_dointvec_minmax, 2365 2742 .extra1 = SYSCTL_ZERO, 2366 - .extra2 = &two, 2743 + .extra2 = SYSCTL_TWO, 2367 2744 }, 2368 2745 { 2369 2746 .procname = "oom_kill_allocating_task", ··· 2408 2785 .mode = 0644, 2409 2786 .proc_handler = dirty_background_ratio_handler, 2410 2787 .extra1 = SYSCTL_ZERO, 2411 - .extra2 = &one_hundred, 2788 + .extra2 = SYSCTL_ONE_HUNDRED, 2412 2789 }, 2413 2790 { 2414 2791 .procname = "dirty_background_bytes", ··· 2416 2793 .maxlen = sizeof(dirty_background_bytes), 2417 2794 .mode = 0644, 2418 2795 .proc_handler = dirty_background_bytes_handler, 2419 - .extra1 = &one_ul, 2796 + .extra1 = SYSCTL_LONG_ONE, 2420 2797 }, 2421 2798 { 2422 2799 .procname = "dirty_ratio", ··· 2425 2802 .mode = 0644, 2426 2803 .proc_handler = dirty_ratio_handler, 2427 2804 .extra1 = SYSCTL_ZERO, 2428 - .extra2 = &one_hundred, 2805 + .extra2 = SYSCTL_ONE_HUNDRED, 2429 2806 }, 2430 2807 { 2431 2808 .procname = "dirty_bytes", ··· 2433 2810 .maxlen = sizeof(vm_dirty_bytes), 2434 2811 .mode = 0644, 2435 2812 .proc_handler = dirty_bytes_handler, 2436 - .extra1 = &dirty_bytes_min, 2813 + .extra1 = (void *)&dirty_bytes_min, 2437 2814 }, 2438 2815 { 2439 2816 .procname = "dirty_writeback_centisecs", ··· 2465 2842 .mode = 0644, 2466 2843 .proc_handler = proc_dointvec_minmax, 2467 2844 .extra1 = SYSCTL_ZERO, 2468 - .extra2 = &two_hundred, 2845 + .extra2 = SYSCTL_TWO_HUNDRED, 2469 2846 }, 2470 2847 #ifdef CONFIG_HUGETLB_PAGE 2471 2848 { ··· 2522 2899 .mode = 0200, 2523 2900 .proc_handler = drop_caches_sysctl_handler, 2524 2901 .extra1 = SYSCTL_ONE, 2525 - .extra2 = &four, 2902 + .extra2 = SYSCTL_FOUR, 2526 2903 }, 2527 2904 #ifdef CONFIG_COMPACTION 2528 2905 { ··· 2539 2916 .mode = 0644, 2540 2917 .proc_handler = compaction_proactiveness_sysctl_handler, 2541 2918 .extra1 = SYSCTL_ZERO, 2542 - .extra2 = &one_hundred, 2919 + .extra2 = SYSCTL_ONE_HUNDRED, 2543 2920 }, 2544 2921 { 2545 2922 .procname = "extfrag_threshold", ··· 2547 2924 .maxlen = sizeof(int), 2548 2925 .mode = 0644, 2549 2926 .proc_handler = proc_dointvec_minmax, 2550 - .extra1 = &min_extfrag_threshold, 2551 - .extra2 = &max_extfrag_threshold, 2927 + .extra1 = SYSCTL_ZERO, 2928 + .extra2 = (void *)&max_extfrag_threshold, 2552 2929 }, 2553 2930 { 2554 2931 .procname = "compact_unevictable_allowed", ··· 2584 2961 .mode = 0644, 2585 2962 .proc_handler = watermark_scale_factor_sysctl_handler, 2586 2963 .extra1 = SYSCTL_ONE, 2587 - .extra2 = &three_thousand, 2964 + .extra2 = SYSCTL_THREE_THOUSAND, 2588 2965 }, 2589 2966 { 2590 2967 .procname = "percpu_pagelist_high_fraction", ··· 2663 3040 .mode = 0644, 2664 3041 .proc_handler = sysctl_min_unmapped_ratio_sysctl_handler, 2665 3042 .extra1 = SYSCTL_ZERO, 2666 - .extra2 = &one_hundred, 3043 + .extra2 = SYSCTL_ONE_HUNDRED, 2667 3044 }, 2668 3045 { 2669 3046 .procname = "min_slab_ratio", ··· 2672 3049 .mode = 0644, 2673 3050 .proc_handler = sysctl_min_slab_ratio_sysctl_handler, 2674 3051 .extra1 = SYSCTL_ZERO, 2675 - .extra2 = &one_hundred, 3052 + .extra2 = SYSCTL_ONE_HUNDRED, 2676 3053 }, 2677 3054 #endif 2678 3055 #ifdef CONFIG_SMP ··· 2806 3183 { } 2807 3184 }; 2808 3185 2809 - static struct ctl_table fs_table[] = { 2810 - { 2811 - .procname = "inode-nr", 2812 - .data = &inodes_stat, 2813 - .maxlen = 2*sizeof(long), 2814 - .mode = 0444, 2815 - .proc_handler = proc_nr_inodes, 2816 - }, 2817 - { 2818 - .procname = "inode-state", 2819 - .data = &inodes_stat, 2820 - .maxlen = 7*sizeof(long), 2821 - .mode = 0444, 2822 - .proc_handler = proc_nr_inodes, 2823 - }, 2824 - { 2825 - .procname = "file-nr", 2826 - .data = &files_stat, 2827 - .maxlen = sizeof(files_stat), 2828 - .mode = 0444, 2829 - .proc_handler = proc_nr_files, 2830 - }, 2831 - { 2832 - .procname = "file-max", 2833 - .data = &files_stat.max_files, 2834 - .maxlen = sizeof(files_stat.max_files), 2835 - .mode = 0644, 2836 - .proc_handler = proc_doulongvec_minmax, 2837 - .extra1 = &zero_ul, 2838 - .extra2 = &long_max, 2839 - }, 2840 - { 2841 - .procname = "nr_open", 2842 - .data = &sysctl_nr_open, 2843 - .maxlen = sizeof(unsigned int), 2844 - .mode = 0644, 2845 - .proc_handler = proc_dointvec_minmax, 2846 - .extra1 = &sysctl_nr_open_min, 2847 - .extra2 = &sysctl_nr_open_max, 2848 - }, 2849 - { 2850 - .procname = "dentry-state", 2851 - .data = &dentry_stat, 2852 - .maxlen = 6*sizeof(long), 2853 - .mode = 0444, 2854 - .proc_handler = proc_nr_dentry, 2855 - }, 2856 - { 2857 - .procname = "overflowuid", 2858 - .data = &fs_overflowuid, 2859 - .maxlen = sizeof(int), 2860 - .mode = 0644, 2861 - .proc_handler = proc_dointvec_minmax, 2862 - .extra1 = &minolduid, 2863 - .extra2 = &maxolduid, 2864 - }, 2865 - { 2866 - .procname = "overflowgid", 2867 - .data = &fs_overflowgid, 2868 - .maxlen = sizeof(int), 2869 - .mode = 0644, 2870 - .proc_handler = proc_dointvec_minmax, 2871 - .extra1 = &minolduid, 2872 - .extra2 = &maxolduid, 2873 - }, 2874 - #ifdef CONFIG_FILE_LOCKING 2875 - { 2876 - .procname = "leases-enable", 2877 - .data = &leases_enable, 2878 - .maxlen = sizeof(int), 2879 - .mode = 0644, 2880 - .proc_handler = proc_dointvec, 2881 - }, 2882 - #endif 2883 - #ifdef CONFIG_DNOTIFY 2884 - { 2885 - .procname = "dir-notify-enable", 2886 - .data = &dir_notify_enable, 2887 - .maxlen = sizeof(int), 2888 - .mode = 0644, 2889 - .proc_handler = proc_dointvec, 2890 - }, 2891 - #endif 2892 - #ifdef CONFIG_MMU 2893 - #ifdef CONFIG_FILE_LOCKING 2894 - { 2895 - .procname = "lease-break-time", 2896 - .data = &lease_break_time, 2897 - .maxlen = sizeof(int), 2898 - .mode = 0644, 2899 - .proc_handler = proc_dointvec, 2900 - }, 2901 - #endif 2902 - #ifdef CONFIG_AIO 2903 - { 2904 - .procname = "aio-nr", 2905 - .data = &aio_nr, 2906 - .maxlen = sizeof(aio_nr), 2907 - .mode = 0444, 2908 - .proc_handler = proc_doulongvec_minmax, 2909 - }, 2910 - { 2911 - .procname = "aio-max-nr", 2912 - .data = &aio_max_nr, 2913 - .maxlen = sizeof(aio_max_nr), 2914 - .mode = 0644, 2915 - .proc_handler = proc_doulongvec_minmax, 2916 - }, 2917 - #endif /* CONFIG_AIO */ 2918 - #ifdef CONFIG_INOTIFY_USER 2919 - { 2920 - .procname = "inotify", 2921 - .mode = 0555, 2922 - .child = inotify_table, 2923 - }, 2924 - #endif 2925 - #ifdef CONFIG_FANOTIFY 2926 - { 2927 - .procname = "fanotify", 2928 - .mode = 0555, 2929 - .child = fanotify_table, 2930 - }, 2931 - #endif 2932 - #ifdef CONFIG_EPOLL 2933 - { 2934 - .procname = "epoll", 2935 - .mode = 0555, 2936 - .child = epoll_table, 2937 - }, 2938 - #endif 2939 - #endif 2940 - { 2941 - .procname = "protected_symlinks", 2942 - .data = &sysctl_protected_symlinks, 2943 - .maxlen = sizeof(int), 2944 - .mode = 0600, 2945 - .proc_handler = proc_dointvec_minmax, 2946 - .extra1 = SYSCTL_ZERO, 2947 - .extra2 = SYSCTL_ONE, 2948 - }, 2949 - { 2950 - .procname = "protected_hardlinks", 2951 - .data = &sysctl_protected_hardlinks, 2952 - .maxlen = sizeof(int), 2953 - .mode = 0600, 2954 - .proc_handler = proc_dointvec_minmax, 2955 - .extra1 = SYSCTL_ZERO, 2956 - .extra2 = SYSCTL_ONE, 2957 - }, 2958 - { 2959 - .procname = "protected_fifos", 2960 - .data = &sysctl_protected_fifos, 2961 - .maxlen = sizeof(int), 2962 - .mode = 0600, 2963 - .proc_handler = proc_dointvec_minmax, 2964 - .extra1 = SYSCTL_ZERO, 2965 - .extra2 = &two, 2966 - }, 2967 - { 2968 - .procname = "protected_regular", 2969 - .data = &sysctl_protected_regular, 2970 - .maxlen = sizeof(int), 2971 - .mode = 0600, 2972 - .proc_handler = proc_dointvec_minmax, 2973 - .extra1 = SYSCTL_ZERO, 2974 - .extra2 = &two, 2975 - }, 2976 - { 2977 - .procname = "suid_dumpable", 2978 - .data = &suid_dumpable, 2979 - .maxlen = sizeof(int), 2980 - .mode = 0644, 2981 - .proc_handler = proc_dointvec_minmax_coredump, 2982 - .extra1 = SYSCTL_ZERO, 2983 - .extra2 = &two, 2984 - }, 2985 - #if defined(CONFIG_BINFMT_MISC) || defined(CONFIG_BINFMT_MISC_MODULE) 2986 - { 2987 - .procname = "binfmt_misc", 2988 - .mode = 0555, 2989 - .child = sysctl_mount_point, 2990 - }, 2991 - #endif 2992 - { 2993 - .procname = "pipe-max-size", 2994 - .data = &pipe_max_size, 2995 - .maxlen = sizeof(pipe_max_size), 2996 - .mode = 0644, 2997 - .proc_handler = proc_dopipe_max_size, 2998 - }, 2999 - { 3000 - .procname = "pipe-user-pages-hard", 3001 - .data = &pipe_user_pages_hard, 3002 - .maxlen = sizeof(pipe_user_pages_hard), 3003 - .mode = 0644, 3004 - .proc_handler = proc_doulongvec_minmax, 3005 - }, 3006 - { 3007 - .procname = "pipe-user-pages-soft", 3008 - .data = &pipe_user_pages_soft, 3009 - .maxlen = sizeof(pipe_user_pages_soft), 3010 - .mode = 0644, 3011 - .proc_handler = proc_doulongvec_minmax, 3012 - }, 3013 - { 3014 - .procname = "mount-max", 3015 - .data = &sysctl_mount_max, 3016 - .maxlen = sizeof(unsigned int), 3017 - .mode = 0644, 3018 - .proc_handler = proc_dointvec_minmax, 3019 - .extra1 = SYSCTL_ONE, 3020 - }, 3021 - { } 3022 - }; 3023 - 3024 3186 static struct ctl_table debug_table[] = { 3025 3187 #ifdef CONFIG_SYSCTL_EXCEPTION_TRACE 3026 3188 { ··· 2816 3408 .proc_handler = proc_dointvec 2817 3409 }, 2818 3410 #endif 2819 - #if defined(CONFIG_OPTPROBES) 2820 - { 2821 - .procname = "kprobes-optimization", 2822 - .data = &sysctl_kprobes_optimization, 2823 - .maxlen = sizeof(int), 2824 - .mode = 0644, 2825 - .proc_handler = proc_kprobes_optimization_handler, 2826 - .extra1 = SYSCTL_ZERO, 2827 - .extra2 = SYSCTL_ONE, 2828 - }, 2829 - #endif 2830 3411 { } 2831 3412 }; 2832 3413 ··· 2823 3426 { } 2824 3427 }; 2825 3428 2826 - static struct ctl_table sysctl_base_table[] = { 2827 - { 2828 - .procname = "kernel", 2829 - .mode = 0555, 2830 - .child = kern_table, 2831 - }, 2832 - { 2833 - .procname = "vm", 2834 - .mode = 0555, 2835 - .child = vm_table, 2836 - }, 2837 - { 2838 - .procname = "fs", 2839 - .mode = 0555, 2840 - .child = fs_table, 2841 - }, 2842 - { 2843 - .procname = "debug", 2844 - .mode = 0555, 2845 - .child = debug_table, 2846 - }, 2847 - { 2848 - .procname = "dev", 2849 - .mode = 0555, 2850 - .child = dev_table, 2851 - }, 2852 - { } 2853 - }; 3429 + DECLARE_SYSCTL_BASE(kernel, kern_table); 3430 + DECLARE_SYSCTL_BASE(vm, vm_table); 3431 + DECLARE_SYSCTL_BASE(debug, debug_table); 3432 + DECLARE_SYSCTL_BASE(dev, dev_table); 2854 3433 2855 - int __init sysctl_init(void) 3434 + int __init sysctl_init_bases(void) 2856 3435 { 2857 - struct ctl_table_header *hdr; 3436 + register_sysctl_base(kernel); 3437 + register_sysctl_base(vm); 3438 + register_sysctl_base(debug); 3439 + register_sysctl_base(dev); 2858 3440 2859 - hdr = register_sysctl_table(sysctl_base_table); 2860 - kmemleak_not_leak(hdr); 2861 3441 return 0; 2862 3442 } 2863 3443 #endif /* CONFIG_SYSCTL */

+101

kernel/watchdog.c

··· 740 740 mutex_unlock(&watchdog_mutex); 741 741 return err; 742 742 } 743 + 744 + static const int sixty = 60; 745 + 746 + static struct ctl_table watchdog_sysctls[] = { 747 + { 748 + .procname = "watchdog", 749 + .data = &watchdog_user_enabled, 750 + .maxlen = sizeof(int), 751 + .mode = 0644, 752 + .proc_handler = proc_watchdog, 753 + .extra1 = SYSCTL_ZERO, 754 + .extra2 = SYSCTL_ONE, 755 + }, 756 + { 757 + .procname = "watchdog_thresh", 758 + .data = &watchdog_thresh, 759 + .maxlen = sizeof(int), 760 + .mode = 0644, 761 + .proc_handler = proc_watchdog_thresh, 762 + .extra1 = SYSCTL_ZERO, 763 + .extra2 = (void *)&sixty, 764 + }, 765 + { 766 + .procname = "nmi_watchdog", 767 + .data = &nmi_watchdog_user_enabled, 768 + .maxlen = sizeof(int), 769 + .mode = NMI_WATCHDOG_SYSCTL_PERM, 770 + .proc_handler = proc_nmi_watchdog, 771 + .extra1 = SYSCTL_ZERO, 772 + .extra2 = SYSCTL_ONE, 773 + }, 774 + { 775 + .procname = "watchdog_cpumask", 776 + .data = &watchdog_cpumask_bits, 777 + .maxlen = NR_CPUS, 778 + .mode = 0644, 779 + .proc_handler = proc_watchdog_cpumask, 780 + }, 781 + #ifdef CONFIG_SOFTLOCKUP_DETECTOR 782 + { 783 + .procname = "soft_watchdog", 784 + .data = &soft_watchdog_user_enabled, 785 + .maxlen = sizeof(int), 786 + .mode = 0644, 787 + .proc_handler = proc_soft_watchdog, 788 + .extra1 = SYSCTL_ZERO, 789 + .extra2 = SYSCTL_ONE, 790 + }, 791 + { 792 + .procname = "softlockup_panic", 793 + .data = &softlockup_panic, 794 + .maxlen = sizeof(int), 795 + .mode = 0644, 796 + .proc_handler = proc_dointvec_minmax, 797 + .extra1 = SYSCTL_ZERO, 798 + .extra2 = SYSCTL_ONE, 799 + }, 800 + #ifdef CONFIG_SMP 801 + { 802 + .procname = "softlockup_all_cpu_backtrace", 803 + .data = &sysctl_softlockup_all_cpu_backtrace, 804 + .maxlen = sizeof(int), 805 + .mode = 0644, 806 + .proc_handler = proc_dointvec_minmax, 807 + .extra1 = SYSCTL_ZERO, 808 + .extra2 = SYSCTL_ONE, 809 + }, 810 + #endif /* CONFIG_SMP */ 811 + #endif 812 + #ifdef CONFIG_HARDLOCKUP_DETECTOR 813 + { 814 + .procname = "hardlockup_panic", 815 + .data = &hardlockup_panic, 816 + .maxlen = sizeof(int), 817 + .mode = 0644, 818 + .proc_handler = proc_dointvec_minmax, 819 + .extra1 = SYSCTL_ZERO, 820 + .extra2 = SYSCTL_ONE, 821 + }, 822 + #ifdef CONFIG_SMP 823 + { 824 + .procname = "hardlockup_all_cpu_backtrace", 825 + .data = &sysctl_hardlockup_all_cpu_backtrace, 826 + .maxlen = sizeof(int), 827 + .mode = 0644, 828 + .proc_handler = proc_dointvec_minmax, 829 + .extra1 = SYSCTL_ZERO, 830 + .extra2 = SYSCTL_ONE, 831 + }, 832 + #endif /* CONFIG_SMP */ 833 + #endif 834 + {} 835 + }; 836 + 837 + static void __init watchdog_sysctl_init(void) 838 + { 839 + register_sysctl_init("kernel", watchdog_sysctls); 840 + } 841 + #else 842 + #define watchdog_sysctl_init() do { } while (0) 743 843 #endif /* CONFIG_SYSCTL */ 744 844 745 845 void __init lockup_detector_init(void) ··· 853 753 if (!watchdog_nmi_probe()) 854 754 nmi_watchdog_available = true; 855 755 lockup_detector_setup(); 756 + watchdog_sysctl_init(); 856 757 }

+4

lib/Kconfig

··· 673 673 bool 674 674 select STACKTRACE 675 675 676 + config STACKDEPOT_ALWAYS_INIT 677 + bool 678 + select STACKDEPOT 679 + 676 680 config STACK_HASH_ORDER 677 681 int "stack depot hash size (12 => 4KB, 20 => 1024KB)" 678 682 range 12 20

+1 -1

lib/Kconfig.kasan

··· 38 38 CC_HAS_WORKING_NOSANITIZE_ADDRESS) || \ 39 39 HAVE_ARCH_KASAN_HW_TAGS 40 40 depends on (SLUB && SYSFS) || (SLAB && !DEBUG_SLAB) 41 - select STACKDEPOT 41 + select STACKDEPOT_ALWAYS_INIT 42 42 help 43 43 Enables KASAN (KernelAddressSANitizer) - runtime memory debugger, 44 44 designed to find out-of-bounds accesses and use-after-free bugs.

+41 -5

lib/stackdepot.c

··· 23 23 #include <linux/jhash.h> 24 24 #include <linux/kernel.h> 25 25 #include <linux/mm.h> 26 + #include <linux/mutex.h> 26 27 #include <linux/percpu.h> 27 28 #include <linux/printk.h> 28 29 #include <linux/slab.h> ··· 162 161 } 163 162 early_param("stack_depot_disable", is_stack_depot_disabled); 164 163 165 - int __init stack_depot_init(void) 164 + /* 165 + * __ref because of memblock_alloc(), which will not be actually called after 166 + * the __init code is gone, because at that point slab_is_available() is true 167 + */ 168 + __ref int stack_depot_init(void) 166 169 { 167 - if (!stack_depot_disable) { 170 + static DEFINE_MUTEX(stack_depot_init_mutex); 171 + 172 + mutex_lock(&stack_depot_init_mutex); 173 + if (!stack_depot_disable && !stack_table) { 168 174 size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *)); 169 175 int i; 170 176 171 - stack_table = memblock_alloc(size, size); 172 - for (i = 0; i < STACK_HASH_SIZE; i++) 173 - stack_table[i] = NULL; 177 + if (slab_is_available()) { 178 + pr_info("Stack Depot allocating hash table with kvmalloc\n"); 179 + stack_table = kvmalloc(size, GFP_KERNEL); 180 + } else { 181 + pr_info("Stack Depot allocating hash table with memblock_alloc\n"); 182 + stack_table = memblock_alloc(size, SMP_CACHE_BYTES); 183 + } 184 + if (stack_table) { 185 + for (i = 0; i < STACK_HASH_SIZE; i++) 186 + stack_table[i] = NULL; 187 + } else { 188 + pr_err("Stack Depot hash table allocation failed, disabling\n"); 189 + stack_depot_disable = true; 190 + mutex_unlock(&stack_depot_init_mutex); 191 + return -ENOMEM; 192 + } 174 193 } 194 + mutex_unlock(&stack_depot_init_mutex); 175 195 return 0; 176 196 } 197 + EXPORT_SYMBOL_GPL(stack_depot_init); 177 198 178 199 /* Calculate hash for a stack */ 179 200 static inline u32 hash_stack(unsigned long *entries, unsigned int size) ··· 328 305 * (allocates using GFP flags of @alloc_flags). If @can_alloc is %false, avoids 329 306 * any allocations and will fail if no space is left to store the stack trace. 330 307 * 308 + * If the stack trace in @entries is from an interrupt, only the portion up to 309 + * interrupt entry is saved. 310 + * 331 311 * Context: Any context, but setting @can_alloc to %false is required if 332 312 * alloc_pages() cannot be used from the current context. Currently 333 313 * this is the case from contexts where neither %GFP_ATOMIC nor ··· 348 322 void *prealloc = NULL; 349 323 unsigned long flags; 350 324 u32 hash; 325 + 326 + /* 327 + * If this stack trace is from an interrupt, including anything before 328 + * interrupt entry usually leads to unbounded stackdepot growth. 329 + * 330 + * Because use of filter_irq_stacks() is a requirement to ensure 331 + * stackdepot can efficiently deduplicate interrupt stacks, always 332 + * filter_irq_stacks() to simplify all callers' use of stackdepot. 333 + */ 334 + nr_entries = filter_irq_stacks(entries, nr_entries); 351 335 352 336 if (unlikely(nr_entries == 0) || stack_depot_disable) 353 337 goto fast_exit;

+1 -21

lib/test_sysctl.c

··· 128 128 { } 129 129 }; 130 130 131 - static struct ctl_table test_sysctl_table[] = { 132 - { 133 - .procname = "test_sysctl", 134 - .maxlen = 0, 135 - .mode = 0555, 136 - .child = test_table, 137 - }, 138 - { } 139 - }; 140 - 141 - static struct ctl_table test_sysctl_root_table[] = { 142 - { 143 - .procname = "debug", 144 - .maxlen = 0, 145 - .mode = 0555, 146 - .child = test_sysctl_table, 147 - }, 148 - { } 149 - }; 150 - 151 131 static struct ctl_table_header *test_sysctl_header; 152 132 153 133 static int __init test_sysctl_init(void) ··· 135 155 test_data.bitmap_0001 = kzalloc(SYSCTL_TEST_BITMAP_SIZE/8, GFP_KERNEL); 136 156 if (!test_data.bitmap_0001) 137 157 return -ENOMEM; 138 - test_sysctl_header = register_sysctl_table(test_sysctl_root_table); 158 + test_sysctl_header = register_sysctl("debug/test_sysctl", test_table); 139 159 if (!test_sysctl_header) { 140 160 kfree(test_data.bitmap_0001); 141 161 return -ENOMEM;

+3 -37

mm/Kconfig

··· 444 444 config HAVE_SETUP_PER_CPU_AREA 445 445 bool 446 446 447 - config CLEANCACHE 448 - bool "Enable cleancache driver to cache clean pages if tmem is present" 449 - help 450 - Cleancache can be thought of as a page-granularity victim cache 451 - for clean pages that the kernel's pageframe replacement algorithm 452 - (PFRA) would like to keep around, but can't since there isn't enough 453 - memory. So when the PFRA "evicts" a page, it first attempts to use 454 - cleancache code to put the data contained in that page into 455 - "transcendent memory", memory that is not directly accessible or 456 - addressable by the kernel and is of unknown and possibly 457 - time-varying size. And when a cleancache-enabled 458 - filesystem wishes to access a page in a file on disk, it first 459 - checks cleancache to see if it already contains it; if it does, 460 - the page is copied into the kernel and a disk access is avoided. 461 - When a transcendent memory driver is available (such as zcache or 462 - Xen transcendent memory), a significant I/O reduction 463 - may be achieved. When none is available, all cleancache calls 464 - are reduced to a single pointer-compare-against-NULL resulting 465 - in a negligible performance hit. 466 - 467 - If unsure, say Y to enable cleancache 468 - 469 447 config FRONTSWAP 470 - bool "Enable frontswap to cache swap pages if tmem is present" 471 - depends on SWAP 472 - help 473 - Frontswap is so named because it can be thought of as the opposite 474 - of a "backing" store for a swap device. The data is stored into 475 - "transcendent memory", memory that is not directly accessible or 476 - addressable by the kernel and is of unknown and possibly 477 - time-varying size. When space in transcendent memory is available, 478 - a significant swap I/O reduction may be achieved. When none is 479 - available, all frontswap calls are reduced to a single pointer- 480 - compare-against-NULL resulting in a negligible performance hit 481 - and swap data is stored as normal on the matching swap device. 482 - 483 - If unsure, say Y to enable frontswap. 448 + bool 484 449 485 450 config CMA 486 451 bool "Contiguous Memory Allocator" ··· 510 545 511 546 config ZSWAP 512 547 bool "Compressed cache for swap pages (EXPERIMENTAL)" 513 - depends on FRONTSWAP && CRYPTO=y 548 + depends on SWAP && CRYPTO=y 549 + select FRONTSWAP 514 550 select ZPOOL 515 551 help 516 552 A lightweight compressed cache for swap pages. It takes

-1

mm/Makefile

··· 104 104 obj-$(CONFIG_DEBUG_RODATA_TEST) += rodata_test.o 105 105 obj-$(CONFIG_DEBUG_VM_PGTABLE) += debug_vm_pgtable.o 106 106 obj-$(CONFIG_PAGE_OWNER) += page_owner.o 107 - obj-$(CONFIG_CLEANCACHE) += cleancache.o 108 107 obj-$(CONFIG_MEMORY_ISOLATION) += page_isolation.o 109 108 obj-$(CONFIG_ZPOOL) += zpool.o 110 109 obj-$(CONFIG_ZBUD) += zbud.o

-315

mm/cleancache.c

··· 1 - // SPDX-License-Identifier: GPL-2.0-only 2 - /* 3 - * Cleancache frontend 4 - * 5 - * This code provides the generic "frontend" layer to call a matching 6 - * "backend" driver implementation of cleancache. See 7 - * Documentation/vm/cleancache.rst for more information. 8 - * 9 - * Copyright (C) 2009-2010 Oracle Corp. All rights reserved. 10 - * Author: Dan Magenheimer 11 - */ 12 - 13 - #include <linux/module.h> 14 - #include <linux/fs.h> 15 - #include <linux/exportfs.h> 16 - #include <linux/mm.h> 17 - #include <linux/debugfs.h> 18 - #include <linux/cleancache.h> 19 - 20 - /* 21 - * cleancache_ops is set by cleancache_register_ops to contain the pointers 22 - * to the cleancache "backend" implementation functions. 23 - */ 24 - static const struct cleancache_ops *cleancache_ops __read_mostly; 25 - 26 - /* 27 - * Counters available via /sys/kernel/debug/cleancache (if debugfs is 28 - * properly configured. These are for information only so are not protected 29 - * against increment races. 30 - */ 31 - static u64 cleancache_succ_gets; 32 - static u64 cleancache_failed_gets; 33 - static u64 cleancache_puts; 34 - static u64 cleancache_invalidates; 35 - 36 - static void cleancache_register_ops_sb(struct super_block *sb, void *unused) 37 - { 38 - switch (sb->cleancache_poolid) { 39 - case CLEANCACHE_NO_BACKEND: 40 - __cleancache_init_fs(sb); 41 - break; 42 - case CLEANCACHE_NO_BACKEND_SHARED: 43 - __cleancache_init_shared_fs(sb); 44 - break; 45 - } 46 - } 47 - 48 - /* 49 - * Register operations for cleancache. Returns 0 on success. 50 - */ 51 - int cleancache_register_ops(const struct cleancache_ops *ops) 52 - { 53 - if (cmpxchg(&cleancache_ops, NULL, ops)) 54 - return -EBUSY; 55 - 56 - /* 57 - * A cleancache backend can be built as a module and hence loaded after 58 - * a cleancache enabled filesystem has called cleancache_init_fs. To 59 - * handle such a scenario, here we call ->init_fs or ->init_shared_fs 60 - * for each active super block. To differentiate between local and 61 - * shared filesystems, we temporarily initialize sb->cleancache_poolid 62 - * to CLEANCACHE_NO_BACKEND or CLEANCACHE_NO_BACKEND_SHARED 63 - * respectively in case there is no backend registered at the time 64 - * cleancache_init_fs or cleancache_init_shared_fs is called. 65 - * 66 - * Since filesystems can be mounted concurrently with cleancache 67 - * backend registration, we have to be careful to guarantee that all 68 - * cleancache enabled filesystems that has been mounted by the time 69 - * cleancache_register_ops is called has got and all mounted later will 70 - * get cleancache_poolid. This is assured by the following statements 71 - * tied together: 72 - * 73 - * a) iterate_supers skips only those super blocks that has started 74 - * ->kill_sb 75 - * 76 - * b) if iterate_supers encounters a super block that has not finished 77 - * ->mount yet, it waits until it is finished 78 - * 79 - * c) cleancache_init_fs is called from ->mount and 80 - * cleancache_invalidate_fs is called from ->kill_sb 81 - * 82 - * d) we call iterate_supers after cleancache_ops has been set 83 - * 84 - * From a) it follows that if iterate_supers skips a super block, then 85 - * either the super block is already dead, in which case we do not need 86 - * to bother initializing cleancache for it, or it was mounted after we 87 - * initiated iterate_supers. In the latter case, it must have seen 88 - * cleancache_ops set according to d) and initialized cleancache from 89 - * ->mount by itself according to c). This proves that we call 90 - * ->init_fs at least once for each active super block. 91 - * 92 - * From b) and c) it follows that if iterate_supers encounters a super 93 - * block that has already started ->init_fs, it will wait until ->mount 94 - * and hence ->init_fs has finished, then check cleancache_poolid, see 95 - * that it has already been set and therefore do nothing. This proves 96 - * that we call ->init_fs no more than once for each super block. 97 - * 98 - * Combined together, the last two paragraphs prove the function 99 - * correctness. 100 - * 101 - * Note that various cleancache callbacks may proceed before this 102 - * function is called or even concurrently with it, but since 103 - * CLEANCACHE_NO_BACKEND is negative, they will all result in a noop 104 - * until the corresponding ->init_fs has been actually called and 105 - * cleancache_ops has been set. 106 - */ 107 - iterate_supers(cleancache_register_ops_sb, NULL); 108 - return 0; 109 - } 110 - EXPORT_SYMBOL(cleancache_register_ops); 111 - 112 - /* Called by a cleancache-enabled filesystem at time of mount */ 113 - void __cleancache_init_fs(struct super_block *sb) 114 - { 115 - int pool_id = CLEANCACHE_NO_BACKEND; 116 - 117 - if (cleancache_ops) { 118 - pool_id = cleancache_ops->init_fs(PAGE_SIZE); 119 - if (pool_id < 0) 120 - pool_id = CLEANCACHE_NO_POOL; 121 - } 122 - sb->cleancache_poolid = pool_id; 123 - } 124 - EXPORT_SYMBOL(__cleancache_init_fs); 125 - 126 - /* Called by a cleancache-enabled clustered filesystem at time of mount */ 127 - void __cleancache_init_shared_fs(struct super_block *sb) 128 - { 129 - int pool_id = CLEANCACHE_NO_BACKEND_SHARED; 130 - 131 - if (cleancache_ops) { 132 - pool_id = cleancache_ops->init_shared_fs(&sb->s_uuid, PAGE_SIZE); 133 - if (pool_id < 0) 134 - pool_id = CLEANCACHE_NO_POOL; 135 - } 136 - sb->cleancache_poolid = pool_id; 137 - } 138 - EXPORT_SYMBOL(__cleancache_init_shared_fs); 139 - 140 - /* 141 - * If the filesystem uses exportable filehandles, use the filehandle as 142 - * the key, else use the inode number. 143 - */ 144 - static int cleancache_get_key(struct inode *inode, 145 - struct cleancache_filekey *key) 146 - { 147 - int (*fhfn)(struct inode *, __u32 *fh, int *, struct inode *); 148 - int len = 0, maxlen = CLEANCACHE_KEY_MAX; 149 - struct super_block *sb = inode->i_sb; 150 - 151 - key->u.ino = inode->i_ino; 152 - if (sb->s_export_op != NULL) { 153 - fhfn = sb->s_export_op->encode_fh; 154 - if (fhfn) { 155 - len = (*fhfn)(inode, &key->u.fh[0], &maxlen, NULL); 156 - if (len <= FILEID_ROOT || len == FILEID_INVALID) 157 - return -1; 158 - if (maxlen > CLEANCACHE_KEY_MAX) 159 - return -1; 160 - } 161 - } 162 - return 0; 163 - } 164 - 165 - /* 166 - * "Get" data from cleancache associated with the poolid/inode/index 167 - * that were specified when the data was put to cleanache and, if 168 - * successful, use it to fill the specified page with data and return 0. 169 - * The pageframe is unchanged and returns -1 if the get fails. 170 - * Page must be locked by caller. 171 - * 172 - * The function has two checks before any action is taken - whether 173 - * a backend is registered and whether the sb->cleancache_poolid 174 - * is correct. 175 - */ 176 - int __cleancache_get_page(struct page *page) 177 - { 178 - int ret = -1; 179 - int pool_id; 180 - struct cleancache_filekey key = { .u.key = { 0 } }; 181 - 182 - if (!cleancache_ops) { 183 - cleancache_failed_gets++; 184 - goto out; 185 - } 186 - 187 - VM_BUG_ON_PAGE(!PageLocked(page), page); 188 - pool_id = page->mapping->host->i_sb->cleancache_poolid; 189 - if (pool_id < 0) 190 - goto out; 191 - 192 - if (cleancache_get_key(page->mapping->host, &key) < 0) 193 - goto out; 194 - 195 - ret = cleancache_ops->get_page(pool_id, key, page->index, page); 196 - if (ret == 0) 197 - cleancache_succ_gets++; 198 - else 199 - cleancache_failed_gets++; 200 - out: 201 - return ret; 202 - } 203 - EXPORT_SYMBOL(__cleancache_get_page); 204 - 205 - /* 206 - * "Put" data from a page to cleancache and associate it with the 207 - * (previously-obtained per-filesystem) poolid and the page's, 208 - * inode and page index. Page must be locked. Note that a put_page 209 - * always "succeeds", though a subsequent get_page may succeed or fail. 210 - * 211 - * The function has two checks before any action is taken - whether 212 - * a backend is registered and whether the sb->cleancache_poolid 213 - * is correct. 214 - */ 215 - void __cleancache_put_page(struct page *page) 216 - { 217 - int pool_id; 218 - struct cleancache_filekey key = { .u.key = { 0 } }; 219 - 220 - if (!cleancache_ops) { 221 - cleancache_puts++; 222 - return; 223 - } 224 - 225 - VM_BUG_ON_PAGE(!PageLocked(page), page); 226 - pool_id = page->mapping->host->i_sb->cleancache_poolid; 227 - if (pool_id >= 0 && 228 - cleancache_get_key(page->mapping->host, &key) >= 0) { 229 - cleancache_ops->put_page(pool_id, key, page->index, page); 230 - cleancache_puts++; 231 - } 232 - } 233 - EXPORT_SYMBOL(__cleancache_put_page); 234 - 235 - /* 236 - * Invalidate any data from cleancache associated with the poolid and the 237 - * page's inode and page index so that a subsequent "get" will fail. 238 - * 239 - * The function has two checks before any action is taken - whether 240 - * a backend is registered and whether the sb->cleancache_poolid 241 - * is correct. 242 - */ 243 - void __cleancache_invalidate_page(struct address_space *mapping, 244 - struct page *page) 245 - { 246 - /* careful... page->mapping is NULL sometimes when this is called */ 247 - int pool_id = mapping->host->i_sb->cleancache_poolid; 248 - struct cleancache_filekey key = { .u.key = { 0 } }; 249 - 250 - if (!cleancache_ops) 251 - return; 252 - 253 - if (pool_id >= 0) { 254 - VM_BUG_ON_PAGE(!PageLocked(page), page); 255 - if (cleancache_get_key(mapping->host, &key) >= 0) { 256 - cleancache_ops->invalidate_page(pool_id, 257 - key, page->index); 258 - cleancache_invalidates++; 259 - } 260 - } 261 - } 262 - EXPORT_SYMBOL(__cleancache_invalidate_page); 263 - 264 - /* 265 - * Invalidate all data from cleancache associated with the poolid and the 266 - * mappings's inode so that all subsequent gets to this poolid/inode 267 - * will fail. 268 - * 269 - * The function has two checks before any action is taken - whether 270 - * a backend is registered and whether the sb->cleancache_poolid 271 - * is correct. 272 - */ 273 - void __cleancache_invalidate_inode(struct address_space *mapping) 274 - { 275 - int pool_id = mapping->host->i_sb->cleancache_poolid; 276 - struct cleancache_filekey key = { .u.key = { 0 } }; 277 - 278 - if (!cleancache_ops) 279 - return; 280 - 281 - if (pool_id >= 0 && cleancache_get_key(mapping->host, &key) >= 0) 282 - cleancache_ops->invalidate_inode(pool_id, key); 283 - } 284 - EXPORT_SYMBOL(__cleancache_invalidate_inode); 285 - 286 - /* 287 - * Called by any cleancache-enabled filesystem at time of unmount; 288 - * note that pool_id is surrendered and may be returned by a subsequent 289 - * cleancache_init_fs or cleancache_init_shared_fs. 290 - */ 291 - void __cleancache_invalidate_fs(struct super_block *sb) 292 - { 293 - int pool_id; 294 - 295 - pool_id = sb->cleancache_poolid; 296 - sb->cleancache_poolid = CLEANCACHE_NO_POOL; 297 - 298 - if (cleancache_ops && pool_id >= 0) 299 - cleancache_ops->invalidate_fs(pool_id); 300 - } 301 - EXPORT_SYMBOL(__cleancache_invalidate_fs); 302 - 303 - static int __init init_cleancache(void) 304 - { 305 - #ifdef CONFIG_DEBUG_FS 306 - struct dentry *root = debugfs_create_dir("cleancache", NULL); 307 - 308 - debugfs_create_u64("succ_gets", 0444, root, &cleancache_succ_gets); 309 - debugfs_create_u64("failed_gets", 0444, root, &cleancache_failed_gets); 310 - debugfs_create_u64("puts", 0444, root, &cleancache_puts); 311 - debugfs_create_u64("invalidates", 0444, root, &cleancache_invalidates); 312 - #endif 313 - return 0; 314 - } 315 - module_init(init_cleancache)

+91 -11

mm/filemap.c

··· 21 21 #include <linux/gfp.h> 22 22 #include <linux/mm.h> 23 23 #include <linux/swap.h> 24 + #include <linux/swapops.h> 24 25 #include <linux/mman.h> 25 26 #include <linux/pagemap.h> 26 27 #include <linux/file.h> ··· 35 34 #include <linux/cpuset.h> 36 35 #include <linux/hugetlb.h> 37 36 #include <linux/memcontrol.h> 38 - #include <linux/cleancache.h> 39 37 #include <linux/shmem_fs.h> 40 38 #include <linux/rmap.h> 41 39 #include <linux/delayacct.h> 42 40 #include <linux/psi.h> 43 41 #include <linux/ramfs.h> 44 42 #include <linux/page_idle.h> 43 + #include <linux/migrate.h> 45 44 #include <asm/pgalloc.h> 46 45 #include <asm/tlbflush.h> 47 46 #include "internal.h" ··· 149 148 struct folio *folio) 150 149 { 151 150 long nr; 152 - 153 - /* 154 - * if we're uptodate, flush out into the cleancache, otherwise 155 - * invalidate any existing cleancache entries. We can't leave 156 - * stale data around in the cleancache once our page is gone 157 - */ 158 - if (folio_test_uptodate(folio) && folio_test_mappedtodisk(folio)) 159 - cleancache_put_page(&folio->page); 160 - else 161 - cleancache_invalidate_page(mapping, &folio->page); 162 151 163 152 VM_BUG_ON_FOLIO(folio_mapped(folio), folio); 164 153 if (!IS_ENABLED(CONFIG_DEBUG_VM) && unlikely(folio_mapped(folio))) { ··· 1376 1385 1377 1386 return wait->flags & WQ_FLAG_WOKEN ? 0 : -EINTR; 1378 1387 } 1388 + 1389 + #ifdef CONFIG_MIGRATION 1390 + /** 1391 + * migration_entry_wait_on_locked - Wait for a migration entry to be removed 1392 + * @entry: migration swap entry. 1393 + * @ptep: mapped pte pointer. Will return with the ptep unmapped. Only required 1394 + * for pte entries, pass NULL for pmd entries. 1395 + * @ptl: already locked ptl. This function will drop the lock. 1396 + * 1397 + * Wait for a migration entry referencing the given page to be removed. This is 1398 + * equivalent to put_and_wait_on_page_locked(page, TASK_UNINTERRUPTIBLE) except 1399 + * this can be called without taking a reference on the page. Instead this 1400 + * should be called while holding the ptl for the migration entry referencing 1401 + * the page. 1402 + * 1403 + * Returns after unmapping and unlocking the pte/ptl with pte_unmap_unlock(). 1404 + * 1405 + * This follows the same logic as folio_wait_bit_common() so see the comments 1406 + * there. 1407 + */ 1408 + void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep, 1409 + spinlock_t *ptl) 1410 + { 1411 + struct wait_page_queue wait_page; 1412 + wait_queue_entry_t *wait = &wait_page.wait; 1413 + bool thrashing = false; 1414 + bool delayacct = false; 1415 + unsigned long pflags; 1416 + wait_queue_head_t *q; 1417 + struct folio *folio = page_folio(pfn_swap_entry_to_page(entry)); 1418 + 1419 + q = folio_waitqueue(folio); 1420 + if (!folio_test_uptodate(folio) && folio_test_workingset(folio)) { 1421 + if (!folio_test_swapbacked(folio)) { 1422 + delayacct_thrashing_start(); 1423 + delayacct = true; 1424 + } 1425 + psi_memstall_enter(&pflags); 1426 + thrashing = true; 1427 + } 1428 + 1429 + init_wait(wait); 1430 + wait->func = wake_page_function; 1431 + wait_page.folio = folio; 1432 + wait_page.bit_nr = PG_locked; 1433 + wait->flags = 0; 1434 + 1435 + spin_lock_irq(&q->lock); 1436 + folio_set_waiters(folio); 1437 + if (!folio_trylock_flag(folio, PG_locked, wait)) 1438 + __add_wait_queue_entry_tail(q, wait); 1439 + spin_unlock_irq(&q->lock); 1440 + 1441 + /* 1442 + * If a migration entry exists for the page the migration path must hold 1443 + * a valid reference to the page, and it must take the ptl to remove the 1444 + * migration entry. So the page is valid until the ptl is dropped. 1445 + */ 1446 + if (ptep) 1447 + pte_unmap_unlock(ptep, ptl); 1448 + else 1449 + spin_unlock(ptl); 1450 + 1451 + for (;;) { 1452 + unsigned int flags; 1453 + 1454 + set_current_state(TASK_UNINTERRUPTIBLE); 1455 + 1456 + /* Loop until we've been woken or interrupted */ 1457 + flags = smp_load_acquire(&wait->flags); 1458 + if (!(flags & WQ_FLAG_WOKEN)) { 1459 + if (signal_pending_state(TASK_UNINTERRUPTIBLE, current)) 1460 + break; 1461 + 1462 + io_schedule(); 1463 + continue; 1464 + } 1465 + break; 1466 + } 1467 + 1468 + finish_wait(q, wait); 1469 + 1470 + if (thrashing) { 1471 + if (delayacct) 1472 + delayacct_thrashing_end(); 1473 + psi_memstall_leave(&pflags); 1474 + } 1475 + } 1476 + #endif 1379 1477 1380 1478 void folio_wait_bit(struct folio *folio, int bit_nr) 1381 1479 {

+16 -243

mm/frontswap.c

··· 27 27 * may be registered, but implementations can never deregister. This 28 28 * is a simple singly-linked list of all registered implementations. 29 29 */ 30 - static struct frontswap_ops *frontswap_ops __read_mostly; 31 - 32 - #define for_each_frontswap_ops(ops) \ 33 - for ((ops) = frontswap_ops; (ops); (ops) = (ops)->next) 34 - 35 - /* 36 - * If enabled, frontswap_store will return failure even on success. As 37 - * a result, the swap subsystem will always write the page to swap, in 38 - * effect converting frontswap into a writethrough cache. In this mode, 39 - * there is no direct reduction in swap writes, but a frontswap backend 40 - * can unilaterally "reclaim" any pages in use with no data loss, thus 41 - * providing increases control over maximum memory usage due to frontswap. 42 - */ 43 - static bool frontswap_writethrough_enabled __read_mostly; 44 - 45 - /* 46 - * If enabled, the underlying tmem implementation is capable of doing 47 - * exclusive gets, so frontswap_load, on a successful tmem_get must 48 - * mark the page as no longer in frontswap AND mark it dirty. 49 - */ 50 - static bool frontswap_tmem_exclusive_gets_enabled __read_mostly; 30 + static const struct frontswap_ops *frontswap_ops __read_mostly; 51 31 52 32 #ifdef CONFIG_DEBUG_FS 53 33 /* ··· 94 114 /* 95 115 * Register operations for frontswap 96 116 */ 97 - void frontswap_register_ops(struct frontswap_ops *ops) 117 + int frontswap_register_ops(const struct frontswap_ops *ops) 98 118 { 99 - DECLARE_BITMAP(a, MAX_SWAPFILES); 100 - DECLARE_BITMAP(b, MAX_SWAPFILES); 101 - struct swap_info_struct *si; 102 - unsigned int i; 119 + if (frontswap_ops) 120 + return -EINVAL; 103 121 104 - bitmap_zero(a, MAX_SWAPFILES); 105 - bitmap_zero(b, MAX_SWAPFILES); 106 - 107 - spin_lock(&swap_lock); 108 - plist_for_each_entry(si, &swap_active_head, list) { 109 - if (!WARN_ON(!si->frontswap_map)) 110 - __set_bit(si->type, a); 111 - } 112 - spin_unlock(&swap_lock); 113 - 114 - /* the new ops needs to know the currently active swap devices */ 115 - for_each_set_bit(i, a, MAX_SWAPFILES) 116 - ops->init(i); 117 - 118 - /* 119 - * Setting frontswap_ops must happen after the ops->init() calls 120 - * above; cmpxchg implies smp_mb() which will ensure the init is 121 - * complete at this point. 122 - */ 123 - do { 124 - ops->next = frontswap_ops; 125 - } while (cmpxchg(&frontswap_ops, ops->next, ops) != ops->next); 126 - 122 + frontswap_ops = ops; 127 123 static_branch_inc(&frontswap_enabled_key); 128 - 129 - spin_lock(&swap_lock); 130 - plist_for_each_entry(si, &swap_active_head, list) { 131 - if (si->frontswap_map) 132 - __set_bit(si->type, b); 133 - } 134 - spin_unlock(&swap_lock); 135 - 136 - /* 137 - * On the very unlikely chance that a swap device was added or 138 - * removed between setting the "a" list bits and the ops init 139 - * calls, we re-check and do init or invalidate for any changed 140 - * bits. 141 - */ 142 - if (unlikely(!bitmap_equal(a, b, MAX_SWAPFILES))) { 143 - for (i = 0; i < MAX_SWAPFILES; i++) { 144 - if (!test_bit(i, a) && test_bit(i, b)) 145 - ops->init(i); 146 - else if (test_bit(i, a) && !test_bit(i, b)) 147 - ops->invalidate_area(i); 148 - } 149 - } 124 + return 0; 150 125 } 151 - EXPORT_SYMBOL(frontswap_register_ops); 152 - 153 - /* 154 - * Enable/disable frontswap writethrough (see above). 155 - */ 156 - void frontswap_writethrough(bool enable) 157 - { 158 - frontswap_writethrough_enabled = enable; 159 - } 160 - EXPORT_SYMBOL(frontswap_writethrough); 161 - 162 - /* 163 - * Enable/disable frontswap exclusive gets (see above). 164 - */ 165 - void frontswap_tmem_exclusive_gets(bool enable) 166 - { 167 - frontswap_tmem_exclusive_gets_enabled = enable; 168 - } 169 - EXPORT_SYMBOL(frontswap_tmem_exclusive_gets); 170 126 171 127 /* 172 128 * Called when a swap device is swapon'd. 173 129 */ 174 - void __frontswap_init(unsigned type, unsigned long *map) 130 + void frontswap_init(unsigned type, unsigned long *map) 175 131 { 176 132 struct swap_info_struct *sis = swap_info[type]; 177 - struct frontswap_ops *ops; 178 133 179 134 VM_BUG_ON(sis == NULL); 180 135 ··· 125 210 * p->frontswap set to something valid to work properly. 126 211 */ 127 212 frontswap_map_set(sis, map); 128 - 129 - for_each_frontswap_ops(ops) 130 - ops->init(type); 213 + frontswap_ops->init(type); 131 214 } 132 - EXPORT_SYMBOL(__frontswap_init); 133 215 134 - bool __frontswap_test(struct swap_info_struct *sis, 216 + static bool __frontswap_test(struct swap_info_struct *sis, 135 217 pgoff_t offset) 136 218 { 137 219 if (sis->frontswap_map) 138 220 return test_bit(offset, sis->frontswap_map); 139 221 return false; 140 222 } 141 - EXPORT_SYMBOL(__frontswap_test); 142 223 143 224 static inline void __frontswap_set(struct swap_info_struct *sis, 144 225 pgoff_t offset) ··· 164 253 int type = swp_type(entry); 165 254 struct swap_info_struct *sis = swap_info[type]; 166 255 pgoff_t offset = swp_offset(entry); 167 - struct frontswap_ops *ops; 168 256 169 257 VM_BUG_ON(!frontswap_ops); 170 258 VM_BUG_ON(!PageLocked(page)); ··· 177 267 */ 178 268 if (__frontswap_test(sis, offset)) { 179 269 __frontswap_clear(sis, offset); 180 - for_each_frontswap_ops(ops) 181 - ops->invalidate_page(type, offset); 270 + frontswap_ops->invalidate_page(type, offset); 182 271 } 183 272 184 - /* Try to store in each implementation, until one succeeds. */ 185 - for_each_frontswap_ops(ops) { 186 - ret = ops->store(type, offset, page); 187 - if (!ret) /* successful store */ 188 - break; 189 - } 273 + ret = frontswap_ops->store(type, offset, page); 190 274 if (ret == 0) { 191 275 __frontswap_set(sis, offset); 192 276 inc_frontswap_succ_stores(); 193 277 } else { 194 278 inc_frontswap_failed_stores(); 195 279 } 196 - if (frontswap_writethrough_enabled) 197 - /* report failure so swap also writes to swap device */ 198 - ret = -1; 280 + 199 281 return ret; 200 282 } 201 - EXPORT_SYMBOL(__frontswap_store); 202 283 203 284 /* 204 285 * "Get" data from frontswap associated with swaptype and offset that were ··· 203 302 int type = swp_type(entry); 204 303 struct swap_info_struct *sis = swap_info[type]; 205 304 pgoff_t offset = swp_offset(entry); 206 - struct frontswap_ops *ops; 207 305 208 306 VM_BUG_ON(!frontswap_ops); 209 307 VM_BUG_ON(!PageLocked(page)); ··· 212 312 return -1; 213 313 214 314 /* Try loading from each implementation, until one succeeds. */ 215 - for_each_frontswap_ops(ops) { 216 - ret = ops->load(type, offset, page); 217 - if (!ret) /* successful load */ 218 - break; 219 - } 220 - if (ret == 0) { 315 + ret = frontswap_ops->load(type, offset, page); 316 + if (ret == 0) 221 317 inc_frontswap_loads(); 222 - if (frontswap_tmem_exclusive_gets_enabled) { 223 - SetPageDirty(page); 224 - __frontswap_clear(sis, offset); 225 - } 226 - } 227 318 return ret; 228 319 } 229 - EXPORT_SYMBOL(__frontswap_load); 230 320 231 321 /* 232 322 * Invalidate any data from frontswap associated with the specified swaptype ··· 225 335 void __frontswap_invalidate_page(unsigned type, pgoff_t offset) 226 336 { 227 337 struct swap_info_struct *sis = swap_info[type]; 228 - struct frontswap_ops *ops; 229 338 230 339 VM_BUG_ON(!frontswap_ops); 231 340 VM_BUG_ON(sis == NULL); ··· 232 343 if (!__frontswap_test(sis, offset)) 233 344 return; 234 345 235 - for_each_frontswap_ops(ops) 236 - ops->invalidate_page(type, offset); 346 + frontswap_ops->invalidate_page(type, offset); 237 347 __frontswap_clear(sis, offset); 238 348 inc_frontswap_invalidates(); 239 349 } 240 - EXPORT_SYMBOL(__frontswap_invalidate_page); 241 350 242 351 /* 243 352 * Invalidate all data from frontswap associated with all offsets for the ··· 244 357 void __frontswap_invalidate_area(unsigned type) 245 358 { 246 359 struct swap_info_struct *sis = swap_info[type]; 247 - struct frontswap_ops *ops; 248 360 249 361 VM_BUG_ON(!frontswap_ops); 250 362 VM_BUG_ON(sis == NULL); ··· 251 365 if (sis->frontswap_map == NULL) 252 366 return; 253 367 254 - for_each_frontswap_ops(ops) 255 - ops->invalidate_area(type); 368 + frontswap_ops->invalidate_area(type); 256 369 atomic_set(&sis->frontswap_pages, 0); 257 370 bitmap_zero(sis->frontswap_map, sis->max); 258 371 } 259 - EXPORT_SYMBOL(__frontswap_invalidate_area); 260 - 261 - static unsigned long __frontswap_curr_pages(void) 262 - { 263 - unsigned long totalpages = 0; 264 - struct swap_info_struct *si = NULL; 265 - 266 - assert_spin_locked(&swap_lock); 267 - plist_for_each_entry(si, &swap_active_head, list) 268 - totalpages += atomic_read(&si->frontswap_pages); 269 - return totalpages; 270 - } 271 - 272 - static int __frontswap_unuse_pages(unsigned long total, unsigned long *unused, 273 - int *swapid) 274 - { 275 - int ret = -EINVAL; 276 - struct swap_info_struct *si = NULL; 277 - int si_frontswap_pages; 278 - unsigned long total_pages_to_unuse = total; 279 - unsigned long pages = 0, pages_to_unuse = 0; 280 - 281 - assert_spin_locked(&swap_lock); 282 - plist_for_each_entry(si, &swap_active_head, list) { 283 - si_frontswap_pages = atomic_read(&si->frontswap_pages); 284 - if (total_pages_to_unuse < si_frontswap_pages) { 285 - pages = pages_to_unuse = total_pages_to_unuse; 286 - } else { 287 - pages = si_frontswap_pages; 288 - pages_to_unuse = 0; /* unuse all */ 289 - } 290 - /* ensure there is enough RAM to fetch pages from frontswap */ 291 - if (security_vm_enough_memory_mm(current->mm, pages)) { 292 - ret = -ENOMEM; 293 - continue; 294 - } 295 - vm_unacct_memory(pages); 296 - *unused = pages_to_unuse; 297 - *swapid = si->type; 298 - ret = 0; 299 - break; 300 - } 301 - 302 - return ret; 303 - } 304 - 305 - /* 306 - * Used to check if it's necessary and feasible to unuse pages. 307 - * Return 1 when nothing to do, 0 when need to shrink pages, 308 - * error code when there is an error. 309 - */ 310 - static int __frontswap_shrink(unsigned long target_pages, 311 - unsigned long *pages_to_unuse, 312 - int *type) 313 - { 314 - unsigned long total_pages = 0, total_pages_to_unuse; 315 - 316 - assert_spin_locked(&swap_lock); 317 - 318 - total_pages = __frontswap_curr_pages(); 319 - if (total_pages <= target_pages) { 320 - /* Nothing to do */ 321 - *pages_to_unuse = 0; 322 - return 1; 323 - } 324 - total_pages_to_unuse = total_pages - target_pages; 325 - return __frontswap_unuse_pages(total_pages_to_unuse, pages_to_unuse, type); 326 - } 327 - 328 - /* 329 - * Frontswap, like a true swap device, may unnecessarily retain pages 330 - * under certain circumstances; "shrink" frontswap is essentially a 331 - * "partial swapoff" and works by calling try_to_unuse to attempt to 332 - * unuse enough frontswap pages to attempt to -- subject to memory 333 - * constraints -- reduce the number of pages in frontswap to the 334 - * number given in the parameter target_pages. 335 - */ 336 - void frontswap_shrink(unsigned long target_pages) 337 - { 338 - unsigned long pages_to_unuse = 0; 339 - int type, ret; 340 - 341 - /* 342 - * we don't want to hold swap_lock while doing a very 343 - * lengthy try_to_unuse, but swap_list may change 344 - * so restart scan from swap_active_head each time 345 - */ 346 - spin_lock(&swap_lock); 347 - ret = __frontswap_shrink(target_pages, &pages_to_unuse, &type); 348 - spin_unlock(&swap_lock); 349 - if (ret == 0) 350 - try_to_unuse(type, true, pages_to_unuse); 351 - return; 352 - } 353 - EXPORT_SYMBOL(frontswap_shrink); 354 - 355 - /* 356 - * Count and return the number of frontswap pages across all 357 - * swap devices. This is exported so that backend drivers can 358 - * determine current usage without reading debugfs. 359 - */ 360 - unsigned long frontswap_curr_pages(void) 361 - { 362 - unsigned long totalpages = 0; 363 - 364 - spin_lock(&swap_lock); 365 - totalpages = __frontswap_curr_pages(); 366 - spin_unlock(&swap_lock); 367 - 368 - return totalpages; 369 - } 370 - EXPORT_SYMBOL(frontswap_curr_pages); 371 372 372 373 static int __init init_frontswap(void) 373 374 {

-1

mm/kasan/common.c

··· 36 36 unsigned int nr_entries; 37 37 38 38 nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 0); 39 - nr_entries = filter_irq_stacks(entries, nr_entries); 40 39 return __stack_depot_save(entries, nr_entries, flags, can_alloc); 41 40 } 42 41

+4 -34

mm/migrate.c

··· 291 291 { 292 292 pte_t pte; 293 293 swp_entry_t entry; 294 - struct folio *folio; 295 294 296 295 spin_lock(ptl); 297 296 pte = *ptep; ··· 301 302 if (!is_migration_entry(entry)) 302 303 goto out; 303 304 304 - folio = page_folio(pfn_swap_entry_to_page(entry)); 305 - 306 - /* 307 - * Once page cache replacement of page migration started, page_count 308 - * is zero; but we must not call folio_put_wait_locked() without 309 - * a ref. Use folio_try_get(), and just fault again if it fails. 310 - */ 311 - if (!folio_try_get(folio)) 312 - goto out; 313 - pte_unmap_unlock(ptep, ptl); 314 - folio_put_wait_locked(folio, TASK_UNINTERRUPTIBLE); 305 + migration_entry_wait_on_locked(entry, ptep, ptl); 315 306 return; 316 307 out: 317 308 pte_unmap_unlock(ptep, ptl); ··· 326 337 void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd) 327 338 { 328 339 spinlock_t *ptl; 329 - struct folio *folio; 330 340 331 341 ptl = pmd_lock(mm, pmd); 332 342 if (!is_pmd_migration_entry(*pmd)) 333 343 goto unlock; 334 - folio = page_folio(pfn_swap_entry_to_page(pmd_to_swp_entry(*pmd))); 335 - if (!folio_try_get(folio)) 336 - goto unlock; 337 - spin_unlock(ptl); 338 - folio_put_wait_locked(folio, TASK_UNINTERRUPTIBLE); 344 + migration_entry_wait_on_locked(pmd_to_swp_entry(*pmd), NULL, ptl); 339 345 return; 340 346 unlock: 341 347 spin_unlock(ptl); ··· 2415 2431 return false; 2416 2432 2417 2433 /* Page from ZONE_DEVICE have one extra reference */ 2418 - if (is_zone_device_page(page)) { 2419 - /* 2420 - * Private page can never be pin as they have no valid pte and 2421 - * GUP will fail for those. Yet if there is a pending migration 2422 - * a thread might try to wait on the pte migration entry and 2423 - * will bump the page reference count. Sadly there is no way to 2424 - * differentiate a regular pin from migration wait. Hence to 2425 - * avoid 2 racing thread trying to migrate back to CPU to enter 2426 - * infinite loop (one stopping migration because the other is 2427 - * waiting on pte migration entry). We always return true here. 2428 - * 2429 - * FIXME proper solution is to rework migration_entry_wait() so 2430 - * it does not need to take a reference on page. 2431 - */ 2432 - return is_device_private_page(page); 2433 - } 2434 + if (is_zone_device_page(page)) 2435 + extra++; 2434 2436 2435 2437 /* For file back page */ 2436 2438 if (page_mapping(page))

+2

mm/page_owner.c

··· 80 80 if (!page_owner_enabled) 81 81 return; 82 82 83 + stack_depot_init(); 84 + 83 85 register_dummy_stack(); 84 86 register_failure_stack(); 85 87 register_early_stack();

+6 -27

mm/shmem.c

··· 36 36 #include <linux/uio.h> 37 37 #include <linux/khugepaged.h> 38 38 #include <linux/hugetlb.h> 39 - #include <linux/frontswap.h> 40 39 #include <linux/fs_parser.h> 41 40 #include <linux/swapfile.h> 42 41 ··· 1151 1152 static int shmem_find_swap_entries(struct address_space *mapping, 1152 1153 pgoff_t start, unsigned int nr_entries, 1153 1154 struct page **entries, pgoff_t *indices, 1154 - unsigned int type, bool frontswap) 1155 + unsigned int type) 1155 1156 { 1156 1157 XA_STATE(xas, &mapping->i_pages, start); 1157 1158 struct page *page; ··· 1171 1172 1172 1173 entry = radix_to_swp_entry(page); 1173 1174 if (swp_type(entry) != type) 1174 - continue; 1175 - if (frontswap && 1176 - !frontswap_test(swap_info[type], swp_offset(entry))) 1177 1175 continue; 1178 1176 1179 1177 indices[ret] = xas.xa_index; ··· 1224 1228 /* 1225 1229 * If swap found in inode, free it and move page from swapcache to filecache. 1226 1230 */ 1227 - static int shmem_unuse_inode(struct inode *inode, unsigned int type, 1228 - bool frontswap, unsigned long *fs_pages_to_unuse) 1231 + static int shmem_unuse_inode(struct inode *inode, unsigned int type) 1229 1232 { 1230 1233 struct address_space *mapping = inode->i_mapping; 1231 1234 pgoff_t start = 0; 1232 1235 struct pagevec pvec; 1233 1236 pgoff_t indices[PAGEVEC_SIZE]; 1234 - bool frontswap_partial = (frontswap && *fs_pages_to_unuse > 0); 1235 1237 int ret = 0; 1236 1238 1237 1239 pagevec_init(&pvec); 1238 1240 do { 1239 1241 unsigned int nr_entries = PAGEVEC_SIZE; 1240 1242 1241 - if (frontswap_partial && *fs_pages_to_unuse < PAGEVEC_SIZE) 1242 - nr_entries = *fs_pages_to_unuse; 1243 - 1244 1243 pvec.nr = shmem_find_swap_entries(mapping, start, nr_entries, 1245 - pvec.pages, indices, 1246 - type, frontswap); 1244 + pvec.pages, indices, type); 1247 1245 if (pvec.nr == 0) { 1248 1246 ret = 0; 1249 1247 break; ··· 1246 1256 ret = shmem_unuse_swap_entries(inode, pvec, indices); 1247 1257 if (ret < 0) 1248 1258 break; 1249 - 1250 - if (frontswap_partial) { 1251 - *fs_pages_to_unuse -= ret; 1252 - if (*fs_pages_to_unuse == 0) { 1253 - ret = FRONTSWAP_PAGES_UNUSED; 1254 - break; 1255 - } 1256 - } 1257 1259 1258 1260 start = indices[pvec.nr - 1]; 1259 1261 } while (true); ··· 1258 1276 * device 'type' back into memory, so the swap device can be 1259 1277 * unused. 1260 1278 */ 1261 - int shmem_unuse(unsigned int type, bool frontswap, 1262 - unsigned long *fs_pages_to_unuse) 1279 + int shmem_unuse(unsigned int type) 1263 1280 { 1264 1281 struct shmem_inode_info *info, *next; 1265 1282 int error = 0; ··· 1281 1300 atomic_inc(&info->stop_eviction); 1282 1301 mutex_unlock(&shmem_swaplist_mutex); 1283 1302 1284 - error = shmem_unuse_inode(&info->vfs_inode, type, frontswap, 1285 - fs_pages_to_unuse); 1303 + error = shmem_unuse_inode(&info->vfs_inode, type); 1286 1304 cond_resched(); 1287 1305 1288 1306 mutex_lock(&shmem_swaplist_mutex); ··· 3995 4015 return 0; 3996 4016 } 3997 4017 3998 - int shmem_unuse(unsigned int type, bool frontswap, 3999 - unsigned long *fs_pages_to_unuse) 4018 + int shmem_unuse(unsigned int type) 4000 4019 { 4001 4020 return 0; 4002 4021 }

+27 -63

mm/swapfile.c

··· 49 49 unsigned char); 50 50 static void free_swap_count_continuations(struct swap_info_struct *); 51 51 52 - DEFINE_SPINLOCK(swap_lock); 52 + static DEFINE_SPINLOCK(swap_lock); 53 53 static unsigned int nr_swapfiles; 54 54 atomic_long_t nr_swap_pages; 55 55 /* ··· 71 71 * all active swap_info_structs 72 72 * protected with swap_lock, and ordered by priority. 73 73 */ 74 - PLIST_HEAD(swap_active_head); 74 + static PLIST_HEAD(swap_active_head); 75 75 76 76 /* 77 77 * all available (active, not full) swap_info_structs ··· 1923 1923 1924 1924 static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd, 1925 1925 unsigned long addr, unsigned long end, 1926 - unsigned int type, bool frontswap, 1927 - unsigned long *fs_pages_to_unuse) 1926 + unsigned int type) 1928 1927 { 1929 1928 struct page *page; 1930 1929 swp_entry_t entry; ··· 1944 1945 continue; 1945 1946 1946 1947 offset = swp_offset(entry); 1947 - if (frontswap && !frontswap_test(si, offset)) 1948 - continue; 1949 - 1950 1948 pte_unmap(pte); 1951 1949 swap_map = &si->swap_map[offset]; 1952 1950 page = lookup_swap_cache(entry, vma, addr); ··· 1975 1979 try_to_free_swap(page); 1976 1980 unlock_page(page); 1977 1981 put_page(page); 1978 - 1979 - if (*fs_pages_to_unuse && !--(*fs_pages_to_unuse)) { 1980 - ret = FRONTSWAP_PAGES_UNUSED; 1981 - goto out; 1982 - } 1983 1982 try_next: 1984 1983 pte = pte_offset_map(pmd, addr); 1985 1984 } while (pte++, addr += PAGE_SIZE, addr != end); ··· 1987 1996 1988 1997 static inline int unuse_pmd_range(struct vm_area_struct *vma, pud_t *pud, 1989 1998 unsigned long addr, unsigned long end, 1990 - unsigned int type, bool frontswap, 1991 - unsigned long *fs_pages_to_unuse) 1999 + unsigned int type) 1992 2000 { 1993 2001 pmd_t *pmd; 1994 2002 unsigned long next; ··· 1999 2009 next = pmd_addr_end(addr, end); 2000 2010 if (pmd_none_or_trans_huge_or_clear_bad(pmd)) 2001 2011 continue; 2002 - ret = unuse_pte_range(vma, pmd, addr, next, type, 2003 - frontswap, fs_pages_to_unuse); 2012 + ret = unuse_pte_range(vma, pmd, addr, next, type); 2004 2013 if (ret) 2005 2014 return ret; 2006 2015 } while (pmd++, addr = next, addr != end); ··· 2008 2019 2009 2020 static inline int unuse_pud_range(struct vm_area_struct *vma, p4d_t *p4d, 2010 2021 unsigned long addr, unsigned long end, 2011 - unsigned int type, bool frontswap, 2012 - unsigned long *fs_pages_to_unuse) 2022 + unsigned int type) 2013 2023 { 2014 2024 pud_t *pud; 2015 2025 unsigned long next; ··· 2019 2031 next = pud_addr_end(addr, end); 2020 2032 if (pud_none_or_clear_bad(pud)) 2021 2033 continue; 2022 - ret = unuse_pmd_range(vma, pud, addr, next, type, 2023 - frontswap, fs_pages_to_unuse); 2034 + ret = unuse_pmd_range(vma, pud, addr, next, type); 2024 2035 if (ret) 2025 2036 return ret; 2026 2037 } while (pud++, addr = next, addr != end); ··· 2028 2041 2029 2042 static inline int unuse_p4d_range(struct vm_area_struct *vma, pgd_t *pgd, 2030 2043 unsigned long addr, unsigned long end, 2031 - unsigned int type, bool frontswap, 2032 - unsigned long *fs_pages_to_unuse) 2044 + unsigned int type) 2033 2045 { 2034 2046 p4d_t *p4d; 2035 2047 unsigned long next; ··· 2039 2053 next = p4d_addr_end(addr, end); 2040 2054 if (p4d_none_or_clear_bad(p4d)) 2041 2055 continue; 2042 - ret = unuse_pud_range(vma, p4d, addr, next, type, 2043 - frontswap, fs_pages_to_unuse); 2056 + ret = unuse_pud_range(vma, p4d, addr, next, type); 2044 2057 if (ret) 2045 2058 return ret; 2046 2059 } while (p4d++, addr = next, addr != end); 2047 2060 return 0; 2048 2061 } 2049 2062 2050 - static int unuse_vma(struct vm_area_struct *vma, unsigned int type, 2051 - bool frontswap, unsigned long *fs_pages_to_unuse) 2063 + static int unuse_vma(struct vm_area_struct *vma, unsigned int type) 2052 2064 { 2053 2065 pgd_t *pgd; 2054 2066 unsigned long addr, end, next; ··· 2060 2076 next = pgd_addr_end(addr, end); 2061 2077 if (pgd_none_or_clear_bad(pgd)) 2062 2078 continue; 2063 - ret = unuse_p4d_range(vma, pgd, addr, next, type, 2064 - frontswap, fs_pages_to_unuse); 2079 + ret = unuse_p4d_range(vma, pgd, addr, next, type); 2065 2080 if (ret) 2066 2081 return ret; 2067 2082 } while (pgd++, addr = next, addr != end); 2068 2083 return 0; 2069 2084 } 2070 2085 2071 - static int unuse_mm(struct mm_struct *mm, unsigned int type, 2072 - bool frontswap, unsigned long *fs_pages_to_unuse) 2086 + static int unuse_mm(struct mm_struct *mm, unsigned int type) 2073 2087 { 2074 2088 struct vm_area_struct *vma; 2075 2089 int ret = 0; ··· 2075 2093 mmap_read_lock(mm); 2076 2094 for (vma = mm->mmap; vma; vma = vma->vm_next) { 2077 2095 if (vma->anon_vma) { 2078 - ret = unuse_vma(vma, type, frontswap, 2079 - fs_pages_to_unuse); 2096 + ret = unuse_vma(vma, type); 2080 2097 if (ret) 2081 2098 break; 2082 2099 } ··· 2091 2110 * if there are no inuse entries after prev till end of the map. 2092 2111 */ 2093 2112 static unsigned int find_next_to_unuse(struct swap_info_struct *si, 2094 - unsigned int prev, bool frontswap) 2113 + unsigned int prev) 2095 2114 { 2096 2115 unsigned int i; 2097 2116 unsigned char count; ··· 2105 2124 for (i = prev + 1; i < si->max; i++) { 2106 2125 count = READ_ONCE(si->swap_map[i]); 2107 2126 if (count && swap_count(count) != SWAP_MAP_BAD) 2108 - if (!frontswap || frontswap_test(si, i)) 2109 - break; 2127 + break; 2110 2128 if ((i % LATENCY_LIMIT) == 0) 2111 2129 cond_resched(); 2112 2130 } ··· 2116 2136 return i; 2117 2137 } 2118 2138 2119 - /* 2120 - * If the boolean frontswap is true, only unuse pages_to_unuse pages; 2121 - * pages_to_unuse==0 means all pages; ignored if frontswap is false 2122 - */ 2123 - int try_to_unuse(unsigned int type, bool frontswap, 2124 - unsigned long pages_to_unuse) 2139 + static int try_to_unuse(unsigned int type) 2125 2140 { 2126 2141 struct mm_struct *prev_mm; 2127 2142 struct mm_struct *mm; ··· 2130 2155 if (!READ_ONCE(si->inuse_pages)) 2131 2156 return 0; 2132 2157 2133 - if (!frontswap) 2134 - pages_to_unuse = 0; 2135 - 2136 2158 retry: 2137 - retval = shmem_unuse(type, frontswap, &pages_to_unuse); 2159 + retval = shmem_unuse(type); 2138 2160 if (retval) 2139 - goto out; 2161 + return retval; 2140 2162 2141 2163 prev_mm = &init_mm; 2142 2164 mmget(prev_mm); ··· 2150 2178 spin_unlock(&mmlist_lock); 2151 2179 mmput(prev_mm); 2152 2180 prev_mm = mm; 2153 - retval = unuse_mm(mm, type, frontswap, &pages_to_unuse); 2154 - 2181 + retval = unuse_mm(mm, type); 2155 2182 if (retval) { 2156 2183 mmput(prev_mm); 2157 - goto out; 2184 + return retval; 2158 2185 } 2159 2186 2160 2187 /* ··· 2170 2199 i = 0; 2171 2200 while (READ_ONCE(si->inuse_pages) && 2172 2201 !signal_pending(current) && 2173 - (i = find_next_to_unuse(si, i, frontswap)) != 0) { 2202 + (i = find_next_to_unuse(si, i)) != 0) { 2174 2203 2175 2204 entry = swp_entry(type, i); 2176 2205 page = find_get_page(swap_address_space(entry), i); ··· 2188 2217 try_to_free_swap(page); 2189 2218 unlock_page(page); 2190 2219 put_page(page); 2191 - 2192 - /* 2193 - * For frontswap, we just need to unuse pages_to_unuse, if 2194 - * it was specified. Need not check frontswap again here as 2195 - * we already zeroed out pages_to_unuse if not frontswap. 2196 - */ 2197 - if (pages_to_unuse && --pages_to_unuse == 0) 2198 - goto out; 2199 2220 } 2200 2221 2201 2222 /* ··· 2205 2242 if (READ_ONCE(si->inuse_pages)) { 2206 2243 if (!signal_pending(current)) 2207 2244 goto retry; 2208 - retval = -EINTR; 2245 + return -EINTR; 2209 2246 } 2210 - out: 2211 - return (retval == FRONTSWAP_PAGES_UNUSED) ? 0 : retval; 2247 + 2248 + return 0; 2212 2249 } 2213 2250 2214 2251 /* ··· 2426 2463 struct swap_cluster_info *cluster_info, 2427 2464 unsigned long *frontswap_map) 2428 2465 { 2429 - frontswap_init(p->type, frontswap_map); 2466 + if (IS_ENABLED(CONFIG_FRONTSWAP)) 2467 + frontswap_init(p->type, frontswap_map); 2430 2468 spin_lock(&swap_lock); 2431 2469 spin_lock(&p->lock); 2432 2470 setup_swap_info(p, prio, swap_map, cluster_info); ··· 2540 2576 disable_swap_slots_cache_lock(); 2541 2577 2542 2578 set_current_oom_origin(); 2543 - err = try_to_unuse(p->type, false, 0); /* force unuse all pages */ 2579 + err = try_to_unuse(p->type); 2544 2580 clear_current_oom_origin(); 2545 2581 2546 2582 if (err) {

+2 -13

mm/truncate.c

··· 22 22 #include <linux/buffer_head.h> /* grr. try_to_release_page, 23 23 do_invalidatepage */ 24 24 #include <linux/shmem_fs.h> 25 - #include <linux/cleancache.h> 26 25 #include <linux/rmap.h> 27 26 #include "internal.h" 28 27 ··· 263 264 */ 264 265 folio_zero_range(folio, offset, length); 265 266 266 - cleancache_invalidate_page(folio->mapping, &folio->page); 267 267 if (folio_has_private(folio)) 268 268 do_invalidatepage(&folio->page, offset, length); 269 269 if (!folio_test_large(folio)) ··· 349 351 bool same_folio; 350 352 351 353 if (mapping_empty(mapping)) 352 - goto out; 354 + return; 353 355 354 356 /* 355 357 * 'start' and 'end' always covers the range of pages to be fully ··· 440 442 folio_batch_release(&fbatch); 441 443 index++; 442 444 } 443 - 444 - out: 445 - cleancache_invalidate_inode(mapping); 446 445 } 447 446 EXPORT_SYMBOL(truncate_inode_pages_range); 448 447 ··· 493 498 xa_unlock_irq(&mapping->i_pages); 494 499 } 495 500 496 - /* 497 - * Cleancache needs notification even if there are no pages or shadow 498 - * entries. 499 - */ 500 501 truncate_inode_pages(mapping, 0); 501 502 } 502 503 EXPORT_SYMBOL(truncate_inode_pages_final); ··· 652 661 int did_range_unmap = 0; 653 662 654 663 if (mapping_empty(mapping)) 655 - goto out; 664 + return 0; 656 665 657 666 folio_batch_init(&fbatch); 658 667 index = start; ··· 716 725 if (dax_mapping(mapping)) { 717 726 unmap_mapping_pages(mapping, start, end - start + 1, false); 718 727 } 719 - out: 720 - cleancache_invalidate_inode(mapping); 721 728 return ret; 722 729 } 723 730 EXPORT_SYMBOL_GPL(invalidate_inode_pages2_range);

+188 -341

mm/zsmalloc.c

··· 30 30 31 31 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt 32 32 33 + /* 34 + * lock ordering: 35 + * page_lock 36 + * pool->migrate_lock 37 + * class->lock 38 + * zspage->lock 39 + */ 40 + 33 41 #include <linux/module.h> 34 42 #include <linux/kernel.h> 35 43 #include <linux/sched.h> ··· 65 57 #include <linux/wait.h> 66 58 #include <linux/pagemap.h> 67 59 #include <linux/fs.h> 60 + #include <linux/local_lock.h> 68 61 69 62 #define ZSPAGE_MAGIC 0x58 70 63 ··· 110 101 #define _PFN_BITS (MAX_POSSIBLE_PHYSMEM_BITS - PAGE_SHIFT) 111 102 112 103 /* 113 - * Memory for allocating for handle keeps object position by 114 - * encoding <page, obj_idx> and the encoded value has a room 115 - * in least bit(ie, look at obj_to_location). 116 - * We use the bit to synchronize between object access by 117 - * user and migration. 118 - */ 119 - #define HANDLE_PIN_BIT 0 120 - 121 - /* 122 104 * Head in allocated object should have OBJ_ALLOCATED_TAG 123 105 * to identify the object was allocated or not. 124 106 * It's okay to add the status bit in the least bit because ··· 121 121 #define OBJ_INDEX_BITS (BITS_PER_LONG - _PFN_BITS - OBJ_TAG_BITS) 122 122 #define OBJ_INDEX_MASK ((_AC(1, UL) << OBJ_INDEX_BITS) - 1) 123 123 124 + #define HUGE_BITS 1 124 125 #define FULLNESS_BITS 2 125 126 #define CLASS_BITS 8 126 127 #define ISOLATED_BITS 3 ··· 159 158 NR_ZS_FULLNESS, 160 159 }; 161 160 162 - enum zs_stat_type { 161 + enum class_stat_type { 163 162 CLASS_EMPTY, 164 163 CLASS_ALMOST_EMPTY, 165 164 CLASS_ALMOST_FULL, ··· 214 213 struct zs_size_stat stats; 215 214 }; 216 215 217 - /* huge object: pages_per_zspage == 1 && maxobj_per_zspage == 1 */ 218 - static void SetPageHugeObject(struct page *page) 219 - { 220 - SetPageOwnerPriv1(page); 221 - } 222 - 223 - static void ClearPageHugeObject(struct page *page) 224 - { 225 - ClearPageOwnerPriv1(page); 226 - } 227 - 228 - static int PageHugeObject(struct page *page) 229 - { 230 - return PageOwnerPriv1(page); 231 - } 232 - 233 216 /* 234 217 * Placed within free objects to form a singly linked list. 235 218 * For every zspage, zspage->freeobj gives head of this list. ··· 254 269 #ifdef CONFIG_COMPACTION 255 270 struct inode *inode; 256 271 struct work_struct free_work; 257 - /* A wait queue for when migration races with async_free_zspage() */ 258 - struct wait_queue_head migration_wait; 259 - atomic_long_t isolated_pages; 260 - bool destroying; 261 272 #endif 273 + /* protect page/zspage migration */ 274 + rwlock_t migrate_lock; 262 275 }; 263 276 264 277 struct zspage { 265 278 struct { 279 + unsigned int huge:HUGE_BITS; 266 280 unsigned int fullness:FULLNESS_BITS; 267 281 unsigned int class:CLASS_BITS + 1; 268 282 unsigned int isolated:ISOLATED_BITS; ··· 277 293 }; 278 294 279 295 struct mapping_area { 296 + local_lock_t lock; 280 297 char *vm_buf; /* copy buffer for objects that span pages */ 281 298 char *vm_addr; /* address of kmap_atomic()'ed pages */ 282 299 enum zs_mapmode vm_mm; /* mapping mode */ 283 300 }; 301 + 302 + /* huge object: pages_per_zspage == 1 && maxobj_per_zspage == 1 */ 303 + static void SetZsHugePage(struct zspage *zspage) 304 + { 305 + zspage->huge = 1; 306 + } 307 + 308 + static bool ZsHugePage(struct zspage *zspage) 309 + { 310 + return zspage->huge; 311 + } 284 312 285 313 #ifdef CONFIG_COMPACTION 286 314 static int zs_register_migration(struct zs_pool *pool); ··· 300 304 static void migrate_lock_init(struct zspage *zspage); 301 305 static void migrate_read_lock(struct zspage *zspage); 302 306 static void migrate_read_unlock(struct zspage *zspage); 307 + static void migrate_write_lock(struct zspage *zspage); 308 + static void migrate_write_lock_nested(struct zspage *zspage); 309 + static void migrate_write_unlock(struct zspage *zspage); 303 310 static void kick_deferred_free(struct zs_pool *pool); 304 311 static void init_deferred_free(struct zs_pool *pool); 305 312 static void SetZsPageMovable(struct zs_pool *pool, struct zspage *zspage); ··· 314 315 static void migrate_lock_init(struct zspage *zspage) {} 315 316 static void migrate_read_lock(struct zspage *zspage) {} 316 317 static void migrate_read_unlock(struct zspage *zspage) {} 318 + static void migrate_write_lock(struct zspage *zspage) {} 319 + static void migrate_write_lock_nested(struct zspage *zspage) {} 320 + static void migrate_write_unlock(struct zspage *zspage) {} 317 321 static void kick_deferred_free(struct zs_pool *pool) {} 318 322 static void init_deferred_free(struct zs_pool *pool) {} 319 323 static void SetZsPageMovable(struct zs_pool *pool, struct zspage *zspage) {} ··· 368 366 kmem_cache_free(pool->zspage_cachep, zspage); 369 367 } 370 368 369 + /* class->lock(which owns the handle) synchronizes races */ 371 370 static void record_obj(unsigned long handle, unsigned long obj) 372 371 { 373 - /* 374 - * lsb of @obj represents handle lock while other bits 375 - * represent object value the handle is pointing so 376 - * updating shouldn't do store tearing. 377 - */ 378 - WRITE_ONCE(*(unsigned long *)handle, obj); 372 + *(unsigned long *)handle = obj; 379 373 } 380 374 381 375 /* zpool driver */ ··· 453 455 #endif /* CONFIG_ZPOOL */ 454 456 455 457 /* per-cpu VM mapping areas for zspage accesses that cross page boundaries */ 456 - static DEFINE_PER_CPU(struct mapping_area, zs_map_area); 457 - 458 - static bool is_zspage_isolated(struct zspage *zspage) 459 - { 460 - return zspage->isolated; 461 - } 458 + static DEFINE_PER_CPU(struct mapping_area, zs_map_area) = { 459 + .lock = INIT_LOCAL_LOCK(lock), 460 + }; 462 461 463 462 static __maybe_unused int is_first_page(struct page *page) 464 463 { ··· 512 517 *class_idx = zspage->class; 513 518 } 514 519 520 + static struct size_class *zspage_class(struct zs_pool *pool, 521 + struct zspage *zspage) 522 + { 523 + return pool->size_class[zspage->class]; 524 + } 525 + 515 526 static void set_zspage_mapping(struct zspage *zspage, 516 527 unsigned int class_idx, 517 528 enum fullness_group fullness) ··· 544 543 return min_t(int, ZS_SIZE_CLASSES - 1, idx); 545 544 } 546 545 547 - /* type can be of enum type zs_stat_type or fullness_group */ 548 - static inline void zs_stat_inc(struct size_class *class, 546 + /* type can be of enum type class_stat_type or fullness_group */ 547 + static inline void class_stat_inc(struct size_class *class, 549 548 int type, unsigned long cnt) 550 549 { 551 550 class->stats.objs[type] += cnt; 552 551 } 553 552 554 - /* type can be of enum type zs_stat_type or fullness_group */ 555 - static inline void zs_stat_dec(struct size_class *class, 553 + /* type can be of enum type class_stat_type or fullness_group */ 554 + static inline void class_stat_dec(struct size_class *class, 556 555 int type, unsigned long cnt) 557 556 { 558 557 class->stats.objs[type] -= cnt; 559 558 } 560 559 561 - /* type can be of enum type zs_stat_type or fullness_group */ 560 + /* type can be of enum type class_stat_type or fullness_group */ 562 561 static inline unsigned long zs_stat_get(struct size_class *class, 563 562 int type) 564 563 { ··· 720 719 { 721 720 struct zspage *head; 722 721 723 - zs_stat_inc(class, fullness, 1); 722 + class_stat_inc(class, fullness, 1); 724 723 head = list_first_entry_or_null(&class->fullness_list[fullness], 725 724 struct zspage, list); 726 725 /* ··· 742 741 enum fullness_group fullness) 743 742 { 744 743 VM_BUG_ON(list_empty(&class->fullness_list[fullness])); 745 - VM_BUG_ON(is_zspage_isolated(zspage)); 746 744 747 745 list_del_init(&zspage->list); 748 - zs_stat_dec(class, fullness, 1); 746 + class_stat_dec(class, fullness, 1); 749 747 } 750 748 751 749 /* ··· 767 767 if (newfg == currfg) 768 768 goto out; 769 769 770 - if (!is_zspage_isolated(zspage)) { 771 - remove_zspage(class, zspage, currfg); 772 - insert_zspage(class, zspage, newfg); 773 - } 774 - 770 + remove_zspage(class, zspage, currfg); 771 + insert_zspage(class, zspage, newfg); 775 772 set_zspage_mapping(zspage, class_idx, newfg); 776 - 777 773 out: 778 774 return newfg; 779 775 } ··· 820 824 821 825 static struct page *get_next_page(struct page *page) 822 826 { 823 - if (unlikely(PageHugeObject(page))) 827 + struct zspage *zspage = get_zspage(page); 828 + 829 + if (unlikely(ZsHugePage(zspage))) 824 830 return NULL; 825 831 826 832 return (struct page *)page->index; ··· 840 842 obj >>= OBJ_TAG_BITS; 841 843 *page = pfn_to_page(obj >> OBJ_INDEX_BITS); 842 844 *obj_idx = (obj & OBJ_INDEX_MASK); 845 + } 846 + 847 + static void obj_to_page(unsigned long obj, struct page **page) 848 + { 849 + obj >>= OBJ_TAG_BITS; 850 + *page = pfn_to_page(obj >> OBJ_INDEX_BITS); 843 851 } 844 852 845 853 /** ··· 869 865 return *(unsigned long *)handle; 870 866 } 871 867 872 - static unsigned long obj_to_head(struct page *page, void *obj) 868 + static bool obj_allocated(struct page *page, void *obj, unsigned long *phandle) 873 869 { 874 - if (unlikely(PageHugeObject(page))) { 870 + unsigned long handle; 871 + struct zspage *zspage = get_zspage(page); 872 + 873 + if (unlikely(ZsHugePage(zspage))) { 875 874 VM_BUG_ON_PAGE(!is_first_page(page), page); 876 - return page->index; 875 + handle = page->index; 877 876 } else 878 - return *(unsigned long *)obj; 879 - } 877 + handle = *(unsigned long *)obj; 880 878 881 - static inline int testpin_tag(unsigned long handle) 882 - { 883 - return bit_spin_is_locked(HANDLE_PIN_BIT, (unsigned long *)handle); 884 - } 879 + if (!(handle & OBJ_ALLOCATED_TAG)) 880 + return false; 885 881 886 - static inline int trypin_tag(unsigned long handle) 887 - { 888 - return bit_spin_trylock(HANDLE_PIN_BIT, (unsigned long *)handle); 889 - } 890 - 891 - static void pin_tag(unsigned long handle) __acquires(bitlock) 892 - { 893 - bit_spin_lock(HANDLE_PIN_BIT, (unsigned long *)handle); 894 - } 895 - 896 - static void unpin_tag(unsigned long handle) __releases(bitlock) 897 - { 898 - bit_spin_unlock(HANDLE_PIN_BIT, (unsigned long *)handle); 882 + *phandle = handle & ~OBJ_ALLOCATED_TAG; 883 + return true; 899 884 } 900 885 901 886 static void reset_page(struct page *page) ··· 893 900 ClearPagePrivate(page); 894 901 set_page_private(page, 0); 895 902 page_mapcount_reset(page); 896 - ClearPageHugeObject(page); 897 903 page->index = 0; 898 904 } 899 905 ··· 944 952 945 953 cache_free_zspage(pool, zspage); 946 954 947 - zs_stat_dec(class, OBJ_ALLOCATED, class->objs_per_zspage); 955 + class_stat_dec(class, OBJ_ALLOCATED, class->objs_per_zspage); 948 956 atomic_long_sub(class->pages_per_zspage, 949 957 &pool->pages_allocated); 950 958 } ··· 955 963 VM_BUG_ON(get_zspage_inuse(zspage)); 956 964 VM_BUG_ON(list_empty(&zspage->list)); 957 965 966 + /* 967 + * Since zs_free couldn't be sleepable, this function cannot call 968 + * lock_page. The page locks trylock_zspage got will be released 969 + * by __free_zspage. 970 + */ 958 971 if (!trylock_zspage(zspage)) { 959 972 kick_deferred_free(pool); 960 973 return; ··· 1039 1042 SetPagePrivate(page); 1040 1043 if (unlikely(class->objs_per_zspage == 1 && 1041 1044 class->pages_per_zspage == 1)) 1042 - SetPageHugeObject(page); 1045 + SetZsHugePage(zspage); 1043 1046 } else { 1044 1047 prev_page->index = (unsigned long)page; 1045 1048 } ··· 1243 1246 unsigned long obj, off; 1244 1247 unsigned int obj_idx; 1245 1248 1246 - unsigned int class_idx; 1247 - enum fullness_group fg; 1248 1249 struct size_class *class; 1249 1250 struct mapping_area *area; 1250 1251 struct page *pages[2]; ··· 1255 1260 */ 1256 1261 BUG_ON(in_interrupt()); 1257 1262 1258 - /* From now on, migration cannot move the object */ 1259 - pin_tag(handle); 1260 - 1263 + /* It guarantees it can get zspage from handle safely */ 1264 + read_lock(&pool->migrate_lock); 1261 1265 obj = handle_to_obj(handle); 1262 1266 obj_to_location(obj, &page, &obj_idx); 1263 1267 zspage = get_zspage(page); 1264 1268 1265 - /* migration cannot move any subpage in this zspage */ 1269 + /* 1270 + * migration cannot move any zpages in this zspage. Here, class->lock 1271 + * is too heavy since callers would take some time until they calls 1272 + * zs_unmap_object API so delegate the locking from class to zspage 1273 + * which is smaller granularity. 1274 + */ 1266 1275 migrate_read_lock(zspage); 1276 + read_unlock(&pool->migrate_lock); 1267 1277 1268 - get_zspage_mapping(zspage, &class_idx, &fg); 1269 - class = pool->size_class[class_idx]; 1278 + class = zspage_class(pool, zspage); 1270 1279 off = (class->size * obj_idx) & ~PAGE_MASK; 1271 1280 1272 - area = &get_cpu_var(zs_map_area); 1281 + local_lock(&zs_map_area.lock); 1282 + area = this_cpu_ptr(&zs_map_area); 1273 1283 area->vm_mm = mm; 1274 1284 if (off + class->size <= PAGE_SIZE) { 1275 1285 /* this object is contained entirely within a page */ ··· 1290 1290 1291 1291 ret = __zs_map_object(area, pages, off, class->size); 1292 1292 out: 1293 - if (likely(!PageHugeObject(page))) 1293 + if (likely(!ZsHugePage(zspage))) 1294 1294 ret += ZS_HANDLE_SIZE; 1295 1295 1296 1296 return ret; ··· 1304 1304 unsigned long obj, off; 1305 1305 unsigned int obj_idx; 1306 1306 1307 - unsigned int class_idx; 1308 - enum fullness_group fg; 1309 1307 struct size_class *class; 1310 1308 struct mapping_area *area; 1311 1309 1312 1310 obj = handle_to_obj(handle); 1313 1311 obj_to_location(obj, &page, &obj_idx); 1314 1312 zspage = get_zspage(page); 1315 - get_zspage_mapping(zspage, &class_idx, &fg); 1316 - class = pool->size_class[class_idx]; 1313 + class = zspage_class(pool, zspage); 1317 1314 off = (class->size * obj_idx) & ~PAGE_MASK; 1318 1315 1319 1316 area = this_cpu_ptr(&zs_map_area); ··· 1325 1328 1326 1329 __zs_unmap_object(area, pages, off, class->size); 1327 1330 } 1328 - put_cpu_var(zs_map_area); 1331 + local_unlock(&zs_map_area.lock); 1329 1332 1330 1333 migrate_read_unlock(zspage); 1331 - unpin_tag(handle); 1332 1334 } 1333 1335 EXPORT_SYMBOL_GPL(zs_unmap_object); 1334 1336 ··· 1350 1354 } 1351 1355 EXPORT_SYMBOL_GPL(zs_huge_class_size); 1352 1356 1353 - static unsigned long obj_malloc(struct size_class *class, 1357 + static unsigned long obj_malloc(struct zs_pool *pool, 1354 1358 struct zspage *zspage, unsigned long handle) 1355 1359 { 1356 1360 int i, nr_page, offset; 1357 1361 unsigned long obj; 1358 1362 struct link_free *link; 1363 + struct size_class *class; 1359 1364 1360 1365 struct page *m_page; 1361 1366 unsigned long m_offset; 1362 1367 void *vaddr; 1363 1368 1369 + class = pool->size_class[zspage->class]; 1364 1370 handle |= OBJ_ALLOCATED_TAG; 1365 1371 obj = get_freeobj(zspage); 1366 1372 ··· 1377 1379 vaddr = kmap_atomic(m_page); 1378 1380 link = (struct link_free *)vaddr + m_offset / sizeof(*link); 1379 1381 set_freeobj(zspage, link->next >> OBJ_TAG_BITS); 1380 - if (likely(!PageHugeObject(m_page))) 1382 + if (likely(!ZsHugePage(zspage))) 1381 1383 /* record handle in the header of allocated chunk */ 1382 1384 link->handle = handle; 1383 1385 else ··· 1386 1388 1387 1389 kunmap_atomic(vaddr); 1388 1390 mod_zspage_inuse(zspage, 1); 1389 - zs_stat_inc(class, OBJ_USED, 1); 1390 1391 1391 1392 obj = location_to_obj(m_page, obj); 1392 1393 ··· 1421 1424 size += ZS_HANDLE_SIZE; 1422 1425 class = pool->size_class[get_size_class_index(size)]; 1423 1426 1427 + /* class->lock effectively protects the zpage migration */ 1424 1428 spin_lock(&class->lock); 1425 1429 zspage = find_get_zspage(class); 1426 1430 if (likely(zspage)) { 1427 - obj = obj_malloc(class, zspage, handle); 1431 + obj = obj_malloc(pool, zspage, handle); 1428 1432 /* Now move the zspage to another fullness group, if required */ 1429 1433 fix_fullness_group(class, zspage); 1430 1434 record_obj(handle, obj); 1435 + class_stat_inc(class, OBJ_USED, 1); 1431 1436 spin_unlock(&class->lock); 1432 1437 1433 1438 return handle; ··· 1444 1445 } 1445 1446 1446 1447 spin_lock(&class->lock); 1447 - obj = obj_malloc(class, zspage, handle); 1448 + obj = obj_malloc(pool, zspage, handle); 1448 1449 newfg = get_fullness_group(class, zspage); 1449 1450 insert_zspage(class, zspage, newfg); 1450 1451 set_zspage_mapping(zspage, class->index, newfg); 1451 1452 record_obj(handle, obj); 1452 1453 atomic_long_add(class->pages_per_zspage, 1453 1454 &pool->pages_allocated); 1454 - zs_stat_inc(class, OBJ_ALLOCATED, class->objs_per_zspage); 1455 + class_stat_inc(class, OBJ_ALLOCATED, class->objs_per_zspage); 1456 + class_stat_inc(class, OBJ_USED, 1); 1455 1457 1456 1458 /* We completely set up zspage so mark them as movable */ 1457 1459 SetZsPageMovable(pool, zspage); ··· 1462 1462 } 1463 1463 EXPORT_SYMBOL_GPL(zs_malloc); 1464 1464 1465 - static void obj_free(struct size_class *class, unsigned long obj) 1465 + static void obj_free(int class_size, unsigned long obj) 1466 1466 { 1467 1467 struct link_free *link; 1468 1468 struct zspage *zspage; ··· 1472 1472 void *vaddr; 1473 1473 1474 1474 obj_to_location(obj, &f_page, &f_objidx); 1475 - f_offset = (class->size * f_objidx) & ~PAGE_MASK; 1475 + f_offset = (class_size * f_objidx) & ~PAGE_MASK; 1476 1476 zspage = get_zspage(f_page); 1477 1477 1478 1478 vaddr = kmap_atomic(f_page); 1479 1479 1480 1480 /* Insert this object in containing zspage's freelist */ 1481 1481 link = (struct link_free *)(vaddr + f_offset); 1482 - link->next = get_freeobj(zspage) << OBJ_TAG_BITS; 1482 + if (likely(!ZsHugePage(zspage))) 1483 + link->next = get_freeobj(zspage) << OBJ_TAG_BITS; 1484 + else 1485 + f_page->index = 0; 1483 1486 kunmap_atomic(vaddr); 1484 1487 set_freeobj(zspage, f_objidx); 1485 1488 mod_zspage_inuse(zspage, -1); 1486 - zs_stat_dec(class, OBJ_USED, 1); 1487 1489 } 1488 1490 1489 1491 void zs_free(struct zs_pool *pool, unsigned long handle) ··· 1493 1491 struct zspage *zspage; 1494 1492 struct page *f_page; 1495 1493 unsigned long obj; 1496 - unsigned int f_objidx; 1497 - int class_idx; 1498 1494 struct size_class *class; 1499 1495 enum fullness_group fullness; 1500 - bool isolated; 1501 1496 1502 1497 if (unlikely(!handle)) 1503 1498 return; 1504 1499 1505 - pin_tag(handle); 1500 + /* 1501 + * The pool->migrate_lock protects the race with zpage's migration 1502 + * so it's safe to get the page from handle. 1503 + */ 1504 + read_lock(&pool->migrate_lock); 1506 1505 obj = handle_to_obj(handle); 1507 - obj_to_location(obj, &f_page, &f_objidx); 1506 + obj_to_page(obj, &f_page); 1508 1507 zspage = get_zspage(f_page); 1509 - 1510 - migrate_read_lock(zspage); 1511 - 1512 - get_zspage_mapping(zspage, &class_idx, &fullness); 1513 - class = pool->size_class[class_idx]; 1514 - 1508 + class = zspage_class(pool, zspage); 1515 1509 spin_lock(&class->lock); 1516 - obj_free(class, obj); 1510 + read_unlock(&pool->migrate_lock); 1511 + 1512 + obj_free(class->size, obj); 1513 + class_stat_dec(class, OBJ_USED, 1); 1517 1514 fullness = fix_fullness_group(class, zspage); 1518 - if (fullness != ZS_EMPTY) { 1519 - migrate_read_unlock(zspage); 1515 + if (fullness != ZS_EMPTY) 1520 1516 goto out; 1521 - } 1522 1517 1523 - isolated = is_zspage_isolated(zspage); 1524 - migrate_read_unlock(zspage); 1525 - /* If zspage is isolated, zs_page_putback will free the zspage */ 1526 - if (likely(!isolated)) 1527 - free_zspage(pool, class, zspage); 1518 + free_zspage(pool, class, zspage); 1528 1519 out: 1529 - 1530 1520 spin_unlock(&class->lock); 1531 - unpin_tag(handle); 1532 1521 cache_free_handle(pool, handle); 1533 1522 } 1534 1523 EXPORT_SYMBOL_GPL(zs_free); ··· 1594 1601 static unsigned long find_alloced_obj(struct size_class *class, 1595 1602 struct page *page, int *obj_idx) 1596 1603 { 1597 - unsigned long head; 1598 1604 int offset = 0; 1599 1605 int index = *obj_idx; 1600 1606 unsigned long handle = 0; ··· 1603 1611 offset += class->size * index; 1604 1612 1605 1613 while (offset < PAGE_SIZE) { 1606 - head = obj_to_head(page, addr + offset); 1607 - if (head & OBJ_ALLOCATED_TAG) { 1608 - handle = head & ~OBJ_ALLOCATED_TAG; 1609 - if (trypin_tag(handle)) 1610 - break; 1611 - handle = 0; 1612 - } 1614 + if (obj_allocated(page, addr + offset, &handle)) 1615 + break; 1613 1616 1614 1617 offset += class->size; 1615 1618 index++; ··· 1650 1663 1651 1664 /* Stop if there is no more space */ 1652 1665 if (zspage_full(class, get_zspage(d_page))) { 1653 - unpin_tag(handle); 1654 1666 ret = -ENOMEM; 1655 1667 break; 1656 1668 } 1657 1669 1658 1670 used_obj = handle_to_obj(handle); 1659 - free_obj = obj_malloc(class, get_zspage(d_page), handle); 1671 + free_obj = obj_malloc(pool, get_zspage(d_page), handle); 1660 1672 zs_object_copy(class, free_obj, used_obj); 1661 1673 obj_idx++; 1662 - /* 1663 - * record_obj updates handle's value to free_obj and it will 1664 - * invalidate lock bit(ie, HANDLE_PIN_BIT) of handle, which 1665 - * breaks synchronization using pin_tag(e,g, zs_free) so 1666 - * let's keep the lock bit. 1667 - */ 1668 - free_obj |= BIT(HANDLE_PIN_BIT); 1669 1674 record_obj(handle, free_obj); 1670 - unpin_tag(handle); 1671 - obj_free(class, used_obj); 1675 + obj_free(class->size, used_obj); 1672 1676 } 1673 1677 1674 1678 /* Remember last position in this iteration */ ··· 1684 1706 zspage = list_first_entry_or_null(&class->fullness_list[fg[i]], 1685 1707 struct zspage, list); 1686 1708 if (zspage) { 1687 - VM_BUG_ON(is_zspage_isolated(zspage)); 1688 1709 remove_zspage(class, zspage, fg[i]); 1689 1710 return zspage; 1690 1711 } ··· 1703 1726 struct zspage *zspage) 1704 1727 { 1705 1728 enum fullness_group fullness; 1706 - 1707 - VM_BUG_ON(is_zspage_isolated(zspage)); 1708 1729 1709 1730 fullness = get_fullness_group(class, zspage); 1710 1731 insert_zspage(class, zspage, fullness); ··· 1772 1797 write_lock(&zspage->lock); 1773 1798 } 1774 1799 1800 + static void migrate_write_lock_nested(struct zspage *zspage) 1801 + { 1802 + write_lock_nested(&zspage->lock, SINGLE_DEPTH_NESTING); 1803 + } 1804 + 1775 1805 static void migrate_write_unlock(struct zspage *zspage) 1776 1806 { 1777 1807 write_unlock(&zspage->lock); ··· 1790 1810 1791 1811 static void dec_zspage_isolation(struct zspage *zspage) 1792 1812 { 1813 + VM_BUG_ON(zspage->isolated == 0); 1793 1814 zspage->isolated--; 1794 - } 1795 - 1796 - static void putback_zspage_deferred(struct zs_pool *pool, 1797 - struct size_class *class, 1798 - struct zspage *zspage) 1799 - { 1800 - enum fullness_group fg; 1801 - 1802 - fg = putback_zspage(class, zspage); 1803 - if (fg == ZS_EMPTY) 1804 - schedule_work(&pool->free_work); 1805 - 1806 - } 1807 - 1808 - static inline void zs_pool_dec_isolated(struct zs_pool *pool) 1809 - { 1810 - VM_BUG_ON(atomic_long_read(&pool->isolated_pages) <= 0); 1811 - atomic_long_dec(&pool->isolated_pages); 1812 - /* 1813 - * Checking pool->destroying must happen after atomic_long_dec() 1814 - * for pool->isolated_pages above. Paired with the smp_mb() in 1815 - * zs_unregister_migration(). 1816 - */ 1817 - smp_mb__after_atomic(); 1818 - if (atomic_long_read(&pool->isolated_pages) == 0 && pool->destroying) 1819 - wake_up_all(&pool->migration_wait); 1820 1815 } 1821 1816 1822 1817 static void replace_sub_page(struct size_class *class, struct zspage *zspage, ··· 1812 1857 1813 1858 create_page_chain(class, zspage, pages); 1814 1859 set_first_obj_offset(newpage, get_first_obj_offset(oldpage)); 1815 - if (unlikely(PageHugeObject(oldpage))) 1860 + if (unlikely(ZsHugePage(zspage))) 1816 1861 newpage->index = oldpage->index; 1817 1862 __SetPageMovable(newpage, page_mapping(oldpage)); 1818 1863 } 1819 1864 1820 1865 static bool zs_page_isolate(struct page *page, isolate_mode_t mode) 1821 1866 { 1822 - struct zs_pool *pool; 1823 - struct size_class *class; 1824 - int class_idx; 1825 - enum fullness_group fullness; 1826 1867 struct zspage *zspage; 1827 - struct address_space *mapping; 1828 1868 1829 1869 /* 1830 1870 * Page is locked so zspage couldn't be destroyed. For detail, look at ··· 1829 1879 VM_BUG_ON_PAGE(PageIsolated(page), page); 1830 1880 1831 1881 zspage = get_zspage(page); 1832 - 1833 - /* 1834 - * Without class lock, fullness could be stale while class_idx is okay 1835 - * because class_idx is constant unless page is freed so we should get 1836 - * fullness again under class lock. 1837 - */ 1838 - get_zspage_mapping(zspage, &class_idx, &fullness); 1839 - mapping = page_mapping(page); 1840 - pool = mapping->private_data; 1841 - class = pool->size_class[class_idx]; 1842 - 1843 - spin_lock(&class->lock); 1844 - if (get_zspage_inuse(zspage) == 0) { 1845 - spin_unlock(&class->lock); 1846 - return false; 1847 - } 1848 - 1849 - /* zspage is isolated for object migration */ 1850 - if (list_empty(&zspage->list) && !is_zspage_isolated(zspage)) { 1851 - spin_unlock(&class->lock); 1852 - return false; 1853 - } 1854 - 1855 - /* 1856 - * If this is first time isolation for the zspage, isolate zspage from 1857 - * size_class to prevent further object allocation from the zspage. 1858 - */ 1859 - if (!list_empty(&zspage->list) && !is_zspage_isolated(zspage)) { 1860 - get_zspage_mapping(zspage, &class_idx, &fullness); 1861 - atomic_long_inc(&pool->isolated_pages); 1862 - remove_zspage(class, zspage, fullness); 1863 - } 1864 - 1882 + migrate_write_lock(zspage); 1865 1883 inc_zspage_isolation(zspage); 1866 - spin_unlock(&class->lock); 1884 + migrate_write_unlock(zspage); 1867 1885 1868 1886 return true; 1869 1887 } ··· 1841 1923 { 1842 1924 struct zs_pool *pool; 1843 1925 struct size_class *class; 1844 - int class_idx; 1845 - enum fullness_group fullness; 1846 1926 struct zspage *zspage; 1847 1927 struct page *dummy; 1848 1928 void *s_addr, *d_addr, *addr; 1849 - int offset, pos; 1850 - unsigned long handle, head; 1929 + int offset; 1930 + unsigned long handle; 1851 1931 unsigned long old_obj, new_obj; 1852 1932 unsigned int obj_idx; 1853 - int ret = -EAGAIN; 1854 1933 1855 1934 /* 1856 1935 * We cannot support the _NO_COPY case here, because copy needs to ··· 1860 1945 VM_BUG_ON_PAGE(!PageMovable(page), page); 1861 1946 VM_BUG_ON_PAGE(!PageIsolated(page), page); 1862 1947 1863 - zspage = get_zspage(page); 1864 - 1865 - /* Concurrent compactor cannot migrate any subpage in zspage */ 1866 - migrate_write_lock(zspage); 1867 - get_zspage_mapping(zspage, &class_idx, &fullness); 1868 1948 pool = mapping->private_data; 1869 - class = pool->size_class[class_idx]; 1870 - offset = get_first_obj_offset(page); 1871 1949 1950 + /* 1951 + * The pool migrate_lock protects the race between zpage migration 1952 + * and zs_free. 1953 + */ 1954 + write_lock(&pool->migrate_lock); 1955 + zspage = get_zspage(page); 1956 + class = zspage_class(pool, zspage); 1957 + 1958 + /* 1959 + * the class lock protects zpage alloc/free in the zspage. 1960 + */ 1872 1961 spin_lock(&class->lock); 1873 - if (!get_zspage_inuse(zspage)) { 1874 - /* 1875 - * Set "offset" to end of the page so that every loops 1876 - * skips unnecessary object scanning. 1877 - */ 1878 - offset = PAGE_SIZE; 1879 - } 1962 + /* the migrate_write_lock protects zpage access via zs_map_object */ 1963 + migrate_write_lock(zspage); 1880 1964 1881 - pos = offset; 1965 + offset = get_first_obj_offset(page); 1882 1966 s_addr = kmap_atomic(page); 1883 - while (pos < PAGE_SIZE) { 1884 - head = obj_to_head(page, s_addr + pos); 1885 - if (head & OBJ_ALLOCATED_TAG) { 1886 - handle = head & ~OBJ_ALLOCATED_TAG; 1887 - if (!trypin_tag(handle)) 1888 - goto unpin_objects; 1889 - } 1890 - pos += class->size; 1891 - } 1892 1967 1893 1968 /* 1894 1969 * Here, any user cannot access all objects in the zspage so let's move. ··· 1887 1982 memcpy(d_addr, s_addr, PAGE_SIZE); 1888 1983 kunmap_atomic(d_addr); 1889 1984 1890 - for (addr = s_addr + offset; addr < s_addr + pos; 1985 + for (addr = s_addr + offset; addr < s_addr + PAGE_SIZE; 1891 1986 addr += class->size) { 1892 - head = obj_to_head(page, addr); 1893 - if (head & OBJ_ALLOCATED_TAG) { 1894 - handle = head & ~OBJ_ALLOCATED_TAG; 1895 - BUG_ON(!testpin_tag(handle)); 1987 + if (obj_allocated(page, addr, &handle)) { 1896 1988 1897 1989 old_obj = handle_to_obj(handle); 1898 1990 obj_to_location(old_obj, &dummy, &obj_idx); 1899 1991 new_obj = (unsigned long)location_to_obj(newpage, 1900 1992 obj_idx); 1901 - new_obj |= BIT(HANDLE_PIN_BIT); 1902 1993 record_obj(handle, new_obj); 1903 1994 } 1904 1995 } 1996 + kunmap_atomic(s_addr); 1905 1997 1906 1998 replace_sub_page(class, zspage, newpage, page); 1907 - get_page(newpage); 1908 - 1909 - dec_zspage_isolation(zspage); 1910 - 1911 1999 /* 1912 - * Page migration is done so let's putback isolated zspage to 1913 - * the list if @page is final isolated subpage in the zspage. 2000 + * Since we complete the data copy and set up new zspage structure, 2001 + * it's okay to release migration_lock. 1914 2002 */ 1915 - if (!is_zspage_isolated(zspage)) { 1916 - /* 1917 - * We cannot race with zs_destroy_pool() here because we wait 1918 - * for isolation to hit zero before we start destroying. 1919 - * Also, we ensure that everyone can see pool->destroying before 1920 - * we start waiting. 1921 - */ 1922 - putback_zspage_deferred(pool, class, zspage); 1923 - zs_pool_dec_isolated(pool); 1924 - } 2003 + write_unlock(&pool->migrate_lock); 2004 + spin_unlock(&class->lock); 2005 + dec_zspage_isolation(zspage); 2006 + migrate_write_unlock(zspage); 1925 2007 2008 + get_page(newpage); 1926 2009 if (page_zone(newpage) != page_zone(page)) { 1927 2010 dec_zone_page_state(page, NR_ZSPAGES); 1928 2011 inc_zone_page_state(newpage, NR_ZSPAGES); ··· 1918 2025 1919 2026 reset_page(page); 1920 2027 put_page(page); 1921 - page = newpage; 1922 2028 1923 - ret = MIGRATEPAGE_SUCCESS; 1924 - unpin_objects: 1925 - for (addr = s_addr + offset; addr < s_addr + pos; 1926 - addr += class->size) { 1927 - head = obj_to_head(page, addr); 1928 - if (head & OBJ_ALLOCATED_TAG) { 1929 - handle = head & ~OBJ_ALLOCATED_TAG; 1930 - BUG_ON(!testpin_tag(handle)); 1931 - unpin_tag(handle); 1932 - } 1933 - } 1934 - kunmap_atomic(s_addr); 1935 - spin_unlock(&class->lock); 1936 - migrate_write_unlock(zspage); 1937 - 1938 - return ret; 2029 + return MIGRATEPAGE_SUCCESS; 1939 2030 } 1940 2031 1941 2032 static void zs_page_putback(struct page *page) 1942 2033 { 1943 - struct zs_pool *pool; 1944 - struct size_class *class; 1945 - int class_idx; 1946 - enum fullness_group fg; 1947 - struct address_space *mapping; 1948 2034 struct zspage *zspage; 1949 2035 1950 2036 VM_BUG_ON_PAGE(!PageMovable(page), page); 1951 2037 VM_BUG_ON_PAGE(!PageIsolated(page), page); 1952 2038 1953 2039 zspage = get_zspage(page); 1954 - get_zspage_mapping(zspage, &class_idx, &fg); 1955 - mapping = page_mapping(page); 1956 - pool = mapping->private_data; 1957 - class = pool->size_class[class_idx]; 1958 - 1959 - spin_lock(&class->lock); 2040 + migrate_write_lock(zspage); 1960 2041 dec_zspage_isolation(zspage); 1961 - if (!is_zspage_isolated(zspage)) { 1962 - /* 1963 - * Due to page_lock, we cannot free zspage immediately 1964 - * so let's defer. 1965 - */ 1966 - putback_zspage_deferred(pool, class, zspage); 1967 - zs_pool_dec_isolated(pool); 1968 - } 1969 - spin_unlock(&class->lock); 2042 + migrate_write_unlock(zspage); 1970 2043 } 1971 2044 1972 2045 static const struct address_space_operations zsmalloc_aops = { ··· 1954 2095 return 0; 1955 2096 } 1956 2097 1957 - static bool pool_isolated_are_drained(struct zs_pool *pool) 1958 - { 1959 - return atomic_long_read(&pool->isolated_pages) == 0; 1960 - } 1961 - 1962 - /* Function for resolving migration */ 1963 - static void wait_for_isolated_drain(struct zs_pool *pool) 1964 - { 1965 - 1966 - /* 1967 - * We're in the process of destroying the pool, so there are no 1968 - * active allocations. zs_page_isolate() fails for completely free 1969 - * zspages, so we need only wait for the zs_pool's isolated 1970 - * count to hit zero. 1971 - */ 1972 - wait_event(pool->migration_wait, 1973 - pool_isolated_are_drained(pool)); 1974 - } 1975 - 1976 2098 static void zs_unregister_migration(struct zs_pool *pool) 1977 2099 { 1978 - pool->destroying = true; 1979 - /* 1980 - * We need a memory barrier here to ensure global visibility of 1981 - * pool->destroying. Thus pool->isolated pages will either be 0 in which 1982 - * case we don't care, or it will be > 0 and pool->destroying will 1983 - * ensure that we wake up once isolation hits 0. 1984 - */ 1985 - smp_mb(); 1986 - wait_for_isolated_drain(pool); /* This can block */ 1987 2100 flush_work(&pool->free_work); 1988 2101 iput(pool->inode); 1989 2102 } ··· 1984 2153 list_splice_init(&class->fullness_list[ZS_EMPTY], &free_pages); 1985 2154 spin_unlock(&class->lock); 1986 2155 } 1987 - 1988 2156 1989 2157 list_for_each_entry_safe(zspage, tmp, &free_pages, list) { 1990 2158 list_del(&zspage->list); ··· 2048 2218 struct zspage *dst_zspage = NULL; 2049 2219 unsigned long pages_freed = 0; 2050 2220 2221 + /* protect the race between zpage migration and zs_free */ 2222 + write_lock(&pool->migrate_lock); 2223 + /* protect zpage allocation/free */ 2051 2224 spin_lock(&class->lock); 2052 2225 while ((src_zspage = isolate_zspage(class, true))) { 2226 + /* protect someone accessing the zspage(i.e., zs_map_object) */ 2227 + migrate_write_lock(src_zspage); 2053 2228 2054 2229 if (!zs_can_compact(class)) 2055 2230 break; ··· 2063 2228 cc.s_page = get_first_page(src_zspage); 2064 2229 2065 2230 while ((dst_zspage = isolate_zspage(class, false))) { 2231 + migrate_write_lock_nested(dst_zspage); 2232 + 2066 2233 cc.d_page = get_first_page(dst_zspage); 2067 2234 /* 2068 2235 * If there is no more space in dst_page, resched ··· 2074 2237 break; 2075 2238 2076 2239 putback_zspage(class, dst_zspage); 2240 + migrate_write_unlock(dst_zspage); 2241 + dst_zspage = NULL; 2242 + if (rwlock_is_contended(&pool->migrate_lock)) 2243 + break; 2077 2244 } 2078 2245 2079 2246 /* Stop if we couldn't find slot */ ··· 2085 2244 break; 2086 2245 2087 2246 putback_zspage(class, dst_zspage); 2247 + migrate_write_unlock(dst_zspage); 2248 + 2088 2249 if (putback_zspage(class, src_zspage) == ZS_EMPTY) { 2250 + migrate_write_unlock(src_zspage); 2089 2251 free_zspage(pool, class, src_zspage); 2090 2252 pages_freed += class->pages_per_zspage; 2091 - } 2253 + } else 2254 + migrate_write_unlock(src_zspage); 2092 2255 spin_unlock(&class->lock); 2256 + write_unlock(&pool->migrate_lock); 2093 2257 cond_resched(); 2258 + write_lock(&pool->migrate_lock); 2094 2259 spin_lock(&class->lock); 2095 2260 } 2096 2261 2097 - if (src_zspage) 2262 + if (src_zspage) { 2098 2263 putback_zspage(class, src_zspage); 2264 + migrate_write_unlock(src_zspage); 2265 + } 2099 2266 2100 2267 spin_unlock(&class->lock); 2268 + write_unlock(&pool->migrate_lock); 2101 2269 2102 2270 return pages_freed; 2103 2271 } ··· 2212 2362 return NULL; 2213 2363 2214 2364 init_deferred_free(pool); 2365 + rwlock_init(&pool->migrate_lock); 2215 2366 2216 2367 pool->name = kstrdup(name, GFP_KERNEL); 2217 2368 if (!pool->name) 2218 2369 goto err; 2219 - 2220 - #ifdef CONFIG_COMPACTION 2221 - init_waitqueue_head(&pool->migration_wait); 2222 - #endif 2223 2370 2224 2371 if (create_cache(pool)) 2225 2372 goto err;

+6 -2

mm/zswap.c

··· 1378 1378 zswap_trees[type] = tree; 1379 1379 } 1380 1380 1381 - static struct frontswap_ops zswap_frontswap_ops = { 1381 + static const struct frontswap_ops zswap_frontswap_ops = { 1382 1382 .store = zswap_frontswap_store, 1383 1383 .load = zswap_frontswap_load, 1384 1384 .invalidate_page = zswap_frontswap_invalidate_page, ··· 1475 1475 if (!shrink_wq) 1476 1476 goto fallback_fail; 1477 1477 1478 - frontswap_register_ops(&zswap_frontswap_ops); 1478 + ret = frontswap_register_ops(&zswap_frontswap_ops); 1479 + if (ret) 1480 + goto destroy_wq; 1479 1481 if (zswap_debugfs_init()) 1480 1482 pr_warn("debugfs initialization failed\n"); 1481 1483 return 0; 1482 1484 1485 + destroy_wq: 1486 + destroy_workqueue(shrink_wq); 1483 1487 fallback_fail: 1484 1488 if (pool) 1485 1489 zswap_pool_destroy(pool);

+2 -2

net/atm/proc.c

··· 108 108 static inline void *vcc_walk(struct seq_file *seq, loff_t l) 109 109 { 110 110 struct vcc_state *state = seq->private; 111 - int family = (uintptr_t)(PDE_DATA(file_inode(seq->file))); 111 + int family = (uintptr_t)(pde_data(file_inode(seq->file))); 112 112 113 113 return __vcc_walk(&state->sk, family, &state->bucket, l) ? 114 114 state : NULL; ··· 324 324 page = get_zeroed_page(GFP_KERNEL); 325 325 if (!page) 326 326 return -ENOMEM; 327 - dev = PDE_DATA(file_inode(file)); 327 + dev = pde_data(file_inode(file)); 328 328 if (!dev->ops->proc_read) 329 329 length = -EINVAL; 330 330 else {

+4 -4

net/bluetooth/af_bluetooth.c

··· 611 611 static void *bt_seq_start(struct seq_file *seq, loff_t *pos) 612 612 __acquires(seq->private->l->lock) 613 613 { 614 - struct bt_sock_list *l = PDE_DATA(file_inode(seq->file)); 614 + struct bt_sock_list *l = pde_data(file_inode(seq->file)); 615 615 616 616 read_lock(&l->lock); 617 617 return seq_hlist_start_head(&l->head, *pos); ··· 619 619 620 620 static void *bt_seq_next(struct seq_file *seq, void *v, loff_t *pos) 621 621 { 622 - struct bt_sock_list *l = PDE_DATA(file_inode(seq->file)); 622 + struct bt_sock_list *l = pde_data(file_inode(seq->file)); 623 623 624 624 return seq_hlist_next(v, &l->head, pos); 625 625 } ··· 627 627 static void bt_seq_stop(struct seq_file *seq, void *v) 628 628 __releases(seq->private->l->lock) 629 629 { 630 - struct bt_sock_list *l = PDE_DATA(file_inode(seq->file)); 630 + struct bt_sock_list *l = pde_data(file_inode(seq->file)); 631 631 632 632 read_unlock(&l->lock); 633 633 } 634 634 635 635 static int bt_seq_show(struct seq_file *seq, void *v) 636 636 { 637 - struct bt_sock_list *l = PDE_DATA(file_inode(seq->file)); 637 + struct bt_sock_list *l = pde_data(file_inode(seq->file)); 638 638 639 639 if (v == SEQ_START_TOKEN) { 640 640 seq_puts(seq, "sk RefCnt Rmem Wmem User Inode Parent");

+1 -1

net/can/bcm.c

··· 193 193 { 194 194 char ifname[IFNAMSIZ]; 195 195 struct net *net = m->private; 196 - struct sock *sk = (struct sock *)PDE_DATA(m->file->f_inode); 196 + struct sock *sk = (struct sock *)pde_data(m->file->f_inode); 197 197 struct bcm_sock *bo = bcm_sk(sk); 198 198 struct bcm_op *op; 199 199

+1 -1

net/can/proc.c

··· 305 305 static int can_rcvlist_proc_show(struct seq_file *m, void *v) 306 306 { 307 307 /* double cast to prevent GCC warning */ 308 - int idx = (int)(long)PDE_DATA(m->file->f_inode); 308 + int idx = (int)(long)pde_data(m->file->f_inode); 309 309 struct net_device *dev; 310 310 struct can_dev_rcv_lists *dev_rcv_lists; 311 311 struct net *net = m->private;

+3 -3

net/core/neighbour.c

··· 3364 3364 3365 3365 static void *neigh_stat_seq_start(struct seq_file *seq, loff_t *pos) 3366 3366 { 3367 - struct neigh_table *tbl = PDE_DATA(file_inode(seq->file)); 3367 + struct neigh_table *tbl = pde_data(file_inode(seq->file)); 3368 3368 int cpu; 3369 3369 3370 3370 if (*pos == 0) ··· 3381 3381 3382 3382 static void *neigh_stat_seq_next(struct seq_file *seq, void *v, loff_t *pos) 3383 3383 { 3384 - struct neigh_table *tbl = PDE_DATA(file_inode(seq->file)); 3384 + struct neigh_table *tbl = pde_data(file_inode(seq->file)); 3385 3385 int cpu; 3386 3386 3387 3387 for (cpu = *pos; cpu < nr_cpu_ids; ++cpu) { ··· 3401 3401 3402 3402 static int neigh_stat_seq_show(struct seq_file *seq, void *v) 3403 3403 { 3404 - struct neigh_table *tbl = PDE_DATA(file_inode(seq->file)); 3404 + struct neigh_table *tbl = pde_data(file_inode(seq->file)); 3405 3405 struct neigh_statistics *st = v; 3406 3406 3407 3407 if (v == SEQ_START_TOKEN) {

+3 -3

net/core/pktgen.c

··· 546 546 547 547 static int pgctrl_open(struct inode *inode, struct file *file) 548 548 { 549 - return single_open(file, pgctrl_show, PDE_DATA(inode)); 549 + return single_open(file, pgctrl_show, pde_data(inode)); 550 550 } 551 551 552 552 static const struct proc_ops pktgen_proc_ops = { ··· 1811 1811 1812 1812 static int pktgen_if_open(struct inode *inode, struct file *file) 1813 1813 { 1814 - return single_open(file, pktgen_if_show, PDE_DATA(inode)); 1814 + return single_open(file, pktgen_if_show, pde_data(inode)); 1815 1815 } 1816 1816 1817 1817 static const struct proc_ops pktgen_if_proc_ops = { ··· 1948 1948 1949 1949 static int pktgen_thread_open(struct inode *inode, struct file *file) 1950 1950 { 1951 - return single_open(file, pktgen_thread_show, PDE_DATA(inode)); 1951 + return single_open(file, pktgen_thread_show, pde_data(inode)); 1952 1952 } 1953 1953 1954 1954 static const struct proc_ops pktgen_thread_proc_ops = {

+3 -3

net/ipv4/netfilter/ipt_CLUSTERIP.c

··· 776 776 777 777 if (!ret) { 778 778 struct seq_file *sf = file->private_data; 779 - struct clusterip_config *c = PDE_DATA(inode); 779 + struct clusterip_config *c = pde_data(inode); 780 780 781 781 sf->private = c; 782 782 ··· 788 788 789 789 static int clusterip_proc_release(struct inode *inode, struct file *file) 790 790 { 791 - struct clusterip_config *c = PDE_DATA(inode); 791 + struct clusterip_config *c = pde_data(inode); 792 792 int ret; 793 793 794 794 ret = seq_release(inode, file); ··· 802 802 static ssize_t clusterip_proc_write(struct file *file, const char __user *input, 803 803 size_t size, loff_t *ofs) 804 804 { 805 - struct clusterip_config *c = PDE_DATA(file_inode(file)); 805 + struct clusterip_config *c = pde_data(file_inode(file)); 806 806 #define PROC_WRITELEN 10 807 807 char buffer[PROC_WRITELEN+1]; 808 808 unsigned long nodenum;

+4 -4

net/ipv4/raw.c

··· 971 971 static struct sock *raw_get_first(struct seq_file *seq) 972 972 { 973 973 struct sock *sk; 974 - struct raw_hashinfo *h = PDE_DATA(file_inode(seq->file)); 974 + struct raw_hashinfo *h = pde_data(file_inode(seq->file)); 975 975 struct raw_iter_state *state = raw_seq_private(seq); 976 976 977 977 for (state->bucket = 0; state->bucket < RAW_HTABLE_SIZE; ··· 987 987 988 988 static struct sock *raw_get_next(struct seq_file *seq, struct sock *sk) 989 989 { 990 - struct raw_hashinfo *h = PDE_DATA(file_inode(seq->file)); 990 + struct raw_hashinfo *h = pde_data(file_inode(seq->file)); 991 991 struct raw_iter_state *state = raw_seq_private(seq); 992 992 993 993 do { ··· 1016 1016 void *raw_seq_start(struct seq_file *seq, loff_t *pos) 1017 1017 __acquires(&h->lock) 1018 1018 { 1019 - struct raw_hashinfo *h = PDE_DATA(file_inode(seq->file)); 1019 + struct raw_hashinfo *h = pde_data(file_inode(seq->file)); 1020 1020 1021 1021 read_lock(&h->lock); 1022 1022 return *pos ? raw_get_idx(seq, *pos - 1) : SEQ_START_TOKEN; ··· 1039 1039 void raw_seq_stop(struct seq_file *seq, void *v) 1040 1040 __releases(&h->lock) 1041 1041 { 1042 - struct raw_hashinfo *h = PDE_DATA(file_inode(seq->file)); 1042 + struct raw_hashinfo *h = pde_data(file_inode(seq->file)); 1043 1043 1044 1044 read_unlock(&h->lock); 1045 1045 }

+1 -1

net/ipv4/tcp_ipv4.c

··· 3002 3002 #endif 3003 3003 3004 3004 /* Iterated from proc fs */ 3005 - afinfo = PDE_DATA(file_inode(seq->file)); 3005 + afinfo = pde_data(file_inode(seq->file)); 3006 3006 return afinfo->family; 3007 3007 } 3008 3008

+3 -3

net/ipv4/udp.c

··· 2960 2960 if (state->bpf_seq_afinfo) 2961 2961 afinfo = state->bpf_seq_afinfo; 2962 2962 else 2963 - afinfo = PDE_DATA(file_inode(seq->file)); 2963 + afinfo = pde_data(file_inode(seq->file)); 2964 2964 2965 2965 for (state->bucket = start; state->bucket <= afinfo->udp_table->mask; 2966 2966 ++state->bucket) { ··· 2993 2993 if (state->bpf_seq_afinfo) 2994 2994 afinfo = state->bpf_seq_afinfo; 2995 2995 else 2996 - afinfo = PDE_DATA(file_inode(seq->file)); 2996 + afinfo = pde_data(file_inode(seq->file)); 2997 2997 2998 2998 do { 2999 2999 sk = sk_next(sk); ··· 3050 3050 if (state->bpf_seq_afinfo) 3051 3051 afinfo = state->bpf_seq_afinfo; 3052 3052 else 3053 - afinfo = PDE_DATA(file_inode(seq->file)); 3053 + afinfo = pde_data(file_inode(seq->file)); 3054 3054 3055 3055 if (state->bucket <= afinfo->udp_table->mask) 3056 3056 spin_unlock_bh(&afinfo->udp_table->hash[state->bucket].lock);

+5 -5

net/netfilter/x_tables.c

··· 1517 1517 #ifdef CONFIG_PROC_FS 1518 1518 static void *xt_table_seq_start(struct seq_file *seq, loff_t *pos) 1519 1519 { 1520 - u8 af = (unsigned long)PDE_DATA(file_inode(seq->file)); 1520 + u8 af = (unsigned long)pde_data(file_inode(seq->file)); 1521 1521 struct net *net = seq_file_net(seq); 1522 1522 struct xt_pernet *xt_net; 1523 1523 ··· 1529 1529 1530 1530 static void *xt_table_seq_next(struct seq_file *seq, void *v, loff_t *pos) 1531 1531 { 1532 - u8 af = (unsigned long)PDE_DATA(file_inode(seq->file)); 1532 + u8 af = (unsigned long)pde_data(file_inode(seq->file)); 1533 1533 struct net *net = seq_file_net(seq); 1534 1534 struct xt_pernet *xt_net; 1535 1535 ··· 1540 1540 1541 1541 static void xt_table_seq_stop(struct seq_file *seq, void *v) 1542 1542 { 1543 - u_int8_t af = (unsigned long)PDE_DATA(file_inode(seq->file)); 1543 + u_int8_t af = (unsigned long)pde_data(file_inode(seq->file)); 1544 1544 1545 1545 mutex_unlock(&xt[af].mutex); 1546 1546 } ··· 1584 1584 [MTTG_TRAV_NFP_UNSPEC] = MTTG_TRAV_NFP_SPEC, 1585 1585 [MTTG_TRAV_NFP_SPEC] = MTTG_TRAV_DONE, 1586 1586 }; 1587 - uint8_t nfproto = (unsigned long)PDE_DATA(file_inode(seq->file)); 1587 + uint8_t nfproto = (unsigned long)pde_data(file_inode(seq->file)); 1588 1588 struct nf_mttg_trav *trav = seq->private; 1589 1589 1590 1590 if (ppos != NULL) ··· 1633 1633 1634 1634 static void xt_mttg_seq_stop(struct seq_file *seq, void *v) 1635 1635 { 1636 - uint8_t nfproto = (unsigned long)PDE_DATA(file_inode(seq->file)); 1636 + uint8_t nfproto = (unsigned long)pde_data(file_inode(seq->file)); 1637 1637 struct nf_mttg_trav *trav = seq->private; 1638 1638 1639 1639 switch (trav->class) {

+9 -9

net/netfilter/xt_hashlimit.c

··· 1052 1052 static void *dl_seq_start(struct seq_file *s, loff_t *pos) 1053 1053 __acquires(htable->lock) 1054 1054 { 1055 - struct xt_hashlimit_htable *htable = PDE_DATA(file_inode(s->file)); 1055 + struct xt_hashlimit_htable *htable = pde_data(file_inode(s->file)); 1056 1056 unsigned int *bucket; 1057 1057 1058 1058 spin_lock_bh(&htable->lock); ··· 1069 1069 1070 1070 static void *dl_seq_next(struct seq_file *s, void *v, loff_t *pos) 1071 1071 { 1072 - struct xt_hashlimit_htable *htable = PDE_DATA(file_inode(s->file)); 1072 + struct xt_hashlimit_htable *htable = pde_data(file_inode(s->file)); 1073 1073 unsigned int *bucket = v; 1074 1074 1075 1075 *pos = ++(*bucket); ··· 1083 1083 static void dl_seq_stop(struct seq_file *s, void *v) 1084 1084 __releases(htable->lock) 1085 1085 { 1086 - struct xt_hashlimit_htable *htable = PDE_DATA(file_inode(s->file)); 1086 + struct xt_hashlimit_htable *htable = pde_data(file_inode(s->file)); 1087 1087 unsigned int *bucket = v; 1088 1088 1089 1089 if (!IS_ERR(bucket)) ··· 1125 1125 static int dl_seq_real_show_v2(struct dsthash_ent *ent, u_int8_t family, 1126 1126 struct seq_file *s) 1127 1127 { 1128 - struct xt_hashlimit_htable *ht = PDE_DATA(file_inode(s->file)); 1128 + struct xt_hashlimit_htable *ht = pde_data(file_inode(s->file)); 1129 1129 1130 1130 spin_lock(&ent->lock); 1131 1131 /* recalculate to show accurate numbers */ ··· 1140 1140 static int dl_seq_real_show_v1(struct dsthash_ent *ent, u_int8_t family, 1141 1141 struct seq_file *s) 1142 1142 { 1143 - struct xt_hashlimit_htable *ht = PDE_DATA(file_inode(s->file)); 1143 + struct xt_hashlimit_htable *ht = pde_data(file_inode(s->file)); 1144 1144 1145 1145 spin_lock(&ent->lock); 1146 1146 /* recalculate to show accurate numbers */ ··· 1155 1155 static int dl_seq_real_show(struct dsthash_ent *ent, u_int8_t family, 1156 1156 struct seq_file *s) 1157 1157 { 1158 - struct xt_hashlimit_htable *ht = PDE_DATA(file_inode(s->file)); 1158 + struct xt_hashlimit_htable *ht = pde_data(file_inode(s->file)); 1159 1159 1160 1160 spin_lock(&ent->lock); 1161 1161 /* recalculate to show accurate numbers */ ··· 1169 1169 1170 1170 static int dl_seq_show_v2(struct seq_file *s, void *v) 1171 1171 { 1172 - struct xt_hashlimit_htable *htable = PDE_DATA(file_inode(s->file)); 1172 + struct xt_hashlimit_htable *htable = pde_data(file_inode(s->file)); 1173 1173 unsigned int *bucket = (unsigned int *)v; 1174 1174 struct dsthash_ent *ent; 1175 1175 ··· 1183 1183 1184 1184 static int dl_seq_show_v1(struct seq_file *s, void *v) 1185 1185 { 1186 - struct xt_hashlimit_htable *htable = PDE_DATA(file_inode(s->file)); 1186 + struct xt_hashlimit_htable *htable = pde_data(file_inode(s->file)); 1187 1187 unsigned int *bucket = v; 1188 1188 struct dsthash_ent *ent; 1189 1189 ··· 1197 1197 1198 1198 static int dl_seq_show(struct seq_file *s, void *v) 1199 1199 { 1200 - struct xt_hashlimit_htable *htable = PDE_DATA(file_inode(s->file)); 1200 + struct xt_hashlimit_htable *htable = pde_data(file_inode(s->file)); 1201 1201 unsigned int *bucket = v; 1202 1202 struct dsthash_ent *ent; 1203 1203

+2 -2

net/netfilter/xt_recent.c

··· 551 551 if (st == NULL) 552 552 return -ENOMEM; 553 553 554 - st->table = PDE_DATA(inode); 554 + st->table = pde_data(inode); 555 555 return 0; 556 556 } 557 557 ··· 559 559 recent_mt_proc_write(struct file *file, const char __user *input, 560 560 size_t size, loff_t *loff) 561 561 { 562 - struct recent_table *t = PDE_DATA(file_inode(file)); 562 + struct recent_table *t = pde_data(file_inode(file)); 563 563 struct recent_entry *e; 564 564 char buf[sizeof("+b335:1d35:1e55:dead:c0de:1715:5afe:c0de")]; 565 565 const char *c = buf;

+2 -2

net/sunrpc/auth_gss/svcauth_gss.c

··· 1433 1433 static ssize_t write_gssp(struct file *file, const char __user *buf, 1434 1434 size_t count, loff_t *ppos) 1435 1435 { 1436 - struct net *net = PDE_DATA(file_inode(file)); 1436 + struct net *net = pde_data(file_inode(file)); 1437 1437 char tbuf[20]; 1438 1438 unsigned long i; 1439 1439 int res; ··· 1461 1461 static ssize_t read_gssp(struct file *file, char __user *buf, 1462 1462 size_t count, loff_t *ppos) 1463 1463 { 1464 - struct net *net = PDE_DATA(file_inode(file)); 1464 + struct net *net = pde_data(file_inode(file)); 1465 1465 struct sunrpc_net *sn = net_generic(net, sunrpc_net_id); 1466 1466 unsigned long p = *ppos; 1467 1467 char tbuf[10];

+12 -12

net/sunrpc/cache.c

··· 1536 1536 static ssize_t cache_read_procfs(struct file *filp, char __user *buf, 1537 1537 size_t count, loff_t *ppos) 1538 1538 { 1539 - struct cache_detail *cd = PDE_DATA(file_inode(filp)); 1539 + struct cache_detail *cd = pde_data(file_inode(filp)); 1540 1540 1541 1541 return cache_read(filp, buf, count, ppos, cd); 1542 1542 } ··· 1544 1544 static ssize_t cache_write_procfs(struct file *filp, const char __user *buf, 1545 1545 size_t count, loff_t *ppos) 1546 1546 { 1547 - struct cache_detail *cd = PDE_DATA(file_inode(filp)); 1547 + struct cache_detail *cd = pde_data(file_inode(filp)); 1548 1548 1549 1549 return cache_write(filp, buf, count, ppos, cd); 1550 1550 } 1551 1551 1552 1552 static __poll_t cache_poll_procfs(struct file *filp, poll_table *wait) 1553 1553 { 1554 - struct cache_detail *cd = PDE_DATA(file_inode(filp)); 1554 + struct cache_detail *cd = pde_data(file_inode(filp)); 1555 1555 1556 1556 return cache_poll(filp, wait, cd); 1557 1557 } ··· 1560 1560 unsigned int cmd, unsigned long arg) 1561 1561 { 1562 1562 struct inode *inode = file_inode(filp); 1563 - struct cache_detail *cd = PDE_DATA(inode); 1563 + struct cache_detail *cd = pde_data(inode); 1564 1564 1565 1565 return cache_ioctl(inode, filp, cmd, arg, cd); 1566 1566 } 1567 1567 1568 1568 static int cache_open_procfs(struct inode *inode, struct file *filp) 1569 1569 { 1570 - struct cache_detail *cd = PDE_DATA(inode); 1570 + struct cache_detail *cd = pde_data(inode); 1571 1571 1572 1572 return cache_open(inode, filp, cd); 1573 1573 } 1574 1574 1575 1575 static int cache_release_procfs(struct inode *inode, struct file *filp) 1576 1576 { 1577 - struct cache_detail *cd = PDE_DATA(inode); 1577 + struct cache_detail *cd = pde_data(inode); 1578 1578 1579 1579 return cache_release(inode, filp, cd); 1580 1580 } ··· 1591 1591 1592 1592 static int content_open_procfs(struct inode *inode, struct file *filp) 1593 1593 { 1594 - struct cache_detail *cd = PDE_DATA(inode); 1594 + struct cache_detail *cd = pde_data(inode); 1595 1595 1596 1596 return content_open(inode, filp, cd); 1597 1597 } 1598 1598 1599 1599 static int content_release_procfs(struct inode *inode, struct file *filp) 1600 1600 { 1601 - struct cache_detail *cd = PDE_DATA(inode); 1601 + struct cache_detail *cd = pde_data(inode); 1602 1602 1603 1603 return content_release(inode, filp, cd); 1604 1604 } ··· 1612 1612 1613 1613 static int open_flush_procfs(struct inode *inode, struct file *filp) 1614 1614 { 1615 - struct cache_detail *cd = PDE_DATA(inode); 1615 + struct cache_detail *cd = pde_data(inode); 1616 1616 1617 1617 return open_flush(inode, filp, cd); 1618 1618 } 1619 1619 1620 1620 static int release_flush_procfs(struct inode *inode, struct file *filp) 1621 1621 { 1622 - struct cache_detail *cd = PDE_DATA(inode); 1622 + struct cache_detail *cd = pde_data(inode); 1623 1623 1624 1624 return release_flush(inode, filp, cd); 1625 1625 } ··· 1627 1627 static ssize_t read_flush_procfs(struct file *filp, char __user *buf, 1628 1628 size_t count, loff_t *ppos) 1629 1629 { 1630 - struct cache_detail *cd = PDE_DATA(file_inode(filp)); 1630 + struct cache_detail *cd = pde_data(file_inode(filp)); 1631 1631 1632 1632 return read_flush(filp, buf, count, ppos, cd); 1633 1633 } ··· 1636 1636 const char __user *buf, 1637 1637 size_t count, loff_t *ppos) 1638 1638 { 1639 - struct cache_detail *cd = PDE_DATA(file_inode(filp)); 1639 + struct cache_detail *cd = pde_data(file_inode(filp)); 1640 1640 1641 1641 return write_flush(filp, buf, count, ppos, cd); 1642 1642 }

+1 -1

net/sunrpc/stats.c

··· 66 66 67 67 static int rpc_proc_open(struct inode *inode, struct file *file) 68 68 { 69 - return single_open(file, rpc_proc_show, PDE_DATA(inode)); 69 + return single_open(file, rpc_proc_show, pde_data(inode)); 70 70 } 71 71 72 72 static const struct proc_ops rpc_proc_ops = {

+2 -2

sound/core/info.c

··· 234 234 235 235 static int snd_info_entry_open(struct inode *inode, struct file *file) 236 236 { 237 - struct snd_info_entry *entry = PDE_DATA(inode); 237 + struct snd_info_entry *entry = pde_data(inode); 238 238 struct snd_info_private_data *data; 239 239 int mode, err; 240 240 ··· 365 365 366 366 static int snd_info_text_entry_open(struct inode *inode, struct file *file) 367 367 { 368 - struct snd_info_entry *entry = PDE_DATA(inode); 368 + struct snd_info_entry *entry = pde_data(inode); 369 369 struct snd_info_private_data *data; 370 370 int err; 371 371

Configure Feed

Configure Feed