Merge branch 'akpm' (patches from Andrew)

+35 -12

Documentation/cgroup-v1/memory.txt

··· 789 789 system. It might be too late to consult with vmstat or any other 790 790 statistics, so it's advisable to take an immediate action. 791 791 792 - The events are propagated upward until the event is handled, i.e. the 793 - events are not pass-through. Here is what this means: for example you have 794 - three cgroups: A->B->C. Now you set up an event listener on cgroups A, B 795 - and C, and suppose group C experiences some pressure. In this situation, 796 - only group C will receive the notification, i.e. groups A and B will not 797 - receive it. This is done to avoid excessive "broadcasting" of messages, 798 - which disturbs the system and which is especially bad if we are low on 799 - memory or thrashing. So, organize the cgroups wisely, or propagate the 800 - events manually (or, ask us to implement the pass-through events, 801 - explaining why would you need them.) 792 + By default, events are propagated upward until the event is handled, i.e. the 793 + events are not pass-through. For example, you have three cgroups: A->B->C. Now 794 + you set up an event listener on cgroups A, B and C, and suppose group C 795 + experiences some pressure. In this situation, only group C will receive the 796 + notification, i.e. groups A and B will not receive it. This is done to avoid 797 + excessive "broadcasting" of messages, which disturbs the system and which is 798 + especially bad if we are low on memory or thrashing. Group B, will receive 799 + notification only if there are no event listers for group C. 800 + 801 + There are three optional modes that specify different propagation behavior: 802 + 803 + - "default": this is the default behavior specified above. This mode is the 804 + same as omitting the optional mode parameter, preserved by backwards 805 + compatibility. 806 + 807 + - "hierarchy": events always propagate up to the root, similar to the default 808 + behavior, except that propagation continues regardless of whether there are 809 + event listeners at each level, with the "hierarchy" mode. In the above 810 + example, groups A, B, and C will receive notification of memory pressure. 811 + 812 + - "local": events are pass-through, i.e. they only receive notifications when 813 + memory pressure is experienced in the memcg for which the notification is 814 + registered. In the above example, group C will receive notification if 815 + registered for "local" notification and the group experiences memory 816 + pressure. However, group B will never receive notification, regardless if 817 + there is an event listener for group C or not, if group B is registered for 818 + local notification. 819 + 820 + The level and event notification mode ("hierarchy" or "local", if necessary) are 821 + specified by a comma-delimited string, i.e. "low,hierarchy" specifies 822 + hierarchical, pass-through, notification for all ancestor memcgs. Notification 823 + that is the default, non pass-through behavior, does not specify a mode. 824 + "medium,local" specifies pass-through notification for the medium level. 802 825 803 826 The file memory.pressure_level is only used to setup an eventfd. To 804 827 register a notification, an application must: 805 828 806 829 - create an eventfd using eventfd(2); 807 830 - open memory.pressure_level; 808 - - write string like "<event_fd> <fd of memory.pressure_level> <level>" 831 + - write string as "<event_fd> <fd of memory.pressure_level> <level[,mode]>" 809 832 to cgroup.event_control. 810 833 811 834 Application will be notified through eventfd when memory pressure is at ··· 844 821 # cd /sys/fs/cgroup/memory/ 845 822 # mkdir foo 846 823 # cd foo 847 - # cgroup_event_listener memory.pressure_level low & 824 + # cgroup_event_listener memory.pressure_level low,hierarchy & 848 825 # echo 8000000 > memory.limit_in_bytes 849 826 # echo 8000000 > memory.memsw.limit_in_bytes 850 827 # echo $$ > tasks

+9 -3

Documentation/memory-hotplug.txt

··· 282 282 % echo online > /sys/devices/system/memory/memoryXXX/state 283 283 284 284 This onlining will not change the ZONE type of the target memory block, 285 - If the memory block is in ZONE_NORMAL, you can change it to ZONE_MOVABLE: 285 + If the memory block doesn't belong to any zone an appropriate kernel zone 286 + (usually ZONE_NORMAL) will be used unless movable_node kernel command line 287 + option is specified when ZONE_MOVABLE will be used. 288 + 289 + You can explicitly request to associate it with ZONE_MOVABLE by 286 290 287 291 % echo online_movable > /sys/devices/system/memory/memoryXXX/state 288 292 (NOTE: current limit: this memory block must be adjacent to ZONE_MOVABLE) 289 293 290 - And if the memory block is in ZONE_MOVABLE, you can change it to ZONE_NORMAL: 294 + Or you can explicitly request a kernel zone (usually ZONE_NORMAL) by: 291 295 292 296 % echo online_kernel > /sys/devices/system/memory/memoryXXX/state 293 297 (NOTE: current limit: this memory block must be adjacent to ZONE_NORMAL) 294 298 299 + An explicit zone onlining can fail (e.g. when the range is already within 300 + and existing and incompatible zone already). 301 + 295 302 After this, memory block XXX's state will be 'online' and the amount of 296 303 available memory will be increased. 297 304 298 - Currently, newly added memory is added as ZONE_NORMAL (for powerpc, ZONE_DMA). 299 305 This may be changed in future. 300 306 301 307

+20

Documentation/sysctl/vm.txt

··· 240 240 241 241 ============================================================== 242 242 243 + highmem_is_dirtyable 244 + 245 + Available only for systems with CONFIG_HIGHMEM enabled (32b systems). 246 + 247 + This parameter controls whether the high memory is considered for dirty 248 + writers throttling. This is not the case by default which means that 249 + only the amount of memory directly visible/usable by the kernel can 250 + be dirtied. As a result, on systems with a large amount of memory and 251 + lowmem basically depleted writers might be throttled too early and 252 + streaming writes can get very slow. 253 + 254 + Changing the value to non zero would allow more memory to be dirtied 255 + and thus allow writers to write more data which can be flushed to the 256 + storage more effectively. Note this also comes with a risk of pre-mature 257 + OOM killer because some writers (e.g. direct block device writes) can 258 + only use the low memory and they can fill it up with dirty data without 259 + any throttling. 260 + 261 + ============================================================== 262 + 243 263 hugepages_treat_as_movable 244 264 245 265 This parameter controls whether we can allocate hugepages from ZONE_MOVABLE

+11

MAINTAINERS

··· 10559 10559 S: Obsolete 10560 10560 F: drivers/net/wireless/intersil/prism54/ 10561 10561 10562 + PROC SYSCTL 10563 + M: "Luis R. Rodriguez" <mcgrof@kernel.org> 10564 + M: Kees Cook <keescook@chromium.org> 10565 + L: linux-kernel@vger.kernel.org 10566 + L: linux-fsdevel@vger.kernel.org 10567 + S: Maintained 10568 + F: fs/proc/proc_sysctl.c 10569 + F: include/linux/sysctl.h 10570 + F: kernel/sysctl.c 10571 + F: tools/testing/selftests/sysctl/ 10572 + 10562 10573 PS3 NETWORK SUPPORT 10563 10574 M: Geoff Levand <geoff@infradead.org> 10564 10575 L: netdev@vger.kernel.org

+1

arch/arm/boot/compressed/decompress.c

··· 33 33 /* Not needed, but used in some headers pulled in by decompressors */ 34 34 extern char * strstr(const char * s1, const char *s2); 35 35 extern size_t strlen(const char *s); 36 + extern int memcmp(const void *cs, const void *ct, size_t count); 36 37 37 38 #ifdef CONFIG_KERNEL_GZIP 38 39 #include "../../../../lib/decompress_inflate.c"

+2 -6

arch/arm/include/asm/elf.h

··· 112 112 #define CORE_DUMP_USE_REGSET 113 113 #define ELF_EXEC_PAGESIZE 4096 114 114 115 - /* This is the location that an ET_DYN program is loaded if exec'ed. Typical 116 - use of this is to invoke "./ld.so someprog" to test out a new version of 117 - the loader. We need to make sure that it is out of the way of the program 118 - that it will "exec", and that there is sufficient room for the brk. */ 119 - 120 - #define ELF_ET_DYN_BASE (TASK_SIZE / 3 * 2) 115 + /* This is the base location for PIE (ET_DYN with INTERP) loads. */ 116 + #define ELF_ET_DYN_BASE 0x400000UL 121 117 122 118 /* When the program starts, a1 contains a pointer to a function to be 123 119 registered with atexit, as per the SVR4 ABI. A value of 0 means we

+1 -2

arch/arm/kernel/atags_parse.c

··· 18 18 */ 19 19 20 20 #include <linux/init.h> 21 + #include <linux/initrd.h> 21 22 #include <linux/kernel.h> 22 23 #include <linux/fs.h> 23 24 #include <linux/root_dev.h> ··· 92 91 #ifdef CONFIG_BLK_DEV_RAM 93 92 static int __init parse_tag_ramdisk(const struct tag *tag) 94 93 { 95 - extern int rd_size, rd_image_start, rd_prompt, rd_doload; 96 - 97 94 rd_image_start = tag->u.ramdisk.start; 98 95 rd_doload = (tag->u.ramdisk.flags & 1) == 0; 99 96 rd_prompt = (tag->u.ramdisk.flags & 2) == 0;

+6 -6

arch/arm64/include/asm/elf.h

··· 113 113 #define ELF_EXEC_PAGESIZE PAGE_SIZE 114 114 115 115 /* 116 - * This is the location that an ET_DYN program is loaded if exec'ed. Typical 117 - * use of this is to invoke "./ld.so someprog" to test out a new version of 118 - * the loader. We need to make sure that it is out of the way of the program 119 - * that it will "exec", and that there is sufficient room for the brk. 116 + * This is the base location for PIE (ET_DYN with INTERP) loads. On 117 + * 64-bit, this is raised to 4GB to leave the entire 32-bit address 118 + * space open for things that want to use the area for 32-bit pointers. 120 119 */ 121 - #define ELF_ET_DYN_BASE (2 * TASK_SIZE_64 / 3) 120 + #define ELF_ET_DYN_BASE 0x100000000UL 122 121 123 122 #ifndef __ASSEMBLY__ 124 123 ··· 173 174 174 175 #ifdef CONFIG_COMPAT 175 176 176 - #define COMPAT_ELF_ET_DYN_BASE (2 * TASK_SIZE_32 / 3) 177 + /* PIE load location for compat arm. Must match ARM ELF_ET_DYN_BASE. */ 178 + #define COMPAT_ELF_ET_DYN_BASE 0x000400000UL 177 179 178 180 /* AArch32 registers. */ 179 181 #define COMPAT_ELF_NGREG 18

+1 -7

arch/arm64/mm/kasan_init.c

··· 191 191 if (start >= end) 192 192 break; 193 193 194 - /* 195 - * end + 1 here is intentional. We check several shadow bytes in 196 - * advance to slightly speed up fastpath. In some rare cases 197 - * we could cross boundary of mapped shadow, so we just map 198 - * some more here. 199 - */ 200 194 vmemmap_populate((unsigned long)kasan_mem_to_shadow(start), 201 - (unsigned long)kasan_mem_to_shadow(end) + 1, 195 + (unsigned long)kasan_mem_to_shadow(end), 202 196 pfn_to_nid(virt_to_pfn(start))); 203 197 } 204 198

+2

arch/frv/include/asm/Kbuild

··· 1 1 2 2 generic-y += clkdev.h 3 + generic-y += device.h 3 4 generic-y += exec.h 4 5 generic-y += extable.h 6 + generic-y += fb.h 5 7 generic-y += irq_work.h 6 8 generic-y += mcs_spinlock.h 7 9 generic-y += mm-arch-hooks.h

+1

arch/frv/include/asm/cmpxchg.h

··· 76 76 * - if (*ptr != test) then orig = *ptr; 77 77 */ 78 78 extern uint64_t __cmpxchg_64(uint64_t test, uint64_t new, volatile uint64_t *v); 79 + #define cmpxchg64(p, o, n) __cmpxchg_64((o), (n), (p)) 79 80 80 81 #ifndef CONFIG_FRV_OUTOFLINE_ATOMIC_OPS 81 82

-7

arch/frv/include/asm/device.h

··· 1 - /* 2 - * Arch specific extensions to struct device 3 - * 4 - * This file is released under the GPLv2 5 - */ 6 - #include <asm-generic/device.h> 7 -

-12

arch/frv/include/asm/fb.h

··· 1 - #ifndef _ASM_FB_H_ 2 - #define _ASM_FB_H_ 3 - #include <linux/fb.h> 4 - 5 - #define fb_pgprotect(...) do {} while (0) 6 - 7 - static inline int fb_is_primary_device(struct fb_info *info) 8 - { 9 - return 0; 10 - } 11 - 12 - #endif /* _ASM_FB_H_ */

+2 -1

arch/mips/kernel/module.c

··· 317 317 318 318 spin_lock_irqsave(&dbe_lock, flags); 319 319 list_for_each_entry(dbe, &dbe_list, dbe_list) { 320 - e = search_extable(dbe->dbe_start, dbe->dbe_end - 1, addr); 320 + e = search_extable(dbe->dbe_start, 321 + dbe->dbe_end - dbe->dbe_start, addr); 321 322 if (e) 322 323 break; 323 324 }

+2 -1

arch/mips/kernel/traps.c

··· 429 429 { 430 430 const struct exception_table_entry *e; 431 431 432 - e = search_extable(__start___dbe_table, __stop___dbe_table - 1, addr); 432 + e = search_extable(__start___dbe_table, 433 + __stop___dbe_table - __start___dbe_table, addr); 433 434 if (!e) 434 435 e = search_module_dbetables(addr); 435 436 return e;

+7 -6

arch/powerpc/include/asm/elf.h

··· 23 23 #define CORE_DUMP_USE_REGSET 24 24 #define ELF_EXEC_PAGESIZE PAGE_SIZE 25 25 26 - /* This is the location that an ET_DYN program is loaded if exec'ed. Typical 27 - use of this is to invoke "./ld.so someprog" to test out a new version of 28 - the loader. We need to make sure that it is out of the way of the program 29 - that it will "exec", and that there is sufficient room for the brk. */ 30 - 31 - #define ELF_ET_DYN_BASE 0x20000000 26 + /* 27 + * This is the base location for PIE (ET_DYN with INTERP) loads. On 28 + * 64-bit, this is raised to 4GB to leave the entire 32-bit address 29 + * space open for things that want to use the area for 32-bit pointers. 30 + */ 31 + #define ELF_ET_DYN_BASE (is_32bit_task() ? 0x000400000UL : \ 32 + 0x100000000UL) 32 33 33 34 #define ELF_CORE_EFLAGS (is_elf2_task() ? 2 : 0) 34 35

+7 -8

arch/s390/include/asm/elf.h

··· 193 193 #define CORE_DUMP_USE_REGSET 194 194 #define ELF_EXEC_PAGESIZE 4096 195 195 196 - /* This is the location that an ET_DYN program is loaded if exec'ed. Typical 197 - use of this is to invoke "./ld.so someprog" to test out a new version of 198 - the loader. We need to make sure that it is out of the way of the program 199 - that it will "exec", and that there is sufficient room for the brk. 64-bit 200 - tasks are aligned to 4GB. */ 201 - #define ELF_ET_DYN_BASE (is_compat_task() ? \ 202 - (STACK_TOP / 3 * 2) : \ 203 - (STACK_TOP / 3 * 2) & ~((1UL << 32) - 1)) 196 + /* 197 + * This is the base location for PIE (ET_DYN with INTERP) loads. On 198 + * 64-bit, this is raised to 4GB to leave the entire 32-bit address 199 + * space open for things that want to use the area for 32-bit pointers. 200 + */ 201 + #define ELF_ET_DYN_BASE (is_compat_task() ? 0x000400000UL : \ 202 + 0x100000000UL) 204 203 205 204 /* This yields a mask that user programs can use to figure out what 206 205 instruction set this CPU supports. */

+18 -16

arch/sh/mm/extable_64.c

··· 10 10 * License. See the file "COPYING" in the main directory of this archive 11 11 * for more details. 12 12 */ 13 + #include <linux/bsearch.h> 13 14 #include <linux/rwsem.h> 14 15 #include <linux/extable.h> 15 16 #include <linux/uaccess.h> ··· 41 40 return NULL; 42 41 } 43 42 43 + static int cmp_ex_search(const void *key, const void *elt) 44 + { 45 + const struct exception_table_entry *_elt = elt; 46 + unsigned long _key = *(unsigned long *)key; 47 + 48 + /* avoid overflow */ 49 + if (_key > _elt->insn) 50 + return 1; 51 + if (_key < _elt->insn) 52 + return -1; 53 + return 0; 54 + } 55 + 44 56 /* Simple binary search */ 45 57 const struct exception_table_entry * 46 - search_extable(const struct exception_table_entry *first, 47 - const struct exception_table_entry *last, 58 + search_extable(const struct exception_table_entry *base, 59 + const size_t num, 48 60 unsigned long value) 49 61 { 50 62 const struct exception_table_entry *mid; ··· 66 52 if (mid) 67 53 return mid; 68 54 69 - while (first <= last) { 70 - long diff; 71 - 72 - mid = (last - first) / 2 + first; 73 - diff = mid->insn - value; 74 - if (diff == 0) 75 - return mid; 76 - else if (diff < 0) 77 - first = mid+1; 78 - else 79 - last = mid-1; 80 - } 81 - 82 - return NULL; 55 + return bsearch(&value, base, num, 56 + sizeof(struct exception_table_entry), cmp_ex_search); 83 57 } 84 58 85 59 int fixup_exception(struct pt_regs *regs)

+14 -14

arch/sparc/mm/extable.c

··· 13 13 14 14 /* Caller knows they are in a range if ret->fixup == 0 */ 15 15 const struct exception_table_entry * 16 - search_extable(const struct exception_table_entry *start, 17 - const struct exception_table_entry *last, 16 + search_extable(const struct exception_table_entry *base, 17 + const size_t num, 18 18 unsigned long value) 19 19 { 20 - const struct exception_table_entry *walk; 20 + int i; 21 21 22 22 /* Single insn entries are encoded as: 23 23 * word 1: insn address ··· 37 37 */ 38 38 39 39 /* 1. Try to find an exact match. */ 40 - for (walk = start; walk <= last; walk++) { 41 - if (walk->fixup == 0) { 40 + for (i = 0; i < num; i++) { 41 + if (base[i].fixup == 0) { 42 42 /* A range entry, skip both parts. */ 43 - walk++; 43 + i++; 44 44 continue; 45 45 } 46 46 47 47 /* A deleted entry; see trim_init_extable */ 48 - if (walk->fixup == -1) 48 + if (base[i].fixup == -1) 49 49 continue; 50 50 51 - if (walk->insn == value) 52 - return walk; 51 + if (base[i].insn == value) 52 + return &base[i]; 53 53 } 54 54 55 55 /* 2. Try to find a range match. */ 56 - for (walk = start; walk <= (last - 1); walk++) { 57 - if (walk->fixup) 56 + for (i = 0; i < (num - 1); i++) { 57 + if (base[i].fixup) 58 58 continue; 59 59 60 - if (walk[0].insn <= value && walk[1].insn > value) 61 - return walk; 60 + if (base[i].insn <= value && base[i + 1].insn > value) 61 + return &base[i]; 62 62 63 - walk++; 63 + i++; 64 64 } 65 65 66 66 return NULL;

+7 -6

arch/x86/include/asm/elf.h

··· 245 245 #define CORE_DUMP_USE_REGSET 246 246 #define ELF_EXEC_PAGESIZE 4096 247 247 248 - /* This is the location that an ET_DYN program is loaded if exec'ed. Typical 249 - use of this is to invoke "./ld.so someprog" to test out a new version of 250 - the loader. We need to make sure that it is out of the way of the program 251 - that it will "exec", and that there is sufficient room for the brk. */ 252 - 253 - #define ELF_ET_DYN_BASE (TASK_SIZE / 3 * 2) 248 + /* 249 + * This is the base location for PIE (ET_DYN with INTERP) loads. On 250 + * 64-bit, this is raised to 4GB to leave the entire 32-bit address 251 + * space open for things that want to use the area for 32-bit pointers. 252 + */ 253 + #define ELF_ET_DYN_BASE (mmap_is_ia32() ? 0x000400000UL : \ 254 + 0x100000000UL) 254 255 255 256 /* This yields a mask that user programs can use to figure out what 256 257 instruction set this CPU supports. This could be done in user space,

+1 -6

arch/x86/mm/kasan_init_64.c

··· 23 23 start = (unsigned long)kasan_mem_to_shadow(pfn_to_kaddr(range->start)); 24 24 end = (unsigned long)kasan_mem_to_shadow(pfn_to_kaddr(range->end)); 25 25 26 - /* 27 - * end + 1 here is intentional. We check several shadow bytes in advance 28 - * to slightly speed up fastpath. In some rare cases we could cross 29 - * boundary of mapped shadow, so we just map some more here. 30 - */ 31 - return vmemmap_populate(start, end + 1, NUMA_NO_NODE); 26 + return vmemmap_populate(start, end, NUMA_NO_NODE); 32 27 } 33 28 34 29 static void __init clear_pgds(unsigned long start,

+2 -7

drivers/base/node.c

··· 288 288 * 289 289 * Initialize and register the node device. 290 290 */ 291 - static int register_node(struct node *node, int num, struct node *parent) 291 + static int register_node(struct node *node, int num) 292 292 { 293 293 int error; 294 294 ··· 567 567 568 568 int __register_one_node(int nid) 569 569 { 570 - int p_node = parent_node(nid); 571 - struct node *parent = NULL; 572 570 int error; 573 571 int cpu; 574 - 575 - if (p_node != nid) 576 - parent = node_devices[p_node]; 577 572 578 573 node_devices[nid] = kzalloc(sizeof(struct node), GFP_KERNEL); 579 574 if (!node_devices[nid]) 580 575 return -ENOMEM; 581 576 582 - error = register_node(node_devices[nid], nid, parent); 577 + error = register_node(node_devices[nid], nid); 583 578 584 579 /* link cpu under this node */ 585 580 for_each_present_cpu(cpu) {

+1

drivers/block/brd.c

··· 9 9 */ 10 10 11 11 #include <linux/init.h> 12 + #include <linux/initrd.h> 12 13 #include <linux/module.h> 13 14 #include <linux/moduleparam.h> 14 15 #include <linux/major.h>

+4 -6

drivers/block/zram/zcomp.c

··· 68 68 69 69 bool zcomp_available_algorithm(const char *comp) 70 70 { 71 - int i = 0; 71 + int i; 72 72 73 - while (backends[i]) { 74 - if (sysfs_streq(comp, backends[i])) 75 - return true; 76 - i++; 77 - } 73 + i = __sysfs_match_string(backends, -1, comp); 74 + if (i >= 0) 75 + return true; 78 76 79 77 /* 80 78 * Crypto does not ignore a trailing new line symbol,

+1 -1

drivers/block/zram/zram_drv.c

··· 1124 1124 NULL, 1125 1125 }; 1126 1126 1127 - static struct attribute_group zram_disk_attr_group = { 1127 + static const struct attribute_group zram_disk_attr_group = { 1128 1128 .attrs = zram_disk_attrs, 1129 1129 }; 1130 1130

+60 -19

fs/binfmt_elf.c

··· 163 163 unsigned long p = bprm->p; 164 164 int argc = bprm->argc; 165 165 int envc = bprm->envc; 166 - elf_addr_t __user *argv; 167 - elf_addr_t __user *envp; 168 166 elf_addr_t __user *sp; 169 167 elf_addr_t __user *u_platform; 170 168 elf_addr_t __user *u_base_platform; ··· 302 304 /* Now, let's put argc (and argv, envp if appropriate) on the stack */ 303 305 if (__put_user(argc, sp++)) 304 306 return -EFAULT; 305 - argv = sp; 306 - envp = argv + argc + 1; 307 307 308 - /* Populate argv and envp */ 308 + /* Populate list of argv pointers back to argv strings. */ 309 309 p = current->mm->arg_end = current->mm->arg_start; 310 310 while (argc-- > 0) { 311 311 size_t len; 312 - if (__put_user((elf_addr_t)p, argv++)) 312 + if (__put_user((elf_addr_t)p, sp++)) 313 313 return -EFAULT; 314 314 len = strnlen_user((void __user *)p, MAX_ARG_STRLEN); 315 315 if (!len || len > MAX_ARG_STRLEN) 316 316 return -EINVAL; 317 317 p += len; 318 318 } 319 - if (__put_user(0, argv)) 319 + if (__put_user(0, sp++)) 320 320 return -EFAULT; 321 - current->mm->arg_end = current->mm->env_start = p; 321 + current->mm->arg_end = p; 322 + 323 + /* Populate list of envp pointers back to envp strings. */ 324 + current->mm->env_end = current->mm->env_start = p; 322 325 while (envc-- > 0) { 323 326 size_t len; 324 - if (__put_user((elf_addr_t)p, envp++)) 327 + if (__put_user((elf_addr_t)p, sp++)) 325 328 return -EFAULT; 326 329 len = strnlen_user((void __user *)p, MAX_ARG_STRLEN); 327 330 if (!len || len > MAX_ARG_STRLEN) 328 331 return -EINVAL; 329 332 p += len; 330 333 } 331 - if (__put_user(0, envp)) 334 + if (__put_user(0, sp++)) 332 335 return -EFAULT; 333 336 current->mm->env_end = p; 334 337 335 338 /* Put the elf_info on the stack in the right place. */ 336 - sp = (elf_addr_t __user *)envp + 1; 337 339 if (copy_to_user(sp, elf_info, ei_index * sizeof(elf_addr_t))) 338 340 return -EFAULT; 339 341 return 0; ··· 925 927 elf_flags = MAP_PRIVATE | MAP_DENYWRITE | MAP_EXECUTABLE; 926 928 927 929 vaddr = elf_ppnt->p_vaddr; 930 + /* 931 + * If we are loading ET_EXEC or we have already performed 932 + * the ET_DYN load_addr calculations, proceed normally. 933 + */ 928 934 if (loc->elf_ex.e_type == ET_EXEC || load_addr_set) { 929 935 elf_flags |= MAP_FIXED; 930 936 } else if (loc->elf_ex.e_type == ET_DYN) { 931 - /* Try and get dynamic programs out of the way of the 932 - * default mmap base, as well as whatever program they 933 - * might try to exec. This is because the brk will 934 - * follow the loader, and is not movable. */ 935 - load_bias = ELF_ET_DYN_BASE - vaddr; 936 - if (current->flags & PF_RANDOMIZE) 937 - load_bias += arch_mmap_rnd(); 938 - load_bias = ELF_PAGESTART(load_bias); 937 + /* 938 + * This logic is run once for the first LOAD Program 939 + * Header for ET_DYN binaries to calculate the 940 + * randomization (load_bias) for all the LOAD 941 + * Program Headers, and to calculate the entire 942 + * size of the ELF mapping (total_size). (Note that 943 + * load_addr_set is set to true later once the 944 + * initial mapping is performed.) 945 + * 946 + * There are effectively two types of ET_DYN 947 + * binaries: programs (i.e. PIE: ET_DYN with INTERP) 948 + * and loaders (ET_DYN without INTERP, since they 949 + * _are_ the ELF interpreter). The loaders must 950 + * be loaded away from programs since the program 951 + * may otherwise collide with the loader (especially 952 + * for ET_EXEC which does not have a randomized 953 + * position). For example to handle invocations of 954 + * "./ld.so someprog" to test out a new version of 955 + * the loader, the subsequent program that the 956 + * loader loads must avoid the loader itself, so 957 + * they cannot share the same load range. Sufficient 958 + * room for the brk must be allocated with the 959 + * loader as well, since brk must be available with 960 + * the loader. 961 + * 962 + * Therefore, programs are loaded offset from 963 + * ELF_ET_DYN_BASE and loaders are loaded into the 964 + * independently randomized mmap region (0 load_bias 965 + * without MAP_FIXED). 966 + */ 967 + if (elf_interpreter) { 968 + load_bias = ELF_ET_DYN_BASE; 969 + if (current->flags & PF_RANDOMIZE) 970 + load_bias += arch_mmap_rnd(); 971 + elf_flags |= MAP_FIXED; 972 + } else 973 + load_bias = 0; 974 + 975 + /* 976 + * Since load_bias is used for all subsequent loading 977 + * calculations, we must lower it by the first vaddr 978 + * so that the remaining calculations based on the 979 + * ELF vaddrs will be correctly offset. The result 980 + * is then page aligned. 981 + */ 982 + load_bias = ELF_PAGESTART(load_bias - vaddr); 983 + 939 984 total_size = total_mapping_size(elf_phdata, 940 985 loc->elf_ex.e_phnum); 941 986 if (!total_size) {

+15 -28

fs/buffer.c

··· 1281 1281 } 1282 1282 1283 1283 /* 1284 - * The LRU management algorithm is dopey-but-simple. Sorry. 1284 + * Install a buffer_head into this cpu's LRU. If not already in the LRU, it is 1285 + * inserted at the front, and the buffer_head at the back if any is evicted. 1286 + * Or, if already in the LRU it is moved to the front. 1285 1287 */ 1286 1288 static void bh_lru_install(struct buffer_head *bh) 1287 1289 { 1288 - struct buffer_head *evictee = NULL; 1290 + struct buffer_head *evictee = bh; 1291 + struct bh_lru *b; 1292 + int i; 1289 1293 1290 1294 check_irqs_on(); 1291 1295 bh_lru_lock(); 1292 - if (__this_cpu_read(bh_lrus.bhs[0]) != bh) { 1293 - struct buffer_head *bhs[BH_LRU_SIZE]; 1294 - int in; 1295 - int out = 0; 1296 1296 1297 - get_bh(bh); 1298 - bhs[out++] = bh; 1299 - for (in = 0; in < BH_LRU_SIZE; in++) { 1300 - struct buffer_head *bh2 = 1301 - __this_cpu_read(bh_lrus.bhs[in]); 1302 - 1303 - if (bh2 == bh) { 1304 - __brelse(bh2); 1305 - } else { 1306 - if (out >= BH_LRU_SIZE) { 1307 - BUG_ON(evictee != NULL); 1308 - evictee = bh2; 1309 - } else { 1310 - bhs[out++] = bh2; 1311 - } 1312 - } 1297 + b = this_cpu_ptr(&bh_lrus); 1298 + for (i = 0; i < BH_LRU_SIZE; i++) { 1299 + swap(evictee, b->bhs[i]); 1300 + if (evictee == bh) { 1301 + bh_lru_unlock(); 1302 + return; 1313 1303 } 1314 - while (out < BH_LRU_SIZE) 1315 - bhs[out++] = NULL; 1316 - memcpy(this_cpu_ptr(&bh_lrus.bhs), bhs, sizeof(bhs)); 1317 1304 } 1318 - bh_lru_unlock(); 1319 1305 1320 - if (evictee) 1321 - __brelse(evictee); 1306 + get_bh(bh); 1307 + bh_lru_unlock(); 1308 + brelse(evictee); 1322 1309 } 1323 1310 1324 1311 /*

+3 -2

fs/dcache.c

··· 1160 1160 LIST_HEAD(dispose); 1161 1161 1162 1162 freed = list_lru_walk(&sb->s_dentry_lru, 1163 - dentry_lru_isolate_shrink, &dispose, UINT_MAX); 1163 + dentry_lru_isolate_shrink, &dispose, 1024); 1164 1164 1165 1165 this_cpu_sub(nr_dentry_unused, freed); 1166 1166 shrink_dentry_list(&dispose); 1167 - } while (freed > 0); 1167 + cond_resched(); 1168 + } while (list_lru_count(&sb->s_dentry_lru) > 0); 1168 1169 } 1169 1170 EXPORT_SYMBOL(shrink_dcache_sb); 1170 1171

+10

fs/eventpoll.c

··· 1748 1748 * to TASK_INTERRUPTIBLE before doing the checks. 1749 1749 */ 1750 1750 set_current_state(TASK_INTERRUPTIBLE); 1751 + /* 1752 + * Always short-circuit for fatal signals to allow 1753 + * threads to make a timely exit without the chance of 1754 + * finding more events available and fetching 1755 + * repeatedly. 1756 + */ 1757 + if (fatal_signal_pending(current)) { 1758 + res = -EINTR; 1759 + break; 1760 + } 1751 1761 if (ep_events_available(ep) || timed_out) 1752 1762 break; 1753 1763 if (signal_pending(current)) {

+11

fs/hugetlbfs/inode.c

··· 851 851 return MIGRATEPAGE_SUCCESS; 852 852 } 853 853 854 + static int hugetlbfs_error_remove_page(struct address_space *mapping, 855 + struct page *page) 856 + { 857 + struct inode *inode = mapping->host; 858 + 859 + remove_huge_page(page); 860 + hugetlb_fix_reserve_counts(inode); 861 + return 0; 862 + } 863 + 854 864 static int hugetlbfs_statfs(struct dentry *dentry, struct kstatfs *buf) 855 865 { 856 866 struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(dentry->d_sb); ··· 976 966 .write_end = hugetlbfs_write_end, 977 967 .set_page_dirty = hugetlbfs_set_page_dirty, 978 968 .migratepage = hugetlbfs_migrate_page, 969 + .error_remove_page = hugetlbfs_error_remove_page, 979 970 }; 980 971 981 972

+7 -25

fs/proc/generic.c

··· 180 180 } 181 181 182 182 static DEFINE_IDA(proc_inum_ida); 183 - static DEFINE_SPINLOCK(proc_inum_lock); /* protects the above */ 184 183 185 184 #define PROC_DYNAMIC_FIRST 0xF0000000U 186 185 ··· 189 190 */ 190 191 int proc_alloc_inum(unsigned int *inum) 191 192 { 192 - unsigned int i; 193 - int error; 193 + int i; 194 194 195 - retry: 196 - if (!ida_pre_get(&proc_inum_ida, GFP_KERNEL)) 197 - return -ENOMEM; 195 + i = ida_simple_get(&proc_inum_ida, 0, UINT_MAX - PROC_DYNAMIC_FIRST + 1, 196 + GFP_KERNEL); 197 + if (i < 0) 198 + return i; 198 199 199 - spin_lock_irq(&proc_inum_lock); 200 - error = ida_get_new(&proc_inum_ida, &i); 201 - spin_unlock_irq(&proc_inum_lock); 202 - if (error == -EAGAIN) 203 - goto retry; 204 - else if (error) 205 - return error; 206 - 207 - if (i > UINT_MAX - PROC_DYNAMIC_FIRST) { 208 - spin_lock_irq(&proc_inum_lock); 209 - ida_remove(&proc_inum_ida, i); 210 - spin_unlock_irq(&proc_inum_lock); 211 - return -ENOSPC; 212 - } 213 - *inum = PROC_DYNAMIC_FIRST + i; 200 + *inum = PROC_DYNAMIC_FIRST + (unsigned int)i; 214 201 return 0; 215 202 } 216 203 217 204 void proc_free_inum(unsigned int inum) 218 205 { 219 - unsigned long flags; 220 - spin_lock_irqsave(&proc_inum_lock, flags); 221 - ida_remove(&proc_inum_ida, inum - PROC_DYNAMIC_FIRST); 222 - spin_unlock_irqrestore(&proc_inum_lock, flags); 206 + ida_simple_remove(&proc_inum_ida, inum - PROC_DYNAMIC_FIRST); 223 207 } 224 208 225 209 /*

-1

fs/proc/task_mmu.c

··· 298 298 pgoff = ((loff_t)vma->vm_pgoff) << PAGE_SHIFT; 299 299 } 300 300 301 - /* We don't show the stack guard page in /proc/maps */ 302 301 start = vma->vm_start; 303 302 end = vma->vm_end; 304 303

+1

include/asm-generic/bug.h

··· 97 97 98 98 /* used internally by panic.c */ 99 99 struct warn_args; 100 + struct pt_regs; 100 101 101 102 void __warn(const char *file, int line, void *caller, unsigned taint, 102 103 struct pt_regs *regs, struct warn_args *args);

+1 -14

include/linux/backing-dev.h

··· 104 104 return percpu_counter_read_positive(&wb->stat[item]); 105 105 } 106 106 107 - static inline s64 __wb_stat_sum(struct bdi_writeback *wb, 108 - enum wb_stat_item item) 109 - { 110 - return percpu_counter_sum_positive(&wb->stat[item]); 111 - } 112 - 113 107 static inline s64 wb_stat_sum(struct bdi_writeback *wb, enum wb_stat_item item) 114 108 { 115 - s64 sum; 116 - unsigned long flags; 117 - 118 - local_irq_save(flags); 119 - sum = __wb_stat_sum(wb, item); 120 - local_irq_restore(flags); 121 - 122 - return sum; 109 + return percpu_counter_sum_positive(&wb->stat[item]); 123 110 } 124 111 125 112 extern void wb_writeout_inc(struct bdi_writeback *wb);

+27 -6

include/linux/bitmap.h

··· 112 112 extern int __bitmap_subset(const unsigned long *bitmap1, 113 113 const unsigned long *bitmap2, unsigned int nbits); 114 114 extern int __bitmap_weight(const unsigned long *bitmap, unsigned int nbits); 115 - 116 - extern void bitmap_set(unsigned long *map, unsigned int start, int len); 117 - extern void bitmap_clear(unsigned long *map, unsigned int start, int len); 115 + extern void __bitmap_set(unsigned long *map, unsigned int start, int len); 116 + extern void __bitmap_clear(unsigned long *map, unsigned int start, int len); 118 117 119 118 extern unsigned long bitmap_find_next_zero_area_off(unsigned long *map, 120 119 unsigned long size, ··· 266 267 { 267 268 if (small_const_nbits(nbits)) 268 269 return !((*src1 ^ *src2) & BITMAP_LAST_WORD_MASK(nbits)); 269 - #ifdef CONFIG_S390 270 - if (__builtin_constant_p(nbits) && (nbits % BITS_PER_LONG) == 0) 270 + if (__builtin_constant_p(nbits & 7) && IS_ALIGNED(nbits, 8)) 271 271 return !memcmp(src1, src2, nbits / 8); 272 - #endif 273 272 return __bitmap_equal(src1, src2, nbits); 274 273 } 275 274 ··· 310 313 if (small_const_nbits(nbits)) 311 314 return hweight_long(*src & BITMAP_LAST_WORD_MASK(nbits)); 312 315 return __bitmap_weight(src, nbits); 316 + } 317 + 318 + static __always_inline void bitmap_set(unsigned long *map, unsigned int start, 319 + unsigned int nbits) 320 + { 321 + if (__builtin_constant_p(nbits) && nbits == 1) 322 + __set_bit(start, map); 323 + else if (__builtin_constant_p(start & 7) && IS_ALIGNED(start, 8) && 324 + __builtin_constant_p(nbits & 7) && IS_ALIGNED(nbits, 8)) 325 + memset((char *)map + start / 8, 0xff, nbits / 8); 326 + else 327 + __bitmap_set(map, start, nbits); 328 + } 329 + 330 + static __always_inline void bitmap_clear(unsigned long *map, unsigned int start, 331 + unsigned int nbits) 332 + { 333 + if (__builtin_constant_p(nbits) && nbits == 1) 334 + __clear_bit(start, map); 335 + else if (__builtin_constant_p(start & 7) && IS_ALIGNED(start, 8) && 336 + __builtin_constant_p(nbits & 7) && IS_ALIGNED(nbits, 8)) 337 + memset((char *)map + start / 8, 0, nbits / 8); 338 + else 339 + __bitmap_clear(map, start, nbits); 313 340 } 314 341 315 342 static inline void bitmap_shift_right(unsigned long *dst, const unsigned long *src,

+1 -71

include/linux/bug.h

··· 3 3 4 4 #include <asm/bug.h> 5 5 #include <linux/compiler.h> 6 + #include <linux/build_bug.h> 6 7 7 8 enum bug_trap_type { 8 9 BUG_TRAP_TYPE_NONE = 0, ··· 14 13 struct pt_regs; 15 14 16 15 #ifdef __CHECKER__ 17 - #define __BUILD_BUG_ON_NOT_POWER_OF_2(n) (0) 18 - #define BUILD_BUG_ON_NOT_POWER_OF_2(n) (0) 19 - #define BUILD_BUG_ON_ZERO(e) (0) 20 - #define BUILD_BUG_ON_NULL(e) ((void*)0) 21 - #define BUILD_BUG_ON_INVALID(e) (0) 22 - #define BUILD_BUG_ON_MSG(cond, msg) (0) 23 - #define BUILD_BUG_ON(condition) (0) 24 - #define BUILD_BUG() (0) 25 16 #define MAYBE_BUILD_BUG_ON(cond) (0) 26 17 #else /* __CHECKER__ */ 27 - 28 - /* Force a compilation error if a constant expression is not a power of 2 */ 29 - #define __BUILD_BUG_ON_NOT_POWER_OF_2(n) \ 30 - BUILD_BUG_ON(((n) & ((n) - 1)) != 0) 31 - #define BUILD_BUG_ON_NOT_POWER_OF_2(n) \ 32 - BUILD_BUG_ON((n) == 0 || (((n) & ((n) - 1)) != 0)) 33 - 34 - /* Force a compilation error if condition is true, but also produce a 35 - result (of value 0 and type size_t), so the expression can be used 36 - e.g. in a structure initializer (or where-ever else comma expressions 37 - aren't permitted). */ 38 - #define BUILD_BUG_ON_ZERO(e) (sizeof(struct { int:-!!(e); })) 39 - #define BUILD_BUG_ON_NULL(e) ((void *)sizeof(struct { int:-!!(e); })) 40 - 41 - /* 42 - * BUILD_BUG_ON_INVALID() permits the compiler to check the validity of the 43 - * expression but avoids the generation of any code, even if that expression 44 - * has side-effects. 45 - */ 46 - #define BUILD_BUG_ON_INVALID(e) ((void)(sizeof((__force long)(e)))) 47 - 48 - /** 49 - * BUILD_BUG_ON_MSG - break compile if a condition is true & emit supplied 50 - * error message. 51 - * @condition: the condition which the compiler should know is false. 52 - * 53 - * See BUILD_BUG_ON for description. 54 - */ 55 - #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg) 56 - 57 - /** 58 - * BUILD_BUG_ON - break compile if a condition is true. 59 - * @condition: the condition which the compiler should know is false. 60 - * 61 - * If you have some code which relies on certain constants being equal, or 62 - * some other compile-time-evaluated condition, you should use BUILD_BUG_ON to 63 - * detect if someone changes it. 64 - * 65 - * The implementation uses gcc's reluctance to create a negative array, but gcc 66 - * (as of 4.4) only emits that error for obvious cases (e.g. not arguments to 67 - * inline functions). Luckily, in 4.3 they added the "error" function 68 - * attribute just for this type of case. Thus, we use a negative sized array 69 - * (should always create an error on gcc versions older than 4.4) and then call 70 - * an undefined function with the error attribute (should always create an 71 - * error on gcc 4.3 and later). If for some reason, neither creates a 72 - * compile-time error, we'll still have a link-time error, which is harder to 73 - * track down. 74 - */ 75 - #ifndef __OPTIMIZE__ 76 - #define BUILD_BUG_ON(condition) ((void)sizeof(char[1 - 2*!!(condition)])) 77 - #else 78 - #define BUILD_BUG_ON(condition) \ 79 - BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition) 80 - #endif 81 - 82 - /** 83 - * BUILD_BUG - break compile if used. 84 - * 85 - * If you have some code that you expect the compiler to eliminate at 86 - * build time, you should use BUILD_BUG to detect if it is 87 - * unexpectedly used. 88 - */ 89 - #define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed") 90 18 91 19 #define MAYBE_BUILD_BUG_ON(cond) \ 92 20 do { \

+84

include/linux/build_bug.h

··· 1 + #ifndef _LINUX_BUILD_BUG_H 2 + #define _LINUX_BUILD_BUG_H 3 + 4 + #include <linux/compiler.h> 5 + 6 + #ifdef __CHECKER__ 7 + #define __BUILD_BUG_ON_NOT_POWER_OF_2(n) (0) 8 + #define BUILD_BUG_ON_NOT_POWER_OF_2(n) (0) 9 + #define BUILD_BUG_ON_ZERO(e) (0) 10 + #define BUILD_BUG_ON_NULL(e) ((void *)0) 11 + #define BUILD_BUG_ON_INVALID(e) (0) 12 + #define BUILD_BUG_ON_MSG(cond, msg) (0) 13 + #define BUILD_BUG_ON(condition) (0) 14 + #define BUILD_BUG() (0) 15 + #else /* __CHECKER__ */ 16 + 17 + /* Force a compilation error if a constant expression is not a power of 2 */ 18 + #define __BUILD_BUG_ON_NOT_POWER_OF_2(n) \ 19 + BUILD_BUG_ON(((n) & ((n) - 1)) != 0) 20 + #define BUILD_BUG_ON_NOT_POWER_OF_2(n) \ 21 + BUILD_BUG_ON((n) == 0 || (((n) & ((n) - 1)) != 0)) 22 + 23 + /* 24 + * Force a compilation error if condition is true, but also produce a 25 + * result (of value 0 and type size_t), so the expression can be used 26 + * e.g. in a structure initializer (or where-ever else comma expressions 27 + * aren't permitted). 28 + */ 29 + #define BUILD_BUG_ON_ZERO(e) (sizeof(struct { int:(-!!(e)); })) 30 + #define BUILD_BUG_ON_NULL(e) ((void *)sizeof(struct { int:(-!!(e)); })) 31 + 32 + /* 33 + * BUILD_BUG_ON_INVALID() permits the compiler to check the validity of the 34 + * expression but avoids the generation of any code, even if that expression 35 + * has side-effects. 36 + */ 37 + #define BUILD_BUG_ON_INVALID(e) ((void)(sizeof((__force long)(e)))) 38 + 39 + /** 40 + * BUILD_BUG_ON_MSG - break compile if a condition is true & emit supplied 41 + * error message. 42 + * @condition: the condition which the compiler should know is false. 43 + * 44 + * See BUILD_BUG_ON for description. 45 + */ 46 + #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg) 47 + 48 + /** 49 + * BUILD_BUG_ON - break compile if a condition is true. 50 + * @condition: the condition which the compiler should know is false. 51 + * 52 + * If you have some code which relies on certain constants being equal, or 53 + * some other compile-time-evaluated condition, you should use BUILD_BUG_ON to 54 + * detect if someone changes it. 55 + * 56 + * The implementation uses gcc's reluctance to create a negative array, but gcc 57 + * (as of 4.4) only emits that error for obvious cases (e.g. not arguments to 58 + * inline functions). Luckily, in 4.3 they added the "error" function 59 + * attribute just for this type of case. Thus, we use a negative sized array 60 + * (should always create an error on gcc versions older than 4.4) and then call 61 + * an undefined function with the error attribute (should always create an 62 + * error on gcc 4.3 and later). If for some reason, neither creates a 63 + * compile-time error, we'll still have a link-time error, which is harder to 64 + * track down. 65 + */ 66 + #ifndef __OPTIMIZE__ 67 + #define BUILD_BUG_ON(condition) ((void)sizeof(char[1 - 2*!!(condition)])) 68 + #else 69 + #define BUILD_BUG_ON(condition) \ 70 + BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition) 71 + #endif 72 + 73 + /** 74 + * BUILD_BUG - break compile if used. 75 + * 76 + * If you have some code that you expect the compiler to eliminate at 77 + * build time, you should use BUILD_BUG to detect if it is 78 + * unexpectedly used. 79 + */ 80 + #define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed") 81 + 82 + #endif /* __CHECKER__ */ 83 + 84 + #endif /* _LINUX_BUILD_BUG_H */

-5

include/linux/dax.h

··· 154 154 #endif 155 155 int dax_pfn_mkwrite(struct vm_fault *vmf); 156 156 157 - static inline bool vma_is_dax(struct vm_area_struct *vma) 158 - { 159 - return vma->vm_file && IS_DAX(vma->vm_file->f_mapping->host); 160 - } 161 - 162 157 static inline bool dax_mapping(struct address_space *mapping) 163 158 { 164 159 return mapping->host && IS_DAX(mapping->host);

+3 -2

include/linux/extable.h

··· 2 2 #define _LINUX_EXTABLE_H 3 3 4 4 #include <linux/stddef.h> /* for NULL */ 5 + #include <linux/types.h> 5 6 6 7 struct module; 7 8 struct exception_table_entry; 8 9 9 10 const struct exception_table_entry * 10 - search_extable(const struct exception_table_entry *first, 11 - const struct exception_table_entry *last, 11 + search_extable(const struct exception_table_entry *base, 12 + const size_t num, 12 13 unsigned long value); 13 14 void sort_extable(struct exception_table_entry *start, 14 15 struct exception_table_entry *finish);

+6

include/linux/fs.h

··· 18 18 #include <linux/bug.h> 19 19 #include <linux/mutex.h> 20 20 #include <linux/rwsem.h> 21 + #include <linux/mm_types.h> 21 22 #include <linux/capability.h> 22 23 #include <linux/semaphore.h> 23 24 #include <linux/fcntl.h> ··· 3126 3125 static inline bool io_is_direct(struct file *filp) 3127 3126 { 3128 3127 return (filp->f_flags & O_DIRECT) || IS_DAX(filp->f_mapping->host); 3128 + } 3129 + 3130 + static inline bool vma_is_dax(struct vm_area_struct *vma) 3131 + { 3132 + return vma->vm_file && IS_DAX(vma->vm_file->f_mapping->host); 3129 3133 } 3130 3134 3131 3135 static inline int iocb_flags(struct file *file)

+34 -11

include/linux/huge_mm.h

··· 1 1 #ifndef _LINUX_HUGE_MM_H 2 2 #define _LINUX_HUGE_MM_H 3 3 4 + #include <linux/sched/coredump.h> 5 + 6 + #include <linux/fs.h> /* only for vma_is_dax() */ 7 + 4 8 extern int do_huge_pmd_anonymous_page(struct vm_fault *vmf); 5 9 extern int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, 6 10 pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr, ··· 89 85 90 86 extern bool is_vma_temporary_stack(struct vm_area_struct *vma); 91 87 92 - #define transparent_hugepage_enabled(__vma) \ 93 - ((transparent_hugepage_flags & \ 94 - (1<<TRANSPARENT_HUGEPAGE_FLAG) || \ 95 - (transparent_hugepage_flags & \ 96 - (1<<TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG) && \ 97 - ((__vma)->vm_flags & VM_HUGEPAGE))) && \ 98 - !((__vma)->vm_flags & VM_NOHUGEPAGE) && \ 99 - !is_vma_temporary_stack(__vma)) 88 + extern unsigned long transparent_hugepage_flags; 89 + 90 + static inline bool transparent_hugepage_enabled(struct vm_area_struct *vma) 91 + { 92 + if (vma->vm_flags & VM_NOHUGEPAGE) 93 + return false; 94 + 95 + if (is_vma_temporary_stack(vma)) 96 + return false; 97 + 98 + if (test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) 99 + return false; 100 + 101 + if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_FLAG)) 102 + return true; 103 + 104 + if (vma_is_dax(vma)) 105 + return true; 106 + 107 + if (transparent_hugepage_flags & 108 + (1 << TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG)) 109 + return !!(vma->vm_flags & VM_HUGEPAGE); 110 + 111 + return false; 112 + } 113 + 100 114 #define transparent_hugepage_use_zero_page() \ 101 115 (transparent_hugepage_flags & \ 102 116 (1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG)) ··· 125 103 #else /* CONFIG_DEBUG_VM */ 126 104 #define transparent_hugepage_debug_cow() 0 127 105 #endif /* CONFIG_DEBUG_VM */ 128 - 129 - extern unsigned long transparent_hugepage_flags; 130 106 131 107 extern unsigned long thp_get_unmapped_area(struct file *filp, 132 108 unsigned long addr, unsigned long len, unsigned long pgoff, ··· 244 224 245 225 #define hpage_nr_pages(x) 1 246 226 247 - #define transparent_hugepage_enabled(__vma) 0 227 + static inline bool transparent_hugepage_enabled(struct vm_area_struct *vma) 228 + { 229 + return false; 230 + } 248 231 249 232 static inline void prep_transhuge_page(struct page *page) {} 250 233

+30 -9

include/linux/hugetlb.h

··· 116 116 vm_flags_t vm_flags); 117 117 long hugetlb_unreserve_pages(struct inode *inode, long start, long end, 118 118 long freed); 119 - int dequeue_hwpoisoned_huge_page(struct page *page); 120 119 bool isolate_huge_page(struct page *page, struct list_head *list); 121 120 void putback_active_hugepage(struct page *page); 122 121 void free_huge_page(struct page *page); ··· 191 192 #define hugetlb_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma, dst_addr, \ 192 193 src_addr, pagep) ({ BUG(); 0; }) 193 194 #define huge_pte_offset(mm, address, sz) 0 194 - static inline int dequeue_hwpoisoned_huge_page(struct page *page) 195 - { 196 - return 0; 197 - } 198 195 199 196 static inline bool isolate_huge_page(struct page *page, struct list_head *list) 200 197 { ··· 349 354 struct page *alloc_huge_page_node(struct hstate *h, int nid); 350 355 struct page *alloc_huge_page_noerr(struct vm_area_struct *vma, 351 356 unsigned long addr, int avoid_reserve); 357 + struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, 358 + nodemask_t *nmask); 352 359 int huge_add_to_page_cache(struct page *page, struct address_space *mapping, 353 360 pgoff_t idx); 354 361 ··· 469 472 return __basepage_index(page); 470 473 } 471 474 475 + extern int dissolve_free_huge_page(struct page *page); 472 476 extern int dissolve_free_huge_pages(unsigned long start_pfn, 473 477 unsigned long end_pfn); 474 478 static inline bool hugepage_migration_supported(struct hstate *h) ··· 526 528 struct hstate {}; 527 529 #define alloc_huge_page(v, a, r) NULL 528 530 #define alloc_huge_page_node(h, nid) NULL 531 + #define alloc_huge_page_nodemask(h, preferred_nid, nmask) NULL 529 532 #define alloc_huge_page_noerr(v, a, r) NULL 530 533 #define alloc_bootmem_huge_page(h) NULL 531 534 #define hstate_file(f) NULL ··· 549 550 { 550 551 return 1; 551 552 } 552 - #define hstate_index_to_shift(index) 0 553 - #define hstate_index(h) 0 553 + 554 + static inline unsigned hstate_index_to_shift(unsigned index) 555 + { 556 + return 0; 557 + } 558 + 559 + static inline int hstate_index(struct hstate *h) 560 + { 561 + return 0; 562 + } 554 563 555 564 static inline pgoff_t basepage_index(struct page *page) 556 565 { 557 566 return page->index; 558 567 } 559 - #define dissolve_free_huge_pages(s, e) 0 560 - #define hugepage_migration_supported(h) false 568 + 569 + static inline int dissolve_free_huge_page(struct page *page) 570 + { 571 + return 0; 572 + } 573 + 574 + static inline int dissolve_free_huge_pages(unsigned long start_pfn, 575 + unsigned long end_pfn) 576 + { 577 + return 0; 578 + } 579 + 580 + static inline bool hugepage_migration_supported(struct hstate *h) 581 + { 582 + return false; 583 + } 561 584 562 585 static inline spinlock_t *huge_pte_lockptr(struct hstate *h, 563 586 struct mm_struct *mm, pte_t *pte)

+3

include/linux/initrd.h

··· 10 10 /* starting block # of image */ 11 11 extern int rd_image_start; 12 12 13 + /* size of a single RAM disk */ 14 + extern unsigned long rd_size; 15 + 13 16 /* 1 if it is not an error if initrd_start < memory_start */ 14 17 extern int initrd_below_start_ok; 15 18

+2 -1

include/linux/khugepaged.h

··· 48 48 if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags)) 49 49 if ((khugepaged_always() || 50 50 (khugepaged_req_madv() && (vm_flags & VM_HUGEPAGE))) && 51 - !(vm_flags & VM_NOHUGEPAGE)) 51 + !(vm_flags & VM_NOHUGEPAGE) && 52 + !test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) 52 53 if (__khugepaged_enter(vma->vm_mm)) 53 54 return -ENOMEM; 54 55 return 0;

+1

include/linux/list_lru.h

··· 44 44 /* for cgroup aware lrus points to per cgroup lists, otherwise NULL */ 45 45 struct list_lru_memcg *memcg_lrus; 46 46 #endif 47 + long nr_items; 47 48 } ____cacheline_aligned_in_smp; 48 49 49 50 struct list_lru {

+16

include/linux/migrate.h

··· 4 4 #include <linux/mm.h> 5 5 #include <linux/mempolicy.h> 6 6 #include <linux/migrate_mode.h> 7 + #include <linux/hugetlb.h> 7 8 8 9 typedef struct page *new_page_t(struct page *page, unsigned long private, 9 10 int **reason); ··· 30 29 31 30 /* In mm/debug.c; also keep sync with include/trace/events/migrate.h */ 32 31 extern char *migrate_reason_names[MR_TYPES]; 32 + 33 + static inline struct page *new_page_nodemask(struct page *page, 34 + int preferred_nid, nodemask_t *nodemask) 35 + { 36 + gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE; 37 + 38 + if (PageHuge(page)) 39 + return alloc_huge_page_nodemask(page_hstate(compound_head(page)), 40 + preferred_nid, nodemask); 41 + 42 + if (PageHighMem(page) || (zone_idx(page_zone(page)) == ZONE_MOVABLE)) 43 + gfp_mask |= __GFP_HIGHMEM; 44 + 45 + return __alloc_pages_nodemask(gfp_mask, 0, preferred_nid, nodemask); 46 + } 33 47 34 48 #ifdef CONFIG_MIGRATION 35 49

+3 -5

include/linux/mmzone.h

··· 603 603 #endif 604 604 605 605 /* 606 - * The pg_data_t structure is used in machines with CONFIG_DISCONTIGMEM 607 - * (mostly NUMA machines?) to denote a higher-level memory zone than the 608 - * zone denotes. 609 - * 610 606 * On NUMA machines, each NUMA node would have a pg_data_t to describe 611 - * it's memory layout. 607 + * it's memory layout. On UMA machines there is a single pglist_data which 608 + * describes the whole memory. 612 609 * 613 610 * Memory statistics and page replacement data structures are maintained on a 614 611 * per-zone basis. ··· 1055 1058 !defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) 1056 1059 static inline unsigned long early_pfn_to_nid(unsigned long pfn) 1057 1060 { 1061 + BUILD_BUG_ON(IS_ENABLED(CONFIG_NUMA)); 1058 1062 return 0; 1059 1063 } 1060 1064 #endif

+1

include/linux/page_ref.h

··· 174 174 VM_BUG_ON_PAGE(page_count(page) != 0, page); 175 175 VM_BUG_ON(count == 0); 176 176 177 + smp_mb(); 177 178 atomic_set(&page->_refcount, count); 178 179 if (page_ref_tracepoint_active(__tracepoint_page_ref_unfreeze)) 179 180 __page_ref_unfreeze(page, count);

+4 -1

include/linux/sched/coredump.h

··· 68 68 #define MMF_OOM_SKIP 21 /* mm is of no interest for the OOM killer */ 69 69 #define MMF_UNSTABLE 22 /* mm is unstable for copy_from_user */ 70 70 #define MMF_HUGE_ZERO_PAGE 23 /* mm has ever used the global huge zero page */ 71 + #define MMF_DISABLE_THP 24 /* disable THP for all VMAs */ 72 + #define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP) 71 73 72 - #define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK) 74 + #define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\ 75 + MMF_DISABLE_THP_MASK) 73 76 74 77 #endif /* _LINUX_SCHED_COREDUMP_H */

+4 -2

include/linux/swap.h

··· 277 277 extern void lru_add_drain(void); 278 278 extern void lru_add_drain_cpu(int cpu); 279 279 extern void lru_add_drain_all(void); 280 + extern void lru_add_drain_all_cpuslocked(void); 280 281 extern void rotate_reclaimable_page(struct page *page); 281 282 extern void deactivate_file_page(struct page *page); 282 283 extern void mark_page_lazyfree(struct page *page); ··· 332 331 #include <linux/blk_types.h> /* for bio_end_io_t */ 333 332 334 333 /* linux/mm/page_io.c */ 335 - extern int swap_readpage(struct page *); 334 + extern int swap_readpage(struct page *page, bool do_poll); 336 335 extern int swap_writepage(struct page *page, struct writeback_control *wbc); 337 336 extern void end_swap_bio_write(struct bio *bio); 338 337 extern int __swap_writepage(struct page *page, struct writeback_control *wbc, ··· 363 362 extern void free_pages_and_swap_cache(struct page **, int); 364 363 extern struct page *lookup_swap_cache(swp_entry_t); 365 364 extern struct page *read_swap_cache_async(swp_entry_t, gfp_t, 366 - struct vm_area_struct *vma, unsigned long addr); 365 + struct vm_area_struct *vma, unsigned long addr, 366 + bool do_poll); 367 367 extern struct page *__read_swap_cache_async(swp_entry_t, gfp_t, 368 368 struct vm_area_struct *vma, unsigned long addr, 369 369 bool *new_page_allocated);

-9

include/linux/swapops.h

··· 196 196 atomic_long_dec(&num_poisoned_pages); 197 197 } 198 198 199 - static inline void num_poisoned_pages_add(long num) 200 - { 201 - atomic_long_add(num, &num_poisoned_pages); 202 - } 203 - 204 - static inline void num_poisoned_pages_sub(long num) 205 - { 206 - atomic_long_sub(num, &num_poisoned_pages); 207 - } 208 199 #else 209 200 210 201 static inline swp_entry_t make_hwpoison_entry(struct page *page)

+1 -1

include/trace/events/mmflags.h

··· 257 257 258 258 COMPACTION_STATUS 259 259 COMPACTION_PRIORITY 260 - COMPACTION_FEEDBACK 260 + /* COMPACTION_FEEDBACK are defines not enums. Not needed here. */ 261 261 ZONE_TYPE 262 262 LRU_NAMES 263 263

+80

include/trace/events/oom.h

··· 70 70 __entry->wmark_check) 71 71 ); 72 72 73 + TRACE_EVENT(mark_victim, 74 + TP_PROTO(int pid), 75 + 76 + TP_ARGS(pid), 77 + 78 + TP_STRUCT__entry( 79 + __field(int, pid) 80 + ), 81 + 82 + TP_fast_assign( 83 + __entry->pid = pid; 84 + ), 85 + 86 + TP_printk("pid=%d", __entry->pid) 87 + ); 88 + 89 + TRACE_EVENT(wake_reaper, 90 + TP_PROTO(int pid), 91 + 92 + TP_ARGS(pid), 93 + 94 + TP_STRUCT__entry( 95 + __field(int, pid) 96 + ), 97 + 98 + TP_fast_assign( 99 + __entry->pid = pid; 100 + ), 101 + 102 + TP_printk("pid=%d", __entry->pid) 103 + ); 104 + 105 + TRACE_EVENT(start_task_reaping, 106 + TP_PROTO(int pid), 107 + 108 + TP_ARGS(pid), 109 + 110 + TP_STRUCT__entry( 111 + __field(int, pid) 112 + ), 113 + 114 + TP_fast_assign( 115 + __entry->pid = pid; 116 + ), 117 + 118 + TP_printk("pid=%d", __entry->pid) 119 + ); 120 + 121 + TRACE_EVENT(finish_task_reaping, 122 + TP_PROTO(int pid), 123 + 124 + TP_ARGS(pid), 125 + 126 + TP_STRUCT__entry( 127 + __field(int, pid) 128 + ), 129 + 130 + TP_fast_assign( 131 + __entry->pid = pid; 132 + ), 133 + 134 + TP_printk("pid=%d", __entry->pid) 135 + ); 136 + 137 + TRACE_EVENT(skip_task_reaping, 138 + TP_PROTO(int pid), 139 + 140 + TP_ARGS(pid), 141 + 142 + TP_STRUCT__entry( 143 + __field(int, pid) 144 + ), 145 + 146 + TP_fast_assign( 147 + __entry->pid = pid; 148 + ), 149 + 150 + TP_printk("pid=%d", __entry->pid) 151 + ); 152 + 73 153 #ifdef CONFIG_COMPACTION 74 154 TRACE_EVENT(compact_retry, 75 155

+4

kernel/exit.c

··· 1639 1639 __WNOTHREAD|__WCLONE|__WALL)) 1640 1640 return -EINVAL; 1641 1641 1642 + /* -INT_MIN is not defined */ 1643 + if (upid == INT_MIN) 1644 + return -ESRCH; 1645 + 1642 1646 if (upid == -1) 1643 1647 type = PIDTYPE_MAX; 1644 1648 else if (upid < 0) {

+2 -1

kernel/extable.c

··· 55 55 { 56 56 const struct exception_table_entry *e; 57 57 58 - e = search_extable(__start___ex_table, __stop___ex_table-1, addr); 58 + e = search_extable(__start___ex_table, 59 + __stop___ex_table - __start___ex_table, addr); 59 60 if (!e) 60 61 e = search_module_extables(addr); 61 62 return e;

+11 -24

kernel/groups.c

··· 5 5 #include <linux/export.h> 6 6 #include <linux/slab.h> 7 7 #include <linux/security.h> 8 + #include <linux/sort.h> 8 9 #include <linux/syscalls.h> 9 10 #include <linux/user_namespace.h> 10 11 #include <linux/vmalloc.h> ··· 77 76 return 0; 78 77 } 79 78 80 - /* a simple Shell sort */ 79 + static int gid_cmp(const void *_a, const void *_b) 80 + { 81 + kgid_t a = *(kgid_t *)_a; 82 + kgid_t b = *(kgid_t *)_b; 83 + 84 + return gid_gt(a, b) - gid_lt(a, b); 85 + } 86 + 81 87 static void groups_sort(struct group_info *group_info) 82 88 { 83 - int base, max, stride; 84 - int gidsetsize = group_info->ngroups; 85 - 86 - for (stride = 1; stride < gidsetsize; stride = 3 * stride + 1) 87 - ; /* nothing */ 88 - stride /= 3; 89 - 90 - while (stride) { 91 - max = gidsetsize - stride; 92 - for (base = 0; base < max; base++) { 93 - int left = base; 94 - int right = left + stride; 95 - kgid_t tmp = group_info->gid[right]; 96 - 97 - while (left >= 0 && gid_gt(group_info->gid[left], tmp)) { 98 - group_info->gid[right] = group_info->gid[left]; 99 - right = left; 100 - left -= stride; 101 - } 102 - group_info->gid[right] = tmp; 103 - } 104 - stride /= 3; 105 - } 89 + sort(group_info->gid, group_info->ngroups, sizeof(*group_info->gid), 90 + gid_cmp, NULL); 106 91 } 107 92 108 93 /* a simple bsearch */

+2 -8

kernel/kallsyms.c

··· 28 28 29 29 #include <asm/sections.h> 30 30 31 - #ifdef CONFIG_KALLSYMS_ALL 32 - #define all_var 1 33 - #else 34 - #define all_var 0 35 - #endif 36 - 37 31 /* 38 32 * These will be re-linked against their real values 39 33 * during the second link stage. ··· 76 82 77 83 static int is_ksym_addr(unsigned long addr) 78 84 { 79 - if (all_var) 85 + if (IS_ENABLED(CONFIG_KALLSYMS_ALL)) 80 86 return is_kernel(addr); 81 87 82 88 return is_kernel_text(addr) || is_kernel_inittext(addr); ··· 274 280 if (!symbol_end) { 275 281 if (is_kernel_inittext(addr)) 276 282 symbol_end = (unsigned long)_einittext; 277 - else if (all_var) 283 + else if (IS_ENABLED(CONFIG_KALLSYMS_ALL)) 278 284 symbol_end = (unsigned long)_end; 279 285 else 280 286 symbol_end = (unsigned long)_etext;

+1 -1

kernel/ksysfs.c

··· 234 234 NULL 235 235 }; 236 236 237 - static struct attribute_group kernel_attr_group = { 237 + static const struct attribute_group kernel_attr_group = { 238 238 .attrs = kernel_attrs, 239 239 }; 240 240

+1 -1

kernel/module.c

··· 4196 4196 goto out; 4197 4197 4198 4198 e = search_extable(mod->extable, 4199 - mod->extable + mod->num_exentries - 1, 4199 + mod->num_exentries, 4200 4200 addr); 4201 4201 out: 4202 4202 preempt_enable();

+4

kernel/signal.c

··· 1402 1402 return ret; 1403 1403 } 1404 1404 1405 + /* -INT_MIN is undefined. Exclude this case to avoid a UBSAN warning */ 1406 + if (pid == INT_MIN) 1407 + return -ESRCH; 1408 + 1405 1409 read_lock(&tasklist_lock); 1406 1410 if (pid != -1) { 1407 1411 ret = __kill_pgrp_info(sig, info,

+3 -3

kernel/sys.c

··· 2360 2360 case PR_GET_THP_DISABLE: 2361 2361 if (arg2 || arg3 || arg4 || arg5) 2362 2362 return -EINVAL; 2363 - error = !!(me->mm->def_flags & VM_NOHUGEPAGE); 2363 + error = !!test_bit(MMF_DISABLE_THP, &me->mm->flags); 2364 2364 break; 2365 2365 case PR_SET_THP_DISABLE: 2366 2366 if (arg3 || arg4 || arg5) ··· 2368 2368 if (down_write_killable(&me->mm->mmap_sem)) 2369 2369 return -EINTR; 2370 2370 if (arg2) 2371 - me->mm->def_flags |= VM_NOHUGEPAGE; 2371 + set_bit(MMF_DISABLE_THP, &me->mm->flags); 2372 2372 else 2373 - me->mm->def_flags &= ~VM_NOHUGEPAGE; 2373 + clear_bit(MMF_DISABLE_THP, &me->mm->flags); 2374 2374 up_write(&me->mm->mmap_sem); 2375 2375 break; 2376 2376 case PR_MPX_ENABLE_MANAGEMENT:

+1 -1

lib/Kconfig.debug

··· 1594 1594 1595 1595 config INTERVAL_TREE_TEST 1596 1596 tristate "Interval tree test" 1597 - depends on m && DEBUG_KERNEL 1597 + depends on DEBUG_KERNEL 1598 1598 select INTERVAL_TREE 1599 1599 help 1600 1600 A benchmark measuring the performance of the interval tree library

+4 -4

lib/bitmap.c

··· 251 251 } 252 252 EXPORT_SYMBOL(__bitmap_weight); 253 253 254 - void bitmap_set(unsigned long *map, unsigned int start, int len) 254 + void __bitmap_set(unsigned long *map, unsigned int start, int len) 255 255 { 256 256 unsigned long *p = map + BIT_WORD(start); 257 257 const unsigned int size = start + len; ··· 270 270 *p |= mask_to_set; 271 271 } 272 272 } 273 - EXPORT_SYMBOL(bitmap_set); 273 + EXPORT_SYMBOL(__bitmap_set); 274 274 275 - void bitmap_clear(unsigned long *map, unsigned int start, int len) 275 + void __bitmap_clear(unsigned long *map, unsigned int start, int len) 276 276 { 277 277 unsigned long *p = map + BIT_WORD(start); 278 278 const unsigned int size = start + len; ··· 291 291 *p &= ~mask_to_clear; 292 292 } 293 293 } 294 - EXPORT_SYMBOL(bitmap_clear); 294 + EXPORT_SYMBOL(__bitmap_clear); 295 295 296 296 /** 297 297 * bitmap_find_next_zero_area_off - find a contiguous aligned zero area

+12 -10

lib/bsearch.c

··· 33 33 void *bsearch(const void *key, const void *base, size_t num, size_t size, 34 34 int (*cmp)(const void *key, const void *elt)) 35 35 { 36 - size_t start = 0, end = num; 36 + const char *pivot; 37 37 int result; 38 38 39 - while (start < end) { 40 - size_t mid = start + (end - start) / 2; 39 + while (num > 0) { 40 + pivot = base + (num >> 1) * size; 41 + result = cmp(key, pivot); 41 42 42 - result = cmp(key, base + mid * size); 43 - if (result < 0) 44 - end = mid; 45 - else if (result > 0) 46 - start = mid + 1; 47 - else 48 - return (void *)base + mid * size; 43 + if (result == 0) 44 + return (void *)pivot; 45 + 46 + if (result > 0) { 47 + base = pivot + size; 48 + num--; 49 + } 50 + num >>= 1; 49 51 } 50 52 51 53 return NULL;

+21 -20

lib/extable.c

··· 9 9 * 2 of the License, or (at your option) any later version. 10 10 */ 11 11 12 + #include <linux/bsearch.h> 12 13 #include <linux/module.h> 13 14 #include <linux/init.h> 14 15 #include <linux/sort.h> ··· 52 51 * This is used both for the kernel exception table and for 53 52 * the exception tables of modules that get loaded. 54 53 */ 55 - static int cmp_ex(const void *a, const void *b) 54 + static int cmp_ex_sort(const void *a, const void *b) 56 55 { 57 56 const struct exception_table_entry *x = a, *y = b; 58 57 ··· 68 67 struct exception_table_entry *finish) 69 68 { 70 69 sort(start, finish - start, sizeof(struct exception_table_entry), 71 - cmp_ex, swap_ex); 70 + cmp_ex_sort, swap_ex); 72 71 } 73 72 74 73 #ifdef CONFIG_MODULES ··· 94 93 #endif /* !ARCH_HAS_SORT_EXTABLE */ 95 94 96 95 #ifndef ARCH_HAS_SEARCH_EXTABLE 96 + 97 + static int cmp_ex_search(const void *key, const void *elt) 98 + { 99 + const struct exception_table_entry *_elt = elt; 100 + unsigned long _key = *(unsigned long *)key; 101 + 102 + /* avoid overflow */ 103 + if (_key > ex_to_insn(_elt)) 104 + return 1; 105 + if (_key < ex_to_insn(_elt)) 106 + return -1; 107 + return 0; 108 + } 109 + 97 110 /* 98 111 * Search one exception table for an entry corresponding to the 99 112 * given instruction address, and return the address of the entry, ··· 116 101 * already sorted. 117 102 */ 118 103 const struct exception_table_entry * 119 - search_extable(const struct exception_table_entry *first, 120 - const struct exception_table_entry *last, 104 + search_extable(const struct exception_table_entry *base, 105 + const size_t num, 121 106 unsigned long value) 122 107 { 123 - while (first <= last) { 124 - const struct exception_table_entry *mid; 125 - 126 - mid = ((last - first) >> 1) + first; 127 - /* 128 - * careful, the distance between value and insn 129 - * can be larger than MAX_LONG: 130 - */ 131 - if (ex_to_insn(mid) < value) 132 - first = mid + 1; 133 - else if (ex_to_insn(mid) > value) 134 - last = mid - 1; 135 - else 136 - return mid; 137 - } 138 - return NULL; 108 + return bsearch(&value, base, num, 109 + sizeof(struct exception_table_entry), cmp_ex_search); 139 110 } 140 111 #endif

+62 -31

lib/interval_tree_test.c

··· 1 1 #include <linux/module.h> 2 + #include <linux/moduleparam.h> 2 3 #include <linux/interval_tree.h> 3 4 #include <linux/random.h> 5 + #include <linux/slab.h> 4 6 #include <asm/timex.h> 5 7 6 - #define NODES 100 7 - #define PERF_LOOPS 100000 8 - #define SEARCHES 100 9 - #define SEARCH_LOOPS 10000 8 + #define __param(type, name, init, msg) \ 9 + static type name = init; \ 10 + module_param(name, type, 0444); \ 11 + MODULE_PARM_DESC(name, msg); 12 + 13 + __param(int, nnodes, 100, "Number of nodes in the interval tree"); 14 + __param(int, perf_loops, 100000, "Number of iterations modifying the tree"); 15 + 16 + __param(int, nsearches, 100, "Number of searches to the interval tree"); 17 + __param(int, search_loops, 10000, "Number of iterations searching the tree"); 18 + __param(bool, search_all, false, "Searches will iterate all nodes in the tree"); 19 + 20 + __param(uint, max_endpoint, ~0, "Largest value for the interval's endpoint"); 10 21 11 22 static struct rb_root root = RB_ROOT; 12 - static struct interval_tree_node nodes[NODES]; 13 - static u32 queries[SEARCHES]; 23 + static struct interval_tree_node *nodes = NULL; 24 + static u32 *queries = NULL; 14 25 15 26 static struct rnd_state rnd; 16 27 17 28 static inline unsigned long 18 - search(unsigned long query, struct rb_root *root) 29 + search(struct rb_root *root, unsigned long start, unsigned long last) 19 30 { 20 31 struct interval_tree_node *node; 21 32 unsigned long results = 0; 22 33 23 - for (node = interval_tree_iter_first(root, query, query); node; 24 - node = interval_tree_iter_next(node, query, query)) 34 + for (node = interval_tree_iter_first(root, start, last); node; 35 + node = interval_tree_iter_next(node, start, last)) 25 36 results++; 26 37 return results; 27 38 } ··· 40 29 static void init(void) 41 30 { 42 31 int i; 43 - for (i = 0; i < NODES; i++) { 44 - u32 a = prandom_u32_state(&rnd); 45 - u32 b = prandom_u32_state(&rnd); 46 - if (a <= b) { 47 - nodes[i].start = a; 48 - nodes[i].last = b; 49 - } else { 50 - nodes[i].start = b; 51 - nodes[i].last = a; 52 - } 32 + 33 + for (i = 0; i < nnodes; i++) { 34 + u32 b = (prandom_u32_state(&rnd) >> 4) % max_endpoint; 35 + u32 a = (prandom_u32_state(&rnd) >> 4) % b; 36 + 37 + nodes[i].start = a; 38 + nodes[i].last = b; 53 39 } 54 - for (i = 0; i < SEARCHES; i++) 55 - queries[i] = prandom_u32_state(&rnd); 40 + 41 + /* 42 + * Limit the search scope to what the user defined. 43 + * Otherwise we are merely measuring empty walks, 44 + * which is pointless. 45 + */ 46 + for (i = 0; i < nsearches; i++) 47 + queries[i] = (prandom_u32_state(&rnd) >> 4) % max_endpoint; 56 48 } 57 49 58 50 static int interval_tree_test_init(void) ··· 64 50 unsigned long results; 65 51 cycles_t time1, time2, time; 66 52 53 + nodes = kmalloc(nnodes * sizeof(struct interval_tree_node), GFP_KERNEL); 54 + if (!nodes) 55 + return -ENOMEM; 56 + 57 + queries = kmalloc(nsearches * sizeof(int), GFP_KERNEL); 58 + if (!queries) { 59 + kfree(nodes); 60 + return -ENOMEM; 61 + } 62 + 67 63 printk(KERN_ALERT "interval tree insert/remove"); 68 64 69 65 prandom_seed_state(&rnd, 3141592653589793238ULL); ··· 81 57 82 58 time1 = get_cycles(); 83 59 84 - for (i = 0; i < PERF_LOOPS; i++) { 85 - for (j = 0; j < NODES; j++) 60 + for (i = 0; i < perf_loops; i++) { 61 + for (j = 0; j < nnodes; j++) 86 62 interval_tree_insert(nodes + j, &root); 87 - for (j = 0; j < NODES; j++) 63 + for (j = 0; j < nnodes; j++) 88 64 interval_tree_remove(nodes + j, &root); 89 65 } 90 66 91 67 time2 = get_cycles(); 92 68 time = time2 - time1; 93 69 94 - time = div_u64(time, PERF_LOOPS); 70 + time = div_u64(time, perf_loops); 95 71 printk(" -> %llu cycles\n", (unsigned long long)time); 96 72 97 73 printk(KERN_ALERT "interval tree search"); 98 74 99 - for (j = 0; j < NODES; j++) 75 + for (j = 0; j < nnodes; j++) 100 76 interval_tree_insert(nodes + j, &root); 101 77 102 78 time1 = get_cycles(); 103 79 104 80 results = 0; 105 - for (i = 0; i < SEARCH_LOOPS; i++) 106 - for (j = 0; j < SEARCHES; j++) 107 - results += search(queries[j], &root); 81 + for (i = 0; i < search_loops; i++) 82 + for (j = 0; j < nsearches; j++) { 83 + unsigned long start = search_all ? 0 : queries[j]; 84 + unsigned long last = search_all ? max_endpoint : queries[j]; 85 + 86 + results += search(&root, start, last); 87 + } 108 88 109 89 time2 = get_cycles(); 110 90 time = time2 - time1; 111 91 112 - time = div_u64(time, SEARCH_LOOPS); 113 - results = div_u64(results, SEARCH_LOOPS); 92 + time = div_u64(time, search_loops); 93 + results = div_u64(results, search_loops); 114 94 printk(" -> %llu cycles (%lu results)\n", 115 95 (unsigned long long)time, results); 96 + 97 + kfree(queries); 98 + kfree(nodes); 116 99 117 100 return -EAGAIN; /* Fail will directly unload the module */ 118 101 }

+7 -5

lib/kstrtox.c

··· 51 51 52 52 res = 0; 53 53 rv = 0; 54 - while (*s) { 54 + while (1) { 55 + unsigned int c = *s; 56 + unsigned int lc = c | 0x20; /* don't tolower() this line */ 55 57 unsigned int val; 56 58 57 - if ('0' <= *s && *s <= '9') 58 - val = *s - '0'; 59 - else if ('a' <= _tolower(*s) && _tolower(*s) <= 'f') 60 - val = _tolower(*s) - 'a' + 10; 59 + if ('0' <= c && c <= '9') 60 + val = c - '0'; 61 + else if ('a' <= lc && lc <= 'f') 62 + val = lc - 'a' + 10; 61 63 else 62 64 break; 63 65

+3 -4

lib/rhashtable.c

··· 211 211 int i; 212 212 213 213 size = sizeof(*tbl) + nbuckets * sizeof(tbl->buckets[0]); 214 - if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER) || 215 - gfp != GFP_KERNEL) 214 + if (gfp != GFP_KERNEL) 216 215 tbl = kzalloc(size, gfp | __GFP_NOWARN | __GFP_NORETRY); 217 - if (tbl == NULL && gfp == GFP_KERNEL) 218 - tbl = vzalloc(size); 216 + else 217 + tbl = kvzalloc(size, gfp); 219 218 220 219 size = nbuckets; 221 220

+29

lib/test_bitmap.c

··· 333 333 } 334 334 } 335 335 336 + static void noinline __init test_mem_optimisations(void) 337 + { 338 + DECLARE_BITMAP(bmap1, 1024); 339 + DECLARE_BITMAP(bmap2, 1024); 340 + unsigned int start, nbits; 341 + 342 + for (start = 0; start < 1024; start += 8) { 343 + memset(bmap1, 0x5a, sizeof(bmap1)); 344 + memset(bmap2, 0x5a, sizeof(bmap2)); 345 + for (nbits = 0; nbits < 1024 - start; nbits += 8) { 346 + bitmap_set(bmap1, start, nbits); 347 + __bitmap_set(bmap2, start, nbits); 348 + if (!bitmap_equal(bmap1, bmap2, 1024)) 349 + printk("set not equal %d %d\n", start, nbits); 350 + if (!__bitmap_equal(bmap1, bmap2, 1024)) 351 + printk("set not __equal %d %d\n", start, nbits); 352 + 353 + bitmap_clear(bmap1, start, nbits); 354 + __bitmap_clear(bmap2, start, nbits); 355 + if (!bitmap_equal(bmap1, bmap2, 1024)) 356 + printk("clear not equal %d %d\n", start, nbits); 357 + if (!__bitmap_equal(bmap1, bmap2, 1024)) 358 + printk("clear not __equal %d %d\n", start, 359 + nbits); 360 + } 361 + } 362 + } 363 + 336 364 static int __init test_bitmap_init(void) 337 365 { 338 366 test_zero_fill_copy(); 339 367 test_bitmap_u32_array_conversions(); 368 + test_mem_optimisations(); 340 369 341 370 if (failed_tests == 0) 342 371 pr_info("all %u tests passed\n", total_tests);

-1

mm/Kconfig

··· 161 161 bool "Allow for memory hot-add" 162 162 depends on SPARSEMEM || X86_64_ACPI_NUMA 163 163 depends on ARCH_ENABLE_MEMORY_HOTPLUG 164 - depends on COMPILE_TEST || !KASAN 165 164 166 165 config MEMORY_HOTPLUG_SPARSE 167 166 def_bool y

+1 -1

mm/balloon_compaction.c

··· 24 24 { 25 25 unsigned long flags; 26 26 struct page *page = alloc_page(balloon_mapping_gfp_mask() | 27 - __GFP_NOMEMALLOC | __GFP_NORETRY); 27 + __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_ZERO); 28 28 if (!page) 29 29 return NULL; 30 30

+9 -11

mm/cma.c

··· 59 59 } 60 60 61 61 static unsigned long cma_bitmap_aligned_mask(const struct cma *cma, 62 - int align_order) 62 + unsigned int align_order) 63 63 { 64 64 if (align_order <= cma->order_per_bit) 65 65 return 0; ··· 67 67 } 68 68 69 69 /* 70 - * Find a PFN aligned to the specified order and return an offset represented in 71 - * order_per_bits. 70 + * Find the offset of the base PFN from the specified align_order. 71 + * The value returned is represented in order_per_bits. 72 72 */ 73 73 static unsigned long cma_bitmap_aligned_offset(const struct cma *cma, 74 - int align_order) 74 + unsigned int align_order) 75 75 { 76 - if (align_order <= cma->order_per_bit) 77 - return 0; 78 - 79 - return (ALIGN(cma->base_pfn, (1UL << align_order)) 80 - - cma->base_pfn) >> cma->order_per_bit; 76 + return (cma->base_pfn & ((1UL << align_order) - 1)) 77 + >> cma->order_per_bit; 81 78 } 82 79 83 80 static unsigned long cma_bitmap_pages_to_bits(const struct cma *cma, ··· 124 127 * to be in the same zone. 125 128 */ 126 129 if (page_zone(pfn_to_page(pfn)) != zone) 127 - goto err; 130 + goto not_in_zone; 128 131 } 129 132 init_cma_reserved_pageblock(pfn_to_page(base_pfn)); 130 133 } while (--i); ··· 138 141 139 142 return 0; 140 143 141 - err: 144 + not_in_zone: 145 + pr_err("CMA area %s could not be activated\n", cma->name); 142 146 kfree(cma->bitmap); 143 147 cma->count = 0; 144 148 return -EINVAL;

+5 -3

mm/filemap.c

··· 239 239 /* Leave page->index set: truncation lookup relies upon it */ 240 240 241 241 /* hugetlb pages do not participate in page cache accounting. */ 242 - if (!PageHuge(page)) 243 - __mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, -nr); 242 + if (PageHuge(page)) 243 + return; 244 + 245 + __mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, -nr); 244 246 if (PageSwapBacked(page)) { 245 247 __mod_node_page_state(page_pgdat(page), NR_SHMEM, -nr); 246 248 if (PageTransHuge(page)) 247 249 __dec_node_page_state(page, NR_SHMEM_THPS); 248 250 } else { 249 - VM_BUG_ON_PAGE(PageTransHuge(page) && !PageHuge(page), page); 251 + VM_BUG_ON_PAGE(PageTransHuge(page), page); 250 252 } 251 253 252 254 /*

+111 -181

mm/hugetlb.c

··· 20 20 #include <linux/slab.h> 21 21 #include <linux/sched/signal.h> 22 22 #include <linux/rmap.h> 23 + #include <linux/string_helpers.h> 23 24 #include <linux/swap.h> 24 25 #include <linux/swapops.h> 25 - #include <linux/page-isolation.h> 26 26 #include <linux/jhash.h> 27 27 28 28 #include <asm/page.h> ··· 872 872 struct page *page; 873 873 874 874 list_for_each_entry(page, &h->hugepage_freelists[nid], lru) 875 - if (!is_migrate_isolate_page(page)) 875 + if (!PageHWPoison(page)) 876 876 break; 877 877 /* 878 878 * if 'non-isolated free hugepage' not found on the list, ··· 887 887 return page; 888 888 } 889 889 890 - static struct page *dequeue_huge_page_node(struct hstate *h, int nid) 890 + static struct page *dequeue_huge_page_nodemask(struct hstate *h, gfp_t gfp_mask, int nid, 891 + nodemask_t *nmask) 891 892 { 892 - struct page *page; 893 - int node; 893 + unsigned int cpuset_mems_cookie; 894 + struct zonelist *zonelist; 895 + struct zone *zone; 896 + struct zoneref *z; 897 + int node = -1; 894 898 895 - if (nid != NUMA_NO_NODE) 896 - return dequeue_huge_page_node_exact(h, nid); 899 + zonelist = node_zonelist(nid, gfp_mask); 897 900 898 - for_each_online_node(node) { 901 + retry_cpuset: 902 + cpuset_mems_cookie = read_mems_allowed_begin(); 903 + for_each_zone_zonelist_nodemask(zone, z, zonelist, gfp_zone(gfp_mask), nmask) { 904 + struct page *page; 905 + 906 + if (!cpuset_zone_allowed(zone, gfp_mask)) 907 + continue; 908 + /* 909 + * no need to ask again on the same node. Pool is node rather than 910 + * zone aware 911 + */ 912 + if (zone_to_nid(zone) == node) 913 + continue; 914 + node = zone_to_nid(zone); 915 + 899 916 page = dequeue_huge_page_node_exact(h, node); 900 917 if (page) 901 918 return page; 902 919 } 920 + if (unlikely(read_mems_allowed_retry(cpuset_mems_cookie))) 921 + goto retry_cpuset; 922 + 903 923 return NULL; 904 924 } 905 925 ··· 937 917 unsigned long address, int avoid_reserve, 938 918 long chg) 939 919 { 940 - struct page *page = NULL; 920 + struct page *page; 941 921 struct mempolicy *mpol; 942 - nodemask_t *nodemask; 943 922 gfp_t gfp_mask; 923 + nodemask_t *nodemask; 944 924 int nid; 945 - struct zonelist *zonelist; 946 - struct zone *zone; 947 - struct zoneref *z; 948 - unsigned int cpuset_mems_cookie; 949 925 950 926 /* 951 927 * A child process with MAP_PRIVATE mappings created by their parent ··· 956 940 if (avoid_reserve && h->free_huge_pages - h->resv_huge_pages == 0) 957 941 goto err; 958 942 959 - retry_cpuset: 960 - cpuset_mems_cookie = read_mems_allowed_begin(); 961 943 gfp_mask = htlb_alloc_mask(h); 962 944 nid = huge_node(vma, address, gfp_mask, &mpol, &nodemask); 963 - zonelist = node_zonelist(nid, gfp_mask); 964 - 965 - for_each_zone_zonelist_nodemask(zone, z, zonelist, 966 - MAX_NR_ZONES - 1, nodemask) { 967 - if (cpuset_zone_allowed(zone, gfp_mask)) { 968 - page = dequeue_huge_page_node(h, zone_to_nid(zone)); 969 - if (page) { 970 - if (avoid_reserve) 971 - break; 972 - if (!vma_has_reserves(vma, chg)) 973 - break; 974 - 975 - SetPagePrivate(page); 976 - h->resv_huge_pages--; 977 - break; 978 - } 979 - } 945 + page = dequeue_huge_page_nodemask(h, gfp_mask, nid, nodemask); 946 + if (page && !avoid_reserve && vma_has_reserves(vma, chg)) { 947 + SetPagePrivate(page); 948 + h->resv_huge_pages--; 980 949 } 981 950 982 951 mpol_cond_put(mpol); 983 - if (unlikely(!page && read_mems_allowed_retry(cpuset_mems_cookie))) 984 - goto retry_cpuset; 985 952 return page; 986 953 987 954 err: ··· 1459 1460 * number of free hugepages would be reduced below the number of reserved 1460 1461 * hugepages. 1461 1462 */ 1462 - static int dissolve_free_huge_page(struct page *page) 1463 + int dissolve_free_huge_page(struct page *page) 1463 1464 { 1464 1465 int rc = 0; 1465 1466 ··· 1471 1472 if (h->free_huge_pages - h->resv_huge_pages == 0) { 1472 1473 rc = -EBUSY; 1473 1474 goto out; 1475 + } 1476 + /* 1477 + * Move PageHWPoison flag from head page to the raw error page, 1478 + * which makes any subpages rather than the error page reusable. 1479 + */ 1480 + if (PageHWPoison(head) && page != head) { 1481 + SetPageHWPoison(page); 1482 + ClearPageHWPoison(head); 1474 1483 } 1475 1484 list_del(&head->lru); 1476 1485 h->free_huge_pages--; ··· 1520 1513 return rc; 1521 1514 } 1522 1515 1523 - /* 1524 - * There are 3 ways this can get called: 1525 - * 1. With vma+addr: we use the VMA's memory policy 1526 - * 2. With !vma, but nid=NUMA_NO_NODE: We try to allocate a huge 1527 - * page from any node, and let the buddy allocator itself figure 1528 - * it out. 1529 - * 3. With !vma, but nid!=NUMA_NO_NODE. We allocate a huge page 1530 - * strictly from 'nid' 1531 - */ 1532 1516 static struct page *__hugetlb_alloc_buddy_huge_page(struct hstate *h, 1533 - struct vm_area_struct *vma, unsigned long addr, int nid) 1517 + gfp_t gfp_mask, int nid, nodemask_t *nmask) 1534 1518 { 1535 1519 int order = huge_page_order(h); 1536 - gfp_t gfp = htlb_alloc_mask(h)|__GFP_COMP|__GFP_REPEAT|__GFP_NOWARN; 1537 - unsigned int cpuset_mems_cookie; 1538 1520 1539 - /* 1540 - * We need a VMA to get a memory policy. If we do not 1541 - * have one, we use the 'nid' argument. 1542 - * 1543 - * The mempolicy stuff below has some non-inlined bits 1544 - * and calls ->vm_ops. That makes it hard to optimize at 1545 - * compile-time, even when NUMA is off and it does 1546 - * nothing. This helps the compiler optimize it out. 1547 - */ 1548 - if (!IS_ENABLED(CONFIG_NUMA) || !vma) { 1549 - /* 1550 - * If a specific node is requested, make sure to 1551 - * get memory from there, but only when a node 1552 - * is explicitly specified. 1553 - */ 1554 - if (nid != NUMA_NO_NODE) 1555 - gfp |= __GFP_THISNODE; 1556 - /* 1557 - * Make sure to call something that can handle 1558 - * nid=NUMA_NO_NODE 1559 - */ 1560 - return alloc_pages_node(nid, gfp, order); 1561 - } 1562 - 1563 - /* 1564 - * OK, so we have a VMA. Fetch the mempolicy and try to 1565 - * allocate a huge page with it. We will only reach this 1566 - * when CONFIG_NUMA=y. 1567 - */ 1568 - do { 1569 - struct page *page; 1570 - struct mempolicy *mpol; 1571 - int nid; 1572 - nodemask_t *nodemask; 1573 - 1574 - cpuset_mems_cookie = read_mems_allowed_begin(); 1575 - nid = huge_node(vma, addr, gfp, &mpol, &nodemask); 1576 - mpol_cond_put(mpol); 1577 - page = __alloc_pages_nodemask(gfp, order, nid, nodemask); 1578 - if (page) 1579 - return page; 1580 - } while (read_mems_allowed_retry(cpuset_mems_cookie)); 1581 - 1582 - return NULL; 1521 + gfp_mask |= __GFP_COMP|__GFP_REPEAT|__GFP_NOWARN; 1522 + if (nid == NUMA_NO_NODE) 1523 + nid = numa_mem_id(); 1524 + return __alloc_pages_nodemask(gfp_mask, order, nid, nmask); 1583 1525 } 1584 1526 1585 - /* 1586 - * There are two ways to allocate a huge page: 1587 - * 1. When you have a VMA and an address (like a fault) 1588 - * 2. When you have no VMA (like when setting /proc/.../nr_hugepages) 1589 - * 1590 - * 'vma' and 'addr' are only for (1). 'nid' is always NUMA_NO_NODE in 1591 - * this case which signifies that the allocation should be done with 1592 - * respect for the VMA's memory policy. 1593 - * 1594 - * For (2), we ignore 'vma' and 'addr' and use 'nid' exclusively. This 1595 - * implies that memory policies will not be taken in to account. 1596 - */ 1597 - static struct page *__alloc_buddy_huge_page(struct hstate *h, 1598 - struct vm_area_struct *vma, unsigned long addr, int nid) 1527 + static struct page *__alloc_buddy_huge_page(struct hstate *h, gfp_t gfp_mask, 1528 + int nid, nodemask_t *nmask) 1599 1529 { 1600 1530 struct page *page; 1601 1531 unsigned int r_nid; ··· 1540 1596 if (hstate_is_gigantic(h)) 1541 1597 return NULL; 1542 1598 1543 - /* 1544 - * Make sure that anyone specifying 'nid' is not also specifying a VMA. 1545 - * This makes sure the caller is picking _one_ of the modes with which 1546 - * we can call this function, not both. 1547 - */ 1548 - if (vma || (addr != -1)) { 1549 - VM_WARN_ON_ONCE(addr == -1); 1550 - VM_WARN_ON_ONCE(nid != NUMA_NO_NODE); 1551 - } 1552 1599 /* 1553 1600 * Assume we will successfully allocate the surplus page to 1554 1601 * prevent racing processes from causing the surplus to exceed ··· 1573 1638 } 1574 1639 spin_unlock(&hugetlb_lock); 1575 1640 1576 - page = __hugetlb_alloc_buddy_huge_page(h, vma, addr, nid); 1641 + page = __hugetlb_alloc_buddy_huge_page(h, gfp_mask, nid, nmask); 1577 1642 1578 1643 spin_lock(&hugetlb_lock); 1579 1644 if (page) { ··· 1598 1663 } 1599 1664 1600 1665 /* 1601 - * Allocate a huge page from 'nid'. Note, 'nid' may be 1602 - * NUMA_NO_NODE, which means that it may be allocated 1603 - * anywhere. 1604 - */ 1605 - static 1606 - struct page *__alloc_buddy_huge_page_no_mpol(struct hstate *h, int nid) 1607 - { 1608 - unsigned long addr = -1; 1609 - 1610 - return __alloc_buddy_huge_page(h, NULL, addr, nid); 1611 - } 1612 - 1613 - /* 1614 1666 * Use the VMA's mpolicy to allocate a huge page from the buddy. 1615 1667 */ 1616 1668 static 1617 1669 struct page *__alloc_buddy_huge_page_with_mpol(struct hstate *h, 1618 1670 struct vm_area_struct *vma, unsigned long addr) 1619 1671 { 1620 - return __alloc_buddy_huge_page(h, vma, addr, NUMA_NO_NODE); 1672 + struct page *page; 1673 + struct mempolicy *mpol; 1674 + gfp_t gfp_mask = htlb_alloc_mask(h); 1675 + int nid; 1676 + nodemask_t *nodemask; 1677 + 1678 + nid = huge_node(vma, addr, gfp_mask, &mpol, &nodemask); 1679 + page = __alloc_buddy_huge_page(h, gfp_mask, nid, nodemask); 1680 + mpol_cond_put(mpol); 1681 + 1682 + return page; 1621 1683 } 1622 1684 1623 1685 /* ··· 1624 1692 */ 1625 1693 struct page *alloc_huge_page_node(struct hstate *h, int nid) 1626 1694 { 1695 + gfp_t gfp_mask = htlb_alloc_mask(h); 1627 1696 struct page *page = NULL; 1697 + 1698 + if (nid != NUMA_NO_NODE) 1699 + gfp_mask |= __GFP_THISNODE; 1628 1700 1629 1701 spin_lock(&hugetlb_lock); 1630 1702 if (h->free_huge_pages - h->resv_huge_pages > 0) 1631 - page = dequeue_huge_page_node(h, nid); 1703 + page = dequeue_huge_page_nodemask(h, gfp_mask, nid, NULL); 1632 1704 spin_unlock(&hugetlb_lock); 1633 1705 1634 1706 if (!page) 1635 - page = __alloc_buddy_huge_page_no_mpol(h, nid); 1707 + page = __alloc_buddy_huge_page(h, gfp_mask, nid, NULL); 1636 1708 1637 1709 return page; 1710 + } 1711 + 1712 + 1713 + struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, 1714 + nodemask_t *nmask) 1715 + { 1716 + gfp_t gfp_mask = htlb_alloc_mask(h); 1717 + 1718 + spin_lock(&hugetlb_lock); 1719 + if (h->free_huge_pages - h->resv_huge_pages > 0) { 1720 + struct page *page; 1721 + 1722 + page = dequeue_huge_page_nodemask(h, gfp_mask, preferred_nid, nmask); 1723 + if (page) { 1724 + spin_unlock(&hugetlb_lock); 1725 + return page; 1726 + } 1727 + } 1728 + spin_unlock(&hugetlb_lock); 1729 + 1730 + /* No reservations, try to overcommit */ 1731 + 1732 + return __alloc_buddy_huge_page(h, gfp_mask, preferred_nid, nmask); 1638 1733 } 1639 1734 1640 1735 /* ··· 1689 1730 retry: 1690 1731 spin_unlock(&hugetlb_lock); 1691 1732 for (i = 0; i < needed; i++) { 1692 - page = __alloc_buddy_huge_page_no_mpol(h, NUMA_NO_NODE); 1733 + page = __alloc_buddy_huge_page(h, htlb_alloc_mask(h), 1734 + NUMA_NO_NODE, NULL); 1693 1735 if (!page) { 1694 1736 alloc_ok = false; 1695 1737 break; 1696 1738 } 1697 1739 list_add(&page->lru, &surplus_list); 1740 + cond_resched(); 1698 1741 } 1699 1742 allocated += i; 1700 1743 ··· 2165 2204 } else if (!alloc_fresh_huge_page(h, 2166 2205 &node_states[N_MEMORY])) 2167 2206 break; 2207 + cond_resched(); 2168 2208 } 2169 - h->max_huge_pages = i; 2209 + if (i < h->max_huge_pages) { 2210 + char buf[32]; 2211 + 2212 + string_get_size(huge_page_size(h), 1, STRING_UNITS_2, buf, 32); 2213 + pr_warn("HugeTLB: allocating %lu of page size %s failed. Only allocated %lu hugepages.\n", 2214 + h->max_huge_pages, buf, i); 2215 + h->max_huge_pages = i; 2216 + } 2170 2217 } 2171 2218 2172 2219 static void __init hugetlb_init_hstates(void) ··· 2192 2223 VM_BUG_ON(minimum_order == UINT_MAX); 2193 2224 } 2194 2225 2195 - static char * __init memfmt(char *buf, unsigned long n) 2196 - { 2197 - if (n >= (1UL << 30)) 2198 - sprintf(buf, "%lu GB", n >> 30); 2199 - else if (n >= (1UL << 20)) 2200 - sprintf(buf, "%lu MB", n >> 20); 2201 - else 2202 - sprintf(buf, "%lu KB", n >> 10); 2203 - return buf; 2204 - } 2205 - 2206 2226 static void __init report_hugepages(void) 2207 2227 { 2208 2228 struct hstate *h; 2209 2229 2210 2230 for_each_hstate(h) { 2211 2231 char buf[32]; 2232 + 2233 + string_get_size(huge_page_size(h), 1, STRING_UNITS_2, buf, 32); 2212 2234 pr_info("HugeTLB registered %s page size, pre-allocated %ld pages\n", 2213 - memfmt(buf, huge_page_size(h)), 2214 - h->free_huge_pages); 2235 + buf, h->free_huge_pages); 2215 2236 } 2216 2237 } 2217 2238 ··· 2760 2801 return 0; 2761 2802 2762 2803 if (!size_to_hstate(default_hstate_size)) { 2804 + if (default_hstate_size != 0) { 2805 + pr_err("HugeTLB: unsupported default_hugepagesz %lu. Reverting to %lu\n", 2806 + default_hstate_size, HPAGE_SIZE); 2807 + } 2808 + 2763 2809 default_hstate_size = HPAGE_SIZE; 2764 2810 if (!size_to_hstate(default_hstate_size)) 2765 2811 hugetlb_add_hstate(HUGETLB_PAGE_ORDER); ··· 4702 4738 4703 4739 return pte_page(*(pte_t *)pgd) + ((address & ~PGDIR_MASK) >> PAGE_SHIFT); 4704 4740 } 4705 - 4706 - #ifdef CONFIG_MEMORY_FAILURE 4707 - 4708 - /* 4709 - * This function is called from memory failure code. 4710 - */ 4711 - int dequeue_hwpoisoned_huge_page(struct page *hpage) 4712 - { 4713 - struct hstate *h = page_hstate(hpage); 4714 - int nid = page_to_nid(hpage); 4715 - int ret = -EBUSY; 4716 - 4717 - spin_lock(&hugetlb_lock); 4718 - /* 4719 - * Just checking !page_huge_active is not enough, because that could be 4720 - * an isolated/hwpoisoned hugepage (which have >0 refcount). 4721 - */ 4722 - if (!page_huge_active(hpage) && !page_count(hpage)) { 4723 - /* 4724 - * Hwpoisoned hugepage isn't linked to activelist or freelist, 4725 - * but dangling hpage->lru can trigger list-debug warnings 4726 - * (this happens when we call unpoison_memory() on it), 4727 - * so let it point to itself with list_del_init(). 4728 - */ 4729 - list_del_init(&hpage->lru); 4730 - set_page_refcounted(hpage); 4731 - h->free_huge_pages--; 4732 - h->free_huge_pages_node[nid]--; 4733 - ret = 0; 4734 - } 4735 - spin_unlock(&hugetlb_lock); 4736 - return ret; 4737 - } 4738 - #endif 4739 4741 4740 4742 bool isolate_huge_page(struct page *page, struct list_head *list) 4741 4743 {

+58 -94

mm/kasan/kasan.c

··· 134 134 return false; 135 135 } 136 136 137 - static __always_inline bool memory_is_poisoned_2(unsigned long addr) 137 + static __always_inline bool memory_is_poisoned_2_4_8(unsigned long addr, 138 + unsigned long size) 138 139 { 139 - u16 *shadow_addr = (u16 *)kasan_mem_to_shadow((void *)addr); 140 + u8 *shadow_addr = (u8 *)kasan_mem_to_shadow((void *)addr); 140 141 141 - if (unlikely(*shadow_addr)) { 142 - if (memory_is_poisoned_1(addr + 1)) 143 - return true; 142 + /* 143 + * Access crosses 8(shadow size)-byte boundary. Such access maps 144 + * into 2 shadow bytes, so we need to check them both. 145 + */ 146 + if (unlikely(((addr + size - 1) & KASAN_SHADOW_MASK) < size - 1)) 147 + return *shadow_addr || memory_is_poisoned_1(addr + size - 1); 144 148 145 - /* 146 - * If single shadow byte covers 2-byte access, we don't 147 - * need to do anything more. Otherwise, test the first 148 - * shadow byte. 149 - */ 150 - if (likely(((addr + 1) & KASAN_SHADOW_MASK) != 0)) 151 - return false; 152 - 153 - return unlikely(*(u8 *)shadow_addr); 154 - } 155 - 156 - return false; 157 - } 158 - 159 - static __always_inline bool memory_is_poisoned_4(unsigned long addr) 160 - { 161 - u16 *shadow_addr = (u16 *)kasan_mem_to_shadow((void *)addr); 162 - 163 - if (unlikely(*shadow_addr)) { 164 - if (memory_is_poisoned_1(addr + 3)) 165 - return true; 166 - 167 - /* 168 - * If single shadow byte covers 4-byte access, we don't 169 - * need to do anything more. Otherwise, test the first 170 - * shadow byte. 171 - */ 172 - if (likely(((addr + 3) & KASAN_SHADOW_MASK) >= 3)) 173 - return false; 174 - 175 - return unlikely(*(u8 *)shadow_addr); 176 - } 177 - 178 - return false; 179 - } 180 - 181 - static __always_inline bool memory_is_poisoned_8(unsigned long addr) 182 - { 183 - u16 *shadow_addr = (u16 *)kasan_mem_to_shadow((void *)addr); 184 - 185 - if (unlikely(*shadow_addr)) { 186 - if (memory_is_poisoned_1(addr + 7)) 187 - return true; 188 - 189 - /* 190 - * If single shadow byte covers 8-byte access, we don't 191 - * need to do anything more. Otherwise, test the first 192 - * shadow byte. 193 - */ 194 - if (likely(IS_ALIGNED(addr, KASAN_SHADOW_SCALE_SIZE))) 195 - return false; 196 - 197 - return unlikely(*(u8 *)shadow_addr); 198 - } 199 - 200 - return false; 149 + return memory_is_poisoned_1(addr + size - 1); 201 150 } 202 151 203 152 static __always_inline bool memory_is_poisoned_16(unsigned long addr) 204 153 { 205 - u32 *shadow_addr = (u32 *)kasan_mem_to_shadow((void *)addr); 154 + u16 *shadow_addr = (u16 *)kasan_mem_to_shadow((void *)addr); 206 155 207 - if (unlikely(*shadow_addr)) { 208 - u16 shadow_first_bytes = *(u16 *)shadow_addr; 156 + /* Unaligned 16-bytes access maps into 3 shadow bytes. */ 157 + if (unlikely(!IS_ALIGNED(addr, KASAN_SHADOW_SCALE_SIZE))) 158 + return *shadow_addr || memory_is_poisoned_1(addr + 15); 209 159 210 - if (unlikely(shadow_first_bytes)) 211 - return true; 212 - 213 - /* 214 - * If two shadow bytes covers 16-byte access, we don't 215 - * need to do anything more. Otherwise, test the last 216 - * shadow byte. 217 - */ 218 - if (likely(IS_ALIGNED(addr, KASAN_SHADOW_SCALE_SIZE))) 219 - return false; 220 - 221 - return memory_is_poisoned_1(addr + 15); 222 - } 223 - 224 - return false; 160 + return *shadow_addr; 225 161 } 226 162 227 - static __always_inline unsigned long bytes_is_zero(const u8 *start, 163 + static __always_inline unsigned long bytes_is_nonzero(const u8 *start, 228 164 size_t size) 229 165 { 230 166 while (size) { ··· 173 237 return 0; 174 238 } 175 239 176 - static __always_inline unsigned long memory_is_zero(const void *start, 240 + static __always_inline unsigned long memory_is_nonzero(const void *start, 177 241 const void *end) 178 242 { 179 243 unsigned int words; ··· 181 245 unsigned int prefix = (unsigned long)start % 8; 182 246 183 247 if (end - start <= 16) 184 - return bytes_is_zero(start, end - start); 248 + return bytes_is_nonzero(start, end - start); 185 249 186 250 if (prefix) { 187 251 prefix = 8 - prefix; 188 - ret = bytes_is_zero(start, prefix); 252 + ret = bytes_is_nonzero(start, prefix); 189 253 if (unlikely(ret)) 190 254 return ret; 191 255 start += prefix; ··· 194 258 words = (end - start) / 8; 195 259 while (words) { 196 260 if (unlikely(*(u64 *)start)) 197 - return bytes_is_zero(start, 8); 261 + return bytes_is_nonzero(start, 8); 198 262 start += 8; 199 263 words--; 200 264 } 201 265 202 - return bytes_is_zero(start, (end - start) % 8); 266 + return bytes_is_nonzero(start, (end - start) % 8); 203 267 } 204 268 205 269 static __always_inline bool memory_is_poisoned_n(unsigned long addr, ··· 207 271 { 208 272 unsigned long ret; 209 273 210 - ret = memory_is_zero(kasan_mem_to_shadow((void *)addr), 274 + ret = memory_is_nonzero(kasan_mem_to_shadow((void *)addr), 211 275 kasan_mem_to_shadow((void *)addr + size - 1) + 1); 212 276 213 277 if (unlikely(ret)) { ··· 228 292 case 1: 229 293 return memory_is_poisoned_1(addr); 230 294 case 2: 231 - return memory_is_poisoned_2(addr); 232 295 case 4: 233 - return memory_is_poisoned_4(addr); 234 296 case 8: 235 - return memory_is_poisoned_8(addr); 297 + return memory_is_poisoned_2_4_8(addr, size); 236 298 case 16: 237 299 return memory_is_poisoned_16(addr); 238 300 default: ··· 737 803 EXPORT_SYMBOL(__asan_unpoison_stack_memory); 738 804 739 805 #ifdef CONFIG_MEMORY_HOTPLUG 740 - static int kasan_mem_notifier(struct notifier_block *nb, 806 + static int __meminit kasan_mem_notifier(struct notifier_block *nb, 741 807 unsigned long action, void *data) 742 808 { 743 - return (action == MEM_GOING_ONLINE) ? NOTIFY_BAD : NOTIFY_OK; 809 + struct memory_notify *mem_data = data; 810 + unsigned long nr_shadow_pages, start_kaddr, shadow_start; 811 + unsigned long shadow_end, shadow_size; 812 + 813 + nr_shadow_pages = mem_data->nr_pages >> KASAN_SHADOW_SCALE_SHIFT; 814 + start_kaddr = (unsigned long)pfn_to_kaddr(mem_data->start_pfn); 815 + shadow_start = (unsigned long)kasan_mem_to_shadow((void *)start_kaddr); 816 + shadow_size = nr_shadow_pages << PAGE_SHIFT; 817 + shadow_end = shadow_start + shadow_size; 818 + 819 + if (WARN_ON(mem_data->nr_pages % KASAN_SHADOW_SCALE_SIZE) || 820 + WARN_ON(start_kaddr % (KASAN_SHADOW_SCALE_SIZE << PAGE_SHIFT))) 821 + return NOTIFY_BAD; 822 + 823 + switch (action) { 824 + case MEM_GOING_ONLINE: { 825 + void *ret; 826 + 827 + ret = __vmalloc_node_range(shadow_size, PAGE_SIZE, shadow_start, 828 + shadow_end, GFP_KERNEL, 829 + PAGE_KERNEL, VM_NO_GUARD, 830 + pfn_to_nid(mem_data->start_pfn), 831 + __builtin_return_address(0)); 832 + if (!ret) 833 + return NOTIFY_BAD; 834 + 835 + kmemleak_ignore(ret); 836 + return NOTIFY_OK; 837 + } 838 + case MEM_OFFLINE: 839 + vfree((void *)shadow_start); 840 + } 841 + 842 + return NOTIFY_OK; 744 843 } 745 844 746 845 static int __init kasan_memhotplug_init(void) 747 846 { 748 - pr_info("WARNING: KASAN doesn't support memory hot-add\n"); 749 - pr_info("Memory hot-add will be disabled\n"); 750 - 751 847 hotplug_memory_notifier(kasan_mem_notifier, 0); 752 848 753 849 return 0;

+12

mm/kasan/kasan_init.c

··· 118 118 119 119 do { 120 120 next = p4d_addr_end(addr, end); 121 + if (IS_ALIGNED(addr, P4D_SIZE) && end - addr >= P4D_SIZE) { 122 + pud_t *pud; 123 + pmd_t *pmd; 124 + 125 + p4d_populate(&init_mm, p4d, lm_alias(kasan_zero_pud)); 126 + pud = pud_offset(p4d, addr); 127 + pud_populate(&init_mm, pud, lm_alias(kasan_zero_pmd)); 128 + pmd = pmd_offset(pud, addr); 129 + pmd_populate_kernel(&init_mm, pmd, 130 + lm_alias(kasan_zero_pte)); 131 + continue; 132 + } 121 133 122 134 if (p4d_none(*p4d)) { 123 135 p4d_populate(&init_mm, p4d,

+1 -1

mm/kasan/report.c

··· 107 107 return bug_type; 108 108 } 109 109 110 - const char *get_wild_bug_type(struct kasan_access_info *info) 110 + static const char *get_wild_bug_type(struct kasan_access_info *info) 111 111 { 112 112 const char *bug_type = "unknown-crash"; 113 113

+2 -1

mm/khugepaged.c

··· 816 816 static bool hugepage_vma_check(struct vm_area_struct *vma) 817 817 { 818 818 if ((!(vma->vm_flags & VM_HUGEPAGE) && !khugepaged_always()) || 819 - (vma->vm_flags & VM_NOHUGEPAGE)) 819 + (vma->vm_flags & VM_NOHUGEPAGE) || 820 + test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) 820 821 return false; 821 822 if (shmem_file(vma->vm_file)) { 822 823 if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGE_PAGECACHE))

+6 -8

mm/list_lru.c

··· 117 117 l = list_lru_from_kmem(nlru, item); 118 118 list_add_tail(item, &l->list); 119 119 l->nr_items++; 120 + nlru->nr_items++; 120 121 spin_unlock(&nlru->lock); 121 122 return true; 122 123 } ··· 137 136 l = list_lru_from_kmem(nlru, item); 138 137 list_del_init(item); 139 138 l->nr_items--; 139 + nlru->nr_items--; 140 140 spin_unlock(&nlru->lock); 141 141 return true; 142 142 } ··· 185 183 186 184 unsigned long list_lru_count_node(struct list_lru *lru, int nid) 187 185 { 188 - long count = 0; 189 - int memcg_idx; 186 + struct list_lru_node *nlru; 190 187 191 - count += __list_lru_count_one(lru, nid, -1); 192 - if (list_lru_memcg_aware(lru)) { 193 - for_each_memcg_cache_index(memcg_idx) 194 - count += __list_lru_count_one(lru, nid, memcg_idx); 195 - } 196 - return count; 188 + nlru = &lru->node[nid]; 189 + return nlru->nr_items; 197 190 } 198 191 EXPORT_SYMBOL_GPL(list_lru_count_node); 199 192 ··· 223 226 assert_spin_locked(&nlru->lock); 224 227 case LRU_REMOVED: 225 228 isolated++; 229 + nlru->nr_items--; 226 230 /* 227 231 * If the lru lock has been dropped, our list 228 232 * traversal is now invalid and so we have to

+24 -22

mm/madvise.c

··· 205 205 continue; 206 206 207 207 page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE, 208 - vma, index); 208 + vma, index, false); 209 209 if (page) 210 210 put_page(page); 211 211 } ··· 246 246 } 247 247 swap = radix_to_swp_entry(page); 248 248 page = read_swap_cache_async(swap, GFP_HIGHUSER_MOVABLE, 249 - NULL, 0); 249 + NULL, 0, false); 250 250 if (page) 251 251 put_page(page); 252 252 } ··· 451 451 struct mm_struct *mm = vma->vm_mm; 452 452 struct mmu_gather tlb; 453 453 454 - if (vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP)) 455 - return -EINVAL; 456 - 457 454 /* MADV_FREE works for only anon vma at the moment */ 458 455 if (!vma_is_anonymous(vma)) 459 456 return -EINVAL; ··· 474 477 return 0; 475 478 } 476 479 477 - static long madvise_free(struct vm_area_struct *vma, 478 - struct vm_area_struct **prev, 479 - unsigned long start, unsigned long end) 480 - { 481 - *prev = vma; 482 - return madvise_free_single_vma(vma, start, end); 483 - } 484 - 485 480 /* 486 481 * Application no longer needs these pages. If the pages are dirty, 487 482 * it's OK to just throw them away. The app will be more careful about ··· 493 504 * An interface that causes the system to free clean pages and flush 494 505 * dirty pages is already available as msync(MS_INVALIDATE). 495 506 */ 496 - static long madvise_dontneed(struct vm_area_struct *vma, 497 - struct vm_area_struct **prev, 498 - unsigned long start, unsigned long end) 507 + static long madvise_dontneed_single_vma(struct vm_area_struct *vma, 508 + unsigned long start, unsigned long end) 509 + { 510 + zap_page_range(vma, start, end - start); 511 + return 0; 512 + } 513 + 514 + static long madvise_dontneed_free(struct vm_area_struct *vma, 515 + struct vm_area_struct **prev, 516 + unsigned long start, unsigned long end, 517 + int behavior) 499 518 { 500 519 *prev = vma; 501 520 if (!can_madv_dontneed_vma(vma)) ··· 523 526 * is also < vma->vm_end. If start < 524 527 * vma->vm_start it means an hole materialized 525 528 * in the user address space within the 526 - * virtual range passed to MADV_DONTNEED. 529 + * virtual range passed to MADV_DONTNEED 530 + * or MADV_FREE. 527 531 */ 528 532 return -ENOMEM; 529 533 } ··· 535 537 * Don't fail if end > vma->vm_end. If the old 536 538 * vma was splitted while the mmap_sem was 537 539 * released the effect of the concurrent 538 - * operation may not cause MADV_DONTNEED to 540 + * operation may not cause madvise() to 539 541 * have an undefined result. There may be an 540 542 * adjacent next vma that we'll walk 541 543 * next. userfaultfd_remove() will generate an ··· 547 549 } 548 550 VM_WARN_ON(start >= end); 549 551 } 550 - zap_page_range(vma, start, end - start); 551 - return 0; 552 + 553 + if (behavior == MADV_DONTNEED) 554 + return madvise_dontneed_single_vma(vma, start, end); 555 + else if (behavior == MADV_FREE) 556 + return madvise_free_single_vma(vma, start, end); 557 + else 558 + return -EINVAL; 552 559 } 553 560 554 561 /* ··· 659 656 case MADV_WILLNEED: 660 657 return madvise_willneed(vma, prev, start, end); 661 658 case MADV_FREE: 662 - return madvise_free(vma, prev, start, end); 663 659 case MADV_DONTNEED: 664 - return madvise_dontneed(vma, prev, start, end); 660 + return madvise_dontneed_free(vma, prev, start, end, behavior); 665 661 default: 666 662 return madvise_behavior(vma, prev, start, end, behavior); 667 663 }

+33 -19

mm/memcontrol.c

··· 631 631 val = __this_cpu_read(memcg->stat->nr_page_events); 632 632 next = __this_cpu_read(memcg->stat->targets[target]); 633 633 /* from time_after() in jiffies.h */ 634 - if ((long)next - (long)val < 0) { 634 + if ((long)(next - val) < 0) { 635 635 switch (target) { 636 636 case MEM_CGROUP_TARGET_THRESH: 637 637 next = val + THRESHOLDS_EVENTS_TARGET; ··· 5317 5317 5318 5318 /** 5319 5319 * mem_cgroup_low - check if memory consumption is below the normal range 5320 - * @root: the highest ancestor to consider 5320 + * @root: the top ancestor of the sub-tree being checked 5321 5321 * @memcg: the memory cgroup to check 5322 5322 * 5323 5323 * Returns %true if memory consumption of @memcg, and that of all 5324 - * configurable ancestors up to @root, is below the normal range. 5324 + * ancestors up to (but not including) @root, is below the normal range. 5325 + * 5326 + * @root is exclusive; it is never low when looked at directly and isn't 5327 + * checked when traversing the hierarchy. 5328 + * 5329 + * Excluding @root enables using memory.low to prioritize memory usage 5330 + * between cgroups within a subtree of the hierarchy that is limited by 5331 + * memory.high or memory.max. 5332 + * 5333 + * For example, given cgroup A with children B and C: 5334 + * 5335 + * A 5336 + * / \ 5337 + * B C 5338 + * 5339 + * and 5340 + * 5341 + * 1. A/memory.current > A/memory.high 5342 + * 2. A/B/memory.current < A/B/memory.low 5343 + * 3. A/C/memory.current >= A/C/memory.low 5344 + * 5345 + * As 'A' is high, i.e. triggers reclaim from 'A', and 'B' is low, we 5346 + * should reclaim from 'C' until 'A' is no longer high or until we can 5347 + * no longer reclaim from 'C'. If 'A', i.e. @root, isn't excluded by 5348 + * mem_cgroup_low when reclaming from 'A', then 'B' won't be considered 5349 + * low and we will reclaim indiscriminately from both 'B' and 'C'. 5325 5350 */ 5326 5351 bool mem_cgroup_low(struct mem_cgroup *root, struct mem_cgroup *memcg) 5327 5352 { 5328 5353 if (mem_cgroup_disabled()) 5329 5354 return false; 5330 5355 5331 - /* 5332 - * The toplevel group doesn't have a configurable range, so 5333 - * it's never low when looked at directly, and it is not 5334 - * considered an ancestor when assessing the hierarchy. 5335 - */ 5336 - 5337 - if (memcg == root_mem_cgroup) 5356 + if (!root) 5357 + root = root_mem_cgroup; 5358 + if (memcg == root) 5338 5359 return false; 5339 5360 5340 - if (page_counter_read(&memcg->memory) >= memcg->low) 5341 - return false; 5342 - 5343 - while (memcg != root) { 5344 - memcg = parent_mem_cgroup(memcg); 5345 - 5346 - if (memcg == root_mem_cgroup) 5347 - break; 5348 - 5361 + for (; memcg != root; memcg = parent_mem_cgroup(memcg)) { 5349 5362 if (page_counter_read(&memcg->memory) >= memcg->low) 5350 5363 return false; 5351 5364 } 5365 + 5352 5366 return true; 5353 5367 } 5354 5368

+153 -179

mm/memory-failure.c

··· 49 49 #include <linux/swap.h> 50 50 #include <linux/backing-dev.h> 51 51 #include <linux/migrate.h> 52 - #include <linux/page-isolation.h> 53 52 #include <linux/suspend.h> 54 53 #include <linux/slab.h> 55 54 #include <linux/swapops.h> ··· 554 555 return -EIO; 555 556 } 556 557 558 + static int truncate_error_page(struct page *p, unsigned long pfn, 559 + struct address_space *mapping) 560 + { 561 + int ret = MF_FAILED; 562 + 563 + if (mapping->a_ops->error_remove_page) { 564 + int err = mapping->a_ops->error_remove_page(mapping, p); 565 + 566 + if (err != 0) { 567 + pr_info("Memory failure: %#lx: Failed to punch page: %d\n", 568 + pfn, err); 569 + } else if (page_has_private(p) && 570 + !try_to_release_page(p, GFP_NOIO)) { 571 + pr_info("Memory failure: %#lx: failed to release buffers\n", 572 + pfn); 573 + } else { 574 + ret = MF_RECOVERED; 575 + } 576 + } else { 577 + /* 578 + * If the file system doesn't support it just invalidate 579 + * This fails on dirty or anything with private pages 580 + */ 581 + if (invalidate_inode_page(p)) 582 + ret = MF_RECOVERED; 583 + else 584 + pr_info("Memory failure: %#lx: Failed to invalidate\n", 585 + pfn); 586 + } 587 + 588 + return ret; 589 + } 590 + 557 591 /* 558 592 * Error hit kernel page. 559 593 * Do nothing, try to be lucky and not touch this instead. For a few cases we ··· 611 579 */ 612 580 static int me_pagecache_clean(struct page *p, unsigned long pfn) 613 581 { 614 - int err; 615 - int ret = MF_FAILED; 616 582 struct address_space *mapping; 617 583 618 584 delete_from_lru_cache(p); ··· 642 612 * 643 613 * Open: to take i_mutex or not for this? Right now we don't. 644 614 */ 645 - if (mapping->a_ops->error_remove_page) { 646 - err = mapping->a_ops->error_remove_page(mapping, p); 647 - if (err != 0) { 648 - pr_info("Memory failure: %#lx: Failed to punch page: %d\n", 649 - pfn, err); 650 - } else if (page_has_private(p) && 651 - !try_to_release_page(p, GFP_NOIO)) { 652 - pr_info("Memory failure: %#lx: failed to release buffers\n", 653 - pfn); 654 - } else { 655 - ret = MF_RECOVERED; 656 - } 657 - } else { 658 - /* 659 - * If the file system doesn't support it just invalidate 660 - * This fails on dirty or anything with private pages 661 - */ 662 - if (invalidate_inode_page(p)) 663 - ret = MF_RECOVERED; 664 - else 665 - pr_info("Memory failure: %#lx: Failed to invalidate\n", 666 - pfn); 667 - } 668 - return ret; 615 + return truncate_error_page(p, pfn, mapping); 669 616 } 670 617 671 618 /* ··· 748 741 { 749 742 int res = 0; 750 743 struct page *hpage = compound_head(p); 744 + struct address_space *mapping; 751 745 752 746 if (!PageHuge(hpage)) 753 747 return MF_DELAYED; 754 748 755 - /* 756 - * We can safely recover from error on free or reserved (i.e. 757 - * not in-use) hugepage by dequeuing it from freelist. 758 - * To check whether a hugepage is in-use or not, we can't use 759 - * page->lru because it can be used in other hugepage operations, 760 - * such as __unmap_hugepage_range() and gather_surplus_pages(). 761 - * So instead we use page_mapping() and PageAnon(). 762 - */ 763 - if (!(page_mapping(hpage) || PageAnon(hpage))) { 764 - res = dequeue_hwpoisoned_huge_page(hpage); 765 - if (!res) 766 - return MF_RECOVERED; 749 + mapping = page_mapping(hpage); 750 + if (mapping) { 751 + res = truncate_error_page(hpage, pfn, mapping); 752 + } else { 753 + unlock_page(hpage); 754 + /* 755 + * migration entry prevents later access on error anonymous 756 + * hugepage, so we can free and dissolve it into buddy to 757 + * save healthy subpages. 758 + */ 759 + if (PageAnon(hpage)) 760 + put_page(hpage); 761 + dissolve_free_huge_page(p); 762 + res = MF_RECOVERED; 763 + lock_page(hpage); 767 764 } 768 - return MF_DELAYED; 765 + 766 + return res; 769 767 } 770 768 771 769 /* ··· 869 857 count = page_count(p) - 1; 870 858 if (ps->action == me_swapcache_dirty && result == MF_DELAYED) 871 859 count--; 872 - if (count != 0) { 860 + if (count > 0) { 873 861 pr_err("Memory failure: %#lx: %s still referenced by %d users\n", 874 862 pfn, action_page_types[ps->type], count); 875 863 result = MF_FAILED; ··· 1022 1010 return unmap_success; 1023 1011 } 1024 1012 1025 - static void set_page_hwpoison_huge_page(struct page *hpage) 1013 + static int identify_page_state(unsigned long pfn, struct page *p, 1014 + unsigned long page_flags) 1026 1015 { 1027 - int i; 1028 - int nr_pages = 1 << compound_order(hpage); 1029 - for (i = 0; i < nr_pages; i++) 1030 - SetPageHWPoison(hpage + i); 1016 + struct page_state *ps; 1017 + 1018 + /* 1019 + * The first check uses the current page flags which may not have any 1020 + * relevant information. The second check with the saved page flags is 1021 + * carried out only if the first check can't determine the page status. 1022 + */ 1023 + for (ps = error_states;; ps++) 1024 + if ((p->flags & ps->mask) == ps->res) 1025 + break; 1026 + 1027 + page_flags |= (p->flags & (1UL << PG_dirty)); 1028 + 1029 + if (!ps->mask) 1030 + for (ps = error_states;; ps++) 1031 + if ((page_flags & ps->mask) == ps->res) 1032 + break; 1033 + return page_action(ps, p, pfn); 1031 1034 } 1032 1035 1033 - static void clear_page_hwpoison_huge_page(struct page *hpage) 1036 + static int memory_failure_hugetlb(unsigned long pfn, int trapno, int flags) 1034 1037 { 1035 - int i; 1036 - int nr_pages = 1 << compound_order(hpage); 1037 - for (i = 0; i < nr_pages; i++) 1038 - ClearPageHWPoison(hpage + i); 1038 + struct page *p = pfn_to_page(pfn); 1039 + struct page *head = compound_head(p); 1040 + int res; 1041 + unsigned long page_flags; 1042 + 1043 + if (TestSetPageHWPoison(head)) { 1044 + pr_err("Memory failure: %#lx: already hardware poisoned\n", 1045 + pfn); 1046 + return 0; 1047 + } 1048 + 1049 + num_poisoned_pages_inc(); 1050 + 1051 + if (!(flags & MF_COUNT_INCREASED) && !get_hwpoison_page(p)) { 1052 + /* 1053 + * Check "filter hit" and "race with other subpage." 1054 + */ 1055 + lock_page(head); 1056 + if (PageHWPoison(head)) { 1057 + if ((hwpoison_filter(p) && TestClearPageHWPoison(p)) 1058 + || (p != head && TestSetPageHWPoison(head))) { 1059 + num_poisoned_pages_dec(); 1060 + unlock_page(head); 1061 + return 0; 1062 + } 1063 + } 1064 + unlock_page(head); 1065 + dissolve_free_huge_page(p); 1066 + action_result(pfn, MF_MSG_FREE_HUGE, MF_DELAYED); 1067 + return 0; 1068 + } 1069 + 1070 + lock_page(head); 1071 + page_flags = head->flags; 1072 + 1073 + if (!PageHWPoison(head)) { 1074 + pr_err("Memory failure: %#lx: just unpoisoned\n", pfn); 1075 + num_poisoned_pages_dec(); 1076 + unlock_page(head); 1077 + put_hwpoison_page(head); 1078 + return 0; 1079 + } 1080 + 1081 + if (!hwpoison_user_mappings(p, pfn, trapno, flags, &head)) { 1082 + action_result(pfn, MF_MSG_UNMAP_FAILED, MF_IGNORED); 1083 + res = -EBUSY; 1084 + goto out; 1085 + } 1086 + 1087 + res = identify_page_state(pfn, p, page_flags); 1088 + out: 1089 + unlock_page(head); 1090 + return res; 1039 1091 } 1040 1092 1041 1093 /** ··· 1122 1046 */ 1123 1047 int memory_failure(unsigned long pfn, int trapno, int flags) 1124 1048 { 1125 - struct page_state *ps; 1126 1049 struct page *p; 1127 1050 struct page *hpage; 1128 1051 struct page *orig_head; 1129 1052 int res; 1130 - unsigned int nr_pages; 1131 1053 unsigned long page_flags; 1132 1054 1133 1055 if (!sysctl_memory_failure_recovery) ··· 1138 1064 } 1139 1065 1140 1066 p = pfn_to_page(pfn); 1141 - orig_head = hpage = compound_head(p); 1067 + if (PageHuge(p)) 1068 + return memory_failure_hugetlb(pfn, trapno, flags); 1142 1069 if (TestSetPageHWPoison(p)) { 1143 1070 pr_err("Memory failure: %#lx: already hardware poisoned\n", 1144 1071 pfn); 1145 1072 return 0; 1146 1073 } 1147 1074 1148 - /* 1149 - * Currently errors on hugetlbfs pages are measured in hugepage units, 1150 - * so nr_pages should be 1 << compound_order. OTOH when errors are on 1151 - * transparent hugepages, they are supposed to be split and error 1152 - * measurement is done in normal page units. So nr_pages should be one 1153 - * in this case. 1154 - */ 1155 - if (PageHuge(p)) 1156 - nr_pages = 1 << compound_order(hpage); 1157 - else /* normal page or thp */ 1158 - nr_pages = 1; 1159 - num_poisoned_pages_add(nr_pages); 1075 + orig_head = hpage = compound_head(p); 1076 + num_poisoned_pages_inc(); 1160 1077 1161 1078 /* 1162 1079 * We need/can do nothing about count=0 pages. 1163 1080 * 1) it's a free page, and therefore in safe hand: 1164 1081 * prep_new_page() will be the gate keeper. 1165 - * 2) it's a free hugepage, which is also safe: 1166 - * an affected hugepage will be dequeued from hugepage freelist, 1167 - * so there's no concern about reusing it ever after. 1168 - * 3) it's part of a non-compound high order page. 1082 + * 2) it's part of a non-compound high order page. 1169 1083 * Implies some kernel user: cannot stop them from 1170 1084 * R/W the page; let's pray that the page has been 1171 1085 * used and will be freed some time later. ··· 1164 1102 if (is_free_buddy_page(p)) { 1165 1103 action_result(pfn, MF_MSG_BUDDY, MF_DELAYED); 1166 1104 return 0; 1167 - } else if (PageHuge(hpage)) { 1168 - /* 1169 - * Check "filter hit" and "race with other subpage." 1170 - */ 1171 - lock_page(hpage); 1172 - if (PageHWPoison(hpage)) { 1173 - if ((hwpoison_filter(p) && TestClearPageHWPoison(p)) 1174 - || (p != hpage && TestSetPageHWPoison(hpage))) { 1175 - num_poisoned_pages_sub(nr_pages); 1176 - unlock_page(hpage); 1177 - return 0; 1178 - } 1179 - } 1180 - set_page_hwpoison_huge_page(hpage); 1181 - res = dequeue_hwpoisoned_huge_page(hpage); 1182 - action_result(pfn, MF_MSG_FREE_HUGE, 1183 - res ? MF_IGNORED : MF_DELAYED); 1184 - unlock_page(hpage); 1185 - return res; 1186 1105 } else { 1187 1106 action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED); 1188 1107 return -EBUSY; 1189 1108 } 1190 1109 } 1191 1110 1192 - if (!PageHuge(p) && PageTransHuge(hpage)) { 1111 + if (PageTransHuge(hpage)) { 1193 1112 lock_page(p); 1194 1113 if (!PageAnon(p) || unlikely(split_huge_page(p))) { 1195 1114 unlock_page(p); ··· 1181 1138 pr_err("Memory failure: %#lx: thp split failed\n", 1182 1139 pfn); 1183 1140 if (TestClearPageHWPoison(p)) 1184 - num_poisoned_pages_sub(nr_pages); 1141 + num_poisoned_pages_dec(); 1185 1142 put_hwpoison_page(p); 1186 1143 return -EBUSY; 1187 1144 } ··· 1208 1165 return 0; 1209 1166 } 1210 1167 1211 - lock_page(hpage); 1168 + lock_page(p); 1212 1169 1213 1170 /* 1214 1171 * The page could have changed compound pages during the locking. ··· 1237 1194 */ 1238 1195 if (!PageHWPoison(p)) { 1239 1196 pr_err("Memory failure: %#lx: just unpoisoned\n", pfn); 1240 - num_poisoned_pages_sub(nr_pages); 1241 - unlock_page(hpage); 1242 - put_hwpoison_page(hpage); 1197 + num_poisoned_pages_dec(); 1198 + unlock_page(p); 1199 + put_hwpoison_page(p); 1243 1200 return 0; 1244 1201 } 1245 1202 if (hwpoison_filter(p)) { 1246 1203 if (TestClearPageHWPoison(p)) 1247 - num_poisoned_pages_sub(nr_pages); 1248 - unlock_page(hpage); 1249 - put_hwpoison_page(hpage); 1204 + num_poisoned_pages_dec(); 1205 + unlock_page(p); 1206 + put_hwpoison_page(p); 1250 1207 return 0; 1251 1208 } 1252 1209 1253 - if (!PageHuge(p) && !PageTransTail(p) && !PageLRU(p)) 1210 + if (!PageTransTail(p) && !PageLRU(p)) 1254 1211 goto identify_page_state; 1255 - 1256 - /* 1257 - * For error on the tail page, we should set PG_hwpoison 1258 - * on the head page to show that the hugepage is hwpoisoned 1259 - */ 1260 - if (PageHuge(p) && PageTail(p) && TestSetPageHWPoison(hpage)) { 1261 - action_result(pfn, MF_MSG_POISONED_HUGE, MF_IGNORED); 1262 - unlock_page(hpage); 1263 - put_hwpoison_page(hpage); 1264 - return 0; 1265 - } 1266 - /* 1267 - * Set PG_hwpoison on all pages in an error hugepage, 1268 - * because containment is done in hugepage unit for now. 1269 - * Since we have done TestSetPageHWPoison() for the head page with 1270 - * page lock held, we can safely set PG_hwpoison bits on tail pages. 1271 - */ 1272 - if (PageHuge(p)) 1273 - set_page_hwpoison_huge_page(hpage); 1274 1212 1275 1213 /* 1276 1214 * It's very difficult to mess with pages currently under IO ··· 1282 1258 } 1283 1259 1284 1260 identify_page_state: 1285 - res = -EBUSY; 1286 - /* 1287 - * The first check uses the current page flags which may not have any 1288 - * relevant information. The second check with the saved page flagss is 1289 - * carried out only if the first check can't determine the page status. 1290 - */ 1291 - for (ps = error_states;; ps++) 1292 - if ((p->flags & ps->mask) == ps->res) 1293 - break; 1294 - 1295 - page_flags |= (p->flags & (1UL << PG_dirty)); 1296 - 1297 - if (!ps->mask) 1298 - for (ps = error_states;; ps++) 1299 - if ((page_flags & ps->mask) == ps->res) 1300 - break; 1301 - res = page_action(ps, p, pfn); 1261 + res = identify_page_state(pfn, p, page_flags); 1302 1262 out: 1303 - unlock_page(hpage); 1263 + unlock_page(p); 1304 1264 return res; 1305 1265 } 1306 1266 EXPORT_SYMBOL_GPL(memory_failure); ··· 1406 1398 struct page *page; 1407 1399 struct page *p; 1408 1400 int freeit = 0; 1409 - unsigned int nr_pages; 1410 1401 static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL, 1411 1402 DEFAULT_RATELIMIT_BURST); 1412 1403 ··· 1450 1443 return 0; 1451 1444 } 1452 1445 1453 - nr_pages = 1 << compound_order(page); 1454 - 1455 1446 if (!get_hwpoison_page(p)) { 1456 - /* 1457 - * Since HWPoisoned hugepage should have non-zero refcount, 1458 - * race between memory failure and unpoison seems to happen. 1459 - * In such case unpoison fails and memory failure runs 1460 - * to the end. 1461 - */ 1462 - if (PageHuge(page)) { 1463 - unpoison_pr_info("Unpoison: Memory failure is now running on free hugepage %#lx\n", 1464 - pfn, &unpoison_rs); 1465 - return 0; 1466 - } 1467 1447 if (TestClearPageHWPoison(p)) 1468 1448 num_poisoned_pages_dec(); 1469 1449 unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n", ··· 1468 1474 if (TestClearPageHWPoison(page)) { 1469 1475 unpoison_pr_info("Unpoison: Software-unpoisoned page %#lx\n", 1470 1476 pfn, &unpoison_rs); 1471 - num_poisoned_pages_sub(nr_pages); 1477 + num_poisoned_pages_dec(); 1472 1478 freeit = 1; 1473 - if (PageHuge(page)) 1474 - clear_page_hwpoison_huge_page(page); 1475 1479 } 1476 1480 unlock_page(page); 1477 1481 ··· 1484 1492 static struct page *new_page(struct page *p, unsigned long private, int **x) 1485 1493 { 1486 1494 int nid = page_to_nid(p); 1487 - if (PageHuge(p)) { 1488 - struct hstate *hstate = page_hstate(compound_head(p)); 1489 1495 1490 - if (hstate_is_gigantic(hstate)) 1491 - return alloc_huge_page_node(hstate, NUMA_NO_NODE); 1492 - 1493 - return alloc_huge_page_node(hstate, nid); 1494 - } else { 1495 - return __alloc_pages_node(nid, GFP_HIGHUSER_MOVABLE, 0); 1496 - } 1496 + return new_page_nodemask(p, nid, &node_states[N_MEMORY]); 1497 1497 } 1498 1498 1499 1499 /* ··· 1592 1608 if (ret > 0) 1593 1609 ret = -EIO; 1594 1610 } else { 1595 - /* overcommit hugetlb page will be freed to buddy */ 1596 - if (PageHuge(page)) { 1597 - set_page_hwpoison_huge_page(hpage); 1598 - dequeue_hwpoisoned_huge_page(hpage); 1599 - num_poisoned_pages_add(1 << compound_order(hpage)); 1600 - } else { 1601 - SetPageHWPoison(page); 1602 - num_poisoned_pages_inc(); 1603 - } 1611 + if (PageHuge(page)) 1612 + dissolve_free_huge_page(page); 1604 1613 } 1605 1614 return ret; 1606 1615 } ··· 1709 1732 1710 1733 static void soft_offline_free_page(struct page *page) 1711 1734 { 1712 - if (PageHuge(page)) { 1713 - struct page *hpage = compound_head(page); 1735 + struct page *head = compound_head(page); 1714 1736 1715 - set_page_hwpoison_huge_page(hpage); 1716 - if (!dequeue_hwpoisoned_huge_page(hpage)) 1717 - num_poisoned_pages_add(1 << compound_order(hpage)); 1718 - } else { 1719 - if (!TestSetPageHWPoison(page)) 1720 - num_poisoned_pages_inc(); 1737 + if (!TestSetPageHWPoison(head)) { 1738 + num_poisoned_pages_inc(); 1739 + if (PageHuge(head)) 1740 + dissolve_free_huge_page(page); 1721 1741 } 1722 1742 } 1723 1743

+2 -2

mm/memory.c

··· 3262 3262 fault_around_bytes = PAGE_SIZE; /* rounddown_pow_of_two(0) is undefined */ 3263 3263 return 0; 3264 3264 } 3265 - DEFINE_SIMPLE_ATTRIBUTE(fault_around_bytes_fops, 3265 + DEFINE_DEBUGFS_ATTRIBUTE(fault_around_bytes_fops, 3266 3266 fault_around_bytes_get, fault_around_bytes_set, "%llu\n"); 3267 3267 3268 3268 static int __init fault_around_debugfs(void) 3269 3269 { 3270 3270 void *ret; 3271 3271 3272 - ret = debugfs_create_file("fault_around_bytes", 0644, NULL, NULL, 3272 + ret = debugfs_create_file_unsafe("fault_around_bytes", 0644, NULL, NULL, 3273 3273 &fault_around_bytes_fops); 3274 3274 if (!ret) 3275 3275 pr_warn("Failed to create fault_around_bytes in debugfs");

+38 -100

mm/memory_hotplug.c

··· 52 52 static online_page_callback_t online_page_callback = generic_online_page; 53 53 static DEFINE_MUTEX(online_page_callback_lock); 54 54 55 - /* The same as the cpu_hotplug lock, but for memory hotplug. */ 56 - static struct { 57 - struct task_struct *active_writer; 58 - struct mutex lock; /* Synchronizes accesses to refcount, */ 59 - /* 60 - * Also blocks the new readers during 61 - * an ongoing mem hotplug operation. 62 - */ 63 - int refcount; 55 + DEFINE_STATIC_PERCPU_RWSEM(mem_hotplug_lock); 64 56 65 - #ifdef CONFIG_DEBUG_LOCK_ALLOC 66 - struct lockdep_map dep_map; 67 - #endif 68 - } mem_hotplug = { 69 - .active_writer = NULL, 70 - .lock = __MUTEX_INITIALIZER(mem_hotplug.lock), 71 - .refcount = 0, 72 - #ifdef CONFIG_DEBUG_LOCK_ALLOC 73 - .dep_map = {.name = "mem_hotplug.lock" }, 74 - #endif 75 - }; 57 + void get_online_mems(void) 58 + { 59 + percpu_down_read(&mem_hotplug_lock); 60 + } 76 61 77 - /* Lockdep annotations for get/put_online_mems() and mem_hotplug_begin/end() */ 78 - #define memhp_lock_acquire_read() lock_map_acquire_read(&mem_hotplug.dep_map) 79 - #define memhp_lock_acquire() lock_map_acquire(&mem_hotplug.dep_map) 80 - #define memhp_lock_release() lock_map_release(&mem_hotplug.dep_map) 62 + void put_online_mems(void) 63 + { 64 + percpu_up_read(&mem_hotplug_lock); 65 + } 81 66 82 67 bool movable_node_enabled = false; 83 68 ··· 84 99 } 85 100 __setup("memhp_default_state=", setup_memhp_default_state); 86 101 87 - void get_online_mems(void) 88 - { 89 - might_sleep(); 90 - if (mem_hotplug.active_writer == current) 91 - return; 92 - memhp_lock_acquire_read(); 93 - mutex_lock(&mem_hotplug.lock); 94 - mem_hotplug.refcount++; 95 - mutex_unlock(&mem_hotplug.lock); 96 - 97 - } 98 - 99 - void put_online_mems(void) 100 - { 101 - if (mem_hotplug.active_writer == current) 102 - return; 103 - mutex_lock(&mem_hotplug.lock); 104 - 105 - if (WARN_ON(!mem_hotplug.refcount)) 106 - mem_hotplug.refcount++; /* try to fix things up */ 107 - 108 - if (!--mem_hotplug.refcount && unlikely(mem_hotplug.active_writer)) 109 - wake_up_process(mem_hotplug.active_writer); 110 - mutex_unlock(&mem_hotplug.lock); 111 - memhp_lock_release(); 112 - 113 - } 114 - 115 - /* Serializes write accesses to mem_hotplug.active_writer. */ 116 - static DEFINE_MUTEX(memory_add_remove_lock); 117 - 118 102 void mem_hotplug_begin(void) 119 103 { 120 - mutex_lock(&memory_add_remove_lock); 121 - 122 - mem_hotplug.active_writer = current; 123 - 124 - memhp_lock_acquire(); 125 - for (;;) { 126 - mutex_lock(&mem_hotplug.lock); 127 - if (likely(!mem_hotplug.refcount)) 128 - break; 129 - __set_current_state(TASK_UNINTERRUPTIBLE); 130 - mutex_unlock(&mem_hotplug.lock); 131 - schedule(); 132 - } 104 + cpus_read_lock(); 105 + percpu_down_write(&mem_hotplug_lock); 133 106 } 134 107 135 108 void mem_hotplug_done(void) 136 109 { 137 - mem_hotplug.active_writer = NULL; 138 - mutex_unlock(&mem_hotplug.lock); 139 - memhp_lock_release(); 140 - mutex_unlock(&memory_add_remove_lock); 110 + percpu_up_write(&mem_hotplug_lock); 111 + cpus_read_unlock(); 141 112 } 142 113 143 114 /* add this memory to iomem resource */ ··· 521 580 { 522 581 struct pglist_data *pgdat = zone->zone_pgdat; 523 582 int nr_pages = PAGES_PER_SECTION; 524 - int zone_type; 525 583 unsigned long flags; 526 - 527 - zone_type = zone - pgdat->node_zones; 528 584 529 585 pgdat_resize_lock(zone->zone_pgdat, &flags); 530 586 shrink_zone_span(zone, start_pfn, start_pfn + nr_pages); ··· 872 934 return &pgdat->node_zones[ZONE_NORMAL]; 873 935 } 874 936 937 + static inline bool movable_pfn_range(int nid, struct zone *default_zone, 938 + unsigned long start_pfn, unsigned long nr_pages) 939 + { 940 + if (!allow_online_pfn_range(nid, start_pfn, nr_pages, 941 + MMOP_ONLINE_KERNEL)) 942 + return true; 943 + 944 + if (!movable_node_is_enabled()) 945 + return false; 946 + 947 + return !zone_intersects(default_zone, start_pfn, nr_pages); 948 + } 949 + 875 950 /* 876 951 * Associates the given pfn range with the given node and the zone appropriate 877 952 * for the given online type. ··· 900 949 /* 901 950 * MMOP_ONLINE_KEEP defaults to MMOP_ONLINE_KERNEL but use 902 951 * movable zone if that is not possible (e.g. we are within 903 - * or past the existing movable zone) 952 + * or past the existing movable zone). movable_node overrides 953 + * this default and defaults to movable zone 904 954 */ 905 - if (!allow_online_pfn_range(nid, start_pfn, nr_pages, 906 - MMOP_ONLINE_KERNEL)) 955 + if (movable_pfn_range(nid, zone, start_pfn, nr_pages)) 907 956 zone = movable_zone; 908 957 } else if (online_type == MMOP_ONLINE_MOVABLE) { 909 958 zone = &pgdat->node_zones[ZONE_MOVABLE]; ··· 1219 1268 1220 1269 error: 1221 1270 /* rollback pgdat allocation and others */ 1222 - if (new_pgdat) 1271 + if (new_pgdat && pgdat) 1223 1272 rollback_node_hotadd(nid, pgdat); 1224 1273 memblock_remove(start, size); 1225 1274 ··· 1371 1420 static struct page *new_node_page(struct page *page, unsigned long private, 1372 1421 int **result) 1373 1422 { 1374 - gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE; 1375 1423 int nid = page_to_nid(page); 1376 1424 nodemask_t nmask = node_states[N_MEMORY]; 1377 - struct page *new_page = NULL; 1378 1425 1379 1426 /* 1380 - * TODO: allocate a destination hugepage from a nearest neighbor node, 1381 - * accordance with memory policy of the user process if possible. For 1382 - * now as a simple work-around, we use the next node for destination. 1427 + * try to allocate from a different node but reuse this node if there 1428 + * are no other online nodes to be used (e.g. we are offlining a part 1429 + * of the only existing node) 1383 1430 */ 1384 - if (PageHuge(page)) 1385 - return alloc_huge_page_node(page_hstate(compound_head(page)), 1386 - next_node_in(nid, nmask)); 1387 - 1388 1431 node_clear(nid, nmask); 1432 + if (nodes_empty(nmask)) 1433 + node_set(nid, nmask); 1389 1434 1390 - if (PageHighMem(page) 1391 - || (zone_idx(page_zone(page)) == ZONE_MOVABLE)) 1392 - gfp_mask |= __GFP_HIGHMEM; 1393 - 1394 - if (!nodes_empty(nmask)) 1395 - new_page = __alloc_pages_nodemask(gfp_mask, 0, nid, &nmask); 1396 - if (!new_page) 1397 - new_page = __alloc_pages(gfp_mask, 0, nid); 1398 - 1399 - return new_page; 1435 + return new_page_nodemask(page, nid, &nmask); 1400 1436 } 1401 1437 1402 1438 #define NR_OFFLINE_AT_ONCE_PAGES (256) ··· 1666 1728 goto failed_removal; 1667 1729 ret = 0; 1668 1730 if (drain) { 1669 - lru_add_drain_all(); 1731 + lru_add_drain_all_cpuslocked(); 1670 1732 cond_resched(); 1671 1733 drain_all_pages(zone); 1672 1734 } ··· 1687 1749 } 1688 1750 } 1689 1751 /* drain all zone's lru pagevec, this is asynchronous... */ 1690 - lru_add_drain_all(); 1752 + lru_add_drain_all_cpuslocked(); 1691 1753 yield(); 1692 1754 /* drain pcp pages, this is synchronous. */ 1693 1755 drain_all_pages(zone);

+4 -13

mm/migrate.c

··· 1252 1252 out: 1253 1253 if (rc != -EAGAIN) 1254 1254 putback_active_hugepage(hpage); 1255 + if (reason == MR_MEMORY_FAILURE && !test_set_page_hwpoison(hpage)) 1256 + num_poisoned_pages_inc(); 1255 1257 1256 1258 /* 1257 1259 * If migration was not successful and there's a freeing callback, use ··· 1916 1914 int page_lru = page_is_file_cache(page); 1917 1915 unsigned long mmun_start = address & HPAGE_PMD_MASK; 1918 1916 unsigned long mmun_end = mmun_start + HPAGE_PMD_SIZE; 1919 - pmd_t orig_entry; 1920 1917 1921 1918 /* 1922 1919 * Rate-limit the amount of data that is being migrated to a node. ··· 1958 1957 /* Recheck the target PMD */ 1959 1958 mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end); 1960 1959 ptl = pmd_lock(mm, pmd); 1961 - if (unlikely(!pmd_same(*pmd, entry) || page_count(page) != 2)) { 1962 - fail_putback: 1960 + if (unlikely(!pmd_same(*pmd, entry) || !page_ref_freeze(page, 2))) { 1963 1961 spin_unlock(ptl); 1964 1962 mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end); 1965 1963 ··· 1980 1980 goto out_unlock; 1981 1981 } 1982 1982 1983 - orig_entry = *pmd; 1984 1983 entry = mk_huge_pmd(new_page, vma->vm_page_prot); 1985 1984 entry = maybe_pmd_mkwrite(pmd_mkdirty(entry), vma); 1986 1985 ··· 1996 1997 set_pmd_at(mm, mmun_start, pmd, entry); 1997 1998 update_mmu_cache_pmd(vma, address, &entry); 1998 1999 1999 - if (page_count(page) != 2) { 2000 - set_pmd_at(mm, mmun_start, pmd, orig_entry); 2001 - flush_pmd_tlb_range(vma, mmun_start, mmun_end); 2002 - mmu_notifier_invalidate_range(mm, mmun_start, mmun_end); 2003 - update_mmu_cache_pmd(vma, address, &entry); 2004 - page_remove_rmap(new_page, true); 2005 - goto fail_putback; 2006 - } 2007 - 2000 + page_ref_unfreeze(page, 2); 2008 2001 mlock_migrate_page(new_page, page); 2009 2002 page_remove_rmap(page, true); 2010 2003 set_page_owner_migrate_reason(new_page, MR_NUMA_MISPLACED);

+8 -11

mm/mmap.c

··· 2177 2177 unsigned long size, unsigned long grow) 2178 2178 { 2179 2179 struct mm_struct *mm = vma->vm_mm; 2180 - struct rlimit *rlim = current->signal->rlim; 2181 2180 unsigned long new_start; 2182 2181 2183 2182 /* address space limit tests */ ··· 2184 2185 return -ENOMEM; 2185 2186 2186 2187 /* Stack limit test */ 2187 - if (size > READ_ONCE(rlim[RLIMIT_STACK].rlim_cur)) 2188 + if (size > rlimit(RLIMIT_STACK)) 2188 2189 return -ENOMEM; 2189 2190 2190 2191 /* mlock limit tests */ ··· 2192 2193 unsigned long locked; 2193 2194 unsigned long limit; 2194 2195 locked = mm->locked_vm + grow; 2195 - limit = READ_ONCE(rlim[RLIMIT_MEMLOCK].rlim_cur); 2196 + limit = rlimit(RLIMIT_MEMLOCK); 2196 2197 limit >>= PAGE_SHIFT; 2197 2198 if (locked > limit && !capable(CAP_IPC_LOCK)) 2198 2199 return -ENOMEM; ··· 2243 2244 gap_addr = TASK_SIZE; 2244 2245 2245 2246 next = vma->vm_next; 2246 - if (next && next->vm_start < gap_addr) { 2247 + if (next && next->vm_start < gap_addr && 2248 + (next->vm_flags & (VM_WRITE|VM_READ|VM_EXEC))) { 2247 2249 if (!(next->vm_flags & VM_GROWSUP)) 2248 2250 return -ENOMEM; 2249 2251 /* Check that both stack segments have the same anon_vma? */ ··· 2315 2315 { 2316 2316 struct mm_struct *mm = vma->vm_mm; 2317 2317 struct vm_area_struct *prev; 2318 - unsigned long gap_addr; 2319 2318 int error; 2320 2319 2321 2320 address &= PAGE_MASK; ··· 2323 2324 return error; 2324 2325 2325 2326 /* Enforce stack_guard_gap */ 2326 - gap_addr = address - stack_guard_gap; 2327 - if (gap_addr > address) 2328 - return -ENOMEM; 2329 2327 prev = vma->vm_prev; 2330 - if (prev && prev->vm_end > gap_addr) { 2331 - if (!(prev->vm_flags & VM_GROWSDOWN)) 2328 + /* Check that both stack segments have the same anon_vma? */ 2329 + if (prev && !(prev->vm_flags & VM_GROWSDOWN) && 2330 + (prev->vm_flags & (VM_WRITE|VM_READ|VM_EXEC))) { 2331 + if (address - prev->vm_end < stack_guard_gap) 2332 2332 return -ENOMEM; 2333 - /* Check that both stack segments have the same anon_vma? */ 2334 2333 } 2335 2334 2336 2335 /* We must make sure the anon_vma is allocated. */

+7

mm/oom_kill.c

··· 490 490 491 491 if (!down_read_trylock(&mm->mmap_sem)) { 492 492 ret = false; 493 + trace_skip_task_reaping(tsk->pid); 493 494 goto unlock_oom; 494 495 } 495 496 ··· 501 500 */ 502 501 if (!mmget_not_zero(mm)) { 503 502 up_read(&mm->mmap_sem); 503 + trace_skip_task_reaping(tsk->pid); 504 504 goto unlock_oom; 505 505 } 506 + 507 + trace_start_task_reaping(tsk->pid); 506 508 507 509 /* 508 510 * Tell all users of get_user/copy_from_user etc... that the content ··· 548 544 * put the oom_reaper out of the way. 549 545 */ 550 546 mmput_async(mm); 547 + trace_finish_task_reaping(tsk->pid); 551 548 unlock_oom: 552 549 mutex_unlock(&oom_lock); 553 550 return ret; ··· 620 615 tsk->oom_reaper_list = oom_reaper_list; 621 616 oom_reaper_list = tsk; 622 617 spin_unlock(&oom_reaper_lock); 618 + trace_wake_reaper(tsk->pid); 623 619 wake_up(&oom_reaper_wait); 624 620 } 625 621 ··· 672 666 */ 673 667 __thaw_task(tsk); 674 668 atomic_inc(&oom_victims); 669 + trace_mark_victim(tsk->pid); 675 670 } 676 671 677 672 /**

+53 -15

mm/page_alloc.c

··· 2206 2206 * list of requested migratetype, possibly along with other pages from the same 2207 2207 * block, depending on fragmentation avoidance heuristics. Returns true if 2208 2208 * fallback was found so that __rmqueue_smallest() can grab it. 2209 + * 2210 + * The use of signed ints for order and current_order is a deliberate 2211 + * deviation from the rest of this file, to make the for loop 2212 + * condition simpler. 2209 2213 */ 2210 2214 static inline bool 2211 - __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype) 2215 + __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) 2212 2216 { 2213 2217 struct free_area *area; 2214 - unsigned int current_order; 2218 + int current_order; 2215 2219 struct page *page; 2216 2220 int fallback_mt; 2217 2221 bool can_steal; 2218 2222 2219 - /* Find the largest possible block of pages in the other list */ 2220 - for (current_order = MAX_ORDER-1; 2221 - current_order >= order && current_order <= MAX_ORDER-1; 2223 + /* 2224 + * Find the largest available free page in the other list. This roughly 2225 + * approximates finding the pageblock with the most free pages, which 2226 + * would be too costly to do exactly. 2227 + */ 2228 + for (current_order = MAX_ORDER - 1; current_order >= order; 2222 2229 --current_order) { 2223 2230 area = &(zone->free_area[current_order]); 2224 2231 fallback_mt = find_suitable_fallback(area, current_order, ··· 2233 2226 if (fallback_mt == -1) 2234 2227 continue; 2235 2228 2236 - page = list_first_entry(&area->free_list[fallback_mt], 2237 - struct page, lru); 2229 + /* 2230 + * We cannot steal all free pages from the pageblock and the 2231 + * requested migratetype is movable. In that case it's better to 2232 + * steal and split the smallest available page instead of the 2233 + * largest available page, because even if the next movable 2234 + * allocation falls back into a different pageblock than this 2235 + * one, it won't cause permanent fragmentation. 2236 + */ 2237 + if (!can_steal && start_migratetype == MIGRATE_MOVABLE 2238 + && current_order > order) 2239 + goto find_smallest; 2238 2240 2239 - steal_suitable_fallback(zone, page, start_migratetype, 2240 - can_steal); 2241 - 2242 - trace_mm_page_alloc_extfrag(page, order, current_order, 2243 - start_migratetype, fallback_mt); 2244 - 2245 - return true; 2241 + goto do_steal; 2246 2242 } 2247 2243 2248 2244 return false; 2245 + 2246 + find_smallest: 2247 + for (current_order = order; current_order < MAX_ORDER; 2248 + current_order++) { 2249 + area = &(zone->free_area[current_order]); 2250 + fallback_mt = find_suitable_fallback(area, current_order, 2251 + start_migratetype, false, &can_steal); 2252 + if (fallback_mt != -1) 2253 + break; 2254 + } 2255 + 2256 + /* 2257 + * This should not happen - we already found a suitable fallback 2258 + * when looking for the largest page. 2259 + */ 2260 + VM_BUG_ON(current_order == MAX_ORDER); 2261 + 2262 + do_steal: 2263 + page = list_first_entry(&area->free_list[fallback_mt], 2264 + struct page, lru); 2265 + 2266 + steal_suitable_fallback(zone, page, start_migratetype, can_steal); 2267 + 2268 + trace_mm_page_alloc_extfrag(page, order, current_order, 2269 + start_migratetype, fallback_mt); 2270 + 2271 + return true; 2272 + 2249 2273 } 2250 2274 2251 2275 /* ··· 5278 5240 #endif 5279 5241 /* we have to stop all cpus to guarantee there is no user 5280 5242 of zonelist */ 5281 - stop_machine(__build_all_zonelists, pgdat, NULL); 5243 + stop_machine_cpuslocked(__build_all_zonelists, pgdat, NULL); 5282 5244 /* cpuset refresh routine should be here */ 5283 5245 } 5284 5246 vm_total_pages = nr_free_pagecache_pages();

+21 -2

mm/page_io.c

··· 117 117 static void end_swap_bio_read(struct bio *bio) 118 118 { 119 119 struct page *page = bio->bi_io_vec[0].bv_page; 120 + struct task_struct *waiter = bio->bi_private; 120 121 121 122 if (bio->bi_status) { 122 123 SetPageError(page); ··· 133 132 swap_slot_free_notify(page); 134 133 out: 135 134 unlock_page(page); 135 + WRITE_ONCE(bio->bi_private, NULL); 136 136 bio_put(bio); 137 + wake_up_process(waiter); 137 138 } 138 139 139 140 int generic_swapfile_activate(struct swap_info_struct *sis, ··· 332 329 return ret; 333 330 } 334 331 335 - int swap_readpage(struct page *page) 332 + int swap_readpage(struct page *page, bool do_poll) 336 333 { 337 334 struct bio *bio; 338 335 int ret = 0; 339 336 struct swap_info_struct *sis = page_swap_info(page); 337 + blk_qc_t qc; 338 + struct block_device *bdev; 340 339 341 340 VM_BUG_ON_PAGE(!PageSwapCache(page), page); 342 341 VM_BUG_ON_PAGE(!PageLocked(page), page); ··· 377 372 ret = -ENOMEM; 378 373 goto out; 379 374 } 375 + bdev = bio->bi_bdev; 376 + bio->bi_private = current; 380 377 bio_set_op_attrs(bio, REQ_OP_READ, 0); 381 378 count_vm_event(PSWPIN); 382 - submit_bio(bio); 379 + bio_get(bio); 380 + qc = submit_bio(bio); 381 + while (do_poll) { 382 + set_current_state(TASK_UNINTERRUPTIBLE); 383 + if (!READ_ONCE(bio->bi_private)) 384 + break; 385 + 386 + if (!blk_mq_poll(bdev_get_queue(bdev), qc)) 387 + break; 388 + } 389 + __set_current_state(TASK_RUNNING); 390 + bio_put(bio); 391 + 383 392 out: 384 393 return ret; 385 394 }

+2 -16

mm/page_isolation.c

··· 8 8 #include <linux/memory.h> 9 9 #include <linux/hugetlb.h> 10 10 #include <linux/page_owner.h> 11 + #include <linux/migrate.h> 11 12 #include "internal.h" 12 13 13 14 #define CREATE_TRACE_POINTS ··· 295 294 struct page *alloc_migrate_target(struct page *page, unsigned long private, 296 295 int **resultp) 297 296 { 298 - gfp_t gfp_mask = GFP_USER | __GFP_MOVABLE; 299 - 300 - /* 301 - * TODO: allocate a destination hugepage from a nearest neighbor node, 302 - * accordance with memory policy of the user process if possible. For 303 - * now as a simple work-around, we use the next node for destination. 304 - */ 305 - if (PageHuge(page)) 306 - return alloc_huge_page_node(page_hstate(compound_head(page)), 307 - next_node_in(page_to_nid(page), 308 - node_online_map)); 309 - 310 - if (PageHighMem(page)) 311 - gfp_mask |= __GFP_HIGHMEM; 312 - 313 - return alloc_page(gfp_mask); 297 + return new_page_nodemask(page, numa_node_id(), &node_states[N_MEMORY]); 314 298 }

+5 -1

mm/page_owner.c

··· 281 281 continue; 282 282 283 283 if (PageBuddy(page)) { 284 - pfn += (1UL << page_order(page)) - 1; 284 + unsigned long freepage_order; 285 + 286 + freepage_order = page_order_unsafe(page); 287 + if (freepage_order < MAX_ORDER) 288 + pfn += (1UL << freepage_order) - 1; 285 289 continue; 286 290 } 287 291

+5 -3

mm/shmem.c

··· 1977 1977 } 1978 1978 1979 1979 sgp = SGP_CACHE; 1980 - if (vma->vm_flags & VM_HUGEPAGE) 1981 - sgp = SGP_HUGE; 1982 - else if (vma->vm_flags & VM_NOHUGEPAGE) 1980 + 1981 + if ((vma->vm_flags & VM_NOHUGEPAGE) || 1982 + test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) 1983 1983 sgp = SGP_NOHUGE; 1984 + else if (vma->vm_flags & VM_HUGEPAGE) 1985 + sgp = SGP_HUGE; 1984 1986 1985 1987 error = shmem_getpage_gfp(inode, vmf->pgoff, &vmf->page, sgp, 1986 1988 gfp, vma, vmf, &ret);

+8 -3

mm/swap.c

··· 688 688 689 689 static DEFINE_PER_CPU(struct work_struct, lru_add_drain_work); 690 690 691 - void lru_add_drain_all(void) 691 + void lru_add_drain_all_cpuslocked(void) 692 692 { 693 693 static DEFINE_MUTEX(lock); 694 694 static struct cpumask has_work; ··· 702 702 return; 703 703 704 704 mutex_lock(&lock); 705 - get_online_cpus(); 706 705 cpumask_clear(&has_work); 707 706 708 707 for_each_online_cpu(cpu) { ··· 721 722 for_each_cpu(cpu, &has_work) 722 723 flush_work(&per_cpu(lru_add_drain_work, cpu)); 723 724 724 - put_online_cpus(); 725 725 mutex_unlock(&lock); 726 + } 727 + 728 + void lru_add_drain_all(void) 729 + { 730 + get_online_cpus(); 731 + lru_add_drain_all_cpuslocked(); 732 + put_online_cpus(); 726 733 } 727 734 728 735 /**

+2 -3

mm/swap_slots.c

··· 273 273 { 274 274 struct swap_slots_cache *cache; 275 275 276 - cache = &get_cpu_var(swp_slots); 276 + cache = raw_cpu_ptr(&swp_slots); 277 277 if (use_swap_slot_cache && cache->slots_ret) { 278 278 spin_lock_irq(&cache->free_lock); 279 279 /* Swap slots cache may be deactivated before acquiring lock */ 280 - if (!use_swap_slot_cache) { 280 + if (!use_swap_slot_cache || !cache->slots_ret) { 281 281 spin_unlock_irq(&cache->free_lock); 282 282 goto direct_free; 283 283 } ··· 297 297 direct_free: 298 298 swapcache_free_entries(&entry, 1); 299 299 } 300 - put_cpu_var(swp_slots); 301 300 302 301 return 0; 303 302 }

+6 -4

mm/swap_state.c

··· 412 412 * the swap entry is no longer in use. 413 413 */ 414 414 struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, 415 - struct vm_area_struct *vma, unsigned long addr) 415 + struct vm_area_struct *vma, unsigned long addr, bool do_poll) 416 416 { 417 417 bool page_was_allocated; 418 418 struct page *retpage = __read_swap_cache_async(entry, gfp_mask, 419 419 vma, addr, &page_was_allocated); 420 420 421 421 if (page_was_allocated) 422 - swap_readpage(retpage); 422 + swap_readpage(retpage, do_poll); 423 423 424 424 return retpage; 425 425 } ··· 496 496 unsigned long start_offset, end_offset; 497 497 unsigned long mask; 498 498 struct blk_plug plug; 499 + bool do_poll = true; 499 500 500 501 mask = swapin_nr_pages(offset) - 1; 501 502 if (!mask) 502 503 goto skip; 503 504 505 + do_poll = false; 504 506 /* Read a page_cluster sized and aligned cluster around offset. */ 505 507 start_offset = offset & ~mask; 506 508 end_offset = offset | mask; ··· 513 511 for (offset = start_offset; offset <= end_offset ; offset++) { 514 512 /* Ok, do the async read-ahead now */ 515 513 page = read_swap_cache_async(swp_entry(swp_type(entry), offset), 516 - gfp_mask, vma, addr); 514 + gfp_mask, vma, addr, false); 517 515 if (!page) 518 516 continue; 519 517 if (offset != entry_offset && likely(!PageTransCompound(page))) ··· 524 522 525 523 lru_add_drain(); /* Push any new pages onto the LRU now */ 526 524 skip: 527 - return read_swap_cache_async(entry, gfp_mask, vma, addr); 525 + return read_swap_cache_async(entry, gfp_mask, vma, addr, do_poll); 528 526 } 529 527 530 528 int init_swap_address_space(unsigned int type, unsigned long nr_pages)

+1 -1

mm/swapfile.c

··· 1868 1868 swap_map = &si->swap_map[i]; 1869 1869 entry = swp_entry(type, i); 1870 1870 page = read_swap_cache_async(entry, 1871 - GFP_HIGHUSER_MOVABLE, NULL, 0); 1871 + GFP_HIGHUSER_MOVABLE, NULL, 0, false); 1872 1872 if (!page) { 1873 1873 /* 1874 1874 * Either swap_duplicate() failed because entry

+8 -2

mm/truncate.c

··· 530 530 } else if (PageTransHuge(page)) { 531 531 index += HPAGE_PMD_NR - 1; 532 532 i += HPAGE_PMD_NR - 1; 533 - /* 'end' is in the middle of THP */ 534 - if (index == round_down(end, HPAGE_PMD_NR)) 533 + /* 534 + * 'end' is in the middle of THP. Don't 535 + * invalidate the page as the part outside of 536 + * 'end' could be still useful. 537 + */ 538 + if (index > end) { 539 + unlock_page(page); 535 540 continue; 541 + } 536 542 } 537 543 538 544 ret = invalidate_inode_page(page);

+9 -1

mm/vmalloc.c

··· 325 325 326 326 /*** Global kva allocator ***/ 327 327 328 + #define VM_LAZY_FREE 0x02 328 329 #define VM_VM_AREA 0x04 329 330 330 331 static DEFINE_SPINLOCK(vmap_area_lock); ··· 1498 1497 spin_lock(&vmap_area_lock); 1499 1498 va->vm = NULL; 1500 1499 va->flags &= ~VM_VM_AREA; 1500 + va->flags |= VM_LAZY_FREE; 1501 1501 spin_unlock(&vmap_area_lock); 1502 1502 1503 1503 vmap_debug_free_range(va->va_start, va->va_end); ··· 2706 2704 * s_show can encounter race with remove_vm_area, !VM_VM_AREA on 2707 2705 * behalf of vmap area is being tear down or vm_map_ram allocation. 2708 2706 */ 2709 - if (!(va->flags & VM_VM_AREA)) 2707 + if (!(va->flags & VM_VM_AREA)) { 2708 + seq_printf(m, "0x%pK-0x%pK %7ld %s\n", 2709 + (void *)va->va_start, (void *)va->va_end, 2710 + va->va_end - va->va_start, 2711 + va->flags & VM_LAZY_FREE ? "unpurged vm_area" : "vm_map_ram"); 2712 + 2710 2713 return 0; 2714 + } 2711 2715 2712 2716 v = va->vm; 2713 2717

+93 -29

mm/vmpressure.c

··· 93 93 VMPRESSURE_NUM_LEVELS, 94 94 }; 95 95 96 + enum vmpressure_modes { 97 + VMPRESSURE_NO_PASSTHROUGH = 0, 98 + VMPRESSURE_HIERARCHY, 99 + VMPRESSURE_LOCAL, 100 + VMPRESSURE_NUM_MODES, 101 + }; 102 + 96 103 static const char * const vmpressure_str_levels[] = { 97 104 [VMPRESSURE_LOW] = "low", 98 105 [VMPRESSURE_MEDIUM] = "medium", 99 106 [VMPRESSURE_CRITICAL] = "critical", 107 + }; 108 + 109 + static const char * const vmpressure_str_modes[] = { 110 + [VMPRESSURE_NO_PASSTHROUGH] = "default", 111 + [VMPRESSURE_HIERARCHY] = "hierarchy", 112 + [VMPRESSURE_LOCAL] = "local", 100 113 }; 101 114 102 115 static enum vmpressure_levels vmpressure_level(unsigned long pressure) ··· 154 141 struct vmpressure_event { 155 142 struct eventfd_ctx *efd; 156 143 enum vmpressure_levels level; 144 + enum vmpressure_modes mode; 157 145 struct list_head node; 158 146 }; 159 147 160 148 static bool vmpressure_event(struct vmpressure *vmpr, 161 - enum vmpressure_levels level) 149 + const enum vmpressure_levels level, 150 + bool ancestor, bool signalled) 162 151 { 163 152 struct vmpressure_event *ev; 164 - bool signalled = false; 153 + bool ret = false; 165 154 166 155 mutex_lock(&vmpr->events_lock); 167 - 168 156 list_for_each_entry(ev, &vmpr->events, node) { 169 - if (level >= ev->level) { 170 - eventfd_signal(ev->efd, 1); 171 - signalled = true; 172 - } 157 + if (ancestor && ev->mode == VMPRESSURE_LOCAL) 158 + continue; 159 + if (signalled && ev->mode == VMPRESSURE_NO_PASSTHROUGH) 160 + continue; 161 + if (level < ev->level) 162 + continue; 163 + eventfd_signal(ev->efd, 1); 164 + ret = true; 173 165 } 174 - 175 166 mutex_unlock(&vmpr->events_lock); 176 167 177 - return signalled; 168 + return ret; 178 169 } 179 170 180 171 static void vmpressure_work_fn(struct work_struct *work) ··· 187 170 unsigned long scanned; 188 171 unsigned long reclaimed; 189 172 enum vmpressure_levels level; 173 + bool ancestor = false; 174 + bool signalled = false; 190 175 191 176 spin_lock(&vmpr->sr_lock); 192 177 /* ··· 213 194 level = vmpressure_calc_level(scanned, reclaimed); 214 195 215 196 do { 216 - if (vmpressure_event(vmpr, level)) 217 - break; 218 - /* 219 - * If not handled, propagate the event upward into the 220 - * hierarchy. 221 - */ 197 + if (vmpressure_event(vmpr, level, ancestor, signalled)) 198 + signalled = true; 199 + ancestor = true; 222 200 } while ((vmpr = vmpressure_parent(vmpr))); 223 201 } 224 202 ··· 342 326 vmpressure(gfp, memcg, true, vmpressure_win, 0); 343 327 } 344 328 329 + static enum vmpressure_levels str_to_level(const char *arg) 330 + { 331 + enum vmpressure_levels level; 332 + 333 + for (level = 0; level < VMPRESSURE_NUM_LEVELS; level++) 334 + if (!strcmp(vmpressure_str_levels[level], arg)) 335 + return level; 336 + return -1; 337 + } 338 + 339 + static enum vmpressure_modes str_to_mode(const char *arg) 340 + { 341 + enum vmpressure_modes mode; 342 + 343 + for (mode = 0; mode < VMPRESSURE_NUM_MODES; mode++) 344 + if (!strcmp(vmpressure_str_modes[mode], arg)) 345 + return mode; 346 + return -1; 347 + } 348 + 349 + #define MAX_VMPRESSURE_ARGS_LEN (strlen("critical") + strlen("hierarchy") + 2) 350 + 345 351 /** 346 352 * vmpressure_register_event() - Bind vmpressure notifications to an eventfd 347 353 * @memcg: memcg that is interested in vmpressure notifications 348 354 * @eventfd: eventfd context to link notifications with 349 - * @args: event arguments (used to set up a pressure level threshold) 355 + * @args: event arguments (pressure level threshold, optional mode) 350 356 * 351 357 * This function associates eventfd context with the vmpressure 352 358 * infrastructure, so that the notifications will be delivered to the 353 - * @eventfd. The @args parameter is a string that denotes pressure level 354 - * threshold (one of vmpressure_str_levels, i.e. "low", "medium", or 355 - * "critical"). 359 + * @eventfd. The @args parameter is a comma-delimited string that denotes a 360 + * pressure level threshold (one of vmpressure_str_levels, i.e. "low", "medium", 361 + * or "critical") and an optional mode (one of vmpressure_str_modes, i.e. 362 + * "hierarchy" or "local"). 356 363 * 357 364 * To be used as memcg event method. 358 365 */ ··· 384 345 { 385 346 struct vmpressure *vmpr = memcg_to_vmpressure(memcg); 386 347 struct vmpressure_event *ev; 387 - int level; 348 + enum vmpressure_modes mode = VMPRESSURE_NO_PASSTHROUGH; 349 + enum vmpressure_levels level = -1; 350 + char *spec, *spec_orig; 351 + char *token; 352 + int ret = 0; 388 353 389 - for (level = 0; level < VMPRESSURE_NUM_LEVELS; level++) { 390 - if (!strcmp(vmpressure_str_levels[level], args)) 391 - break; 354 + spec_orig = spec = kzalloc(MAX_VMPRESSURE_ARGS_LEN + 1, GFP_KERNEL); 355 + if (!spec) { 356 + ret = -ENOMEM; 357 + goto out; 358 + } 359 + strncpy(spec, args, MAX_VMPRESSURE_ARGS_LEN); 360 + 361 + /* Find required level */ 362 + token = strsep(&spec, ","); 363 + level = str_to_level(token); 364 + if (level == -1) { 365 + ret = -EINVAL; 366 + goto out; 392 367 } 393 368 394 - if (level >= VMPRESSURE_NUM_LEVELS) 395 - return -EINVAL; 369 + /* Find optional mode */ 370 + token = strsep(&spec, ","); 371 + if (token) { 372 + mode = str_to_mode(token); 373 + if (mode == -1) { 374 + ret = -EINVAL; 375 + goto out; 376 + } 377 + } 396 378 397 379 ev = kzalloc(sizeof(*ev), GFP_KERNEL); 398 - if (!ev) 399 - return -ENOMEM; 380 + if (!ev) { 381 + ret = -ENOMEM; 382 + goto out; 383 + } 400 384 401 385 ev->efd = eventfd; 402 386 ev->level = level; 387 + ev->mode = mode; 403 388 404 389 mutex_lock(&vmpr->events_lock); 405 390 list_add(&ev->node, &vmpr->events); 406 391 mutex_unlock(&vmpr->events_lock); 407 - 408 - return 0; 392 + out: 393 + kfree(spec_orig); 394 + return ret; 409 395 } 410 396 411 397 /**

+11 -2

mm/vmscan.c

··· 2228 2228 } 2229 2229 2230 2230 if (unlikely(pgdatfile + pgdatfree <= total_high_wmark)) { 2231 - scan_balance = SCAN_ANON; 2232 - goto out; 2231 + /* 2232 + * Force SCAN_ANON if there are enough inactive 2233 + * anonymous pages on the LRU in eligible zones. 2234 + * Otherwise, the small LRU gets thrashed. 2235 + */ 2236 + if (!inactive_list_is_low(lruvec, false, memcg, sc, false) && 2237 + lruvec_lru_size(lruvec, LRU_INACTIVE_ANON, sc->reclaim_idx) 2238 + >> sc->priority) { 2239 + scan_balance = SCAN_ANON; 2240 + goto out; 2241 + } 2233 2242 } 2234 2243 } 2235 2244

+14 -10

mm/vmstat.c

··· 1130 1130 * If @assert_populated is true, only use callback for zones that are populated. 1131 1131 */ 1132 1132 static void walk_zones_in_node(struct seq_file *m, pg_data_t *pgdat, 1133 - bool assert_populated, 1133 + bool assert_populated, bool nolock, 1134 1134 void (*print)(struct seq_file *m, pg_data_t *, struct zone *)) 1135 1135 { 1136 1136 struct zone *zone; ··· 1141 1141 if (assert_populated && !populated_zone(zone)) 1142 1142 continue; 1143 1143 1144 - spin_lock_irqsave(&zone->lock, flags); 1144 + if (!nolock) 1145 + spin_lock_irqsave(&zone->lock, flags); 1145 1146 print(m, pgdat, zone); 1146 - spin_unlock_irqrestore(&zone->lock, flags); 1147 + if (!nolock) 1148 + spin_unlock_irqrestore(&zone->lock, flags); 1147 1149 } 1148 1150 } 1149 1151 #endif ··· 1168 1166 static int frag_show(struct seq_file *m, void *arg) 1169 1167 { 1170 1168 pg_data_t *pgdat = (pg_data_t *)arg; 1171 - walk_zones_in_node(m, pgdat, true, frag_show_print); 1169 + walk_zones_in_node(m, pgdat, true, false, frag_show_print); 1172 1170 return 0; 1173 1171 } 1174 1172 ··· 1209 1207 seq_printf(m, "%6d ", order); 1210 1208 seq_putc(m, '\n'); 1211 1209 1212 - walk_zones_in_node(m, pgdat, true, pagetypeinfo_showfree_print); 1210 + walk_zones_in_node(m, pgdat, true, false, pagetypeinfo_showfree_print); 1213 1211 1214 1212 return 0; 1215 1213 } ··· 1260 1258 for (mtype = 0; mtype < MIGRATE_TYPES; mtype++) 1261 1259 seq_printf(m, "%12s ", migratetype_names[mtype]); 1262 1260 seq_putc(m, '\n'); 1263 - walk_zones_in_node(m, pgdat, true, pagetypeinfo_showblockcount_print); 1261 + walk_zones_in_node(m, pgdat, true, false, 1262 + pagetypeinfo_showblockcount_print); 1264 1263 1265 1264 return 0; 1266 1265 } ··· 1287 1284 seq_printf(m, "%12s ", migratetype_names[mtype]); 1288 1285 seq_putc(m, '\n'); 1289 1286 1290 - walk_zones_in_node(m, pgdat, true, pagetypeinfo_showmixedcount_print); 1287 + walk_zones_in_node(m, pgdat, true, true, 1288 + pagetypeinfo_showmixedcount_print); 1291 1289 #endif /* CONFIG_PAGE_OWNER */ 1292 1290 } 1293 1291 ··· 1450 1446 static int zoneinfo_show(struct seq_file *m, void *arg) 1451 1447 { 1452 1448 pg_data_t *pgdat = (pg_data_t *)arg; 1453 - walk_zones_in_node(m, pgdat, false, zoneinfo_show_print); 1449 + walk_zones_in_node(m, pgdat, false, false, zoneinfo_show_print); 1454 1450 return 0; 1455 1451 } 1456 1452 ··· 1856 1852 if (!node_state(pgdat->node_id, N_MEMORY)) 1857 1853 return 0; 1858 1854 1859 - walk_zones_in_node(m, pgdat, true, unusable_show_print); 1855 + walk_zones_in_node(m, pgdat, true, false, unusable_show_print); 1860 1856 1861 1857 return 0; 1862 1858 } ··· 1908 1904 { 1909 1905 pg_data_t *pgdat = (pg_data_t *)arg; 1910 1906 1911 - walk_zones_in_node(m, pgdat, true, extfrag_show_print); 1907 + walk_zones_in_node(m, pgdat, true, false, extfrag_show_print); 1912 1908 1913 1909 return 0; 1914 1910 }

+16 -38

mm/zsmalloc.c

··· 116 116 #define OBJ_INDEX_BITS (BITS_PER_LONG - _PFN_BITS - OBJ_TAG_BITS) 117 117 #define OBJ_INDEX_MASK ((_AC(1, UL) << OBJ_INDEX_BITS) - 1) 118 118 119 + #define FULLNESS_BITS 2 120 + #define CLASS_BITS 8 121 + #define ISOLATED_BITS 3 122 + #define MAGIC_VAL_BITS 8 123 + 119 124 #define MAX(a, b) ((a) >= (b) ? (a) : (b)) 120 125 /* ZS_MIN_ALLOC_SIZE must be multiple of ZS_ALIGN */ 121 126 #define ZS_MIN_ALLOC_SIZE \ ··· 142 137 * (reason above) 143 138 */ 144 139 #define ZS_SIZE_CLASS_DELTA (PAGE_SIZE >> CLASS_BITS) 140 + #define ZS_SIZE_CLASSES (DIV_ROUND_UP(ZS_MAX_ALLOC_SIZE - ZS_MIN_ALLOC_SIZE, \ 141 + ZS_SIZE_CLASS_DELTA) + 1) 145 142 146 143 enum fullness_group { 147 144 ZS_EMPTY, ··· 174 167 #ifdef CONFIG_COMPACTION 175 168 static struct vfsmount *zsmalloc_mnt; 176 169 #endif 177 - 178 - /* 179 - * number of size_classes 180 - */ 181 - static int zs_size_classes; 182 170 183 171 /* 184 172 * We assign a page to ZS_ALMOST_EMPTY fullness group when: ··· 246 244 struct zs_pool { 247 245 const char *name; 248 246 249 - struct size_class **size_class; 247 + struct size_class *size_class[ZS_SIZE_CLASSES]; 250 248 struct kmem_cache *handle_cachep; 251 249 struct kmem_cache *zspage_cachep; 252 250 ··· 269 267 struct work_struct free_work; 270 268 #endif 271 269 }; 272 - 273 - #define FULLNESS_BITS 2 274 - #define CLASS_BITS 8 275 - #define ISOLATED_BITS 3 276 - #define MAGIC_VAL_BITS 8 277 270 278 271 struct zspage { 279 272 struct { ··· 466 469 return zspage->isolated; 467 470 } 468 471 469 - static int is_first_page(struct page *page) 472 + static __maybe_unused int is_first_page(struct page *page) 470 473 { 471 474 return PagePrivate(page); 472 475 } ··· 548 551 idx = DIV_ROUND_UP(size - ZS_MIN_ALLOC_SIZE, 549 552 ZS_SIZE_CLASS_DELTA); 550 553 551 - return min(zs_size_classes - 1, idx); 554 + return min_t(int, ZS_SIZE_CLASSES - 1, idx); 552 555 } 553 556 554 557 static inline void zs_stat_inc(struct size_class *class, ··· 607 610 "obj_allocated", "obj_used", "pages_used", 608 611 "pages_per_zspage", "freeable"); 609 612 610 - for (i = 0; i < zs_size_classes; i++) { 613 + for (i = 0; i < ZS_SIZE_CLASSES; i++) { 611 614 class = pool->size_class[i]; 612 615 613 616 if (class->index != i) ··· 1289 1292 area = &per_cpu(zs_map_area, cpu); 1290 1293 __zs_cpu_down(area); 1291 1294 return 0; 1292 - } 1293 - 1294 - static void __init init_zs_size_classes(void) 1295 - { 1296 - int nr; 1297 - 1298 - nr = (ZS_MAX_ALLOC_SIZE - ZS_MIN_ALLOC_SIZE) / ZS_SIZE_CLASS_DELTA + 1; 1299 - if ((ZS_MAX_ALLOC_SIZE - ZS_MIN_ALLOC_SIZE) % ZS_SIZE_CLASS_DELTA) 1300 - nr += 1; 1301 - 1302 - zs_size_classes = nr; 1303 1295 } 1304 1296 1305 1297 static bool can_merge(struct size_class *prev, int pages_per_zspage, ··· 2131 2145 struct zs_pool *pool = container_of(work, struct zs_pool, 2132 2146 free_work); 2133 2147 2134 - for (i = 0; i < zs_size_classes; i++) { 2148 + for (i = 0; i < ZS_SIZE_CLASSES; i++) { 2135 2149 class = pool->size_class[i]; 2136 2150 if (class->index != i) 2137 2151 continue; ··· 2249 2263 int i; 2250 2264 struct size_class *class; 2251 2265 2252 - for (i = zs_size_classes - 1; i >= 0; i--) { 2266 + for (i = ZS_SIZE_CLASSES - 1; i >= 0; i--) { 2253 2267 class = pool->size_class[i]; 2254 2268 if (!class) 2255 2269 continue; ··· 2295 2309 struct zs_pool *pool = container_of(shrinker, struct zs_pool, 2296 2310 shrinker); 2297 2311 2298 - for (i = zs_size_classes - 1; i >= 0; i--) { 2312 + for (i = ZS_SIZE_CLASSES - 1; i >= 0; i--) { 2299 2313 class = pool->size_class[i]; 2300 2314 if (!class) 2301 2315 continue; ··· 2347 2361 return NULL; 2348 2362 2349 2363 init_deferred_free(pool); 2350 - pool->size_class = kcalloc(zs_size_classes, sizeof(struct size_class *), 2351 - GFP_KERNEL); 2352 - if (!pool->size_class) { 2353 - kfree(pool); 2354 - return NULL; 2355 - } 2356 2364 2357 2365 pool->name = kstrdup(name, GFP_KERNEL); 2358 2366 if (!pool->name) ··· 2359 2379 * Iterate reversely, because, size of size_class that we want to use 2360 2380 * for merging should be larger or equal to current size. 2361 2381 */ 2362 - for (i = zs_size_classes - 1; i >= 0; i--) { 2382 + for (i = ZS_SIZE_CLASSES - 1; i >= 0; i--) { 2363 2383 int size; 2364 2384 int pages_per_zspage; 2365 2385 int objs_per_zspage; ··· 2433 2453 zs_unregister_migration(pool); 2434 2454 zs_pool_stat_destroy(pool); 2435 2455 2436 - for (i = 0; i < zs_size_classes; i++) { 2456 + for (i = 0; i < ZS_SIZE_CLASSES; i++) { 2437 2457 int fg; 2438 2458 struct size_class *class = pool->size_class[i]; 2439 2459 ··· 2471 2491 zs_cpu_prepare, zs_cpu_dead); 2472 2492 if (ret) 2473 2493 goto hp_setup_fail; 2474 - 2475 - init_zs_size_classes(); 2476 2494 2477 2495 #ifdef CONFIG_ZPOOL 2478 2496 zpool_register_driver(&zs_zpool_driver);

+81 -21

scripts/checkpatch.pl

··· 57 57 my $codespellfile = "/usr/share/codespell/dictionary.txt"; 58 58 my $conststructsfile = "$D/const_structs.checkpatch"; 59 59 my $typedefsfile = ""; 60 - my $color = 1; 60 + my $color = "auto"; 61 61 my $allow_c99_comments = 1; 62 62 63 63 sub help { ··· 116 116 (default:/usr/share/codespell/dictionary.txt) 117 117 --codespellfile Use this codespell dictionary 118 118 --typedefsfile Read additional types from this file 119 - --color Use colors when output is STDOUT (default: on) 119 + --color[=WHEN] Use colors 'always', 'never', or only when output 120 + is a terminal ('auto'). Default is 'auto'. 120 121 -h, --help, --version display this help and exit 121 122 122 123 When FILE is - read standard input. ··· 183 182 unshift(@ARGV, @conf_args) if @conf_args; 184 183 } 185 184 185 + # Perl's Getopt::Long allows options to take optional arguments after a space. 186 + # Prevent --color by itself from consuming other arguments 187 + foreach (@ARGV) { 188 + if ($_ eq "--color" || $_ eq "-color") { 189 + $_ = "--color=$color"; 190 + } 191 + } 192 + 186 193 GetOptions( 187 194 'q|quiet+' => \$quiet, 188 195 'tree!' => \$tree, ··· 221 212 'codespell!' => \$codespell, 222 213 'codespellfile=s' => \$codespellfile, 223 214 'typedefsfile=s' => \$typedefsfile, 224 - 'color!' => \$color, 215 + 'color=s' => \$color, 216 + 'no-color' => \$color, #keep old behaviors of -nocolor 217 + 'nocolor' => \$color, #keep old behaviors of -nocolor 225 218 'h|help' => \$help, 226 219 'version' => \$help 227 220 ) or help(1); ··· 247 236 #if no filenames are given, push '-' to read patch from stdin 248 237 if ($#ARGV < 0) { 249 238 push(@ARGV, '-'); 239 + } 240 + 241 + if ($color =~ /^[01]$/) { 242 + $color = !$color; 243 + } elsif ($color =~ /^always$/i) { 244 + $color = 1; 245 + } elsif ($color =~ /^never$/i) { 246 + $color = 0; 247 + } elsif ($color =~ /^auto$/i) { 248 + $color = (-t STDOUT); 249 + } else { 250 + die "Invalid color mode: $color\n"; 250 251 } 251 252 252 253 sub hash_save_array_words { ··· 756 733 757 734 our $declaration_macros = qr{(?x: 758 735 (?:$Storage\s+)?(?:[A-Z_][A-Z0-9]*_){0,2}(?:DEFINE|DECLARE)(?:_[A-Z0-9]+){1,6}\s*\(| 759 - (?:$Storage\s+)?LIST_HEAD\s*\(| 736 + (?:$Storage\s+)?[HLP]?LIST_HEAD\s*\(| 760 737 (?:$Storage\s+)?${Type}\s+uninitialized_var\s*\( 761 738 )}; 762 739 ··· 890 867 # echo "commit $(cut -c 1-12,41-)" 891 868 # done 892 869 } elsif ($lines[0] =~ /^fatal: ambiguous argument '$commit': unknown revision or path not in the working tree\./) { 870 + $id = undef; 893 871 } else { 894 872 $id = substr($lines[0], 0, 12); 895 873 $desc = substr($lines[0], 41); ··· 1906 1882 return 0; 1907 1883 } 1908 1884 my $output = ''; 1909 - if (-t STDOUT && $color) { 1885 + if ($color) { 1910 1886 if ($level eq 'ERROR') { 1911 1887 $output .= RED; 1912 1888 } elsif ($level eq 'WARNING') { ··· 1917 1893 } 1918 1894 $output .= $prefix . $level . ':'; 1919 1895 if ($show_types) { 1920 - $output .= BLUE if (-t STDOUT && $color); 1896 + $output .= BLUE if ($color); 1921 1897 $output .= "$type:"; 1922 1898 } 1923 - $output .= RESET if (-t STDOUT && $color); 1899 + $output .= RESET if ($color); 1924 1900 $output .= ' ' . $msg . "\n"; 1925 1901 1926 1902 if ($showfile) { ··· 2630 2606 ($id, $description) = git_commit_info($orig_commit, 2631 2607 $id, $orig_desc); 2632 2608 2633 - if ($short || $long || $space || $case || ($orig_desc ne $description) || !$hasparens) { 2609 + if (defined($id) && 2610 + ($short || $long || $space || $case || ($orig_desc ne $description) || !$hasparens)) { 2634 2611 ERROR("GIT_COMMIT_ID", 2635 2612 "Please use git commit description style 'commit <12+ chars of sha1> (\"<title line>\")' - ie: '${init_char}ommit $id (\"$description\")'\n" . $herecurr); 2636 2613 } ··· 2799 2774 "please write a paragraph that describes the config symbol fully\n" . $herecurr); 2800 2775 } 2801 2776 #print "is_start<$is_start> is_end<$is_end> length<$length>\n"; 2777 + } 2778 + 2779 + # check for MAINTAINERS entries that don't have the right form 2780 + if ($realfile =~ /^MAINTAINERS$/ && 2781 + $rawline =~ /^\+[A-Z]:/ && 2782 + $rawline !~ /^\+[A-Z]:\t\S/) { 2783 + if (WARN("MAINTAINERS_STYLE", 2784 + "MAINTAINERS entries use one tab after TYPE:\n" . $herecurr) && 2785 + $fix) { 2786 + $fixed[$fixlinenr] =~ s/^(\+[A-Z]):\s*/$1:\t/; 2787 + } 2802 2788 } 2803 2789 2804 2790 # discourage the use of boolean for type definition attributes of Kconfig options ··· 2993 2957 2994 2958 # check multi-line statement indentation matches previous line 2995 2959 if ($^V && $^V ge 5.10.0 && 2996 - $prevline =~ /^\+([ \t]*)((?:$c90_Keywords(?:\s+if)\s*)|(?:$Declare\s*)?(?:$Ident|$\s*\*\s*$Ident\s*$)\s*|$Ident\s*=\s*$Ident\s*)$.*(\&\&|\|\||,)\s*$/) { 2960 + $prevline =~ /^\+([ \t]*)((?:$c90_Keywords(?:\s+if)\s*)|(?:$Declare\s*)?(?:$Ident|\(\s*\*\s*$Ident\s*$)\s*|(?:\*\s*)*$Lval\s*=\s*$Ident\s*)\(.*(\&\&|\|\||,)\s*$/) { 2997 2961 $prevline =~ /^\+(\t*)(.*)$/; 2998 2962 my $oldindent = $1; 2999 2963 my $rest = $2; ··· 3244 3208 my ($stat, $cond, $line_nr_next, $remain_next, $off_next, 3245 3209 $realline_next); 3246 3210 #print "LINE<$line>\n"; 3247 - if ($linenr >= $suppress_statement && 3211 + if ($linenr > $suppress_statement && 3248 3212 $realcnt && $sline =~ /.\s*\S/) { 3249 3213 ($stat, $cond, $line_nr_next, $remain_next, $off_next) = 3250 3214 ctx_statement_block($linenr, $realcnt, 0); ··· 3578 3542 $fixedline =~ s/\s*=\s*$/ = {/; 3579 3543 fix_insert_line($fixlinenr, $fixedline); 3580 3544 $fixedline = $line; 3581 - $fixedline =~ s/^(.\s*){\s*/$1/; 3545 + $fixedline =~ s/^(.\s*)\{\s*/$1/; 3582 3546 fix_insert_line($fixlinenr, $fixedline); 3583 3547 } 3584 3548 } ··· 3919 3883 my $fixedline = rtrim($prevrawline) . " {"; 3920 3884 fix_insert_line($fixlinenr, $fixedline); 3921 3885 $fixedline = $rawline; 3922 - $fixedline =~ s/^(.\s*){\s*/$1\t/; 3886 + $fixedline =~ s/^(.\s*)\{\s*/$1\t/; 3923 3887 if ($fixedline !~ /^\+\s*$/) { 3924 3888 fix_insert_line($fixlinenr, $fixedline); 3925 3889 } ··· 4408 4372 if (ERROR("SPACING", 4409 4373 "space required before the open brace '{'\n" . $herecurr) && 4410 4374 $fix) { 4411 - $fixed[$fixlinenr] =~ s/^(\+.*(?:do|\))){/$1 {/; 4375 + $fixed[$fixlinenr] =~ s/^(\+.*(?:do|\)))\{/$1 {/; 4412 4376 } 4413 4377 } 4414 4378 ··· 4940 4904 foreach my $arg (@def_args) { 4941 4905 next if ($arg =~ /\.\.\./); 4942 4906 next if ($arg =~ /^type$/i); 4943 - my $tmp = $define_stmt; 4944 - $tmp =~ s/\b(typeof|__typeof__|__builtin\w+|typecheck\s*$\s*$Type\s*,|\#+)\s*\(*\s*$arg\s*$*\b//g; 4945 - $tmp =~ s/\#+\s*$arg\b//g; 4946 - $tmp =~ s/\b$arg\s*\#\#//g; 4947 - my $use_cnt = $tmp =~ s/\b$arg\b//g; 4907 + my $tmp_stmt = $define_stmt; 4908 + $tmp_stmt =~ s/\b(typeof|__typeof__|__builtin\w+|typecheck\s*$\s*$Type\s*,|\#+)\s*\(*\s*$arg\s*$*\b//g; 4909 + $tmp_stmt =~ s/\#+\s*$arg\b//g; 4910 + $tmp_stmt =~ s/\b$arg\s*\#\#//g; 4911 + my $use_cnt = $tmp_stmt =~ s/\b$arg\b//g; 4948 4912 if ($use_cnt > 1) { 4949 4913 CHK("MACRO_ARG_REUSE", 4950 4914 "Macro argument reuse '$arg' - possible side-effects?\n" . "$herectx"); 4951 4915 } 4952 4916 # check if any macro arguments may have other precedence issues 4953 - if ($define_stmt =~ m/($Operators)?\s*\b$arg\b\s*($Operators)?/m && 4917 + if ($tmp_stmt =~ m/($Operators)?\s*\b$arg\b\s*($Operators)?/m && 4954 4918 ((defined($1) && $1 ne ',') || 4955 4919 (defined($2) && $2 ne ','))) { 4956 4920 CHK("MACRO_ARG_PRECEDENCE", ··· 5347 5311 my ($s, $c) = ctx_statement_block($linenr - 3, $realcnt, 0); 5348 5312 # print("line: <$line>\nprevline: <$prevline>\ns: <$s>\nc: <$c>\n\n\n"); 5349 5313 5350 - if ($c =~ /(?:^|\n)[ \+]\s*(?:$Type\s*)?\Q$testval\E\s*=\s*(?:$[^$]*\)\s*)?\s*(?:devm_)?(?:[kv][czm]alloc(?:_node|_array)?\b|kstrdup|(?:dev_)?alloc_skb)/) { 5314 + if ($s =~ /(?:^|\n)[ \+]\s*(?:$Type\s*)?\Q$testval\E\s*=\s*(?:$[^$]*\)\s*)?\s*(?:devm_)?(?:[kv][czm]alloc(?:_node|_array)?\b|kstrdup|kmemdup|(?:dev_)?alloc_skb)/) { 5351 5315 WARN("OOM_MESSAGE", 5352 5316 "Possible unnecessary 'out of memory' message\n" . $hereprev); 5353 5317 } ··· 5922 5886 "externs should be avoided in .c files\n" . $herecurr); 5923 5887 } 5924 5888 5925 - if ($realfile =~ /\.[ch]$/ && defined $stat && 5889 + # check for function declarations that have arguments without identifier names 5890 + if (defined $stat && 5926 5891 $stat =~ /^.\s*(?:extern\s+)?$Type\s*$Ident\s*$\s*([^{]+)\s*$\s*;/s && 5927 5892 $1 ne "void") { 5928 5893 my $args = trim($1); ··· 5933 5896 WARN("FUNCTION_ARGUMENTS", 5934 5897 "function definition argument '$arg' should also have an identifier name\n" . $herecurr); 5935 5898 } 5899 + } 5900 + } 5901 + 5902 + # check for function definitions 5903 + if ($^V && $^V ge 5.10.0 && 5904 + defined $stat && 5905 + $stat =~ /^.\s*(?:$Storage\s+)?$Type\s*($Ident)\s*$balanced_parens\s*{/s) { 5906 + $context_function = $1; 5907 + 5908 + # check for multiline function definition with misplaced open brace 5909 + my $ok = 0; 5910 + my $cnt = statement_rawlines($stat); 5911 + my $herectx = $here . "\n"; 5912 + for (my $n = 0; $n < $cnt; $n++) { 5913 + my $rl = raw_line($linenr, $n); 5914 + $herectx .= $rl . "\n"; 5915 + $ok = 1 if ($rl =~ /^[ \+]\{/); 5916 + $ok = 1 if ($rl =~ /\{/ && $n == 0); 5917 + last if $rl =~ /^[ \+].*\{/; 5918 + } 5919 + if (!$ok) { 5920 + ERROR("OPEN_BRACE", 5921 + "open brace '{' following function definitions go on the next line\n" . $herectx); 5936 5922 } 5937 5923 } 5938 5924

Configure Feed

Configure Feed