Merge tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

+1

.mailmap

··· 673 673 Ross Zwisler <zwisler@kernel.org> <ross.zwisler@linux.intel.com> 674 674 Rudolf Marek <R.Marek@sh.cvut.cz> 675 675 Rui Saraiva <rmps@joel.ist.utl.pt> 676 + Sachin Mokashi <sachin.mokashi@intel.com> <sachinx.mokashi@intel.com> 676 677 Sachin P Sant <ssant@in.ibm.com> 677 678 Sai Prakash Ranjan <quic_saipraka@quicinc.com> <saiprakash.ranjan@codeaurora.org> 678 679 Sakari Ailus <sakari.ailus@linux.intel.com> <sakari.ailus@iki.fi>

+56

Documentation/accounting/delay-accounting.rst

··· 131 131 linuxrc: read=65536, write=0, cancelled_write=0 132 132 133 133 The above command can be used with -v to get more debug information. 134 + 135 + After the system starts, use `delaytop` to get the system-wide delay information, 136 + which includes system-wide PSI information and Top-N high-latency tasks. 137 + 138 + `delaytop` supports sorting by CPU latency in descending order by default, 139 + displays the top 20 high-latency tasks by default, and refreshes the latency 140 + data every 2 seconds by default. 141 + 142 + Get PSI information and Top-N tasks delay, since system boot:: 143 + 144 + bash# ./delaytop 145 + System Pressure Information: (avg10/avg60/avg300/total) 146 + CPU some: 0.0%/ 0.0%/ 0.0%/ 345(ms) 147 + CPU full: 0.0%/ 0.0%/ 0.0%/ 0(ms) 148 + Memory full: 0.0%/ 0.0%/ 0.0%/ 0(ms) 149 + Memory some: 0.0%/ 0.0%/ 0.0%/ 0(ms) 150 + IO full: 0.0%/ 0.0%/ 0.0%/ 65(ms) 151 + IO some: 0.0%/ 0.0%/ 0.0%/ 79(ms) 152 + IRQ full: 0.0%/ 0.0%/ 0.0%/ 0(ms) 153 + Top 20 processes (sorted by CPU delay): 154 + PID TGID COMMAND CPU(ms) IO(ms) SWAP(ms) RCL(ms) THR(ms) CMP(ms) WP(ms) IRQ(ms) 155 + ---------------------------------------------------------------------------------------------- 156 + 161 161 zombie_memcg_re 1.40 0.00 0.00 0.00 0.00 0.00 0.00 0.00 157 + 130 130 blkcg_punt_bio 1.37 0.00 0.00 0.00 0.00 0.00 0.00 0.00 158 + 444 444 scsi_tmf_0 0.73 0.00 0.00 0.00 0.00 0.00 0.00 0.00 159 + 1280 1280 rsyslogd 0.53 0.04 0.00 0.00 0.00 0.00 0.00 0.00 160 + 12 12 ksoftirqd/0 0.47 0.00 0.00 0.00 0.00 0.00 0.00 0.00 161 + 1277 1277 nbd-server 0.44 0.00 0.00 0.00 0.00 0.00 0.00 0.00 162 + 308 308 kworker/2:2-sys 0.41 0.00 0.00 0.00 0.00 0.00 0.00 0.00 163 + 55 55 netns 0.36 0.00 0.00 0.00 0.00 0.00 0.00 0.00 164 + 1187 1187 acpid 0.31 0.03 0.00 0.00 0.00 0.00 0.00 0.00 165 + 6184 6184 kworker/1:2-sys 0.24 0.00 0.00 0.00 0.00 0.00 0.00 0.00 166 + 186 186 kaluad 0.24 0.00 0.00 0.00 0.00 0.00 0.00 0.00 167 + 18 18 ksoftirqd/1 0.24 0.00 0.00 0.00 0.00 0.00 0.00 0.00 168 + 185 185 kmpath_rdacd 0.23 0.00 0.00 0.00 0.00 0.00 0.00 0.00 169 + 190 190 kstrp 0.23 0.00 0.00 0.00 0.00 0.00 0.00 0.00 170 + 2759 2759 agetty 0.20 0.03 0.00 0.00 0.00 0.00 0.00 0.00 171 + 1190 1190 kworker/0:3-sys 0.19 0.00 0.00 0.00 0.00 0.00 0.00 0.00 172 + 1272 1272 sshd 0.15 0.04 0.00 0.00 0.00 0.00 0.00 0.00 173 + 1156 1156 license 0.15 0.11 0.00 0.00 0.00 0.00 0.00 0.00 174 + 134 134 md 0.13 0.00 0.00 0.00 0.00 0.00 0.00 0.00 175 + 6142 6142 kworker/3:2-xfs 0.13 0.00 0.00 0.00 0.00 0.00 0.00 0.00 176 + 177 + Dynamic interactive interface of delaytop:: 178 + 179 + # ./delaytop -p pid 180 + Print delayacct stats 181 + 182 + # ./delaytop -P num 183 + Display the top N tasks 184 + 185 + # ./delaytop -n num 186 + Set delaytop refresh frequency (num times) 187 + 188 + # ./delaytop -d secs 189 + Specify refresh interval as secs

+21

Documentation/admin-guide/kdump/kdump.rst

··· 311 311 312 312 crashkernel=0,low 313 313 314 + 4) crashkernel=size,cma 315 + 316 + Reserve additional crash kernel memory from CMA. This reservation is 317 + usable by the first system's userspace memory and kernel movable 318 + allocations (memory balloon, zswap). Pages allocated from this memory 319 + range will not be included in the vmcore so this should not be used if 320 + dumping of userspace memory is intended and it has to be expected that 321 + some movable kernel pages may be missing from the dump. 322 + 323 + A standard crashkernel reservation, as described above, is still needed 324 + to hold the crash kernel and initrd. 325 + 326 + This option increases the risk of a kdump failure: DMA transfers 327 + configured by the first kernel may end up corrupting the second 328 + kernel's memory. 329 + 330 + This reservation method is intended for systems that can't afford to 331 + sacrifice enough memory for standard crashkernel reservation and where 332 + less reliable and possibly incomplete kdump is preferable to no kdump at 333 + all. 334 + 314 335 Boot into System Kernel 315 336 ----------------------- 316 337 1) Update the boot loader (such as grub, yaboot, or lilo) configuration

+47 -1

Documentation/admin-guide/kernel-parameters.txt

··· 994 994 0: to disable low allocation. 995 995 It will be ignored when crashkernel=X,high is not used 996 996 or memory reserved is below 4G. 997 + crashkernel=size[KMG],cma 998 + [KNL, X86] Reserve additional crash kernel memory from 999 + CMA. This reservation is usable by the first system's 1000 + userspace memory and kernel movable allocations (memory 1001 + balloon, zswap). Pages allocated from this memory range 1002 + will not be included in the vmcore so this should not 1003 + be used if dumping of userspace memory is intended and 1004 + it has to be expected that some movable kernel pages 1005 + may be missing from the dump. 1006 + 1007 + A standard crashkernel reservation, as described above, 1008 + is still needed to hold the crash kernel and initrd. 1009 + 1010 + This option increases the risk of a kdump failure: DMA 1011 + transfers configured by the first kernel may end up 1012 + corrupting the second kernel's memory. 1013 + 1014 + This reservation method is intended for systems that 1015 + can't afford to sacrifice enough memory for standard 1016 + crashkernel reservation and where less reliable and 1017 + possibly incomplete kdump is preferable to no kdump at 1018 + all. 997 1019 998 1020 cryptomgr.notests 999 1021 [KNL] Disable crypto self-tests ··· 4579 4557 bit 2: print timer info 4580 4558 bit 3: print locks info if CONFIG_LOCKDEP is on 4581 4559 bit 4: print ftrace buffer 4582 - bit 5: print all printk messages in buffer 4560 + bit 5: replay all messages on consoles at the end of panic 4583 4561 bit 6: print all CPUs backtrace (if available in the arch) 4584 4562 bit 7: print only tasks in uninterruptible (blocked) state 4585 4563 *Be aware* that this option may print a _lot_ of lines, 4586 4564 so there are risks of losing older messages in the log. 4587 4565 Use this option carefully, maybe worth to setup a 4588 4566 bigger log buffer with "log_buf_len" along with this. 4567 + 4568 + panic_sys_info= A comma separated list of extra information to be dumped 4569 + on panic. 4570 + Format: val[,val...] 4571 + Where @val can be any of the following: 4572 + 4573 + tasks: print all tasks info 4574 + mem: print system memory info 4575 + timers: print timers info 4576 + locks: print locks info if CONFIG_LOCKDEP is on 4577 + ftrace: print ftrace buffer 4578 + all_bt: print all CPUs backtrace (if available in the arch) 4579 + blocked_tasks: print only tasks in uninterruptible (blocked) state 4580 + 4581 + This is a human readable alternative to the 'panic_print' option. 4582 + 4583 + panic_console_replay 4584 + When panic happens, replay all kernel messages on 4585 + consoles at the end of panic. 4589 4586 4590 4587 parkbd.port= [HW] Parallel port number the keyboard adapter is 4591 4588 connected to, default is 0. ··· 7072 7031 disable the stack depot thereby saving the static memory 7073 7032 consumed by the stack hash table. By default this is set 7074 7033 to false. 7034 + 7035 + stack_depot_max_pools= [KNL,EARLY] 7036 + Specify the maximum number of pools to use for storing 7037 + stack traces. Pools are allocated on-demand up to this 7038 + limit. Default value is 8191 pools. 7075 7039 7076 7040 stacktrace [FTRACE] 7077 7041 Enabled the stack tracer on boot up.

+19 -1

Documentation/admin-guide/sysctl/kernel.rst

··· 890 890 bit 2 print timer info 891 891 bit 3 print locks info if ``CONFIG_LOCKDEP`` is on 892 892 bit 4 print ftrace buffer 893 - bit 5 print all printk messages in buffer 893 + bit 5 replay all messages on consoles at the end of panic 894 894 bit 6 print all CPUs backtrace (if available in the arch) 895 895 bit 7 print only tasks in uninterruptible (blocked) state 896 896 ===== ============================================ ··· 898 898 So for example to print tasks and memory info on panic, user can:: 899 899 900 900 echo 3 > /proc/sys/kernel/panic_print 901 + 902 + 903 + panic_sys_info 904 + ============== 905 + 906 + A comma separated list of extra information to be dumped on panic, 907 + for example, "tasks,mem,timers,...". It is a human readable alternative 908 + to 'panic_print'. Possible values are: 909 + 910 + ============= =================================================== 911 + tasks print all tasks info 912 + mem print system memory info 913 + timer print timers info 914 + lock print locks info if CONFIG_LOCKDEP is on 915 + ftrace print ftrace buffer 916 + all_bt print all CPUs backtrace (if available in the arch) 917 + blocked_tasks print only tasks in uninterruptible (blocked) state 918 + ============= =================================================== 901 919 902 920 903 921 panic_on_rcu_stall

+12

MAINTAINERS

··· 13544 13544 F: Documentation/core-api/kho/* 13545 13545 F: include/linux/kexec_handover.h 13546 13546 F: kernel/kexec_handover.c 13547 + F: tools/testing/selftests/kho/ 13547 13548 13548 13549 KEYS-ENCRYPTED 13549 13550 M: Mimi Zohar <zohar@linux.ibm.com> ··· 19742 19741 F: include/linux/delayacct.h 19743 19742 F: kernel/delayacct.c 19744 19743 19744 + TASK DELAY MONITORING TOOLS 19745 + M: Andrew Morton <akpm@linux-foundation.org> 19746 + M: Wang Yaxin <wang.yaxin@zte.com.cn> 19747 + M: Fan Yu <fan.yu9@zte.com.cn> 19748 + L: linux-kernel@vger.kernel.org 19749 + S: Maintained 19750 + F: Documentation/accounting/delay-accounting.rst 19751 + F: tools/accounting/delaytop.c 19752 + F: tools/accounting/getdelays.c 19753 + 19745 19754 PERFORMANCE EVENTS SUBSYSTEM 19746 19755 M: Peter Zijlstra <peterz@infradead.org> 19747 19756 M: Ingo Molnar <mingo@redhat.com> ··· 23393 23382 F: drivers/md/raid* 23394 23383 F: include/linux/raid/ 23395 23384 F: include/uapi/linux/raid/ 23385 + F: lib/raid6/ 23396 23386 23397 23387 SOLIDRUN CLEARFOG SUPPORT 23398 23388 M: Russell King <linux@armlinux.org.uk>

+7 -4

arch/alpha/kernel/core_marvel.c

··· 17 17 #include <linux/vmalloc.h> 18 18 #include <linux/mc146818rtc.h> 19 19 #include <linux/rtc.h> 20 + #include <linux/string.h> 20 21 #include <linux/module.h> 21 22 #include <linux/memblock.h> 22 23 ··· 80 79 { 81 80 char tmp[80]; 82 81 char *name; 83 - 84 - sprintf(tmp, "PCI %s PE %d PORT %d", str, pe, port); 85 - name = memblock_alloc_or_panic(strlen(tmp) + 1, SMP_CACHE_BYTES); 86 - strcpy(name, tmp); 82 + size_t sz; 83 + 84 + sz = scnprintf(tmp, sizeof(tmp), "PCI %s PE %d PORT %d", str, pe, port); 85 + sz += 1; /* NUL terminator */ 86 + name = memblock_alloc_or_panic(sz, SMP_CACHE_BYTES); 87 + strscpy(name, tmp, sz); 87 88 88 89 return name; 89 90 }

+1 -1

arch/arm/kernel/setup.c

··· 1004 1004 total_mem = get_total_mem(); 1005 1005 ret = parse_crashkernel(boot_command_line, total_mem, 1006 1006 &crash_size, &crash_base, 1007 - NULL, NULL); 1007 + NULL, NULL, NULL); 1008 1008 /* invalid value specified or crashkernel=0 */ 1009 1009 if (ret || !crash_size) 1010 1010 return;

+1 -1

arch/arm64/mm/init.c

··· 106 106 107 107 ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), 108 108 &crash_size, &crash_base, 109 - &low_size, &high); 109 + &low_size, NULL, &high); 110 110 if (ret) 111 111 return; 112 112

+1 -1

arch/loongarch/kernel/setup.c

··· 265 265 return; 266 266 267 267 ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), 268 - &crash_size, &crash_base, &low_size, &high); 268 + &crash_size, &crash_base, &low_size, NULL, &high); 269 269 if (ret) 270 270 return; 271 271

+1 -1

arch/mips/kernel/setup.c

··· 458 458 total_mem = memblock_phys_mem_size(); 459 459 ret = parse_crashkernel(boot_command_line, total_mem, 460 460 &crash_size, &crash_base, 461 - NULL, NULL); 461 + NULL, NULL, NULL); 462 462 if (ret != 0 || crash_size <= 0) 463 463 return; 464 464

+1 -1

arch/powerpc/kernel/fadump.c

··· 333 333 * memory at a predefined offset. 334 334 */ 335 335 ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), 336 - &size, &base, NULL, NULL); 336 + &size, &base, NULL, NULL, NULL); 337 337 if (ret == 0 && size > 0) { 338 338 unsigned long max_size; 339 339

+1 -1

arch/powerpc/kexec/core.c

··· 110 110 111 111 /* use common parsing */ 112 112 ret = parse_crashkernel(boot_command_line, total_mem_sz, &crash_size, 113 - &crash_base, NULL, NULL); 113 + &crash_base, NULL, NULL, NULL); 114 114 115 115 if (ret) 116 116 return;

+1 -1

arch/powerpc/mm/nohash/kaslr_booke.c

··· 178 178 int ret; 179 179 180 180 ret = parse_crashkernel(boot_command_line, size, &crash_size, 181 - &crash_base, NULL, NULL); 181 + &crash_base, NULL, NULL, NULL); 182 182 if (ret != 0 || crash_size == 0) 183 183 return; 184 184 if (crash_base == 0)

+1

arch/riscv/Kconfig

··· 93 93 select CLINT_TIMER if RISCV_M_MODE 94 94 select CLONE_BACKWARDS 95 95 select COMMON_CLK 96 + select CPU_NO_EFFICIENT_FFS if !RISCV_ISA_ZBB 96 97 select CPU_PM if CPU_IDLE || HIBERNATION || SUSPEND 97 98 select DYNAMIC_FTRACE if FUNCTION_TRACER 98 99 select EDAC_SUPPORT

+1

arch/riscv/kernel/kexec_elf.c

··· 95 95 kbuf.buf_align = PMD_SIZE; 96 96 kbuf.mem = KEXEC_BUF_MEM_UNKNOWN; 97 97 kbuf.memsz = ALIGN(kernel_len, PAGE_SIZE); 98 + kbuf.cma = NULL; 98 99 kbuf.top_down = false; 99 100 ret = arch_kexec_locate_mem_hole(&kbuf); 100 101 if (!ret) {

+5

arch/riscv/kernel/setup.c

··· 21 21 #include <linux/efi.h> 22 22 #include <linux/crash_dump.h> 23 23 #include <linux/panic_notifier.h> 24 + #include <linux/jump_label.h> 25 + #include <linux/gcd.h> 24 26 25 27 #include <asm/acpi.h> 26 28 #include <asm/alternative.h> ··· 364 362 365 363 riscv_user_isa_enable(); 366 364 riscv_spinlock_init(); 365 + 366 + if (!IS_ENABLED(CONFIG_RISCV_ISA_ZBB) || !riscv_isa_extension_available(NULL, ZBB)) 367 + static_branch_disable(&efficient_ffs_key); 367 368 } 368 369 369 370 bool arch_cpu_is_hotpluggable(int cpu)

+1 -1

arch/riscv/mm/init.c

··· 1408 1408 1409 1409 ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), 1410 1410 &crash_size, &crash_base, 1411 - &low_size, &high); 1411 + &low_size, NULL, &high); 1412 1412 if (ret) 1413 1413 return; 1414 1414

+1 -1

arch/s390/kernel/setup.c

··· 605 605 int rc; 606 606 607 607 rc = parse_crashkernel(boot_command_line, ident_map_size, 608 - &crash_size, &crash_base, NULL, NULL); 608 + &crash_size, &crash_base, NULL, NULL, NULL); 609 609 610 610 crash_base = ALIGN(crash_base, KEXEC_CRASH_MEM_ALIGN); 611 611 crash_size = ALIGN(crash_size, KEXEC_CRASH_MEM_ALIGN);

+1 -1

arch/sh/kernel/machine_kexec.c

··· 146 146 return; 147 147 148 148 ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), 149 - &crash_size, &crash_base, NULL, NULL); 149 + &crash_size, &crash_base, NULL, NULL, NULL); 150 150 if (ret == 0 && crash_size > 0) { 151 151 crashk_res.start = crash_base; 152 152 crashk_res.end = crash_base + crash_size - 1;

+22 -4

arch/x86/kernel/crash.c

··· 163 163 return NULL; 164 164 165 165 /* 166 - * Exclusion of crash region and/or crashk_low_res may cause 167 - * another range split. So add extra two slots here. 166 + * Exclusion of crash region, crashk_low_res and/or crashk_cma_ranges 167 + * may cause range splits. So add extra slots here. 168 168 */ 169 - nr_ranges += 2; 169 + nr_ranges += 2 + crashk_cma_cnt; 170 170 cmem = vzalloc(struct_size(cmem, ranges, nr_ranges)); 171 171 if (!cmem) 172 172 return NULL; ··· 184 184 static int elf_header_exclude_ranges(struct crash_mem *cmem) 185 185 { 186 186 int ret = 0; 187 + int i; 187 188 188 189 /* Exclude the low 1M because it is always reserved */ 189 190 ret = crash_exclude_mem_range(cmem, 0, SZ_1M - 1); ··· 199 198 if (crashk_low_res.end) 200 199 ret = crash_exclude_mem_range(cmem, crashk_low_res.start, 201 200 crashk_low_res.end); 201 + if (ret) 202 + return ret; 202 203 203 - return ret; 204 + for (i = 0; i < crashk_cma_cnt; ++i) { 205 + ret = crash_exclude_mem_range(cmem, crashk_cma_ranges[i].start, 206 + crashk_cma_ranges[i].end); 207 + if (ret) 208 + return ret; 209 + } 210 + 211 + return 0; 204 212 } 205 213 206 214 static int prepare_elf64_ram_headers_callback(struct resource *res, void *arg) ··· 380 370 if (ei.size < PAGE_SIZE) 381 371 continue; 382 372 ei.addr = cmem->ranges[i].start; 373 + ei.type = E820_TYPE_RAM; 374 + add_e820_entry(params, &ei); 375 + } 376 + 377 + for (i = 0; i < crashk_cma_cnt; ++i) { 378 + ei.addr = crashk_cma_ranges[i].start; 379 + ei.size = crashk_cma_ranges[i].end - 380 + crashk_cma_ranges[i].start + 1; 383 381 ei.type = E820_TYPE_RAM; 384 382 add_e820_entry(params, &ei); 385 383 }

+3 -2

arch/x86/kernel/setup.c

··· 603 603 604 604 static void __init arch_reserve_crashkernel(void) 605 605 { 606 - unsigned long long crash_base, crash_size, low_size = 0; 606 + unsigned long long crash_base, crash_size, low_size = 0, cma_size = 0; 607 607 bool high = false; 608 608 int ret; 609 609 ··· 612 612 613 613 ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(), 614 614 &crash_size, &crash_base, 615 - &low_size, &high); 615 + &low_size, &cma_size, &high); 616 616 if (ret) 617 617 return; 618 618 ··· 622 622 } 623 623 624 624 reserve_crashkernel_generic(crash_size, crash_base, low_size, high); 625 + reserve_crashkernel_cma(cma_size); 625 626 } 626 627 627 628 static struct resource standard_io_resources[] = {

+2 -2

arch/x86/kvm/i8254.c

··· 641 641 kvm_pit_reset_reinject(pit); 642 642 } 643 643 644 - static void pit_mask_notifer(struct kvm_irq_mask_notifier *kimn, bool mask) 644 + static void pit_mask_notifier(struct kvm_irq_mask_notifier *kimn, bool mask) 645 645 { 646 646 struct kvm_pit *pit = container_of(kimn, struct kvm_pit, mask_notifier); 647 647 ··· 763 763 764 764 pit_state->irq_ack_notifier.gsi = 0; 765 765 pit_state->irq_ack_notifier.irq_acked = kvm_pit_ack_irq; 766 - pit->mask_notifier.func = pit_mask_notifer; 766 + pit->mask_notifier.func = pit_mask_notifier; 767 767 768 768 kvm_pit_reset(pit); 769 769

+1 -1

crypto/async_tx/async_pq.c

··· 119 119 for (i = 0; i < disks; i++) { 120 120 if (blocks[i] == NULL) { 121 121 BUG_ON(i > disks - 3); /* P or Q can't be zero */ 122 - srcs[i] = (void*)raid6_empty_zero_page; 122 + srcs[i] = raid6_get_zero_page(); 123 123 } else { 124 124 srcs[i] = page_address(blocks[i]) + offsets[i]; 125 125

+2 -2

crypto/async_tx/async_raid6_recov.c

··· 414 414 async_tx_quiesce(&submit->depend_tx); 415 415 for (i = 0; i < disks; i++) 416 416 if (blocks[i] == NULL) 417 - ptrs[i] = (void *) raid6_empty_zero_page; 417 + ptrs[i] = raid6_get_zero_page(); 418 418 else 419 419 ptrs[i] = page_address(blocks[i]) + offs[i]; 420 420 ··· 497 497 async_tx_quiesce(&submit->depend_tx); 498 498 for (i = 0; i < disks; i++) 499 499 if (blocks[i] == NULL) 500 - ptrs[i] = (void*)raid6_empty_zero_page; 500 + ptrs[i] = raid6_get_zero_page(); 501 501 else 502 502 ptrs[i] = page_address(blocks[i]) + offs[i]; 503 503

+1 -1

drivers/cxl/core/mce.h

··· 7 7 8 8 #ifdef CONFIG_CXL_MCE 9 9 int devm_cxl_register_mce_notifier(struct device *dev, 10 - struct notifier_block *mce_notifer); 10 + struct notifier_block *mce_notifier); 11 11 #else 12 12 static inline int 13 13 devm_cxl_register_mce_notifier(struct device *dev,

+1 -2

drivers/gpu/drm/i915/gt/uc/intel_guc_log.c

··· 220 220 */ 221 221 static int subbuf_start_callback(struct rchan_buf *buf, 222 222 void *subbuf, 223 - void *prev_subbuf, 224 - size_t prev_padding) 223 + void *prev_subbuf) 225 224 { 226 225 /* 227 226 * Use no-overwrite mode by default, where relay will stop accepting

+1 -1

drivers/gpu/drm/xe/xe_vm_types.h

··· 266 266 * up for revalidation. Protected from access with the 267 267 * @invalidated_lock. Removing items from the list 268 268 * additionally requires @lock in write mode, and adding 269 - * items to the list requires either the @userptr.notifer_lock in 269 + * items to the list requires either the @userptr.notifier_lock in 270 270 * write mode, OR @lock in write mode. 271 271 */ 272 272 struct list_head invalidated;

+1 -1

drivers/net/ethernet/marvell/mvneta.c

··· 4610 4610 /* Inform that we are stopping so we don't want to setup the 4611 4611 * driver for new CPUs in the notifiers. The code of the 4612 4612 * notifier for CPU online is protected by the same spinlock, 4613 - * so when we get the lock, the notifer work is done. 4613 + * so when we get the lock, the notifier work is done. 4614 4614 */ 4615 4615 spin_lock(&pp->lock); 4616 4616 pp->is_stopped = true;

+1 -2

drivers/net/wwan/iosm/iosm_ipc_trace.c

··· 51 51 } 52 52 53 53 static int ipc_trace_subbuf_start_handler(struct rchan_buf *buf, void *subbuf, 54 - void *prev_subbuf, 55 - size_t prev_padding) 54 + void *prev_subbuf) 56 55 { 57 56 if (relay_buf_full(buf)) { 58 57 pr_err_ratelimited("Relay_buf full dropping traces");

+1 -1

drivers/net/wwan/t7xx/t7xx_port_trace.c

··· 33 33 } 34 34 35 35 static int t7xx_trace_subbuf_start_handler(struct rchan_buf *buf, void *subbuf, 36 - void *prev_subbuf, size_t prev_padding) 36 + void *prev_subbuf) 37 37 { 38 38 if (relay_buf_full(buf)) { 39 39 pr_err_ratelimited("Relay_buf full dropping traces");

+1 -1

fs/fat/fatent.c

··· 356 356 357 357 if (!fat_valid_entry(sbi, entry)) { 358 358 fatent_brelse(fatent); 359 - fat_fs_error(sb, "invalid access to FAT (entry 0x%08x)", entry); 359 + fat_fs_error_ratelimit(sb, "invalid access to FAT (entry 0x%08x)", entry); 360 360 return -EIO; 361 361 } 362 362

+3 -3

fs/fat/misc.c

··· 158 158 mark_inode_dirty(inode); 159 159 } 160 160 if (new_fclus != (inode->i_blocks >> (sbi->cluster_bits - 9))) { 161 - fat_fs_error(sb, "clusters badly computed (%d != %llu)", 162 - new_fclus, 163 - (llu)(inode->i_blocks >> (sbi->cluster_bits - 9))); 161 + fat_fs_error_ratelimit( 162 + sb, "clusters badly computed (%d != %llu)", new_fclus, 163 + (llu)(inode->i_blocks >> (sbi->cluster_bits - 9))); 164 164 fat_cache_inval_inode(inode); 165 165 } 166 166 inode->i_blocks += nr_cluster << (sbi->cluster_bits - 9);

+1

fs/ocfs2/aops.c

··· 1071 1071 if (IS_ERR(wc->w_folios[i])) { 1072 1072 ret = PTR_ERR(wc->w_folios[i]); 1073 1073 mlog_errno(ret); 1074 + wc->w_folios[i] = NULL; 1074 1075 goto out; 1075 1076 } 1076 1077 }

+8

fs/ocfs2/dir.c

··· 798 798 } 799 799 } 800 800 801 + if (le16_to_cpu(el->l_next_free_rec) == 0) { 802 + ret = ocfs2_error(inode->i_sb, 803 + "Inode %lu has empty extent list at depth %u\n", 804 + inode->i_ino, 805 + le16_to_cpu(el->l_tree_depth)); 806 + goto out; 807 + } 808 + 801 809 found = 0; 802 810 for (i = le16_to_cpu(el->l_next_free_rec) - 1; i >= 0; i--) { 803 811 rec = &el->l_recs[i];

+1 -1

fs/ocfs2/dlm/dlmrecovery.c

··· 2632 2632 dlm_reco_master_ready(dlm), 2633 2633 msecs_to_jiffies(1000)); 2634 2634 if (!dlm_reco_master_ready(dlm)) { 2635 - mlog(0, "%s: reco master taking awhile\n", 2635 + mlog(0, "%s: reco master taking a while\n", 2636 2636 dlm->name); 2637 2637 goto again; 2638 2638 }

+66 -4

fs/ocfs2/inode.c

··· 50 50 unsigned int fi_sysfile_type; 51 51 }; 52 52 53 - static struct lock_class_key ocfs2_sysfile_lock_key[NUM_SYSTEM_INODES]; 54 - 55 53 static int ocfs2_read_locked_inode(struct inode *inode, 56 54 struct ocfs2_find_inode_args *args); 57 55 static int ocfs2_init_locked_inode(struct inode *inode, void *opaque); ··· 248 250 static int ocfs2_init_locked_inode(struct inode *inode, void *opaque) 249 251 { 250 252 struct ocfs2_find_inode_args *args = opaque; 253 + #ifdef CONFIG_LOCKDEP 254 + static struct lock_class_key ocfs2_sysfile_lock_key[NUM_SYSTEM_INODES]; 251 255 static struct lock_class_key ocfs2_quota_ip_alloc_sem_key, 252 256 ocfs2_file_ip_alloc_sem_key; 257 + #endif 253 258 254 259 inode->i_ino = args->fi_ino; 255 260 OCFS2_I(inode)->ip_blkno = args->fi_blkno; 256 - if (args->fi_sysfile_type != 0) 261 + #ifdef CONFIG_LOCKDEP 262 + switch (args->fi_sysfile_type) { 263 + case BAD_BLOCK_SYSTEM_INODE: 264 + break; 265 + case GLOBAL_INODE_ALLOC_SYSTEM_INODE: 257 266 lockdep_set_class(&inode->i_rwsem, 258 - &ocfs2_sysfile_lock_key[args->fi_sysfile_type]); 267 + &ocfs2_sysfile_lock_key[GLOBAL_INODE_ALLOC_SYSTEM_INODE]); 268 + break; 269 + case SLOT_MAP_SYSTEM_INODE: 270 + lockdep_set_class(&inode->i_rwsem, 271 + &ocfs2_sysfile_lock_key[SLOT_MAP_SYSTEM_INODE]); 272 + break; 273 + case HEARTBEAT_SYSTEM_INODE: 274 + lockdep_set_class(&inode->i_rwsem, 275 + &ocfs2_sysfile_lock_key[HEARTBEAT_SYSTEM_INODE]); 276 + break; 277 + case GLOBAL_BITMAP_SYSTEM_INODE: 278 + lockdep_set_class(&inode->i_rwsem, 279 + &ocfs2_sysfile_lock_key[GLOBAL_BITMAP_SYSTEM_INODE]); 280 + break; 281 + case USER_QUOTA_SYSTEM_INODE: 282 + lockdep_set_class(&inode->i_rwsem, 283 + &ocfs2_sysfile_lock_key[USER_QUOTA_SYSTEM_INODE]); 284 + break; 285 + case GROUP_QUOTA_SYSTEM_INODE: 286 + lockdep_set_class(&inode->i_rwsem, 287 + &ocfs2_sysfile_lock_key[GROUP_QUOTA_SYSTEM_INODE]); 288 + break; 289 + case ORPHAN_DIR_SYSTEM_INODE: 290 + lockdep_set_class(&inode->i_rwsem, 291 + &ocfs2_sysfile_lock_key[ORPHAN_DIR_SYSTEM_INODE]); 292 + break; 293 + case EXTENT_ALLOC_SYSTEM_INODE: 294 + lockdep_set_class(&inode->i_rwsem, 295 + &ocfs2_sysfile_lock_key[EXTENT_ALLOC_SYSTEM_INODE]); 296 + break; 297 + case INODE_ALLOC_SYSTEM_INODE: 298 + lockdep_set_class(&inode->i_rwsem, 299 + &ocfs2_sysfile_lock_key[INODE_ALLOC_SYSTEM_INODE]); 300 + break; 301 + case JOURNAL_SYSTEM_INODE: 302 + lockdep_set_class(&inode->i_rwsem, 303 + &ocfs2_sysfile_lock_key[JOURNAL_SYSTEM_INODE]); 304 + break; 305 + case LOCAL_ALLOC_SYSTEM_INODE: 306 + lockdep_set_class(&inode->i_rwsem, 307 + &ocfs2_sysfile_lock_key[LOCAL_ALLOC_SYSTEM_INODE]); 308 + break; 309 + case TRUNCATE_LOG_SYSTEM_INODE: 310 + lockdep_set_class(&inode->i_rwsem, 311 + &ocfs2_sysfile_lock_key[TRUNCATE_LOG_SYSTEM_INODE]); 312 + break; 313 + case LOCAL_USER_QUOTA_SYSTEM_INODE: 314 + lockdep_set_class(&inode->i_rwsem, 315 + &ocfs2_sysfile_lock_key[LOCAL_USER_QUOTA_SYSTEM_INODE]); 316 + break; 317 + case LOCAL_GROUP_QUOTA_SYSTEM_INODE: 318 + lockdep_set_class(&inode->i_rwsem, 319 + &ocfs2_sysfile_lock_key[LOCAL_GROUP_QUOTA_SYSTEM_INODE]); 320 + break; 321 + default: 322 + WARN_ONCE(1, "Unknown sysfile type %d\n", args->fi_sysfile_type); 323 + } 259 324 if (args->fi_sysfile_type == USER_QUOTA_SYSTEM_INODE || 260 325 args->fi_sysfile_type == GROUP_QUOTA_SYSTEM_INODE || 261 326 args->fi_sysfile_type == LOCAL_USER_QUOTA_SYSTEM_INODE || ··· 328 267 else 329 268 lockdep_set_class(&OCFS2_I(inode)->ip_alloc_sem, 330 269 &ocfs2_file_ip_alloc_sem_key); 270 + #endif 331 271 332 272 return 0; 333 273 }

+9 -10

fs/ocfs2/move_extents.c

··· 617 617 */ 618 618 credits += OCFS2_INODE_UPDATE_CREDITS + 1; 619 619 620 + inode_lock(tl_inode); 621 + 620 622 /* 621 623 * ocfs2_move_extent() didn't reserve any clusters in lock_allocators() 622 624 * logic, while we still need to lock the global_bitmap. ··· 628 626 if (!gb_inode) { 629 627 mlog(ML_ERROR, "unable to get global_bitmap inode\n"); 630 628 ret = -EIO; 631 - goto out; 629 + goto out_unlock_tl_inode; 632 630 } 633 631 634 632 inode_lock(gb_inode); ··· 636 634 ret = ocfs2_inode_lock(gb_inode, &gb_bh, 1); 637 635 if (ret) { 638 636 mlog_errno(ret); 639 - goto out_unlock_gb_mutex; 637 + goto out_unlock_gb_inode; 640 638 } 641 - 642 - inode_lock(tl_inode); 643 639 644 640 handle = ocfs2_start_trans(osb, credits); 645 641 if (IS_ERR(handle)) { 646 642 ret = PTR_ERR(handle); 647 643 mlog_errno(ret); 648 - goto out_unlock_tl_inode; 644 + goto out_unlock; 649 645 } 650 646 651 647 new_phys_blkno = ocfs2_clusters_to_blocks(inode->i_sb, *new_phys_cpos); ··· 703 703 out_commit: 704 704 ocfs2_commit_trans(osb, handle); 705 705 brelse(gd_bh); 706 - 707 - out_unlock_tl_inode: 708 - inode_unlock(tl_inode); 709 - 706 + out_unlock: 710 707 ocfs2_inode_unlock(gb_inode, 1); 711 - out_unlock_gb_mutex: 708 + out_unlock_gb_inode: 712 709 inode_unlock(gb_inode); 713 710 brelse(gb_bh); 714 711 iput(gb_inode); 712 + out_unlock_tl_inode: 713 + inode_unlock(tl_inode); 715 714 716 715 out: 717 716 if (context->meta_ac) {

+7 -4

fs/ocfs2/namei.c

··· 142 142 143 143 bail_add: 144 144 ret = d_splice_alias(inode, dentry); 145 + if (IS_ERR(ret)) 146 + goto bail_unlock; 145 147 146 148 if (inode) { 147 149 /* ··· 156 154 * NOTE: This dentry already has ->d_op set from 157 155 * ocfs2_get_parent() and ocfs2_get_dentry() 158 156 */ 159 - if (!IS_ERR_OR_NULL(ret)) 157 + if (ret) 160 158 dentry = ret; 161 159 162 160 status = ocfs2_dentry_attach_lock(dentry, inode, 163 161 OCFS2_I(dir)->ip_blkno); 164 162 if (status) { 165 163 mlog_errno(status); 164 + if (ret) 165 + dput(ret); 166 166 ret = ERR_PTR(status); 167 - goto bail_unlock; 168 167 } 169 168 } else 170 169 ocfs2_dentry_attach_gen(dentry); ··· 1455 1452 newfe = (struct ocfs2_dinode *) newfe_bh->b_data; 1456 1453 1457 1454 trace_ocfs2_rename_over_existing( 1458 - (unsigned long long)newfe_blkno, newfe_bh, newfe_bh ? 1459 - (unsigned long long)newfe_bh->b_blocknr : 0ULL); 1455 + (unsigned long long)newfe_blkno, newfe_bh, 1456 + (unsigned long long)newfe_bh->b_blocknr); 1460 1457 1461 1458 if (S_ISDIR(new_inode->i_mode) || (new_inode->i_nlink == 1)) { 1462 1459 status = ocfs2_prepare_orphan_dir(osb, &orphan_dir,

+4 -11

fs/ocfs2/stack_user.c

··· 360 360 struct ocfs2_control_message_setn *msg) 361 361 { 362 362 long nodenum; 363 - char *ptr = NULL; 364 363 struct ocfs2_control_private *p = file->private_data; 365 364 366 365 if (ocfs2_control_get_handshake_state(file) != ··· 374 375 return -EINVAL; 375 376 msg->space = msg->newline = '\0'; 376 377 377 - nodenum = simple_strtol(msg->nodestr, &ptr, 16); 378 - if (!ptr || *ptr) 378 + if (kstrtol(msg->nodestr, 16, &nodenum)) 379 379 return -EINVAL; 380 380 381 381 if ((nodenum == LONG_MIN) || (nodenum == LONG_MAX) || ··· 389 391 struct ocfs2_control_message_setv *msg) 390 392 { 391 393 long major, minor; 392 - char *ptr = NULL; 393 394 struct ocfs2_control_private *p = file->private_data; 394 395 struct ocfs2_protocol_version *max = 395 396 &ocfs2_user_plugin.sp_max_proto; ··· 406 409 return -EINVAL; 407 410 msg->space1 = msg->space2 = msg->newline = '\0'; 408 411 409 - major = simple_strtol(msg->major, &ptr, 16); 410 - if (!ptr || *ptr) 412 + if (kstrtol(msg->major, 16, &major)) 411 413 return -EINVAL; 412 - minor = simple_strtol(msg->minor, &ptr, 16); 413 - if (!ptr || *ptr) 414 + if (kstrtol(msg->minor, 16, &minor)) 414 415 return -EINVAL; 415 416 416 417 /* ··· 436 441 struct ocfs2_control_message_down *msg) 437 442 { 438 443 long nodenum; 439 - char *p = NULL; 440 444 441 445 if (ocfs2_control_get_handshake_state(file) != 442 446 OCFS2_CONTROL_HANDSHAKE_VALID) ··· 450 456 return -EINVAL; 451 457 msg->space1 = msg->space2 = msg->newline = '\0'; 452 458 453 - nodenum = simple_strtol(msg->nodestr, &p, 16); 454 - if (!p || *p) 459 + if (kstrtol(msg->nodestr, 16, &nodenum)) 455 460 return -EINVAL; 456 461 457 462 if ((nodenum == LONG_MIN) || (nodenum == LONG_MAX) ||

+12 -17

fs/proc/vmcore.c

··· 1490 1490 return -EINVAL; 1491 1491 1492 1492 dump = vzalloc(sizeof(*dump)); 1493 - if (!dump) { 1494 - ret = -ENOMEM; 1495 - goto out_err; 1496 - } 1493 + if (!dump) 1494 + return -ENOMEM; 1497 1495 1498 1496 /* Keep size of the buffer page aligned so that it can be mmaped */ 1499 1497 data_size = roundup(sizeof(struct vmcoredd_header) + data->size, ··· 1517 1519 dump->size = data_size; 1518 1520 1519 1521 /* Add the dump to driver sysfs list and update the elfcore hdr */ 1520 - mutex_lock(&vmcore_mutex); 1521 - if (vmcore_opened) 1522 - pr_warn_once("Unexpected adding of device dump\n"); 1523 - if (vmcore_open) { 1524 - ret = -EBUSY; 1525 - goto unlock; 1522 + scoped_guard(mutex, &vmcore_mutex) { 1523 + if (vmcore_opened) 1524 + pr_warn_once("Unexpected adding of device dump\n"); 1525 + if (vmcore_open) { 1526 + ret = -EBUSY; 1527 + goto out_err; 1528 + } 1529 + 1530 + list_add_tail(&dump->list, &vmcoredd_list); 1531 + vmcoredd_update_size(data_size); 1526 1532 } 1527 - 1528 - list_add_tail(&dump->list, &vmcoredd_list); 1529 - vmcoredd_update_size(data_size); 1530 - mutex_unlock(&vmcore_mutex); 1531 1533 return 0; 1532 - 1533 - unlock: 1534 - mutex_unlock(&vmcore_mutex); 1535 1534 1536 1535 out_err: 1537 1536 vfree(buf);

+23 -24

fs/squashfs/block.c

··· 80 80 struct address_space *cache_mapping, u64 index, int length, 81 81 u64 read_start, u64 read_end, int page_count) 82 82 { 83 - struct page *head_to_cache = NULL, *tail_to_cache = NULL; 83 + struct folio *head_to_cache = NULL, *tail_to_cache = NULL; 84 84 struct block_device *bdev = fullbio->bi_bdev; 85 85 int start_idx = 0, end_idx = 0; 86 - struct bvec_iter_all iter_all; 86 + struct folio_iter fi; 87 87 struct bio *bio = NULL; 88 - struct bio_vec *bv; 89 88 int idx = 0; 90 89 int err = 0; 91 90 #ifdef CONFIG_SQUASHFS_COMP_CACHE_FULL 92 - struct page **cache_pages = kmalloc_array(page_count, 93 - sizeof(void *), GFP_KERNEL | __GFP_ZERO); 91 + struct folio **cache_folios = kmalloc_array(page_count, 92 + sizeof(*cache_folios), GFP_KERNEL | __GFP_ZERO); 94 93 #endif 95 94 96 - bio_for_each_segment_all(bv, fullbio, iter_all) { 97 - struct page *page = bv->bv_page; 95 + bio_for_each_folio_all(fi, fullbio) { 96 + struct folio *folio = fi.folio; 98 97 99 - if (page->mapping == cache_mapping) { 98 + if (folio->mapping == cache_mapping) { 100 99 idx++; 101 100 continue; 102 101 } ··· 110 111 * adjacent blocks. 111 112 */ 112 113 if (idx == 0 && index != read_start) 113 - head_to_cache = page; 114 + head_to_cache = folio; 114 115 else if (idx == page_count - 1 && index + length != read_end) 115 - tail_to_cache = page; 116 + tail_to_cache = folio; 116 117 #ifdef CONFIG_SQUASHFS_COMP_CACHE_FULL 117 118 /* Cache all pages in the BIO for repeated reads */ 118 - else if (cache_pages) 119 - cache_pages[idx] = page; 119 + else if (cache_folios) 120 + cache_folios[idx] = folio; 120 121 #endif 121 122 122 123 if (!bio || idx != end_idx) { ··· 149 150 return err; 150 151 151 152 if (head_to_cache) { 152 - int ret = add_to_page_cache_lru(head_to_cache, cache_mapping, 153 + int ret = filemap_add_folio(cache_mapping, head_to_cache, 153 154 read_start >> PAGE_SHIFT, 154 155 GFP_NOIO); 155 156 156 157 if (!ret) { 157 - SetPageUptodate(head_to_cache); 158 - unlock_page(head_to_cache); 158 + folio_mark_uptodate(head_to_cache); 159 + folio_unlock(head_to_cache); 159 160 } 160 161 161 162 } 162 163 163 164 if (tail_to_cache) { 164 - int ret = add_to_page_cache_lru(tail_to_cache, cache_mapping, 165 + int ret = filemap_add_folio(cache_mapping, tail_to_cache, 165 166 (read_end >> PAGE_SHIFT) - 1, 166 167 GFP_NOIO); 167 168 168 169 if (!ret) { 169 - SetPageUptodate(tail_to_cache); 170 - unlock_page(tail_to_cache); 170 + folio_mark_uptodate(tail_to_cache); 171 + folio_unlock(tail_to_cache); 171 172 } 172 173 } 173 174 174 175 #ifdef CONFIG_SQUASHFS_COMP_CACHE_FULL 175 - if (!cache_pages) 176 + if (!cache_folios) 176 177 goto out; 177 178 178 179 for (idx = 0; idx < page_count; idx++) { 179 - if (!cache_pages[idx]) 180 + if (!cache_folios[idx]) 180 181 continue; 181 - int ret = add_to_page_cache_lru(cache_pages[idx], cache_mapping, 182 + int ret = filemap_add_folio(cache_mapping, cache_folios[idx], 182 183 (read_start >> PAGE_SHIFT) + idx, 183 184 GFP_NOIO); 184 185 185 186 if (!ret) { 186 - SetPageUptodate(cache_pages[idx]); 187 - unlock_page(cache_pages[idx]); 187 + folio_mark_uptodate(cache_folios[idx]); 188 + folio_unlock(cache_folios[idx]); 188 189 } 189 190 } 190 - kfree(cache_pages); 191 + kfree(cache_folios); 191 192 out: 192 193 #endif 193 194 return 0;

+3 -4

fs/squashfs/file.c

··· 493 493 return res; 494 494 } 495 495 496 - static int squashfs_readahead_fragment(struct page **page, 496 + static int squashfs_readahead_fragment(struct inode *inode, struct page **page, 497 497 unsigned int pages, unsigned int expected, loff_t start) 498 498 { 499 - struct inode *inode = page[0]->mapping->host; 500 499 struct squashfs_cache_entry *buffer = squashfs_get_fragment(inode->i_sb, 501 500 squashfs_i(inode)->fragment_block, 502 501 squashfs_i(inode)->fragment_size); ··· 604 605 605 606 if (start >> msblk->block_log == file_end && 606 607 squashfs_i(inode)->fragment_block != SQUASHFS_INVALID_BLK) { 607 - res = squashfs_readahead_fragment(pages, nr_pages, 608 - expected, start); 608 + res = squashfs_readahead_fragment(inode, pages, 609 + nr_pages, expected, start); 609 610 if (res) 610 611 goto skip_pages; 611 612 continue;

+14 -1

include/linux/crash_reserve.h

··· 13 13 */ 14 14 extern struct resource crashk_res; 15 15 extern struct resource crashk_low_res; 16 + extern struct range crashk_cma_ranges[]; 17 + #if defined(CONFIG_CMA) && defined(CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION) 18 + #define CRASHKERNEL_CMA 19 + #define CRASHKERNEL_CMA_RANGES_MAX 4 20 + extern int crashk_cma_cnt; 21 + #else 22 + #define crashk_cma_cnt 0 23 + #define CRASHKERNEL_CMA_RANGES_MAX 0 24 + #endif 25 + 16 26 17 27 int __init parse_crashkernel(char *cmdline, unsigned long long system_ram, 18 28 unsigned long long *crash_size, unsigned long long *crash_base, 19 - unsigned long long *low_size, bool *high); 29 + unsigned long long *low_size, unsigned long long *cma_size, 30 + bool *high); 31 + 32 + void __init reserve_crashkernel_cma(unsigned long long cma_size); 20 33 21 34 #ifdef CONFIG_ARCH_HAS_GENERIC_CRASHKERNEL_RESERVATION 22 35 #ifndef DEFAULT_CRASH_KERNEL_LOW_SIZE

+3

include/linux/gcd.h

··· 3 3 #define _GCD_H 4 4 5 5 #include <linux/compiler.h> 6 + #include <linux/jump_label.h> 7 + 8 + DECLARE_STATIC_KEY_TRUE(efficient_ffs_key); 6 9 7 10 unsigned long gcd(unsigned long a, unsigned long b) __attribute_const__; 8 11

+9 -9

include/linux/hung_task.h

··· 21 21 * type. 22 22 * 23 23 * Type encoding: 24 - * 00 - Blocked on mutex (BLOCKER_TYPE_MUTEX) 25 - * 01 - Blocked on semaphore (BLOCKER_TYPE_SEM) 26 - * 10 - Blocked on rt-mutex (BLOCKER_TYPE_RTMUTEX) 27 - * 11 - Blocked on rw-semaphore (BLOCKER_TYPE_RWSEM) 24 + * 00 - Blocked on mutex (BLOCKER_TYPE_MUTEX) 25 + * 01 - Blocked on semaphore (BLOCKER_TYPE_SEM) 26 + * 10 - Blocked on rw-semaphore as READER (BLOCKER_TYPE_RWSEM_READER) 27 + * 11 - Blocked on rw-semaphore as WRITER (BLOCKER_TYPE_RWSEM_WRITER) 28 28 */ 29 - #define BLOCKER_TYPE_MUTEX 0x00UL 30 - #define BLOCKER_TYPE_SEM 0x01UL 31 - #define BLOCKER_TYPE_RTMUTEX 0x02UL 32 - #define BLOCKER_TYPE_RWSEM 0x03UL 29 + #define BLOCKER_TYPE_MUTEX 0x00UL 30 + #define BLOCKER_TYPE_SEM 0x01UL 31 + #define BLOCKER_TYPE_RWSEM_READER 0x02UL 32 + #define BLOCKER_TYPE_RWSEM_WRITER 0x03UL 33 33 34 - #define BLOCKER_TYPE_MASK 0x03UL 34 + #define BLOCKER_TYPE_MASK 0x03UL 35 35 36 36 #ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER 37 37 static inline void hung_task_set_blocker(void *lock, unsigned long type)

+4 -4

include/linux/jhash.h

··· 24 24 * Jozsef 25 25 */ 26 26 #include <linux/bitops.h> 27 - #include <linux/unaligned/packed_struct.h> 27 + #include <linux/unaligned.h> 28 28 29 29 /* Best hash sizes are of power of two */ 30 30 #define jhash_size(n) ((u32)1<<(n)) ··· 77 77 78 78 /* All but the last block: affect some 32 bits of (a,b,c) */ 79 79 while (length > 12) { 80 - a += __get_unaligned_cpu32(k); 81 - b += __get_unaligned_cpu32(k + 4); 82 - c += __get_unaligned_cpu32(k + 8); 80 + a += get_unaligned((u32 *)k); 81 + b += get_unaligned((u32 *)(k + 4)); 82 + c += get_unaligned((u32 *)(k + 8)); 83 83 __jhash_mix(a, b, c); 84 84 length -= 12; 85 85 k += 12;

+10

include/linux/kexec.h

··· 79 79 80 80 typedef unsigned long kimage_entry_t; 81 81 82 + /* 83 + * This is a copy of the UAPI struct kexec_segment and must be identical 84 + * to it because it gets copied straight from user space into kernel 85 + * memory. Do not modify this structure unless you change the way segments 86 + * get ingested from user space. 87 + */ 82 88 struct kexec_segment { 83 89 /* 84 90 * This pointer can point to user memory if kexec_load() system ··· 178 172 * @buf_align: Minimum alignment needed. 179 173 * @buf_min: The buffer can't be placed below this address. 180 174 * @buf_max: The buffer can't be placed above this address. 175 + * @cma: CMA page if the buffer is backed by CMA. 181 176 * @top_down: Allocate from top of memory. 182 177 * @random: Place the buffer at a random position. 183 178 */ ··· 191 184 unsigned long buf_align; 192 185 unsigned long buf_min; 193 186 unsigned long buf_max; 187 + struct page *cma; 194 188 bool top_down; 195 189 #ifdef CONFIG_CRASH_DUMP 196 190 bool random; ··· 348 340 349 341 unsigned long nr_segments; 350 342 struct kexec_segment segment[KEXEC_SEGMENT_MAX]; 343 + struct page *segment_cma[KEXEC_SEGMENT_MAX]; 351 344 352 345 struct list_head control_pages; 353 346 struct list_head dest_pages; ··· 370 361 */ 371 362 unsigned int hotplug_support:1; 372 363 #endif 364 + unsigned int no_cma:1; 373 365 374 366 #ifdef ARCH_HAS_KIMAGE_ARCH 375 367 struct kimage_arch arch;

+11 -1

include/linux/raid/pq.h

··· 11 11 #ifdef __KERNEL__ 12 12 13 13 #include <linux/blkdev.h> 14 + #include <linux/mm.h> 14 15 15 - extern const char raid6_empty_zero_page[PAGE_SIZE]; 16 + /* This should be const but the raid6 code is too convoluted for that. */ 17 + static inline void *raid6_get_zero_page(void) 18 + { 19 + return page_address(ZERO_PAGE(0)); 20 + } 16 21 17 22 #else /* ! __KERNEL__ */ 18 23 /* Used for testing in user space */ ··· 194 189 struct timeval tv; 195 190 gettimeofday(&tv, NULL); 196 191 return tv.tv_sec*1000 + tv.tv_usec/1000; 192 + } 193 + 194 + static inline void *raid6_get_zero_page(void) 195 + { 196 + return raid6_empty_zero_page; 197 197 } 198 198 199 199 #endif /* ! __KERNEL__ */

+19 -5

include/linux/relay.h

··· 29 29 #define RELAYFS_CHANNEL_VERSION 7 30 30 31 31 /* 32 + * Relay buffer statistics 33 + */ 34 + enum { 35 + RELAY_STATS_BUF_FULL = (1 << 0), 36 + RELAY_STATS_WRT_BIG = (1 << 1), 37 + 38 + RELAY_STATS_LAST = RELAY_STATS_WRT_BIG, 39 + }; 40 + 41 + struct rchan_buf_stats 42 + { 43 + unsigned int full_count; /* counter for buffer full */ 44 + unsigned int big_count; /* counter for too big to write */ 45 + }; 46 + 47 + /* 32 48 * Per-cpu relay channel buffer 33 49 */ 34 50 struct rchan_buf ··· 59 43 struct irq_work wakeup_work; /* reader wakeup */ 60 44 struct dentry *dentry; /* channel file dentry */ 61 45 struct kref kref; /* channel buffer refcount */ 46 + struct rchan_buf_stats stats; /* buffer stats */ 62 47 struct page **page_array; /* array of current buffer pages */ 63 48 unsigned int page_count; /* number of current buffer pages */ 64 49 unsigned int finalized; /* buffer has been finalized */ 65 50 size_t *padding; /* padding counts per sub-buffer */ 66 - size_t prev_padding; /* temporary variable */ 67 51 size_t bytes_consumed; /* bytes consumed in cur read subbuf */ 68 52 size_t early_bytes; /* bytes consumed before VFS inited */ 69 53 unsigned int cpu; /* this buf's cpu */ ··· 81 65 const struct rchan_callbacks *cb; /* client callbacks */ 82 66 struct kref kref; /* channel refcount */ 83 67 void *private_data; /* for user-defined data */ 84 - size_t last_toobig; /* tried to log event > subbuf size */ 85 68 struct rchan_buf * __percpu *buf; /* per-cpu channel buffers */ 86 69 int is_global; /* One global buffer ? */ 87 70 struct list_head list; /* for channel list */ ··· 99 84 * @buf: the channel buffer containing the new sub-buffer 100 85 * @subbuf: the start of the new sub-buffer 101 86 * @prev_subbuf: the start of the previous sub-buffer 102 - * @prev_padding: unused space at the end of previous sub-buffer 103 87 * 104 88 * The client should return 1 to continue logging, 0 to stop 105 89 * logging. ··· 114 100 */ 115 101 int (*subbuf_start) (struct rchan_buf *buf, 116 102 void *subbuf, 117 - void *prev_subbuf, 118 - size_t prev_padding); 103 + void *prev_subbuf); 119 104 120 105 /* 121 106 * create_buf_file - create file to represent a relay channel buffer ··· 174 161 void *private_data); 175 162 extern void relay_close(struct rchan *chan); 176 163 extern void relay_flush(struct rchan *chan); 164 + size_t relay_stats(struct rchan *chan, int flags); 177 165 extern void relay_subbufs_consumed(struct rchan *chan, 178 166 unsigned int cpu, 179 167 size_t consumed);

+12

include/linux/rwsem.h

··· 132 132 return !list_empty(&sem->wait_list); 133 133 } 134 134 135 + #if defined(CONFIG_DEBUG_RWSEMS) || defined(CONFIG_DETECT_HUNG_TASK_BLOCKER) 136 + /* 137 + * Return just the real task structure pointer of the owner 138 + */ 139 + extern struct task_struct *rwsem_owner(struct rw_semaphore *sem); 140 + 141 + /* 142 + * Return true if the rwsem is owned by a reader. 143 + */ 144 + extern bool is_rwsem_reader_owned(struct rw_semaphore *sem); 145 + #endif 146 + 135 147 #else /* !CONFIG_PREEMPT_RT */ 136 148 137 149 #include <linux/rwbase_rt.h>

+28

include/linux/sys_info.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #ifndef _LINUX_SYS_INFO_H 3 + #define _LINUX_SYS_INFO_H 4 + 5 + #include <linux/sysctl.h> 6 + 7 + /* 8 + * SYS_INFO_PANIC_CONSOLE_REPLAY is for panic case only, as it needs special 9 + * handling which only fits panic case. 10 + */ 11 + #define SYS_INFO_TASKS 0x00000001 12 + #define SYS_INFO_MEM 0x00000002 13 + #define SYS_INFO_TIMERS 0x00000004 14 + #define SYS_INFO_LOCKS 0x00000008 15 + #define SYS_INFO_FTRACE 0x00000010 16 + #define SYS_INFO_PANIC_CONSOLE_REPLAY 0x00000020 17 + #define SYS_INFO_ALL_CPU_BT 0x00000040 18 + #define SYS_INFO_BLOCKED_TASKS 0x00000080 19 + 20 + void sys_info(unsigned long si_mask); 21 + unsigned long sys_info_parse_param(char *str); 22 + 23 + #ifdef CONFIG_SYSCTL 24 + int sysctl_sys_info_handler(const struct ctl_table *ro_table, int write, 25 + void *buffer, size_t *lenp, 26 + loff_t *ppos); 27 + #endif 28 + #endif /* _LINUX_SYS_INFO_H */

-26

include/linux/xxhash.h

··· 178 178 void xxh32_reset(struct xxh32_state *state, uint32_t seed); 179 179 180 180 /** 181 - * xxh32_update() - hash the data given and update the xxh32 state 182 - * 183 - * @state: The xxh32 state to update. 184 - * @input: The data to hash. 185 - * @length: The length of the data to hash. 186 - * 187 - * After calling xxh32_reset() call xxh32_update() as many times as necessary. 188 - * 189 - * Return: Zero on success, otherwise an error code. 190 - */ 191 - int xxh32_update(struct xxh32_state *state, const void *input, size_t length); 192 - 193 - /** 194 - * xxh32_digest() - produce the current xxh32 hash 195 - * 196 - * @state: Produce the current xxh32 hash of this state. 197 - * 198 - * A hash value can be produced at any time. It is still possible to continue 199 - * inserting input into the hash state after a call to xxh32_digest(), and 200 - * generate new hashes later on, by calling xxh32_digest() again. 201 - * 202 - * Return: The xxh32 hash stored in the state. 203 - */ 204 - uint32_t xxh32_digest(const struct xxh32_state *state); 205 - 206 - /** 207 181 * xxh64_reset() - reset the xxh64 state to start a new hashing operation 208 182 * 209 183 * @state: The xxh64 state to reset.

+1

include/uapi/linux/kexec.h

··· 27 27 #define KEXEC_FILE_ON_CRASH 0x00000002 28 28 #define KEXEC_FILE_NO_INITRAMFS 0x00000004 29 29 #define KEXEC_FILE_DEBUG 0x00000008 30 + #define KEXEC_FILE_NO_CMA 0x00000010 30 31 31 32 /* These values match the ELF architecture values. 32 33 * Unless there is a good reason that should continue to be the case.

+1 -1

include/xen/xenbus.h

··· 178 178 * sprintf-style type string, and pointer. Returns 0 or errno.*/ 179 179 int xenbus_gather(struct xenbus_transaction t, const char *dir, ...); 180 180 181 - /* notifer routines for when the xenstore comes up */ 181 + /* notifier routines for when the xenstore comes up */ 182 182 extern int xenstored_ready; 183 183 int register_xenstore_notifier(struct notifier_block *nb); 184 184 void unregister_xenstore_notifier(struct notifier_block *nb);

+4

init/Kconfig

··· 172 172 173 173 config BROKEN 174 174 bool 175 + help 176 + This option allows you to choose whether you want to try to 177 + compile (and fix) old drivers that haven't been updated to 178 + new infrastructure. 175 179 176 180 config BROKEN_ON_SMP 177 181 bool

+5 -1

init/main.c

··· 1587 1587 * check if there is an early userspace init. If yes, let it do all 1588 1588 * the work 1589 1589 */ 1590 - if (init_eaccess(ramdisk_execute_command) != 0) { 1590 + int ramdisk_command_access; 1591 + ramdisk_command_access = init_eaccess(ramdisk_execute_command); 1592 + if (ramdisk_command_access != 0) { 1593 + pr_warn("check access for rdinit=%s failed: %i, ignoring\n", 1594 + ramdisk_execute_command, ramdisk_command_access); 1591 1595 ramdisk_execute_command = NULL; 1592 1596 prepare_namespace(); 1593 1597 }

+15

kernel/crash_core.c

··· 21 21 #include <linux/reboot.h> 22 22 #include <linux/btf.h> 23 23 #include <linux/objtool.h> 24 + #include <linux/delay.h> 24 25 25 26 #include <asm/page.h> 26 27 #include <asm/sections.h> ··· 33 32 34 33 /* Per cpu memory for storing cpu states in case of system crash. */ 35 34 note_buf_t __percpu *crash_notes; 35 + 36 + /* time to wait for possible DMA to finish before starting the kdump kernel 37 + * when a CMA reservation is used 38 + */ 39 + #define CMA_DMA_TIMEOUT_SEC 10 36 40 37 41 #ifdef CONFIG_CRASH_DUMP 38 42 ··· 103 97 } 104 98 EXPORT_SYMBOL_GPL(kexec_crash_loaded); 105 99 100 + static void crash_cma_clear_pending_dma(void) 101 + { 102 + if (!crashk_cma_cnt) 103 + return; 104 + 105 + mdelay(CMA_DMA_TIMEOUT_SEC * 1000); 106 + } 107 + 106 108 /* 107 109 * No panic_cpu check version of crash_kexec(). This function is called 108 110 * only when panic_cpu holds the current CPU number; this is the only CPU ··· 133 119 crash_setup_regs(&fixed_regs, regs); 134 120 crash_save_vmcoreinfo(); 135 121 machine_crash_shutdown(&fixed_regs); 122 + crash_cma_clear_pending_dma(); 136 123 machine_kexec(kexec_crash_image); 137 124 } 138 125 kexec_unlock();

+66 -2

kernel/crash_reserve.c

··· 14 14 #include <linux/cpuhotplug.h> 15 15 #include <linux/memblock.h> 16 16 #include <linux/kmemleak.h> 17 + #include <linux/cma.h> 18 + #include <linux/crash_reserve.h> 17 19 18 20 #include <asm/page.h> 19 21 #include <asm/sections.h> ··· 174 172 175 173 #define SUFFIX_HIGH 0 176 174 #define SUFFIX_LOW 1 177 - #define SUFFIX_NULL 2 175 + #define SUFFIX_CMA 2 176 + #define SUFFIX_NULL 3 178 177 static __initdata char *suffix_tbl[] = { 179 178 [SUFFIX_HIGH] = ",high", 180 179 [SUFFIX_LOW] = ",low", 180 + [SUFFIX_CMA] = ",cma", 181 181 [SUFFIX_NULL] = NULL, 182 182 }; 183 183 184 184 /* 185 185 * That function parses "suffix" crashkernel command lines like 186 186 * 187 - * crashkernel=size,[high|low] 187 + * crashkernel=size,[high|low|cma] 188 188 * 189 189 * It returns 0 on success and -EINVAL on failure. 190 190 */ ··· 302 298 unsigned long long *crash_size, 303 299 unsigned long long *crash_base, 304 300 unsigned long long *low_size, 301 + unsigned long long *cma_size, 305 302 bool *high) 306 303 { 307 304 int ret; 305 + unsigned long long __always_unused cma_base; 308 306 309 307 /* crashkernel=X[@offset] */ 310 308 ret = __parse_crashkernel(cmdline, system_ram, crash_size, ··· 337 331 338 332 *high = true; 339 333 } 334 + 335 + /* 336 + * optional CMA reservation 337 + * cma_base is ignored 338 + */ 339 + if (cma_size) 340 + __parse_crashkernel(cmdline, 0, cma_size, 341 + &cma_base, suffix_tbl[SUFFIX_CMA]); 340 342 #endif 341 343 if (!*crash_size) 342 344 ret = -EINVAL; ··· 470 456 insert_resource(&iomem_resource, &crashk_res); 471 457 #endif 472 458 } 459 + 460 + struct range crashk_cma_ranges[CRASHKERNEL_CMA_RANGES_MAX]; 461 + #ifdef CRASHKERNEL_CMA 462 + int crashk_cma_cnt; 463 + void __init reserve_crashkernel_cma(unsigned long long cma_size) 464 + { 465 + unsigned long long request_size = roundup(cma_size, PAGE_SIZE); 466 + unsigned long long reserved_size = 0; 467 + 468 + if (!cma_size) 469 + return; 470 + 471 + while (cma_size > reserved_size && 472 + crashk_cma_cnt < CRASHKERNEL_CMA_RANGES_MAX) { 473 + 474 + struct cma *res; 475 + 476 + if (cma_declare_contiguous(0, request_size, 0, 0, 0, false, 477 + "crashkernel", &res)) { 478 + /* reservation failed, try half-sized blocks */ 479 + if (request_size <= PAGE_SIZE) 480 + break; 481 + 482 + request_size = roundup(request_size / 2, PAGE_SIZE); 483 + continue; 484 + } 485 + 486 + crashk_cma_ranges[crashk_cma_cnt].start = cma_get_base(res); 487 + crashk_cma_ranges[crashk_cma_cnt].end = 488 + crashk_cma_ranges[crashk_cma_cnt].start + 489 + cma_get_size(res) - 1; 490 + ++crashk_cma_cnt; 491 + reserved_size += request_size; 492 + } 493 + 494 + if (cma_size > reserved_size) 495 + pr_warn("crashkernel CMA reservation failed: %lld MB requested, %lld MB reserved in %d ranges\n", 496 + cma_size >> 20, reserved_size >> 20, crashk_cma_cnt); 497 + else 498 + pr_info("crashkernel CMA reserved: %lld MB in %d ranges\n", 499 + reserved_size >> 20, crashk_cma_cnt); 500 + } 501 + 502 + #else /* CRASHKERNEL_CMA */ 503 + void __init reserve_crashkernel_cma(unsigned long long cma_size) 504 + { 505 + if (cma_size) 506 + pr_warn("crashkernel CMA reservation not supported\n"); 507 + } 508 + #endif 473 509 474 510 #ifndef HAVE_ARCH_ADD_CRASH_RES_TO_IOMEM_EARLY 475 511 static __init int insert_crashkernel_resources(void)

+2 -2

kernel/events/uprobes.c

··· 580 580 581 581 out: 582 582 /* Revert back reference counter if instruction update failed. */ 583 - if (ret < 0 && is_register && ref_ctr_updated) 584 - update_ref_ctr(uprobe, mm, -1); 583 + if (ret < 0 && ref_ctr_updated) 584 + update_ref_ctr(uprobe, mm, is_register ? -1 : 1); 585 585 586 586 /* try collapse pmd for compound page */ 587 587 if (ret > 0)

+1 -6

kernel/exit.c

··· 693 693 } 694 694 695 695 /* 696 - * This does two things: 697 - * 698 - * A. Make init inherit all the child processes 699 - * B. Check to see if any process groups have become orphaned 700 - * as a result of our exiting, and if they have any stopped 701 - * jobs, send them a SIGHUP and then a SIGCONT. (POSIX 3.2.2.2) 696 + * Make init inherit all the child processes 702 697 */ 703 698 static void forget_original_parent(struct task_struct *father, 704 699 struct list_head *dead)

+47 -48

kernel/fork.c

··· 189 189 kmem_cache_free(task_struct_cachep, tsk); 190 190 } 191 191 192 - /* 193 - * Allocate pages if THREAD_SIZE is >= PAGE_SIZE, otherwise use a 194 - * kmemcache based allocator. 195 - */ 196 - # if THREAD_SIZE >= PAGE_SIZE || defined(CONFIG_VMAP_STACK) 197 - 198 - # ifdef CONFIG_VMAP_STACK 192 + #ifdef CONFIG_VMAP_STACK 199 193 /* 200 194 * vmalloc() is a bit slow, and calling vfree() enough times will force a TLB 201 195 * flush. Try to minimize the number of calls by caching stacks. 202 196 */ 203 197 #define NR_CACHED_STACKS 2 204 198 static DEFINE_PER_CPU(struct vm_struct *, cached_stacks[NR_CACHED_STACKS]); 199 + /* 200 + * Allocated stacks are cached and later reused by new threads, so memcg 201 + * accounting is performed by the code assigning/releasing stacks to tasks. 202 + * We need a zeroed memory without __GFP_ACCOUNT. 203 + */ 204 + #define GFP_VMAP_STACK (GFP_KERNEL | __GFP_ZERO) 205 205 206 206 struct vm_stack { 207 207 struct rcu_head rcu; 208 208 struct vm_struct *stack_vm_area; 209 209 }; 210 210 211 - static bool try_release_thread_stack_to_cache(struct vm_struct *vm) 211 + static bool try_release_thread_stack_to_cache(struct vm_struct *vm_area) 212 212 { 213 213 unsigned int i; 214 214 215 215 for (i = 0; i < NR_CACHED_STACKS; i++) { 216 216 struct vm_struct *tmp = NULL; 217 217 218 - if (this_cpu_try_cmpxchg(cached_stacks[i], &tmp, vm)) 218 + if (this_cpu_try_cmpxchg(cached_stacks[i], &tmp, vm_area)) 219 219 return true; 220 220 } 221 221 return false; ··· 224 224 static void thread_stack_free_rcu(struct rcu_head *rh) 225 225 { 226 226 struct vm_stack *vm_stack = container_of(rh, struct vm_stack, rcu); 227 + struct vm_struct *vm_area = vm_stack->stack_vm_area; 227 228 228 229 if (try_release_thread_stack_to_cache(vm_stack->stack_vm_area)) 229 230 return; 230 231 231 - vfree(vm_stack); 232 + vfree(vm_area->addr); 232 233 } 233 234 234 235 static void thread_stack_delayed_free(struct task_struct *tsk) ··· 242 241 243 242 static int free_vm_stack_cache(unsigned int cpu) 244 243 { 245 - struct vm_struct **cached_vm_stacks = per_cpu_ptr(cached_stacks, cpu); 244 + struct vm_struct **cached_vm_stack_areas = per_cpu_ptr(cached_stacks, cpu); 246 245 int i; 247 246 248 247 for (i = 0; i < NR_CACHED_STACKS; i++) { 249 - struct vm_struct *vm_stack = cached_vm_stacks[i]; 248 + struct vm_struct *vm_area = cached_vm_stack_areas[i]; 250 249 251 - if (!vm_stack) 250 + if (!vm_area) 252 251 continue; 253 252 254 - vfree(vm_stack->addr); 255 - cached_vm_stacks[i] = NULL; 253 + vfree(vm_area->addr); 254 + cached_vm_stack_areas[i] = NULL; 256 255 } 257 256 258 257 return 0; 259 258 } 260 259 261 - static int memcg_charge_kernel_stack(struct vm_struct *vm) 260 + static int memcg_charge_kernel_stack(struct vm_struct *vm_area) 262 261 { 263 262 int i; 264 263 int ret; 265 264 int nr_charged = 0; 266 265 267 - BUG_ON(vm->nr_pages != THREAD_SIZE / PAGE_SIZE); 266 + BUG_ON(vm_area->nr_pages != THREAD_SIZE / PAGE_SIZE); 268 267 269 268 for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++) { 270 - ret = memcg_kmem_charge_page(vm->pages[i], GFP_KERNEL, 0); 269 + ret = memcg_kmem_charge_page(vm_area->pages[i], GFP_KERNEL, 0); 271 270 if (ret) 272 271 goto err; 273 272 nr_charged++; ··· 275 274 return 0; 276 275 err: 277 276 for (i = 0; i < nr_charged; i++) 278 - memcg_kmem_uncharge_page(vm->pages[i], 0); 277 + memcg_kmem_uncharge_page(vm_area->pages[i], 0); 279 278 return ret; 280 279 } 281 280 282 281 static int alloc_thread_stack_node(struct task_struct *tsk, int node) 283 282 { 284 - struct vm_struct *vm; 283 + struct vm_struct *vm_area; 285 284 void *stack; 286 285 int i; 287 286 288 287 for (i = 0; i < NR_CACHED_STACKS; i++) { 289 - struct vm_struct *s; 290 - 291 - s = this_cpu_xchg(cached_stacks[i], NULL); 292 - 293 - if (!s) 288 + vm_area = this_cpu_xchg(cached_stacks[i], NULL); 289 + if (!vm_area) 294 290 continue; 295 291 296 292 /* Reset stack metadata. */ 297 - kasan_unpoison_range(s->addr, THREAD_SIZE); 293 + kasan_unpoison_range(vm_area->addr, THREAD_SIZE); 298 294 299 - stack = kasan_reset_tag(s->addr); 295 + stack = kasan_reset_tag(vm_area->addr); 300 296 301 297 /* Clear stale pointers from reused stack. */ 302 298 memset(stack, 0, THREAD_SIZE); 303 299 304 - if (memcg_charge_kernel_stack(s)) { 305 - vfree(s->addr); 300 + if (memcg_charge_kernel_stack(vm_area)) { 301 + vfree(vm_area->addr); 306 302 return -ENOMEM; 307 303 } 308 304 309 - tsk->stack_vm_area = s; 305 + tsk->stack_vm_area = vm_area; 310 306 tsk->stack = stack; 311 307 return 0; 312 308 } 313 309 314 - /* 315 - * Allocated stacks are cached and later reused by new threads, 316 - * so memcg accounting is performed manually on assigning/releasing 317 - * stacks to tasks. Drop __GFP_ACCOUNT. 318 - */ 319 310 stack = __vmalloc_node(THREAD_SIZE, THREAD_ALIGN, 320 - THREADINFO_GFP & ~__GFP_ACCOUNT, 311 + GFP_VMAP_STACK, 321 312 node, __builtin_return_address(0)); 322 313 if (!stack) 323 314 return -ENOMEM; 324 315 325 - vm = find_vm_area(stack); 326 - if (memcg_charge_kernel_stack(vm)) { 316 + vm_area = find_vm_area(stack); 317 + if (memcg_charge_kernel_stack(vm_area)) { 327 318 vfree(stack); 328 319 return -ENOMEM; 329 320 } ··· 324 331 * free_thread_stack() can be called in interrupt context, 325 332 * so cache the vm_struct. 326 333 */ 327 - tsk->stack_vm_area = vm; 334 + tsk->stack_vm_area = vm_area; 328 335 stack = kasan_reset_tag(stack); 329 336 tsk->stack = stack; 330 337 return 0; ··· 339 346 tsk->stack_vm_area = NULL; 340 347 } 341 348 342 - # else /* !CONFIG_VMAP_STACK */ 349 + #else /* !CONFIG_VMAP_STACK */ 350 + 351 + /* 352 + * Allocate pages if THREAD_SIZE is >= PAGE_SIZE, otherwise use a 353 + * kmemcache based allocator. 354 + */ 355 + #if THREAD_SIZE >= PAGE_SIZE 343 356 344 357 static void thread_stack_free_rcu(struct rcu_head *rh) 345 358 { ··· 377 378 tsk->stack = NULL; 378 379 } 379 380 380 - # endif /* CONFIG_VMAP_STACK */ 381 - # else /* !(THREAD_SIZE >= PAGE_SIZE || defined(CONFIG_VMAP_STACK)) */ 381 + #else /* !(THREAD_SIZE >= PAGE_SIZE) */ 382 382 383 383 static struct kmem_cache *thread_stack_cache; 384 384 ··· 416 418 BUG_ON(thread_stack_cache == NULL); 417 419 } 418 420 419 - # endif /* THREAD_SIZE >= PAGE_SIZE || defined(CONFIG_VMAP_STACK) */ 421 + #endif /* THREAD_SIZE >= PAGE_SIZE */ 422 + #endif /* CONFIG_VMAP_STACK */ 420 423 421 424 /* SLAB cache for signal_struct structures (tsk->signal) */ 422 425 static struct kmem_cache *signal_cachep; ··· 437 438 static void account_kernel_stack(struct task_struct *tsk, int account) 438 439 { 439 440 if (IS_ENABLED(CONFIG_VMAP_STACK)) { 440 - struct vm_struct *vm = task_stack_vm_area(tsk); 441 + struct vm_struct *vm_area = task_stack_vm_area(tsk); 441 442 int i; 442 443 443 444 for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++) 444 - mod_lruvec_page_state(vm->pages[i], NR_KERNEL_STACK_KB, 445 + mod_lruvec_page_state(vm_area->pages[i], NR_KERNEL_STACK_KB, 445 446 account * (PAGE_SIZE / 1024)); 446 447 } else { 447 448 void *stack = task_stack_page(tsk); ··· 457 458 account_kernel_stack(tsk, -1); 458 459 459 460 if (IS_ENABLED(CONFIG_VMAP_STACK)) { 460 - struct vm_struct *vm; 461 + struct vm_struct *vm_area; 461 462 int i; 462 463 463 - vm = task_stack_vm_area(tsk); 464 + vm_area = task_stack_vm_area(tsk); 464 465 for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++) 465 - memcg_kmem_uncharge_page(vm->pages[i], 0); 466 + memcg_kmem_uncharge_page(vm_area->pages[i], 0); 466 467 } 467 468 } 468 469

+25 -4

kernel/hung_task.c

··· 23 23 #include <linux/sched/debug.h> 24 24 #include <linux/sched/sysctl.h> 25 25 #include <linux/hung_task.h> 26 + #include <linux/rwsem.h> 26 27 27 28 #include <trace/events/sched.h> 28 29 ··· 101 100 { 102 101 struct task_struct *g, *t; 103 102 unsigned long owner, blocker, blocker_type; 103 + const char *rwsem_blocked_by, *rwsem_blocked_as; 104 104 105 105 RCU_LOCKDEP_WARN(!rcu_read_lock_held(), "No rcu lock held"); 106 106 ··· 113 111 114 112 switch (blocker_type) { 115 113 case BLOCKER_TYPE_MUTEX: 116 - owner = mutex_get_owner( 117 - (struct mutex *)hung_task_blocker_to_lock(blocker)); 114 + owner = mutex_get_owner(hung_task_blocker_to_lock(blocker)); 118 115 break; 119 116 case BLOCKER_TYPE_SEM: 120 - owner = sem_last_holder( 121 - (struct semaphore *)hung_task_blocker_to_lock(blocker)); 117 + owner = sem_last_holder(hung_task_blocker_to_lock(blocker)); 118 + break; 119 + case BLOCKER_TYPE_RWSEM_READER: 120 + case BLOCKER_TYPE_RWSEM_WRITER: 121 + owner = (unsigned long)rwsem_owner( 122 + hung_task_blocker_to_lock(blocker)); 123 + rwsem_blocked_as = (blocker_type == BLOCKER_TYPE_RWSEM_READER) ? 124 + "reader" : "writer"; 125 + rwsem_blocked_by = is_rwsem_reader_owned( 126 + hung_task_blocker_to_lock(blocker)) ? 127 + "reader" : "writer"; 122 128 break; 123 129 default: 124 130 WARN_ON_ONCE(1); ··· 142 132 break; 143 133 case BLOCKER_TYPE_SEM: 144 134 pr_err("INFO: task %s:%d is blocked on a semaphore, but the last holder is not found.\n", 135 + task->comm, task->pid); 136 + break; 137 + case BLOCKER_TYPE_RWSEM_READER: 138 + case BLOCKER_TYPE_RWSEM_WRITER: 139 + pr_err("INFO: task %s:%d is blocked on an rw-semaphore, but the owner is not found.\n", 145 140 task->comm, task->pid); 146 141 break; 147 142 } ··· 166 151 case BLOCKER_TYPE_SEM: 167 152 pr_err("INFO: task %s:%d blocked on a semaphore likely last held by task %s:%d\n", 168 153 task->comm, task->pid, t->comm, t->pid); 154 + break; 155 + case BLOCKER_TYPE_RWSEM_READER: 156 + case BLOCKER_TYPE_RWSEM_WRITER: 157 + pr_err("INFO: task %s:%d <%s> blocked on an rw-semaphore likely owned by task %s:%d <%s>\n", 158 + task->comm, task->pid, rwsem_blocked_as, t->comm, 159 + t->pid, rwsem_blocked_by); 169 160 break; 170 161 } 171 162 sched_show_task(t);

+1 -1

kernel/kcov.c

··· 552 552 553 553 /* 554 554 * Fault in a lazily-faulted vmalloc area before it can be used by 555 - * __santizer_cov_trace_pc(), to avoid recursion issues if any code on the 555 + * __sanitizer_cov_trace_pc(), to avoid recursion issues if any code on the 556 556 * vmalloc fault handling path is instrumented. 557 557 */ 558 558 static void kcov_fault_in_area(struct kcov *kcov)

+1 -1

kernel/kexec.c

··· 152 152 goto out; 153 153 154 154 for (i = 0; i < nr_segments; i++) { 155 - ret = kimage_load_segment(image, &image->segment[i]); 155 + ret = kimage_load_segment(image, i); 156 156 if (ret) 157 157 goto out; 158 158 }

+92 -8

kernel/kexec_core.c

··· 40 40 #include <linux/hugetlb.h> 41 41 #include <linux/objtool.h> 42 42 #include <linux/kmsg_dump.h> 43 + #include <linux/dma-map-ops.h> 43 44 44 45 #include <asm/page.h> 45 46 #include <asm/sections.h> ··· 554 553 kimage_free_pages(page); 555 554 } 556 555 556 + static void kimage_free_cma(struct kimage *image) 557 + { 558 + unsigned long i; 559 + 560 + for (i = 0; i < image->nr_segments; i++) { 561 + struct page *cma = image->segment_cma[i]; 562 + u32 nr_pages = image->segment[i].memsz >> PAGE_SHIFT; 563 + 564 + if (!cma) 565 + continue; 566 + 567 + arch_kexec_pre_free_pages(page_address(cma), nr_pages); 568 + dma_release_from_contiguous(NULL, cma, nr_pages); 569 + image->segment_cma[i] = NULL; 570 + } 571 + 572 + } 573 + 557 574 void kimage_free(struct kimage *image) 558 575 { 559 576 kimage_entry_t *ptr, entry; ··· 609 590 610 591 /* Free the kexec control pages... */ 611 592 kimage_free_page_list(&image->control_pages); 593 + 594 + /* Free CMA allocations */ 595 + kimage_free_cma(image); 612 596 613 597 /* 614 598 * Free up any temporary buffers allocated. This might hit if ··· 738 716 return page; 739 717 } 740 718 741 - static int kimage_load_normal_segment(struct kimage *image, 742 - struct kexec_segment *segment) 719 + static int kimage_load_cma_segment(struct kimage *image, int idx) 743 720 { 721 + struct kexec_segment *segment = &image->segment[idx]; 722 + struct page *cma = image->segment_cma[idx]; 723 + char *ptr = page_address(cma); 724 + unsigned long maddr; 725 + size_t ubytes, mbytes; 726 + int result = 0; 727 + unsigned char __user *buf = NULL; 728 + unsigned char *kbuf = NULL; 729 + 730 + if (image->file_mode) 731 + kbuf = segment->kbuf; 732 + else 733 + buf = segment->buf; 734 + ubytes = segment->bufsz; 735 + mbytes = segment->memsz; 736 + maddr = segment->mem; 737 + 738 + /* Then copy from source buffer to the CMA one */ 739 + while (mbytes) { 740 + size_t uchunk, mchunk; 741 + 742 + ptr += maddr & ~PAGE_MASK; 743 + mchunk = min_t(size_t, mbytes, 744 + PAGE_SIZE - (maddr & ~PAGE_MASK)); 745 + uchunk = min(ubytes, mchunk); 746 + 747 + if (uchunk) { 748 + /* For file based kexec, source pages are in kernel memory */ 749 + if (image->file_mode) 750 + memcpy(ptr, kbuf, uchunk); 751 + else 752 + result = copy_from_user(ptr, buf, uchunk); 753 + ubytes -= uchunk; 754 + if (image->file_mode) 755 + kbuf += uchunk; 756 + else 757 + buf += uchunk; 758 + } 759 + 760 + if (result) { 761 + result = -EFAULT; 762 + goto out; 763 + } 764 + 765 + ptr += mchunk; 766 + maddr += mchunk; 767 + mbytes -= mchunk; 768 + 769 + cond_resched(); 770 + } 771 + 772 + /* Clear any remainder */ 773 + memset(ptr, 0, mbytes); 774 + 775 + out: 776 + return result; 777 + } 778 + 779 + static int kimage_load_normal_segment(struct kimage *image, int idx) 780 + { 781 + struct kexec_segment *segment = &image->segment[idx]; 744 782 unsigned long maddr; 745 783 size_t ubytes, mbytes; 746 784 int result; ··· 814 732 ubytes = segment->bufsz; 815 733 mbytes = segment->memsz; 816 734 maddr = segment->mem; 735 + 736 + if (image->segment_cma[idx]) 737 + return kimage_load_cma_segment(image, idx); 817 738 818 739 result = kimage_set_destination(image, maddr); 819 740 if (result < 0) ··· 872 787 } 873 788 874 789 #ifdef CONFIG_CRASH_DUMP 875 - static int kimage_load_crash_segment(struct kimage *image, 876 - struct kexec_segment *segment) 790 + static int kimage_load_crash_segment(struct kimage *image, int idx) 877 791 { 878 792 /* For crash dumps kernels we simply copy the data from 879 793 * user space to it's destination. 880 794 * We do things a page at a time for the sake of kmap. 881 795 */ 796 + struct kexec_segment *segment = &image->segment[idx]; 882 797 unsigned long maddr; 883 798 size_t ubytes, mbytes; 884 799 int result; ··· 943 858 } 944 859 #endif 945 860 946 - int kimage_load_segment(struct kimage *image, 947 - struct kexec_segment *segment) 861 + int kimage_load_segment(struct kimage *image, int idx) 948 862 { 949 863 int result = -ENOMEM; 950 864 951 865 switch (image->type) { 952 866 case KEXEC_TYPE_DEFAULT: 953 - result = kimage_load_normal_segment(image, segment); 867 + result = kimage_load_normal_segment(image, idx); 954 868 break; 955 869 #ifdef CONFIG_CRASH_DUMP 956 870 case KEXEC_TYPE_CRASH: 957 - result = kimage_load_crash_segment(image, segment); 871 + result = kimage_load_crash_segment(image, idx); 958 872 break; 959 873 #endif 960 874 }

+50 -1

kernel/kexec_file.c

··· 26 26 #include <linux/kernel_read_file.h> 27 27 #include <linux/syscalls.h> 28 28 #include <linux/vmalloc.h> 29 + #include <linux/dma-map-ops.h> 29 30 #include "kexec_internal.h" 30 31 31 32 #ifdef CONFIG_KEXEC_SIG ··· 254 253 ret = 0; 255 254 } 256 255 256 + image->no_cma = !!(flags & KEXEC_FILE_NO_CMA); 257 + 257 258 if (cmdline_len) { 258 259 image->cmdline_buf = memdup_user(cmdline_ptr, cmdline_len); 259 260 if (IS_ERR(image->cmdline_buf)) { ··· 437 434 i, ksegment->buf, ksegment->bufsz, ksegment->mem, 438 435 ksegment->memsz); 439 436 440 - ret = kimage_load_segment(image, &image->segment[i]); 437 + ret = kimage_load_segment(image, i); 441 438 if (ret) 442 439 goto out; 443 440 } ··· 666 663 return walk_system_ram_res(0, ULONG_MAX, kbuf, func); 667 664 } 668 665 666 + static int kexec_alloc_contig(struct kexec_buf *kbuf) 667 + { 668 + size_t nr_pages = kbuf->memsz >> PAGE_SHIFT; 669 + unsigned long mem; 670 + struct page *p; 671 + 672 + /* User space disabled CMA allocations, bail out. */ 673 + if (kbuf->image->no_cma) 674 + return -EPERM; 675 + 676 + /* Skip CMA logic for crash kernel */ 677 + if (kbuf->image->type == KEXEC_TYPE_CRASH) 678 + return -EPERM; 679 + 680 + p = dma_alloc_from_contiguous(NULL, nr_pages, get_order(kbuf->buf_align), true); 681 + if (!p) 682 + return -ENOMEM; 683 + 684 + pr_debug("allocated %zu DMA pages at 0x%lx", nr_pages, page_to_boot_pfn(p)); 685 + 686 + mem = page_to_boot_pfn(p) << PAGE_SHIFT; 687 + 688 + if (kimage_is_destination_range(kbuf->image, mem, mem + kbuf->memsz)) { 689 + /* Our region is already in use by a statically defined one. Bail out. */ 690 + pr_debug("CMA overlaps existing mem: 0x%lx+0x%lx\n", mem, kbuf->memsz); 691 + dma_release_from_contiguous(NULL, p, nr_pages); 692 + return -EBUSY; 693 + } 694 + 695 + kbuf->mem = page_to_boot_pfn(p) << PAGE_SHIFT; 696 + kbuf->cma = p; 697 + 698 + arch_kexec_post_alloc_pages(page_address(p), (int)nr_pages, 0); 699 + 700 + return 0; 701 + } 702 + 669 703 /** 670 704 * kexec_locate_mem_hole - find free memory for the purgatory or the next kernel 671 705 * @kbuf: Parameters for the memory search. ··· 726 686 ret = kho_locate_mem_hole(kbuf, locate_mem_hole_callback); 727 687 if (ret <= 0) 728 688 return ret; 689 + 690 + /* 691 + * Try to find a free physically contiguous block of memory first. With that, we 692 + * can avoid any copying at kexec time. 693 + */ 694 + if (!kexec_alloc_contig(kbuf)) 695 + return 0; 729 696 730 697 if (!IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) 731 698 ret = kexec_walk_resources(kbuf, locate_mem_hole_callback); ··· 779 732 /* Ensure minimum alignment needed for segments. */ 780 733 kbuf->memsz = ALIGN(kbuf->memsz, PAGE_SIZE); 781 734 kbuf->buf_align = max(kbuf->buf_align, PAGE_SIZE); 735 + kbuf->cma = NULL; 782 736 783 737 /* Walk the RAM ranges and allocate a suitable range for the buffer */ 784 738 ret = arch_kexec_locate_mem_hole(kbuf); ··· 792 744 ksegment->bufsz = kbuf->bufsz; 793 745 ksegment->mem = kbuf->mem; 794 746 ksegment->memsz = kbuf->memsz; 747 + kbuf->image->segment_cma[kbuf->image->nr_segments] = kbuf->cma; 795 748 kbuf->image->nr_segments++; 796 749 return 0; 797 750 }

+1 -1

kernel/kexec_internal.h

··· 10 10 int sanity_check_segment_list(struct kimage *image); 11 11 void kimage_free_page_list(struct list_head *list); 12 12 void kimage_free(struct kimage *image); 13 - int kimage_load_segment(struct kimage *image, struct kexec_segment *segment); 13 + int kimage_load_segment(struct kimage *image, int idx); 14 14 void kimage_terminate(struct kimage *image); 15 15 int kimage_is_destination_range(struct kimage *image, 16 16 unsigned long start, unsigned long end);

+5 -6

kernel/kthread.c

··· 88 88 /* 89 89 * Variant of to_kthread() that doesn't assume @p is a kthread. 90 90 * 91 - * Per construction; when: 91 + * When "(p->flags & PF_KTHREAD)" is set the task is a kthread and will 92 + * always remain a kthread. For kthreads p->worker_private always 93 + * points to a struct kthread. For tasks that are not kthreads 94 + * p->worker_private is used to point to other things. 92 95 * 93 - * (p->flags & PF_KTHREAD) && p->worker_private 94 - * 95 - * the task is both a kthread and struct kthread is persistent. However 96 - * PF_KTHREAD on it's own is not, kernel_thread() can exec() (See umh.c and 97 - * begin_new_exec()). 96 + * Return NULL for any task that is not a kthread. 98 97 */ 99 98 static inline struct kthread *__to_kthread(struct task_struct *p) 100 99 {

+23 -8

kernel/locking/rwsem.c

··· 27 27 #include <linux/export.h> 28 28 #include <linux/rwsem.h> 29 29 #include <linux/atomic.h> 30 + #include <linux/hung_task.h> 30 31 #include <trace/events/lock.h> 31 32 32 33 #ifndef CONFIG_PREEMPT_RT ··· 182 181 __rwsem_set_reader_owned(sem, current); 183 182 } 184 183 185 - #ifdef CONFIG_DEBUG_RWSEMS 184 + #if defined(CONFIG_DEBUG_RWSEMS) || defined(CONFIG_DETECT_HUNG_TASK_BLOCKER) 186 185 /* 187 186 * Return just the real task structure pointer of the owner 188 187 */ 189 - static inline struct task_struct *rwsem_owner(struct rw_semaphore *sem) 188 + struct task_struct *rwsem_owner(struct rw_semaphore *sem) 190 189 { 191 190 return (struct task_struct *) 192 191 (atomic_long_read(&sem->owner) & ~RWSEM_OWNER_FLAGS_MASK); ··· 195 194 /* 196 195 * Return true if the rwsem is owned by a reader. 197 196 */ 198 - static inline bool is_rwsem_reader_owned(struct rw_semaphore *sem) 197 + bool is_rwsem_reader_owned(struct rw_semaphore *sem) 199 198 { 200 199 /* 201 200 * Check the count to see if it is write-locked. ··· 208 207 } 209 208 210 209 /* 211 - * With CONFIG_DEBUG_RWSEMS configured, it will make sure that if there 212 - * is a task pointer in owner of a reader-owned rwsem, it will be the 213 - * real owner or one of the real owners. The only exception is when the 214 - * unlock is done by up_read_non_owner(). 210 + * With CONFIG_DEBUG_RWSEMS or CONFIG_DETECT_HUNG_TASK_BLOCKER configured, 211 + * it will make sure that the owner field of a reader-owned rwsem either 212 + * points to a real reader-owner(s) or gets cleared. The only exception is 213 + * when the unlock is done by up_read_non_owner(). 215 214 */ 216 215 static inline void rwsem_clear_reader_owned(struct rw_semaphore *sem) 217 216 { ··· 1064 1063 wake_up_q(&wake_q); 1065 1064 1066 1065 trace_contention_begin(sem, LCB_F_READ); 1066 + set_current_state(state); 1067 + 1068 + if (state == TASK_UNINTERRUPTIBLE) 1069 + hung_task_set_blocker(sem, BLOCKER_TYPE_RWSEM_READER); 1067 1070 1068 1071 /* wait to be given the lock */ 1069 1072 for (;;) { 1070 - set_current_state(state); 1071 1073 if (!smp_load_acquire(&waiter.task)) { 1072 1074 /* Matches rwsem_mark_wake()'s smp_store_release(). */ 1073 1075 break; ··· 1085 1081 } 1086 1082 schedule_preempt_disabled(); 1087 1083 lockevent_inc(rwsem_sleep_reader); 1084 + set_current_state(state); 1088 1085 } 1086 + 1087 + if (state == TASK_UNINTERRUPTIBLE) 1088 + hung_task_clear_blocker(); 1089 1089 1090 1090 __set_current_state(TASK_RUNNING); 1091 1091 lockevent_inc(rwsem_rlock); ··· 1152 1144 set_current_state(state); 1153 1145 trace_contention_begin(sem, LCB_F_WRITE); 1154 1146 1147 + if (state == TASK_UNINTERRUPTIBLE) 1148 + hung_task_set_blocker(sem, BLOCKER_TYPE_RWSEM_WRITER); 1149 + 1155 1150 for (;;) { 1156 1151 if (rwsem_try_write_lock(sem, &waiter)) { 1157 1152 /* rwsem_try_write_lock() implies ACQUIRE on success */ ··· 1188 1177 trylock_again: 1189 1178 raw_spin_lock_irq(&sem->wait_lock); 1190 1179 } 1180 + 1181 + if (state == TASK_UNINTERRUPTIBLE) 1182 + hung_task_clear_blocker(); 1183 + 1191 1184 __set_current_state(TASK_RUNNING); 1192 1185 raw_spin_unlock_irq(&sem->wait_lock); 1193 1186 lockevent_inc(rwsem_wlock);

+32 -39

kernel/panic.c

··· 36 36 #include <linux/sysfs.h> 37 37 #include <linux/context_tracking.h> 38 38 #include <linux/seq_buf.h> 39 + #include <linux/sys_info.h> 39 40 #include <trace/events/error_report.h> 40 41 #include <asm/sections.h> 41 42 ··· 64 63 unsigned long panic_on_taint; 65 64 bool panic_on_taint_nousertaint = false; 66 65 static unsigned int warn_limit __read_mostly; 66 + static bool panic_console_replay; 67 67 68 68 bool panic_triggering_all_cpu_backtrace; 69 69 70 70 int panic_timeout = CONFIG_PANIC_TIMEOUT; 71 71 EXPORT_SYMBOL_GPL(panic_timeout); 72 72 73 - #define PANIC_PRINT_TASK_INFO 0x00000001 74 - #define PANIC_PRINT_MEM_INFO 0x00000002 75 - #define PANIC_PRINT_TIMER_INFO 0x00000004 76 - #define PANIC_PRINT_LOCK_INFO 0x00000008 77 - #define PANIC_PRINT_FTRACE_INFO 0x00000010 78 - #define PANIC_PRINT_ALL_PRINTK_MSG 0x00000020 79 - #define PANIC_PRINT_ALL_CPU_BT 0x00000040 80 - #define PANIC_PRINT_BLOCKED_TASKS 0x00000080 81 73 unsigned long panic_print; 82 74 83 75 ATOMIC_NOTIFIER_HEAD(panic_notifier_list); ··· 122 128 return err; 123 129 } 124 130 131 + static int sysctl_panic_print_handler(const struct ctl_table *table, int write, 132 + void *buffer, size_t *lenp, loff_t *ppos) 133 + { 134 + pr_info_once("Kernel: 'panic_print' sysctl interface will be obsoleted by both 'panic_sys_info' and 'panic_console_replay'\n"); 135 + return proc_doulongvec_minmax(table, write, buffer, lenp, ppos); 136 + } 137 + 125 138 static const struct ctl_table kern_panic_table[] = { 126 139 #ifdef CONFIG_SMP 127 140 { ··· 166 165 .data = &panic_print, 167 166 .maxlen = sizeof(unsigned long), 168 167 .mode = 0644, 169 - .proc_handler = proc_doulongvec_minmax, 168 + .proc_handler = sysctl_panic_print_handler, 170 169 }, 171 170 { 172 171 .procname = "panic_on_warn", ··· 194 193 .proc_handler = proc_dointvec, 195 194 }, 196 195 #endif 196 + { 197 + .procname = "panic_sys_info", 198 + .data = &panic_print, 199 + .maxlen = sizeof(panic_print), 200 + .mode = 0644, 201 + .proc_handler = sysctl_sys_info_handler, 202 + }, 197 203 }; 198 204 199 205 static __init int kernel_panic_sysctls_init(void) ··· 210 202 } 211 203 late_initcall(kernel_panic_sysctls_init); 212 204 #endif 205 + 206 + /* The format is "panic_sys_info=tasks,mem,locks,ftrace,..." */ 207 + static int __init setup_panic_sys_info(char *buf) 208 + { 209 + /* There is no risk of race in kernel boot phase */ 210 + panic_print = sys_info_parse_param(buf); 211 + return 1; 212 + } 213 + __setup("panic_sys_info=", setup_panic_sys_info); 213 214 214 215 static atomic_t warn_count = ATOMIC_INIT(0); 215 216 ··· 315 298 } 316 299 EXPORT_SYMBOL(nmi_panic); 317 300 318 - static void panic_print_sys_info(bool console_flush) 319 - { 320 - if (console_flush) { 321 - if (panic_print & PANIC_PRINT_ALL_PRINTK_MSG) 322 - console_flush_on_panic(CONSOLE_REPLAY_ALL); 323 - return; 324 - } 325 - 326 - if (panic_print & PANIC_PRINT_TASK_INFO) 327 - show_state(); 328 - 329 - if (panic_print & PANIC_PRINT_MEM_INFO) 330 - show_mem(); 331 - 332 - if (panic_print & PANIC_PRINT_TIMER_INFO) 333 - sysrq_timer_list_show(); 334 - 335 - if (panic_print & PANIC_PRINT_LOCK_INFO) 336 - debug_show_all_locks(); 337 - 338 - if (panic_print & PANIC_PRINT_FTRACE_INFO) 339 - ftrace_dump(DUMP_ALL); 340 - 341 - if (panic_print & PANIC_PRINT_BLOCKED_TASKS) 342 - show_state_filter(TASK_UNINTERRUPTIBLE); 343 - } 344 - 345 301 void check_panic_on_warn(const char *origin) 346 302 { 347 303 unsigned int limit; ··· 335 345 */ 336 346 static void panic_other_cpus_shutdown(bool crash_kexec) 337 347 { 338 - if (panic_print & PANIC_PRINT_ALL_CPU_BT) { 348 + if (panic_print & SYS_INFO_ALL_CPU_BT) { 339 349 /* Temporary allow non-panic CPUs to write their backtraces. */ 340 350 panic_triggering_all_cpu_backtrace = true; 341 351 trigger_all_cpu_backtrace(); ··· 458 468 */ 459 469 atomic_notifier_call_chain(&panic_notifier_list, 0, buf); 460 470 461 - panic_print_sys_info(false); 471 + sys_info(panic_print); 462 472 463 473 kmsg_dump_desc(KMSG_DUMP_PANIC, buf); 464 474 ··· 487 497 debug_locks_off(); 488 498 console_flush_on_panic(CONSOLE_FLUSH_PENDING); 489 499 490 - panic_print_sys_info(true); 500 + if ((panic_print & SYS_INFO_PANIC_CONSOLE_REPLAY) || 501 + panic_console_replay) 502 + console_flush_on_panic(CONSOLE_REPLAY_ALL); 491 503 492 504 if (!panic_blink) 493 505 panic_blink = no_blink; ··· 941 949 core_param(pause_on_oops, pause_on_oops, int, 0644); 942 950 core_param(panic_on_warn, panic_on_warn, int, 0644); 943 951 core_param(crash_kexec_post_notifiers, crash_kexec_post_notifiers, bool, 0644); 952 + core_param(panic_console_replay, panic_console_replay, bool, 0644); 944 953 945 954 static int __init oops_setup(char *s) 946 955 {

+54 -15

kernel/relay.c

··· 118 118 return NULL; 119 119 120 120 for (i = 0; i < n_pages; i++) { 121 - buf->page_array[i] = alloc_page(GFP_KERNEL); 121 + buf->page_array[i] = alloc_page(GFP_KERNEL | __GFP_ZERO); 122 122 if (unlikely(!buf->page_array[i])) 123 123 goto depopulate; 124 124 set_page_private(buf->page_array[i], (unsigned long)buf); ··· 127 127 if (!mem) 128 128 goto depopulate; 129 129 130 - memset(mem, 0, *size); 131 130 buf->page_count = n_pages; 132 131 return mem; 133 132 ··· 249 250 */ 250 251 251 252 static int relay_subbuf_start(struct rchan_buf *buf, void *subbuf, 252 - void *prev_subbuf, size_t prev_padding) 253 + void *prev_subbuf) 253 254 { 255 + int full = relay_buf_full(buf); 256 + 257 + if (full) 258 + buf->stats.full_count++; 259 + 254 260 if (!buf->chan->cb->subbuf_start) 255 - return !relay_buf_full(buf); 261 + return !full; 256 262 257 263 return buf->chan->cb->subbuf_start(buf, subbuf, 258 - prev_subbuf, prev_padding); 264 + prev_subbuf); 259 265 } 260 266 261 267 /** ··· 302 298 buf->finalized = 0; 303 299 buf->data = buf->start; 304 300 buf->offset = 0; 301 + buf->stats.full_count = 0; 302 + buf->stats.big_count = 0; 305 303 306 304 for (i = 0; i < buf->chan->n_subbufs; i++) 307 305 buf->padding[i] = 0; 308 306 309 - relay_subbuf_start(buf, buf->data, NULL, 0); 307 + relay_subbuf_start(buf, buf->data, NULL); 310 308 } 311 309 312 310 /** ··· 561 555 goto toobig; 562 556 563 557 if (buf->offset != buf->chan->subbuf_size + 1) { 564 - buf->prev_padding = buf->chan->subbuf_size - buf->offset; 558 + size_t prev_padding; 559 + 560 + prev_padding = buf->chan->subbuf_size - buf->offset; 565 561 old_subbuf = buf->subbufs_produced % buf->chan->n_subbufs; 566 - buf->padding[old_subbuf] = buf->prev_padding; 562 + buf->padding[old_subbuf] = prev_padding; 567 563 buf->subbufs_produced++; 568 564 if (buf->dentry) 569 565 d_inode(buf->dentry)->i_size += ··· 590 582 new_subbuf = buf->subbufs_produced % buf->chan->n_subbufs; 591 583 new = buf->start + new_subbuf * buf->chan->subbuf_size; 592 584 buf->offset = 0; 593 - if (!relay_subbuf_start(buf, new, old, buf->prev_padding)) { 585 + if (!relay_subbuf_start(buf, new, old)) { 594 586 buf->offset = buf->chan->subbuf_size + 1; 595 587 return 0; 596 588 } ··· 603 595 return length; 604 596 605 597 toobig: 606 - buf->chan->last_toobig = length; 598 + buf->stats.big_count++; 607 599 return 0; 608 600 } 609 601 EXPORT_SYMBOL_GPL(relay_switch_subbuf); ··· 663 655 if ((buf = *per_cpu_ptr(chan->buf, i))) 664 656 relay_close_buf(buf); 665 657 666 - if (chan->last_toobig) 667 - printk(KERN_WARNING "relay: one or more items not logged " 668 - "[item size (%zd) > sub-buffer size (%zd)]\n", 669 - chan->last_toobig, chan->subbuf_size); 670 - 671 658 list_del(&chan->list); 672 659 kref_put(&chan->kref, relay_destroy_channel); 673 660 mutex_unlock(&relay_channels_mutex); ··· 695 692 mutex_unlock(&relay_channels_mutex); 696 693 } 697 694 EXPORT_SYMBOL_GPL(relay_flush); 695 + 696 + /** 697 + * relay_stats - get channel buffer statistics 698 + * @chan: the channel 699 + * @flags: select particular information to get 700 + * 701 + * Returns the count of certain field that caller specifies. 702 + */ 703 + size_t relay_stats(struct rchan *chan, int flags) 704 + { 705 + unsigned int i, count = 0; 706 + struct rchan_buf *rbuf; 707 + 708 + if (!chan || flags > RELAY_STATS_LAST) 709 + return 0; 710 + 711 + if (chan->is_global) { 712 + rbuf = *per_cpu_ptr(chan->buf, 0); 713 + if (flags & RELAY_STATS_BUF_FULL) 714 + count = rbuf->stats.full_count; 715 + else if (flags & RELAY_STATS_WRT_BIG) 716 + count = rbuf->stats.big_count; 717 + } else { 718 + for_each_online_cpu(i) { 719 + rbuf = *per_cpu_ptr(chan->buf, i); 720 + if (rbuf) { 721 + if (flags & RELAY_STATS_BUF_FULL) 722 + count += rbuf->stats.full_count; 723 + else if (flags & RELAY_STATS_WRT_BIG) 724 + count += rbuf->stats.big_count; 725 + } 726 + } 727 + } 728 + 729 + return count; 730 + } 698 731 699 732 /** 700 733 * relay_file_open - open file op for relay files

+2 -20

kernel/trace/blktrace.c

··· 415 415 size_t count, loff_t *ppos) 416 416 { 417 417 struct blk_trace *bt = filp->private_data; 418 + size_t dropped = relay_stats(bt->rchan, RELAY_STATS_BUF_FULL); 418 419 char buf[16]; 419 420 420 - snprintf(buf, sizeof(buf), "%u\n", atomic_read(&bt->dropped)); 421 + snprintf(buf, sizeof(buf), "%zu\n", dropped); 421 422 422 423 return simple_read_from_buffer(buffer, count, ppos, buf, strlen(buf)); 423 424 } ··· 457 456 .llseek = noop_llseek, 458 457 }; 459 458 460 - /* 461 - * Keep track of how many times we encountered a full subbuffer, to aid 462 - * the user space app in telling how many lost events there were. 463 - */ 464 - static int blk_subbuf_start_callback(struct rchan_buf *buf, void *subbuf, 465 - void *prev_subbuf, size_t prev_padding) 466 - { 467 - struct blk_trace *bt; 468 - 469 - if (!relay_buf_full(buf)) 470 - return 1; 471 - 472 - bt = buf->chan->private_data; 473 - atomic_inc(&bt->dropped); 474 - return 0; 475 - } 476 - 477 459 static int blk_remove_buf_file_callback(struct dentry *dentry) 478 460 { 479 461 debugfs_remove(dentry); ··· 475 491 } 476 492 477 493 static const struct rchan_callbacks blk_relay_callbacks = { 478 - .subbuf_start = blk_subbuf_start_callback, 479 494 .create_buf_file = blk_create_buf_file_callback, 480 495 .remove_buf_file = blk_remove_buf_file_callback, 481 496 }; ··· 563 580 } 564 581 565 582 bt->dev = dev; 566 - atomic_set(&bt->dropped, 0); 567 583 INIT_LIST_HEAD(&bt->running_list); 568 584 569 585 ret = -EIO;

+7 -9

kernel/ucount.c

··· 199 199 } 200 200 } 201 201 202 - static inline bool atomic_long_inc_below(atomic_long_t *v, int u) 202 + static inline bool atomic_long_inc_below(atomic_long_t *v, long u) 203 203 { 204 - long c, old; 205 - c = atomic_long_read(v); 206 - for (;;) { 204 + long c = atomic_long_read(v); 205 + 206 + do { 207 207 if (unlikely(c >= u)) 208 208 return false; 209 - old = atomic_long_cmpxchg(v, c, c+1); 210 - if (likely(old == c)) 211 - return true; 212 - c = old; 213 - } 209 + } while (!atomic_long_try_cmpxchg(v, &c, c+1)); 210 + 211 + return true; 214 212 } 215 213 216 214 struct ucounts *inc_ucount(struct user_namespace *ns, kuid_t uid,

+20

lib/Kconfig.debug

··· 3214 3214 3215 3215 If unsure, say N. 3216 3216 3217 + config TEST_KEXEC_HANDOVER 3218 + bool "Test for Kexec HandOver" 3219 + default n 3220 + depends on KEXEC_HANDOVER 3221 + help 3222 + This option enables test for Kexec HandOver (KHO). 3223 + The test consists of two parts: saving kernel data before kexec and 3224 + restoring the data after kexec and verifying that it was properly 3225 + handed over. This test module creates and saves data on the boot of 3226 + the first kernel and restores and verifies the data on the boot of 3227 + kexec'ed kernel. 3228 + 3229 + For detailed documentation about KHO, see Documentation/core-api/kho. 3230 + 3231 + To run the test run: 3232 + 3233 + tools/testing/selftests/kho/vmtest.sh -h 3234 + 3235 + If unsure, say N. 3236 + 3217 3237 config RATELIMIT_KUNIT_TEST 3218 3238 tristate "KUnit Test for correctness and stress of ratelimit" if !KUNIT_ALL_TESTS 3219 3239 depends on KUNIT

+2 -1

lib/Makefile

··· 40 40 is_single_threaded.o plist.o decompress.o kobject_uevent.o \ 41 41 earlycpio.o seq_buf.o siphash.o dec_and_lock.o \ 42 42 nmi_backtrace.o win_minmax.o memcat_p.o \ 43 - buildid.o objpool.o iomem_copy.o 43 + buildid.o objpool.o iomem_copy.o sys_info.o 44 44 45 45 lib-$(CONFIG_UNION_FIND) += union_find.o 46 46 lib-$(CONFIG_PRINTK) += dump_stack.o ··· 102 102 obj-$(CONFIG_TEST_FREE_PAGES) += test_free_pages.o 103 103 obj-$(CONFIG_TEST_REF_TRACKER) += test_ref_tracker.o 104 104 obj-$(CONFIG_TEST_OBJPOOL) += test_objpool.o 105 + obj-$(CONFIG_TEST_KEXEC_HANDOVER) += test_kho.o 105 106 106 107 obj-$(CONFIG_TEST_FPU) += test_fpu.o 107 108 test_fpu-y := test_fpu_glue.o test_fpu_impl.o

+7 -6

lib/math/div64.c

··· 212 212 213 213 #endif 214 214 215 - /* make sure c is not zero, trigger exception otherwise */ 216 - #pragma GCC diagnostic push 217 - #pragma GCC diagnostic ignored "-Wdiv-by-zero" 218 - if (unlikely(c == 0)) 219 - return 1/0; 220 - #pragma GCC diagnostic pop 215 + /* make sure c is not zero, trigger runtime exception otherwise */ 216 + if (unlikely(c == 0)) { 217 + unsigned long zero = 0; 218 + 219 + OPTIMIZER_HIDE_VAR(zero); 220 + return ~0UL/zero; 221 + } 221 222 222 223 int shift = __builtin_ctzll(c); 223 224

+15 -12

lib/math/gcd.c

··· 11 11 * has decent hardware division. 12 12 */ 13 13 14 + DEFINE_STATIC_KEY_TRUE(efficient_ffs_key); 15 + 14 16 #if !defined(CONFIG_CPU_NO_EFFICIENT_FFS) 15 17 16 18 /* If __ffs is available, the even/odd algorithm benchmarks slower. */ 17 19 18 - /** 19 - * gcd - calculate and return the greatest common divisor of 2 unsigned longs 20 - * @a: first value 21 - * @b: second value 22 - */ 23 - unsigned long gcd(unsigned long a, unsigned long b) 20 + static unsigned long binary_gcd(unsigned long a, unsigned long b) 24 21 { 25 22 unsigned long r = a | b; 26 - 27 - if (!a || !b) 28 - return r; 29 23 30 24 b >>= __ffs(b); 31 25 if (b == 1) ··· 38 44 } 39 45 } 40 46 41 - #else 47 + #endif 42 48 43 49 /* If normalization is done by loops, the even/odd algorithm is a win. */ 50 + 51 + /** 52 + * gcd - calculate and return the greatest common divisor of 2 unsigned longs 53 + * @a: first value 54 + * @b: second value 55 + */ 44 56 unsigned long gcd(unsigned long a, unsigned long b) 45 57 { 46 58 unsigned long r = a | b; 47 59 48 60 if (!a || !b) 49 61 return r; 62 + 63 + #if !defined(CONFIG_CPU_NO_EFFICIENT_FFS) 64 + if (static_branch_likely(&efficient_ffs_key)) 65 + return binary_gcd(a, b); 66 + #endif 50 67 51 68 /* Isolate lsbit of r */ 52 69 r &= -r; ··· 84 79 a >>= 1; 85 80 } 86 81 } 87 - 88 - #endif 89 82 90 83 EXPORT_SYMBOL_GPL(gcd);

-3

lib/raid6/algos.c

··· 18 18 #else 19 19 #include <linux/module.h> 20 20 #include <linux/gfp.h> 21 - /* In .bss so it's zeroed */ 22 - const char raid6_empty_zero_page[PAGE_SIZE] __attribute__((aligned(256))); 23 - EXPORT_SYMBOL(raid6_empty_zero_page); 24 21 #endif 25 22 26 23 struct raid6_calls raid6_call;

+3 -3

lib/raid6/recov.c

··· 31 31 Use the dead data pages as temporary storage for 32 32 delta p and delta q */ 33 33 dp = (u8 *)ptrs[faila]; 34 - ptrs[faila] = (void *)raid6_empty_zero_page; 34 + ptrs[faila] = raid6_get_zero_page(); 35 35 ptrs[disks-2] = dp; 36 36 dq = (u8 *)ptrs[failb]; 37 - ptrs[failb] = (void *)raid6_empty_zero_page; 37 + ptrs[failb] = raid6_get_zero_page(); 38 38 ptrs[disks-1] = dq; 39 39 40 40 raid6_call.gen_syndrome(disks, bytes, ptrs); ··· 72 72 /* Compute syndrome with zero for the missing data page 73 73 Use the dead data page as temporary storage for delta q */ 74 74 dq = (u8 *)ptrs[faila]; 75 - ptrs[faila] = (void *)raid6_empty_zero_page; 75 + ptrs[faila] = raid6_get_zero_page(); 76 76 ptrs[disks-1] = dq; 77 77 78 78 raid6_call.gen_syndrome(disks, bytes, ptrs);

+3 -3

lib/raid6/recov_avx2.c

··· 28 28 Use the dead data pages as temporary storage for 29 29 delta p and delta q */ 30 30 dp = (u8 *)ptrs[faila]; 31 - ptrs[faila] = (void *)raid6_empty_zero_page; 31 + ptrs[faila] = raid6_get_zero_page(); 32 32 ptrs[disks-2] = dp; 33 33 dq = (u8 *)ptrs[failb]; 34 - ptrs[failb] = (void *)raid6_empty_zero_page; 34 + ptrs[failb] = raid6_get_zero_page(); 35 35 ptrs[disks-1] = dq; 36 36 37 37 raid6_call.gen_syndrome(disks, bytes, ptrs); ··· 196 196 /* Compute syndrome with zero for the missing data page 197 197 Use the dead data page as temporary storage for delta q */ 198 198 dq = (u8 *)ptrs[faila]; 199 - ptrs[faila] = (void *)raid6_empty_zero_page; 199 + ptrs[faila] = raid6_get_zero_page(); 200 200 ptrs[disks-1] = dq; 201 201 202 202 raid6_call.gen_syndrome(disks, bytes, ptrs);

+3 -3

lib/raid6/recov_avx512.c

··· 37 37 */ 38 38 39 39 dp = (u8 *)ptrs[faila]; 40 - ptrs[faila] = (void *)raid6_empty_zero_page; 40 + ptrs[faila] = raid6_get_zero_page(); 41 41 ptrs[disks-2] = dp; 42 42 dq = (u8 *)ptrs[failb]; 43 - ptrs[failb] = (void *)raid6_empty_zero_page; 43 + ptrs[failb] = raid6_get_zero_page(); 44 44 ptrs[disks-1] = dq; 45 45 46 46 raid6_call.gen_syndrome(disks, bytes, ptrs); ··· 238 238 */ 239 239 240 240 dq = (u8 *)ptrs[faila]; 241 - ptrs[faila] = (void *)raid6_empty_zero_page; 241 + ptrs[faila] = raid6_get_zero_page(); 242 242 ptrs[disks-1] = dq; 243 243 244 244 raid6_call.gen_syndrome(disks, bytes, ptrs);

+6 -6

lib/raid6/recov_loongarch_simd.c

··· 42 42 * delta p and delta q 43 43 */ 44 44 dp = (u8 *)ptrs[faila]; 45 - ptrs[faila] = (void *)raid6_empty_zero_page; 45 + ptrs[faila] = raid6_get_zero_page(); 46 46 ptrs[disks - 2] = dp; 47 47 dq = (u8 *)ptrs[failb]; 48 - ptrs[failb] = (void *)raid6_empty_zero_page; 48 + ptrs[failb] = raid6_get_zero_page(); 49 49 ptrs[disks - 1] = dq; 50 50 51 51 raid6_call.gen_syndrome(disks, bytes, ptrs); ··· 197 197 * Use the dead data page as temporary storage for delta q 198 198 */ 199 199 dq = (u8 *)ptrs[faila]; 200 - ptrs[faila] = (void *)raid6_empty_zero_page; 200 + ptrs[faila] = raid6_get_zero_page(); 201 201 ptrs[disks - 1] = dq; 202 202 203 203 raid6_call.gen_syndrome(disks, bytes, ptrs); ··· 316 316 * delta p and delta q 317 317 */ 318 318 dp = (u8 *)ptrs[faila]; 319 - ptrs[faila] = (void *)raid6_empty_zero_page; 319 + ptrs[faila] = raid6_get_zero_page(); 320 320 ptrs[disks - 2] = dp; 321 321 dq = (u8 *)ptrs[failb]; 322 - ptrs[failb] = (void *)raid6_empty_zero_page; 322 + ptrs[failb] = raid6_get_zero_page(); 323 323 ptrs[disks - 1] = dq; 324 324 325 325 raid6_call.gen_syndrome(disks, bytes, ptrs); ··· 436 436 * Use the dead data page as temporary storage for delta q 437 437 */ 438 438 dq = (u8 *)ptrs[faila]; 439 - ptrs[faila] = (void *)raid6_empty_zero_page; 439 + ptrs[faila] = raid6_get_zero_page(); 440 440 ptrs[disks - 1] = dq; 441 441 442 442 raid6_call.gen_syndrome(disks, bytes, ptrs);

+3 -3

lib/raid6/recov_neon.c

··· 36 36 * delta p and delta q 37 37 */ 38 38 dp = (u8 *)ptrs[faila]; 39 - ptrs[faila] = (void *)raid6_empty_zero_page; 39 + ptrs[faila] = raid6_get_zero_page(); 40 40 ptrs[disks - 2] = dp; 41 41 dq = (u8 *)ptrs[failb]; 42 - ptrs[failb] = (void *)raid6_empty_zero_page; 42 + ptrs[failb] = raid6_get_zero_page(); 43 43 ptrs[disks - 1] = dq; 44 44 45 45 raid6_call.gen_syndrome(disks, bytes, ptrs); ··· 74 74 * Use the dead data page as temporary storage for delta q 75 75 */ 76 76 dq = (u8 *)ptrs[faila]; 77 - ptrs[faila] = (void *)raid6_empty_zero_page; 77 + ptrs[faila] = raid6_get_zero_page(); 78 78 ptrs[disks - 1] = dq; 79 79 80 80 raid6_call.gen_syndrome(disks, bytes, ptrs);

+3 -3

lib/raid6/recov_rvv.c

··· 165 165 * delta p and delta q 166 166 */ 167 167 dp = (u8 *)ptrs[faila]; 168 - ptrs[faila] = (void *)raid6_empty_zero_page; 168 + ptrs[faila] = raid6_get_zero_page(); 169 169 ptrs[disks - 2] = dp; 170 170 dq = (u8 *)ptrs[failb]; 171 - ptrs[failb] = (void *)raid6_empty_zero_page; 171 + ptrs[failb] = raid6_get_zero_page(); 172 172 ptrs[disks - 1] = dq; 173 173 174 174 raid6_call.gen_syndrome(disks, bytes, ptrs); ··· 203 203 * Use the dead data page as temporary storage for delta q 204 204 */ 205 205 dq = (u8 *)ptrs[faila]; 206 - ptrs[faila] = (void *)raid6_empty_zero_page; 206 + ptrs[faila] = raid6_get_zero_page(); 207 207 ptrs[disks - 1] = dq; 208 208 209 209 raid6_call.gen_syndrome(disks, bytes, ptrs);

+3 -3

lib/raid6/recov_s390xc.c

··· 34 34 Use the dead data pages as temporary storage for 35 35 delta p and delta q */ 36 36 dp = (u8 *)ptrs[faila]; 37 - ptrs[faila] = (void *)raid6_empty_zero_page; 37 + ptrs[faila] = raid6_get_zero_page(); 38 38 ptrs[disks-2] = dp; 39 39 dq = (u8 *)ptrs[failb]; 40 - ptrs[failb] = (void *)raid6_empty_zero_page; 40 + ptrs[failb] = raid6_get_zero_page(); 41 41 ptrs[disks-1] = dq; 42 42 43 43 raid6_call.gen_syndrome(disks, bytes, ptrs); ··· 81 81 /* Compute syndrome with zero for the missing data page 82 82 Use the dead data page as temporary storage for delta q */ 83 83 dq = (u8 *)ptrs[faila]; 84 - ptrs[faila] = (void *)raid6_empty_zero_page; 84 + ptrs[faila] = raid6_get_zero_page(); 85 85 ptrs[disks-1] = dq; 86 86 87 87 raid6_call.gen_syndrome(disks, bytes, ptrs);

+3 -3

lib/raid6/recov_ssse3.c

··· 30 30 Use the dead data pages as temporary storage for 31 31 delta p and delta q */ 32 32 dp = (u8 *)ptrs[faila]; 33 - ptrs[faila] = (void *)raid6_empty_zero_page; 33 + ptrs[faila] = raid6_get_zero_page(); 34 34 ptrs[disks-2] = dp; 35 35 dq = (u8 *)ptrs[failb]; 36 - ptrs[failb] = (void *)raid6_empty_zero_page; 36 + ptrs[failb] = raid6_get_zero_page(); 37 37 ptrs[disks-1] = dq; 38 38 39 39 raid6_call.gen_syndrome(disks, bytes, ptrs); ··· 203 203 /* Compute syndrome with zero for the missing data page 204 204 Use the dead data page as temporary storage for delta q */ 205 205 dq = (u8 *)ptrs[faila]; 206 - ptrs[faila] = (void *)raid6_empty_zero_page; 206 + ptrs[faila] = raid6_get_zero_page(); 207 207 ptrs[disks-1] = dq; 208 208 209 209 raid6_call.gen_syndrome(disks, bytes, ptrs);

+58 -9

lib/stackdepot.c

··· 36 36 #include <linux/memblock.h> 37 37 #include <linux/kasan-enabled.h> 38 38 39 - #define DEPOT_POOLS_CAP 8192 40 - /* The pool_index is offset by 1 so the first record does not have a 0 handle. */ 41 - #define DEPOT_MAX_POOLS \ 42 - (((1LL << (DEPOT_POOL_INDEX_BITS)) - 1 < DEPOT_POOLS_CAP) ? \ 43 - (1LL << (DEPOT_POOL_INDEX_BITS)) - 1 : DEPOT_POOLS_CAP) 39 + /* 40 + * The pool_index is offset by 1 so the first record does not have a 0 handle. 41 + */ 42 + static unsigned int stack_max_pools __read_mostly = 43 + MIN((1LL << DEPOT_POOL_INDEX_BITS) - 1, 8192); 44 44 45 45 static bool stack_depot_disabled; 46 46 static bool __stack_depot_early_init_requested __initdata = IS_ENABLED(CONFIG_STACKDEPOT_ALWAYS_INIT); ··· 62 62 static unsigned int stack_hash_mask; 63 63 64 64 /* Array of memory regions that store stack records. */ 65 - static void *stack_pools[DEPOT_MAX_POOLS]; 65 + static void **stack_pools; 66 66 /* Newly allocated pool that is not yet added to stack_pools. */ 67 67 static void *new_pool; 68 68 /* Number of pools in stack_pools. */ ··· 100 100 return kstrtobool(str, &stack_depot_disabled); 101 101 } 102 102 early_param("stack_depot_disable", disable_stack_depot); 103 + 104 + static int __init parse_max_pools(char *str) 105 + { 106 + const long long limit = (1LL << (DEPOT_POOL_INDEX_BITS)) - 1; 107 + unsigned int max_pools; 108 + int rv; 109 + 110 + rv = kstrtouint(str, 0, &max_pools); 111 + if (rv) 112 + return rv; 113 + 114 + if (max_pools < 1024) { 115 + pr_err("stack_depot_max_pools below 1024, using default of %u\n", 116 + stack_max_pools); 117 + goto out; 118 + } 119 + 120 + if (max_pools > limit) { 121 + pr_err("stack_depot_max_pools exceeds %lld, using default of %u\n", 122 + limit, stack_max_pools); 123 + goto out; 124 + } 125 + 126 + stack_max_pools = max_pools; 127 + out: 128 + return 0; 129 + } 130 + early_param("stack_depot_max_pools", parse_max_pools); 103 131 104 132 void __init stack_depot_request_early_init(void) 105 133 { ··· 210 182 } 211 183 init_stack_table(entries); 212 184 185 + pr_info("allocating space for %u stack pools via memblock\n", 186 + stack_max_pools); 187 + stack_pools = 188 + memblock_alloc(stack_max_pools * sizeof(void *), PAGE_SIZE); 189 + if (!stack_pools) { 190 + pr_err("stack pools allocation failed, disabling\n"); 191 + memblock_free(stack_table, entries * sizeof(struct list_head)); 192 + stack_depot_disabled = true; 193 + return -ENOMEM; 194 + } 195 + 213 196 return 0; 214 197 } 215 198 ··· 270 231 stack_hash_mask = entries - 1; 271 232 init_stack_table(entries); 272 233 234 + pr_info("allocating space for %u stack pools via kvcalloc\n", 235 + stack_max_pools); 236 + stack_pools = kvcalloc(stack_max_pools, sizeof(void *), GFP_KERNEL); 237 + if (!stack_pools) { 238 + pr_err("stack pools allocation failed, disabling\n"); 239 + kvfree(stack_table); 240 + stack_depot_disabled = true; 241 + ret = -ENOMEM; 242 + } 243 + 273 244 out_unlock: 274 245 mutex_unlock(&stack_depot_init_mutex); 275 246 ··· 294 245 { 295 246 lockdep_assert_held(&pool_lock); 296 247 297 - if (unlikely(pools_num >= DEPOT_MAX_POOLS)) { 248 + if (unlikely(pools_num >= stack_max_pools)) { 298 249 /* Bail out if we reached the pool limit. */ 299 - WARN_ON_ONCE(pools_num > DEPOT_MAX_POOLS); /* should never happen */ 250 + WARN_ON_ONCE(pools_num > stack_max_pools); /* should never happen */ 300 251 WARN_ON_ONCE(!new_pool); /* to avoid unnecessary pre-allocation */ 301 252 WARN_ONCE(1, "Stack depot reached limit capacity"); 302 253 return false; ··· 322 273 * NULL; do not reset to NULL if we have reached the maximum number of 323 274 * pools. 324 275 */ 325 - if (pools_num < DEPOT_MAX_POOLS) 276 + if (pools_num < stack_max_pools) 326 277 WRITE_ONCE(new_pool, NULL); 327 278 else 328 279 WRITE_ONCE(new_pool, STACK_DEPOT_POISON);

+122

lib/sys_info.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + #include <linux/sched/debug.h> 3 + #include <linux/console.h> 4 + #include <linux/kernel.h> 5 + #include <linux/ftrace.h> 6 + #include <linux/sysctl.h> 7 + #include <linux/nmi.h> 8 + 9 + #include <linux/sys_info.h> 10 + 11 + struct sys_info_name { 12 + unsigned long bit; 13 + const char *name; 14 + }; 15 + 16 + /* 17 + * When 'si_names' gets updated, please make sure the 'sys_info_avail' 18 + * below is updated accordingly. 19 + */ 20 + static const struct sys_info_name si_names[] = { 21 + { SYS_INFO_TASKS, "tasks" }, 22 + { SYS_INFO_MEM, "mem" }, 23 + { SYS_INFO_TIMERS, "timers" }, 24 + { SYS_INFO_LOCKS, "locks" }, 25 + { SYS_INFO_FTRACE, "ftrace" }, 26 + { SYS_INFO_ALL_CPU_BT, "all_bt" }, 27 + { SYS_INFO_BLOCKED_TASKS, "blocked_tasks" }, 28 + }; 29 + 30 + /* Expecting string like "xxx_sys_info=tasks,mem,timers,locks,ftrace,..." */ 31 + unsigned long sys_info_parse_param(char *str) 32 + { 33 + unsigned long si_bits = 0; 34 + char *s, *name; 35 + int i; 36 + 37 + s = str; 38 + while ((name = strsep(&s, ",")) && *name) { 39 + for (i = 0; i < ARRAY_SIZE(si_names); i++) { 40 + if (!strcmp(name, si_names[i].name)) { 41 + si_bits |= si_names[i].bit; 42 + break; 43 + } 44 + } 45 + } 46 + 47 + return si_bits; 48 + } 49 + 50 + #ifdef CONFIG_SYSCTL 51 + 52 + static const char sys_info_avail[] __maybe_unused = "tasks,mem,timers,locks,ftrace,all_bt,blocked_tasks"; 53 + 54 + int sysctl_sys_info_handler(const struct ctl_table *ro_table, int write, 55 + void *buffer, size_t *lenp, 56 + loff_t *ppos) 57 + { 58 + char names[sizeof(sys_info_avail) + 1]; 59 + struct ctl_table table; 60 + unsigned long *si_bits_global; 61 + 62 + si_bits_global = ro_table->data; 63 + 64 + if (write) { 65 + unsigned long si_bits; 66 + int ret; 67 + 68 + table = *ro_table; 69 + table.data = names; 70 + table.maxlen = sizeof(names); 71 + ret = proc_dostring(&table, write, buffer, lenp, ppos); 72 + if (ret) 73 + return ret; 74 + 75 + si_bits = sys_info_parse_param(names); 76 + /* The access to the global value is not synchronized. */ 77 + WRITE_ONCE(*si_bits_global, si_bits); 78 + return 0; 79 + } else { 80 + /* for 'read' operation */ 81 + char *delim = ""; 82 + int i, len = 0; 83 + 84 + for (i = 0; i < ARRAY_SIZE(si_names); i++) { 85 + if (*si_bits_global & si_names[i].bit) { 86 + len += scnprintf(names + len, sizeof(names) - len, 87 + "%s%s", delim, si_names[i].name); 88 + delim = ","; 89 + } 90 + } 91 + 92 + table = *ro_table; 93 + table.data = names; 94 + table.maxlen = sizeof(names); 95 + return proc_dostring(&table, write, buffer, lenp, ppos); 96 + } 97 + } 98 + #endif 99 + 100 + void sys_info(unsigned long si_mask) 101 + { 102 + if (si_mask & SYS_INFO_TASKS) 103 + show_state(); 104 + 105 + if (si_mask & SYS_INFO_MEM) 106 + show_mem(); 107 + 108 + if (si_mask & SYS_INFO_TIMERS) 109 + sysrq_timer_list_show(); 110 + 111 + if (si_mask & SYS_INFO_LOCKS) 112 + debug_show_all_locks(); 113 + 114 + if (si_mask & SYS_INFO_FTRACE) 115 + ftrace_dump(DUMP_ALL); 116 + 117 + if (si_mask & SYS_INFO_ALL_CPU_BT) 118 + trigger_all_cpu_backtrace(); 119 + 120 + if (si_mask & SYS_INFO_BLOCKED_TASKS) 121 + show_state_filter(TASK_UNINTERRUPTIBLE); 122 + }

+305

lib/test_kho.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-or-later 2 + /* 3 + * Test module for KHO 4 + * Copyright (c) 2025 Microsoft Corporation. 5 + * 6 + * Authors: 7 + * Saurabh Sengar <ssengar@microsoft.com> 8 + * Mike Rapoport <rppt@kernel.org> 9 + */ 10 + 11 + #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt 12 + 13 + #include <linux/mm.h> 14 + #include <linux/gfp.h> 15 + #include <linux/slab.h> 16 + #include <linux/kexec.h> 17 + #include <linux/libfdt.h> 18 + #include <linux/module.h> 19 + #include <linux/printk.h> 20 + #include <linux/vmalloc.h> 21 + #include <linux/kexec_handover.h> 22 + 23 + #include <net/checksum.h> 24 + 25 + #define KHO_TEST_MAGIC 0x4b484f21 /* KHO! */ 26 + #define KHO_TEST_FDT "kho_test" 27 + #define KHO_TEST_COMPAT "kho-test-v1" 28 + 29 + static long max_mem = (PAGE_SIZE << MAX_PAGE_ORDER) * 2; 30 + module_param(max_mem, long, 0644); 31 + 32 + struct kho_test_state { 33 + unsigned int nr_folios; 34 + struct folio **folios; 35 + struct folio *fdt; 36 + __wsum csum; 37 + }; 38 + 39 + static struct kho_test_state kho_test_state; 40 + 41 + static int kho_test_notifier(struct notifier_block *self, unsigned long cmd, 42 + void *v) 43 + { 44 + struct kho_test_state *state = &kho_test_state; 45 + struct kho_serialization *ser = v; 46 + int err = 0; 47 + 48 + switch (cmd) { 49 + case KEXEC_KHO_ABORT: 50 + return NOTIFY_DONE; 51 + case KEXEC_KHO_FINALIZE: 52 + /* Handled below */ 53 + break; 54 + default: 55 + return NOTIFY_BAD; 56 + } 57 + 58 + err |= kho_preserve_folio(state->fdt); 59 + err |= kho_add_subtree(ser, KHO_TEST_FDT, folio_address(state->fdt)); 60 + 61 + return err ? NOTIFY_BAD : NOTIFY_DONE; 62 + } 63 + 64 + static struct notifier_block kho_test_nb = { 65 + .notifier_call = kho_test_notifier, 66 + }; 67 + 68 + static int kho_test_save_data(struct kho_test_state *state, void *fdt) 69 + { 70 + phys_addr_t *folios_info __free(kvfree) = NULL; 71 + int err = 0; 72 + 73 + folios_info = kvmalloc_array(state->nr_folios, sizeof(*folios_info), 74 + GFP_KERNEL); 75 + if (!folios_info) 76 + return -ENOMEM; 77 + 78 + for (int i = 0; i < state->nr_folios; i++) { 79 + struct folio *folio = state->folios[i]; 80 + unsigned int order = folio_order(folio); 81 + 82 + folios_info[i] = virt_to_phys(folio_address(folio)) | order; 83 + 84 + err = kho_preserve_folio(folio); 85 + if (err) 86 + return err; 87 + } 88 + 89 + err |= fdt_begin_node(fdt, "data"); 90 + err |= fdt_property(fdt, "nr_folios", &state->nr_folios, 91 + sizeof(state->nr_folios)); 92 + err |= fdt_property(fdt, "folios_info", folios_info, 93 + state->nr_folios * sizeof(*folios_info)); 94 + err |= fdt_property(fdt, "csum", &state->csum, sizeof(state->csum)); 95 + err |= fdt_end_node(fdt); 96 + 97 + return err; 98 + } 99 + 100 + static int kho_test_prepare_fdt(struct kho_test_state *state) 101 + { 102 + const char compatible[] = KHO_TEST_COMPAT; 103 + unsigned int magic = KHO_TEST_MAGIC; 104 + ssize_t fdt_size; 105 + int err = 0; 106 + void *fdt; 107 + 108 + fdt_size = state->nr_folios * sizeof(phys_addr_t) + PAGE_SIZE; 109 + state->fdt = folio_alloc(GFP_KERNEL, get_order(fdt_size)); 110 + if (!state->fdt) 111 + return -ENOMEM; 112 + 113 + fdt = folio_address(state->fdt); 114 + 115 + err |= fdt_create(fdt, fdt_size); 116 + err |= fdt_finish_reservemap(fdt); 117 + 118 + err |= fdt_begin_node(fdt, ""); 119 + err |= fdt_property(fdt, "compatible", compatible, sizeof(compatible)); 120 + err |= fdt_property(fdt, "magic", &magic, sizeof(magic)); 121 + err |= kho_test_save_data(state, fdt); 122 + err |= fdt_end_node(fdt); 123 + 124 + err |= fdt_finish(fdt); 125 + 126 + if (err) 127 + folio_put(state->fdt); 128 + 129 + return err; 130 + } 131 + 132 + static int kho_test_generate_data(struct kho_test_state *state) 133 + { 134 + size_t alloc_size = 0; 135 + __wsum csum = 0; 136 + 137 + while (alloc_size < max_mem) { 138 + int order = get_random_u32() % NR_PAGE_ORDERS; 139 + struct folio *folio; 140 + unsigned int size; 141 + void *addr; 142 + 143 + /* cap allocation so that we won't exceed max_mem */ 144 + if (alloc_size + (PAGE_SIZE << order) > max_mem) { 145 + order = get_order(max_mem - alloc_size); 146 + if (order) 147 + order--; 148 + } 149 + size = PAGE_SIZE << order; 150 + 151 + folio = folio_alloc(GFP_KERNEL | __GFP_NORETRY, order); 152 + if (!folio) 153 + goto err_free_folios; 154 + 155 + state->folios[state->nr_folios++] = folio; 156 + addr = folio_address(folio); 157 + get_random_bytes(addr, size); 158 + csum = csum_partial(addr, size, csum); 159 + alloc_size += size; 160 + } 161 + 162 + state->csum = csum; 163 + return 0; 164 + 165 + err_free_folios: 166 + for (int i = 0; i < state->nr_folios; i++) 167 + folio_put(state->folios[i]); 168 + return -ENOMEM; 169 + } 170 + 171 + static int kho_test_save(void) 172 + { 173 + struct kho_test_state *state = &kho_test_state; 174 + struct folio **folios __free(kvfree) = NULL; 175 + unsigned long max_nr; 176 + int err; 177 + 178 + max_mem = PAGE_ALIGN(max_mem); 179 + max_nr = max_mem >> PAGE_SHIFT; 180 + 181 + folios = kvmalloc_array(max_nr, sizeof(*state->folios), GFP_KERNEL); 182 + if (!folios) 183 + return -ENOMEM; 184 + state->folios = folios; 185 + 186 + err = kho_test_generate_data(state); 187 + if (err) 188 + return err; 189 + 190 + err = kho_test_prepare_fdt(state); 191 + if (err) 192 + return err; 193 + 194 + return register_kho_notifier(&kho_test_nb); 195 + } 196 + 197 + static int kho_test_restore_data(const void *fdt, int node) 198 + { 199 + const unsigned int *nr_folios; 200 + const phys_addr_t *folios_info; 201 + const __wsum *old_csum; 202 + __wsum csum = 0; 203 + int len; 204 + 205 + node = fdt_path_offset(fdt, "/data"); 206 + 207 + nr_folios = fdt_getprop(fdt, node, "nr_folios", &len); 208 + if (!nr_folios || len != sizeof(*nr_folios)) 209 + return -EINVAL; 210 + 211 + old_csum = fdt_getprop(fdt, node, "csum", &len); 212 + if (!old_csum || len != sizeof(*old_csum)) 213 + return -EINVAL; 214 + 215 + folios_info = fdt_getprop(fdt, node, "folios_info", &len); 216 + if (!folios_info || len != sizeof(*folios_info) * *nr_folios) 217 + return -EINVAL; 218 + 219 + for (int i = 0; i < *nr_folios; i++) { 220 + unsigned int order = folios_info[i] & ~PAGE_MASK; 221 + phys_addr_t phys = folios_info[i] & PAGE_MASK; 222 + unsigned int size = PAGE_SIZE << order; 223 + struct folio *folio; 224 + 225 + folio = kho_restore_folio(phys); 226 + if (!folio) 227 + break; 228 + 229 + if (folio_order(folio) != order) 230 + break; 231 + 232 + csum = csum_partial(folio_address(folio), size, csum); 233 + folio_put(folio); 234 + } 235 + 236 + if (csum != *old_csum) 237 + return -EINVAL; 238 + 239 + return 0; 240 + } 241 + 242 + static int kho_test_restore(phys_addr_t fdt_phys) 243 + { 244 + void *fdt = phys_to_virt(fdt_phys); 245 + const unsigned int *magic; 246 + int node, len, err; 247 + 248 + node = fdt_path_offset(fdt, "/"); 249 + if (node < 0) 250 + return -EINVAL; 251 + 252 + if (fdt_node_check_compatible(fdt, node, KHO_TEST_COMPAT)) 253 + return -EINVAL; 254 + 255 + magic = fdt_getprop(fdt, node, "magic", &len); 256 + if (!magic || len != sizeof(*magic)) 257 + return -EINVAL; 258 + 259 + if (*magic != KHO_TEST_MAGIC) 260 + return -EINVAL; 261 + 262 + err = kho_test_restore_data(fdt, node); 263 + if (err) 264 + return err; 265 + 266 + pr_info("KHO restore succeeded\n"); 267 + return 0; 268 + } 269 + 270 + static int __init kho_test_init(void) 271 + { 272 + phys_addr_t fdt_phys; 273 + int err; 274 + 275 + err = kho_retrieve_subtree(KHO_TEST_FDT, &fdt_phys); 276 + if (!err) 277 + return kho_test_restore(fdt_phys); 278 + 279 + if (err != -ENOENT) { 280 + pr_warn("failed to retrieve %s FDT: %d\n", KHO_TEST_FDT, err); 281 + return err; 282 + } 283 + 284 + return kho_test_save(); 285 + } 286 + module_init(kho_test_init); 287 + 288 + static void kho_test_cleanup(void) 289 + { 290 + for (int i = 0; i < kho_test_state.nr_folios; i++) 291 + folio_put(kho_test_state.folios[i]); 292 + 293 + kvfree(kho_test_state.folios); 294 + } 295 + 296 + static void __exit kho_test_exit(void) 297 + { 298 + unregister_kho_notifier(&kho_test_nb); 299 + kho_test_cleanup(); 300 + } 301 + module_exit(kho_test_exit); 302 + 303 + MODULE_AUTHOR("Mike Rapoport <rppt@kernel.org>"); 304 + MODULE_DESCRIPTION("KHO test module"); 305 + MODULE_LICENSE("GPL");

-107

lib/xxhash.c

··· 267 267 } 268 268 EXPORT_SYMBOL(xxh64_reset); 269 269 270 - int xxh32_update(struct xxh32_state *state, const void *input, const size_t len) 271 - { 272 - const uint8_t *p = (const uint8_t *)input; 273 - const uint8_t *const b_end = p + len; 274 - 275 - if (input == NULL) 276 - return -EINVAL; 277 - 278 - state->total_len_32 += (uint32_t)len; 279 - state->large_len |= (len >= 16) | (state->total_len_32 >= 16); 280 - 281 - if (state->memsize + len < 16) { /* fill in tmp buffer */ 282 - memcpy((uint8_t *)(state->mem32) + state->memsize, input, len); 283 - state->memsize += (uint32_t)len; 284 - return 0; 285 - } 286 - 287 - if (state->memsize) { /* some data left from previous update */ 288 - const uint32_t *p32 = state->mem32; 289 - 290 - memcpy((uint8_t *)(state->mem32) + state->memsize, input, 291 - 16 - state->memsize); 292 - 293 - state->v1 = xxh32_round(state->v1, get_unaligned_le32(p32)); 294 - p32++; 295 - state->v2 = xxh32_round(state->v2, get_unaligned_le32(p32)); 296 - p32++; 297 - state->v3 = xxh32_round(state->v3, get_unaligned_le32(p32)); 298 - p32++; 299 - state->v4 = xxh32_round(state->v4, get_unaligned_le32(p32)); 300 - p32++; 301 - 302 - p += 16-state->memsize; 303 - state->memsize = 0; 304 - } 305 - 306 - if (p <= b_end - 16) { 307 - const uint8_t *const limit = b_end - 16; 308 - uint32_t v1 = state->v1; 309 - uint32_t v2 = state->v2; 310 - uint32_t v3 = state->v3; 311 - uint32_t v4 = state->v4; 312 - 313 - do { 314 - v1 = xxh32_round(v1, get_unaligned_le32(p)); 315 - p += 4; 316 - v2 = xxh32_round(v2, get_unaligned_le32(p)); 317 - p += 4; 318 - v3 = xxh32_round(v3, get_unaligned_le32(p)); 319 - p += 4; 320 - v4 = xxh32_round(v4, get_unaligned_le32(p)); 321 - p += 4; 322 - } while (p <= limit); 323 - 324 - state->v1 = v1; 325 - state->v2 = v2; 326 - state->v3 = v3; 327 - state->v4 = v4; 328 - } 329 - 330 - if (p < b_end) { 331 - memcpy(state->mem32, p, (size_t)(b_end-p)); 332 - state->memsize = (uint32_t)(b_end-p); 333 - } 334 - 335 - return 0; 336 - } 337 - EXPORT_SYMBOL(xxh32_update); 338 - 339 - uint32_t xxh32_digest(const struct xxh32_state *state) 340 - { 341 - const uint8_t *p = (const uint8_t *)state->mem32; 342 - const uint8_t *const b_end = (const uint8_t *)(state->mem32) + 343 - state->memsize; 344 - uint32_t h32; 345 - 346 - if (state->large_len) { 347 - h32 = xxh_rotl32(state->v1, 1) + xxh_rotl32(state->v2, 7) + 348 - xxh_rotl32(state->v3, 12) + xxh_rotl32(state->v4, 18); 349 - } else { 350 - h32 = state->v3 /* == seed */ + PRIME32_5; 351 - } 352 - 353 - h32 += state->total_len_32; 354 - 355 - while (p + 4 <= b_end) { 356 - h32 += get_unaligned_le32(p) * PRIME32_3; 357 - h32 = xxh_rotl32(h32, 17) * PRIME32_4; 358 - p += 4; 359 - } 360 - 361 - while (p < b_end) { 362 - h32 += (*p) * PRIME32_5; 363 - h32 = xxh_rotl32(h32, 11) * PRIME32_1; 364 - p++; 365 - } 366 - 367 - h32 ^= h32 >> 15; 368 - h32 *= PRIME32_2; 369 - h32 ^= h32 >> 13; 370 - h32 *= PRIME32_3; 371 - h32 ^= h32 >> 16; 372 - 373 - return h32; 374 - } 375 - EXPORT_SYMBOL(xxh32_digest); 376 - 377 270 int xxh64_update(struct xxh64_state *state, const void *input, const size_t len) 378 271 { 379 272 const uint8_t *p = (const uint8_t *)input;

+4 -5

samples/Kconfig

··· 54 54 measures the time taken to invoke one function a number of times. 55 55 56 56 config SAMPLE_TRACE_ARRAY 57 - tristate "Build sample module for kernel access to Ftrace instancess" 57 + tristate "Build sample module for kernel access to Ftrace instances" 58 58 depends on EVENT_TRACING && m 59 59 help 60 60 This builds a module that demonstrates the use of various APIs to ··· 316 316 depends on DETECT_HUNG_TASK && DEBUG_FS 317 317 help 318 318 Build a module that provides debugfs files (e.g., mutex, semaphore, 319 - etc.) under <debugfs>/hung_task. If user reads one of these files, 320 - it will sleep long time (256 seconds) with holding a lock. Thus, 321 - if 2 or more processes read the same file concurrently, it will 322 - be detected by the hung_task watchdog. 319 + rw_semaphore_read, rw_semaphore_write) under <debugfs>/hung_task. 320 + Reading these files with multiple processes triggers hung task 321 + detection by holding locks for a long time (256 seconds). 323 322 324 323 source "samples/rust/Kconfig" 325 324

+74 -7

samples/hung_task/hung_task_tests.c

··· 4 4 * semaphore, etc. 5 5 * 6 6 * Usage: Load this module and read `<debugfs>/hung_task/mutex`, 7 - * `<debugfs>/hung_task/semaphore`, etc., with 2 or more processes. 7 + * `<debugfs>/hung_task/semaphore`, `<debugfs>/hung_task/rw_semaphore_read`, 8 + * `<debugfs>/hung_task/rw_semaphore_write`, etc., with 2 or more processes. 8 9 * 9 10 * This is for testing kernel hung_task error messages with various locking 10 - * mechanisms (e.g., mutex, semaphore, etc.). Note that this may freeze 11 - * your system or cause a panic. Use only for testing purposes. 11 + * mechanisms (e.g., mutex, semaphore, rw_semaphore_read, rw_semaphore_write, etc.). 12 + * Note that this may freeze your system or cause a panic. Use only for testing purposes. 12 13 */ 13 14 14 15 #include <linux/debugfs.h> ··· 18 17 #include <linux/module.h> 19 18 #include <linux/mutex.h> 20 19 #include <linux/semaphore.h> 20 + #include <linux/rwsem.h> 21 21 22 - #define HUNG_TASK_DIR "hung_task" 23 - #define HUNG_TASK_MUTEX_FILE "mutex" 24 - #define HUNG_TASK_SEM_FILE "semaphore" 25 - #define SLEEP_SECOND 256 22 + #define HUNG_TASK_DIR "hung_task" 23 + #define HUNG_TASK_MUTEX_FILE "mutex" 24 + #define HUNG_TASK_SEM_FILE "semaphore" 25 + #define HUNG_TASK_RWSEM_READ_FILE "rw_semaphore_read" 26 + #define HUNG_TASK_RWSEM_WRITE_FILE "rw_semaphore_write" 27 + #define SLEEP_SECOND 256 26 28 27 29 static const char dummy_string[] = "This is a dummy string."; 28 30 static DEFINE_MUTEX(dummy_mutex); 29 31 static DEFINE_SEMAPHORE(dummy_sem, 1); 32 + static DECLARE_RWSEM(dummy_rwsem); 30 33 static struct dentry *hung_task_dir; 31 34 32 35 /* Mutex-based read function */ 33 36 static ssize_t read_dummy_mutex(struct file *file, char __user *user_buf, 34 37 size_t count, loff_t *ppos) 35 38 { 39 + /* Check if data is already read */ 40 + if (*ppos >= sizeof(dummy_string)) 41 + return 0; 42 + 36 43 /* Second task waits on mutex, entering uninterruptible sleep */ 37 44 guard(mutex)(&dummy_mutex); 38 45 ··· 55 46 static ssize_t read_dummy_semaphore(struct file *file, char __user *user_buf, 56 47 size_t count, loff_t *ppos) 57 48 { 49 + /* Check if data is already read */ 50 + if (*ppos >= sizeof(dummy_string)) 51 + return 0; 52 + 58 53 /* Second task waits on semaphore, entering uninterruptible sleep */ 59 54 down(&dummy_sem); 60 55 ··· 66 53 msleep_interruptible(SLEEP_SECOND * 1000); 67 54 68 55 up(&dummy_sem); 56 + 57 + return simple_read_from_buffer(user_buf, count, ppos, dummy_string, 58 + sizeof(dummy_string)); 59 + } 60 + 61 + /* Read-write semaphore read function */ 62 + static ssize_t read_dummy_rwsem_read(struct file *file, char __user *user_buf, 63 + size_t count, loff_t *ppos) 64 + { 65 + /* Check if data is already read */ 66 + if (*ppos >= sizeof(dummy_string)) 67 + return 0; 68 + 69 + /* Acquires read lock, allowing concurrent readers but blocks if write lock is held */ 70 + down_read(&dummy_rwsem); 71 + 72 + /* Sleeps here, potentially triggering hung task detection if lock is held too long */ 73 + msleep_interruptible(SLEEP_SECOND * 1000); 74 + 75 + up_read(&dummy_rwsem); 76 + 77 + return simple_read_from_buffer(user_buf, count, ppos, dummy_string, 78 + sizeof(dummy_string)); 79 + } 80 + 81 + /* Read-write semaphore write function */ 82 + static ssize_t read_dummy_rwsem_write(struct file *file, char __user *user_buf, 83 + size_t count, loff_t *ppos) 84 + { 85 + /* Check if data is already read */ 86 + if (*ppos >= sizeof(dummy_string)) 87 + return 0; 88 + 89 + /* Acquires exclusive write lock, blocking all other readers and writers */ 90 + down_write(&dummy_rwsem); 91 + 92 + /* Sleeps here, potentially triggering hung task detection if lock is held too long */ 93 + msleep_interruptible(SLEEP_SECOND * 1000); 94 + 95 + up_write(&dummy_rwsem); 69 96 70 97 return simple_read_from_buffer(user_buf, count, ppos, dummy_string, 71 98 sizeof(dummy_string)); ··· 121 68 .read = read_dummy_semaphore, 122 69 }; 123 70 71 + /* File operations for rw_semaphore read */ 72 + static const struct file_operations hung_task_rwsem_read_fops = { 73 + .read = read_dummy_rwsem_read, 74 + }; 75 + 76 + /* File operations for rw_semaphore write */ 77 + static const struct file_operations hung_task_rwsem_write_fops = { 78 + .read = read_dummy_rwsem_write, 79 + }; 80 + 124 81 static int __init hung_task_tests_init(void) 125 82 { 126 83 hung_task_dir = debugfs_create_dir(HUNG_TASK_DIR, NULL); ··· 142 79 &hung_task_mutex_fops); 143 80 debugfs_create_file(HUNG_TASK_SEM_FILE, 0400, hung_task_dir, NULL, 144 81 &hung_task_sem_fops); 82 + debugfs_create_file(HUNG_TASK_RWSEM_READ_FILE, 0400, hung_task_dir, NULL, 83 + &hung_task_rwsem_read_fops); 84 + debugfs_create_file(HUNG_TASK_RWSEM_WRITE_FILE, 0400, hung_task_dir, NULL, 85 + &hung_task_rwsem_write_fops); 145 86 146 87 return 0; 147 88 }

+31 -2

scripts/checkpatch.pl

··· 685 685 [\.\!:\s]* 686 686 )}; 687 687 688 + # Device ID types like found in include/linux/mod_devicetable.h. 689 + our $dev_id_types = qr{\b[a-z]\w*_device_id\b}; 690 + 688 691 sub edit_distance_min { 689 692 my (@arr) = @_; 690 693 my $len = scalar @arr; ··· 3503 3500 # Check for various typo / spelling mistakes 3504 3501 if (defined($misspellings) && 3505 3502 ($in_commit_log || $line =~ /^(?:\+|Subject:)/i)) { 3506 - while ($rawline =~ /(?:^|[^\w\-'`])($misspellings)(?:[^\w\-'`]|$)/gi) { 3503 + my $rawline_utf8 = decode("utf8", $rawline); 3504 + while ($rawline_utf8 =~ /(?:^|[^\w\-'`])($misspellings)(?:[^\w\-'`]|$)/gi) { 3507 3505 my $typo = $1; 3508 - my $blank = copy_spacing($rawline); 3506 + my $blank = copy_spacing($rawline_utf8); 3509 3507 my $ptr = substr($blank, 0, $-[1]) . "^" x length($typo); 3510 3508 my $hereptr = "$hereline$ptr\n"; 3511 3509 my $typo_fix = $spelling_fix{lc($typo)}; ··· 7691 7687 if ($line =~ /\.extra[12]\s*=\s*&(zero|one|int_max)\b/) { 7692 7688 WARN("DUPLICATED_SYSCTL_CONST", 7693 7689 "duplicated sysctl range checking value '$1', consider using the shared one in include/linux/sysctl.h\n" . $herecurr); 7690 + } 7691 + 7692 + # Check that *_device_id tables have sentinel entries. 7693 + if (defined $stat && $line =~ /struct\s+$dev_id_types\s+\w+\s*\[\s*\]\s*=\s*\{/) { 7694 + my $stripped = $stat; 7695 + 7696 + # Strip diff line prefixes. 7697 + $stripped =~ s/(^|\n)./$1/g; 7698 + # Line continuations. 7699 + $stripped =~ s/\\\n/\n/g; 7700 + # Strip whitespace, empty strings, zeroes, and commas. 7701 + $stripped =~ s/""//g; 7702 + $stripped =~ s/0x0//g; 7703 + $stripped =~ s/[\s$;,0]//g; 7704 + # Strip field assignments. 7705 + $stripped =~ s/\.$Ident=//g; 7706 + 7707 + if (!(substr($stripped, -4) eq "{}};" || 7708 + substr($stripped, -6) eq "{{}}};" || 7709 + $stripped =~ /ISAPNP_DEVICE_SINGLE_END}};$/ || 7710 + $stripped =~ /ISAPNP_CARD_END}};$/ || 7711 + $stripped =~ /NULL};$/ || 7712 + $stripped =~ /PCMCIA_DEVICE_NULL};$/)) { 7713 + ERROR("MISSING_SENTINEL", "missing sentinel in ID array\n" . "$here\n$stat\n"); 7714 + } 7694 7715 } 7695 7716 } 7696 7717

+44 -5

scripts/coccinelle/misc/secs_to_jiffies.cocci

··· 7 7 // Confidence: High 8 8 // Copyright: (C) 2024 Easwar Hariharan, Microsoft 9 9 // Keywords: secs, seconds, jiffies 10 - // 10 + // Options: --include-headers 11 11 12 12 virtual patch 13 + virtual report 14 + virtual context 13 15 14 - @depends on patch@ constant C; @@ 16 + @pconst depends on patch@ constant C; @@ 15 17 16 18 - msecs_to_jiffies(C * 1000) 17 19 + secs_to_jiffies(C) 18 20 19 - @depends on patch@ constant C; @@ 21 + @pconstms depends on patch@ constant C; @@ 20 22 21 23 - msecs_to_jiffies(C * MSEC_PER_SEC) 22 24 + secs_to_jiffies(C) 23 25 24 - @depends on patch@ expression E; @@ 26 + @pexpr depends on patch@ expression E; @@ 25 27 26 28 - msecs_to_jiffies(E * 1000) 27 29 + secs_to_jiffies(E) 28 30 29 - @depends on patch@ expression E; @@ 31 + @pexprms depends on patch@ expression E; @@ 30 32 31 33 - msecs_to_jiffies(E * MSEC_PER_SEC) 32 34 + secs_to_jiffies(E) 35 + 36 + @r depends on report && !patch@ 37 + constant C; 38 + expression E; 39 + position p; 40 + @@ 41 + 42 + ( 43 + msecs_to_jiffies(C@p * 1000) 44 + | 45 + msecs_to_jiffies(C@p * MSEC_PER_SEC) 46 + | 47 + msecs_to_jiffies(E@p * 1000) 48 + | 49 + msecs_to_jiffies(E@p * MSEC_PER_SEC) 50 + ) 51 + 52 + @c depends on context && !patch@ 53 + constant C; 54 + expression E; 55 + @@ 56 + 57 + ( 58 + * msecs_to_jiffies(C * 1000) 59 + | 60 + * msecs_to_jiffies(C * MSEC_PER_SEC) 61 + | 62 + * msecs_to_jiffies(E * 1000) 63 + | 64 + * msecs_to_jiffies(E * MSEC_PER_SEC) 65 + ) 66 + 67 + @script:python depends on report@ 68 + p << r.p; 69 + @@ 70 + 71 + coccilib.report.print_report(p[0], "WARNING opportunity for secs_to_jiffies()")

+6 -6

scripts/gdb/linux/constants.py.in

··· 74 74 LX_GDBPARSED(MOD_RO_AFTER_INIT) 75 75 76 76 /* linux/mount.h */ 77 - LX_VALUE(MNT_NOSUID) 78 - LX_VALUE(MNT_NODEV) 79 - LX_VALUE(MNT_NOEXEC) 80 - LX_VALUE(MNT_NOATIME) 81 - LX_VALUE(MNT_NODIRATIME) 82 - LX_VALUE(MNT_RELATIME) 77 + LX_GDBPARSED(MNT_NOSUID) 78 + LX_GDBPARSED(MNT_NODEV) 79 + LX_GDBPARSED(MNT_NOEXEC) 80 + LX_GDBPARSED(MNT_NOATIME) 81 + LX_GDBPARSED(MNT_NODIRATIME) 82 + LX_GDBPARSED(MNT_RELATIME) 83 83 84 84 /* linux/threads.h */ 85 85 LX_VALUE(NR_CPUS)

+1

scripts/spelling.txt

··· 1099 1099 notications||notifications 1100 1100 notifcations||notifications 1101 1101 notifed||notified 1102 + notifer||notifier 1102 1103 notity||notify 1103 1104 notfify||notify 1104 1105 nubmer||number

+1 -1

tools/accounting/Makefile

··· 2 2 CC := $(CROSS_COMPILE)gcc 3 3 CFLAGS := -I../../usr/include 4 4 5 - PROGS := getdelays procacct 5 + PROGS := getdelays procacct delaytop 6 6 7 7 all: $(PROGS) 8 8

+862

tools/accounting/delaytop.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * delaytop.c - system-wide delay monitoring tool. 4 + * 5 + * This tool provides real-time monitoring and statistics of 6 + * system, container, and task-level delays, including CPU, 7 + * memory, IO, and IRQ. It supports both interactive (top-like), 8 + * and can output delay information for the whole system, specific 9 + * containers (cgroups), or individual tasks (PIDs). 10 + * 11 + * Key features: 12 + * - Collects per-task delay accounting statistics via taskstats. 13 + * - Collects system-wide PSI information. 14 + * - Supports sorting, filtering. 15 + * - Supports both interactive (screen refresh). 16 + * 17 + * Copyright (C) Fan Yu, ZTE Corp. 2025 18 + * Copyright (C) Wang Yaxin, ZTE Corp. 2025 19 + * 20 + * Compile with 21 + * gcc -I/usr/src/linux/include delaytop.c -o delaytop 22 + */ 23 + 24 + #include <stdio.h> 25 + #include <stdlib.h> 26 + #include <string.h> 27 + #include <errno.h> 28 + #include <unistd.h> 29 + #include <fcntl.h> 30 + #include <getopt.h> 31 + #include <signal.h> 32 + #include <time.h> 33 + #include <dirent.h> 34 + #include <ctype.h> 35 + #include <stdbool.h> 36 + #include <sys/types.h> 37 + #include <sys/stat.h> 38 + #include <sys/socket.h> 39 + #include <sys/select.h> 40 + #include <termios.h> 41 + #include <limits.h> 42 + #include <linux/genetlink.h> 43 + #include <linux/taskstats.h> 44 + #include <linux/cgroupstats.h> 45 + 46 + #define PSI_CPU_SOME "/proc/pressure/cpu" 47 + #define PSI_CPU_FULL "/proc/pressure/cpu" 48 + #define PSI_MEMORY_SOME "/proc/pressure/memory" 49 + #define PSI_MEMORY_FULL "/proc/pressure/memory" 50 + #define PSI_IO_SOME "/proc/pressure/io" 51 + #define PSI_IO_FULL "/proc/pressure/io" 52 + #define PSI_IRQ_FULL "/proc/pressure/irq" 53 + 54 + #define NLA_NEXT(na) ((struct nlattr *)((char *)(na) + NLA_ALIGN((na)->nla_len))) 55 + #define NLA_DATA(na) ((void *)((char *)(na) + NLA_HDRLEN)) 56 + #define NLA_PAYLOAD(len) (len - NLA_HDRLEN) 57 + 58 + #define GENLMSG_DATA(glh) ((void *)(NLMSG_DATA(glh) + GENL_HDRLEN)) 59 + #define GENLMSG_PAYLOAD(glh) (NLMSG_PAYLOAD(glh, 0) - GENL_HDRLEN) 60 + 61 + #define TASK_COMM_LEN 16 62 + #define MAX_MSG_SIZE 1024 63 + #define MAX_TASKS 1000 64 + #define SET_TASK_STAT(task_count, field) tasks[task_count].field = stats.field 65 + #define BOOL_FPRINT(stream, fmt, ...) \ 66 + ({ \ 67 + int ret = fprintf(stream, fmt, ##__VA_ARGS__); \ 68 + ret >= 0; \ 69 + }) 70 + #define PSI_LINE_FORMAT "%-12s %6.1f%%/%6.1f%%/%6.1f%%/%8llu(ms)\n" 71 + 72 + /* Program settings structure */ 73 + struct config { 74 + int delay; /* Update interval in seconds */ 75 + int iterations; /* Number of iterations, 0 == infinite */ 76 + int max_processes; /* Maximum number of processes to show */ 77 + char sort_field; /* Field to sort by */ 78 + int output_one_time; /* Output once and exit */ 79 + int monitor_pid; /* Monitor specific PID */ 80 + char *container_path; /* Path to container cgroup */ 81 + }; 82 + 83 + /* PSI statistics structure */ 84 + struct psi_stats { 85 + double cpu_some_avg10, cpu_some_avg60, cpu_some_avg300; 86 + unsigned long long cpu_some_total; 87 + double cpu_full_avg10, cpu_full_avg60, cpu_full_avg300; 88 + unsigned long long cpu_full_total; 89 + double memory_some_avg10, memory_some_avg60, memory_some_avg300; 90 + unsigned long long memory_some_total; 91 + double memory_full_avg10, memory_full_avg60, memory_full_avg300; 92 + unsigned long long memory_full_total; 93 + double io_some_avg10, io_some_avg60, io_some_avg300; 94 + unsigned long long io_some_total; 95 + double io_full_avg10, io_full_avg60, io_full_avg300; 96 + unsigned long long io_full_total; 97 + double irq_full_avg10, irq_full_avg60, irq_full_avg300; 98 + unsigned long long irq_full_total; 99 + }; 100 + 101 + /* Task delay information structure */ 102 + struct task_info { 103 + int pid; 104 + int tgid; 105 + char command[TASK_COMM_LEN]; 106 + unsigned long long cpu_count; 107 + unsigned long long cpu_delay_total; 108 + unsigned long long blkio_count; 109 + unsigned long long blkio_delay_total; 110 + unsigned long long swapin_count; 111 + unsigned long long swapin_delay_total; 112 + unsigned long long freepages_count; 113 + unsigned long long freepages_delay_total; 114 + unsigned long long thrashing_count; 115 + unsigned long long thrashing_delay_total; 116 + unsigned long long compact_count; 117 + unsigned long long compact_delay_total; 118 + unsigned long long wpcopy_count; 119 + unsigned long long wpcopy_delay_total; 120 + unsigned long long irq_count; 121 + unsigned long long irq_delay_total; 122 + }; 123 + 124 + /* Container statistics structure */ 125 + struct container_stats { 126 + int nr_sleeping; /* Number of sleeping processes */ 127 + int nr_running; /* Number of running processes */ 128 + int nr_stopped; /* Number of stopped processes */ 129 + int nr_uninterruptible; /* Number of uninterruptible processes */ 130 + int nr_io_wait; /* Number of processes in IO wait */ 131 + }; 132 + 133 + /* Global variables */ 134 + static struct config cfg; 135 + static struct psi_stats psi; 136 + static struct task_info tasks[MAX_TASKS]; 137 + static int task_count; 138 + static int running = 1; 139 + static struct container_stats container_stats; 140 + 141 + /* Netlink socket variables */ 142 + static int nl_sd = -1; 143 + static int family_id; 144 + 145 + /* Set terminal to non-canonical mode for q-to-quit */ 146 + static struct termios orig_termios; 147 + static void enable_raw_mode(void) 148 + { 149 + struct termios raw; 150 + 151 + tcgetattr(STDIN_FILENO, &orig_termios); 152 + raw = orig_termios; 153 + raw.c_lflag &= ~(ICANON | ECHO); 154 + tcsetattr(STDIN_FILENO, TCSAFLUSH, &raw); 155 + } 156 + static void disable_raw_mode(void) 157 + { 158 + tcsetattr(STDIN_FILENO, TCSAFLUSH, &orig_termios); 159 + } 160 + 161 + /* Display usage information and command line options */ 162 + static void usage(void) 163 + { 164 + printf("Usage: delaytop [Options]\n" 165 + "Options:\n" 166 + " -h, --help Show this help message and exit\n" 167 + " -d, --delay=SECONDS Set refresh interval (default: 2 seconds, min: 1)\n" 168 + " -n, --iterations=COUNT Set number of updates (default: 0 = infinite)\n" 169 + " -P, --processes=NUMBER Set maximum number of processes to show (default: 20, max: 1000)\n" 170 + " -o, --once Display once and exit\n" 171 + " -p, --pid=PID Monitor only the specified PID\n" 172 + " -C, --container=PATH Monitor the container at specified cgroup path\n"); 173 + exit(0); 174 + } 175 + 176 + /* Parse command line arguments and set configuration */ 177 + static void parse_args(int argc, char **argv) 178 + { 179 + int c; 180 + struct option long_options[] = { 181 + {"help", no_argument, 0, 'h'}, 182 + {"delay", required_argument, 0, 'd'}, 183 + {"iterations", required_argument, 0, 'n'}, 184 + {"pid", required_argument, 0, 'p'}, 185 + {"once", no_argument, 0, 'o'}, 186 + {"processes", required_argument, 0, 'P'}, 187 + {"container", required_argument, 0, 'C'}, 188 + {0, 0, 0, 0} 189 + }; 190 + 191 + /* Set defaults */ 192 + cfg.delay = 2; 193 + cfg.iterations = 0; 194 + cfg.max_processes = 20; 195 + cfg.sort_field = 'c'; /* Default sort by CPU delay */ 196 + cfg.output_one_time = 0; 197 + cfg.monitor_pid = 0; /* 0 means monitor all PIDs */ 198 + cfg.container_path = NULL; 199 + 200 + while (1) { 201 + int option_index = 0; 202 + 203 + c = getopt_long(argc, argv, "hd:n:p:oP:C:", long_options, &option_index); 204 + if (c == -1) 205 + break; 206 + 207 + switch (c) { 208 + case 'h': 209 + usage(); 210 + break; 211 + case 'd': 212 + cfg.delay = atoi(optarg); 213 + if (cfg.delay < 1) { 214 + fprintf(stderr, "Error: delay must be >= 1.\n"); 215 + exit(1); 216 + } 217 + break; 218 + case 'n': 219 + cfg.iterations = atoi(optarg); 220 + if (cfg.iterations < 0) { 221 + fprintf(stderr, "Error: iterations must be >= 0.\n"); 222 + exit(1); 223 + } 224 + break; 225 + case 'p': 226 + cfg.monitor_pid = atoi(optarg); 227 + if (cfg.monitor_pid < 1) { 228 + fprintf(stderr, "Error: pid must be >= 1.\n"); 229 + exit(1); 230 + } 231 + break; 232 + case 'o': 233 + cfg.output_one_time = 1; 234 + break; 235 + case 'P': 236 + cfg.max_processes = atoi(optarg); 237 + if (cfg.max_processes < 1) { 238 + fprintf(stderr, "Error: processes must be >= 1.\n"); 239 + exit(1); 240 + } 241 + if (cfg.max_processes > MAX_TASKS) { 242 + fprintf(stderr, "Warning: processes capped to %d.\n", 243 + MAX_TASKS); 244 + cfg.max_processes = MAX_TASKS; 245 + } 246 + break; 247 + case 'C': 248 + cfg.container_path = strdup(optarg); 249 + break; 250 + default: 251 + fprintf(stderr, "Try 'delaytop --help' for more information.\n"); 252 + exit(1); 253 + } 254 + } 255 + } 256 + 257 + /* Create a raw netlink socket and bind */ 258 + static int create_nl_socket(void) 259 + { 260 + int fd; 261 + struct sockaddr_nl local; 262 + 263 + fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC); 264 + if (fd < 0) 265 + return -1; 266 + 267 + memset(&local, 0, sizeof(local)); 268 + local.nl_family = AF_NETLINK; 269 + 270 + if (bind(fd, (struct sockaddr *) &local, sizeof(local)) < 0) { 271 + fprintf(stderr, "Failed to bind socket when create nl_socket\n"); 272 + close(fd); 273 + return -1; 274 + } 275 + 276 + return fd; 277 + } 278 + 279 + /* Send a command via netlink */ 280 + static int send_cmd(int sd, __u16 nlmsg_type, __u32 nlmsg_pid, 281 + __u8 genl_cmd, __u16 nla_type, 282 + void *nla_data, int nla_len) 283 + { 284 + struct sockaddr_nl nladdr; 285 + struct nlattr *na; 286 + int r, buflen; 287 + char *buf; 288 + 289 + struct { 290 + struct nlmsghdr n; 291 + struct genlmsghdr g; 292 + char buf[MAX_MSG_SIZE]; 293 + } msg; 294 + 295 + msg.n.nlmsg_len = NLMSG_LENGTH(GENL_HDRLEN); 296 + msg.n.nlmsg_type = nlmsg_type; 297 + msg.n.nlmsg_flags = NLM_F_REQUEST; 298 + msg.n.nlmsg_seq = 0; 299 + msg.n.nlmsg_pid = nlmsg_pid; 300 + msg.g.cmd = genl_cmd; 301 + msg.g.version = 0x1; 302 + na = (struct nlattr *) GENLMSG_DATA(&msg); 303 + na->nla_type = nla_type; 304 + na->nla_len = nla_len + NLA_HDRLEN; 305 + memcpy(NLA_DATA(na), nla_data, nla_len); 306 + msg.n.nlmsg_len += NLMSG_ALIGN(na->nla_len); 307 + 308 + buf = (char *) &msg; 309 + buflen = msg.n.nlmsg_len; 310 + memset(&nladdr, 0, sizeof(nladdr)); 311 + nladdr.nl_family = AF_NETLINK; 312 + while ((r = sendto(sd, buf, buflen, 0, (struct sockaddr *) &nladdr, 313 + sizeof(nladdr))) < buflen) { 314 + if (r > 0) { 315 + buf += r; 316 + buflen -= r; 317 + } else if (errno != EAGAIN) 318 + return -1; 319 + } 320 + return 0; 321 + } 322 + 323 + /* Get family ID for taskstats via netlink */ 324 + static int get_family_id(int sd) 325 + { 326 + struct { 327 + struct nlmsghdr n; 328 + struct genlmsghdr g; 329 + char buf[256]; 330 + } ans; 331 + 332 + int id = 0, rc; 333 + struct nlattr *na; 334 + int rep_len; 335 + char name[100]; 336 + 337 + strncpy(name, TASKSTATS_GENL_NAME, sizeof(name) - 1); 338 + name[sizeof(name) - 1] = '\0'; 339 + rc = send_cmd(sd, GENL_ID_CTRL, getpid(), CTRL_CMD_GETFAMILY, 340 + CTRL_ATTR_FAMILY_NAME, (void *)name, 341 + strlen(TASKSTATS_GENL_NAME)+1); 342 + if (rc < 0) { 343 + fprintf(stderr, "Failed to send cmd for family id\n"); 344 + return 0; 345 + } 346 + 347 + rep_len = recv(sd, &ans, sizeof(ans), 0); 348 + if (ans.n.nlmsg_type == NLMSG_ERROR || 349 + (rep_len < 0) || !NLMSG_OK((&ans.n), rep_len)) { 350 + fprintf(stderr, "Failed to receive response for family id\n"); 351 + return 0; 352 + } 353 + 354 + na = (struct nlattr *) GENLMSG_DATA(&ans); 355 + na = (struct nlattr *) ((char *) na + NLA_ALIGN(na->nla_len)); 356 + if (na->nla_type == CTRL_ATTR_FAMILY_ID) 357 + id = *(__u16 *) NLA_DATA(na); 358 + return id; 359 + } 360 + 361 + static void read_psi_stats(void) 362 + { 363 + FILE *fp; 364 + char line[256]; 365 + int ret = 0; 366 + /* Zero all fields */ 367 + memset(&psi, 0, sizeof(psi)); 368 + /* CPU pressure */ 369 + fp = fopen(PSI_CPU_SOME, "r"); 370 + if (fp) { 371 + while (fgets(line, sizeof(line), fp)) { 372 + if (strncmp(line, "some", 4) == 0) { 373 + ret = sscanf(line, "some avg10=%lf avg60=%lf avg300=%lf total=%llu", 374 + &psi.cpu_some_avg10, &psi.cpu_some_avg60, 375 + &psi.cpu_some_avg300, &psi.cpu_some_total); 376 + if (ret != 4) 377 + fprintf(stderr, "Failed to parse CPU some PSI data\n"); 378 + } else if (strncmp(line, "full", 4) == 0) { 379 + ret = sscanf(line, "full avg10=%lf avg60=%lf avg300=%lf total=%llu", 380 + &psi.cpu_full_avg10, &psi.cpu_full_avg60, 381 + &psi.cpu_full_avg300, &psi.cpu_full_total); 382 + if (ret != 4) 383 + fprintf(stderr, "Failed to parse CPU full PSI data\n"); 384 + } 385 + } 386 + fclose(fp); 387 + } 388 + /* Memory pressure */ 389 + fp = fopen(PSI_MEMORY_SOME, "r"); 390 + if (fp) { 391 + while (fgets(line, sizeof(line), fp)) { 392 + if (strncmp(line, "some", 4) == 0) { 393 + ret = sscanf(line, "some avg10=%lf avg60=%lf avg300=%lf total=%llu", 394 + &psi.memory_some_avg10, &psi.memory_some_avg60, 395 + &psi.memory_some_avg300, &psi.memory_some_total); 396 + if (ret != 4) 397 + fprintf(stderr, "Failed to parse Memory some PSI data\n"); 398 + } else if (strncmp(line, "full", 4) == 0) { 399 + ret = sscanf(line, "full avg10=%lf avg60=%lf avg300=%lf total=%llu", 400 + &psi.memory_full_avg10, &psi.memory_full_avg60, 401 + &psi.memory_full_avg300, &psi.memory_full_total); 402 + } 403 + if (ret != 4) 404 + fprintf(stderr, "Failed to parse Memory full PSI data\n"); 405 + } 406 + fclose(fp); 407 + } 408 + /* IO pressure */ 409 + fp = fopen(PSI_IO_SOME, "r"); 410 + if (fp) { 411 + while (fgets(line, sizeof(line), fp)) { 412 + if (strncmp(line, "some", 4) == 0) { 413 + ret = sscanf(line, "some avg10=%lf avg60=%lf avg300=%lf total=%llu", 414 + &psi.io_some_avg10, &psi.io_some_avg60, 415 + &psi.io_some_avg300, &psi.io_some_total); 416 + if (ret != 4) 417 + fprintf(stderr, "Failed to parse IO some PSI data\n"); 418 + } else if (strncmp(line, "full", 4) == 0) { 419 + ret = sscanf(line, "full avg10=%lf avg60=%lf avg300=%lf total=%llu", 420 + &psi.io_full_avg10, &psi.io_full_avg60, 421 + &psi.io_full_avg300, &psi.io_full_total); 422 + if (ret != 4) 423 + fprintf(stderr, "Failed to parse IO full PSI data\n"); 424 + } 425 + } 426 + fclose(fp); 427 + } 428 + /* IRQ pressure (only full) */ 429 + fp = fopen(PSI_IRQ_FULL, "r"); 430 + if (fp) { 431 + while (fgets(line, sizeof(line), fp)) { 432 + if (strncmp(line, "full", 4) == 0) { 433 + ret = sscanf(line, "full avg10=%lf avg60=%lf avg300=%lf total=%llu", 434 + &psi.irq_full_avg10, &psi.irq_full_avg60, 435 + &psi.irq_full_avg300, &psi.irq_full_total); 436 + if (ret != 4) 437 + fprintf(stderr, "Failed to parse IRQ full PSI data\n"); 438 + } 439 + } 440 + fclose(fp); 441 + } 442 + } 443 + 444 + static int read_comm(int pid, char *comm_buf, size_t buf_size) 445 + { 446 + char path[64]; 447 + int ret = -1; 448 + size_t len; 449 + FILE *fp; 450 + 451 + snprintf(path, sizeof(path), "/proc/%d/comm", pid); 452 + fp = fopen(path, "r"); 453 + if (!fp) { 454 + fprintf(stderr, "Failed to open comm file /proc/%d/comm\n", pid); 455 + return ret; 456 + } 457 + 458 + if (fgets(comm_buf, buf_size, fp)) { 459 + len = strlen(comm_buf); 460 + if (len > 0 && comm_buf[len - 1] == '\n') 461 + comm_buf[len - 1] = '\0'; 462 + ret = 0; 463 + } 464 + 465 + fclose(fp); 466 + 467 + return ret; 468 + } 469 + 470 + static void fetch_and_fill_task_info(int pid, const char *comm) 471 + { 472 + struct { 473 + struct nlmsghdr n; 474 + struct genlmsghdr g; 475 + char buf[MAX_MSG_SIZE]; 476 + } resp; 477 + struct taskstats stats; 478 + struct nlattr *nested; 479 + struct nlattr *na; 480 + int nested_len; 481 + int nl_len; 482 + int rc; 483 + 484 + /* Send request for task stats */ 485 + if (send_cmd(nl_sd, family_id, getpid(), TASKSTATS_CMD_GET, 486 + TASKSTATS_CMD_ATTR_PID, &pid, sizeof(pid)) < 0) { 487 + fprintf(stderr, "Failed to send request for task stats\n"); 488 + return; 489 + } 490 + 491 + /* Receive response */ 492 + rc = recv(nl_sd, &resp, sizeof(resp), 0); 493 + if (rc < 0 || resp.n.nlmsg_type == NLMSG_ERROR) { 494 + fprintf(stderr, "Failed to receive response for task stats\n"); 495 + return; 496 + } 497 + 498 + /* Parse response */ 499 + nl_len = GENLMSG_PAYLOAD(&resp.n); 500 + na = (struct nlattr *) GENLMSG_DATA(&resp); 501 + while (nl_len > 0) { 502 + if (na->nla_type == TASKSTATS_TYPE_AGGR_PID) { 503 + nested = (struct nlattr *) NLA_DATA(na); 504 + nested_len = NLA_PAYLOAD(na->nla_len); 505 + while (nested_len > 0) { 506 + if (nested->nla_type == TASKSTATS_TYPE_STATS) { 507 + memcpy(&stats, NLA_DATA(nested), sizeof(stats)); 508 + if (task_count < MAX_TASKS) { 509 + tasks[task_count].pid = pid; 510 + tasks[task_count].tgid = pid; 511 + strncpy(tasks[task_count].command, comm, 512 + TASK_COMM_LEN - 1); 513 + tasks[task_count].command[TASK_COMM_LEN - 1] = '\0'; 514 + SET_TASK_STAT(task_count, cpu_count); 515 + SET_TASK_STAT(task_count, cpu_delay_total); 516 + SET_TASK_STAT(task_count, blkio_count); 517 + SET_TASK_STAT(task_count, blkio_delay_total); 518 + SET_TASK_STAT(task_count, swapin_count); 519 + SET_TASK_STAT(task_count, swapin_delay_total); 520 + SET_TASK_STAT(task_count, freepages_count); 521 + SET_TASK_STAT(task_count, freepages_delay_total); 522 + SET_TASK_STAT(task_count, thrashing_count); 523 + SET_TASK_STAT(task_count, thrashing_delay_total); 524 + SET_TASK_STAT(task_count, compact_count); 525 + SET_TASK_STAT(task_count, compact_delay_total); 526 + SET_TASK_STAT(task_count, wpcopy_count); 527 + SET_TASK_STAT(task_count, wpcopy_delay_total); 528 + SET_TASK_STAT(task_count, irq_count); 529 + SET_TASK_STAT(task_count, irq_delay_total); 530 + task_count++; 531 + } 532 + break; 533 + } 534 + nested_len -= NLA_ALIGN(nested->nla_len); 535 + nested = NLA_NEXT(nested); 536 + } 537 + } 538 + nl_len -= NLA_ALIGN(na->nla_len); 539 + na = NLA_NEXT(na); 540 + } 541 + return; 542 + } 543 + 544 + static void get_task_delays(void) 545 + { 546 + char comm[TASK_COMM_LEN]; 547 + struct dirent *entry; 548 + DIR *dir; 549 + int pid; 550 + 551 + task_count = 0; 552 + if (cfg.monitor_pid > 0) { 553 + if (read_comm(cfg.monitor_pid, comm, sizeof(comm)) == 0) 554 + fetch_and_fill_task_info(cfg.monitor_pid, comm); 555 + return; 556 + } 557 + 558 + dir = opendir("/proc"); 559 + if (!dir) { 560 + fprintf(stderr, "Error opening /proc directory\n"); 561 + return; 562 + } 563 + 564 + while ((entry = readdir(dir)) != NULL && task_count < MAX_TASKS) { 565 + if (!isdigit(entry->d_name[0])) 566 + continue; 567 + pid = atoi(entry->d_name); 568 + if (pid == 0) 569 + continue; 570 + if (read_comm(pid, comm, sizeof(comm)) != 0) 571 + continue; 572 + fetch_and_fill_task_info(pid, comm); 573 + } 574 + closedir(dir); 575 + } 576 + 577 + /* Calculate average delay in milliseconds */ 578 + static double average_ms(unsigned long long total, unsigned long long count) 579 + { 580 + if (count == 0) 581 + return 0; 582 + return (double)total / 1000000.0 / count; 583 + } 584 + 585 + /* Comparison function for sorting tasks */ 586 + static int compare_tasks(const void *a, const void *b) 587 + { 588 + const struct task_info *t1 = (const struct task_info *)a; 589 + const struct task_info *t2 = (const struct task_info *)b; 590 + double avg1, avg2; 591 + 592 + switch (cfg.sort_field) { 593 + case 'c': /* CPU */ 594 + avg1 = average_ms(t1->cpu_delay_total, t1->cpu_count); 595 + avg2 = average_ms(t2->cpu_delay_total, t2->cpu_count); 596 + if (avg1 != avg2) 597 + return avg2 > avg1 ? 1 : -1; 598 + return t2->cpu_delay_total > t1->cpu_delay_total ? 1 : -1; 599 + 600 + default: 601 + return t2->cpu_delay_total > t1->cpu_delay_total ? 1 : -1; 602 + } 603 + } 604 + 605 + /* Sort tasks by selected field */ 606 + static void sort_tasks(void) 607 + { 608 + if (task_count > 0) 609 + qsort(tasks, task_count, sizeof(struct task_info), compare_tasks); 610 + } 611 + 612 + /* Get container statistics via cgroupstats */ 613 + static void get_container_stats(void) 614 + { 615 + int rc, cfd; 616 + struct { 617 + struct nlmsghdr n; 618 + struct genlmsghdr g; 619 + char buf[MAX_MSG_SIZE]; 620 + } req, resp; 621 + struct nlattr *na; 622 + int nl_len; 623 + struct cgroupstats stats; 624 + 625 + /* Check if container path is set */ 626 + if (!cfg.container_path) 627 + return; 628 + 629 + /* Open container cgroup */ 630 + cfd = open(cfg.container_path, O_RDONLY); 631 + if (cfd < 0) { 632 + fprintf(stderr, "Error opening container path: %s\n", cfg.container_path); 633 + return; 634 + } 635 + 636 + /* Send request for container stats */ 637 + if (send_cmd(nl_sd, family_id, getpid(), CGROUPSTATS_CMD_GET, 638 + CGROUPSTATS_CMD_ATTR_FD, &cfd, sizeof(__u32)) < 0) { 639 + fprintf(stderr, "Failed to send request for container stats\n"); 640 + close(cfd); 641 + return; 642 + } 643 + 644 + /* Receive response */ 645 + rc = recv(nl_sd, &resp, sizeof(resp), 0); 646 + if (rc < 0 || resp.n.nlmsg_type == NLMSG_ERROR) { 647 + fprintf(stderr, "Failed to receive response for container stats\n"); 648 + close(cfd); 649 + return; 650 + } 651 + 652 + /* Parse response */ 653 + nl_len = GENLMSG_PAYLOAD(&resp.n); 654 + na = (struct nlattr *) GENLMSG_DATA(&resp); 655 + while (nl_len > 0) { 656 + if (na->nla_type == CGROUPSTATS_TYPE_CGROUP_STATS) { 657 + /* Get the cgroupstats structure */ 658 + memcpy(&stats, NLA_DATA(na), sizeof(stats)); 659 + 660 + /* Fill container stats */ 661 + container_stats.nr_sleeping = stats.nr_sleeping; 662 + container_stats.nr_running = stats.nr_running; 663 + container_stats.nr_stopped = stats.nr_stopped; 664 + container_stats.nr_uninterruptible = stats.nr_uninterruptible; 665 + container_stats.nr_io_wait = stats.nr_io_wait; 666 + break; 667 + } 668 + nl_len -= NLA_ALIGN(na->nla_len); 669 + na = (struct nlattr *) ((char *) na + NLA_ALIGN(na->nla_len)); 670 + } 671 + 672 + close(cfd); 673 + } 674 + 675 + /* Display results to stdout or log file */ 676 + static void display_results(void) 677 + { 678 + time_t now = time(NULL); 679 + struct tm *tm_now = localtime(&now); 680 + FILE *out = stdout; 681 + char timestamp[32]; 682 + bool suc = true; 683 + int i, count; 684 + 685 + /* Clear terminal screen */ 686 + suc &= BOOL_FPRINT(out, "\033[H\033[J"); 687 + 688 + /* PSI output (one-line, no cat style) */ 689 + suc &= BOOL_FPRINT(out, "System Pressure Information: (avg10/avg60/avg300/total)\n"); 690 + suc &= BOOL_FPRINT(out, PSI_LINE_FORMAT, 691 + "CPU some:", 692 + psi.cpu_some_avg10, 693 + psi.cpu_some_avg60, 694 + psi.cpu_some_avg300, 695 + psi.cpu_some_total / 1000); 696 + suc &= BOOL_FPRINT(out, PSI_LINE_FORMAT, 697 + "CPU full:", 698 + psi.cpu_full_avg10, 699 + psi.cpu_full_avg60, 700 + psi.cpu_full_avg300, 701 + psi.cpu_full_total / 1000); 702 + suc &= BOOL_FPRINT(out, PSI_LINE_FORMAT, 703 + "Memory full:", 704 + psi.memory_full_avg10, 705 + psi.memory_full_avg60, 706 + psi.memory_full_avg300, 707 + psi.memory_full_total / 1000); 708 + suc &= BOOL_FPRINT(out, PSI_LINE_FORMAT, 709 + "Memory some:", 710 + psi.memory_some_avg10, 711 + psi.memory_some_avg60, 712 + psi.memory_some_avg300, 713 + psi.memory_some_total / 1000); 714 + suc &= BOOL_FPRINT(out, PSI_LINE_FORMAT, 715 + "IO full:", 716 + psi.io_full_avg10, 717 + psi.io_full_avg60, 718 + psi.io_full_avg300, 719 + psi.io_full_total / 1000); 720 + suc &= BOOL_FPRINT(out, PSI_LINE_FORMAT, 721 + "IO some:", 722 + psi.io_some_avg10, 723 + psi.io_some_avg60, 724 + psi.io_some_avg300, 725 + psi.io_some_total / 1000); 726 + suc &= BOOL_FPRINT(out, PSI_LINE_FORMAT, 727 + "IRQ full:", 728 + psi.irq_full_avg10, 729 + psi.irq_full_avg60, 730 + psi.irq_full_avg300, 731 + psi.irq_full_total / 1000); 732 + 733 + if (cfg.container_path) { 734 + suc &= BOOL_FPRINT(out, "Container Information (%s):\n", cfg.container_path); 735 + suc &= BOOL_FPRINT(out, "Processes: running=%d, sleeping=%d, ", 736 + container_stats.nr_running, container_stats.nr_sleeping); 737 + suc &= BOOL_FPRINT(out, "stopped=%d, uninterruptible=%d, io_wait=%d\n\n", 738 + container_stats.nr_stopped, container_stats.nr_uninterruptible, 739 + container_stats.nr_io_wait); 740 + } 741 + suc &= BOOL_FPRINT(out, "Top %d processes (sorted by CPU delay):\n", 742 + cfg.max_processes); 743 + suc &= BOOL_FPRINT(out, "%5s %5s %-17s", "PID", "TGID", "COMMAND"); 744 + suc &= BOOL_FPRINT(out, "%7s %7s %7s %7s %7s %7s %7s %7s\n", 745 + "CPU(ms)", "IO(ms)", "SWAP(ms)", "RCL(ms)", 746 + "THR(ms)", "CMP(ms)", "WP(ms)", "IRQ(ms)"); 747 + 748 + suc &= BOOL_FPRINT(out, "-----------------------------------------------"); 749 + suc &= BOOL_FPRINT(out, "----------------------------------------------\n"); 750 + count = task_count < cfg.max_processes ? task_count : cfg.max_processes; 751 + 752 + for (i = 0; i < count; i++) { 753 + suc &= BOOL_FPRINT(out, "%5d %5d %-15s", 754 + tasks[i].pid, tasks[i].tgid, tasks[i].command); 755 + suc &= BOOL_FPRINT(out, "%7.2f %7.2f %7.2f %7.2f %7.2f %7.2f %7.2f %7.2f\n", 756 + average_ms(tasks[i].cpu_delay_total, tasks[i].cpu_count), 757 + average_ms(tasks[i].blkio_delay_total, tasks[i].blkio_count), 758 + average_ms(tasks[i].swapin_delay_total, tasks[i].swapin_count), 759 + average_ms(tasks[i].freepages_delay_total, tasks[i].freepages_count), 760 + average_ms(tasks[i].thrashing_delay_total, tasks[i].thrashing_count), 761 + average_ms(tasks[i].compact_delay_total, tasks[i].compact_count), 762 + average_ms(tasks[i].wpcopy_delay_total, tasks[i].wpcopy_count), 763 + average_ms(tasks[i].irq_delay_total, tasks[i].irq_count)); 764 + } 765 + 766 + suc &= BOOL_FPRINT(out, "\n"); 767 + 768 + if (!suc) 769 + perror("Error writing to output"); 770 + } 771 + 772 + /* Main function */ 773 + int main(int argc, char **argv) 774 + { 775 + int iterations = 0; 776 + int use_q_quit = 0; 777 + 778 + /* Parse command line arguments */ 779 + parse_args(argc, argv); 780 + 781 + /* Setup netlink socket */ 782 + nl_sd = create_nl_socket(); 783 + if (nl_sd < 0) { 784 + fprintf(stderr, "Error creating netlink socket\n"); 785 + exit(1); 786 + } 787 + 788 + /* Get family ID for taskstats via netlink */ 789 + family_id = get_family_id(nl_sd); 790 + if (!family_id) { 791 + fprintf(stderr, "Error getting taskstats family ID\n"); 792 + close(nl_sd); 793 + exit(1); 794 + } 795 + 796 + if (!cfg.output_one_time) { 797 + use_q_quit = 1; 798 + enable_raw_mode(); 799 + printf("Press 'q' to quit.\n"); 800 + fflush(stdout); 801 + } 802 + 803 + /* Main loop */ 804 + while (running) { 805 + /* Read PSI statistics */ 806 + read_psi_stats(); 807 + 808 + /* Get container stats if container path provided */ 809 + if (cfg.container_path) 810 + get_container_stats(); 811 + 812 + /* Get task delays */ 813 + get_task_delays(); 814 + 815 + /* Sort tasks */ 816 + sort_tasks(); 817 + 818 + /* Display results to stdout or log file */ 819 + display_results(); 820 + 821 + /* Check for iterations */ 822 + if (cfg.iterations > 0 && ++iterations >= cfg.iterations) 823 + break; 824 + 825 + /* Exit if output_one_time is set */ 826 + if (cfg.output_one_time) 827 + break; 828 + 829 + /* Check for 'q' key to quit */ 830 + if (use_q_quit) { 831 + struct timeval tv = {cfg.delay, 0}; 832 + fd_set readfds; 833 + 834 + FD_ZERO(&readfds); 835 + FD_SET(STDIN_FILENO, &readfds); 836 + int r = select(STDIN_FILENO+1, &readfds, NULL, NULL, &tv); 837 + 838 + if (r > 0 && FD_ISSET(STDIN_FILENO, &readfds)) { 839 + char ch = 0; 840 + 841 + read(STDIN_FILENO, &ch, 1); 842 + if (ch == 'q' || ch == 'Q') { 843 + running = 0; 844 + break; 845 + } 846 + } 847 + } else { 848 + sleep(cfg.delay); 849 + } 850 + } 851 + 852 + /* Restore terminal mode */ 853 + if (use_q_quit) 854 + disable_raw_mode(); 855 + 856 + /* Cleanup */ 857 + close(nl_sd); 858 + if (cfg.container_path) 859 + free(cfg.container_path); 860 + 861 + return 0; 862 + }

+100 -67

tools/accounting/getdelays.c

··· 194 194 #define average_ms(t, c) (t / 1000000ULL / (c ? c : 1)) 195 195 #define delay_ms(t) (t / 1000000ULL) 196 196 197 + /* 198 + * Version compatibility note: 199 + * Field availability depends on taskstats version (t->version), 200 + * corresponding to TASKSTATS_VERSION in kernel headers 201 + * see include/uapi/linux/taskstats.h 202 + * 203 + * Version feature mapping: 204 + * version >= 11 - supports COMPACT statistics 205 + * version >= 13 - supports WPCOPY statistics 206 + * version >= 14 - supports IRQ statistics 207 + * version >= 16 - supports *_max and *_min delay statistics 208 + * 209 + * Always verify version before accessing version-dependent fields 210 + * to maintain backward compatibility. 211 + */ 212 + #define PRINT_CPU_DELAY(version, t) \ 213 + do { \ 214 + if (version >= 16) { \ 215 + printf("%-10s%15s%15s%15s%15s%15s%15s%15s\n", \ 216 + "CPU", "count", "real total", "virtual total", \ 217 + "delay total", "delay average", "delay max", "delay min"); \ 218 + printf(" %15llu%15llu%15llu%15llu%15.3fms%13.6fms%13.6fms\n", \ 219 + (unsigned long long)(t)->cpu_count, \ 220 + (unsigned long long)(t)->cpu_run_real_total, \ 221 + (unsigned long long)(t)->cpu_run_virtual_total, \ 222 + (unsigned long long)(t)->cpu_delay_total, \ 223 + average_ms((double)(t)->cpu_delay_total, (t)->cpu_count), \ 224 + delay_ms((double)(t)->cpu_delay_max), \ 225 + delay_ms((double)(t)->cpu_delay_min)); \ 226 + } else { \ 227 + printf("%-10s%15s%15s%15s%15s%15s\n", \ 228 + "CPU", "count", "real total", "virtual total", \ 229 + "delay total", "delay average"); \ 230 + printf(" %15llu%15llu%15llu%15llu%15.3fms\n", \ 231 + (unsigned long long)(t)->cpu_count, \ 232 + (unsigned long long)(t)->cpu_run_real_total, \ 233 + (unsigned long long)(t)->cpu_run_virtual_total, \ 234 + (unsigned long long)(t)->cpu_delay_total, \ 235 + average_ms((double)(t)->cpu_delay_total, (t)->cpu_count)); \ 236 + } \ 237 + } while (0) 238 + #define PRINT_FILED_DELAY(name, version, t, count, total, max, min) \ 239 + do { \ 240 + if (version >= 16) { \ 241 + printf("%-10s%15s%15s%15s%15s%15s\n", \ 242 + name, "count", "delay total", "delay average", \ 243 + "delay max", "delay min"); \ 244 + printf(" %15llu%15llu%15.3fms%13.6fms%13.6fms\n", \ 245 + (unsigned long long)(t)->count, \ 246 + (unsigned long long)(t)->total, \ 247 + average_ms((double)(t)->total, (t)->count), \ 248 + delay_ms((double)(t)->max), \ 249 + delay_ms((double)(t)->min)); \ 250 + } else { \ 251 + printf("%-10s%15s%15s%15s\n", \ 252 + name, "count", "delay total", "delay average"); \ 253 + printf(" %15llu%15llu%15.3fms\n", \ 254 + (unsigned long long)(t)->count, \ 255 + (unsigned long long)(t)->total, \ 256 + average_ms((double)(t)->total, (t)->count)); \ 257 + } \ 258 + } while (0) 259 + 197 260 static void print_delayacct(struct taskstats *t) 198 261 { 199 - printf("\n\nCPU %15s%15s%15s%15s%15s%15s%15s\n" 200 - " %15llu%15llu%15llu%15llu%15.3fms%13.6fms%13.6fms\n" 201 - "IO %15s%15s%15s%15s%15s\n" 202 - " %15llu%15llu%15.3fms%13.6fms%13.6fms\n" 203 - "SWAP %15s%15s%15s%15s%15s\n" 204 - " %15llu%15llu%15.3fms%13.6fms%13.6fms\n" 205 - "RECLAIM %12s%15s%15s%15s%15s\n" 206 - " %15llu%15llu%15.3fms%13.6fms%13.6fms\n" 207 - "THRASHING%12s%15s%15s%15s%15s\n" 208 - " %15llu%15llu%15.3fms%13.6fms%13.6fms\n" 209 - "COMPACT %12s%15s%15s%15s%15s\n" 210 - " %15llu%15llu%15.3fms%13.6fms%13.6fms\n" 211 - "WPCOPY %12s%15s%15s%15s%15s\n" 212 - " %15llu%15llu%15.3fms%13.6fms%13.6fms\n" 213 - "IRQ %15s%15s%15s%15s%15s\n" 214 - " %15llu%15llu%15.3fms%13.6fms%13.6fms\n", 215 - "count", "real total", "virtual total", 216 - "delay total", "delay average", "delay max", "delay min", 217 - (unsigned long long)t->cpu_count, 218 - (unsigned long long)t->cpu_run_real_total, 219 - (unsigned long long)t->cpu_run_virtual_total, 220 - (unsigned long long)t->cpu_delay_total, 221 - average_ms((double)t->cpu_delay_total, t->cpu_count), 222 - delay_ms((double)t->cpu_delay_max), 223 - delay_ms((double)t->cpu_delay_min), 224 - "count", "delay total", "delay average", "delay max", "delay min", 225 - (unsigned long long)t->blkio_count, 226 - (unsigned long long)t->blkio_delay_total, 227 - average_ms((double)t->blkio_delay_total, t->blkio_count), 228 - delay_ms((double)t->blkio_delay_max), 229 - delay_ms((double)t->blkio_delay_min), 230 - "count", "delay total", "delay average", "delay max", "delay min", 231 - (unsigned long long)t->swapin_count, 232 - (unsigned long long)t->swapin_delay_total, 233 - average_ms((double)t->swapin_delay_total, t->swapin_count), 234 - delay_ms((double)t->swapin_delay_max), 235 - delay_ms((double)t->swapin_delay_min), 236 - "count", "delay total", "delay average", "delay max", "delay min", 237 - (unsigned long long)t->freepages_count, 238 - (unsigned long long)t->freepages_delay_total, 239 - average_ms((double)t->freepages_delay_total, t->freepages_count), 240 - delay_ms((double)t->freepages_delay_max), 241 - delay_ms((double)t->freepages_delay_min), 242 - "count", "delay total", "delay average", "delay max", "delay min", 243 - (unsigned long long)t->thrashing_count, 244 - (unsigned long long)t->thrashing_delay_total, 245 - average_ms((double)t->thrashing_delay_total, t->thrashing_count), 246 - delay_ms((double)t->thrashing_delay_max), 247 - delay_ms((double)t->thrashing_delay_min), 248 - "count", "delay total", "delay average", "delay max", "delay min", 249 - (unsigned long long)t->compact_count, 250 - (unsigned long long)t->compact_delay_total, 251 - average_ms((double)t->compact_delay_total, t->compact_count), 252 - delay_ms((double)t->compact_delay_max), 253 - delay_ms((double)t->compact_delay_min), 254 - "count", "delay total", "delay average", "delay max", "delay min", 255 - (unsigned long long)t->wpcopy_count, 256 - (unsigned long long)t->wpcopy_delay_total, 257 - average_ms((double)t->wpcopy_delay_total, t->wpcopy_count), 258 - delay_ms((double)t->wpcopy_delay_max), 259 - delay_ms((double)t->wpcopy_delay_min), 260 - "count", "delay total", "delay average", "delay max", "delay min", 261 - (unsigned long long)t->irq_count, 262 - (unsigned long long)t->irq_delay_total, 263 - average_ms((double)t->irq_delay_total, t->irq_count), 264 - delay_ms((double)t->irq_delay_max), 265 - delay_ms((double)t->irq_delay_min)); 262 + printf("\n\n"); 263 + 264 + PRINT_CPU_DELAY(t->version, t); 265 + 266 + PRINT_FILED_DELAY("IO", t->version, t, 267 + blkio_count, blkio_delay_total, 268 + blkio_delay_max, blkio_delay_min); 269 + 270 + PRINT_FILED_DELAY("SWAP", t->version, t, 271 + swapin_count, swapin_delay_total, 272 + swapin_delay_max, swapin_delay_min); 273 + 274 + PRINT_FILED_DELAY("RECLAIM", t->version, t, 275 + freepages_count, freepages_delay_total, 276 + freepages_delay_max, freepages_delay_min); 277 + 278 + PRINT_FILED_DELAY("THRASHING", t->version, t, 279 + thrashing_count, thrashing_delay_total, 280 + thrashing_delay_max, thrashing_delay_min); 281 + 282 + if (t->version >= 11) { 283 + PRINT_FILED_DELAY("COMPACT", t->version, t, 284 + compact_count, compact_delay_total, 285 + compact_delay_max, compact_delay_min); 286 + } 287 + 288 + if (t->version >= 13) { 289 + PRINT_FILED_DELAY("WPCOPY", t->version, t, 290 + wpcopy_count, wpcopy_delay_total, 291 + wpcopy_delay_max, wpcopy_delay_min); 292 + } 293 + 294 + if (t->version >= 14) { 295 + PRINT_FILED_DELAY("IRQ", t->version, t, 296 + irq_count, irq_delay_total, 297 + irq_delay_max, irq_delay_min); 298 + } 266 299 } 267 300 268 301 static void task_context_switch_counts(struct taskstats *t)

+9

tools/testing/selftests/kho/arm64.conf

··· 1 + QEMU_CMD="qemu-system-aarch64 -M virt -cpu max" 2 + QEMU_KCONFIG=" 3 + CONFIG_SERIAL_AMBA_PL010=y 4 + CONFIG_SERIAL_AMBA_PL010_CONSOLE=y 5 + CONFIG_SERIAL_AMBA_PL011=y 6 + CONFIG_SERIAL_AMBA_PL011_CONSOLE=y 7 + " 8 + KERNEL_IMAGE="Image" 9 + KERNEL_CMDLINE="console=ttyAMA0"

+100

tools/testing/selftests/kho/init.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + #ifndef NOLIBC 4 + #include <errno.h> 5 + #include <stdio.h> 6 + #include <unistd.h> 7 + #include <fcntl.h> 8 + #include <syscall.h> 9 + #include <sys/mount.h> 10 + #include <sys/reboot.h> 11 + #endif 12 + 13 + /* from arch/x86/include/asm/setup.h */ 14 + #define COMMAND_LINE_SIZE 2048 15 + 16 + /* from include/linux/kexex.h */ 17 + #define KEXEC_FILE_NO_INITRAMFS 0x00000004 18 + 19 + #define KHO_FINILIZE "/debugfs/kho/out/finalize" 20 + #define KERNEL_IMAGE "/kernel" 21 + 22 + static int mount_filesystems(void) 23 + { 24 + if (mount("debugfs", "/debugfs", "debugfs", 0, NULL) < 0) 25 + return -1; 26 + 27 + return mount("proc", "/proc", "proc", 0, NULL); 28 + } 29 + 30 + static int kho_enable(void) 31 + { 32 + const char enable[] = "1"; 33 + int fd; 34 + 35 + fd = open(KHO_FINILIZE, O_RDWR); 36 + if (fd < 0) 37 + return -1; 38 + 39 + if (write(fd, enable, sizeof(enable)) != sizeof(enable)) 40 + return 1; 41 + 42 + close(fd); 43 + return 0; 44 + } 45 + 46 + static long kexec_file_load(int kernel_fd, int initrd_fd, 47 + unsigned long cmdline_len, const char *cmdline, 48 + unsigned long flags) 49 + { 50 + return syscall(__NR_kexec_file_load, kernel_fd, initrd_fd, cmdline_len, 51 + cmdline, flags); 52 + } 53 + 54 + static int kexec_load(void) 55 + { 56 + char cmdline[COMMAND_LINE_SIZE]; 57 + ssize_t len; 58 + int fd, err; 59 + 60 + fd = open("/proc/cmdline", O_RDONLY); 61 + if (fd < 0) 62 + return -1; 63 + 64 + len = read(fd, cmdline, sizeof(cmdline)); 65 + close(fd); 66 + if (len < 0) 67 + return -1; 68 + 69 + /* replace \n with \0 */ 70 + cmdline[len - 1] = 0; 71 + fd = open(KERNEL_IMAGE, O_RDONLY); 72 + if (fd < 0) 73 + return -1; 74 + 75 + err = kexec_file_load(fd, -1, len, cmdline, KEXEC_FILE_NO_INITRAMFS); 76 + close(fd); 77 + 78 + return err ? : 0; 79 + } 80 + 81 + int main(int argc, char *argv[]) 82 + { 83 + if (mount_filesystems()) 84 + goto err_reboot; 85 + 86 + if (kho_enable()) 87 + goto err_reboot; 88 + 89 + if (kexec_load()) 90 + goto err_reboot; 91 + 92 + if (reboot(RB_KEXEC)) 93 + goto err_reboot; 94 + 95 + return 0; 96 + 97 + err_reboot: 98 + reboot(RB_AUTOBOOT); 99 + return -1; 100 + }

+183

tools/testing/selftests/kho/vmtest.sh

··· 1 + #!/bin/bash 2 + # SPDX-License-Identifier: GPL-2.0 3 + 4 + set -ue 5 + 6 + CROSS_COMPILE="${CROSS_COMPILE:-""}" 7 + 8 + test_dir=$(realpath "$(dirname "$0")") 9 + kernel_dir=$(realpath "$test_dir/../../../..") 10 + 11 + tmp_dir=$(mktemp -d /tmp/kho-test.XXXXXXXX) 12 + headers_dir="$tmp_dir/usr" 13 + initrd_dir="$tmp_dir/initrd" 14 + initrd="$tmp_dir/initrd.cpio" 15 + 16 + source "$test_dir/../kselftest/ktap_helpers.sh" 17 + 18 + function usage() { 19 + cat <<EOF 20 + $0 [-d build_dir] [-j jobs] [-t target_arch] [-h] 21 + Options: 22 + -d) path to the kernel build directory 23 + -j) number of jobs for compilation, similar to -j in make 24 + -t) run test for target_arch, requires CROSS_COMPILE set 25 + supported targets: aarch64, x86_64 26 + -h) display this help 27 + EOF 28 + } 29 + 30 + function cleanup() { 31 + rm -fr "$tmp_dir" 32 + ktap_finished 33 + } 34 + trap cleanup EXIT 35 + 36 + function skip() { 37 + local msg=${1:-""} 38 + 39 + ktap_test_skip "$msg" 40 + exit "$KSFT_SKIP" 41 + } 42 + 43 + function fail() { 44 + local msg=${1:-""} 45 + 46 + ktap_test_fail "$msg" 47 + exit "$KSFT_FAIL" 48 + } 49 + 50 + function build_kernel() { 51 + local build_dir=$1 52 + local make_cmd=$2 53 + local arch_kconfig=$3 54 + local kimage=$4 55 + 56 + local kho_config="$tmp_dir/kho.config" 57 + local kconfig="$build_dir/.config" 58 + 59 + # enable initrd, KHO and KHO test in kernel configuration 60 + tee "$kconfig" > "$kho_config" <<EOF 61 + CONFIG_BLK_DEV_INITRD=y 62 + CONFIG_KEXEC_HANDOVER=y 63 + CONFIG_TEST_KEXEC_HANDOVER=y 64 + CONFIG_DEBUG_KERNEL=y 65 + CONFIG_DEBUG_VM=y 66 + $arch_kconfig 67 + EOF 68 + 69 + make_cmd="$make_cmd -C $kernel_dir O=$build_dir" 70 + $make_cmd olddefconfig 71 + 72 + # verify that kernel confiration has all necessary options 73 + while read -r opt ; do 74 + grep "$opt" "$kconfig" &>/dev/null || skip "$opt is missing" 75 + done < "$kho_config" 76 + 77 + $make_cmd "$kimage" 78 + $make_cmd headers_install INSTALL_HDR_PATH="$headers_dir" 79 + } 80 + 81 + function mkinitrd() { 82 + local kernel=$1 83 + 84 + mkdir -p "$initrd_dir"/{dev,debugfs,proc} 85 + sudo mknod "$initrd_dir/dev/console" c 5 1 86 + 87 + "$CROSS_COMPILE"gcc -s -static -Os -nostdinc -I"$headers_dir/include" \ 88 + -fno-asynchronous-unwind-tables -fno-ident -nostdlib \ 89 + -include "$test_dir/../../../include/nolibc/nolibc.h" \ 90 + -o "$initrd_dir/init" "$test_dir/init.c" \ 91 + 92 + cp "$kernel" "$initrd_dir/kernel" 93 + 94 + pushd "$initrd_dir" &>/dev/null 95 + find . | cpio -H newc --create > "$initrd" 2>/dev/null 96 + popd &>/dev/null 97 + } 98 + 99 + function run_qemu() { 100 + local qemu_cmd=$1 101 + local cmdline=$2 102 + local kernel=$3 103 + local serial="$tmp_dir/qemu.serial" 104 + 105 + cmdline="$cmdline kho=on panic=-1" 106 + 107 + $qemu_cmd -m 1G -smp 2 -no-reboot -nographic -nodefaults \ 108 + -accel kvm -accel hvf -accel tcg \ 109 + -serial file:"$serial" \ 110 + -append "$cmdline" \ 111 + -kernel "$kernel" \ 112 + -initrd "$initrd" 113 + 114 + grep "KHO restore succeeded" "$serial" &> /dev/null || fail "KHO failed" 115 + } 116 + 117 + function target_to_arch() { 118 + local target=$1 119 + 120 + case $target in 121 + aarch64) echo "arm64" ;; 122 + x86_64) echo "x86" ;; 123 + *) skip "architecture $target is not supported" 124 + esac 125 + } 126 + 127 + function main() { 128 + local build_dir="$kernel_dir/.kho" 129 + local jobs=$(($(nproc) * 2)) 130 + local target="$(uname -m)" 131 + 132 + # skip the test if any of the preparation steps fails 133 + set -o errtrace 134 + trap skip ERR 135 + 136 + while getopts 'hd:j:t:' opt; do 137 + case $opt in 138 + d) 139 + build_dir="$OPTARG" 140 + ;; 141 + j) 142 + jobs="$OPTARG" 143 + ;; 144 + t) 145 + target="$OPTARG" 146 + ;; 147 + h) 148 + usage 149 + exit 0 150 + ;; 151 + *) 152 + echo Unknown argument "$opt" 153 + usage 154 + exit 1 155 + ;; 156 + esac 157 + done 158 + 159 + ktap_print_header 160 + ktap_set_plan 1 161 + 162 + if [[ "$target" != "$(uname -m)" ]] && [[ -z "$CROSS_COMPILE" ]]; then 163 + skip "Cross-platform testing needs to specify CROSS_COMPILE" 164 + fi 165 + 166 + mkdir -p "$build_dir" 167 + local arch=$(target_to_arch "$target") 168 + source "$test_dir/$arch.conf" 169 + 170 + # build the kernel and create initrd 171 + # initrd includes the kernel image that will be kexec'ed 172 + local make_cmd="make ARCH=$arch CROSS_COMPILE=$CROSS_COMPILE -j$jobs" 173 + build_kernel "$build_dir" "$make_cmd" "$QEMU_KCONFIG" "$KERNEL_IMAGE" 174 + 175 + local kernel="$build_dir/arch/$arch/boot/$KERNEL_IMAGE" 176 + mkinitrd "$kernel" 177 + 178 + run_qemu "$QEMU_CMD" "$KERNEL_CMDLINE" "$kernel" 179 + 180 + ktap_test_pass "KHO succeeded" 181 + } 182 + 183 + main "$@"

+7

tools/testing/selftests/kho/x86.conf

··· 1 + QEMU_CMD=qemu-system-x86_64 2 + QEMU_KCONFIG=" 3 + CONFIG_SERIAL_8250=y 4 + CONFIG_SERIAL_8250_CONSOLE=y 5 + " 6 + KERNEL_IMAGE="bzImage" 7 + KERNEL_CMDLINE="console=ttyS0"

+1

tools/testing/selftests/ptrace/.gitignore

··· 3 3 get_set_sud 4 4 peeksiginfo 5 5 vmaccess 6 + set_syscall_info

+7 -9

tools/testing/selftests/thermal/intel/workload_hint/workload_hint_test.c

··· 32 32 33 33 fd = open(WORKLOAD_ENABLE_ATTRIBUTE, O_RDWR); 34 34 if (fd < 0) { 35 - perror("Unable to open workload type feature enable file\n"); 35 + perror("Unable to open workload type feature enable file"); 36 36 exit(1); 37 37 } 38 38 39 39 if (write(fd, "0\n", 2) < 0) { 40 - perror("Can't disable workload hints\n"); 40 + perror("Can't disable workload hints"); 41 41 exit(1); 42 42 } 43 43 ··· 68 68 exit(1); 69 69 70 70 sprintf(delay_str, "%s\n", argv[1]); 71 - 72 - sprintf(delay_str, "%s\n", argv[1]); 73 71 fd = open(WORKLOAD_NOTIFICATION_DELAY_ATTRIBUTE, O_RDWR); 74 72 if (fd < 0) { 75 - perror("Unable to open workload notification delay\n"); 73 + perror("Unable to open workload notification delay"); 76 74 exit(1); 77 75 } 78 76 79 77 if (write(fd, delay_str, strlen(delay_str)) < 0) { 80 - perror("Can't set delay\n"); 78 + perror("Can't set delay"); 81 79 exit(1); 82 80 } 83 81 ··· 92 94 /* Enable feature via sysfs knob */ 93 95 fd = open(WORKLOAD_ENABLE_ATTRIBUTE, O_RDWR); 94 96 if (fd < 0) { 95 - perror("Unable to open workload type feature enable file\n"); 97 + perror("Unable to open workload type feature enable file"); 96 98 exit(1); 97 99 } 98 100 99 101 if (write(fd, "1\n", 2) < 0) { 100 - perror("Can't enable workload hints\n"); 102 + perror("Can't enable workload hints"); 101 103 exit(1); 102 104 } 103 105 ··· 108 110 while (1) { 109 111 fd = open(WORKLOAD_TYPE_INDEX_ATTRIBUTE, O_RDONLY); 110 112 if (fd < 0) { 111 - perror("Unable to open workload type file\n"); 113 + perror("Unable to open workload type file"); 112 114 exit(1); 113 115 } 114 116

Configure Feed

Configure Feed