Merge tag 'mm-nonmm-stable-2026-04-15-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

+5 -1

.mailmap

··· 309 309 Govindaraj Saminathan <quic_gsamin@quicinc.com> <gsamin@codeaurora.org> 310 310 Guo Ren <guoren@kernel.org> <guoren@linux.alibaba.com> 311 311 Guo Ren <guoren@kernel.org> <ren_guo@c-sky.com> 312 - Guru Das Srinagesh <quic_gurus@quicinc.com> <gurus@codeaurora.org> 312 + Guru Das Srinagesh <linux@gurudas.dev> 313 + Guru Das Srinagesh <linux@gurudas.dev> <quic_gurus@quicinc.com> 314 + Guru Das Srinagesh <linux@gurudas.dev> <gurus@codeaurora.org> 315 + Guru Das Srinagesh <linux@gurudas.dev> <gurooodas@gmail.com> 313 316 Gustavo Padovan <gustavo@las.ic.unicamp.br> 314 317 Gustavo Padovan <padovan@profusion.mobi> 315 318 Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> <hamza.mahfooz@amd.com> ··· 746 743 Satya Priya <quic_skakitap@quicinc.com> <quic_c_skakit@quicinc.com> <skakit@codeaurora.org> 747 744 S.Çağlar Onur <caglar@pardus.org.tr> 748 745 Sayali Lokhande <quic_sayalil@quicinc.com> <sayalil@codeaurora.org> 746 + Sean Anderson <sean.anderson@linux.dev> <sean.anderson@seco.com> 749 747 Sean Christopherson <seanjc@google.com> <sean.j.christopherson@intel.com> 750 748 Sean Nyekjaer <sean@geanix.com> <sean.nyekjaer@prevas.dk> 751 749 Sean Tranchetti <quic_stranche@quicinc.com> <stranche@codeaurora.org>

+2 -5

CREDITS

··· 4570 4570 D: EISA/sysfs subsystem 4571 4571 S: France 4572 4572 4573 - # Don't add your name here, unless you really _are_ after Marc 4574 - # alphabetically. Leonard used to be very proud of being the 4575 - # last entry, and he'll get positively pissed if he can't even 4576 - # be second-to-last. (and this file really _is_ supposed to be 4577 - # in alphabetic order) 4573 + # Don't add your name here unless you really are last alphabetically. 4574 + # (This file is supposed to be kept in alphabetical order by last name.)

+119 -23

Documentation/admin-guide/lockup-watchdogs.rst

··· 16 16 provided for this. 17 17 18 18 A 'hardlockup' is defined as a bug that causes the CPU to loop in 19 - kernel mode for more than 10 seconds (see "Implementation" below for 19 + kernel mode for several seconds (see "Implementation" below for 20 20 details), without letting other interrupts have a chance to run. 21 21 Similarly to the softlockup case, the current stack trace is displayed 22 22 upon detection and the system will stay locked up unless the default ··· 30 30 to cause the system to reboot automatically after a specified amount 31 31 of time. 32 32 33 + Configuration 34 + ============= 35 + 36 + A kernel knob is provided that allows administrators to configure 37 + this period. The "watchdog_thresh" parameter (default 10 seconds) 38 + controls the threshold. The right value for a particular environment 39 + is a trade-off between fast response to lockups and detection overhead. 40 + 33 41 Implementation 34 42 ============== 35 43 36 - The soft and hard lockup detectors are built on top of the hrtimer and 37 - perf subsystems, respectively. A direct consequence of this is that, 38 - in principle, they should work in any architecture where these 39 - subsystems are present. 44 + The soft and hard lockup detectors are built around an hrtimer. 45 + In addition, the softlockup detector regularly schedules a job, and 46 + the hard lockup detector might use Perf/NMI events on architectures 47 + that support it. 40 48 41 - A periodic hrtimer runs to generate interrupts and kick the watchdog 42 - job. An NMI perf event is generated every "watchdog_thresh" 43 - (compile-time initialized to 10 and configurable through sysctl of the 44 - same name) seconds to check for hardlockups. If any CPU in the system 45 - does not receive any hrtimer interrupt during that time the 46 - 'hardlockup detector' (the handler for the NMI perf event) will 47 - generate a kernel warning or call panic, depending on the 48 - configuration. 49 + Frequency and Heartbeats 50 + ------------------------ 49 51 50 - The watchdog job runs in a stop scheduling thread that updates a 51 - timestamp every time it is scheduled. If that timestamp is not updated 52 - for 2*watchdog_thresh seconds (the softlockup threshold) the 52 + The core of the detectors is an hrtimer. It serves multiple purposes: 53 + 54 + - schedules watchdog job for the softlockup detector 55 + - bumps the interrupt counter for hardlockup detectors (heartbeat) 56 + - detects softlockups 57 + - detects hardlockups in Buddy mode 58 + 59 + The period of this hrtimer is 2*watchdog_thresh/5, which is 4 seconds 60 + by default. The hrtimer has two or three chances to generate an interrupt 61 + (heartbeat) before the hardlockup detector kicks in. 62 + 63 + Softlockup Detector 64 + ------------------- 65 + 66 + The watchdog job is scheduled by the hrtimer and runs in a stop scheduling 67 + thread. It updates a timestamp every time it is scheduled. If that timestamp 68 + is not updated for 2*watchdog_thresh seconds (the softlockup threshold) the 53 69 'softlockup detector' (coded inside the hrtimer callback function) 54 70 will dump useful debug information to the system log, after which it 55 71 will call panic if it was instructed to do so or resume execution of 56 72 other kernel code. 57 73 58 - The period of the hrtimer is 2*watchdog_thresh/5, which means it has 59 - two or three chances to generate an interrupt before the hardlockup 60 - detector kicks in. 74 + Hardlockup Detector (NMI/Perf) 75 + ------------------------------ 61 76 62 - As explained above, a kernel knob is provided that allows 63 - administrators to configure the period of the hrtimer and the perf 64 - event. The right value for a particular environment is a trade-off 65 - between fast response to lockups and detection overhead. 77 + On architectures that support NMI (Non-Maskable Interrupt) perf events, 78 + a periodic NMI is generated every "watchdog_thresh" seconds. 79 + 80 + If any CPU in the system does not receive any hrtimer interrupt 81 + (heartbeat) during the "watchdog_thresh" window, the 'hardlockup 82 + detector' (the handler for the NMI perf event) will generate a kernel 83 + warning or call panic. 84 + 85 + **Detection Overhead (NMI):** 86 + 87 + The time to detect a lockup can vary depending on when the lockup 88 + occurs relative to the NMI check window. Examples below assume a watchdog_thresh of 10. 89 + 90 + * **Best Case:** The lockup occurs just before the first heartbeat is 91 + due. The detector will notice the missing hrtimer interrupt almost 92 + immediately during the next check. 93 + 94 + :: 95 + 96 + Time 100.0: cpu 1 heartbeat 97 + Time 100.1: hardlockup_check, cpu1 stores its state 98 + Time 103.9: Hard Lockup on cpu1 99 + Time 104.0: cpu 1 heartbeat never comes 100 + Time 110.1: hardlockup_check, cpu1 checks the state again, should be the same, declares lockup 101 + 102 + Time to detection: ~6 seconds 103 + 104 + * **Worst Case:** The lockup occurs shortly after a valid interrupt 105 + (heartbeat) which itself happened just after the NMI check. The next 106 + NMI check sees that the interrupt count has changed (due to that one 107 + heartbeat), assumes the CPU is healthy, and resets the baseline. The 108 + lockup is only detected at the subsequent check. 109 + 110 + :: 111 + 112 + Time 100.0: hardlockup_check, cpu1 stores its state 113 + Time 100.1: cpu 1 heartbeat 114 + Time 100.2: Hard Lockup on cpu1 115 + Time 110.0: hardlockup_check, cpu1 stores its state (misses lockup as state changed) 116 + Time 120.0: hardlockup_check, cpu1 checks the state again, should be the same, declares lockup 117 + 118 + Time to detection: ~20 seconds 119 + 120 + Hardlockup Detector (Buddy) 121 + --------------------------- 122 + 123 + On architectures or configurations where NMI perf events are not 124 + available (or disabled), the kernel may use the "buddy" hardlockup 125 + detector. This mechanism requires SMP (Symmetric Multi-Processing). 126 + 127 + In this mode, each CPU is assigned a "buddy" CPU to monitor. The 128 + monitoring CPU runs its own hrtimer (the same one used for softlockup 129 + detection) and checks if the buddy CPU's hrtimer interrupt count has 130 + increased. 131 + 132 + To ensure timeliness and avoid false positives, the buddy system performs 133 + checks at every hrtimer interval (2*watchdog_thresh/5, which is 4 seconds 134 + by default). It uses a missed-interrupt threshold of 3. If the buddy's 135 + interrupt count has not changed for 3 consecutive checks, it is assumed 136 + that the buddy CPU is hardlocked (interrupts disabled). The monitoring 137 + CPU will then trigger the hardlockup response (warning or panic). 138 + 139 + **Detection Overhead (Buddy):** 140 + 141 + With a default check interval of 4 seconds (watchdog_thresh = 10): 142 + 143 + * **Best case:** Lockup occurs just before a check. 144 + Detected in ~8s (0s till 1st check + 4s till 2nd + 4s till 3rd). 145 + * **Worst case:** Lockup occurs just after a check. 146 + Detected in ~12s (4s till 1st check + 4s till 2nd + 4s till 3rd). 147 + 148 + **Limitations of the Buddy Detector:** 149 + 150 + 1. **All-CPU Lockup:** If all CPUs lock up simultaneously, the buddy 151 + detector cannot detect the condition because the monitoring CPUs 152 + are also frozen. 153 + 2. **Stack Traces:** Unlike the NMI detector, the buddy detector 154 + cannot directly interrupt the locked CPU to grab a stack trace. 155 + It relies on architecture-specific mechanisms (like NMI backtrace 156 + support) to try and retrieve the status of the locked CPU. If 157 + such support is missing, the log may only show that a lockup 158 + occurred without providing the locked CPU's stack. 159 + 160 + Watchdog Core Exclusion 161 + ======================= 66 162 67 163 By default, the watchdog runs on all online cores. However, on a 68 164 kernel configured with NO_HZ_FULL, by default the watchdog runs only

+2 -1

Documentation/admin-guide/sysctl/kernel.rst

··· 418 418 ====================== 419 419 420 420 Indicates the total number of tasks that have been detected as hung since 421 - the system boot. 421 + the system boot or since the counter was reset. The counter is zeroed when 422 + a value of 0 is written. 422 423 423 424 This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled. 424 425

+1 -1

Documentation/devicetree/bindings/timer/xlnx,xps-timer.yaml

··· 7 7 title: Xilinx LogiCORE IP AXI Timer 8 8 9 9 maintainers: 10 - - Sean Anderson <sean.anderson@seco.com> 10 + - Sean Anderson <sean.anderson@linux.dev> 11 11 12 12 properties: 13 13 compatible:

+3 -2

MAINTAINERS

··· 35 35 F: drivers/net/ all files in and below drivers/net 36 36 F: drivers/net/* all files in drivers/net, but not below 37 37 F: */net/* all files in "any top level directory"/net 38 + F: fs/**/*foo*.c all *foo*.c files in any subdirectory of fs 38 39 One pattern per line. Multiple F: lines acceptable. 39 40 X: *Excluded* files and directories that are NOT maintained, same 40 41 rules as F:. Files exclusions are tested before file matches. ··· 10309 10308 10310 10309 FREESCALE QORIQ DPAA FMAN DRIVER 10311 10310 M: Madalin Bucur <madalin.bucur@nxp.com> 10312 - R: Sean Anderson <sean.anderson@seco.com> 10311 + R: Sean Anderson <sean.anderson@linux.dev> 10313 10312 L: netdev@vger.kernel.org 10314 10313 S: Maintained 10315 10314 F: Documentation/devicetree/bindings/net/fsl,fman*.yaml ··· 29110 29109 F: drivers/net/ethernet/xilinx/ll_temac* 29111 29110 29112 29111 XILINX PWM DRIVER 29113 - M: Sean Anderson <sean.anderson@seco.com> 29112 + M: Sean Anderson <sean.anderson@linux.dev> 29114 29113 S: Maintained 29115 29114 F: drivers/pwm/pwm-xilinx.c 29116 29115 F: include/clocksource/timer-xilinx.h

-866

arch/alpha/include/asm/xor.h

··· 1 - /* SPDX-License-Identifier: GPL-2.0-or-later */ 2 - /* 3 - * include/asm-alpha/xor.h 4 - * 5 - * Optimized RAID-5 checksumming functions for alpha EV5 and EV6 6 - */ 7 - 8 - extern void 9 - xor_alpha_2(unsigned long bytes, unsigned long * __restrict p1, 10 - const unsigned long * __restrict p2); 11 - extern void 12 - xor_alpha_3(unsigned long bytes, unsigned long * __restrict p1, 13 - const unsigned long * __restrict p2, 14 - const unsigned long * __restrict p3); 15 - extern void 16 - xor_alpha_4(unsigned long bytes, unsigned long * __restrict p1, 17 - const unsigned long * __restrict p2, 18 - const unsigned long * __restrict p3, 19 - const unsigned long * __restrict p4); 20 - extern void 21 - xor_alpha_5(unsigned long bytes, unsigned long * __restrict p1, 22 - const unsigned long * __restrict p2, 23 - const unsigned long * __restrict p3, 24 - const unsigned long * __restrict p4, 25 - const unsigned long * __restrict p5); 26 - 27 - extern void 28 - xor_alpha_prefetch_2(unsigned long bytes, unsigned long * __restrict p1, 29 - const unsigned long * __restrict p2); 30 - extern void 31 - xor_alpha_prefetch_3(unsigned long bytes, unsigned long * __restrict p1, 32 - const unsigned long * __restrict p2, 33 - const unsigned long * __restrict p3); 34 - extern void 35 - xor_alpha_prefetch_4(unsigned long bytes, unsigned long * __restrict p1, 36 - const unsigned long * __restrict p2, 37 - const unsigned long * __restrict p3, 38 - const unsigned long * __restrict p4); 39 - extern void 40 - xor_alpha_prefetch_5(unsigned long bytes, unsigned long * __restrict p1, 41 - const unsigned long * __restrict p2, 42 - const unsigned long * __restrict p3, 43 - const unsigned long * __restrict p4, 44 - const unsigned long * __restrict p5); 45 - 46 - asm(" \n\ 47 - .text \n\ 48 - .align 3 \n\ 49 - .ent xor_alpha_2 \n\ 50 - xor_alpha_2: \n\ 51 - .prologue 0 \n\ 52 - srl $16, 6, $16 \n\ 53 - .align 4 \n\ 54 - 2: \n\ 55 - ldq $0,0($17) \n\ 56 - ldq $1,0($18) \n\ 57 - ldq $2,8($17) \n\ 58 - ldq $3,8($18) \n\ 59 - \n\ 60 - ldq $4,16($17) \n\ 61 - ldq $5,16($18) \n\ 62 - ldq $6,24($17) \n\ 63 - ldq $7,24($18) \n\ 64 - \n\ 65 - ldq $19,32($17) \n\ 66 - ldq $20,32($18) \n\ 67 - ldq $21,40($17) \n\ 68 - ldq $22,40($18) \n\ 69 - \n\ 70 - ldq $23,48($17) \n\ 71 - ldq $24,48($18) \n\ 72 - ldq $25,56($17) \n\ 73 - xor $0,$1,$0 # 7 cycles from $1 load \n\ 74 - \n\ 75 - ldq $27,56($18) \n\ 76 - xor $2,$3,$2 \n\ 77 - stq $0,0($17) \n\ 78 - xor $4,$5,$4 \n\ 79 - \n\ 80 - stq $2,8($17) \n\ 81 - xor $6,$7,$6 \n\ 82 - stq $4,16($17) \n\ 83 - xor $19,$20,$19 \n\ 84 - \n\ 85 - stq $6,24($17) \n\ 86 - xor $21,$22,$21 \n\ 87 - stq $19,32($17) \n\ 88 - xor $23,$24,$23 \n\ 89 - \n\ 90 - stq $21,40($17) \n\ 91 - xor $25,$27,$25 \n\ 92 - stq $23,48($17) \n\ 93 - subq $16,1,$16 \n\ 94 - \n\ 95 - stq $25,56($17) \n\ 96 - addq $17,64,$17 \n\ 97 - addq $18,64,$18 \n\ 98 - bgt $16,2b \n\ 99 - \n\ 100 - ret \n\ 101 - .end xor_alpha_2 \n\ 102 - \n\ 103 - .align 3 \n\ 104 - .ent xor_alpha_3 \n\ 105 - xor_alpha_3: \n\ 106 - .prologue 0 \n\ 107 - srl $16, 6, $16 \n\ 108 - .align 4 \n\ 109 - 3: \n\ 110 - ldq $0,0($17) \n\ 111 - ldq $1,0($18) \n\ 112 - ldq $2,0($19) \n\ 113 - ldq $3,8($17) \n\ 114 - \n\ 115 - ldq $4,8($18) \n\ 116 - ldq $6,16($17) \n\ 117 - ldq $7,16($18) \n\ 118 - ldq $21,24($17) \n\ 119 - \n\ 120 - ldq $22,24($18) \n\ 121 - ldq $24,32($17) \n\ 122 - ldq $25,32($18) \n\ 123 - ldq $5,8($19) \n\ 124 - \n\ 125 - ldq $20,16($19) \n\ 126 - ldq $23,24($19) \n\ 127 - ldq $27,32($19) \n\ 128 - nop \n\ 129 - \n\ 130 - xor $0,$1,$1 # 8 cycles from $0 load \n\ 131 - xor $3,$4,$4 # 6 cycles from $4 load \n\ 132 - xor $6,$7,$7 # 6 cycles from $7 load \n\ 133 - xor $21,$22,$22 # 5 cycles from $22 load \n\ 134 - \n\ 135 - xor $1,$2,$2 # 9 cycles from $2 load \n\ 136 - xor $24,$25,$25 # 5 cycles from $25 load \n\ 137 - stq $2,0($17) \n\ 138 - xor $4,$5,$5 # 6 cycles from $5 load \n\ 139 - \n\ 140 - stq $5,8($17) \n\ 141 - xor $7,$20,$20 # 7 cycles from $20 load \n\ 142 - stq $20,16($17) \n\ 143 - xor $22,$23,$23 # 7 cycles from $23 load \n\ 144 - \n\ 145 - stq $23,24($17) \n\ 146 - xor $25,$27,$27 # 7 cycles from $27 load \n\ 147 - stq $27,32($17) \n\ 148 - nop \n\ 149 - \n\ 150 - ldq $0,40($17) \n\ 151 - ldq $1,40($18) \n\ 152 - ldq $3,48($17) \n\ 153 - ldq $4,48($18) \n\ 154 - \n\ 155 - ldq $6,56($17) \n\ 156 - ldq $7,56($18) \n\ 157 - ldq $2,40($19) \n\ 158 - ldq $5,48($19) \n\ 159 - \n\ 160 - ldq $20,56($19) \n\ 161 - xor $0,$1,$1 # 4 cycles from $1 load \n\ 162 - xor $3,$4,$4 # 5 cycles from $4 load \n\ 163 - xor $6,$7,$7 # 5 cycles from $7 load \n\ 164 - \n\ 165 - xor $1,$2,$2 # 4 cycles from $2 load \n\ 166 - xor $4,$5,$5 # 5 cycles from $5 load \n\ 167 - stq $2,40($17) \n\ 168 - xor $7,$20,$20 # 4 cycles from $20 load \n\ 169 - \n\ 170 - stq $5,48($17) \n\ 171 - subq $16,1,$16 \n\ 172 - stq $20,56($17) \n\ 173 - addq $19,64,$19 \n\ 174 - \n\ 175 - addq $18,64,$18 \n\ 176 - addq $17,64,$17 \n\ 177 - bgt $16,3b \n\ 178 - ret \n\ 179 - .end xor_alpha_3 \n\ 180 - \n\ 181 - .align 3 \n\ 182 - .ent xor_alpha_4 \n\ 183 - xor_alpha_4: \n\ 184 - .prologue 0 \n\ 185 - srl $16, 6, $16 \n\ 186 - .align 4 \n\ 187 - 4: \n\ 188 - ldq $0,0($17) \n\ 189 - ldq $1,0($18) \n\ 190 - ldq $2,0($19) \n\ 191 - ldq $3,0($20) \n\ 192 - \n\ 193 - ldq $4,8($17) \n\ 194 - ldq $5,8($18) \n\ 195 - ldq $6,8($19) \n\ 196 - ldq $7,8($20) \n\ 197 - \n\ 198 - ldq $21,16($17) \n\ 199 - ldq $22,16($18) \n\ 200 - ldq $23,16($19) \n\ 201 - ldq $24,16($20) \n\ 202 - \n\ 203 - ldq $25,24($17) \n\ 204 - xor $0,$1,$1 # 6 cycles from $1 load \n\ 205 - ldq $27,24($18) \n\ 206 - xor $2,$3,$3 # 6 cycles from $3 load \n\ 207 - \n\ 208 - ldq $0,24($19) \n\ 209 - xor $1,$3,$3 \n\ 210 - ldq $1,24($20) \n\ 211 - xor $4,$5,$5 # 7 cycles from $5 load \n\ 212 - \n\ 213 - stq $3,0($17) \n\ 214 - xor $6,$7,$7 \n\ 215 - xor $21,$22,$22 # 7 cycles from $22 load \n\ 216 - xor $5,$7,$7 \n\ 217 - \n\ 218 - stq $7,8($17) \n\ 219 - xor $23,$24,$24 # 7 cycles from $24 load \n\ 220 - ldq $2,32($17) \n\ 221 - xor $22,$24,$24 \n\ 222 - \n\ 223 - ldq $3,32($18) \n\ 224 - ldq $4,32($19) \n\ 225 - ldq $5,32($20) \n\ 226 - xor $25,$27,$27 # 8 cycles from $27 load \n\ 227 - \n\ 228 - ldq $6,40($17) \n\ 229 - ldq $7,40($18) \n\ 230 - ldq $21,40($19) \n\ 231 - ldq $22,40($20) \n\ 232 - \n\ 233 - stq $24,16($17) \n\ 234 - xor $0,$1,$1 # 9 cycles from $1 load \n\ 235 - xor $2,$3,$3 # 5 cycles from $3 load \n\ 236 - xor $27,$1,$1 \n\ 237 - \n\ 238 - stq $1,24($17) \n\ 239 - xor $4,$5,$5 # 5 cycles from $5 load \n\ 240 - ldq $23,48($17) \n\ 241 - ldq $24,48($18) \n\ 242 - \n\ 243 - ldq $25,48($19) \n\ 244 - xor $3,$5,$5 \n\ 245 - ldq $27,48($20) \n\ 246 - ldq $0,56($17) \n\ 247 - \n\ 248 - ldq $1,56($18) \n\ 249 - ldq $2,56($19) \n\ 250 - xor $6,$7,$7 # 8 cycles from $6 load \n\ 251 - ldq $3,56($20) \n\ 252 - \n\ 253 - stq $5,32($17) \n\ 254 - xor $21,$22,$22 # 8 cycles from $22 load \n\ 255 - xor $7,$22,$22 \n\ 256 - xor $23,$24,$24 # 5 cycles from $24 load \n\ 257 - \n\ 258 - stq $22,40($17) \n\ 259 - xor $25,$27,$27 # 5 cycles from $27 load \n\ 260 - xor $24,$27,$27 \n\ 261 - xor $0,$1,$1 # 5 cycles from $1 load \n\ 262 - \n\ 263 - stq $27,48($17) \n\ 264 - xor $2,$3,$3 # 4 cycles from $3 load \n\ 265 - xor $1,$3,$3 \n\ 266 - subq $16,1,$16 \n\ 267 - \n\ 268 - stq $3,56($17) \n\ 269 - addq $20,64,$20 \n\ 270 - addq $19,64,$19 \n\ 271 - addq $18,64,$18 \n\ 272 - \n\ 273 - addq $17,64,$17 \n\ 274 - bgt $16,4b \n\ 275 - ret \n\ 276 - .end xor_alpha_4 \n\ 277 - \n\ 278 - .align 3 \n\ 279 - .ent xor_alpha_5 \n\ 280 - xor_alpha_5: \n\ 281 - .prologue 0 \n\ 282 - srl $16, 6, $16 \n\ 283 - .align 4 \n\ 284 - 5: \n\ 285 - ldq $0,0($17) \n\ 286 - ldq $1,0($18) \n\ 287 - ldq $2,0($19) \n\ 288 - ldq $3,0($20) \n\ 289 - \n\ 290 - ldq $4,0($21) \n\ 291 - ldq $5,8($17) \n\ 292 - ldq $6,8($18) \n\ 293 - ldq $7,8($19) \n\ 294 - \n\ 295 - ldq $22,8($20) \n\ 296 - ldq $23,8($21) \n\ 297 - ldq $24,16($17) \n\ 298 - ldq $25,16($18) \n\ 299 - \n\ 300 - ldq $27,16($19) \n\ 301 - xor $0,$1,$1 # 6 cycles from $1 load \n\ 302 - ldq $28,16($20) \n\ 303 - xor $2,$3,$3 # 6 cycles from $3 load \n\ 304 - \n\ 305 - ldq $0,16($21) \n\ 306 - xor $1,$3,$3 \n\ 307 - ldq $1,24($17) \n\ 308 - xor $3,$4,$4 # 7 cycles from $4 load \n\ 309 - \n\ 310 - stq $4,0($17) \n\ 311 - xor $5,$6,$6 # 7 cycles from $6 load \n\ 312 - xor $7,$22,$22 # 7 cycles from $22 load \n\ 313 - xor $6,$23,$23 # 7 cycles from $23 load \n\ 314 - \n\ 315 - ldq $2,24($18) \n\ 316 - xor $22,$23,$23 \n\ 317 - ldq $3,24($19) \n\ 318 - xor $24,$25,$25 # 8 cycles from $25 load \n\ 319 - \n\ 320 - stq $23,8($17) \n\ 321 - xor $25,$27,$27 # 8 cycles from $27 load \n\ 322 - ldq $4,24($20) \n\ 323 - xor $28,$0,$0 # 7 cycles from $0 load \n\ 324 - \n\ 325 - ldq $5,24($21) \n\ 326 - xor $27,$0,$0 \n\ 327 - ldq $6,32($17) \n\ 328 - ldq $7,32($18) \n\ 329 - \n\ 330 - stq $0,16($17) \n\ 331 - xor $1,$2,$2 # 6 cycles from $2 load \n\ 332 - ldq $22,32($19) \n\ 333 - xor $3,$4,$4 # 4 cycles from $4 load \n\ 334 - \n\ 335 - ldq $23,32($20) \n\ 336 - xor $2,$4,$4 \n\ 337 - ldq $24,32($21) \n\ 338 - ldq $25,40($17) \n\ 339 - \n\ 340 - ldq $27,40($18) \n\ 341 - ldq $28,40($19) \n\ 342 - ldq $0,40($20) \n\ 343 - xor $4,$5,$5 # 7 cycles from $5 load \n\ 344 - \n\ 345 - stq $5,24($17) \n\ 346 - xor $6,$7,$7 # 7 cycles from $7 load \n\ 347 - ldq $1,40($21) \n\ 348 - ldq $2,48($17) \n\ 349 - \n\ 350 - ldq $3,48($18) \n\ 351 - xor $7,$22,$22 # 7 cycles from $22 load \n\ 352 - ldq $4,48($19) \n\ 353 - xor $23,$24,$24 # 6 cycles from $24 load \n\ 354 - \n\ 355 - ldq $5,48($20) \n\ 356 - xor $22,$24,$24 \n\ 357 - ldq $6,48($21) \n\ 358 - xor $25,$27,$27 # 7 cycles from $27 load \n\ 359 - \n\ 360 - stq $24,32($17) \n\ 361 - xor $27,$28,$28 # 8 cycles from $28 load \n\ 362 - ldq $7,56($17) \n\ 363 - xor $0,$1,$1 # 6 cycles from $1 load \n\ 364 - \n\ 365 - ldq $22,56($18) \n\ 366 - ldq $23,56($19) \n\ 367 - ldq $24,56($20) \n\ 368 - ldq $25,56($21) \n\ 369 - \n\ 370 - xor $28,$1,$1 \n\ 371 - xor $2,$3,$3 # 9 cycles from $3 load \n\ 372 - xor $3,$4,$4 # 9 cycles from $4 load \n\ 373 - xor $5,$6,$6 # 8 cycles from $6 load \n\ 374 - \n\ 375 - stq $1,40($17) \n\ 376 - xor $4,$6,$6 \n\ 377 - xor $7,$22,$22 # 7 cycles from $22 load \n\ 378 - xor $23,$24,$24 # 6 cycles from $24 load \n\ 379 - \n\ 380 - stq $6,48($17) \n\ 381 - xor $22,$24,$24 \n\ 382 - subq $16,1,$16 \n\ 383 - xor $24,$25,$25 # 8 cycles from $25 load \n\ 384 - \n\ 385 - stq $25,56($17) \n\ 386 - addq $21,64,$21 \n\ 387 - addq $20,64,$20 \n\ 388 - addq $19,64,$19 \n\ 389 - \n\ 390 - addq $18,64,$18 \n\ 391 - addq $17,64,$17 \n\ 392 - bgt $16,5b \n\ 393 - ret \n\ 394 - .end xor_alpha_5 \n\ 395 - \n\ 396 - .align 3 \n\ 397 - .ent xor_alpha_prefetch_2 \n\ 398 - xor_alpha_prefetch_2: \n\ 399 - .prologue 0 \n\ 400 - srl $16, 6, $16 \n\ 401 - \n\ 402 - ldq $31, 0($17) \n\ 403 - ldq $31, 0($18) \n\ 404 - \n\ 405 - ldq $31, 64($17) \n\ 406 - ldq $31, 64($18) \n\ 407 - \n\ 408 - ldq $31, 128($17) \n\ 409 - ldq $31, 128($18) \n\ 410 - \n\ 411 - ldq $31, 192($17) \n\ 412 - ldq $31, 192($18) \n\ 413 - .align 4 \n\ 414 - 2: \n\ 415 - ldq $0,0($17) \n\ 416 - ldq $1,0($18) \n\ 417 - ldq $2,8($17) \n\ 418 - ldq $3,8($18) \n\ 419 - \n\ 420 - ldq $4,16($17) \n\ 421 - ldq $5,16($18) \n\ 422 - ldq $6,24($17) \n\ 423 - ldq $7,24($18) \n\ 424 - \n\ 425 - ldq $19,32($17) \n\ 426 - ldq $20,32($18) \n\ 427 - ldq $21,40($17) \n\ 428 - ldq $22,40($18) \n\ 429 - \n\ 430 - ldq $23,48($17) \n\ 431 - ldq $24,48($18) \n\ 432 - ldq $25,56($17) \n\ 433 - ldq $27,56($18) \n\ 434 - \n\ 435 - ldq $31,256($17) \n\ 436 - xor $0,$1,$0 # 8 cycles from $1 load \n\ 437 - ldq $31,256($18) \n\ 438 - xor $2,$3,$2 \n\ 439 - \n\ 440 - stq $0,0($17) \n\ 441 - xor $4,$5,$4 \n\ 442 - stq $2,8($17) \n\ 443 - xor $6,$7,$6 \n\ 444 - \n\ 445 - stq $4,16($17) \n\ 446 - xor $19,$20,$19 \n\ 447 - stq $6,24($17) \n\ 448 - xor $21,$22,$21 \n\ 449 - \n\ 450 - stq $19,32($17) \n\ 451 - xor $23,$24,$23 \n\ 452 - stq $21,40($17) \n\ 453 - xor $25,$27,$25 \n\ 454 - \n\ 455 - stq $23,48($17) \n\ 456 - subq $16,1,$16 \n\ 457 - stq $25,56($17) \n\ 458 - addq $17,64,$17 \n\ 459 - \n\ 460 - addq $18,64,$18 \n\ 461 - bgt $16,2b \n\ 462 - ret \n\ 463 - .end xor_alpha_prefetch_2 \n\ 464 - \n\ 465 - .align 3 \n\ 466 - .ent xor_alpha_prefetch_3 \n\ 467 - xor_alpha_prefetch_3: \n\ 468 - .prologue 0 \n\ 469 - srl $16, 6, $16 \n\ 470 - \n\ 471 - ldq $31, 0($17) \n\ 472 - ldq $31, 0($18) \n\ 473 - ldq $31, 0($19) \n\ 474 - \n\ 475 - ldq $31, 64($17) \n\ 476 - ldq $31, 64($18) \n\ 477 - ldq $31, 64($19) \n\ 478 - \n\ 479 - ldq $31, 128($17) \n\ 480 - ldq $31, 128($18) \n\ 481 - ldq $31, 128($19) \n\ 482 - \n\ 483 - ldq $31, 192($17) \n\ 484 - ldq $31, 192($18) \n\ 485 - ldq $31, 192($19) \n\ 486 - .align 4 \n\ 487 - 3: \n\ 488 - ldq $0,0($17) \n\ 489 - ldq $1,0($18) \n\ 490 - ldq $2,0($19) \n\ 491 - ldq $3,8($17) \n\ 492 - \n\ 493 - ldq $4,8($18) \n\ 494 - ldq $6,16($17) \n\ 495 - ldq $7,16($18) \n\ 496 - ldq $21,24($17) \n\ 497 - \n\ 498 - ldq $22,24($18) \n\ 499 - ldq $24,32($17) \n\ 500 - ldq $25,32($18) \n\ 501 - ldq $5,8($19) \n\ 502 - \n\ 503 - ldq $20,16($19) \n\ 504 - ldq $23,24($19) \n\ 505 - ldq $27,32($19) \n\ 506 - nop \n\ 507 - \n\ 508 - xor $0,$1,$1 # 8 cycles from $0 load \n\ 509 - xor $3,$4,$4 # 7 cycles from $4 load \n\ 510 - xor $6,$7,$7 # 6 cycles from $7 load \n\ 511 - xor $21,$22,$22 # 5 cycles from $22 load \n\ 512 - \n\ 513 - xor $1,$2,$2 # 9 cycles from $2 load \n\ 514 - xor $24,$25,$25 # 5 cycles from $25 load \n\ 515 - stq $2,0($17) \n\ 516 - xor $4,$5,$5 # 6 cycles from $5 load \n\ 517 - \n\ 518 - stq $5,8($17) \n\ 519 - xor $7,$20,$20 # 7 cycles from $20 load \n\ 520 - stq $20,16($17) \n\ 521 - xor $22,$23,$23 # 7 cycles from $23 load \n\ 522 - \n\ 523 - stq $23,24($17) \n\ 524 - xor $25,$27,$27 # 7 cycles from $27 load \n\ 525 - stq $27,32($17) \n\ 526 - nop \n\ 527 - \n\ 528 - ldq $0,40($17) \n\ 529 - ldq $1,40($18) \n\ 530 - ldq $3,48($17) \n\ 531 - ldq $4,48($18) \n\ 532 - \n\ 533 - ldq $6,56($17) \n\ 534 - ldq $7,56($18) \n\ 535 - ldq $2,40($19) \n\ 536 - ldq $5,48($19) \n\ 537 - \n\ 538 - ldq $20,56($19) \n\ 539 - ldq $31,256($17) \n\ 540 - ldq $31,256($18) \n\ 541 - ldq $31,256($19) \n\ 542 - \n\ 543 - xor $0,$1,$1 # 6 cycles from $1 load \n\ 544 - xor $3,$4,$4 # 5 cycles from $4 load \n\ 545 - xor $6,$7,$7 # 5 cycles from $7 load \n\ 546 - xor $1,$2,$2 # 4 cycles from $2 load \n\ 547 - \n\ 548 - xor $4,$5,$5 # 5 cycles from $5 load \n\ 549 - xor $7,$20,$20 # 4 cycles from $20 load \n\ 550 - stq $2,40($17) \n\ 551 - subq $16,1,$16 \n\ 552 - \n\ 553 - stq $5,48($17) \n\ 554 - addq $19,64,$19 \n\ 555 - stq $20,56($17) \n\ 556 - addq $18,64,$18 \n\ 557 - \n\ 558 - addq $17,64,$17 \n\ 559 - bgt $16,3b \n\ 560 - ret \n\ 561 - .end xor_alpha_prefetch_3 \n\ 562 - \n\ 563 - .align 3 \n\ 564 - .ent xor_alpha_prefetch_4 \n\ 565 - xor_alpha_prefetch_4: \n\ 566 - .prologue 0 \n\ 567 - srl $16, 6, $16 \n\ 568 - \n\ 569 - ldq $31, 0($17) \n\ 570 - ldq $31, 0($18) \n\ 571 - ldq $31, 0($19) \n\ 572 - ldq $31, 0($20) \n\ 573 - \n\ 574 - ldq $31, 64($17) \n\ 575 - ldq $31, 64($18) \n\ 576 - ldq $31, 64($19) \n\ 577 - ldq $31, 64($20) \n\ 578 - \n\ 579 - ldq $31, 128($17) \n\ 580 - ldq $31, 128($18) \n\ 581 - ldq $31, 128($19) \n\ 582 - ldq $31, 128($20) \n\ 583 - \n\ 584 - ldq $31, 192($17) \n\ 585 - ldq $31, 192($18) \n\ 586 - ldq $31, 192($19) \n\ 587 - ldq $31, 192($20) \n\ 588 - .align 4 \n\ 589 - 4: \n\ 590 - ldq $0,0($17) \n\ 591 - ldq $1,0($18) \n\ 592 - ldq $2,0($19) \n\ 593 - ldq $3,0($20) \n\ 594 - \n\ 595 - ldq $4,8($17) \n\ 596 - ldq $5,8($18) \n\ 597 - ldq $6,8($19) \n\ 598 - ldq $7,8($20) \n\ 599 - \n\ 600 - ldq $21,16($17) \n\ 601 - ldq $22,16($18) \n\ 602 - ldq $23,16($19) \n\ 603 - ldq $24,16($20) \n\ 604 - \n\ 605 - ldq $25,24($17) \n\ 606 - xor $0,$1,$1 # 6 cycles from $1 load \n\ 607 - ldq $27,24($18) \n\ 608 - xor $2,$3,$3 # 6 cycles from $3 load \n\ 609 - \n\ 610 - ldq $0,24($19) \n\ 611 - xor $1,$3,$3 \n\ 612 - ldq $1,24($20) \n\ 613 - xor $4,$5,$5 # 7 cycles from $5 load \n\ 614 - \n\ 615 - stq $3,0($17) \n\ 616 - xor $6,$7,$7 \n\ 617 - xor $21,$22,$22 # 7 cycles from $22 load \n\ 618 - xor $5,$7,$7 \n\ 619 - \n\ 620 - stq $7,8($17) \n\ 621 - xor $23,$24,$24 # 7 cycles from $24 load \n\ 622 - ldq $2,32($17) \n\ 623 - xor $22,$24,$24 \n\ 624 - \n\ 625 - ldq $3,32($18) \n\ 626 - ldq $4,32($19) \n\ 627 - ldq $5,32($20) \n\ 628 - xor $25,$27,$27 # 8 cycles from $27 load \n\ 629 - \n\ 630 - ldq $6,40($17) \n\ 631 - ldq $7,40($18) \n\ 632 - ldq $21,40($19) \n\ 633 - ldq $22,40($20) \n\ 634 - \n\ 635 - stq $24,16($17) \n\ 636 - xor $0,$1,$1 # 9 cycles from $1 load \n\ 637 - xor $2,$3,$3 # 5 cycles from $3 load \n\ 638 - xor $27,$1,$1 \n\ 639 - \n\ 640 - stq $1,24($17) \n\ 641 - xor $4,$5,$5 # 5 cycles from $5 load \n\ 642 - ldq $23,48($17) \n\ 643 - xor $3,$5,$5 \n\ 644 - \n\ 645 - ldq $24,48($18) \n\ 646 - ldq $25,48($19) \n\ 647 - ldq $27,48($20) \n\ 648 - ldq $0,56($17) \n\ 649 - \n\ 650 - ldq $1,56($18) \n\ 651 - ldq $2,56($19) \n\ 652 - ldq $3,56($20) \n\ 653 - xor $6,$7,$7 # 8 cycles from $6 load \n\ 654 - \n\ 655 - ldq $31,256($17) \n\ 656 - xor $21,$22,$22 # 8 cycles from $22 load \n\ 657 - ldq $31,256($18) \n\ 658 - xor $7,$22,$22 \n\ 659 - \n\ 660 - ldq $31,256($19) \n\ 661 - xor $23,$24,$24 # 6 cycles from $24 load \n\ 662 - ldq $31,256($20) \n\ 663 - xor $25,$27,$27 # 6 cycles from $27 load \n\ 664 - \n\ 665 - stq $5,32($17) \n\ 666 - xor $24,$27,$27 \n\ 667 - xor $0,$1,$1 # 7 cycles from $1 load \n\ 668 - xor $2,$3,$3 # 6 cycles from $3 load \n\ 669 - \n\ 670 - stq $22,40($17) \n\ 671 - xor $1,$3,$3 \n\ 672 - stq $27,48($17) \n\ 673 - subq $16,1,$16 \n\ 674 - \n\ 675 - stq $3,56($17) \n\ 676 - addq $20,64,$20 \n\ 677 - addq $19,64,$19 \n\ 678 - addq $18,64,$18 \n\ 679 - \n\ 680 - addq $17,64,$17 \n\ 681 - bgt $16,4b \n\ 682 - ret \n\ 683 - .end xor_alpha_prefetch_4 \n\ 684 - \n\ 685 - .align 3 \n\ 686 - .ent xor_alpha_prefetch_5 \n\ 687 - xor_alpha_prefetch_5: \n\ 688 - .prologue 0 \n\ 689 - srl $16, 6, $16 \n\ 690 - \n\ 691 - ldq $31, 0($17) \n\ 692 - ldq $31, 0($18) \n\ 693 - ldq $31, 0($19) \n\ 694 - ldq $31, 0($20) \n\ 695 - ldq $31, 0($21) \n\ 696 - \n\ 697 - ldq $31, 64($17) \n\ 698 - ldq $31, 64($18) \n\ 699 - ldq $31, 64($19) \n\ 700 - ldq $31, 64($20) \n\ 701 - ldq $31, 64($21) \n\ 702 - \n\ 703 - ldq $31, 128($17) \n\ 704 - ldq $31, 128($18) \n\ 705 - ldq $31, 128($19) \n\ 706 - ldq $31, 128($20) \n\ 707 - ldq $31, 128($21) \n\ 708 - \n\ 709 - ldq $31, 192($17) \n\ 710 - ldq $31, 192($18) \n\ 711 - ldq $31, 192($19) \n\ 712 - ldq $31, 192($20) \n\ 713 - ldq $31, 192($21) \n\ 714 - .align 4 \n\ 715 - 5: \n\ 716 - ldq $0,0($17) \n\ 717 - ldq $1,0($18) \n\ 718 - ldq $2,0($19) \n\ 719 - ldq $3,0($20) \n\ 720 - \n\ 721 - ldq $4,0($21) \n\ 722 - ldq $5,8($17) \n\ 723 - ldq $6,8($18) \n\ 724 - ldq $7,8($19) \n\ 725 - \n\ 726 - ldq $22,8($20) \n\ 727 - ldq $23,8($21) \n\ 728 - ldq $24,16($17) \n\ 729 - ldq $25,16($18) \n\ 730 - \n\ 731 - ldq $27,16($19) \n\ 732 - xor $0,$1,$1 # 6 cycles from $1 load \n\ 733 - ldq $28,16($20) \n\ 734 - xor $2,$3,$3 # 6 cycles from $3 load \n\ 735 - \n\ 736 - ldq $0,16($21) \n\ 737 - xor $1,$3,$3 \n\ 738 - ldq $1,24($17) \n\ 739 - xor $3,$4,$4 # 7 cycles from $4 load \n\ 740 - \n\ 741 - stq $4,0($17) \n\ 742 - xor $5,$6,$6 # 7 cycles from $6 load \n\ 743 - xor $7,$22,$22 # 7 cycles from $22 load \n\ 744 - xor $6,$23,$23 # 7 cycles from $23 load \n\ 745 - \n\ 746 - ldq $2,24($18) \n\ 747 - xor $22,$23,$23 \n\ 748 - ldq $3,24($19) \n\ 749 - xor $24,$25,$25 # 8 cycles from $25 load \n\ 750 - \n\ 751 - stq $23,8($17) \n\ 752 - xor $25,$27,$27 # 8 cycles from $27 load \n\ 753 - ldq $4,24($20) \n\ 754 - xor $28,$0,$0 # 7 cycles from $0 load \n\ 755 - \n\ 756 - ldq $5,24($21) \n\ 757 - xor $27,$0,$0 \n\ 758 - ldq $6,32($17) \n\ 759 - ldq $7,32($18) \n\ 760 - \n\ 761 - stq $0,16($17) \n\ 762 - xor $1,$2,$2 # 6 cycles from $2 load \n\ 763 - ldq $22,32($19) \n\ 764 - xor $3,$4,$4 # 4 cycles from $4 load \n\ 765 - \n\ 766 - ldq $23,32($20) \n\ 767 - xor $2,$4,$4 \n\ 768 - ldq $24,32($21) \n\ 769 - ldq $25,40($17) \n\ 770 - \n\ 771 - ldq $27,40($18) \n\ 772 - ldq $28,40($19) \n\ 773 - ldq $0,40($20) \n\ 774 - xor $4,$5,$5 # 7 cycles from $5 load \n\ 775 - \n\ 776 - stq $5,24($17) \n\ 777 - xor $6,$7,$7 # 7 cycles from $7 load \n\ 778 - ldq $1,40($21) \n\ 779 - ldq $2,48($17) \n\ 780 - \n\ 781 - ldq $3,48($18) \n\ 782 - xor $7,$22,$22 # 7 cycles from $22 load \n\ 783 - ldq $4,48($19) \n\ 784 - xor $23,$24,$24 # 6 cycles from $24 load \n\ 785 - \n\ 786 - ldq $5,48($20) \n\ 787 - xor $22,$24,$24 \n\ 788 - ldq $6,48($21) \n\ 789 - xor $25,$27,$27 # 7 cycles from $27 load \n\ 790 - \n\ 791 - stq $24,32($17) \n\ 792 - xor $27,$28,$28 # 8 cycles from $28 load \n\ 793 - ldq $7,56($17) \n\ 794 - xor $0,$1,$1 # 6 cycles from $1 load \n\ 795 - \n\ 796 - ldq $22,56($18) \n\ 797 - ldq $23,56($19) \n\ 798 - ldq $24,56($20) \n\ 799 - ldq $25,56($21) \n\ 800 - \n\ 801 - ldq $31,256($17) \n\ 802 - xor $28,$1,$1 \n\ 803 - ldq $31,256($18) \n\ 804 - xor $2,$3,$3 # 9 cycles from $3 load \n\ 805 - \n\ 806 - ldq $31,256($19) \n\ 807 - xor $3,$4,$4 # 9 cycles from $4 load \n\ 808 - ldq $31,256($20) \n\ 809 - xor $5,$6,$6 # 8 cycles from $6 load \n\ 810 - \n\ 811 - stq $1,40($17) \n\ 812 - xor $4,$6,$6 \n\ 813 - xor $7,$22,$22 # 7 cycles from $22 load \n\ 814 - xor $23,$24,$24 # 6 cycles from $24 load \n\ 815 - \n\ 816 - stq $6,48($17) \n\ 817 - xor $22,$24,$24 \n\ 818 - ldq $31,256($21) \n\ 819 - xor $24,$25,$25 # 8 cycles from $25 load \n\ 820 - \n\ 821 - stq $25,56($17) \n\ 822 - subq $16,1,$16 \n\ 823 - addq $21,64,$21 \n\ 824 - addq $20,64,$20 \n\ 825 - \n\ 826 - addq $19,64,$19 \n\ 827 - addq $18,64,$18 \n\ 828 - addq $17,64,$17 \n\ 829 - bgt $16,5b \n\ 830 - \n\ 831 - ret \n\ 832 - .end xor_alpha_prefetch_5 \n\ 833 - "); 834 - 835 - static struct xor_block_template xor_block_alpha = { 836 - .name = "alpha", 837 - .do_2 = xor_alpha_2, 838 - .do_3 = xor_alpha_3, 839 - .do_4 = xor_alpha_4, 840 - .do_5 = xor_alpha_5, 841 - }; 842 - 843 - static struct xor_block_template xor_block_alpha_prefetch = { 844 - .name = "alpha prefetch", 845 - .do_2 = xor_alpha_prefetch_2, 846 - .do_3 = xor_alpha_prefetch_3, 847 - .do_4 = xor_alpha_prefetch_4, 848 - .do_5 = xor_alpha_prefetch_5, 849 - }; 850 - 851 - /* For grins, also test the generic routines. */ 852 - #include <asm-generic/xor.h> 853 - 854 - #undef XOR_TRY_TEMPLATES 855 - #define XOR_TRY_TEMPLATES \ 856 - do { \ 857 - xor_speed(&xor_block_8regs); \ 858 - xor_speed(&xor_block_32regs); \ 859 - xor_speed(&xor_block_alpha); \ 860 - xor_speed(&xor_block_alpha_prefetch); \ 861 - } while (0) 862 - 863 - /* Force the use of alpha_prefetch if EV6, as it is significantly 864 - faster in the cold cache case. */ 865 - #define XOR_SELECT_TEMPLATE(FASTEST) \ 866 - (implver() == IMPLVER_EV6 ? &xor_block_alpha_prefetch : FASTEST)

-225

arch/arm/include/asm/xor.h

··· 1 - /* SPDX-License-Identifier: GPL-2.0-only */ 2 - /* 3 - * arch/arm/include/asm/xor.h 4 - * 5 - * Copyright (C) 2001 Russell King 6 - */ 7 - #include <linux/hardirq.h> 8 - #include <asm-generic/xor.h> 9 - #include <asm/hwcap.h> 10 - #include <asm/neon.h> 11 - 12 - #define __XOR(a1, a2) a1 ^= a2 13 - 14 - #define GET_BLOCK_2(dst) \ 15 - __asm__("ldmia %0, {%1, %2}" \ 16 - : "=r" (dst), "=r" (a1), "=r" (a2) \ 17 - : "0" (dst)) 18 - 19 - #define GET_BLOCK_4(dst) \ 20 - __asm__("ldmia %0, {%1, %2, %3, %4}" \ 21 - : "=r" (dst), "=r" (a1), "=r" (a2), "=r" (a3), "=r" (a4) \ 22 - : "0" (dst)) 23 - 24 - #define XOR_BLOCK_2(src) \ 25 - __asm__("ldmia %0!, {%1, %2}" \ 26 - : "=r" (src), "=r" (b1), "=r" (b2) \ 27 - : "0" (src)); \ 28 - __XOR(a1, b1); __XOR(a2, b2); 29 - 30 - #define XOR_BLOCK_4(src) \ 31 - __asm__("ldmia %0!, {%1, %2, %3, %4}" \ 32 - : "=r" (src), "=r" (b1), "=r" (b2), "=r" (b3), "=r" (b4) \ 33 - : "0" (src)); \ 34 - __XOR(a1, b1); __XOR(a2, b2); __XOR(a3, b3); __XOR(a4, b4) 35 - 36 - #define PUT_BLOCK_2(dst) \ 37 - __asm__ __volatile__("stmia %0!, {%2, %3}" \ 38 - : "=r" (dst) \ 39 - : "0" (dst), "r" (a1), "r" (a2)) 40 - 41 - #define PUT_BLOCK_4(dst) \ 42 - __asm__ __volatile__("stmia %0!, {%2, %3, %4, %5}" \ 43 - : "=r" (dst) \ 44 - : "0" (dst), "r" (a1), "r" (a2), "r" (a3), "r" (a4)) 45 - 46 - static void 47 - xor_arm4regs_2(unsigned long bytes, unsigned long * __restrict p1, 48 - const unsigned long * __restrict p2) 49 - { 50 - unsigned int lines = bytes / sizeof(unsigned long) / 4; 51 - register unsigned int a1 __asm__("r4"); 52 - register unsigned int a2 __asm__("r5"); 53 - register unsigned int a3 __asm__("r6"); 54 - register unsigned int a4 __asm__("r10"); 55 - register unsigned int b1 __asm__("r8"); 56 - register unsigned int b2 __asm__("r9"); 57 - register unsigned int b3 __asm__("ip"); 58 - register unsigned int b4 __asm__("lr"); 59 - 60 - do { 61 - GET_BLOCK_4(p1); 62 - XOR_BLOCK_4(p2); 63 - PUT_BLOCK_4(p1); 64 - } while (--lines); 65 - } 66 - 67 - static void 68 - xor_arm4regs_3(unsigned long bytes, unsigned long * __restrict p1, 69 - const unsigned long * __restrict p2, 70 - const unsigned long * __restrict p3) 71 - { 72 - unsigned int lines = bytes / sizeof(unsigned long) / 4; 73 - register unsigned int a1 __asm__("r4"); 74 - register unsigned int a2 __asm__("r5"); 75 - register unsigned int a3 __asm__("r6"); 76 - register unsigned int a4 __asm__("r10"); 77 - register unsigned int b1 __asm__("r8"); 78 - register unsigned int b2 __asm__("r9"); 79 - register unsigned int b3 __asm__("ip"); 80 - register unsigned int b4 __asm__("lr"); 81 - 82 - do { 83 - GET_BLOCK_4(p1); 84 - XOR_BLOCK_4(p2); 85 - XOR_BLOCK_4(p3); 86 - PUT_BLOCK_4(p1); 87 - } while (--lines); 88 - } 89 - 90 - static void 91 - xor_arm4regs_4(unsigned long bytes, unsigned long * __restrict p1, 92 - const unsigned long * __restrict p2, 93 - const unsigned long * __restrict p3, 94 - const unsigned long * __restrict p4) 95 - { 96 - unsigned int lines = bytes / sizeof(unsigned long) / 2; 97 - register unsigned int a1 __asm__("r8"); 98 - register unsigned int a2 __asm__("r9"); 99 - register unsigned int b1 __asm__("ip"); 100 - register unsigned int b2 __asm__("lr"); 101 - 102 - do { 103 - GET_BLOCK_2(p1); 104 - XOR_BLOCK_2(p2); 105 - XOR_BLOCK_2(p3); 106 - XOR_BLOCK_2(p4); 107 - PUT_BLOCK_2(p1); 108 - } while (--lines); 109 - } 110 - 111 - static void 112 - xor_arm4regs_5(unsigned long bytes, unsigned long * __restrict p1, 113 - const unsigned long * __restrict p2, 114 - const unsigned long * __restrict p3, 115 - const unsigned long * __restrict p4, 116 - const unsigned long * __restrict p5) 117 - { 118 - unsigned int lines = bytes / sizeof(unsigned long) / 2; 119 - register unsigned int a1 __asm__("r8"); 120 - register unsigned int a2 __asm__("r9"); 121 - register unsigned int b1 __asm__("ip"); 122 - register unsigned int b2 __asm__("lr"); 123 - 124 - do { 125 - GET_BLOCK_2(p1); 126 - XOR_BLOCK_2(p2); 127 - XOR_BLOCK_2(p3); 128 - XOR_BLOCK_2(p4); 129 - XOR_BLOCK_2(p5); 130 - PUT_BLOCK_2(p1); 131 - } while (--lines); 132 - } 133 - 134 - static struct xor_block_template xor_block_arm4regs = { 135 - .name = "arm4regs", 136 - .do_2 = xor_arm4regs_2, 137 - .do_3 = xor_arm4regs_3, 138 - .do_4 = xor_arm4regs_4, 139 - .do_5 = xor_arm4regs_5, 140 - }; 141 - 142 - #undef XOR_TRY_TEMPLATES 143 - #define XOR_TRY_TEMPLATES \ 144 - do { \ 145 - xor_speed(&xor_block_arm4regs); \ 146 - xor_speed(&xor_block_8regs); \ 147 - xor_speed(&xor_block_32regs); \ 148 - NEON_TEMPLATES; \ 149 - } while (0) 150 - 151 - #ifdef CONFIG_KERNEL_MODE_NEON 152 - 153 - extern struct xor_block_template const xor_block_neon_inner; 154 - 155 - static void 156 - xor_neon_2(unsigned long bytes, unsigned long * __restrict p1, 157 - const unsigned long * __restrict p2) 158 - { 159 - if (in_interrupt()) { 160 - xor_arm4regs_2(bytes, p1, p2); 161 - } else { 162 - kernel_neon_begin(); 163 - xor_block_neon_inner.do_2(bytes, p1, p2); 164 - kernel_neon_end(); 165 - } 166 - } 167 - 168 - static void 169 - xor_neon_3(unsigned long bytes, unsigned long * __restrict p1, 170 - const unsigned long * __restrict p2, 171 - const unsigned long * __restrict p3) 172 - { 173 - if (in_interrupt()) { 174 - xor_arm4regs_3(bytes, p1, p2, p3); 175 - } else { 176 - kernel_neon_begin(); 177 - xor_block_neon_inner.do_3(bytes, p1, p2, p3); 178 - kernel_neon_end(); 179 - } 180 - } 181 - 182 - static void 183 - xor_neon_4(unsigned long bytes, unsigned long * __restrict p1, 184 - const unsigned long * __restrict p2, 185 - const unsigned long * __restrict p3, 186 - const unsigned long * __restrict p4) 187 - { 188 - if (in_interrupt()) { 189 - xor_arm4regs_4(bytes, p1, p2, p3, p4); 190 - } else { 191 - kernel_neon_begin(); 192 - xor_block_neon_inner.do_4(bytes, p1, p2, p3, p4); 193 - kernel_neon_end(); 194 - } 195 - } 196 - 197 - static void 198 - xor_neon_5(unsigned long bytes, unsigned long * __restrict p1, 199 - const unsigned long * __restrict p2, 200 - const unsigned long * __restrict p3, 201 - const unsigned long * __restrict p4, 202 - const unsigned long * __restrict p5) 203 - { 204 - if (in_interrupt()) { 205 - xor_arm4regs_5(bytes, p1, p2, p3, p4, p5); 206 - } else { 207 - kernel_neon_begin(); 208 - xor_block_neon_inner.do_5(bytes, p1, p2, p3, p4, p5); 209 - kernel_neon_end(); 210 - } 211 - } 212 - 213 - static struct xor_block_template xor_block_neon = { 214 - .name = "neon", 215 - .do_2 = xor_neon_2, 216 - .do_3 = xor_neon_3, 217 - .do_4 = xor_neon_4, 218 - .do_5 = xor_neon_5 219 - }; 220 - 221 - #define NEON_TEMPLATES \ 222 - do { if (cpu_has_neon()) xor_speed(&xor_block_neon); } while (0) 223 - #else 224 - #define NEON_TEMPLATES 225 - #endif

-5

arch/arm/lib/Makefile

··· 39 39 $(obj)/csumpartialcopy.o: $(obj)/csumpartialcopygeneric.S 40 40 $(obj)/csumpartialcopyuser.o: $(obj)/csumpartialcopygeneric.S 41 41 42 - ifeq ($(CONFIG_KERNEL_MODE_NEON),y) 43 - CFLAGS_xor-neon.o += $(CC_FLAGS_FPU) 44 - obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o 45 - endif 46 - 47 42 obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o

-38

arch/arm/lib/xor-neon.c

··· 1 - // SPDX-License-Identifier: GPL-2.0-only 2 - /* 3 - * linux/arch/arm/lib/xor-neon.c 4 - * 5 - * Copyright (C) 2013 Linaro Ltd <ard.biesheuvel@linaro.org> 6 - */ 7 - 8 - #include <linux/raid/xor.h> 9 - #include <linux/module.h> 10 - 11 - MODULE_DESCRIPTION("NEON accelerated XOR implementation"); 12 - MODULE_LICENSE("GPL"); 13 - 14 - #ifndef __ARM_NEON__ 15 - #error You should compile this file with '-march=armv7-a -mfloat-abi=softfp -mfpu=neon' 16 - #endif 17 - 18 - /* 19 - * Pull in the reference implementations while instructing GCC (through 20 - * -ftree-vectorize) to attempt to exploit implicit parallelism and emit 21 - * NEON instructions. Clang does this by default at O2 so no pragma is 22 - * needed. 23 - */ 24 - #ifdef CONFIG_CC_IS_GCC 25 - #pragma GCC optimize "tree-vectorize" 26 - #endif 27 - 28 - #pragma GCC diagnostic ignored "-Wunused-variable" 29 - #include <asm-generic/xor.h> 30 - 31 - struct xor_block_template const xor_block_neon_inner = { 32 - .name = "__inner_neon__", 33 - .do_2 = xor_8regs_2, 34 - .do_3 = xor_8regs_3, 35 - .do_4 = xor_8regs_4, 36 - .do_5 = xor_8regs_5, 37 - }; 38 - EXPORT_SYMBOL(xor_block_neon_inner);

-73

arch/arm64/include/asm/xor.h

··· 1 - /* SPDX-License-Identifier: GPL-2.0-only */ 2 - /* 3 - * arch/arm64/include/asm/xor.h 4 - * 5 - * Authors: Jackie Liu <liuyun01@kylinos.cn> 6 - * Copyright (C) 2018,Tianjin KYLIN Information Technology Co., Ltd. 7 - */ 8 - 9 - #include <linux/hardirq.h> 10 - #include <asm-generic/xor.h> 11 - #include <asm/hwcap.h> 12 - #include <asm/simd.h> 13 - 14 - #ifdef CONFIG_KERNEL_MODE_NEON 15 - 16 - extern struct xor_block_template const xor_block_inner_neon; 17 - 18 - static void 19 - xor_neon_2(unsigned long bytes, unsigned long * __restrict p1, 20 - const unsigned long * __restrict p2) 21 - { 22 - scoped_ksimd() 23 - xor_block_inner_neon.do_2(bytes, p1, p2); 24 - } 25 - 26 - static void 27 - xor_neon_3(unsigned long bytes, unsigned long * __restrict p1, 28 - const unsigned long * __restrict p2, 29 - const unsigned long * __restrict p3) 30 - { 31 - scoped_ksimd() 32 - xor_block_inner_neon.do_3(bytes, p1, p2, p3); 33 - } 34 - 35 - static void 36 - xor_neon_4(unsigned long bytes, unsigned long * __restrict p1, 37 - const unsigned long * __restrict p2, 38 - const unsigned long * __restrict p3, 39 - const unsigned long * __restrict p4) 40 - { 41 - scoped_ksimd() 42 - xor_block_inner_neon.do_4(bytes, p1, p2, p3, p4); 43 - } 44 - 45 - static void 46 - xor_neon_5(unsigned long bytes, unsigned long * __restrict p1, 47 - const unsigned long * __restrict p2, 48 - const unsigned long * __restrict p3, 49 - const unsigned long * __restrict p4, 50 - const unsigned long * __restrict p5) 51 - { 52 - scoped_ksimd() 53 - xor_block_inner_neon.do_5(bytes, p1, p2, p3, p4, p5); 54 - } 55 - 56 - static struct xor_block_template xor_block_arm64 = { 57 - .name = "arm64_neon", 58 - .do_2 = xor_neon_2, 59 - .do_3 = xor_neon_3, 60 - .do_4 = xor_neon_4, 61 - .do_5 = xor_neon_5 62 - }; 63 - #undef XOR_TRY_TEMPLATES 64 - #define XOR_TRY_TEMPLATES \ 65 - do { \ 66 - xor_speed(&xor_block_8regs); \ 67 - xor_speed(&xor_block_32regs); \ 68 - if (cpu_has_neon()) { \ 69 - xor_speed(&xor_block_arm64);\ 70 - } \ 71 - } while (0) 72 - 73 - #endif /* ! CONFIG_KERNEL_MODE_NEON */

+4

arch/arm64/kernel/machine_kexec_file.c

··· 134 134 135 135 kexec_dprintk("Loaded elf core header at 0x%lx bufsz=0x%lx memsz=0x%lx\n", 136 136 image->elf_load_addr, kbuf.bufsz, kbuf.memsz); 137 + 138 + ret = crash_load_dm_crypt_keys(image); 139 + if (ret) 140 + goto out_err; 137 141 } 138 142 #endif 139 143

-6

arch/arm64/lib/Makefile

··· 5 5 memset.o memcmp.o strcmp.o strncmp.o strlen.o \ 6 6 strnlen.o strchr.o strrchr.o tishift.o 7 7 8 - ifeq ($(CONFIG_KERNEL_MODE_NEON), y) 9 - obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o 10 - CFLAGS_xor-neon.o += $(CC_FLAGS_FPU) 11 - CFLAGS_REMOVE_xor-neon.o += $(CC_FLAGS_NO_FPU) 12 - endif 13 - 14 8 lib-$(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) += uaccess_flushcache.o 15 9 16 10 obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o

-338

arch/arm64/lib/xor-neon.c

··· 1 - // SPDX-License-Identifier: GPL-2.0-only 2 - /* 3 - * arch/arm64/lib/xor-neon.c 4 - * 5 - * Authors: Jackie Liu <liuyun01@kylinos.cn> 6 - * Copyright (C) 2018,Tianjin KYLIN Information Technology Co., Ltd. 7 - */ 8 - 9 - #include <linux/raid/xor.h> 10 - #include <linux/module.h> 11 - #include <asm/neon-intrinsics.h> 12 - 13 - static void xor_arm64_neon_2(unsigned long bytes, unsigned long * __restrict p1, 14 - const unsigned long * __restrict p2) 15 - { 16 - uint64_t *dp1 = (uint64_t *)p1; 17 - uint64_t *dp2 = (uint64_t *)p2; 18 - 19 - register uint64x2_t v0, v1, v2, v3; 20 - long lines = bytes / (sizeof(uint64x2_t) * 4); 21 - 22 - do { 23 - /* p1 ^= p2 */ 24 - v0 = veorq_u64(vld1q_u64(dp1 + 0), vld1q_u64(dp2 + 0)); 25 - v1 = veorq_u64(vld1q_u64(dp1 + 2), vld1q_u64(dp2 + 2)); 26 - v2 = veorq_u64(vld1q_u64(dp1 + 4), vld1q_u64(dp2 + 4)); 27 - v3 = veorq_u64(vld1q_u64(dp1 + 6), vld1q_u64(dp2 + 6)); 28 - 29 - /* store */ 30 - vst1q_u64(dp1 + 0, v0); 31 - vst1q_u64(dp1 + 2, v1); 32 - vst1q_u64(dp1 + 4, v2); 33 - vst1q_u64(dp1 + 6, v3); 34 - 35 - dp1 += 8; 36 - dp2 += 8; 37 - } while (--lines > 0); 38 - } 39 - 40 - static void xor_arm64_neon_3(unsigned long bytes, unsigned long * __restrict p1, 41 - const unsigned long * __restrict p2, 42 - const unsigned long * __restrict p3) 43 - { 44 - uint64_t *dp1 = (uint64_t *)p1; 45 - uint64_t *dp2 = (uint64_t *)p2; 46 - uint64_t *dp3 = (uint64_t *)p3; 47 - 48 - register uint64x2_t v0, v1, v2, v3; 49 - long lines = bytes / (sizeof(uint64x2_t) * 4); 50 - 51 - do { 52 - /* p1 ^= p2 */ 53 - v0 = veorq_u64(vld1q_u64(dp1 + 0), vld1q_u64(dp2 + 0)); 54 - v1 = veorq_u64(vld1q_u64(dp1 + 2), vld1q_u64(dp2 + 2)); 55 - v2 = veorq_u64(vld1q_u64(dp1 + 4), vld1q_u64(dp2 + 4)); 56 - v3 = veorq_u64(vld1q_u64(dp1 + 6), vld1q_u64(dp2 + 6)); 57 - 58 - /* p1 ^= p3 */ 59 - v0 = veorq_u64(v0, vld1q_u64(dp3 + 0)); 60 - v1 = veorq_u64(v1, vld1q_u64(dp3 + 2)); 61 - v2 = veorq_u64(v2, vld1q_u64(dp3 + 4)); 62 - v3 = veorq_u64(v3, vld1q_u64(dp3 + 6)); 63 - 64 - /* store */ 65 - vst1q_u64(dp1 + 0, v0); 66 - vst1q_u64(dp1 + 2, v1); 67 - vst1q_u64(dp1 + 4, v2); 68 - vst1q_u64(dp1 + 6, v3); 69 - 70 - dp1 += 8; 71 - dp2 += 8; 72 - dp3 += 8; 73 - } while (--lines > 0); 74 - } 75 - 76 - static void xor_arm64_neon_4(unsigned long bytes, unsigned long * __restrict p1, 77 - const unsigned long * __restrict p2, 78 - const unsigned long * __restrict p3, 79 - const unsigned long * __restrict p4) 80 - { 81 - uint64_t *dp1 = (uint64_t *)p1; 82 - uint64_t *dp2 = (uint64_t *)p2; 83 - uint64_t *dp3 = (uint64_t *)p3; 84 - uint64_t *dp4 = (uint64_t *)p4; 85 - 86 - register uint64x2_t v0, v1, v2, v3; 87 - long lines = bytes / (sizeof(uint64x2_t) * 4); 88 - 89 - do { 90 - /* p1 ^= p2 */ 91 - v0 = veorq_u64(vld1q_u64(dp1 + 0), vld1q_u64(dp2 + 0)); 92 - v1 = veorq_u64(vld1q_u64(dp1 + 2), vld1q_u64(dp2 + 2)); 93 - v2 = veorq_u64(vld1q_u64(dp1 + 4), vld1q_u64(dp2 + 4)); 94 - v3 = veorq_u64(vld1q_u64(dp1 + 6), vld1q_u64(dp2 + 6)); 95 - 96 - /* p1 ^= p3 */ 97 - v0 = veorq_u64(v0, vld1q_u64(dp3 + 0)); 98 - v1 = veorq_u64(v1, vld1q_u64(dp3 + 2)); 99 - v2 = veorq_u64(v2, vld1q_u64(dp3 + 4)); 100 - v3 = veorq_u64(v3, vld1q_u64(dp3 + 6)); 101 - 102 - /* p1 ^= p4 */ 103 - v0 = veorq_u64(v0, vld1q_u64(dp4 + 0)); 104 - v1 = veorq_u64(v1, vld1q_u64(dp4 + 2)); 105 - v2 = veorq_u64(v2, vld1q_u64(dp4 + 4)); 106 - v3 = veorq_u64(v3, vld1q_u64(dp4 + 6)); 107 - 108 - /* store */ 109 - vst1q_u64(dp1 + 0, v0); 110 - vst1q_u64(dp1 + 2, v1); 111 - vst1q_u64(dp1 + 4, v2); 112 - vst1q_u64(dp1 + 6, v3); 113 - 114 - dp1 += 8; 115 - dp2 += 8; 116 - dp3 += 8; 117 - dp4 += 8; 118 - } while (--lines > 0); 119 - } 120 - 121 - static void xor_arm64_neon_5(unsigned long bytes, unsigned long * __restrict p1, 122 - const unsigned long * __restrict p2, 123 - const unsigned long * __restrict p3, 124 - const unsigned long * __restrict p4, 125 - const unsigned long * __restrict p5) 126 - { 127 - uint64_t *dp1 = (uint64_t *)p1; 128 - uint64_t *dp2 = (uint64_t *)p2; 129 - uint64_t *dp3 = (uint64_t *)p3; 130 - uint64_t *dp4 = (uint64_t *)p4; 131 - uint64_t *dp5 = (uint64_t *)p5; 132 - 133 - register uint64x2_t v0, v1, v2, v3; 134 - long lines = bytes / (sizeof(uint64x2_t) * 4); 135 - 136 - do { 137 - /* p1 ^= p2 */ 138 - v0 = veorq_u64(vld1q_u64(dp1 + 0), vld1q_u64(dp2 + 0)); 139 - v1 = veorq_u64(vld1q_u64(dp1 + 2), vld1q_u64(dp2 + 2)); 140 - v2 = veorq_u64(vld1q_u64(dp1 + 4), vld1q_u64(dp2 + 4)); 141 - v3 = veorq_u64(vld1q_u64(dp1 + 6), vld1q_u64(dp2 + 6)); 142 - 143 - /* p1 ^= p3 */ 144 - v0 = veorq_u64(v0, vld1q_u64(dp3 + 0)); 145 - v1 = veorq_u64(v1, vld1q_u64(dp3 + 2)); 146 - v2 = veorq_u64(v2, vld1q_u64(dp3 + 4)); 147 - v3 = veorq_u64(v3, vld1q_u64(dp3 + 6)); 148 - 149 - /* p1 ^= p4 */ 150 - v0 = veorq_u64(v0, vld1q_u64(dp4 + 0)); 151 - v1 = veorq_u64(v1, vld1q_u64(dp4 + 2)); 152 - v2 = veorq_u64(v2, vld1q_u64(dp4 + 4)); 153 - v3 = veorq_u64(v3, vld1q_u64(dp4 + 6)); 154 - 155 - /* p1 ^= p5 */ 156 - v0 = veorq_u64(v0, vld1q_u64(dp5 + 0)); 157 - v1 = veorq_u64(v1, vld1q_u64(dp5 + 2)); 158 - v2 = veorq_u64(v2, vld1q_u64(dp5 + 4)); 159 - v3 = veorq_u64(v3, vld1q_u64(dp5 + 6)); 160 - 161 - /* store */ 162 - vst1q_u64(dp1 + 0, v0); 163 - vst1q_u64(dp1 + 2, v1); 164 - vst1q_u64(dp1 + 4, v2); 165 - vst1q_u64(dp1 + 6, v3); 166 - 167 - dp1 += 8; 168 - dp2 += 8; 169 - dp3 += 8; 170 - dp4 += 8; 171 - dp5 += 8; 172 - } while (--lines > 0); 173 - } 174 - 175 - struct xor_block_template xor_block_inner_neon __ro_after_init = { 176 - .name = "__inner_neon__", 177 - .do_2 = xor_arm64_neon_2, 178 - .do_3 = xor_arm64_neon_3, 179 - .do_4 = xor_arm64_neon_4, 180 - .do_5 = xor_arm64_neon_5, 181 - }; 182 - EXPORT_SYMBOL(xor_block_inner_neon); 183 - 184 - static inline uint64x2_t eor3(uint64x2_t p, uint64x2_t q, uint64x2_t r) 185 - { 186 - uint64x2_t res; 187 - 188 - asm(ARM64_ASM_PREAMBLE ".arch_extension sha3\n" 189 - "eor3 %0.16b, %1.16b, %2.16b, %3.16b" 190 - : "=w"(res) : "w"(p), "w"(q), "w"(r)); 191 - return res; 192 - } 193 - 194 - static void xor_arm64_eor3_3(unsigned long bytes, 195 - unsigned long * __restrict p1, 196 - const unsigned long * __restrict p2, 197 - const unsigned long * __restrict p3) 198 - { 199 - uint64_t *dp1 = (uint64_t *)p1; 200 - uint64_t *dp2 = (uint64_t *)p2; 201 - uint64_t *dp3 = (uint64_t *)p3; 202 - 203 - register uint64x2_t v0, v1, v2, v3; 204 - long lines = bytes / (sizeof(uint64x2_t) * 4); 205 - 206 - do { 207 - /* p1 ^= p2 ^ p3 */ 208 - v0 = eor3(vld1q_u64(dp1 + 0), vld1q_u64(dp2 + 0), 209 - vld1q_u64(dp3 + 0)); 210 - v1 = eor3(vld1q_u64(dp1 + 2), vld1q_u64(dp2 + 2), 211 - vld1q_u64(dp3 + 2)); 212 - v2 = eor3(vld1q_u64(dp1 + 4), vld1q_u64(dp2 + 4), 213 - vld1q_u64(dp3 + 4)); 214 - v3 = eor3(vld1q_u64(dp1 + 6), vld1q_u64(dp2 + 6), 215 - vld1q_u64(dp3 + 6)); 216 - 217 - /* store */ 218 - vst1q_u64(dp1 + 0, v0); 219 - vst1q_u64(dp1 + 2, v1); 220 - vst1q_u64(dp1 + 4, v2); 221 - vst1q_u64(dp1 + 6, v3); 222 - 223 - dp1 += 8; 224 - dp2 += 8; 225 - dp3 += 8; 226 - } while (--lines > 0); 227 - } 228 - 229 - static void xor_arm64_eor3_4(unsigned long bytes, 230 - unsigned long * __restrict p1, 231 - const unsigned long * __restrict p2, 232 - const unsigned long * __restrict p3, 233 - const unsigned long * __restrict p4) 234 - { 235 - uint64_t *dp1 = (uint64_t *)p1; 236 - uint64_t *dp2 = (uint64_t *)p2; 237 - uint64_t *dp3 = (uint64_t *)p3; 238 - uint64_t *dp4 = (uint64_t *)p4; 239 - 240 - register uint64x2_t v0, v1, v2, v3; 241 - long lines = bytes / (sizeof(uint64x2_t) * 4); 242 - 243 - do { 244 - /* p1 ^= p2 ^ p3 */ 245 - v0 = eor3(vld1q_u64(dp1 + 0), vld1q_u64(dp2 + 0), 246 - vld1q_u64(dp3 + 0)); 247 - v1 = eor3(vld1q_u64(dp1 + 2), vld1q_u64(dp2 + 2), 248 - vld1q_u64(dp3 + 2)); 249 - v2 = eor3(vld1q_u64(dp1 + 4), vld1q_u64(dp2 + 4), 250 - vld1q_u64(dp3 + 4)); 251 - v3 = eor3(vld1q_u64(dp1 + 6), vld1q_u64(dp2 + 6), 252 - vld1q_u64(dp3 + 6)); 253 - 254 - /* p1 ^= p4 */ 255 - v0 = veorq_u64(v0, vld1q_u64(dp4 + 0)); 256 - v1 = veorq_u64(v1, vld1q_u64(dp4 + 2)); 257 - v2 = veorq_u64(v2, vld1q_u64(dp4 + 4)); 258 - v3 = veorq_u64(v3, vld1q_u64(dp4 + 6)); 259 - 260 - /* store */ 261 - vst1q_u64(dp1 + 0, v0); 262 - vst1q_u64(dp1 + 2, v1); 263 - vst1q_u64(dp1 + 4, v2); 264 - vst1q_u64(dp1 + 6, v3); 265 - 266 - dp1 += 8; 267 - dp2 += 8; 268 - dp3 += 8; 269 - dp4 += 8; 270 - } while (--lines > 0); 271 - } 272 - 273 - static void xor_arm64_eor3_5(unsigned long bytes, 274 - unsigned long * __restrict p1, 275 - const unsigned long * __restrict p2, 276 - const unsigned long * __restrict p3, 277 - const unsigned long * __restrict p4, 278 - const unsigned long * __restrict p5) 279 - { 280 - uint64_t *dp1 = (uint64_t *)p1; 281 - uint64_t *dp2 = (uint64_t *)p2; 282 - uint64_t *dp3 = (uint64_t *)p3; 283 - uint64_t *dp4 = (uint64_t *)p4; 284 - uint64_t *dp5 = (uint64_t *)p5; 285 - 286 - register uint64x2_t v0, v1, v2, v3; 287 - long lines = bytes / (sizeof(uint64x2_t) * 4); 288 - 289 - do { 290 - /* p1 ^= p2 ^ p3 */ 291 - v0 = eor3(vld1q_u64(dp1 + 0), vld1q_u64(dp2 + 0), 292 - vld1q_u64(dp3 + 0)); 293 - v1 = eor3(vld1q_u64(dp1 + 2), vld1q_u64(dp2 + 2), 294 - vld1q_u64(dp3 + 2)); 295 - v2 = eor3(vld1q_u64(dp1 + 4), vld1q_u64(dp2 + 4), 296 - vld1q_u64(dp3 + 4)); 297 - v3 = eor3(vld1q_u64(dp1 + 6), vld1q_u64(dp2 + 6), 298 - vld1q_u64(dp3 + 6)); 299 - 300 - /* p1 ^= p4 ^ p5 */ 301 - v0 = eor3(v0, vld1q_u64(dp4 + 0), vld1q_u64(dp5 + 0)); 302 - v1 = eor3(v1, vld1q_u64(dp4 + 2), vld1q_u64(dp5 + 2)); 303 - v2 = eor3(v2, vld1q_u64(dp4 + 4), vld1q_u64(dp5 + 4)); 304 - v3 = eor3(v3, vld1q_u64(dp4 + 6), vld1q_u64(dp5 + 6)); 305 - 306 - /* store */ 307 - vst1q_u64(dp1 + 0, v0); 308 - vst1q_u64(dp1 + 2, v1); 309 - vst1q_u64(dp1 + 4, v2); 310 - vst1q_u64(dp1 + 6, v3); 311 - 312 - dp1 += 8; 313 - dp2 += 8; 314 - dp3 += 8; 315 - dp4 += 8; 316 - dp5 += 8; 317 - } while (--lines > 0); 318 - } 319 - 320 - static int __init xor_neon_init(void) 321 - { 322 - if (cpu_have_named_feature(SHA3)) { 323 - xor_block_inner_neon.do_3 = xor_arm64_eor3_3; 324 - xor_block_inner_neon.do_4 = xor_arm64_eor3_4; 325 - xor_block_inner_neon.do_5 = xor_arm64_eor3_5; 326 - } 327 - return 0; 328 - } 329 - module_init(xor_neon_init); 330 - 331 - static void __exit xor_neon_exit(void) 332 - { 333 - } 334 - module_exit(xor_neon_exit); 335 - 336 - MODULE_AUTHOR("Jackie Liu <liuyun01@kylinos.cn>"); 337 - MODULE_DESCRIPTION("ARMv8 XOR Extensions"); 338 - MODULE_LICENSE("GPL");

-68

arch/loongarch/include/asm/xor.h

··· 1 - /* SPDX-License-Identifier: GPL-2.0-or-later */ 2 - /* 3 - * Copyright (C) 2023 WANG Xuerui <git@xen0n.name> 4 - */ 5 - #ifndef _ASM_LOONGARCH_XOR_H 6 - #define _ASM_LOONGARCH_XOR_H 7 - 8 - #include <asm/cpu-features.h> 9 - #include <asm/xor_simd.h> 10 - 11 - #ifdef CONFIG_CPU_HAS_LSX 12 - static struct xor_block_template xor_block_lsx = { 13 - .name = "lsx", 14 - .do_2 = xor_lsx_2, 15 - .do_3 = xor_lsx_3, 16 - .do_4 = xor_lsx_4, 17 - .do_5 = xor_lsx_5, 18 - }; 19 - 20 - #define XOR_SPEED_LSX() \ 21 - do { \ 22 - if (cpu_has_lsx) \ 23 - xor_speed(&xor_block_lsx); \ 24 - } while (0) 25 - #else /* CONFIG_CPU_HAS_LSX */ 26 - #define XOR_SPEED_LSX() 27 - #endif /* CONFIG_CPU_HAS_LSX */ 28 - 29 - #ifdef CONFIG_CPU_HAS_LASX 30 - static struct xor_block_template xor_block_lasx = { 31 - .name = "lasx", 32 - .do_2 = xor_lasx_2, 33 - .do_3 = xor_lasx_3, 34 - .do_4 = xor_lasx_4, 35 - .do_5 = xor_lasx_5, 36 - }; 37 - 38 - #define XOR_SPEED_LASX() \ 39 - do { \ 40 - if (cpu_has_lasx) \ 41 - xor_speed(&xor_block_lasx); \ 42 - } while (0) 43 - #else /* CONFIG_CPU_HAS_LASX */ 44 - #define XOR_SPEED_LASX() 45 - #endif /* CONFIG_CPU_HAS_LASX */ 46 - 47 - /* 48 - * For grins, also test the generic routines. 49 - * 50 - * More importantly: it cannot be ruled out at this point of time, that some 51 - * future (maybe reduced) models could run the vector algorithms slower than 52 - * the scalar ones, maybe for errata or micro-op reasons. It may be 53 - * appropriate to revisit this after one or two more uarch generations. 54 - */ 55 - #include <asm-generic/xor.h> 56 - 57 - #undef XOR_TRY_TEMPLATES 58 - #define XOR_TRY_TEMPLATES \ 59 - do { \ 60 - xor_speed(&xor_block_8regs); \ 61 - xor_speed(&xor_block_8regs_p); \ 62 - xor_speed(&xor_block_32regs); \ 63 - xor_speed(&xor_block_32regs_p); \ 64 - XOR_SPEED_LSX(); \ 65 - XOR_SPEED_LASX(); \ 66 - } while (0) 67 - 68 - #endif /* _ASM_LOONGARCH_XOR_H */

-34

arch/loongarch/include/asm/xor_simd.h

··· 1 - /* SPDX-License-Identifier: GPL-2.0-or-later */ 2 - /* 3 - * Copyright (C) 2023 WANG Xuerui <git@xen0n.name> 4 - */ 5 - #ifndef _ASM_LOONGARCH_XOR_SIMD_H 6 - #define _ASM_LOONGARCH_XOR_SIMD_H 7 - 8 - #ifdef CONFIG_CPU_HAS_LSX 9 - void xor_lsx_2(unsigned long bytes, unsigned long * __restrict p1, 10 - const unsigned long * __restrict p2); 11 - void xor_lsx_3(unsigned long bytes, unsigned long * __restrict p1, 12 - const unsigned long * __restrict p2, const unsigned long * __restrict p3); 13 - void xor_lsx_4(unsigned long bytes, unsigned long * __restrict p1, 14 - const unsigned long * __restrict p2, const unsigned long * __restrict p3, 15 - const unsigned long * __restrict p4); 16 - void xor_lsx_5(unsigned long bytes, unsigned long * __restrict p1, 17 - const unsigned long * __restrict p2, const unsigned long * __restrict p3, 18 - const unsigned long * __restrict p4, const unsigned long * __restrict p5); 19 - #endif /* CONFIG_CPU_HAS_LSX */ 20 - 21 - #ifdef CONFIG_CPU_HAS_LASX 22 - void xor_lasx_2(unsigned long bytes, unsigned long * __restrict p1, 23 - const unsigned long * __restrict p2); 24 - void xor_lasx_3(unsigned long bytes, unsigned long * __restrict p1, 25 - const unsigned long * __restrict p2, const unsigned long * __restrict p3); 26 - void xor_lasx_4(unsigned long bytes, unsigned long * __restrict p1, 27 - const unsigned long * __restrict p2, const unsigned long * __restrict p3, 28 - const unsigned long * __restrict p4); 29 - void xor_lasx_5(unsigned long bytes, unsigned long * __restrict p1, 30 - const unsigned long * __restrict p2, const unsigned long * __restrict p3, 31 - const unsigned long * __restrict p4, const unsigned long * __restrict p5); 32 - #endif /* CONFIG_CPU_HAS_LASX */ 33 - 34 - #endif /* _ASM_LOONGARCH_XOR_SIMD_H */

-2

arch/loongarch/lib/Makefile

··· 8 8 9 9 obj-$(CONFIG_ARCH_SUPPORTS_INT128) += tishift.o 10 10 11 - obj-$(CONFIG_CPU_HAS_LSX) += xor_simd.o xor_simd_glue.o 12 - 13 11 obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o

arch/loongarch/lib/xor_simd.c lib/raid/xor/loongarch/xor_simd.c

arch/loongarch/lib/xor_simd.h lib/raid/xor/loongarch/xor_simd.h

-72

arch/loongarch/lib/xor_simd_glue.c

··· 1 - // SPDX-License-Identifier: GPL-2.0-or-later 2 - /* 3 - * LoongArch SIMD XOR operations 4 - * 5 - * Copyright (C) 2023 WANG Xuerui <git@xen0n.name> 6 - */ 7 - 8 - #include <linux/export.h> 9 - #include <linux/sched.h> 10 - #include <asm/fpu.h> 11 - #include <asm/xor_simd.h> 12 - #include "xor_simd.h" 13 - 14 - #define MAKE_XOR_GLUE_2(flavor) \ 15 - void xor_##flavor##_2(unsigned long bytes, unsigned long * __restrict p1, \ 16 - const unsigned long * __restrict p2) \ 17 - { \ 18 - kernel_fpu_begin(); \ 19 - __xor_##flavor##_2(bytes, p1, p2); \ 20 - kernel_fpu_end(); \ 21 - } \ 22 - EXPORT_SYMBOL_GPL(xor_##flavor##_2) 23 - 24 - #define MAKE_XOR_GLUE_3(flavor) \ 25 - void xor_##flavor##_3(unsigned long bytes, unsigned long * __restrict p1, \ 26 - const unsigned long * __restrict p2, \ 27 - const unsigned long * __restrict p3) \ 28 - { \ 29 - kernel_fpu_begin(); \ 30 - __xor_##flavor##_3(bytes, p1, p2, p3); \ 31 - kernel_fpu_end(); \ 32 - } \ 33 - EXPORT_SYMBOL_GPL(xor_##flavor##_3) 34 - 35 - #define MAKE_XOR_GLUE_4(flavor) \ 36 - void xor_##flavor##_4(unsigned long bytes, unsigned long * __restrict p1, \ 37 - const unsigned long * __restrict p2, \ 38 - const unsigned long * __restrict p3, \ 39 - const unsigned long * __restrict p4) \ 40 - { \ 41 - kernel_fpu_begin(); \ 42 - __xor_##flavor##_4(bytes, p1, p2, p3, p4); \ 43 - kernel_fpu_end(); \ 44 - } \ 45 - EXPORT_SYMBOL_GPL(xor_##flavor##_4) 46 - 47 - #define MAKE_XOR_GLUE_5(flavor) \ 48 - void xor_##flavor##_5(unsigned long bytes, unsigned long * __restrict p1, \ 49 - const unsigned long * __restrict p2, \ 50 - const unsigned long * __restrict p3, \ 51 - const unsigned long * __restrict p4, \ 52 - const unsigned long * __restrict p5) \ 53 - { \ 54 - kernel_fpu_begin(); \ 55 - __xor_##flavor##_5(bytes, p1, p2, p3, p4, p5); \ 56 - kernel_fpu_end(); \ 57 - } \ 58 - EXPORT_SYMBOL_GPL(xor_##flavor##_5) 59 - 60 - #define MAKE_XOR_GLUES(flavor) \ 61 - MAKE_XOR_GLUE_2(flavor); \ 62 - MAKE_XOR_GLUE_3(flavor); \ 63 - MAKE_XOR_GLUE_4(flavor); \ 64 - MAKE_XOR_GLUE_5(flavor) 65 - 66 - #ifdef CONFIG_CPU_HAS_LSX 67 - MAKE_XOR_GLUES(lsx); 68 - #endif 69 - 70 - #ifdef CONFIG_CPU_HAS_LASX 71 - MAKE_XOR_GLUES(lasx); 72 - #endif

arch/loongarch/lib/xor_template.c lib/raid/xor/loongarch/xor_template.c

-47

arch/powerpc/include/asm/xor.h

··· 1 - /* SPDX-License-Identifier: GPL-2.0-or-later */ 2 - /* 3 - * 4 - * Copyright (C) IBM Corporation, 2012 5 - * 6 - * Author: Anton Blanchard <anton@au.ibm.com> 7 - */ 8 - #ifndef _ASM_POWERPC_XOR_H 9 - #define _ASM_POWERPC_XOR_H 10 - 11 - #ifdef CONFIG_ALTIVEC 12 - 13 - #include <asm/cputable.h> 14 - #include <asm/cpu_has_feature.h> 15 - #include <asm/xor_altivec.h> 16 - 17 - static struct xor_block_template xor_block_altivec = { 18 - .name = "altivec", 19 - .do_2 = xor_altivec_2, 20 - .do_3 = xor_altivec_3, 21 - .do_4 = xor_altivec_4, 22 - .do_5 = xor_altivec_5, 23 - }; 24 - 25 - #define XOR_SPEED_ALTIVEC() \ 26 - do { \ 27 - if (cpu_has_feature(CPU_FTR_ALTIVEC)) \ 28 - xor_speed(&xor_block_altivec); \ 29 - } while (0) 30 - #else 31 - #define XOR_SPEED_ALTIVEC() 32 - #endif 33 - 34 - /* Also try the generic routines. */ 35 - #include <asm-generic/xor.h> 36 - 37 - #undef XOR_TRY_TEMPLATES 38 - #define XOR_TRY_TEMPLATES \ 39 - do { \ 40 - xor_speed(&xor_block_8regs); \ 41 - xor_speed(&xor_block_8regs_p); \ 42 - xor_speed(&xor_block_32regs); \ 43 - xor_speed(&xor_block_32regs_p); \ 44 - XOR_SPEED_ALTIVEC(); \ 45 - } while (0) 46 - 47 - #endif /* _ASM_POWERPC_XOR_H */

-22

arch/powerpc/include/asm/xor_altivec.h

··· 1 - /* SPDX-License-Identifier: GPL-2.0 */ 2 - #ifndef _ASM_POWERPC_XOR_ALTIVEC_H 3 - #define _ASM_POWERPC_XOR_ALTIVEC_H 4 - 5 - #ifdef CONFIG_ALTIVEC 6 - void xor_altivec_2(unsigned long bytes, unsigned long * __restrict p1, 7 - const unsigned long * __restrict p2); 8 - void xor_altivec_3(unsigned long bytes, unsigned long * __restrict p1, 9 - const unsigned long * __restrict p2, 10 - const unsigned long * __restrict p3); 11 - void xor_altivec_4(unsigned long bytes, unsigned long * __restrict p1, 12 - const unsigned long * __restrict p2, 13 - const unsigned long * __restrict p3, 14 - const unsigned long * __restrict p4); 15 - void xor_altivec_5(unsigned long bytes, unsigned long * __restrict p1, 16 - const unsigned long * __restrict p2, 17 - const unsigned long * __restrict p3, 18 - const unsigned long * __restrict p4, 19 - const unsigned long * __restrict p5); 20 - 21 - #endif 22 - #endif /* _ASM_POWERPC_XOR_ALTIVEC_H */

+4

arch/powerpc/kexec/elf_64.c

··· 79 79 goto out; 80 80 } 81 81 82 + ret = crash_load_dm_crypt_keys(image); 83 + if (ret) 84 + goto out; 85 + 82 86 /* Setup cmdline for kdump kernel case */ 83 87 modified_cmdline = setup_kdump_cmdline(image, cmdline, 84 88 cmdline_len);

-5

arch/powerpc/lib/Makefile

··· 72 72 73 73 obj-$(CONFIG_FTR_FIXUP_SELFTEST) += feature-fixups-test.o 74 74 75 - obj-$(CONFIG_ALTIVEC) += xor_vmx.o xor_vmx_glue.o 76 - CFLAGS_xor_vmx.o += -mhard-float -maltivec $(call cc-option,-mabi=altivec) 77 - # Enable <altivec.h> 78 - CFLAGS_xor_vmx.o += -isystem $(shell $(CC) -print-file-name=include) 79 - 80 75 obj-$(CONFIG_PPC64) += $(obj64-y)

-156

arch/powerpc/lib/xor_vmx.c

··· 1 - // SPDX-License-Identifier: GPL-2.0-or-later 2 - /* 3 - * 4 - * Copyright (C) IBM Corporation, 2012 5 - * 6 - * Author: Anton Blanchard <anton@au.ibm.com> 7 - */ 8 - 9 - /* 10 - * Sparse (as at v0.5.0) gets very, very confused by this file. 11 - * Make it a bit simpler for it. 12 - */ 13 - #if !defined(__CHECKER__) 14 - #include <altivec.h> 15 - #else 16 - #define vec_xor(a, b) a ^ b 17 - #define vector __attribute__((vector_size(16))) 18 - #endif 19 - 20 - #include "xor_vmx.h" 21 - 22 - typedef vector signed char unative_t; 23 - 24 - #define DEFINE(V) \ 25 - unative_t *V = (unative_t *)V##_in; \ 26 - unative_t V##_0, V##_1, V##_2, V##_3 27 - 28 - #define LOAD(V) \ 29 - do { \ 30 - V##_0 = V[0]; \ 31 - V##_1 = V[1]; \ 32 - V##_2 = V[2]; \ 33 - V##_3 = V[3]; \ 34 - } while (0) 35 - 36 - #define STORE(V) \ 37 - do { \ 38 - V[0] = V##_0; \ 39 - V[1] = V##_1; \ 40 - V[2] = V##_2; \ 41 - V[3] = V##_3; \ 42 - } while (0) 43 - 44 - #define XOR(V1, V2) \ 45 - do { \ 46 - V1##_0 = vec_xor(V1##_0, V2##_0); \ 47 - V1##_1 = vec_xor(V1##_1, V2##_1); \ 48 - V1##_2 = vec_xor(V1##_2, V2##_2); \ 49 - V1##_3 = vec_xor(V1##_3, V2##_3); \ 50 - } while (0) 51 - 52 - void __xor_altivec_2(unsigned long bytes, 53 - unsigned long * __restrict v1_in, 54 - const unsigned long * __restrict v2_in) 55 - { 56 - DEFINE(v1); 57 - DEFINE(v2); 58 - unsigned long lines = bytes / (sizeof(unative_t)) / 4; 59 - 60 - do { 61 - LOAD(v1); 62 - LOAD(v2); 63 - XOR(v1, v2); 64 - STORE(v1); 65 - 66 - v1 += 4; 67 - v2 += 4; 68 - } while (--lines > 0); 69 - } 70 - 71 - void __xor_altivec_3(unsigned long bytes, 72 - unsigned long * __restrict v1_in, 73 - const unsigned long * __restrict v2_in, 74 - const unsigned long * __restrict v3_in) 75 - { 76 - DEFINE(v1); 77 - DEFINE(v2); 78 - DEFINE(v3); 79 - unsigned long lines = bytes / (sizeof(unative_t)) / 4; 80 - 81 - do { 82 - LOAD(v1); 83 - LOAD(v2); 84 - LOAD(v3); 85 - XOR(v1, v2); 86 - XOR(v1, v3); 87 - STORE(v1); 88 - 89 - v1 += 4; 90 - v2 += 4; 91 - v3 += 4; 92 - } while (--lines > 0); 93 - } 94 - 95 - void __xor_altivec_4(unsigned long bytes, 96 - unsigned long * __restrict v1_in, 97 - const unsigned long * __restrict v2_in, 98 - const unsigned long * __restrict v3_in, 99 - const unsigned long * __restrict v4_in) 100 - { 101 - DEFINE(v1); 102 - DEFINE(v2); 103 - DEFINE(v3); 104 - DEFINE(v4); 105 - unsigned long lines = bytes / (sizeof(unative_t)) / 4; 106 - 107 - do { 108 - LOAD(v1); 109 - LOAD(v2); 110 - LOAD(v3); 111 - LOAD(v4); 112 - XOR(v1, v2); 113 - XOR(v3, v4); 114 - XOR(v1, v3); 115 - STORE(v1); 116 - 117 - v1 += 4; 118 - v2 += 4; 119 - v3 += 4; 120 - v4 += 4; 121 - } while (--lines > 0); 122 - } 123 - 124 - void __xor_altivec_5(unsigned long bytes, 125 - unsigned long * __restrict v1_in, 126 - const unsigned long * __restrict v2_in, 127 - const unsigned long * __restrict v3_in, 128 - const unsigned long * __restrict v4_in, 129 - const unsigned long * __restrict v5_in) 130 - { 131 - DEFINE(v1); 132 - DEFINE(v2); 133 - DEFINE(v3); 134 - DEFINE(v4); 135 - DEFINE(v5); 136 - unsigned long lines = bytes / (sizeof(unative_t)) / 4; 137 - 138 - do { 139 - LOAD(v1); 140 - LOAD(v2); 141 - LOAD(v3); 142 - LOAD(v4); 143 - LOAD(v5); 144 - XOR(v1, v2); 145 - XOR(v3, v4); 146 - XOR(v1, v5); 147 - XOR(v1, v3); 148 - STORE(v1); 149 - 150 - v1 += 4; 151 - v2 += 4; 152 - v3 += 4; 153 - v4 += 4; 154 - v5 += 4; 155 - } while (--lines > 0); 156 - }

-22

arch/powerpc/lib/xor_vmx.h

··· 1 - /* SPDX-License-Identifier: GPL-2.0 */ 2 - /* 3 - * Simple interface to link xor_vmx.c and xor_vmx_glue.c 4 - * 5 - * Separating these file ensures that no altivec instructions are run 6 - * outside of the enable/disable altivec block. 7 - */ 8 - 9 - void __xor_altivec_2(unsigned long bytes, unsigned long * __restrict p1, 10 - const unsigned long * __restrict p2); 11 - void __xor_altivec_3(unsigned long bytes, unsigned long * __restrict p1, 12 - const unsigned long * __restrict p2, 13 - const unsigned long * __restrict p3); 14 - void __xor_altivec_4(unsigned long bytes, unsigned long * __restrict p1, 15 - const unsigned long * __restrict p2, 16 - const unsigned long * __restrict p3, 17 - const unsigned long * __restrict p4); 18 - void __xor_altivec_5(unsigned long bytes, unsigned long * __restrict p1, 19 - const unsigned long * __restrict p2, 20 - const unsigned long * __restrict p3, 21 - const unsigned long * __restrict p4, 22 - const unsigned long * __restrict p5);

-63

arch/powerpc/lib/xor_vmx_glue.c

··· 1 - // SPDX-License-Identifier: GPL-2.0-or-later 2 - /* 3 - * Altivec XOR operations 4 - * 5 - * Copyright 2017 IBM Corp. 6 - */ 7 - 8 - #include <linux/preempt.h> 9 - #include <linux/export.h> 10 - #include <linux/sched.h> 11 - #include <asm/switch_to.h> 12 - #include <asm/xor_altivec.h> 13 - #include "xor_vmx.h" 14 - 15 - void xor_altivec_2(unsigned long bytes, unsigned long * __restrict p1, 16 - const unsigned long * __restrict p2) 17 - { 18 - preempt_disable(); 19 - enable_kernel_altivec(); 20 - __xor_altivec_2(bytes, p1, p2); 21 - disable_kernel_altivec(); 22 - preempt_enable(); 23 - } 24 - EXPORT_SYMBOL(xor_altivec_2); 25 - 26 - void xor_altivec_3(unsigned long bytes, unsigned long * __restrict p1, 27 - const unsigned long * __restrict p2, 28 - const unsigned long * __restrict p3) 29 - { 30 - preempt_disable(); 31 - enable_kernel_altivec(); 32 - __xor_altivec_3(bytes, p1, p2, p3); 33 - disable_kernel_altivec(); 34 - preempt_enable(); 35 - } 36 - EXPORT_SYMBOL(xor_altivec_3); 37 - 38 - void xor_altivec_4(unsigned long bytes, unsigned long * __restrict p1, 39 - const unsigned long * __restrict p2, 40 - const unsigned long * __restrict p3, 41 - const unsigned long * __restrict p4) 42 - { 43 - preempt_disable(); 44 - enable_kernel_altivec(); 45 - __xor_altivec_4(bytes, p1, p2, p3, p4); 46 - disable_kernel_altivec(); 47 - preempt_enable(); 48 - } 49 - EXPORT_SYMBOL(xor_altivec_4); 50 - 51 - void xor_altivec_5(unsigned long bytes, unsigned long * __restrict p1, 52 - const unsigned long * __restrict p2, 53 - const unsigned long * __restrict p3, 54 - const unsigned long * __restrict p4, 55 - const unsigned long * __restrict p5) 56 - { 57 - preempt_disable(); 58 - enable_kernel_altivec(); 59 - __xor_altivec_5(bytes, p1, p2, p3, p4, p5); 60 - disable_kernel_altivec(); 61 - preempt_enable(); 62 - } 63 - EXPORT_SYMBOL(xor_altivec_5);

-68

arch/riscv/include/asm/xor.h

··· 1 - /* SPDX-License-Identifier: GPL-2.0-or-later */ 2 - /* 3 - * Copyright (C) 2021 SiFive 4 - */ 5 - 6 - #include <linux/hardirq.h> 7 - #include <asm-generic/xor.h> 8 - #ifdef CONFIG_RISCV_ISA_V 9 - #include <asm/vector.h> 10 - #include <asm/switch_to.h> 11 - #include <asm/asm-prototypes.h> 12 - 13 - static void xor_vector_2(unsigned long bytes, unsigned long *__restrict p1, 14 - const unsigned long *__restrict p2) 15 - { 16 - kernel_vector_begin(); 17 - xor_regs_2_(bytes, p1, p2); 18 - kernel_vector_end(); 19 - } 20 - 21 - static void xor_vector_3(unsigned long bytes, unsigned long *__restrict p1, 22 - const unsigned long *__restrict p2, 23 - const unsigned long *__restrict p3) 24 - { 25 - kernel_vector_begin(); 26 - xor_regs_3_(bytes, p1, p2, p3); 27 - kernel_vector_end(); 28 - } 29 - 30 - static void xor_vector_4(unsigned long bytes, unsigned long *__restrict p1, 31 - const unsigned long *__restrict p2, 32 - const unsigned long *__restrict p3, 33 - const unsigned long *__restrict p4) 34 - { 35 - kernel_vector_begin(); 36 - xor_regs_4_(bytes, p1, p2, p3, p4); 37 - kernel_vector_end(); 38 - } 39 - 40 - static void xor_vector_5(unsigned long bytes, unsigned long *__restrict p1, 41 - const unsigned long *__restrict p2, 42 - const unsigned long *__restrict p3, 43 - const unsigned long *__restrict p4, 44 - const unsigned long *__restrict p5) 45 - { 46 - kernel_vector_begin(); 47 - xor_regs_5_(bytes, p1, p2, p3, p4, p5); 48 - kernel_vector_end(); 49 - } 50 - 51 - static struct xor_block_template xor_block_rvv = { 52 - .name = "rvv", 53 - .do_2 = xor_vector_2, 54 - .do_3 = xor_vector_3, 55 - .do_4 = xor_vector_4, 56 - .do_5 = xor_vector_5 57 - }; 58 - 59 - #undef XOR_TRY_TEMPLATES 60 - #define XOR_TRY_TEMPLATES \ 61 - do { \ 62 - xor_speed(&xor_block_8regs); \ 63 - xor_speed(&xor_block_32regs); \ 64 - if (has_vector()) { \ 65 - xor_speed(&xor_block_rvv);\ 66 - } \ 67 - } while (0) 68 - #endif

-1

arch/riscv/lib/Makefile

··· 16 16 lib-$(CONFIG_64BIT) += tishift.o 17 17 lib-$(CONFIG_RISCV_ISA_ZICBOZ) += clear_page.o 18 18 obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o 19 - lib-$(CONFIG_RISCV_ISA_V) += xor.o 20 19 lib-$(CONFIG_RISCV_ISA_V) += riscv_v_helpers.o

-81

arch/riscv/lib/xor.S

··· 1 - /* SPDX-License-Identifier: GPL-2.0-or-later */ 2 - /* 3 - * Copyright (C) 2021 SiFive 4 - */ 5 - #include <linux/linkage.h> 6 - #include <linux/export.h> 7 - #include <asm/asm.h> 8 - 9 - SYM_FUNC_START(xor_regs_2_) 10 - vsetvli a3, a0, e8, m8, ta, ma 11 - vle8.v v0, (a1) 12 - vle8.v v8, (a2) 13 - sub a0, a0, a3 14 - vxor.vv v16, v0, v8 15 - add a2, a2, a3 16 - vse8.v v16, (a1) 17 - add a1, a1, a3 18 - bnez a0, xor_regs_2_ 19 - ret 20 - SYM_FUNC_END(xor_regs_2_) 21 - EXPORT_SYMBOL(xor_regs_2_) 22 - 23 - SYM_FUNC_START(xor_regs_3_) 24 - vsetvli a4, a0, e8, m8, ta, ma 25 - vle8.v v0, (a1) 26 - vle8.v v8, (a2) 27 - sub a0, a0, a4 28 - vxor.vv v0, v0, v8 29 - vle8.v v16, (a3) 30 - add a2, a2, a4 31 - vxor.vv v16, v0, v16 32 - add a3, a3, a4 33 - vse8.v v16, (a1) 34 - add a1, a1, a4 35 - bnez a0, xor_regs_3_ 36 - ret 37 - SYM_FUNC_END(xor_regs_3_) 38 - EXPORT_SYMBOL(xor_regs_3_) 39 - 40 - SYM_FUNC_START(xor_regs_4_) 41 - vsetvli a5, a0, e8, m8, ta, ma 42 - vle8.v v0, (a1) 43 - vle8.v v8, (a2) 44 - sub a0, a0, a5 45 - vxor.vv v0, v0, v8 46 - vle8.v v16, (a3) 47 - add a2, a2, a5 48 - vxor.vv v0, v0, v16 49 - vle8.v v24, (a4) 50 - add a3, a3, a5 51 - vxor.vv v16, v0, v24 52 - add a4, a4, a5 53 - vse8.v v16, (a1) 54 - add a1, a1, a5 55 - bnez a0, xor_regs_4_ 56 - ret 57 - SYM_FUNC_END(xor_regs_4_) 58 - EXPORT_SYMBOL(xor_regs_4_) 59 - 60 - SYM_FUNC_START(xor_regs_5_) 61 - vsetvli a6, a0, e8, m8, ta, ma 62 - vle8.v v0, (a1) 63 - vle8.v v8, (a2) 64 - sub a0, a0, a6 65 - vxor.vv v0, v0, v8 66 - vle8.v v16, (a3) 67 - add a2, a2, a6 68 - vxor.vv v0, v0, v16 69 - vle8.v v24, (a4) 70 - add a3, a3, a6 71 - vxor.vv v0, v0, v24 72 - vle8.v v8, (a5) 73 - add a4, a4, a6 74 - vxor.vv v16, v0, v8 75 - add a5, a5, a6 76 - vse8.v v16, (a1) 77 - add a1, a1, a6 78 - bnez a0, xor_regs_5_ 79 - ret 80 - SYM_FUNC_END(xor_regs_5_) 81 - EXPORT_SYMBOL(xor_regs_5_)

-21

arch/s390/include/asm/xor.h

··· 1 - /* SPDX-License-Identifier: GPL-2.0 */ 2 - /* 3 - * Optimited xor routines 4 - * 5 - * Copyright IBM Corp. 2016 6 - * Author(s): Martin Schwidefsky <schwidefsky@de.ibm.com> 7 - */ 8 - #ifndef _ASM_S390_XOR_H 9 - #define _ASM_S390_XOR_H 10 - 11 - extern struct xor_block_template xor_block_xc; 12 - 13 - #undef XOR_TRY_TEMPLATES 14 - #define XOR_TRY_TEMPLATES \ 15 - do { \ 16 - xor_speed(&xor_block_xc); \ 17 - } while (0) 18 - 19 - #define XOR_SELECT_TEMPLATE(FASTEST) (&xor_block_xc) 20 - 21 - #endif /* _ASM_S390_XOR_H */

+1 -1

arch/s390/lib/Makefile

··· 5 5 6 6 lib-y += delay.o string.o uaccess.o find.o spinlock.o tishift.o 7 7 lib-y += csum-partial.o 8 - obj-y += mem.o xor.o 8 + obj-y += mem.o 9 9 lib-$(CONFIG_KPROBES) += probes.o 10 10 lib-$(CONFIG_UPROBES) += probes.o 11 11 obj-$(CONFIG_S390_KPROBES_SANITY_TEST) += test_kprobes_s390.o

-136

arch/s390/lib/xor.c

··· 1 - // SPDX-License-Identifier: GPL-2.0 2 - /* 3 - * Optimized xor_block operation for RAID4/5 4 - * 5 - * Copyright IBM Corp. 2016 6 - * Author(s): Martin Schwidefsky <schwidefsky@de.ibm.com> 7 - */ 8 - 9 - #include <linux/types.h> 10 - #include <linux/export.h> 11 - #include <linux/raid/xor.h> 12 - #include <asm/xor.h> 13 - 14 - static void xor_xc_2(unsigned long bytes, unsigned long * __restrict p1, 15 - const unsigned long * __restrict p2) 16 - { 17 - asm volatile( 18 - " aghi %0,-1\n" 19 - " jm 3f\n" 20 - " srlg 0,%0,8\n" 21 - " ltgr 0,0\n" 22 - " jz 1f\n" 23 - "0: xc 0(256,%1),0(%2)\n" 24 - " la %1,256(%1)\n" 25 - " la %2,256(%2)\n" 26 - " brctg 0,0b\n" 27 - "1: exrl %0,2f\n" 28 - " j 3f\n" 29 - "2: xc 0(1,%1),0(%2)\n" 30 - "3:" 31 - : "+a" (bytes), "+a" (p1), "+a" (p2) 32 - : : "0", "cc", "memory"); 33 - } 34 - 35 - static void xor_xc_3(unsigned long bytes, unsigned long * __restrict p1, 36 - const unsigned long * __restrict p2, 37 - const unsigned long * __restrict p3) 38 - { 39 - asm volatile( 40 - " aghi %0,-1\n" 41 - " jm 4f\n" 42 - " srlg 0,%0,8\n" 43 - " ltgr 0,0\n" 44 - " jz 1f\n" 45 - "0: xc 0(256,%1),0(%2)\n" 46 - " xc 0(256,%1),0(%3)\n" 47 - " la %1,256(%1)\n" 48 - " la %2,256(%2)\n" 49 - " la %3,256(%3)\n" 50 - " brctg 0,0b\n" 51 - "1: exrl %0,2f\n" 52 - " exrl %0,3f\n" 53 - " j 4f\n" 54 - "2: xc 0(1,%1),0(%2)\n" 55 - "3: xc 0(1,%1),0(%3)\n" 56 - "4:" 57 - : "+a" (bytes), "+a" (p1), "+a" (p2), "+a" (p3) 58 - : : "0", "cc", "memory"); 59 - } 60 - 61 - static void xor_xc_4(unsigned long bytes, unsigned long * __restrict p1, 62 - const unsigned long * __restrict p2, 63 - const unsigned long * __restrict p3, 64 - const unsigned long * __restrict p4) 65 - { 66 - asm volatile( 67 - " aghi %0,-1\n" 68 - " jm 5f\n" 69 - " srlg 0,%0,8\n" 70 - " ltgr 0,0\n" 71 - " jz 1f\n" 72 - "0: xc 0(256,%1),0(%2)\n" 73 - " xc 0(256,%1),0(%3)\n" 74 - " xc 0(256,%1),0(%4)\n" 75 - " la %1,256(%1)\n" 76 - " la %2,256(%2)\n" 77 - " la %3,256(%3)\n" 78 - " la %4,256(%4)\n" 79 - " brctg 0,0b\n" 80 - "1: exrl %0,2f\n" 81 - " exrl %0,3f\n" 82 - " exrl %0,4f\n" 83 - " j 5f\n" 84 - "2: xc 0(1,%1),0(%2)\n" 85 - "3: xc 0(1,%1),0(%3)\n" 86 - "4: xc 0(1,%1),0(%4)\n" 87 - "5:" 88 - : "+a" (bytes), "+a" (p1), "+a" (p2), "+a" (p3), "+a" (p4) 89 - : : "0", "cc", "memory"); 90 - } 91 - 92 - static void xor_xc_5(unsigned long bytes, unsigned long * __restrict p1, 93 - const unsigned long * __restrict p2, 94 - const unsigned long * __restrict p3, 95 - const unsigned long * __restrict p4, 96 - const unsigned long * __restrict p5) 97 - { 98 - asm volatile( 99 - " aghi %0,-1\n" 100 - " jm 6f\n" 101 - " srlg 0,%0,8\n" 102 - " ltgr 0,0\n" 103 - " jz 1f\n" 104 - "0: xc 0(256,%1),0(%2)\n" 105 - " xc 0(256,%1),0(%3)\n" 106 - " xc 0(256,%1),0(%4)\n" 107 - " xc 0(256,%1),0(%5)\n" 108 - " la %1,256(%1)\n" 109 - " la %2,256(%2)\n" 110 - " la %3,256(%3)\n" 111 - " la %4,256(%4)\n" 112 - " la %5,256(%5)\n" 113 - " brctg 0,0b\n" 114 - "1: exrl %0,2f\n" 115 - " exrl %0,3f\n" 116 - " exrl %0,4f\n" 117 - " exrl %0,5f\n" 118 - " j 6f\n" 119 - "2: xc 0(1,%1),0(%2)\n" 120 - "3: xc 0(1,%1),0(%3)\n" 121 - "4: xc 0(1,%1),0(%4)\n" 122 - "5: xc 0(1,%1),0(%5)\n" 123 - "6:" 124 - : "+a" (bytes), "+a" (p1), "+a" (p2), "+a" (p3), "+a" (p4), 125 - "+a" (p5) 126 - : : "0", "cc", "memory"); 127 - } 128 - 129 - struct xor_block_template xor_block_xc = { 130 - .name = "xc", 131 - .do_2 = xor_xc_2, 132 - .do_3 = xor_xc_3, 133 - .do_4 = xor_xc_4, 134 - .do_5 = xor_xc_5, 135 - }; 136 - EXPORT_SYMBOL(xor_block_xc);

-1

arch/sparc/include/asm/asm-prototypes.h

··· 14 14 #include <asm/oplib.h> 15 15 #include <asm/pgtable.h> 16 16 #include <asm/trap_block.h> 17 - #include <asm/xor.h> 18 17 19 18 void *__memscan_zero(void *, size_t); 20 19 void *__memscan_generic(void *, int, size_t);

-9

arch/sparc/include/asm/xor.h

··· 1 - /* SPDX-License-Identifier: GPL-2.0 */ 2 - #ifndef ___ASM_SPARC_XOR_H 3 - #define ___ASM_SPARC_XOR_H 4 - #if defined(__sparc__) && defined(__arch64__) 5 - #include <asm/xor_64.h> 6 - #else 7 - #include <asm/xor_32.h> 8 - #endif 9 - #endif

-268

arch/sparc/include/asm/xor_32.h

··· 1 - /* SPDX-License-Identifier: GPL-2.0-or-later */ 2 - /* 3 - * include/asm/xor.h 4 - * 5 - * Optimized RAID-5 checksumming functions for 32-bit Sparc. 6 - */ 7 - 8 - /* 9 - * High speed xor_block operation for RAID4/5 utilizing the 10 - * ldd/std SPARC instructions. 11 - * 12 - * Copyright (C) 1999 Jakub Jelinek (jj@ultra.linux.cz) 13 - */ 14 - 15 - static void 16 - sparc_2(unsigned long bytes, unsigned long * __restrict p1, 17 - const unsigned long * __restrict p2) 18 - { 19 - int lines = bytes / (sizeof (long)) / 8; 20 - 21 - do { 22 - __asm__ __volatile__( 23 - "ldd [%0 + 0x00], %%g2\n\t" 24 - "ldd [%0 + 0x08], %%g4\n\t" 25 - "ldd [%0 + 0x10], %%o0\n\t" 26 - "ldd [%0 + 0x18], %%o2\n\t" 27 - "ldd [%1 + 0x00], %%o4\n\t" 28 - "ldd [%1 + 0x08], %%l0\n\t" 29 - "ldd [%1 + 0x10], %%l2\n\t" 30 - "ldd [%1 + 0x18], %%l4\n\t" 31 - "xor %%g2, %%o4, %%g2\n\t" 32 - "xor %%g3, %%o5, %%g3\n\t" 33 - "xor %%g4, %%l0, %%g4\n\t" 34 - "xor %%g5, %%l1, %%g5\n\t" 35 - "xor %%o0, %%l2, %%o0\n\t" 36 - "xor %%o1, %%l3, %%o1\n\t" 37 - "xor %%o2, %%l4, %%o2\n\t" 38 - "xor %%o3, %%l5, %%o3\n\t" 39 - "std %%g2, [%0 + 0x00]\n\t" 40 - "std %%g4, [%0 + 0x08]\n\t" 41 - "std %%o0, [%0 + 0x10]\n\t" 42 - "std %%o2, [%0 + 0x18]\n" 43 - : 44 - : "r" (p1), "r" (p2) 45 - : "g2", "g3", "g4", "g5", 46 - "o0", "o1", "o2", "o3", "o4", "o5", 47 - "l0", "l1", "l2", "l3", "l4", "l5"); 48 - p1 += 8; 49 - p2 += 8; 50 - } while (--lines > 0); 51 - } 52 - 53 - static void 54 - sparc_3(unsigned long bytes, unsigned long * __restrict p1, 55 - const unsigned long * __restrict p2, 56 - const unsigned long * __restrict p3) 57 - { 58 - int lines = bytes / (sizeof (long)) / 8; 59 - 60 - do { 61 - __asm__ __volatile__( 62 - "ldd [%0 + 0x00], %%g2\n\t" 63 - "ldd [%0 + 0x08], %%g4\n\t" 64 - "ldd [%0 + 0x10], %%o0\n\t" 65 - "ldd [%0 + 0x18], %%o2\n\t" 66 - "ldd [%1 + 0x00], %%o4\n\t" 67 - "ldd [%1 + 0x08], %%l0\n\t" 68 - "ldd [%1 + 0x10], %%l2\n\t" 69 - "ldd [%1 + 0x18], %%l4\n\t" 70 - "xor %%g2, %%o4, %%g2\n\t" 71 - "xor %%g3, %%o5, %%g3\n\t" 72 - "ldd [%2 + 0x00], %%o4\n\t" 73 - "xor %%g4, %%l0, %%g4\n\t" 74 - "xor %%g5, %%l1, %%g5\n\t" 75 - "ldd [%2 + 0x08], %%l0\n\t" 76 - "xor %%o0, %%l2, %%o0\n\t" 77 - "xor %%o1, %%l3, %%o1\n\t" 78 - "ldd [%2 + 0x10], %%l2\n\t" 79 - "xor %%o2, %%l4, %%o2\n\t" 80 - "xor %%o3, %%l5, %%o3\n\t" 81 - "ldd [%2 + 0x18], %%l4\n\t" 82 - "xor %%g2, %%o4, %%g2\n\t" 83 - "xor %%g3, %%o5, %%g3\n\t" 84 - "xor %%g4, %%l0, %%g4\n\t" 85 - "xor %%g5, %%l1, %%g5\n\t" 86 - "xor %%o0, %%l2, %%o0\n\t" 87 - "xor %%o1, %%l3, %%o1\n\t" 88 - "xor %%o2, %%l4, %%o2\n\t" 89 - "xor %%o3, %%l5, %%o3\n\t" 90 - "std %%g2, [%0 + 0x00]\n\t" 91 - "std %%g4, [%0 + 0x08]\n\t" 92 - "std %%o0, [%0 + 0x10]\n\t" 93 - "std %%o2, [%0 + 0x18]\n" 94 - : 95 - : "r" (p1), "r" (p2), "r" (p3) 96 - : "g2", "g3", "g4", "g5", 97 - "o0", "o1", "o2", "o3", "o4", "o5", 98 - "l0", "l1", "l2", "l3", "l4", "l5"); 99 - p1 += 8; 100 - p2 += 8; 101 - p3 += 8; 102 - } while (--lines > 0); 103 - } 104 - 105 - static void 106 - sparc_4(unsigned long bytes, unsigned long * __restrict p1, 107 - const unsigned long * __restrict p2, 108 - const unsigned long * __restrict p3, 109 - const unsigned long * __restrict p4) 110 - { 111 - int lines = bytes / (sizeof (long)) / 8; 112 - 113 - do { 114 - __asm__ __volatile__( 115 - "ldd [%0 + 0x00], %%g2\n\t" 116 - "ldd [%0 + 0x08], %%g4\n\t" 117 - "ldd [%0 + 0x10], %%o0\n\t" 118 - "ldd [%0 + 0x18], %%o2\n\t" 119 - "ldd [%1 + 0x00], %%o4\n\t" 120 - "ldd [%1 + 0x08], %%l0\n\t" 121 - "ldd [%1 + 0x10], %%l2\n\t" 122 - "ldd [%1 + 0x18], %%l4\n\t" 123 - "xor %%g2, %%o4, %%g2\n\t" 124 - "xor %%g3, %%o5, %%g3\n\t" 125 - "ldd [%2 + 0x00], %%o4\n\t" 126 - "xor %%g4, %%l0, %%g4\n\t" 127 - "xor %%g5, %%l1, %%g5\n\t" 128 - "ldd [%2 + 0x08], %%l0\n\t" 129 - "xor %%o0, %%l2, %%o0\n\t" 130 - "xor %%o1, %%l3, %%o1\n\t" 131 - "ldd [%2 + 0x10], %%l2\n\t" 132 - "xor %%o2, %%l4, %%o2\n\t" 133 - "xor %%o3, %%l5, %%o3\n\t" 134 - "ldd [%2 + 0x18], %%l4\n\t" 135 - "xor %%g2, %%o4, %%g2\n\t" 136 - "xor %%g3, %%o5, %%g3\n\t" 137 - "ldd [%3 + 0x00], %%o4\n\t" 138 - "xor %%g4, %%l0, %%g4\n\t" 139 - "xor %%g5, %%l1, %%g5\n\t" 140 - "ldd [%3 + 0x08], %%l0\n\t" 141 - "xor %%o0, %%l2, %%o0\n\t" 142 - "xor %%o1, %%l3, %%o1\n\t" 143 - "ldd [%3 + 0x10], %%l2\n\t" 144 - "xor %%o2, %%l4, %%o2\n\t" 145 - "xor %%o3, %%l5, %%o3\n\t" 146 - "ldd [%3 + 0x18], %%l4\n\t" 147 - "xor %%g2, %%o4, %%g2\n\t" 148 - "xor %%g3, %%o5, %%g3\n\t" 149 - "xor %%g4, %%l0, %%g4\n\t" 150 - "xor %%g5, %%l1, %%g5\n\t" 151 - "xor %%o0, %%l2, %%o0\n\t" 152 - "xor %%o1, %%l3, %%o1\n\t" 153 - "xor %%o2, %%l4, %%o2\n\t" 154 - "xor %%o3, %%l5, %%o3\n\t" 155 - "std %%g2, [%0 + 0x00]\n\t" 156 - "std %%g4, [%0 + 0x08]\n\t" 157 - "std %%o0, [%0 + 0x10]\n\t" 158 - "std %%o2, [%0 + 0x18]\n" 159 - : 160 - : "r" (p1), "r" (p2), "r" (p3), "r" (p4) 161 - : "g2", "g3", "g4", "g5", 162 - "o0", "o1", "o2", "o3", "o4", "o5", 163 - "l0", "l1", "l2", "l3", "l4", "l5"); 164 - p1 += 8; 165 - p2 += 8; 166 - p3 += 8; 167 - p4 += 8; 168 - } while (--lines > 0); 169 - } 170 - 171 - static void 172 - sparc_5(unsigned long bytes, unsigned long * __restrict p1, 173 - const unsigned long * __restrict p2, 174 - const unsigned long * __restrict p3, 175 - const unsigned long * __restrict p4, 176 - const unsigned long * __restrict p5) 177 - { 178 - int lines = bytes / (sizeof (long)) / 8; 179 - 180 - do { 181 - __asm__ __volatile__( 182 - "ldd [%0 + 0x00], %%g2\n\t" 183 - "ldd [%0 + 0x08], %%g4\n\t" 184 - "ldd [%0 + 0x10], %%o0\n\t" 185 - "ldd [%0 + 0x18], %%o2\n\t" 186 - "ldd [%1 + 0x00], %%o4\n\t" 187 - "ldd [%1 + 0x08], %%l0\n\t" 188 - "ldd [%1 + 0x10], %%l2\n\t" 189 - "ldd [%1 + 0x18], %%l4\n\t" 190 - "xor %%g2, %%o4, %%g2\n\t" 191 - "xor %%g3, %%o5, %%g3\n\t" 192 - "ldd [%2 + 0x00], %%o4\n\t" 193 - "xor %%g4, %%l0, %%g4\n\t" 194 - "xor %%g5, %%l1, %%g5\n\t" 195 - "ldd [%2 + 0x08], %%l0\n\t" 196 - "xor %%o0, %%l2, %%o0\n\t" 197 - "xor %%o1, %%l3, %%o1\n\t" 198 - "ldd [%2 + 0x10], %%l2\n\t" 199 - "xor %%o2, %%l4, %%o2\n\t" 200 - "xor %%o3, %%l5, %%o3\n\t" 201 - "ldd [%2 + 0x18], %%l4\n\t" 202 - "xor %%g2, %%o4, %%g2\n\t" 203 - "xor %%g3, %%o5, %%g3\n\t" 204 - "ldd [%3 + 0x00], %%o4\n\t" 205 - "xor %%g4, %%l0, %%g4\n\t" 206 - "xor %%g5, %%l1, %%g5\n\t" 207 - "ldd [%3 + 0x08], %%l0\n\t" 208 - "xor %%o0, %%l2, %%o0\n\t" 209 - "xor %%o1, %%l3, %%o1\n\t" 210 - "ldd [%3 + 0x10], %%l2\n\t" 211 - "xor %%o2, %%l4, %%o2\n\t" 212 - "xor %%o3, %%l5, %%o3\n\t" 213 - "ldd [%3 + 0x18], %%l4\n\t" 214 - "xor %%g2, %%o4, %%g2\n\t" 215 - "xor %%g3, %%o5, %%g3\n\t" 216 - "ldd [%4 + 0x00], %%o4\n\t" 217 - "xor %%g4, %%l0, %%g4\n\t" 218 - "xor %%g5, %%l1, %%g5\n\t" 219 - "ldd [%4 + 0x08], %%l0\n\t" 220 - "xor %%o0, %%l2, %%o0\n\t" 221 - "xor %%o1, %%l3, %%o1\n\t" 222 - "ldd [%4 + 0x10], %%l2\n\t" 223 - "xor %%o2, %%l4, %%o2\n\t" 224 - "xor %%o3, %%l5, %%o3\n\t" 225 - "ldd [%4 + 0x18], %%l4\n\t" 226 - "xor %%g2, %%o4, %%g2\n\t" 227 - "xor %%g3, %%o5, %%g3\n\t" 228 - "xor %%g4, %%l0, %%g4\n\t" 229 - "xor %%g5, %%l1, %%g5\n\t" 230 - "xor %%o0, %%l2, %%o0\n\t" 231 - "xor %%o1, %%l3, %%o1\n\t" 232 - "xor %%o2, %%l4, %%o2\n\t" 233 - "xor %%o3, %%l5, %%o3\n\t" 234 - "std %%g2, [%0 + 0x00]\n\t" 235 - "std %%g4, [%0 + 0x08]\n\t" 236 - "std %%o0, [%0 + 0x10]\n\t" 237 - "std %%o2, [%0 + 0x18]\n" 238 - : 239 - : "r" (p1), "r" (p2), "r" (p3), "r" (p4), "r" (p5) 240 - : "g2", "g3", "g4", "g5", 241 - "o0", "o1", "o2", "o3", "o4", "o5", 242 - "l0", "l1", "l2", "l3", "l4", "l5"); 243 - p1 += 8; 244 - p2 += 8; 245 - p3 += 8; 246 - p4 += 8; 247 - p5 += 8; 248 - } while (--lines > 0); 249 - } 250 - 251 - static struct xor_block_template xor_block_SPARC = { 252 - .name = "SPARC", 253 - .do_2 = sparc_2, 254 - .do_3 = sparc_3, 255 - .do_4 = sparc_4, 256 - .do_5 = sparc_5, 257 - }; 258 - 259 - /* For grins, also test the generic routines. */ 260 - #include <asm-generic/xor.h> 261 - 262 - #undef XOR_TRY_TEMPLATES 263 - #define XOR_TRY_TEMPLATES \ 264 - do { \ 265 - xor_speed(&xor_block_8regs); \ 266 - xor_speed(&xor_block_32regs); \ 267 - xor_speed(&xor_block_SPARC); \ 268 - } while (0)

-79

arch/sparc/include/asm/xor_64.h

··· 1 - /* SPDX-License-Identifier: GPL-2.0-or-later */ 2 - /* 3 - * include/asm/xor.h 4 - * 5 - * High speed xor_block operation for RAID4/5 utilizing the 6 - * UltraSparc Visual Instruction Set and Niagara block-init 7 - * twin-load instructions. 8 - * 9 - * Copyright (C) 1997, 1999 Jakub Jelinek (jj@ultra.linux.cz) 10 - * Copyright (C) 2006 David S. Miller <davem@davemloft.net> 11 - */ 12 - 13 - #include <asm/spitfire.h> 14 - 15 - void xor_vis_2(unsigned long bytes, unsigned long * __restrict p1, 16 - const unsigned long * __restrict p2); 17 - void xor_vis_3(unsigned long bytes, unsigned long * __restrict p1, 18 - const unsigned long * __restrict p2, 19 - const unsigned long * __restrict p3); 20 - void xor_vis_4(unsigned long bytes, unsigned long * __restrict p1, 21 - const unsigned long * __restrict p2, 22 - const unsigned long * __restrict p3, 23 - const unsigned long * __restrict p4); 24 - void xor_vis_5(unsigned long bytes, unsigned long * __restrict p1, 25 - const unsigned long * __restrict p2, 26 - const unsigned long * __restrict p3, 27 - const unsigned long * __restrict p4, 28 - const unsigned long * __restrict p5); 29 - 30 - /* XXX Ugh, write cheetah versions... -DaveM */ 31 - 32 - static struct xor_block_template xor_block_VIS = { 33 - .name = "VIS", 34 - .do_2 = xor_vis_2, 35 - .do_3 = xor_vis_3, 36 - .do_4 = xor_vis_4, 37 - .do_5 = xor_vis_5, 38 - }; 39 - 40 - void xor_niagara_2(unsigned long bytes, unsigned long * __restrict p1, 41 - const unsigned long * __restrict p2); 42 - void xor_niagara_3(unsigned long bytes, unsigned long * __restrict p1, 43 - const unsigned long * __restrict p2, 44 - const unsigned long * __restrict p3); 45 - void xor_niagara_4(unsigned long bytes, unsigned long * __restrict p1, 46 - const unsigned long * __restrict p2, 47 - const unsigned long * __restrict p3, 48 - const unsigned long * __restrict p4); 49 - void xor_niagara_5(unsigned long bytes, unsigned long * __restrict p1, 50 - const unsigned long * __restrict p2, 51 - const unsigned long * __restrict p3, 52 - const unsigned long * __restrict p4, 53 - const unsigned long * __restrict p5); 54 - 55 - static struct xor_block_template xor_block_niagara = { 56 - .name = "Niagara", 57 - .do_2 = xor_niagara_2, 58 - .do_3 = xor_niagara_3, 59 - .do_4 = xor_niagara_4, 60 - .do_5 = xor_niagara_5, 61 - }; 62 - 63 - #undef XOR_TRY_TEMPLATES 64 - #define XOR_TRY_TEMPLATES \ 65 - do { \ 66 - xor_speed(&xor_block_VIS); \ 67 - xor_speed(&xor_block_niagara); \ 68 - } while (0) 69 - 70 - /* For VIS for everything except Niagara. */ 71 - #define XOR_SELECT_TEMPLATE(FASTEST) \ 72 - ((tlb_type == hypervisor && \ 73 - (sun4v_chip_type == SUN4V_CHIP_NIAGARA1 || \ 74 - sun4v_chip_type == SUN4V_CHIP_NIAGARA2 || \ 75 - sun4v_chip_type == SUN4V_CHIP_NIAGARA3 || \ 76 - sun4v_chip_type == SUN4V_CHIP_NIAGARA4 || \ 77 - sun4v_chip_type == SUN4V_CHIP_NIAGARA5)) ? \ 78 - &xor_block_niagara : \ 79 - &xor_block_VIS)

+1 -1

arch/sparc/lib/Makefile

··· 48 48 lib-$(CONFIG_SPARC64) += GENpatch.o GENpage.o GENbzero.o 49 49 50 50 lib-$(CONFIG_SPARC64) += copy_in_user.o memmove.o 51 - lib-$(CONFIG_SPARC64) += mcount.o ipcsum.o xor.o hweight.o ffs.o 51 + lib-$(CONFIG_SPARC64) += mcount.o ipcsum.o hweight.o ffs.o 52 52 53 53 obj-$(CONFIG_SPARC64) += iomap.o 54 54 obj-$(CONFIG_SPARC32) += atomic32.o

-646

arch/sparc/lib/xor.S

··· 1 - /* SPDX-License-Identifier: GPL-2.0 */ 2 - /* 3 - * arch/sparc64/lib/xor.S 4 - * 5 - * High speed xor_block operation for RAID4/5 utilizing the 6 - * UltraSparc Visual Instruction Set and Niagara store-init/twin-load. 7 - * 8 - * Copyright (C) 1997, 1999 Jakub Jelinek (jj@ultra.linux.cz) 9 - * Copyright (C) 2006 David S. Miller <davem@davemloft.net> 10 - */ 11 - 12 - #include <linux/export.h> 13 - #include <linux/linkage.h> 14 - #include <asm/visasm.h> 15 - #include <asm/asi.h> 16 - #include <asm/dcu.h> 17 - #include <asm/spitfire.h> 18 - 19 - /* 20 - * Requirements: 21 - * !(((long)dest | (long)sourceN) & (64 - 1)) && 22 - * !(len & 127) && len >= 256 23 - */ 24 - .text 25 - 26 - /* VIS versions. */ 27 - ENTRY(xor_vis_2) 28 - rd %fprs, %o5 29 - andcc %o5, FPRS_FEF|FPRS_DU, %g0 30 - be,pt %icc, 0f 31 - sethi %hi(VISenter), %g1 32 - jmpl %g1 + %lo(VISenter), %g7 33 - add %g7, 8, %g7 34 - 0: wr %g0, FPRS_FEF, %fprs 35 - rd %asi, %g1 36 - wr %g0, ASI_BLK_P, %asi 37 - membar #LoadStore|#StoreLoad|#StoreStore 38 - sub %o0, 128, %o0 39 - ldda [%o1] %asi, %f0 40 - ldda [%o2] %asi, %f16 41 - 42 - 2: ldda [%o1 + 64] %asi, %f32 43 - fxor %f0, %f16, %f16 44 - fxor %f2, %f18, %f18 45 - fxor %f4, %f20, %f20 46 - fxor %f6, %f22, %f22 47 - fxor %f8, %f24, %f24 48 - fxor %f10, %f26, %f26 49 - fxor %f12, %f28, %f28 50 - fxor %f14, %f30, %f30 51 - stda %f16, [%o1] %asi 52 - ldda [%o2 + 64] %asi, %f48 53 - ldda [%o1 + 128] %asi, %f0 54 - fxor %f32, %f48, %f48 55 - fxor %f34, %f50, %f50 56 - add %o1, 128, %o1 57 - fxor %f36, %f52, %f52 58 - add %o2, 128, %o2 59 - fxor %f38, %f54, %f54 60 - subcc %o0, 128, %o0 61 - fxor %f40, %f56, %f56 62 - fxor %f42, %f58, %f58 63 - fxor %f44, %f60, %f60 64 - fxor %f46, %f62, %f62 65 - stda %f48, [%o1 - 64] %asi 66 - bne,pt %xcc, 2b 67 - ldda [%o2] %asi, %f16 68 - 69 - ldda [%o1 + 64] %asi, %f32 70 - fxor %f0, %f16, %f16 71 - fxor %f2, %f18, %f18 72 - fxor %f4, %f20, %f20 73 - fxor %f6, %f22, %f22 74 - fxor %f8, %f24, %f24 75 - fxor %f10, %f26, %f26 76 - fxor %f12, %f28, %f28 77 - fxor %f14, %f30, %f30 78 - stda %f16, [%o1] %asi 79 - ldda [%o2 + 64] %asi, %f48 80 - membar #Sync 81 - fxor %f32, %f48, %f48 82 - fxor %f34, %f50, %f50 83 - fxor %f36, %f52, %f52 84 - fxor %f38, %f54, %f54 85 - fxor %f40, %f56, %f56 86 - fxor %f42, %f58, %f58 87 - fxor %f44, %f60, %f60 88 - fxor %f46, %f62, %f62 89 - stda %f48, [%o1 + 64] %asi 90 - membar #Sync|#StoreStore|#StoreLoad 91 - wr %g1, %g0, %asi 92 - retl 93 - wr %g0, 0, %fprs 94 - ENDPROC(xor_vis_2) 95 - EXPORT_SYMBOL(xor_vis_2) 96 - 97 - ENTRY(xor_vis_3) 98 - rd %fprs, %o5 99 - andcc %o5, FPRS_FEF|FPRS_DU, %g0 100 - be,pt %icc, 0f 101 - sethi %hi(VISenter), %g1 102 - jmpl %g1 + %lo(VISenter), %g7 103 - add %g7, 8, %g7 104 - 0: wr %g0, FPRS_FEF, %fprs 105 - rd %asi, %g1 106 - wr %g0, ASI_BLK_P, %asi 107 - membar #LoadStore|#StoreLoad|#StoreStore 108 - sub %o0, 64, %o0 109 - ldda [%o1] %asi, %f0 110 - ldda [%o2] %asi, %f16 111 - 112 - 3: ldda [%o3] %asi, %f32 113 - fxor %f0, %f16, %f48 114 - fxor %f2, %f18, %f50 115 - add %o1, 64, %o1 116 - fxor %f4, %f20, %f52 117 - fxor %f6, %f22, %f54 118 - add %o2, 64, %o2 119 - fxor %f8, %f24, %f56 120 - fxor %f10, %f26, %f58 121 - fxor %f12, %f28, %f60 122 - fxor %f14, %f30, %f62 123 - ldda [%o1] %asi, %f0 124 - fxor %f48, %f32, %f48 125 - fxor %f50, %f34, %f50 126 - fxor %f52, %f36, %f52 127 - fxor %f54, %f38, %f54 128 - add %o3, 64, %o3 129 - fxor %f56, %f40, %f56 130 - fxor %f58, %f42, %f58 131 - subcc %o0, 64, %o0 132 - fxor %f60, %f44, %f60 133 - fxor %f62, %f46, %f62 134 - stda %f48, [%o1 - 64] %asi 135 - bne,pt %xcc, 3b 136 - ldda [%o2] %asi, %f16 137 - 138 - ldda [%o3] %asi, %f32 139 - fxor %f0, %f16, %f48 140 - fxor %f2, %f18, %f50 141 - fxor %f4, %f20, %f52 142 - fxor %f6, %f22, %f54 143 - fxor %f8, %f24, %f56 144 - fxor %f10, %f26, %f58 145 - fxor %f12, %f28, %f60 146 - fxor %f14, %f30, %f62 147 - membar #Sync 148 - fxor %f48, %f32, %f48 149 - fxor %f50, %f34, %f50 150 - fxor %f52, %f36, %f52 151 - fxor %f54, %f38, %f54 152 - fxor %f56, %f40, %f56 153 - fxor %f58, %f42, %f58 154 - fxor %f60, %f44, %f60 155 - fxor %f62, %f46, %f62 156 - stda %f48, [%o1] %asi 157 - membar #Sync|#StoreStore|#StoreLoad 158 - wr %g1, %g0, %asi 159 - retl 160 - wr %g0, 0, %fprs 161 - ENDPROC(xor_vis_3) 162 - EXPORT_SYMBOL(xor_vis_3) 163 - 164 - ENTRY(xor_vis_4) 165 - rd %fprs, %o5 166 - andcc %o5, FPRS_FEF|FPRS_DU, %g0 167 - be,pt %icc, 0f 168 - sethi %hi(VISenter), %g1 169 - jmpl %g1 + %lo(VISenter), %g7 170 - add %g7, 8, %g7 171 - 0: wr %g0, FPRS_FEF, %fprs 172 - rd %asi, %g1 173 - wr %g0, ASI_BLK_P, %asi 174 - membar #LoadStore|#StoreLoad|#StoreStore 175 - sub %o0, 64, %o0 176 - ldda [%o1] %asi, %f0 177 - ldda [%o2] %asi, %f16 178 - 179 - 4: ldda [%o3] %asi, %f32 180 - fxor %f0, %f16, %f16 181 - fxor %f2, %f18, %f18 182 - add %o1, 64, %o1 183 - fxor %f4, %f20, %f20 184 - fxor %f6, %f22, %f22 185 - add %o2, 64, %o2 186 - fxor %f8, %f24, %f24 187 - fxor %f10, %f26, %f26 188 - fxor %f12, %f28, %f28 189 - fxor %f14, %f30, %f30 190 - ldda [%o4] %asi, %f48 191 - fxor %f16, %f32, %f32 192 - fxor %f18, %f34, %f34 193 - fxor %f20, %f36, %f36 194 - fxor %f22, %f38, %f38 195 - add %o3, 64, %o3 196 - fxor %f24, %f40, %f40 197 - fxor %f26, %f42, %f42 198 - fxor %f28, %f44, %f44 199 - fxor %f30, %f46, %f46 200 - ldda [%o1] %asi, %f0 201 - fxor %f32, %f48, %f48 202 - fxor %f34, %f50, %f50 203 - fxor %f36, %f52, %f52 204 - add %o4, 64, %o4 205 - fxor %f38, %f54, %f54 206 - fxor %f40, %f56, %f56 207 - fxor %f42, %f58, %f58 208 - subcc %o0, 64, %o0 209 - fxor %f44, %f60, %f60 210 - fxor %f46, %f62, %f62 211 - stda %f48, [%o1 - 64] %asi 212 - bne,pt %xcc, 4b 213 - ldda [%o2] %asi, %f16 214 - 215 - ldda [%o3] %asi, %f32 216 - fxor %f0, %f16, %f16 217 - fxor %f2, %f18, %f18 218 - fxor %f4, %f20, %f20 219 - fxor %f6, %f22, %f22 220 - fxor %f8, %f24, %f24 221 - fxor %f10, %f26, %f26 222 - fxor %f12, %f28, %f28 223 - fxor %f14, %f30, %f30 224 - ldda [%o4] %asi, %f48 225 - fxor %f16, %f32, %f32 226 - fxor %f18, %f34, %f34 227 - fxor %f20, %f36, %f36 228 - fxor %f22, %f38, %f38 229 - fxor %f24, %f40, %f40 230 - fxor %f26, %f42, %f42 231 - fxor %f28, %f44, %f44 232 - fxor %f30, %f46, %f46 233 - membar #Sync 234 - fxor %f32, %f48, %f48 235 - fxor %f34, %f50, %f50 236 - fxor %f36, %f52, %f52 237 - fxor %f38, %f54, %f54 238 - fxor %f40, %f56, %f56 239 - fxor %f42, %f58, %f58 240 - fxor %f44, %f60, %f60 241 - fxor %f46, %f62, %f62 242 - stda %f48, [%o1] %asi 243 - membar #Sync|#StoreStore|#StoreLoad 244 - wr %g1, %g0, %asi 245 - retl 246 - wr %g0, 0, %fprs 247 - ENDPROC(xor_vis_4) 248 - EXPORT_SYMBOL(xor_vis_4) 249 - 250 - ENTRY(xor_vis_5) 251 - save %sp, -192, %sp 252 - rd %fprs, %o5 253 - andcc %o5, FPRS_FEF|FPRS_DU, %g0 254 - be,pt %icc, 0f 255 - sethi %hi(VISenter), %g1 256 - jmpl %g1 + %lo(VISenter), %g7 257 - add %g7, 8, %g7 258 - 0: wr %g0, FPRS_FEF, %fprs 259 - rd %asi, %g1 260 - wr %g0, ASI_BLK_P, %asi 261 - membar #LoadStore|#StoreLoad|#StoreStore 262 - sub %i0, 64, %i0 263 - ldda [%i1] %asi, %f0 264 - ldda [%i2] %asi, %f16 265 - 266 - 5: ldda [%i3] %asi, %f32 267 - fxor %f0, %f16, %f48 268 - fxor %f2, %f18, %f50 269 - add %i1, 64, %i1 270 - fxor %f4, %f20, %f52 271 - fxor %f6, %f22, %f54 272 - add %i2, 64, %i2 273 - fxor %f8, %f24, %f56 274 - fxor %f10, %f26, %f58 275 - fxor %f12, %f28, %f60 276 - fxor %f14, %f30, %f62 277 - ldda [%i4] %asi, %f16 278 - fxor %f48, %f32, %f48 279 - fxor %f50, %f34, %f50 280 - fxor %f52, %f36, %f52 281 - fxor %f54, %f38, %f54 282 - add %i3, 64, %i3 283 - fxor %f56, %f40, %f56 284 - fxor %f58, %f42, %f58 285 - fxor %f60, %f44, %f60 286 - fxor %f62, %f46, %f62 287 - ldda [%i5] %asi, %f32 288 - fxor %f48, %f16, %f48 289 - fxor %f50, %f18, %f50 290 - add %i4, 64, %i4 291 - fxor %f52, %f20, %f52 292 - fxor %f54, %f22, %f54 293 - add %i5, 64, %i5 294 - fxor %f56, %f24, %f56 295 - fxor %f58, %f26, %f58 296 - fxor %f60, %f28, %f60 297 - fxor %f62, %f30, %f62 298 - ldda [%i1] %asi, %f0 299 - fxor %f48, %f32, %f48 300 - fxor %f50, %f34, %f50 301 - fxor %f52, %f36, %f52 302 - fxor %f54, %f38, %f54 303 - fxor %f56, %f40, %f56 304 - fxor %f58, %f42, %f58 305 - subcc %i0, 64, %i0 306 - fxor %f60, %f44, %f60 307 - fxor %f62, %f46, %f62 308 - stda %f48, [%i1 - 64] %asi 309 - bne,pt %xcc, 5b 310 - ldda [%i2] %asi, %f16 311 - 312 - ldda [%i3] %asi, %f32 313 - fxor %f0, %f16, %f48 314 - fxor %f2, %f18, %f50 315 - fxor %f4, %f20, %f52 316 - fxor %f6, %f22, %f54 317 - fxor %f8, %f24, %f56 318 - fxor %f10, %f26, %f58 319 - fxor %f12, %f28, %f60 320 - fxor %f14, %f30, %f62 321 - ldda [%i4] %asi, %f16 322 - fxor %f48, %f32, %f48 323 - fxor %f50, %f34, %f50 324 - fxor %f52, %f36, %f52 325 - fxor %f54, %f38, %f54 326 - fxor %f56, %f40, %f56 327 - fxor %f58, %f42, %f58 328 - fxor %f60, %f44, %f60 329 - fxor %f62, %f46, %f62 330 - ldda [%i5] %asi, %f32 331 - fxor %f48, %f16, %f48 332 - fxor %f50, %f18, %f50 333 - fxor %f52, %f20, %f52 334 - fxor %f54, %f22, %f54 335 - fxor %f56, %f24, %f56 336 - fxor %f58, %f26, %f58 337 - fxor %f60, %f28, %f60 338 - fxor %f62, %f30, %f62 339 - membar #Sync 340 - fxor %f48, %f32, %f48 341 - fxor %f50, %f34, %f50 342 - fxor %f52, %f36, %f52 343 - fxor %f54, %f38, %f54 344 - fxor %f56, %f40, %f56 345 - fxor %f58, %f42, %f58 346 - fxor %f60, %f44, %f60 347 - fxor %f62, %f46, %f62 348 - stda %f48, [%i1] %asi 349 - membar #Sync|#StoreStore|#StoreLoad 350 - wr %g1, %g0, %asi 351 - wr %g0, 0, %fprs 352 - ret 353 - restore 354 - ENDPROC(xor_vis_5) 355 - EXPORT_SYMBOL(xor_vis_5) 356 - 357 - /* Niagara versions. */ 358 - ENTRY(xor_niagara_2) /* %o0=bytes, %o1=dest, %o2=src */ 359 - save %sp, -192, %sp 360 - prefetch [%i1], #n_writes 361 - prefetch [%i2], #one_read 362 - rd %asi, %g7 363 - wr %g0, ASI_BLK_INIT_QUAD_LDD_P, %asi 364 - srlx %i0, 6, %g1 365 - mov %i1, %i0 366 - mov %i2, %i1 367 - 1: ldda [%i1 + 0x00] %asi, %i2 /* %i2/%i3 = src + 0x00 */ 368 - ldda [%i1 + 0x10] %asi, %i4 /* %i4/%i5 = src + 0x10 */ 369 - ldda [%i1 + 0x20] %asi, %g2 /* %g2/%g3 = src + 0x20 */ 370 - ldda [%i1 + 0x30] %asi, %l0 /* %l0/%l1 = src + 0x30 */ 371 - prefetch [%i1 + 0x40], #one_read 372 - ldda [%i0 + 0x00] %asi, %o0 /* %o0/%o1 = dest + 0x00 */ 373 - ldda [%i0 + 0x10] %asi, %o2 /* %o2/%o3 = dest + 0x10 */ 374 - ldda [%i0 + 0x20] %asi, %o4 /* %o4/%o5 = dest + 0x20 */ 375 - ldda [%i0 + 0x30] %asi, %l2 /* %l2/%l3 = dest + 0x30 */ 376 - prefetch [%i0 + 0x40], #n_writes 377 - xor %o0, %i2, %o0 378 - xor %o1, %i3, %o1 379 - stxa %o0, [%i0 + 0x00] %asi 380 - stxa %o1, [%i0 + 0x08] %asi 381 - xor %o2, %i4, %o2 382 - xor %o3, %i5, %o3 383 - stxa %o2, [%i0 + 0x10] %asi 384 - stxa %o3, [%i0 + 0x18] %asi 385 - xor %o4, %g2, %o4 386 - xor %o5, %g3, %o5 387 - stxa %o4, [%i0 + 0x20] %asi 388 - stxa %o5, [%i0 + 0x28] %asi 389 - xor %l2, %l0, %l2 390 - xor %l3, %l1, %l3 391 - stxa %l2, [%i0 + 0x30] %asi 392 - stxa %l3, [%i0 + 0x38] %asi 393 - add %i0, 0x40, %i0 394 - subcc %g1, 1, %g1 395 - bne,pt %xcc, 1b 396 - add %i1, 0x40, %i1 397 - membar #Sync 398 - wr %g7, 0x0, %asi 399 - ret 400 - restore 401 - ENDPROC(xor_niagara_2) 402 - EXPORT_SYMBOL(xor_niagara_2) 403 - 404 - ENTRY(xor_niagara_3) /* %o0=bytes, %o1=dest, %o2=src1, %o3=src2 */ 405 - save %sp, -192, %sp 406 - prefetch [%i1], #n_writes 407 - prefetch [%i2], #one_read 408 - prefetch [%i3], #one_read 409 - rd %asi, %g7 410 - wr %g0, ASI_BLK_INIT_QUAD_LDD_P, %asi 411 - srlx %i0, 6, %g1 412 - mov %i1, %i0 413 - mov %i2, %i1 414 - mov %i3, %l7 415 - 1: ldda [%i1 + 0x00] %asi, %i2 /* %i2/%i3 = src1 + 0x00 */ 416 - ldda [%i1 + 0x10] %asi, %i4 /* %i4/%i5 = src1 + 0x10 */ 417 - ldda [%l7 + 0x00] %asi, %g2 /* %g2/%g3 = src2 + 0x00 */ 418 - ldda [%l7 + 0x10] %asi, %l0 /* %l0/%l1 = src2 + 0x10 */ 419 - ldda [%i0 + 0x00] %asi, %o0 /* %o0/%o1 = dest + 0x00 */ 420 - ldda [%i0 + 0x10] %asi, %o2 /* %o2/%o3 = dest + 0x10 */ 421 - xor %g2, %i2, %g2 422 - xor %g3, %i3, %g3 423 - xor %o0, %g2, %o0 424 - xor %o1, %g3, %o1 425 - stxa %o0, [%i0 + 0x00] %asi 426 - stxa %o1, [%i0 + 0x08] %asi 427 - ldda [%i1 + 0x20] %asi, %i2 /* %i2/%i3 = src1 + 0x20 */ 428 - ldda [%l7 + 0x20] %asi, %g2 /* %g2/%g3 = src2 + 0x20 */ 429 - ldda [%i0 + 0x20] %asi, %o0 /* %o0/%o1 = dest + 0x20 */ 430 - xor %l0, %i4, %l0 431 - xor %l1, %i5, %l1 432 - xor %o2, %l0, %o2 433 - xor %o3, %l1, %o3 434 - stxa %o2, [%i0 + 0x10] %asi 435 - stxa %o3, [%i0 + 0x18] %asi 436 - ldda [%i1 + 0x30] %asi, %i4 /* %i4/%i5 = src1 + 0x30 */ 437 - ldda [%l7 + 0x30] %asi, %l0 /* %l0/%l1 = src2 + 0x30 */ 438 - ldda [%i0 + 0x30] %asi, %o2 /* %o2/%o3 = dest + 0x30 */ 439 - prefetch [%i1 + 0x40], #one_read 440 - prefetch [%l7 + 0x40], #one_read 441 - prefetch [%i0 + 0x40], #n_writes 442 - xor %g2, %i2, %g2 443 - xor %g3, %i3, %g3 444 - xor %o0, %g2, %o0 445 - xor %o1, %g3, %o1 446 - stxa %o0, [%i0 + 0x20] %asi 447 - stxa %o1, [%i0 + 0x28] %asi 448 - xor %l0, %i4, %l0 449 - xor %l1, %i5, %l1 450 - xor %o2, %l0, %o2 451 - xor %o3, %l1, %o3 452 - stxa %o2, [%i0 + 0x30] %asi 453 - stxa %o3, [%i0 + 0x38] %asi 454 - add %i0, 0x40, %i0 455 - add %i1, 0x40, %i1 456 - subcc %g1, 1, %g1 457 - bne,pt %xcc, 1b 458 - add %l7, 0x40, %l7 459 - membar #Sync 460 - wr %g7, 0x0, %asi 461 - ret 462 - restore 463 - ENDPROC(xor_niagara_3) 464 - EXPORT_SYMBOL(xor_niagara_3) 465 - 466 - ENTRY(xor_niagara_4) /* %o0=bytes, %o1=dest, %o2=src1, %o3=src2, %o4=src3 */ 467 - save %sp, -192, %sp 468 - prefetch [%i1], #n_writes 469 - prefetch [%i2], #one_read 470 - prefetch [%i3], #one_read 471 - prefetch [%i4], #one_read 472 - rd %asi, %g7 473 - wr %g0, ASI_BLK_INIT_QUAD_LDD_P, %asi 474 - srlx %i0, 6, %g1 475 - mov %i1, %i0 476 - mov %i2, %i1 477 - mov %i3, %l7 478 - mov %i4, %l6 479 - 1: ldda [%i1 + 0x00] %asi, %i2 /* %i2/%i3 = src1 + 0x00 */ 480 - ldda [%l7 + 0x00] %asi, %i4 /* %i4/%i5 = src2 + 0x00 */ 481 - ldda [%l6 + 0x00] %asi, %g2 /* %g2/%g3 = src3 + 0x00 */ 482 - ldda [%i0 + 0x00] %asi, %l0 /* %l0/%l1 = dest + 0x00 */ 483 - xor %i4, %i2, %i4 484 - xor %i5, %i3, %i5 485 - ldda [%i1 + 0x10] %asi, %i2 /* %i2/%i3 = src1 + 0x10 */ 486 - xor %g2, %i4, %g2 487 - xor %g3, %i5, %g3 488 - ldda [%l7 + 0x10] %asi, %i4 /* %i4/%i5 = src2 + 0x10 */ 489 - xor %l0, %g2, %l0 490 - xor %l1, %g3, %l1 491 - stxa %l0, [%i0 + 0x00] %asi 492 - stxa %l1, [%i0 + 0x08] %asi 493 - ldda [%l6 + 0x10] %asi, %g2 /* %g2/%g3 = src3 + 0x10 */ 494 - ldda [%i0 + 0x10] %asi, %l0 /* %l0/%l1 = dest + 0x10 */ 495 - 496 - xor %i4, %i2, %i4 497 - xor %i5, %i3, %i5 498 - ldda [%i1 + 0x20] %asi, %i2 /* %i2/%i3 = src1 + 0x20 */ 499 - xor %g2, %i4, %g2 500 - xor %g3, %i5, %g3 501 - ldda [%l7 + 0x20] %asi, %i4 /* %i4/%i5 = src2 + 0x20 */ 502 - xor %l0, %g2, %l0 503 - xor %l1, %g3, %l1 504 - stxa %l0, [%i0 + 0x10] %asi 505 - stxa %l1, [%i0 + 0x18] %asi 506 - ldda [%l6 + 0x20] %asi, %g2 /* %g2/%g3 = src3 + 0x20 */ 507 - ldda [%i0 + 0x20] %asi, %l0 /* %l0/%l1 = dest + 0x20 */ 508 - 509 - xor %i4, %i2, %i4 510 - xor %i5, %i3, %i5 511 - ldda [%i1 + 0x30] %asi, %i2 /* %i2/%i3 = src1 + 0x30 */ 512 - xor %g2, %i4, %g2 513 - xor %g3, %i5, %g3 514 - ldda [%l7 + 0x30] %asi, %i4 /* %i4/%i5 = src2 + 0x30 */ 515 - xor %l0, %g2, %l0 516 - xor %l1, %g3, %l1 517 - stxa %l0, [%i0 + 0x20] %asi 518 - stxa %l1, [%i0 + 0x28] %asi 519 - ldda [%l6 + 0x30] %asi, %g2 /* %g2/%g3 = src3 + 0x30 */ 520 - ldda [%i0 + 0x30] %asi, %l0 /* %l0/%l1 = dest + 0x30 */ 521 - 522 - prefetch [%i1 + 0x40], #one_read 523 - prefetch [%l7 + 0x40], #one_read 524 - prefetch [%l6 + 0x40], #one_read 525 - prefetch [%i0 + 0x40], #n_writes 526 - 527 - xor %i4, %i2, %i4 528 - xor %i5, %i3, %i5 529 - xor %g2, %i4, %g2 530 - xor %g3, %i5, %g3 531 - xor %l0, %g2, %l0 532 - xor %l1, %g3, %l1 533 - stxa %l0, [%i0 + 0x30] %asi 534 - stxa %l1, [%i0 + 0x38] %asi 535 - 536 - add %i0, 0x40, %i0 537 - add %i1, 0x40, %i1 538 - add %l7, 0x40, %l7 539 - subcc %g1, 1, %g1 540 - bne,pt %xcc, 1b 541 - add %l6, 0x40, %l6 542 - membar #Sync 543 - wr %g7, 0x0, %asi 544 - ret 545 - restore 546 - ENDPROC(xor_niagara_4) 547 - EXPORT_SYMBOL(xor_niagara_4) 548 - 549 - ENTRY(xor_niagara_5) /* %o0=bytes, %o1=dest, %o2=src1, %o3=src2, %o4=src3, %o5=src4 */ 550 - save %sp, -192, %sp 551 - prefetch [%i1], #n_writes 552 - prefetch [%i2], #one_read 553 - prefetch [%i3], #one_read 554 - prefetch [%i4], #one_read 555 - prefetch [%i5], #one_read 556 - rd %asi, %g7 557 - wr %g0, ASI_BLK_INIT_QUAD_LDD_P, %asi 558 - srlx %i0, 6, %g1 559 - mov %i1, %i0 560 - mov %i2, %i1 561 - mov %i3, %l7 562 - mov %i4, %l6 563 - mov %i5, %l5 564 - 1: ldda [%i1 + 0x00] %asi, %i2 /* %i2/%i3 = src1 + 0x00 */ 565 - ldda [%l7 + 0x00] %asi, %i4 /* %i4/%i5 = src2 + 0x00 */ 566 - ldda [%l6 + 0x00] %asi, %g2 /* %g2/%g3 = src3 + 0x00 */ 567 - ldda [%l5 + 0x00] %asi, %l0 /* %l0/%l1 = src4 + 0x00 */ 568 - ldda [%i0 + 0x00] %asi, %l2 /* %l2/%l3 = dest + 0x00 */ 569 - xor %i4, %i2, %i4 570 - xor %i5, %i3, %i5 571 - ldda [%i1 + 0x10] %asi, %i2 /* %i2/%i3 = src1 + 0x10 */ 572 - xor %g2, %i4, %g2 573 - xor %g3, %i5, %g3 574 - ldda [%l7 + 0x10] %asi, %i4 /* %i4/%i5 = src2 + 0x10 */ 575 - xor %l0, %g2, %l0 576 - xor %l1, %g3, %l1 577 - ldda [%l6 + 0x10] %asi, %g2 /* %g2/%g3 = src3 + 0x10 */ 578 - xor %l2, %l0, %l2 579 - xor %l3, %l1, %l3 580 - stxa %l2, [%i0 + 0x00] %asi 581 - stxa %l3, [%i0 + 0x08] %asi 582 - ldda [%l5 + 0x10] %asi, %l0 /* %l0/%l1 = src4 + 0x10 */ 583 - ldda [%i0 + 0x10] %asi, %l2 /* %l2/%l3 = dest + 0x10 */ 584 - 585 - xor %i4, %i2, %i4 586 - xor %i5, %i3, %i5 587 - ldda [%i1 + 0x20] %asi, %i2 /* %i2/%i3 = src1 + 0x20 */ 588 - xor %g2, %i4, %g2 589 - xor %g3, %i5, %g3 590 - ldda [%l7 + 0x20] %asi, %i4 /* %i4/%i5 = src2 + 0x20 */ 591 - xor %l0, %g2, %l0 592 - xor %l1, %g3, %l1 593 - ldda [%l6 + 0x20] %asi, %g2 /* %g2/%g3 = src3 + 0x20 */ 594 - xor %l2, %l0, %l2 595 - xor %l3, %l1, %l3 596 - stxa %l2, [%i0 + 0x10] %asi 597 - stxa %l3, [%i0 + 0x18] %asi 598 - ldda [%l5 + 0x20] %asi, %l0 /* %l0/%l1 = src4 + 0x20 */ 599 - ldda [%i0 + 0x20] %asi, %l2 /* %l2/%l3 = dest + 0x20 */ 600 - 601 - xor %i4, %i2, %i4 602 - xor %i5, %i3, %i5 603 - ldda [%i1 + 0x30] %asi, %i2 /* %i2/%i3 = src1 + 0x30 */ 604 - xor %g2, %i4, %g2 605 - xor %g3, %i5, %g3 606 - ldda [%l7 + 0x30] %asi, %i4 /* %i4/%i5 = src2 + 0x30 */ 607 - xor %l0, %g2, %l0 608 - xor %l1, %g3, %l1 609 - ldda [%l6 + 0x30] %asi, %g2 /* %g2/%g3 = src3 + 0x30 */ 610 - xor %l2, %l0, %l2 611 - xor %l3, %l1, %l3 612 - stxa %l2, [%i0 + 0x20] %asi 613 - stxa %l3, [%i0 + 0x28] %asi 614 - ldda [%l5 + 0x30] %asi, %l0 /* %l0/%l1 = src4 + 0x30 */ 615 - ldda [%i0 + 0x30] %asi, %l2 /* %l2/%l3 = dest + 0x30 */ 616 - 617 - prefetch [%i1 + 0x40], #one_read 618 - prefetch [%l7 + 0x40], #one_read 619 - prefetch [%l6 + 0x40], #one_read 620 - prefetch [%l5 + 0x40], #one_read 621 - prefetch [%i0 + 0x40], #n_writes 622 - 623 - xor %i4, %i2, %i4 624 - xor %i5, %i3, %i5 625 - xor %g2, %i4, %g2 626 - xor %g3, %i5, %g3 627 - xor %l0, %g2, %l0 628 - xor %l1, %g3, %l1 629 - xor %l2, %l0, %l2 630 - xor %l3, %l1, %l3 631 - stxa %l2, [%i0 + 0x30] %asi 632 - stxa %l3, [%i0 + 0x38] %asi 633 - 634 - add %i0, 0x40, %i0 635 - add %i1, 0x40, %i1 636 - add %l7, 0x40, %l7 637 - add %l6, 0x40, %l6 638 - subcc %g1, 1, %g1 639 - bne,pt %xcc, 1b 640 - add %l5, 0x40, %l5 641 - membar #Sync 642 - wr %g7, 0x0, %asi 643 - ret 644 - restore 645 - ENDPROC(xor_niagara_5) 646 - EXPORT_SYMBOL(xor_niagara_5)

-24

arch/um/include/asm/xor.h

··· 1 - /* SPDX-License-Identifier: GPL-2.0 */ 2 - #ifndef _ASM_UM_XOR_H 3 - #define _ASM_UM_XOR_H 4 - 5 - #ifdef CONFIG_64BIT 6 - #undef CONFIG_X86_32 7 - #define TT_CPU_INF_XOR_DEFAULT (AVX_SELECT(&xor_block_sse_pf64)) 8 - #else 9 - #define CONFIG_X86_32 1 10 - #define TT_CPU_INF_XOR_DEFAULT (AVX_SELECT(&xor_block_8regs)) 11 - #endif 12 - 13 - #include <asm/cpufeature.h> 14 - #include <../../x86/include/asm/xor.h> 15 - #include <linux/time-internal.h> 16 - 17 - #ifdef CONFIG_UML_TIME_TRAVEL_SUPPORT 18 - #undef XOR_SELECT_TEMPLATE 19 - /* pick an arbitrary one - measuring isn't possible with inf-cpu */ 20 - #define XOR_SELECT_TEMPLATE(x) \ 21 - (time_travel_mode == TT_MODE_INFCPU ? TT_CPU_INF_XOR_DEFAULT : x) 22 - #endif 23 - 24 - #endif

-502

arch/x86/include/asm/xor.h

··· 1 - /* SPDX-License-Identifier: GPL-2.0-or-later */ 2 - #ifndef _ASM_X86_XOR_H 3 - #define _ASM_X86_XOR_H 4 - 5 - /* 6 - * Optimized RAID-5 checksumming functions for SSE. 7 - */ 8 - 9 - /* 10 - * Cache avoiding checksumming functions utilizing KNI instructions 11 - * Copyright (C) 1999 Zach Brown (with obvious credit due Ingo) 12 - */ 13 - 14 - /* 15 - * Based on 16 - * High-speed RAID5 checksumming functions utilizing SSE instructions. 17 - * Copyright (C) 1998 Ingo Molnar. 18 - */ 19 - 20 - /* 21 - * x86-64 changes / gcc fixes from Andi Kleen. 22 - * Copyright 2002 Andi Kleen, SuSE Labs. 23 - * 24 - * This hasn't been optimized for the hammer yet, but there are likely 25 - * no advantages to be gotten from x86-64 here anyways. 26 - */ 27 - 28 - #include <asm/fpu/api.h> 29 - 30 - #ifdef CONFIG_X86_32 31 - /* reduce register pressure */ 32 - # define XOR_CONSTANT_CONSTRAINT "i" 33 - #else 34 - # define XOR_CONSTANT_CONSTRAINT "re" 35 - #endif 36 - 37 - #define OFFS(x) "16*("#x")" 38 - #define PF_OFFS(x) "256+16*("#x")" 39 - #define PF0(x) " prefetchnta "PF_OFFS(x)"(%[p1]) ;\n" 40 - #define LD(x, y) " movaps "OFFS(x)"(%[p1]), %%xmm"#y" ;\n" 41 - #define ST(x, y) " movaps %%xmm"#y", "OFFS(x)"(%[p1]) ;\n" 42 - #define PF1(x) " prefetchnta "PF_OFFS(x)"(%[p2]) ;\n" 43 - #define PF2(x) " prefetchnta "PF_OFFS(x)"(%[p3]) ;\n" 44 - #define PF3(x) " prefetchnta "PF_OFFS(x)"(%[p4]) ;\n" 45 - #define PF4(x) " prefetchnta "PF_OFFS(x)"(%[p5]) ;\n" 46 - #define XO1(x, y) " xorps "OFFS(x)"(%[p2]), %%xmm"#y" ;\n" 47 - #define XO2(x, y) " xorps "OFFS(x)"(%[p3]), %%xmm"#y" ;\n" 48 - #define XO3(x, y) " xorps "OFFS(x)"(%[p4]), %%xmm"#y" ;\n" 49 - #define XO4(x, y) " xorps "OFFS(x)"(%[p5]), %%xmm"#y" ;\n" 50 - #define NOP(x) 51 - 52 - #define BLK64(pf, op, i) \ 53 - pf(i) \ 54 - op(i, 0) \ 55 - op(i + 1, 1) \ 56 - op(i + 2, 2) \ 57 - op(i + 3, 3) 58 - 59 - static void 60 - xor_sse_2(unsigned long bytes, unsigned long * __restrict p1, 61 - const unsigned long * __restrict p2) 62 - { 63 - unsigned long lines = bytes >> 8; 64 - 65 - kernel_fpu_begin(); 66 - 67 - asm volatile( 68 - #undef BLOCK 69 - #define BLOCK(i) \ 70 - LD(i, 0) \ 71 - LD(i + 1, 1) \ 72 - PF1(i) \ 73 - PF1(i + 2) \ 74 - LD(i + 2, 2) \ 75 - LD(i + 3, 3) \ 76 - PF0(i + 4) \ 77 - PF0(i + 6) \ 78 - XO1(i, 0) \ 79 - XO1(i + 1, 1) \ 80 - XO1(i + 2, 2) \ 81 - XO1(i + 3, 3) \ 82 - ST(i, 0) \ 83 - ST(i + 1, 1) \ 84 - ST(i + 2, 2) \ 85 - ST(i + 3, 3) \ 86 - 87 - 88 - PF0(0) 89 - PF0(2) 90 - 91 - " .align 32 ;\n" 92 - " 1: ;\n" 93 - 94 - BLOCK(0) 95 - BLOCK(4) 96 - BLOCK(8) 97 - BLOCK(12) 98 - 99 - " add %[inc], %[p1] ;\n" 100 - " add %[inc], %[p2] ;\n" 101 - " dec %[cnt] ;\n" 102 - " jnz 1b ;\n" 103 - : [cnt] "+r" (lines), 104 - [p1] "+r" (p1), [p2] "+r" (p2) 105 - : [inc] XOR_CONSTANT_CONSTRAINT (256UL) 106 - : "memory"); 107 - 108 - kernel_fpu_end(); 109 - } 110 - 111 - static void 112 - xor_sse_2_pf64(unsigned long bytes, unsigned long * __restrict p1, 113 - const unsigned long * __restrict p2) 114 - { 115 - unsigned long lines = bytes >> 8; 116 - 117 - kernel_fpu_begin(); 118 - 119 - asm volatile( 120 - #undef BLOCK 121 - #define BLOCK(i) \ 122 - BLK64(PF0, LD, i) \ 123 - BLK64(PF1, XO1, i) \ 124 - BLK64(NOP, ST, i) \ 125 - 126 - " .align 32 ;\n" 127 - " 1: ;\n" 128 - 129 - BLOCK(0) 130 - BLOCK(4) 131 - BLOCK(8) 132 - BLOCK(12) 133 - 134 - " add %[inc], %[p1] ;\n" 135 - " add %[inc], %[p2] ;\n" 136 - " dec %[cnt] ;\n" 137 - " jnz 1b ;\n" 138 - : [cnt] "+r" (lines), 139 - [p1] "+r" (p1), [p2] "+r" (p2) 140 - : [inc] XOR_CONSTANT_CONSTRAINT (256UL) 141 - : "memory"); 142 - 143 - kernel_fpu_end(); 144 - } 145 - 146 - static void 147 - xor_sse_3(unsigned long bytes, unsigned long * __restrict p1, 148 - const unsigned long * __restrict p2, 149 - const unsigned long * __restrict p3) 150 - { 151 - unsigned long lines = bytes >> 8; 152 - 153 - kernel_fpu_begin(); 154 - 155 - asm volatile( 156 - #undef BLOCK 157 - #define BLOCK(i) \ 158 - PF1(i) \ 159 - PF1(i + 2) \ 160 - LD(i, 0) \ 161 - LD(i + 1, 1) \ 162 - LD(i + 2, 2) \ 163 - LD(i + 3, 3) \ 164 - PF2(i) \ 165 - PF2(i + 2) \ 166 - PF0(i + 4) \ 167 - PF0(i + 6) \ 168 - XO1(i, 0) \ 169 - XO1(i + 1, 1) \ 170 - XO1(i + 2, 2) \ 171 - XO1(i + 3, 3) \ 172 - XO2(i, 0) \ 173 - XO2(i + 1, 1) \ 174 - XO2(i + 2, 2) \ 175 - XO2(i + 3, 3) \ 176 - ST(i, 0) \ 177 - ST(i + 1, 1) \ 178 - ST(i + 2, 2) \ 179 - ST(i + 3, 3) \ 180 - 181 - 182 - PF0(0) 183 - PF0(2) 184 - 185 - " .align 32 ;\n" 186 - " 1: ;\n" 187 - 188 - BLOCK(0) 189 - BLOCK(4) 190 - BLOCK(8) 191 - BLOCK(12) 192 - 193 - " add %[inc], %[p1] ;\n" 194 - " add %[inc], %[p2] ;\n" 195 - " add %[inc], %[p3] ;\n" 196 - " dec %[cnt] ;\n" 197 - " jnz 1b ;\n" 198 - : [cnt] "+r" (lines), 199 - [p1] "+r" (p1), [p2] "+r" (p2), [p3] "+r" (p3) 200 - : [inc] XOR_CONSTANT_CONSTRAINT (256UL) 201 - : "memory"); 202 - 203 - kernel_fpu_end(); 204 - } 205 - 206 - static void 207 - xor_sse_3_pf64(unsigned long bytes, unsigned long * __restrict p1, 208 - const unsigned long * __restrict p2, 209 - const unsigned long * __restrict p3) 210 - { 211 - unsigned long lines = bytes >> 8; 212 - 213 - kernel_fpu_begin(); 214 - 215 - asm volatile( 216 - #undef BLOCK 217 - #define BLOCK(i) \ 218 - BLK64(PF0, LD, i) \ 219 - BLK64(PF1, XO1, i) \ 220 - BLK64(PF2, XO2, i) \ 221 - BLK64(NOP, ST, i) \ 222 - 223 - " .align 32 ;\n" 224 - " 1: ;\n" 225 - 226 - BLOCK(0) 227 - BLOCK(4) 228 - BLOCK(8) 229 - BLOCK(12) 230 - 231 - " add %[inc], %[p1] ;\n" 232 - " add %[inc], %[p2] ;\n" 233 - " add %[inc], %[p3] ;\n" 234 - " dec %[cnt] ;\n" 235 - " jnz 1b ;\n" 236 - : [cnt] "+r" (lines), 237 - [p1] "+r" (p1), [p2] "+r" (p2), [p3] "+r" (p3) 238 - : [inc] XOR_CONSTANT_CONSTRAINT (256UL) 239 - : "memory"); 240 - 241 - kernel_fpu_end(); 242 - } 243 - 244 - static void 245 - xor_sse_4(unsigned long bytes, unsigned long * __restrict p1, 246 - const unsigned long * __restrict p2, 247 - const unsigned long * __restrict p3, 248 - const unsigned long * __restrict p4) 249 - { 250 - unsigned long lines = bytes >> 8; 251 - 252 - kernel_fpu_begin(); 253 - 254 - asm volatile( 255 - #undef BLOCK 256 - #define BLOCK(i) \ 257 - PF1(i) \ 258 - PF1(i + 2) \ 259 - LD(i, 0) \ 260 - LD(i + 1, 1) \ 261 - LD(i + 2, 2) \ 262 - LD(i + 3, 3) \ 263 - PF2(i) \ 264 - PF2(i + 2) \ 265 - XO1(i, 0) \ 266 - XO1(i + 1, 1) \ 267 - XO1(i + 2, 2) \ 268 - XO1(i + 3, 3) \ 269 - PF3(i) \ 270 - PF3(i + 2) \ 271 - PF0(i + 4) \ 272 - PF0(i + 6) \ 273 - XO2(i, 0) \ 274 - XO2(i + 1, 1) \ 275 - XO2(i + 2, 2) \ 276 - XO2(i + 3, 3) \ 277 - XO3(i, 0) \ 278 - XO3(i + 1, 1) \ 279 - XO3(i + 2, 2) \ 280 - XO3(i + 3, 3) \ 281 - ST(i, 0) \ 282 - ST(i + 1, 1) \ 283 - ST(i + 2, 2) \ 284 - ST(i + 3, 3) \ 285 - 286 - 287 - PF0(0) 288 - PF0(2) 289 - 290 - " .align 32 ;\n" 291 - " 1: ;\n" 292 - 293 - BLOCK(0) 294 - BLOCK(4) 295 - BLOCK(8) 296 - BLOCK(12) 297 - 298 - " add %[inc], %[p1] ;\n" 299 - " add %[inc], %[p2] ;\n" 300 - " add %[inc], %[p3] ;\n" 301 - " add %[inc], %[p4] ;\n" 302 - " dec %[cnt] ;\n" 303 - " jnz 1b ;\n" 304 - : [cnt] "+r" (lines), [p1] "+r" (p1), 305 - [p2] "+r" (p2), [p3] "+r" (p3), [p4] "+r" (p4) 306 - : [inc] XOR_CONSTANT_CONSTRAINT (256UL) 307 - : "memory"); 308 - 309 - kernel_fpu_end(); 310 - } 311 - 312 - static void 313 - xor_sse_4_pf64(unsigned long bytes, unsigned long * __restrict p1, 314 - const unsigned long * __restrict p2, 315 - const unsigned long * __restrict p3, 316 - const unsigned long * __restrict p4) 317 - { 318 - unsigned long lines = bytes >> 8; 319 - 320 - kernel_fpu_begin(); 321 - 322 - asm volatile( 323 - #undef BLOCK 324 - #define BLOCK(i) \ 325 - BLK64(PF0, LD, i) \ 326 - BLK64(PF1, XO1, i) \ 327 - BLK64(PF2, XO2, i) \ 328 - BLK64(PF3, XO3, i) \ 329 - BLK64(NOP, ST, i) \ 330 - 331 - " .align 32 ;\n" 332 - " 1: ;\n" 333 - 334 - BLOCK(0) 335 - BLOCK(4) 336 - BLOCK(8) 337 - BLOCK(12) 338 - 339 - " add %[inc], %[p1] ;\n" 340 - " add %[inc], %[p2] ;\n" 341 - " add %[inc], %[p3] ;\n" 342 - " add %[inc], %[p4] ;\n" 343 - " dec %[cnt] ;\n" 344 - " jnz 1b ;\n" 345 - : [cnt] "+r" (lines), [p1] "+r" (p1), 346 - [p2] "+r" (p2), [p3] "+r" (p3), [p4] "+r" (p4) 347 - : [inc] XOR_CONSTANT_CONSTRAINT (256UL) 348 - : "memory"); 349 - 350 - kernel_fpu_end(); 351 - } 352 - 353 - static void 354 - xor_sse_5(unsigned long bytes, unsigned long * __restrict p1, 355 - const unsigned long * __restrict p2, 356 - const unsigned long * __restrict p3, 357 - const unsigned long * __restrict p4, 358 - const unsigned long * __restrict p5) 359 - { 360 - unsigned long lines = bytes >> 8; 361 - 362 - kernel_fpu_begin(); 363 - 364 - asm volatile( 365 - #undef BLOCK 366 - #define BLOCK(i) \ 367 - PF1(i) \ 368 - PF1(i + 2) \ 369 - LD(i, 0) \ 370 - LD(i + 1, 1) \ 371 - LD(i + 2, 2) \ 372 - LD(i + 3, 3) \ 373 - PF2(i) \ 374 - PF2(i + 2) \ 375 - XO1(i, 0) \ 376 - XO1(i + 1, 1) \ 377 - XO1(i + 2, 2) \ 378 - XO1(i + 3, 3) \ 379 - PF3(i) \ 380 - PF3(i + 2) \ 381 - XO2(i, 0) \ 382 - XO2(i + 1, 1) \ 383 - XO2(i + 2, 2) \ 384 - XO2(i + 3, 3) \ 385 - PF4(i) \ 386 - PF4(i + 2) \ 387 - PF0(i + 4) \ 388 - PF0(i + 6) \ 389 - XO3(i, 0) \ 390 - XO3(i + 1, 1) \ 391 - XO3(i + 2, 2) \ 392 - XO3(i + 3, 3) \ 393 - XO4(i, 0) \ 394 - XO4(i + 1, 1) \ 395 - XO4(i + 2, 2) \ 396 - XO4(i + 3, 3) \ 397 - ST(i, 0) \ 398 - ST(i + 1, 1) \ 399 - ST(i + 2, 2) \ 400 - ST(i + 3, 3) \ 401 - 402 - 403 - PF0(0) 404 - PF0(2) 405 - 406 - " .align 32 ;\n" 407 - " 1: ;\n" 408 - 409 - BLOCK(0) 410 - BLOCK(4) 411 - BLOCK(8) 412 - BLOCK(12) 413 - 414 - " add %[inc], %[p1] ;\n" 415 - " add %[inc], %[p2] ;\n" 416 - " add %[inc], %[p3] ;\n" 417 - " add %[inc], %[p4] ;\n" 418 - " add %[inc], %[p5] ;\n" 419 - " dec %[cnt] ;\n" 420 - " jnz 1b ;\n" 421 - : [cnt] "+r" (lines), [p1] "+r" (p1), [p2] "+r" (p2), 422 - [p3] "+r" (p3), [p4] "+r" (p4), [p5] "+r" (p5) 423 - : [inc] XOR_CONSTANT_CONSTRAINT (256UL) 424 - : "memory"); 425 - 426 - kernel_fpu_end(); 427 - } 428 - 429 - static void 430 - xor_sse_5_pf64(unsigned long bytes, unsigned long * __restrict p1, 431 - const unsigned long * __restrict p2, 432 - const unsigned long * __restrict p3, 433 - const unsigned long * __restrict p4, 434 - const unsigned long * __restrict p5) 435 - { 436 - unsigned long lines = bytes >> 8; 437 - 438 - kernel_fpu_begin(); 439 - 440 - asm volatile( 441 - #undef BLOCK 442 - #define BLOCK(i) \ 443 - BLK64(PF0, LD, i) \ 444 - BLK64(PF1, XO1, i) \ 445 - BLK64(PF2, XO2, i) \ 446 - BLK64(PF3, XO3, i) \ 447 - BLK64(PF4, XO4, i) \ 448 - BLK64(NOP, ST, i) \ 449 - 450 - " .align 32 ;\n" 451 - " 1: ;\n" 452 - 453 - BLOCK(0) 454 - BLOCK(4) 455 - BLOCK(8) 456 - BLOCK(12) 457 - 458 - " add %[inc], %[p1] ;\n" 459 - " add %[inc], %[p2] ;\n" 460 - " add %[inc], %[p3] ;\n" 461 - " add %[inc], %[p4] ;\n" 462 - " add %[inc], %[p5] ;\n" 463 - " dec %[cnt] ;\n" 464 - " jnz 1b ;\n" 465 - : [cnt] "+r" (lines), [p1] "+r" (p1), [p2] "+r" (p2), 466 - [p3] "+r" (p3), [p4] "+r" (p4), [p5] "+r" (p5) 467 - : [inc] XOR_CONSTANT_CONSTRAINT (256UL) 468 - : "memory"); 469 - 470 - kernel_fpu_end(); 471 - } 472 - 473 - static struct xor_block_template xor_block_sse_pf64 = { 474 - .name = "prefetch64-sse", 475 - .do_2 = xor_sse_2_pf64, 476 - .do_3 = xor_sse_3_pf64, 477 - .do_4 = xor_sse_4_pf64, 478 - .do_5 = xor_sse_5_pf64, 479 - }; 480 - 481 - #undef LD 482 - #undef XO1 483 - #undef XO2 484 - #undef XO3 485 - #undef XO4 486 - #undef ST 487 - #undef NOP 488 - #undef BLK64 489 - #undef BLOCK 490 - 491 - #undef XOR_CONSTANT_CONSTRAINT 492 - 493 - #ifdef CONFIG_X86_32 494 - # include <asm/xor_32.h> 495 - #else 496 - # include <asm/xor_64.h> 497 - #endif 498 - 499 - #define XOR_SELECT_TEMPLATE(FASTEST) \ 500 - AVX_SELECT(FASTEST) 501 - 502 - #endif /* _ASM_X86_XOR_H */

-573

arch/x86/include/asm/xor_32.h

··· 1 - /* SPDX-License-Identifier: GPL-2.0-or-later */ 2 - #ifndef _ASM_X86_XOR_32_H 3 - #define _ASM_X86_XOR_32_H 4 - 5 - /* 6 - * Optimized RAID-5 checksumming functions for MMX. 7 - */ 8 - 9 - /* 10 - * High-speed RAID5 checksumming functions utilizing MMX instructions. 11 - * Copyright (C) 1998 Ingo Molnar. 12 - */ 13 - 14 - #define LD(x, y) " movq 8*("#x")(%1), %%mm"#y" ;\n" 15 - #define ST(x, y) " movq %%mm"#y", 8*("#x")(%1) ;\n" 16 - #define XO1(x, y) " pxor 8*("#x")(%2), %%mm"#y" ;\n" 17 - #define XO2(x, y) " pxor 8*("#x")(%3), %%mm"#y" ;\n" 18 - #define XO3(x, y) " pxor 8*("#x")(%4), %%mm"#y" ;\n" 19 - #define XO4(x, y) " pxor 8*("#x")(%5), %%mm"#y" ;\n" 20 - 21 - #include <asm/fpu/api.h> 22 - 23 - static void 24 - xor_pII_mmx_2(unsigned long bytes, unsigned long * __restrict p1, 25 - const unsigned long * __restrict p2) 26 - { 27 - unsigned long lines = bytes >> 7; 28 - 29 - kernel_fpu_begin(); 30 - 31 - asm volatile( 32 - #undef BLOCK 33 - #define BLOCK(i) \ 34 - LD(i, 0) \ 35 - LD(i + 1, 1) \ 36 - LD(i + 2, 2) \ 37 - LD(i + 3, 3) \ 38 - XO1(i, 0) \ 39 - ST(i, 0) \ 40 - XO1(i+1, 1) \ 41 - ST(i+1, 1) \ 42 - XO1(i + 2, 2) \ 43 - ST(i + 2, 2) \ 44 - XO1(i + 3, 3) \ 45 - ST(i + 3, 3) 46 - 47 - " .align 32 ;\n" 48 - " 1: ;\n" 49 - 50 - BLOCK(0) 51 - BLOCK(4) 52 - BLOCK(8) 53 - BLOCK(12) 54 - 55 - " addl $128, %1 ;\n" 56 - " addl $128, %2 ;\n" 57 - " decl %0 ;\n" 58 - " jnz 1b ;\n" 59 - : "+r" (lines), 60 - "+r" (p1), "+r" (p2) 61 - : 62 - : "memory"); 63 - 64 - kernel_fpu_end(); 65 - } 66 - 67 - static void 68 - xor_pII_mmx_3(unsigned long bytes, unsigned long * __restrict p1, 69 - const unsigned long * __restrict p2, 70 - const unsigned long * __restrict p3) 71 - { 72 - unsigned long lines = bytes >> 7; 73 - 74 - kernel_fpu_begin(); 75 - 76 - asm volatile( 77 - #undef BLOCK 78 - #define BLOCK(i) \ 79 - LD(i, 0) \ 80 - LD(i + 1, 1) \ 81 - LD(i + 2, 2) \ 82 - LD(i + 3, 3) \ 83 - XO1(i, 0) \ 84 - XO1(i + 1, 1) \ 85 - XO1(i + 2, 2) \ 86 - XO1(i + 3, 3) \ 87 - XO2(i, 0) \ 88 - ST(i, 0) \ 89 - XO2(i + 1, 1) \ 90 - ST(i + 1, 1) \ 91 - XO2(i + 2, 2) \ 92 - ST(i + 2, 2) \ 93 - XO2(i + 3, 3) \ 94 - ST(i + 3, 3) 95 - 96 - " .align 32 ;\n" 97 - " 1: ;\n" 98 - 99 - BLOCK(0) 100 - BLOCK(4) 101 - BLOCK(8) 102 - BLOCK(12) 103 - 104 - " addl $128, %1 ;\n" 105 - " addl $128, %2 ;\n" 106 - " addl $128, %3 ;\n" 107 - " decl %0 ;\n" 108 - " jnz 1b ;\n" 109 - : "+r" (lines), 110 - "+r" (p1), "+r" (p2), "+r" (p3) 111 - : 112 - : "memory"); 113 - 114 - kernel_fpu_end(); 115 - } 116 - 117 - static void 118 - xor_pII_mmx_4(unsigned long bytes, unsigned long * __restrict p1, 119 - const unsigned long * __restrict p2, 120 - const unsigned long * __restrict p3, 121 - const unsigned long * __restrict p4) 122 - { 123 - unsigned long lines = bytes >> 7; 124 - 125 - kernel_fpu_begin(); 126 - 127 - asm volatile( 128 - #undef BLOCK 129 - #define BLOCK(i) \ 130 - LD(i, 0) \ 131 - LD(i + 1, 1) \ 132 - LD(i + 2, 2) \ 133 - LD(i + 3, 3) \ 134 - XO1(i, 0) \ 135 - XO1(i + 1, 1) \ 136 - XO1(i + 2, 2) \ 137 - XO1(i + 3, 3) \ 138 - XO2(i, 0) \ 139 - XO2(i + 1, 1) \ 140 - XO2(i + 2, 2) \ 141 - XO2(i + 3, 3) \ 142 - XO3(i, 0) \ 143 - ST(i, 0) \ 144 - XO3(i + 1, 1) \ 145 - ST(i + 1, 1) \ 146 - XO3(i + 2, 2) \ 147 - ST(i + 2, 2) \ 148 - XO3(i + 3, 3) \ 149 - ST(i + 3, 3) 150 - 151 - " .align 32 ;\n" 152 - " 1: ;\n" 153 - 154 - BLOCK(0) 155 - BLOCK(4) 156 - BLOCK(8) 157 - BLOCK(12) 158 - 159 - " addl $128, %1 ;\n" 160 - " addl $128, %2 ;\n" 161 - " addl $128, %3 ;\n" 162 - " addl $128, %4 ;\n" 163 - " decl %0 ;\n" 164 - " jnz 1b ;\n" 165 - : "+r" (lines), 166 - "+r" (p1), "+r" (p2), "+r" (p3), "+r" (p4) 167 - : 168 - : "memory"); 169 - 170 - kernel_fpu_end(); 171 - } 172 - 173 - 174 - static void 175 - xor_pII_mmx_5(unsigned long bytes, unsigned long * __restrict p1, 176 - const unsigned long * __restrict p2, 177 - const unsigned long * __restrict p3, 178 - const unsigned long * __restrict p4, 179 - const unsigned long * __restrict p5) 180 - { 181 - unsigned long lines = bytes >> 7; 182 - 183 - kernel_fpu_begin(); 184 - 185 - /* Make sure GCC forgets anything it knows about p4 or p5, 186 - such that it won't pass to the asm volatile below a 187 - register that is shared with any other variable. That's 188 - because we modify p4 and p5 there, but we can't mark them 189 - as read/write, otherwise we'd overflow the 10-asm-operands 190 - limit of GCC < 3.1. */ 191 - asm("" : "+r" (p4), "+r" (p5)); 192 - 193 - asm volatile( 194 - #undef BLOCK 195 - #define BLOCK(i) \ 196 - LD(i, 0) \ 197 - LD(i + 1, 1) \ 198 - LD(i + 2, 2) \ 199 - LD(i + 3, 3) \ 200 - XO1(i, 0) \ 201 - XO1(i + 1, 1) \ 202 - XO1(i + 2, 2) \ 203 - XO1(i + 3, 3) \ 204 - XO2(i, 0) \ 205 - XO2(i + 1, 1) \ 206 - XO2(i + 2, 2) \ 207 - XO2(i + 3, 3) \ 208 - XO3(i, 0) \ 209 - XO3(i + 1, 1) \ 210 - XO3(i + 2, 2) \ 211 - XO3(i + 3, 3) \ 212 - XO4(i, 0) \ 213 - ST(i, 0) \ 214 - XO4(i + 1, 1) \ 215 - ST(i + 1, 1) \ 216 - XO4(i + 2, 2) \ 217 - ST(i + 2, 2) \ 218 - XO4(i + 3, 3) \ 219 - ST(i + 3, 3) 220 - 221 - " .align 32 ;\n" 222 - " 1: ;\n" 223 - 224 - BLOCK(0) 225 - BLOCK(4) 226 - BLOCK(8) 227 - BLOCK(12) 228 - 229 - " addl $128, %1 ;\n" 230 - " addl $128, %2 ;\n" 231 - " addl $128, %3 ;\n" 232 - " addl $128, %4 ;\n" 233 - " addl $128, %5 ;\n" 234 - " decl %0 ;\n" 235 - " jnz 1b ;\n" 236 - : "+r" (lines), 237 - "+r" (p1), "+r" (p2), "+r" (p3) 238 - : "r" (p4), "r" (p5) 239 - : "memory"); 240 - 241 - /* p4 and p5 were modified, and now the variables are dead. 242 - Clobber them just to be sure nobody does something stupid 243 - like assuming they have some legal value. */ 244 - asm("" : "=r" (p4), "=r" (p5)); 245 - 246 - kernel_fpu_end(); 247 - } 248 - 249 - #undef LD 250 - #undef XO1 251 - #undef XO2 252 - #undef XO3 253 - #undef XO4 254 - #undef ST 255 - #undef BLOCK 256 - 257 - static void 258 - xor_p5_mmx_2(unsigned long bytes, unsigned long * __restrict p1, 259 - const unsigned long * __restrict p2) 260 - { 261 - unsigned long lines = bytes >> 6; 262 - 263 - kernel_fpu_begin(); 264 - 265 - asm volatile( 266 - " .align 32 ;\n" 267 - " 1: ;\n" 268 - " movq (%1), %%mm0 ;\n" 269 - " movq 8(%1), %%mm1 ;\n" 270 - " pxor (%2), %%mm0 ;\n" 271 - " movq 16(%1), %%mm2 ;\n" 272 - " movq %%mm0, (%1) ;\n" 273 - " pxor 8(%2), %%mm1 ;\n" 274 - " movq 24(%1), %%mm3 ;\n" 275 - " movq %%mm1, 8(%1) ;\n" 276 - " pxor 16(%2), %%mm2 ;\n" 277 - " movq 32(%1), %%mm4 ;\n" 278 - " movq %%mm2, 16(%1) ;\n" 279 - " pxor 24(%2), %%mm3 ;\n" 280 - " movq 40(%1), %%mm5 ;\n" 281 - " movq %%mm3, 24(%1) ;\n" 282 - " pxor 32(%2), %%mm4 ;\n" 283 - " movq 48(%1), %%mm6 ;\n" 284 - " movq %%mm4, 32(%1) ;\n" 285 - " pxor 40(%2), %%mm5 ;\n" 286 - " movq 56(%1), %%mm7 ;\n" 287 - " movq %%mm5, 40(%1) ;\n" 288 - " pxor 48(%2), %%mm6 ;\n" 289 - " pxor 56(%2), %%mm7 ;\n" 290 - " movq %%mm6, 48(%1) ;\n" 291 - " movq %%mm7, 56(%1) ;\n" 292 - 293 - " addl $64, %1 ;\n" 294 - " addl $64, %2 ;\n" 295 - " decl %0 ;\n" 296 - " jnz 1b ;\n" 297 - : "+r" (lines), 298 - "+r" (p1), "+r" (p2) 299 - : 300 - : "memory"); 301 - 302 - kernel_fpu_end(); 303 - } 304 - 305 - static void 306 - xor_p5_mmx_3(unsigned long bytes, unsigned long * __restrict p1, 307 - const unsigned long * __restrict p2, 308 - const unsigned long * __restrict p3) 309 - { 310 - unsigned long lines = bytes >> 6; 311 - 312 - kernel_fpu_begin(); 313 - 314 - asm volatile( 315 - " .align 32,0x90 ;\n" 316 - " 1: ;\n" 317 - " movq (%1), %%mm0 ;\n" 318 - " movq 8(%1), %%mm1 ;\n" 319 - " pxor (%2), %%mm0 ;\n" 320 - " movq 16(%1), %%mm2 ;\n" 321 - " pxor 8(%2), %%mm1 ;\n" 322 - " pxor (%3), %%mm0 ;\n" 323 - " pxor 16(%2), %%mm2 ;\n" 324 - " movq %%mm0, (%1) ;\n" 325 - " pxor 8(%3), %%mm1 ;\n" 326 - " pxor 16(%3), %%mm2 ;\n" 327 - " movq 24(%1), %%mm3 ;\n" 328 - " movq %%mm1, 8(%1) ;\n" 329 - " movq 32(%1), %%mm4 ;\n" 330 - " movq 40(%1), %%mm5 ;\n" 331 - " pxor 24(%2), %%mm3 ;\n" 332 - " movq %%mm2, 16(%1) ;\n" 333 - " pxor 32(%2), %%mm4 ;\n" 334 - " pxor 24(%3), %%mm3 ;\n" 335 - " pxor 40(%2), %%mm5 ;\n" 336 - " movq %%mm3, 24(%1) ;\n" 337 - " pxor 32(%3), %%mm4 ;\n" 338 - " pxor 40(%3), %%mm5 ;\n" 339 - " movq 48(%1), %%mm6 ;\n" 340 - " movq %%mm4, 32(%1) ;\n" 341 - " movq 56(%1), %%mm7 ;\n" 342 - " pxor 48(%2), %%mm6 ;\n" 343 - " movq %%mm5, 40(%1) ;\n" 344 - " pxor 56(%2), %%mm7 ;\n" 345 - " pxor 48(%3), %%mm6 ;\n" 346 - " pxor 56(%3), %%mm7 ;\n" 347 - " movq %%mm6, 48(%1) ;\n" 348 - " movq %%mm7, 56(%1) ;\n" 349 - 350 - " addl $64, %1 ;\n" 351 - " addl $64, %2 ;\n" 352 - " addl $64, %3 ;\n" 353 - " decl %0 ;\n" 354 - " jnz 1b ;\n" 355 - : "+r" (lines), 356 - "+r" (p1), "+r" (p2), "+r" (p3) 357 - : 358 - : "memory" ); 359 - 360 - kernel_fpu_end(); 361 - } 362 - 363 - static void 364 - xor_p5_mmx_4(unsigned long bytes, unsigned long * __restrict p1, 365 - const unsigned long * __restrict p2, 366 - const unsigned long * __restrict p3, 367 - const unsigned long * __restrict p4) 368 - { 369 - unsigned long lines = bytes >> 6; 370 - 371 - kernel_fpu_begin(); 372 - 373 - asm volatile( 374 - " .align 32,0x90 ;\n" 375 - " 1: ;\n" 376 - " movq (%1), %%mm0 ;\n" 377 - " movq 8(%1), %%mm1 ;\n" 378 - " pxor (%2), %%mm0 ;\n" 379 - " movq 16(%1), %%mm2 ;\n" 380 - " pxor 8(%2), %%mm1 ;\n" 381 - " pxor (%3), %%mm0 ;\n" 382 - " pxor 16(%2), %%mm2 ;\n" 383 - " pxor 8(%3), %%mm1 ;\n" 384 - " pxor (%4), %%mm0 ;\n" 385 - " movq 24(%1), %%mm3 ;\n" 386 - " pxor 16(%3), %%mm2 ;\n" 387 - " pxor 8(%4), %%mm1 ;\n" 388 - " movq %%mm0, (%1) ;\n" 389 - " movq 32(%1), %%mm4 ;\n" 390 - " pxor 24(%2), %%mm3 ;\n" 391 - " pxor 16(%4), %%mm2 ;\n" 392 - " movq %%mm1, 8(%1) ;\n" 393 - " movq 40(%1), %%mm5 ;\n" 394 - " pxor 32(%2), %%mm4 ;\n" 395 - " pxor 24(%3), %%mm3 ;\n" 396 - " movq %%mm2, 16(%1) ;\n" 397 - " pxor 40(%2), %%mm5 ;\n" 398 - " pxor 32(%3), %%mm4 ;\n" 399 - " pxor 24(%4), %%mm3 ;\n" 400 - " movq %%mm3, 24(%1) ;\n" 401 - " movq 56(%1), %%mm7 ;\n" 402 - " movq 48(%1), %%mm6 ;\n" 403 - " pxor 40(%3), %%mm5 ;\n" 404 - " pxor 32(%4), %%mm4 ;\n" 405 - " pxor 48(%2), %%mm6 ;\n" 406 - " movq %%mm4, 32(%1) ;\n" 407 - " pxor 56(%2), %%mm7 ;\n" 408 - " pxor 40(%4), %%mm5 ;\n" 409 - " pxor 48(%3), %%mm6 ;\n" 410 - " pxor 56(%3), %%mm7 ;\n" 411 - " movq %%mm5, 40(%1) ;\n" 412 - " pxor 48(%4), %%mm6 ;\n" 413 - " pxor 56(%4), %%mm7 ;\n" 414 - " movq %%mm6, 48(%1) ;\n" 415 - " movq %%mm7, 56(%1) ;\n" 416 - 417 - " addl $64, %1 ;\n" 418 - " addl $64, %2 ;\n" 419 - " addl $64, %3 ;\n" 420 - " addl $64, %4 ;\n" 421 - " decl %0 ;\n" 422 - " jnz 1b ;\n" 423 - : "+r" (lines), 424 - "+r" (p1), "+r" (p2), "+r" (p3), "+r" (p4) 425 - : 426 - : "memory"); 427 - 428 - kernel_fpu_end(); 429 - } 430 - 431 - static void 432 - xor_p5_mmx_5(unsigned long bytes, unsigned long * __restrict p1, 433 - const unsigned long * __restrict p2, 434 - const unsigned long * __restrict p3, 435 - const unsigned long * __restrict p4, 436 - const unsigned long * __restrict p5) 437 - { 438 - unsigned long lines = bytes >> 6; 439 - 440 - kernel_fpu_begin(); 441 - 442 - /* Make sure GCC forgets anything it knows about p4 or p5, 443 - such that it won't pass to the asm volatile below a 444 - register that is shared with any other variable. That's 445 - because we modify p4 and p5 there, but we can't mark them 446 - as read/write, otherwise we'd overflow the 10-asm-operands 447 - limit of GCC < 3.1. */ 448 - asm("" : "+r" (p4), "+r" (p5)); 449 - 450 - asm volatile( 451 - " .align 32,0x90 ;\n" 452 - " 1: ;\n" 453 - " movq (%1), %%mm0 ;\n" 454 - " movq 8(%1), %%mm1 ;\n" 455 - " pxor (%2), %%mm0 ;\n" 456 - " pxor 8(%2), %%mm1 ;\n" 457 - " movq 16(%1), %%mm2 ;\n" 458 - " pxor (%3), %%mm0 ;\n" 459 - " pxor 8(%3), %%mm1 ;\n" 460 - " pxor 16(%2), %%mm2 ;\n" 461 - " pxor (%4), %%mm0 ;\n" 462 - " pxor 8(%4), %%mm1 ;\n" 463 - " pxor 16(%3), %%mm2 ;\n" 464 - " movq 24(%1), %%mm3 ;\n" 465 - " pxor (%5), %%mm0 ;\n" 466 - " pxor 8(%5), %%mm1 ;\n" 467 - " movq %%mm0, (%1) ;\n" 468 - " pxor 16(%4), %%mm2 ;\n" 469 - " pxor 24(%2), %%mm3 ;\n" 470 - " movq %%mm1, 8(%1) ;\n" 471 - " pxor 16(%5), %%mm2 ;\n" 472 - " pxor 24(%3), %%mm3 ;\n" 473 - " movq 32(%1), %%mm4 ;\n" 474 - " movq %%mm2, 16(%1) ;\n" 475 - " pxor 24(%4), %%mm3 ;\n" 476 - " pxor 32(%2), %%mm4 ;\n" 477 - " movq 40(%1), %%mm5 ;\n" 478 - " pxor 24(%5), %%mm3 ;\n" 479 - " pxor 32(%3), %%mm4 ;\n" 480 - " pxor 40(%2), %%mm5 ;\n" 481 - " movq %%mm3, 24(%1) ;\n" 482 - " pxor 32(%4), %%mm4 ;\n" 483 - " pxor 40(%3), %%mm5 ;\n" 484 - " movq 48(%1), %%mm6 ;\n" 485 - " movq 56(%1), %%mm7 ;\n" 486 - " pxor 32(%5), %%mm4 ;\n" 487 - " pxor 40(%4), %%mm5 ;\n" 488 - " pxor 48(%2), %%mm6 ;\n" 489 - " pxor 56(%2), %%mm7 ;\n" 490 - " movq %%mm4, 32(%1) ;\n" 491 - " pxor 48(%3), %%mm6 ;\n" 492 - " pxor 56(%3), %%mm7 ;\n" 493 - " pxor 40(%5), %%mm5 ;\n" 494 - " pxor 48(%4), %%mm6 ;\n" 495 - " pxor 56(%4), %%mm7 ;\n" 496 - " movq %%mm5, 40(%1) ;\n" 497 - " pxor 48(%5), %%mm6 ;\n" 498 - " pxor 56(%5), %%mm7 ;\n" 499 - " movq %%mm6, 48(%1) ;\n" 500 - " movq %%mm7, 56(%1) ;\n" 501 - 502 - " addl $64, %1 ;\n" 503 - " addl $64, %2 ;\n" 504 - " addl $64, %3 ;\n" 505 - " addl $64, %4 ;\n" 506 - " addl $64, %5 ;\n" 507 - " decl %0 ;\n" 508 - " jnz 1b ;\n" 509 - : "+r" (lines), 510 - "+r" (p1), "+r" (p2), "+r" (p3) 511 - : "r" (p4), "r" (p5) 512 - : "memory"); 513 - 514 - /* p4 and p5 were modified, and now the variables are dead. 515 - Clobber them just to be sure nobody does something stupid 516 - like assuming they have some legal value. */ 517 - asm("" : "=r" (p4), "=r" (p5)); 518 - 519 - kernel_fpu_end(); 520 - } 521 - 522 - static struct xor_block_template xor_block_pII_mmx = { 523 - .name = "pII_mmx", 524 - .do_2 = xor_pII_mmx_2, 525 - .do_3 = xor_pII_mmx_3, 526 - .do_4 = xor_pII_mmx_4, 527 - .do_5 = xor_pII_mmx_5, 528 - }; 529 - 530 - static struct xor_block_template xor_block_p5_mmx = { 531 - .name = "p5_mmx", 532 - .do_2 = xor_p5_mmx_2, 533 - .do_3 = xor_p5_mmx_3, 534 - .do_4 = xor_p5_mmx_4, 535 - .do_5 = xor_p5_mmx_5, 536 - }; 537 - 538 - static struct xor_block_template xor_block_pIII_sse = { 539 - .name = "pIII_sse", 540 - .do_2 = xor_sse_2, 541 - .do_3 = xor_sse_3, 542 - .do_4 = xor_sse_4, 543 - .do_5 = xor_sse_5, 544 - }; 545 - 546 - /* Also try the AVX routines */ 547 - #include <asm/xor_avx.h> 548 - 549 - /* Also try the generic routines. */ 550 - #include <asm-generic/xor.h> 551 - 552 - /* We force the use of the SSE xor block because it can write around L2. 553 - We may also be able to load into the L1 only depending on how the cpu 554 - deals with a load to a line that is being prefetched. */ 555 - #undef XOR_TRY_TEMPLATES 556 - #define XOR_TRY_TEMPLATES \ 557 - do { \ 558 - AVX_XOR_SPEED; \ 559 - if (boot_cpu_has(X86_FEATURE_XMM)) { \ 560 - xor_speed(&xor_block_pIII_sse); \ 561 - xor_speed(&xor_block_sse_pf64); \ 562 - } else if (boot_cpu_has(X86_FEATURE_MMX)) { \ 563 - xor_speed(&xor_block_pII_mmx); \ 564 - xor_speed(&xor_block_p5_mmx); \ 565 - } else { \ 566 - xor_speed(&xor_block_8regs); \ 567 - xor_speed(&xor_block_8regs_p); \ 568 - xor_speed(&xor_block_32regs); \ 569 - xor_speed(&xor_block_32regs_p); \ 570 - } \ 571 - } while (0) 572 - 573 - #endif /* _ASM_X86_XOR_32_H */

-28

arch/x86/include/asm/xor_64.h

··· 1 - /* SPDX-License-Identifier: GPL-2.0 */ 2 - #ifndef _ASM_X86_XOR_64_H 3 - #define _ASM_X86_XOR_64_H 4 - 5 - static struct xor_block_template xor_block_sse = { 6 - .name = "generic_sse", 7 - .do_2 = xor_sse_2, 8 - .do_3 = xor_sse_3, 9 - .do_4 = xor_sse_4, 10 - .do_5 = xor_sse_5, 11 - }; 12 - 13 - 14 - /* Also try the AVX routines */ 15 - #include <asm/xor_avx.h> 16 - 17 - /* We force the use of the SSE xor block because it can write around L2. 18 - We may also be able to load into the L1 only depending on how the cpu 19 - deals with a load to a line that is being prefetched. */ 20 - #undef XOR_TRY_TEMPLATES 21 - #define XOR_TRY_TEMPLATES \ 22 - do { \ 23 - AVX_XOR_SPEED; \ 24 - xor_speed(&xor_block_sse_pf64); \ 25 - xor_speed(&xor_block_sse); \ 26 - } while (0) 27 - 28 - #endif /* _ASM_X86_XOR_64_H */

-178

arch/x86/include/asm/xor_avx.h

··· 1 - /* SPDX-License-Identifier: GPL-2.0-only */ 2 - #ifndef _ASM_X86_XOR_AVX_H 3 - #define _ASM_X86_XOR_AVX_H 4 - 5 - /* 6 - * Optimized RAID-5 checksumming functions for AVX 7 - * 8 - * Copyright (C) 2012 Intel Corporation 9 - * Author: Jim Kukunas <james.t.kukunas@linux.intel.com> 10 - * 11 - * Based on Ingo Molnar and Zach Brown's respective MMX and SSE routines 12 - */ 13 - 14 - #include <linux/compiler.h> 15 - #include <asm/fpu/api.h> 16 - 17 - #define BLOCK4(i) \ 18 - BLOCK(32 * i, 0) \ 19 - BLOCK(32 * (i + 1), 1) \ 20 - BLOCK(32 * (i + 2), 2) \ 21 - BLOCK(32 * (i + 3), 3) 22 - 23 - #define BLOCK16() \ 24 - BLOCK4(0) \ 25 - BLOCK4(4) \ 26 - BLOCK4(8) \ 27 - BLOCK4(12) 28 - 29 - static void xor_avx_2(unsigned long bytes, unsigned long * __restrict p0, 30 - const unsigned long * __restrict p1) 31 - { 32 - unsigned long lines = bytes >> 9; 33 - 34 - kernel_fpu_begin(); 35 - 36 - while (lines--) { 37 - #undef BLOCK 38 - #define BLOCK(i, reg) \ 39 - do { \ 40 - asm volatile("vmovdqa %0, %%ymm" #reg : : "m" (p1[i / sizeof(*p1)])); \ 41 - asm volatile("vxorps %0, %%ymm" #reg ", %%ymm" #reg : : \ 42 - "m" (p0[i / sizeof(*p0)])); \ 43 - asm volatile("vmovdqa %%ymm" #reg ", %0" : \ 44 - "=m" (p0[i / sizeof(*p0)])); \ 45 - } while (0); 46 - 47 - BLOCK16() 48 - 49 - p0 = (unsigned long *)((uintptr_t)p0 + 512); 50 - p1 = (unsigned long *)((uintptr_t)p1 + 512); 51 - } 52 - 53 - kernel_fpu_end(); 54 - } 55 - 56 - static void xor_avx_3(unsigned long bytes, unsigned long * __restrict p0, 57 - const unsigned long * __restrict p1, 58 - const unsigned long * __restrict p2) 59 - { 60 - unsigned long lines = bytes >> 9; 61 - 62 - kernel_fpu_begin(); 63 - 64 - while (lines--) { 65 - #undef BLOCK 66 - #define BLOCK(i, reg) \ 67 - do { \ 68 - asm volatile("vmovdqa %0, %%ymm" #reg : : "m" (p2[i / sizeof(*p2)])); \ 69 - asm volatile("vxorps %0, %%ymm" #reg ", %%ymm" #reg : : \ 70 - "m" (p1[i / sizeof(*p1)])); \ 71 - asm volatile("vxorps %0, %%ymm" #reg ", %%ymm" #reg : : \ 72 - "m" (p0[i / sizeof(*p0)])); \ 73 - asm volatile("vmovdqa %%ymm" #reg ", %0" : \ 74 - "=m" (p0[i / sizeof(*p0)])); \ 75 - } while (0); 76 - 77 - BLOCK16() 78 - 79 - p0 = (unsigned long *)((uintptr_t)p0 + 512); 80 - p1 = (unsigned long *)((uintptr_t)p1 + 512); 81 - p2 = (unsigned long *)((uintptr_t)p2 + 512); 82 - } 83 - 84 - kernel_fpu_end(); 85 - } 86 - 87 - static void xor_avx_4(unsigned long bytes, unsigned long * __restrict p0, 88 - const unsigned long * __restrict p1, 89 - const unsigned long * __restrict p2, 90 - const unsigned long * __restrict p3) 91 - { 92 - unsigned long lines = bytes >> 9; 93 - 94 - kernel_fpu_begin(); 95 - 96 - while (lines--) { 97 - #undef BLOCK 98 - #define BLOCK(i, reg) \ 99 - do { \ 100 - asm volatile("vmovdqa %0, %%ymm" #reg : : "m" (p3[i / sizeof(*p3)])); \ 101 - asm volatile("vxorps %0, %%ymm" #reg ", %%ymm" #reg : : \ 102 - "m" (p2[i / sizeof(*p2)])); \ 103 - asm volatile("vxorps %0, %%ymm" #reg ", %%ymm" #reg : : \ 104 - "m" (p1[i / sizeof(*p1)])); \ 105 - asm volatile("vxorps %0, %%ymm" #reg ", %%ymm" #reg : : \ 106 - "m" (p0[i / sizeof(*p0)])); \ 107 - asm volatile("vmovdqa %%ymm" #reg ", %0" : \ 108 - "=m" (p0[i / sizeof(*p0)])); \ 109 - } while (0); 110 - 111 - BLOCK16(); 112 - 113 - p0 = (unsigned long *)((uintptr_t)p0 + 512); 114 - p1 = (unsigned long *)((uintptr_t)p1 + 512); 115 - p2 = (unsigned long *)((uintptr_t)p2 + 512); 116 - p3 = (unsigned long *)((uintptr_t)p3 + 512); 117 - } 118 - 119 - kernel_fpu_end(); 120 - } 121 - 122 - static void xor_avx_5(unsigned long bytes, unsigned long * __restrict p0, 123 - const unsigned long * __restrict p1, 124 - const unsigned long * __restrict p2, 125 - const unsigned long * __restrict p3, 126 - const unsigned long * __restrict p4) 127 - { 128 - unsigned long lines = bytes >> 9; 129 - 130 - kernel_fpu_begin(); 131 - 132 - while (lines--) { 133 - #undef BLOCK 134 - #define BLOCK(i, reg) \ 135 - do { \ 136 - asm volatile("vmovdqa %0, %%ymm" #reg : : "m" (p4[i / sizeof(*p4)])); \ 137 - asm volatile("vxorps %0, %%ymm" #reg ", %%ymm" #reg : : \ 138 - "m" (p3[i / sizeof(*p3)])); \ 139 - asm volatile("vxorps %0, %%ymm" #reg ", %%ymm" #reg : : \ 140 - "m" (p2[i / sizeof(*p2)])); \ 141 - asm volatile("vxorps %0, %%ymm" #reg ", %%ymm" #reg : : \ 142 - "m" (p1[i / sizeof(*p1)])); \ 143 - asm volatile("vxorps %0, %%ymm" #reg ", %%ymm" #reg : : \ 144 - "m" (p0[i / sizeof(*p0)])); \ 145 - asm volatile("vmovdqa %%ymm" #reg ", %0" : \ 146 - "=m" (p0[i / sizeof(*p0)])); \ 147 - } while (0); 148 - 149 - BLOCK16() 150 - 151 - p0 = (unsigned long *)((uintptr_t)p0 + 512); 152 - p1 = (unsigned long *)((uintptr_t)p1 + 512); 153 - p2 = (unsigned long *)((uintptr_t)p2 + 512); 154 - p3 = (unsigned long *)((uintptr_t)p3 + 512); 155 - p4 = (unsigned long *)((uintptr_t)p4 + 512); 156 - } 157 - 158 - kernel_fpu_end(); 159 - } 160 - 161 - static struct xor_block_template xor_block_avx = { 162 - .name = "avx", 163 - .do_2 = xor_avx_2, 164 - .do_3 = xor_avx_3, 165 - .do_4 = xor_avx_4, 166 - .do_5 = xor_avx_5, 167 - }; 168 - 169 - #define AVX_XOR_SPEED \ 170 - do { \ 171 - if (boot_cpu_has(X86_FEATURE_AVX) && boot_cpu_has(X86_FEATURE_OSXSAVE)) \ 172 - xor_speed(&xor_block_avx); \ 173 - } while (0) 174 - 175 - #define AVX_SELECT(FASTEST) \ 176 - (boot_cpu_has(X86_FEATURE_AVX) && boot_cpu_has(X86_FEATURE_OSXSAVE) ? &xor_block_avx : FASTEST) 177 - 178 - #endif

+1 -5

arch/x86/kernel/kexec-bzimage64.c

··· 525 525 if (ret) 526 526 return ERR_PTR(ret); 527 527 ret = crash_load_dm_crypt_keys(image); 528 - if (ret == -ENOENT) { 529 - kexec_dprintk("No dm crypt key to load\n"); 530 - } else if (ret) { 531 - pr_err("Failed to load dm crypt keys\n"); 528 + if (ret) 532 529 return ERR_PTR(ret); 533 - } 534 530 if (image->dm_crypt_keys_addr && 535 531 cmdline_len + MAX_ELFCOREHDR_STR_LEN + MAX_DMCRYPTKEYS_STR_LEN > 536 532 header->cmdline_size) {

-2

crypto/Kconfig

··· 2 2 # 3 3 # Generic algorithms support 4 4 # 5 - config XOR_BLOCKS 6 - tristate 7 5 8 6 # 9 7 # async_tx api: hardware offloaded memory transfer/transform support

-1

crypto/Makefile

··· 193 193 # 194 194 # generic algorithms and the async_tx api 195 195 # 196 - obj-$(CONFIG_XOR_BLOCKS) += xor.o 197 196 obj-$(CONFIG_ASYNC_CORE) += async_tx/ 198 197 obj-$(CONFIG_ASYMMETRIC_KEY_TYPE) += asymmetric_keys/ 199 198 crypto_simd-y := simd.o

+10 -24

crypto/async_tx/async_xor.c

··· 103 103 { 104 104 int i; 105 105 int xor_src_cnt = 0; 106 - int src_off = 0; 107 106 void *dest_buf; 108 107 void **srcs; 109 108 ··· 116 117 if (src_list[i]) 117 118 srcs[xor_src_cnt++] = page_address(src_list[i]) + 118 119 (src_offs ? src_offs[i] : offset); 119 - src_cnt = xor_src_cnt; 120 + 120 121 /* set destination address */ 121 122 dest_buf = page_address(dest) + offset; 122 - 123 123 if (submit->flags & ASYNC_TX_XOR_ZERO_DST) 124 124 memset(dest_buf, 0, len); 125 - 126 - while (src_cnt > 0) { 127 - /* process up to 'MAX_XOR_BLOCKS' sources */ 128 - xor_src_cnt = min(src_cnt, MAX_XOR_BLOCKS); 129 - xor_blocks(xor_src_cnt, len, dest_buf, &srcs[src_off]); 130 - 131 - /* drop completed sources */ 132 - src_cnt -= xor_src_cnt; 133 - src_off += xor_src_cnt; 134 - } 135 - 125 + xor_gen(dest_buf, srcs, xor_src_cnt, len); 136 126 async_tx_sync_epilog(submit); 137 127 } 138 128 ··· 156 168 * 157 169 * honored flags: ASYNC_TX_ACK, ASYNC_TX_XOR_ZERO_DST, ASYNC_TX_XOR_DROP_DST 158 170 * 159 - * xor_blocks always uses the dest as a source so the 160 - * ASYNC_TX_XOR_ZERO_DST flag must be set to not include dest data in 161 - * the calculation. The assumption with dma engines is that they only 162 - * use the destination buffer as a source when it is explicitly specified 163 - * in the source list. 171 + * xor_gen always uses the dest as a source so the ASYNC_TX_XOR_ZERO_DST flag 172 + * must be set to not include dest data in the calculation. The assumption with 173 + * dma engines is that they only use the destination buffer as a source when it 174 + * is explicitly specified in the source list. 164 175 * 165 176 * src_list note: if the dest is also a source it must be at index zero. 166 177 * The contents of this array will be overwritten if a scribble region ··· 246 259 * 247 260 * honored flags: ASYNC_TX_ACK, ASYNC_TX_XOR_ZERO_DST, ASYNC_TX_XOR_DROP_DST 248 261 * 249 - * xor_blocks always uses the dest as a source so the 250 - * ASYNC_TX_XOR_ZERO_DST flag must be set to not include dest data in 251 - * the calculation. The assumption with dma engines is that they only 252 - * use the destination buffer as a source when it is explicitly specified 253 - * in the source list. 262 + * xor_gen always uses the dest as a source so the ASYNC_TX_XOR_ZERO_DST flag 263 + * must be set to not include dest data in the calculation. The assumption with 264 + * dma engines is that they only use the destination buffer as a source when it 265 + * is explicitly specified in the source list. 254 266 * 255 267 * src_list note: if the dest is also a source it must be at index zero. 256 268 * The contents of this array will be overwritten if a scribble region

-174

crypto/xor.c

··· 1 - // SPDX-License-Identifier: GPL-2.0-or-later 2 - /* 3 - * xor.c : Multiple Devices driver for Linux 4 - * 5 - * Copyright (C) 1996, 1997, 1998, 1999, 2000, 6 - * Ingo Molnar, Matti Aarnio, Jakub Jelinek, Richard Henderson. 7 - * 8 - * Dispatch optimized RAID-5 checksumming functions. 9 - */ 10 - 11 - #define BH_TRACE 0 12 - #include <linux/module.h> 13 - #include <linux/gfp.h> 14 - #include <linux/raid/xor.h> 15 - #include <linux/jiffies.h> 16 - #include <linux/preempt.h> 17 - #include <asm/xor.h> 18 - 19 - #ifndef XOR_SELECT_TEMPLATE 20 - #define XOR_SELECT_TEMPLATE(x) (x) 21 - #endif 22 - 23 - /* The xor routines to use. */ 24 - static struct xor_block_template *active_template; 25 - 26 - void 27 - xor_blocks(unsigned int src_count, unsigned int bytes, void *dest, void **srcs) 28 - { 29 - unsigned long *p1, *p2, *p3, *p4; 30 - 31 - p1 = (unsigned long *) srcs[0]; 32 - if (src_count == 1) { 33 - active_template->do_2(bytes, dest, p1); 34 - return; 35 - } 36 - 37 - p2 = (unsigned long *) srcs[1]; 38 - if (src_count == 2) { 39 - active_template->do_3(bytes, dest, p1, p2); 40 - return; 41 - } 42 - 43 - p3 = (unsigned long *) srcs[2]; 44 - if (src_count == 3) { 45 - active_template->do_4(bytes, dest, p1, p2, p3); 46 - return; 47 - } 48 - 49 - p4 = (unsigned long *) srcs[3]; 50 - active_template->do_5(bytes, dest, p1, p2, p3, p4); 51 - } 52 - EXPORT_SYMBOL(xor_blocks); 53 - 54 - /* Set of all registered templates. */ 55 - static struct xor_block_template *__initdata template_list; 56 - 57 - #ifndef MODULE 58 - static void __init do_xor_register(struct xor_block_template *tmpl) 59 - { 60 - tmpl->next = template_list; 61 - template_list = tmpl; 62 - } 63 - 64 - static int __init register_xor_blocks(void) 65 - { 66 - active_template = XOR_SELECT_TEMPLATE(NULL); 67 - 68 - if (!active_template) { 69 - #define xor_speed do_xor_register 70 - // register all the templates and pick the first as the default 71 - XOR_TRY_TEMPLATES; 72 - #undef xor_speed 73 - active_template = template_list; 74 - } 75 - return 0; 76 - } 77 - #endif 78 - 79 - #define BENCH_SIZE 4096 80 - #define REPS 800U 81 - 82 - static void __init 83 - do_xor_speed(struct xor_block_template *tmpl, void *b1, void *b2) 84 - { 85 - int speed; 86 - unsigned long reps; 87 - ktime_t min, start, t0; 88 - 89 - tmpl->next = template_list; 90 - template_list = tmpl; 91 - 92 - preempt_disable(); 93 - 94 - reps = 0; 95 - t0 = ktime_get(); 96 - /* delay start until time has advanced */ 97 - while ((start = ktime_get()) == t0) 98 - cpu_relax(); 99 - do { 100 - mb(); /* prevent loop optimization */ 101 - tmpl->do_2(BENCH_SIZE, b1, b2); 102 - mb(); 103 - } while (reps++ < REPS || (t0 = ktime_get()) == start); 104 - min = ktime_sub(t0, start); 105 - 106 - preempt_enable(); 107 - 108 - // bytes/ns == GB/s, multiply by 1000 to get MB/s [not MiB/s] 109 - speed = (1000 * reps * BENCH_SIZE) / (unsigned int)ktime_to_ns(min); 110 - tmpl->speed = speed; 111 - 112 - pr_info(" %-16s: %5d MB/sec\n", tmpl->name, speed); 113 - } 114 - 115 - static int __init 116 - calibrate_xor_blocks(void) 117 - { 118 - void *b1, *b2; 119 - struct xor_block_template *f, *fastest; 120 - 121 - fastest = XOR_SELECT_TEMPLATE(NULL); 122 - 123 - if (fastest) { 124 - printk(KERN_INFO "xor: automatically using best " 125 - "checksumming function %-10s\n", 126 - fastest->name); 127 - goto out; 128 - } 129 - 130 - b1 = (void *) __get_free_pages(GFP_KERNEL, 2); 131 - if (!b1) { 132 - printk(KERN_WARNING "xor: Yikes! No memory available.\n"); 133 - return -ENOMEM; 134 - } 135 - b2 = b1 + 2*PAGE_SIZE + BENCH_SIZE; 136 - 137 - /* 138 - * If this arch/cpu has a short-circuited selection, don't loop through 139 - * all the possible functions, just test the best one 140 - */ 141 - 142 - #define xor_speed(templ) do_xor_speed((templ), b1, b2) 143 - 144 - printk(KERN_INFO "xor: measuring software checksum speed\n"); 145 - template_list = NULL; 146 - XOR_TRY_TEMPLATES; 147 - fastest = template_list; 148 - for (f = fastest; f; f = f->next) 149 - if (f->speed > fastest->speed) 150 - fastest = f; 151 - 152 - pr_info("xor: using function: %s (%d MB/sec)\n", 153 - fastest->name, fastest->speed); 154 - 155 - #undef xor_speed 156 - 157 - free_pages((unsigned long)b1, 2); 158 - out: 159 - active_template = fastest; 160 - return 0; 161 - } 162 - 163 - static __exit void xor_exit(void) { } 164 - 165 - MODULE_DESCRIPTION("RAID-5 checksumming functions"); 166 - MODULE_LICENSE("GPL"); 167 - 168 - #ifndef MODULE 169 - /* when built-in xor.o must initialize before drivers/md/md.o */ 170 - core_initcall(register_xor_blocks); 171 - #endif 172 - 173 - module_init(calibrate_xor_blocks); 174 - module_exit(xor_exit);

+21

drivers/of/fdt.c

··· 866 866 elfcorehdr_addr, elfcorehdr_size); 867 867 } 868 868 869 + static void __init early_init_dt_check_for_dmcryptkeys(unsigned long node) 870 + { 871 + const char *prop_name = "linux,dmcryptkeys"; 872 + const __be32 *prop; 873 + 874 + if (!IS_ENABLED(CONFIG_CRASH_DM_CRYPT)) 875 + return; 876 + 877 + pr_debug("Looking for dmcryptkeys property... "); 878 + 879 + prop = of_get_flat_dt_prop(node, prop_name, NULL); 880 + if (!prop) 881 + return; 882 + 883 + dm_crypt_keys_addr = dt_mem_next_cell(dt_root_addr_cells, &prop); 884 + 885 + /* Property only accessible to crash dump kernel */ 886 + fdt_delprop(initial_boot_params, node, prop_name); 887 + } 888 + 869 889 static unsigned long chosen_node_offset = -FDT_ERR_NOTFOUND; 870 890 871 891 /* ··· 1117 1097 1118 1098 early_init_dt_check_for_initrd(node); 1119 1099 early_init_dt_check_for_elfcorehdr(node); 1100 + early_init_dt_check_for_dmcryptkeys(node); 1120 1101 1121 1102 rng_seed = of_get_flat_dt_prop(node, "rng-seed", &l); 1122 1103 if (rng_seed && l > 0) {

+19

drivers/of/kexec.c

··· 423 423 if (ret) 424 424 goto out; 425 425 426 + if (image->dm_crypt_keys_addr != 0) { 427 + ret = fdt_appendprop_addrrange(fdt, 0, chosen_node, 428 + "linux,dmcryptkeys", 429 + image->dm_crypt_keys_addr, 430 + image->dm_crypt_keys_sz); 431 + 432 + if (ret) 433 + goto out; 434 + 435 + /* 436 + * Avoid dmcryptkeys from being stomped on in kdump kernel by 437 + * setting up memory reserve map. 438 + */ 439 + ret = fdt_add_mem_rsv(fdt, image->dm_crypt_keys_addr, 440 + image->dm_crypt_keys_sz); 441 + if (ret) 442 + goto out; 443 + } 444 + 426 445 #ifdef CONFIG_CRASH_DUMP 427 446 /* add linux,usable-memory-range */ 428 447 ret = fdt_appendprop_addrrange(fdt, 0, chosen_node,

+4 -23

fs/btrfs/raid56.c

··· 618 618 } 619 619 620 620 /* 621 - * helper function to run the xor_blocks api. It is only 622 - * able to do MAX_XOR_BLOCKS at a time, so we need to 623 - * loop through. 624 - */ 625 - static void run_xor(void **pages, int src_cnt, ssize_t len) 626 - { 627 - int src_off = 0; 628 - int xor_src_cnt = 0; 629 - void *dest = pages[src_cnt]; 630 - 631 - while(src_cnt > 0) { 632 - xor_src_cnt = min(src_cnt, MAX_XOR_BLOCKS); 633 - xor_blocks(xor_src_cnt, len, dest, pages + src_off); 634 - 635 - src_cnt -= xor_src_cnt; 636 - src_off += xor_src_cnt; 637 - } 638 - } 639 - 640 - /* 641 621 * Returns true if the bio list inside this rbio covers an entire stripe (no 642 622 * rmw required). 643 623 */ ··· 1414 1434 } else { 1415 1435 /* raid5 */ 1416 1436 memcpy(pointers[rbio->nr_data], pointers[0], step); 1417 - run_xor(pointers + 1, rbio->nr_data - 1, step); 1437 + xor_gen(pointers[rbio->nr_data], pointers + 1, rbio->nr_data - 1, 1438 + step); 1418 1439 } 1419 1440 for (stripe = stripe - 1; stripe >= 0; stripe--) 1420 1441 kunmap_local(pointers[stripe]); ··· 2010 2029 pointers[rbio->nr_data - 1] = p; 2011 2030 2012 2031 /* Xor in the rest */ 2013 - run_xor(pointers, rbio->nr_data - 1, step); 2032 + xor_gen(p, pointers, rbio->nr_data - 1, step); 2014 2033 } 2015 2034 2016 2035 cleanup: ··· 2648 2667 } else { 2649 2668 /* RAID5. */ 2650 2669 memcpy(pointers[nr_data], pointers[0], step); 2651 - run_xor(pointers + 1, nr_data - 1, step); 2670 + xor_gen(pointers[nr_data], pointers + 1, nr_data - 1, step); 2652 2671 } 2653 2672 2654 2673 /* Check scrubbing parity and repair it. */

+22 -13

fs/ocfs2/alloc.c

··· 917 917 goto bail; 918 918 } 919 919 920 - if (le32_to_cpu(eb->h_fs_generation) != OCFS2_SB(sb)->fs_generation) 920 + if (le32_to_cpu(eb->h_fs_generation) != OCFS2_SB(sb)->fs_generation) { 921 921 rc = ocfs2_error(sb, 922 922 "Extent block #%llu has an invalid h_fs_generation of #%u\n", 923 923 (unsigned long long)bh->b_blocknr, 924 924 le32_to_cpu(eb->h_fs_generation)); 925 + goto bail; 926 + } 927 + 928 + if (le16_to_cpu(eb->h_list.l_count) != ocfs2_extent_recs_per_eb(sb)) { 929 + rc = ocfs2_error(sb, 930 + "Extent block #%llu has invalid l_count %u (expected %u)\n", 931 + (unsigned long long)bh->b_blocknr, 932 + le16_to_cpu(eb->h_list.l_count), 933 + ocfs2_extent_recs_per_eb(sb)); 934 + goto bail; 935 + } 936 + 937 + if (le16_to_cpu(eb->h_list.l_next_free_rec) > le16_to_cpu(eb->h_list.l_count)) { 938 + rc = ocfs2_error(sb, 939 + "Extent block #%llu has invalid l_next_free_rec %u (l_count %u)\n", 940 + (unsigned long long)bh->b_blocknr, 941 + le16_to_cpu(eb->h_list.l_next_free_rec), 942 + le16_to_cpu(eb->h_list.l_count)); 943 + goto bail; 944 + } 945 + 925 946 bail: 926 947 return rc; 927 948 } ··· 1877 1856 1878 1857 eb = (struct ocfs2_extent_block *) bh->b_data; 1879 1858 el = &eb->h_list; 1880 - 1881 - if (le16_to_cpu(el->l_next_free_rec) > 1882 - le16_to_cpu(el->l_count)) { 1883 - ocfs2_error(ocfs2_metadata_cache_get_super(ci), 1884 - "Owner %llu has bad count in extent list at block %llu (next free=%u, count=%u)\n", 1885 - (unsigned long long)ocfs2_metadata_cache_owner(ci), 1886 - (unsigned long long)bh->b_blocknr, 1887 - le16_to_cpu(el->l_next_free_rec), 1888 - le16_to_cpu(el->l_count)); 1889 - ret = -EROFS; 1890 - goto out; 1891 - } 1892 1859 1893 1860 if (func) 1894 1861 func(data, bh);

+45 -30

fs/ocfs2/aops.c

··· 37 37 #include "namei.h" 38 38 #include "sysfile.h" 39 39 40 + #define OCFS2_DIO_MARK_EXTENT_BATCH 200 41 + 40 42 static int ocfs2_symlink_get_block(struct inode *inode, sector_t iblock, 41 43 struct buffer_head *bh_result, int create) 42 44 { ··· 2279 2277 struct ocfs2_alloc_context *meta_ac = NULL; 2280 2278 handle_t *handle = NULL; 2281 2279 loff_t end = offset + bytes; 2282 - int ret = 0, credits = 0; 2280 + int ret = 0, credits = 0, batch = 0; 2283 2281 2284 2282 ocfs2_init_dealloc_ctxt(&dealloc); 2285 2283 ··· 2297 2295 } 2298 2296 2299 2297 down_write(&oi->ip_alloc_sem); 2300 - 2301 - /* Delete orphan before acquire i_rwsem. */ 2302 - if (dwc->dw_orphaned) { 2303 - BUG_ON(dwc->dw_writer_pid != task_pid_nr(current)); 2304 - 2305 - end = end > i_size_read(inode) ? end : 0; 2306 - 2307 - ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 2308 - !!end, end); 2309 - if (ret < 0) 2310 - mlog_errno(ret); 2311 - } 2312 - 2313 2298 di = (struct ocfs2_dinode *)di_bh->b_data; 2314 2299 2315 2300 ocfs2_init_dinode_extent_tree(&et, INODE_CACHE(inode), di_bh); ··· 2316 2327 2317 2328 credits = ocfs2_calc_extend_credits(inode->i_sb, &di->id2.i_list); 2318 2329 2319 - handle = ocfs2_start_trans(osb, credits); 2320 - if (IS_ERR(handle)) { 2321 - ret = PTR_ERR(handle); 2322 - mlog_errno(ret); 2323 - goto unlock; 2324 - } 2325 - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh, 2326 - OCFS2_JOURNAL_ACCESS_WRITE); 2327 - if (ret) { 2328 - mlog_errno(ret); 2329 - goto commit; 2330 - } 2331 - 2332 2330 list_for_each_entry(ue, &dwc->dw_zero_list, ue_node) { 2331 + if (!handle) { 2332 + handle = ocfs2_start_trans(osb, credits); 2333 + if (IS_ERR(handle)) { 2334 + ret = PTR_ERR(handle); 2335 + mlog_errno(ret); 2336 + goto unlock; 2337 + } 2338 + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh, 2339 + OCFS2_JOURNAL_ACCESS_WRITE); 2340 + if (ret) { 2341 + mlog_errno(ret); 2342 + goto commit; 2343 + } 2344 + } 2333 2345 ret = ocfs2_assure_trans_credits(handle, credits); 2334 2346 if (ret < 0) { 2335 2347 mlog_errno(ret); 2336 - break; 2348 + goto commit; 2337 2349 } 2338 2350 ret = ocfs2_mark_extent_written(inode, &et, handle, 2339 2351 ue->ue_cpos, 1, ··· 2342 2352 meta_ac, &dealloc); 2343 2353 if (ret < 0) { 2344 2354 mlog_errno(ret); 2345 - break; 2355 + goto commit; 2356 + } 2357 + 2358 + if (++batch == OCFS2_DIO_MARK_EXTENT_BATCH) { 2359 + ocfs2_commit_trans(osb, handle); 2360 + handle = NULL; 2361 + batch = 0; 2346 2362 } 2347 2363 } 2348 2364 2349 2365 if (end > i_size_read(inode)) { 2366 + if (!handle) { 2367 + handle = ocfs2_start_trans(osb, credits); 2368 + if (IS_ERR(handle)) { 2369 + ret = PTR_ERR(handle); 2370 + mlog_errno(ret); 2371 + goto unlock; 2372 + } 2373 + } 2350 2374 ret = ocfs2_set_inode_size(handle, inode, di_bh, end); 2351 2375 if (ret < 0) 2352 2376 mlog_errno(ret); 2353 2377 } 2378 + 2354 2379 commit: 2355 - ocfs2_commit_trans(osb, handle); 2380 + if (handle) 2381 + ocfs2_commit_trans(osb, handle); 2356 2382 unlock: 2357 2383 up_write(&oi->ip_alloc_sem); 2384 + 2385 + /* everything looks good, let's start the cleanup */ 2386 + if (!ret && dwc->dw_orphaned) { 2387 + BUG_ON(dwc->dw_writer_pid != task_pid_nr(current)); 2388 + 2389 + ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0); 2390 + if (ret < 0) 2391 + mlog_errno(ret); 2392 + } 2358 2393 ocfs2_inode_unlock(inode, 1); 2359 2394 brelse(di_bh); 2360 2395 out:

+56 -27

fs/ocfs2/cluster/heartbeat.c

··· 1488 1488 return item ? container_of(item, struct o2hb_region, hr_item) : NULL; 1489 1489 } 1490 1490 1491 - /* drop_item only drops its ref after killing the thread, nothing should 1492 - * be using the region anymore. this has to clean up any state that 1493 - * attributes might have built up. */ 1494 - static void o2hb_region_release(struct config_item *item) 1491 + static void o2hb_unmap_slot_data(struct o2hb_region *reg) 1495 1492 { 1496 1493 int i; 1497 1494 struct page *page; 1498 - struct o2hb_region *reg = to_o2hb_region(item); 1499 - 1500 - mlog(ML_HEARTBEAT, "hb region release (%pg)\n", reg_bdev(reg)); 1501 - 1502 - kfree(reg->hr_tmp_block); 1503 1495 1504 1496 if (reg->hr_slot_data) { 1505 1497 for (i = 0; i < reg->hr_num_pages; i++) { 1506 1498 page = reg->hr_slot_data[i]; 1507 - if (page) 1499 + if (page) { 1508 1500 __free_page(page); 1501 + reg->hr_slot_data[i] = NULL; 1502 + } 1509 1503 } 1510 1504 kfree(reg->hr_slot_data); 1505 + reg->hr_slot_data = NULL; 1511 1506 } 1507 + 1508 + kfree(reg->hr_slots); 1509 + reg->hr_slots = NULL; 1510 + 1511 + kfree(reg->hr_tmp_block); 1512 + reg->hr_tmp_block = NULL; 1513 + } 1514 + 1515 + /* drop_item only drops its ref after killing the thread, nothing should 1516 + * be using the region anymore. this has to clean up any state that 1517 + * attributes might have built up. 1518 + */ 1519 + static void o2hb_region_release(struct config_item *item) 1520 + { 1521 + struct o2hb_region *reg = to_o2hb_region(item); 1522 + 1523 + mlog(ML_HEARTBEAT, "hb region release (%pg)\n", reg_bdev(reg)); 1524 + 1525 + o2hb_unmap_slot_data(reg); 1512 1526 1513 1527 if (reg->hr_bdev_file) 1514 1528 fput(reg->hr_bdev_file); 1515 - 1516 - kfree(reg->hr_slots); 1517 1529 1518 1530 debugfs_remove_recursive(reg->hr_debug_dir); 1519 1531 kfree(reg->hr_db_livenodes); ··· 1679 1667 static int o2hb_map_slot_data(struct o2hb_region *reg) 1680 1668 { 1681 1669 int i, j; 1670 + int ret = -ENOMEM; 1682 1671 unsigned int last_slot; 1683 1672 unsigned int spp = reg->hr_slots_per_page; 1684 1673 struct page *page; ··· 1687 1674 struct o2hb_disk_slot *slot; 1688 1675 1689 1676 reg->hr_tmp_block = kmalloc(reg->hr_block_bytes, GFP_KERNEL); 1690 - if (reg->hr_tmp_block == NULL) 1691 - return -ENOMEM; 1677 + if (!reg->hr_tmp_block) 1678 + goto out; 1692 1679 1693 1680 reg->hr_slots = kzalloc_objs(struct o2hb_disk_slot, reg->hr_blocks); 1694 - if (reg->hr_slots == NULL) 1695 - return -ENOMEM; 1681 + if (!reg->hr_slots) 1682 + goto out; 1696 1683 1697 - for(i = 0; i < reg->hr_blocks; i++) { 1684 + for (i = 0; i < reg->hr_blocks; i++) { 1698 1685 slot = &reg->hr_slots[i]; 1699 1686 slot->ds_node_num = i; 1700 1687 INIT_LIST_HEAD(&slot->ds_live_item); ··· 1708 1695 1709 1696 reg->hr_slot_data = kzalloc_objs(struct page *, reg->hr_num_pages); 1710 1697 if (!reg->hr_slot_data) 1711 - return -ENOMEM; 1698 + goto out; 1712 1699 1713 - for(i = 0; i < reg->hr_num_pages; i++) { 1700 + for (i = 0; i < reg->hr_num_pages; i++) { 1714 1701 page = alloc_page(GFP_KERNEL); 1715 1702 if (!page) 1716 - return -ENOMEM; 1703 + goto out; 1717 1704 1718 1705 reg->hr_slot_data[i] = page; 1719 1706 ··· 1733 1720 } 1734 1721 1735 1722 return 0; 1723 + 1724 + out: 1725 + o2hb_unmap_slot_data(reg); 1726 + return ret; 1736 1727 } 1737 1728 1738 1729 /* Read in all the slots available and populate the tracking ··· 1826 1809 "blocksize %u incorrect for device, expected %d", 1827 1810 reg->hr_block_bytes, sectsize); 1828 1811 ret = -EINVAL; 1829 - goto out3; 1812 + goto out; 1830 1813 } 1831 1814 1815 + reg->hr_aborted_start = 0; 1816 + reg->hr_node_deleted = 0; 1832 1817 o2hb_init_region_params(reg); 1833 1818 1834 1819 /* Generation of zero is invalid */ ··· 1842 1823 ret = o2hb_map_slot_data(reg); 1843 1824 if (ret) { 1844 1825 mlog_errno(ret); 1845 - goto out3; 1826 + goto out; 1846 1827 } 1847 1828 1848 1829 ret = o2hb_populate_slot_data(reg); 1849 1830 if (ret) { 1850 1831 mlog_errno(ret); 1851 - goto out3; 1832 + goto out; 1852 1833 } 1853 1834 1854 1835 INIT_DELAYED_WORK(&reg->hr_write_timeout_work, o2hb_write_timeout); ··· 1879 1860 if (IS_ERR(hb_task)) { 1880 1861 ret = PTR_ERR(hb_task); 1881 1862 mlog_errno(ret); 1882 - goto out3; 1863 + goto out; 1883 1864 } 1884 1865 1885 1866 spin_lock(&o2hb_live_lock); ··· 1896 1877 1897 1878 if (reg->hr_aborted_start) { 1898 1879 ret = -EIO; 1899 - goto out3; 1880 + goto out; 1900 1881 } 1901 1882 1902 1883 if (reg->hr_node_deleted) { 1903 1884 ret = -EINVAL; 1904 - goto out3; 1885 + goto out; 1905 1886 } 1906 1887 1907 1888 /* Ok, we were woken. Make sure it wasn't by drop_item() */ ··· 1920 1901 printk(KERN_NOTICE "o2hb: Heartbeat started on region %s (%pg)\n", 1921 1902 config_item_name(&reg->hr_item), reg_bdev(reg)); 1922 1903 1923 - out3: 1904 + out: 1924 1905 if (ret < 0) { 1906 + spin_lock(&o2hb_live_lock); 1907 + hb_task = reg->hr_task; 1908 + reg->hr_task = NULL; 1909 + spin_unlock(&o2hb_live_lock); 1910 + 1911 + if (hb_task) 1912 + kthread_stop(hb_task); 1913 + 1914 + o2hb_unmap_slot_data(reg); 1915 + 1925 1916 fput(reg->hr_bdev_file); 1926 1917 reg->hr_bdev_file = NULL; 1927 1918 }

+28 -21

fs/ocfs2/dir.c

··· 593 593 mlog(ML_ERROR, 594 594 "Checksum failed for dir index root block %llu\n", 595 595 (unsigned long long)bh->b_blocknr); 596 - return ret; 596 + goto bail; 597 597 } 598 598 599 599 if (!OCFS2_IS_VALID_DX_ROOT(dx_root)) { ··· 601 601 "Dir Index Root # %llu has bad signature %.*s\n", 602 602 (unsigned long long)le64_to_cpu(dx_root->dr_blkno), 603 603 7, dx_root->dr_signature); 604 + goto bail; 604 605 } 605 606 607 + if (!(dx_root->dr_flags & OCFS2_DX_FLAG_INLINE)) { 608 + struct ocfs2_extent_list *el = &dx_root->dr_list; 609 + 610 + if (le16_to_cpu(el->l_count) != ocfs2_extent_recs_per_dx_root(sb)) { 611 + ret = ocfs2_error(sb, 612 + "Dir Index Root # %llu has invalid l_count %u (expected %u)\n", 613 + (unsigned long long)le64_to_cpu(dx_root->dr_blkno), 614 + le16_to_cpu(el->l_count), 615 + ocfs2_extent_recs_per_dx_root(sb)); 616 + goto bail; 617 + } 618 + 619 + if (le16_to_cpu(el->l_next_free_rec) > le16_to_cpu(el->l_count)) { 620 + ret = ocfs2_error(sb, 621 + "Dir Index Root # %llu has invalid l_next_free_rec %u (l_count %u)\n", 622 + (unsigned long long)le64_to_cpu(dx_root->dr_blkno), 623 + le16_to_cpu(el->l_next_free_rec), 624 + le16_to_cpu(el->l_count)); 625 + goto bail; 626 + } 627 + } 628 + 629 + bail: 606 630 return ret; 607 631 } 608 632 ··· 815 791 struct ocfs2_extent_block *eb; 816 792 struct ocfs2_extent_rec *rec = NULL; 817 793 818 - if (le16_to_cpu(el->l_count) != 819 - ocfs2_extent_recs_per_dx_root(inode->i_sb)) { 820 - ret = ocfs2_error(inode->i_sb, 821 - "Inode %llu has invalid extent list length %u\n", 822 - inode->i_ino, le16_to_cpu(el->l_count)); 823 - goto out; 824 - } 825 - 826 794 if (el->l_tree_depth) { 827 795 ret = ocfs2_find_leaf(INODE_CACHE(inode), el, major_hash, 828 796 &eb_bh); ··· 835 819 } 836 820 } 837 821 838 - if (le16_to_cpu(el->l_next_free_rec) == 0) { 839 - ret = ocfs2_error(inode->i_sb, 840 - "Inode %llu has empty extent list at depth %u\n", 841 - inode->i_ino, 842 - le16_to_cpu(el->l_tree_depth)); 843 - goto out; 844 - } 845 - 846 822 found = 0; 847 823 for (i = le16_to_cpu(el->l_next_free_rec) - 1; i >= 0; i--) { 848 824 rec = &el->l_recs[i]; ··· 847 839 848 840 if (!found) { 849 841 ret = ocfs2_error(inode->i_sb, 850 - "Inode %llu has bad extent record (%u, %u, 0) in btree\n", 851 - inode->i_ino, 852 - le32_to_cpu(rec->e_cpos), 853 - ocfs2_rec_clusters(el, rec)); 842 + "Inode %llu has no extent record for hash %u in btree (next_free_rec %u)\n", 843 + inode->i_ino, major_hash, 844 + le16_to_cpu(el->l_next_free_rec)); 854 845 goto out; 855 846 } 856 847

+9 -1

fs/ocfs2/dlm/dlmdomain.c

··· 980 980 goto bail; 981 981 } 982 982 983 + if (qr->qr_numregions > O2NM_MAX_REGIONS) { 984 + mlog(ML_ERROR, "Domain %s: Joining node %d has invalid " 985 + "number of heartbeat regions %u\n", 986 + qr->qr_domain, qr->qr_node, qr->qr_numregions); 987 + status = -EINVAL; 988 + goto bail; 989 + } 990 + 983 991 r = remote; 984 992 for (i = 0; i < qr->qr_numregions; ++i) { 985 993 mlog(0, "Region %.*s\n", O2HB_MAX_REGION_NAME_LEN, r); ··· 1002 994 for (i = 0; i < localnr; ++i) { 1003 995 foundit = 0; 1004 996 r = remote; 1005 - for (j = 0; j <= qr->qr_numregions; ++j) { 997 + for (j = 0; j < qr->qr_numregions; ++j) { 1006 998 if (!memcmp(l, r, O2HB_MAX_REGION_NAME_LEN)) { 1007 999 foundit = 1; 1008 1000 break;

-1

fs/ocfs2/dlm/dlmmaster.c

··· 930 930 if (blocked) 931 931 goto wait; 932 932 933 - ret = -EINVAL; 934 933 dlm_node_iter_init(mle->vote_map, &iter); 935 934 while ((nodenum = dlm_node_iter_next(&iter)) >= 0) { 936 935 ret = dlm_do_master_request(res, mle, nodenum);

+17 -1

fs/ocfs2/ioctl.c

··· 441 441 struct buffer_head *bh = NULL; 442 442 struct ocfs2_group_desc *bg = NULL; 443 443 444 - unsigned int max_bits, num_clusters; 444 + unsigned int max_bits, max_bitmap_bits, num_clusters; 445 445 unsigned int offset = 0, cluster, chunk; 446 446 unsigned int chunk_free, last_chunksize = 0; 447 447 448 448 if (!le32_to_cpu(rec->c_free)) 449 449 goto bail; 450 + 451 + max_bitmap_bits = 8 * ocfs2_group_bitmap_size(osb->sb, 0, 452 + osb->s_feature_incompat); 450 453 451 454 do { 452 455 if (!bg) ··· 482 479 continue; 483 480 484 481 max_bits = le16_to_cpu(bg->bg_bits); 482 + 483 + /* 484 + * Non-coherent scans read raw blocks and do not get the 485 + * bg_bits validation from 486 + * ocfs2_read_group_descriptor(). 487 + */ 488 + if (max_bits > max_bitmap_bits) { 489 + mlog(ML_ERROR, 490 + "Group desc #%llu has %u bits, max bitmap bits %u\n", 491 + (unsigned long long)blkno, max_bits, max_bitmap_bits); 492 + max_bits = max_bitmap_bits; 493 + } 494 + 485 495 offset = 0; 486 496 487 497 for (chunk = 0; chunk < chunks_in_group; chunk++) {

+3 -4

fs/ocfs2/mmap.c

··· 30 30 31 31 static vm_fault_t ocfs2_fault(struct vm_fault *vmf) 32 32 { 33 - struct vm_area_struct *vma = vmf->vma; 33 + unsigned long long ip_blkno = 34 + OCFS2_I(file_inode(vmf->vma->vm_file))->ip_blkno; 34 35 sigset_t oldset; 35 36 vm_fault_t ret; 36 37 ··· 39 38 ret = filemap_fault(vmf); 40 39 ocfs2_unblock_signals(&oldset); 41 40 42 - trace_ocfs2_fault(OCFS2_I(vma->vm_file->f_mapping->host)->ip_blkno, 43 - vma, vmf->page, vmf->pgoff); 41 + trace_ocfs2_fault(ip_blkno, vmf->page, vmf->pgoff); 44 42 return ret; 45 43 } 46 - 47 44 static vm_fault_t __ocfs2_page_mkwrite(struct file *file, 48 45 struct buffer_head *di_bh, struct folio *folio) 49 46 {

+4 -6

fs/ocfs2/ocfs2_trace.h

··· 1246 1246 1247 1247 TRACE_EVENT(ocfs2_fault, 1248 1248 TP_PROTO(unsigned long long ino, 1249 - void *area, void *page, unsigned long pgoff), 1250 - TP_ARGS(ino, area, page, pgoff), 1249 + void *page, unsigned long pgoff), 1250 + TP_ARGS(ino, page, pgoff), 1251 1251 TP_STRUCT__entry( 1252 1252 __field(unsigned long long, ino) 1253 - __field(void *, area) 1254 1253 __field(void *, page) 1255 1254 __field(unsigned long, pgoff) 1256 1255 ), 1257 1256 TP_fast_assign( 1258 1257 __entry->ino = ino; 1259 - __entry->area = area; 1260 1258 __entry->page = page; 1261 1259 __entry->pgoff = pgoff; 1262 1260 ), 1263 - TP_printk("%llu %p %p %lu", 1264 - __entry->ino, __entry->area, __entry->page, __entry->pgoff) 1261 + TP_printk("%llu %p %lu", 1262 + __entry->ino, __entry->page, __entry->pgoff) 1265 1263 ); 1266 1264 1267 1265 /* End of trace events for fs/ocfs2/mmap.c. */

+15 -1

fs/ocfs2/quota_global.c

··· 311 311 spin_unlock(&dq_data_lock); 312 312 if (ex) { 313 313 inode_lock(oinfo->dqi_gqinode); 314 - down_write(&OCFS2_I(oinfo->dqi_gqinode)->ip_alloc_sem); 314 + if (!down_write_trylock(&OCFS2_I(oinfo->dqi_gqinode)->ip_alloc_sem)) { 315 + inode_unlock(oinfo->dqi_gqinode); 316 + status = -EBUSY; 317 + goto bail; 318 + } 315 319 } else { 316 320 down_read(&OCFS2_I(oinfo->dqi_gqinode)->ip_alloc_sem); 317 321 } 318 322 return 0; 323 + 324 + bail: 325 + /* does a similar job as ocfs2_unlock_global_qf */ 326 + ocfs2_inode_unlock(oinfo->dqi_gqinode, ex); 327 + brelse(oinfo->dqi_gqi_bh); 328 + spin_lock(&dq_data_lock); 329 + if (!--oinfo->dqi_gqi_count) 330 + oinfo->dqi_gqi_bh = NULL; 331 + spin_unlock(&dq_data_lock); 332 + return status; 319 333 } 320 334 321 335 void ocfs2_unlock_global_qf(struct ocfs2_mem_dqinfo *oinfo, int ex)

+3 -1

fs/ocfs2/quota_local.c

··· 1224 1224 int status; 1225 1225 u64 pcount; 1226 1226 1227 - down_write(&OCFS2_I(lqinode)->ip_alloc_sem); 1227 + if (!down_write_trylock(&OCFS2_I(lqinode)->ip_alloc_sem)) 1228 + return -EBUSY; 1229 + 1228 1230 chunk = ocfs2_find_free_entry(sb, type, &offset); 1229 1231 if (!chunk) { 1230 1232 chunk = ocfs2_extend_local_quota_file(sb, type, &offset);

+14 -8

fs/ocfs2/resize.c

··· 303 303 304 304 fe = (struct ocfs2_dinode *)main_bm_bh->b_data; 305 305 306 - /* main_bm_bh is validated by inode read inside ocfs2_inode_lock(), 307 - * so any corruption is a code bug. */ 308 - BUG_ON(!OCFS2_IS_VALID_DINODE(fe)); 306 + /* JBD-managed buffers can bypass validation, so treat this as corruption. */ 307 + if (!OCFS2_IS_VALID_DINODE(fe)) { 308 + ret = ocfs2_error(main_bm_inode->i_sb, 309 + "Invalid dinode #%llu\n", 310 + (unsigned long long)OCFS2_I(main_bm_inode)->ip_blkno); 311 + goto out_unlock; 312 + } 309 313 310 314 if (le16_to_cpu(fe->id2.i_chain.cl_cpg) != 311 315 ocfs2_group_bitmap_size(osb->sb, 0, ··· 508 504 goto out_unlock; 509 505 } 510 506 511 - ocfs2_set_new_buffer_uptodate(INODE_CACHE(inode), group_bh); 512 - 513 507 ret = ocfs2_verify_group_and_input(main_bm_inode, fe, input, group_bh); 514 508 if (ret) { 515 509 mlog_errno(ret); 516 510 goto out_free_group_bh; 517 511 } 512 + 513 + ocfs2_set_new_buffer_uptodate(INODE_CACHE(main_bm_inode), group_bh); 518 514 519 515 trace_ocfs2_group_add((unsigned long long)input->group, 520 516 input->chain, input->clusters, input->frees); ··· 523 519 if (IS_ERR(handle)) { 524 520 mlog_errno(PTR_ERR(handle)); 525 521 ret = -EINVAL; 526 - goto out_free_group_bh; 522 + goto out_remove_cache; 527 523 } 528 524 529 525 cl_bpc = le16_to_cpu(fe->id2.i_chain.cl_bpc); ··· 577 573 out_commit: 578 574 ocfs2_commit_trans(osb, handle); 579 575 580 - out_free_group_bh: 576 + out_remove_cache: 581 577 if (ret < 0) 582 - ocfs2_remove_from_cache(INODE_CACHE(inode), group_bh); 578 + ocfs2_remove_from_cache(INODE_CACHE(main_bm_inode), group_bh); 579 + 580 + out_free_group_bh: 583 581 brelse(group_bh); 584 582 585 583 out_unlock:

+25

fs/ocfs2/suballoc.c

··· 197 197 8 * le16_to_cpu(gd->bg_size)); 198 198 } 199 199 200 + /* 201 + * For discontiguous block groups, validate the on-disk extent list 202 + * against the maximum number of extent records that can physically 203 + * fit in a single block. 204 + */ 205 + if (ocfs2_gd_is_discontig(gd)) { 206 + u16 max_recs = ocfs2_extent_recs_per_gd(sb); 207 + u16 l_count = le16_to_cpu(gd->bg_list.l_count); 208 + u16 l_next_free_rec = le16_to_cpu(gd->bg_list.l_next_free_rec); 209 + 210 + if (l_count != max_recs) { 211 + do_error("Group descriptor #%llu bad discontig l_count %u expected %u\n", 212 + (unsigned long long)bh->b_blocknr, 213 + l_count, 214 + max_recs); 215 + } 216 + 217 + if (l_next_free_rec > l_count) { 218 + do_error("Group descriptor #%llu bad discontig l_next_free_rec %u max %u\n", 219 + (unsigned long long)bh->b_blocknr, 220 + l_next_free_rec, 221 + l_count); 222 + } 223 + } 224 + 200 225 return 0; 201 226 } 202 227

+1 -1

fs/ocfs2/super.c

··· 2124 2124 osb->osb_cluster_stack[0] = '\0'; 2125 2125 } 2126 2126 2127 - get_random_bytes(&osb->s_next_generation, sizeof(u32)); 2127 + osb->s_next_generation = get_random_u32(); 2128 2128 2129 2129 /* 2130 2130 * FIXME

+2 -2

fs/ocfs2/xattr.c

··· 911 911 total_len = prefix_len + name_len + 1; 912 912 *result += total_len; 913 913 914 - /* we are just looking for how big our buffer needs to be */ 915 - if (!size) 914 + /* No buffer means we are only looking for the required size. */ 915 + if (!buffer) 916 916 return 0; 917 917 918 918 if (*result > size)

+1 -1

fs/proc/array.c

··· 280 280 blocked = p->blocked; 281 281 collect_sigign_sigcatch(p, &ignored, &caught); 282 282 num_threads = get_nr_threads(p); 283 - rcu_read_lock(); /* FIXME: is this correct? */ 283 + rcu_read_lock(); 284 284 qsize = get_rlimit_value(task_ucounts(p), UCOUNT_RLIMIT_SIGPENDING); 285 285 rcu_read_unlock(); 286 286 qlim = task_rlimit(p, RLIMIT_SIGPENDING);

-2

fs/ubifs/gc.c

··· 109 109 struct ubifs_info *c = priv; 110 110 struct ubifs_scan_node *sa, *sb; 111 111 112 - cond_resched(); 113 112 if (a == b) 114 113 return 0; 115 114 ··· 152 153 struct ubifs_info *c = priv; 153 154 struct ubifs_scan_node *sa, *sb; 154 155 155 - cond_resched(); 156 156 if (a == b) 157 157 return 0; 158 158

-1

fs/ubifs/replay.c

··· 305 305 struct ubifs_info *c = priv; 306 306 struct replay_entry *ra, *rb; 307 307 308 - cond_resched(); 309 308 if (a == b) 310 309 return 0; 311 310

-1

include/asm-generic/Kbuild

··· 65 65 mandatory-y += vga.h 66 66 mandatory-y += video.h 67 67 mandatory-y += word-at-a-time.h 68 - mandatory-y += xor.h

-738

include/asm-generic/xor.h

··· 1 - /* SPDX-License-Identifier: GPL-2.0-or-later */ 2 - /* 3 - * include/asm-generic/xor.h 4 - * 5 - * Generic optimized RAID-5 checksumming functions. 6 - */ 7 - 8 - #include <linux/prefetch.h> 9 - 10 - static void 11 - xor_8regs_2(unsigned long bytes, unsigned long * __restrict p1, 12 - const unsigned long * __restrict p2) 13 - { 14 - long lines = bytes / (sizeof (long)) / 8; 15 - 16 - do { 17 - p1[0] ^= p2[0]; 18 - p1[1] ^= p2[1]; 19 - p1[2] ^= p2[2]; 20 - p1[3] ^= p2[3]; 21 - p1[4] ^= p2[4]; 22 - p1[5] ^= p2[5]; 23 - p1[6] ^= p2[6]; 24 - p1[7] ^= p2[7]; 25 - p1 += 8; 26 - p2 += 8; 27 - } while (--lines > 0); 28 - } 29 - 30 - static void 31 - xor_8regs_3(unsigned long bytes, unsigned long * __restrict p1, 32 - const unsigned long * __restrict p2, 33 - const unsigned long * __restrict p3) 34 - { 35 - long lines = bytes / (sizeof (long)) / 8; 36 - 37 - do { 38 - p1[0] ^= p2[0] ^ p3[0]; 39 - p1[1] ^= p2[1] ^ p3[1]; 40 - p1[2] ^= p2[2] ^ p3[2]; 41 - p1[3] ^= p2[3] ^ p3[3]; 42 - p1[4] ^= p2[4] ^ p3[4]; 43 - p1[5] ^= p2[5] ^ p3[5]; 44 - p1[6] ^= p2[6] ^ p3[6]; 45 - p1[7] ^= p2[7] ^ p3[7]; 46 - p1 += 8; 47 - p2 += 8; 48 - p3 += 8; 49 - } while (--lines > 0); 50 - } 51 - 52 - static void 53 - xor_8regs_4(unsigned long bytes, unsigned long * __restrict p1, 54 - const unsigned long * __restrict p2, 55 - const unsigned long * __restrict p3, 56 - const unsigned long * __restrict p4) 57 - { 58 - long lines = bytes / (sizeof (long)) / 8; 59 - 60 - do { 61 - p1[0] ^= p2[0] ^ p3[0] ^ p4[0]; 62 - p1[1] ^= p2[1] ^ p3[1] ^ p4[1]; 63 - p1[2] ^= p2[2] ^ p3[2] ^ p4[2]; 64 - p1[3] ^= p2[3] ^ p3[3] ^ p4[3]; 65 - p1[4] ^= p2[4] ^ p3[4] ^ p4[4]; 66 - p1[5] ^= p2[5] ^ p3[5] ^ p4[5]; 67 - p1[6] ^= p2[6] ^ p3[6] ^ p4[6]; 68 - p1[7] ^= p2[7] ^ p3[7] ^ p4[7]; 69 - p1 += 8; 70 - p2 += 8; 71 - p3 += 8; 72 - p4 += 8; 73 - } while (--lines > 0); 74 - } 75 - 76 - static void 77 - xor_8regs_5(unsigned long bytes, unsigned long * __restrict p1, 78 - const unsigned long * __restrict p2, 79 - const unsigned long * __restrict p3, 80 - const unsigned long * __restrict p4, 81 - const unsigned long * __restrict p5) 82 - { 83 - long lines = bytes / (sizeof (long)) / 8; 84 - 85 - do { 86 - p1[0] ^= p2[0] ^ p3[0] ^ p4[0] ^ p5[0]; 87 - p1[1] ^= p2[1] ^ p3[1] ^ p4[1] ^ p5[1]; 88 - p1[2] ^= p2[2] ^ p3[2] ^ p4[2] ^ p5[2]; 89 - p1[3] ^= p2[3] ^ p3[3] ^ p4[3] ^ p5[3]; 90 - p1[4] ^= p2[4] ^ p3[4] ^ p4[4] ^ p5[4]; 91 - p1[5] ^= p2[5] ^ p3[5] ^ p4[5] ^ p5[5]; 92 - p1[6] ^= p2[6] ^ p3[6] ^ p4[6] ^ p5[6]; 93 - p1[7] ^= p2[7] ^ p3[7] ^ p4[7] ^ p5[7]; 94 - p1 += 8; 95 - p2 += 8; 96 - p3 += 8; 97 - p4 += 8; 98 - p5 += 8; 99 - } while (--lines > 0); 100 - } 101 - 102 - static void 103 - xor_32regs_2(unsigned long bytes, unsigned long * __restrict p1, 104 - const unsigned long * __restrict p2) 105 - { 106 - long lines = bytes / (sizeof (long)) / 8; 107 - 108 - do { 109 - register long d0, d1, d2, d3, d4, d5, d6, d7; 110 - d0 = p1[0]; /* Pull the stuff into registers */ 111 - d1 = p1[1]; /* ... in bursts, if possible. */ 112 - d2 = p1[2]; 113 - d3 = p1[3]; 114 - d4 = p1[4]; 115 - d5 = p1[5]; 116 - d6 = p1[6]; 117 - d7 = p1[7]; 118 - d0 ^= p2[0]; 119 - d1 ^= p2[1]; 120 - d2 ^= p2[2]; 121 - d3 ^= p2[3]; 122 - d4 ^= p2[4]; 123 - d5 ^= p2[5]; 124 - d6 ^= p2[6]; 125 - d7 ^= p2[7]; 126 - p1[0] = d0; /* Store the result (in bursts) */ 127 - p1[1] = d1; 128 - p1[2] = d2; 129 - p1[3] = d3; 130 - p1[4] = d4; 131 - p1[5] = d5; 132 - p1[6] = d6; 133 - p1[7] = d7; 134 - p1 += 8; 135 - p2 += 8; 136 - } while (--lines > 0); 137 - } 138 - 139 - static void 140 - xor_32regs_3(unsigned long bytes, unsigned long * __restrict p1, 141 - const unsigned long * __restrict p2, 142 - const unsigned long * __restrict p3) 143 - { 144 - long lines = bytes / (sizeof (long)) / 8; 145 - 146 - do { 147 - register long d0, d1, d2, d3, d4, d5, d6, d7; 148 - d0 = p1[0]; /* Pull the stuff into registers */ 149 - d1 = p1[1]; /* ... in bursts, if possible. */ 150 - d2 = p1[2]; 151 - d3 = p1[3]; 152 - d4 = p1[4]; 153 - d5 = p1[5]; 154 - d6 = p1[6]; 155 - d7 = p1[7]; 156 - d0 ^= p2[0]; 157 - d1 ^= p2[1]; 158 - d2 ^= p2[2]; 159 - d3 ^= p2[3]; 160 - d4 ^= p2[4]; 161 - d5 ^= p2[5]; 162 - d6 ^= p2[6]; 163 - d7 ^= p2[7]; 164 - d0 ^= p3[0]; 165 - d1 ^= p3[1]; 166 - d2 ^= p3[2]; 167 - d3 ^= p3[3]; 168 - d4 ^= p3[4]; 169 - d5 ^= p3[5]; 170 - d6 ^= p3[6]; 171 - d7 ^= p3[7]; 172 - p1[0] = d0; /* Store the result (in bursts) */ 173 - p1[1] = d1; 174 - p1[2] = d2; 175 - p1[3] = d3; 176 - p1[4] = d4; 177 - p1[5] = d5; 178 - p1[6] = d6; 179 - p1[7] = d7; 180 - p1 += 8; 181 - p2 += 8; 182 - p3 += 8; 183 - } while (--lines > 0); 184 - } 185 - 186 - static void 187 - xor_32regs_4(unsigned long bytes, unsigned long * __restrict p1, 188 - const unsigned long * __restrict p2, 189 - const unsigned long * __restrict p3, 190 - const unsigned long * __restrict p4) 191 - { 192 - long lines = bytes / (sizeof (long)) / 8; 193 - 194 - do { 195 - register long d0, d1, d2, d3, d4, d5, d6, d7; 196 - d0 = p1[0]; /* Pull the stuff into registers */ 197 - d1 = p1[1]; /* ... in bursts, if possible. */ 198 - d2 = p1[2]; 199 - d3 = p1[3]; 200 - d4 = p1[4]; 201 - d5 = p1[5]; 202 - d6 = p1[6]; 203 - d7 = p1[7]; 204 - d0 ^= p2[0]; 205 - d1 ^= p2[1]; 206 - d2 ^= p2[2]; 207 - d3 ^= p2[3]; 208 - d4 ^= p2[4]; 209 - d5 ^= p2[5]; 210 - d6 ^= p2[6]; 211 - d7 ^= p2[7]; 212 - d0 ^= p3[0]; 213 - d1 ^= p3[1]; 214 - d2 ^= p3[2]; 215 - d3 ^= p3[3]; 216 - d4 ^= p3[4]; 217 - d5 ^= p3[5]; 218 - d6 ^= p3[6]; 219 - d7 ^= p3[7]; 220 - d0 ^= p4[0]; 221 - d1 ^= p4[1]; 222 - d2 ^= p4[2]; 223 - d3 ^= p4[3]; 224 - d4 ^= p4[4]; 225 - d5 ^= p4[5]; 226 - d6 ^= p4[6]; 227 - d7 ^= p4[7]; 228 - p1[0] = d0; /* Store the result (in bursts) */ 229 - p1[1] = d1; 230 - p1[2] = d2; 231 - p1[3] = d3; 232 - p1[4] = d4; 233 - p1[5] = d5; 234 - p1[6] = d6; 235 - p1[7] = d7; 236 - p1 += 8; 237 - p2 += 8; 238 - p3 += 8; 239 - p4 += 8; 240 - } while (--lines > 0); 241 - } 242 - 243 - static void 244 - xor_32regs_5(unsigned long bytes, unsigned long * __restrict p1, 245 - const unsigned long * __restrict p2, 246 - const unsigned long * __restrict p3, 247 - const unsigned long * __restrict p4, 248 - const unsigned long * __restrict p5) 249 - { 250 - long lines = bytes / (sizeof (long)) / 8; 251 - 252 - do { 253 - register long d0, d1, d2, d3, d4, d5, d6, d7; 254 - d0 = p1[0]; /* Pull the stuff into registers */ 255 - d1 = p1[1]; /* ... in bursts, if possible. */ 256 - d2 = p1[2]; 257 - d3 = p1[3]; 258 - d4 = p1[4]; 259 - d5 = p1[5]; 260 - d6 = p1[6]; 261 - d7 = p1[7]; 262 - d0 ^= p2[0]; 263 - d1 ^= p2[1]; 264 - d2 ^= p2[2]; 265 - d3 ^= p2[3]; 266 - d4 ^= p2[4]; 267 - d5 ^= p2[5]; 268 - d6 ^= p2[6]; 269 - d7 ^= p2[7]; 270 - d0 ^= p3[0]; 271 - d1 ^= p3[1]; 272 - d2 ^= p3[2]; 273 - d3 ^= p3[3]; 274 - d4 ^= p3[4]; 275 - d5 ^= p3[5]; 276 - d6 ^= p3[6]; 277 - d7 ^= p3[7]; 278 - d0 ^= p4[0]; 279 - d1 ^= p4[1]; 280 - d2 ^= p4[2]; 281 - d3 ^= p4[3]; 282 - d4 ^= p4[4]; 283 - d5 ^= p4[5]; 284 - d6 ^= p4[6]; 285 - d7 ^= p4[7]; 286 - d0 ^= p5[0]; 287 - d1 ^= p5[1]; 288 - d2 ^= p5[2]; 289 - d3 ^= p5[3]; 290 - d4 ^= p5[4]; 291 - d5 ^= p5[5]; 292 - d6 ^= p5[6]; 293 - d7 ^= p5[7]; 294 - p1[0] = d0; /* Store the result (in bursts) */ 295 - p1[1] = d1; 296 - p1[2] = d2; 297 - p1[3] = d3; 298 - p1[4] = d4; 299 - p1[5] = d5; 300 - p1[6] = d6; 301 - p1[7] = d7; 302 - p1 += 8; 303 - p2 += 8; 304 - p3 += 8; 305 - p4 += 8; 306 - p5 += 8; 307 - } while (--lines > 0); 308 - } 309 - 310 - static void 311 - xor_8regs_p_2(unsigned long bytes, unsigned long * __restrict p1, 312 - const unsigned long * __restrict p2) 313 - { 314 - long lines = bytes / (sizeof (long)) / 8 - 1; 315 - prefetchw(p1); 316 - prefetch(p2); 317 - 318 - do { 319 - prefetchw(p1+8); 320 - prefetch(p2+8); 321 - once_more: 322 - p1[0] ^= p2[0]; 323 - p1[1] ^= p2[1]; 324 - p1[2] ^= p2[2]; 325 - p1[3] ^= p2[3]; 326 - p1[4] ^= p2[4]; 327 - p1[5] ^= p2[5]; 328 - p1[6] ^= p2[6]; 329 - p1[7] ^= p2[7]; 330 - p1 += 8; 331 - p2 += 8; 332 - } while (--lines > 0); 333 - if (lines == 0) 334 - goto once_more; 335 - } 336 - 337 - static void 338 - xor_8regs_p_3(unsigned long bytes, unsigned long * __restrict p1, 339 - const unsigned long * __restrict p2, 340 - const unsigned long * __restrict p3) 341 - { 342 - long lines = bytes / (sizeof (long)) / 8 - 1; 343 - prefetchw(p1); 344 - prefetch(p2); 345 - prefetch(p3); 346 - 347 - do { 348 - prefetchw(p1+8); 349 - prefetch(p2+8); 350 - prefetch(p3+8); 351 - once_more: 352 - p1[0] ^= p2[0] ^ p3[0]; 353 - p1[1] ^= p2[1] ^ p3[1]; 354 - p1[2] ^= p2[2] ^ p3[2]; 355 - p1[3] ^= p2[3] ^ p3[3]; 356 - p1[4] ^= p2[4] ^ p3[4]; 357 - p1[5] ^= p2[5] ^ p3[5]; 358 - p1[6] ^= p2[6] ^ p3[6]; 359 - p1[7] ^= p2[7] ^ p3[7]; 360 - p1 += 8; 361 - p2 += 8; 362 - p3 += 8; 363 - } while (--lines > 0); 364 - if (lines == 0) 365 - goto once_more; 366 - } 367 - 368 - static void 369 - xor_8regs_p_4(unsigned long bytes, unsigned long * __restrict p1, 370 - const unsigned long * __restrict p2, 371 - const unsigned long * __restrict p3, 372 - const unsigned long * __restrict p4) 373 - { 374 - long lines = bytes / (sizeof (long)) / 8 - 1; 375 - 376 - prefetchw(p1); 377 - prefetch(p2); 378 - prefetch(p3); 379 - prefetch(p4); 380 - 381 - do { 382 - prefetchw(p1+8); 383 - prefetch(p2+8); 384 - prefetch(p3+8); 385 - prefetch(p4+8); 386 - once_more: 387 - p1[0] ^= p2[0] ^ p3[0] ^ p4[0]; 388 - p1[1] ^= p2[1] ^ p3[1] ^ p4[1]; 389 - p1[2] ^= p2[2] ^ p3[2] ^ p4[2]; 390 - p1[3] ^= p2[3] ^ p3[3] ^ p4[3]; 391 - p1[4] ^= p2[4] ^ p3[4] ^ p4[4]; 392 - p1[5] ^= p2[5] ^ p3[5] ^ p4[5]; 393 - p1[6] ^= p2[6] ^ p3[6] ^ p4[6]; 394 - p1[7] ^= p2[7] ^ p3[7] ^ p4[7]; 395 - p1 += 8; 396 - p2 += 8; 397 - p3 += 8; 398 - p4 += 8; 399 - } while (--lines > 0); 400 - if (lines == 0) 401 - goto once_more; 402 - } 403 - 404 - static void 405 - xor_8regs_p_5(unsigned long bytes, unsigned long * __restrict p1, 406 - const unsigned long * __restrict p2, 407 - const unsigned long * __restrict p3, 408 - const unsigned long * __restrict p4, 409 - const unsigned long * __restrict p5) 410 - { 411 - long lines = bytes / (sizeof (long)) / 8 - 1; 412 - 413 - prefetchw(p1); 414 - prefetch(p2); 415 - prefetch(p3); 416 - prefetch(p4); 417 - prefetch(p5); 418 - 419 - do { 420 - prefetchw(p1+8); 421 - prefetch(p2+8); 422 - prefetch(p3+8); 423 - prefetch(p4+8); 424 - prefetch(p5+8); 425 - once_more: 426 - p1[0] ^= p2[0] ^ p3[0] ^ p4[0] ^ p5[0]; 427 - p1[1] ^= p2[1] ^ p3[1] ^ p4[1] ^ p5[1]; 428 - p1[2] ^= p2[2] ^ p3[2] ^ p4[2] ^ p5[2]; 429 - p1[3] ^= p2[3] ^ p3[3] ^ p4[3] ^ p5[3]; 430 - p1[4] ^= p2[4] ^ p3[4] ^ p4[4] ^ p5[4]; 431 - p1[5] ^= p2[5] ^ p3[5] ^ p4[5] ^ p5[5]; 432 - p1[6] ^= p2[6] ^ p3[6] ^ p4[6] ^ p5[6]; 433 - p1[7] ^= p2[7] ^ p3[7] ^ p4[7] ^ p5[7]; 434 - p1 += 8; 435 - p2 += 8; 436 - p3 += 8; 437 - p4 += 8; 438 - p5 += 8; 439 - } while (--lines > 0); 440 - if (lines == 0) 441 - goto once_more; 442 - } 443 - 444 - static void 445 - xor_32regs_p_2(unsigned long bytes, unsigned long * __restrict p1, 446 - const unsigned long * __restrict p2) 447 - { 448 - long lines = bytes / (sizeof (long)) / 8 - 1; 449 - 450 - prefetchw(p1); 451 - prefetch(p2); 452 - 453 - do { 454 - register long d0, d1, d2, d3, d4, d5, d6, d7; 455 - 456 - prefetchw(p1+8); 457 - prefetch(p2+8); 458 - once_more: 459 - d0 = p1[0]; /* Pull the stuff into registers */ 460 - d1 = p1[1]; /* ... in bursts, if possible. */ 461 - d2 = p1[2]; 462 - d3 = p1[3]; 463 - d4 = p1[4]; 464 - d5 = p1[5]; 465 - d6 = p1[6]; 466 - d7 = p1[7]; 467 - d0 ^= p2[0]; 468 - d1 ^= p2[1]; 469 - d2 ^= p2[2]; 470 - d3 ^= p2[3]; 471 - d4 ^= p2[4]; 472 - d5 ^= p2[5]; 473 - d6 ^= p2[6]; 474 - d7 ^= p2[7]; 475 - p1[0] = d0; /* Store the result (in bursts) */ 476 - p1[1] = d1; 477 - p1[2] = d2; 478 - p1[3] = d3; 479 - p1[4] = d4; 480 - p1[5] = d5; 481 - p1[6] = d6; 482 - p1[7] = d7; 483 - p1 += 8; 484 - p2 += 8; 485 - } while (--lines > 0); 486 - if (lines == 0) 487 - goto once_more; 488 - } 489 - 490 - static void 491 - xor_32regs_p_3(unsigned long bytes, unsigned long * __restrict p1, 492 - const unsigned long * __restrict p2, 493 - const unsigned long * __restrict p3) 494 - { 495 - long lines = bytes / (sizeof (long)) / 8 - 1; 496 - 497 - prefetchw(p1); 498 - prefetch(p2); 499 - prefetch(p3); 500 - 501 - do { 502 - register long d0, d1, d2, d3, d4, d5, d6, d7; 503 - 504 - prefetchw(p1+8); 505 - prefetch(p2+8); 506 - prefetch(p3+8); 507 - once_more: 508 - d0 = p1[0]; /* Pull the stuff into registers */ 509 - d1 = p1[1]; /* ... in bursts, if possible. */ 510 - d2 = p1[2]; 511 - d3 = p1[3]; 512 - d4 = p1[4]; 513 - d5 = p1[5]; 514 - d6 = p1[6]; 515 - d7 = p1[7]; 516 - d0 ^= p2[0]; 517 - d1 ^= p2[1]; 518 - d2 ^= p2[2]; 519 - d3 ^= p2[3]; 520 - d4 ^= p2[4]; 521 - d5 ^= p2[5]; 522 - d6 ^= p2[6]; 523 - d7 ^= p2[7]; 524 - d0 ^= p3[0]; 525 - d1 ^= p3[1]; 526 - d2 ^= p3[2]; 527 - d3 ^= p3[3]; 528 - d4 ^= p3[4]; 529 - d5 ^= p3[5]; 530 - d6 ^= p3[6]; 531 - d7 ^= p3[7]; 532 - p1[0] = d0; /* Store the result (in bursts) */ 533 - p1[1] = d1; 534 - p1[2] = d2; 535 - p1[3] = d3; 536 - p1[4] = d4; 537 - p1[5] = d5; 538 - p1[6] = d6; 539 - p1[7] = d7; 540 - p1 += 8; 541 - p2 += 8; 542 - p3 += 8; 543 - } while (--lines > 0); 544 - if (lines == 0) 545 - goto once_more; 546 - } 547 - 548 - static void 549 - xor_32regs_p_4(unsigned long bytes, unsigned long * __restrict p1, 550 - const unsigned long * __restrict p2, 551 - const unsigned long * __restrict p3, 552 - const unsigned long * __restrict p4) 553 - { 554 - long lines = bytes / (sizeof (long)) / 8 - 1; 555 - 556 - prefetchw(p1); 557 - prefetch(p2); 558 - prefetch(p3); 559 - prefetch(p4); 560 - 561 - do { 562 - register long d0, d1, d2, d3, d4, d5, d6, d7; 563 - 564 - prefetchw(p1+8); 565 - prefetch(p2+8); 566 - prefetch(p3+8); 567 - prefetch(p4+8); 568 - once_more: 569 - d0 = p1[0]; /* Pull the stuff into registers */ 570 - d1 = p1[1]; /* ... in bursts, if possible. */ 571 - d2 = p1[2]; 572 - d3 = p1[3]; 573 - d4 = p1[4]; 574 - d5 = p1[5]; 575 - d6 = p1[6]; 576 - d7 = p1[7]; 577 - d0 ^= p2[0]; 578 - d1 ^= p2[1]; 579 - d2 ^= p2[2]; 580 - d3 ^= p2[3]; 581 - d4 ^= p2[4]; 582 - d5 ^= p2[5]; 583 - d6 ^= p2[6]; 584 - d7 ^= p2[7]; 585 - d0 ^= p3[0]; 586 - d1 ^= p3[1]; 587 - d2 ^= p3[2]; 588 - d3 ^= p3[3]; 589 - d4 ^= p3[4]; 590 - d5 ^= p3[5]; 591 - d6 ^= p3[6]; 592 - d7 ^= p3[7]; 593 - d0 ^= p4[0]; 594 - d1 ^= p4[1]; 595 - d2 ^= p4[2]; 596 - d3 ^= p4[3]; 597 - d4 ^= p4[4]; 598 - d5 ^= p4[5]; 599 - d6 ^= p4[6]; 600 - d7 ^= p4[7]; 601 - p1[0] = d0; /* Store the result (in bursts) */ 602 - p1[1] = d1; 603 - p1[2] = d2; 604 - p1[3] = d3; 605 - p1[4] = d4; 606 - p1[5] = d5; 607 - p1[6] = d6; 608 - p1[7] = d7; 609 - p1 += 8; 610 - p2 += 8; 611 - p3 += 8; 612 - p4 += 8; 613 - } while (--lines > 0); 614 - if (lines == 0) 615 - goto once_more; 616 - } 617 - 618 - static void 619 - xor_32regs_p_5(unsigned long bytes, unsigned long * __restrict p1, 620 - const unsigned long * __restrict p2, 621 - const unsigned long * __restrict p3, 622 - const unsigned long * __restrict p4, 623 - const unsigned long * __restrict p5) 624 - { 625 - long lines = bytes / (sizeof (long)) / 8 - 1; 626 - 627 - prefetchw(p1); 628 - prefetch(p2); 629 - prefetch(p3); 630 - prefetch(p4); 631 - prefetch(p5); 632 - 633 - do { 634 - register long d0, d1, d2, d3, d4, d5, d6, d7; 635 - 636 - prefetchw(p1+8); 637 - prefetch(p2+8); 638 - prefetch(p3+8); 639 - prefetch(p4+8); 640 - prefetch(p5+8); 641 - once_more: 642 - d0 = p1[0]; /* Pull the stuff into registers */ 643 - d1 = p1[1]; /* ... in bursts, if possible. */ 644 - d2 = p1[2]; 645 - d3 = p1[3]; 646 - d4 = p1[4]; 647 - d5 = p1[5]; 648 - d6 = p1[6]; 649 - d7 = p1[7]; 650 - d0 ^= p2[0]; 651 - d1 ^= p2[1]; 652 - d2 ^= p2[2]; 653 - d3 ^= p2[3]; 654 - d4 ^= p2[4]; 655 - d5 ^= p2[5]; 656 - d6 ^= p2[6]; 657 - d7 ^= p2[7]; 658 - d0 ^= p3[0]; 659 - d1 ^= p3[1]; 660 - d2 ^= p3[2]; 661 - d3 ^= p3[3]; 662 - d4 ^= p3[4]; 663 - d5 ^= p3[5]; 664 - d6 ^= p3[6]; 665 - d7 ^= p3[7]; 666 - d0 ^= p4[0]; 667 - d1 ^= p4[1]; 668 - d2 ^= p4[2]; 669 - d3 ^= p4[3]; 670 - d4 ^= p4[4]; 671 - d5 ^= p4[5]; 672 - d6 ^= p4[6]; 673 - d7 ^= p4[7]; 674 - d0 ^= p5[0]; 675 - d1 ^= p5[1]; 676 - d2 ^= p5[2]; 677 - d3 ^= p5[3]; 678 - d4 ^= p5[4]; 679 - d5 ^= p5[5]; 680 - d6 ^= p5[6]; 681 - d7 ^= p5[7]; 682 - p1[0] = d0; /* Store the result (in bursts) */ 683 - p1[1] = d1; 684 - p1[2] = d2; 685 - p1[3] = d3; 686 - p1[4] = d4; 687 - p1[5] = d5; 688 - p1[6] = d6; 689 - p1[7] = d7; 690 - p1 += 8; 691 - p2 += 8; 692 - p3 += 8; 693 - p4 += 8; 694 - p5 += 8; 695 - } while (--lines > 0); 696 - if (lines == 0) 697 - goto once_more; 698 - } 699 - 700 - static struct xor_block_template xor_block_8regs = { 701 - .name = "8regs", 702 - .do_2 = xor_8regs_2, 703 - .do_3 = xor_8regs_3, 704 - .do_4 = xor_8regs_4, 705 - .do_5 = xor_8regs_5, 706 - }; 707 - 708 - static struct xor_block_template xor_block_32regs = { 709 - .name = "32regs", 710 - .do_2 = xor_32regs_2, 711 - .do_3 = xor_32regs_3, 712 - .do_4 = xor_32regs_4, 713 - .do_5 = xor_32regs_5, 714 - }; 715 - 716 - static struct xor_block_template xor_block_8regs_p __maybe_unused = { 717 - .name = "8regs_prefetch", 718 - .do_2 = xor_8regs_p_2, 719 - .do_3 = xor_8regs_p_3, 720 - .do_4 = xor_8regs_p_4, 721 - .do_5 = xor_8regs_p_5, 722 - }; 723 - 724 - static struct xor_block_template xor_block_32regs_p __maybe_unused = { 725 - .name = "32regs_prefetch", 726 - .do_2 = xor_32regs_p_2, 727 - .do_3 = xor_32regs_p_3, 728 - .do_4 = xor_32regs_p_4, 729 - .do_5 = xor_32regs_p_5, 730 - }; 731 - 732 - #define XOR_TRY_TEMPLATES \ 733 - do { \ 734 - xor_speed(&xor_block_8regs); \ 735 - xor_speed(&xor_block_8regs_p); \ 736 - xor_speed(&xor_block_32regs); \ 737 - xor_speed(&xor_block_32regs_p); \ 738 - } while (0)

+7 -7

include/linux/crash_core.h

··· 34 34 static inline void arch_kexec_unprotect_crashkres(void) { } 35 35 #endif 36 36 37 - #ifdef CONFIG_CRASH_DM_CRYPT 38 - int crash_load_dm_crypt_keys(struct kimage *image); 39 - ssize_t dm_crypt_keys_read(char *buf, size_t count, u64 *ppos); 40 - #else 41 - static inline int crash_load_dm_crypt_keys(struct kimage *image) {return 0; } 42 - #endif 43 - 44 37 #ifndef arch_crash_handle_hotplug_event 45 38 static inline void arch_crash_handle_hotplug_event(struct kimage *image, void *arg) { } 46 39 #endif ··· 88 95 static inline void crash_save_cpu(struct pt_regs *regs, int cpu) {}; 89 96 static inline int kimage_crash_copy_vmcoreinfo(struct kimage *image) { return 0; }; 90 97 #endif /* CONFIG_CRASH_DUMP*/ 98 + 99 + #ifdef CONFIG_CRASH_DM_CRYPT 100 + int crash_load_dm_crypt_keys(struct kimage *image); 101 + ssize_t dm_crypt_keys_read(char *buf, size_t count, u64 *ppos); 102 + #else 103 + static inline int crash_load_dm_crypt_keys(struct kimage *image) {return 0; } 104 + #endif 91 105 92 106 #endif /* LINUX_CRASH_CORE_H */

+1

include/linux/nmi.h

··· 21 21 extern int watchdog_user_enabled; 22 22 extern int watchdog_thresh; 23 23 extern unsigned long watchdog_enabled; 24 + extern int watchdog_hardlockup_miss_thresh; 24 25 25 26 extern struct cpumask watchdog_cpumask; 26 27 extern unsigned long *watchdog_cpumask_bits;

+2 -25

include/linux/raid/xor.h

··· 2 2 #ifndef _XOR_H 3 3 #define _XOR_H 4 4 5 - #define MAX_XOR_BLOCKS 4 5 + void xor_gen(void *dest, void **srcs, unsigned int src_cnt, unsigned int bytes); 6 6 7 - extern void xor_blocks(unsigned int count, unsigned int bytes, 8 - void *dest, void **srcs); 9 - 10 - struct xor_block_template { 11 - struct xor_block_template *next; 12 - const char *name; 13 - int speed; 14 - void (*do_2)(unsigned long, unsigned long * __restrict, 15 - const unsigned long * __restrict); 16 - void (*do_3)(unsigned long, unsigned long * __restrict, 17 - const unsigned long * __restrict, 18 - const unsigned long * __restrict); 19 - void (*do_4)(unsigned long, unsigned long * __restrict, 20 - const unsigned long * __restrict, 21 - const unsigned long * __restrict, 22 - const unsigned long * __restrict); 23 - void (*do_5)(unsigned long, unsigned long * __restrict, 24 - const unsigned long * __restrict, 25 - const unsigned long * __restrict, 26 - const unsigned long * __restrict, 27 - const unsigned long * __restrict); 28 - }; 29 - 30 - #endif 7 + #endif /* _XOR_H */

-2

kernel/crash_core.c

··· 27 27 #include <asm/page.h> 28 28 #include <asm/sections.h> 29 29 30 - #include <crypto/sha1.h> 31 - 32 30 #include "kallsyms_internal.h" 33 31 #include "kexec_internal.h" 34 32

+13 -9

kernel/crash_dump_dm_crypt.c

··· 6 6 #include <linux/cc_platform.h> 7 7 #include <linux/configfs.h> 8 8 #include <linux/module.h> 9 + #include <linux/sysfs.h> 9 10 10 11 #define KEY_NUM_MAX 128 /* maximum dm crypt keys */ 11 12 #define KEY_SIZE_MAX 256 /* maximum dm crypt key size */ ··· 116 115 117 116 addr = dm_crypt_keys_addr; 118 117 dm_crypt_keys_read((char *)&key_count, sizeof(key_count), &addr); 119 - if (key_count < 0 || key_count > KEY_NUM_MAX) { 118 + if (key_count > KEY_NUM_MAX) { 120 119 kexec_dprintk("Failed to read the number of dm-crypt keys\n"); 121 120 return -1; 122 121 } ··· 140 139 return 0; 141 140 } 142 141 143 - static int read_key_from_user_keying(struct dm_crypt_key *dm_key) 142 + static int read_key_from_user_keyring(struct dm_crypt_key *dm_key) 144 143 { 145 144 const struct user_key_payload *ukp; 146 145 struct key *key; ··· 190 189 191 190 static ssize_t config_key_description_show(struct config_item *item, char *page) 192 191 { 193 - return sprintf(page, "%s\n", to_config_key(item)->description); 192 + return sysfs_emit(page, "%s\n", to_config_key(item)->description); 194 193 } 195 194 196 195 static ssize_t config_key_description_store(struct config_item *item, ··· 266 265 267 266 static ssize_t config_keys_count_show(struct config_item *item, char *page) 268 267 { 269 - return sprintf(page, "%d\n", key_count); 268 + return sysfs_emit(page, "%d\n", key_count); 270 269 } 271 270 272 271 CONFIGFS_ATTR_RO(config_keys_, count); ··· 275 274 276 275 static ssize_t config_keys_reuse_show(struct config_item *item, char *page) 277 276 { 278 - return sprintf(page, "%d\n", is_dm_key_reused); 277 + return sysfs_emit(page, "%d\n", is_dm_key_reused); 279 278 } 280 279 281 280 static ssize_t config_keys_reuse_store(struct config_item *item, ··· 322 321 323 322 static ssize_t config_keys_restore_show(struct config_item *item, char *page) 324 323 { 325 - return sprintf(page, "%d\n", restore); 324 + return sysfs_emit(page, "%d\n", restore); 326 325 } 327 326 328 327 static ssize_t config_keys_restore_store(struct config_item *item, ··· 388 387 389 388 strscpy(keys_header->keys[i].key_desc, key->description, 390 389 KEY_DESC_MAX_LEN); 391 - r = read_key_from_user_keying(&keys_header->keys[i]); 390 + r = read_key_from_user_keyring(&keys_header->keys[i]); 392 391 if (r != 0) { 393 392 kexec_dprintk("Failed to read key %s\n", 394 393 keys_header->keys[i].key_desc); ··· 415 414 416 415 if (key_count <= 0) { 417 416 kexec_dprintk("No dm-crypt keys\n"); 418 - return -ENOENT; 417 + return 0; 419 418 } 420 419 421 420 if (!is_dm_key_reused) { 422 421 image->dm_crypt_keys_addr = 0; 423 422 r = build_keys_header(); 424 - if (r) 423 + if (r) { 424 + pr_err("Failed to build dm-crypt keys header, ret=%d\n", r); 425 425 return r; 426 + } 426 427 } 427 428 428 429 kbuf.buffer = keys_header; ··· 435 432 kbuf.mem = KEXEC_BUF_MEM_UNKNOWN; 436 433 r = kexec_add_buffer(&kbuf); 437 434 if (r) { 435 + pr_err("Failed to call kexec_add_buffer, ret=%d\n", r); 438 436 kvfree((void *)kbuf.buffer); 439 437 return r; 440 438 }

-2

kernel/crash_reserve.c

··· 20 20 #include <asm/page.h> 21 21 #include <asm/sections.h> 22 22 23 - #include <crypto/sha1.h> 24 - 25 23 #include "kallsyms_internal.h" 26 24 #include "kexec_internal.h" 27 25

+3 -5

kernel/exit.c

··· 749 749 tsk->exit_state = EXIT_ZOMBIE; 750 750 751 751 if (unlikely(tsk->ptrace)) { 752 - int sig = thread_group_leader(tsk) && 753 - thread_group_empty(tsk) && 754 - !ptrace_reparented(tsk) ? 755 - tsk->exit_signal : SIGCHLD; 752 + int sig = thread_group_empty(tsk) && !ptrace_reparented(tsk) 753 + ? tsk->exit_signal : SIGCHLD; 756 754 autoreap = do_notify_parent(tsk, sig); 757 755 } else if (thread_group_leader(tsk)) { 758 756 autoreap = thread_group_empty(tsk) && 759 - do_notify_parent(tsk, tsk->exit_signal); 757 + do_notify_parent(tsk, tsk->exit_signal); 760 758 } else { 761 759 autoreap = true; 762 760 /* untraced sub-thread */

+19 -11

kernel/fork.c

··· 347 347 stack = kasan_reset_tag(vm_area->addr); 348 348 349 349 /* Clear stale pointers from reused stack. */ 350 - memset(stack, 0, THREAD_SIZE); 350 + clear_pages(vm_area->addr, vm_area->nr_pages); 351 351 352 352 tsk->stack_vm_area = vm_area; 353 353 tsk->stack = stack; ··· 1016 1016 1017 1017 __cacheline_aligned_in_smp DEFINE_SPINLOCK(mmlist_lock); 1018 1018 1019 - static unsigned long default_dump_filter = MMF_DUMP_FILTER_DEFAULT; 1019 + static unsigned long coredump_filter = MMF_DUMP_FILTER_DEFAULT; 1020 1020 1021 1021 static int __init coredump_filter_setup(char *s) 1022 1022 { 1023 - default_dump_filter = 1024 - (simple_strtoul(s, NULL, 0) << MMF_DUMP_FILTER_SHIFT) & 1025 - MMF_DUMP_FILTER_MASK; 1023 + if (kstrtoul(s, 0, &coredump_filter)) 1024 + return 0; 1025 + coredump_filter <<= MMF_DUMP_FILTER_SHIFT; 1026 + coredump_filter &= MMF_DUMP_FILTER_MASK; 1026 1027 return 1; 1027 1028 } 1028 1029 ··· 1109 1108 __mm_flags_overwrite_word(mm, mmf_init_legacy_flags(flags)); 1110 1109 mm->def_flags = current->mm->def_flags & VM_INIT_DEF_MASK; 1111 1110 } else { 1112 - __mm_flags_overwrite_word(mm, default_dump_filter); 1111 + __mm_flags_overwrite_word(mm, coredump_filter); 1113 1112 mm->def_flags = 0; 1114 1113 } 1115 1114 ··· 2436 2435 2437 2436 rseq_fork(p, clone_flags); 2438 2437 2439 - /* Don't start children in a dying pid namespace */ 2438 + /* 2439 + * If zap_pid_ns_processes() was called after alloc_pid(), the new 2440 + * child missed SIGKILL. If current is not in the same namespace, 2441 + * we can't rely on fatal_signal_pending() below. 2442 + */ 2440 2443 if (unlikely(!(ns_of_pid(pid)->pid_allocated & PIDNS_ADDING))) { 2441 2444 retval = -ENOMEM; 2442 2445 goto bad_fork_core_free; ··· 3246 3241 new_cred, new_fs); 3247 3242 if (err) 3248 3243 goto bad_unshare_cleanup_cred; 3249 - 3250 3244 if (new_cred) { 3251 3245 err = set_cred_ucounts(new_cred); 3252 3246 if (err) 3253 - goto bad_unshare_cleanup_cred; 3247 + goto bad_unshare_cleanup_nsproxy; 3254 3248 } 3255 3249 3256 3250 if (new_fs || new_fd || do_sysvsem || new_cred || new_nsproxy) { ··· 3265 3261 shm_init_task(current); 3266 3262 } 3267 3263 3268 - if (new_nsproxy) 3264 + if (new_nsproxy) { 3269 3265 switch_task_namespaces(current, new_nsproxy); 3266 + new_nsproxy = NULL; 3267 + } 3270 3268 3271 3269 task_lock(current); 3272 3270 ··· 3297 3291 3298 3292 perf_event_namespaces(current); 3299 3293 3294 + bad_unshare_cleanup_nsproxy: 3295 + if (new_nsproxy) 3296 + put_nsproxy(new_nsproxy); 3300 3297 bad_unshare_cleanup_cred: 3301 3298 if (new_cred) 3302 3299 put_cred(new_cred); 3303 3300 bad_unshare_cleanup_fd: 3304 3301 if (new_fd) 3305 3302 put_files_struct(new_fd); 3306 - 3307 3303 bad_unshare_cleanup_fs: 3308 3304 if (new_fs) 3309 3305 free_fs_struct(new_fs);

+76 -30

kernel/hung_task.c

··· 36 36 /* 37 37 * Total number of tasks detected as hung since boot: 38 38 */ 39 - static unsigned long __read_mostly sysctl_hung_task_detect_count; 39 + static atomic_long_t sysctl_hung_task_detect_count = ATOMIC_LONG_INIT(0); 40 40 41 41 /* 42 42 * Limit number of tasks checked in a batch. ··· 223 223 } 224 224 #endif 225 225 226 - static void check_hung_task(struct task_struct *t, unsigned long timeout, 227 - unsigned long prev_detect_count) 226 + /** 227 + * hung_task_info - Print diagnostic details for a hung task 228 + * @t: Pointer to the detected hung task. 229 + * @timeout: Timeout threshold for detecting hung tasks 230 + * @this_round_count: Count of hung tasks detected in the current iteration 231 + * 232 + * Print structured information about the specified hung task, if warnings 233 + * are enabled or if the panic batch threshold is exceeded. 234 + */ 235 + static void hung_task_info(struct task_struct *t, unsigned long timeout, 236 + unsigned long this_round_count) 228 237 { 229 - unsigned long total_hung_task; 230 - 231 - if (!task_is_hung(t, timeout)) 232 - return; 233 - 234 - /* 235 - * This counter tracks the total number of tasks detected as hung 236 - * since boot. 237 - */ 238 - sysctl_hung_task_detect_count++; 239 - 240 - total_hung_task = sysctl_hung_task_detect_count - prev_detect_count; 241 238 trace_sched_process_hang(t); 242 239 243 - if (sysctl_hung_task_panic && total_hung_task >= sysctl_hung_task_panic) { 240 + if (sysctl_hung_task_panic && this_round_count >= sysctl_hung_task_panic) { 244 241 console_verbose(); 245 242 hung_task_call_panic = true; 246 243 } 247 244 248 245 /* 249 - * Ok, the task did not get scheduled for more than 2 minutes, 250 - * complain: 246 + * The given task did not get scheduled for more than 247 + * CONFIG_DEFAULT_HUNG_TASK_TIMEOUT. Therefore, complain 248 + * accordingly 251 249 */ 252 250 if (sysctl_hung_task_warnings || hung_task_call_panic) { 253 251 if (sysctl_hung_task_warnings > 0) 254 252 sysctl_hung_task_warnings--; 255 - pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n", 256 - t->comm, t->pid, (jiffies - t->last_switch_time) / HZ); 253 + pr_err("INFO: task %s:%d blocked%s for more than %ld seconds.\n", 254 + t->comm, t->pid, t->in_iowait ? " in I/O wait" : "", 255 + (jiffies - t->last_switch_time) / HZ); 257 256 pr_err(" %s %s %.*s\n", 258 257 print_tainted(), init_utsname()->release, 259 258 (int)strcspn(init_utsname()->version, " "), ··· 296 297 297 298 /* 298 299 * Check whether a TASK_UNINTERRUPTIBLE does not get woken up for 299 - * a really long time (120 seconds). If that happens, print out 300 - * a warning. 300 + * a really long time. If that happens, print out a warning. 301 301 */ 302 302 static void check_hung_uninterruptible_tasks(unsigned long timeout) 303 303 { 304 304 int max_count = sysctl_hung_task_check_count; 305 305 unsigned long last_break = jiffies; 306 306 struct task_struct *g, *t; 307 - unsigned long prev_detect_count = sysctl_hung_task_detect_count; 307 + unsigned long this_round_count; 308 308 int need_warning = sysctl_hung_task_warnings; 309 309 unsigned long si_mask = hung_task_si_mask; 310 310 ··· 314 316 if (test_taint(TAINT_DIE) || did_panic) 315 317 return; 316 318 317 - 319 + this_round_count = 0; 318 320 rcu_read_lock(); 319 321 for_each_process_thread(g, t) { 320 - 321 322 if (!max_count--) 322 323 goto unlock; 323 324 if (time_after(jiffies, last_break + HUNG_TASK_LOCK_BREAK)) { ··· 325 328 last_break = jiffies; 326 329 } 327 330 328 - check_hung_task(t, timeout, prev_detect_count); 331 + if (task_is_hung(t, timeout)) { 332 + /* 333 + * Increment the global counter so that userspace could 334 + * start migrating tasks ASAP. But count the current 335 + * round separately because userspace could reset 336 + * the global counter at any time. 337 + */ 338 + atomic_long_inc(&sysctl_hung_task_detect_count); 339 + this_round_count++; 340 + hung_task_info(t, timeout, this_round_count); 341 + } 329 342 } 330 343 unlock: 331 344 rcu_read_unlock(); 332 345 333 - if (!(sysctl_hung_task_detect_count - prev_detect_count)) 346 + if (!this_round_count) 334 347 return; 335 348 336 349 if (need_warning || hung_task_call_panic) { ··· 365 358 } 366 359 367 360 #ifdef CONFIG_SYSCTL 361 + 362 + /** 363 + * proc_dohung_task_detect_count - proc handler for hung_task_detect_count 364 + * @table: Pointer to the struct ctl_table definition for this proc entry 365 + * @dir: Flag indicating the operation 366 + * @buffer: User space buffer for data transfer 367 + * @lenp: Pointer to the length of the data being transferred 368 + * @ppos: Pointer to the current file offset 369 + * 370 + * This handler is used for reading the current hung task detection count 371 + * and for resetting it to zero when a write operation is performed using a 372 + * zero value only. 373 + * Return: 0 on success, or a negative error code on failure. 374 + */ 375 + static int proc_dohung_task_detect_count(const struct ctl_table *table, int dir, 376 + void *buffer, size_t *lenp, loff_t *ppos) 377 + { 378 + unsigned long detect_count; 379 + struct ctl_table proxy_table; 380 + int err; 381 + 382 + proxy_table = *table; 383 + proxy_table.data = &detect_count; 384 + 385 + if (SYSCTL_KERN_TO_USER(dir)) 386 + detect_count = atomic_long_read(&sysctl_hung_task_detect_count); 387 + 388 + err = proc_doulongvec_minmax(&proxy_table, dir, buffer, lenp, ppos); 389 + if (err < 0) 390 + return err; 391 + 392 + if (SYSCTL_USER_TO_KERN(dir)) { 393 + if (detect_count) 394 + return -EINVAL; 395 + atomic_long_set(&sysctl_hung_task_detect_count, 0); 396 + } 397 + 398 + return 0; 399 + } 400 + 368 401 /* 369 402 * Process updating of timeout sysctl 370 403 */ ··· 485 438 }, 486 439 { 487 440 .procname = "hung_task_detect_count", 488 - .data = &sysctl_hung_task_detect_count, 489 441 .maxlen = sizeof(unsigned long), 490 - .mode = 0444, 491 - .proc_handler = proc_doulongvec_minmax, 442 + .mode = 0644, 443 + .proc_handler = proc_dohung_task_detect_count, 492 444 }, 493 445 { 494 446 .procname = "hung_task_sys_info",

-1

kernel/kexec_core.c

··· 47 47 #include <asm/page.h> 48 48 #include <asm/sections.h> 49 49 50 - #include <crypto/hash.h> 51 50 #include "kexec_internal.h" 52 51 53 52 atomic_t __kexec_lock = ATOMIC_INIT(0);

+44 -3

kernel/panic.c

··· 801 801 * Documentation/admin-guide/tainted-kernels.rst, including its 802 802 * small shell script that prints the TAINT_FLAGS_COUNT bits of 803 803 * /proc/sys/kernel/tainted. 804 + * 805 + * Also, update INIT_TAINT_BUF_MAX below. 804 806 */ 805 807 const struct taint_flag taint_flags[TAINT_FLAGS_COUNT] = { 806 808 TAINT_FLAG(PROPRIETARY_MODULE, 'P', 'G'), ··· 856 854 } 857 855 } 858 856 857 + /* The initial buffer can accommodate all taint flags in verbose 858 + * mode, with some headroom. Once the allocator is available, the 859 + * exact size is allocated dynamically; the initial buffer remains 860 + * as a fallback if allocation fails. 861 + * 862 + * The verbose taint string currently requires up to 327 characters. 863 + */ 864 + #define INIT_TAINT_BUF_MAX 350 865 + 866 + static char init_taint_buf[INIT_TAINT_BUF_MAX] __initdata; 867 + static char *taint_buf __refdata = init_taint_buf; 868 + static size_t taint_buf_size = INIT_TAINT_BUF_MAX; 869 + 870 + static __init int alloc_taint_buf(void) 871 + { 872 + int i; 873 + char *buf; 874 + size_t size = 0; 875 + 876 + size += sizeof("Tainted: ") - 1; 877 + for (i = 0; i < TAINT_FLAGS_COUNT; i++) { 878 + size += 2; /* For ", " */ 879 + size += 4; /* For "[%c]=" */ 880 + size += strlen(taint_flags[i].desc); 881 + } 882 + 883 + size += 1; /* For NULL terminator */ 884 + 885 + buf = kmalloc(size, GFP_KERNEL); 886 + 887 + if (!buf) { 888 + panic("Failed to allocate taint string buffer"); 889 + } 890 + 891 + taint_buf = buf; 892 + taint_buf_size = size; 893 + 894 + return 0; 895 + } 896 + postcore_initcall(alloc_taint_buf); 897 + 859 898 static const char *_print_tainted(bool verbose) 860 899 { 861 - /* FIXME: what should the size be? */ 862 - static char buf[sizeof(taint_flags)]; 863 900 struct seq_buf s; 864 901 865 902 BUILD_BUG_ON(ARRAY_SIZE(taint_flags) != TAINT_FLAGS_COUNT); 866 903 867 - seq_buf_init(&s, buf, sizeof(buf)); 904 + seq_buf_init(&s, taint_buf, taint_buf_size); 868 905 869 906 print_tainted_seq(&s, verbose); 870 907

+11 -7

kernel/pid.c

··· 131 131 wake_up_process(READ_ONCE(ns->child_reaper)); 132 132 break; 133 133 case PIDNS_ADDING: 134 - /* Handle a fork failure of the first process */ 135 - WARN_ON(ns->child_reaper); 136 - ns->pid_allocated = 0; 134 + /* Only possible if the 1st fork fails */ 135 + WARN_ON(READ_ONCE(ns->child_reaper)); 137 136 break; 138 137 } 139 138 ··· 229 230 retried_preload = false; 230 231 idr_preload(GFP_KERNEL); 231 232 spin_lock(&pidmap_lock); 233 + /* For the case when the previous attempt to create init failed */ 234 + if (ns->pid_allocated == PIDNS_ADDING) 235 + idr_set_cursor(&ns->idr, 0); 236 + 232 237 for (tmp = ns, i = ns->level; i >= 0;) { 233 238 int tid = set_tid[ns->level - i]; 234 239 ··· 317 314 * 318 315 * This can't be done earlier because we need to preserve other 319 316 * error conditions. 317 + * 318 + * We need this even if copy_process() does the same check. If two 319 + * or more tasks from parent namespace try to inject a child into a 320 + * dead namespace, one of free_pid() calls from the copy_process() 321 + * error path may try to wakeup the possibly freed ns->child_reaper. 320 322 */ 321 323 retval = -ENOMEM; 322 324 if (unlikely(!(ns->pid_allocated & PIDNS_ADDING))) ··· 348 340 upid = pid->numbers + i; 349 341 idr_remove(&upid->ns->idr, upid->nr); 350 342 } 351 - 352 - /* On failure to allocate the first pid, reset the state */ 353 - if (ns->pid_allocated == PIDNS_ADDING) 354 - idr_set_cursor(&ns->idr, 0); 355 343 356 344 spin_unlock(&pidmap_lock); 357 345 idr_preload_end();

+5 -7

kernel/signal.c

··· 1000 1000 * Found a killable thread. If the signal will be fatal, 1001 1001 * then start taking the whole group down immediately. 1002 1002 */ 1003 - if (sig_fatal(p, sig) && 1004 - (signal->core_state || !(signal->flags & SIGNAL_GROUP_EXIT)) && 1005 - !sigismember(&t->real_blocked, sig) && 1003 + if (sig_fatal(p, sig) && !sigismember(&t->real_blocked, sig) && 1006 1004 (sig == SIGKILL || !p->ptrace)) { 1007 1005 /* 1008 1006 * This signal will be fatal to the whole group. ··· 2171 2173 bool autoreap = false; 2172 2174 u64 utime, stime; 2173 2175 2174 - WARN_ON_ONCE(sig == -1); 2176 + if (WARN_ON_ONCE(!valid_signal(sig))) 2177 + return false; 2175 2178 2176 2179 /* do_notify_parent_cldstop should have been called instead. */ 2177 2180 WARN_ON_ONCE(task_is_stopped_or_traced(tsk)); 2178 2181 2179 - WARN_ON_ONCE(!tsk->ptrace && 2180 - (tsk->group_leader != tsk || !thread_group_empty(tsk))); 2182 + WARN_ON_ONCE(!tsk->ptrace && !thread_group_empty(tsk)); 2181 2183 2182 2184 /* ptraced, or group-leader without sub-threads */ 2183 2185 do_notify_pidfd(tsk); ··· 2257 2259 * Send with __send_signal as si_pid and si_uid are in the 2258 2260 * parent's namespaces. 2259 2261 */ 2260 - if (valid_signal(sig) && sig) 2262 + if (sig) 2261 2263 __send_signal_locked(sig, &info, tsk->parent, PIDTYPE_TGID, false); 2262 2264 __wake_up_parent(tsk, tsk->parent); 2263 2265 spin_unlock_irqrestore(&psig->siglock, flags);

+1

kernel/taskstats.c

··· 649 649 goto err; 650 650 651 651 memcpy(stats, tsk->signal->stats, sizeof(*stats)); 652 + stats->version = TASKSTATS_VERSION; 652 653 653 654 send: 654 655 send_cpu_listeners(rep_skb, listeners);

-2

kernel/vmcore_info.c

··· 18 18 #include <asm/page.h> 19 19 #include <asm/sections.h> 20 20 21 - #include <crypto/sha1.h> 22 - 23 21 #include "kallsyms_internal.h" 24 22 #include "kexec_internal.h" 25 23

+90 -68

kernel/watchdog.c

··· 61 61 # endif /* CONFIG_SMP */ 62 62 63 63 /* 64 + * Number of consecutive missed interrupts before declaring a lockup. 65 + * Default to 1 (immediate) for NMI/Perf. Buddy will overwrite this to 3. 66 + */ 67 + int __read_mostly watchdog_hardlockup_miss_thresh = 1; 68 + EXPORT_SYMBOL_GPL(watchdog_hardlockup_miss_thresh); 69 + 70 + /* 64 71 * Should we panic when a soft-lockup or hard-lockup occurs: 65 72 */ 66 73 unsigned int __read_mostly hardlockup_panic = ··· 144 137 145 138 static DEFINE_PER_CPU(atomic_t, hrtimer_interrupts); 146 139 static DEFINE_PER_CPU(int, hrtimer_interrupts_saved); 140 + static DEFINE_PER_CPU(int, hrtimer_interrupts_missed); 147 141 static DEFINE_PER_CPU(bool, watchdog_hardlockup_warned); 148 142 static DEFINE_PER_CPU(bool, watchdog_hardlockup_touched); 149 143 static unsigned long hard_lockup_nmi_warn; ··· 167 159 per_cpu(watchdog_hardlockup_touched, cpu) = true; 168 160 } 169 161 170 - static bool is_hardlockup(unsigned int cpu) 162 + static void watchdog_hardlockup_update_reset(unsigned int cpu) 171 163 { 172 164 int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu)); 173 - 174 - if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint) 175 - return true; 176 165 177 166 /* 178 167 * NOTE: we don't need any fancy atomic_t or READ_ONCE/WRITE_ONCE ··· 177 172 * written/read by a single CPU. 178 173 */ 179 174 per_cpu(hrtimer_interrupts_saved, cpu) = hrint; 175 + per_cpu(hrtimer_interrupts_missed, cpu) = 0; 176 + } 180 177 181 - return false; 178 + static bool is_hardlockup(unsigned int cpu) 179 + { 180 + int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu)); 181 + 182 + if (per_cpu(hrtimer_interrupts_saved, cpu) != hrint) { 183 + watchdog_hardlockup_update_reset(cpu); 184 + return false; 185 + } 186 + 187 + per_cpu(hrtimer_interrupts_missed, cpu)++; 188 + if (per_cpu(hrtimer_interrupts_missed, cpu) % watchdog_hardlockup_miss_thresh) 189 + return false; 190 + 191 + return true; 182 192 } 183 193 184 194 static void watchdog_hardlockup_kick(void) ··· 207 187 void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs) 208 188 { 209 189 int hardlockup_all_cpu_backtrace; 190 + unsigned int this_cpu; 191 + unsigned long flags; 210 192 211 193 if (per_cpu(watchdog_hardlockup_touched, cpu)) { 194 + watchdog_hardlockup_update_reset(cpu); 212 195 per_cpu(watchdog_hardlockup_touched, cpu) = false; 213 196 return; 214 197 } ··· 224 201 * fired multiple times before we overflow'd. If it hasn't 225 202 * then this is a good indication the cpu is stuck 226 203 */ 227 - if (is_hardlockup(cpu)) { 228 - unsigned int this_cpu = smp_processor_id(); 229 - unsigned long flags; 204 + if (!is_hardlockup(cpu)) { 205 + per_cpu(watchdog_hardlockup_warned, cpu) = false; 206 + return; 207 + } 230 208 231 209 #ifdef CONFIG_SYSFS 232 - ++hardlockup_count; 210 + ++hardlockup_count; 233 211 #endif 234 - /* 235 - * A poorly behaving BPF scheduler can trigger hard lockup by 236 - * e.g. putting numerous affinitized tasks in a single queue and 237 - * directing all CPUs at it. The following call can return true 238 - * only once when sched_ext is enabled and will immediately 239 - * abort the BPF scheduler and print out a warning message. 240 - */ 241 - if (scx_hardlockup(cpu)) 212 + /* 213 + * A poorly behaving BPF scheduler can trigger hard lockup by 214 + * e.g. putting numerous affinitized tasks in a single queue and 215 + * directing all CPUs at it. The following call can return true 216 + * only once when sched_ext is enabled and will immediately 217 + * abort the BPF scheduler and print out a warning message. 218 + */ 219 + if (scx_hardlockup(cpu)) 220 + return; 221 + 222 + /* Only print hardlockups once. */ 223 + if (per_cpu(watchdog_hardlockup_warned, cpu)) 224 + return; 225 + 226 + /* 227 + * Prevent multiple hard-lockup reports if one cpu is already 228 + * engaged in dumping all cpu back traces. 229 + */ 230 + if (hardlockup_all_cpu_backtrace) { 231 + if (test_and_set_bit_lock(0, &hard_lockup_nmi_warn)) 242 232 return; 243 - 244 - /* Only print hardlockups once. */ 245 - if (per_cpu(watchdog_hardlockup_warned, cpu)) 246 - return; 247 - 248 - /* 249 - * Prevent multiple hard-lockup reports if one cpu is already 250 - * engaged in dumping all cpu back traces. 251 - */ 252 - if (hardlockup_all_cpu_backtrace) { 253 - if (test_and_set_bit_lock(0, &hard_lockup_nmi_warn)) 254 - return; 255 - } 256 - 257 - /* 258 - * NOTE: we call printk_cpu_sync_get_irqsave() after printing 259 - * the lockup message. While it would be nice to serialize 260 - * that printout, we really want to make sure that if some 261 - * other CPU somehow locked up while holding the lock associated 262 - * with printk_cpu_sync_get_irqsave() that we can still at least 263 - * get the message about the lockup out. 264 - */ 265 - pr_emerg("CPU%u: Watchdog detected hard LOCKUP on cpu %u\n", this_cpu, cpu); 266 - printk_cpu_sync_get_irqsave(flags); 267 - 268 - print_modules(); 269 - print_irqtrace_events(current); 270 - if (cpu == this_cpu) { 271 - if (regs) 272 - show_regs(regs); 273 - else 274 - dump_stack(); 275 - printk_cpu_sync_put_irqrestore(flags); 276 - } else { 277 - printk_cpu_sync_put_irqrestore(flags); 278 - trigger_single_cpu_backtrace(cpu); 279 - } 280 - 281 - if (hardlockup_all_cpu_backtrace) { 282 - trigger_allbutcpu_cpu_backtrace(cpu); 283 - if (!hardlockup_panic) 284 - clear_bit_unlock(0, &hard_lockup_nmi_warn); 285 - } 286 - 287 - sys_info(hardlockup_si_mask & ~SYS_INFO_ALL_BT); 288 - if (hardlockup_panic) 289 - nmi_panic(regs, "Hard LOCKUP"); 290 - 291 - per_cpu(watchdog_hardlockup_warned, cpu) = true; 292 - } else { 293 - per_cpu(watchdog_hardlockup_warned, cpu) = false; 294 233 } 234 + 235 + /* 236 + * NOTE: we call printk_cpu_sync_get_irqsave() after printing 237 + * the lockup message. While it would be nice to serialize 238 + * that printout, we really want to make sure that if some 239 + * other CPU somehow locked up while holding the lock associated 240 + * with printk_cpu_sync_get_irqsave() that we can still at least 241 + * get the message about the lockup out. 242 + */ 243 + this_cpu = smp_processor_id(); 244 + pr_emerg("CPU%u: Watchdog detected hard LOCKUP on cpu %u\n", this_cpu, cpu); 245 + printk_cpu_sync_get_irqsave(flags); 246 + 247 + print_modules(); 248 + print_irqtrace_events(current); 249 + if (cpu == this_cpu) { 250 + if (regs) 251 + show_regs(regs); 252 + else 253 + dump_stack(); 254 + printk_cpu_sync_put_irqrestore(flags); 255 + } else { 256 + printk_cpu_sync_put_irqrestore(flags); 257 + trigger_single_cpu_backtrace(cpu); 258 + } 259 + 260 + if (hardlockup_all_cpu_backtrace) { 261 + trigger_allbutcpu_cpu_backtrace(cpu); 262 + if (!hardlockup_panic) 263 + clear_bit_unlock(0, &hard_lockup_nmi_warn); 264 + } 265 + 266 + sys_info(hardlockup_si_mask & ~SYS_INFO_ALL_BT); 267 + if (hardlockup_panic) 268 + nmi_panic(regs, "Hard LOCKUP"); 269 + 270 + per_cpu(watchdog_hardlockup_warned, cpu) = true; 295 271 } 296 272 297 273 #else /* CONFIG_HARDLOCKUP_DETECTOR_COUNTS_HRTIMER */

+1 -8

kernel/watchdog_buddy.c

··· 21 21 22 22 int __init watchdog_hardlockup_probe(void) 23 23 { 24 + watchdog_hardlockup_miss_thresh = 3; 24 25 return 0; 25 26 } 26 27 ··· 86 85 void watchdog_buddy_check_hardlockup(int hrtimer_interrupts) 87 86 { 88 87 unsigned int next_cpu; 89 - 90 - /* 91 - * Test for hardlockups every 3 samples. The sample period is 92 - * watchdog_thresh * 2 / 5, so 3 samples gets us back to slightly over 93 - * watchdog_thresh (over by 20%). 94 - */ 95 - if (hrtimer_interrupts % 3 != 0) 96 - return; 97 88 98 89 /* check for a hardlockup on the next CPU */ 99 90 next_cpu = watchdog_next_cpu(smp_processor_id());

+1 -3

lib/Kconfig

··· 138 138 139 139 source "lib/crc/Kconfig" 140 140 source "lib/crypto/Kconfig" 141 + source "lib/raid/Kconfig" 141 142 142 143 config XXHASH 143 144 tristate ··· 624 623 default n 625 624 626 625 config ASN1_ENCODER 627 - tristate 628 - 629 - config POLYNOMIAL 630 626 tristate 631 627 632 628 config FIRMWARE_TABLE

+1 -3

lib/Makefile

··· 121 121 obj-$(CONFIG_DEBUG_INFO_REDUCED) += debug_info.o 122 122 CFLAGS_debug_info.o += $(call cc-option, -femit-struct-debug-detailed=any) 123 123 124 - obj-y += math/ crc/ crypto/ tests/ vdso/ 124 + obj-y += math/ crc/ crypto/ tests/ vdso/ raid/ 125 125 126 126 obj-$(CONFIG_GENERIC_IOMAP) += iomap.o 127 127 obj-$(CONFIG_HAS_IOMEM) += iomap_copy.o devres.o ··· 243 243 obj-$(CONFIG_MEMREGION) += memregion.o 244 244 obj-$(CONFIG_STMP_DEVICE) += stmp_device.o 245 245 obj-$(CONFIG_IRQ_POLL) += irq_poll.o 246 - 247 - obj-$(CONFIG_POLYNOMIAL) += polynomial.o 248 246 249 247 # stackdepot.c should not be instrumented or call instrumented functions. 250 248 # Prevent the compiler from calling builtins like memcmp() or bcmp() from this

+3 -3

lib/bch.c

··· 392 392 for (j = 0; j < 2*t; j += 2) 393 393 syn[j] ^= a_pow(bch, (j+1)*(i+s)); 394 394 395 - poly ^= (1 << i); 395 + poly ^= (1u << i); 396 396 } 397 397 } while (s > 0); 398 398 ··· 612 612 while (v) { 613 613 i = deg(v); 614 614 r ^= bch->xi_tab[i]; 615 - v ^= (1 << i); 615 + v ^= (1u << i); 616 616 } 617 617 /* verify root */ 618 618 if ((gf_sqr(bch, r)^r) == u) { ··· 1116 1116 for (b = 0; b < 4; b++) { 1117 1117 /* we want to compute (p(X).X^(8*b+deg(g))) mod g(X) */ 1118 1118 tab = bch->mod8_tab + (b*256+i)*l; 1119 - data = i << (8*b); 1119 + data = (unsigned int)i << (8*b); 1120 1120 while (data) { 1121 1121 d = deg(data); 1122 1122 /* subtract X^d.g(X) from p(X).X^(8*b+deg(g)) */

+3 -3

lib/bug.c

··· 251 251 if (file) 252 252 pr_crit("kernel BUG at %s:%u!\n", file, line); 253 253 else 254 - pr_crit("Kernel BUG at %pB [verbose debug info unavailable]\n", 254 + pr_crit("kernel BUG at %pB [verbose debug info unavailable]\n", 255 255 (void *)bugaddr); 256 256 257 257 return BUG_TRAP_TYPE_BUG; ··· 260 260 enum bug_trap_type report_bug_entry(struct bug_entry *bug, struct pt_regs *regs) 261 261 { 262 262 enum bug_trap_type ret; 263 - bool rcu = false; 263 + bool rcu; 264 264 265 265 rcu = warn_rcu_enter(); 266 266 ret = __report_bug(bug, bug_addr(bug), regs); ··· 272 272 enum bug_trap_type report_bug(unsigned long bugaddr, struct pt_regs *regs) 273 273 { 274 274 enum bug_trap_type ret; 275 - bool rcu = false; 275 + bool rcu; 276 276 277 277 rcu = warn_rcu_enter(); 278 278 ret = __report_bug(NULL, bugaddr, regs);

+2 -2

lib/decompress_bunzip2.c

··· 135 135 } 136 136 /* Avoid 32-bit overflow (dump bit buffer to top of output) */ 137 137 if (bd->inbufBitCount >= 24) { 138 - bits = bd->inbufBits&((1 << bd->inbufBitCount)-1); 138 + bits = bd->inbufBits & ((1ULL << bd->inbufBitCount) - 1); 139 139 bits_wanted -= bd->inbufBitCount; 140 140 bits <<= bits_wanted; 141 141 bd->inbufBitCount = 0; ··· 146 146 } 147 147 /* Calculate result */ 148 148 bd->inbufBitCount -= bits_wanted; 149 - bits |= (bd->inbufBits >> bd->inbufBitCount)&((1 << bits_wanted)-1); 149 + bits |= (bd->inbufBits >> bd->inbufBitCount) & ((1ULL << bits_wanted) - 1); 150 150 151 151 return bits; 152 152 }

+7 -4

lib/glob.c

··· 1 + // SPDX-License-Identifier: (GPL-2.0 OR MIT) 1 2 #include <linux/module.h> 2 3 #include <linux/glob.h> 4 + #include <linux/export.h> 3 5 4 6 /* 5 7 * The only reason this code can be compiled as a module is because the ··· 22 20 * Pattern metacharacters are ?, *, [ and \. 23 21 * (And, inside character classes, !, - and ].) 24 22 * 25 - * This is small and simple implementation intended for device blacklists 23 + * This is a small and simple implementation intended for device denylists 26 24 * where a string is matched against a number of patterns. Thus, it 27 25 * does not preprocess the patterns. It is non-recursive, and run-time 28 26 * is at most quadratic: strlen(@str)*strlen(@pat). ··· 47 45 * (no exception for /), it can be easily proved that there's 48 46 * never a need to backtrack multiple levels. 49 47 */ 50 - char const *back_pat = NULL, *back_str; 48 + char const *back_pat = NULL, *back_str = NULL; 51 49 52 50 /* 53 51 * Loop over each token (character or class) in pat, matching ··· 73 71 if (c == '\0') /* No possible match */ 74 72 return false; 75 73 bool match = false, inverted = (*pat == '!'); 76 - char const *class = pat + inverted; 74 + char const *class = inverted ? pat + 1 : pat; 77 75 unsigned char a = *class++; 78 76 79 77 /* ··· 96 94 class += 2; 97 95 /* Any special action if a > b? */ 98 96 } 99 - match |= (a <= c && c <= b); 97 + if (a <= c && c <= b) 98 + match = true; 100 99 } while ((a = *class++) != ']'); 101 100 102 101 if (match == inverted)

+7 -6

lib/inflate.c

··· 9 9 * based on gzip-1.0.3 10 10 * 11 11 * Nicolas Pitre <nico@fluxnic.net>, 1999/04/14 : 12 - * Little mods for all variable to reside either into rodata or bss segments 12 + * Little mods for all variables to reside either into rodata or bss segments 13 13 * by marking constant variables with 'const' and initializing all the others 14 14 * at run-time only. This allows for the kernel uncompressor to run 15 15 * directly from Flash or ROM memory on embedded systems. ··· 286 286 the longer codes. The time it costs to decode the longer codes is 287 287 then traded against the time it takes to make longer tables. 288 288 289 - This results of this trade are in the variables lbits and dbits 289 + The results of this trade are in the variables lbits and dbits 290 290 below. lbits is the number of bits the first level table for literal/ 291 291 length codes can decode in one step, and dbits is the same thing for 292 292 the distance codes. Subsequent tables are also less than or equal to ··· 811 811 812 812 /* decompress until an end-of-block code */ 813 813 if (inflate_codes(tl, td, bl, bd)) { 814 + huft_free(tl); 815 + huft_free(td); 814 816 free(l); 815 817 return 1; 816 818 } ··· 1009 1007 DEBG("dyn6 "); 1010 1008 1011 1009 /* decompress until an end-of-block code */ 1012 - if (inflate_codes(tl, td, bl, bd)) { 1010 + if (inflate_codes(tl, td, bl, bd)) 1013 1011 ret = 1; 1014 - goto out; 1015 - } 1012 + else 1013 + ret = 0; 1016 1014 1017 1015 DEBG("dyn7 "); 1018 1016 ··· 1021 1019 huft_free(td); 1022 1020 1023 1021 DEBG(">"); 1024 - ret = 0; 1025 1022 out: 1026 1023 free(ll); 1027 1024 return ret;

-10

lib/list_sort.c

··· 50 50 struct list_head *a, struct list_head *b) 51 51 { 52 52 struct list_head *tail = head; 53 - u8 count = 0; 54 53 55 54 for (;;) { 56 55 /* if equal, take 'a' -- important for sort stability */ ··· 75 76 /* Finish linking remainder of list b on to tail */ 76 77 tail->next = b; 77 78 do { 78 - /* 79 - * If the merge is highly unbalanced (e.g. the input is 80 - * already sorted), this loop may run many iterations. 81 - * Continue callbacks to the client even though no 82 - * element comparison is needed, so the client's cmp() 83 - * routine can invoke cond_resched() periodically. 84 - */ 85 - if (unlikely(!++count)) 86 - cmp(priv, b, b); 87 79 b->prev = tail; 88 80 tail = b; 89 81 b = b->next;

+3

lib/math/Kconfig

··· 5 5 This option provides an implementation of the CORDIC algorithm; 6 6 calculations are in fixed point. Module will be called cordic. 7 7 8 + config POLYNOMIAL 9 + tristate 10 + 8 11 config PRIME_NUMBERS 9 12 tristate "Simple prime number generator for testing" 10 13 help

+1

lib/math/Makefile

··· 2 2 obj-y += div64.o gcd.o lcm.o int_log.o int_pow.o int_sqrt.o reciprocal_div.o 3 3 4 4 obj-$(CONFIG_CORDIC) += cordic.o 5 + obj-$(CONFIG_POLYNOMIAL) += polynomial.o 5 6 obj-$(CONFIG_PRIME_NUMBERS) += prime_numbers.o 6 7 obj-$(CONFIG_RATIONAL) += rational.o 7 8

+105

lib/math/polynomial.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * Generic polynomial calculation using integer coefficients. 4 + * 5 + * Copyright (C) 2020 BAIKAL ELECTRONICS, JSC 6 + * 7 + * Authors: 8 + * Maxim Kaurkin <maxim.kaurkin@baikalelectronics.ru> 9 + * Serge Semin <Sergey.Semin@baikalelectronics.ru> 10 + * 11 + */ 12 + 13 + #include <linux/export.h> 14 + #include <linux/math.h> 15 + #include <linux/module.h> 16 + #include <linux/polynomial.h> 17 + 18 + /* 19 + * The following conversion is an example: 20 + * 21 + * The original translation formulae of the temperature (in degrees of Celsius) 22 + * to PVT data and vice-versa are following: 23 + * 24 + * N = 1.8322e-8*(T^4) + 2.343e-5*(T^3) + 8.7018e-3*(T^2) + 3.9269*(T^1) + 1.7204e2 25 + * T = -1.6743e-11*(N^4) + 8.1542e-8*(N^3) + -1.8201e-4*(N^2) + 3.1020e-1*(N^1) - 4.838e1 26 + * 27 + * where T = [-48.380, 147.438]C and N = [0, 1023]. 28 + * 29 + * They must be accordingly altered to be suitable for the integer arithmetics. 30 + * The technique is called 'factor redistribution', which just makes sure the 31 + * multiplications and divisions are made so to have a result of the operations 32 + * within the integer numbers limit. In addition we need to translate the 33 + * formulae to accept millidegrees of Celsius. Here what they look like after 34 + * the alterations: 35 + * 36 + * N = (18322e-20*(T^4) + 2343e-13*(T^3) + 87018e-9*(T^2) + 39269e-3*T + 17204e2) / 1e4 37 + * T = -16743e-12*(D^4) + 81542e-9*(D^3) - 182010e-6*(D^2) + 310200e-3*D - 48380 38 + * 39 + * where T = [-48380, 147438] mC and N = [0, 1023]. 40 + * 41 + * static const struct polynomial poly_temp_to_N = { 42 + * .total_divider = 10000, 43 + * .terms = { 44 + * {4, 18322, 10000, 10000}, 45 + * {3, 2343, 10000, 10}, 46 + * {2, 87018, 10000, 10}, 47 + * {1, 39269, 1000, 1}, 48 + * {0, 1720400, 1, 1} 49 + * } 50 + * }; 51 + * 52 + * static const struct polynomial poly_N_to_temp = { 53 + * .total_divider = 1, 54 + * .terms = { 55 + * {4, -16743, 1000, 1}, 56 + * {3, 81542, 1000, 1}, 57 + * {2, -182010, 1000, 1}, 58 + * {1, 310200, 1000, 1}, 59 + * {0, -48380, 1, 1} 60 + * } 61 + * }; 62 + */ 63 + 64 + /** 65 + * polynomial_calc - calculate a polynomial using integer arithmetic 66 + * 67 + * @poly: pointer to the descriptor of the polynomial 68 + * @data: input value of the polynomial 69 + * 70 + * Calculate the result of a polynomial using only integer arithmetic. For 71 + * this to work without too much loss of precision the coefficients has to 72 + * be altered. This is called factor redistribution. 73 + * 74 + * Return: the result of the polynomial calculation. 75 + */ 76 + long polynomial_calc(const struct polynomial *poly, long data) 77 + { 78 + const struct polynomial_term *term = poly->terms; 79 + long total_divider = poly->total_divider ?: 1; 80 + long tmp, ret = 0; 81 + int deg; 82 + 83 + /* 84 + * Here is the polynomial calculation function, which performs the 85 + * redistributed terms calculations. It's pretty straightforward. 86 + * We walk over each degree term up to the free one, and perform 87 + * the redistributed multiplication of the term coefficient, its 88 + * divider (as for the rationale fraction representation), data 89 + * power and the rational fraction divider leftover. Then all of 90 + * this is collected in a total sum variable, which value is 91 + * normalized by the total divider before being returned. 92 + */ 93 + do { 94 + tmp = term->coef; 95 + for (deg = 0; deg < term->deg; ++deg) 96 + tmp = mult_frac(tmp, data, term->divider); 97 + ret += tmp / term->divider_leftover; 98 + } while ((term++)->deg); 99 + 100 + return ret / total_divider; 101 + } 102 + EXPORT_SYMBOL_GPL(polynomial_calc); 103 + 104 + MODULE_DESCRIPTION("Generic polynomial calculations"); 105 + MODULE_LICENSE("GPL");

+1 -1

lib/parser.c

··· 315 315 } 316 316 } 317 317 318 - if (*p == '*') 318 + while (*p == '*') 319 319 ++p; 320 320 return !*p; 321 321 }

-108

lib/polynomial.c

··· 1 - // SPDX-License-Identifier: GPL-2.0-only 2 - /* 3 - * Generic polynomial calculation using integer coefficients. 4 - * 5 - * Copyright (C) 2020 BAIKAL ELECTRONICS, JSC 6 - * 7 - * Authors: 8 - * Maxim Kaurkin <maxim.kaurkin@baikalelectronics.ru> 9 - * Serge Semin <Sergey.Semin@baikalelectronics.ru> 10 - * 11 - */ 12 - 13 - #include <linux/kernel.h> 14 - #include <linux/module.h> 15 - #include <linux/polynomial.h> 16 - 17 - /* 18 - * Originally this was part of drivers/hwmon/bt1-pvt.c. 19 - * There the following conversion is used and should serve as an example here: 20 - * 21 - * The original translation formulae of the temperature (in degrees of Celsius) 22 - * to PVT data and vice-versa are following: 23 - * 24 - * N = 1.8322e-8*(T^4) + 2.343e-5*(T^3) + 8.7018e-3*(T^2) + 3.9269*(T^1) + 25 - * 1.7204e2 26 - * T = -1.6743e-11*(N^4) + 8.1542e-8*(N^3) + -1.8201e-4*(N^2) + 27 - * 3.1020e-1*(N^1) - 4.838e1 28 - * 29 - * where T = [-48.380, 147.438]C and N = [0, 1023]. 30 - * 31 - * They must be accordingly altered to be suitable for the integer arithmetics. 32 - * The technique is called 'factor redistribution', which just makes sure the 33 - * multiplications and divisions are made so to have a result of the operations 34 - * within the integer numbers limit. In addition we need to translate the 35 - * formulae to accept millidegrees of Celsius. Here what they look like after 36 - * the alterations: 37 - * 38 - * N = (18322e-20*(T^4) + 2343e-13*(T^3) + 87018e-9*(T^2) + 39269e-3*T + 39 - * 17204e2) / 1e4 40 - * T = -16743e-12*(D^4) + 81542e-9*(D^3) - 182010e-6*(D^2) + 310200e-3*D - 41 - * 48380 42 - * where T = [-48380, 147438] mC and N = [0, 1023]. 43 - * 44 - * static const struct polynomial poly_temp_to_N = { 45 - * .total_divider = 10000, 46 - * .terms = { 47 - * {4, 18322, 10000, 10000}, 48 - * {3, 2343, 10000, 10}, 49 - * {2, 87018, 10000, 10}, 50 - * {1, 39269, 1000, 1}, 51 - * {0, 1720400, 1, 1} 52 - * } 53 - * }; 54 - * 55 - * static const struct polynomial poly_N_to_temp = { 56 - * .total_divider = 1, 57 - * .terms = { 58 - * {4, -16743, 1000, 1}, 59 - * {3, 81542, 1000, 1}, 60 - * {2, -182010, 1000, 1}, 61 - * {1, 310200, 1000, 1}, 62 - * {0, -48380, 1, 1} 63 - * } 64 - * }; 65 - */ 66 - 67 - /** 68 - * polynomial_calc - calculate a polynomial using integer arithmetic 69 - * 70 - * @poly: pointer to the descriptor of the polynomial 71 - * @data: input value of the polynimal 72 - * 73 - * Calculate the result of a polynomial using only integer arithmetic. For 74 - * this to work without too much loss of precision the coefficients has to 75 - * be altered. This is called factor redistribution. 76 - * 77 - * Returns the result of the polynomial calculation. 78 - */ 79 - long polynomial_calc(const struct polynomial *poly, long data) 80 - { 81 - const struct polynomial_term *term = poly->terms; 82 - long total_divider = poly->total_divider ?: 1; 83 - long tmp, ret = 0; 84 - int deg; 85 - 86 - /* 87 - * Here is the polynomial calculation function, which performs the 88 - * redistributed terms calculations. It's pretty straightforward. 89 - * We walk over each degree term up to the free one, and perform 90 - * the redistributed multiplication of the term coefficient, its 91 - * divider (as for the rationale fraction representation), data 92 - * power and the rational fraction divider leftover. Then all of 93 - * this is collected in a total sum variable, which value is 94 - * normalized by the total divider before being returned. 95 - */ 96 - do { 97 - tmp = term->coef; 98 - for (deg = 0; deg < term->deg; ++deg) 99 - tmp = mult_frac(tmp, data, term->divider); 100 - ret += tmp / term->divider_leftover; 101 - } while ((term++)->deg); 102 - 103 - return ret / total_divider; 104 - } 105 - EXPORT_SYMBOL_GPL(polynomial_calc); 106 - 107 - MODULE_DESCRIPTION("Generic polynomial calculations"); 108 - MODULE_LICENSE("GPL");

+3

lib/raid/.kunitconfig

··· 1 + CONFIG_KUNIT=y 2 + CONFIG_BTRFS_FS=y 3 + CONFIG_XOR_KUNIT_TEST=y

+30

lib/raid/Kconfig

··· 1 + # SPDX-License-Identifier: GPL-2.0 2 + 3 + config XOR_BLOCKS 4 + tristate 5 + 6 + # selected by architectures that provide an optimized XOR implementation 7 + config XOR_BLOCKS_ARCH 8 + depends on XOR_BLOCKS 9 + default y if ALPHA 10 + default y if ARM 11 + default y if ARM64 12 + default y if CPU_HAS_LSX # loongarch 13 + default y if ALTIVEC # powerpc 14 + default y if RISCV_ISA_V 15 + default y if SPARC 16 + default y if S390 17 + default y if X86_32 18 + default y if X86_64 19 + bool 20 + 21 + config XOR_KUNIT_TEST 22 + tristate "KUnit tests for xor_gen" if !KUNIT_ALL_TESTS 23 + depends on KUNIT 24 + depends on XOR_BLOCKS 25 + default KUNIT_ALL_TESTS 26 + help 27 + Unit tests for the XOR library functions. 28 + 29 + This is intended to help people writing architecture-specific 30 + optimized versions. If unsure, say N.

+3

lib/raid/Makefile

··· 1 + # SPDX-License-Identifier: GPL-2.0 2 + 3 + obj-y += xor/

+42

lib/raid/xor/Makefile

··· 1 + # SPDX-License-Identifier: GPL-2.0 2 + 3 + ccflags-y += -I $(src) 4 + 5 + obj-$(CONFIG_XOR_BLOCKS) += xor.o 6 + 7 + xor-y += xor-core.o 8 + xor-y += xor-8regs.o 9 + xor-y += xor-32regs.o 10 + xor-y += xor-8regs-prefetch.o 11 + xor-y += xor-32regs-prefetch.o 12 + 13 + ifeq ($(CONFIG_XOR_BLOCKS_ARCH),y) 14 + CFLAGS_xor-core.o += -I$(src)/$(SRCARCH) 15 + endif 16 + 17 + xor-$(CONFIG_ALPHA) += alpha/xor.o 18 + xor-$(CONFIG_ARM) += arm/xor.o 19 + ifeq ($(CONFIG_ARM),y) 20 + xor-$(CONFIG_KERNEL_MODE_NEON) += arm/xor-neon.o arm/xor-neon-glue.o 21 + endif 22 + xor-$(CONFIG_ARM64) += arm64/xor-neon.o arm64/xor-neon-glue.o 23 + xor-$(CONFIG_CPU_HAS_LSX) += loongarch/xor_simd.o 24 + xor-$(CONFIG_CPU_HAS_LSX) += loongarch/xor_simd_glue.o 25 + xor-$(CONFIG_ALTIVEC) += powerpc/xor_vmx.o powerpc/xor_vmx_glue.o 26 + xor-$(CONFIG_RISCV_ISA_V) += riscv/xor.o riscv/xor-glue.o 27 + xor-$(CONFIG_SPARC32) += sparc/xor-sparc32.o 28 + xor-$(CONFIG_SPARC64) += sparc/xor-sparc64.o sparc/xor-sparc64-glue.o 29 + xor-$(CONFIG_S390) += s390/xor.o 30 + xor-$(CONFIG_X86_32) += x86/xor-avx.o x86/xor-sse.o x86/xor-mmx.o 31 + xor-$(CONFIG_X86_64) += x86/xor-avx.o x86/xor-sse.o 32 + obj-y += tests/ 33 + 34 + CFLAGS_arm/xor-neon.o += $(CC_FLAGS_FPU) 35 + CFLAGS_REMOVE_arm/xor-neon.o += $(CC_FLAGS_NO_FPU) 36 + 37 + CFLAGS_arm64/xor-neon.o += $(CC_FLAGS_FPU) 38 + CFLAGS_REMOVE_arm64/xor-neon.o += $(CC_FLAGS_NO_FPU) 39 + 40 + CFLAGS_powerpc/xor_vmx.o += -mhard-float -maltivec \ 41 + $(call cc-option,-mabi=altivec) \ 42 + -isystem $(shell $(CC) -print-file-name=include)

+848

lib/raid/xor/alpha/xor.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-or-later 2 + /* 3 + * Optimized XOR parity functions for alpha EV5 and EV6 4 + */ 5 + #include "xor_impl.h" 6 + #include "xor_arch.h" 7 + 8 + extern void 9 + xor_alpha_2(unsigned long bytes, unsigned long * __restrict p1, 10 + const unsigned long * __restrict p2); 11 + extern void 12 + xor_alpha_3(unsigned long bytes, unsigned long * __restrict p1, 13 + const unsigned long * __restrict p2, 14 + const unsigned long * __restrict p3); 15 + extern void 16 + xor_alpha_4(unsigned long bytes, unsigned long * __restrict p1, 17 + const unsigned long * __restrict p2, 18 + const unsigned long * __restrict p3, 19 + const unsigned long * __restrict p4); 20 + extern void 21 + xor_alpha_5(unsigned long bytes, unsigned long * __restrict p1, 22 + const unsigned long * __restrict p2, 23 + const unsigned long * __restrict p3, 24 + const unsigned long * __restrict p4, 25 + const unsigned long * __restrict p5); 26 + 27 + extern void 28 + xor_alpha_prefetch_2(unsigned long bytes, unsigned long * __restrict p1, 29 + const unsigned long * __restrict p2); 30 + extern void 31 + xor_alpha_prefetch_3(unsigned long bytes, unsigned long * __restrict p1, 32 + const unsigned long * __restrict p2, 33 + const unsigned long * __restrict p3); 34 + extern void 35 + xor_alpha_prefetch_4(unsigned long bytes, unsigned long * __restrict p1, 36 + const unsigned long * __restrict p2, 37 + const unsigned long * __restrict p3, 38 + const unsigned long * __restrict p4); 39 + extern void 40 + xor_alpha_prefetch_5(unsigned long bytes, unsigned long * __restrict p1, 41 + const unsigned long * __restrict p2, 42 + const unsigned long * __restrict p3, 43 + const unsigned long * __restrict p4, 44 + const unsigned long * __restrict p5); 45 + 46 + asm(" \n\ 47 + .text \n\ 48 + .align 3 \n\ 49 + .ent xor_alpha_2 \n\ 50 + xor_alpha_2: \n\ 51 + .prologue 0 \n\ 52 + srl $16, 6, $16 \n\ 53 + .align 4 \n\ 54 + 2: \n\ 55 + ldq $0,0($17) \n\ 56 + ldq $1,0($18) \n\ 57 + ldq $2,8($17) \n\ 58 + ldq $3,8($18) \n\ 59 + \n\ 60 + ldq $4,16($17) \n\ 61 + ldq $5,16($18) \n\ 62 + ldq $6,24($17) \n\ 63 + ldq $7,24($18) \n\ 64 + \n\ 65 + ldq $19,32($17) \n\ 66 + ldq $20,32($18) \n\ 67 + ldq $21,40($17) \n\ 68 + ldq $22,40($18) \n\ 69 + \n\ 70 + ldq $23,48($17) \n\ 71 + ldq $24,48($18) \n\ 72 + ldq $25,56($17) \n\ 73 + xor $0,$1,$0 # 7 cycles from $1 load \n\ 74 + \n\ 75 + ldq $27,56($18) \n\ 76 + xor $2,$3,$2 \n\ 77 + stq $0,0($17) \n\ 78 + xor $4,$5,$4 \n\ 79 + \n\ 80 + stq $2,8($17) \n\ 81 + xor $6,$7,$6 \n\ 82 + stq $4,16($17) \n\ 83 + xor $19,$20,$19 \n\ 84 + \n\ 85 + stq $6,24($17) \n\ 86 + xor $21,$22,$21 \n\ 87 + stq $19,32($17) \n\ 88 + xor $23,$24,$23 \n\ 89 + \n\ 90 + stq $21,40($17) \n\ 91 + xor $25,$27,$25 \n\ 92 + stq $23,48($17) \n\ 93 + subq $16,1,$16 \n\ 94 + \n\ 95 + stq $25,56($17) \n\ 96 + addq $17,64,$17 \n\ 97 + addq $18,64,$18 \n\ 98 + bgt $16,2b \n\ 99 + \n\ 100 + ret \n\ 101 + .end xor_alpha_2 \n\ 102 + \n\ 103 + .align 3 \n\ 104 + .ent xor_alpha_3 \n\ 105 + xor_alpha_3: \n\ 106 + .prologue 0 \n\ 107 + srl $16, 6, $16 \n\ 108 + .align 4 \n\ 109 + 3: \n\ 110 + ldq $0,0($17) \n\ 111 + ldq $1,0($18) \n\ 112 + ldq $2,0($19) \n\ 113 + ldq $3,8($17) \n\ 114 + \n\ 115 + ldq $4,8($18) \n\ 116 + ldq $6,16($17) \n\ 117 + ldq $7,16($18) \n\ 118 + ldq $21,24($17) \n\ 119 + \n\ 120 + ldq $22,24($18) \n\ 121 + ldq $24,32($17) \n\ 122 + ldq $25,32($18) \n\ 123 + ldq $5,8($19) \n\ 124 + \n\ 125 + ldq $20,16($19) \n\ 126 + ldq $23,24($19) \n\ 127 + ldq $27,32($19) \n\ 128 + nop \n\ 129 + \n\ 130 + xor $0,$1,$1 # 8 cycles from $0 load \n\ 131 + xor $3,$4,$4 # 6 cycles from $4 load \n\ 132 + xor $6,$7,$7 # 6 cycles from $7 load \n\ 133 + xor $21,$22,$22 # 5 cycles from $22 load \n\ 134 + \n\ 135 + xor $1,$2,$2 # 9 cycles from $2 load \n\ 136 + xor $24,$25,$25 # 5 cycles from $25 load \n\ 137 + stq $2,0($17) \n\ 138 + xor $4,$5,$5 # 6 cycles from $5 load \n\ 139 + \n\ 140 + stq $5,8($17) \n\ 141 + xor $7,$20,$20 # 7 cycles from $20 load \n\ 142 + stq $20,16($17) \n\ 143 + xor $22,$23,$23 # 7 cycles from $23 load \n\ 144 + \n\ 145 + stq $23,24($17) \n\ 146 + xor $25,$27,$27 # 7 cycles from $27 load \n\ 147 + stq $27,32($17) \n\ 148 + nop \n\ 149 + \n\ 150 + ldq $0,40($17) \n\ 151 + ldq $1,40($18) \n\ 152 + ldq $3,48($17) \n\ 153 + ldq $4,48($18) \n\ 154 + \n\ 155 + ldq $6,56($17) \n\ 156 + ldq $7,56($18) \n\ 157 + ldq $2,40($19) \n\ 158 + ldq $5,48($19) \n\ 159 + \n\ 160 + ldq $20,56($19) \n\ 161 + xor $0,$1,$1 # 4 cycles from $1 load \n\ 162 + xor $3,$4,$4 # 5 cycles from $4 load \n\ 163 + xor $6,$7,$7 # 5 cycles from $7 load \n\ 164 + \n\ 165 + xor $1,$2,$2 # 4 cycles from $2 load \n\ 166 + xor $4,$5,$5 # 5 cycles from $5 load \n\ 167 + stq $2,40($17) \n\ 168 + xor $7,$20,$20 # 4 cycles from $20 load \n\ 169 + \n\ 170 + stq $5,48($17) \n\ 171 + subq $16,1,$16 \n\ 172 + stq $20,56($17) \n\ 173 + addq $19,64,$19 \n\ 174 + \n\ 175 + addq $18,64,$18 \n\ 176 + addq $17,64,$17 \n\ 177 + bgt $16,3b \n\ 178 + ret \n\ 179 + .end xor_alpha_3 \n\ 180 + \n\ 181 + .align 3 \n\ 182 + .ent xor_alpha_4 \n\ 183 + xor_alpha_4: \n\ 184 + .prologue 0 \n\ 185 + srl $16, 6, $16 \n\ 186 + .align 4 \n\ 187 + 4: \n\ 188 + ldq $0,0($17) \n\ 189 + ldq $1,0($18) \n\ 190 + ldq $2,0($19) \n\ 191 + ldq $3,0($20) \n\ 192 + \n\ 193 + ldq $4,8($17) \n\ 194 + ldq $5,8($18) \n\ 195 + ldq $6,8($19) \n\ 196 + ldq $7,8($20) \n\ 197 + \n\ 198 + ldq $21,16($17) \n\ 199 + ldq $22,16($18) \n\ 200 + ldq $23,16($19) \n\ 201 + ldq $24,16($20) \n\ 202 + \n\ 203 + ldq $25,24($17) \n\ 204 + xor $0,$1,$1 # 6 cycles from $1 load \n\ 205 + ldq $27,24($18) \n\ 206 + xor $2,$3,$3 # 6 cycles from $3 load \n\ 207 + \n\ 208 + ldq $0,24($19) \n\ 209 + xor $1,$3,$3 \n\ 210 + ldq $1,24($20) \n\ 211 + xor $4,$5,$5 # 7 cycles from $5 load \n\ 212 + \n\ 213 + stq $3,0($17) \n\ 214 + xor $6,$7,$7 \n\ 215 + xor $21,$22,$22 # 7 cycles from $22 load \n\ 216 + xor $5,$7,$7 \n\ 217 + \n\ 218 + stq $7,8($17) \n\ 219 + xor $23,$24,$24 # 7 cycles from $24 load \n\ 220 + ldq $2,32($17) \n\ 221 + xor $22,$24,$24 \n\ 222 + \n\ 223 + ldq $3,32($18) \n\ 224 + ldq $4,32($19) \n\ 225 + ldq $5,32($20) \n\ 226 + xor $25,$27,$27 # 8 cycles from $27 load \n\ 227 + \n\ 228 + ldq $6,40($17) \n\ 229 + ldq $7,40($18) \n\ 230 + ldq $21,40($19) \n\ 231 + ldq $22,40($20) \n\ 232 + \n\ 233 + stq $24,16($17) \n\ 234 + xor $0,$1,$1 # 9 cycles from $1 load \n\ 235 + xor $2,$3,$3 # 5 cycles from $3 load \n\ 236 + xor $27,$1,$1 \n\ 237 + \n\ 238 + stq $1,24($17) \n\ 239 + xor $4,$5,$5 # 5 cycles from $5 load \n\ 240 + ldq $23,48($17) \n\ 241 + ldq $24,48($18) \n\ 242 + \n\ 243 + ldq $25,48($19) \n\ 244 + xor $3,$5,$5 \n\ 245 + ldq $27,48($20) \n\ 246 + ldq $0,56($17) \n\ 247 + \n\ 248 + ldq $1,56($18) \n\ 249 + ldq $2,56($19) \n\ 250 + xor $6,$7,$7 # 8 cycles from $6 load \n\ 251 + ldq $3,56($20) \n\ 252 + \n\ 253 + stq $5,32($17) \n\ 254 + xor $21,$22,$22 # 8 cycles from $22 load \n\ 255 + xor $7,$22,$22 \n\ 256 + xor $23,$24,$24 # 5 cycles from $24 load \n\ 257 + \n\ 258 + stq $22,40($17) \n\ 259 + xor $25,$27,$27 # 5 cycles from $27 load \n\ 260 + xor $24,$27,$27 \n\ 261 + xor $0,$1,$1 # 5 cycles from $1 load \n\ 262 + \n\ 263 + stq $27,48($17) \n\ 264 + xor $2,$3,$3 # 4 cycles from $3 load \n\ 265 + xor $1,$3,$3 \n\ 266 + subq $16,1,$16 \n\ 267 + \n\ 268 + stq $3,56($17) \n\ 269 + addq $20,64,$20 \n\ 270 + addq $19,64,$19 \n\ 271 + addq $18,64,$18 \n\ 272 + \n\ 273 + addq $17,64,$17 \n\ 274 + bgt $16,4b \n\ 275 + ret \n\ 276 + .end xor_alpha_4 \n\ 277 + \n\ 278 + .align 3 \n\ 279 + .ent xor_alpha_5 \n\ 280 + xor_alpha_5: \n\ 281 + .prologue 0 \n\ 282 + srl $16, 6, $16 \n\ 283 + .align 4 \n\ 284 + 5: \n\ 285 + ldq $0,0($17) \n\ 286 + ldq $1,0($18) \n\ 287 + ldq $2,0($19) \n\ 288 + ldq $3,0($20) \n\ 289 + \n\ 290 + ldq $4,0($21) \n\ 291 + ldq $5,8($17) \n\ 292 + ldq $6,8($18) \n\ 293 + ldq $7,8($19) \n\ 294 + \n\ 295 + ldq $22,8($20) \n\ 296 + ldq $23,8($21) \n\ 297 + ldq $24,16($17) \n\ 298 + ldq $25,16($18) \n\ 299 + \n\ 300 + ldq $27,16($19) \n\ 301 + xor $0,$1,$1 # 6 cycles from $1 load \n\ 302 + ldq $28,16($20) \n\ 303 + xor $2,$3,$3 # 6 cycles from $3 load \n\ 304 + \n\ 305 + ldq $0,16($21) \n\ 306 + xor $1,$3,$3 \n\ 307 + ldq $1,24($17) \n\ 308 + xor $3,$4,$4 # 7 cycles from $4 load \n\ 309 + \n\ 310 + stq $4,0($17) \n\ 311 + xor $5,$6,$6 # 7 cycles from $6 load \n\ 312 + xor $7,$22,$22 # 7 cycles from $22 load \n\ 313 + xor $6,$23,$23 # 7 cycles from $23 load \n\ 314 + \n\ 315 + ldq $2,24($18) \n\ 316 + xor $22,$23,$23 \n\ 317 + ldq $3,24($19) \n\ 318 + xor $24,$25,$25 # 8 cycles from $25 load \n\ 319 + \n\ 320 + stq $23,8($17) \n\ 321 + xor $25,$27,$27 # 8 cycles from $27 load \n\ 322 + ldq $4,24($20) \n\ 323 + xor $28,$0,$0 # 7 cycles from $0 load \n\ 324 + \n\ 325 + ldq $5,24($21) \n\ 326 + xor $27,$0,$0 \n\ 327 + ldq $6,32($17) \n\ 328 + ldq $7,32($18) \n\ 329 + \n\ 330 + stq $0,16($17) \n\ 331 + xor $1,$2,$2 # 6 cycles from $2 load \n\ 332 + ldq $22,32($19) \n\ 333 + xor $3,$4,$4 # 4 cycles from $4 load \n\ 334 + \n\ 335 + ldq $23,32($20) \n\ 336 + xor $2,$4,$4 \n\ 337 + ldq $24,32($21) \n\ 338 + ldq $25,40($17) \n\ 339 + \n\ 340 + ldq $27,40($18) \n\ 341 + ldq $28,40($19) \n\ 342 + ldq $0,40($20) \n\ 343 + xor $4,$5,$5 # 7 cycles from $5 load \n\ 344 + \n\ 345 + stq $5,24($17) \n\ 346 + xor $6,$7,$7 # 7 cycles from $7 load \n\ 347 + ldq $1,40($21) \n\ 348 + ldq $2,48($17) \n\ 349 + \n\ 350 + ldq $3,48($18) \n\ 351 + xor $7,$22,$22 # 7 cycles from $22 load \n\ 352 + ldq $4,48($19) \n\ 353 + xor $23,$24,$24 # 6 cycles from $24 load \n\ 354 + \n\ 355 + ldq $5,48($20) \n\ 356 + xor $22,$24,$24 \n\ 357 + ldq $6,48($21) \n\ 358 + xor $25,$27,$27 # 7 cycles from $27 load \n\ 359 + \n\ 360 + stq $24,32($17) \n\ 361 + xor $27,$28,$28 # 8 cycles from $28 load \n\ 362 + ldq $7,56($17) \n\ 363 + xor $0,$1,$1 # 6 cycles from $1 load \n\ 364 + \n\ 365 + ldq $22,56($18) \n\ 366 + ldq $23,56($19) \n\ 367 + ldq $24,56($20) \n\ 368 + ldq $25,56($21) \n\ 369 + \n\ 370 + xor $28,$1,$1 \n\ 371 + xor $2,$3,$3 # 9 cycles from $3 load \n\ 372 + xor $3,$4,$4 # 9 cycles from $4 load \n\ 373 + xor $5,$6,$6 # 8 cycles from $6 load \n\ 374 + \n\ 375 + stq $1,40($17) \n\ 376 + xor $4,$6,$6 \n\ 377 + xor $7,$22,$22 # 7 cycles from $22 load \n\ 378 + xor $23,$24,$24 # 6 cycles from $24 load \n\ 379 + \n\ 380 + stq $6,48($17) \n\ 381 + xor $22,$24,$24 \n\ 382 + subq $16,1,$16 \n\ 383 + xor $24,$25,$25 # 8 cycles from $25 load \n\ 384 + \n\ 385 + stq $25,56($17) \n\ 386 + addq $21,64,$21 \n\ 387 + addq $20,64,$20 \n\ 388 + addq $19,64,$19 \n\ 389 + \n\ 390 + addq $18,64,$18 \n\ 391 + addq $17,64,$17 \n\ 392 + bgt $16,5b \n\ 393 + ret \n\ 394 + .end xor_alpha_5 \n\ 395 + \n\ 396 + .align 3 \n\ 397 + .ent xor_alpha_prefetch_2 \n\ 398 + xor_alpha_prefetch_2: \n\ 399 + .prologue 0 \n\ 400 + srl $16, 6, $16 \n\ 401 + \n\ 402 + ldq $31, 0($17) \n\ 403 + ldq $31, 0($18) \n\ 404 + \n\ 405 + ldq $31, 64($17) \n\ 406 + ldq $31, 64($18) \n\ 407 + \n\ 408 + ldq $31, 128($17) \n\ 409 + ldq $31, 128($18) \n\ 410 + \n\ 411 + ldq $31, 192($17) \n\ 412 + ldq $31, 192($18) \n\ 413 + .align 4 \n\ 414 + 2: \n\ 415 + ldq $0,0($17) \n\ 416 + ldq $1,0($18) \n\ 417 + ldq $2,8($17) \n\ 418 + ldq $3,8($18) \n\ 419 + \n\ 420 + ldq $4,16($17) \n\ 421 + ldq $5,16($18) \n\ 422 + ldq $6,24($17) \n\ 423 + ldq $7,24($18) \n\ 424 + \n\ 425 + ldq $19,32($17) \n\ 426 + ldq $20,32($18) \n\ 427 + ldq $21,40($17) \n\ 428 + ldq $22,40($18) \n\ 429 + \n\ 430 + ldq $23,48($17) \n\ 431 + ldq $24,48($18) \n\ 432 + ldq $25,56($17) \n\ 433 + ldq $27,56($18) \n\ 434 + \n\ 435 + ldq $31,256($17) \n\ 436 + xor $0,$1,$0 # 8 cycles from $1 load \n\ 437 + ldq $31,256($18) \n\ 438 + xor $2,$3,$2 \n\ 439 + \n\ 440 + stq $0,0($17) \n\ 441 + xor $4,$5,$4 \n\ 442 + stq $2,8($17) \n\ 443 + xor $6,$7,$6 \n\ 444 + \n\ 445 + stq $4,16($17) \n\ 446 + xor $19,$20,$19 \n\ 447 + stq $6,24($17) \n\ 448 + xor $21,$22,$21 \n\ 449 + \n\ 450 + stq $19,32($17) \n\ 451 + xor $23,$24,$23 \n\ 452 + stq $21,40($17) \n\ 453 + xor $25,$27,$25 \n\ 454 + \n\ 455 + stq $23,48($17) \n\ 456 + subq $16,1,$16 \n\ 457 + stq $25,56($17) \n\ 458 + addq $17,64,$17 \n\ 459 + \n\ 460 + addq $18,64,$18 \n\ 461 + bgt $16,2b \n\ 462 + ret \n\ 463 + .end xor_alpha_prefetch_2 \n\ 464 + \n\ 465 + .align 3 \n\ 466 + .ent xor_alpha_prefetch_3 \n\ 467 + xor_alpha_prefetch_3: \n\ 468 + .prologue 0 \n\ 469 + srl $16, 6, $16 \n\ 470 + \n\ 471 + ldq $31, 0($17) \n\ 472 + ldq $31, 0($18) \n\ 473 + ldq $31, 0($19) \n\ 474 + \n\ 475 + ldq $31, 64($17) \n\ 476 + ldq $31, 64($18) \n\ 477 + ldq $31, 64($19) \n\ 478 + \n\ 479 + ldq $31, 128($17) \n\ 480 + ldq $31, 128($18) \n\ 481 + ldq $31, 128($19) \n\ 482 + \n\ 483 + ldq $31, 192($17) \n\ 484 + ldq $31, 192($18) \n\ 485 + ldq $31, 192($19) \n\ 486 + .align 4 \n\ 487 + 3: \n\ 488 + ldq $0,0($17) \n\ 489 + ldq $1,0($18) \n\ 490 + ldq $2,0($19) \n\ 491 + ldq $3,8($17) \n\ 492 + \n\ 493 + ldq $4,8($18) \n\ 494 + ldq $6,16($17) \n\ 495 + ldq $7,16($18) \n\ 496 + ldq $21,24($17) \n\ 497 + \n\ 498 + ldq $22,24($18) \n\ 499 + ldq $24,32($17) \n\ 500 + ldq $25,32($18) \n\ 501 + ldq $5,8($19) \n\ 502 + \n\ 503 + ldq $20,16($19) \n\ 504 + ldq $23,24($19) \n\ 505 + ldq $27,32($19) \n\ 506 + nop \n\ 507 + \n\ 508 + xor $0,$1,$1 # 8 cycles from $0 load \n\ 509 + xor $3,$4,$4 # 7 cycles from $4 load \n\ 510 + xor $6,$7,$7 # 6 cycles from $7 load \n\ 511 + xor $21,$22,$22 # 5 cycles from $22 load \n\ 512 + \n\ 513 + xor $1,$2,$2 # 9 cycles from $2 load \n\ 514 + xor $24,$25,$25 # 5 cycles from $25 load \n\ 515 + stq $2,0($17) \n\ 516 + xor $4,$5,$5 # 6 cycles from $5 load \n\ 517 + \n\ 518 + stq $5,8($17) \n\ 519 + xor $7,$20,$20 # 7 cycles from $20 load \n\ 520 + stq $20,16($17) \n\ 521 + xor $22,$23,$23 # 7 cycles from $23 load \n\ 522 + \n\ 523 + stq $23,24($17) \n\ 524 + xor $25,$27,$27 # 7 cycles from $27 load \n\ 525 + stq $27,32($17) \n\ 526 + nop \n\ 527 + \n\ 528 + ldq $0,40($17) \n\ 529 + ldq $1,40($18) \n\ 530 + ldq $3,48($17) \n\ 531 + ldq $4,48($18) \n\ 532 + \n\ 533 + ldq $6,56($17) \n\ 534 + ldq $7,56($18) \n\ 535 + ldq $2,40($19) \n\ 536 + ldq $5,48($19) \n\ 537 + \n\ 538 + ldq $20,56($19) \n\ 539 + ldq $31,256($17) \n\ 540 + ldq $31,256($18) \n\ 541 + ldq $31,256($19) \n\ 542 + \n\ 543 + xor $0,$1,$1 # 6 cycles from $1 load \n\ 544 + xor $3,$4,$4 # 5 cycles from $4 load \n\ 545 + xor $6,$7,$7 # 5 cycles from $7 load \n\ 546 + xor $1,$2,$2 # 4 cycles from $2 load \n\ 547 + \n\ 548 + xor $4,$5,$5 # 5 cycles from $5 load \n\ 549 + xor $7,$20,$20 # 4 cycles from $20 load \n\ 550 + stq $2,40($17) \n\ 551 + subq $16,1,$16 \n\ 552 + \n\ 553 + stq $5,48($17) \n\ 554 + addq $19,64,$19 \n\ 555 + stq $20,56($17) \n\ 556 + addq $18,64,$18 \n\ 557 + \n\ 558 + addq $17,64,$17 \n\ 559 + bgt $16,3b \n\ 560 + ret \n\ 561 + .end xor_alpha_prefetch_3 \n\ 562 + \n\ 563 + .align 3 \n\ 564 + .ent xor_alpha_prefetch_4 \n\ 565 + xor_alpha_prefetch_4: \n\ 566 + .prologue 0 \n\ 567 + srl $16, 6, $16 \n\ 568 + \n\ 569 + ldq $31, 0($17) \n\ 570 + ldq $31, 0($18) \n\ 571 + ldq $31, 0($19) \n\ 572 + ldq $31, 0($20) \n\ 573 + \n\ 574 + ldq $31, 64($17) \n\ 575 + ldq $31, 64($18) \n\ 576 + ldq $31, 64($19) \n\ 577 + ldq $31, 64($20) \n\ 578 + \n\ 579 + ldq $31, 128($17) \n\ 580 + ldq $31, 128($18) \n\ 581 + ldq $31, 128($19) \n\ 582 + ldq $31, 128($20) \n\ 583 + \n\ 584 + ldq $31, 192($17) \n\ 585 + ldq $31, 192($18) \n\ 586 + ldq $31, 192($19) \n\ 587 + ldq $31, 192($20) \n\ 588 + .align 4 \n\ 589 + 4: \n\ 590 + ldq $0,0($17) \n\ 591 + ldq $1,0($18) \n\ 592 + ldq $2,0($19) \n\ 593 + ldq $3,0($20) \n\ 594 + \n\ 595 + ldq $4,8($17) \n\ 596 + ldq $5,8($18) \n\ 597 + ldq $6,8($19) \n\ 598 + ldq $7,8($20) \n\ 599 + \n\ 600 + ldq $21,16($17) \n\ 601 + ldq $22,16($18) \n\ 602 + ldq $23,16($19) \n\ 603 + ldq $24,16($20) \n\ 604 + \n\ 605 + ldq $25,24($17) \n\ 606 + xor $0,$1,$1 # 6 cycles from $1 load \n\ 607 + ldq $27,24($18) \n\ 608 + xor $2,$3,$3 # 6 cycles from $3 load \n\ 609 + \n\ 610 + ldq $0,24($19) \n\ 611 + xor $1,$3,$3 \n\ 612 + ldq $1,24($20) \n\ 613 + xor $4,$5,$5 # 7 cycles from $5 load \n\ 614 + \n\ 615 + stq $3,0($17) \n\ 616 + xor $6,$7,$7 \n\ 617 + xor $21,$22,$22 # 7 cycles from $22 load \n\ 618 + xor $5,$7,$7 \n\ 619 + \n\ 620 + stq $7,8($17) \n\ 621 + xor $23,$24,$24 # 7 cycles from $24 load \n\ 622 + ldq $2,32($17) \n\ 623 + xor $22,$24,$24 \n\ 624 + \n\ 625 + ldq $3,32($18) \n\ 626 + ldq $4,32($19) \n\ 627 + ldq $5,32($20) \n\ 628 + xor $25,$27,$27 # 8 cycles from $27 load \n\ 629 + \n\ 630 + ldq $6,40($17) \n\ 631 + ldq $7,40($18) \n\ 632 + ldq $21,40($19) \n\ 633 + ldq $22,40($20) \n\ 634 + \n\ 635 + stq $24,16($17) \n\ 636 + xor $0,$1,$1 # 9 cycles from $1 load \n\ 637 + xor $2,$3,$3 # 5 cycles from $3 load \n\ 638 + xor $27,$1,$1 \n\ 639 + \n\ 640 + stq $1,24($17) \n\ 641 + xor $4,$5,$5 # 5 cycles from $5 load \n\ 642 + ldq $23,48($17) \n\ 643 + xor $3,$5,$5 \n\ 644 + \n\ 645 + ldq $24,48($18) \n\ 646 + ldq $25,48($19) \n\ 647 + ldq $27,48($20) \n\ 648 + ldq $0,56($17) \n\ 649 + \n\ 650 + ldq $1,56($18) \n\ 651 + ldq $2,56($19) \n\ 652 + ldq $3,56($20) \n\ 653 + xor $6,$7,$7 # 8 cycles from $6 load \n\ 654 + \n\ 655 + ldq $31,256($17) \n\ 656 + xor $21,$22,$22 # 8 cycles from $22 load \n\ 657 + ldq $31,256($18) \n\ 658 + xor $7,$22,$22 \n\ 659 + \n\ 660 + ldq $31,256($19) \n\ 661 + xor $23,$24,$24 # 6 cycles from $24 load \n\ 662 + ldq $31,256($20) \n\ 663 + xor $25,$27,$27 # 6 cycles from $27 load \n\ 664 + \n\ 665 + stq $5,32($17) \n\ 666 + xor $24,$27,$27 \n\ 667 + xor $0,$1,$1 # 7 cycles from $1 load \n\ 668 + xor $2,$3,$3 # 6 cycles from $3 load \n\ 669 + \n\ 670 + stq $22,40($17) \n\ 671 + xor $1,$3,$3 \n\ 672 + stq $27,48($17) \n\ 673 + subq $16,1,$16 \n\ 674 + \n\ 675 + stq $3,56($17) \n\ 676 + addq $20,64,$20 \n\ 677 + addq $19,64,$19 \n\ 678 + addq $18,64,$18 \n\ 679 + \n\ 680 + addq $17,64,$17 \n\ 681 + bgt $16,4b \n\ 682 + ret \n\ 683 + .end xor_alpha_prefetch_4 \n\ 684 + \n\ 685 + .align 3 \n\ 686 + .ent xor_alpha_prefetch_5 \n\ 687 + xor_alpha_prefetch_5: \n\ 688 + .prologue 0 \n\ 689 + srl $16, 6, $16 \n\ 690 + \n\ 691 + ldq $31, 0($17) \n\ 692 + ldq $31, 0($18) \n\ 693 + ldq $31, 0($19) \n\ 694 + ldq $31, 0($20) \n\ 695 + ldq $31, 0($21) \n\ 696 + \n\ 697 + ldq $31, 64($17) \n\ 698 + ldq $31, 64($18) \n\ 699 + ldq $31, 64($19) \n\ 700 + ldq $31, 64($20) \n\ 701 + ldq $31, 64($21) \n\ 702 + \n\ 703 + ldq $31, 128($17) \n\ 704 + ldq $31, 128($18) \n\ 705 + ldq $31, 128($19) \n\ 706 + ldq $31, 128($20) \n\ 707 + ldq $31, 128($21) \n\ 708 + \n\ 709 + ldq $31, 192($17) \n\ 710 + ldq $31, 192($18) \n\ 711 + ldq $31, 192($19) \n\ 712 + ldq $31, 192($20) \n\ 713 + ldq $31, 192($21) \n\ 714 + .align 4 \n\ 715 + 5: \n\ 716 + ldq $0,0($17) \n\ 717 + ldq $1,0($18) \n\ 718 + ldq $2,0($19) \n\ 719 + ldq $3,0($20) \n\ 720 + \n\ 721 + ldq $4,0($21) \n\ 722 + ldq $5,8($17) \n\ 723 + ldq $6,8($18) \n\ 724 + ldq $7,8($19) \n\ 725 + \n\ 726 + ldq $22,8($20) \n\ 727 + ldq $23,8($21) \n\ 728 + ldq $24,16($17) \n\ 729 + ldq $25,16($18) \n\ 730 + \n\ 731 + ldq $27,16($19) \n\ 732 + xor $0,$1,$1 # 6 cycles from $1 load \n\ 733 + ldq $28,16($20) \n\ 734 + xor $2,$3,$3 # 6 cycles from $3 load \n\ 735 + \n\ 736 + ldq $0,16($21) \n\ 737 + xor $1,$3,$3 \n\ 738 + ldq $1,24($17) \n\ 739 + xor $3,$4,$4 # 7 cycles from $4 load \n\ 740 + \n\ 741 + stq $4,0($17) \n\ 742 + xor $5,$6,$6 # 7 cycles from $6 load \n\ 743 + xor $7,$22,$22 # 7 cycles from $22 load \n\ 744 + xor $6,$23,$23 # 7 cycles from $23 load \n\ 745 + \n\ 746 + ldq $2,24($18) \n\ 747 + xor $22,$23,$23 \n\ 748 + ldq $3,24($19) \n\ 749 + xor $24,$25,$25 # 8 cycles from $25 load \n\ 750 + \n\ 751 + stq $23,8($17) \n\ 752 + xor $25,$27,$27 # 8 cycles from $27 load \n\ 753 + ldq $4,24($20) \n\ 754 + xor $28,$0,$0 # 7 cycles from $0 load \n\ 755 + \n\ 756 + ldq $5,24($21) \n\ 757 + xor $27,$0,$0 \n\ 758 + ldq $6,32($17) \n\ 759 + ldq $7,32($18) \n\ 760 + \n\ 761 + stq $0,16($17) \n\ 762 + xor $1,$2,$2 # 6 cycles from $2 load \n\ 763 + ldq $22,32($19) \n\ 764 + xor $3,$4,$4 # 4 cycles from $4 load \n\ 765 + \n\ 766 + ldq $23,32($20) \n\ 767 + xor $2,$4,$4 \n\ 768 + ldq $24,32($21) \n\ 769 + ldq $25,40($17) \n\ 770 + \n\ 771 + ldq $27,40($18) \n\ 772 + ldq $28,40($19) \n\ 773 + ldq $0,40($20) \n\ 774 + xor $4,$5,$5 # 7 cycles from $5 load \n\ 775 + \n\ 776 + stq $5,24($17) \n\ 777 + xor $6,$7,$7 # 7 cycles from $7 load \n\ 778 + ldq $1,40($21) \n\ 779 + ldq $2,48($17) \n\ 780 + \n\ 781 + ldq $3,48($18) \n\ 782 + xor $7,$22,$22 # 7 cycles from $22 load \n\ 783 + ldq $4,48($19) \n\ 784 + xor $23,$24,$24 # 6 cycles from $24 load \n\ 785 + \n\ 786 + ldq $5,48($20) \n\ 787 + xor $22,$24,$24 \n\ 788 + ldq $6,48($21) \n\ 789 + xor $25,$27,$27 # 7 cycles from $27 load \n\ 790 + \n\ 791 + stq $24,32($17) \n\ 792 + xor $27,$28,$28 # 8 cycles from $28 load \n\ 793 + ldq $7,56($17) \n\ 794 + xor $0,$1,$1 # 6 cycles from $1 load \n\ 795 + \n\ 796 + ldq $22,56($18) \n\ 797 + ldq $23,56($19) \n\ 798 + ldq $24,56($20) \n\ 799 + ldq $25,56($21) \n\ 800 + \n\ 801 + ldq $31,256($17) \n\ 802 + xor $28,$1,$1 \n\ 803 + ldq $31,256($18) \n\ 804 + xor $2,$3,$3 # 9 cycles from $3 load \n\ 805 + \n\ 806 + ldq $31,256($19) \n\ 807 + xor $3,$4,$4 # 9 cycles from $4 load \n\ 808 + ldq $31,256($20) \n\ 809 + xor $5,$6,$6 # 8 cycles from $6 load \n\ 810 + \n\ 811 + stq $1,40($17) \n\ 812 + xor $4,$6,$6 \n\ 813 + xor $7,$22,$22 # 7 cycles from $22 load \n\ 814 + xor $23,$24,$24 # 6 cycles from $24 load \n\ 815 + \n\ 816 + stq $6,48($17) \n\ 817 + xor $22,$24,$24 \n\ 818 + ldq $31,256($21) \n\ 819 + xor $24,$25,$25 # 8 cycles from $25 load \n\ 820 + \n\ 821 + stq $25,56($17) \n\ 822 + subq $16,1,$16 \n\ 823 + addq $21,64,$21 \n\ 824 + addq $20,64,$20 \n\ 825 + \n\ 826 + addq $19,64,$19 \n\ 827 + addq $18,64,$18 \n\ 828 + addq $17,64,$17 \n\ 829 + bgt $16,5b \n\ 830 + \n\ 831 + ret \n\ 832 + .end xor_alpha_prefetch_5 \n\ 833 + "); 834 + 835 + DO_XOR_BLOCKS(alpha, xor_alpha_2, xor_alpha_3, xor_alpha_4, xor_alpha_5); 836 + 837 + struct xor_block_template xor_block_alpha = { 838 + .name = "alpha", 839 + .xor_gen = xor_gen_alpha, 840 + }; 841 + 842 + DO_XOR_BLOCKS(alpha_prefetch, xor_alpha_prefetch_2, xor_alpha_prefetch_3, 843 + xor_alpha_prefetch_4, xor_alpha_prefetch_5); 844 + 845 + struct xor_block_template xor_block_alpha_prefetch = { 846 + .name = "alpha prefetch", 847 + .xor_gen = xor_gen_alpha_prefetch, 848 + };

+22

lib/raid/xor/alpha/xor_arch.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0-or-later */ 2 + 3 + #include <asm/special_insns.h> 4 + 5 + extern struct xor_block_template xor_block_alpha; 6 + extern struct xor_block_template xor_block_alpha_prefetch; 7 + 8 + /* 9 + * Force the use of alpha_prefetch if EV6, as it is significantly faster in the 10 + * cold cache case. 11 + */ 12 + static __always_inline void __init arch_xor_init(void) 13 + { 14 + if (implver() == IMPLVER_EV6) { 15 + xor_force(&xor_block_alpha_prefetch); 16 + } else { 17 + xor_register(&xor_block_8regs); 18 + xor_register(&xor_block_32regs); 19 + xor_register(&xor_block_alpha); 20 + xor_register(&xor_block_alpha_prefetch); 21 + } 22 + }

+19

lib/raid/xor/arm/xor-neon-glue.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * Copyright (C) 2001 Russell King 4 + */ 5 + #include "xor_impl.h" 6 + #include "xor_arch.h" 7 + 8 + static void xor_gen_neon(void *dest, void **srcs, unsigned int src_cnt, 9 + unsigned int bytes) 10 + { 11 + kernel_neon_begin(); 12 + xor_gen_neon_inner(dest, srcs, src_cnt, bytes); 13 + kernel_neon_end(); 14 + } 15 + 16 + struct xor_block_template xor_block_neon = { 17 + .name = "neon", 18 + .xor_gen = xor_gen_neon, 19 + };

+26

lib/raid/xor/arm/xor-neon.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * Copyright (C) 2013 Linaro Ltd <ard.biesheuvel@linaro.org> 4 + */ 5 + 6 + #include "xor_impl.h" 7 + #include "xor_arch.h" 8 + 9 + #ifndef __ARM_NEON__ 10 + #error You should compile this file with '-march=armv7-a -mfloat-abi=softfp -mfpu=neon' 11 + #endif 12 + 13 + /* 14 + * Pull in the reference implementations while instructing GCC (through 15 + * -ftree-vectorize) to attempt to exploit implicit parallelism and emit 16 + * NEON instructions. Clang does this by default at O2 so no pragma is 17 + * needed. 18 + */ 19 + #ifdef CONFIG_CC_IS_GCC 20 + #pragma GCC optimize "tree-vectorize" 21 + #endif 22 + 23 + #define NO_TEMPLATE 24 + #include "../xor-8regs.c" 25 + 26 + __DO_XOR_BLOCKS(neon_inner, xor_8regs_2, xor_8regs_3, xor_8regs_4, xor_8regs_5);

+136

lib/raid/xor/arm/xor.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * Copyright (C) 2001 Russell King 4 + */ 5 + #include "xor_impl.h" 6 + #include "xor_arch.h" 7 + 8 + #define __XOR(a1, a2) a1 ^= a2 9 + 10 + #define GET_BLOCK_2(dst) \ 11 + __asm__("ldmia %0, {%1, %2}" \ 12 + : "=r" (dst), "=r" (a1), "=r" (a2) \ 13 + : "0" (dst)) 14 + 15 + #define GET_BLOCK_4(dst) \ 16 + __asm__("ldmia %0, {%1, %2, %3, %4}" \ 17 + : "=r" (dst), "=r" (a1), "=r" (a2), "=r" (a3), "=r" (a4) \ 18 + : "0" (dst)) 19 + 20 + #define XOR_BLOCK_2(src) \ 21 + __asm__("ldmia %0!, {%1, %2}" \ 22 + : "=r" (src), "=r" (b1), "=r" (b2) \ 23 + : "0" (src)); \ 24 + __XOR(a1, b1); __XOR(a2, b2); 25 + 26 + #define XOR_BLOCK_4(src) \ 27 + __asm__("ldmia %0!, {%1, %2, %3, %4}" \ 28 + : "=r" (src), "=r" (b1), "=r" (b2), "=r" (b3), "=r" (b4) \ 29 + : "0" (src)); \ 30 + __XOR(a1, b1); __XOR(a2, b2); __XOR(a3, b3); __XOR(a4, b4) 31 + 32 + #define PUT_BLOCK_2(dst) \ 33 + __asm__ __volatile__("stmia %0!, {%2, %3}" \ 34 + : "=r" (dst) \ 35 + : "0" (dst), "r" (a1), "r" (a2)) 36 + 37 + #define PUT_BLOCK_4(dst) \ 38 + __asm__ __volatile__("stmia %0!, {%2, %3, %4, %5}" \ 39 + : "=r" (dst) \ 40 + : "0" (dst), "r" (a1), "r" (a2), "r" (a3), "r" (a4)) 41 + 42 + static void 43 + xor_arm4regs_2(unsigned long bytes, unsigned long * __restrict p1, 44 + const unsigned long * __restrict p2) 45 + { 46 + unsigned int lines = bytes / sizeof(unsigned long) / 4; 47 + register unsigned int a1 __asm__("r4"); 48 + register unsigned int a2 __asm__("r5"); 49 + register unsigned int a3 __asm__("r6"); 50 + register unsigned int a4 __asm__("r10"); 51 + register unsigned int b1 __asm__("r8"); 52 + register unsigned int b2 __asm__("r9"); 53 + register unsigned int b3 __asm__("ip"); 54 + register unsigned int b4 __asm__("lr"); 55 + 56 + do { 57 + GET_BLOCK_4(p1); 58 + XOR_BLOCK_4(p2); 59 + PUT_BLOCK_4(p1); 60 + } while (--lines); 61 + } 62 + 63 + static void 64 + xor_arm4regs_3(unsigned long bytes, unsigned long * __restrict p1, 65 + const unsigned long * __restrict p2, 66 + const unsigned long * __restrict p3) 67 + { 68 + unsigned int lines = bytes / sizeof(unsigned long) / 4; 69 + register unsigned int a1 __asm__("r4"); 70 + register unsigned int a2 __asm__("r5"); 71 + register unsigned int a3 __asm__("r6"); 72 + register unsigned int a4 __asm__("r10"); 73 + register unsigned int b1 __asm__("r8"); 74 + register unsigned int b2 __asm__("r9"); 75 + register unsigned int b3 __asm__("ip"); 76 + register unsigned int b4 __asm__("lr"); 77 + 78 + do { 79 + GET_BLOCK_4(p1); 80 + XOR_BLOCK_4(p2); 81 + XOR_BLOCK_4(p3); 82 + PUT_BLOCK_4(p1); 83 + } while (--lines); 84 + } 85 + 86 + static void 87 + xor_arm4regs_4(unsigned long bytes, unsigned long * __restrict p1, 88 + const unsigned long * __restrict p2, 89 + const unsigned long * __restrict p3, 90 + const unsigned long * __restrict p4) 91 + { 92 + unsigned int lines = bytes / sizeof(unsigned long) / 2; 93 + register unsigned int a1 __asm__("r8"); 94 + register unsigned int a2 __asm__("r9"); 95 + register unsigned int b1 __asm__("ip"); 96 + register unsigned int b2 __asm__("lr"); 97 + 98 + do { 99 + GET_BLOCK_2(p1); 100 + XOR_BLOCK_2(p2); 101 + XOR_BLOCK_2(p3); 102 + XOR_BLOCK_2(p4); 103 + PUT_BLOCK_2(p1); 104 + } while (--lines); 105 + } 106 + 107 + static void 108 + xor_arm4regs_5(unsigned long bytes, unsigned long * __restrict p1, 109 + const unsigned long * __restrict p2, 110 + const unsigned long * __restrict p3, 111 + const unsigned long * __restrict p4, 112 + const unsigned long * __restrict p5) 113 + { 114 + unsigned int lines = bytes / sizeof(unsigned long) / 2; 115 + register unsigned int a1 __asm__("r8"); 116 + register unsigned int a2 __asm__("r9"); 117 + register unsigned int b1 __asm__("ip"); 118 + register unsigned int b2 __asm__("lr"); 119 + 120 + do { 121 + GET_BLOCK_2(p1); 122 + XOR_BLOCK_2(p2); 123 + XOR_BLOCK_2(p3); 124 + XOR_BLOCK_2(p4); 125 + XOR_BLOCK_2(p5); 126 + PUT_BLOCK_2(p1); 127 + } while (--lines); 128 + } 129 + 130 + DO_XOR_BLOCKS(arm4regs, xor_arm4regs_2, xor_arm4regs_3, xor_arm4regs_4, 131 + xor_arm4regs_5); 132 + 133 + struct xor_block_template xor_block_arm4regs = { 134 + .name = "arm4regs", 135 + .xor_gen = xor_gen_arm4regs, 136 + };

+22

lib/raid/xor/arm/xor_arch.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + /* 3 + * Copyright (C) 2001 Russell King 4 + */ 5 + #include <asm/neon.h> 6 + 7 + extern struct xor_block_template xor_block_arm4regs; 8 + extern struct xor_block_template xor_block_neon; 9 + 10 + void xor_gen_neon_inner(void *dest, void **srcs, unsigned int src_cnt, 11 + unsigned int bytes); 12 + 13 + static __always_inline void __init arch_xor_init(void) 14 + { 15 + xor_register(&xor_block_arm4regs); 16 + xor_register(&xor_block_8regs); 17 + xor_register(&xor_block_32regs); 18 + #ifdef CONFIG_KERNEL_MODE_NEON 19 + if (cpu_has_neon()) 20 + xor_register(&xor_block_neon); 21 + #endif 22 + }

+26

lib/raid/xor/arm64/xor-neon-glue.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * Authors: Jackie Liu <liuyun01@kylinos.cn> 4 + * Copyright (C) 2018,Tianjin KYLIN Information Technology Co., Ltd. 5 + */ 6 + 7 + #include <asm/simd.h> 8 + #include "xor_impl.h" 9 + #include "xor_arch.h" 10 + #include "xor-neon.h" 11 + 12 + #define XOR_TEMPLATE(_name) \ 13 + static void xor_gen_##_name(void *dest, void **srcs, unsigned int src_cnt, \ 14 + unsigned int bytes) \ 15 + { \ 16 + scoped_ksimd() \ 17 + xor_gen_##_name##_inner(dest, srcs, src_cnt, bytes); \ 18 + } \ 19 + \ 20 + struct xor_block_template xor_block_##_name = { \ 21 + .name = __stringify(_name), \ 22 + .xor_gen = xor_gen_##_name, \ 23 + }; 24 + 25 + XOR_TEMPLATE(neon); 26 + XOR_TEMPLATE(eor3);

+312

lib/raid/xor/arm64/xor-neon.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * Authors: Jackie Liu <liuyun01@kylinos.cn> 4 + * Copyright (C) 2018,Tianjin KYLIN Information Technology Co., Ltd. 5 + */ 6 + 7 + #include <linux/cache.h> 8 + #include <asm/neon-intrinsics.h> 9 + #include "xor_impl.h" 10 + #include "xor_arch.h" 11 + #include "xor-neon.h" 12 + 13 + static void __xor_neon_2(unsigned long bytes, unsigned long * __restrict p1, 14 + const unsigned long * __restrict p2) 15 + { 16 + uint64_t *dp1 = (uint64_t *)p1; 17 + uint64_t *dp2 = (uint64_t *)p2; 18 + 19 + register uint64x2_t v0, v1, v2, v3; 20 + long lines = bytes / (sizeof(uint64x2_t) * 4); 21 + 22 + do { 23 + /* p1 ^= p2 */ 24 + v0 = veorq_u64(vld1q_u64(dp1 + 0), vld1q_u64(dp2 + 0)); 25 + v1 = veorq_u64(vld1q_u64(dp1 + 2), vld1q_u64(dp2 + 2)); 26 + v2 = veorq_u64(vld1q_u64(dp1 + 4), vld1q_u64(dp2 + 4)); 27 + v3 = veorq_u64(vld1q_u64(dp1 + 6), vld1q_u64(dp2 + 6)); 28 + 29 + /* store */ 30 + vst1q_u64(dp1 + 0, v0); 31 + vst1q_u64(dp1 + 2, v1); 32 + vst1q_u64(dp1 + 4, v2); 33 + vst1q_u64(dp1 + 6, v3); 34 + 35 + dp1 += 8; 36 + dp2 += 8; 37 + } while (--lines > 0); 38 + } 39 + 40 + static void __xor_neon_3(unsigned long bytes, unsigned long * __restrict p1, 41 + const unsigned long * __restrict p2, 42 + const unsigned long * __restrict p3) 43 + { 44 + uint64_t *dp1 = (uint64_t *)p1; 45 + uint64_t *dp2 = (uint64_t *)p2; 46 + uint64_t *dp3 = (uint64_t *)p3; 47 + 48 + register uint64x2_t v0, v1, v2, v3; 49 + long lines = bytes / (sizeof(uint64x2_t) * 4); 50 + 51 + do { 52 + /* p1 ^= p2 */ 53 + v0 = veorq_u64(vld1q_u64(dp1 + 0), vld1q_u64(dp2 + 0)); 54 + v1 = veorq_u64(vld1q_u64(dp1 + 2), vld1q_u64(dp2 + 2)); 55 + v2 = veorq_u64(vld1q_u64(dp1 + 4), vld1q_u64(dp2 + 4)); 56 + v3 = veorq_u64(vld1q_u64(dp1 + 6), vld1q_u64(dp2 + 6)); 57 + 58 + /* p1 ^= p3 */ 59 + v0 = veorq_u64(v0, vld1q_u64(dp3 + 0)); 60 + v1 = veorq_u64(v1, vld1q_u64(dp3 + 2)); 61 + v2 = veorq_u64(v2, vld1q_u64(dp3 + 4)); 62 + v3 = veorq_u64(v3, vld1q_u64(dp3 + 6)); 63 + 64 + /* store */ 65 + vst1q_u64(dp1 + 0, v0); 66 + vst1q_u64(dp1 + 2, v1); 67 + vst1q_u64(dp1 + 4, v2); 68 + vst1q_u64(dp1 + 6, v3); 69 + 70 + dp1 += 8; 71 + dp2 += 8; 72 + dp3 += 8; 73 + } while (--lines > 0); 74 + } 75 + 76 + static void __xor_neon_4(unsigned long bytes, unsigned long * __restrict p1, 77 + const unsigned long * __restrict p2, 78 + const unsigned long * __restrict p3, 79 + const unsigned long * __restrict p4) 80 + { 81 + uint64_t *dp1 = (uint64_t *)p1; 82 + uint64_t *dp2 = (uint64_t *)p2; 83 + uint64_t *dp3 = (uint64_t *)p3; 84 + uint64_t *dp4 = (uint64_t *)p4; 85 + 86 + register uint64x2_t v0, v1, v2, v3; 87 + long lines = bytes / (sizeof(uint64x2_t) * 4); 88 + 89 + do { 90 + /* p1 ^= p2 */ 91 + v0 = veorq_u64(vld1q_u64(dp1 + 0), vld1q_u64(dp2 + 0)); 92 + v1 = veorq_u64(vld1q_u64(dp1 + 2), vld1q_u64(dp2 + 2)); 93 + v2 = veorq_u64(vld1q_u64(dp1 + 4), vld1q_u64(dp2 + 4)); 94 + v3 = veorq_u64(vld1q_u64(dp1 + 6), vld1q_u64(dp2 + 6)); 95 + 96 + /* p1 ^= p3 */ 97 + v0 = veorq_u64(v0, vld1q_u64(dp3 + 0)); 98 + v1 = veorq_u64(v1, vld1q_u64(dp3 + 2)); 99 + v2 = veorq_u64(v2, vld1q_u64(dp3 + 4)); 100 + v3 = veorq_u64(v3, vld1q_u64(dp3 + 6)); 101 + 102 + /* p1 ^= p4 */ 103 + v0 = veorq_u64(v0, vld1q_u64(dp4 + 0)); 104 + v1 = veorq_u64(v1, vld1q_u64(dp4 + 2)); 105 + v2 = veorq_u64(v2, vld1q_u64(dp4 + 4)); 106 + v3 = veorq_u64(v3, vld1q_u64(dp4 + 6)); 107 + 108 + /* store */ 109 + vst1q_u64(dp1 + 0, v0); 110 + vst1q_u64(dp1 + 2, v1); 111 + vst1q_u64(dp1 + 4, v2); 112 + vst1q_u64(dp1 + 6, v3); 113 + 114 + dp1 += 8; 115 + dp2 += 8; 116 + dp3 += 8; 117 + dp4 += 8; 118 + } while (--lines > 0); 119 + } 120 + 121 + static void __xor_neon_5(unsigned long bytes, unsigned long * __restrict p1, 122 + const unsigned long * __restrict p2, 123 + const unsigned long * __restrict p3, 124 + const unsigned long * __restrict p4, 125 + const unsigned long * __restrict p5) 126 + { 127 + uint64_t *dp1 = (uint64_t *)p1; 128 + uint64_t *dp2 = (uint64_t *)p2; 129 + uint64_t *dp3 = (uint64_t *)p3; 130 + uint64_t *dp4 = (uint64_t *)p4; 131 + uint64_t *dp5 = (uint64_t *)p5; 132 + 133 + register uint64x2_t v0, v1, v2, v3; 134 + long lines = bytes / (sizeof(uint64x2_t) * 4); 135 + 136 + do { 137 + /* p1 ^= p2 */ 138 + v0 = veorq_u64(vld1q_u64(dp1 + 0), vld1q_u64(dp2 + 0)); 139 + v1 = veorq_u64(vld1q_u64(dp1 + 2), vld1q_u64(dp2 + 2)); 140 + v2 = veorq_u64(vld1q_u64(dp1 + 4), vld1q_u64(dp2 + 4)); 141 + v3 = veorq_u64(vld1q_u64(dp1 + 6), vld1q_u64(dp2 + 6)); 142 + 143 + /* p1 ^= p3 */ 144 + v0 = veorq_u64(v0, vld1q_u64(dp3 + 0)); 145 + v1 = veorq_u64(v1, vld1q_u64(dp3 + 2)); 146 + v2 = veorq_u64(v2, vld1q_u64(dp3 + 4)); 147 + v3 = veorq_u64(v3, vld1q_u64(dp3 + 6)); 148 + 149 + /* p1 ^= p4 */ 150 + v0 = veorq_u64(v0, vld1q_u64(dp4 + 0)); 151 + v1 = veorq_u64(v1, vld1q_u64(dp4 + 2)); 152 + v2 = veorq_u64(v2, vld1q_u64(dp4 + 4)); 153 + v3 = veorq_u64(v3, vld1q_u64(dp4 + 6)); 154 + 155 + /* p1 ^= p5 */ 156 + v0 = veorq_u64(v0, vld1q_u64(dp5 + 0)); 157 + v1 = veorq_u64(v1, vld1q_u64(dp5 + 2)); 158 + v2 = veorq_u64(v2, vld1q_u64(dp5 + 4)); 159 + v3 = veorq_u64(v3, vld1q_u64(dp5 + 6)); 160 + 161 + /* store */ 162 + vst1q_u64(dp1 + 0, v0); 163 + vst1q_u64(dp1 + 2, v1); 164 + vst1q_u64(dp1 + 4, v2); 165 + vst1q_u64(dp1 + 6, v3); 166 + 167 + dp1 += 8; 168 + dp2 += 8; 169 + dp3 += 8; 170 + dp4 += 8; 171 + dp5 += 8; 172 + } while (--lines > 0); 173 + } 174 + 175 + __DO_XOR_BLOCKS(neon_inner, __xor_neon_2, __xor_neon_3, __xor_neon_4, 176 + __xor_neon_5); 177 + 178 + static inline uint64x2_t eor3(uint64x2_t p, uint64x2_t q, uint64x2_t r) 179 + { 180 + uint64x2_t res; 181 + 182 + asm(ARM64_ASM_PREAMBLE ".arch_extension sha3\n" 183 + "eor3 %0.16b, %1.16b, %2.16b, %3.16b" 184 + : "=w"(res) : "w"(p), "w"(q), "w"(r)); 185 + return res; 186 + } 187 + 188 + static void __xor_eor3_3(unsigned long bytes, unsigned long * __restrict p1, 189 + const unsigned long * __restrict p2, 190 + const unsigned long * __restrict p3) 191 + { 192 + uint64_t *dp1 = (uint64_t *)p1; 193 + uint64_t *dp2 = (uint64_t *)p2; 194 + uint64_t *dp3 = (uint64_t *)p3; 195 + 196 + register uint64x2_t v0, v1, v2, v3; 197 + long lines = bytes / (sizeof(uint64x2_t) * 4); 198 + 199 + do { 200 + /* p1 ^= p2 ^ p3 */ 201 + v0 = eor3(vld1q_u64(dp1 + 0), vld1q_u64(dp2 + 0), 202 + vld1q_u64(dp3 + 0)); 203 + v1 = eor3(vld1q_u64(dp1 + 2), vld1q_u64(dp2 + 2), 204 + vld1q_u64(dp3 + 2)); 205 + v2 = eor3(vld1q_u64(dp1 + 4), vld1q_u64(dp2 + 4), 206 + vld1q_u64(dp3 + 4)); 207 + v3 = eor3(vld1q_u64(dp1 + 6), vld1q_u64(dp2 + 6), 208 + vld1q_u64(dp3 + 6)); 209 + 210 + /* store */ 211 + vst1q_u64(dp1 + 0, v0); 212 + vst1q_u64(dp1 + 2, v1); 213 + vst1q_u64(dp1 + 4, v2); 214 + vst1q_u64(dp1 + 6, v3); 215 + 216 + dp1 += 8; 217 + dp2 += 8; 218 + dp3 += 8; 219 + } while (--lines > 0); 220 + } 221 + 222 + static void __xor_eor3_4(unsigned long bytes, unsigned long * __restrict p1, 223 + const unsigned long * __restrict p2, 224 + const unsigned long * __restrict p3, 225 + const unsigned long * __restrict p4) 226 + { 227 + uint64_t *dp1 = (uint64_t *)p1; 228 + uint64_t *dp2 = (uint64_t *)p2; 229 + uint64_t *dp3 = (uint64_t *)p3; 230 + uint64_t *dp4 = (uint64_t *)p4; 231 + 232 + register uint64x2_t v0, v1, v2, v3; 233 + long lines = bytes / (sizeof(uint64x2_t) * 4); 234 + 235 + do { 236 + /* p1 ^= p2 ^ p3 */ 237 + v0 = eor3(vld1q_u64(dp1 + 0), vld1q_u64(dp2 + 0), 238 + vld1q_u64(dp3 + 0)); 239 + v1 = eor3(vld1q_u64(dp1 + 2), vld1q_u64(dp2 + 2), 240 + vld1q_u64(dp3 + 2)); 241 + v2 = eor3(vld1q_u64(dp1 + 4), vld1q_u64(dp2 + 4), 242 + vld1q_u64(dp3 + 4)); 243 + v3 = eor3(vld1q_u64(dp1 + 6), vld1q_u64(dp2 + 6), 244 + vld1q_u64(dp3 + 6)); 245 + 246 + /* p1 ^= p4 */ 247 + v0 = veorq_u64(v0, vld1q_u64(dp4 + 0)); 248 + v1 = veorq_u64(v1, vld1q_u64(dp4 + 2)); 249 + v2 = veorq_u64(v2, vld1q_u64(dp4 + 4)); 250 + v3 = veorq_u64(v3, vld1q_u64(dp4 + 6)); 251 + 252 + /* store */ 253 + vst1q_u64(dp1 + 0, v0); 254 + vst1q_u64(dp1 + 2, v1); 255 + vst1q_u64(dp1 + 4, v2); 256 + vst1q_u64(dp1 + 6, v3); 257 + 258 + dp1 += 8; 259 + dp2 += 8; 260 + dp3 += 8; 261 + dp4 += 8; 262 + } while (--lines > 0); 263 + } 264 + 265 + static void __xor_eor3_5(unsigned long bytes, unsigned long * __restrict p1, 266 + const unsigned long * __restrict p2, 267 + const unsigned long * __restrict p3, 268 + const unsigned long * __restrict p4, 269 + const unsigned long * __restrict p5) 270 + { 271 + uint64_t *dp1 = (uint64_t *)p1; 272 + uint64_t *dp2 = (uint64_t *)p2; 273 + uint64_t *dp3 = (uint64_t *)p3; 274 + uint64_t *dp4 = (uint64_t *)p4; 275 + uint64_t *dp5 = (uint64_t *)p5; 276 + 277 + register uint64x2_t v0, v1, v2, v3; 278 + long lines = bytes / (sizeof(uint64x2_t) * 4); 279 + 280 + do { 281 + /* p1 ^= p2 ^ p3 */ 282 + v0 = eor3(vld1q_u64(dp1 + 0), vld1q_u64(dp2 + 0), 283 + vld1q_u64(dp3 + 0)); 284 + v1 = eor3(vld1q_u64(dp1 + 2), vld1q_u64(dp2 + 2), 285 + vld1q_u64(dp3 + 2)); 286 + v2 = eor3(vld1q_u64(dp1 + 4), vld1q_u64(dp2 + 4), 287 + vld1q_u64(dp3 + 4)); 288 + v3 = eor3(vld1q_u64(dp1 + 6), vld1q_u64(dp2 + 6), 289 + vld1q_u64(dp3 + 6)); 290 + 291 + /* p1 ^= p4 ^ p5 */ 292 + v0 = eor3(v0, vld1q_u64(dp4 + 0), vld1q_u64(dp5 + 0)); 293 + v1 = eor3(v1, vld1q_u64(dp4 + 2), vld1q_u64(dp5 + 2)); 294 + v2 = eor3(v2, vld1q_u64(dp4 + 4), vld1q_u64(dp5 + 4)); 295 + v3 = eor3(v3, vld1q_u64(dp4 + 6), vld1q_u64(dp5 + 6)); 296 + 297 + /* store */ 298 + vst1q_u64(dp1 + 0, v0); 299 + vst1q_u64(dp1 + 2, v1); 300 + vst1q_u64(dp1 + 4, v2); 301 + vst1q_u64(dp1 + 6, v3); 302 + 303 + dp1 += 8; 304 + dp2 += 8; 305 + dp3 += 8; 306 + dp4 += 8; 307 + dp5 += 8; 308 + } while (--lines > 0); 309 + } 310 + 311 + __DO_XOR_BLOCKS(eor3_inner, __xor_neon_2, __xor_eor3_3, __xor_eor3_4, 312 + __xor_eor3_5);

+6

lib/raid/xor/arm64/xor-neon.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + 3 + void xor_gen_neon_inner(void *dest, void **srcs, unsigned int src_cnt, 4 + unsigned int bytes); 5 + void xor_gen_eor3_inner(void *dest, void **srcs, unsigned int src_cnt, 6 + unsigned int bytes);

+21

lib/raid/xor/arm64/xor_arch.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + /* 3 + * Authors: Jackie Liu <liuyun01@kylinos.cn> 4 + * Copyright (C) 2018,Tianjin KYLIN Information Technology Co., Ltd. 5 + */ 6 + #include <asm/simd.h> 7 + 8 + extern struct xor_block_template xor_block_neon; 9 + extern struct xor_block_template xor_block_eor3; 10 + 11 + static __always_inline void __init arch_xor_init(void) 12 + { 13 + xor_register(&xor_block_8regs); 14 + xor_register(&xor_block_32regs); 15 + if (cpu_has_neon()) { 16 + if (cpu_have_named_feature(SHA3)) 17 + xor_register(&xor_block_eor3); 18 + else 19 + xor_register(&xor_block_neon); 20 + } 21 + }

+33

lib/raid/xor/loongarch/xor_arch.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0-or-later */ 2 + /* 3 + * Copyright (C) 2023 WANG Xuerui <git@xen0n.name> 4 + */ 5 + #include <asm/cpu-features.h> 6 + 7 + /* 8 + * For grins, also test the generic routines. 9 + * 10 + * More importantly: it cannot be ruled out at this point of time, that some 11 + * future (maybe reduced) models could run the vector algorithms slower than 12 + * the scalar ones, maybe for errata or micro-op reasons. It may be 13 + * appropriate to revisit this after one or two more uarch generations. 14 + */ 15 + 16 + extern struct xor_block_template xor_block_lsx; 17 + extern struct xor_block_template xor_block_lasx; 18 + 19 + static __always_inline void __init arch_xor_init(void) 20 + { 21 + xor_register(&xor_block_8regs); 22 + xor_register(&xor_block_8regs_p); 23 + xor_register(&xor_block_32regs); 24 + xor_register(&xor_block_32regs_p); 25 + #ifdef CONFIG_CPU_HAS_LSX 26 + if (cpu_has_lsx) 27 + xor_register(&xor_block_lsx); 28 + #endif 29 + #ifdef CONFIG_CPU_HAS_LASX 30 + if (cpu_has_lasx) 31 + xor_register(&xor_block_lasx); 32 + #endif 33 + }

+37

lib/raid/xor/loongarch/xor_simd_glue.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-or-later 2 + /* 3 + * LoongArch SIMD XOR operations 4 + * 5 + * Copyright (C) 2023 WANG Xuerui <git@xen0n.name> 6 + */ 7 + 8 + #include <linux/sched.h> 9 + #include <asm/fpu.h> 10 + #include "xor_impl.h" 11 + #include "xor_arch.h" 12 + #include "xor_simd.h" 13 + 14 + #define MAKE_XOR_GLUES(flavor) \ 15 + DO_XOR_BLOCKS(flavor##_inner, __xor_##flavor##_2, __xor_##flavor##_3, \ 16 + __xor_##flavor##_4, __xor_##flavor##_5); \ 17 + \ 18 + static void xor_gen_##flavor(void *dest, void **srcs, unsigned int src_cnt, \ 19 + unsigned int bytes) \ 20 + { \ 21 + kernel_fpu_begin(); \ 22 + xor_gen_##flavor##_inner(dest, srcs, src_cnt, bytes); \ 23 + kernel_fpu_end(); \ 24 + } \ 25 + \ 26 + struct xor_block_template xor_block_##flavor = { \ 27 + .name = __stringify(flavor), \ 28 + .xor_gen = xor_gen_##flavor \ 29 + } 30 + 31 + #ifdef CONFIG_CPU_HAS_LSX 32 + MAKE_XOR_GLUES(lsx); 33 + #endif /* CONFIG_CPU_HAS_LSX */ 34 + 35 + #ifdef CONFIG_CPU_HAS_LASX 36 + MAKE_XOR_GLUES(lasx); 37 + #endif /* CONFIG_CPU_HAS_LASX */

+22

lib/raid/xor/powerpc/xor_arch.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0-or-later */ 2 + /* 3 + * 4 + * Copyright (C) IBM Corporation, 2012 5 + * 6 + * Author: Anton Blanchard <anton@au.ibm.com> 7 + */ 8 + #include <asm/cpu_has_feature.h> 9 + 10 + extern struct xor_block_template xor_block_altivec; 11 + 12 + static __always_inline void __init arch_xor_init(void) 13 + { 14 + xor_register(&xor_block_8regs); 15 + xor_register(&xor_block_8regs_p); 16 + xor_register(&xor_block_32regs); 17 + xor_register(&xor_block_32regs_p); 18 + #ifdef CONFIG_ALTIVEC 19 + if (cpu_has_feature(CPU_FTR_ALTIVEC)) 20 + xor_register(&xor_block_altivec); 21 + #endif 22 + }

+160

lib/raid/xor/powerpc/xor_vmx.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-or-later 2 + /* 3 + * 4 + * Copyright (C) IBM Corporation, 2012 5 + * 6 + * Author: Anton Blanchard <anton@au.ibm.com> 7 + */ 8 + 9 + /* 10 + * Sparse (as at v0.5.0) gets very, very confused by this file. 11 + * Make it a bit simpler for it. 12 + */ 13 + #include "xor_impl.h" 14 + #if !defined(__CHECKER__) 15 + #include <altivec.h> 16 + #else 17 + #define vec_xor(a, b) a ^ b 18 + #define vector __attribute__((vector_size(16))) 19 + #endif 20 + 21 + #include "xor_vmx.h" 22 + 23 + typedef vector signed char unative_t; 24 + 25 + #define DEFINE(V) \ 26 + unative_t *V = (unative_t *)V##_in; \ 27 + unative_t V##_0, V##_1, V##_2, V##_3 28 + 29 + #define LOAD(V) \ 30 + do { \ 31 + V##_0 = V[0]; \ 32 + V##_1 = V[1]; \ 33 + V##_2 = V[2]; \ 34 + V##_3 = V[3]; \ 35 + } while (0) 36 + 37 + #define STORE(V) \ 38 + do { \ 39 + V[0] = V##_0; \ 40 + V[1] = V##_1; \ 41 + V[2] = V##_2; \ 42 + V[3] = V##_3; \ 43 + } while (0) 44 + 45 + #define XOR(V1, V2) \ 46 + do { \ 47 + V1##_0 = vec_xor(V1##_0, V2##_0); \ 48 + V1##_1 = vec_xor(V1##_1, V2##_1); \ 49 + V1##_2 = vec_xor(V1##_2, V2##_2); \ 50 + V1##_3 = vec_xor(V1##_3, V2##_3); \ 51 + } while (0) 52 + 53 + static void __xor_altivec_2(unsigned long bytes, 54 + unsigned long * __restrict v1_in, 55 + const unsigned long * __restrict v2_in) 56 + { 57 + DEFINE(v1); 58 + DEFINE(v2); 59 + unsigned long lines = bytes / (sizeof(unative_t)) / 4; 60 + 61 + do { 62 + LOAD(v1); 63 + LOAD(v2); 64 + XOR(v1, v2); 65 + STORE(v1); 66 + 67 + v1 += 4; 68 + v2 += 4; 69 + } while (--lines > 0); 70 + } 71 + 72 + static void __xor_altivec_3(unsigned long bytes, 73 + unsigned long * __restrict v1_in, 74 + const unsigned long * __restrict v2_in, 75 + const unsigned long * __restrict v3_in) 76 + { 77 + DEFINE(v1); 78 + DEFINE(v2); 79 + DEFINE(v3); 80 + unsigned long lines = bytes / (sizeof(unative_t)) / 4; 81 + 82 + do { 83 + LOAD(v1); 84 + LOAD(v2); 85 + LOAD(v3); 86 + XOR(v1, v2); 87 + XOR(v1, v3); 88 + STORE(v1); 89 + 90 + v1 += 4; 91 + v2 += 4; 92 + v3 += 4; 93 + } while (--lines > 0); 94 + } 95 + 96 + static void __xor_altivec_4(unsigned long bytes, 97 + unsigned long * __restrict v1_in, 98 + const unsigned long * __restrict v2_in, 99 + const unsigned long * __restrict v3_in, 100 + const unsigned long * __restrict v4_in) 101 + { 102 + DEFINE(v1); 103 + DEFINE(v2); 104 + DEFINE(v3); 105 + DEFINE(v4); 106 + unsigned long lines = bytes / (sizeof(unative_t)) / 4; 107 + 108 + do { 109 + LOAD(v1); 110 + LOAD(v2); 111 + LOAD(v3); 112 + LOAD(v4); 113 + XOR(v1, v2); 114 + XOR(v3, v4); 115 + XOR(v1, v3); 116 + STORE(v1); 117 + 118 + v1 += 4; 119 + v2 += 4; 120 + v3 += 4; 121 + v4 += 4; 122 + } while (--lines > 0); 123 + } 124 + 125 + static void __xor_altivec_5(unsigned long bytes, 126 + unsigned long * __restrict v1_in, 127 + const unsigned long * __restrict v2_in, 128 + const unsigned long * __restrict v3_in, 129 + const unsigned long * __restrict v4_in, 130 + const unsigned long * __restrict v5_in) 131 + { 132 + DEFINE(v1); 133 + DEFINE(v2); 134 + DEFINE(v3); 135 + DEFINE(v4); 136 + DEFINE(v5); 137 + unsigned long lines = bytes / (sizeof(unative_t)) / 4; 138 + 139 + do { 140 + LOAD(v1); 141 + LOAD(v2); 142 + LOAD(v3); 143 + LOAD(v4); 144 + LOAD(v5); 145 + XOR(v1, v2); 146 + XOR(v3, v4); 147 + XOR(v1, v5); 148 + XOR(v1, v3); 149 + STORE(v1); 150 + 151 + v1 += 4; 152 + v2 += 4; 153 + v3 += 4; 154 + v4 += 4; 155 + v5 += 4; 156 + } while (--lines > 0); 157 + } 158 + 159 + __DO_XOR_BLOCKS(altivec_inner, __xor_altivec_2, __xor_altivec_3, 160 + __xor_altivec_4, __xor_altivec_5);

+10

lib/raid/xor/powerpc/xor_vmx.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* 3 + * Simple interface to link xor_vmx.c and xor_vmx_glue.c 4 + * 5 + * Separating these file ensures that no altivec instructions are run 6 + * outside of the enable/disable altivec block. 7 + */ 8 + 9 + void xor_gen_altivec_inner(void *dest, void **srcs, unsigned int src_cnt, 10 + unsigned int bytes);

+28

lib/raid/xor/powerpc/xor_vmx_glue.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-or-later 2 + /* 3 + * Altivec XOR operations 4 + * 5 + * Copyright 2017 IBM Corp. 6 + */ 7 + 8 + #include <linux/preempt.h> 9 + #include <linux/sched.h> 10 + #include <asm/switch_to.h> 11 + #include "xor_impl.h" 12 + #include "xor_arch.h" 13 + #include "xor_vmx.h" 14 + 15 + static void xor_gen_altivec(void *dest, void **srcs, unsigned int src_cnt, 16 + unsigned int bytes) 17 + { 18 + preempt_disable(); 19 + enable_kernel_altivec(); 20 + xor_gen_altivec_inner(dest, srcs, src_cnt, bytes); 21 + disable_kernel_altivec(); 22 + preempt_enable(); 23 + } 24 + 25 + struct xor_block_template xor_block_altivec = { 26 + .name = "altivec", 27 + .xor_gen = xor_gen_altivec, 28 + };

+25

lib/raid/xor/riscv/xor-glue.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-or-later 2 + /* 3 + * Copyright (C) 2021 SiFive 4 + */ 5 + 6 + #include <asm/vector.h> 7 + #include <asm/switch_to.h> 8 + #include <asm/asm-prototypes.h> 9 + #include "xor_impl.h" 10 + #include "xor_arch.h" 11 + 12 + DO_XOR_BLOCKS(vector_inner, xor_regs_2_, xor_regs_3_, xor_regs_4_, xor_regs_5_); 13 + 14 + static void xor_gen_vector(void *dest, void **srcs, unsigned int src_cnt, 15 + unsigned int bytes) 16 + { 17 + kernel_vector_begin(); 18 + xor_gen_vector_inner(dest, srcs, src_cnt, bytes); 19 + kernel_vector_end(); 20 + } 21 + 22 + struct xor_block_template xor_block_rvv = { 23 + .name = "rvv", 24 + .xor_gen = xor_gen_vector, 25 + };

+77

lib/raid/xor/riscv/xor.S

··· 1 + /* SPDX-License-Identifier: GPL-2.0-or-later */ 2 + /* 3 + * Copyright (C) 2021 SiFive 4 + */ 5 + #include <linux/linkage.h> 6 + #include <linux/export.h> 7 + #include <asm/asm.h> 8 + 9 + SYM_FUNC_START(xor_regs_2_) 10 + vsetvli a3, a0, e8, m8, ta, ma 11 + vle8.v v0, (a1) 12 + vle8.v v8, (a2) 13 + sub a0, a0, a3 14 + vxor.vv v16, v0, v8 15 + add a2, a2, a3 16 + vse8.v v16, (a1) 17 + add a1, a1, a3 18 + bnez a0, xor_regs_2_ 19 + ret 20 + SYM_FUNC_END(xor_regs_2_) 21 + 22 + SYM_FUNC_START(xor_regs_3_) 23 + vsetvli a4, a0, e8, m8, ta, ma 24 + vle8.v v0, (a1) 25 + vle8.v v8, (a2) 26 + sub a0, a0, a4 27 + vxor.vv v0, v0, v8 28 + vle8.v v16, (a3) 29 + add a2, a2, a4 30 + vxor.vv v16, v0, v16 31 + add a3, a3, a4 32 + vse8.v v16, (a1) 33 + add a1, a1, a4 34 + bnez a0, xor_regs_3_ 35 + ret 36 + SYM_FUNC_END(xor_regs_3_) 37 + 38 + SYM_FUNC_START(xor_regs_4_) 39 + vsetvli a5, a0, e8, m8, ta, ma 40 + vle8.v v0, (a1) 41 + vle8.v v8, (a2) 42 + sub a0, a0, a5 43 + vxor.vv v0, v0, v8 44 + vle8.v v16, (a3) 45 + add a2, a2, a5 46 + vxor.vv v0, v0, v16 47 + vle8.v v24, (a4) 48 + add a3, a3, a5 49 + vxor.vv v16, v0, v24 50 + add a4, a4, a5 51 + vse8.v v16, (a1) 52 + add a1, a1, a5 53 + bnez a0, xor_regs_4_ 54 + ret 55 + SYM_FUNC_END(xor_regs_4_) 56 + 57 + SYM_FUNC_START(xor_regs_5_) 58 + vsetvli a6, a0, e8, m8, ta, ma 59 + vle8.v v0, (a1) 60 + vle8.v v8, (a2) 61 + sub a0, a0, a6 62 + vxor.vv v0, v0, v8 63 + vle8.v v16, (a3) 64 + add a2, a2, a6 65 + vxor.vv v0, v0, v16 66 + vle8.v v24, (a4) 67 + add a3, a3, a6 68 + vxor.vv v0, v0, v24 69 + vle8.v v8, (a5) 70 + add a4, a4, a6 71 + vxor.vv v16, v0, v8 72 + add a5, a5, a6 73 + vse8.v v16, (a1) 74 + add a1, a1, a6 75 + bnez a0, xor_regs_5_ 76 + ret 77 + SYM_FUNC_END(xor_regs_5_)

+17

lib/raid/xor/riscv/xor_arch.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0-or-later */ 2 + /* 3 + * Copyright (C) 2021 SiFive 4 + */ 5 + #include <asm/vector.h> 6 + 7 + extern struct xor_block_template xor_block_rvv; 8 + 9 + static __always_inline void __init arch_xor_init(void) 10 + { 11 + xor_register(&xor_block_8regs); 12 + xor_register(&xor_block_32regs); 13 + #ifdef CONFIG_RISCV_ISA_V 14 + if (has_vector()) 15 + xor_register(&xor_block_rvv); 16 + #endif 17 + }

+133

lib/raid/xor/s390/xor.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Optimized xor_block operation for RAID4/5 4 + * 5 + * Copyright IBM Corp. 2016 6 + * Author(s): Martin Schwidefsky <schwidefsky@de.ibm.com> 7 + */ 8 + 9 + #include <linux/types.h> 10 + #include "xor_impl.h" 11 + #include "xor_arch.h" 12 + 13 + static void xor_xc_2(unsigned long bytes, unsigned long * __restrict p1, 14 + const unsigned long * __restrict p2) 15 + { 16 + asm volatile( 17 + " aghi %0,-1\n" 18 + " jm 3f\n" 19 + " srlg 0,%0,8\n" 20 + " ltgr 0,0\n" 21 + " jz 1f\n" 22 + "0: xc 0(256,%1),0(%2)\n" 23 + " la %1,256(%1)\n" 24 + " la %2,256(%2)\n" 25 + " brctg 0,0b\n" 26 + "1: exrl %0,2f\n" 27 + " j 3f\n" 28 + "2: xc 0(1,%1),0(%2)\n" 29 + "3:" 30 + : "+a" (bytes), "+a" (p1), "+a" (p2) 31 + : : "0", "cc", "memory"); 32 + } 33 + 34 + static void xor_xc_3(unsigned long bytes, unsigned long * __restrict p1, 35 + const unsigned long * __restrict p2, 36 + const unsigned long * __restrict p3) 37 + { 38 + asm volatile( 39 + " aghi %0,-1\n" 40 + " jm 4f\n" 41 + " srlg 0,%0,8\n" 42 + " ltgr 0,0\n" 43 + " jz 1f\n" 44 + "0: xc 0(256,%1),0(%2)\n" 45 + " xc 0(256,%1),0(%3)\n" 46 + " la %1,256(%1)\n" 47 + " la %2,256(%2)\n" 48 + " la %3,256(%3)\n" 49 + " brctg 0,0b\n" 50 + "1: exrl %0,2f\n" 51 + " exrl %0,3f\n" 52 + " j 4f\n" 53 + "2: xc 0(1,%1),0(%2)\n" 54 + "3: xc 0(1,%1),0(%3)\n" 55 + "4:" 56 + : "+a" (bytes), "+a" (p1), "+a" (p2), "+a" (p3) 57 + : : "0", "cc", "memory"); 58 + } 59 + 60 + static void xor_xc_4(unsigned long bytes, unsigned long * __restrict p1, 61 + const unsigned long * __restrict p2, 62 + const unsigned long * __restrict p3, 63 + const unsigned long * __restrict p4) 64 + { 65 + asm volatile( 66 + " aghi %0,-1\n" 67 + " jm 5f\n" 68 + " srlg 0,%0,8\n" 69 + " ltgr 0,0\n" 70 + " jz 1f\n" 71 + "0: xc 0(256,%1),0(%2)\n" 72 + " xc 0(256,%1),0(%3)\n" 73 + " xc 0(256,%1),0(%4)\n" 74 + " la %1,256(%1)\n" 75 + " la %2,256(%2)\n" 76 + " la %3,256(%3)\n" 77 + " la %4,256(%4)\n" 78 + " brctg 0,0b\n" 79 + "1: exrl %0,2f\n" 80 + " exrl %0,3f\n" 81 + " exrl %0,4f\n" 82 + " j 5f\n" 83 + "2: xc 0(1,%1),0(%2)\n" 84 + "3: xc 0(1,%1),0(%3)\n" 85 + "4: xc 0(1,%1),0(%4)\n" 86 + "5:" 87 + : "+a" (bytes), "+a" (p1), "+a" (p2), "+a" (p3), "+a" (p4) 88 + : : "0", "cc", "memory"); 89 + } 90 + 91 + static void xor_xc_5(unsigned long bytes, unsigned long * __restrict p1, 92 + const unsigned long * __restrict p2, 93 + const unsigned long * __restrict p3, 94 + const unsigned long * __restrict p4, 95 + const unsigned long * __restrict p5) 96 + { 97 + asm volatile( 98 + " aghi %0,-1\n" 99 + " jm 6f\n" 100 + " srlg 0,%0,8\n" 101 + " ltgr 0,0\n" 102 + " jz 1f\n" 103 + "0: xc 0(256,%1),0(%2)\n" 104 + " xc 0(256,%1),0(%3)\n" 105 + " xc 0(256,%1),0(%4)\n" 106 + " xc 0(256,%1),0(%5)\n" 107 + " la %1,256(%1)\n" 108 + " la %2,256(%2)\n" 109 + " la %3,256(%3)\n" 110 + " la %4,256(%4)\n" 111 + " la %5,256(%5)\n" 112 + " brctg 0,0b\n" 113 + "1: exrl %0,2f\n" 114 + " exrl %0,3f\n" 115 + " exrl %0,4f\n" 116 + " exrl %0,5f\n" 117 + " j 6f\n" 118 + "2: xc 0(1,%1),0(%2)\n" 119 + "3: xc 0(1,%1),0(%3)\n" 120 + "4: xc 0(1,%1),0(%4)\n" 121 + "5: xc 0(1,%1),0(%5)\n" 122 + "6:" 123 + : "+a" (bytes), "+a" (p1), "+a" (p2), "+a" (p3), "+a" (p4), 124 + "+a" (p5) 125 + : : "0", "cc", "memory"); 126 + } 127 + 128 + DO_XOR_BLOCKS(xc, xor_xc_2, xor_xc_3, xor_xc_4, xor_xc_5); 129 + 130 + struct xor_block_template xor_block_xc = { 131 + .name = "xc", 132 + .xor_gen = xor_gen_xc, 133 + };

+13

lib/raid/xor/s390/xor_arch.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* 3 + * Optimited xor routines 4 + * 5 + * Copyright IBM Corp. 2016 6 + * Author(s): Martin Schwidefsky <schwidefsky@de.ibm.com> 7 + */ 8 + extern struct xor_block_template xor_block_xc; 9 + 10 + static __always_inline void __init arch_xor_init(void) 11 + { 12 + xor_force(&xor_block_xc); 13 + }

+252

lib/raid/xor/sparc/xor-sparc32.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-or-later 2 + /* 3 + * High speed xor_block operation for RAID4/5 utilizing the 4 + * ldd/std SPARC instructions. 5 + * 6 + * Copyright (C) 1999 Jakub Jelinek (jj@ultra.linux.cz) 7 + */ 8 + #include "xor_impl.h" 9 + #include "xor_arch.h" 10 + 11 + static void 12 + sparc_2(unsigned long bytes, unsigned long * __restrict p1, 13 + const unsigned long * __restrict p2) 14 + { 15 + int lines = bytes / (sizeof (long)) / 8; 16 + 17 + do { 18 + __asm__ __volatile__( 19 + "ldd [%0 + 0x00], %%g2\n\t" 20 + "ldd [%0 + 0x08], %%g4\n\t" 21 + "ldd [%0 + 0x10], %%o0\n\t" 22 + "ldd [%0 + 0x18], %%o2\n\t" 23 + "ldd [%1 + 0x00], %%o4\n\t" 24 + "ldd [%1 + 0x08], %%l0\n\t" 25 + "ldd [%1 + 0x10], %%l2\n\t" 26 + "ldd [%1 + 0x18], %%l4\n\t" 27 + "xor %%g2, %%o4, %%g2\n\t" 28 + "xor %%g3, %%o5, %%g3\n\t" 29 + "xor %%g4, %%l0, %%g4\n\t" 30 + "xor %%g5, %%l1, %%g5\n\t" 31 + "xor %%o0, %%l2, %%o0\n\t" 32 + "xor %%o1, %%l3, %%o1\n\t" 33 + "xor %%o2, %%l4, %%o2\n\t" 34 + "xor %%o3, %%l5, %%o3\n\t" 35 + "std %%g2, [%0 + 0x00]\n\t" 36 + "std %%g4, [%0 + 0x08]\n\t" 37 + "std %%o0, [%0 + 0x10]\n\t" 38 + "std %%o2, [%0 + 0x18]\n" 39 + : 40 + : "r" (p1), "r" (p2) 41 + : "g2", "g3", "g4", "g5", 42 + "o0", "o1", "o2", "o3", "o4", "o5", 43 + "l0", "l1", "l2", "l3", "l4", "l5"); 44 + p1 += 8; 45 + p2 += 8; 46 + } while (--lines > 0); 47 + } 48 + 49 + static void 50 + sparc_3(unsigned long bytes, unsigned long * __restrict p1, 51 + const unsigned long * __restrict p2, 52 + const unsigned long * __restrict p3) 53 + { 54 + int lines = bytes / (sizeof (long)) / 8; 55 + 56 + do { 57 + __asm__ __volatile__( 58 + "ldd [%0 + 0x00], %%g2\n\t" 59 + "ldd [%0 + 0x08], %%g4\n\t" 60 + "ldd [%0 + 0x10], %%o0\n\t" 61 + "ldd [%0 + 0x18], %%o2\n\t" 62 + "ldd [%1 + 0x00], %%o4\n\t" 63 + "ldd [%1 + 0x08], %%l0\n\t" 64 + "ldd [%1 + 0x10], %%l2\n\t" 65 + "ldd [%1 + 0x18], %%l4\n\t" 66 + "xor %%g2, %%o4, %%g2\n\t" 67 + "xor %%g3, %%o5, %%g3\n\t" 68 + "ldd [%2 + 0x00], %%o4\n\t" 69 + "xor %%g4, %%l0, %%g4\n\t" 70 + "xor %%g5, %%l1, %%g5\n\t" 71 + "ldd [%2 + 0x08], %%l0\n\t" 72 + "xor %%o0, %%l2, %%o0\n\t" 73 + "xor %%o1, %%l3, %%o1\n\t" 74 + "ldd [%2 + 0x10], %%l2\n\t" 75 + "xor %%o2, %%l4, %%o2\n\t" 76 + "xor %%o3, %%l5, %%o3\n\t" 77 + "ldd [%2 + 0x18], %%l4\n\t" 78 + "xor %%g2, %%o4, %%g2\n\t" 79 + "xor %%g3, %%o5, %%g3\n\t" 80 + "xor %%g4, %%l0, %%g4\n\t" 81 + "xor %%g5, %%l1, %%g5\n\t" 82 + "xor %%o0, %%l2, %%o0\n\t" 83 + "xor %%o1, %%l3, %%o1\n\t" 84 + "xor %%o2, %%l4, %%o2\n\t" 85 + "xor %%o3, %%l5, %%o3\n\t" 86 + "std %%g2, [%0 + 0x00]\n\t" 87 + "std %%g4, [%0 + 0x08]\n\t" 88 + "std %%o0, [%0 + 0x10]\n\t" 89 + "std %%o2, [%0 + 0x18]\n" 90 + : 91 + : "r" (p1), "r" (p2), "r" (p3) 92 + : "g2", "g3", "g4", "g5", 93 + "o0", "o1", "o2", "o3", "o4", "o5", 94 + "l0", "l1", "l2", "l3", "l4", "l5"); 95 + p1 += 8; 96 + p2 += 8; 97 + p3 += 8; 98 + } while (--lines > 0); 99 + } 100 + 101 + static void 102 + sparc_4(unsigned long bytes, unsigned long * __restrict p1, 103 + const unsigned long * __restrict p2, 104 + const unsigned long * __restrict p3, 105 + const unsigned long * __restrict p4) 106 + { 107 + int lines = bytes / (sizeof (long)) / 8; 108 + 109 + do { 110 + __asm__ __volatile__( 111 + "ldd [%0 + 0x00], %%g2\n\t" 112 + "ldd [%0 + 0x08], %%g4\n\t" 113 + "ldd [%0 + 0x10], %%o0\n\t" 114 + "ldd [%0 + 0x18], %%o2\n\t" 115 + "ldd [%1 + 0x00], %%o4\n\t" 116 + "ldd [%1 + 0x08], %%l0\n\t" 117 + "ldd [%1 + 0x10], %%l2\n\t" 118 + "ldd [%1 + 0x18], %%l4\n\t" 119 + "xor %%g2, %%o4, %%g2\n\t" 120 + "xor %%g3, %%o5, %%g3\n\t" 121 + "ldd [%2 + 0x00], %%o4\n\t" 122 + "xor %%g4, %%l0, %%g4\n\t" 123 + "xor %%g5, %%l1, %%g5\n\t" 124 + "ldd [%2 + 0x08], %%l0\n\t" 125 + "xor %%o0, %%l2, %%o0\n\t" 126 + "xor %%o1, %%l3, %%o1\n\t" 127 + "ldd [%2 + 0x10], %%l2\n\t" 128 + "xor %%o2, %%l4, %%o2\n\t" 129 + "xor %%o3, %%l5, %%o3\n\t" 130 + "ldd [%2 + 0x18], %%l4\n\t" 131 + "xor %%g2, %%o4, %%g2\n\t" 132 + "xor %%g3, %%o5, %%g3\n\t" 133 + "ldd [%3 + 0x00], %%o4\n\t" 134 + "xor %%g4, %%l0, %%g4\n\t" 135 + "xor %%g5, %%l1, %%g5\n\t" 136 + "ldd [%3 + 0x08], %%l0\n\t" 137 + "xor %%o0, %%l2, %%o0\n\t" 138 + "xor %%o1, %%l3, %%o1\n\t" 139 + "ldd [%3 + 0x10], %%l2\n\t" 140 + "xor %%o2, %%l4, %%o2\n\t" 141 + "xor %%o3, %%l5, %%o3\n\t" 142 + "ldd [%3 + 0x18], %%l4\n\t" 143 + "xor %%g2, %%o4, %%g2\n\t" 144 + "xor %%g3, %%o5, %%g3\n\t" 145 + "xor %%g4, %%l0, %%g4\n\t" 146 + "xor %%g5, %%l1, %%g5\n\t" 147 + "xor %%o0, %%l2, %%o0\n\t" 148 + "xor %%o1, %%l3, %%o1\n\t" 149 + "xor %%o2, %%l4, %%o2\n\t" 150 + "xor %%o3, %%l5, %%o3\n\t" 151 + "std %%g2, [%0 + 0x00]\n\t" 152 + "std %%g4, [%0 + 0x08]\n\t" 153 + "std %%o0, [%0 + 0x10]\n\t" 154 + "std %%o2, [%0 + 0x18]\n" 155 + : 156 + : "r" (p1), "r" (p2), "r" (p3), "r" (p4) 157 + : "g2", "g3", "g4", "g5", 158 + "o0", "o1", "o2", "o3", "o4", "o5", 159 + "l0", "l1", "l2", "l3", "l4", "l5"); 160 + p1 += 8; 161 + p2 += 8; 162 + p3 += 8; 163 + p4 += 8; 164 + } while (--lines > 0); 165 + } 166 + 167 + static void 168 + sparc_5(unsigned long bytes, unsigned long * __restrict p1, 169 + const unsigned long * __restrict p2, 170 + const unsigned long * __restrict p3, 171 + const unsigned long * __restrict p4, 172 + const unsigned long * __restrict p5) 173 + { 174 + int lines = bytes / (sizeof (long)) / 8; 175 + 176 + do { 177 + __asm__ __volatile__( 178 + "ldd [%0 + 0x00], %%g2\n\t" 179 + "ldd [%0 + 0x08], %%g4\n\t" 180 + "ldd [%0 + 0x10], %%o0\n\t" 181 + "ldd [%0 + 0x18], %%o2\n\t" 182 + "ldd [%1 + 0x00], %%o4\n\t" 183 + "ldd [%1 + 0x08], %%l0\n\t" 184 + "ldd [%1 + 0x10], %%l2\n\t" 185 + "ldd [%1 + 0x18], %%l4\n\t" 186 + "xor %%g2, %%o4, %%g2\n\t" 187 + "xor %%g3, %%o5, %%g3\n\t" 188 + "ldd [%2 + 0x00], %%o4\n\t" 189 + "xor %%g4, %%l0, %%g4\n\t" 190 + "xor %%g5, %%l1, %%g5\n\t" 191 + "ldd [%2 + 0x08], %%l0\n\t" 192 + "xor %%o0, %%l2, %%o0\n\t" 193 + "xor %%o1, %%l3, %%o1\n\t" 194 + "ldd [%2 + 0x10], %%l2\n\t" 195 + "xor %%o2, %%l4, %%o2\n\t" 196 + "xor %%o3, %%l5, %%o3\n\t" 197 + "ldd [%2 + 0x18], %%l4\n\t" 198 + "xor %%g2, %%o4, %%g2\n\t" 199 + "xor %%g3, %%o5, %%g3\n\t" 200 + "ldd [%3 + 0x00], %%o4\n\t" 201 + "xor %%g4, %%l0, %%g4\n\t" 202 + "xor %%g5, %%l1, %%g5\n\t" 203 + "ldd [%3 + 0x08], %%l0\n\t" 204 + "xor %%o0, %%l2, %%o0\n\t" 205 + "xor %%o1, %%l3, %%o1\n\t" 206 + "ldd [%3 + 0x10], %%l2\n\t" 207 + "xor %%o2, %%l4, %%o2\n\t" 208 + "xor %%o3, %%l5, %%o3\n\t" 209 + "ldd [%3 + 0x18], %%l4\n\t" 210 + "xor %%g2, %%o4, %%g2\n\t" 211 + "xor %%g3, %%o5, %%g3\n\t" 212 + "ldd [%4 + 0x00], %%o4\n\t" 213 + "xor %%g4, %%l0, %%g4\n\t" 214 + "xor %%g5, %%l1, %%g5\n\t" 215 + "ldd [%4 + 0x08], %%l0\n\t" 216 + "xor %%o0, %%l2, %%o0\n\t" 217 + "xor %%o1, %%l3, %%o1\n\t" 218 + "ldd [%4 + 0x10], %%l2\n\t" 219 + "xor %%o2, %%l4, %%o2\n\t" 220 + "xor %%o3, %%l5, %%o3\n\t" 221 + "ldd [%4 + 0x18], %%l4\n\t" 222 + "xor %%g2, %%o4, %%g2\n\t" 223 + "xor %%g3, %%o5, %%g3\n\t" 224 + "xor %%g4, %%l0, %%g4\n\t" 225 + "xor %%g5, %%l1, %%g5\n\t" 226 + "xor %%o0, %%l2, %%o0\n\t" 227 + "xor %%o1, %%l3, %%o1\n\t" 228 + "xor %%o2, %%l4, %%o2\n\t" 229 + "xor %%o3, %%l5, %%o3\n\t" 230 + "std %%g2, [%0 + 0x00]\n\t" 231 + "std %%g4, [%0 + 0x08]\n\t" 232 + "std %%o0, [%0 + 0x10]\n\t" 233 + "std %%o2, [%0 + 0x18]\n" 234 + : 235 + : "r" (p1), "r" (p2), "r" (p3), "r" (p4), "r" (p5) 236 + : "g2", "g3", "g4", "g5", 237 + "o0", "o1", "o2", "o3", "o4", "o5", 238 + "l0", "l1", "l2", "l3", "l4", "l5"); 239 + p1 += 8; 240 + p2 += 8; 241 + p3 += 8; 242 + p4 += 8; 243 + p5 += 8; 244 + } while (--lines > 0); 245 + } 246 + 247 + DO_XOR_BLOCKS(sparc32, sparc_2, sparc_3, sparc_4, sparc_5); 248 + 249 + struct xor_block_template xor_block_SPARC = { 250 + .name = "SPARC", 251 + .xor_gen = xor_gen_sparc32, 252 + };

+59

lib/raid/xor/sparc/xor-sparc64-glue.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-or-later 2 + /* 3 + * High speed xor_block operation for RAID4/5 utilizing the 4 + * UltraSparc Visual Instruction Set and Niagara block-init 5 + * twin-load instructions. 6 + * 7 + * Copyright (C) 1997, 1999 Jakub Jelinek (jj@ultra.linux.cz) 8 + * Copyright (C) 2006 David S. Miller <davem@davemloft.net> 9 + */ 10 + 11 + #include "xor_impl.h" 12 + #include "xor_arch.h" 13 + 14 + void xor_vis_2(unsigned long bytes, unsigned long * __restrict p1, 15 + const unsigned long * __restrict p2); 16 + void xor_vis_3(unsigned long bytes, unsigned long * __restrict p1, 17 + const unsigned long * __restrict p2, 18 + const unsigned long * __restrict p3); 19 + void xor_vis_4(unsigned long bytes, unsigned long * __restrict p1, 20 + const unsigned long * __restrict p2, 21 + const unsigned long * __restrict p3, 22 + const unsigned long * __restrict p4); 23 + void xor_vis_5(unsigned long bytes, unsigned long * __restrict p1, 24 + const unsigned long * __restrict p2, 25 + const unsigned long * __restrict p3, 26 + const unsigned long * __restrict p4, 27 + const unsigned long * __restrict p5); 28 + 29 + /* XXX Ugh, write cheetah versions... -DaveM */ 30 + 31 + DO_XOR_BLOCKS(vis, xor_vis_2, xor_vis_3, xor_vis_4, xor_vis_5); 32 + 33 + struct xor_block_template xor_block_VIS = { 34 + .name = "VIS", 35 + .xor_gen = xor_gen_vis, 36 + }; 37 + 38 + void xor_niagara_2(unsigned long bytes, unsigned long * __restrict p1, 39 + const unsigned long * __restrict p2); 40 + void xor_niagara_3(unsigned long bytes, unsigned long * __restrict p1, 41 + const unsigned long * __restrict p2, 42 + const unsigned long * __restrict p3); 43 + void xor_niagara_4(unsigned long bytes, unsigned long * __restrict p1, 44 + const unsigned long * __restrict p2, 45 + const unsigned long * __restrict p3, 46 + const unsigned long * __restrict p4); 47 + void xor_niagara_5(unsigned long bytes, unsigned long * __restrict p1, 48 + const unsigned long * __restrict p2, 49 + const unsigned long * __restrict p3, 50 + const unsigned long * __restrict p4, 51 + const unsigned long * __restrict p5); 52 + 53 + DO_XOR_BLOCKS(niagara, xor_niagara_2, xor_niagara_3, xor_niagara_4, 54 + xor_niagara_5); 55 + 56 + struct xor_block_template xor_block_niagara = { 57 + .name = "Niagara", 58 + .xor_gen = xor_gen_niagara, 59 + };

+636

lib/raid/xor/sparc/xor-sparc64.S

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* 3 + * High speed xor_block operation for RAID4/5 utilizing the 4 + * UltraSparc Visual Instruction Set and Niagara store-init/twin-load. 5 + * 6 + * Copyright (C) 1997, 1999 Jakub Jelinek (jj@ultra.linux.cz) 7 + * Copyright (C) 2006 David S. Miller <davem@davemloft.net> 8 + */ 9 + 10 + #include <linux/export.h> 11 + #include <linux/linkage.h> 12 + #include <asm/visasm.h> 13 + #include <asm/asi.h> 14 + #include <asm/dcu.h> 15 + #include <asm/spitfire.h> 16 + 17 + /* 18 + * Requirements: 19 + * !(((long)dest | (long)sourceN) & (64 - 1)) && 20 + * !(len & 127) && len >= 256 21 + */ 22 + .text 23 + 24 + /* VIS versions. */ 25 + ENTRY(xor_vis_2) 26 + rd %fprs, %o5 27 + andcc %o5, FPRS_FEF|FPRS_DU, %g0 28 + be,pt %icc, 0f 29 + sethi %hi(VISenter), %g1 30 + jmpl %g1 + %lo(VISenter), %g7 31 + add %g7, 8, %g7 32 + 0: wr %g0, FPRS_FEF, %fprs 33 + rd %asi, %g1 34 + wr %g0, ASI_BLK_P, %asi 35 + membar #LoadStore|#StoreLoad|#StoreStore 36 + sub %o0, 128, %o0 37 + ldda [%o1] %asi, %f0 38 + ldda [%o2] %asi, %f16 39 + 40 + 2: ldda [%o1 + 64] %asi, %f32 41 + fxor %f0, %f16, %f16 42 + fxor %f2, %f18, %f18 43 + fxor %f4, %f20, %f20 44 + fxor %f6, %f22, %f22 45 + fxor %f8, %f24, %f24 46 + fxor %f10, %f26, %f26 47 + fxor %f12, %f28, %f28 48 + fxor %f14, %f30, %f30 49 + stda %f16, [%o1] %asi 50 + ldda [%o2 + 64] %asi, %f48 51 + ldda [%o1 + 128] %asi, %f0 52 + fxor %f32, %f48, %f48 53 + fxor %f34, %f50, %f50 54 + add %o1, 128, %o1 55 + fxor %f36, %f52, %f52 56 + add %o2, 128, %o2 57 + fxor %f38, %f54, %f54 58 + subcc %o0, 128, %o0 59 + fxor %f40, %f56, %f56 60 + fxor %f42, %f58, %f58 61 + fxor %f44, %f60, %f60 62 + fxor %f46, %f62, %f62 63 + stda %f48, [%o1 - 64] %asi 64 + bne,pt %xcc, 2b 65 + ldda [%o2] %asi, %f16 66 + 67 + ldda [%o1 + 64] %asi, %f32 68 + fxor %f0, %f16, %f16 69 + fxor %f2, %f18, %f18 70 + fxor %f4, %f20, %f20 71 + fxor %f6, %f22, %f22 72 + fxor %f8, %f24, %f24 73 + fxor %f10, %f26, %f26 74 + fxor %f12, %f28, %f28 75 + fxor %f14, %f30, %f30 76 + stda %f16, [%o1] %asi 77 + ldda [%o2 + 64] %asi, %f48 78 + membar #Sync 79 + fxor %f32, %f48, %f48 80 + fxor %f34, %f50, %f50 81 + fxor %f36, %f52, %f52 82 + fxor %f38, %f54, %f54 83 + fxor %f40, %f56, %f56 84 + fxor %f42, %f58, %f58 85 + fxor %f44, %f60, %f60 86 + fxor %f46, %f62, %f62 87 + stda %f48, [%o1 + 64] %asi 88 + membar #Sync|#StoreStore|#StoreLoad 89 + wr %g1, %g0, %asi 90 + retl 91 + wr %g0, 0, %fprs 92 + ENDPROC(xor_vis_2) 93 + 94 + ENTRY(xor_vis_3) 95 + rd %fprs, %o5 96 + andcc %o5, FPRS_FEF|FPRS_DU, %g0 97 + be,pt %icc, 0f 98 + sethi %hi(VISenter), %g1 99 + jmpl %g1 + %lo(VISenter), %g7 100 + add %g7, 8, %g7 101 + 0: wr %g0, FPRS_FEF, %fprs 102 + rd %asi, %g1 103 + wr %g0, ASI_BLK_P, %asi 104 + membar #LoadStore|#StoreLoad|#StoreStore 105 + sub %o0, 64, %o0 106 + ldda [%o1] %asi, %f0 107 + ldda [%o2] %asi, %f16 108 + 109 + 3: ldda [%o3] %asi, %f32 110 + fxor %f0, %f16, %f48 111 + fxor %f2, %f18, %f50 112 + add %o1, 64, %o1 113 + fxor %f4, %f20, %f52 114 + fxor %f6, %f22, %f54 115 + add %o2, 64, %o2 116 + fxor %f8, %f24, %f56 117 + fxor %f10, %f26, %f58 118 + fxor %f12, %f28, %f60 119 + fxor %f14, %f30, %f62 120 + ldda [%o1] %asi, %f0 121 + fxor %f48, %f32, %f48 122 + fxor %f50, %f34, %f50 123 + fxor %f52, %f36, %f52 124 + fxor %f54, %f38, %f54 125 + add %o3, 64, %o3 126 + fxor %f56, %f40, %f56 127 + fxor %f58, %f42, %f58 128 + subcc %o0, 64, %o0 129 + fxor %f60, %f44, %f60 130 + fxor %f62, %f46, %f62 131 + stda %f48, [%o1 - 64] %asi 132 + bne,pt %xcc, 3b 133 + ldda [%o2] %asi, %f16 134 + 135 + ldda [%o3] %asi, %f32 136 + fxor %f0, %f16, %f48 137 + fxor %f2, %f18, %f50 138 + fxor %f4, %f20, %f52 139 + fxor %f6, %f22, %f54 140 + fxor %f8, %f24, %f56 141 + fxor %f10, %f26, %f58 142 + fxor %f12, %f28, %f60 143 + fxor %f14, %f30, %f62 144 + membar #Sync 145 + fxor %f48, %f32, %f48 146 + fxor %f50, %f34, %f50 147 + fxor %f52, %f36, %f52 148 + fxor %f54, %f38, %f54 149 + fxor %f56, %f40, %f56 150 + fxor %f58, %f42, %f58 151 + fxor %f60, %f44, %f60 152 + fxor %f62, %f46, %f62 153 + stda %f48, [%o1] %asi 154 + membar #Sync|#StoreStore|#StoreLoad 155 + wr %g1, %g0, %asi 156 + retl 157 + wr %g0, 0, %fprs 158 + ENDPROC(xor_vis_3) 159 + 160 + ENTRY(xor_vis_4) 161 + rd %fprs, %o5 162 + andcc %o5, FPRS_FEF|FPRS_DU, %g0 163 + be,pt %icc, 0f 164 + sethi %hi(VISenter), %g1 165 + jmpl %g1 + %lo(VISenter), %g7 166 + add %g7, 8, %g7 167 + 0: wr %g0, FPRS_FEF, %fprs 168 + rd %asi, %g1 169 + wr %g0, ASI_BLK_P, %asi 170 + membar #LoadStore|#StoreLoad|#StoreStore 171 + sub %o0, 64, %o0 172 + ldda [%o1] %asi, %f0 173 + ldda [%o2] %asi, %f16 174 + 175 + 4: ldda [%o3] %asi, %f32 176 + fxor %f0, %f16, %f16 177 + fxor %f2, %f18, %f18 178 + add %o1, 64, %o1 179 + fxor %f4, %f20, %f20 180 + fxor %f6, %f22, %f22 181 + add %o2, 64, %o2 182 + fxor %f8, %f24, %f24 183 + fxor %f10, %f26, %f26 184 + fxor %f12, %f28, %f28 185 + fxor %f14, %f30, %f30 186 + ldda [%o4] %asi, %f48 187 + fxor %f16, %f32, %f32 188 + fxor %f18, %f34, %f34 189 + fxor %f20, %f36, %f36 190 + fxor %f22, %f38, %f38 191 + add %o3, 64, %o3 192 + fxor %f24, %f40, %f40 193 + fxor %f26, %f42, %f42 194 + fxor %f28, %f44, %f44 195 + fxor %f30, %f46, %f46 196 + ldda [%o1] %asi, %f0 197 + fxor %f32, %f48, %f48 198 + fxor %f34, %f50, %f50 199 + fxor %f36, %f52, %f52 200 + add %o4, 64, %o4 201 + fxor %f38, %f54, %f54 202 + fxor %f40, %f56, %f56 203 + fxor %f42, %f58, %f58 204 + subcc %o0, 64, %o0 205 + fxor %f44, %f60, %f60 206 + fxor %f46, %f62, %f62 207 + stda %f48, [%o1 - 64] %asi 208 + bne,pt %xcc, 4b 209 + ldda [%o2] %asi, %f16 210 + 211 + ldda [%o3] %asi, %f32 212 + fxor %f0, %f16, %f16 213 + fxor %f2, %f18, %f18 214 + fxor %f4, %f20, %f20 215 + fxor %f6, %f22, %f22 216 + fxor %f8, %f24, %f24 217 + fxor %f10, %f26, %f26 218 + fxor %f12, %f28, %f28 219 + fxor %f14, %f30, %f30 220 + ldda [%o4] %asi, %f48 221 + fxor %f16, %f32, %f32 222 + fxor %f18, %f34, %f34 223 + fxor %f20, %f36, %f36 224 + fxor %f22, %f38, %f38 225 + fxor %f24, %f40, %f40 226 + fxor %f26, %f42, %f42 227 + fxor %f28, %f44, %f44 228 + fxor %f30, %f46, %f46 229 + membar #Sync 230 + fxor %f32, %f48, %f48 231 + fxor %f34, %f50, %f50 232 + fxor %f36, %f52, %f52 233 + fxor %f38, %f54, %f54 234 + fxor %f40, %f56, %f56 235 + fxor %f42, %f58, %f58 236 + fxor %f44, %f60, %f60 237 + fxor %f46, %f62, %f62 238 + stda %f48, [%o1] %asi 239 + membar #Sync|#StoreStore|#StoreLoad 240 + wr %g1, %g0, %asi 241 + retl 242 + wr %g0, 0, %fprs 243 + ENDPROC(xor_vis_4) 244 + 245 + ENTRY(xor_vis_5) 246 + save %sp, -192, %sp 247 + rd %fprs, %o5 248 + andcc %o5, FPRS_FEF|FPRS_DU, %g0 249 + be,pt %icc, 0f 250 + sethi %hi(VISenter), %g1 251 + jmpl %g1 + %lo(VISenter), %g7 252 + add %g7, 8, %g7 253 + 0: wr %g0, FPRS_FEF, %fprs 254 + rd %asi, %g1 255 + wr %g0, ASI_BLK_P, %asi 256 + membar #LoadStore|#StoreLoad|#StoreStore 257 + sub %i0, 64, %i0 258 + ldda [%i1] %asi, %f0 259 + ldda [%i2] %asi, %f16 260 + 261 + 5: ldda [%i3] %asi, %f32 262 + fxor %f0, %f16, %f48 263 + fxor %f2, %f18, %f50 264 + add %i1, 64, %i1 265 + fxor %f4, %f20, %f52 266 + fxor %f6, %f22, %f54 267 + add %i2, 64, %i2 268 + fxor %f8, %f24, %f56 269 + fxor %f10, %f26, %f58 270 + fxor %f12, %f28, %f60 271 + fxor %f14, %f30, %f62 272 + ldda [%i4] %asi, %f16 273 + fxor %f48, %f32, %f48 274 + fxor %f50, %f34, %f50 275 + fxor %f52, %f36, %f52 276 + fxor %f54, %f38, %f54 277 + add %i3, 64, %i3 278 + fxor %f56, %f40, %f56 279 + fxor %f58, %f42, %f58 280 + fxor %f60, %f44, %f60 281 + fxor %f62, %f46, %f62 282 + ldda [%i5] %asi, %f32 283 + fxor %f48, %f16, %f48 284 + fxor %f50, %f18, %f50 285 + add %i4, 64, %i4 286 + fxor %f52, %f20, %f52 287 + fxor %f54, %f22, %f54 288 + add %i5, 64, %i5 289 + fxor %f56, %f24, %f56 290 + fxor %f58, %f26, %f58 291 + fxor %f60, %f28, %f60 292 + fxor %f62, %f30, %f62 293 + ldda [%i1] %asi, %f0 294 + fxor %f48, %f32, %f48 295 + fxor %f50, %f34, %f50 296 + fxor %f52, %f36, %f52 297 + fxor %f54, %f38, %f54 298 + fxor %f56, %f40, %f56 299 + fxor %f58, %f42, %f58 300 + subcc %i0, 64, %i0 301 + fxor %f60, %f44, %f60 302 + fxor %f62, %f46, %f62 303 + stda %f48, [%i1 - 64] %asi 304 + bne,pt %xcc, 5b 305 + ldda [%i2] %asi, %f16 306 + 307 + ldda [%i3] %asi, %f32 308 + fxor %f0, %f16, %f48 309 + fxor %f2, %f18, %f50 310 + fxor %f4, %f20, %f52 311 + fxor %f6, %f22, %f54 312 + fxor %f8, %f24, %f56 313 + fxor %f10, %f26, %f58 314 + fxor %f12, %f28, %f60 315 + fxor %f14, %f30, %f62 316 + ldda [%i4] %asi, %f16 317 + fxor %f48, %f32, %f48 318 + fxor %f50, %f34, %f50 319 + fxor %f52, %f36, %f52 320 + fxor %f54, %f38, %f54 321 + fxor %f56, %f40, %f56 322 + fxor %f58, %f42, %f58 323 + fxor %f60, %f44, %f60 324 + fxor %f62, %f46, %f62 325 + ldda [%i5] %asi, %f32 326 + fxor %f48, %f16, %f48 327 + fxor %f50, %f18, %f50 328 + fxor %f52, %f20, %f52 329 + fxor %f54, %f22, %f54 330 + fxor %f56, %f24, %f56 331 + fxor %f58, %f26, %f58 332 + fxor %f60, %f28, %f60 333 + fxor %f62, %f30, %f62 334 + membar #Sync 335 + fxor %f48, %f32, %f48 336 + fxor %f50, %f34, %f50 337 + fxor %f52, %f36, %f52 338 + fxor %f54, %f38, %f54 339 + fxor %f56, %f40, %f56 340 + fxor %f58, %f42, %f58 341 + fxor %f60, %f44, %f60 342 + fxor %f62, %f46, %f62 343 + stda %f48, [%i1] %asi 344 + membar #Sync|#StoreStore|#StoreLoad 345 + wr %g1, %g0, %asi 346 + wr %g0, 0, %fprs 347 + ret 348 + restore 349 + ENDPROC(xor_vis_5) 350 + 351 + /* Niagara versions. */ 352 + ENTRY(xor_niagara_2) /* %o0=bytes, %o1=dest, %o2=src */ 353 + save %sp, -192, %sp 354 + prefetch [%i1], #n_writes 355 + prefetch [%i2], #one_read 356 + rd %asi, %g7 357 + wr %g0, ASI_BLK_INIT_QUAD_LDD_P, %asi 358 + srlx %i0, 6, %g1 359 + mov %i1, %i0 360 + mov %i2, %i1 361 + 1: ldda [%i1 + 0x00] %asi, %i2 /* %i2/%i3 = src + 0x00 */ 362 + ldda [%i1 + 0x10] %asi, %i4 /* %i4/%i5 = src + 0x10 */ 363 + ldda [%i1 + 0x20] %asi, %g2 /* %g2/%g3 = src + 0x20 */ 364 + ldda [%i1 + 0x30] %asi, %l0 /* %l0/%l1 = src + 0x30 */ 365 + prefetch [%i1 + 0x40], #one_read 366 + ldda [%i0 + 0x00] %asi, %o0 /* %o0/%o1 = dest + 0x00 */ 367 + ldda [%i0 + 0x10] %asi, %o2 /* %o2/%o3 = dest + 0x10 */ 368 + ldda [%i0 + 0x20] %asi, %o4 /* %o4/%o5 = dest + 0x20 */ 369 + ldda [%i0 + 0x30] %asi, %l2 /* %l2/%l3 = dest + 0x30 */ 370 + prefetch [%i0 + 0x40], #n_writes 371 + xor %o0, %i2, %o0 372 + xor %o1, %i3, %o1 373 + stxa %o0, [%i0 + 0x00] %asi 374 + stxa %o1, [%i0 + 0x08] %asi 375 + xor %o2, %i4, %o2 376 + xor %o3, %i5, %o3 377 + stxa %o2, [%i0 + 0x10] %asi 378 + stxa %o3, [%i0 + 0x18] %asi 379 + xor %o4, %g2, %o4 380 + xor %o5, %g3, %o5 381 + stxa %o4, [%i0 + 0x20] %asi 382 + stxa %o5, [%i0 + 0x28] %asi 383 + xor %l2, %l0, %l2 384 + xor %l3, %l1, %l3 385 + stxa %l2, [%i0 + 0x30] %asi 386 + stxa %l3, [%i0 + 0x38] %asi 387 + add %i0, 0x40, %i0 388 + subcc %g1, 1, %g1 389 + bne,pt %xcc, 1b 390 + add %i1, 0x40, %i1 391 + membar #Sync 392 + wr %g7, 0x0, %asi 393 + ret 394 + restore 395 + ENDPROC(xor_niagara_2) 396 + 397 + ENTRY(xor_niagara_3) /* %o0=bytes, %o1=dest, %o2=src1, %o3=src2 */ 398 + save %sp, -192, %sp 399 + prefetch [%i1], #n_writes 400 + prefetch [%i2], #one_read 401 + prefetch [%i3], #one_read 402 + rd %asi, %g7 403 + wr %g0, ASI_BLK_INIT_QUAD_LDD_P, %asi 404 + srlx %i0, 6, %g1 405 + mov %i1, %i0 406 + mov %i2, %i1 407 + mov %i3, %l7 408 + 1: ldda [%i1 + 0x00] %asi, %i2 /* %i2/%i3 = src1 + 0x00 */ 409 + ldda [%i1 + 0x10] %asi, %i4 /* %i4/%i5 = src1 + 0x10 */ 410 + ldda [%l7 + 0x00] %asi, %g2 /* %g2/%g3 = src2 + 0x00 */ 411 + ldda [%l7 + 0x10] %asi, %l0 /* %l0/%l1 = src2 + 0x10 */ 412 + ldda [%i0 + 0x00] %asi, %o0 /* %o0/%o1 = dest + 0x00 */ 413 + ldda [%i0 + 0x10] %asi, %o2 /* %o2/%o3 = dest + 0x10 */ 414 + xor %g2, %i2, %g2 415 + xor %g3, %i3, %g3 416 + xor %o0, %g2, %o0 417 + xor %o1, %g3, %o1 418 + stxa %o0, [%i0 + 0x00] %asi 419 + stxa %o1, [%i0 + 0x08] %asi 420 + ldda [%i1 + 0x20] %asi, %i2 /* %i2/%i3 = src1 + 0x20 */ 421 + ldda [%l7 + 0x20] %asi, %g2 /* %g2/%g3 = src2 + 0x20 */ 422 + ldda [%i0 + 0x20] %asi, %o0 /* %o0/%o1 = dest + 0x20 */ 423 + xor %l0, %i4, %l0 424 + xor %l1, %i5, %l1 425 + xor %o2, %l0, %o2 426 + xor %o3, %l1, %o3 427 + stxa %o2, [%i0 + 0x10] %asi 428 + stxa %o3, [%i0 + 0x18] %asi 429 + ldda [%i1 + 0x30] %asi, %i4 /* %i4/%i5 = src1 + 0x30 */ 430 + ldda [%l7 + 0x30] %asi, %l0 /* %l0/%l1 = src2 + 0x30 */ 431 + ldda [%i0 + 0x30] %asi, %o2 /* %o2/%o3 = dest + 0x30 */ 432 + prefetch [%i1 + 0x40], #one_read 433 + prefetch [%l7 + 0x40], #one_read 434 + prefetch [%i0 + 0x40], #n_writes 435 + xor %g2, %i2, %g2 436 + xor %g3, %i3, %g3 437 + xor %o0, %g2, %o0 438 + xor %o1, %g3, %o1 439 + stxa %o0, [%i0 + 0x20] %asi 440 + stxa %o1, [%i0 + 0x28] %asi 441 + xor %l0, %i4, %l0 442 + xor %l1, %i5, %l1 443 + xor %o2, %l0, %o2 444 + xor %o3, %l1, %o3 445 + stxa %o2, [%i0 + 0x30] %asi 446 + stxa %o3, [%i0 + 0x38] %asi 447 + add %i0, 0x40, %i0 448 + add %i1, 0x40, %i1 449 + subcc %g1, 1, %g1 450 + bne,pt %xcc, 1b 451 + add %l7, 0x40, %l7 452 + membar #Sync 453 + wr %g7, 0x0, %asi 454 + ret 455 + restore 456 + ENDPROC(xor_niagara_3) 457 + 458 + ENTRY(xor_niagara_4) /* %o0=bytes, %o1=dest, %o2=src1, %o3=src2, %o4=src3 */ 459 + save %sp, -192, %sp 460 + prefetch [%i1], #n_writes 461 + prefetch [%i2], #one_read 462 + prefetch [%i3], #one_read 463 + prefetch [%i4], #one_read 464 + rd %asi, %g7 465 + wr %g0, ASI_BLK_INIT_QUAD_LDD_P, %asi 466 + srlx %i0, 6, %g1 467 + mov %i1, %i0 468 + mov %i2, %i1 469 + mov %i3, %l7 470 + mov %i4, %l6 471 + 1: ldda [%i1 + 0x00] %asi, %i2 /* %i2/%i3 = src1 + 0x00 */ 472 + ldda [%l7 + 0x00] %asi, %i4 /* %i4/%i5 = src2 + 0x00 */ 473 + ldda [%l6 + 0x00] %asi, %g2 /* %g2/%g3 = src3 + 0x00 */ 474 + ldda [%i0 + 0x00] %asi, %l0 /* %l0/%l1 = dest + 0x00 */ 475 + xor %i4, %i2, %i4 476 + xor %i5, %i3, %i5 477 + ldda [%i1 + 0x10] %asi, %i2 /* %i2/%i3 = src1 + 0x10 */ 478 + xor %g2, %i4, %g2 479 + xor %g3, %i5, %g3 480 + ldda [%l7 + 0x10] %asi, %i4 /* %i4/%i5 = src2 + 0x10 */ 481 + xor %l0, %g2, %l0 482 + xor %l1, %g3, %l1 483 + stxa %l0, [%i0 + 0x00] %asi 484 + stxa %l1, [%i0 + 0x08] %asi 485 + ldda [%l6 + 0x10] %asi, %g2 /* %g2/%g3 = src3 + 0x10 */ 486 + ldda [%i0 + 0x10] %asi, %l0 /* %l0/%l1 = dest + 0x10 */ 487 + 488 + xor %i4, %i2, %i4 489 + xor %i5, %i3, %i5 490 + ldda [%i1 + 0x20] %asi, %i2 /* %i2/%i3 = src1 + 0x20 */ 491 + xor %g2, %i4, %g2 492 + xor %g3, %i5, %g3 493 + ldda [%l7 + 0x20] %asi, %i4 /* %i4/%i5 = src2 + 0x20 */ 494 + xor %l0, %g2, %l0 495 + xor %l1, %g3, %l1 496 + stxa %l0, [%i0 + 0x10] %asi 497 + stxa %l1, [%i0 + 0x18] %asi 498 + ldda [%l6 + 0x20] %asi, %g2 /* %g2/%g3 = src3 + 0x20 */ 499 + ldda [%i0 + 0x20] %asi, %l0 /* %l0/%l1 = dest + 0x20 */ 500 + 501 + xor %i4, %i2, %i4 502 + xor %i5, %i3, %i5 503 + ldda [%i1 + 0x30] %asi, %i2 /* %i2/%i3 = src1 + 0x30 */ 504 + xor %g2, %i4, %g2 505 + xor %g3, %i5, %g3 506 + ldda [%l7 + 0x30] %asi, %i4 /* %i4/%i5 = src2 + 0x30 */ 507 + xor %l0, %g2, %l0 508 + xor %l1, %g3, %l1 509 + stxa %l0, [%i0 + 0x20] %asi 510 + stxa %l1, [%i0 + 0x28] %asi 511 + ldda [%l6 + 0x30] %asi, %g2 /* %g2/%g3 = src3 + 0x30 */ 512 + ldda [%i0 + 0x30] %asi, %l0 /* %l0/%l1 = dest + 0x30 */ 513 + 514 + prefetch [%i1 + 0x40], #one_read 515 + prefetch [%l7 + 0x40], #one_read 516 + prefetch [%l6 + 0x40], #one_read 517 + prefetch [%i0 + 0x40], #n_writes 518 + 519 + xor %i4, %i2, %i4 520 + xor %i5, %i3, %i5 521 + xor %g2, %i4, %g2 522 + xor %g3, %i5, %g3 523 + xor %l0, %g2, %l0 524 + xor %l1, %g3, %l1 525 + stxa %l0, [%i0 + 0x30] %asi 526 + stxa %l1, [%i0 + 0x38] %asi 527 + 528 + add %i0, 0x40, %i0 529 + add %i1, 0x40, %i1 530 + add %l7, 0x40, %l7 531 + subcc %g1, 1, %g1 532 + bne,pt %xcc, 1b 533 + add %l6, 0x40, %l6 534 + membar #Sync 535 + wr %g7, 0x0, %asi 536 + ret 537 + restore 538 + ENDPROC(xor_niagara_4) 539 + 540 + ENTRY(xor_niagara_5) /* %o0=bytes, %o1=dest, %o2=src1, %o3=src2, %o4=src3, %o5=src4 */ 541 + save %sp, -192, %sp 542 + prefetch [%i1], #n_writes 543 + prefetch [%i2], #one_read 544 + prefetch [%i3], #one_read 545 + prefetch [%i4], #one_read 546 + prefetch [%i5], #one_read 547 + rd %asi, %g7 548 + wr %g0, ASI_BLK_INIT_QUAD_LDD_P, %asi 549 + srlx %i0, 6, %g1 550 + mov %i1, %i0 551 + mov %i2, %i1 552 + mov %i3, %l7 553 + mov %i4, %l6 554 + mov %i5, %l5 555 + 1: ldda [%i1 + 0x00] %asi, %i2 /* %i2/%i3 = src1 + 0x00 */ 556 + ldda [%l7 + 0x00] %asi, %i4 /* %i4/%i5 = src2 + 0x00 */ 557 + ldda [%l6 + 0x00] %asi, %g2 /* %g2/%g3 = src3 + 0x00 */ 558 + ldda [%l5 + 0x00] %asi, %l0 /* %l0/%l1 = src4 + 0x00 */ 559 + ldda [%i0 + 0x00] %asi, %l2 /* %l2/%l3 = dest + 0x00 */ 560 + xor %i4, %i2, %i4 561 + xor %i5, %i3, %i5 562 + ldda [%i1 + 0x10] %asi, %i2 /* %i2/%i3 = src1 + 0x10 */ 563 + xor %g2, %i4, %g2 564 + xor %g3, %i5, %g3 565 + ldda [%l7 + 0x10] %asi, %i4 /* %i4/%i5 = src2 + 0x10 */ 566 + xor %l0, %g2, %l0 567 + xor %l1, %g3, %l1 568 + ldda [%l6 + 0x10] %asi, %g2 /* %g2/%g3 = src3 + 0x10 */ 569 + xor %l2, %l0, %l2 570 + xor %l3, %l1, %l3 571 + stxa %l2, [%i0 + 0x00] %asi 572 + stxa %l3, [%i0 + 0x08] %asi 573 + ldda [%l5 + 0x10] %asi, %l0 /* %l0/%l1 = src4 + 0x10 */ 574 + ldda [%i0 + 0x10] %asi, %l2 /* %l2/%l3 = dest + 0x10 */ 575 + 576 + xor %i4, %i2, %i4 577 + xor %i5, %i3, %i5 578 + ldda [%i1 + 0x20] %asi, %i2 /* %i2/%i3 = src1 + 0x20 */ 579 + xor %g2, %i4, %g2 580 + xor %g3, %i5, %g3 581 + ldda [%l7 + 0x20] %asi, %i4 /* %i4/%i5 = src2 + 0x20 */ 582 + xor %l0, %g2, %l0 583 + xor %l1, %g3, %l1 584 + ldda [%l6 + 0x20] %asi, %g2 /* %g2/%g3 = src3 + 0x20 */ 585 + xor %l2, %l0, %l2 586 + xor %l3, %l1, %l3 587 + stxa %l2, [%i0 + 0x10] %asi 588 + stxa %l3, [%i0 + 0x18] %asi 589 + ldda [%l5 + 0x20] %asi, %l0 /* %l0/%l1 = src4 + 0x20 */ 590 + ldda [%i0 + 0x20] %asi, %l2 /* %l2/%l3 = dest + 0x20 */ 591 + 592 + xor %i4, %i2, %i4 593 + xor %i5, %i3, %i5 594 + ldda [%i1 + 0x30] %asi, %i2 /* %i2/%i3 = src1 + 0x30 */ 595 + xor %g2, %i4, %g2 596 + xor %g3, %i5, %g3 597 + ldda [%l7 + 0x30] %asi, %i4 /* %i4/%i5 = src2 + 0x30 */ 598 + xor %l0, %g2, %l0 599 + xor %l1, %g3, %l1 600 + ldda [%l6 + 0x30] %asi, %g2 /* %g2/%g3 = src3 + 0x30 */ 601 + xor %l2, %l0, %l2 602 + xor %l3, %l1, %l3 603 + stxa %l2, [%i0 + 0x20] %asi 604 + stxa %l3, [%i0 + 0x28] %asi 605 + ldda [%l5 + 0x30] %asi, %l0 /* %l0/%l1 = src4 + 0x30 */ 606 + ldda [%i0 + 0x30] %asi, %l2 /* %l2/%l3 = dest + 0x30 */ 607 + 608 + prefetch [%i1 + 0x40], #one_read 609 + prefetch [%l7 + 0x40], #one_read 610 + prefetch [%l6 + 0x40], #one_read 611 + prefetch [%l5 + 0x40], #one_read 612 + prefetch [%i0 + 0x40], #n_writes 613 + 614 + xor %i4, %i2, %i4 615 + xor %i5, %i3, %i5 616 + xor %g2, %i4, %g2 617 + xor %g3, %i5, %g3 618 + xor %l0, %g2, %l0 619 + xor %l1, %g3, %l1 620 + xor %l2, %l0, %l2 621 + xor %l3, %l1, %l3 622 + stxa %l2, [%i0 + 0x30] %asi 623 + stxa %l3, [%i0 + 0x38] %asi 624 + 625 + add %i0, 0x40, %i0 626 + add %i1, 0x40, %i1 627 + add %l7, 0x40, %l7 628 + add %l6, 0x40, %l6 629 + subcc %g1, 1, %g1 630 + bne,pt %xcc, 1b 631 + add %l5, 0x40, %l5 632 + membar #Sync 633 + wr %g7, 0x0, %asi 634 + ret 635 + restore 636 + ENDPROC(xor_niagara_5)

+35

lib/raid/xor/sparc/xor_arch.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* 3 + * Copyright (C) 1997, 1999 Jakub Jelinek (jj@ultra.linux.cz) 4 + * Copyright (C) 2006 David S. Miller <davem@davemloft.net> 5 + */ 6 + #if defined(__sparc__) && defined(__arch64__) 7 + #include <asm/spitfire.h> 8 + 9 + extern struct xor_block_template xor_block_VIS; 10 + extern struct xor_block_template xor_block_niagara; 11 + 12 + static __always_inline void __init arch_xor_init(void) 13 + { 14 + /* Force VIS for everything except Niagara. */ 15 + if (tlb_type == hypervisor && 16 + (sun4v_chip_type == SUN4V_CHIP_NIAGARA1 || 17 + sun4v_chip_type == SUN4V_CHIP_NIAGARA2 || 18 + sun4v_chip_type == SUN4V_CHIP_NIAGARA3 || 19 + sun4v_chip_type == SUN4V_CHIP_NIAGARA4 || 20 + sun4v_chip_type == SUN4V_CHIP_NIAGARA5)) 21 + xor_force(&xor_block_niagara); 22 + else 23 + xor_force(&xor_block_VIS); 24 + } 25 + #else /* sparc64 */ 26 + 27 + extern struct xor_block_template xor_block_SPARC; 28 + 29 + static __always_inline void __init arch_xor_init(void) 30 + { 31 + xor_register(&xor_block_8regs); 32 + xor_register(&xor_block_32regs); 33 + xor_register(&xor_block_SPARC); 34 + } 35 + #endif /* !sparc64 */

+3

lib/raid/xor/tests/Makefile

··· 1 + # SPDX-License-Identifier: GPL-2.0-only 2 + 3 + obj-$(CONFIG_XOR_KUNIT_TEST) += xor_kunit.o

+187

lib/raid/xor/tests/xor_kunit.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-or-later 2 + /* 3 + * Unit test the XOR library functions. 4 + * 5 + * Copyright 2024 Google LLC 6 + * Copyright 2026 Christoph Hellwig 7 + * 8 + * Based on the CRC tests by Eric Biggers <ebiggers@google.com>. 9 + */ 10 + #include <kunit/test.h> 11 + #include <linux/prandom.h> 12 + #include <linux/string_choices.h> 13 + #include <linux/vmalloc.h> 14 + #include <linux/raid/xor.h> 15 + 16 + #define XOR_KUNIT_SEED 42 17 + #define XOR_KUNIT_MAX_BYTES 16384 18 + #define XOR_KUNIT_MAX_BUFFERS 64 19 + #define XOR_KUNIT_NUM_TEST_ITERS 1000 20 + 21 + static struct rnd_state rng; 22 + static void *test_buffers[XOR_KUNIT_MAX_BUFFERS]; 23 + static void *test_dest; 24 + static void *test_ref; 25 + static size_t test_buflen; 26 + 27 + static u32 rand32(void) 28 + { 29 + return prandom_u32_state(&rng); 30 + } 31 + 32 + /* Reference implementation using dumb byte-wise XOR */ 33 + static void xor_ref(void *dest, void **srcs, unsigned int src_cnt, 34 + unsigned int bytes) 35 + { 36 + unsigned int off, idx; 37 + u8 *d = dest; 38 + 39 + for (off = 0; off < bytes; off++) { 40 + for (idx = 0; idx < src_cnt; idx++) { 41 + u8 *src = srcs[idx]; 42 + 43 + d[off] ^= src[off]; 44 + } 45 + } 46 + } 47 + 48 + /* Generate a random length that is a multiple of 512. */ 49 + static unsigned int random_length(unsigned int max_length) 50 + { 51 + return round_up((rand32() % max_length) + 1, 512); 52 + } 53 + 54 + /* Generate a random alignment that is a multiple of 64. */ 55 + static unsigned int random_alignment(unsigned int max_alignment) 56 + { 57 + return ((rand32() % max_alignment) + 1) & ~63; 58 + } 59 + 60 + static void xor_generate_random_data(void) 61 + { 62 + int i; 63 + 64 + prandom_bytes_state(&rng, test_dest, test_buflen); 65 + memcpy(test_ref, test_dest, test_buflen); 66 + for (i = 0; i < XOR_KUNIT_MAX_BUFFERS; i++) 67 + prandom_bytes_state(&rng, test_buffers[i], test_buflen); 68 + } 69 + 70 + /* Test that xor_gen gives the same result as a reference implementation. */ 71 + static void xor_test(struct kunit *test) 72 + { 73 + void *aligned_buffers[XOR_KUNIT_MAX_BUFFERS]; 74 + size_t i; 75 + 76 + for (i = 0; i < XOR_KUNIT_NUM_TEST_ITERS; i++) { 77 + unsigned int nr_buffers = 78 + (rand32() % XOR_KUNIT_MAX_BUFFERS) + 1; 79 + unsigned int len = random_length(XOR_KUNIT_MAX_BYTES); 80 + unsigned int max_alignment, align = 0; 81 + void *buffers; 82 + 83 + if (rand32() % 8 == 0) 84 + /* Refresh the data occasionally. */ 85 + xor_generate_random_data(); 86 + 87 + /* 88 + * If we're not using the entire buffer size, inject randomize 89 + * alignment into the buffer. 90 + */ 91 + max_alignment = XOR_KUNIT_MAX_BYTES - len; 92 + if (max_alignment == 0) { 93 + buffers = test_buffers; 94 + } else if (rand32() % 2 == 0) { 95 + /* Use random alignments mod 64 */ 96 + int j; 97 + 98 + for (j = 0; j < nr_buffers; j++) 99 + aligned_buffers[j] = test_buffers[j] + 100 + random_alignment(max_alignment); 101 + buffers = aligned_buffers; 102 + align = random_alignment(max_alignment); 103 + } else { 104 + /* Go up to the guard page, to catch buffer overreads */ 105 + int j; 106 + 107 + align = test_buflen - len; 108 + for (j = 0; j < nr_buffers; j++) 109 + aligned_buffers[j] = test_buffers[j] + align; 110 + buffers = aligned_buffers; 111 + } 112 + 113 + /* 114 + * Compute the XOR, and verify that it equals the XOR computed 115 + * by a simple byte-at-a-time reference implementation. 116 + */ 117 + xor_ref(test_ref + align, buffers, nr_buffers, len); 118 + xor_gen(test_dest + align, buffers, nr_buffers, len); 119 + KUNIT_EXPECT_MEMEQ_MSG(test, test_ref + align, 120 + test_dest + align, len, 121 + "Wrong result with buffers=%u, len=%u, unaligned=%s, at_end=%s", 122 + nr_buffers, len, 123 + str_yes_no(max_alignment), 124 + str_yes_no(align + len == test_buflen)); 125 + } 126 + } 127 + 128 + static struct kunit_case xor_test_cases[] = { 129 + KUNIT_CASE(xor_test), 130 + {}, 131 + }; 132 + 133 + static int xor_suite_init(struct kunit_suite *suite) 134 + { 135 + int i; 136 + 137 + /* 138 + * Allocate the test buffer using vmalloc() with a page-aligned length 139 + * so that it is immediately followed by a guard page. This allows 140 + * buffer overreads to be detected, even in assembly code. 141 + */ 142 + test_buflen = round_up(XOR_KUNIT_MAX_BYTES, PAGE_SIZE); 143 + test_ref = vmalloc(test_buflen); 144 + if (!test_ref) 145 + return -ENOMEM; 146 + test_dest = vmalloc(test_buflen); 147 + if (!test_dest) 148 + goto out_free_ref; 149 + for (i = 0; i < XOR_KUNIT_MAX_BUFFERS; i++) { 150 + test_buffers[i] = vmalloc(test_buflen); 151 + if (!test_buffers[i]) 152 + goto out_free_buffers; 153 + } 154 + 155 + prandom_seed_state(&rng, XOR_KUNIT_SEED); 156 + xor_generate_random_data(); 157 + return 0; 158 + 159 + out_free_buffers: 160 + while (--i >= 0) 161 + vfree(test_buffers[i]); 162 + vfree(test_dest); 163 + out_free_ref: 164 + vfree(test_ref); 165 + return -ENOMEM; 166 + } 167 + 168 + static void xor_suite_exit(struct kunit_suite *suite) 169 + { 170 + int i; 171 + 172 + vfree(test_ref); 173 + vfree(test_dest); 174 + for (i = 0; i < XOR_KUNIT_MAX_BUFFERS; i++) 175 + vfree(test_buffers[i]); 176 + } 177 + 178 + static struct kunit_suite xor_test_suite = { 179 + .name = "xor", 180 + .test_cases = xor_test_cases, 181 + .suite_init = xor_suite_init, 182 + .suite_exit = xor_suite_exit, 183 + }; 184 + kunit_test_suite(xor_test_suite); 185 + 186 + MODULE_DESCRIPTION("Unit test for the XOR library functions"); 187 + MODULE_LICENSE("GPL");

+2

lib/raid/xor/um/xor_arch.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #include <../x86/xor_arch.h>

+156

lib/raid/xor/x86/xor-avx.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * Optimized XOR parity functions for AVX 4 + * 5 + * Copyright (C) 2012 Intel Corporation 6 + * Author: Jim Kukunas <james.t.kukunas@linux.intel.com> 7 + * 8 + * Based on Ingo Molnar and Zach Brown's respective MMX and SSE routines 9 + */ 10 + #include <linux/compiler.h> 11 + #include <asm/fpu/api.h> 12 + #include "xor_impl.h" 13 + #include "xor_arch.h" 14 + 15 + #define BLOCK4(i) \ 16 + BLOCK(32 * i, 0) \ 17 + BLOCK(32 * (i + 1), 1) \ 18 + BLOCK(32 * (i + 2), 2) \ 19 + BLOCK(32 * (i + 3), 3) 20 + 21 + #define BLOCK16() \ 22 + BLOCK4(0) \ 23 + BLOCK4(4) \ 24 + BLOCK4(8) \ 25 + BLOCK4(12) 26 + 27 + static void xor_avx_2(unsigned long bytes, unsigned long * __restrict p0, 28 + const unsigned long * __restrict p1) 29 + { 30 + unsigned long lines = bytes >> 9; 31 + 32 + while (lines--) { 33 + #undef BLOCK 34 + #define BLOCK(i, reg) \ 35 + do { \ 36 + asm volatile("vmovdqa %0, %%ymm" #reg : : "m" (p1[i / sizeof(*p1)])); \ 37 + asm volatile("vxorps %0, %%ymm" #reg ", %%ymm" #reg : : \ 38 + "m" (p0[i / sizeof(*p0)])); \ 39 + asm volatile("vmovdqa %%ymm" #reg ", %0" : \ 40 + "=m" (p0[i / sizeof(*p0)])); \ 41 + } while (0); 42 + 43 + BLOCK16() 44 + 45 + p0 = (unsigned long *)((uintptr_t)p0 + 512); 46 + p1 = (unsigned long *)((uintptr_t)p1 + 512); 47 + } 48 + } 49 + 50 + static void xor_avx_3(unsigned long bytes, unsigned long * __restrict p0, 51 + const unsigned long * __restrict p1, 52 + const unsigned long * __restrict p2) 53 + { 54 + unsigned long lines = bytes >> 9; 55 + 56 + while (lines--) { 57 + #undef BLOCK 58 + #define BLOCK(i, reg) \ 59 + do { \ 60 + asm volatile("vmovdqa %0, %%ymm" #reg : : "m" (p2[i / sizeof(*p2)])); \ 61 + asm volatile("vxorps %0, %%ymm" #reg ", %%ymm" #reg : : \ 62 + "m" (p1[i / sizeof(*p1)])); \ 63 + asm volatile("vxorps %0, %%ymm" #reg ", %%ymm" #reg : : \ 64 + "m" (p0[i / sizeof(*p0)])); \ 65 + asm volatile("vmovdqa %%ymm" #reg ", %0" : \ 66 + "=m" (p0[i / sizeof(*p0)])); \ 67 + } while (0); 68 + 69 + BLOCK16() 70 + 71 + p0 = (unsigned long *)((uintptr_t)p0 + 512); 72 + p1 = (unsigned long *)((uintptr_t)p1 + 512); 73 + p2 = (unsigned long *)((uintptr_t)p2 + 512); 74 + } 75 + } 76 + 77 + static void xor_avx_4(unsigned long bytes, unsigned long * __restrict p0, 78 + const unsigned long * __restrict p1, 79 + const unsigned long * __restrict p2, 80 + const unsigned long * __restrict p3) 81 + { 82 + unsigned long lines = bytes >> 9; 83 + 84 + while (lines--) { 85 + #undef BLOCK 86 + #define BLOCK(i, reg) \ 87 + do { \ 88 + asm volatile("vmovdqa %0, %%ymm" #reg : : "m" (p3[i / sizeof(*p3)])); \ 89 + asm volatile("vxorps %0, %%ymm" #reg ", %%ymm" #reg : : \ 90 + "m" (p2[i / sizeof(*p2)])); \ 91 + asm volatile("vxorps %0, %%ymm" #reg ", %%ymm" #reg : : \ 92 + "m" (p1[i / sizeof(*p1)])); \ 93 + asm volatile("vxorps %0, %%ymm" #reg ", %%ymm" #reg : : \ 94 + "m" (p0[i / sizeof(*p0)])); \ 95 + asm volatile("vmovdqa %%ymm" #reg ", %0" : \ 96 + "=m" (p0[i / sizeof(*p0)])); \ 97 + } while (0); 98 + 99 + BLOCK16(); 100 + 101 + p0 = (unsigned long *)((uintptr_t)p0 + 512); 102 + p1 = (unsigned long *)((uintptr_t)p1 + 512); 103 + p2 = (unsigned long *)((uintptr_t)p2 + 512); 104 + p3 = (unsigned long *)((uintptr_t)p3 + 512); 105 + } 106 + } 107 + 108 + static void xor_avx_5(unsigned long bytes, unsigned long * __restrict p0, 109 + const unsigned long * __restrict p1, 110 + const unsigned long * __restrict p2, 111 + const unsigned long * __restrict p3, 112 + const unsigned long * __restrict p4) 113 + { 114 + unsigned long lines = bytes >> 9; 115 + 116 + while (lines--) { 117 + #undef BLOCK 118 + #define BLOCK(i, reg) \ 119 + do { \ 120 + asm volatile("vmovdqa %0, %%ymm" #reg : : "m" (p4[i / sizeof(*p4)])); \ 121 + asm volatile("vxorps %0, %%ymm" #reg ", %%ymm" #reg : : \ 122 + "m" (p3[i / sizeof(*p3)])); \ 123 + asm volatile("vxorps %0, %%ymm" #reg ", %%ymm" #reg : : \ 124 + "m" (p2[i / sizeof(*p2)])); \ 125 + asm volatile("vxorps %0, %%ymm" #reg ", %%ymm" #reg : : \ 126 + "m" (p1[i / sizeof(*p1)])); \ 127 + asm volatile("vxorps %0, %%ymm" #reg ", %%ymm" #reg : : \ 128 + "m" (p0[i / sizeof(*p0)])); \ 129 + asm volatile("vmovdqa %%ymm" #reg ", %0" : \ 130 + "=m" (p0[i / sizeof(*p0)])); \ 131 + } while (0); 132 + 133 + BLOCK16() 134 + 135 + p0 = (unsigned long *)((uintptr_t)p0 + 512); 136 + p1 = (unsigned long *)((uintptr_t)p1 + 512); 137 + p2 = (unsigned long *)((uintptr_t)p2 + 512); 138 + p3 = (unsigned long *)((uintptr_t)p3 + 512); 139 + p4 = (unsigned long *)((uintptr_t)p4 + 512); 140 + } 141 + } 142 + 143 + DO_XOR_BLOCKS(avx_inner, xor_avx_2, xor_avx_3, xor_avx_4, xor_avx_5); 144 + 145 + static void xor_gen_avx(void *dest, void **srcs, unsigned int src_cnt, 146 + unsigned int bytes) 147 + { 148 + kernel_fpu_begin(); 149 + xor_gen_avx_inner(dest, srcs, src_cnt, bytes); 150 + kernel_fpu_end(); 151 + } 152 + 153 + struct xor_block_template xor_block_avx = { 154 + .name = "avx", 155 + .xor_gen = xor_gen_avx, 156 + };

+515

lib/raid/xor/x86/xor-mmx.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-or-later 2 + /* 3 + * Optimized XOR parity functions for MMX. 4 + * 5 + * Copyright (C) 1998 Ingo Molnar. 6 + */ 7 + #include <asm/fpu/api.h> 8 + #include "xor_impl.h" 9 + #include "xor_arch.h" 10 + 11 + #define LD(x, y) " movq 8*("#x")(%1), %%mm"#y" ;\n" 12 + #define ST(x, y) " movq %%mm"#y", 8*("#x")(%1) ;\n" 13 + #define XO1(x, y) " pxor 8*("#x")(%2), %%mm"#y" ;\n" 14 + #define XO2(x, y) " pxor 8*("#x")(%3), %%mm"#y" ;\n" 15 + #define XO3(x, y) " pxor 8*("#x")(%4), %%mm"#y" ;\n" 16 + #define XO4(x, y) " pxor 8*("#x")(%5), %%mm"#y" ;\n" 17 + 18 + static void 19 + xor_pII_mmx_2(unsigned long bytes, unsigned long * __restrict p1, 20 + const unsigned long * __restrict p2) 21 + { 22 + unsigned long lines = bytes >> 7; 23 + 24 + asm volatile( 25 + #undef BLOCK 26 + #define BLOCK(i) \ 27 + LD(i, 0) \ 28 + LD(i + 1, 1) \ 29 + LD(i + 2, 2) \ 30 + LD(i + 3, 3) \ 31 + XO1(i, 0) \ 32 + ST(i, 0) \ 33 + XO1(i+1, 1) \ 34 + ST(i+1, 1) \ 35 + XO1(i + 2, 2) \ 36 + ST(i + 2, 2) \ 37 + XO1(i + 3, 3) \ 38 + ST(i + 3, 3) 39 + 40 + " .align 32 ;\n" 41 + " 1: ;\n" 42 + 43 + BLOCK(0) 44 + BLOCK(4) 45 + BLOCK(8) 46 + BLOCK(12) 47 + 48 + " addl $128, %1 ;\n" 49 + " addl $128, %2 ;\n" 50 + " decl %0 ;\n" 51 + " jnz 1b ;\n" 52 + : "+r" (lines), 53 + "+r" (p1), "+r" (p2) 54 + : 55 + : "memory"); 56 + } 57 + 58 + static void 59 + xor_pII_mmx_3(unsigned long bytes, unsigned long * __restrict p1, 60 + const unsigned long * __restrict p2, 61 + const unsigned long * __restrict p3) 62 + { 63 + unsigned long lines = bytes >> 7; 64 + 65 + asm volatile( 66 + #undef BLOCK 67 + #define BLOCK(i) \ 68 + LD(i, 0) \ 69 + LD(i + 1, 1) \ 70 + LD(i + 2, 2) \ 71 + LD(i + 3, 3) \ 72 + XO1(i, 0) \ 73 + XO1(i + 1, 1) \ 74 + XO1(i + 2, 2) \ 75 + XO1(i + 3, 3) \ 76 + XO2(i, 0) \ 77 + ST(i, 0) \ 78 + XO2(i + 1, 1) \ 79 + ST(i + 1, 1) \ 80 + XO2(i + 2, 2) \ 81 + ST(i + 2, 2) \ 82 + XO2(i + 3, 3) \ 83 + ST(i + 3, 3) 84 + 85 + " .align 32 ;\n" 86 + " 1: ;\n" 87 + 88 + BLOCK(0) 89 + BLOCK(4) 90 + BLOCK(8) 91 + BLOCK(12) 92 + 93 + " addl $128, %1 ;\n" 94 + " addl $128, %2 ;\n" 95 + " addl $128, %3 ;\n" 96 + " decl %0 ;\n" 97 + " jnz 1b ;\n" 98 + : "+r" (lines), 99 + "+r" (p1), "+r" (p2), "+r" (p3) 100 + : 101 + : "memory"); 102 + } 103 + 104 + static void 105 + xor_pII_mmx_4(unsigned long bytes, unsigned long * __restrict p1, 106 + const unsigned long * __restrict p2, 107 + const unsigned long * __restrict p3, 108 + const unsigned long * __restrict p4) 109 + { 110 + unsigned long lines = bytes >> 7; 111 + 112 + asm volatile( 113 + #undef BLOCK 114 + #define BLOCK(i) \ 115 + LD(i, 0) \ 116 + LD(i + 1, 1) \ 117 + LD(i + 2, 2) \ 118 + LD(i + 3, 3) \ 119 + XO1(i, 0) \ 120 + XO1(i + 1, 1) \ 121 + XO1(i + 2, 2) \ 122 + XO1(i + 3, 3) \ 123 + XO2(i, 0) \ 124 + XO2(i + 1, 1) \ 125 + XO2(i + 2, 2) \ 126 + XO2(i + 3, 3) \ 127 + XO3(i, 0) \ 128 + ST(i, 0) \ 129 + XO3(i + 1, 1) \ 130 + ST(i + 1, 1) \ 131 + XO3(i + 2, 2) \ 132 + ST(i + 2, 2) \ 133 + XO3(i + 3, 3) \ 134 + ST(i + 3, 3) 135 + 136 + " .align 32 ;\n" 137 + " 1: ;\n" 138 + 139 + BLOCK(0) 140 + BLOCK(4) 141 + BLOCK(8) 142 + BLOCK(12) 143 + 144 + " addl $128, %1 ;\n" 145 + " addl $128, %2 ;\n" 146 + " addl $128, %3 ;\n" 147 + " addl $128, %4 ;\n" 148 + " decl %0 ;\n" 149 + " jnz 1b ;\n" 150 + : "+r" (lines), 151 + "+r" (p1), "+r" (p2), "+r" (p3), "+r" (p4) 152 + : 153 + : "memory"); 154 + } 155 + 156 + 157 + static void 158 + xor_pII_mmx_5(unsigned long bytes, unsigned long * __restrict p1, 159 + const unsigned long * __restrict p2, 160 + const unsigned long * __restrict p3, 161 + const unsigned long * __restrict p4, 162 + const unsigned long * __restrict p5) 163 + { 164 + unsigned long lines = bytes >> 7; 165 + 166 + /* Make sure GCC forgets anything it knows about p4 or p5, 167 + such that it won't pass to the asm volatile below a 168 + register that is shared with any other variable. That's 169 + because we modify p4 and p5 there, but we can't mark them 170 + as read/write, otherwise we'd overflow the 10-asm-operands 171 + limit of GCC < 3.1. */ 172 + asm("" : "+r" (p4), "+r" (p5)); 173 + 174 + asm volatile( 175 + #undef BLOCK 176 + #define BLOCK(i) \ 177 + LD(i, 0) \ 178 + LD(i + 1, 1) \ 179 + LD(i + 2, 2) \ 180 + LD(i + 3, 3) \ 181 + XO1(i, 0) \ 182 + XO1(i + 1, 1) \ 183 + XO1(i + 2, 2) \ 184 + XO1(i + 3, 3) \ 185 + XO2(i, 0) \ 186 + XO2(i + 1, 1) \ 187 + XO2(i + 2, 2) \ 188 + XO2(i + 3, 3) \ 189 + XO3(i, 0) \ 190 + XO3(i + 1, 1) \ 191 + XO3(i + 2, 2) \ 192 + XO3(i + 3, 3) \ 193 + XO4(i, 0) \ 194 + ST(i, 0) \ 195 + XO4(i + 1, 1) \ 196 + ST(i + 1, 1) \ 197 + XO4(i + 2, 2) \ 198 + ST(i + 2, 2) \ 199 + XO4(i + 3, 3) \ 200 + ST(i + 3, 3) 201 + 202 + " .align 32 ;\n" 203 + " 1: ;\n" 204 + 205 + BLOCK(0) 206 + BLOCK(4) 207 + BLOCK(8) 208 + BLOCK(12) 209 + 210 + " addl $128, %1 ;\n" 211 + " addl $128, %2 ;\n" 212 + " addl $128, %3 ;\n" 213 + " addl $128, %4 ;\n" 214 + " addl $128, %5 ;\n" 215 + " decl %0 ;\n" 216 + " jnz 1b ;\n" 217 + : "+r" (lines), 218 + "+r" (p1), "+r" (p2), "+r" (p3) 219 + : "r" (p4), "r" (p5) 220 + : "memory"); 221 + 222 + /* p4 and p5 were modified, and now the variables are dead. 223 + Clobber them just to be sure nobody does something stupid 224 + like assuming they have some legal value. */ 225 + asm("" : "=r" (p4), "=r" (p5)); 226 + } 227 + 228 + #undef LD 229 + #undef XO1 230 + #undef XO2 231 + #undef XO3 232 + #undef XO4 233 + #undef ST 234 + #undef BLOCK 235 + 236 + static void 237 + xor_p5_mmx_2(unsigned long bytes, unsigned long * __restrict p1, 238 + const unsigned long * __restrict p2) 239 + { 240 + unsigned long lines = bytes >> 6; 241 + 242 + asm volatile( 243 + " .align 32 ;\n" 244 + " 1: ;\n" 245 + " movq (%1), %%mm0 ;\n" 246 + " movq 8(%1), %%mm1 ;\n" 247 + " pxor (%2), %%mm0 ;\n" 248 + " movq 16(%1), %%mm2 ;\n" 249 + " movq %%mm0, (%1) ;\n" 250 + " pxor 8(%2), %%mm1 ;\n" 251 + " movq 24(%1), %%mm3 ;\n" 252 + " movq %%mm1, 8(%1) ;\n" 253 + " pxor 16(%2), %%mm2 ;\n" 254 + " movq 32(%1), %%mm4 ;\n" 255 + " movq %%mm2, 16(%1) ;\n" 256 + " pxor 24(%2), %%mm3 ;\n" 257 + " movq 40(%1), %%mm5 ;\n" 258 + " movq %%mm3, 24(%1) ;\n" 259 + " pxor 32(%2), %%mm4 ;\n" 260 + " movq 48(%1), %%mm6 ;\n" 261 + " movq %%mm4, 32(%1) ;\n" 262 + " pxor 40(%2), %%mm5 ;\n" 263 + " movq 56(%1), %%mm7 ;\n" 264 + " movq %%mm5, 40(%1) ;\n" 265 + " pxor 48(%2), %%mm6 ;\n" 266 + " pxor 56(%2), %%mm7 ;\n" 267 + " movq %%mm6, 48(%1) ;\n" 268 + " movq %%mm7, 56(%1) ;\n" 269 + 270 + " addl $64, %1 ;\n" 271 + " addl $64, %2 ;\n" 272 + " decl %0 ;\n" 273 + " jnz 1b ;\n" 274 + : "+r" (lines), 275 + "+r" (p1), "+r" (p2) 276 + : 277 + : "memory"); 278 + } 279 + 280 + static void 281 + xor_p5_mmx_3(unsigned long bytes, unsigned long * __restrict p1, 282 + const unsigned long * __restrict p2, 283 + const unsigned long * __restrict p3) 284 + { 285 + unsigned long lines = bytes >> 6; 286 + 287 + asm volatile( 288 + " .align 32,0x90 ;\n" 289 + " 1: ;\n" 290 + " movq (%1), %%mm0 ;\n" 291 + " movq 8(%1), %%mm1 ;\n" 292 + " pxor (%2), %%mm0 ;\n" 293 + " movq 16(%1), %%mm2 ;\n" 294 + " pxor 8(%2), %%mm1 ;\n" 295 + " pxor (%3), %%mm0 ;\n" 296 + " pxor 16(%2), %%mm2 ;\n" 297 + " movq %%mm0, (%1) ;\n" 298 + " pxor 8(%3), %%mm1 ;\n" 299 + " pxor 16(%3), %%mm2 ;\n" 300 + " movq 24(%1), %%mm3 ;\n" 301 + " movq %%mm1, 8(%1) ;\n" 302 + " movq 32(%1), %%mm4 ;\n" 303 + " movq 40(%1), %%mm5 ;\n" 304 + " pxor 24(%2), %%mm3 ;\n" 305 + " movq %%mm2, 16(%1) ;\n" 306 + " pxor 32(%2), %%mm4 ;\n" 307 + " pxor 24(%3), %%mm3 ;\n" 308 + " pxor 40(%2), %%mm5 ;\n" 309 + " movq %%mm3, 24(%1) ;\n" 310 + " pxor 32(%3), %%mm4 ;\n" 311 + " pxor 40(%3), %%mm5 ;\n" 312 + " movq 48(%1), %%mm6 ;\n" 313 + " movq %%mm4, 32(%1) ;\n" 314 + " movq 56(%1), %%mm7 ;\n" 315 + " pxor 48(%2), %%mm6 ;\n" 316 + " movq %%mm5, 40(%1) ;\n" 317 + " pxor 56(%2), %%mm7 ;\n" 318 + " pxor 48(%3), %%mm6 ;\n" 319 + " pxor 56(%3), %%mm7 ;\n" 320 + " movq %%mm6, 48(%1) ;\n" 321 + " movq %%mm7, 56(%1) ;\n" 322 + 323 + " addl $64, %1 ;\n" 324 + " addl $64, %2 ;\n" 325 + " addl $64, %3 ;\n" 326 + " decl %0 ;\n" 327 + " jnz 1b ;\n" 328 + : "+r" (lines), 329 + "+r" (p1), "+r" (p2), "+r" (p3) 330 + : 331 + : "memory" ); 332 + } 333 + 334 + static void 335 + xor_p5_mmx_4(unsigned long bytes, unsigned long * __restrict p1, 336 + const unsigned long * __restrict p2, 337 + const unsigned long * __restrict p3, 338 + const unsigned long * __restrict p4) 339 + { 340 + unsigned long lines = bytes >> 6; 341 + 342 + asm volatile( 343 + " .align 32,0x90 ;\n" 344 + " 1: ;\n" 345 + " movq (%1), %%mm0 ;\n" 346 + " movq 8(%1), %%mm1 ;\n" 347 + " pxor (%2), %%mm0 ;\n" 348 + " movq 16(%1), %%mm2 ;\n" 349 + " pxor 8(%2), %%mm1 ;\n" 350 + " pxor (%3), %%mm0 ;\n" 351 + " pxor 16(%2), %%mm2 ;\n" 352 + " pxor 8(%3), %%mm1 ;\n" 353 + " pxor (%4), %%mm0 ;\n" 354 + " movq 24(%1), %%mm3 ;\n" 355 + " pxor 16(%3), %%mm2 ;\n" 356 + " pxor 8(%4), %%mm1 ;\n" 357 + " movq %%mm0, (%1) ;\n" 358 + " movq 32(%1), %%mm4 ;\n" 359 + " pxor 24(%2), %%mm3 ;\n" 360 + " pxor 16(%4), %%mm2 ;\n" 361 + " movq %%mm1, 8(%1) ;\n" 362 + " movq 40(%1), %%mm5 ;\n" 363 + " pxor 32(%2), %%mm4 ;\n" 364 + " pxor 24(%3), %%mm3 ;\n" 365 + " movq %%mm2, 16(%1) ;\n" 366 + " pxor 40(%2), %%mm5 ;\n" 367 + " pxor 32(%3), %%mm4 ;\n" 368 + " pxor 24(%4), %%mm3 ;\n" 369 + " movq %%mm3, 24(%1) ;\n" 370 + " movq 56(%1), %%mm7 ;\n" 371 + " movq 48(%1), %%mm6 ;\n" 372 + " pxor 40(%3), %%mm5 ;\n" 373 + " pxor 32(%4), %%mm4 ;\n" 374 + " pxor 48(%2), %%mm6 ;\n" 375 + " movq %%mm4, 32(%1) ;\n" 376 + " pxor 56(%2), %%mm7 ;\n" 377 + " pxor 40(%4), %%mm5 ;\n" 378 + " pxor 48(%3), %%mm6 ;\n" 379 + " pxor 56(%3), %%mm7 ;\n" 380 + " movq %%mm5, 40(%1) ;\n" 381 + " pxor 48(%4), %%mm6 ;\n" 382 + " pxor 56(%4), %%mm7 ;\n" 383 + " movq %%mm6, 48(%1) ;\n" 384 + " movq %%mm7, 56(%1) ;\n" 385 + 386 + " addl $64, %1 ;\n" 387 + " addl $64, %2 ;\n" 388 + " addl $64, %3 ;\n" 389 + " addl $64, %4 ;\n" 390 + " decl %0 ;\n" 391 + " jnz 1b ;\n" 392 + : "+r" (lines), 393 + "+r" (p1), "+r" (p2), "+r" (p3), "+r" (p4) 394 + : 395 + : "memory"); 396 + } 397 + 398 + static void 399 + xor_p5_mmx_5(unsigned long bytes, unsigned long * __restrict p1, 400 + const unsigned long * __restrict p2, 401 + const unsigned long * __restrict p3, 402 + const unsigned long * __restrict p4, 403 + const unsigned long * __restrict p5) 404 + { 405 + unsigned long lines = bytes >> 6; 406 + 407 + /* Make sure GCC forgets anything it knows about p4 or p5, 408 + such that it won't pass to the asm volatile below a 409 + register that is shared with any other variable. That's 410 + because we modify p4 and p5 there, but we can't mark them 411 + as read/write, otherwise we'd overflow the 10-asm-operands 412 + limit of GCC < 3.1. */ 413 + asm("" : "+r" (p4), "+r" (p5)); 414 + 415 + asm volatile( 416 + " .align 32,0x90 ;\n" 417 + " 1: ;\n" 418 + " movq (%1), %%mm0 ;\n" 419 + " movq 8(%1), %%mm1 ;\n" 420 + " pxor (%2), %%mm0 ;\n" 421 + " pxor 8(%2), %%mm1 ;\n" 422 + " movq 16(%1), %%mm2 ;\n" 423 + " pxor (%3), %%mm0 ;\n" 424 + " pxor 8(%3), %%mm1 ;\n" 425 + " pxor 16(%2), %%mm2 ;\n" 426 + " pxor (%4), %%mm0 ;\n" 427 + " pxor 8(%4), %%mm1 ;\n" 428 + " pxor 16(%3), %%mm2 ;\n" 429 + " movq 24(%1), %%mm3 ;\n" 430 + " pxor (%5), %%mm0 ;\n" 431 + " pxor 8(%5), %%mm1 ;\n" 432 + " movq %%mm0, (%1) ;\n" 433 + " pxor 16(%4), %%mm2 ;\n" 434 + " pxor 24(%2), %%mm3 ;\n" 435 + " movq %%mm1, 8(%1) ;\n" 436 + " pxor 16(%5), %%mm2 ;\n" 437 + " pxor 24(%3), %%mm3 ;\n" 438 + " movq 32(%1), %%mm4 ;\n" 439 + " movq %%mm2, 16(%1) ;\n" 440 + " pxor 24(%4), %%mm3 ;\n" 441 + " pxor 32(%2), %%mm4 ;\n" 442 + " movq 40(%1), %%mm5 ;\n" 443 + " pxor 24(%5), %%mm3 ;\n" 444 + " pxor 32(%3), %%mm4 ;\n" 445 + " pxor 40(%2), %%mm5 ;\n" 446 + " movq %%mm3, 24(%1) ;\n" 447 + " pxor 32(%4), %%mm4 ;\n" 448 + " pxor 40(%3), %%mm5 ;\n" 449 + " movq 48(%1), %%mm6 ;\n" 450 + " movq 56(%1), %%mm7 ;\n" 451 + " pxor 32(%5), %%mm4 ;\n" 452 + " pxor 40(%4), %%mm5 ;\n" 453 + " pxor 48(%2), %%mm6 ;\n" 454 + " pxor 56(%2), %%mm7 ;\n" 455 + " movq %%mm4, 32(%1) ;\n" 456 + " pxor 48(%3), %%mm6 ;\n" 457 + " pxor 56(%3), %%mm7 ;\n" 458 + " pxor 40(%5), %%mm5 ;\n" 459 + " pxor 48(%4), %%mm6 ;\n" 460 + " pxor 56(%4), %%mm7 ;\n" 461 + " movq %%mm5, 40(%1) ;\n" 462 + " pxor 48(%5), %%mm6 ;\n" 463 + " pxor 56(%5), %%mm7 ;\n" 464 + " movq %%mm6, 48(%1) ;\n" 465 + " movq %%mm7, 56(%1) ;\n" 466 + 467 + " addl $64, %1 ;\n" 468 + " addl $64, %2 ;\n" 469 + " addl $64, %3 ;\n" 470 + " addl $64, %4 ;\n" 471 + " addl $64, %5 ;\n" 472 + " decl %0 ;\n" 473 + " jnz 1b ;\n" 474 + : "+r" (lines), 475 + "+r" (p1), "+r" (p2), "+r" (p3) 476 + : "r" (p4), "r" (p5) 477 + : "memory"); 478 + 479 + /* p4 and p5 were modified, and now the variables are dead. 480 + Clobber them just to be sure nobody does something stupid 481 + like assuming they have some legal value. */ 482 + asm("" : "=r" (p4), "=r" (p5)); 483 + } 484 + 485 + DO_XOR_BLOCKS(pII_mmx_inner, xor_pII_mmx_2, xor_pII_mmx_3, xor_pII_mmx_4, 486 + xor_pII_mmx_5); 487 + 488 + static void xor_gen_pII_mmx(void *dest, void **srcs, unsigned int src_cnt, 489 + unsigned int bytes) 490 + { 491 + kernel_fpu_begin(); 492 + xor_gen_pII_mmx_inner(dest, srcs, src_cnt, bytes); 493 + kernel_fpu_end(); 494 + } 495 + 496 + struct xor_block_template xor_block_pII_mmx = { 497 + .name = "pII_mmx", 498 + .xor_gen = xor_gen_pII_mmx, 499 + }; 500 + 501 + DO_XOR_BLOCKS(p5_mmx_inner, xor_p5_mmx_2, xor_p5_mmx_3, xor_p5_mmx_4, 502 + xor_p5_mmx_5); 503 + 504 + static void xor_gen_p5_mmx(void *dest, void **srcs, unsigned int src_cnt, 505 + unsigned int bytes) 506 + { 507 + kernel_fpu_begin(); 508 + xor_gen_p5_mmx_inner(dest, srcs, src_cnt, bytes); 509 + kernel_fpu_end(); 510 + } 511 + 512 + struct xor_block_template xor_block_p5_mmx = { 513 + .name = "p5_mmx", 514 + .xor_gen = xor_gen_p5_mmx, 515 + };

+459

lib/raid/xor/x86/xor-sse.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-or-later 2 + /* 3 + * Optimized XOR parity functions for SSE. 4 + * 5 + * Cache avoiding checksumming functions utilizing KNI instructions 6 + * Copyright (C) 1999 Zach Brown (with obvious credit due Ingo) 7 + * 8 + * Based on 9 + * High-speed RAID5 checksumming functions utilizing SSE instructions. 10 + * Copyright (C) 1998 Ingo Molnar. 11 + * 12 + * x86-64 changes / gcc fixes from Andi Kleen. 13 + * Copyright 2002 Andi Kleen, SuSE Labs. 14 + */ 15 + #include <asm/fpu/api.h> 16 + #include "xor_impl.h" 17 + #include "xor_arch.h" 18 + 19 + #ifdef CONFIG_X86_32 20 + /* reduce register pressure */ 21 + # define XOR_CONSTANT_CONSTRAINT "i" 22 + #else 23 + # define XOR_CONSTANT_CONSTRAINT "re" 24 + #endif 25 + 26 + #define OFFS(x) "16*("#x")" 27 + #define PF_OFFS(x) "256+16*("#x")" 28 + #define PF0(x) " prefetchnta "PF_OFFS(x)"(%[p1]) ;\n" 29 + #define LD(x, y) " movaps "OFFS(x)"(%[p1]), %%xmm"#y" ;\n" 30 + #define ST(x, y) " movaps %%xmm"#y", "OFFS(x)"(%[p1]) ;\n" 31 + #define PF1(x) " prefetchnta "PF_OFFS(x)"(%[p2]) ;\n" 32 + #define PF2(x) " prefetchnta "PF_OFFS(x)"(%[p3]) ;\n" 33 + #define PF3(x) " prefetchnta "PF_OFFS(x)"(%[p4]) ;\n" 34 + #define PF4(x) " prefetchnta "PF_OFFS(x)"(%[p5]) ;\n" 35 + #define XO1(x, y) " xorps "OFFS(x)"(%[p2]), %%xmm"#y" ;\n" 36 + #define XO2(x, y) " xorps "OFFS(x)"(%[p3]), %%xmm"#y" ;\n" 37 + #define XO3(x, y) " xorps "OFFS(x)"(%[p4]), %%xmm"#y" ;\n" 38 + #define XO4(x, y) " xorps "OFFS(x)"(%[p5]), %%xmm"#y" ;\n" 39 + #define NOP(x) 40 + 41 + #define BLK64(pf, op, i) \ 42 + pf(i) \ 43 + op(i, 0) \ 44 + op(i + 1, 1) \ 45 + op(i + 2, 2) \ 46 + op(i + 3, 3) 47 + 48 + static void 49 + xor_sse_2(unsigned long bytes, unsigned long * __restrict p1, 50 + const unsigned long * __restrict p2) 51 + { 52 + unsigned long lines = bytes >> 8; 53 + 54 + asm volatile( 55 + #undef BLOCK 56 + #define BLOCK(i) \ 57 + LD(i, 0) \ 58 + LD(i + 1, 1) \ 59 + PF1(i) \ 60 + PF1(i + 2) \ 61 + LD(i + 2, 2) \ 62 + LD(i + 3, 3) \ 63 + PF0(i + 4) \ 64 + PF0(i + 6) \ 65 + XO1(i, 0) \ 66 + XO1(i + 1, 1) \ 67 + XO1(i + 2, 2) \ 68 + XO1(i + 3, 3) \ 69 + ST(i, 0) \ 70 + ST(i + 1, 1) \ 71 + ST(i + 2, 2) \ 72 + ST(i + 3, 3) \ 73 + 74 + 75 + PF0(0) 76 + PF0(2) 77 + 78 + " .align 32 ;\n" 79 + " 1: ;\n" 80 + 81 + BLOCK(0) 82 + BLOCK(4) 83 + BLOCK(8) 84 + BLOCK(12) 85 + 86 + " add %[inc], %[p1] ;\n" 87 + " add %[inc], %[p2] ;\n" 88 + " dec %[cnt] ;\n" 89 + " jnz 1b ;\n" 90 + : [cnt] "+r" (lines), 91 + [p1] "+r" (p1), [p2] "+r" (p2) 92 + : [inc] XOR_CONSTANT_CONSTRAINT (256UL) 93 + : "memory"); 94 + } 95 + 96 + static void 97 + xor_sse_2_pf64(unsigned long bytes, unsigned long * __restrict p1, 98 + const unsigned long * __restrict p2) 99 + { 100 + unsigned long lines = bytes >> 8; 101 + 102 + asm volatile( 103 + #undef BLOCK 104 + #define BLOCK(i) \ 105 + BLK64(PF0, LD, i) \ 106 + BLK64(PF1, XO1, i) \ 107 + BLK64(NOP, ST, i) \ 108 + 109 + " .align 32 ;\n" 110 + " 1: ;\n" 111 + 112 + BLOCK(0) 113 + BLOCK(4) 114 + BLOCK(8) 115 + BLOCK(12) 116 + 117 + " add %[inc], %[p1] ;\n" 118 + " add %[inc], %[p2] ;\n" 119 + " dec %[cnt] ;\n" 120 + " jnz 1b ;\n" 121 + : [cnt] "+r" (lines), 122 + [p1] "+r" (p1), [p2] "+r" (p2) 123 + : [inc] XOR_CONSTANT_CONSTRAINT (256UL) 124 + : "memory"); 125 + } 126 + 127 + static void 128 + xor_sse_3(unsigned long bytes, unsigned long * __restrict p1, 129 + const unsigned long * __restrict p2, 130 + const unsigned long * __restrict p3) 131 + { 132 + unsigned long lines = bytes >> 8; 133 + 134 + asm volatile( 135 + #undef BLOCK 136 + #define BLOCK(i) \ 137 + PF1(i) \ 138 + PF1(i + 2) \ 139 + LD(i, 0) \ 140 + LD(i + 1, 1) \ 141 + LD(i + 2, 2) \ 142 + LD(i + 3, 3) \ 143 + PF2(i) \ 144 + PF2(i + 2) \ 145 + PF0(i + 4) \ 146 + PF0(i + 6) \ 147 + XO1(i, 0) \ 148 + XO1(i + 1, 1) \ 149 + XO1(i + 2, 2) \ 150 + XO1(i + 3, 3) \ 151 + XO2(i, 0) \ 152 + XO2(i + 1, 1) \ 153 + XO2(i + 2, 2) \ 154 + XO2(i + 3, 3) \ 155 + ST(i, 0) \ 156 + ST(i + 1, 1) \ 157 + ST(i + 2, 2) \ 158 + ST(i + 3, 3) \ 159 + 160 + 161 + PF0(0) 162 + PF0(2) 163 + 164 + " .align 32 ;\n" 165 + " 1: ;\n" 166 + 167 + BLOCK(0) 168 + BLOCK(4) 169 + BLOCK(8) 170 + BLOCK(12) 171 + 172 + " add %[inc], %[p1] ;\n" 173 + " add %[inc], %[p2] ;\n" 174 + " add %[inc], %[p3] ;\n" 175 + " dec %[cnt] ;\n" 176 + " jnz 1b ;\n" 177 + : [cnt] "+r" (lines), 178 + [p1] "+r" (p1), [p2] "+r" (p2), [p3] "+r" (p3) 179 + : [inc] XOR_CONSTANT_CONSTRAINT (256UL) 180 + : "memory"); 181 + } 182 + 183 + static void 184 + xor_sse_3_pf64(unsigned long bytes, unsigned long * __restrict p1, 185 + const unsigned long * __restrict p2, 186 + const unsigned long * __restrict p3) 187 + { 188 + unsigned long lines = bytes >> 8; 189 + 190 + asm volatile( 191 + #undef BLOCK 192 + #define BLOCK(i) \ 193 + BLK64(PF0, LD, i) \ 194 + BLK64(PF1, XO1, i) \ 195 + BLK64(PF2, XO2, i) \ 196 + BLK64(NOP, ST, i) \ 197 + 198 + " .align 32 ;\n" 199 + " 1: ;\n" 200 + 201 + BLOCK(0) 202 + BLOCK(4) 203 + BLOCK(8) 204 + BLOCK(12) 205 + 206 + " add %[inc], %[p1] ;\n" 207 + " add %[inc], %[p2] ;\n" 208 + " add %[inc], %[p3] ;\n" 209 + " dec %[cnt] ;\n" 210 + " jnz 1b ;\n" 211 + : [cnt] "+r" (lines), 212 + [p1] "+r" (p1), [p2] "+r" (p2), [p3] "+r" (p3) 213 + : [inc] XOR_CONSTANT_CONSTRAINT (256UL) 214 + : "memory"); 215 + } 216 + 217 + static void 218 + xor_sse_4(unsigned long bytes, unsigned long * __restrict p1, 219 + const unsigned long * __restrict p2, 220 + const unsigned long * __restrict p3, 221 + const unsigned long * __restrict p4) 222 + { 223 + unsigned long lines = bytes >> 8; 224 + 225 + asm volatile( 226 + #undef BLOCK 227 + #define BLOCK(i) \ 228 + PF1(i) \ 229 + PF1(i + 2) \ 230 + LD(i, 0) \ 231 + LD(i + 1, 1) \ 232 + LD(i + 2, 2) \ 233 + LD(i + 3, 3) \ 234 + PF2(i) \ 235 + PF2(i + 2) \ 236 + XO1(i, 0) \ 237 + XO1(i + 1, 1) \ 238 + XO1(i + 2, 2) \ 239 + XO1(i + 3, 3) \ 240 + PF3(i) \ 241 + PF3(i + 2) \ 242 + PF0(i + 4) \ 243 + PF0(i + 6) \ 244 + XO2(i, 0) \ 245 + XO2(i + 1, 1) \ 246 + XO2(i + 2, 2) \ 247 + XO2(i + 3, 3) \ 248 + XO3(i, 0) \ 249 + XO3(i + 1, 1) \ 250 + XO3(i + 2, 2) \ 251 + XO3(i + 3, 3) \ 252 + ST(i, 0) \ 253 + ST(i + 1, 1) \ 254 + ST(i + 2, 2) \ 255 + ST(i + 3, 3) \ 256 + 257 + 258 + PF0(0) 259 + PF0(2) 260 + 261 + " .align 32 ;\n" 262 + " 1: ;\n" 263 + 264 + BLOCK(0) 265 + BLOCK(4) 266 + BLOCK(8) 267 + BLOCK(12) 268 + 269 + " add %[inc], %[p1] ;\n" 270 + " add %[inc], %[p2] ;\n" 271 + " add %[inc], %[p3] ;\n" 272 + " add %[inc], %[p4] ;\n" 273 + " dec %[cnt] ;\n" 274 + " jnz 1b ;\n" 275 + : [cnt] "+r" (lines), [p1] "+r" (p1), 276 + [p2] "+r" (p2), [p3] "+r" (p3), [p4] "+r" (p4) 277 + : [inc] XOR_CONSTANT_CONSTRAINT (256UL) 278 + : "memory"); 279 + } 280 + 281 + static void 282 + xor_sse_4_pf64(unsigned long bytes, unsigned long * __restrict p1, 283 + const unsigned long * __restrict p2, 284 + const unsigned long * __restrict p3, 285 + const unsigned long * __restrict p4) 286 + { 287 + unsigned long lines = bytes >> 8; 288 + 289 + asm volatile( 290 + #undef BLOCK 291 + #define BLOCK(i) \ 292 + BLK64(PF0, LD, i) \ 293 + BLK64(PF1, XO1, i) \ 294 + BLK64(PF2, XO2, i) \ 295 + BLK64(PF3, XO3, i) \ 296 + BLK64(NOP, ST, i) \ 297 + 298 + " .align 32 ;\n" 299 + " 1: ;\n" 300 + 301 + BLOCK(0) 302 + BLOCK(4) 303 + BLOCK(8) 304 + BLOCK(12) 305 + 306 + " add %[inc], %[p1] ;\n" 307 + " add %[inc], %[p2] ;\n" 308 + " add %[inc], %[p3] ;\n" 309 + " add %[inc], %[p4] ;\n" 310 + " dec %[cnt] ;\n" 311 + " jnz 1b ;\n" 312 + : [cnt] "+r" (lines), [p1] "+r" (p1), 313 + [p2] "+r" (p2), [p3] "+r" (p3), [p4] "+r" (p4) 314 + : [inc] XOR_CONSTANT_CONSTRAINT (256UL) 315 + : "memory"); 316 + } 317 + 318 + static void 319 + xor_sse_5(unsigned long bytes, unsigned long * __restrict p1, 320 + const unsigned long * __restrict p2, 321 + const unsigned long * __restrict p3, 322 + const unsigned long * __restrict p4, 323 + const unsigned long * __restrict p5) 324 + { 325 + unsigned long lines = bytes >> 8; 326 + 327 + asm volatile( 328 + #undef BLOCK 329 + #define BLOCK(i) \ 330 + PF1(i) \ 331 + PF1(i + 2) \ 332 + LD(i, 0) \ 333 + LD(i + 1, 1) \ 334 + LD(i + 2, 2) \ 335 + LD(i + 3, 3) \ 336 + PF2(i) \ 337 + PF2(i + 2) \ 338 + XO1(i, 0) \ 339 + XO1(i + 1, 1) \ 340 + XO1(i + 2, 2) \ 341 + XO1(i + 3, 3) \ 342 + PF3(i) \ 343 + PF3(i + 2) \ 344 + XO2(i, 0) \ 345 + XO2(i + 1, 1) \ 346 + XO2(i + 2, 2) \ 347 + XO2(i + 3, 3) \ 348 + PF4(i) \ 349 + PF4(i + 2) \ 350 + PF0(i + 4) \ 351 + PF0(i + 6) \ 352 + XO3(i, 0) \ 353 + XO3(i + 1, 1) \ 354 + XO3(i + 2, 2) \ 355 + XO3(i + 3, 3) \ 356 + XO4(i, 0) \ 357 + XO4(i + 1, 1) \ 358 + XO4(i + 2, 2) \ 359 + XO4(i + 3, 3) \ 360 + ST(i, 0) \ 361 + ST(i + 1, 1) \ 362 + ST(i + 2, 2) \ 363 + ST(i + 3, 3) \ 364 + 365 + 366 + PF0(0) 367 + PF0(2) 368 + 369 + " .align 32 ;\n" 370 + " 1: ;\n" 371 + 372 + BLOCK(0) 373 + BLOCK(4) 374 + BLOCK(8) 375 + BLOCK(12) 376 + 377 + " add %[inc], %[p1] ;\n" 378 + " add %[inc], %[p2] ;\n" 379 + " add %[inc], %[p3] ;\n" 380 + " add %[inc], %[p4] ;\n" 381 + " add %[inc], %[p5] ;\n" 382 + " dec %[cnt] ;\n" 383 + " jnz 1b ;\n" 384 + : [cnt] "+r" (lines), [p1] "+r" (p1), [p2] "+r" (p2), 385 + [p3] "+r" (p3), [p4] "+r" (p4), [p5] "+r" (p5) 386 + : [inc] XOR_CONSTANT_CONSTRAINT (256UL) 387 + : "memory"); 388 + } 389 + 390 + static void 391 + xor_sse_5_pf64(unsigned long bytes, unsigned long * __restrict p1, 392 + const unsigned long * __restrict p2, 393 + const unsigned long * __restrict p3, 394 + const unsigned long * __restrict p4, 395 + const unsigned long * __restrict p5) 396 + { 397 + unsigned long lines = bytes >> 8; 398 + 399 + asm volatile( 400 + #undef BLOCK 401 + #define BLOCK(i) \ 402 + BLK64(PF0, LD, i) \ 403 + BLK64(PF1, XO1, i) \ 404 + BLK64(PF2, XO2, i) \ 405 + BLK64(PF3, XO3, i) \ 406 + BLK64(PF4, XO4, i) \ 407 + BLK64(NOP, ST, i) \ 408 + 409 + " .align 32 ;\n" 410 + " 1: ;\n" 411 + 412 + BLOCK(0) 413 + BLOCK(4) 414 + BLOCK(8) 415 + BLOCK(12) 416 + 417 + " add %[inc], %[p1] ;\n" 418 + " add %[inc], %[p2] ;\n" 419 + " add %[inc], %[p3] ;\n" 420 + " add %[inc], %[p4] ;\n" 421 + " add %[inc], %[p5] ;\n" 422 + " dec %[cnt] ;\n" 423 + " jnz 1b ;\n" 424 + : [cnt] "+r" (lines), [p1] "+r" (p1), [p2] "+r" (p2), 425 + [p3] "+r" (p3), [p4] "+r" (p4), [p5] "+r" (p5) 426 + : [inc] XOR_CONSTANT_CONSTRAINT (256UL) 427 + : "memory"); 428 + } 429 + 430 + DO_XOR_BLOCKS(sse_inner, xor_sse_2, xor_sse_3, xor_sse_4, xor_sse_5); 431 + 432 + static void xor_gen_sse(void *dest, void **srcs, unsigned int src_cnt, 433 + unsigned int bytes) 434 + { 435 + kernel_fpu_begin(); 436 + xor_gen_sse_inner(dest, srcs, src_cnt, bytes); 437 + kernel_fpu_end(); 438 + } 439 + 440 + struct xor_block_template xor_block_sse = { 441 + .name = "sse", 442 + .xor_gen = xor_gen_sse, 443 + }; 444 + 445 + DO_XOR_BLOCKS(sse_pf64_inner, xor_sse_2_pf64, xor_sse_3_pf64, xor_sse_4_pf64, 446 + xor_sse_5_pf64); 447 + 448 + static void xor_gen_sse_pf64(void *dest, void **srcs, unsigned int src_cnt, 449 + unsigned int bytes) 450 + { 451 + kernel_fpu_begin(); 452 + xor_gen_sse_pf64_inner(dest, srcs, src_cnt, bytes); 453 + kernel_fpu_end(); 454 + } 455 + 456 + struct xor_block_template xor_block_sse_pf64 = { 457 + .name = "prefetch64-sse", 458 + .xor_gen = xor_gen_sse_pf64, 459 + };

+36

lib/raid/xor/x86/xor_arch.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0-or-later */ 2 + #include <asm/cpufeature.h> 3 + 4 + extern struct xor_block_template xor_block_pII_mmx; 5 + extern struct xor_block_template xor_block_p5_mmx; 6 + extern struct xor_block_template xor_block_sse; 7 + extern struct xor_block_template xor_block_sse_pf64; 8 + extern struct xor_block_template xor_block_avx; 9 + 10 + /* 11 + * When SSE is available, use it as it can write around L2. We may also be able 12 + * to load into the L1 only depending on how the cpu deals with a load to a line 13 + * that is being prefetched. 14 + * 15 + * When AVX2 is available, force using it as it is better by all measures. 16 + * 17 + * 32-bit without MMX can fall back to the generic routines. 18 + */ 19 + static __always_inline void __init arch_xor_init(void) 20 + { 21 + if (boot_cpu_has(X86_FEATURE_AVX) && 22 + boot_cpu_has(X86_FEATURE_OSXSAVE)) { 23 + xor_force(&xor_block_avx); 24 + } else if (IS_ENABLED(CONFIG_X86_64) || boot_cpu_has(X86_FEATURE_XMM)) { 25 + xor_register(&xor_block_sse); 26 + xor_register(&xor_block_sse_pf64); 27 + } else if (boot_cpu_has(X86_FEATURE_MMX)) { 28 + xor_register(&xor_block_pII_mmx); 29 + xor_register(&xor_block_p5_mmx); 30 + } else { 31 + xor_register(&xor_block_8regs); 32 + xor_register(&xor_block_8regs_p); 33 + xor_register(&xor_block_32regs); 34 + xor_register(&xor_block_32regs_p); 35 + } 36 + }

+267

lib/raid/xor/xor-32regs-prefetch.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-or-later 2 + #include <linux/prefetch.h> 3 + #include "xor_impl.h" 4 + 5 + static void 6 + xor_32regs_p_2(unsigned long bytes, unsigned long * __restrict p1, 7 + const unsigned long * __restrict p2) 8 + { 9 + long lines = bytes / (sizeof (long)) / 8 - 1; 10 + 11 + prefetchw(p1); 12 + prefetch(p2); 13 + 14 + do { 15 + register long d0, d1, d2, d3, d4, d5, d6, d7; 16 + 17 + prefetchw(p1+8); 18 + prefetch(p2+8); 19 + once_more: 20 + d0 = p1[0]; /* Pull the stuff into registers */ 21 + d1 = p1[1]; /* ... in bursts, if possible. */ 22 + d2 = p1[2]; 23 + d3 = p1[3]; 24 + d4 = p1[4]; 25 + d5 = p1[5]; 26 + d6 = p1[6]; 27 + d7 = p1[7]; 28 + d0 ^= p2[0]; 29 + d1 ^= p2[1]; 30 + d2 ^= p2[2]; 31 + d3 ^= p2[3]; 32 + d4 ^= p2[4]; 33 + d5 ^= p2[5]; 34 + d6 ^= p2[6]; 35 + d7 ^= p2[7]; 36 + p1[0] = d0; /* Store the result (in bursts) */ 37 + p1[1] = d1; 38 + p1[2] = d2; 39 + p1[3] = d3; 40 + p1[4] = d4; 41 + p1[5] = d5; 42 + p1[6] = d6; 43 + p1[7] = d7; 44 + p1 += 8; 45 + p2 += 8; 46 + } while (--lines > 0); 47 + if (lines == 0) 48 + goto once_more; 49 + } 50 + 51 + static void 52 + xor_32regs_p_3(unsigned long bytes, unsigned long * __restrict p1, 53 + const unsigned long * __restrict p2, 54 + const unsigned long * __restrict p3) 55 + { 56 + long lines = bytes / (sizeof (long)) / 8 - 1; 57 + 58 + prefetchw(p1); 59 + prefetch(p2); 60 + prefetch(p3); 61 + 62 + do { 63 + register long d0, d1, d2, d3, d4, d5, d6, d7; 64 + 65 + prefetchw(p1+8); 66 + prefetch(p2+8); 67 + prefetch(p3+8); 68 + once_more: 69 + d0 = p1[0]; /* Pull the stuff into registers */ 70 + d1 = p1[1]; /* ... in bursts, if possible. */ 71 + d2 = p1[2]; 72 + d3 = p1[3]; 73 + d4 = p1[4]; 74 + d5 = p1[5]; 75 + d6 = p1[6]; 76 + d7 = p1[7]; 77 + d0 ^= p2[0]; 78 + d1 ^= p2[1]; 79 + d2 ^= p2[2]; 80 + d3 ^= p2[3]; 81 + d4 ^= p2[4]; 82 + d5 ^= p2[5]; 83 + d6 ^= p2[6]; 84 + d7 ^= p2[7]; 85 + d0 ^= p3[0]; 86 + d1 ^= p3[1]; 87 + d2 ^= p3[2]; 88 + d3 ^= p3[3]; 89 + d4 ^= p3[4]; 90 + d5 ^= p3[5]; 91 + d6 ^= p3[6]; 92 + d7 ^= p3[7]; 93 + p1[0] = d0; /* Store the result (in bursts) */ 94 + p1[1] = d1; 95 + p1[2] = d2; 96 + p1[3] = d3; 97 + p1[4] = d4; 98 + p1[5] = d5; 99 + p1[6] = d6; 100 + p1[7] = d7; 101 + p1 += 8; 102 + p2 += 8; 103 + p3 += 8; 104 + } while (--lines > 0); 105 + if (lines == 0) 106 + goto once_more; 107 + } 108 + 109 + static void 110 + xor_32regs_p_4(unsigned long bytes, unsigned long * __restrict p1, 111 + const unsigned long * __restrict p2, 112 + const unsigned long * __restrict p3, 113 + const unsigned long * __restrict p4) 114 + { 115 + long lines = bytes / (sizeof (long)) / 8 - 1; 116 + 117 + prefetchw(p1); 118 + prefetch(p2); 119 + prefetch(p3); 120 + prefetch(p4); 121 + 122 + do { 123 + register long d0, d1, d2, d3, d4, d5, d6, d7; 124 + 125 + prefetchw(p1+8); 126 + prefetch(p2+8); 127 + prefetch(p3+8); 128 + prefetch(p4+8); 129 + once_more: 130 + d0 = p1[0]; /* Pull the stuff into registers */ 131 + d1 = p1[1]; /* ... in bursts, if possible. */ 132 + d2 = p1[2]; 133 + d3 = p1[3]; 134 + d4 = p1[4]; 135 + d5 = p1[5]; 136 + d6 = p1[6]; 137 + d7 = p1[7]; 138 + d0 ^= p2[0]; 139 + d1 ^= p2[1]; 140 + d2 ^= p2[2]; 141 + d3 ^= p2[3]; 142 + d4 ^= p2[4]; 143 + d5 ^= p2[5]; 144 + d6 ^= p2[6]; 145 + d7 ^= p2[7]; 146 + d0 ^= p3[0]; 147 + d1 ^= p3[1]; 148 + d2 ^= p3[2]; 149 + d3 ^= p3[3]; 150 + d4 ^= p3[4]; 151 + d5 ^= p3[5]; 152 + d6 ^= p3[6]; 153 + d7 ^= p3[7]; 154 + d0 ^= p4[0]; 155 + d1 ^= p4[1]; 156 + d2 ^= p4[2]; 157 + d3 ^= p4[3]; 158 + d4 ^= p4[4]; 159 + d5 ^= p4[5]; 160 + d6 ^= p4[6]; 161 + d7 ^= p4[7]; 162 + p1[0] = d0; /* Store the result (in bursts) */ 163 + p1[1] = d1; 164 + p1[2] = d2; 165 + p1[3] = d3; 166 + p1[4] = d4; 167 + p1[5] = d5; 168 + p1[6] = d6; 169 + p1[7] = d7; 170 + p1 += 8; 171 + p2 += 8; 172 + p3 += 8; 173 + p4 += 8; 174 + } while (--lines > 0); 175 + if (lines == 0) 176 + goto once_more; 177 + } 178 + 179 + static void 180 + xor_32regs_p_5(unsigned long bytes, unsigned long * __restrict p1, 181 + const unsigned long * __restrict p2, 182 + const unsigned long * __restrict p3, 183 + const unsigned long * __restrict p4, 184 + const unsigned long * __restrict p5) 185 + { 186 + long lines = bytes / (sizeof (long)) / 8 - 1; 187 + 188 + prefetchw(p1); 189 + prefetch(p2); 190 + prefetch(p3); 191 + prefetch(p4); 192 + prefetch(p5); 193 + 194 + do { 195 + register long d0, d1, d2, d3, d4, d5, d6, d7; 196 + 197 + prefetchw(p1+8); 198 + prefetch(p2+8); 199 + prefetch(p3+8); 200 + prefetch(p4+8); 201 + prefetch(p5+8); 202 + once_more: 203 + d0 = p1[0]; /* Pull the stuff into registers */ 204 + d1 = p1[1]; /* ... in bursts, if possible. */ 205 + d2 = p1[2]; 206 + d3 = p1[3]; 207 + d4 = p1[4]; 208 + d5 = p1[5]; 209 + d6 = p1[6]; 210 + d7 = p1[7]; 211 + d0 ^= p2[0]; 212 + d1 ^= p2[1]; 213 + d2 ^= p2[2]; 214 + d3 ^= p2[3]; 215 + d4 ^= p2[4]; 216 + d5 ^= p2[5]; 217 + d6 ^= p2[6]; 218 + d7 ^= p2[7]; 219 + d0 ^= p3[0]; 220 + d1 ^= p3[1]; 221 + d2 ^= p3[2]; 222 + d3 ^= p3[3]; 223 + d4 ^= p3[4]; 224 + d5 ^= p3[5]; 225 + d6 ^= p3[6]; 226 + d7 ^= p3[7]; 227 + d0 ^= p4[0]; 228 + d1 ^= p4[1]; 229 + d2 ^= p4[2]; 230 + d3 ^= p4[3]; 231 + d4 ^= p4[4]; 232 + d5 ^= p4[5]; 233 + d6 ^= p4[6]; 234 + d7 ^= p4[7]; 235 + d0 ^= p5[0]; 236 + d1 ^= p5[1]; 237 + d2 ^= p5[2]; 238 + d3 ^= p5[3]; 239 + d4 ^= p5[4]; 240 + d5 ^= p5[5]; 241 + d6 ^= p5[6]; 242 + d7 ^= p5[7]; 243 + p1[0] = d0; /* Store the result (in bursts) */ 244 + p1[1] = d1; 245 + p1[2] = d2; 246 + p1[3] = d3; 247 + p1[4] = d4; 248 + p1[5] = d5; 249 + p1[6] = d6; 250 + p1[7] = d7; 251 + p1 += 8; 252 + p2 += 8; 253 + p3 += 8; 254 + p4 += 8; 255 + p5 += 8; 256 + } while (--lines > 0); 257 + if (lines == 0) 258 + goto once_more; 259 + } 260 + 261 + DO_XOR_BLOCKS(32regs_p, xor_32regs_p_2, xor_32regs_p_3, xor_32regs_p_4, 262 + xor_32regs_p_5); 263 + 264 + struct xor_block_template xor_block_32regs_p = { 265 + .name = "32regs_prefetch", 266 + .xor_gen = xor_gen_32regs_p, 267 + };

+217

lib/raid/xor/xor-32regs.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-or-later 2 + #include "xor_impl.h" 3 + 4 + static void 5 + xor_32regs_2(unsigned long bytes, unsigned long * __restrict p1, 6 + const unsigned long * __restrict p2) 7 + { 8 + long lines = bytes / (sizeof (long)) / 8; 9 + 10 + do { 11 + register long d0, d1, d2, d3, d4, d5, d6, d7; 12 + d0 = p1[0]; /* Pull the stuff into registers */ 13 + d1 = p1[1]; /* ... in bursts, if possible. */ 14 + d2 = p1[2]; 15 + d3 = p1[3]; 16 + d4 = p1[4]; 17 + d5 = p1[5]; 18 + d6 = p1[6]; 19 + d7 = p1[7]; 20 + d0 ^= p2[0]; 21 + d1 ^= p2[1]; 22 + d2 ^= p2[2]; 23 + d3 ^= p2[3]; 24 + d4 ^= p2[4]; 25 + d5 ^= p2[5]; 26 + d6 ^= p2[6]; 27 + d7 ^= p2[7]; 28 + p1[0] = d0; /* Store the result (in bursts) */ 29 + p1[1] = d1; 30 + p1[2] = d2; 31 + p1[3] = d3; 32 + p1[4] = d4; 33 + p1[5] = d5; 34 + p1[6] = d6; 35 + p1[7] = d7; 36 + p1 += 8; 37 + p2 += 8; 38 + } while (--lines > 0); 39 + } 40 + 41 + static void 42 + xor_32regs_3(unsigned long bytes, unsigned long * __restrict p1, 43 + const unsigned long * __restrict p2, 44 + const unsigned long * __restrict p3) 45 + { 46 + long lines = bytes / (sizeof (long)) / 8; 47 + 48 + do { 49 + register long d0, d1, d2, d3, d4, d5, d6, d7; 50 + d0 = p1[0]; /* Pull the stuff into registers */ 51 + d1 = p1[1]; /* ... in bursts, if possible. */ 52 + d2 = p1[2]; 53 + d3 = p1[3]; 54 + d4 = p1[4]; 55 + d5 = p1[5]; 56 + d6 = p1[6]; 57 + d7 = p1[7]; 58 + d0 ^= p2[0]; 59 + d1 ^= p2[1]; 60 + d2 ^= p2[2]; 61 + d3 ^= p2[3]; 62 + d4 ^= p2[4]; 63 + d5 ^= p2[5]; 64 + d6 ^= p2[6]; 65 + d7 ^= p2[7]; 66 + d0 ^= p3[0]; 67 + d1 ^= p3[1]; 68 + d2 ^= p3[2]; 69 + d3 ^= p3[3]; 70 + d4 ^= p3[4]; 71 + d5 ^= p3[5]; 72 + d6 ^= p3[6]; 73 + d7 ^= p3[7]; 74 + p1[0] = d0; /* Store the result (in bursts) */ 75 + p1[1] = d1; 76 + p1[2] = d2; 77 + p1[3] = d3; 78 + p1[4] = d4; 79 + p1[5] = d5; 80 + p1[6] = d6; 81 + p1[7] = d7; 82 + p1 += 8; 83 + p2 += 8; 84 + p3 += 8; 85 + } while (--lines > 0); 86 + } 87 + 88 + static void 89 + xor_32regs_4(unsigned long bytes, unsigned long * __restrict p1, 90 + const unsigned long * __restrict p2, 91 + const unsigned long * __restrict p3, 92 + const unsigned long * __restrict p4) 93 + { 94 + long lines = bytes / (sizeof (long)) / 8; 95 + 96 + do { 97 + register long d0, d1, d2, d3, d4, d5, d6, d7; 98 + d0 = p1[0]; /* Pull the stuff into registers */ 99 + d1 = p1[1]; /* ... in bursts, if possible. */ 100 + d2 = p1[2]; 101 + d3 = p1[3]; 102 + d4 = p1[4]; 103 + d5 = p1[5]; 104 + d6 = p1[6]; 105 + d7 = p1[7]; 106 + d0 ^= p2[0]; 107 + d1 ^= p2[1]; 108 + d2 ^= p2[2]; 109 + d3 ^= p2[3]; 110 + d4 ^= p2[4]; 111 + d5 ^= p2[5]; 112 + d6 ^= p2[6]; 113 + d7 ^= p2[7]; 114 + d0 ^= p3[0]; 115 + d1 ^= p3[1]; 116 + d2 ^= p3[2]; 117 + d3 ^= p3[3]; 118 + d4 ^= p3[4]; 119 + d5 ^= p3[5]; 120 + d6 ^= p3[6]; 121 + d7 ^= p3[7]; 122 + d0 ^= p4[0]; 123 + d1 ^= p4[1]; 124 + d2 ^= p4[2]; 125 + d3 ^= p4[3]; 126 + d4 ^= p4[4]; 127 + d5 ^= p4[5]; 128 + d6 ^= p4[6]; 129 + d7 ^= p4[7]; 130 + p1[0] = d0; /* Store the result (in bursts) */ 131 + p1[1] = d1; 132 + p1[2] = d2; 133 + p1[3] = d3; 134 + p1[4] = d4; 135 + p1[5] = d5; 136 + p1[6] = d6; 137 + p1[7] = d7; 138 + p1 += 8; 139 + p2 += 8; 140 + p3 += 8; 141 + p4 += 8; 142 + } while (--lines > 0); 143 + } 144 + 145 + static void 146 + xor_32regs_5(unsigned long bytes, unsigned long * __restrict p1, 147 + const unsigned long * __restrict p2, 148 + const unsigned long * __restrict p3, 149 + const unsigned long * __restrict p4, 150 + const unsigned long * __restrict p5) 151 + { 152 + long lines = bytes / (sizeof (long)) / 8; 153 + 154 + do { 155 + register long d0, d1, d2, d3, d4, d5, d6, d7; 156 + d0 = p1[0]; /* Pull the stuff into registers */ 157 + d1 = p1[1]; /* ... in bursts, if possible. */ 158 + d2 = p1[2]; 159 + d3 = p1[3]; 160 + d4 = p1[4]; 161 + d5 = p1[5]; 162 + d6 = p1[6]; 163 + d7 = p1[7]; 164 + d0 ^= p2[0]; 165 + d1 ^= p2[1]; 166 + d2 ^= p2[2]; 167 + d3 ^= p2[3]; 168 + d4 ^= p2[4]; 169 + d5 ^= p2[5]; 170 + d6 ^= p2[6]; 171 + d7 ^= p2[7]; 172 + d0 ^= p3[0]; 173 + d1 ^= p3[1]; 174 + d2 ^= p3[2]; 175 + d3 ^= p3[3]; 176 + d4 ^= p3[4]; 177 + d5 ^= p3[5]; 178 + d6 ^= p3[6]; 179 + d7 ^= p3[7]; 180 + d0 ^= p4[0]; 181 + d1 ^= p4[1]; 182 + d2 ^= p4[2]; 183 + d3 ^= p4[3]; 184 + d4 ^= p4[4]; 185 + d5 ^= p4[5]; 186 + d6 ^= p4[6]; 187 + d7 ^= p4[7]; 188 + d0 ^= p5[0]; 189 + d1 ^= p5[1]; 190 + d2 ^= p5[2]; 191 + d3 ^= p5[3]; 192 + d4 ^= p5[4]; 193 + d5 ^= p5[5]; 194 + d6 ^= p5[6]; 195 + d7 ^= p5[7]; 196 + p1[0] = d0; /* Store the result (in bursts) */ 197 + p1[1] = d1; 198 + p1[2] = d2; 199 + p1[3] = d3; 200 + p1[4] = d4; 201 + p1[5] = d5; 202 + p1[6] = d6; 203 + p1[7] = d7; 204 + p1 += 8; 205 + p2 += 8; 206 + p3 += 8; 207 + p4 += 8; 208 + p5 += 8; 209 + } while (--lines > 0); 210 + } 211 + 212 + DO_XOR_BLOCKS(32regs, xor_32regs_2, xor_32regs_3, xor_32regs_4, xor_32regs_5); 213 + 214 + struct xor_block_template xor_block_32regs = { 215 + .name = "32regs", 216 + .xor_gen = xor_gen_32regs, 217 + };

+146

lib/raid/xor/xor-8regs-prefetch.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-or-later 2 + #include <linux/prefetch.h> 3 + #include "xor_impl.h" 4 + 5 + static void 6 + xor_8regs_p_2(unsigned long bytes, unsigned long * __restrict p1, 7 + const unsigned long * __restrict p2) 8 + { 9 + long lines = bytes / (sizeof (long)) / 8 - 1; 10 + prefetchw(p1); 11 + prefetch(p2); 12 + 13 + do { 14 + prefetchw(p1+8); 15 + prefetch(p2+8); 16 + once_more: 17 + p1[0] ^= p2[0]; 18 + p1[1] ^= p2[1]; 19 + p1[2] ^= p2[2]; 20 + p1[3] ^= p2[3]; 21 + p1[4] ^= p2[4]; 22 + p1[5] ^= p2[5]; 23 + p1[6] ^= p2[6]; 24 + p1[7] ^= p2[7]; 25 + p1 += 8; 26 + p2 += 8; 27 + } while (--lines > 0); 28 + if (lines == 0) 29 + goto once_more; 30 + } 31 + 32 + static void 33 + xor_8regs_p_3(unsigned long bytes, unsigned long * __restrict p1, 34 + const unsigned long * __restrict p2, 35 + const unsigned long * __restrict p3) 36 + { 37 + long lines = bytes / (sizeof (long)) / 8 - 1; 38 + prefetchw(p1); 39 + prefetch(p2); 40 + prefetch(p3); 41 + 42 + do { 43 + prefetchw(p1+8); 44 + prefetch(p2+8); 45 + prefetch(p3+8); 46 + once_more: 47 + p1[0] ^= p2[0] ^ p3[0]; 48 + p1[1] ^= p2[1] ^ p3[1]; 49 + p1[2] ^= p2[2] ^ p3[2]; 50 + p1[3] ^= p2[3] ^ p3[3]; 51 + p1[4] ^= p2[4] ^ p3[4]; 52 + p1[5] ^= p2[5] ^ p3[5]; 53 + p1[6] ^= p2[6] ^ p3[6]; 54 + p1[7] ^= p2[7] ^ p3[7]; 55 + p1 += 8; 56 + p2 += 8; 57 + p3 += 8; 58 + } while (--lines > 0); 59 + if (lines == 0) 60 + goto once_more; 61 + } 62 + 63 + static void 64 + xor_8regs_p_4(unsigned long bytes, unsigned long * __restrict p1, 65 + const unsigned long * __restrict p2, 66 + const unsigned long * __restrict p3, 67 + const unsigned long * __restrict p4) 68 + { 69 + long lines = bytes / (sizeof (long)) / 8 - 1; 70 + 71 + prefetchw(p1); 72 + prefetch(p2); 73 + prefetch(p3); 74 + prefetch(p4); 75 + 76 + do { 77 + prefetchw(p1+8); 78 + prefetch(p2+8); 79 + prefetch(p3+8); 80 + prefetch(p4+8); 81 + once_more: 82 + p1[0] ^= p2[0] ^ p3[0] ^ p4[0]; 83 + p1[1] ^= p2[1] ^ p3[1] ^ p4[1]; 84 + p1[2] ^= p2[2] ^ p3[2] ^ p4[2]; 85 + p1[3] ^= p2[3] ^ p3[3] ^ p4[3]; 86 + p1[4] ^= p2[4] ^ p3[4] ^ p4[4]; 87 + p1[5] ^= p2[5] ^ p3[5] ^ p4[5]; 88 + p1[6] ^= p2[6] ^ p3[6] ^ p4[6]; 89 + p1[7] ^= p2[7] ^ p3[7] ^ p4[7]; 90 + p1 += 8; 91 + p2 += 8; 92 + p3 += 8; 93 + p4 += 8; 94 + } while (--lines > 0); 95 + if (lines == 0) 96 + goto once_more; 97 + } 98 + 99 + static void 100 + xor_8regs_p_5(unsigned long bytes, unsigned long * __restrict p1, 101 + const unsigned long * __restrict p2, 102 + const unsigned long * __restrict p3, 103 + const unsigned long * __restrict p4, 104 + const unsigned long * __restrict p5) 105 + { 106 + long lines = bytes / (sizeof (long)) / 8 - 1; 107 + 108 + prefetchw(p1); 109 + prefetch(p2); 110 + prefetch(p3); 111 + prefetch(p4); 112 + prefetch(p5); 113 + 114 + do { 115 + prefetchw(p1+8); 116 + prefetch(p2+8); 117 + prefetch(p3+8); 118 + prefetch(p4+8); 119 + prefetch(p5+8); 120 + once_more: 121 + p1[0] ^= p2[0] ^ p3[0] ^ p4[0] ^ p5[0]; 122 + p1[1] ^= p2[1] ^ p3[1] ^ p4[1] ^ p5[1]; 123 + p1[2] ^= p2[2] ^ p3[2] ^ p4[2] ^ p5[2]; 124 + p1[3] ^= p2[3] ^ p3[3] ^ p4[3] ^ p5[3]; 125 + p1[4] ^= p2[4] ^ p3[4] ^ p4[4] ^ p5[4]; 126 + p1[5] ^= p2[5] ^ p3[5] ^ p4[5] ^ p5[5]; 127 + p1[6] ^= p2[6] ^ p3[6] ^ p4[6] ^ p5[6]; 128 + p1[7] ^= p2[7] ^ p3[7] ^ p4[7] ^ p5[7]; 129 + p1 += 8; 130 + p2 += 8; 131 + p3 += 8; 132 + p4 += 8; 133 + p5 += 8; 134 + } while (--lines > 0); 135 + if (lines == 0) 136 + goto once_more; 137 + } 138 + 139 + 140 + DO_XOR_BLOCKS(8regs_p, xor_8regs_p_2, xor_8regs_p_3, xor_8regs_p_4, 141 + xor_8regs_p_5); 142 + 143 + struct xor_block_template xor_block_8regs_p = { 144 + .name = "8regs_prefetch", 145 + .xor_gen = xor_gen_8regs_p, 146 + };

+103

lib/raid/xor/xor-8regs.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-or-later 2 + #include "xor_impl.h" 3 + 4 + static void 5 + xor_8regs_2(unsigned long bytes, unsigned long * __restrict p1, 6 + const unsigned long * __restrict p2) 7 + { 8 + long lines = bytes / (sizeof (long)) / 8; 9 + 10 + do { 11 + p1[0] ^= p2[0]; 12 + p1[1] ^= p2[1]; 13 + p1[2] ^= p2[2]; 14 + p1[3] ^= p2[3]; 15 + p1[4] ^= p2[4]; 16 + p1[5] ^= p2[5]; 17 + p1[6] ^= p2[6]; 18 + p1[7] ^= p2[7]; 19 + p1 += 8; 20 + p2 += 8; 21 + } while (--lines > 0); 22 + } 23 + 24 + static void 25 + xor_8regs_3(unsigned long bytes, unsigned long * __restrict p1, 26 + const unsigned long * __restrict p2, 27 + const unsigned long * __restrict p3) 28 + { 29 + long lines = bytes / (sizeof (long)) / 8; 30 + 31 + do { 32 + p1[0] ^= p2[0] ^ p3[0]; 33 + p1[1] ^= p2[1] ^ p3[1]; 34 + p1[2] ^= p2[2] ^ p3[2]; 35 + p1[3] ^= p2[3] ^ p3[3]; 36 + p1[4] ^= p2[4] ^ p3[4]; 37 + p1[5] ^= p2[5] ^ p3[5]; 38 + p1[6] ^= p2[6] ^ p3[6]; 39 + p1[7] ^= p2[7] ^ p3[7]; 40 + p1 += 8; 41 + p2 += 8; 42 + p3 += 8; 43 + } while (--lines > 0); 44 + } 45 + 46 + static void 47 + xor_8regs_4(unsigned long bytes, unsigned long * __restrict p1, 48 + const unsigned long * __restrict p2, 49 + const unsigned long * __restrict p3, 50 + const unsigned long * __restrict p4) 51 + { 52 + long lines = bytes / (sizeof (long)) / 8; 53 + 54 + do { 55 + p1[0] ^= p2[0] ^ p3[0] ^ p4[0]; 56 + p1[1] ^= p2[1] ^ p3[1] ^ p4[1]; 57 + p1[2] ^= p2[2] ^ p3[2] ^ p4[2]; 58 + p1[3] ^= p2[3] ^ p3[3] ^ p4[3]; 59 + p1[4] ^= p2[4] ^ p3[4] ^ p4[4]; 60 + p1[5] ^= p2[5] ^ p3[5] ^ p4[5]; 61 + p1[6] ^= p2[6] ^ p3[6] ^ p4[6]; 62 + p1[7] ^= p2[7] ^ p3[7] ^ p4[7]; 63 + p1 += 8; 64 + p2 += 8; 65 + p3 += 8; 66 + p4 += 8; 67 + } while (--lines > 0); 68 + } 69 + 70 + static void 71 + xor_8regs_5(unsigned long bytes, unsigned long * __restrict p1, 72 + const unsigned long * __restrict p2, 73 + const unsigned long * __restrict p3, 74 + const unsigned long * __restrict p4, 75 + const unsigned long * __restrict p5) 76 + { 77 + long lines = bytes / (sizeof (long)) / 8; 78 + 79 + do { 80 + p1[0] ^= p2[0] ^ p3[0] ^ p4[0] ^ p5[0]; 81 + p1[1] ^= p2[1] ^ p3[1] ^ p4[1] ^ p5[1]; 82 + p1[2] ^= p2[2] ^ p3[2] ^ p4[2] ^ p5[2]; 83 + p1[3] ^= p2[3] ^ p3[3] ^ p4[3] ^ p5[3]; 84 + p1[4] ^= p2[4] ^ p3[4] ^ p4[4] ^ p5[4]; 85 + p1[5] ^= p2[5] ^ p3[5] ^ p4[5] ^ p5[5]; 86 + p1[6] ^= p2[6] ^ p3[6] ^ p4[6] ^ p5[6]; 87 + p1[7] ^= p2[7] ^ p3[7] ^ p4[7] ^ p5[7]; 88 + p1 += 8; 89 + p2 += 8; 90 + p3 += 8; 91 + p4 += 8; 92 + p5 += 8; 93 + } while (--lines > 0); 94 + } 95 + 96 + #ifndef NO_TEMPLATE 97 + DO_XOR_BLOCKS(8regs, xor_8regs_2, xor_8regs_3, xor_8regs_4, xor_8regs_5); 98 + 99 + struct xor_block_template xor_block_8regs = { 100 + .name = "8regs", 101 + .xor_gen = xor_gen_8regs, 102 + }; 103 + #endif /* NO_TEMPLATE */

+193

lib/raid/xor/xor-core.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-or-later 2 + /* 3 + * Copyright (C) 1996, 1997, 1998, 1999, 2000, 4 + * Ingo Molnar, Matti Aarnio, Jakub Jelinek, Richard Henderson. 5 + * 6 + * Dispatch optimized XOR parity functions. 7 + */ 8 + 9 + #include <linux/module.h> 10 + #include <linux/gfp.h> 11 + #include <linux/raid/xor.h> 12 + #include <linux/jiffies.h> 13 + #include <linux/preempt.h> 14 + #include <linux/static_call.h> 15 + #include "xor_impl.h" 16 + 17 + DEFINE_STATIC_CALL_NULL(xor_gen_impl, *xor_block_8regs.xor_gen); 18 + 19 + /** 20 + * xor_gen - generate RAID-style XOR information 21 + * @dest: destination vector 22 + * @srcs: source vectors 23 + * @src_cnt: number of source vectors 24 + * @bytes: length in bytes of each vector 25 + * 26 + * Performs bit-wise XOR operation into @dest for each of the @src_cnt vectors 27 + * in @srcs for a length of @bytes bytes. @src_cnt must be non-zero, and the 28 + * memory pointed to by @dest and each member of @srcs must be at least 64-byte 29 + * aligned. @bytes must be non-zero and a multiple of 512. 30 + * 31 + * Note: for typical RAID uses, @dest either needs to be zeroed, or filled with 32 + * the first disk, which then needs to be removed from @srcs. 33 + */ 34 + void xor_gen(void *dest, void **srcs, unsigned int src_cnt, unsigned int bytes) 35 + { 36 + WARN_ON_ONCE(!in_task() || irqs_disabled() || softirq_count()); 37 + WARN_ON_ONCE(bytes == 0); 38 + WARN_ON_ONCE(bytes & 511); 39 + 40 + static_call(xor_gen_impl)(dest, srcs, src_cnt, bytes); 41 + } 42 + EXPORT_SYMBOL(xor_gen); 43 + 44 + /* Set of all registered templates. */ 45 + static struct xor_block_template *__initdata template_list; 46 + static struct xor_block_template *forced_template; 47 + 48 + /** 49 + * xor_register - register a XOR template 50 + * @tmpl: template to register 51 + * 52 + * Register a XOR implementation with the core. Registered implementations 53 + * will be measured by a trivial benchmark, and the fastest one is chosen 54 + * unless an implementation is forced using xor_force(). 55 + */ 56 + void __init xor_register(struct xor_block_template *tmpl) 57 + { 58 + tmpl->next = template_list; 59 + template_list = tmpl; 60 + } 61 + 62 + /** 63 + * xor_force - force use of a XOR template 64 + * @tmpl: template to register 65 + * 66 + * Register a XOR implementation with the core and force using it. Forcing 67 + * an implementation will make the core ignore any template registered using 68 + * xor_register(), or any previous implementation forced using xor_force(). 69 + */ 70 + void __init xor_force(struct xor_block_template *tmpl) 71 + { 72 + forced_template = tmpl; 73 + } 74 + 75 + #define BENCH_SIZE 4096 76 + #define REPS 800U 77 + 78 + static void __init 79 + do_xor_speed(struct xor_block_template *tmpl, void *b1, void *b2) 80 + { 81 + int speed; 82 + unsigned long reps; 83 + ktime_t min, start, t0; 84 + void *srcs[1] = { b2 }; 85 + 86 + preempt_disable(); 87 + 88 + reps = 0; 89 + t0 = ktime_get(); 90 + /* delay start until time has advanced */ 91 + while ((start = ktime_get()) == t0) 92 + cpu_relax(); 93 + do { 94 + mb(); /* prevent loop optimization */ 95 + tmpl->xor_gen(b1, srcs, 1, BENCH_SIZE); 96 + mb(); 97 + } while (reps++ < REPS || (t0 = ktime_get()) == start); 98 + min = ktime_sub(t0, start); 99 + 100 + preempt_enable(); 101 + 102 + // bytes/ns == GB/s, multiply by 1000 to get MB/s [not MiB/s] 103 + speed = (1000 * reps * BENCH_SIZE) / (unsigned int)ktime_to_ns(min); 104 + tmpl->speed = speed; 105 + 106 + pr_info(" %-16s: %5d MB/sec\n", tmpl->name, speed); 107 + } 108 + 109 + static int __init calibrate_xor_blocks(void) 110 + { 111 + void *b1, *b2; 112 + struct xor_block_template *f, *fastest; 113 + 114 + if (forced_template) 115 + return 0; 116 + 117 + b1 = (void *) __get_free_pages(GFP_KERNEL, 2); 118 + if (!b1) { 119 + pr_warn("xor: Yikes! No memory available.\n"); 120 + return -ENOMEM; 121 + } 122 + b2 = b1 + 2*PAGE_SIZE + BENCH_SIZE; 123 + 124 + pr_info("xor: measuring software checksum speed\n"); 125 + fastest = template_list; 126 + for (f = template_list; f; f = f->next) { 127 + do_xor_speed(f, b1, b2); 128 + if (f->speed > fastest->speed) 129 + fastest = f; 130 + } 131 + static_call_update(xor_gen_impl, fastest->xor_gen); 132 + pr_info("xor: using function: %s (%d MB/sec)\n", 133 + fastest->name, fastest->speed); 134 + 135 + free_pages((unsigned long)b1, 2); 136 + return 0; 137 + } 138 + 139 + #ifdef CONFIG_XOR_BLOCKS_ARCH 140 + #include "xor_arch.h" /* $SRCARCH/xor_arch.h */ 141 + #else 142 + static void __init arch_xor_init(void) 143 + { 144 + xor_register(&xor_block_8regs); 145 + xor_register(&xor_block_8regs_p); 146 + xor_register(&xor_block_32regs); 147 + xor_register(&xor_block_32regs_p); 148 + } 149 + #endif /* CONFIG_XOR_BLOCKS_ARCH */ 150 + 151 + static int __init xor_init(void) 152 + { 153 + arch_xor_init(); 154 + 155 + /* 156 + * If this arch/cpu has a short-circuited selection, don't loop through 157 + * all the possible functions, just use the best one. 158 + */ 159 + if (forced_template) { 160 + pr_info("xor: automatically using best checksumming function %-10s\n", 161 + forced_template->name); 162 + static_call_update(xor_gen_impl, forced_template->xor_gen); 163 + return 0; 164 + } 165 + 166 + #ifdef MODULE 167 + return calibrate_xor_blocks(); 168 + #else 169 + /* 170 + * Pick the first template as the temporary default until calibration 171 + * happens. 172 + */ 173 + static_call_update(xor_gen_impl, template_list->xor_gen); 174 + return 0; 175 + #endif 176 + } 177 + 178 + static __exit void xor_exit(void) 179 + { 180 + } 181 + 182 + MODULE_DESCRIPTION("RAID-5 checksumming functions"); 183 + MODULE_LICENSE("GPL"); 184 + 185 + /* 186 + * When built-in we must register the default template before md, but we don't 187 + * want calibration to run that early as that would delay the boot process. 188 + */ 189 + #ifndef MODULE 190 + __initcall(calibrate_xor_blocks); 191 + #endif 192 + core_initcall(xor_init); 193 + module_exit(xor_exit);

+56

lib/raid/xor/xor_impl.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #ifndef _XOR_IMPL_H 3 + #define _XOR_IMPL_H 4 + 5 + #include <linux/init.h> 6 + #include <linux/minmax.h> 7 + 8 + struct xor_block_template { 9 + struct xor_block_template *next; 10 + const char *name; 11 + int speed; 12 + void (*xor_gen)(void *dest, void **srcs, unsigned int src_cnt, 13 + unsigned int bytes); 14 + }; 15 + 16 + #define __DO_XOR_BLOCKS(_name, _handle1, _handle2, _handle3, _handle4) \ 17 + void \ 18 + xor_gen_##_name(void *dest, void **srcs, unsigned int src_cnt, \ 19 + unsigned int bytes) \ 20 + { \ 21 + unsigned int src_off = 0; \ 22 + \ 23 + while (src_cnt > 0) { \ 24 + unsigned int this_cnt = min(src_cnt, 4); \ 25 + \ 26 + if (this_cnt == 1) \ 27 + _handle1(bytes, dest, srcs[src_off]); \ 28 + else if (this_cnt == 2) \ 29 + _handle2(bytes, dest, srcs[src_off], \ 30 + srcs[src_off + 1]); \ 31 + else if (this_cnt == 3) \ 32 + _handle3(bytes, dest, srcs[src_off], \ 33 + srcs[src_off + 1], srcs[src_off + 2]); \ 34 + else \ 35 + _handle4(bytes, dest, srcs[src_off], \ 36 + srcs[src_off + 1], srcs[src_off + 2], \ 37 + srcs[src_off + 3]); \ 38 + \ 39 + src_cnt -= this_cnt; \ 40 + src_off += this_cnt; \ 41 + } \ 42 + } 43 + 44 + #define DO_XOR_BLOCKS(_name, _handle1, _handle2, _handle3, _handle4) \ 45 + static __DO_XOR_BLOCKS(_name, _handle1, _handle2, _handle3, _handle4) 46 + 47 + /* generic implementations */ 48 + extern struct xor_block_template xor_block_8regs; 49 + extern struct xor_block_template xor_block_32regs; 50 + extern struct xor_block_template xor_block_8regs_p; 51 + extern struct xor_block_template xor_block_32regs_p; 52 + 53 + void __init xor_register(struct xor_block_template *tmpl); 54 + void __init xor_force(struct xor_block_template *tmpl); 55 + 56 + #endif /* _XOR_IMPL_H */

+4 -4

lib/scatterlist.c

··· 1123 1123 size_t len, off; 1124 1124 1125 1125 /* We decant the page list into the tail of the scatterlist */ 1126 - pages = (void *)sgtable->sgl + 1127 - array_size(sg_max, sizeof(struct scatterlist)); 1126 + pages = (void *)sg + array_size(sg_max, sizeof(struct scatterlist)); 1128 1127 pages -= sg_max; 1129 1128 1130 1129 do { ··· 1246 1247 else 1247 1248 page = virt_to_page((void *)kaddr); 1248 1249 1249 - sg_set_page(sg, page, len, off); 1250 + sg_set_page(sg, page, seg, off); 1250 1251 sgtable->nents++; 1251 1252 sg++; 1252 1253 sg_max--; ··· 1255 1256 kaddr += PAGE_SIZE; 1256 1257 off = 0; 1257 1258 } while (len > 0 && sg_max > 0); 1259 + ret -= len; 1258 1260 1259 1261 if (maxsize <= 0 || sg_max == 0) 1260 1262 break; ··· 1409 1409 struct sg_table *sgtable, unsigned int sg_max, 1410 1410 iov_iter_extraction_t extraction_flags) 1411 1411 { 1412 - if (maxsize == 0) 1412 + if (maxsize == 0 || sg_max == 0) 1413 1413 return 0; 1414 1414 1415 1415 switch (iov_iter_type(iter)) {

+217 -7

lib/tests/kunit_iov_iter.c

··· 13 13 #include <linux/uio.h> 14 14 #include <linux/bvec.h> 15 15 #include <linux/folio_queue.h> 16 + #include <linux/scatterlist.h> 17 + #include <linux/minmax.h> 18 + #include <linux/mman.h> 16 19 #include <kunit/test.h> 17 20 18 21 MODULE_DESCRIPTION("iov_iter testing"); ··· 40 37 41 38 static inline u8 pattern(unsigned long x) 42 39 { 43 - return x & 0xff; 40 + return (u8)x + (u8)(x >> 8) + (u8)(x >> 16); 44 41 } 45 42 46 43 static void iov_kunit_unmap(void *data) 47 44 { 48 - vunmap(data); 45 + vfree(data); 49 46 } 50 47 51 48 static void *__init iov_kunit_create_buffer(struct kunit *test, ··· 55 52 struct page **pages; 56 53 unsigned long got; 57 54 void *buffer; 55 + unsigned int i; 58 56 59 - pages = kunit_kcalloc(test, npages, sizeof(struct page *), GFP_KERNEL); 60 - KUNIT_ASSERT_NOT_ERR_OR_NULL(test, pages); 57 + pages = kzalloc_objs(struct page *, npages, GFP_KERNEL); 58 + KUNIT_ASSERT_NOT_ERR_OR_NULL(test, pages); 61 59 *ppages = pages; 62 60 63 61 got = alloc_pages_bulk(GFP_KERNEL, npages, pages); 64 62 if (got != npages) { 65 63 release_pages(pages, got); 64 + kvfree(pages); 66 65 KUNIT_ASSERT_EQ(test, got, npages); 67 66 } 67 + /* Make sure that we don't get a physically contiguous buffer. */ 68 + for (i = 0; i < npages / 4; ++i) 69 + swap(pages[i], pages[i + npages / 2]); 68 70 69 71 buffer = vmap(pages, npages, VM_MAP | VM_MAP_PUT_PAGES, PAGE_KERNEL); 72 + if (buffer == NULL) { 73 + release_pages(pages, got); 74 + kvfree(pages); 75 + } 70 76 KUNIT_ASSERT_NOT_ERR_OR_NULL(test, buffer); 71 77 72 78 kunit_add_action_or_reset(test, iov_kunit_unmap, buffer); ··· 381 369 382 370 for (folioq = data; folioq; folioq = next) { 383 371 next = folioq->next; 384 - for (int i = 0; i < folioq_nr_slots(folioq); i++) 385 - if (folioq_folio(folioq, i)) 386 - folio_put(folioq_folio(folioq, i)); 387 372 kfree(folioq); 388 373 } 389 374 } ··· 1018 1009 KUNIT_SUCCEED(test); 1019 1010 } 1020 1011 1012 + struct iov_kunit_iter_to_sg_data { 1013 + struct sg_table *sgt; 1014 + u8 *buffer, *scratch; 1015 + u8 __user *ubuf; 1016 + struct page **pages; 1017 + size_t npages; 1018 + }; 1019 + 1020 + static void __init 1021 + iov_kunit_iter_unpin_sgt(void *data) 1022 + { 1023 + struct sg_table *sgt = data; 1024 + 1025 + for (unsigned int i = 0; i < sgt->nents; ++i) 1026 + unpin_user_page(sg_page(&sgt->sgl[i])); 1027 + } 1028 + 1029 + static void __init 1030 + iov_kunit_iter_to_sg_init(struct kunit *test, size_t bufsize, bool user, 1031 + struct iov_kunit_iter_to_sg_data *data) 1032 + { 1033 + struct page **spages; 1034 + struct scatterlist *sg; 1035 + unsigned long uaddr; 1036 + size_t i; 1037 + 1038 + data->npages = bufsize / PAGE_SIZE; 1039 + sg = kunit_kmalloc_array(test, data->npages, sizeof(*sg), GFP_KERNEL); 1040 + KUNIT_ASSERT_NOT_ERR_OR_NULL(test, sg); 1041 + sg_init_table(sg, data->npages); 1042 + data->sgt = kunit_kzalloc(test, sizeof(*data->sgt), GFP_KERNEL); 1043 + KUNIT_ASSERT_NOT_ERR_OR_NULL(test, data->sgt); 1044 + data->sgt->orig_nents = 0; 1045 + data->sgt->sgl = sg; 1046 + 1047 + data->buffer = NULL; 1048 + data->ubuf = NULL; 1049 + if (user) { 1050 + uaddr = kunit_vm_mmap(test, NULL, 0, bufsize, 1051 + PROT_READ | PROT_WRITE, 1052 + MAP_ANONYMOUS | MAP_PRIVATE, 0); 1053 + KUNIT_ASSERT_NE(test, uaddr, 0); 1054 + data->ubuf = (u8 __user *)uaddr; 1055 + for (i = 0; i < bufsize; ++i) 1056 + put_user(pattern(i), data->ubuf + i); 1057 + } else { 1058 + data->buffer = iov_kunit_create_buffer(test, &data->pages, 1059 + data->npages); 1060 + for (i = 0; i < bufsize; ++i) 1061 + data->buffer[i] = pattern(i); 1062 + } 1063 + data->scratch = iov_kunit_create_buffer(test, &spages, data->npages); 1064 + memset(data->scratch, 0, bufsize); 1065 + } 1066 + 1067 + static void __init 1068 + iov_kunit_iter_to_sg_check(struct kunit *test, struct iov_iter *iter, 1069 + size_t bufsize, 1070 + struct iov_kunit_iter_to_sg_data *data) 1071 + { 1072 + static const size_t tail = 16 * PAGE_SIZE; 1073 + size_t i; 1074 + 1075 + KUNIT_ASSERT_LT(test, tail, bufsize); 1076 + 1077 + if (iov_iter_extract_will_pin(iter)) 1078 + kunit_add_action_or_reset(test, iov_kunit_iter_unpin_sgt, 1079 + data->sgt); 1080 + 1081 + i = extract_iter_to_sg(iter, bufsize, data->sgt, 0, 0); 1082 + KUNIT_ASSERT_EQ(test, i, 0); 1083 + KUNIT_ASSERT_EQ(test, data->sgt->nents, 0); 1084 + 1085 + i = extract_iter_to_sg(iter, bufsize - tail, data->sgt, 1, 0); 1086 + KUNIT_ASSERT_LE(test, i, bufsize - tail); 1087 + KUNIT_ASSERT_EQ(test, data->sgt->nents, 1); 1088 + 1089 + i += extract_iter_to_sg(iter, bufsize - tail - i, data->sgt, 1090 + data->npages - data->sgt->nents, 0); 1091 + KUNIT_ASSERT_EQ(test, i, bufsize - tail); 1092 + KUNIT_ASSERT_LE(test, data->sgt->nents, data->npages); 1093 + 1094 + i += extract_iter_to_sg(iter, tail, data->sgt, 1095 + data->npages - data->sgt->nents, 0); 1096 + KUNIT_ASSERT_EQ(test, i, bufsize); 1097 + KUNIT_ASSERT_LE(test, data->sgt->nents, data->npages); 1098 + 1099 + sg_mark_end(&data->sgt->sgl[data->sgt->nents - 1]); 1100 + 1101 + i = sg_copy_to_buffer(data->sgt->sgl, data->sgt->nents, 1102 + data->scratch, bufsize); 1103 + KUNIT_ASSERT_EQ(test, i, bufsize); 1104 + 1105 + for (i = 0; i < bufsize; ++i) { 1106 + KUNIT_EXPECT_EQ_MSG(test, data->scratch[i], pattern(i), 1107 + "at i=%zx", i); 1108 + if (data->scratch[i] != pattern(i)) 1109 + break; 1110 + } 1111 + 1112 + KUNIT_EXPECT_EQ(test, i, bufsize); 1113 + } 1114 + 1115 + static void __init iov_kunit_iter_to_sg_kvec(struct kunit *test) 1116 + { 1117 + struct iov_kunit_iter_to_sg_data data; 1118 + struct iov_iter iter; 1119 + struct kvec kvec; 1120 + size_t bufsize; 1121 + 1122 + bufsize = 0x100000; 1123 + iov_kunit_iter_to_sg_init(test, bufsize, false, &data); 1124 + 1125 + kvec.iov_base = data.buffer; 1126 + kvec.iov_len = bufsize; 1127 + iov_iter_kvec(&iter, READ, &kvec, 1, bufsize); 1128 + 1129 + iov_kunit_iter_to_sg_check(test, &iter, bufsize, &data); 1130 + } 1131 + 1132 + static void __init iov_kunit_iter_to_sg_bvec(struct kunit *test) 1133 + { 1134 + struct iov_kunit_iter_to_sg_data data; 1135 + struct page *p, *can_merge = NULL; 1136 + size_t i, k, bufsize; 1137 + struct bio_vec *bvec; 1138 + struct iov_iter iter; 1139 + 1140 + bufsize = 0x100000; 1141 + iov_kunit_iter_to_sg_init(test, bufsize, false, &data); 1142 + 1143 + bvec = kunit_kmalloc_array(test, data.npages, sizeof(*bvec), 1144 + GFP_KERNEL); 1145 + KUNIT_ASSERT_NOT_ERR_OR_NULL(test, bvec); 1146 + k = 0; 1147 + for (i = 0; i < data.npages; ++i) { 1148 + p = data.pages[i]; 1149 + if (p == can_merge) 1150 + bvec[k-1].bv_len += PAGE_SIZE; 1151 + else 1152 + bvec_set_page(&bvec[k++], p, PAGE_SIZE, 0); 1153 + can_merge = p + 1; 1154 + } 1155 + iov_iter_bvec(&iter, READ, bvec, k, bufsize); 1156 + 1157 + iov_kunit_iter_to_sg_check(test, &iter, bufsize, &data); 1158 + } 1159 + 1160 + static void __init iov_kunit_iter_to_sg_folioq(struct kunit *test) 1161 + { 1162 + struct iov_kunit_iter_to_sg_data data; 1163 + struct folio_queue *folioq; 1164 + struct iov_iter iter; 1165 + size_t bufsize; 1166 + 1167 + bufsize = 0x100000; 1168 + iov_kunit_iter_to_sg_init(test, bufsize, false, &data); 1169 + 1170 + folioq = iov_kunit_create_folioq(test); 1171 + iov_kunit_load_folioq(test, &iter, READ, folioq, data.pages, 1172 + data.npages); 1173 + 1174 + iov_kunit_iter_to_sg_check(test, &iter, bufsize, &data); 1175 + } 1176 + 1177 + static void __init iov_kunit_iter_to_sg_xarray(struct kunit *test) 1178 + { 1179 + struct iov_kunit_iter_to_sg_data data; 1180 + struct xarray *xarray; 1181 + struct iov_iter iter; 1182 + size_t bufsize; 1183 + 1184 + bufsize = 0x100000; 1185 + iov_kunit_iter_to_sg_init(test, bufsize, false, &data); 1186 + 1187 + xarray = iov_kunit_create_xarray(test); 1188 + iov_kunit_load_xarray(test, &iter, READ, xarray, data.pages, 1189 + data.npages); 1190 + 1191 + iov_kunit_iter_to_sg_check(test, &iter, bufsize, &data); 1192 + } 1193 + 1194 + static void __init iov_kunit_iter_to_sg_ubuf(struct kunit *test) 1195 + { 1196 + struct iov_kunit_iter_to_sg_data data; 1197 + struct iov_iter iter; 1198 + size_t bufsize; 1199 + 1200 + bufsize = 0x100000; 1201 + iov_kunit_iter_to_sg_init(test, bufsize, true, &data); 1202 + 1203 + iov_iter_ubuf(&iter, READ, data.ubuf, bufsize); 1204 + 1205 + iov_kunit_iter_to_sg_check(test, &iter, bufsize, &data); 1206 + } 1207 + 1021 1208 static struct kunit_case __refdata iov_kunit_cases[] = { 1022 1209 KUNIT_CASE(iov_kunit_copy_to_kvec), 1023 1210 KUNIT_CASE(iov_kunit_copy_from_kvec), ··· 1227 1022 KUNIT_CASE(iov_kunit_extract_pages_bvec), 1228 1023 KUNIT_CASE(iov_kunit_extract_pages_folioq), 1229 1024 KUNIT_CASE(iov_kunit_extract_pages_xarray), 1025 + KUNIT_CASE(iov_kunit_iter_to_sg_kvec), 1026 + KUNIT_CASE(iov_kunit_iter_to_sg_bvec), 1027 + KUNIT_CASE(iov_kunit_iter_to_sg_folioq), 1028 + KUNIT_CASE(iov_kunit_iter_to_sg_xarray), 1029 + KUNIT_CASE(iov_kunit_iter_to_sg_ubuf), 1230 1030 {} 1231 1031 }; 1232 1032

+16 -2

lib/ts_bm.c

··· 163 163 struct ts_config *conf; 164 164 struct ts_bm *bm; 165 165 int i; 166 - unsigned int prefix_tbl_len = len * sizeof(unsigned int); 167 - size_t priv_size = sizeof(*bm) + len + prefix_tbl_len; 166 + unsigned int prefix_tbl_len; 167 + size_t priv_size; 168 + 169 + /* Zero-length patterns would underflow bm_find()'s initial shift. */ 170 + if (unlikely(!len)) 171 + return ERR_PTR(-EINVAL); 172 + 173 + /* 174 + * bm->pattern is stored immediately after the good_shift[] table. 175 + * Reject lengths that would wrap while sizing either region. 176 + */ 177 + if (unlikely(check_mul_overflow(len, sizeof(*bm->good_shift), 178 + &prefix_tbl_len) || 179 + check_add_overflow(sizeof(*bm), (size_t)len, &priv_size) || 180 + check_add_overflow(priv_size, prefix_tbl_len, &priv_size))) 181 + return ERR_PTR(-EINVAL); 168 182 169 183 conf = alloc_ts_config(priv_size, gfp_mask); 170 184 if (IS_ERR(conf))

+16 -2

lib/ts_kmp.c

··· 94 94 struct ts_config *conf; 95 95 struct ts_kmp *kmp; 96 96 int i; 97 - unsigned int prefix_tbl_len = len * sizeof(unsigned int); 98 - size_t priv_size = sizeof(*kmp) + len + prefix_tbl_len; 97 + unsigned int prefix_tbl_len; 98 + size_t priv_size; 99 + 100 + /* Zero-length patterns would make kmp_find() read beyond kmp->pattern. */ 101 + if (unlikely(!len)) 102 + return ERR_PTR(-EINVAL); 103 + 104 + /* 105 + * kmp->pattern is stored immediately after the prefix_tbl[] table. 106 + * Reject lengths that would wrap while sizing either region. 107 + */ 108 + if (unlikely(check_mul_overflow(len, sizeof(*kmp->prefix_tbl), 109 + &prefix_tbl_len) || 110 + check_add_overflow(sizeof(*kmp), (size_t)len, &priv_size) || 111 + check_add_overflow(priv_size, prefix_tbl_len, &priv_size))) 112 + return ERR_PTR(-EINVAL); 99 113 100 114 conf = alloc_ts_config(priv_size, gfp_mask); 101 115 if (IS_ERR(conf))

+1 -1

lib/uuid.c

··· 54 54 static void __uuid_gen_common(__u8 b[16]) 55 55 { 56 56 get_random_bytes(b, 16); 57 - /* reversion 0b10 */ 57 + /* revision 0b10 */ 58 58 b[8] = (b[8] & 0x3F) | 0x80; 59 59 } 60 60

+3 -3

scripts/bloat-o-meter

··· 18 18 group.add_argument('-d', help='Show delta of Data Section', action='store_true') 19 19 group.add_argument('-t', help='Show delta of text Section', action='store_true') 20 20 parser.add_argument('-p', dest='prefix', help='Arch prefix for the tool being used. Useful in cross build scenarios') 21 - parser.add_argument('file1', help='First file to compare') 22 - parser.add_argument('file2', help='Second file to compare') 21 + parser.add_argument('file_old', help='First file to compare') 22 + parser.add_argument('file_new', help='Second file to compare') 23 23 24 24 args = parser.parse_args() 25 25 ··· 86 86 87 87 def print_result(symboltype, symbolformat): 88 88 grow, shrink, add, remove, up, down, delta, old, new, otot, ntot = \ 89 - calc(args.file1, args.file2, symbolformat) 89 + calc(args.file_old, args.file_new, symbolformat) 90 90 91 91 print("add/remove: %s/%s grow/shrink: %s/%s up/down: %s/%s (%s)" % \ 92 92 (add, remove, grow, shrink, up, -down, up-down))

+12 -2

scripts/checkpatch.pl

··· 641 641 Reviewed-by:| 642 642 Reported-by:| 643 643 Suggested-by:| 644 + Assisted-by:| 644 645 To:| 645 646 Cc: 646 647 )}; ··· 3104 3103 $fixed[$fixlinenr] = 3105 3104 "$ucfirst_sign_off $email"; 3106 3105 } 3106 + } 3107 + 3108 + # Assisted-by uses AGENT_NAME:MODEL_VERSION format, not email 3109 + if ($sign_off =~ /^Assisted-by:/i) { 3110 + if ($email !~ /^\S+:\S+/) { 3111 + WARN("BAD_SIGN_OFF", 3112 + "Assisted-by expects 'AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]' format\n" . $herecurr); 3113 + } 3114 + next; 3107 3115 } 3108 3116 3109 3117 my ($email_name, $name_comment, $email_address, $comment) = parse_email($email); ··· 7512 7502 } 7513 7503 7514 7504 # check for various structs that are normally const (ops, kgdb, device_tree) 7515 - # and avoid what seem like struct definitions 'struct foo {' 7505 + # and avoid what seem like struct definitions 'struct foo {' or forward declarations 'struct foo;' 7516 7506 if (defined($const_structs) && 7517 7507 $line !~ /\bconst\b/ && 7518 - $line =~ /\bstruct\s+($const_structs)\b(?!\s*\{)/) { 7508 + $line =~ /\bstruct\s+($const_structs)\b(?!\s*[\{;])/) { 7519 7509 WARN("CONST_STRUCT", 7520 7510 "struct $1 should normally be const\n" . $herecurr); 7521 7511 }

+22 -4

scripts/decode_stacktrace.sh

··· 5 5 6 6 usage() { 7 7 echo "Usage:" 8 - echo " $0 -r <release>" 9 - echo " $0 [<vmlinux> [<base_path>|auto [<modules_path>]]]" 8 + echo " $0 [-R] -r <release>" 9 + echo " $0 [-R] [<vmlinux> [<base_path>|auto [<modules_path>]]]" 10 10 echo " $0 -h" 11 + echo "Options:" 12 + echo " -R: decode return address instead of caller address." 11 13 } 12 14 13 15 # Try to find a Rust demangler ··· 35 33 READELF=${UTIL_PREFIX}readelf${UTIL_SUFFIX} 36 34 ADDR2LINE=${UTIL_PREFIX}addr2line${UTIL_SUFFIX} 37 35 NM=${UTIL_PREFIX}nm${UTIL_SUFFIX} 36 + decode_retaddr=false 38 37 39 38 if [[ $1 == "-h" ]] ; then 40 39 usage 41 40 exit 0 42 - elif [[ $1 == "-r" ]] ; then 41 + elif [[ $1 == "-R" ]] ; then 42 + decode_retaddr=true 43 + shift 1 44 + fi 45 + 46 + if [[ $1 == "-r" ]] ; then 43 47 vmlinux="" 44 48 basepath="auto" 45 49 modpath="" ··· 184 176 # Let's start doing the math to get the exact address into the 185 177 # symbol. First, strip out the symbol total length. 186 178 local expr=${symbol%/*} 179 + # Also parse the offset from symbol. 180 + local offset=${expr#*+} 181 + offset=$((offset)) 187 182 188 183 # Now, replace the symbol name with the base address we found 189 184 # before. 190 185 expr=${expr/$name/0x$base_addr} 191 186 192 187 # Evaluate it to find the actual address 193 - expr=$((expr)) 188 + # The stack trace shows the return address, which is the next 189 + # instruction after the actual call, so as long as it's in the same 190 + # symbol, subtract one from that to point the call instruction. 191 + if [[ $decode_retaddr == false && $offset != 0 ]]; then 192 + expr=$((expr-1)) 193 + else 194 + expr=$((expr)) 195 + fi 194 196 local address=$(printf "%x\n" "$expr") 195 197 196 198 # Pass it to addr2line to get filename and line number

+1 -2

scripts/decodecode

··· 12 12 13 13 cleanup() { 14 14 rm -f $T $T.s $T.o $T.oo $T.aa $T.dis 15 - exit 1 16 15 } 17 16 18 17 die() { ··· 48 49 49 50 if [ -z "$code" ]; then 50 51 rm $T 51 - exit 52 + die "Code line not found" 52 53 fi 53 54 54 55 echo $code

+1 -1

scripts/gdb/linux/symbols.py

··· 298 298 if p == "-bpf": 299 299 monitor_bpf = True 300 300 else: 301 - p.append(os.path.abspath(os.path.expanduser(p))) 301 + self.module_paths.append(os.path.abspath(os.path.expanduser(p))) 302 302 self.module_paths.append(os.getcwd()) 303 303 304 304 if self.breakpoint is not None:

+7 -2

scripts/get_maintainer.pl

··· 375 375 ##Filename pattern matching 376 376 if ($type eq "F" || $type eq "X") { 377 377 $value =~ s@\.@\\\.@g; ##Convert . to \. 378 + $value =~ s/\*\*/\x00/g; ##Convert ** to placeholder 378 379 $value =~ s/\*/\.\*/g; ##Convert * to .* 379 380 $value =~ s/\?/\./g; ##Convert ? to . 381 + $value =~ s/\x00/(?:.*)/g; ##Convert placeholder to (?:.*) 380 382 ##if pattern is a directory and it lacks a trailing slash, add one 381 383 if ((-d $value)) { 382 384 $value =~ s@([^/])$@$1/@; ··· 748 746 if (($type eq "F" || $type eq "X") && 749 747 ($self_test eq "" || $self_test =~ /\bpatterns\b/)) { 750 748 $value =~ s@\.@\\\.@g; ##Convert . to \. 749 + $value =~ s/\*\*/\x00/g; ##Convert ** to placeholder 751 750 $value =~ s/\*/\.\*/g; ##Convert * to .* 752 751 $value =~ s/\?/\./g; ##Convert ? to . 752 + $value =~ s/\x00/(?:.*)/g; ##Convert placeholder to (?:.*) 753 753 ##if pattern is a directory and it lacks a trailing slash, add one 754 754 if ((-d $value)) { 755 755 $value =~ s@([^/])$@$1/@; ··· 925 921 my $value_pd = ($value =~ tr@/@@); 926 922 my $file_pd = ($file =~ tr@/@@); 927 923 $value_pd++ if (substr($value,-1,1) ne "/"); 928 - $value_pd = -1 if ($value =~ /^\.\*/); 924 + $value_pd = -1 if ($value =~ /^(\.\*|$\?:\.\*$)/); 929 925 if ($value_pd >= $file_pd && 930 926 range_is_maintained($start, $end) && 931 927 range_has_maintainer($start, $end)) { ··· 959 955 $line =~ s/([^\\])\.([^\*])/$1\?$2/g; 960 956 $line =~ s/([^\\])\.$/$1\?/g; ##Convert . back to ? 961 957 $line =~ s/\\\./\./g; ##Convert \. to . 958 + $line =~ s/$\?:\.\*$/\*\*/g; ##Convert (?:.*) to ** 962 959 $line =~ s/\.\*/\*/g; ##Convert .* to * 963 960 } 964 961 my $count = $line =~ s/^([A-Z]):/$1:\t/g; ··· 1053 1048 if ($file =~ m@^$pattern@) { 1054 1049 my $s1 = ($file =~ tr@/@@); 1055 1050 my $s2 = ($pattern =~ tr@/@@); 1056 - if ($s1 == $s2) { 1051 + if ($s1 == $s2 || $pattern =~ /\(\?:/) { 1057 1052 return 1; 1058 1053 } 1059 1054 }

+169 -167

scripts/spelling.txt

··· 57 57 ackowledge||acknowledge 58 58 ackowledged||acknowledged 59 59 acording||according 60 - activete||activate 61 60 actived||activated 61 + activete||activate 62 62 actualy||actually 63 63 actvie||active 64 64 acumulating||accumulating ··· 66 66 acumulator||accumulator 67 67 acutally||actually 68 68 adapater||adapter 69 + adddress||address 69 70 adderted||asserted 70 71 addional||additional 71 72 additionaly||additionally 72 73 additonal||additional 73 74 addres||address 74 - adddress||address 75 75 addreses||addresses 76 76 addresss||address 77 77 addrress||address ··· 95 95 algined||aligned 96 96 algorith||algorithm 97 97 algorithmical||algorithmically 98 + algorithmn||algorithm 98 99 algoritm||algorithm 99 100 algoritms||algorithms 100 - algorithmn||algorithm 101 101 algorrithm||algorithm 102 102 algorritm||algorithm 103 103 aligment||alignment ··· 128 128 amout||amount 129 129 amplifer||amplifier 130 130 amplifyer||amplifier 131 - an union||a union 132 - an user||a user 133 - an userspace||a userspace 134 - an one||a one 135 131 analysator||analyzer 136 132 ang||and 137 133 anniversery||anniversary 138 134 annoucement||announcement 139 135 anomolies||anomalies 140 136 anomoly||anomaly 137 + an one||a one 141 138 anonynous||anonymous 139 + an union||a union 140 + an user||a user 141 + an userspace||a userspace 142 142 anway||anyway 143 - aplication||application 144 143 apeared||appeared 144 + aplication||application 145 145 appearence||appearance 146 146 applicaion||application 147 147 appliction||application ··· 155 155 apropriate||appropriate 156 156 aquainted||acquainted 157 157 aquired||acquired 158 - aquisition||acquisition 159 158 aquires||acquires 159 + aquisition||acquisition 160 160 arbitary||arbitrary 161 161 architechture||architecture 162 162 archtecture||architecture ··· 189 189 assumtpion||assumption 190 190 asume||assume 191 191 asuming||assuming 192 - asycronous||asynchronous 193 192 asychronous||asynchronous 194 - asynchnous||asynchronous 195 - asynchrnous||asynchronous 196 - asynchronus||asynchronous 197 - asynchromous||asynchronous 193 + asycronous||asynchronous 198 194 asymetric||asymmetric 199 195 asymmeric||asymmetric 196 + asynchnous||asynchronous 197 + asynchrnous||asynchronous 198 + asynchromous||asynchronous 199 + asynchronus||asynchronous 200 + atempt||attempt 200 201 atleast||at least 201 202 atomatically||automatically 202 203 atomicly||atomically 203 - atempt||attempt 204 204 atrributes||attributes 205 205 attachement||attachment 206 206 attatch||attach 207 207 attched||attached 208 208 attemp||attempt 209 - attemps||attempts 210 209 attemping||attempting 210 + attemps||attempts 211 211 attepmpt||attempt 212 212 attnetion||attention 213 213 attruibutes||attributes 214 - authentification||authentication 215 214 authenicated||authenticated 215 + authentification||authentication 216 216 automaticaly||automatically 217 217 automaticly||automatically 218 218 automatize||automate ··· 257 257 beter||better 258 258 betweeen||between 259 259 bianries||binaries 260 + binded||bound 260 261 bitmast||bitmask 261 262 bitwiedh||bitwidth 262 263 boardcast||broadcast ··· 288 287 calulate||calculate 289 288 cancelation||cancellation 290 289 cancle||cancel 291 - cant||can't 292 - cant'||can't 293 - canot||cannot 294 - cann't||can't 295 290 cannnot||cannot 291 + cann't||can't 292 + canot||cannot 293 + cant'||can't 294 + cant||can't 296 295 capabiity||capability 297 296 capabilites||capabilities 298 297 capabilties||capabilities 299 298 capabilty||capability 300 299 capabitilies||capabilities 301 300 capablity||capability 302 - capatibilities||capabilities 303 301 capapbilities||capabilities 302 + capatibilities||capabilities 304 303 captuer||capture 305 304 caputure||capture 306 305 carefuly||carefully ··· 308 307 casued||caused 309 308 catagory||category 310 309 cehck||check 310 + chache||cache 311 311 challange||challenge 312 312 challanges||challenges 313 - chache||cache 314 313 chanell||channel 315 314 changable||changeable 316 315 chanined||chained ··· 348 347 collapsable||collapsible 349 348 colorfull||colorful 350 349 comand||command 350 + comaptible||compatible 351 351 comit||commit 352 352 commerical||commercial 353 353 comming||coming ··· 359 357 commmand||command 360 358 commnunication||communication 361 359 commoditiy||commodity 362 - comsume||consume 363 - comsumer||consumer 364 - comsuming||consuming 365 - comaptible||compatible 366 360 compability||compatibility 367 361 compaibility||compatibility 368 362 comparsion||comparison ··· 374 376 completition||completion 375 377 completly||completely 376 378 complient||compliant 377 - componnents||components 378 379 compoment||component 380 + componnents||components 379 381 comppatible||compatible 380 382 compres||compress 381 383 compresion||compression 382 384 compresser||compressor 383 385 comression||compression 386 + comsume||consume 384 387 comsumed||consumed 388 + comsumer||consumer 389 + comsuming||consuming 385 390 comunicate||communicate 386 391 comunication||communication 387 392 conbination||combination 388 393 concurent||concurrent 389 394 conditionaly||conditionally 390 395 conditon||condition 391 - condtion||condition 392 396 condtional||conditional 397 + condtion||condition 393 398 conected||connected 394 399 conector||connector 395 400 configed||configured ··· 429 428 continously||continuously 430 429 continueing||continuing 431 430 contiuous||continuous 432 - contraints||constraints 433 - contruct||construct 434 431 contol||control 435 432 contoller||controller 433 + contraints||constraints 436 434 controled||controlled 437 435 controler||controller 438 436 controll||control 437 + contruct||construct 439 438 contruction||construction 440 439 contry||country 441 440 conuntry||country ··· 466 465 decendant||descendant 467 466 decendants||descendants 468 467 decompres||decompress 469 - decsribed||described 470 468 decrese||decrease 471 469 decription||description 472 - detault||default 470 + decsribed||described 473 471 dectected||detected 474 472 defailt||default 475 473 deferal||deferral ··· 482 482 defintions||definitions 483 483 defualt||default 484 484 defult||default 485 - deintializing||deinitializing 486 - deintialize||deinitialize 487 485 deintialized||deinitialized 486 + deintialize||deinitialize 487 + deintializing||deinitializing 488 488 deivce||device 489 489 delared||declared 490 490 delare||declare ··· 494 494 deley||delay 495 495 delibrately||deliberately 496 496 delievered||delivered 497 - demodualtor||demodulator 498 497 demension||dimension 498 + demodualtor||demodulator 499 499 dependancies||dependencies 500 500 dependancy||dependency 501 501 dependant||dependent ··· 505 505 desactivate||deactivate 506 506 desciptor||descriptor 507 507 desciptors||descriptors 508 - descritpor||descriptor 509 508 descripto||descriptor 510 509 descripton||description 511 510 descrition||description 511 + descritpor||descriptor 512 512 descritptor||descriptor 513 513 desctiptor||descriptor 514 + desination||destination 514 515 desriptor||descriptor 515 516 desriptors||descriptors 516 - desination||destination 517 517 destionation||destination 518 518 destoried||destroyed 519 519 destory||destroy ··· 521 521 destorys||destroys 522 522 destroied||destroyed 523 523 detabase||database 524 + detault||default 524 525 deteced||detected 525 526 detecion||detection 526 527 detectt||detect ··· 536 535 devided||divided 537 536 deviece||device 538 537 devision||division 539 - diable||disable 540 538 diabled||disabled 539 + diable||disable 541 540 dicline||decline 541 + diconnected||disconnected 542 542 dictionnary||dictionary 543 543 didnt||didn't 544 544 diferent||different 545 - differrence||difference 546 - diffrent||different 547 545 differenciate||differentiate 546 + differrence||difference 548 547 diffreential||differential 548 + diffrent||different 549 549 diffrentiate||differentiate 550 550 difinition||definition 551 551 digial||digital 552 552 dimention||dimension 553 553 dimesions||dimensions 554 - diconnected||disconnected 555 - disabed||disabled 556 - disasembler||disassembler 557 - disble||disable 558 - disgest||digest 559 - disired||desired 560 - dispalying||displaying 561 - dissable||disable 562 - dissapeared||disappeared 563 554 diplay||display 564 - directon||direction 565 555 direcly||directly 556 + directon||direction 566 557 direectly||directly 567 558 diregard||disregard 568 - disassocation||disassociation 569 - disassocative||disassociative 559 + disabed||disabled 570 560 disapear||disappear 571 561 disapeared||disappeared 572 562 disappared||disappeared 573 - disbale||disable 563 + disasembler||disassembler 564 + disassocation||disassociation 565 + disassocative||disassociative 574 566 disbaled||disabled 575 - disble||disable 567 + disbale||disable 576 568 disbled||disabled 569 + disble||disable 570 + disble||disable 577 571 disconnet||disconnect 578 572 discontinous||discontinuous 573 + disgest||digest 579 574 disharge||discharge 575 + disired||desired 580 576 disnabled||disabled 577 + dispalying||displaying 581 578 dispertion||dispersion 579 + dissable||disable 580 + dissapeared||disappeared 582 581 dissapears||disappears 583 582 dissconect||disconnect 584 583 distiction||distinction 585 584 divisable||divisible 586 585 divsiors||divisors 587 - dsiabled||disabled 588 586 docuentation||documentation 589 587 documantation||documentation 590 588 documentaion||documentation ··· 598 598 droped||dropped 599 599 droput||dropout 600 600 druing||during 601 + dsiabled||disabled 601 602 dyanmic||dynamic 602 603 dynmaic||dynamic 603 604 eanable||enable ··· 622 621 enchanced||enhanced 623 622 encorporating||incorporating 624 623 encrupted||encrypted 625 - encrypiton||encryption 626 624 encryped||encrypted 625 + encrypiton||encryption 627 626 encryptio||encryption 628 627 endianess||endianness 629 - enpoint||endpoint 630 628 enhaced||enhanced 631 629 enlightnment||enlightenment 630 + enocded||encoded 631 + enought||enough 632 + enpoint||endpoint 632 633 enqueing||enqueuing 634 + enterily||entirely 633 635 entires||entries 634 636 entites||entities 635 637 entrys||entries 636 - enocded||encoded 637 - enought||enough 638 - enterily||entirely 639 638 enviroiment||environment 640 639 enviroment||environment 641 640 environement||environment ··· 654 653 evalutes||evaluates 655 654 evalution||evaluation 656 655 evaulated||evaluated 657 - excecutable||executable 656 + exaclty||exactly 658 657 excceed||exceed 658 + excecutable||executable 659 659 exceded||exceeded 660 660 exceds||exceeds 661 661 exceeed||exceed ··· 670 668 existance||existence 671 669 existant||existent 672 670 exixt||exist 673 - exsits||exists 674 671 exlcude||exclude 675 672 exlcuding||excluding 676 673 exlcusive||exclusive 677 - exlusive||exclusive 678 674 exlicitly||explicitly 675 + exlusive||exclusive 679 676 exmaple||example 680 677 expecially||especially 681 678 experies||expires 682 679 explicite||explicit 683 - explicity||explicitly 684 680 explicitely||explicitly 685 - explict||explicit 681 + explicity||explicitly 686 682 explictely||explicitly 683 + explict||explicit 687 684 explictly||explicitly 688 685 expresion||expression 689 686 exprienced||experienced 690 687 exprimental||experimental 691 - extened||extended 688 + exsits||exists 692 689 exteneded||extended 690 + extened||extended 693 691 extensability||extensibility 694 - extention||extension 695 692 extenstion||extension 693 + extention||extension 696 694 extracter||extractor 697 695 faied||failed 698 696 faield||failed 699 - faild||failed 700 697 failded||failed 698 + faild||failed 701 699 failer||failure 702 - faill||fail 703 700 failied||failed 701 + faill||fail 704 702 faillure||failure 703 + failng||failing 705 704 failue||failure 706 705 failuer||failure 707 - failng||failing 708 706 faireness||fairness 709 707 falied||failed 710 708 faliure||failure ··· 719 717 fileystem||filesystem 720 718 fimrware||firmware 721 719 fimware||firmware 720 + finanize||finalize 721 + findn||find 722 + finilizes||finalizes 723 + finsih||finish 722 724 firmare||firmware 723 725 firmaware||firmware 724 726 firtly||firstly 725 727 firware||firmware 726 728 firwmare||firmware 727 - finanize||finalize 728 - findn||find 729 - finilizes||finalizes 730 - finsih||finish 731 729 fliter||filter 732 730 flusing||flushing 733 731 folloing||following ··· 744 742 frambuffer||framebuffer 745 743 framming||framing 746 744 framwork||framework 745 + frequancy||frequency 747 746 frequence||frequency 748 747 frequncy||frequency 749 - frequancy||frequency 750 748 frome||from 751 749 fronend||frontend 752 750 fucntion||function ··· 768 766 gateing||gating 769 767 gauage||gauge 770 768 gaurenteed||guaranteed 771 - generiously||generously 772 769 genereate||generate 773 770 genereted||generated 771 + generiously||generously 774 772 genric||generic 775 773 gerenal||general 776 774 geting||getting ··· 792 790 hanled||handled 793 791 happend||happened 794 792 hardare||hardware 795 - harware||hardware 796 793 hardward||hardware 794 + harware||hardware 797 795 havind||having 796 + hearbeat||heartbeat 798 797 heigth||height 798 + heirachy||hierarchy 799 799 heirarchically||hierarchically 800 800 heirarchy||hierarchy 801 - heirachy||hierarchy 802 801 helpfull||helpful 803 - hearbeat||heartbeat 804 802 heterogenous||heterogeneous 805 803 hexdecimal||hexadecimal 806 - hybernate||hibernate 807 804 hiearchy||hierarchy 808 805 hierachy||hierarchy 809 806 hierarchie||hierarchy ··· 810 809 horizental||horizontal 811 810 howver||however 812 811 hsould||should 812 + hybernate||hibernate 813 813 hypervior||hypervisor 814 814 hypter||hyper 815 815 idel||idle 816 816 identidier||identifier 817 817 iligal||illegal 818 - illigal||illegal 819 818 illgal||illegal 820 - iomaped||iomapped 819 + illigal||illegal 821 820 imblance||imbalance 822 821 immeadiately||immediately 823 822 immedaite||immediate ··· 832 831 implemenation||implementation 833 832 implementaiton||implementation 834 833 implementated||implemented 835 - implemention||implementation 836 834 implementd||implemented 835 + implemention||implementation 837 836 implemetation||implementation 838 837 implemntation||implementation 839 838 implentation||implementation 840 839 implmentation||implementation 841 840 implmenting||implementing 841 + inavlid||invalid 842 842 incative||inactive 843 843 incomming||incoming 844 844 incompaitiblity||incompatibility ··· 871 869 ingore||ignore 872 870 inheritence||inheritance 873 871 inital||initial 874 - initalized||initialized 875 872 initalised||initialized 876 873 initalise||initialize 874 + initalized||initialized 877 875 initalize||initialize 878 876 initation||initiation 879 877 initators||initiators ··· 881 879 initializationg||initialization 882 880 initializiation||initialization 883 881 initializtion||initialization 884 - initialze||initialize 885 882 initialzed||initialized 883 + initialze||initialize 886 884 initialzing||initializing 887 885 initilization||initialization 886 + initilized||initialized 888 887 initilize||initialize 889 888 initliaze||initialize 890 - initilized||initialized 891 889 inofficial||unofficial 892 890 inrerface||interface 893 891 insititute||institute 894 892 instace||instance 895 893 instal||install 896 - instanciate||instantiate 897 894 instanciated||instantiated 895 + instanciate||instantiate 898 896 instuments||instruments 899 897 insufficent||insufficient 900 898 intead||instead ··· 913 911 intermittant||intermittent 914 912 internel||internal 915 913 interoprability||interoperability 916 - interuupt||interrupt 917 - interupt||interrupt 918 - interupts||interrupts 919 - interurpt||interrupt 920 914 interrface||interface 921 915 interrrupt||interrupt 922 916 interrup||interrupt 923 917 interrups||interrupts 924 918 interruptted||interrupted 925 919 interupted||interrupted 920 + interupt||interrupt 921 + interupts||interrupts 922 + interurpt||interrupt 923 + interuupt||interrupt 926 924 intiailized||initialized 927 925 intial||initial 928 926 intialisation||initialisation ··· 936 934 intrrupt||interrupt 937 935 intterrupt||interrupt 938 936 intuative||intuitive 939 - inavlid||invalid 940 937 invaid||invalid 941 938 invaild||invalid 942 939 invailid||invalid 943 - invald||invalid 944 940 invalde||invalid 941 + invald||invalid 945 942 invalide||invalid 946 943 invalidiate||invalidate 947 944 invalud||invalid 948 945 invididual||individual 949 946 invokation||invocation 950 947 invokations||invocations 948 + iomaped||iomapped 951 949 ireelevant||irrelevant 952 950 irrelevent||irrelevant 953 951 isnt||isn't ··· 993 991 maangement||management 994 992 machinary||machinery 995 993 maibox||mailbox 994 + mailformed||malformed 996 995 maintainance||maintenance 997 996 maintainence||maintenance 998 997 maintan||maintain 999 998 makeing||making 1000 - mailformed||malformed 1001 999 malplaced||misplaced 1002 1000 malplace||misplace 1003 1001 managable||manageable ··· 1007 1005 manger||manager 1008 1006 manoeuvering||maneuvering 1009 1007 manufaucturing||manufacturing 1010 - mappping||mapping 1011 1008 maping||mapping 1009 + mappping||mapping 1012 1010 matchs||matches 1013 1011 mathimatical||mathematical 1014 1012 mathimatic||mathematic 1015 1013 mathimatics||mathematics 1016 - maxmium||maximum 1017 1014 maximium||maximum 1018 1015 maxium||maximum 1016 + maxmium||maximum 1019 1017 mechamism||mechanism 1020 1018 mechanim||mechanism 1021 1019 meetign||meeting 1022 1020 memeory||memory 1023 1021 memmber||member 1024 1022 memoery||memory 1023 + memomry||memory 1025 1024 memroy||memory 1026 1025 ment||meant 1027 1026 mergable||mergeable ··· 1039 1036 miliseconds||milliseconds 1040 1037 millenium||millennium 1041 1038 milliseonds||milliseconds 1042 - minimim||minimum 1043 - minium||minimum 1044 1039 minimam||minimum 1040 + minimim||minimum 1045 1041 minimun||minimum 1042 + minium||minimum 1046 1043 miniumum||minimum 1047 1044 minumum||minimum 1048 1045 misalinged||misaligned 1049 1046 miscelleneous||miscellaneous 1050 1047 misformed||malformed 1051 - mispelled||misspelled 1052 - mispelt||misspelt 1053 1048 mising||missing 1054 1049 mismactch||mismatch 1050 + mispelled||misspelled 1051 + mispelt||misspelt 1055 1052 missign||missing 1056 1053 missmanaged||mismanaged 1057 1054 missmatch||mismatch ··· 1064 1061 modul||module 1065 1062 modulues||modules 1066 1063 momery||memory 1067 - memomry||memory 1068 1064 monitring||monitoring 1069 1065 monochorome||monochrome 1070 1066 monochromo||monochrome 1071 1067 monocrome||monochrome 1072 1068 mopdule||module 1073 1069 mroe||more 1074 - mulitplied||multiplied 1075 1070 muliple||multiple 1076 - multipler||multiplier 1071 + mulitplied||multiplied 1077 1072 multidimensionnal||multidimensional 1078 1073 multipe||multiple 1074 + multipler||multiplier 1079 1075 multple||multiple 1080 1076 mumber||number 1081 1077 muticast||multicast ··· 1096 1094 nescessary||necessary 1097 1095 nessessary||necessary 1098 1096 none existent||non-existent 1097 + notfify||notify 1099 1098 noticable||noticeable 1100 1099 notication||notification 1101 1100 notications||notifications ··· 1104 1101 notifed||notified 1105 1102 notifer||notifier 1106 1103 notity||notify 1107 - notfify||notify 1108 1104 nubmer||number 1109 1105 numebr||number 1110 1106 numer||number ··· 1121 1119 occure||occurred 1122 1120 occuring||occurring 1123 1121 ocurrence||occurrence 1124 - offser||offset 1125 1122 offet||offset 1126 1123 offlaod||offload 1127 1124 offloded||offloaded 1125 + offser||offset 1128 1126 offseting||offsetting 1129 1127 oflload||offload 1130 1128 omited||omitted ··· 1143 1141 optmizations||optimizations 1144 1142 orientatied||orientated 1145 1143 orientied||oriented 1146 - orignal||original 1147 1144 originial||original 1145 + orignal||original 1148 1146 orphanded||orphaned 1149 1147 otherise||otherwise 1150 1148 ouput||output 1151 1149 oustanding||outstanding 1152 - overaall||overall 1153 - overhread||overhead 1154 - overlaping||overlapping 1155 1150 oveflow||overflow 1151 + overaall||overall 1156 1152 overflw||overflow 1157 - overlfow||overflow 1153 + overhread||overhead 1158 1154 overide||override 1155 + overlaping||overlapping 1156 + overlfow||overflow 1159 1157 overrided||overridden 1160 1158 overriden||overridden 1161 1159 overrrun||overrun 1162 1160 overun||overrun 1163 - overwritting||overwriting 1164 1161 overwriten||overwritten 1162 + overwritting||overwriting 1165 1163 pacakge||package 1166 1164 pachage||package 1167 1165 packacge||package ··· 1171 1169 pakage||package 1172 1170 paket||packet 1173 1171 pallette||palette 1174 - paln||plan 1175 1172 palne||plane 1173 + paln||plan 1176 1174 paramameters||parameters 1177 - paramaters||parameters 1178 1175 paramater||parameter 1176 + paramaters||parameters 1179 1177 paramenters||parameters 1180 1178 parametes||parameters 1181 1179 parametised||parametrised ··· 1243 1241 prefferably||preferably 1244 1242 prefitler||prefilter 1245 1243 preform||perform 1246 - previleged||privileged 1247 - previlege||privilege 1248 1244 premption||preemption 1249 1245 prepaired||prepared 1250 1246 prepate||prepare ··· 1250 1250 preprare||prepare 1251 1251 pressre||pressure 1252 1252 presuambly||presumably 1253 + previleged||privileged 1254 + previlege||privilege 1253 1255 previosuly||previously 1254 1256 previsously||previously 1255 1257 primative||primitive ··· 1260 1258 priting||printing 1261 1259 privilaged||privileged 1262 1260 privilage||privilege 1263 - priviledge||privilege 1264 1261 priviledged||privileged 1262 + priviledge||privilege 1265 1263 priviledges||privileges 1266 1264 privleges||privileges 1267 - probaly||probably 1268 1265 probabalistic||probabilistic 1266 + probaly||probably 1269 1267 procceed||proceed 1270 1268 proccesors||processors 1271 1269 procesed||processed 1272 - proces||process 1273 1270 procesing||processing 1271 + proces||process 1274 1272 processessing||processing 1275 1273 processess||processes 1276 1274 processpr||processor ··· 1290 1288 prohibitted||prohibited 1291 1289 prohibitting||prohibiting 1292 1290 promiscous||promiscuous 1291 + promixity||proximity 1293 1292 promps||prompts 1294 1293 pronnounced||pronounced 1295 1294 prononciation||pronunciation ··· 1299 1296 propery||property 1300 1297 propigate||propagate 1301 1298 propigation||propagation 1302 - propogation||propagation 1303 1299 propogate||propagate 1300 + propogation||propagation 1304 1301 prosess||process 1305 1302 protable||portable 1306 1303 protcol||protocol 1307 1304 protecion||protection 1308 1305 protedcted||protected 1309 1306 protocoll||protocol 1310 - promixity||proximity 1311 1307 psudo||pseudo 1312 1308 psuedo||pseudo 1313 1309 psychadelic||psychedelic ··· 1335 1333 recieving||receiving 1336 1334 recogniced||recognised 1337 1335 recognizeable||recognizable 1338 - recompte||recompute 1339 1336 recommanded||recommended 1337 + recompte||recompute 1340 1338 recyle||recycle 1341 1339 redect||reject 1342 1340 redircet||redirect ··· 1346 1344 refcounf||refcount 1347 1345 refence||reference 1348 1346 refered||referred 1349 - referencce||reference 1350 1347 referenace||reference 1348 + referencce||reference 1351 1349 refererence||reference 1352 1350 refering||referring 1353 1351 refernces||references ··· 1355 1353 refrence||reference 1356 1354 regiser||register 1357 1355 registed||registered 1358 - registerd||registered 1359 1356 registeration||registration 1357 + registerd||registered 1360 1358 registeresd||registered 1361 1359 registerred||registered 1362 1360 registes||registers ··· 1373 1371 remoote||remote 1374 1372 remore||remote 1375 1373 removeable||removable 1376 - repective||respective 1377 1374 repectively||respectively 1375 + repective||respective 1378 1376 replacable||replaceable 1379 1377 replacments||replacements 1380 1378 replys||replies ··· 1391 1389 requirment||requirement 1392 1390 requred||required 1393 1391 requried||required 1394 - requst||request 1395 1392 requsted||requested 1393 + requst||request 1396 1394 reregisteration||reregistration 1397 1395 reseting||resetting 1398 1396 reseved||reserved ··· 1414 1412 retreived||retrieved 1415 1413 retreive||retrieve 1416 1414 retreiving||retrieving 1417 - retrive||retrieve 1418 1415 retrived||retrieved 1416 + retrive||retrieve 1419 1417 retrun||return 1420 - retun||return 1421 1418 retuned||returned 1419 + retun||return 1422 1420 reudce||reduce 1423 1421 reuest||request 1424 1422 reuqest||request ··· 1466 1464 seperatly||separately 1467 1465 seperator||separator 1468 1466 sepperate||separate 1469 - seqeunce||sequence 1470 - seqeuncer||sequencer 1471 1467 seqeuencer||sequencer 1468 + seqeuncer||sequencer 1469 + seqeunce||sequence 1472 1470 sequece||sequence 1473 1471 sequemce||sequence 1474 1472 sequencial||sequential ··· 1507 1505 soluation||solution 1508 1506 souce||source 1509 1507 speach||speech 1510 - specfic||specific 1511 1508 specfication||specification 1509 + specfic||specific 1512 1510 specfield||specified 1513 1511 speciefied||specified 1514 1512 specifc||specific ··· 1517 1515 specificaton||specification 1518 1516 specificed||specified 1519 1517 specifing||specifying 1520 - specifiy||specify 1521 1518 specifiying||specifying 1519 + specifiy||specify 1522 1520 speficied||specified 1523 1521 speicify||specify 1524 1522 speling||spelling ··· 1545 1543 straming||streaming 1546 1544 struc||struct 1547 1545 structres||structures 1548 - stuct||struct 1549 1546 strucuture||structure 1547 + stuct||struct 1550 1548 stucture||structure 1551 1549 sturcture||structure 1552 1550 subdirectoires||subdirectories 1553 1551 suble||subtle 1554 - substract||subtract 1555 1552 submited||submitted 1556 1553 submition||submission 1554 + substract||subtract 1557 1555 succeded||succeeded 1558 - suceed||succeed 1559 - succesfuly||successfully 1560 1556 succesfully||successfully 1561 1557 succesful||successful 1558 + succesfuly||successfully 1562 1559 successed||succeeded 1563 1560 successfull||successful 1564 1561 successfuly||successfully 1562 + suceed||succeed 1565 1563 sucessfully||successfully 1566 1564 sucessful||successful 1567 1565 sucess||success ··· 1570 1568 suplied||supplied 1571 1569 suported||supported 1572 1570 suport||support 1573 - supportet||supported 1574 1571 suppored||supported 1575 1572 supporing||supporting 1573 + supportet||supported 1576 1574 supportin||supporting 1577 1575 suppoted||supported 1578 1576 suppported||supported ··· 1583 1581 surpresses||suppresses 1584 1582 susbsystem||subsystem 1585 1583 suspeneded||suspended 1586 - suspsend||suspend 1587 1584 suspicously||suspiciously 1585 + suspsend||suspend 1588 1586 swaping||swapping 1589 1587 switchs||switches 1590 - swith||switch 1591 1588 swithable||switchable 1592 - swithc||switch 1593 1589 swithced||switched 1594 1590 swithcing||switching 1591 + swithc||switch 1595 1592 swithed||switched 1596 1593 swithing||switching 1594 + swith||switch 1597 1595 swtich||switch 1596 + sychronization||synchronization 1597 + sychronously||synchronously 1598 1598 syfs||sysfs 1599 1599 symetric||symmetric 1600 1600 synax||syntax 1601 1601 synchonized||synchronized 1602 - sychronization||synchronization 1603 - sychronously||synchronously 1604 1602 synchronuously||synchronously 1605 - syncronize||synchronize 1606 1603 syncronized||synchronized 1604 + syncronize||synchronize 1607 1605 syncronizing||synchronizing 1608 1606 syncronus||synchronous 1609 1607 syste||system ··· 1612 1610 tagert||target 1613 1611 taht||that 1614 1612 tained||tainted 1615 - tarffic||traffic 1613 + tansition||transition 1616 1614 tansmit||transmit 1615 + tarffic||traffic 1617 1616 targetted||targeted 1618 1617 targetting||targeting 1619 1618 taskelt||tasklet 1620 1619 teh||the 1621 1620 temeprature||temperature 1622 1621 temorary||temporary 1623 - temproarily||temporarily 1624 1622 temperture||temperature 1623 + temproarily||temporarily 1625 1624 theads||threads 1626 1625 therfore||therefore 1627 1626 thier||their ··· 1632 1629 thresold||threshold 1633 1630 throtting||throttling 1634 1631 throught||through 1635 - tansition||transition 1636 - trackling||tracking 1637 - troughput||throughput 1638 - trys||tries 1639 1632 thses||these 1640 - tiggers||triggers 1641 1633 tiggered||triggered 1642 1634 tiggerring||triggering 1643 - tipically||typically 1635 + tiggers||triggers 1644 1636 timeing||timing 1645 1637 timming||timing 1646 1638 timout||timeout 1639 + tipically||typically 1647 1640 tmis||this 1648 1641 tolarance||tolerance 1649 1642 toogle||toggle 1650 1643 torerable||tolerable 1651 1644 torlence||tolerance 1645 + trackling||tracking 1652 1646 traget||target 1653 1647 traking||tracking 1654 1648 tramsmitted||transmitted ··· 1669 1669 trasmission||transmission 1670 1670 trasmitter||transmitter 1671 1671 treshold||threshold 1672 - trigged||triggered 1673 - triggerd||triggered 1674 1672 trigerred||triggered 1675 1673 trigerring||triggering 1674 + trigged||triggered 1675 + triggerd||triggered 1676 + troughput||throughput 1676 1677 trun||turn 1678 + trys||tries 1677 1679 tunning||tuning 1678 1680 ture||true 1679 1681 tyep||type 1680 1682 udpate||update 1681 - updtes||updates 1682 1683 uesd||used 1683 - unknwon||unknown 1684 1684 uknown||unknown 1685 - usccess||success 1685 + unamed||unnamed 1686 1686 uncommited||uncommitted 1687 1687 uncompatible||incompatible 1688 1688 uncomressed||uncompressed ··· 1691 1691 undelying||underlying 1692 1692 underun||underrun 1693 1693 unecessary||unnecessary 1694 + uneeded||unneeded 1694 1695 unexecpted||unexpected 1695 1696 unexepected||unexpected 1696 1697 unexpcted||unexpected ··· 1700 1699 unexpexted||unexpected 1701 1700 unfortunatelly||unfortunately 1702 1701 unifiy||unify 1703 - uniterrupted||uninterrupted 1704 1702 uninterruptable||uninterruptible 1705 1703 unintialized||uninitialized 1704 + uniterrupted||uninterrupted 1706 1705 unitialized||uninitialized 1707 1706 unkmown||unknown 1708 1707 unknonw||unknown 1709 1708 unknouwn||unknown 1710 1709 unknow||unknown 1710 + unknwon||unknown 1711 1711 unkown||unknown 1712 - unamed||unnamed 1713 - uneeded||unneeded 1714 - unneded||unneeded 1712 + unmached||unmatched 1715 1713 unneccecary||unnecessary 1716 1714 unneccesary||unnecessary 1717 1715 unneccessary||unnecessary 1718 1716 unnecesary||unnecessary 1717 + unneded||unneeded 1719 1718 unneedingly||unnecessarily 1720 1719 unnsupported||unsupported 1721 - unuspported||unsupported 1722 - unmached||unmatched 1723 1720 unprecise||imprecise 1724 1721 unpriviledged||unprivileged 1725 1722 unpriviliged||unprivileged ··· 1725 1726 unresgister||unregister 1726 1727 unrgesiter||unregister 1727 1728 unsinged||unsigned 1728 - unstabel||unstable 1729 - unsolicted||unsolicited 1730 1729 unsolicitied||unsolicited 1730 + unsolicted||unsolicited 1731 + unstabel||unstable 1731 1732 unsuccessfull||unsuccessful 1732 1733 unsuported||unsupported 1733 1734 untill||until 1734 1735 ununsed||unused 1735 1736 unuseful||useless 1737 + unuspported||unsupported 1736 1738 unvalid||invalid 1737 1739 upate||update 1740 + updtes||updates 1738 1741 upsupported||unsupported 1739 1742 upto||up to 1743 + usccess||success 1740 1744 useable||usable 1741 1745 usefule||useful 1742 1746 usefull||useful ··· 1761 1759 varible||variable 1762 1760 varient||variant 1763 1761 vaule||value 1764 - verbse||verbose 1765 1762 veify||verify 1763 + verbse||verbose 1766 1764 verfication||verification 1767 1765 veriosn||version 1768 - versoin||version 1769 1766 verisons||versions 1770 1767 verison||version 1771 1768 veritical||vertical 1769 + versoin||version 1772 1770 verson||version 1773 1771 vicefersa||vice-versa 1774 1772 virtal||virtual ··· 1781 1779 was't||wasn't 1782 1780 wathdog||watchdog 1783 1781 wating||waiting 1784 - wiat||wait 1785 1782 wether||whether 1786 1783 whataver||whatever 1787 1784 whcih||which 1788 1785 whenver||whenever 1789 1786 wheter||whether 1790 1787 whe||when 1788 + wiat||wait 1791 1789 wierd||weird 1792 1790 wihout||without 1793 1791 wiil||will

+1 -1

tools/accounting/Makefile

··· 1 1 # SPDX-License-Identifier: GPL-2.0 2 2 CC := $(CROSS_COMPILE)gcc 3 - CFLAGS := -I../../usr/include 3 + CFLAGS := -I../include/uapi/ 4 4 5 5 PROGS := getdelays procacct delaytop 6 6

+37 -4

tools/accounting/getdelays.c

··· 60 60 } 61 61 62 62 /* Maximum size of response requested or message sent */ 63 - #define MAX_MSG_SIZE 1024 63 + #define MAX_MSG_SIZE 2048 64 64 /* Maximum number of cpus expected to be specified in a cpumask */ 65 65 #define MAX_CPUS 32 66 66 ··· 113 113 error: 114 114 close(fd); 115 115 return -1; 116 + } 117 + 118 + static int recv_taskstats_msg(int sd, struct msgtemplate *msg) 119 + { 120 + struct sockaddr_nl nladdr; 121 + struct iovec iov = { 122 + .iov_base = msg, 123 + .iov_len = sizeof(*msg), 124 + }; 125 + struct msghdr hdr = { 126 + .msg_name = &nladdr, 127 + .msg_namelen = sizeof(nladdr), 128 + .msg_iov = &iov, 129 + .msg_iovlen = 1, 130 + }; 131 + int ret; 132 + 133 + ret = recvmsg(sd, &hdr, 0); 134 + if (ret < 0) 135 + return -1; 136 + if (hdr.msg_flags & MSG_TRUNC) { 137 + errno = EMSGSIZE; 138 + return -1; 139 + } 140 + 141 + return ret; 116 142 } 117 143 118 144 ··· 659 633 } 660 634 661 635 do { 662 - rep_len = recv(nl_sd, &msg, sizeof(msg), 0); 636 + rep_len = recv_taskstats_msg(nl_sd, &msg); 663 637 PRINTF("received %d bytes\n", rep_len); 664 638 665 639 if (rep_len < 0) { 666 - fprintf(stderr, "nonfatal reply error: errno %d\n", 667 - errno); 640 + if (errno == EMSGSIZE) 641 + fprintf(stderr, 642 + "dropped truncated taskstats netlink message, please increase MAX_MSG_SIZE\n"); 643 + else 644 + fprintf(stderr, "nonfatal reply error: errno %d\n", 645 + errno); 668 646 continue; 669 647 } 670 648 if (msg.n.nlmsg_type == NLMSG_ERROR || ··· 710 680 printf("TGID\t%d\n", rtid); 711 681 break; 712 682 case TASKSTATS_TYPE_STATS: 683 + PRINTF("version %u\n", 684 + ((struct taskstats *) 685 + NLA_DATA(na))->version); 713 686 if (print_delays) 714 687 print_delayacct((struct taskstats *) NLA_DATA(na)); 715 688 if (print_io_accounting)

+36 -4

tools/accounting/procacct.c

··· 71 71 } 72 72 73 73 /* Maximum size of response requested or message sent */ 74 - #define MAX_MSG_SIZE 1024 74 + #define MAX_MSG_SIZE 2048 75 75 /* Maximum number of cpus expected to be specified in a cpumask */ 76 76 #define MAX_CPUS 32 77 77 ··· 119 119 error: 120 120 close(fd); 121 121 return -1; 122 + } 123 + 124 + static int recv_taskstats_msg(int sd, struct msgtemplate *msg) 125 + { 126 + struct sockaddr_nl nladdr; 127 + struct iovec iov = { 128 + .iov_base = msg, 129 + .iov_len = sizeof(*msg), 130 + }; 131 + struct msghdr hdr = { 132 + .msg_name = &nladdr, 133 + .msg_namelen = sizeof(nladdr), 134 + .msg_iov = &iov, 135 + .msg_iovlen = 1, 136 + }; 137 + int ret; 138 + 139 + ret = recvmsg(sd, &hdr, 0); 140 + if (ret < 0) 141 + return -1; 142 + if (hdr.msg_flags & MSG_TRUNC) { 143 + errno = EMSGSIZE; 144 + return -1; 145 + } 146 + 147 + return ret; 122 148 } 123 149 124 150 ··· 265 239 PRINTF("TGID\t%d\n", rtid); 266 240 break; 267 241 case TASKSTATS_TYPE_STATS: 242 + PRINTF("version %u\n", 243 + ((struct taskstats *)NLA_DATA(na))->version); 268 244 if (mother == TASKSTATS_TYPE_AGGR_PID) 269 245 print_procacct((struct taskstats *) NLA_DATA(na)); 270 246 if (fd) { ··· 375 347 } 376 348 377 349 do { 378 - rep_len = recv(nl_sd, &msg, sizeof(msg), 0); 350 + rep_len = recv_taskstats_msg(nl_sd, &msg); 379 351 PRINTF("received %d bytes\n", rep_len); 380 352 381 353 if (rep_len < 0) { 382 - fprintf(stderr, "nonfatal reply error: errno %d\n", 383 - errno); 354 + if (errno == EMSGSIZE) 355 + fprintf(stderr, 356 + "dropped truncated taskstats netlink message, please increase MAX_MSG_SIZE\n"); 357 + else 358 + fprintf(stderr, "nonfatal reply error: errno %d\n", 359 + errno); 384 360 continue; 385 361 } 386 362 if (msg.n.nlmsg_type == NLMSG_ERROR ||

+291

tools/include/uapi/linux/taskstats.h

··· 1 + /* SPDX-License-Identifier: LGPL-2.1 WITH Linux-syscall-note */ 2 + /* taskstats.h - exporting per-task statistics 3 + * 4 + * Copyright (C) Shailabh Nagar, IBM Corp. 2006 5 + * (C) Balbir Singh, IBM Corp. 2006 6 + * (C) Jay Lan, SGI, 2006 7 + * 8 + * This program is free software; you can redistribute it and/or modify it 9 + * under the terms of version 2.1 of the GNU Lesser General Public License 10 + * as published by the Free Software Foundation. 11 + * 12 + * This program is distributed in the hope that it would be useful, but 13 + * WITHOUT ANY WARRANTY; without even the implied warranty of 14 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 15 + */ 16 + 17 + #ifndef _LINUX_TASKSTATS_H 18 + #define _LINUX_TASKSTATS_H 19 + 20 + #include <linux/types.h> 21 + #include <linux/time_types.h> 22 + 23 + /* Format for per-task data returned to userland when 24 + * - a task exits 25 + * - listener requests stats for a task 26 + * 27 + * The struct is versioned. Newer versions should only add fields to 28 + * the bottom of the struct to maintain backward compatibility. 29 + * 30 + * 31 + * To add new fields 32 + * a) bump up TASKSTATS_VERSION 33 + * b) add comment indicating new version number at end of struct 34 + * c) add new fields after version comment; maintain 64-bit alignment 35 + */ 36 + 37 + 38 + #define TASKSTATS_VERSION 17 39 + #define TS_COMM_LEN 32 /* should be >= TASK_COMM_LEN 40 + * in linux/sched.h */ 41 + 42 + struct taskstats { 43 + 44 + /* The version number of this struct. This field is always set to 45 + * TAKSTATS_VERSION, which is defined in <linux/taskstats.h>. 46 + * Each time the struct is changed, the value should be incremented. 47 + */ 48 + __u16 version; 49 + __u32 ac_exitcode; /* Exit status */ 50 + 51 + /* The accounting flags of a task as defined in <linux/acct.h> 52 + * Defined values are AFORK, ASU, ACOMPAT, ACORE, AXSIG, and AGROUP. 53 + * (AGROUP since version 12). 54 + */ 55 + __u8 ac_flag; /* Record flags */ 56 + __u8 ac_nice; /* task_nice */ 57 + 58 + /* Delay accounting fields start 59 + * 60 + * All values, until comment "Delay accounting fields end" are 61 + * available only if delay accounting is enabled, even though the last 62 + * few fields are not delays 63 + * 64 + * xxx_count is the number of delay values recorded 65 + * xxx_delay_total is the corresponding cumulative delay in nanoseconds 66 + * 67 + * xxx_delay_total wraps around to zero on overflow 68 + * xxx_count incremented regardless of overflow 69 + */ 70 + 71 + /* Delay waiting for cpu, while runnable 72 + * count, delay_total NOT updated atomically 73 + */ 74 + __u64 cpu_count __attribute__((aligned(8))); 75 + __u64 cpu_delay_total; 76 + 77 + /* Following four fields atomically updated using task->delays->lock */ 78 + 79 + /* Delay waiting for synchronous block I/O to complete 80 + * does not account for delays in I/O submission 81 + */ 82 + __u64 blkio_count; 83 + __u64 blkio_delay_total; 84 + 85 + /* Delay waiting for page fault I/O (swap in only) */ 86 + __u64 swapin_count; 87 + __u64 swapin_delay_total; 88 + 89 + /* cpu "wall-clock" running time 90 + * On some architectures, value will adjust for cpu time stolen 91 + * from the kernel in involuntary waits due to virtualization. 92 + * Value is cumulative, in nanoseconds, without a corresponding count 93 + * and wraps around to zero silently on overflow 94 + */ 95 + __u64 cpu_run_real_total; 96 + 97 + /* cpu "virtual" running time 98 + * Uses time intervals seen by the kernel i.e. no adjustment 99 + * for kernel's involuntary waits due to virtualization. 100 + * Value is cumulative, in nanoseconds, without a corresponding count 101 + * and wraps around to zero silently on overflow 102 + */ 103 + __u64 cpu_run_virtual_total; 104 + /* Delay accounting fields end */ 105 + /* version 1 ends here */ 106 + 107 + /* Basic Accounting Fields start */ 108 + char ac_comm[TS_COMM_LEN]; /* Command name */ 109 + __u8 ac_sched __attribute__((aligned(8))); 110 + /* Scheduling discipline */ 111 + __u8 ac_pad[3]; 112 + __u32 ac_uid __attribute__((aligned(8))); 113 + /* User ID */ 114 + __u32 ac_gid; /* Group ID */ 115 + __u32 ac_pid; /* Process ID */ 116 + __u32 ac_ppid; /* Parent process ID */ 117 + /* __u32 range means times from 1970 to 2106 */ 118 + __u32 ac_btime; /* Begin time [sec since 1970] */ 119 + __u64 ac_etime __attribute__((aligned(8))); 120 + /* Elapsed time [usec] */ 121 + __u64 ac_utime; /* User CPU time [usec] */ 122 + __u64 ac_stime; /* SYstem CPU time [usec] */ 123 + __u64 ac_minflt; /* Minor Page Fault Count */ 124 + __u64 ac_majflt; /* Major Page Fault Count */ 125 + /* Basic Accounting Fields end */ 126 + 127 + /* Extended accounting fields start */ 128 + /* Accumulated RSS usage in duration of a task, in MBytes-usecs. 129 + * The current rss usage is added to this counter every time 130 + * a tick is charged to a task's system time. So, at the end we 131 + * will have memory usage multiplied by system time. Thus an 132 + * average usage per system time unit can be calculated. 133 + */ 134 + __u64 coremem; /* accumulated RSS usage in MB-usec */ 135 + /* Accumulated virtual memory usage in duration of a task. 136 + * Same as acct_rss_mem1 above except that we keep track of VM usage. 137 + */ 138 + __u64 virtmem; /* accumulated VM usage in MB-usec */ 139 + 140 + /* High watermark of RSS and virtual memory usage in duration of 141 + * a task, in KBytes. 142 + */ 143 + __u64 hiwater_rss; /* High-watermark of RSS usage, in KB */ 144 + __u64 hiwater_vm; /* High-water VM usage, in KB */ 145 + 146 + /* The following four fields are I/O statistics of a task. */ 147 + __u64 read_char; /* bytes read */ 148 + __u64 write_char; /* bytes written */ 149 + __u64 read_syscalls; /* read syscalls */ 150 + __u64 write_syscalls; /* write syscalls */ 151 + /* Extended accounting fields end */ 152 + 153 + #define TASKSTATS_HAS_IO_ACCOUNTING 154 + /* Per-task storage I/O accounting starts */ 155 + __u64 read_bytes; /* bytes of read I/O */ 156 + __u64 write_bytes; /* bytes of write I/O */ 157 + __u64 cancelled_write_bytes; /* bytes of cancelled write I/O */ 158 + 159 + __u64 nvcsw; /* voluntary_ctxt_switches */ 160 + __u64 nivcsw; /* nonvoluntary_ctxt_switches */ 161 + 162 + /* time accounting for SMT machines */ 163 + __u64 ac_utimescaled; /* utime scaled on frequency etc */ 164 + __u64 ac_stimescaled; /* stime scaled on frequency etc */ 165 + __u64 cpu_scaled_run_real_total; /* scaled cpu_run_real_total */ 166 + 167 + /* Delay waiting for memory reclaim */ 168 + __u64 freepages_count; 169 + __u64 freepages_delay_total; 170 + 171 + 172 + /* Delay waiting for thrashing page */ 173 + __u64 thrashing_count; 174 + __u64 thrashing_delay_total; 175 + 176 + /* v10: 64-bit btime to avoid overflow */ 177 + __u64 ac_btime64; /* 64-bit begin time */ 178 + 179 + /* v11: Delay waiting for memory compact */ 180 + __u64 compact_count; 181 + __u64 compact_delay_total; 182 + 183 + /* v12 begin */ 184 + __u32 ac_tgid; /* thread group ID */ 185 + /* Thread group walltime up to now. This is total process walltime if 186 + * AGROUP flag is set. 187 + */ 188 + __u64 ac_tgetime __attribute__((aligned(8))); 189 + /* Lightweight information to identify process binary files. 190 + * This leaves userspace to match this to a file system path, using 191 + * MAJOR() and MINOR() macros to identify a device and mount point, 192 + * the inode to identify the executable file. This is /proc/self/exe 193 + * at the end, so matching the most recent exec(). Values are zero 194 + * for kernel threads. 195 + */ 196 + __u64 ac_exe_dev; /* program binary device ID */ 197 + __u64 ac_exe_inode; /* program binary inode number */ 198 + /* v12 end */ 199 + 200 + /* v13: Delay waiting for write-protect copy */ 201 + __u64 wpcopy_count; 202 + __u64 wpcopy_delay_total; 203 + 204 + /* v14: Delay waiting for IRQ/SOFTIRQ */ 205 + __u64 irq_count; 206 + __u64 irq_delay_total; 207 + 208 + /* v15: add Delay max and Delay min */ 209 + 210 + /* v16: move Delay max and Delay min to the end of taskstat */ 211 + __u64 cpu_delay_max; 212 + __u64 cpu_delay_min; 213 + 214 + __u64 blkio_delay_max; 215 + __u64 blkio_delay_min; 216 + 217 + __u64 swapin_delay_max; 218 + __u64 swapin_delay_min; 219 + 220 + __u64 freepages_delay_max; 221 + __u64 freepages_delay_min; 222 + 223 + __u64 thrashing_delay_max; 224 + __u64 thrashing_delay_min; 225 + 226 + __u64 compact_delay_max; 227 + __u64 compact_delay_min; 228 + 229 + __u64 wpcopy_delay_max; 230 + __u64 wpcopy_delay_min; 231 + 232 + __u64 irq_delay_max; 233 + __u64 irq_delay_min; 234 + 235 + /*v17: delay max timestamp record*/ 236 + struct __kernel_timespec cpu_delay_max_ts; 237 + struct __kernel_timespec blkio_delay_max_ts; 238 + struct __kernel_timespec swapin_delay_max_ts; 239 + struct __kernel_timespec freepages_delay_max_ts; 240 + struct __kernel_timespec thrashing_delay_max_ts; 241 + struct __kernel_timespec compact_delay_max_ts; 242 + struct __kernel_timespec wpcopy_delay_max_ts; 243 + struct __kernel_timespec irq_delay_max_ts; 244 + }; 245 + 246 + 247 + /* 248 + * Commands sent from userspace 249 + * Not versioned. New commands should only be inserted at the enum's end 250 + * prior to __TASKSTATS_CMD_MAX 251 + */ 252 + 253 + enum { 254 + TASKSTATS_CMD_UNSPEC = 0, /* Reserved */ 255 + TASKSTATS_CMD_GET, /* user->kernel request/get-response */ 256 + TASKSTATS_CMD_NEW, /* kernel->user event */ 257 + __TASKSTATS_CMD_MAX, 258 + }; 259 + 260 + #define TASKSTATS_CMD_MAX (__TASKSTATS_CMD_MAX - 1) 261 + 262 + enum { 263 + TASKSTATS_TYPE_UNSPEC = 0, /* Reserved */ 264 + TASKSTATS_TYPE_PID, /* Process id */ 265 + TASKSTATS_TYPE_TGID, /* Thread group id */ 266 + TASKSTATS_TYPE_STATS, /* taskstats structure */ 267 + TASKSTATS_TYPE_AGGR_PID, /* contains pid + stats */ 268 + TASKSTATS_TYPE_AGGR_TGID, /* contains tgid + stats */ 269 + TASKSTATS_TYPE_NULL, /* contains nothing */ 270 + __TASKSTATS_TYPE_MAX, 271 + }; 272 + 273 + #define TASKSTATS_TYPE_MAX (__TASKSTATS_TYPE_MAX - 1) 274 + 275 + enum { 276 + TASKSTATS_CMD_ATTR_UNSPEC = 0, 277 + TASKSTATS_CMD_ATTR_PID, 278 + TASKSTATS_CMD_ATTR_TGID, 279 + TASKSTATS_CMD_ATTR_REGISTER_CPUMASK, 280 + TASKSTATS_CMD_ATTR_DEREGISTER_CPUMASK, 281 + __TASKSTATS_CMD_ATTR_MAX, 282 + }; 283 + 284 + #define TASKSTATS_CMD_ATTR_MAX (__TASKSTATS_CMD_ATTR_MAX - 1) 285 + 286 + /* NETLINK_GENERIC related info */ 287 + 288 + #define TASKSTATS_GENL_NAME "TASKSTATS" 289 + #define TASKSTATS_GENL_VERSION 0x1 290 + 291 + #endif /* _LINUX_TASKSTATS_H */

+2 -2

tools/testing/selftests/breakpoints/Makefile

··· 1 1 # SPDX-License-Identifier: GPL-2.0 2 2 # Taken from perf makefile 3 - uname_M := $(shell uname -m 2>/dev/null || echo not) 4 - ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/x86/ -e s/x86_64/x86/) 3 + ARCH ?= $(shell uname -m 2>/dev/null || echo not) 4 + override ARCH := $(shell echo $(ARCH) | sed -e s/i.86/x86/ -e s/x86_64/x86/) 5 5 6 6 TEST_GEN_PROGS := step_after_suspend_test 7 7

+108 -50

tools/testing/selftests/fchmodat2/fchmodat2_test.c

··· 9 9 10 10 #include "kselftest.h" 11 11 12 + struct testdir { 13 + char *dirname; 14 + int dfd; 15 + }; 16 + 12 17 int sys_fchmodat2(int dfd, const char *filename, mode_t mode, int flags) 13 18 { 14 19 int ret = syscall(__NR_fchmodat2, dfd, filename, mode, flags); ··· 21 16 return ret >= 0 ? ret : -errno; 22 17 } 23 18 24 - int setup_testdir(void) 19 + static void setup_testdir(struct testdir *testdir) 25 20 { 26 - int dfd, ret; 21 + int ret, dfd; 27 22 char dirname[] = "/tmp/ksft-fchmodat2.XXXXXX"; 28 23 29 24 /* Make the top-level directory. */ ··· 31 26 ksft_exit_fail_msg("%s: failed to create tmpdir\n", __func__); 32 27 33 28 dfd = open(dirname, O_PATH | O_DIRECTORY); 34 - if (dfd < 0) 35 - ksft_exit_fail_msg("%s: failed to open tmpdir\n", __func__); 29 + if (dfd < 0) { 30 + ksft_perror("failed to open tmpdir"); 31 + goto err; 32 + } 36 33 37 34 ret = openat(dfd, "regfile", O_CREAT | O_WRONLY | O_TRUNC, 0644); 38 - if (ret < 0) 39 - ksft_exit_fail_msg("%s: failed to create file in tmpdir\n", 40 - __func__); 35 + if (ret < 0) { 36 + ksft_perror("failed to create file in tmpdir"); 37 + goto err; 38 + } 41 39 close(ret); 42 40 43 41 ret = symlinkat("regfile", dfd, "symlink"); 44 - if (ret < 0) 45 - ksft_exit_fail_msg("%s: failed to create symlink in tmpdir\n", 46 - __func__); 42 + if (ret < 0) { 43 + ksft_perror("symlinkat() failed"); 44 + goto err_regfile; 45 + } 47 46 48 - return dfd; 47 + testdir->dirname = strdup(dirname); 48 + if (!testdir->dirname) { 49 + ksft_perror("Out of memory"); 50 + goto err_symlink; 51 + } 52 + testdir->dfd = dfd; 53 + 54 + return; 55 + 56 + err_symlink: 57 + unlinkat(testdir->dfd, "symlink", 0); 58 + err_regfile: 59 + unlinkat(testdir->dfd, "regfile", 0); 60 + err: 61 + unlink(dirname); 62 + ksft_exit_fail(); 63 + } 64 + 65 + static void cleanup_testdir(struct testdir *testdir) 66 + { 67 + unlinkat(testdir->dfd, "regfile", 0); 68 + unlinkat(testdir->dfd, "symlink", 0); 69 + rmdir(testdir->dirname); 70 + free(testdir->dirname); 49 71 } 50 72 51 73 int expect_mode(int dfd, const char *filename, mode_t expect_mode) ··· 80 48 struct stat st; 81 49 int ret = fstatat(dfd, filename, &st, AT_SYMLINK_NOFOLLOW); 82 50 83 - if (ret) 84 - ksft_exit_fail_msg("%s: %s: fstatat failed\n", 85 - __func__, filename); 51 + if (ret) { 52 + ksft_perror("fstatat() failed\n"); 53 + return 0; 54 + } 86 55 87 56 return (st.st_mode == expect_mode); 88 57 } 89 58 90 59 void test_regfile(void) 91 60 { 92 - int dfd, ret; 61 + struct testdir testdir; 62 + int ret; 93 63 94 - dfd = setup_testdir(); 64 + setup_testdir(&testdir); 95 65 96 - ret = sys_fchmodat2(dfd, "regfile", 0640, 0); 66 + ret = sys_fchmodat2(testdir.dfd, "regfile", 0640, 0); 97 67 98 - if (ret < 0) 99 - ksft_exit_fail_msg("%s: fchmodat2(noflag) failed\n", __func__); 68 + if (ret < 0) { 69 + ksft_perror("fchmodat2(noflag) failed"); 70 + goto out; 71 + } 100 72 101 - if (!expect_mode(dfd, "regfile", 0100640)) 102 - ksft_exit_fail_msg("%s: wrong file mode bits after fchmodat2\n", 73 + if (!expect_mode(testdir.dfd, "regfile", 0100640)) { 74 + ksft_print_msg("%s: wrong file mode bits after fchmodat2\n", 103 75 __func__); 76 + ret = 1; 77 + goto out; 78 + } 104 79 105 - ret = sys_fchmodat2(dfd, "regfile", 0600, AT_SYMLINK_NOFOLLOW); 80 + ret = sys_fchmodat2(testdir.dfd, "regfile", 0600, AT_SYMLINK_NOFOLLOW); 106 81 107 - if (ret < 0) 108 - ksft_exit_fail_msg("%s: fchmodat2(AT_SYMLINK_NOFOLLOW) failed\n", 109 - __func__); 82 + if (ret < 0) { 83 + ksft_perror("fchmodat2(AT_SYMLINK_NOFOLLOW) failed"); 84 + goto out; 85 + } 110 86 111 - if (!expect_mode(dfd, "regfile", 0100600)) 112 - ksft_exit_fail_msg("%s: wrong file mode bits after fchmodat2 with nofollow\n", 113 - __func__); 87 + if (!expect_mode(testdir.dfd, "regfile", 0100600)) { 88 + ksft_print_msg("%s: wrong file mode bits after fchmodat2 with nofollow\n", 89 + __func__); 90 + ret = 1; 91 + } 114 92 115 - ksft_test_result_pass("fchmodat2(regfile)\n"); 93 + out: 94 + ksft_test_result(ret == 0, "fchmodat2(regfile)\n"); 95 + cleanup_testdir(&testdir); 116 96 } 117 97 118 98 void test_symlink(void) 119 99 { 120 - int dfd, ret; 100 + struct testdir testdir; 101 + int ret; 121 102 122 - dfd = setup_testdir(); 103 + setup_testdir(&testdir); 123 104 124 - ret = sys_fchmodat2(dfd, "symlink", 0640, 0); 105 + ret = sys_fchmodat2(testdir.dfd, "symlink", 0640, 0); 125 106 126 - if (ret < 0) 127 - ksft_exit_fail_msg("%s: fchmodat2(noflag) failed\n", __func__); 107 + if (ret < 0) { 108 + ksft_perror("fchmodat2(noflag) failed"); 109 + goto err; 110 + } 128 111 129 - if (!expect_mode(dfd, "regfile", 0100640)) 130 - ksft_exit_fail_msg("%s: wrong file mode bits after fchmodat2\n", 131 - __func__); 112 + if (!expect_mode(testdir.dfd, "regfile", 0100640)) { 113 + ksft_print_msg("%s: wrong file mode bits after fchmodat2\n", 114 + __func__); 115 + goto err; 116 + } 132 117 133 - if (!expect_mode(dfd, "symlink", 0120777)) 134 - ksft_exit_fail_msg("%s: wrong symlink mode bits after fchmodat2\n", 135 - __func__); 118 + if (!expect_mode(testdir.dfd, "symlink", 0120777)) { 119 + ksft_print_msg("%s: wrong symlink mode bits after fchmodat2\n", 120 + __func__); 121 + goto err; 122 + } 136 123 137 - ret = sys_fchmodat2(dfd, "symlink", 0600, AT_SYMLINK_NOFOLLOW); 124 + ret = sys_fchmodat2(testdir.dfd, "symlink", 0600, AT_SYMLINK_NOFOLLOW); 138 125 139 126 /* 140 127 * On certain filesystems (xfs or btrfs), chmod operation fails. So we ··· 162 111 * 163 112 * https://sourceware.org/legacy-ml/libc-alpha/2020-02/msg00467.html 164 113 */ 165 - if (ret == 0 && !expect_mode(dfd, "symlink", 0120600)) 166 - ksft_exit_fail_msg("%s: wrong symlink mode bits after fchmodat2 with nofollow\n", 114 + if (ret == 0 && !expect_mode(testdir.dfd, "symlink", 0120600)) { 115 + ksft_print_msg("%s: wrong symlink mode bits after fchmodat2 with nofollow\n", 167 116 __func__); 117 + ret = 1; 118 + goto err; 119 + } 168 120 169 - if (!expect_mode(dfd, "regfile", 0100640)) 170 - ksft_exit_fail_msg("%s: wrong file mode bits after fchmodat2 with nofollow\n", 171 - __func__); 121 + if (!expect_mode(testdir.dfd, "regfile", 0100640)) { 122 + ksft_print_msg("%s: wrong file mode bits after fchmodat2 with nofollow\n", 123 + __func__); 124 + } 172 125 173 126 if (ret != 0) 174 127 ksft_test_result_skip("fchmodat2(symlink)\n"); 175 128 else 176 129 ksft_test_result_pass("fchmodat2(symlink)\n"); 130 + cleanup_testdir(&testdir); 131 + return; 132 + 133 + err: 134 + ksft_test_result_fail("fchmodat2(symlink)\n"); 135 + cleanup_testdir(&testdir); 177 136 } 178 137 179 138 #define NUM_TESTS 2 ··· 196 135 test_regfile(); 197 136 test_symlink(); 198 137 199 - if (ksft_get_fail_cnt() + ksft_get_error_cnt() > 0) 200 - ksft_exit_fail(); 201 - else 202 - ksft_exit_pass(); 138 + ksft_finished(); 203 139 }

+4 -4

tools/testing/selftests/ipc/Makefile

··· 1 1 # SPDX-License-Identifier: GPL-2.0 2 - uname_M := $(shell uname -m 2>/dev/null || echo not) 3 - ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/i386/) 2 + ARCH ?= $(shell uname -m 2>/dev/null || echo not) 3 + override ARCH := $(shell echo $(ARCH) | sed -e s/i.86/i386/) 4 4 ifeq ($(ARCH),i386) 5 - ARCH := x86 5 + override ARCH := x86 6 6 CFLAGS := -DCONFIG_X86_32 -D__i386__ 7 7 endif 8 8 ifeq ($(ARCH),x86_64) 9 - ARCH := x86 9 + override ARCH := x86 10 10 CFLAGS := -DCONFIG_X86_64 -D__x86_64__ 11 11 endif 12 12

+3

tools/testing/selftests/ipc/msgque.c

··· 161 161 ret = msgrcv(msgque->msq_id, &msgque->messages[i].mtype, 162 162 MAX_MSG_SIZE, i, IPC_NOWAIT | MSG_COPY); 163 163 if (ret < 0) { 164 + if (errno == ENOSYS) 165 + ksft_exit_skip("MSG_COPY not supported\n"); 166 + 164 167 ksft_test_result_fail("Failed to copy IPC message: %m (%d)\n", errno); 165 168 return -errno; 166 169 }

+2 -2

tools/testing/selftests/prctl/Makefile

··· 1 1 # SPDX-License-Identifier: GPL-2.0 2 2 ifndef CROSS_COMPILE 3 - uname_M := $(shell uname -m 2>/dev/null || echo not) 4 - ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/x86/ -e s/x86_64/x86/) 3 + ARCH ?= $(shell uname -m 2>/dev/null || echo not) 4 + override ARCH := $(shell echo $(ARCH) | sed -e s/i.86/x86/ -e s/x86_64/x86/) 5 5 6 6 ifeq ($(ARCH),x86) 7 7 TEST_PROGS := disable-tsc-ctxt-sw-stress-test disable-tsc-on-off-stress-test \

+2 -2

tools/testing/selftests/sparc64/Makefile

··· 1 1 # SPDX-License-Identifier: GPL-2.0 2 - uname_M := $(shell uname -m 2>/dev/null || echo not) 3 - ARCH ?= $(shell echo $(uname_M) | sed -e s/x86_64/x86/) 2 + ARCH ?= $(shell uname -m 2>/dev/null || echo not) 3 + override ARCH := $(shell echo $(ARCH) | sed -e s/x86_64/x86/) 4 4 5 5 ifneq ($(ARCH),sparc64) 6 6 nothing:

+2 -2

tools/testing/selftests/thermal/intel/power_floor/Makefile

··· 1 1 # SPDX-License-Identifier: GPL-2.0 2 2 ifndef CROSS_COMPILE 3 - uname_M := $(shell uname -m 2>/dev/null || echo not) 4 - ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/x86/ -e s/x86_64/x86/) 3 + ARCH ?= $(shell uname -m 2>/dev/null || echo not) 4 + override ARCH := $(shell echo $(ARCH) | sed -e s/i.86/x86/ -e s/x86_64/x86/) 5 5 6 6 ifeq ($(ARCH),x86) 7 7 TEST_GEN_PROGS := power_floor_test

+2 -2

tools/testing/selftests/thermal/intel/workload_hint/Makefile

··· 1 1 # SPDX-License-Identifier: GPL-2.0 2 2 ifndef CROSS_COMPILE 3 - uname_M := $(shell uname -m 2>/dev/null || echo not) 4 - ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/x86/ -e s/x86_64/x86/) 3 + ARCH ?= $(shell uname -m 2>/dev/null || echo not) 4 + override ARCH := $(shell echo $(ARCH) | sed -e s/i.86/x86/ -e s/x86_64/x86/) 5 5 6 6 ifeq ($(ARCH),x86) 7 7 TEST_GEN_PROGS := workload_hint_test

Configure Feed

Configure Feed