bpf, arm64: Use ORR-based MOV for general-purpose registers

Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

kernel os linux

The A64_MOV macro unconditionally uses ADD Rd, Rn, #0 to implement
register moves. While functionally correct, this is not the canonical
encoding when both operands are general-purpose registers.

On AArch64, MOV has two aliases depending on the operand registers:
- MOV <Xd|SP>, <Xn|SP> → ADD <Xd|SP>, <Xn|SP>, #0
- MOV <Xd>, <Xn> → ORR <Xd>, XZR, <Xn>

The ADD form is required when the stack pointer is involved (as ORR
does not accept SP), while the ORR form is the preferred encoding for
general-purpose registers.

The ORR encoding is also measurably faster on modern microarchitectures.
A microbenchmark [1] comparing dependent chains of MOV (ORR) vs ADD #0
on an ARM Neoverse-V2 (72-core, 3.4 GHz) shows:

=== mov (ORR Xd, XZR, Xn) ===
run1 cycles/op=0.749859456
run2 cycles/op=0.749991250
run3 cycles/op=0.749601847
avg cycles/op=0.749817518

=== add0 (ADD Xd, Xn, #0) ===
run1 cycles/op=1.004777689
run2 cycles/op=1.004558266
run3 cycles/op=1.004806559
avg cycles/op=1.004714171

The ORR form completes in ~0.75 cycles/op vs ~1.00 cycles/op for ADD #0,
a ~25% improvement. This is likely because the CPU's register renaming
hardware can eliminate ORR-based moves, while ADD #0 must go through the
ALU pipeline.

Update A64_MOV to select the appropriate encoding at JIT time:
use ADD when either register is A64_SP, and ORR (via
aarch64_insn_gen_move_reg()) otherwise.

Update verifier_private_stack selftests to expect "mov x7, x0" instead
of "add x7, x0, #0x0" in the JITed instruction checks, matching the
new ORR-based encoding.

[1] https://github.com/puranjaymohan/scripts/blob/main/arm64/bench/run_mov_vs_add0.sh

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/r/20260225134339.2723288-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

authored by

Puranjay Mohan and committed by

Alexei Starovoitov 3 months ago b1d6bd54 b0cc2e06

+7 -5

2 changed files

expand all

arch

arm64

net

bpf_jit.h

tools

testing

selftests

bpf

progs

verifier_private_stack.c

+3 -1

arch/arm64/net/bpf_jit.h

··· 187 187 /* Rn - imm12; set condition flags */ 188 188 #define A64_CMP_I(sf, Rn, imm12) A64_SUBS_I(sf, A64_ZR, Rn, imm12) 189 189 /* Rd = Rn */ 190 - #define A64_MOV(sf, Rd, Rn) A64_ADD_I(sf, Rd, Rn, 0) 190 + #define A64_MOV(sf, Rd, Rn) \ 191 + (((Rd) == A64_SP || (Rn) == A64_SP) ? A64_ADD_I(sf, Rd, Rn, 0) : \ 192 + aarch64_insn_gen_move_reg(Rd, Rn, A64_VARIANT(sf))) 191 193 192 194 /* Bitfield move */ 193 195 #define A64_BITFIELD(sf, Rd, Rn, immr, imms, type) \

+4 -4

tools/testing/selftests/bpf/progs/verifier_private_stack.c

··· 170 170 __jited(" add x27, x27, x10") 171 171 __jited(" add x25, x27, {{.*}}") 172 172 __jited(" bl 0x{{.*}}") 173 - __jited(" add x7, x0, #0x0") 173 + __jited(" mov x7, x0") 174 174 __jited(" mov x0, #0x2a") 175 175 __jited(" str x0, [x27]") 176 176 __jited(" bl 0x{{.*}}") 177 - __jited(" add x7, x0, #0x0") 177 + __jited(" mov x7, x0") 178 178 __jited(" mov x7, #0x0") 179 179 __jited(" ldp x25, x27, [sp], {{.*}}") 180 180 __naked void private_stack_callback(void) ··· 220 220 __jited(" str x0, [x27]") 221 221 __jited(" mov x0, #0x0") 222 222 __jited(" bl 0x{{.*}}") 223 - __jited(" add x7, x0, #0x0") 223 + __jited(" mov x7, x0") 224 224 __jited(" ldp x27, x28, [sp], #0x10") 225 225 int private_stack_exception_main_prog(void) 226 226 { ··· 258 258 __jited(" mov x0, #0x2a") 259 259 __jited(" str x0, [x27]") 260 260 __jited(" bl 0x{{.*}}") 261 - __jited(" add x7, x0, #0x0") 261 + __jited(" mov x7, x0") 262 262 __jited(" ldp x27, x28, [sp], #0x10") 263 263 int private_stack_exception_sub_prog(void) 264 264 {

Configure Feed

Configure Feed