Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge branch 'bpf-support-private-stack-for-bpf-progs'

Yonghong Song says:

====================
bpf: Support private stack for bpf progs

The main motivation for private stack comes from nested scheduler in
sched-ext from Tejun. The basic idea is that
- each cgroup will its own associated bpf program,
- bpf program with parent cgroup will call bpf programs
in immediate child cgroups.

Let us say we have the following cgroup hierarchy:
root_cg (prog0):
cg1 (prog1):
cg11 (prog11):
cg111 (prog111)
cg112 (prog112)
cg12 (prog12):
cg121 (prog121)
cg122 (prog122)
cg2 (prog2):
cg21 (prog21)
cg22 (prog22)
cg23 (prog23)

In the above example, prog0 will call a kfunc which will call prog1 and
prog2 to get sched info for cg1 and cg2 and then the information is
summarized and sent back to prog0. Similarly, prog11 and prog12 will be
invoked in the kfunc and the result will be summarized and sent back to
prog1, etc. The following illustrates a possible call sequence:
... -> bpf prog A -> kfunc -> ops.<callback_fn> (bpf prog B) ...

Currently, for each thread, the x86 kernel allocate 16KB stack. Each
bpf program (including its subprograms) has maximum 512B stack size to
avoid potential stack overflow. Nested bpf programs further increase the
risk of stack overflow. To avoid potential stack overflow caused by bpf
programs, this patch set supported private stack and bpf program stack
space is allocated during jit time. Using private stack for bpf progs
can reduce or avoid potential kernel stack overflow.

Currently private stack is applied to tracing programs like kprobe/uprobe,
perf_event, tracepoint, raw tracepoint and struct_ops progs.
Tracing progs enable private stack if any subprog stack size is more
than a threshold (i.e. 64B). Struct-ops progs enable private stack
based on particular struct op implementation which can enable private
stack before verification at per-insn level. Struct-ops progs have
the same treatment as tracing progs w.r.t when to enable private stack.

For all these progs, the kernel will do recursion check (no nesting for
per prog per cpu) to ensure that private stack won't be overwritten.
The bpf_prog_aux struct has a callback func recursion_detected() which
can be implemented by kernel subsystem to synchronously detect recursion,
report error, etc.

Only x86_64 arch supports private stack now. It can be extended to other
archs later. Please see each individual patch for details.

Change logs:
v11 -> v12:
- v11 link: https://lore.kernel.org/bpf/20241109025312.148539-1-yonghong.song@linux.dev/
- Fix a bug where allocated percpu space is less than actual private stack.
- Add guard memory (before and after actual prog stack) to detect potential
underflow/overflow.
v10 -> v11:
- v10 link: https://lore.kernel.org/bpf/20241107024138.3355687-1-yonghong.song@linux.dev/
- Use two bool variables, priv_stack_requested (used by struct-ops only) and
jits_use_priv_stack, in order to make code cleaner.
- Set env->prog->aux->jits_use_priv_stack to true if any subprog uses private stack.
This is for struct-ops use case to kick in recursion protection.
v9 -> v10:
- v9 link: https://lore.kernel.org/bpf/20241104193455.3241859-1-yonghong.song@linux.dev/
- Simplify handling async cbs by making those async cb related progs using normal
kernel stack.
- Do percpu allocation in jit instead of verifier.
v8 -> v9:
- v8 link: https://lore.kernel.org/bpf/20241101030950.2677215-1-yonghong.song@linux.dev/
- Use enum to express priv stack mode.
- Use bits in bpf_subprog_info struct to do subprog recursion check between
main/async and async subprogs.
- Fix potential memory leak.
- Rename recursion detection func from recursion_skipped() to recursion_detected().
v7 -> v8:
- v7 link: https://lore.kernel.org/bpf/20241029221637.264348-1-yonghong.song@linux.dev/
- Add recursion_skipped() callback func to bpf_prog->aux structure such that if
a recursion miss happened and bpf_prog->aux->recursion_skipped is not NULL, the
callback fn will be called so the subsystem can do proper action based on their
respective design.
v6 -> v7:
- v6 link: https://lore.kernel.org/bpf/20241020191341.2104841-1-yonghong.song@linux.dev/
- Going back to do private stack allocation per prog instead per subtree. This can
simplify implementation and avoid verifier complexity.
- Handle potential nested subprog run if async callback exists.
- Use struct_ops->check_member() callback to set whether a particular struct-ops
prog wants private stack or not.
v5 -> v6:
- v5 link: https://lore.kernel.org/bpf/20241017223138.3175885-1-yonghong.song@linux.dev/
- Instead of using (or not using) private stack at struct_ops level,
each prog in struct_ops can decide whether to use private stack or not.
v4 -> v5:
- v4 link: https://lore.kernel.org/bpf/20241010175552.1895980-1-yonghong.song@linux.dev/
- Remove bpf_prog_call() related implementation.
- Allow (opt-in) private stack for sched-ext progs.
v3 -> v4:
- v3 link: https://lore.kernel.org/bpf/20240926234506.1769256-1-yonghong.song@linux.dev/
There is a long discussion in the above v3 link trying to allow private
stack to be used by kernel functions in order to simplify implementation.
But unfortunately we didn't find a workable solution yet, so we return
to the approach where private stack is only used by bpf programs.
- Add bpf_prog_call() kfunc.
v2 -> v3:
- Instead of per-subprog private stack allocation, allocate private
stacks at main prog or callback entry prog. Subprogs not main or callback
progs will increment the inherited stack pointer to be their
frame pointer.
- Private stack allows each prog max stack size to be 512 bytes, intead
of the whole prog hierarchy to be 512 bytes.
- Add some tests.
====================

Link: https://lore.kernel.org/r/20241112163902.2223011-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

+930 -14
+143 -4
arch/x86/net/bpf_jit_comp.c
··· 325 325 /* Number of bytes that will be skipped on tailcall */ 326 326 #define X86_TAIL_CALL_OFFSET (12 + ENDBR_INSN_SIZE) 327 327 328 + static void push_r9(u8 **pprog) 329 + { 330 + u8 *prog = *pprog; 331 + 332 + EMIT2(0x41, 0x51); /* push r9 */ 333 + *pprog = prog; 334 + } 335 + 336 + static void pop_r9(u8 **pprog) 337 + { 338 + u8 *prog = *pprog; 339 + 340 + EMIT2(0x41, 0x59); /* pop r9 */ 341 + *pprog = prog; 342 + } 343 + 328 344 static void push_r12(u8 **pprog) 329 345 { 330 346 u8 *prog = *pprog; ··· 1420 1404 *pprog = prog; 1421 1405 } 1422 1406 1407 + static void emit_priv_frame_ptr(u8 **pprog, void __percpu *priv_frame_ptr) 1408 + { 1409 + u8 *prog = *pprog; 1410 + 1411 + /* movabs r9, priv_frame_ptr */ 1412 + emit_mov_imm64(&prog, X86_REG_R9, (__force long) priv_frame_ptr >> 32, 1413 + (u32) (__force long) priv_frame_ptr); 1414 + 1415 + #ifdef CONFIG_SMP 1416 + /* add <r9>, gs:[<off>] */ 1417 + EMIT2(0x65, 0x4c); 1418 + EMIT3(0x03, 0x0c, 0x25); 1419 + EMIT((u32)(unsigned long)&this_cpu_off, 4); 1420 + #endif 1421 + 1422 + *pprog = prog; 1423 + } 1424 + 1423 1425 #define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp))) 1424 1426 1425 1427 #define __LOAD_TCC_PTR(off) \ ··· 1445 1411 /* mov rax, qword ptr [rbp - rounded_stack_depth - 16] */ 1446 1412 #define LOAD_TAIL_CALL_CNT_PTR(stack) \ 1447 1413 __LOAD_TCC_PTR(BPF_TAIL_CALL_CNT_PTR_STACK_OFF(stack)) 1414 + 1415 + /* Memory size/value to protect private stack overflow/underflow */ 1416 + #define PRIV_STACK_GUARD_SZ 8 1417 + #define PRIV_STACK_GUARD_VAL 0xEB9F12345678eb9fULL 1448 1418 1449 1419 static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image, 1450 1420 int oldproglen, struct jit_context *ctx, bool jmp_padding) ··· 1459 1421 int insn_cnt = bpf_prog->len; 1460 1422 bool seen_exit = false; 1461 1423 u8 temp[BPF_MAX_INSN_SIZE + BPF_INSN_SAFETY]; 1424 + void __percpu *priv_frame_ptr = NULL; 1462 1425 u64 arena_vm_start, user_vm_start; 1426 + void __percpu *priv_stack_ptr; 1463 1427 int i, excnt = 0; 1464 1428 int ilen, proglen = 0; 1465 1429 u8 *prog = temp; 1430 + u32 stack_depth; 1466 1431 int err; 1432 + 1433 + stack_depth = bpf_prog->aux->stack_depth; 1434 + priv_stack_ptr = bpf_prog->aux->priv_stack_ptr; 1435 + if (priv_stack_ptr) { 1436 + priv_frame_ptr = priv_stack_ptr + PRIV_STACK_GUARD_SZ + round_up(stack_depth, 8); 1437 + stack_depth = 0; 1438 + } 1467 1439 1468 1440 arena_vm_start = bpf_arena_get_kern_vm_start(bpf_prog->aux->arena); 1469 1441 user_vm_start = bpf_arena_get_user_vm_start(bpf_prog->aux->arena); 1470 1442 1471 1443 detect_reg_usage(insn, insn_cnt, callee_regs_used); 1472 1444 1473 - emit_prologue(&prog, bpf_prog->aux->stack_depth, 1445 + emit_prologue(&prog, stack_depth, 1474 1446 bpf_prog_was_classic(bpf_prog), tail_call_reachable, 1475 1447 bpf_is_subprog(bpf_prog), bpf_prog->aux->exception_cb); 1476 1448 /* Exception callback will clobber callee regs for its own use, and ··· 1502 1454 emit_mov_imm64(&prog, X86_REG_R12, 1503 1455 arena_vm_start >> 32, (u32) arena_vm_start); 1504 1456 1457 + if (priv_frame_ptr) 1458 + emit_priv_frame_ptr(&prog, priv_frame_ptr); 1459 + 1505 1460 ilen = prog - temp; 1506 1461 if (rw_image) 1507 1462 memcpy(rw_image + proglen, temp, ilen); ··· 1523 1472 u8 jmp_cond; 1524 1473 u8 *func; 1525 1474 int nops; 1475 + 1476 + if (priv_frame_ptr) { 1477 + if (src_reg == BPF_REG_FP) 1478 + src_reg = X86_REG_R9; 1479 + 1480 + if (dst_reg == BPF_REG_FP) 1481 + dst_reg = X86_REG_R9; 1482 + } 1526 1483 1527 1484 switch (insn->code) { 1528 1485 /* ALU */ ··· 2187 2128 2188 2129 func = (u8 *) __bpf_call_base + imm32; 2189 2130 if (tail_call_reachable) { 2190 - LOAD_TAIL_CALL_CNT_PTR(bpf_prog->aux->stack_depth); 2131 + LOAD_TAIL_CALL_CNT_PTR(stack_depth); 2191 2132 ip += 7; 2192 2133 } 2193 2134 if (!imm32) 2194 2135 return -EINVAL; 2136 + if (priv_frame_ptr) { 2137 + push_r9(&prog); 2138 + ip += 2; 2139 + } 2195 2140 ip += x86_call_depth_emit_accounting(&prog, func, ip); 2196 2141 if (emit_call(&prog, func, ip)) 2197 2142 return -EINVAL; 2143 + if (priv_frame_ptr) 2144 + pop_r9(&prog); 2198 2145 break; 2199 2146 } 2200 2147 ··· 2210 2145 &bpf_prog->aux->poke_tab[imm32 - 1], 2211 2146 &prog, image + addrs[i - 1], 2212 2147 callee_regs_used, 2213 - bpf_prog->aux->stack_depth, 2148 + stack_depth, 2214 2149 ctx); 2215 2150 else 2216 2151 emit_bpf_tail_call_indirect(bpf_prog, 2217 2152 &prog, 2218 2153 callee_regs_used, 2219 - bpf_prog->aux->stack_depth, 2154 + stack_depth, 2220 2155 image + addrs[i - 1], 2221 2156 ctx); 2222 2157 break; ··· 3368 3303 return emit_bpf_dispatcher(&prog, 0, num_funcs - 1, funcs, image, buf); 3369 3304 } 3370 3305 3306 + static const char *bpf_get_prog_name(struct bpf_prog *prog) 3307 + { 3308 + if (prog->aux->ksym.prog) 3309 + return prog->aux->ksym.name; 3310 + return prog->aux->name; 3311 + } 3312 + 3313 + static void priv_stack_init_guard(void __percpu *priv_stack_ptr, int alloc_size) 3314 + { 3315 + int cpu, underflow_idx = (alloc_size - PRIV_STACK_GUARD_SZ) >> 3; 3316 + u64 *stack_ptr; 3317 + 3318 + for_each_possible_cpu(cpu) { 3319 + stack_ptr = per_cpu_ptr(priv_stack_ptr, cpu); 3320 + stack_ptr[0] = PRIV_STACK_GUARD_VAL; 3321 + stack_ptr[underflow_idx] = PRIV_STACK_GUARD_VAL; 3322 + } 3323 + } 3324 + 3325 + static void priv_stack_check_guard(void __percpu *priv_stack_ptr, int alloc_size, 3326 + struct bpf_prog *prog) 3327 + { 3328 + int cpu, underflow_idx = (alloc_size - PRIV_STACK_GUARD_SZ) >> 3; 3329 + u64 *stack_ptr; 3330 + 3331 + for_each_possible_cpu(cpu) { 3332 + stack_ptr = per_cpu_ptr(priv_stack_ptr, cpu); 3333 + if (stack_ptr[0] != PRIV_STACK_GUARD_VAL || 3334 + stack_ptr[underflow_idx] != PRIV_STACK_GUARD_VAL) { 3335 + pr_err("BPF private stack overflow/underflow detected for prog %sx\n", 3336 + bpf_get_prog_name(prog)); 3337 + break; 3338 + } 3339 + } 3340 + } 3341 + 3371 3342 struct x64_jit_data { 3372 3343 struct bpf_binary_header *rw_header; 3373 3344 struct bpf_binary_header *header; ··· 3421 3320 struct bpf_binary_header *rw_header = NULL; 3422 3321 struct bpf_binary_header *header = NULL; 3423 3322 struct bpf_prog *tmp, *orig_prog = prog; 3323 + void __percpu *priv_stack_ptr = NULL; 3424 3324 struct x64_jit_data *jit_data; 3325 + int priv_stack_alloc_sz; 3425 3326 int proglen, oldproglen = 0; 3426 3327 struct jit_context ctx = {}; 3427 3328 bool tmp_blinded = false; ··· 3458 3355 goto out; 3459 3356 } 3460 3357 prog->aux->jit_data = jit_data; 3358 + } 3359 + priv_stack_ptr = prog->aux->priv_stack_ptr; 3360 + if (!priv_stack_ptr && prog->aux->jits_use_priv_stack) { 3361 + /* Allocate actual private stack size with verifier-calculated 3362 + * stack size plus two memory guards to protect overflow and 3363 + * underflow. 3364 + */ 3365 + priv_stack_alloc_sz = round_up(prog->aux->stack_depth, 8) + 3366 + 2 * PRIV_STACK_GUARD_SZ; 3367 + priv_stack_ptr = __alloc_percpu_gfp(priv_stack_alloc_sz, 8, GFP_KERNEL); 3368 + if (!priv_stack_ptr) { 3369 + prog = orig_prog; 3370 + goto out_priv_stack; 3371 + } 3372 + 3373 + priv_stack_init_guard(priv_stack_ptr, priv_stack_alloc_sz); 3374 + prog->aux->priv_stack_ptr = priv_stack_ptr; 3461 3375 } 3462 3376 addrs = jit_data->addrs; 3463 3377 if (addrs) { ··· 3611 3491 bpf_prog_fill_jited_linfo(prog, addrs + 1); 3612 3492 out_addrs: 3613 3493 kvfree(addrs); 3494 + if (!image && priv_stack_ptr) { 3495 + free_percpu(priv_stack_ptr); 3496 + prog->aux->priv_stack_ptr = NULL; 3497 + } 3498 + out_priv_stack: 3614 3499 kfree(jit_data); 3615 3500 prog->aux->jit_data = NULL; 3616 3501 } ··· 3654 3529 if (prog->jited) { 3655 3530 struct x64_jit_data *jit_data = prog->aux->jit_data; 3656 3531 struct bpf_binary_header *hdr; 3532 + void __percpu *priv_stack_ptr; 3533 + int priv_stack_alloc_sz; 3657 3534 3658 3535 /* 3659 3536 * If we fail the final pass of JIT (from jit_subprogs), ··· 3671 3544 prog->bpf_func = (void *)prog->bpf_func - cfi_get_offset(); 3672 3545 hdr = bpf_jit_binary_pack_hdr(prog); 3673 3546 bpf_jit_binary_pack_free(hdr, NULL); 3547 + priv_stack_ptr = prog->aux->priv_stack_ptr; 3548 + if (priv_stack_ptr) { 3549 + priv_stack_alloc_sz = round_up(prog->aux->stack_depth, 8) + 3550 + 2 * PRIV_STACK_GUARD_SZ; 3551 + priv_stack_check_guard(priv_stack_ptr, priv_stack_alloc_sz, prog); 3552 + free_percpu(prog->aux->priv_stack_ptr); 3553 + } 3674 3554 WARN_ON_ONCE(!bpf_prog_kallsyms_verify_off(prog)); 3675 3555 } 3676 3556 ··· 3691 3557 * to walk kernel frames and reach BPF frames in the stack trace. 3692 3558 */ 3693 3559 return IS_ENABLED(CONFIG_UNWINDER_ORC); 3560 + } 3561 + 3562 + bool bpf_jit_supports_private_stack(void) 3563 + { 3564 + return true; 3694 3565 } 3695 3566 3696 3567 void arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp), void *cookie)
+4
include/linux/bpf.h
··· 1507 1507 u32 max_rdwr_access; 1508 1508 struct btf *attach_btf; 1509 1509 const struct bpf_ctx_arg_aux *ctx_arg_info; 1510 + void __percpu *priv_stack_ptr; 1510 1511 struct mutex dst_mutex; /* protects dst_* pointers below, *after* prog becomes visible */ 1511 1512 struct bpf_prog *dst_prog; 1512 1513 struct bpf_trampoline *dst_trampoline; ··· 1524 1523 bool exception_cb; 1525 1524 bool exception_boundary; 1526 1525 bool is_extended; /* true if extended by freplace program */ 1526 + bool jits_use_priv_stack; 1527 + bool priv_stack_requested; 1527 1528 u64 prog_array_member_cnt; /* counts how many times as member of prog_array */ 1528 1529 struct mutex ext_mutex; /* mutex for is_extended and prog_array_member_cnt */ 1529 1530 struct bpf_arena *arena; 1531 + void (*recursion_detected)(struct bpf_prog *prog); /* callback if recursion is detected */ 1530 1532 /* BTF_KIND_FUNC_PROTO for valid attach_btf_id */ 1531 1533 const struct btf_type *attach_func_proto; 1532 1534 /* function name for valid attach_btf_id */
+8
include/linux/bpf_verifier.h
··· 633 633 }; 634 634 }; 635 635 636 + enum priv_stack_mode { 637 + PRIV_STACK_UNKNOWN, 638 + NO_PRIV_STACK, 639 + PRIV_STACK_ADAPTIVE, 640 + }; 641 + 636 642 struct bpf_subprog_info { 637 643 /* 'start' has to be the first field otherwise find_subprog() won't work */ 638 644 u32 start; /* insn idx of function entry point */ ··· 659 653 /* true if bpf_fastcall stack region is used by functions that can't be inlined */ 660 654 bool keep_fastcall_stack: 1; 661 655 656 + enum priv_stack_mode priv_stack_mode; 662 657 u8 arg_cnt; 663 658 struct bpf_subprog_arg_info args[MAX_BPF_FUNC_REG_ARGS]; 664 659 }; ··· 879 872 case BPF_PROG_TYPE_TRACING: 880 873 return prog->expected_attach_type != BPF_TRACE_ITER; 881 874 case BPF_PROG_TYPE_STRUCT_OPS: 875 + return prog->aux->jits_use_priv_stack; 882 876 case BPF_PROG_TYPE_LSM: 883 877 return false; 884 878 default:
+1
include/linux/filter.h
··· 1119 1119 bool bpf_jit_supports_ptr_xchg(void); 1120 1120 bool bpf_jit_supports_arena(void); 1121 1121 bool bpf_jit_supports_insn(struct bpf_insn *insn, bool in_arena); 1122 + bool bpf_jit_supports_private_stack(void); 1122 1123 u64 bpf_arch_uaddress_limit(void); 1123 1124 void arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp), void *cookie); 1124 1125 bool bpf_helper_changes_pkt_data(void *func);
+5
kernel/bpf/core.c
··· 3045 3045 return false; 3046 3046 } 3047 3047 3048 + bool __weak bpf_jit_supports_private_stack(void) 3049 + { 3050 + return false; 3051 + } 3052 + 3048 3053 void __weak arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp), void *cookie) 3049 3054 { 3050 3055 }
+4
kernel/bpf/trampoline.c
··· 899 899 900 900 if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) { 901 901 bpf_prog_inc_misses_counter(prog); 902 + if (prog->aux->recursion_detected) 903 + prog->aux->recursion_detected(prog); 902 904 return 0; 903 905 } 904 906 return bpf_prog_start_time(); ··· 977 975 978 976 if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) { 979 977 bpf_prog_inc_misses_counter(prog); 978 + if (prog->aux->recursion_detected) 979 + prog->aux->recursion_detected(prog); 980 980 return 0; 981 981 } 982 982 return bpf_prog_start_time();
+102 -10
kernel/bpf/verifier.c
··· 194 194 195 195 #define BPF_GLOBAL_PERCPU_MA_MAX_SIZE 512 196 196 197 + #define BPF_PRIV_STACK_MIN_SIZE 64 198 + 197 199 static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx); 198 200 static int release_reference(struct bpf_verifier_env *env, int ref_obj_id); 199 201 static void invalidate_non_owning_refs(struct bpf_verifier_env *env); ··· 6092 6090 strict); 6093 6091 } 6094 6092 6093 + static enum priv_stack_mode bpf_enable_priv_stack(struct bpf_prog *prog) 6094 + { 6095 + if (!bpf_jit_supports_private_stack()) 6096 + return NO_PRIV_STACK; 6097 + 6098 + /* bpf_prog_check_recur() checks all prog types that use bpf trampoline 6099 + * while kprobe/tp/perf_event/raw_tp don't use trampoline hence checked 6100 + * explicitly. 6101 + */ 6102 + switch (prog->type) { 6103 + case BPF_PROG_TYPE_KPROBE: 6104 + case BPF_PROG_TYPE_TRACEPOINT: 6105 + case BPF_PROG_TYPE_PERF_EVENT: 6106 + case BPF_PROG_TYPE_RAW_TRACEPOINT: 6107 + return PRIV_STACK_ADAPTIVE; 6108 + case BPF_PROG_TYPE_TRACING: 6109 + case BPF_PROG_TYPE_LSM: 6110 + case BPF_PROG_TYPE_STRUCT_OPS: 6111 + if (prog->aux->priv_stack_requested || bpf_prog_check_recur(prog)) 6112 + return PRIV_STACK_ADAPTIVE; 6113 + fallthrough; 6114 + default: 6115 + break; 6116 + } 6117 + 6118 + return NO_PRIV_STACK; 6119 + } 6120 + 6095 6121 static int round_up_stack_depth(struct bpf_verifier_env *env, int stack_depth) 6096 6122 { 6097 6123 if (env->prog->jit_requested) ··· 6137 6107 * Since recursion is prevented by check_cfg() this algorithm 6138 6108 * only needs a local stack of MAX_CALL_FRAMES to remember callsites 6139 6109 */ 6140 - static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx) 6110 + static int check_max_stack_depth_subprog(struct bpf_verifier_env *env, int idx, 6111 + bool priv_stack_supported) 6141 6112 { 6142 6113 struct bpf_subprog_info *subprog = env->subprog_info; 6143 6114 struct bpf_insn *insn = env->prog->insnsi; 6144 - int depth = 0, frame = 0, i, subprog_end; 6115 + int depth = 0, frame = 0, i, subprog_end, subprog_depth; 6145 6116 bool tail_call_reachable = false; 6146 6117 int ret_insn[MAX_CALL_FRAMES]; 6147 6118 int ret_prog[MAX_CALL_FRAMES]; 6148 6119 int j; 6149 6120 6150 6121 i = subprog[idx].start; 6122 + if (!priv_stack_supported) 6123 + subprog[idx].priv_stack_mode = NO_PRIV_STACK; 6151 6124 process_func: 6152 6125 /* protect against potential stack overflow that might happen when 6153 6126 * bpf2bpf calls get combined with tailcalls. Limit the caller's stack ··· 6177 6144 depth); 6178 6145 return -EACCES; 6179 6146 } 6180 - depth += round_up_stack_depth(env, subprog[idx].stack_depth); 6181 - if (depth > MAX_BPF_STACK) { 6182 - verbose(env, "combined stack size of %d calls is %d. Too large\n", 6183 - frame + 1, depth); 6184 - return -EACCES; 6147 + 6148 + subprog_depth = round_up_stack_depth(env, subprog[idx].stack_depth); 6149 + if (priv_stack_supported) { 6150 + /* Request private stack support only if the subprog stack 6151 + * depth is no less than BPF_PRIV_STACK_MIN_SIZE. This is to 6152 + * avoid jit penalty if the stack usage is small. 6153 + */ 6154 + if (subprog[idx].priv_stack_mode == PRIV_STACK_UNKNOWN && 6155 + subprog_depth >= BPF_PRIV_STACK_MIN_SIZE) 6156 + subprog[idx].priv_stack_mode = PRIV_STACK_ADAPTIVE; 6157 + } 6158 + 6159 + if (subprog[idx].priv_stack_mode == PRIV_STACK_ADAPTIVE) { 6160 + if (subprog_depth > MAX_BPF_STACK) { 6161 + verbose(env, "stack size of subprog %d is %d. Too large\n", 6162 + idx, subprog_depth); 6163 + return -EACCES; 6164 + } 6165 + } else { 6166 + depth += subprog_depth; 6167 + if (depth > MAX_BPF_STACK) { 6168 + verbose(env, "combined stack size of %d calls is %d. Too large\n", 6169 + frame + 1, depth); 6170 + return -EACCES; 6171 + } 6185 6172 } 6186 6173 continue_func: 6187 6174 subprog_end = subprog[idx + 1].start; ··· 6258 6205 } 6259 6206 i = next_insn; 6260 6207 idx = sidx; 6208 + if (!priv_stack_supported) 6209 + subprog[idx].priv_stack_mode = NO_PRIV_STACK; 6261 6210 6262 6211 if (subprog[idx].has_tail_call) 6263 6212 tail_call_reachable = true; ··· 6293 6238 */ 6294 6239 if (frame == 0) 6295 6240 return 0; 6296 - depth -= round_up_stack_depth(env, subprog[idx].stack_depth); 6241 + if (subprog[idx].priv_stack_mode != PRIV_STACK_ADAPTIVE) 6242 + depth -= round_up_stack_depth(env, subprog[idx].stack_depth); 6297 6243 frame--; 6298 6244 i = ret_insn[frame]; 6299 6245 idx = ret_prog[frame]; ··· 6303 6247 6304 6248 static int check_max_stack_depth(struct bpf_verifier_env *env) 6305 6249 { 6250 + enum priv_stack_mode priv_stack_mode = PRIV_STACK_UNKNOWN; 6306 6251 struct bpf_subprog_info *si = env->subprog_info; 6252 + bool priv_stack_supported; 6307 6253 int ret; 6308 6254 6309 6255 for (int i = 0; i < env->subprog_cnt; i++) { 6256 + if (si[i].has_tail_call) { 6257 + priv_stack_mode = NO_PRIV_STACK; 6258 + break; 6259 + } 6260 + } 6261 + 6262 + if (priv_stack_mode == PRIV_STACK_UNKNOWN) 6263 + priv_stack_mode = bpf_enable_priv_stack(env->prog); 6264 + 6265 + /* All async_cb subprogs use normal kernel stack. If a particular 6266 + * subprog appears in both main prog and async_cb subtree, that 6267 + * subprog will use normal kernel stack to avoid potential nesting. 6268 + * The reverse subprog traversal ensures when main prog subtree is 6269 + * checked, the subprogs appearing in async_cb subtrees are already 6270 + * marked as using normal kernel stack, so stack size checking can 6271 + * be done properly. 6272 + */ 6273 + for (int i = env->subprog_cnt - 1; i >= 0; i--) { 6310 6274 if (!i || si[i].is_async_cb) { 6311 - ret = check_max_stack_depth_subprog(env, i); 6275 + priv_stack_supported = !i && priv_stack_mode == PRIV_STACK_ADAPTIVE; 6276 + ret = check_max_stack_depth_subprog(env, i, priv_stack_supported); 6312 6277 if (ret < 0) 6313 6278 return ret; 6314 6279 } 6315 - continue; 6316 6280 } 6281 + 6282 + for (int i = 0; i < env->subprog_cnt; i++) { 6283 + if (si[i].priv_stack_mode == PRIV_STACK_ADAPTIVE) { 6284 + env->prog->aux->jits_use_priv_stack = true; 6285 + break; 6286 + } 6287 + } 6288 + 6317 6289 return 0; 6318 6290 } 6319 6291 ··· 20282 20198 20283 20199 func[i]->aux->name[0] = 'F'; 20284 20200 func[i]->aux->stack_depth = env->subprog_info[i].stack_depth; 20201 + if (env->subprog_info[i].priv_stack_mode == PRIV_STACK_ADAPTIVE) 20202 + func[i]->aux->jits_use_priv_stack = true; 20203 + 20285 20204 func[i]->jit_requested = 1; 20286 20205 func[i]->blinding_requested = prog->blinding_requested; 20287 20206 func[i]->aux->kfunc_tab = prog->aux->kfunc_tab; ··· 22051 21964 mname, st_ops->name); 22052 21965 return err; 22053 21966 } 21967 + } 21968 + 21969 + if (prog->aux->priv_stack_requested && !bpf_jit_supports_private_stack()) { 21970 + verbose(env, "Private stack not supported by jit\n"); 21971 + return -EACCES; 22054 21972 } 22055 21973 22056 21974 /* btf_ctx_access() used this to provide argument type info */
+104
tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c
··· 245 245 call_rcu(&ctx->rcu, testmod_free_cb); 246 246 } 247 247 248 + static struct bpf_testmod_ops3 *st_ops3; 249 + 250 + static int bpf_testmod_test_3(void) 251 + { 252 + return 0; 253 + } 254 + 255 + static int bpf_testmod_test_4(void) 256 + { 257 + return 0; 258 + } 259 + 260 + static struct bpf_testmod_ops3 __bpf_testmod_ops3 = { 261 + .test_1 = bpf_testmod_test_3, 262 + .test_2 = bpf_testmod_test_4, 263 + }; 264 + 265 + static void bpf_testmod_test_struct_ops3(void) 266 + { 267 + if (st_ops3) 268 + st_ops3->test_1(); 269 + } 270 + 271 + __bpf_kfunc void bpf_testmod_ops3_call_test_1(void) 272 + { 273 + st_ops3->test_1(); 274 + } 275 + 276 + __bpf_kfunc void bpf_testmod_ops3_call_test_2(void) 277 + { 278 + st_ops3->test_2(); 279 + } 280 + 248 281 struct bpf_testmod_btf_type_tag_1 { 249 282 int a; 250 283 }; ··· 414 381 (void)bpf_testmod_test_arg_ptr_to_struct(&struct_arg1_2); 415 382 416 383 (void)trace_bpf_testmod_test_raw_tp_null(NULL); 384 + 385 + bpf_testmod_test_struct_ops3(); 417 386 418 387 struct_arg3 = kmalloc((sizeof(struct bpf_testmod_struct_arg_3) + 419 388 sizeof(int)), GFP_KERNEL); ··· 621 586 BTF_ID_FLAGS(func, bpf_kfunc_rcu_task_test, KF_RCU) 622 587 BTF_ID_FLAGS(func, bpf_testmod_ctx_create, KF_ACQUIRE | KF_RET_NULL) 623 588 BTF_ID_FLAGS(func, bpf_testmod_ctx_release, KF_RELEASE) 589 + BTF_ID_FLAGS(func, bpf_testmod_ops3_call_test_1) 590 + BTF_ID_FLAGS(func, bpf_testmod_ops3_call_test_2) 624 591 BTF_KFUNCS_END(bpf_testmod_common_kfunc_ids) 625 592 626 593 BTF_ID_LIST(bpf_testmod_dtor_ids) ··· 1133 1096 .is_valid_access = bpf_testmod_ops_is_valid_access, 1134 1097 }; 1135 1098 1099 + static const struct bpf_verifier_ops bpf_testmod_verifier_ops3 = { 1100 + .is_valid_access = bpf_testmod_ops_is_valid_access, 1101 + }; 1102 + 1136 1103 static int bpf_dummy_reg(void *kdata, struct bpf_link *link) 1137 1104 { 1138 1105 struct bpf_testmod_ops *ops = kdata; ··· 1213 1172 .unreg = bpf_dummy_unreg, 1214 1173 .cfi_stubs = &__bpf_testmod_ops2, 1215 1174 .name = "bpf_testmod_ops2", 1175 + .owner = THIS_MODULE, 1176 + }; 1177 + 1178 + static int st_ops3_reg(void *kdata, struct bpf_link *link) 1179 + { 1180 + int err = 0; 1181 + 1182 + mutex_lock(&st_ops_mutex); 1183 + if (st_ops3) { 1184 + pr_err("st_ops has already been registered\n"); 1185 + err = -EEXIST; 1186 + goto unlock; 1187 + } 1188 + st_ops3 = kdata; 1189 + 1190 + unlock: 1191 + mutex_unlock(&st_ops_mutex); 1192 + return err; 1193 + } 1194 + 1195 + static void st_ops3_unreg(void *kdata, struct bpf_link *link) 1196 + { 1197 + mutex_lock(&st_ops_mutex); 1198 + st_ops3 = NULL; 1199 + mutex_unlock(&st_ops_mutex); 1200 + } 1201 + 1202 + static void test_1_recursion_detected(struct bpf_prog *prog) 1203 + { 1204 + struct bpf_prog_stats *stats; 1205 + 1206 + stats = this_cpu_ptr(prog->stats); 1207 + printk("bpf_testmod: oh no, recursing into test_1, recursion_misses %llu", 1208 + u64_stats_read(&stats->misses)); 1209 + } 1210 + 1211 + static int st_ops3_check_member(const struct btf_type *t, 1212 + const struct btf_member *member, 1213 + const struct bpf_prog *prog) 1214 + { 1215 + u32 moff = __btf_member_bit_offset(t, member) / 8; 1216 + 1217 + switch (moff) { 1218 + case offsetof(struct bpf_testmod_ops3, test_1): 1219 + prog->aux->priv_stack_requested = true; 1220 + prog->aux->recursion_detected = test_1_recursion_detected; 1221 + fallthrough; 1222 + default: 1223 + break; 1224 + } 1225 + return 0; 1226 + } 1227 + 1228 + struct bpf_struct_ops bpf_testmod_ops3 = { 1229 + .verifier_ops = &bpf_testmod_verifier_ops3, 1230 + .init = bpf_testmod_ops_init, 1231 + .init_member = bpf_testmod_ops_init_member, 1232 + .reg = st_ops3_reg, 1233 + .unreg = st_ops3_unreg, 1234 + .check_member = st_ops3_check_member, 1235 + .cfi_stubs = &__bpf_testmod_ops3, 1236 + .name = "bpf_testmod_ops3", 1216 1237 .owner = THIS_MODULE, 1217 1238 }; 1218 1239 ··· 1436 1333 ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &bpf_testmod_kfunc_set); 1437 1334 ret = ret ?: register_bpf_struct_ops(&bpf_bpf_testmod_ops, bpf_testmod_ops); 1438 1335 ret = ret ?: register_bpf_struct_ops(&bpf_testmod_ops2, bpf_testmod_ops2); 1336 + ret = ret ?: register_bpf_struct_ops(&bpf_testmod_ops3, bpf_testmod_ops3); 1439 1337 ret = ret ?: register_bpf_struct_ops(&testmod_st_ops, bpf_testmod_st_ops); 1440 1338 ret = ret ?: register_btf_id_dtor_kfuncs(bpf_testmod_dtors, 1441 1339 ARRAY_SIZE(bpf_testmod_dtors),
+5
tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h
··· 94 94 int (*test_1)(void); 95 95 }; 96 96 97 + struct bpf_testmod_ops3 { 98 + int (*test_1)(void); 99 + int (*test_2)(void); 100 + }; 101 + 97 102 struct st_ops_args { 98 103 u64 a; 99 104 };
+106
tools/testing/selftests/bpf/prog_tests/struct_ops_private_stack.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + #include <test_progs.h> 4 + #include "struct_ops_private_stack.skel.h" 5 + #include "struct_ops_private_stack_fail.skel.h" 6 + #include "struct_ops_private_stack_recur.skel.h" 7 + 8 + static void test_private_stack(void) 9 + { 10 + struct struct_ops_private_stack *skel; 11 + struct bpf_link *link; 12 + int err; 13 + 14 + skel = struct_ops_private_stack__open(); 15 + if (!ASSERT_OK_PTR(skel, "struct_ops_private_stack__open")) 16 + return; 17 + 18 + if (skel->data->skip) { 19 + test__skip(); 20 + goto cleanup; 21 + } 22 + 23 + err = struct_ops_private_stack__load(skel); 24 + if (!ASSERT_OK(err, "struct_ops_private_stack__load")) 25 + goto cleanup; 26 + 27 + link = bpf_map__attach_struct_ops(skel->maps.testmod_1); 28 + if (!ASSERT_OK_PTR(link, "attach_struct_ops")) 29 + goto cleanup; 30 + 31 + ASSERT_OK(trigger_module_test_read(256), "trigger_read"); 32 + 33 + ASSERT_EQ(skel->bss->val_i, 3, "val_i"); 34 + ASSERT_EQ(skel->bss->val_j, 8, "val_j"); 35 + 36 + bpf_link__destroy(link); 37 + 38 + cleanup: 39 + struct_ops_private_stack__destroy(skel); 40 + } 41 + 42 + static void test_private_stack_fail(void) 43 + { 44 + struct struct_ops_private_stack_fail *skel; 45 + int err; 46 + 47 + skel = struct_ops_private_stack_fail__open(); 48 + if (!ASSERT_OK_PTR(skel, "struct_ops_private_stack_fail__open")) 49 + return; 50 + 51 + if (skel->data->skip) { 52 + test__skip(); 53 + goto cleanup; 54 + } 55 + 56 + err = struct_ops_private_stack_fail__load(skel); 57 + if (!ASSERT_ERR(err, "struct_ops_private_stack_fail__load")) 58 + goto cleanup; 59 + return; 60 + 61 + cleanup: 62 + struct_ops_private_stack_fail__destroy(skel); 63 + } 64 + 65 + static void test_private_stack_recur(void) 66 + { 67 + struct struct_ops_private_stack_recur *skel; 68 + struct bpf_link *link; 69 + int err; 70 + 71 + skel = struct_ops_private_stack_recur__open(); 72 + if (!ASSERT_OK_PTR(skel, "struct_ops_private_stack_recur__open")) 73 + return; 74 + 75 + if (skel->data->skip) { 76 + test__skip(); 77 + goto cleanup; 78 + } 79 + 80 + err = struct_ops_private_stack_recur__load(skel); 81 + if (!ASSERT_OK(err, "struct_ops_private_stack_recur__load")) 82 + goto cleanup; 83 + 84 + link = bpf_map__attach_struct_ops(skel->maps.testmod_1); 85 + if (!ASSERT_OK_PTR(link, "attach_struct_ops")) 86 + goto cleanup; 87 + 88 + ASSERT_OK(trigger_module_test_read(256), "trigger_read"); 89 + 90 + ASSERT_EQ(skel->bss->val_j, 3, "val_j"); 91 + 92 + bpf_link__destroy(link); 93 + 94 + cleanup: 95 + struct_ops_private_stack_recur__destroy(skel); 96 + } 97 + 98 + void test_struct_ops_private_stack(void) 99 + { 100 + if (test__start_subtest("private_stack")) 101 + test_private_stack(); 102 + if (test__start_subtest("private_stack_fail")) 103 + test_private_stack_fail(); 104 + if (test__start_subtest("private_stack_recur")) 105 + test_private_stack_recur(); 106 + }
+2
tools/testing/selftests/bpf/prog_tests/verifier.c
··· 61 61 #include "verifier_or_jmp32_k.skel.h" 62 62 #include "verifier_precision.skel.h" 63 63 #include "verifier_prevent_map_lookup.skel.h" 64 + #include "verifier_private_stack.skel.h" 64 65 #include "verifier_raw_stack.skel.h" 65 66 #include "verifier_raw_tp_writable.skel.h" 66 67 #include "verifier_reg_equal.skel.h" ··· 189 188 void test_verifier_or_jmp32_k(void) { RUN(verifier_or_jmp32_k); } 190 189 void test_verifier_precision(void) { RUN(verifier_precision); } 191 190 void test_verifier_prevent_map_lookup(void) { RUN(verifier_prevent_map_lookup); } 191 + void test_verifier_private_stack(void) { RUN(verifier_private_stack); } 192 192 void test_verifier_raw_stack(void) { RUN(verifier_raw_stack); } 193 193 void test_verifier_raw_tp_writable(void) { RUN(verifier_raw_tp_writable); } 194 194 void test_verifier_reg_equal(void) { RUN(verifier_reg_equal); }
+62
tools/testing/selftests/bpf/progs/struct_ops_private_stack.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + #include <vmlinux.h> 4 + #include <bpf/bpf_helpers.h> 5 + #include <bpf/bpf_tracing.h> 6 + #include "../bpf_testmod/bpf_testmod.h" 7 + 8 + char _license[] SEC("license") = "GPL"; 9 + 10 + #if defined(__TARGET_ARCH_x86) 11 + bool skip __attribute((__section__(".data"))) = false; 12 + #else 13 + bool skip = true; 14 + #endif 15 + 16 + void bpf_testmod_ops3_call_test_2(void) __ksym; 17 + 18 + int val_i, val_j; 19 + 20 + __noinline static int subprog2(int *a, int *b) 21 + { 22 + return val_i + a[10] + b[20]; 23 + } 24 + 25 + __noinline static int subprog1(int *a) 26 + { 27 + /* stack size 200 bytes */ 28 + int b[50] = {}; 29 + 30 + b[20] = 2; 31 + return subprog2(a, b); 32 + } 33 + 34 + 35 + SEC("struct_ops") 36 + int BPF_PROG(test_1) 37 + { 38 + /* stack size 400 bytes */ 39 + int a[100] = {}; 40 + 41 + a[10] = 1; 42 + val_i = subprog1(a); 43 + bpf_testmod_ops3_call_test_2(); 44 + return 0; 45 + } 46 + 47 + SEC("struct_ops") 48 + int BPF_PROG(test_2) 49 + { 50 + /* stack size 200 bytes */ 51 + int a[50] = {}; 52 + 53 + a[10] = 3; 54 + val_j = subprog1(a); 55 + return 0; 56 + } 57 + 58 + SEC(".struct_ops") 59 + struct bpf_testmod_ops3 testmod_1 = { 60 + .test_1 = (void *)test_1, 61 + .test_2 = (void *)test_2, 62 + };
+62
tools/testing/selftests/bpf/progs/struct_ops_private_stack_fail.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + #include <vmlinux.h> 4 + #include <bpf/bpf_helpers.h> 5 + #include <bpf/bpf_tracing.h> 6 + #include "../bpf_testmod/bpf_testmod.h" 7 + 8 + char _license[] SEC("license") = "GPL"; 9 + 10 + #if defined(__TARGET_ARCH_x86) 11 + bool skip __attribute((__section__(".data"))) = false; 12 + #else 13 + bool skip = true; 14 + #endif 15 + 16 + void bpf_testmod_ops3_call_test_2(void) __ksym; 17 + 18 + int val_i, val_j; 19 + 20 + __noinline static int subprog2(int *a, int *b) 21 + { 22 + return val_i + a[10] + b[20]; 23 + } 24 + 25 + __noinline static int subprog1(int *a) 26 + { 27 + /* stack size 200 bytes */ 28 + int b[50] = {}; 29 + 30 + b[20] = 2; 31 + return subprog2(a, b); 32 + } 33 + 34 + 35 + SEC("struct_ops") 36 + int BPF_PROG(test_1) 37 + { 38 + /* stack size 100 bytes */ 39 + int a[25] = {}; 40 + 41 + a[10] = 1; 42 + val_i = subprog1(a); 43 + bpf_testmod_ops3_call_test_2(); 44 + return 0; 45 + } 46 + 47 + SEC("struct_ops") 48 + int BPF_PROG(test_2) 49 + { 50 + /* stack size 400 bytes */ 51 + int a[100] = {}; 52 + 53 + a[10] = 3; 54 + val_j = subprog1(a); 55 + return 0; 56 + } 57 + 58 + SEC(".struct_ops") 59 + struct bpf_testmod_ops3 testmod_1 = { 60 + .test_1 = (void *)test_1, 61 + .test_2 = (void *)test_2, 62 + };
+50
tools/testing/selftests/bpf/progs/struct_ops_private_stack_recur.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + #include <vmlinux.h> 4 + #include <bpf/bpf_helpers.h> 5 + #include <bpf/bpf_tracing.h> 6 + #include "../bpf_testmod/bpf_testmod.h" 7 + 8 + char _license[] SEC("license") = "GPL"; 9 + 10 + #if defined(__TARGET_ARCH_x86) 11 + bool skip __attribute((__section__(".data"))) = false; 12 + #else 13 + bool skip = true; 14 + #endif 15 + 16 + void bpf_testmod_ops3_call_test_1(void) __ksym; 17 + 18 + int val_i, val_j; 19 + 20 + __noinline static int subprog2(int *a, int *b) 21 + { 22 + return val_i + a[1] + b[20]; 23 + } 24 + 25 + __noinline static int subprog1(int *a) 26 + { 27 + /* stack size 400 bytes */ 28 + int b[100] = {}; 29 + 30 + b[20] = 2; 31 + return subprog2(a, b); 32 + } 33 + 34 + 35 + SEC("struct_ops") 36 + int BPF_PROG(test_1) 37 + { 38 + /* stack size 20 bytes */ 39 + int a[5] = {}; 40 + 41 + a[1] = 1; 42 + val_j += subprog1(a); 43 + bpf_testmod_ops3_call_test_1(); 44 + return 0; 45 + } 46 + 47 + SEC(".struct_ops") 48 + struct bpf_testmod_ops3 testmod_1 = { 49 + .test_1 = (void *)test_1, 50 + };
+272
tools/testing/selftests/bpf/progs/verifier_private_stack.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + #include <vmlinux.h> 4 + #include <bpf/bpf_helpers.h> 5 + #include "bpf_misc.h" 6 + #include "bpf_experimental.h" 7 + 8 + /* From include/linux/filter.h */ 9 + #define MAX_BPF_STACK 512 10 + 11 + #if defined(__TARGET_ARCH_x86) 12 + 13 + struct elem { 14 + struct bpf_timer t; 15 + char pad[256]; 16 + }; 17 + 18 + struct { 19 + __uint(type, BPF_MAP_TYPE_ARRAY); 20 + __uint(max_entries, 1); 21 + __type(key, int); 22 + __type(value, struct elem); 23 + } array SEC(".maps"); 24 + 25 + SEC("kprobe") 26 + __description("Private stack, single prog") 27 + __success 28 + __arch_x86_64 29 + __jited(" movabsq $0x{{.*}}, %r9") 30 + __jited(" addq %gs:0x{{.*}}, %r9") 31 + __jited(" movl $0x2a, %edi") 32 + __jited(" movq %rdi, -0x100(%r9)") 33 + __naked void private_stack_single_prog(void) 34 + { 35 + asm volatile (" \ 36 + r1 = 42; \ 37 + *(u64 *)(r10 - 256) = r1; \ 38 + r0 = 0; \ 39 + exit; \ 40 + " ::: __clobber_all); 41 + } 42 + 43 + SEC("raw_tp") 44 + __description("No private stack") 45 + __success 46 + __arch_x86_64 47 + __jited(" subq $0x8, %rsp") 48 + __naked void no_private_stack_nested(void) 49 + { 50 + asm volatile (" \ 51 + r1 = 42; \ 52 + *(u64 *)(r10 - 8) = r1; \ 53 + r0 = 0; \ 54 + exit; \ 55 + " ::: __clobber_all); 56 + } 57 + 58 + __used 59 + __naked static void cumulative_stack_depth_subprog(void) 60 + { 61 + asm volatile (" \ 62 + r1 = 41; \ 63 + *(u64 *)(r10 - 32) = r1; \ 64 + call %[bpf_get_smp_processor_id]; \ 65 + exit; \ 66 + " : 67 + : __imm(bpf_get_smp_processor_id) 68 + : __clobber_all); 69 + } 70 + 71 + SEC("kprobe") 72 + __description("Private stack, subtree > MAX_BPF_STACK") 73 + __success 74 + __arch_x86_64 75 + /* private stack fp for the main prog */ 76 + __jited(" movabsq $0x{{.*}}, %r9") 77 + __jited(" addq %gs:0x{{.*}}, %r9") 78 + __jited(" movl $0x2a, %edi") 79 + __jited(" movq %rdi, -0x200(%r9)") 80 + __jited(" pushq %r9") 81 + __jited(" callq 0x{{.*}}") 82 + __jited(" popq %r9") 83 + __jited(" xorl %eax, %eax") 84 + __naked void private_stack_nested_1(void) 85 + { 86 + asm volatile (" \ 87 + r1 = 42; \ 88 + *(u64 *)(r10 - %[max_bpf_stack]) = r1; \ 89 + call cumulative_stack_depth_subprog; \ 90 + r0 = 0; \ 91 + exit; \ 92 + " : 93 + : __imm_const(max_bpf_stack, MAX_BPF_STACK) 94 + : __clobber_all); 95 + } 96 + 97 + __naked __noinline __used 98 + static unsigned long loop_callback(void) 99 + { 100 + asm volatile (" \ 101 + call %[bpf_get_prandom_u32]; \ 102 + r1 = 42; \ 103 + *(u64 *)(r10 - 512) = r1; \ 104 + call cumulative_stack_depth_subprog; \ 105 + r0 = 0; \ 106 + exit; \ 107 + " : 108 + : __imm(bpf_get_prandom_u32) 109 + : __clobber_common); 110 + } 111 + 112 + SEC("raw_tp") 113 + __description("Private stack, callback") 114 + __success 115 + __arch_x86_64 116 + /* for func loop_callback */ 117 + __jited("func #1") 118 + __jited(" endbr64") 119 + __jited(" nopl (%rax,%rax)") 120 + __jited(" nopl (%rax)") 121 + __jited(" pushq %rbp") 122 + __jited(" movq %rsp, %rbp") 123 + __jited(" endbr64") 124 + __jited(" movabsq $0x{{.*}}, %r9") 125 + __jited(" addq %gs:0x{{.*}}, %r9") 126 + __jited(" pushq %r9") 127 + __jited(" callq") 128 + __jited(" popq %r9") 129 + __jited(" movl $0x2a, %edi") 130 + __jited(" movq %rdi, -0x200(%r9)") 131 + __jited(" pushq %r9") 132 + __jited(" callq") 133 + __jited(" popq %r9") 134 + __naked void private_stack_callback(void) 135 + { 136 + asm volatile (" \ 137 + r1 = 1; \ 138 + r2 = %[loop_callback]; \ 139 + r3 = 0; \ 140 + r4 = 0; \ 141 + call %[bpf_loop]; \ 142 + r0 = 0; \ 143 + exit; \ 144 + " : 145 + : __imm_ptr(loop_callback), 146 + __imm(bpf_loop) 147 + : __clobber_common); 148 + } 149 + 150 + SEC("fentry/bpf_fentry_test9") 151 + __description("Private stack, exception in main prog") 152 + __success __retval(0) 153 + __arch_x86_64 154 + __jited(" pushq %r9") 155 + __jited(" callq") 156 + __jited(" popq %r9") 157 + int private_stack_exception_main_prog(void) 158 + { 159 + asm volatile (" \ 160 + r1 = 42; \ 161 + *(u64 *)(r10 - 512) = r1; \ 162 + " ::: __clobber_common); 163 + 164 + bpf_throw(0); 165 + return 0; 166 + } 167 + 168 + __used static int subprog_exception(void) 169 + { 170 + bpf_throw(0); 171 + return 0; 172 + } 173 + 174 + SEC("fentry/bpf_fentry_test9") 175 + __description("Private stack, exception in subprog") 176 + __success __retval(0) 177 + __arch_x86_64 178 + __jited(" movq %rdi, -0x200(%r9)") 179 + __jited(" pushq %r9") 180 + __jited(" callq") 181 + __jited(" popq %r9") 182 + int private_stack_exception_sub_prog(void) 183 + { 184 + asm volatile (" \ 185 + r1 = 42; \ 186 + *(u64 *)(r10 - 512) = r1; \ 187 + call subprog_exception; \ 188 + " ::: __clobber_common); 189 + 190 + return 0; 191 + } 192 + 193 + int glob; 194 + __noinline static void subprog2(int *val) 195 + { 196 + glob += val[0] * 2; 197 + } 198 + 199 + __noinline static void subprog1(int *val) 200 + { 201 + int tmp[64] = {}; 202 + 203 + tmp[0] = *val; 204 + subprog2(tmp); 205 + } 206 + 207 + __noinline static int timer_cb1(void *map, int *key, struct bpf_timer *timer) 208 + { 209 + subprog1(key); 210 + return 0; 211 + } 212 + 213 + __noinline static int timer_cb2(void *map, int *key, struct bpf_timer *timer) 214 + { 215 + return 0; 216 + } 217 + 218 + SEC("fentry/bpf_fentry_test9") 219 + __description("Private stack, async callback, not nested") 220 + __success __retval(0) 221 + __arch_x86_64 222 + __jited(" movabsq $0x{{.*}}, %r9") 223 + int private_stack_async_callback_1(void) 224 + { 225 + struct bpf_timer *arr_timer; 226 + int array_key = 0; 227 + 228 + arr_timer = bpf_map_lookup_elem(&array, &array_key); 229 + if (!arr_timer) 230 + return 0; 231 + 232 + bpf_timer_init(arr_timer, &array, 1); 233 + bpf_timer_set_callback(arr_timer, timer_cb2); 234 + bpf_timer_start(arr_timer, 0, 0); 235 + subprog1(&array_key); 236 + return 0; 237 + } 238 + 239 + SEC("fentry/bpf_fentry_test9") 240 + __description("Private stack, async callback, potential nesting") 241 + __success __retval(0) 242 + __arch_x86_64 243 + __jited(" subq $0x100, %rsp") 244 + int private_stack_async_callback_2(void) 245 + { 246 + struct bpf_timer *arr_timer; 247 + int array_key = 0; 248 + 249 + arr_timer = bpf_map_lookup_elem(&array, &array_key); 250 + if (!arr_timer) 251 + return 0; 252 + 253 + bpf_timer_init(arr_timer, &array, 1); 254 + bpf_timer_set_callback(arr_timer, timer_cb1); 255 + bpf_timer_start(arr_timer, 0, 0); 256 + subprog1(&array_key); 257 + return 0; 258 + } 259 + 260 + #else 261 + 262 + SEC("kprobe") 263 + __description("private stack is not supported, use a dummy test") 264 + __success 265 + int dummy_test(void) 266 + { 267 + return 0; 268 + } 269 + 270 + #endif 271 + 272 + char _license[] SEC("license") = "GPL";