Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

+54 -26

Documentation/bpf/standardization/instruction-set.rst

··· 97 97 A: 10000110 98 98 B: 11111111 10000110 99 99 100 + Conformance groups 101 + ------------------ 102 + 103 + An implementation does not need to support all instructions specified in this 104 + document (e.g., deprecated instructions). Instead, a number of conformance 105 + groups are specified. An implementation must support the "basic" conformance 106 + group and may support additional conformance groups, where supporting a 107 + conformance group means it must support all instructions in that conformance 108 + group. 109 + 110 + The use of named conformance groups enables interoperability between a runtime 111 + that executes instructions, and tools as such compilers that generate 112 + instructions for the runtime. Thus, capability discovery in terms of 113 + conformance groups might be done manually by users or automatically by tools. 114 + 115 + Each conformance group has a short ASCII label (e.g., "basic") that 116 + corresponds to a set of instructions that are mandatory. That is, each 117 + instruction has one or more conformance groups of which it is a member. 118 + 119 + The "basic" conformance group includes all instructions defined in this 120 + specification unless otherwise noted. 121 + 100 122 Instruction encoding 101 123 ==================== 102 124 ··· 174 152 This is depicted in the following figure:: 175 153 176 154 basic_instruction 177 - .-----------------------------. 178 - | | 179 - code:8 regs:8 offset:16 imm:32 unused:32 imm:32 180 - | | 181 - '--------------' 182 - pseudo instruction 155 + .------------------------------. 156 + | | 157 + opcode:8 regs:8 offset:16 imm:32 unused:32 imm:32 158 + | | 159 + '--------------' 160 + pseudo instruction 183 161 184 162 Thus the 64-bit immediate value is constructed as follows: 185 163 ··· 317 295 ``BPF_ALU | BPF_MOVSX`` :term:`sign extends<Sign Extend>` 8-bit and 16-bit operands into 32 318 296 bit operands, and zeroes the remaining upper 32 bits. 319 297 ``BPF_ALU64 | BPF_MOVSX`` :term:`sign extends<Sign Extend>` 8-bit, 16-bit, and 32-bit 320 - operands into 64 bit operands. 298 + operands into 64 bit operands. Unlike other arithmetic instructions, 299 + ``BPF_MOVSX`` is only defined for register source operands (``BPF_X``). 300 + 301 + The ``BPF_NEG`` instruction is only defined when the source bit is clear 302 + (``BPF_K``). 321 303 322 304 Shift operations use a mask of 0x3F (63) for 64-bit operations and 0x1F (31) 323 305 for 32-bit operations. ··· 378 352 otherwise identical operations. 379 353 The 'code' field encodes the operation as below: 380 354 381 - ======== ===== === =========================================== ========================================= 382 - code value src description notes 383 - ======== ===== === =========================================== ========================================= 384 - BPF_JA 0x0 0x0 PC += offset BPF_JMP class 385 - BPF_JA 0x0 0x0 PC += imm BPF_JMP32 class 355 + ======== ===== === =============================== ============================================= 356 + code value src description notes 357 + ======== ===== === =============================== ============================================= 358 + BPF_JA 0x0 0x0 PC += offset BPF_JMP | BPF_K only 359 + BPF_JA 0x0 0x0 PC += imm BPF_JMP32 | BPF_K only 386 360 BPF_JEQ 0x1 any PC += offset if dst == src 387 - BPF_JGT 0x2 any PC += offset if dst > src unsigned 388 - BPF_JGE 0x3 any PC += offset if dst >= src unsigned 361 + BPF_JGT 0x2 any PC += offset if dst > src unsigned 362 + BPF_JGE 0x3 any PC += offset if dst >= src unsigned 389 363 BPF_JSET 0x4 any PC += offset if dst & src 390 364 BPF_JNE 0x5 any PC += offset if dst != src 391 - BPF_JSGT 0x6 any PC += offset if dst > src signed 392 - BPF_JSGE 0x7 any PC += offset if dst >= src signed 393 - BPF_CALL 0x8 0x0 call helper function by address see `Helper functions`_ 394 - BPF_CALL 0x8 0x1 call PC += imm see `Program-local functions`_ 395 - BPF_CALL 0x8 0x2 call helper function by BTF ID see `Helper functions`_ 396 - BPF_EXIT 0x9 0x0 return BPF_JMP only 397 - BPF_JLT 0xa any PC += offset if dst < src unsigned 398 - BPF_JLE 0xb any PC += offset if dst <= src unsigned 399 - BPF_JSLT 0xc any PC += offset if dst < src signed 400 - BPF_JSLE 0xd any PC += offset if dst <= src signed 401 - ======== ===== === =========================================== ========================================= 365 + BPF_JSGT 0x6 any PC += offset if dst > src signed 366 + BPF_JSGE 0x7 any PC += offset if dst >= src signed 367 + BPF_CALL 0x8 0x0 call helper function by address BPF_JMP | BPF_K only, see `Helper functions`_ 368 + BPF_CALL 0x8 0x1 call PC += imm BPF_JMP | BPF_K only, see `Program-local functions`_ 369 + BPF_CALL 0x8 0x2 call helper function by BTF ID BPF_JMP | BPF_K only, see `Helper functions`_ 370 + BPF_EXIT 0x9 0x0 return BPF_JMP | BPF_K only 371 + BPF_JLT 0xa any PC += offset if dst < src unsigned 372 + BPF_JLE 0xb any PC += offset if dst <= src unsigned 373 + BPF_JSLT 0xc any PC += offset if dst < src signed 374 + BPF_JSLE 0xd any PC += offset if dst <= src signed 375 + ======== ===== === =============================== ============================================= 402 376 403 377 The BPF program needs to store the return value into register R0 before doing a 404 378 ``BPF_EXIT``. ··· 636 610 637 611 BPF previously introduced special instructions for access to packet data that were 638 612 carried over from classic BPF. However, these instructions are 639 - deprecated and should no longer be used. 613 + deprecated and should no longer be used. All legacy packet access 614 + instructions belong to the "legacy" conformance group instead of the "basic" 615 + conformance group.

+1 -1

Documentation/bpf/verifier.rst

··· 562 562 * ``checkpoint[0].r1`` is marked as read; 563 563 564 564 * At instruction #5 exit is reached and ``checkpoint[0]`` can now be processed 565 - by ``clean_live_states()``. After this processing ``checkpoint[0].r0`` has a 565 + by ``clean_live_states()``. After this processing ``checkpoint[0].r1`` has a 566 566 read mark and all other registers and stack slots are marked as ``NOT_INIT`` 567 567 or ``STACK_INVALID`` 568 568

+3

MAINTAINERS

··· 3799 3799 M: Daniel Borkmann <daniel@iogearbox.net> 3800 3800 M: Andrii Nakryiko <andrii@kernel.org> 3801 3801 R: Martin KaFai Lau <martin.lau@linux.dev> 3802 + R: Eduard Zingerman <eddyz87@gmail.com> 3802 3803 R: Song Liu <song@kernel.org> 3803 3804 R: Yonghong Song <yonghong.song@linux.dev> 3804 3805 R: John Fastabend <john.fastabend@gmail.com> ··· 3860 3859 3861 3860 BPF [LIBRARY] (libbpf) 3862 3861 M: Andrii Nakryiko <andrii@kernel.org> 3862 + M: Eduard Zingerman <eddyz87@gmail.com> 3863 3863 L: bpf@vger.kernel.org 3864 3864 S: Maintained 3865 3865 F: tools/lib/bpf/ ··· 3918 3916 3919 3917 BPF [SELFTESTS] (Test Runners & Infrastructure) 3920 3918 M: Andrii Nakryiko <andrii@kernel.org> 3919 + M: Eduard Zingerman <eddyz87@gmail.com> 3921 3920 R: Mykola Lysenko <mykolal@fb.com> 3922 3921 L: bpf@vger.kernel.org 3923 3922 S: Maintained

+5

arch/arm64/net/bpf_jit_comp.c

··· 2305 2305 2306 2306 return ret; 2307 2307 } 2308 + 2309 + bool bpf_jit_supports_ptr_xchg(void) 2310 + { 2311 + return true; 2312 + }

+5

arch/x86/net/bpf_jit_comp.c

··· 3242 3242 BUG_ON(ret < 0); 3243 3243 } 3244 3244 } 3245 + 3246 + bool bpf_jit_supports_ptr_xchg(void) 3247 + { 3248 + return true; 3249 + }

+1 -1

drivers/media/rc/bpf-lirc.c

··· 110 110 case BPF_FUNC_get_prandom_u32: 111 111 return &bpf_get_prandom_u32_proto; 112 112 case BPF_FUNC_trace_printk: 113 - if (perfmon_capable()) 113 + if (bpf_token_capable(prog->aux->token, CAP_PERFMON)) 114 114 return bpf_get_trace_printk_proto(); 115 115 fallthrough; 116 116 default:

+119 -23

include/linux/bpf.h

··· 52 52 struct bpf_func_state; 53 53 struct ftrace_ops; 54 54 struct cgroup; 55 + struct bpf_token; 56 + struct user_namespace; 57 + struct super_block; 58 + struct inode; 55 59 56 60 extern struct idr btf_idr; 57 61 extern spinlock_t btf_idr_lock; ··· 1489 1485 #ifdef CONFIG_SECURITY 1490 1486 void *security; 1491 1487 #endif 1488 + struct bpf_token *token; 1492 1489 struct bpf_prog_offload *offload; 1493 1490 struct btf *btf; 1494 1491 struct bpf_func_info *func_info; ··· 1614 1609 u32 id; 1615 1610 }; 1616 1611 1612 + struct bpf_mount_opts { 1613 + kuid_t uid; 1614 + kgid_t gid; 1615 + umode_t mode; 1616 + 1617 + /* BPF token-related delegation options */ 1618 + u64 delegate_cmds; 1619 + u64 delegate_maps; 1620 + u64 delegate_progs; 1621 + u64 delegate_attachs; 1622 + }; 1623 + 1624 + struct bpf_token { 1625 + struct work_struct work; 1626 + atomic64_t refcnt; 1627 + struct user_namespace *userns; 1628 + u64 allowed_cmds; 1629 + u64 allowed_maps; 1630 + u64 allowed_progs; 1631 + u64 allowed_attachs; 1632 + #ifdef CONFIG_SECURITY 1633 + void *security; 1634 + #endif 1635 + }; 1636 + 1617 1637 struct bpf_struct_ops_value; 1618 1638 struct btf_member; 1619 1639 ··· 1703 1673 void (*unreg)(void *kdata); 1704 1674 int (*update)(void *kdata, void *old_kdata); 1705 1675 int (*validate)(void *kdata); 1706 - const struct btf_type *type; 1707 - const struct btf_type *value_type; 1676 + void *cfi_stubs; 1677 + struct module *owner; 1708 1678 const char *name; 1709 1679 struct btf_func_model func_models[BPF_STRUCT_OPS_MAX_NR_MEMBERS]; 1680 + }; 1681 + 1682 + struct bpf_struct_ops_desc { 1683 + struct bpf_struct_ops *st_ops; 1684 + 1685 + const struct btf_type *type; 1686 + const struct btf_type *value_type; 1710 1687 u32 type_id; 1711 1688 u32 value_id; 1712 - void *cfi_stubs; 1689 + }; 1690 + 1691 + enum bpf_struct_ops_state { 1692 + BPF_STRUCT_OPS_STATE_INIT, 1693 + BPF_STRUCT_OPS_STATE_INUSE, 1694 + BPF_STRUCT_OPS_STATE_TOBEFREE, 1695 + BPF_STRUCT_OPS_STATE_READY, 1696 + }; 1697 + 1698 + struct bpf_struct_ops_common_value { 1699 + refcount_t refcnt; 1700 + enum bpf_struct_ops_state state; 1713 1701 }; 1714 1702 1715 1703 #if defined(CONFIG_BPF_JIT) && defined(CONFIG_BPF_SYSCALL) 1704 + /* This macro helps developer to register a struct_ops type and generate 1705 + * type information correctly. Developers should use this macro to register 1706 + * a struct_ops type instead of calling __register_bpf_struct_ops() directly. 1707 + */ 1708 + #define register_bpf_struct_ops(st_ops, type) \ 1709 + ({ \ 1710 + struct bpf_struct_ops_##type { \ 1711 + struct bpf_struct_ops_common_value common; \ 1712 + struct type data ____cacheline_aligned_in_smp; \ 1713 + }; \ 1714 + BTF_TYPE_EMIT(struct bpf_struct_ops_##type); \ 1715 + __register_bpf_struct_ops(st_ops); \ 1716 + }) 1716 1717 #define BPF_MODULE_OWNER ((void *)((0xeB9FUL << 2) + POISON_POINTER_DELTA)) 1717 - const struct bpf_struct_ops *bpf_struct_ops_find(u32 type_id); 1718 - void bpf_struct_ops_init(struct btf *btf, struct bpf_verifier_log *log); 1719 1718 bool bpf_struct_ops_get(const void *kdata); 1720 1719 void bpf_struct_ops_put(const void *kdata); 1721 1720 int bpf_struct_ops_map_sys_lookup_elem(struct bpf_map *map, void *key, ··· 1786 1727 int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr, 1787 1728 union bpf_attr __user *uattr); 1788 1729 #endif 1730 + int bpf_struct_ops_desc_init(struct bpf_struct_ops_desc *st_ops_desc, 1731 + struct btf *btf, 1732 + struct bpf_verifier_log *log); 1733 + void bpf_map_struct_ops_info_fill(struct bpf_map_info *info, struct bpf_map *map); 1789 1734 #else 1790 - static inline const struct bpf_struct_ops *bpf_struct_ops_find(u32 type_id) 1791 - { 1792 - return NULL; 1793 - } 1794 - static inline void bpf_struct_ops_init(struct btf *btf, 1795 - struct bpf_verifier_log *log) 1796 - { 1797 - } 1735 + #define register_bpf_struct_ops(st_ops, type) ({ (void *)(st_ops); 0; }) 1798 1736 static inline bool bpf_try_module_get(const void *data, struct module *owner) 1799 1737 { 1800 1738 return try_module_get(owner); ··· 1809 1753 static inline int bpf_struct_ops_link_create(union bpf_attr *attr) 1810 1754 { 1811 1755 return -EOPNOTSUPP; 1756 + } 1757 + static inline void bpf_map_struct_ops_info_fill(struct bpf_map_info *info, struct bpf_map *map) 1758 + { 1812 1759 } 1813 1760 1814 1761 #endif ··· 2127 2068 migrate_enable(); 2128 2069 } 2129 2070 2071 + extern const struct super_operations bpf_super_ops; 2130 2072 extern const struct file_operations bpf_map_fops; 2131 2073 extern const struct file_operations bpf_prog_fops; 2132 2074 extern const struct file_operations bpf_iter_fops; ··· 2262 2202 2263 2203 extern int sysctl_unprivileged_bpf_disabled; 2264 2204 2265 - static inline bool bpf_allow_ptr_leaks(void) 2205 + bool bpf_token_capable(const struct bpf_token *token, int cap); 2206 + 2207 + static inline bool bpf_allow_ptr_leaks(const struct bpf_token *token) 2266 2208 { 2267 - return perfmon_capable(); 2209 + return bpf_token_capable(token, CAP_PERFMON); 2268 2210 } 2269 2211 2270 - static inline bool bpf_allow_uninit_stack(void) 2212 + static inline bool bpf_allow_uninit_stack(const struct bpf_token *token) 2271 2213 { 2272 - return perfmon_capable(); 2214 + return bpf_token_capable(token, CAP_PERFMON); 2273 2215 } 2274 2216 2275 - static inline bool bpf_bypass_spec_v1(void) 2217 + static inline bool bpf_bypass_spec_v1(const struct bpf_token *token) 2276 2218 { 2277 - return cpu_mitigations_off() || perfmon_capable(); 2219 + return cpu_mitigations_off() || bpf_token_capable(token, CAP_PERFMON); 2278 2220 } 2279 2221 2280 - static inline bool bpf_bypass_spec_v4(void) 2222 + static inline bool bpf_bypass_spec_v4(const struct bpf_token *token) 2281 2223 { 2282 - return cpu_mitigations_off() || perfmon_capable(); 2224 + return cpu_mitigations_off() || bpf_token_capable(token, CAP_PERFMON); 2283 2225 } 2284 2226 2285 2227 int bpf_map_new_fd(struct bpf_map *map, int flags); ··· 2298 2236 struct bpf_link *bpf_link_get_from_fd(u32 ufd); 2299 2237 struct bpf_link *bpf_link_get_curr_or_next(u32 *id); 2300 2238 2239 + void bpf_token_inc(struct bpf_token *token); 2240 + void bpf_token_put(struct bpf_token *token); 2241 + int bpf_token_create(union bpf_attr *attr); 2242 + struct bpf_token *bpf_token_get_from_fd(u32 ufd); 2243 + 2244 + bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd); 2245 + bool bpf_token_allow_map_type(const struct bpf_token *token, enum bpf_map_type type); 2246 + bool bpf_token_allow_prog_type(const struct bpf_token *token, 2247 + enum bpf_prog_type prog_type, 2248 + enum bpf_attach_type attach_type); 2249 + 2301 2250 int bpf_obj_pin_user(u32 ufd, int path_fd, const char __user *pathname); 2302 2251 int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags); 2252 + struct inode *bpf_get_inode(struct super_block *sb, const struct inode *dir, 2253 + umode_t mode); 2303 2254 2304 2255 #define BPF_ITER_FUNC_PREFIX "bpf_iter_" 2305 2256 #define DEFINE_BPF_ITER_FUNC(target, args...) \ ··· 2547 2472 struct btf *btf, const struct btf_type *t); 2548 2473 const char *btf_find_decl_tag_value(const struct btf *btf, const struct btf_type *pt, 2549 2474 int comp_idx, const char *tag_key); 2475 + int btf_find_next_decl_tag(const struct btf *btf, const struct btf_type *pt, 2476 + int comp_idx, const char *tag_key, int last_id); 2550 2477 2551 2478 struct bpf_prog *bpf_prog_by_id(u32 id); 2552 2479 struct bpf_link *bpf_link_by_id(u32 id); 2553 2480 2554 - const struct bpf_func_proto *bpf_base_func_proto(enum bpf_func_id func_id); 2481 + const struct bpf_func_proto *bpf_base_func_proto(enum bpf_func_id func_id, 2482 + const struct bpf_prog *prog); 2555 2483 void bpf_task_storage_free(struct task_struct *task); 2556 2484 void bpf_cgrp_storage_free(struct cgroup *cgroup); 2557 2485 bool bpf_prog_has_kfunc_call(const struct bpf_prog *prog); ··· 2671 2593 static inline int bpf_obj_get_user(const char __user *pathname, int flags) 2672 2594 { 2673 2595 return -EOPNOTSUPP; 2596 + } 2597 + 2598 + static inline bool bpf_token_capable(const struct bpf_token *token, int cap) 2599 + { 2600 + return capable(cap) || (cap != CAP_SYS_ADMIN && capable(CAP_SYS_ADMIN)); 2601 + } 2602 + 2603 + static inline void bpf_token_inc(struct bpf_token *token) 2604 + { 2605 + } 2606 + 2607 + static inline void bpf_token_put(struct bpf_token *token) 2608 + { 2609 + } 2610 + 2611 + static inline struct bpf_token *bpf_token_get_from_fd(u32 ufd) 2612 + { 2613 + return ERR_PTR(-EOPNOTSUPP); 2674 2614 } 2675 2615 2676 2616 static inline void __dev_flush(void) ··· 2814 2718 } 2815 2719 2816 2720 static inline const struct bpf_func_proto * 2817 - bpf_base_func_proto(enum bpf_func_id func_id) 2721 + bpf_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) 2818 2722 { 2819 2723 return NULL; 2820 2724 }

+2 -1

include/linux/bpf_verifier.h

··· 453 453 454 454 #define bpf_get_spilled_reg(slot, frame, mask) \ 455 455 (((slot < frame->allocated_stack / BPF_REG_SIZE) && \ 456 - ((1 << frame->stack[slot].slot_type[0]) & (mask))) \ 456 + ((1 << frame->stack[slot].slot_type[BPF_REG_SIZE - 1]) & (mask))) \ 457 457 ? &frame->stack[slot].spilled_ptr : NULL) 458 458 459 459 /* Iterate over 'frame', setting 'reg' to either NULL or a spilled register. */ ··· 662 662 u32 prev_insn_idx; 663 663 struct bpf_prog *prog; /* eBPF program being verified */ 664 664 const struct bpf_verifier_ops *ops; 665 + struct module *attach_btf_mod; /* The owner module of prog->aux->attach_btf */ 665 666 struct bpf_verifier_stack_elem *head; /* stack of verifier states to be processed */ 666 667 int stack_size; /* number of states to be processed */ 667 668 bool strict_alignment; /* perform strict pointer alignment checks */

+13

include/linux/btf.h

··· 137 137 138 138 extern const struct file_operations btf_fops; 139 139 140 + const char *btf_get_name(const struct btf *btf); 140 141 void btf_get(struct btf *btf); 141 142 void btf_put(struct btf *btf); 142 143 int btf_new_fd(const union bpf_attr *attr, bpfptr_t uattr, u32 uattr_sz); ··· 496 495 } 497 496 498 497 struct bpf_verifier_log; 498 + 499 + #if defined(CONFIG_BPF_JIT) && defined(CONFIG_BPF_SYSCALL) 500 + struct bpf_struct_ops; 501 + int __register_bpf_struct_ops(struct bpf_struct_ops *st_ops); 502 + const struct bpf_struct_ops_desc *bpf_struct_ops_find_value(struct btf *btf, u32 value_id); 503 + const struct bpf_struct_ops_desc *bpf_struct_ops_find(struct btf *btf, u32 type_id); 504 + #else 505 + static inline const struct bpf_struct_ops_desc *bpf_struct_ops_find(struct btf *btf, u32 type_id) 506 + { 507 + return NULL; 508 + } 509 + #endif 499 510 500 511 #ifdef CONFIG_BPF_SYSCALL 501 512 const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id);

+2 -1

include/linux/filter.h

··· 955 955 bool bpf_jit_supports_kfunc_call(void); 956 956 bool bpf_jit_supports_far_kfunc_call(void); 957 957 bool bpf_jit_supports_exceptions(void); 958 + bool bpf_jit_supports_ptr_xchg(void); 958 959 void arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp), void *cookie); 959 960 bool bpf_helper_changes_pkt_data(void *func); 960 961 ··· 1140 1139 return false; 1141 1140 if (!bpf_jit_harden) 1142 1141 return false; 1143 - if (bpf_jit_harden == 1 && bpf_capable()) 1142 + if (bpf_jit_harden == 1 && bpf_token_capable(prog->aux->token, CAP_BPF)) 1144 1143 return false; 1145 1144 1146 1145 return true;

+11 -4

include/linux/lsm_hook_defs.h

··· 404 404 LSM_HOOK(int, 0, bpf, int cmd, union bpf_attr *attr, unsigned int size) 405 405 LSM_HOOK(int, 0, bpf_map, struct bpf_map *map, fmode_t fmode) 406 406 LSM_HOOK(int, 0, bpf_prog, struct bpf_prog *prog) 407 - LSM_HOOK(int, 0, bpf_map_alloc_security, struct bpf_map *map) 408 - LSM_HOOK(void, LSM_RET_VOID, bpf_map_free_security, struct bpf_map *map) 409 - LSM_HOOK(int, 0, bpf_prog_alloc_security, struct bpf_prog_aux *aux) 410 - LSM_HOOK(void, LSM_RET_VOID, bpf_prog_free_security, struct bpf_prog_aux *aux) 407 + LSM_HOOK(int, 0, bpf_map_create, struct bpf_map *map, union bpf_attr *attr, 408 + struct bpf_token *token) 409 + LSM_HOOK(void, LSM_RET_VOID, bpf_map_free, struct bpf_map *map) 410 + LSM_HOOK(int, 0, bpf_prog_load, struct bpf_prog *prog, union bpf_attr *attr, 411 + struct bpf_token *token) 412 + LSM_HOOK(void, LSM_RET_VOID, bpf_prog_free, struct bpf_prog *prog) 413 + LSM_HOOK(int, 0, bpf_token_create, struct bpf_token *token, union bpf_attr *attr, 414 + struct path *path) 415 + LSM_HOOK(void, LSM_RET_VOID, bpf_token_free, struct bpf_token *token) 416 + LSM_HOOK(int, 0, bpf_token_cmd, const struct bpf_token *token, enum bpf_cmd cmd) 417 + LSM_HOOK(int, 0, bpf_token_capable, const struct bpf_token *token, int cap) 411 418 #endif /* CONFIG_BPF_SYSCALL */ 412 419 413 420 LSM_HOOK(int, 0, locked_down, enum lockdown_reason what)

+36 -7

include/linux/security.h

··· 32 32 #include <linux/string.h> 33 33 #include <linux/mm.h> 34 34 #include <linux/sockptr.h> 35 + #include <linux/bpf.h> 35 36 #include <uapi/linux/lsm.h> 36 37 37 38 struct linux_binprm; ··· 2065 2064 union bpf_attr; 2066 2065 struct bpf_map; 2067 2066 struct bpf_prog; 2068 - struct bpf_prog_aux; 2067 + struct bpf_token; 2069 2068 #ifdef CONFIG_SECURITY 2070 2069 extern int security_bpf(int cmd, union bpf_attr *attr, unsigned int size); 2071 2070 extern int security_bpf_map(struct bpf_map *map, fmode_t fmode); 2072 2071 extern int security_bpf_prog(struct bpf_prog *prog); 2073 - extern int security_bpf_map_alloc(struct bpf_map *map); 2072 + extern int security_bpf_map_create(struct bpf_map *map, union bpf_attr *attr, 2073 + struct bpf_token *token); 2074 2074 extern void security_bpf_map_free(struct bpf_map *map); 2075 - extern int security_bpf_prog_alloc(struct bpf_prog_aux *aux); 2076 - extern void security_bpf_prog_free(struct bpf_prog_aux *aux); 2075 + extern int security_bpf_prog_load(struct bpf_prog *prog, union bpf_attr *attr, 2076 + struct bpf_token *token); 2077 + extern void security_bpf_prog_free(struct bpf_prog *prog); 2078 + extern int security_bpf_token_create(struct bpf_token *token, union bpf_attr *attr, 2079 + struct path *path); 2080 + extern void security_bpf_token_free(struct bpf_token *token); 2081 + extern int security_bpf_token_cmd(const struct bpf_token *token, enum bpf_cmd cmd); 2082 + extern int security_bpf_token_capable(const struct bpf_token *token, int cap); 2077 2083 #else 2078 2084 static inline int security_bpf(int cmd, union bpf_attr *attr, 2079 2085 unsigned int size) ··· 2098 2090 return 0; 2099 2091 } 2100 2092 2101 - static inline int security_bpf_map_alloc(struct bpf_map *map) 2093 + static inline int security_bpf_map_create(struct bpf_map *map, union bpf_attr *attr, 2094 + struct bpf_token *token) 2102 2095 { 2103 2096 return 0; 2104 2097 } ··· 2107 2098 static inline void security_bpf_map_free(struct bpf_map *map) 2108 2099 { } 2109 2100 2110 - static inline int security_bpf_prog_alloc(struct bpf_prog_aux *aux) 2101 + static inline int security_bpf_prog_load(struct bpf_prog *prog, union bpf_attr *attr, 2102 + struct bpf_token *token) 2111 2103 { 2112 2104 return 0; 2113 2105 } 2114 2106 2115 - static inline void security_bpf_prog_free(struct bpf_prog_aux *aux) 2107 + static inline void security_bpf_prog_free(struct bpf_prog *prog) 2116 2108 { } 2109 + 2110 + static inline int security_bpf_token_create(struct bpf_token *token, union bpf_attr *attr, 2111 + struct path *path) 2112 + { 2113 + return 0; 2114 + } 2115 + 2116 + static inline void security_bpf_token_free(struct bpf_token *token) 2117 + { } 2118 + 2119 + static inline int security_bpf_token_cmd(const struct bpf_token *token, enum bpf_cmd cmd) 2120 + { 2121 + return 0; 2122 + } 2123 + 2124 + static inline int security_bpf_token_capable(const struct bpf_token *token, int cap) 2125 + { 2126 + return 0; 2127 + } 2117 2128 #endif /* CONFIG_SECURITY */ 2118 2129 #endif /* CONFIG_BPF_SYSCALL */ 2119 2130

+39

include/net/request_sock.h

··· 83 83 return (struct sock *)req; 84 84 } 85 85 86 + /** 87 + * skb_steal_sock - steal a socket from an sk_buff 88 + * @skb: sk_buff to steal the socket from 89 + * @refcounted: is set to true if the socket is reference-counted 90 + * @prefetched: is set to true if the socket was assigned from bpf 91 + */ 92 + static inline struct sock *skb_steal_sock(struct sk_buff *skb, 93 + bool *refcounted, bool *prefetched) 94 + { 95 + struct sock *sk = skb->sk; 96 + 97 + if (!sk) { 98 + *prefetched = false; 99 + *refcounted = false; 100 + return NULL; 101 + } 102 + 103 + *prefetched = skb_sk_is_prefetched(skb); 104 + if (*prefetched) { 105 + #if IS_ENABLED(CONFIG_SYN_COOKIES) 106 + if (sk->sk_state == TCP_NEW_SYN_RECV && inet_reqsk(sk)->syncookie) { 107 + struct request_sock *req = inet_reqsk(sk); 108 + 109 + *refcounted = false; 110 + sk = req->rsk_listener; 111 + req->rsk_listener = NULL; 112 + return sk; 113 + } 114 + #endif 115 + *refcounted = sk_is_refcounted(sk); 116 + } else { 117 + *refcounted = true; 118 + } 119 + 120 + skb->destructor = NULL; 121 + skb->sk = NULL; 122 + return sk; 123 + } 124 + 86 125 static inline struct request_sock * 87 126 reqsk_alloc(const struct request_sock_ops *ops, struct sock *sk_listener, 88 127 bool attach_listener)

-25

include/net/sock.h

··· 2830 2830 return !sk_fullsock(sk) || !sock_flag(sk, SOCK_RCU_FREE); 2831 2831 } 2832 2832 2833 - /** 2834 - * skb_steal_sock - steal a socket from an sk_buff 2835 - * @skb: sk_buff to steal the socket from 2836 - * @refcounted: is set to true if the socket is reference-counted 2837 - * @prefetched: is set to true if the socket was assigned from bpf 2838 - */ 2839 - static inline struct sock * 2840 - skb_steal_sock(struct sk_buff *skb, bool *refcounted, bool *prefetched) 2841 - { 2842 - if (skb->sk) { 2843 - struct sock *sk = skb->sk; 2844 - 2845 - *refcounted = true; 2846 - *prefetched = skb_sk_is_prefetched(skb); 2847 - if (*prefetched) 2848 - *refcounted = sk_is_refcounted(sk); 2849 - skb->destructor = NULL; 2850 - skb->sk = NULL; 2851 - return sk; 2852 - } 2853 - *prefetched = false; 2854 - *refcounted = false; 2855 - return NULL; 2856 - } 2857 - 2858 2833 /* Checks if this SKB belongs to an HW offloaded socket 2859 2834 * and whether any SW fallbacks are required based on dev. 2860 2835 * Check decrypted mark in case skb_orphan() cleared socket.

+45

include/net/tcp.h

··· 498 498 struct tcp_options_received *tcp_opt, 499 499 int mss, u32 tsoff); 500 500 501 + #if IS_ENABLED(CONFIG_BPF) 502 + struct bpf_tcp_req_attrs { 503 + u32 rcv_tsval; 504 + u32 rcv_tsecr; 505 + u16 mss; 506 + u8 rcv_wscale; 507 + u8 snd_wscale; 508 + u8 ecn_ok; 509 + u8 wscale_ok; 510 + u8 sack_ok; 511 + u8 tstamp_ok; 512 + u8 usec_ts_ok; 513 + u8 reserved[3]; 514 + }; 515 + #endif 516 + 501 517 #ifdef CONFIG_SYN_COOKIES 502 518 503 519 /* Syncookies use a monotonic timer which increments every 60 seconds. ··· 593 577 return val; 594 578 } 595 579 580 + /* Convert one nsec 64bit timestamp to ts (ms or usec resolution) */ 581 + static inline u64 tcp_ns_to_ts(bool usec_ts, u64 val) 582 + { 583 + if (usec_ts) 584 + return div_u64(val, NSEC_PER_USEC); 585 + 586 + return div_u64(val, NSEC_PER_MSEC); 587 + } 588 + 596 589 u32 __cookie_v4_init_sequence(const struct iphdr *iph, const struct tcphdr *th, 597 590 u16 *mssp); 598 591 __u32 cookie_v4_init_sequence(const struct sk_buff *skb, __u16 *mss); ··· 614 589 return READ_ONCE(net->ipv4.sysctl_tcp_ecn) || 615 590 dst_feature(dst, RTAX_FEATURE_ECN); 616 591 } 592 + 593 + #if IS_ENABLED(CONFIG_BPF) 594 + static inline bool cookie_bpf_ok(struct sk_buff *skb) 595 + { 596 + return skb->sk; 597 + } 598 + 599 + struct request_sock *cookie_bpf_check(struct sock *sk, struct sk_buff *skb); 600 + #else 601 + static inline bool cookie_bpf_ok(struct sk_buff *skb) 602 + { 603 + return false; 604 + } 605 + 606 + static inline struct request_sock *cookie_bpf_check(struct net *net, struct sock *sk, 607 + struct sk_buff *skb) 608 + { 609 + return NULL; 610 + } 611 + #endif 617 612 618 613 /* From net/ipv6/syncookies.c */ 619 614 int __cookie_v6_check(const struct ipv6hdr *iph, const struct tcphdr *th);

+74 -4

include/uapi/linux/bpf.h

··· 847 847 * Returns zero on success. On error, -1 is returned and *errno* 848 848 * is set appropriately. 849 849 * 850 + * BPF_TOKEN_CREATE 851 + * Description 852 + * Create BPF token with embedded information about what 853 + * BPF-related functionality it allows: 854 + * - a set of allowed bpf() syscall commands; 855 + * - a set of allowed BPF map types to be created with 856 + * BPF_MAP_CREATE command, if BPF_MAP_CREATE itself is allowed; 857 + * - a set of allowed BPF program types and BPF program attach 858 + * types to be loaded with BPF_PROG_LOAD command, if 859 + * BPF_PROG_LOAD itself is allowed. 860 + * 861 + * BPF token is created (derived) from an instance of BPF FS, 862 + * assuming it has necessary delegation mount options specified. 863 + * This BPF token can be passed as an extra parameter to various 864 + * bpf() syscall commands to grant BPF subsystem functionality to 865 + * unprivileged processes. 866 + * 867 + * When created, BPF token is "associated" with the owning 868 + * user namespace of BPF FS instance (super block) that it was 869 + * derived from, and subsequent BPF operations performed with 870 + * BPF token would be performing capabilities checks (i.e., 871 + * CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN) within 872 + * that user namespace. Without BPF token, such capabilities 873 + * have to be granted in init user namespace, making bpf() 874 + * syscall incompatible with user namespace, for the most part. 875 + * 876 + * Return 877 + * A new file descriptor (a nonnegative integer), or -1 if an 878 + * error occurred (in which case, *errno* is set appropriately). 879 + * 850 880 * NOTES 851 881 * eBPF objects (maps and programs) can be shared between processes. 852 882 * ··· 931 901 BPF_ITER_CREATE, 932 902 BPF_LINK_DETACH, 933 903 BPF_PROG_BIND_MAP, 904 + BPF_TOKEN_CREATE, 905 + __MAX_BPF_CMD, 934 906 }; 935 907 936 908 enum bpf_map_type { ··· 983 951 BPF_MAP_TYPE_BLOOM_FILTER, 984 952 BPF_MAP_TYPE_USER_RINGBUF, 985 953 BPF_MAP_TYPE_CGRP_STORAGE, 954 + __MAX_BPF_MAP_TYPE 986 955 }; 987 956 988 957 /* Note that tracing related programs such as ··· 1028 995 BPF_PROG_TYPE_SK_LOOKUP, 1029 996 BPF_PROG_TYPE_SYSCALL, /* a program that can execute syscalls */ 1030 997 BPF_PROG_TYPE_NETFILTER, 998 + __MAX_BPF_PROG_TYPE 1031 999 }; 1032 1000 1033 1001 enum bpf_attach_type { ··· 1364 1330 1365 1331 /* Get path from provided FD in BPF_OBJ_PIN/BPF_OBJ_GET commands */ 1366 1332 BPF_F_PATH_FD = (1U << 14), 1333 + 1334 + /* Flag for value_type_btf_obj_fd, the fd is available */ 1335 + BPF_F_VTYPE_BTF_OBJ_FD = (1U << 15), 1336 + 1337 + /* BPF token FD is passed in a corresponding command's token_fd field */ 1338 + BPF_F_TOKEN_FD = (1U << 16), 1367 1339 }; 1368 1340 1369 1341 /* Flags for BPF_PROG_QUERY. */ ··· 1443 1403 * to using 5 hash functions). 1444 1404 */ 1445 1405 __u64 map_extra; 1406 + 1407 + __s32 value_type_btf_obj_fd; /* fd pointing to a BTF 1408 + * type data for 1409 + * btf_vmlinux_value_type_id. 1410 + */ 1411 + /* BPF token FD to use with BPF_MAP_CREATE operation. 1412 + * If provided, map_flags should have BPF_F_TOKEN_FD flag set. 1413 + */ 1414 + __s32 map_token_fd; 1446 1415 }; 1447 1416 1448 1417 struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */ ··· 1521 1472 * truncated), or smaller (if log buffer wasn't filled completely). 1522 1473 */ 1523 1474 __u32 log_true_size; 1475 + /* BPF token FD to use with BPF_PROG_LOAD operation. 1476 + * If provided, prog_flags should have BPF_F_TOKEN_FD flag set. 1477 + */ 1478 + __s32 prog_token_fd; 1524 1479 }; 1525 1480 1526 1481 struct { /* anonymous struct used by BPF_OBJ_* commands */ ··· 1637 1584 * truncated), or smaller (if log buffer wasn't filled completely). 1638 1585 */ 1639 1586 __u32 btf_log_true_size; 1587 + __u32 btf_flags; 1588 + /* BPF token FD to use with BPF_BTF_LOAD operation. 1589 + * If provided, btf_flags should have BPF_F_TOKEN_FD flag set. 1590 + */ 1591 + __s32 btf_token_fd; 1640 1592 }; 1641 1593 1642 1594 struct { ··· 1771 1713 __u32 map_fd; 1772 1714 __u32 flags; /* extra flags */ 1773 1715 } prog_bind_map; 1716 + 1717 + struct { /* struct used by BPF_TOKEN_CREATE command */ 1718 + __u32 flags; 1719 + __u32 bpffs_fd; 1720 + } token_create; 1774 1721 1775 1722 } __attribute__((aligned(8))); 1776 1723 ··· 4902 4839 * going through the CPU's backlog queue. 4903 4840 * 4904 4841 * The *flags* argument is reserved and must be 0. The helper is 4905 - * currently only supported for tc BPF program types at the ingress 4906 - * hook and for veth device types. The peer device must reside in a 4907 - * different network namespace. 4842 + * currently only supported for tc BPF program types at the 4843 + * ingress hook and for veth and netkit target device types. The 4844 + * peer device must reside in a different network namespace. 4908 4845 * Return 4909 4846 * The helper returns **TC_ACT_REDIRECT** on success or 4910 4847 * **TC_ACT_SHOT** on error. ··· 6550 6487 __u32 btf_id; 6551 6488 __u32 btf_key_type_id; 6552 6489 __u32 btf_value_type_id; 6553 - __u32 :32; /* alignment pad */ 6490 + __u32 btf_vmlinux_id; 6554 6491 __u64 map_extra; 6555 6492 } __attribute__((aligned(8))); 6556 6493 ··· 6626 6563 __u32 count; /* in/out: kprobe_multi function count */ 6627 6564 __u32 flags; 6628 6565 __u64 missed; 6566 + __aligned_u64 cookies; 6629 6567 } kprobe_multi; 6630 6568 struct { 6631 6569 __aligned_u64 path; ··· 6646 6582 __aligned_u64 file_name; /* in/out */ 6647 6583 __u32 name_len; 6648 6584 __u32 offset; /* offset from file_name */ 6585 + __u64 cookie; 6649 6586 } uprobe; /* BPF_PERF_EVENT_UPROBE, BPF_PERF_EVENT_URETPROBE */ 6650 6587 struct { 6651 6588 __aligned_u64 func_name; /* in/out */ ··· 6654 6589 __u32 offset; /* offset from func_name */ 6655 6590 __u64 addr; 6656 6591 __u64 missed; 6592 + __u64 cookie; 6657 6593 } kprobe; /* BPF_PERF_EVENT_KPROBE, BPF_PERF_EVENT_KRETPROBE */ 6658 6594 struct { 6659 6595 __aligned_u64 tp_name; /* in/out */ 6660 6596 __u32 name_len; 6597 + __u32 :32; 6598 + __u64 cookie; 6661 6599 } tracepoint; /* BPF_PERF_EVENT_TRACEPOINT */ 6662 6600 struct { 6663 6601 __u64 config; 6664 6602 __u32 type; 6603 + __u32 :32; 6604 + __u64 cookie; 6665 6605 } event; /* BPF_PERF_EVENT_EVENT */ 6666 6606 }; 6667 6607 } perf_event;

+1 -1

kernel/bpf/Makefile

··· 6 6 endif 7 7 CFLAGS_core.o += $(call cc-disable-warning, override-init) $(cflags-nogcse-yy) 8 8 9 - obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o log.o 9 + obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o log.o token.o 10 10 obj-$(CONFIG_BPF_SYSCALL) += bpf_iter.o map_iter.o task_iter.o prog_iter.o link_iter.o 11 11 obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o bloom_filter.o 12 12 obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o

+1 -1

kernel/bpf/arraymap.c

··· 82 82 bool percpu = attr->map_type == BPF_MAP_TYPE_PERCPU_ARRAY; 83 83 int numa_node = bpf_map_attr_numa_node(attr); 84 84 u32 elem_size, index_mask, max_entries; 85 - bool bypass_spec_v1 = bpf_bypass_spec_v1(); 85 + bool bypass_spec_v1 = bpf_bypass_spec_v1(NULL); 86 86 u64 array_size, mask64; 87 87 struct bpf_array *array; 88 88

+10 -5

kernel/bpf/bpf_lsm.c

··· 260 260 BTF_SET_START(sleepable_lsm_hooks) 261 261 BTF_ID(func, bpf_lsm_bpf) 262 262 BTF_ID(func, bpf_lsm_bpf_map) 263 - BTF_ID(func, bpf_lsm_bpf_map_alloc_security) 264 - BTF_ID(func, bpf_lsm_bpf_map_free_security) 263 + BTF_ID(func, bpf_lsm_bpf_map_create) 264 + BTF_ID(func, bpf_lsm_bpf_map_free) 265 265 BTF_ID(func, bpf_lsm_bpf_prog) 266 + BTF_ID(func, bpf_lsm_bpf_prog_load) 267 + BTF_ID(func, bpf_lsm_bpf_prog_free) 268 + BTF_ID(func, bpf_lsm_bpf_token_create) 269 + BTF_ID(func, bpf_lsm_bpf_token_free) 270 + BTF_ID(func, bpf_lsm_bpf_token_cmd) 271 + BTF_ID(func, bpf_lsm_bpf_token_capable) 266 272 BTF_ID(func, bpf_lsm_bprm_check_security) 267 273 BTF_ID(func, bpf_lsm_bprm_committed_creds) 268 274 BTF_ID(func, bpf_lsm_bprm_committing_creds) ··· 363 357 BTF_SET_END(sleepable_lsm_hooks) 364 358 365 359 BTF_SET_START(untrusted_lsm_hooks) 366 - BTF_ID(func, bpf_lsm_bpf_map_free_security) 367 - BTF_ID(func, bpf_lsm_bpf_prog_alloc_security) 368 - BTF_ID(func, bpf_lsm_bpf_prog_free_security) 360 + BTF_ID(func, bpf_lsm_bpf_map_free) 361 + BTF_ID(func, bpf_lsm_bpf_prog_free) 369 362 BTF_ID(func, bpf_lsm_file_alloc_security) 370 363 BTF_ID(func, bpf_lsm_file_free_security) 371 364 #ifdef CONFIG_SECURITY_NETWORK

+236 -211

kernel/bpf/bpf_struct_ops.c

··· 13 13 #include <linux/btf_ids.h> 14 14 #include <linux/rcupdate_wait.h> 15 15 16 - enum bpf_struct_ops_state { 17 - BPF_STRUCT_OPS_STATE_INIT, 18 - BPF_STRUCT_OPS_STATE_INUSE, 19 - BPF_STRUCT_OPS_STATE_TOBEFREE, 20 - BPF_STRUCT_OPS_STATE_READY, 21 - }; 22 - 23 - #define BPF_STRUCT_OPS_COMMON_VALUE \ 24 - refcount_t refcnt; \ 25 - enum bpf_struct_ops_state state 26 - 27 16 struct bpf_struct_ops_value { 28 - BPF_STRUCT_OPS_COMMON_VALUE; 17 + struct bpf_struct_ops_common_value common; 29 18 char data[] ____cacheline_aligned_in_smp; 30 19 }; 31 20 32 21 struct bpf_struct_ops_map { 33 22 struct bpf_map map; 34 23 struct rcu_head rcu; 35 - const struct bpf_struct_ops *st_ops; 24 + const struct bpf_struct_ops_desc *st_ops_desc; 36 25 /* protect map_update */ 37 26 struct mutex lock; 38 27 /* link has all the bpf_links that is populated ··· 29 40 * (in kvalue.data). 30 41 */ 31 42 struct bpf_link **links; 43 + u32 links_cnt; 32 44 /* image is a page that has all the trampolines 33 45 * that stores the func args before calling the bpf_prog. 34 46 * A PAGE_SIZE "image" is enough to store all trampoline for 35 47 * "links[]". 36 48 */ 37 49 void *image; 50 + /* The owner moduler's btf. */ 51 + struct btf *btf; 38 52 /* uvalue->data stores the kernel struct 39 53 * (e.g. tcp_congestion_ops) that is more useful 40 54 * to userspace than the kvalue. For example, ··· 62 70 #define VALUE_PREFIX "bpf_struct_ops_" 63 71 #define VALUE_PREFIX_LEN (sizeof(VALUE_PREFIX) - 1) 64 72 65 - /* bpf_struct_ops_##_name (e.g. bpf_struct_ops_tcp_congestion_ops) is 66 - * the map's value exposed to the userspace and its btf-type-id is 67 - * stored at the map->btf_vmlinux_value_type_id. 68 - * 69 - */ 70 - #define BPF_STRUCT_OPS_TYPE(_name) \ 71 - extern struct bpf_struct_ops bpf_##_name; \ 72 - \ 73 - struct bpf_struct_ops_##_name { \ 74 - BPF_STRUCT_OPS_COMMON_VALUE; \ 75 - struct _name data ____cacheline_aligned_in_smp; \ 76 - }; 77 - #include "bpf_struct_ops_types.h" 78 - #undef BPF_STRUCT_OPS_TYPE 79 - 80 - enum { 81 - #define BPF_STRUCT_OPS_TYPE(_name) BPF_STRUCT_OPS_TYPE_##_name, 82 - #include "bpf_struct_ops_types.h" 83 - #undef BPF_STRUCT_OPS_TYPE 84 - __NR_BPF_STRUCT_OPS_TYPE, 85 - }; 86 - 87 - static struct bpf_struct_ops * const bpf_struct_ops[] = { 88 - #define BPF_STRUCT_OPS_TYPE(_name) \ 89 - [BPF_STRUCT_OPS_TYPE_##_name] = &bpf_##_name, 90 - #include "bpf_struct_ops_types.h" 91 - #undef BPF_STRUCT_OPS_TYPE 92 - }; 93 - 94 73 const struct bpf_verifier_ops bpf_struct_ops_verifier_ops = { 95 74 }; 96 75 ··· 71 108 #endif 72 109 }; 73 110 74 - static const struct btf_type *module_type; 111 + BTF_ID_LIST(st_ops_ids) 112 + BTF_ID(struct, module) 113 + BTF_ID(struct, bpf_struct_ops_common_value) 75 114 76 - void bpf_struct_ops_init(struct btf *btf, struct bpf_verifier_log *log) 77 - { 78 - s32 type_id, value_id, module_id; 79 - const struct btf_member *member; 80 - struct bpf_struct_ops *st_ops; 81 - const struct btf_type *t; 82 - char value_name[128]; 83 - const char *mname; 84 - u32 i, j; 85 - 86 - /* Ensure BTF type is emitted for "struct bpf_struct_ops_##_name" */ 87 - #define BPF_STRUCT_OPS_TYPE(_name) BTF_TYPE_EMIT(struct bpf_struct_ops_##_name); 88 - #include "bpf_struct_ops_types.h" 89 - #undef BPF_STRUCT_OPS_TYPE 90 - 91 - module_id = btf_find_by_name_kind(btf, "module", BTF_KIND_STRUCT); 92 - if (module_id < 0) { 93 - pr_warn("Cannot find struct module in btf_vmlinux\n"); 94 - return; 95 - } 96 - module_type = btf_type_by_id(btf, module_id); 97 - 98 - for (i = 0; i < ARRAY_SIZE(bpf_struct_ops); i++) { 99 - st_ops = bpf_struct_ops[i]; 100 - 101 - if (strlen(st_ops->name) + VALUE_PREFIX_LEN >= 102 - sizeof(value_name)) { 103 - pr_warn("struct_ops name %s is too long\n", 104 - st_ops->name); 105 - continue; 106 - } 107 - sprintf(value_name, "%s%s", VALUE_PREFIX, st_ops->name); 108 - 109 - value_id = btf_find_by_name_kind(btf, value_name, 110 - BTF_KIND_STRUCT); 111 - if (value_id < 0) { 112 - pr_warn("Cannot find struct %s in btf_vmlinux\n", 113 - value_name); 114 - continue; 115 - } 116 - 117 - type_id = btf_find_by_name_kind(btf, st_ops->name, 118 - BTF_KIND_STRUCT); 119 - if (type_id < 0) { 120 - pr_warn("Cannot find struct %s in btf_vmlinux\n", 121 - st_ops->name); 122 - continue; 123 - } 124 - t = btf_type_by_id(btf, type_id); 125 - if (btf_type_vlen(t) > BPF_STRUCT_OPS_MAX_NR_MEMBERS) { 126 - pr_warn("Cannot support #%u members in struct %s\n", 127 - btf_type_vlen(t), st_ops->name); 128 - continue; 129 - } 130 - 131 - for_each_member(j, t, member) { 132 - const struct btf_type *func_proto; 133 - 134 - mname = btf_name_by_offset(btf, member->name_off); 135 - if (!*mname) { 136 - pr_warn("anon member in struct %s is not supported\n", 137 - st_ops->name); 138 - break; 139 - } 140 - 141 - if (__btf_member_bitfield_size(t, member)) { 142 - pr_warn("bit field member %s in struct %s is not supported\n", 143 - mname, st_ops->name); 144 - break; 145 - } 146 - 147 - func_proto = btf_type_resolve_func_ptr(btf, 148 - member->type, 149 - NULL); 150 - if (func_proto && 151 - btf_distill_func_proto(log, btf, 152 - func_proto, mname, 153 - &st_ops->func_models[j])) { 154 - pr_warn("Error in parsing func ptr %s in struct %s\n", 155 - mname, st_ops->name); 156 - break; 157 - } 158 - } 159 - 160 - if (j == btf_type_vlen(t)) { 161 - if (st_ops->init(btf)) { 162 - pr_warn("Error in init bpf_struct_ops %s\n", 163 - st_ops->name); 164 - } else { 165 - st_ops->type_id = type_id; 166 - st_ops->type = t; 167 - st_ops->value_id = value_id; 168 - st_ops->value_type = btf_type_by_id(btf, 169 - value_id); 170 - } 171 - } 172 - } 173 - } 115 + enum { 116 + IDX_MODULE_ID, 117 + IDX_ST_OPS_COMMON_VALUE_ID, 118 + }; 174 119 175 120 extern struct btf *btf_vmlinux; 176 121 177 - static const struct bpf_struct_ops * 178 - bpf_struct_ops_find_value(u32 value_id) 122 + static bool is_valid_value_type(struct btf *btf, s32 value_id, 123 + const struct btf_type *type, 124 + const char *value_name) 179 125 { 180 - unsigned int i; 126 + const struct btf_type *common_value_type; 127 + const struct btf_member *member; 128 + const struct btf_type *vt, *mt; 181 129 182 - if (!value_id || !btf_vmlinux) 183 - return NULL; 184 - 185 - for (i = 0; i < ARRAY_SIZE(bpf_struct_ops); i++) { 186 - if (bpf_struct_ops[i]->value_id == value_id) 187 - return bpf_struct_ops[i]; 130 + vt = btf_type_by_id(btf, value_id); 131 + if (btf_vlen(vt) != 2) { 132 + pr_warn("The number of %s's members should be 2, but we get %d\n", 133 + value_name, btf_vlen(vt)); 134 + return false; 135 + } 136 + member = btf_type_member(vt); 137 + mt = btf_type_by_id(btf, member->type); 138 + common_value_type = btf_type_by_id(btf_vmlinux, 139 + st_ops_ids[IDX_ST_OPS_COMMON_VALUE_ID]); 140 + if (mt != common_value_type) { 141 + pr_warn("The first member of %s should be bpf_struct_ops_common_value\n", 142 + value_name); 143 + return false; 144 + } 145 + member++; 146 + mt = btf_type_by_id(btf, member->type); 147 + if (mt != type) { 148 + pr_warn("The second member of %s should be %s\n", 149 + value_name, btf_name_by_offset(btf, type->name_off)); 150 + return false; 188 151 } 189 152 190 - return NULL; 153 + return true; 191 154 } 192 155 193 - const struct bpf_struct_ops *bpf_struct_ops_find(u32 type_id) 156 + int bpf_struct_ops_desc_init(struct bpf_struct_ops_desc *st_ops_desc, 157 + struct btf *btf, 158 + struct bpf_verifier_log *log) 194 159 { 195 - unsigned int i; 160 + struct bpf_struct_ops *st_ops = st_ops_desc->st_ops; 161 + const struct btf_member *member; 162 + const struct btf_type *t; 163 + s32 type_id, value_id; 164 + char value_name[128]; 165 + const char *mname; 166 + int i; 196 167 197 - if (!type_id || !btf_vmlinux) 198 - return NULL; 168 + if (strlen(st_ops->name) + VALUE_PREFIX_LEN >= 169 + sizeof(value_name)) { 170 + pr_warn("struct_ops name %s is too long\n", 171 + st_ops->name); 172 + return -EINVAL; 173 + } 174 + sprintf(value_name, "%s%s", VALUE_PREFIX, st_ops->name); 199 175 200 - for (i = 0; i < ARRAY_SIZE(bpf_struct_ops); i++) { 201 - if (bpf_struct_ops[i]->type_id == type_id) 202 - return bpf_struct_ops[i]; 176 + type_id = btf_find_by_name_kind(btf, st_ops->name, 177 + BTF_KIND_STRUCT); 178 + if (type_id < 0) { 179 + pr_warn("Cannot find struct %s in %s\n", 180 + st_ops->name, btf_get_name(btf)); 181 + return -EINVAL; 182 + } 183 + t = btf_type_by_id(btf, type_id); 184 + if (btf_type_vlen(t) > BPF_STRUCT_OPS_MAX_NR_MEMBERS) { 185 + pr_warn("Cannot support #%u members in struct %s\n", 186 + btf_type_vlen(t), st_ops->name); 187 + return -EINVAL; 203 188 } 204 189 205 - return NULL; 190 + value_id = btf_find_by_name_kind(btf, value_name, 191 + BTF_KIND_STRUCT); 192 + if (value_id < 0) { 193 + pr_warn("Cannot find struct %s in %s\n", 194 + value_name, btf_get_name(btf)); 195 + return -EINVAL; 196 + } 197 + if (!is_valid_value_type(btf, value_id, t, value_name)) 198 + return -EINVAL; 199 + 200 + for_each_member(i, t, member) { 201 + const struct btf_type *func_proto; 202 + 203 + mname = btf_name_by_offset(btf, member->name_off); 204 + if (!*mname) { 205 + pr_warn("anon member in struct %s is not supported\n", 206 + st_ops->name); 207 + return -EOPNOTSUPP; 208 + } 209 + 210 + if (__btf_member_bitfield_size(t, member)) { 211 + pr_warn("bit field member %s in struct %s is not supported\n", 212 + mname, st_ops->name); 213 + return -EOPNOTSUPP; 214 + } 215 + 216 + func_proto = btf_type_resolve_func_ptr(btf, 217 + member->type, 218 + NULL); 219 + if (func_proto && 220 + btf_distill_func_proto(log, btf, 221 + func_proto, mname, 222 + &st_ops->func_models[i])) { 223 + pr_warn("Error in parsing func ptr %s in struct %s\n", 224 + mname, st_ops->name); 225 + return -EINVAL; 226 + } 227 + } 228 + 229 + if (i == btf_type_vlen(t)) { 230 + if (st_ops->init(btf)) { 231 + pr_warn("Error in init bpf_struct_ops %s\n", 232 + st_ops->name); 233 + return -EINVAL; 234 + } else { 235 + st_ops_desc->type_id = type_id; 236 + st_ops_desc->type = t; 237 + st_ops_desc->value_id = value_id; 238 + st_ops_desc->value_type = btf_type_by_id(btf, 239 + value_id); 240 + } 241 + } 242 + 243 + return 0; 206 244 } 207 245 208 246 static int bpf_struct_ops_map_get_next_key(struct bpf_map *map, void *key, ··· 229 265 230 266 kvalue = &st_map->kvalue; 231 267 /* Pair with smp_store_release() during map_update */ 232 - state = smp_load_acquire(&kvalue->state); 268 + state = smp_load_acquire(&kvalue->common.state); 233 269 if (state == BPF_STRUCT_OPS_STATE_INIT) { 234 270 memset(value, 0, map->value_size); 235 271 return 0; ··· 240 276 */ 241 277 uvalue = value; 242 278 memcpy(uvalue, st_map->uvalue, map->value_size); 243 - uvalue->state = state; 279 + uvalue->common.state = state; 244 280 245 281 /* This value offers the user space a general estimate of how 246 282 * many sockets are still utilizing this struct_ops for TCP ··· 248 284 * should sufficiently meet our present goals. 249 285 */ 250 286 refcnt = atomic64_read(&map->refcnt) - atomic64_read(&map->usercnt); 251 - refcount_set(&uvalue->refcnt, max_t(s64, refcnt, 0)); 287 + refcount_set(&uvalue->common.refcnt, max_t(s64, refcnt, 0)); 252 288 253 289 return 0; 254 290 } ··· 260 296 261 297 static void bpf_struct_ops_map_put_progs(struct bpf_struct_ops_map *st_map) 262 298 { 263 - const struct btf_type *t = st_map->st_ops->type; 264 299 u32 i; 265 300 266 - for (i = 0; i < btf_type_vlen(t); i++) { 301 + for (i = 0; i < st_map->links_cnt; i++) { 267 302 if (st_map->links[i]) { 268 303 bpf_link_put(st_map->links[i]); 269 304 st_map->links[i] = NULL; ··· 270 307 } 271 308 } 272 309 273 - static int check_zero_holes(const struct btf_type *t, void *data) 310 + static int check_zero_holes(const struct btf *btf, const struct btf_type *t, void *data) 274 311 { 275 312 const struct btf_member *member; 276 313 u32 i, moff, msize, prev_mend = 0; ··· 282 319 memchr_inv(data + prev_mend, 0, moff - prev_mend)) 283 320 return -EINVAL; 284 321 285 - mtype = btf_type_by_id(btf_vmlinux, member->type); 286 - mtype = btf_resolve_size(btf_vmlinux, mtype, &msize); 322 + mtype = btf_type_by_id(btf, member->type); 323 + mtype = btf_resolve_size(btf, mtype, &msize); 287 324 if (IS_ERR(mtype)) 288 325 return PTR_ERR(mtype); 289 326 prev_mend = moff + msize; ··· 339 376 void *value, u64 flags) 340 377 { 341 378 struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map; 342 - const struct bpf_struct_ops *st_ops = st_map->st_ops; 379 + const struct bpf_struct_ops_desc *st_ops_desc = st_map->st_ops_desc; 380 + const struct bpf_struct_ops *st_ops = st_ops_desc->st_ops; 343 381 struct bpf_struct_ops_value *uvalue, *kvalue; 382 + const struct btf_type *module_type; 344 383 const struct btf_member *member; 345 - const struct btf_type *t = st_ops->type; 384 + const struct btf_type *t = st_ops_desc->type; 346 385 struct bpf_tramp_links *tlinks; 347 386 void *udata, *kdata; 348 387 int prog_fd, err; ··· 357 392 if (*(u32 *)key != 0) 358 393 return -E2BIG; 359 394 360 - err = check_zero_holes(st_ops->value_type, value); 395 + err = check_zero_holes(st_map->btf, st_ops_desc->value_type, value); 361 396 if (err) 362 397 return err; 363 398 364 399 uvalue = value; 365 - err = check_zero_holes(t, uvalue->data); 400 + err = check_zero_holes(st_map->btf, t, uvalue->data); 366 401 if (err) 367 402 return err; 368 403 369 - if (uvalue->state || refcount_read(&uvalue->refcnt)) 404 + if (uvalue->common.state || refcount_read(&uvalue->common.refcnt)) 370 405 return -EINVAL; 371 406 372 407 tlinks = kcalloc(BPF_TRAMP_MAX, sizeof(*tlinks), GFP_KERNEL); ··· 378 413 379 414 mutex_lock(&st_map->lock); 380 415 381 - if (kvalue->state != BPF_STRUCT_OPS_STATE_INIT) { 416 + if (kvalue->common.state != BPF_STRUCT_OPS_STATE_INIT) { 382 417 err = -EBUSY; 383 418 goto unlock; 384 419 } ··· 390 425 image = st_map->image; 391 426 image_end = st_map->image + PAGE_SIZE; 392 427 428 + module_type = btf_type_by_id(btf_vmlinux, st_ops_ids[IDX_MODULE_ID]); 393 429 for_each_member(i, t, member) { 394 430 const struct btf_type *mtype, *ptype; 395 431 struct bpf_prog *prog; ··· 398 432 u32 moff; 399 433 400 434 moff = __btf_member_bit_offset(t, member) / 8; 401 - ptype = btf_type_resolve_ptr(btf_vmlinux, member->type, NULL); 435 + ptype = btf_type_resolve_ptr(st_map->btf, member->type, NULL); 402 436 if (ptype == module_type) { 403 437 if (*(void **)(udata + moff)) 404 438 goto reset_unlock; ··· 423 457 if (!ptype || !btf_type_is_func_proto(ptype)) { 424 458 u32 msize; 425 459 426 - mtype = btf_type_by_id(btf_vmlinux, member->type); 427 - mtype = btf_resolve_size(btf_vmlinux, mtype, &msize); 460 + mtype = btf_type_by_id(st_map->btf, member->type); 461 + mtype = btf_resolve_size(st_map->btf, mtype, &msize); 428 462 if (IS_ERR(mtype)) { 429 463 err = PTR_ERR(mtype); 430 464 goto reset_unlock; ··· 450 484 } 451 485 452 486 if (prog->type != BPF_PROG_TYPE_STRUCT_OPS || 453 - prog->aux->attach_btf_id != st_ops->type_id || 487 + prog->aux->attach_btf_id != st_ops_desc->type_id || 454 488 prog->expected_attach_type != i) { 455 489 bpf_prog_put(prog); 456 490 err = -EINVAL; ··· 493 527 * 494 528 * Pair with smp_load_acquire() during lookup_elem(). 495 529 */ 496 - smp_store_release(&kvalue->state, BPF_STRUCT_OPS_STATE_READY); 530 + smp_store_release(&kvalue->common.state, BPF_STRUCT_OPS_STATE_READY); 497 531 goto unlock; 498 532 } 499 533 ··· 511 545 * It ensures the above udata updates (e.g. prog->aux->id) 512 546 * can be seen once BPF_STRUCT_OPS_STATE_INUSE is set. 513 547 */ 514 - smp_store_release(&kvalue->state, BPF_STRUCT_OPS_STATE_INUSE); 548 + smp_store_release(&kvalue->common.state, BPF_STRUCT_OPS_STATE_INUSE); 515 549 goto unlock; 516 550 } 517 551 ··· 541 575 if (st_map->map.map_flags & BPF_F_LINK) 542 576 return -EOPNOTSUPP; 543 577 544 - prev_state = cmpxchg(&st_map->kvalue.state, 578 + prev_state = cmpxchg(&st_map->kvalue.common.state, 545 579 BPF_STRUCT_OPS_STATE_INUSE, 546 580 BPF_STRUCT_OPS_STATE_TOBEFREE); 547 581 switch (prev_state) { 548 582 case BPF_STRUCT_OPS_STATE_INUSE: 549 - st_map->st_ops->unreg(&st_map->kvalue.data); 583 + st_map->st_ops_desc->st_ops->unreg(&st_map->kvalue.data); 550 584 bpf_map_put(map); 551 585 return 0; 552 586 case BPF_STRUCT_OPS_STATE_TOBEFREE: ··· 563 597 static void bpf_struct_ops_map_seq_show_elem(struct bpf_map *map, void *key, 564 598 struct seq_file *m) 565 599 { 600 + struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map; 566 601 void *value; 567 602 int err; 568 603 ··· 573 606 574 607 err = bpf_struct_ops_map_sys_lookup_elem(map, key, value); 575 608 if (!err) { 576 - btf_type_seq_show(btf_vmlinux, map->btf_vmlinux_value_type_id, 609 + btf_type_seq_show(st_map->btf, 610 + map->btf_vmlinux_value_type_id, 577 611 value, m); 578 612 seq_puts(m, "\n"); 579 613 } ··· 599 631 600 632 static void bpf_struct_ops_map_free(struct bpf_map *map) 601 633 { 634 + struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map; 635 + 636 + /* st_ops->owner was acquired during map_alloc to implicitly holds 637 + * the btf's refcnt. The acquire was only done when btf_is_module() 638 + * st_map->btf cannot be NULL here. 639 + */ 640 + if (btf_is_module(st_map->btf)) 641 + module_put(st_map->st_ops_desc->st_ops->owner); 642 + 602 643 /* The struct_ops's function may switch to another struct_ops. 603 644 * 604 645 * For example, bpf_tcp_cc_x->init() may switch to ··· 631 654 static int bpf_struct_ops_map_alloc_check(union bpf_attr *attr) 632 655 { 633 656 if (attr->key_size != sizeof(unsigned int) || attr->max_entries != 1 || 634 - (attr->map_flags & ~BPF_F_LINK) || !attr->btf_vmlinux_value_type_id) 657 + (attr->map_flags & ~(BPF_F_LINK | BPF_F_VTYPE_BTF_OBJ_FD)) || 658 + !attr->btf_vmlinux_value_type_id) 635 659 return -EINVAL; 636 660 return 0; 637 661 } 638 662 639 663 static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr) 640 664 { 641 - const struct bpf_struct_ops *st_ops; 665 + const struct bpf_struct_ops_desc *st_ops_desc; 642 666 size_t st_map_size; 643 667 struct bpf_struct_ops_map *st_map; 644 668 const struct btf_type *t, *vt; 669 + struct module *mod = NULL; 645 670 struct bpf_map *map; 671 + struct btf *btf; 646 672 int ret; 647 673 648 - st_ops = bpf_struct_ops_find_value(attr->btf_vmlinux_value_type_id); 649 - if (!st_ops) 650 - return ERR_PTR(-ENOTSUPP); 674 + if (attr->map_flags & BPF_F_VTYPE_BTF_OBJ_FD) { 675 + /* The map holds btf for its whole life time. */ 676 + btf = btf_get_by_fd(attr->value_type_btf_obj_fd); 677 + if (IS_ERR(btf)) 678 + return ERR_CAST(btf); 679 + if (!btf_is_module(btf)) { 680 + btf_put(btf); 681 + return ERR_PTR(-EINVAL); 682 + } 651 683 652 - vt = st_ops->value_type; 653 - if (attr->value_size != vt->size) 654 - return ERR_PTR(-EINVAL); 684 + mod = btf_try_get_module(btf); 685 + /* mod holds a refcnt to btf. We don't need an extra refcnt 686 + * here. 687 + */ 688 + btf_put(btf); 689 + if (!mod) 690 + return ERR_PTR(-EINVAL); 691 + } else { 692 + btf = bpf_get_btf_vmlinux(); 693 + if (IS_ERR(btf)) 694 + return ERR_CAST(btf); 695 + if (!btf) 696 + return ERR_PTR(-ENOTSUPP); 697 + } 655 698 656 - t = st_ops->type; 699 + st_ops_desc = bpf_struct_ops_find_value(btf, attr->btf_vmlinux_value_type_id); 700 + if (!st_ops_desc) { 701 + ret = -ENOTSUPP; 702 + goto errout; 703 + } 704 + 705 + vt = st_ops_desc->value_type; 706 + if (attr->value_size != vt->size) { 707 + ret = -EINVAL; 708 + goto errout; 709 + } 710 + 711 + t = st_ops_desc->type; 657 712 658 713 st_map_size = sizeof(*st_map) + 659 714 /* kvalue stores the ··· 694 685 (vt->size - sizeof(struct bpf_struct_ops_value)); 695 686 696 687 st_map = bpf_map_area_alloc(st_map_size, NUMA_NO_NODE); 697 - if (!st_map) 698 - return ERR_PTR(-ENOMEM); 688 + if (!st_map) { 689 + ret = -ENOMEM; 690 + goto errout; 691 + } 699 692 700 - st_map->st_ops = st_ops; 693 + st_map->st_ops_desc = st_ops_desc; 701 694 map = &st_map->map; 702 695 703 696 ret = bpf_jit_charge_modmem(PAGE_SIZE); 704 - if (ret) { 705 - __bpf_struct_ops_map_free(map); 706 - return ERR_PTR(ret); 707 - } 697 + if (ret) 698 + goto errout_free; 708 699 709 700 st_map->image = arch_alloc_bpf_trampoline(PAGE_SIZE); 710 701 if (!st_map->image) { ··· 713 704 * here. 714 705 */ 715 706 bpf_jit_uncharge_modmem(PAGE_SIZE); 716 - __bpf_struct_ops_map_free(map); 717 - return ERR_PTR(-ENOMEM); 707 + ret = -ENOMEM; 708 + goto errout_free; 718 709 } 719 710 st_map->uvalue = bpf_map_area_alloc(vt->size, NUMA_NO_NODE); 711 + st_map->links_cnt = btf_type_vlen(t); 720 712 st_map->links = 721 - bpf_map_area_alloc(btf_type_vlen(t) * sizeof(struct bpf_links *), 713 + bpf_map_area_alloc(st_map->links_cnt * sizeof(struct bpf_links *), 722 714 NUMA_NO_NODE); 723 715 if (!st_map->uvalue || !st_map->links) { 724 - __bpf_struct_ops_map_free(map); 725 - return ERR_PTR(-ENOMEM); 716 + ret = -ENOMEM; 717 + goto errout_free; 726 718 } 719 + st_map->btf = btf; 727 720 728 721 mutex_init(&st_map->lock); 729 722 bpf_map_init_from_attr(map, attr); 730 723 731 724 return map; 725 + 726 + errout_free: 727 + __bpf_struct_ops_map_free(map); 728 + errout: 729 + module_put(mod); 730 + 731 + return ERR_PTR(ret); 732 732 } 733 733 734 734 static u64 bpf_struct_ops_map_mem_usage(const struct bpf_map *map) 735 735 { 736 736 struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map; 737 - const struct bpf_struct_ops *st_ops = st_map->st_ops; 738 - const struct btf_type *vt = st_ops->value_type; 737 + const struct bpf_struct_ops_desc *st_ops_desc = st_map->st_ops_desc; 738 + const struct btf_type *vt = st_ops_desc->value_type; 739 739 u64 usage; 740 740 741 741 usage = sizeof(*st_map) + ··· 803 785 return map->map_type == BPF_MAP_TYPE_STRUCT_OPS && 804 786 map->map_flags & BPF_F_LINK && 805 787 /* Pair with smp_store_release() during map_update */ 806 - smp_load_acquire(&st_map->kvalue.state) == BPF_STRUCT_OPS_STATE_READY; 788 + smp_load_acquire(&st_map->kvalue.common.state) == BPF_STRUCT_OPS_STATE_READY; 807 789 } 808 790 809 791 static void bpf_struct_ops_map_link_dealloc(struct bpf_link *link) ··· 818 800 /* st_link->map can be NULL if 819 801 * bpf_struct_ops_link_create() fails to register. 820 802 */ 821 - st_map->st_ops->unreg(&st_map->kvalue.data); 803 + st_map->st_ops_desc->st_ops->unreg(&st_map->kvalue.data); 822 804 bpf_map_put(&st_map->map); 823 805 } 824 806 kfree(st_link); ··· 865 847 if (!bpf_struct_ops_valid_to_reg(new_map)) 866 848 return -EINVAL; 867 849 868 - if (!st_map->st_ops->update) 850 + if (!st_map->st_ops_desc->st_ops->update) 869 851 return -EOPNOTSUPP; 870 852 871 853 mutex_lock(&update_mutex); ··· 878 860 879 861 old_st_map = container_of(old_map, struct bpf_struct_ops_map, map); 880 862 /* The new and old struct_ops must be the same type. */ 881 - if (st_map->st_ops != old_st_map->st_ops) { 863 + if (st_map->st_ops_desc != old_st_map->st_ops_desc) { 882 864 err = -EINVAL; 883 865 goto err_out; 884 866 } 885 867 886 - err = st_map->st_ops->update(st_map->kvalue.data, old_st_map->kvalue.data); 868 + err = st_map->st_ops_desc->st_ops->update(st_map->kvalue.data, old_st_map->kvalue.data); 887 869 if (err) 888 870 goto err_out; 889 871 ··· 934 916 if (err) 935 917 goto err_out; 936 918 937 - err = st_map->st_ops->reg(st_map->kvalue.data); 919 + err = st_map->st_ops_desc->st_ops->reg(st_map->kvalue.data); 938 920 if (err) { 939 921 bpf_link_cleanup(&link_primer); 940 922 link = NULL; ··· 948 930 bpf_map_put(map); 949 931 kfree(link); 950 932 return err; 933 + } 934 + 935 + void bpf_map_struct_ops_info_fill(struct bpf_map_info *info, struct bpf_map *map) 936 + { 937 + struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map; 938 + 939 + info->btf_vmlinux_id = btf_obj_id(st_map->btf); 951 940 }

-12

kernel/bpf/bpf_struct_ops_types.h

··· 1 - /* SPDX-License-Identifier: GPL-2.0 */ 2 - /* internal file - do not include directly */ 3 - 4 - #ifdef CONFIG_BPF_JIT 5 - #ifdef CONFIG_NET 6 - BPF_STRUCT_OPS_TYPE(bpf_dummy_ops) 7 - #endif 8 - #ifdef CONFIG_INET 9 - #include <net/tcp.h> 10 - BPF_STRUCT_OPS_TYPE(tcp_congestion_ops) 11 - #endif 12 - #endif

+227 -49

kernel/bpf/btf.c

··· 19 19 #include <linux/bpf_verifier.h> 20 20 #include <linux/btf.h> 21 21 #include <linux/btf_ids.h> 22 + #include <linux/bpf.h> 22 23 #include <linux/bpf_lsm.h> 23 24 #include <linux/skmsg.h> 24 25 #include <linux/perf_event.h> ··· 242 241 struct btf_id_dtor_kfunc dtors[]; 243 242 }; 244 243 244 + struct btf_struct_ops_tab { 245 + u32 cnt; 246 + u32 capacity; 247 + struct bpf_struct_ops_desc ops[]; 248 + }; 249 + 245 250 struct btf { 246 251 void *data; 247 252 struct btf_type **types; ··· 265 258 struct btf_kfunc_set_tab *kfunc_set_tab; 266 259 struct btf_id_dtor_kfunc_tab *dtor_kfunc_tab; 267 260 struct btf_struct_metas *struct_meta_tab; 261 + struct btf_struct_ops_tab *struct_ops_tab; 268 262 269 263 /* split BTF support */ 270 264 struct btf *base_btf; ··· 1696 1688 btf->struct_meta_tab = NULL; 1697 1689 } 1698 1690 1691 + static void btf_free_struct_ops_tab(struct btf *btf) 1692 + { 1693 + struct btf_struct_ops_tab *tab = btf->struct_ops_tab; 1694 + 1695 + kfree(tab); 1696 + btf->struct_ops_tab = NULL; 1697 + } 1698 + 1699 1699 static void btf_free(struct btf *btf) 1700 1700 { 1701 1701 btf_free_struct_meta_tab(btf); 1702 1702 btf_free_dtor_kfunc_tab(btf); 1703 1703 btf_free_kfunc_set_tab(btf); 1704 + btf_free_struct_ops_tab(btf); 1704 1705 kvfree(btf->types); 1705 1706 kvfree(btf->resolved_sizes); 1706 1707 kvfree(btf->resolved_ids); ··· 1722 1705 struct btf *btf = container_of(rcu, struct btf, rcu); 1723 1706 1724 1707 btf_free(btf); 1708 + } 1709 + 1710 + const char *btf_get_name(const struct btf *btf) 1711 + { 1712 + return btf->name; 1725 1713 } 1726 1714 1727 1715 void btf_get(struct btf *btf) ··· 3332 3310 return BTF_FIELD_FOUND; 3333 3311 } 3334 3312 3313 + int btf_find_next_decl_tag(const struct btf *btf, const struct btf_type *pt, 3314 + int comp_idx, const char *tag_key, int last_id) 3315 + { 3316 + int len = strlen(tag_key); 3317 + int i, n; 3318 + 3319 + for (i = last_id + 1, n = btf_nr_types(btf); i < n; i++) { 3320 + const struct btf_type *t = btf_type_by_id(btf, i); 3321 + 3322 + if (!btf_type_is_decl_tag(t)) 3323 + continue; 3324 + if (pt != btf_type_by_id(btf, t->type)) 3325 + continue; 3326 + if (btf_type_decl_tag(t)->component_idx != comp_idx) 3327 + continue; 3328 + if (strncmp(__btf_name_by_offset(btf, t->name_off), tag_key, len)) 3329 + continue; 3330 + return i; 3331 + } 3332 + return -ENOENT; 3333 + } 3334 + 3335 3335 const char *btf_find_decl_tag_value(const struct btf *btf, const struct btf_type *pt, 3336 3336 int comp_idx, const char *tag_key) 3337 3337 { 3338 3338 const char *value = NULL; 3339 - int i; 3339 + const struct btf_type *t; 3340 + int len, id; 3340 3341 3341 - for (i = 1; i < btf_nr_types(btf); i++) { 3342 - const struct btf_type *t = btf_type_by_id(btf, i); 3343 - int len = strlen(tag_key); 3342 + id = btf_find_next_decl_tag(btf, pt, comp_idx, tag_key, 0); 3343 + if (id < 0) 3344 + return ERR_PTR(id); 3344 3345 3345 - if (!btf_type_is_decl_tag(t)) 3346 - continue; 3347 - if (pt != btf_type_by_id(btf, t->type) || 3348 - btf_type_decl_tag(t)->component_idx != comp_idx) 3349 - continue; 3350 - if (strncmp(__btf_name_by_offset(btf, t->name_off), tag_key, len)) 3351 - continue; 3352 - /* Prevent duplicate entries for same type */ 3353 - if (value) 3354 - return ERR_PTR(-EEXIST); 3355 - value = __btf_name_by_offset(btf, t->name_off) + len; 3356 - } 3357 - if (!value) 3358 - return ERR_PTR(-ENOENT); 3346 + t = btf_type_by_id(btf, id); 3347 + len = strlen(tag_key); 3348 + value = __btf_name_by_offset(btf, t->name_off) + len; 3349 + 3350 + /* Prevent duplicate entries for same type */ 3351 + id = btf_find_next_decl_tag(btf, pt, comp_idx, tag_key, id); 3352 + if (id >= 0) 3353 + return ERR_PTR(-EEXIST); 3354 + 3359 3355 return value; 3360 3356 } 3361 3357 ··· 5973 5933 /* btf_parse_vmlinux() runs under bpf_verifier_lock */ 5974 5934 bpf_ctx_convert.t = btf_type_by_id(btf, bpf_ctx_convert_btf_id[0]); 5975 5935 5976 - bpf_struct_ops_init(btf, log); 5977 - 5978 5936 refcount_set(&btf->refcnt, 1); 5979 5937 5980 5938 err = btf_alloc_id(btf); ··· 6322 6284 __btf_name_by_offset(btf, t->name_off)); 6323 6285 return true; 6324 6286 } 6287 + EXPORT_SYMBOL_GPL(btf_ctx_access); 6325 6288 6326 6289 enum bpf_struct_walk_result { 6327 6290 /* < 0 error */ ··· 6985 6946 return false; 6986 6947 } 6987 6948 6949 + enum btf_arg_tag { 6950 + ARG_TAG_CTX = 0x1, 6951 + ARG_TAG_NONNULL = 0x2, 6952 + }; 6953 + 6988 6954 /* Process BTF of a function to produce high-level expectation of function 6989 6955 * arguments (like ARG_PTR_TO_CTX, or ARG_PTR_TO_MEM, etc). This information 6990 6956 * is cached in subprog info for reuse. ··· 7071 7027 * Only PTR_TO_CTX and SCALAR are supported atm. 7072 7028 */ 7073 7029 for (i = 0; i < nargs; i++) { 7074 - bool is_nonnull = false; 7075 - const char *tag; 7030 + u32 tags = 0; 7031 + int id = 0; 7076 7032 7077 - t = btf_type_by_id(btf, args[i].type); 7078 - 7079 - tag = btf_find_decl_tag_value(btf, fn_t, i, "arg:"); 7080 - if (IS_ERR(tag) && PTR_ERR(tag) == -ENOENT) { 7081 - tag = NULL; 7082 - } else if (IS_ERR(tag)) { 7083 - bpf_log(log, "arg#%d type's tag fetching failure: %ld\n", i, PTR_ERR(tag)); 7084 - return PTR_ERR(tag); 7085 - } 7086 7033 /* 'arg:<tag>' decl_tag takes precedence over derivation of 7087 7034 * register type from BTF type itself 7088 7035 */ 7089 - if (tag) { 7036 + while ((id = btf_find_next_decl_tag(btf, fn_t, i, "arg:", id)) > 0) { 7037 + const struct btf_type *tag_t = btf_type_by_id(btf, id); 7038 + const char *tag = __btf_name_by_offset(btf, tag_t->name_off) + 4; 7039 + 7090 7040 /* disallow arg tags in static subprogs */ 7091 7041 if (!is_global) { 7092 7042 bpf_log(log, "arg#%d type tag is not supported in static functions\n", i); 7093 7043 return -EOPNOTSUPP; 7094 7044 } 7045 + 7095 7046 if (strcmp(tag, "ctx") == 0) { 7096 - sub->args[i].arg_type = ARG_PTR_TO_CTX; 7097 - continue; 7047 + tags |= ARG_TAG_CTX; 7048 + } else if (strcmp(tag, "nonnull") == 0) { 7049 + tags |= ARG_TAG_NONNULL; 7050 + } else { 7051 + bpf_log(log, "arg#%d has unsupported set of tags\n", i); 7052 + return -EOPNOTSUPP; 7098 7053 } 7099 - if (strcmp(tag, "nonnull") == 0) 7100 - is_nonnull = true; 7054 + } 7055 + if (id != -ENOENT) { 7056 + bpf_log(log, "arg#%d type tag fetching failure: %d\n", i, id); 7057 + return id; 7101 7058 } 7102 7059 7060 + t = btf_type_by_id(btf, args[i].type); 7103 7061 while (btf_type_is_modifier(t)) 7104 7062 t = btf_type_by_id(btf, t->type); 7105 - if (btf_type_is_int(t) || btf_is_any_enum(t)) { 7106 - sub->args[i].arg_type = ARG_ANYTHING; 7107 - continue; 7108 - } 7109 - if (btf_type_is_ptr(t) && btf_get_prog_ctx_type(log, btf, t, prog_type, i)) { 7063 + if (!btf_type_is_ptr(t)) 7064 + goto skip_pointer; 7065 + 7066 + if ((tags & ARG_TAG_CTX) || btf_get_prog_ctx_type(log, btf, t, prog_type, i)) { 7067 + if (tags & ~ARG_TAG_CTX) { 7068 + bpf_log(log, "arg#%d has invalid combination of tags\n", i); 7069 + return -EINVAL; 7070 + } 7110 7071 sub->args[i].arg_type = ARG_PTR_TO_CTX; 7111 7072 continue; 7112 7073 } 7113 - if (btf_type_is_ptr(t) && btf_is_dynptr_ptr(btf, t)) { 7074 + if (btf_is_dynptr_ptr(btf, t)) { 7075 + if (tags) { 7076 + bpf_log(log, "arg#%d has invalid combination of tags\n", i); 7077 + return -EINVAL; 7078 + } 7114 7079 sub->args[i].arg_type = ARG_PTR_TO_DYNPTR | MEM_RDONLY; 7115 7080 continue; 7116 7081 } 7117 - if (is_global && btf_type_is_ptr(t)) { 7082 + if (is_global) { /* generic user data pointer */ 7118 7083 u32 mem_size; 7119 7084 7120 7085 t = btf_type_skip_modifiers(btf, t->type, NULL); 7121 7086 ref_t = btf_resolve_size(btf, t, &mem_size); 7122 7087 if (IS_ERR(ref_t)) { 7123 - bpf_log(log, 7124 - "arg#%d reference type('%s %s') size cannot be determined: %ld\n", 7125 - i, btf_type_str(t), btf_name_by_offset(btf, t->name_off), 7088 + bpf_log(log, "arg#%d reference type('%s %s') size cannot be determined: %ld\n", 7089 + i, btf_type_str(t), btf_name_by_offset(btf, t->name_off), 7126 7090 PTR_ERR(ref_t)); 7127 7091 return -EINVAL; 7128 7092 } 7129 7093 7130 - sub->args[i].arg_type = is_nonnull ? ARG_PTR_TO_MEM : ARG_PTR_TO_MEM_OR_NULL; 7094 + sub->args[i].arg_type = ARG_PTR_TO_MEM | PTR_MAYBE_NULL; 7095 + if (tags & ARG_TAG_NONNULL) 7096 + sub->args[i].arg_type &= ~PTR_MAYBE_NULL; 7131 7097 sub->args[i].mem_size = mem_size; 7132 7098 continue; 7133 7099 } 7134 - if (is_nonnull) { 7135 - bpf_log(log, "arg#%d marked as non-null, but is not a pointer type\n", i); 7100 + 7101 + skip_pointer: 7102 + if (tags) { 7103 + bpf_log(log, "arg#%d has pointer tag, but is not a pointer type\n", i); 7136 7104 return -EINVAL; 7105 + } 7106 + if (btf_type_is_int(t) || btf_is_any_enum(t)) { 7107 + sub->args[i].arg_type = ARG_ANYTHING; 7108 + continue; 7137 7109 } 7138 7110 bpf_log(log, "Arg#%d type %s in %s() is not supported yet.\n", 7139 7111 i, btf_type_str(t), tname); ··· 8705 8645 8706 8646 return !strncmp(reg_name, arg_name, cmp_len); 8707 8647 } 8648 + 8649 + #ifdef CONFIG_BPF_JIT 8650 + static int 8651 + btf_add_struct_ops(struct btf *btf, struct bpf_struct_ops *st_ops, 8652 + struct bpf_verifier_log *log) 8653 + { 8654 + struct btf_struct_ops_tab *tab, *new_tab; 8655 + int i, err; 8656 + 8657 + tab = btf->struct_ops_tab; 8658 + if (!tab) { 8659 + tab = kzalloc(offsetof(struct btf_struct_ops_tab, ops[4]), 8660 + GFP_KERNEL); 8661 + if (!tab) 8662 + return -ENOMEM; 8663 + tab->capacity = 4; 8664 + btf->struct_ops_tab = tab; 8665 + } 8666 + 8667 + for (i = 0; i < tab->cnt; i++) 8668 + if (tab->ops[i].st_ops == st_ops) 8669 + return -EEXIST; 8670 + 8671 + if (tab->cnt == tab->capacity) { 8672 + new_tab = krealloc(tab, 8673 + offsetof(struct btf_struct_ops_tab, 8674 + ops[tab->capacity * 2]), 8675 + GFP_KERNEL); 8676 + if (!new_tab) 8677 + return -ENOMEM; 8678 + tab = new_tab; 8679 + tab->capacity *= 2; 8680 + btf->struct_ops_tab = tab; 8681 + } 8682 + 8683 + tab->ops[btf->struct_ops_tab->cnt].st_ops = st_ops; 8684 + 8685 + err = bpf_struct_ops_desc_init(&tab->ops[btf->struct_ops_tab->cnt], btf, log); 8686 + if (err) 8687 + return err; 8688 + 8689 + btf->struct_ops_tab->cnt++; 8690 + 8691 + return 0; 8692 + } 8693 + 8694 + const struct bpf_struct_ops_desc * 8695 + bpf_struct_ops_find_value(struct btf *btf, u32 value_id) 8696 + { 8697 + const struct bpf_struct_ops_desc *st_ops_list; 8698 + unsigned int i; 8699 + u32 cnt; 8700 + 8701 + if (!value_id) 8702 + return NULL; 8703 + if (!btf->struct_ops_tab) 8704 + return NULL; 8705 + 8706 + cnt = btf->struct_ops_tab->cnt; 8707 + st_ops_list = btf->struct_ops_tab->ops; 8708 + for (i = 0; i < cnt; i++) { 8709 + if (st_ops_list[i].value_id == value_id) 8710 + return &st_ops_list[i]; 8711 + } 8712 + 8713 + return NULL; 8714 + } 8715 + 8716 + const struct bpf_struct_ops_desc * 8717 + bpf_struct_ops_find(struct btf *btf, u32 type_id) 8718 + { 8719 + const struct bpf_struct_ops_desc *st_ops_list; 8720 + unsigned int i; 8721 + u32 cnt; 8722 + 8723 + if (!type_id) 8724 + return NULL; 8725 + if (!btf->struct_ops_tab) 8726 + return NULL; 8727 + 8728 + cnt = btf->struct_ops_tab->cnt; 8729 + st_ops_list = btf->struct_ops_tab->ops; 8730 + for (i = 0; i < cnt; i++) { 8731 + if (st_ops_list[i].type_id == type_id) 8732 + return &st_ops_list[i]; 8733 + } 8734 + 8735 + return NULL; 8736 + } 8737 + 8738 + int __register_bpf_struct_ops(struct bpf_struct_ops *st_ops) 8739 + { 8740 + struct bpf_verifier_log *log; 8741 + struct btf *btf; 8742 + int err = 0; 8743 + 8744 + btf = btf_get_module_btf(st_ops->owner); 8745 + if (!btf) 8746 + return -EINVAL; 8747 + 8748 + log = kzalloc(sizeof(*log), GFP_KERNEL | __GFP_NOWARN); 8749 + if (!log) { 8750 + err = -ENOMEM; 8751 + goto errout; 8752 + } 8753 + 8754 + log->level = BPF_LOG_KERNEL; 8755 + 8756 + err = btf_add_struct_ops(btf, st_ops, log); 8757 + 8758 + errout: 8759 + kfree(log); 8760 + btf_put(btf); 8761 + 8762 + return err; 8763 + } 8764 + EXPORT_SYMBOL_GPL(__register_bpf_struct_ops); 8765 + #endif

+3 -3

kernel/bpf/cgroup.c

··· 1630 1630 case BPF_FUNC_perf_event_output: 1631 1631 return &bpf_event_output_data_proto; 1632 1632 default: 1633 - return bpf_base_func_proto(func_id); 1633 + return bpf_base_func_proto(func_id, prog); 1634 1634 } 1635 1635 } 1636 1636 ··· 2191 2191 case BPF_FUNC_perf_event_output: 2192 2192 return &bpf_event_output_data_proto; 2193 2193 default: 2194 - return bpf_base_func_proto(func_id); 2194 + return bpf_base_func_proto(func_id, prog); 2195 2195 } 2196 2196 } 2197 2197 ··· 2348 2348 case BPF_FUNC_perf_event_output: 2349 2349 return &bpf_event_output_data_proto; 2350 2350 default: 2351 - return bpf_base_func_proto(func_id); 2351 + return bpf_base_func_proto(func_id, prog); 2352 2352 } 2353 2353 } 2354 2354

+12 -1

kernel/bpf/core.c

··· 682 682 void bpf_prog_kallsyms_add(struct bpf_prog *fp) 683 683 { 684 684 if (!bpf_prog_kallsyms_candidate(fp) || 685 - !bpf_capable()) 685 + !bpf_token_capable(fp->aux->token, CAP_BPF)) 686 686 return; 687 687 688 688 bpf_prog_ksym_set_addr(fp); ··· 2779 2779 2780 2780 if (aux->dst_prog) 2781 2781 bpf_prog_put(aux->dst_prog); 2782 + bpf_token_put(aux->token); 2782 2783 INIT_WORK(&aux->work, bpf_prog_free_deferred); 2783 2784 schedule_work(&aux->work); 2784 2785 } ··· 2922 2921 } 2923 2922 2924 2923 bool __weak bpf_jit_supports_far_kfunc_call(void) 2924 + { 2925 + return false; 2926 + } 2927 + 2928 + /* Return TRUE if the JIT backend satisfies the following two conditions: 2929 + * 1) JIT backend supports atomic_xchg() on pointer-sized words. 2930 + * 2) Under the specific arch, the implementation of xchg() is the same 2931 + * as atomic_xchg() on pointer-sized words. 2932 + */ 2933 + bool __weak bpf_jit_supports_ptr_xchg(void) 2925 2934 { 2926 2935 return false; 2927 2936 }

+4 -3

kernel/bpf/helpers.c

··· 1414 1414 { 1415 1415 unsigned long *kptr = map_value; 1416 1416 1417 + /* This helper may be inlined by verifier. */ 1417 1418 return xchg(kptr, (unsigned long)ptr); 1418 1419 } 1419 1420 ··· 1680 1679 const struct bpf_func_proto bpf_task_pt_regs_proto __weak; 1681 1680 1682 1681 const struct bpf_func_proto * 1683 - bpf_base_func_proto(enum bpf_func_id func_id) 1682 + bpf_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) 1684 1683 { 1685 1684 switch (func_id) { 1686 1685 case BPF_FUNC_map_lookup_elem: ··· 1731 1730 break; 1732 1731 } 1733 1732 1734 - if (!bpf_capable()) 1733 + if (!bpf_token_capable(prog->aux->token, CAP_BPF)) 1735 1734 return NULL; 1736 1735 1737 1736 switch (func_id) { ··· 1789 1788 break; 1790 1789 } 1791 1790 1792 - if (!perfmon_capable()) 1791 + if (!bpf_token_capable(prog->aux->token, CAP_PERFMON)) 1793 1792 return NULL; 1794 1793 1795 1794 switch (func_id) {

+260 -16

kernel/bpf/inode.c

··· 20 20 #include <linux/filter.h> 21 21 #include <linux/bpf.h> 22 22 #include <linux/bpf_trace.h> 23 + #include <linux/kstrtox.h> 23 24 #include "preload/bpf_preload.h" 24 25 25 26 enum bpf_type { ··· 99 98 static const struct inode_operations bpf_map_iops = { }; 100 99 static const struct inode_operations bpf_link_iops = { }; 101 100 102 - static struct inode *bpf_get_inode(struct super_block *sb, 103 - const struct inode *dir, 104 - umode_t mode) 101 + struct inode *bpf_get_inode(struct super_block *sb, 102 + const struct inode *dir, 103 + umode_t mode) 105 104 { 106 105 struct inode *inode; 107 106 ··· 595 594 } 596 595 EXPORT_SYMBOL(bpf_prog_get_type_path); 597 596 597 + struct bpffs_btf_enums { 598 + const struct btf *btf; 599 + const struct btf_type *cmd_t; 600 + const struct btf_type *map_t; 601 + const struct btf_type *prog_t; 602 + const struct btf_type *attach_t; 603 + }; 604 + 605 + static int find_bpffs_btf_enums(struct bpffs_btf_enums *info) 606 + { 607 + const struct btf *btf; 608 + const struct btf_type *t; 609 + const char *name; 610 + int i, n; 611 + 612 + memset(info, 0, sizeof(*info)); 613 + 614 + btf = bpf_get_btf_vmlinux(); 615 + if (IS_ERR(btf)) 616 + return PTR_ERR(btf); 617 + if (!btf) 618 + return -ENOENT; 619 + 620 + info->btf = btf; 621 + 622 + for (i = 1, n = btf_nr_types(btf); i < n; i++) { 623 + t = btf_type_by_id(btf, i); 624 + if (!btf_type_is_enum(t)) 625 + continue; 626 + 627 + name = btf_name_by_offset(btf, t->name_off); 628 + if (!name) 629 + continue; 630 + 631 + if (strcmp(name, "bpf_cmd") == 0) 632 + info->cmd_t = t; 633 + else if (strcmp(name, "bpf_map_type") == 0) 634 + info->map_t = t; 635 + else if (strcmp(name, "bpf_prog_type") == 0) 636 + info->prog_t = t; 637 + else if (strcmp(name, "bpf_attach_type") == 0) 638 + info->attach_t = t; 639 + else 640 + continue; 641 + 642 + if (info->cmd_t && info->map_t && info->prog_t && info->attach_t) 643 + return 0; 644 + } 645 + 646 + return -ESRCH; 647 + } 648 + 649 + static bool find_btf_enum_const(const struct btf *btf, const struct btf_type *enum_t, 650 + const char *prefix, const char *str, int *value) 651 + { 652 + const struct btf_enum *e; 653 + const char *name; 654 + int i, n, pfx_len = strlen(prefix); 655 + 656 + *value = 0; 657 + 658 + if (!btf || !enum_t) 659 + return false; 660 + 661 + for (i = 0, n = btf_vlen(enum_t); i < n; i++) { 662 + e = &btf_enum(enum_t)[i]; 663 + 664 + name = btf_name_by_offset(btf, e->name_off); 665 + if (!name || strncasecmp(name, prefix, pfx_len) != 0) 666 + continue; 667 + 668 + /* match symbolic name case insensitive and ignoring prefix */ 669 + if (strcasecmp(name + pfx_len, str) == 0) { 670 + *value = e->val; 671 + return true; 672 + } 673 + } 674 + 675 + return false; 676 + } 677 + 678 + static void seq_print_delegate_opts(struct seq_file *m, 679 + const char *opt_name, 680 + const struct btf *btf, 681 + const struct btf_type *enum_t, 682 + const char *prefix, 683 + u64 delegate_msk, u64 any_msk) 684 + { 685 + const struct btf_enum *e; 686 + bool first = true; 687 + const char *name; 688 + u64 msk; 689 + int i, n, pfx_len = strlen(prefix); 690 + 691 + delegate_msk &= any_msk; /* clear unknown bits */ 692 + 693 + if (delegate_msk == 0) 694 + return; 695 + 696 + seq_printf(m, ",%s", opt_name); 697 + if (delegate_msk == any_msk) { 698 + seq_printf(m, "=any"); 699 + return; 700 + } 701 + 702 + if (btf && enum_t) { 703 + for (i = 0, n = btf_vlen(enum_t); i < n; i++) { 704 + e = &btf_enum(enum_t)[i]; 705 + name = btf_name_by_offset(btf, e->name_off); 706 + if (!name || strncasecmp(name, prefix, pfx_len) != 0) 707 + continue; 708 + msk = 1ULL << e->val; 709 + if (delegate_msk & msk) { 710 + /* emit lower-case name without prefix */ 711 + seq_printf(m, "%c", first ? '=' : ':'); 712 + name += pfx_len; 713 + while (*name) { 714 + seq_printf(m, "%c", tolower(*name)); 715 + name++; 716 + } 717 + 718 + delegate_msk &= ~msk; 719 + first = false; 720 + } 721 + } 722 + } 723 + if (delegate_msk) 724 + seq_printf(m, "%c0x%llx", first ? '=' : ':', delegate_msk); 725 + } 726 + 598 727 /* 599 728 * Display the mount options in /proc/mounts. 600 729 */ ··· 732 601 { 733 602 struct inode *inode = d_inode(root); 734 603 umode_t mode = inode->i_mode & S_IALLUGO & ~S_ISVTX; 604 + struct bpf_mount_opts *opts = root->d_sb->s_fs_info; 605 + u64 mask; 735 606 736 607 if (!uid_eq(inode->i_uid, GLOBAL_ROOT_UID)) 737 608 seq_printf(m, ",uid=%u", ··· 743 610 from_kgid_munged(&init_user_ns, inode->i_gid)); 744 611 if (mode != S_IRWXUGO) 745 612 seq_printf(m, ",mode=%o", mode); 613 + 614 + if (opts->delegate_cmds || opts->delegate_maps || 615 + opts->delegate_progs || opts->delegate_attachs) { 616 + struct bpffs_btf_enums info; 617 + 618 + /* ignore errors, fallback to hex */ 619 + (void)find_bpffs_btf_enums(&info); 620 + 621 + mask = (1ULL << __MAX_BPF_CMD) - 1; 622 + seq_print_delegate_opts(m, "delegate_cmds", 623 + info.btf, info.cmd_t, "BPF_", 624 + opts->delegate_cmds, mask); 625 + 626 + mask = (1ULL << __MAX_BPF_MAP_TYPE) - 1; 627 + seq_print_delegate_opts(m, "delegate_maps", 628 + info.btf, info.map_t, "BPF_MAP_TYPE_", 629 + opts->delegate_maps, mask); 630 + 631 + mask = (1ULL << __MAX_BPF_PROG_TYPE) - 1; 632 + seq_print_delegate_opts(m, "delegate_progs", 633 + info.btf, info.prog_t, "BPF_PROG_TYPE_", 634 + opts->delegate_progs, mask); 635 + 636 + mask = (1ULL << __MAX_BPF_ATTACH_TYPE) - 1; 637 + seq_print_delegate_opts(m, "delegate_attachs", 638 + info.btf, info.attach_t, "BPF_", 639 + opts->delegate_attachs, mask); 640 + } 641 + 746 642 return 0; 747 643 } 748 644 ··· 786 624 free_inode_nonrcu(inode); 787 625 } 788 626 789 - static const struct super_operations bpf_super_ops = { 627 + const struct super_operations bpf_super_ops = { 790 628 .statfs = simple_statfs, 791 629 .drop_inode = generic_delete_inode, 792 630 .show_options = bpf_show_options, ··· 797 635 OPT_UID, 798 636 OPT_GID, 799 637 OPT_MODE, 638 + OPT_DELEGATE_CMDS, 639 + OPT_DELEGATE_MAPS, 640 + OPT_DELEGATE_PROGS, 641 + OPT_DELEGATE_ATTACHS, 800 642 }; 801 643 802 644 static const struct fs_parameter_spec bpf_fs_parameters[] = { 803 645 fsparam_u32 ("uid", OPT_UID), 804 646 fsparam_u32 ("gid", OPT_GID), 805 647 fsparam_u32oct ("mode", OPT_MODE), 648 + fsparam_string ("delegate_cmds", OPT_DELEGATE_CMDS), 649 + fsparam_string ("delegate_maps", OPT_DELEGATE_MAPS), 650 + fsparam_string ("delegate_progs", OPT_DELEGATE_PROGS), 651 + fsparam_string ("delegate_attachs", OPT_DELEGATE_ATTACHS), 806 652 {} 807 - }; 808 - 809 - struct bpf_mount_opts { 810 - kuid_t uid; 811 - kgid_t gid; 812 - umode_t mode; 813 653 }; 814 654 815 655 static int bpf_parse_param(struct fs_context *fc, struct fs_parameter *param) 816 656 { 817 - struct bpf_mount_opts *opts = fc->fs_private; 657 + struct bpf_mount_opts *opts = fc->s_fs_info; 818 658 struct fs_parse_result result; 819 659 kuid_t uid; 820 660 kgid_t gid; 821 - int opt; 661 + int opt, err; 822 662 823 663 opt = fs_parse(fc, bpf_fs_parameters, param, &result); 824 664 if (opt < 0) { ··· 871 707 break; 872 708 case OPT_MODE: 873 709 opts->mode = result.uint_32 & S_IALLUGO; 710 + break; 711 + case OPT_DELEGATE_CMDS: 712 + case OPT_DELEGATE_MAPS: 713 + case OPT_DELEGATE_PROGS: 714 + case OPT_DELEGATE_ATTACHS: { 715 + struct bpffs_btf_enums info; 716 + const struct btf_type *enum_t; 717 + const char *enum_pfx; 718 + u64 *delegate_msk, msk = 0; 719 + char *p; 720 + int val; 721 + 722 + /* ignore errors, fallback to hex */ 723 + (void)find_bpffs_btf_enums(&info); 724 + 725 + switch (opt) { 726 + case OPT_DELEGATE_CMDS: 727 + delegate_msk = &opts->delegate_cmds; 728 + enum_t = info.cmd_t; 729 + enum_pfx = "BPF_"; 730 + break; 731 + case OPT_DELEGATE_MAPS: 732 + delegate_msk = &opts->delegate_maps; 733 + enum_t = info.map_t; 734 + enum_pfx = "BPF_MAP_TYPE_"; 735 + break; 736 + case OPT_DELEGATE_PROGS: 737 + delegate_msk = &opts->delegate_progs; 738 + enum_t = info.prog_t; 739 + enum_pfx = "BPF_PROG_TYPE_"; 740 + break; 741 + case OPT_DELEGATE_ATTACHS: 742 + delegate_msk = &opts->delegate_attachs; 743 + enum_t = info.attach_t; 744 + enum_pfx = "BPF_"; 745 + break; 746 + default: 747 + return -EINVAL; 748 + } 749 + 750 + while ((p = strsep(&param->string, ":"))) { 751 + if (strcmp(p, "any") == 0) { 752 + msk |= ~0ULL; 753 + } else if (find_btf_enum_const(info.btf, enum_t, enum_pfx, p, &val)) { 754 + msk |= 1ULL << val; 755 + } else { 756 + err = kstrtou64(p, 0, &msk); 757 + if (err) 758 + return err; 759 + } 760 + } 761 + 762 + /* Setting delegation mount options requires privileges */ 763 + if (msk && !capable(CAP_SYS_ADMIN)) 764 + return -EPERM; 765 + 766 + *delegate_msk |= msk; 767 + break; 768 + } 769 + default: 770 + /* ignore unknown mount options */ 874 771 break; 875 772 } 876 773 ··· 1009 784 static int bpf_fill_super(struct super_block *sb, struct fs_context *fc) 1010 785 { 1011 786 static const struct tree_descr bpf_rfiles[] = { { "" } }; 1012 - struct bpf_mount_opts *opts = fc->fs_private; 787 + struct bpf_mount_opts *opts = sb->s_fs_info; 1013 788 struct inode *inode; 1014 789 int ret; 790 + 791 + /* Mounting an instance of BPF FS requires privileges */ 792 + if (fc->user_ns != &init_user_ns && !capable(CAP_SYS_ADMIN)) 793 + return -EPERM; 1015 794 1016 795 ret = simple_fill_super(sb, BPF_FS_MAGIC, bpf_rfiles); 1017 796 if (ret) ··· 1040 811 1041 812 static void bpf_free_fc(struct fs_context *fc) 1042 813 { 1043 - kfree(fc->fs_private); 814 + kfree(fc->s_fs_info); 1044 815 } 1045 816 1046 817 static const struct fs_context_operations bpf_context_ops = { ··· 1064 835 opts->uid = current_fsuid(); 1065 836 opts->gid = current_fsgid(); 1066 837 1067 - fc->fs_private = opts; 838 + /* start out with no BPF token delegation enabled */ 839 + opts->delegate_cmds = 0; 840 + opts->delegate_maps = 0; 841 + opts->delegate_progs = 0; 842 + opts->delegate_attachs = 0; 843 + 844 + fc->s_fs_info = opts; 1068 845 fc->ops = &bpf_context_ops; 1069 846 return 0; 847 + } 848 + 849 + static void bpf_kill_super(struct super_block *sb) 850 + { 851 + struct bpf_mount_opts *opts = sb->s_fs_info; 852 + 853 + kill_litter_super(sb); 854 + kfree(opts); 1070 855 } 1071 856 1072 857 static struct file_system_type bpf_fs_type = { ··· 1088 845 .name = "bpf", 1089 846 .init_fs_context = bpf_init_fs_context, 1090 847 .parameters = bpf_fs_parameters, 1091 - .kill_sb = kill_litter_super, 848 + .kill_sb = bpf_kill_super, 849 + .fs_flags = FS_USERNS_MOUNT, 1092 850 }; 1093 851 1094 852 static int __init bpf_init(void)

+177 -57

kernel/bpf/syscall.c

··· 1011 1011 return -ENOTSUPP; 1012 1012 } 1013 1013 1014 - static int map_check_btf(struct bpf_map *map, const struct btf *btf, 1015 - u32 btf_key_id, u32 btf_value_id) 1014 + static int map_check_btf(struct bpf_map *map, struct bpf_token *token, 1015 + const struct btf *btf, u32 btf_key_id, u32 btf_value_id) 1016 1016 { 1017 1017 const struct btf_type *key_type, *value_type; 1018 1018 u32 key_size, value_size; ··· 1040 1040 if (!IS_ERR_OR_NULL(map->record)) { 1041 1041 int i; 1042 1042 1043 - if (!bpf_capable()) { 1043 + if (!bpf_token_capable(token, CAP_BPF)) { 1044 1044 ret = -EPERM; 1045 1045 goto free_map_tab; 1046 1046 } ··· 1123 1123 return ret; 1124 1124 } 1125 1125 1126 - #define BPF_MAP_CREATE_LAST_FIELD map_extra 1126 + static bool bpf_net_capable(void) 1127 + { 1128 + return capable(CAP_NET_ADMIN) || capable(CAP_SYS_ADMIN); 1129 + } 1130 + 1131 + #define BPF_MAP_CREATE_LAST_FIELD map_token_fd 1127 1132 /* called via syscall */ 1128 1133 static int map_create(union bpf_attr *attr) 1129 1134 { 1130 1135 const struct bpf_map_ops *ops; 1136 + struct bpf_token *token = NULL; 1131 1137 int numa_node = bpf_map_attr_numa_node(attr); 1132 1138 u32 map_type = attr->map_type; 1133 1139 struct bpf_map *map; 1140 + bool token_flag; 1134 1141 int f_flags; 1135 1142 int err; 1136 1143 1137 1144 err = CHECK_ATTR(BPF_MAP_CREATE); 1138 1145 if (err) 1139 1146 return -EINVAL; 1147 + 1148 + /* check BPF_F_TOKEN_FD flag, remember if it's set, and then clear it 1149 + * to avoid per-map type checks tripping on unknown flag 1150 + */ 1151 + token_flag = attr->map_flags & BPF_F_TOKEN_FD; 1152 + attr->map_flags &= ~BPF_F_TOKEN_FD; 1140 1153 1141 1154 if (attr->btf_vmlinux_value_type_id) { 1142 1155 if (attr->map_type != BPF_MAP_TYPE_STRUCT_OPS || ··· 1191 1178 if (!ops->map_mem_usage) 1192 1179 return -EINVAL; 1193 1180 1181 + if (token_flag) { 1182 + token = bpf_token_get_from_fd(attr->map_token_fd); 1183 + if (IS_ERR(token)) 1184 + return PTR_ERR(token); 1185 + 1186 + /* if current token doesn't grant map creation permissions, 1187 + * then we can't use this token, so ignore it and rely on 1188 + * system-wide capabilities checks 1189 + */ 1190 + if (!bpf_token_allow_cmd(token, BPF_MAP_CREATE) || 1191 + !bpf_token_allow_map_type(token, attr->map_type)) { 1192 + bpf_token_put(token); 1193 + token = NULL; 1194 + } 1195 + } 1196 + 1197 + err = -EPERM; 1198 + 1194 1199 /* Intent here is for unprivileged_bpf_disabled to block BPF map 1195 1200 * creation for unprivileged users; other actions depend 1196 1201 * on fd availability and access to bpffs, so are dependent on 1197 1202 * object creation success. Even with unprivileged BPF disabled, 1198 1203 * capability checks are still carried out. 1199 1204 */ 1200 - if (sysctl_unprivileged_bpf_disabled && !bpf_capable()) 1201 - return -EPERM; 1205 + if (sysctl_unprivileged_bpf_disabled && !bpf_token_capable(token, CAP_BPF)) 1206 + goto put_token; 1202 1207 1203 1208 /* check privileged map type permissions */ 1204 1209 switch (map_type) { ··· 1249 1218 case BPF_MAP_TYPE_LRU_PERCPU_HASH: 1250 1219 case BPF_MAP_TYPE_STRUCT_OPS: 1251 1220 case BPF_MAP_TYPE_CPUMAP: 1252 - if (!bpf_capable()) 1253 - return -EPERM; 1221 + if (!bpf_token_capable(token, CAP_BPF)) 1222 + goto put_token; 1254 1223 break; 1255 1224 case BPF_MAP_TYPE_SOCKMAP: 1256 1225 case BPF_MAP_TYPE_SOCKHASH: 1257 1226 case BPF_MAP_TYPE_DEVMAP: 1258 1227 case BPF_MAP_TYPE_DEVMAP_HASH: 1259 1228 case BPF_MAP_TYPE_XSKMAP: 1260 - if (!capable(CAP_NET_ADMIN)) 1261 - return -EPERM; 1229 + if (!bpf_token_capable(token, CAP_NET_ADMIN)) 1230 + goto put_token; 1262 1231 break; 1263 1232 default: 1264 1233 WARN(1, "unsupported map type %d", map_type); 1265 - return -EPERM; 1234 + goto put_token; 1266 1235 } 1267 1236 1268 1237 map = ops->map_alloc(attr); 1269 - if (IS_ERR(map)) 1270 - return PTR_ERR(map); 1238 + if (IS_ERR(map)) { 1239 + err = PTR_ERR(map); 1240 + goto put_token; 1241 + } 1271 1242 map->ops = ops; 1272 1243 map->map_type = map_type; 1273 1244 ··· 1306 1273 map->btf = btf; 1307 1274 1308 1275 if (attr->btf_value_type_id) { 1309 - err = map_check_btf(map, btf, attr->btf_key_type_id, 1276 + err = map_check_btf(map, token, btf, attr->btf_key_type_id, 1310 1277 attr->btf_value_type_id); 1311 1278 if (err) 1312 1279 goto free_map; ··· 1318 1285 attr->btf_vmlinux_value_type_id; 1319 1286 } 1320 1287 1321 - err = security_bpf_map_alloc(map); 1288 + err = security_bpf_map_create(map, attr, token); 1322 1289 if (err) 1323 - goto free_map; 1290 + goto free_map_sec; 1324 1291 1325 1292 err = bpf_map_alloc_id(map); 1326 1293 if (err) 1327 1294 goto free_map_sec; 1328 1295 1329 1296 bpf_map_save_memcg(map); 1297 + bpf_token_put(token); 1330 1298 1331 1299 err = bpf_map_new_fd(map, f_flags); 1332 1300 if (err < 0) { ··· 1348 1314 free_map: 1349 1315 btf_put(map->btf); 1350 1316 map->ops->map_free(map); 1317 + put_token: 1318 + bpf_token_put(token); 1351 1319 return err; 1352 1320 } 1353 1321 ··· 2180 2144 kvfree(aux->func_info); 2181 2145 kfree(aux->func_info_aux); 2182 2146 free_uid(aux->user); 2183 - security_bpf_prog_free(aux); 2147 + security_bpf_prog_free(aux->prog); 2184 2148 bpf_prog_free(aux->prog); 2185 2149 } 2186 2150 ··· 2626 2590 } 2627 2591 2628 2592 /* last field in 'union bpf_attr' used by this command */ 2629 - #define BPF_PROG_LOAD_LAST_FIELD log_true_size 2593 + #define BPF_PROG_LOAD_LAST_FIELD prog_token_fd 2630 2594 2631 2595 static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size) 2632 2596 { 2633 2597 enum bpf_prog_type type = attr->prog_type; 2634 2598 struct bpf_prog *prog, *dst_prog = NULL; 2635 2599 struct btf *attach_btf = NULL; 2600 + struct bpf_token *token = NULL; 2601 + bool bpf_cap; 2636 2602 int err; 2637 2603 char license[128]; 2638 2604 ··· 2648 2610 BPF_F_TEST_RND_HI32 | 2649 2611 BPF_F_XDP_HAS_FRAGS | 2650 2612 BPF_F_XDP_DEV_BOUND_ONLY | 2651 - BPF_F_TEST_REG_INVARIANTS)) 2613 + BPF_F_TEST_REG_INVARIANTS | 2614 + BPF_F_TOKEN_FD)) 2652 2615 return -EINVAL; 2616 + 2617 + bpf_prog_load_fixup_attach_type(attr); 2618 + 2619 + if (attr->prog_flags & BPF_F_TOKEN_FD) { 2620 + token = bpf_token_get_from_fd(attr->prog_token_fd); 2621 + if (IS_ERR(token)) 2622 + return PTR_ERR(token); 2623 + /* if current token doesn't grant prog loading permissions, 2624 + * then we can't use this token, so ignore it and rely on 2625 + * system-wide capabilities checks 2626 + */ 2627 + if (!bpf_token_allow_cmd(token, BPF_PROG_LOAD) || 2628 + !bpf_token_allow_prog_type(token, attr->prog_type, 2629 + attr->expected_attach_type)) { 2630 + bpf_token_put(token); 2631 + token = NULL; 2632 + } 2633 + } 2634 + 2635 + bpf_cap = bpf_token_capable(token, CAP_BPF); 2636 + err = -EPERM; 2653 2637 2654 2638 if (!IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && 2655 2639 (attr->prog_flags & BPF_F_ANY_ALIGNMENT) && 2656 - !bpf_capable()) 2657 - return -EPERM; 2640 + !bpf_cap) 2641 + goto put_token; 2658 2642 2659 2643 /* Intent here is for unprivileged_bpf_disabled to block BPF program 2660 2644 * creation for unprivileged users; other actions depend ··· 2685 2625 * capability checks are still carried out for these 2686 2626 * and other operations. 2687 2627 */ 2688 - if (sysctl_unprivileged_bpf_disabled && !bpf_capable()) 2689 - return -EPERM; 2628 + if (sysctl_unprivileged_bpf_disabled && !bpf_cap) 2629 + goto put_token; 2690 2630 2691 2631 if (attr->insn_cnt == 0 || 2692 - attr->insn_cnt > (bpf_capable() ? BPF_COMPLEXITY_LIMIT_INSNS : BPF_MAXINSNS)) 2693 - return -E2BIG; 2632 + attr->insn_cnt > (bpf_cap ? BPF_COMPLEXITY_LIMIT_INSNS : BPF_MAXINSNS)) { 2633 + err = -E2BIG; 2634 + goto put_token; 2635 + } 2694 2636 if (type != BPF_PROG_TYPE_SOCKET_FILTER && 2695 2637 type != BPF_PROG_TYPE_CGROUP_SKB && 2696 - !bpf_capable()) 2697 - return -EPERM; 2638 + !bpf_cap) 2639 + goto put_token; 2698 2640 2699 - if (is_net_admin_prog_type(type) && !capable(CAP_NET_ADMIN) && !capable(CAP_SYS_ADMIN)) 2700 - return -EPERM; 2701 - if (is_perfmon_prog_type(type) && !perfmon_capable()) 2702 - return -EPERM; 2641 + if (is_net_admin_prog_type(type) && !bpf_token_capable(token, CAP_NET_ADMIN)) 2642 + goto put_token; 2643 + if (is_perfmon_prog_type(type) && !bpf_token_capable(token, CAP_PERFMON)) 2644 + goto put_token; 2703 2645 2704 2646 /* attach_prog_fd/attach_btf_obj_fd can specify fd of either bpf_prog 2705 2647 * or btf, we need to check which one it is ··· 2711 2649 if (IS_ERR(dst_prog)) { 2712 2650 dst_prog = NULL; 2713 2651 attach_btf = btf_get_by_fd(attr->attach_btf_obj_fd); 2714 - if (IS_ERR(attach_btf)) 2715 - return -EINVAL; 2652 + if (IS_ERR(attach_btf)) { 2653 + err = -EINVAL; 2654 + goto put_token; 2655 + } 2716 2656 if (!btf_is_kernel(attach_btf)) { 2717 2657 /* attaching through specifying bpf_prog's BTF 2718 2658 * objects directly might be supported eventually 2719 2659 */ 2720 2660 btf_put(attach_btf); 2721 - return -ENOTSUPP; 2661 + err = -ENOTSUPP; 2662 + goto put_token; 2722 2663 } 2723 2664 } 2724 2665 } else if (attr->attach_btf_id) { 2725 2666 /* fall back to vmlinux BTF, if BTF type ID is specified */ 2726 2667 attach_btf = bpf_get_btf_vmlinux(); 2727 - if (IS_ERR(attach_btf)) 2728 - return PTR_ERR(attach_btf); 2729 - if (!attach_btf) 2730 - return -EINVAL; 2668 + if (IS_ERR(attach_btf)) { 2669 + err = PTR_ERR(attach_btf); 2670 + goto put_token; 2671 + } 2672 + if (!attach_btf) { 2673 + err = -EINVAL; 2674 + goto put_token; 2675 + } 2731 2676 btf_get(attach_btf); 2732 2677 } 2733 2678 2734 - bpf_prog_load_fixup_attach_type(attr); 2735 2679 if (bpf_prog_load_check_attach(type, attr->expected_attach_type, 2736 2680 attach_btf, attr->attach_btf_id, 2737 2681 dst_prog)) { ··· 2745 2677 bpf_prog_put(dst_prog); 2746 2678 if (attach_btf) 2747 2679 btf_put(attach_btf); 2748 - return -EINVAL; 2680 + err = -EINVAL; 2681 + goto put_token; 2749 2682 } 2750 2683 2751 2684 /* plain bpf_prog allocation */ ··· 2756 2687 bpf_prog_put(dst_prog); 2757 2688 if (attach_btf) 2758 2689 btf_put(attach_btf); 2759 - return -ENOMEM; 2690 + err = -EINVAL; 2691 + goto put_token; 2760 2692 } 2761 2693 2762 2694 prog->expected_attach_type = attr->expected_attach_type; ··· 2768 2698 prog->aux->sleepable = attr->prog_flags & BPF_F_SLEEPABLE; 2769 2699 prog->aux->xdp_has_frags = attr->prog_flags & BPF_F_XDP_HAS_FRAGS; 2770 2700 2771 - err = security_bpf_prog_alloc(prog->aux); 2772 - if (err) 2773 - goto free_prog; 2701 + /* move token into prog->aux, reuse taken refcnt */ 2702 + prog->aux->token = token; 2703 + token = NULL; 2774 2704 2775 2705 prog->aux->user = get_current_user(); 2776 2706 prog->len = attr->insn_cnt; ··· 2779 2709 if (copy_from_bpfptr(prog->insns, 2780 2710 make_bpfptr(attr->insns, uattr.is_kernel), 2781 2711 bpf_prog_insn_size(prog)) != 0) 2782 - goto free_prog_sec; 2712 + goto free_prog; 2783 2713 /* copy eBPF program license from user space */ 2784 2714 if (strncpy_from_bpfptr(license, 2785 2715 make_bpfptr(attr->license, uattr.is_kernel), 2786 2716 sizeof(license) - 1) < 0) 2787 - goto free_prog_sec; 2717 + goto free_prog; 2788 2718 license[sizeof(license) - 1] = 0; 2789 2719 2790 2720 /* eBPF programs must be GPL compatible to use GPL-ed functions */ ··· 2798 2728 if (bpf_prog_is_dev_bound(prog->aux)) { 2799 2729 err = bpf_prog_dev_bound_init(prog, attr); 2800 2730 if (err) 2801 - goto free_prog_sec; 2731 + goto free_prog; 2802 2732 } 2803 2733 2804 2734 if (type == BPF_PROG_TYPE_EXT && dst_prog && 2805 2735 bpf_prog_is_dev_bound(dst_prog->aux)) { 2806 2736 err = bpf_prog_dev_bound_inherit(prog, dst_prog); 2807 2737 if (err) 2808 - goto free_prog_sec; 2738 + goto free_prog; 2809 2739 } 2810 2740 2811 2741 /* ··· 2827 2757 /* find program type: socket_filter vs tracing_filter */ 2828 2758 err = find_prog_type(type, prog); 2829 2759 if (err < 0) 2830 - goto free_prog_sec; 2760 + goto free_prog; 2831 2761 2832 2762 prog->aux->load_time = ktime_get_boottime_ns(); 2833 2763 err = bpf_obj_name_cpy(prog->aux->name, attr->prog_name, 2834 2764 sizeof(attr->prog_name)); 2835 2765 if (err < 0) 2766 + goto free_prog; 2767 + 2768 + err = security_bpf_prog_load(prog, attr, token); 2769 + if (err) 2836 2770 goto free_prog_sec; 2837 2771 2838 2772 /* run eBPF verifier */ ··· 2882 2808 */ 2883 2809 __bpf_prog_put_noref(prog, prog->aux->real_func_cnt); 2884 2810 return err; 2811 + 2885 2812 free_prog_sec: 2886 - free_uid(prog->aux->user); 2887 - security_bpf_prog_free(prog->aux); 2813 + security_bpf_prog_free(prog); 2888 2814 free_prog: 2815 + free_uid(prog->aux->user); 2889 2816 if (prog->aux->attach_btf) 2890 2817 btf_put(prog->aux->attach_btf); 2891 2818 bpf_prog_free(prog); 2819 + put_token: 2820 + bpf_token_put(token); 2892 2821 return err; 2893 2822 } 2894 2823 ··· 3578 3501 if (!kallsyms_show_value(current_cred())) 3579 3502 addr = 0; 3580 3503 info->perf_event.kprobe.addr = addr; 3504 + info->perf_event.kprobe.cookie = event->bpf_cookie; 3581 3505 return 0; 3582 3506 } 3583 3507 #endif ··· 3604 3526 else 3605 3527 info->perf_event.type = BPF_PERF_EVENT_UPROBE; 3606 3528 info->perf_event.uprobe.offset = offset; 3529 + info->perf_event.uprobe.cookie = event->bpf_cookie; 3607 3530 return 0; 3608 3531 } 3609 3532 #endif ··· 3632 3553 uname = u64_to_user_ptr(info->perf_event.tracepoint.tp_name); 3633 3554 ulen = info->perf_event.tracepoint.name_len; 3634 3555 info->perf_event.type = BPF_PERF_EVENT_TRACEPOINT; 3556 + info->perf_event.tracepoint.cookie = event->bpf_cookie; 3635 3557 return bpf_perf_link_fill_common(event, uname, ulen, NULL, NULL, NULL, NULL); 3636 3558 } 3637 3559 ··· 3641 3561 { 3642 3562 info->perf_event.event.type = event->attr.type; 3643 3563 info->perf_event.event.config = event->attr.config; 3564 + info->perf_event.event.cookie = event->bpf_cookie; 3644 3565 info->perf_event.type = BPF_PERF_EVENT_EVENT; 3645 3566 return 0; 3646 3567 } ··· 3899 3818 case BPF_PROG_TYPE_SK_LOOKUP: 3900 3819 return attach_type == prog->expected_attach_type ? 0 : -EINVAL; 3901 3820 case BPF_PROG_TYPE_CGROUP_SKB: 3902 - if (!capable(CAP_NET_ADMIN)) 3821 + if (!bpf_token_capable(prog->aux->token, CAP_NET_ADMIN)) 3903 3822 /* cg-skb progs can be loaded by unpriv user. 3904 3823 * check permissions at attach time. 3905 3824 */ ··· 4102 4021 static int bpf_prog_query(const union bpf_attr *attr, 4103 4022 union bpf_attr __user *uattr) 4104 4023 { 4105 - if (!capable(CAP_NET_ADMIN)) 4024 + if (!bpf_net_capable()) 4106 4025 return -EPERM; 4107 4026 if (CHECK_ATTR(BPF_PROG_QUERY)) 4108 4027 return -EINVAL; ··· 4768 4687 info.btf_value_type_id = map->btf_value_type_id; 4769 4688 } 4770 4689 info.btf_vmlinux_value_type_id = map->btf_vmlinux_value_type_id; 4690 + if (map->map_type == BPF_MAP_TYPE_STRUCT_OPS) 4691 + bpf_map_struct_ops_info_fill(&info, map); 4771 4692 4772 4693 if (bpf_map_is_offloaded(map)) { 4773 4694 err = bpf_map_offload_info_fill(&info, map); ··· 4872 4789 return err; 4873 4790 } 4874 4791 4875 - #define BPF_BTF_LOAD_LAST_FIELD btf_log_true_size 4792 + #define BPF_BTF_LOAD_LAST_FIELD btf_token_fd 4876 4793 4877 4794 static int bpf_btf_load(const union bpf_attr *attr, bpfptr_t uattr, __u32 uattr_size) 4878 4795 { 4796 + struct bpf_token *token = NULL; 4797 + 4879 4798 if (CHECK_ATTR(BPF_BTF_LOAD)) 4880 4799 return -EINVAL; 4881 4800 4882 - if (!bpf_capable()) 4801 + if (attr->btf_flags & ~BPF_F_TOKEN_FD) 4802 + return -EINVAL; 4803 + 4804 + if (attr->btf_flags & BPF_F_TOKEN_FD) { 4805 + token = bpf_token_get_from_fd(attr->btf_token_fd); 4806 + if (IS_ERR(token)) 4807 + return PTR_ERR(token); 4808 + if (!bpf_token_allow_cmd(token, BPF_BTF_LOAD)) { 4809 + bpf_token_put(token); 4810 + token = NULL; 4811 + } 4812 + } 4813 + 4814 + if (!bpf_token_capable(token, CAP_BPF)) { 4815 + bpf_token_put(token); 4883 4816 return -EPERM; 4817 + } 4818 + 4819 + bpf_token_put(token); 4884 4820 4885 4821 return btf_new_fd(attr, uattr, uattr_size); 4886 4822 } ··· 5517 5415 return ret; 5518 5416 } 5519 5417 5418 + #define BPF_TOKEN_CREATE_LAST_FIELD token_create.bpffs_fd 5419 + 5420 + static int token_create(union bpf_attr *attr) 5421 + { 5422 + if (CHECK_ATTR(BPF_TOKEN_CREATE)) 5423 + return -EINVAL; 5424 + 5425 + /* no flags are supported yet */ 5426 + if (attr->token_create.flags) 5427 + return -EINVAL; 5428 + 5429 + return bpf_token_create(attr); 5430 + } 5431 + 5520 5432 static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size) 5521 5433 { 5522 5434 union bpf_attr attr; ··· 5664 5548 case BPF_PROG_BIND_MAP: 5665 5549 err = bpf_prog_bind_map(&attr); 5666 5550 break; 5551 + case BPF_TOKEN_CREATE: 5552 + err = token_create(&attr); 5553 + break; 5667 5554 default: 5668 5555 err = -EINVAL; 5669 5556 break; ··· 5773 5654 const struct bpf_func_proto * __weak 5774 5655 tracing_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) 5775 5656 { 5776 - return bpf_base_func_proto(func_id); 5657 + return bpf_base_func_proto(func_id, prog); 5777 5658 } 5778 5659 5779 5660 BPF_CALL_1(bpf_sys_close, u32, fd) ··· 5823 5704 { 5824 5705 switch (func_id) { 5825 5706 case BPF_FUNC_sys_bpf: 5826 - return !perfmon_capable() ? NULL : &bpf_sys_bpf_proto; 5707 + return !bpf_token_capable(prog->aux->token, CAP_PERFMON) 5708 + ? NULL : &bpf_sys_bpf_proto; 5827 5709 case BPF_FUNC_btf_find_by_name_kind: 5828 5710 return &bpf_btf_find_by_name_kind_proto; 5829 5711 case BPF_FUNC_sys_close:

+278

kernel/bpf/token.c

··· 1 + #include <linux/bpf.h> 2 + #include <linux/vmalloc.h> 3 + #include <linux/fdtable.h> 4 + #include <linux/file.h> 5 + #include <linux/fs.h> 6 + #include <linux/kernel.h> 7 + #include <linux/idr.h> 8 + #include <linux/namei.h> 9 + #include <linux/user_namespace.h> 10 + #include <linux/security.h> 11 + 12 + static bool bpf_ns_capable(struct user_namespace *ns, int cap) 13 + { 14 + return ns_capable(ns, cap) || (cap != CAP_SYS_ADMIN && ns_capable(ns, CAP_SYS_ADMIN)); 15 + } 16 + 17 + bool bpf_token_capable(const struct bpf_token *token, int cap) 18 + { 19 + struct user_namespace *userns; 20 + 21 + /* BPF token allows ns_capable() level of capabilities */ 22 + userns = token ? token->userns : &init_user_ns; 23 + if (!bpf_ns_capable(userns, cap)) 24 + return false; 25 + if (token && security_bpf_token_capable(token, cap) < 0) 26 + return false; 27 + return true; 28 + } 29 + 30 + void bpf_token_inc(struct bpf_token *token) 31 + { 32 + atomic64_inc(&token->refcnt); 33 + } 34 + 35 + static void bpf_token_free(struct bpf_token *token) 36 + { 37 + security_bpf_token_free(token); 38 + put_user_ns(token->userns); 39 + kfree(token); 40 + } 41 + 42 + static void bpf_token_put_deferred(struct work_struct *work) 43 + { 44 + struct bpf_token *token = container_of(work, struct bpf_token, work); 45 + 46 + bpf_token_free(token); 47 + } 48 + 49 + void bpf_token_put(struct bpf_token *token) 50 + { 51 + if (!token) 52 + return; 53 + 54 + if (!atomic64_dec_and_test(&token->refcnt)) 55 + return; 56 + 57 + INIT_WORK(&token->work, bpf_token_put_deferred); 58 + schedule_work(&token->work); 59 + } 60 + 61 + static int bpf_token_release(struct inode *inode, struct file *filp) 62 + { 63 + struct bpf_token *token = filp->private_data; 64 + 65 + bpf_token_put(token); 66 + return 0; 67 + } 68 + 69 + static void bpf_token_show_fdinfo(struct seq_file *m, struct file *filp) 70 + { 71 + struct bpf_token *token = filp->private_data; 72 + u64 mask; 73 + 74 + BUILD_BUG_ON(__MAX_BPF_CMD >= 64); 75 + mask = (1ULL << __MAX_BPF_CMD) - 1; 76 + if ((token->allowed_cmds & mask) == mask) 77 + seq_printf(m, "allowed_cmds:\tany\n"); 78 + else 79 + seq_printf(m, "allowed_cmds:\t0x%llx\n", token->allowed_cmds); 80 + 81 + BUILD_BUG_ON(__MAX_BPF_MAP_TYPE >= 64); 82 + mask = (1ULL << __MAX_BPF_MAP_TYPE) - 1; 83 + if ((token->allowed_maps & mask) == mask) 84 + seq_printf(m, "allowed_maps:\tany\n"); 85 + else 86 + seq_printf(m, "allowed_maps:\t0x%llx\n", token->allowed_maps); 87 + 88 + BUILD_BUG_ON(__MAX_BPF_PROG_TYPE >= 64); 89 + mask = (1ULL << __MAX_BPF_PROG_TYPE) - 1; 90 + if ((token->allowed_progs & mask) == mask) 91 + seq_printf(m, "allowed_progs:\tany\n"); 92 + else 93 + seq_printf(m, "allowed_progs:\t0x%llx\n", token->allowed_progs); 94 + 95 + BUILD_BUG_ON(__MAX_BPF_ATTACH_TYPE >= 64); 96 + mask = (1ULL << __MAX_BPF_ATTACH_TYPE) - 1; 97 + if ((token->allowed_attachs & mask) == mask) 98 + seq_printf(m, "allowed_attachs:\tany\n"); 99 + else 100 + seq_printf(m, "allowed_attachs:\t0x%llx\n", token->allowed_attachs); 101 + } 102 + 103 + #define BPF_TOKEN_INODE_NAME "bpf-token" 104 + 105 + static const struct inode_operations bpf_token_iops = { }; 106 + 107 + static const struct file_operations bpf_token_fops = { 108 + .release = bpf_token_release, 109 + .show_fdinfo = bpf_token_show_fdinfo, 110 + }; 111 + 112 + int bpf_token_create(union bpf_attr *attr) 113 + { 114 + struct bpf_mount_opts *mnt_opts; 115 + struct bpf_token *token = NULL; 116 + struct user_namespace *userns; 117 + struct inode *inode; 118 + struct file *file; 119 + struct path path; 120 + struct fd f; 121 + umode_t mode; 122 + int err, fd; 123 + 124 + f = fdget(attr->token_create.bpffs_fd); 125 + if (!f.file) 126 + return -EBADF; 127 + 128 + path = f.file->f_path; 129 + path_get(&path); 130 + fdput(f); 131 + 132 + if (path.dentry != path.mnt->mnt_sb->s_root) { 133 + err = -EINVAL; 134 + goto out_path; 135 + } 136 + if (path.mnt->mnt_sb->s_op != &bpf_super_ops) { 137 + err = -EINVAL; 138 + goto out_path; 139 + } 140 + err = path_permission(&path, MAY_ACCESS); 141 + if (err) 142 + goto out_path; 143 + 144 + userns = path.dentry->d_sb->s_user_ns; 145 + /* 146 + * Enforce that creators of BPF tokens are in the same user 147 + * namespace as the BPF FS instance. This makes reasoning about 148 + * permissions a lot easier and we can always relax this later. 149 + */ 150 + if (current_user_ns() != userns) { 151 + err = -EPERM; 152 + goto out_path; 153 + } 154 + if (!ns_capable(userns, CAP_BPF)) { 155 + err = -EPERM; 156 + goto out_path; 157 + } 158 + 159 + /* Creating BPF token in init_user_ns doesn't make much sense. */ 160 + if (current_user_ns() == &init_user_ns) { 161 + err = -EOPNOTSUPP; 162 + goto out_path; 163 + } 164 + 165 + mnt_opts = path.dentry->d_sb->s_fs_info; 166 + if (mnt_opts->delegate_cmds == 0 && 167 + mnt_opts->delegate_maps == 0 && 168 + mnt_opts->delegate_progs == 0 && 169 + mnt_opts->delegate_attachs == 0) { 170 + err = -ENOENT; /* no BPF token delegation is set up */ 171 + goto out_path; 172 + } 173 + 174 + mode = S_IFREG | ((S_IRUSR | S_IWUSR) & ~current_umask()); 175 + inode = bpf_get_inode(path.mnt->mnt_sb, NULL, mode); 176 + if (IS_ERR(inode)) { 177 + err = PTR_ERR(inode); 178 + goto out_path; 179 + } 180 + 181 + inode->i_op = &bpf_token_iops; 182 + inode->i_fop = &bpf_token_fops; 183 + clear_nlink(inode); /* make sure it is unlinked */ 184 + 185 + file = alloc_file_pseudo(inode, path.mnt, BPF_TOKEN_INODE_NAME, O_RDWR, &bpf_token_fops); 186 + if (IS_ERR(file)) { 187 + iput(inode); 188 + err = PTR_ERR(file); 189 + goto out_path; 190 + } 191 + 192 + token = kzalloc(sizeof(*token), GFP_USER); 193 + if (!token) { 194 + err = -ENOMEM; 195 + goto out_file; 196 + } 197 + 198 + atomic64_set(&token->refcnt, 1); 199 + 200 + /* remember bpffs owning userns for future ns_capable() checks */ 201 + token->userns = get_user_ns(userns); 202 + 203 + token->allowed_cmds = mnt_opts->delegate_cmds; 204 + token->allowed_maps = mnt_opts->delegate_maps; 205 + token->allowed_progs = mnt_opts->delegate_progs; 206 + token->allowed_attachs = mnt_opts->delegate_attachs; 207 + 208 + err = security_bpf_token_create(token, attr, &path); 209 + if (err) 210 + goto out_token; 211 + 212 + fd = get_unused_fd_flags(O_CLOEXEC); 213 + if (fd < 0) { 214 + err = fd; 215 + goto out_token; 216 + } 217 + 218 + file->private_data = token; 219 + fd_install(fd, file); 220 + 221 + path_put(&path); 222 + return fd; 223 + 224 + out_token: 225 + bpf_token_free(token); 226 + out_file: 227 + fput(file); 228 + out_path: 229 + path_put(&path); 230 + return err; 231 + } 232 + 233 + struct bpf_token *bpf_token_get_from_fd(u32 ufd) 234 + { 235 + struct fd f = fdget(ufd); 236 + struct bpf_token *token; 237 + 238 + if (!f.file) 239 + return ERR_PTR(-EBADF); 240 + if (f.file->f_op != &bpf_token_fops) { 241 + fdput(f); 242 + return ERR_PTR(-EINVAL); 243 + } 244 + 245 + token = f.file->private_data; 246 + bpf_token_inc(token); 247 + fdput(f); 248 + 249 + return token; 250 + } 251 + 252 + bool bpf_token_allow_cmd(const struct bpf_token *token, enum bpf_cmd cmd) 253 + { 254 + if (!token) 255 + return false; 256 + if (!(token->allowed_cmds & (1ULL << cmd))) 257 + return false; 258 + return security_bpf_token_cmd(token, cmd) == 0; 259 + } 260 + 261 + bool bpf_token_allow_map_type(const struct bpf_token *token, enum bpf_map_type type) 262 + { 263 + if (!token || type >= __MAX_BPF_MAP_TYPE) 264 + return false; 265 + 266 + return token->allowed_maps & (1ULL << type); 267 + } 268 + 269 + bool bpf_token_allow_prog_type(const struct bpf_token *token, 270 + enum bpf_prog_type prog_type, 271 + enum bpf_attach_type attach_type) 272 + { 273 + if (!token || prog_type >= __MAX_BPF_PROG_TYPE || attach_type >= __MAX_BPF_ATTACH_TYPE) 274 + return false; 275 + 276 + return (token->allowed_progs & (1ULL << prog_type)) && 277 + (token->allowed_attachs & (1ULL << attach_type)); 278 + }

+109 -39

kernel/bpf/verifier.c

··· 4403 4403 return reg->type != SCALAR_VALUE; 4404 4404 } 4405 4405 4406 + static void assign_scalar_id_before_mov(struct bpf_verifier_env *env, 4407 + struct bpf_reg_state *src_reg) 4408 + { 4409 + if (src_reg->type == SCALAR_VALUE && !src_reg->id && 4410 + !tnum_is_const(src_reg->var_off)) 4411 + /* Ensure that src_reg has a valid ID that will be copied to 4412 + * dst_reg and then will be used by find_equal_scalars() to 4413 + * propagate min/max range. 4414 + */ 4415 + src_reg->id = ++env->id_gen; 4416 + } 4417 + 4406 4418 /* Copy src state preserving dst->parent and dst->live fields */ 4407 4419 static void copy_register_state(struct bpf_reg_state *dst, const struct bpf_reg_state *src) 4408 4420 { ··· 4448 4436 static bool is_bpf_st_mem(struct bpf_insn *insn) 4449 4437 { 4450 4438 return BPF_CLASS(insn->code) == BPF_ST && BPF_MODE(insn->code) == BPF_MEM; 4439 + } 4440 + 4441 + static int get_reg_width(struct bpf_reg_state *reg) 4442 + { 4443 + return fls64(reg->umax_value); 4451 4444 } 4452 4445 4453 4446 /* check_stack_{read,write}_fixed_off functions track spill/fill of registers, ··· 4505 4488 4506 4489 mark_stack_slot_scratched(env, spi); 4507 4490 if (reg && !(off % BPF_REG_SIZE) && register_is_bounded(reg) && env->bpf_capable) { 4491 + bool reg_value_fits; 4492 + 4493 + reg_value_fits = get_reg_width(reg) <= BITS_PER_BYTE * size; 4494 + /* Make sure that reg had an ID to build a relation on spill. */ 4495 + if (reg_value_fits) 4496 + assign_scalar_id_before_mov(env, reg); 4508 4497 save_register_state(env, state, spi, reg, size); 4509 4498 /* Break the relation on a narrowing spill. */ 4510 - if (fls64(reg->umax_value) > BITS_PER_BYTE * size) 4499 + if (!reg_value_fits) 4511 4500 state->stack[spi].spilled_ptr.id = 0; 4512 4501 } else if (!reg && !(off % BPF_REG_SIZE) && is_bpf_st_mem(insn) && 4513 - insn->imm != 0 && env->bpf_capable) { 4502 + env->bpf_capable) { 4514 4503 struct bpf_reg_state fake_reg = {}; 4515 4504 4516 4505 __mark_reg_known(&fake_reg, insn->imm); ··· 4663 4640 return -EINVAL; 4664 4641 } 4665 4642 4666 - /* Erase all spilled pointers. */ 4643 + /* If writing_zero and the spi slot contains a spill of value 0, 4644 + * maintain the spill type. 4645 + */ 4646 + if (writing_zero && *stype == STACK_SPILL && 4647 + is_spilled_scalar_reg(&state->stack[spi])) { 4648 + struct bpf_reg_state *spill_reg = &state->stack[spi].spilled_ptr; 4649 + 4650 + if (tnum_is_const(spill_reg->var_off) && spill_reg->var_off.value == 0) { 4651 + zero_used = true; 4652 + continue; 4653 + } 4654 + } 4655 + 4656 + /* Erase all other spilled pointers. */ 4667 4657 state->stack[spi].spilled_ptr.type = NOT_INIT; 4668 4658 4669 4659 /* Update the slot type. */ ··· 12862 12826 } 12863 12827 12864 12828 switch (base_type(ptr_reg->type)) { 12829 + case PTR_TO_CTX: 12830 + case PTR_TO_MAP_VALUE: 12831 + case PTR_TO_MAP_KEY: 12832 + case PTR_TO_STACK: 12833 + case PTR_TO_PACKET_META: 12834 + case PTR_TO_PACKET: 12835 + case PTR_TO_TP_BUFFER: 12836 + case PTR_TO_BTF_ID: 12837 + case PTR_TO_MEM: 12838 + case PTR_TO_BUF: 12839 + case PTR_TO_FUNC: 12840 + case CONST_PTR_TO_DYNPTR: 12841 + break; 12865 12842 case PTR_TO_FLOW_KEYS: 12866 12843 if (known) 12867 12844 break; ··· 12884 12835 if (known && smin_val == 0 && opcode == BPF_ADD) 12885 12836 break; 12886 12837 fallthrough; 12887 - case PTR_TO_PACKET_END: 12888 - case PTR_TO_SOCKET: 12889 - case PTR_TO_SOCK_COMMON: 12890 - case PTR_TO_TCP_SOCK: 12891 - case PTR_TO_XDP_SOCK: 12838 + default: 12892 12839 verbose(env, "R%d pointer arithmetic on %s prohibited\n", 12893 12840 dst, reg_type_str(env, ptr_reg->type)); 12894 12841 return -EACCES; 12895 - default: 12896 - break; 12897 12842 } 12898 12843 12899 12844 /* In case of 'scalar += pointer', dst_reg inherits pointer type and id. ··· 13948 13905 if (BPF_SRC(insn->code) == BPF_X) { 13949 13906 struct bpf_reg_state *src_reg = regs + insn->src_reg; 13950 13907 struct bpf_reg_state *dst_reg = regs + insn->dst_reg; 13951 - bool need_id = src_reg->type == SCALAR_VALUE && !src_reg->id && 13952 - !tnum_is_const(src_reg->var_off); 13953 13908 13954 13909 if (BPF_CLASS(insn->code) == BPF_ALU64) { 13955 13910 if (insn->off == 0) { 13956 13911 /* case: R1 = R2 13957 13912 * copy register state to dest reg 13958 13913 */ 13959 - if (need_id) 13960 - /* Assign src and dst registers the same ID 13961 - * that will be used by find_equal_scalars() 13962 - * to propagate min/max range. 13963 - */ 13964 - src_reg->id = ++env->id_gen; 13914 + assign_scalar_id_before_mov(env, src_reg); 13965 13915 copy_register_state(dst_reg, src_reg); 13966 13916 dst_reg->live |= REG_LIVE_WRITTEN; 13967 13917 dst_reg->subreg_def = DEF_NOT_SUBREG; ··· 13969 13933 bool no_sext; 13970 13934 13971 13935 no_sext = src_reg->umax_value < (1ULL << (insn->off - 1)); 13972 - if (no_sext && need_id) 13973 - src_reg->id = ++env->id_gen; 13936 + if (no_sext) 13937 + assign_scalar_id_before_mov(env, src_reg); 13974 13938 copy_register_state(dst_reg, src_reg); 13975 13939 if (!no_sext) 13976 13940 dst_reg->id = 0; ··· 13990 13954 return -EACCES; 13991 13955 } else if (src_reg->type == SCALAR_VALUE) { 13992 13956 if (insn->off == 0) { 13993 - bool is_src_reg_u32 = src_reg->umax_value <= U32_MAX; 13957 + bool is_src_reg_u32 = get_reg_width(src_reg) <= 32; 13994 13958 13995 - if (is_src_reg_u32 && need_id) 13996 - src_reg->id = ++env->id_gen; 13959 + if (is_src_reg_u32) 13960 + assign_scalar_id_before_mov(env, src_reg); 13997 13961 copy_register_state(dst_reg, src_reg); 13998 13962 /* Make sure ID is cleared if src_reg is not in u32 13999 13963 * range otherwise dst_reg min/max could be incorrectly ··· 14007 13971 /* case: W1 = (s8, s16)W2 */ 14008 13972 bool no_sext = src_reg->umax_value < (1ULL << (insn->off - 1)); 14009 13973 14010 - if (no_sext && need_id) 14011 - src_reg->id = ++env->id_gen; 13974 + if (no_sext) 13975 + assign_scalar_id_before_mov(env, src_reg); 14012 13976 copy_register_state(dst_reg, src_reg); 14013 13977 if (!no_sext) 14014 13978 dst_reg->id = 0; ··· 17063 17027 } 17064 17028 /* attempt to detect infinite loop to avoid unnecessary doomed work */ 17065 17029 if (states_maybe_looping(&sl->state, cur) && 17066 - states_equal(env, &sl->state, cur, false) && 17030 + states_equal(env, &sl->state, cur, true) && 17067 17031 !iter_active_depths_differ(&sl->state, cur) && 17068 17032 sl->state.callback_unroll_depth == cur->callback_unroll_depth) { 17069 17033 verbose_linfo(env, insn_idx, "; "); ··· 19845 19809 continue; 19846 19810 } 19847 19811 19812 + /* Implement bpf_kptr_xchg inline */ 19813 + if (prog->jit_requested && BITS_PER_LONG == 64 && 19814 + insn->imm == BPF_FUNC_kptr_xchg && 19815 + bpf_jit_supports_ptr_xchg()) { 19816 + insn_buf[0] = BPF_MOV64_REG(BPF_REG_0, BPF_REG_2); 19817 + insn_buf[1] = BPF_ATOMIC_OP(BPF_DW, BPF_XCHG, BPF_REG_1, BPF_REG_0, 0); 19818 + cnt = 2; 19819 + 19820 + new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt); 19821 + if (!new_prog) 19822 + return -ENOMEM; 19823 + 19824 + delta += cnt - 1; 19825 + env->prog = prog = new_prog; 19826 + insn = new_prog->insnsi + i + delta; 19827 + continue; 19828 + } 19848 19829 patch_call_imm: 19849 19830 fn = env->ops->get_func_proto(insn->imm, env->prog); 19850 19831 /* all functions that have prototype and verifier allowed ··· 20094 20041 state->first_insn_idx = env->subprog_info[subprog].start; 20095 20042 state->last_insn_idx = -1; 20096 20043 20097 - 20098 20044 regs = state->frame[state->curframe]->regs; 20099 20045 if (subprog || env->prog->type == BPF_PROG_TYPE_EXT) { 20100 20046 const char *sub_name = subprog_name(env, subprog); ··· 20285 20233 static int check_struct_ops_btf_id(struct bpf_verifier_env *env) 20286 20234 { 20287 20235 const struct btf_type *t, *func_proto; 20236 + const struct bpf_struct_ops_desc *st_ops_desc; 20288 20237 const struct bpf_struct_ops *st_ops; 20289 20238 const struct btf_member *member; 20290 20239 struct bpf_prog *prog = env->prog; 20291 20240 u32 btf_id, member_idx; 20241 + struct btf *btf; 20292 20242 const char *mname; 20293 20243 20294 20244 if (!prog->gpl_compatible) { ··· 20298 20244 return -EINVAL; 20299 20245 } 20300 20246 20247 + if (!prog->aux->attach_btf_id) 20248 + return -ENOTSUPP; 20249 + 20250 + btf = prog->aux->attach_btf; 20251 + if (btf_is_module(btf)) { 20252 + /* Make sure st_ops is valid through the lifetime of env */ 20253 + env->attach_btf_mod = btf_try_get_module(btf); 20254 + if (!env->attach_btf_mod) { 20255 + verbose(env, "struct_ops module %s is not found\n", 20256 + btf_get_name(btf)); 20257 + return -ENOTSUPP; 20258 + } 20259 + } 20260 + 20301 20261 btf_id = prog->aux->attach_btf_id; 20302 - st_ops = bpf_struct_ops_find(btf_id); 20303 - if (!st_ops) { 20262 + st_ops_desc = bpf_struct_ops_find(btf, btf_id); 20263 + if (!st_ops_desc) { 20304 20264 verbose(env, "attach_btf_id %u is not a supported struct\n", 20305 20265 btf_id); 20306 20266 return -ENOTSUPP; 20307 20267 } 20268 + st_ops = st_ops_desc->st_ops; 20308 20269 20309 - t = st_ops->type; 20270 + t = st_ops_desc->type; 20310 20271 member_idx = prog->expected_attach_type; 20311 20272 if (member_idx >= btf_type_vlen(t)) { 20312 20273 verbose(env, "attach to invalid member idx %u of struct %s\n", ··· 20330 20261 } 20331 20262 20332 20263 member = &btf_type_member(t)[member_idx]; 20333 - mname = btf_name_by_offset(btf_vmlinux, member->name_off); 20334 - func_proto = btf_type_resolve_func_ptr(btf_vmlinux, member->type, 20264 + mname = btf_name_by_offset(btf, member->name_off); 20265 + func_proto = btf_type_resolve_func_ptr(btf, member->type, 20335 20266 NULL); 20336 20267 if (!func_proto) { 20337 20268 verbose(env, "attach to invalid member %s(@idx %u) of struct %s\n", ··· 20833 20764 env->prog = *prog; 20834 20765 env->ops = bpf_verifier_ops[env->prog->type]; 20835 20766 env->fd_array = make_bpfptr(attr->fd_array, uattr.is_kernel); 20836 - is_priv = bpf_capable(); 20767 + 20768 + env->allow_ptr_leaks = bpf_allow_ptr_leaks(env->prog->aux->token); 20769 + env->allow_uninit_stack = bpf_allow_uninit_stack(env->prog->aux->token); 20770 + env->bypass_spec_v1 = bpf_bypass_spec_v1(env->prog->aux->token); 20771 + env->bypass_spec_v4 = bpf_bypass_spec_v4(env->prog->aux->token); 20772 + env->bpf_capable = is_priv = bpf_token_capable(env->prog->aux->token, CAP_BPF); 20837 20773 20838 20774 bpf_get_btf_vmlinux(); 20839 20775 ··· 20869 20795 env->strict_alignment = true; 20870 20796 if (attr->prog_flags & BPF_F_ANY_ALIGNMENT) 20871 20797 env->strict_alignment = false; 20872 - 20873 - env->allow_ptr_leaks = bpf_allow_ptr_leaks(); 20874 - env->allow_uninit_stack = bpf_allow_uninit_stack(); 20875 - env->bypass_spec_v1 = bpf_bypass_spec_v1(); 20876 - env->bypass_spec_v4 = bpf_bypass_spec_v4(); 20877 - env->bpf_capable = bpf_capable(); 20878 20798 20879 20799 if (is_priv) 20880 20800 env->test_state_freq = attr->prog_flags & BPF_F_TEST_STATE_FREQ; ··· 21035 20967 env->prog->expected_attach_type = 0; 21036 20968 21037 20969 *prog = env->prog; 20970 + 20971 + module_put(env->attach_btf_mod); 21038 20972 err_unlock: 21039 20973 if (!is_priv) 21040 20974 mutex_unlock(&bpf_verifier_lock);

+16 -1

kernel/trace/bpf_trace.c

··· 1629 1629 case BPF_FUNC_trace_vprintk: 1630 1630 return bpf_get_trace_vprintk_proto(); 1631 1631 default: 1632 - return bpf_base_func_proto(func_id); 1632 + return bpf_base_func_proto(func_id, prog); 1633 1633 } 1634 1634 } 1635 1635 ··· 2679 2679 static int bpf_kprobe_multi_link_fill_link_info(const struct bpf_link *link, 2680 2680 struct bpf_link_info *info) 2681 2681 { 2682 + u64 __user *ucookies = u64_to_user_ptr(info->kprobe_multi.cookies); 2682 2683 u64 __user *uaddrs = u64_to_user_ptr(info->kprobe_multi.addrs); 2683 2684 struct bpf_kprobe_multi_link *kmulti_link; 2684 2685 u32 ucount = info->kprobe_multi.count; 2685 2686 int err = 0, i; 2686 2687 2687 2688 if (!uaddrs ^ !ucount) 2689 + return -EINVAL; 2690 + if (ucookies && !ucount) 2688 2691 return -EINVAL; 2689 2692 2690 2693 kmulti_link = container_of(link, struct bpf_kprobe_multi_link, link); ··· 2701 2698 err = -ENOSPC; 2702 2699 else 2703 2700 ucount = kmulti_link->cnt; 2701 + 2702 + if (ucookies) { 2703 + if (kmulti_link->cookies) { 2704 + if (copy_to_user(ucookies, kmulti_link->cookies, ucount * sizeof(u64))) 2705 + return -EFAULT; 2706 + } else { 2707 + for (i = 0; i < ucount; i++) { 2708 + if (put_user(0, ucookies + i)) 2709 + return -EFAULT; 2710 + } 2711 + } 2712 + } 2704 2713 2705 2714 if (kallsyms_show_value(current_cred())) { 2706 2715 if (copy_to_user(uaddrs, kmulti_link->addrs, ucount * sizeof(u64)))

+19 -3

net/bpf/bpf_dummy_struct_ops.c

··· 7 7 #include <linux/bpf.h> 8 8 #include <linux/btf.h> 9 9 10 - extern struct bpf_struct_ops bpf_bpf_dummy_ops; 10 + static struct bpf_struct_ops bpf_bpf_dummy_ops; 11 11 12 12 /* A common type for test_N with return value in bpf_dummy_ops */ 13 13 typedef int (*dummy_ops_test_ret_fn)(struct bpf_dummy_ops_state *state, ...); ··· 21 21 u64 args[MAX_BPF_FUNC_ARGS]; 22 22 struct bpf_dummy_ops_state state; 23 23 }; 24 + 25 + static struct btf *bpf_dummy_ops_btf; 24 26 25 27 static struct bpf_dummy_ops_test_args * 26 28 dummy_ops_init_args(const union bpf_attr *kattr, unsigned int nr) ··· 92 90 void *image = NULL; 93 91 unsigned int op_idx; 94 92 int prog_ret; 93 + s32 type_id; 95 94 int err; 96 95 97 - if (prog->aux->attach_btf_id != st_ops->type_id) 96 + type_id = btf_find_by_name_kind(bpf_dummy_ops_btf, 97 + bpf_bpf_dummy_ops.name, 98 + BTF_KIND_STRUCT); 99 + if (type_id < 0) 100 + return -EINVAL; 101 + if (prog->aux->attach_btf_id != type_id) 98 102 return -EOPNOTSUPP; 99 103 100 104 func_proto = prog->aux->attach_func_proto; ··· 156 148 157 149 static int bpf_dummy_init(struct btf *btf) 158 150 { 151 + bpf_dummy_ops_btf = btf; 159 152 return 0; 160 153 } 161 154 ··· 256 247 .test_sleepable = bpf_dummy_test_sleepable, 257 248 }; 258 249 259 - struct bpf_struct_ops bpf_bpf_dummy_ops = { 250 + static struct bpf_struct_ops bpf_bpf_dummy_ops = { 260 251 .verifier_ops = &bpf_dummy_verifier_ops, 261 252 .init = bpf_dummy_init, 262 253 .check_member = bpf_dummy_ops_check_member, ··· 265 256 .unreg = bpf_dummy_unreg, 266 257 .name = "bpf_dummy_ops", 267 258 .cfi_stubs = &__bpf_bpf_dummy_ops, 259 + .owner = THIS_MODULE, 268 260 }; 261 + 262 + static int __init bpf_dummy_struct_ops_init(void) 263 + { 264 + return register_bpf_struct_ops(&bpf_bpf_dummy_ops, bpf_dummy_ops); 265 + } 266 + late_initcall(bpf_dummy_struct_ops_init);

+131 -24

net/core/filter.c

··· 88 88 #include "dev.h" 89 89 90 90 static const struct bpf_func_proto * 91 - bpf_sk_base_func_proto(enum bpf_func_id func_id); 91 + bpf_sk_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog); 92 92 93 93 int copy_bpf_fprog_from_user(struct sock_fprog *dst, sockptr_t src, int len) 94 94 { ··· 778 778 BPF_EMIT_JMP; 779 779 break; 780 780 781 - /* ldxb 4 * ([14] & 0xf) is remaped into 6 insns. */ 781 + /* ldxb 4 * ([14] & 0xf) is remapped into 6 insns. */ 782 782 case BPF_LDX | BPF_MSH | BPF_B: { 783 783 struct sock_filter tmp = { 784 784 .code = BPF_LD | BPF_ABS | BPF_B, ··· 804 804 *insn = BPF_MOV64_REG(BPF_REG_A, BPF_REG_TMP); 805 805 break; 806 806 } 807 - /* RET_K is remaped into 2 insns. RET_A case doesn't need an 807 + /* RET_K is remapped into 2 insns. RET_A case doesn't need an 808 808 * extra mov as BPF_REG_0 is already mapped into BPF_REG_A. 809 809 */ 810 810 case BPF_RET | BPF_A: ··· 2968 2968 * 2969 2969 * Then if B is non-zero AND there is no space allocate space and 2970 2970 * compact A, B regions into page. If there is space shift ring to 2971 - * the rigth free'ing the next element in ring to place B, leaving 2971 + * the right free'ing the next element in ring to place B, leaving 2972 2972 * A untouched except to reduce length. 2973 2973 */ 2974 2974 if (start != offset) { ··· 7894 7894 case BPF_FUNC_ktime_get_coarse_ns: 7895 7895 return &bpf_ktime_get_coarse_ns_proto; 7896 7896 default: 7897 - return bpf_base_func_proto(func_id); 7897 + return bpf_base_func_proto(func_id, prog); 7898 7898 } 7899 7899 } 7900 7900 ··· 7987 7987 return NULL; 7988 7988 } 7989 7989 default: 7990 - return bpf_sk_base_func_proto(func_id); 7990 + return bpf_sk_base_func_proto(func_id, prog); 7991 7991 } 7992 7992 } 7993 7993 ··· 8006 8006 case BPF_FUNC_perf_event_output: 8007 8007 return &bpf_skb_event_output_proto; 8008 8008 default: 8009 - return bpf_sk_base_func_proto(func_id); 8009 + return bpf_sk_base_func_proto(func_id, prog); 8010 8010 } 8011 8011 } 8012 8012 ··· 8193 8193 #endif 8194 8194 #endif 8195 8195 default: 8196 - return bpf_sk_base_func_proto(func_id); 8196 + return bpf_sk_base_func_proto(func_id, prog); 8197 8197 } 8198 8198 } 8199 8199 ··· 8252 8252 #endif 8253 8253 #endif 8254 8254 default: 8255 - return bpf_sk_base_func_proto(func_id); 8255 + return bpf_sk_base_func_proto(func_id, prog); 8256 8256 } 8257 8257 8258 8258 #if IS_MODULE(CONFIG_NF_CONNTRACK) && IS_ENABLED(CONFIG_DEBUG_INFO_BTF_MODULES) 8259 8259 /* The nf_conn___init type is used in the NF_CONNTRACK kfuncs. The 8260 8260 * kfuncs are defined in two different modules, and we want to be able 8261 - * to use them interchangably with the same BTF type ID. Because modules 8261 + * to use them interchangeably with the same BTF type ID. Because modules 8262 8262 * can't de-duplicate BTF IDs between each other, we need the type to be 8263 8263 * referenced in the vmlinux BTF or the verifier will get confused about 8264 8264 * the different types. So we add this dummy type reference which will ··· 8313 8313 return &bpf_tcp_sock_proto; 8314 8314 #endif /* CONFIG_INET */ 8315 8315 default: 8316 - return bpf_sk_base_func_proto(func_id); 8316 + return bpf_sk_base_func_proto(func_id, prog); 8317 8317 } 8318 8318 } 8319 8319 ··· 8355 8355 return &bpf_get_cgroup_classid_curr_proto; 8356 8356 #endif 8357 8357 default: 8358 - return bpf_sk_base_func_proto(func_id); 8358 + return bpf_sk_base_func_proto(func_id, prog); 8359 8359 } 8360 8360 } 8361 8361 ··· 8399 8399 return &bpf_skc_lookup_tcp_proto; 8400 8400 #endif 8401 8401 default: 8402 - return bpf_sk_base_func_proto(func_id); 8402 + return bpf_sk_base_func_proto(func_id, prog); 8403 8403 } 8404 8404 } 8405 8405 ··· 8410 8410 case BPF_FUNC_skb_load_bytes: 8411 8411 return &bpf_flow_dissector_load_bytes_proto; 8412 8412 default: 8413 - return bpf_sk_base_func_proto(func_id); 8413 + return bpf_sk_base_func_proto(func_id, prog); 8414 8414 } 8415 8415 } 8416 8416 ··· 8437 8437 case BPF_FUNC_skb_under_cgroup: 8438 8438 return &bpf_skb_under_cgroup_proto; 8439 8439 default: 8440 - return bpf_sk_base_func_proto(func_id); 8440 + return bpf_sk_base_func_proto(func_id, prog); 8441 8441 } 8442 8442 } 8443 8443 ··· 8612 8612 return false; 8613 8613 case bpf_ctx_range(struct __sk_buff, data): 8614 8614 case bpf_ctx_range(struct __sk_buff, data_end): 8615 - if (!bpf_capable()) 8615 + if (!bpf_token_capable(prog->aux->token, CAP_BPF)) 8616 8616 return false; 8617 8617 break; 8618 8618 } ··· 8624 8624 case bpf_ctx_range_till(struct __sk_buff, cb[0], cb[4]): 8625 8625 break; 8626 8626 case bpf_ctx_range(struct __sk_buff, tstamp): 8627 - if (!bpf_capable()) 8627 + if (!bpf_token_capable(prog->aux->token, CAP_BPF)) 8628 8628 return false; 8629 8629 break; 8630 8630 default: ··· 11268 11268 case BPF_FUNC_ktime_get_coarse_ns: 11269 11269 return &bpf_ktime_get_coarse_ns_proto; 11270 11270 default: 11271 - return bpf_base_func_proto(func_id); 11271 + return bpf_base_func_proto(func_id, prog); 11272 11272 } 11273 11273 } 11274 11274 ··· 11450 11450 case BPF_FUNC_sk_release: 11451 11451 return &bpf_sk_release_proto; 11452 11452 default: 11453 - return bpf_sk_base_func_proto(func_id); 11453 + return bpf_sk_base_func_proto(func_id, prog); 11454 11454 } 11455 11455 } 11456 11456 ··· 11784 11784 }; 11785 11785 11786 11786 static const struct bpf_func_proto * 11787 - bpf_sk_base_func_proto(enum bpf_func_id func_id) 11787 + bpf_sk_base_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) 11788 11788 { 11789 11789 const struct bpf_func_proto *func; 11790 11790 ··· 11813 11813 case BPF_FUNC_ktime_get_coarse_ns: 11814 11814 return &bpf_ktime_get_coarse_ns_proto; 11815 11815 default: 11816 - return bpf_base_func_proto(func_id); 11816 + return bpf_base_func_proto(func_id, prog); 11817 11817 } 11818 11818 11819 - if (!perfmon_capable()) 11819 + if (!bpf_token_capable(prog->aux->token, CAP_PERFMON)) 11820 11820 return NULL; 11821 11821 11822 11822 return func; ··· 11869 11869 11870 11870 return 0; 11871 11871 } 11872 + 11873 + __bpf_kfunc int bpf_sk_assign_tcp_reqsk(struct sk_buff *skb, struct sock *sk, 11874 + struct bpf_tcp_req_attrs *attrs, int attrs__sz) 11875 + { 11876 + #if IS_ENABLED(CONFIG_SYN_COOKIES) 11877 + const struct request_sock_ops *ops; 11878 + struct inet_request_sock *ireq; 11879 + struct tcp_request_sock *treq; 11880 + struct request_sock *req; 11881 + struct net *net; 11882 + __u16 min_mss; 11883 + u32 tsoff = 0; 11884 + 11885 + if (attrs__sz != sizeof(*attrs) || 11886 + attrs->reserved[0] || attrs->reserved[1] || attrs->reserved[2]) 11887 + return -EINVAL; 11888 + 11889 + if (!skb_at_tc_ingress(skb)) 11890 + return -EINVAL; 11891 + 11892 + net = dev_net(skb->dev); 11893 + if (net != sock_net(sk)) 11894 + return -ENETUNREACH; 11895 + 11896 + switch (skb->protocol) { 11897 + case htons(ETH_P_IP): 11898 + ops = &tcp_request_sock_ops; 11899 + min_mss = 536; 11900 + break; 11901 + #if IS_BUILTIN(CONFIG_IPV6) 11902 + case htons(ETH_P_IPV6): 11903 + ops = &tcp6_request_sock_ops; 11904 + min_mss = IPV6_MIN_MTU - 60; 11905 + break; 11906 + #endif 11907 + default: 11908 + return -EINVAL; 11909 + } 11910 + 11911 + if (sk->sk_type != SOCK_STREAM || sk->sk_state != TCP_LISTEN || 11912 + sk_is_mptcp(sk)) 11913 + return -EINVAL; 11914 + 11915 + if (attrs->mss < min_mss) 11916 + return -EINVAL; 11917 + 11918 + if (attrs->wscale_ok) { 11919 + if (!READ_ONCE(net->ipv4.sysctl_tcp_window_scaling)) 11920 + return -EINVAL; 11921 + 11922 + if (attrs->snd_wscale > TCP_MAX_WSCALE || 11923 + attrs->rcv_wscale > TCP_MAX_WSCALE) 11924 + return -EINVAL; 11925 + } 11926 + 11927 + if (attrs->sack_ok && !READ_ONCE(net->ipv4.sysctl_tcp_sack)) 11928 + return -EINVAL; 11929 + 11930 + if (attrs->tstamp_ok) { 11931 + if (!READ_ONCE(net->ipv4.sysctl_tcp_timestamps)) 11932 + return -EINVAL; 11933 + 11934 + tsoff = attrs->rcv_tsecr - tcp_ns_to_ts(attrs->usec_ts_ok, tcp_clock_ns()); 11935 + } 11936 + 11937 + req = inet_reqsk_alloc(ops, sk, false); 11938 + if (!req) 11939 + return -ENOMEM; 11940 + 11941 + ireq = inet_rsk(req); 11942 + treq = tcp_rsk(req); 11943 + 11944 + req->rsk_listener = sk; 11945 + req->syncookie = 1; 11946 + req->mss = attrs->mss; 11947 + req->ts_recent = attrs->rcv_tsval; 11948 + 11949 + ireq->snd_wscale = attrs->snd_wscale; 11950 + ireq->rcv_wscale = attrs->rcv_wscale; 11951 + ireq->tstamp_ok = !!attrs->tstamp_ok; 11952 + ireq->sack_ok = !!attrs->sack_ok; 11953 + ireq->wscale_ok = !!attrs->wscale_ok; 11954 + ireq->ecn_ok = !!attrs->ecn_ok; 11955 + 11956 + treq->req_usec_ts = !!attrs->usec_ts_ok; 11957 + treq->ts_off = tsoff; 11958 + 11959 + skb_orphan(skb); 11960 + skb->sk = req_to_sk(req); 11961 + skb->destructor = sock_pfree; 11962 + 11963 + return 0; 11964 + #else 11965 + return -EOPNOTSUPP; 11966 + #endif 11967 + } 11968 + 11872 11969 __bpf_kfunc_end_defs(); 11873 11970 11874 11971 int bpf_dynptr_from_skb_rdonly(struct sk_buff *skb, u64 flags, ··· 11994 11897 BTF_ID_FLAGS(func, bpf_sock_addr_set_sun_path) 11995 11898 BTF_SET8_END(bpf_kfunc_check_set_sock_addr) 11996 11899 11900 + BTF_SET8_START(bpf_kfunc_check_set_tcp_reqsk) 11901 + BTF_ID_FLAGS(func, bpf_sk_assign_tcp_reqsk, KF_TRUSTED_ARGS) 11902 + BTF_SET8_END(bpf_kfunc_check_set_tcp_reqsk) 11903 + 11997 11904 static const struct btf_kfunc_id_set bpf_kfunc_set_skb = { 11998 11905 .owner = THIS_MODULE, 11999 11906 .set = &bpf_kfunc_check_set_skb, ··· 12011 11910 static const struct btf_kfunc_id_set bpf_kfunc_set_sock_addr = { 12012 11911 .owner = THIS_MODULE, 12013 11912 .set = &bpf_kfunc_check_set_sock_addr, 11913 + }; 11914 + 11915 + static const struct btf_kfunc_id_set bpf_kfunc_set_tcp_reqsk = { 11916 + .owner = THIS_MODULE, 11917 + .set = &bpf_kfunc_check_set_tcp_reqsk, 12014 11918 }; 12015 11919 12016 11920 static int __init bpf_kfunc_init(void) ··· 12033 11927 ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_LWT_SEG6LOCAL, &bpf_kfunc_set_skb); 12034 11928 ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_NETFILTER, &bpf_kfunc_set_skb); 12035 11929 ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &bpf_kfunc_set_xdp); 12036 - return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SOCK_ADDR, 12037 - &bpf_kfunc_set_sock_addr); 11930 + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SOCK_ADDR, 11931 + &bpf_kfunc_set_sock_addr); 11932 + return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_tcp_reqsk); 12038 11933 } 12039 11934 late_initcall(bpf_kfunc_init); 12040 11935

+12 -2

net/core/sock.c

··· 2583 2583 #ifdef CONFIG_INET 2584 2584 void sock_pfree(struct sk_buff *skb) 2585 2585 { 2586 - if (sk_is_refcounted(skb->sk)) 2587 - sock_gen_put(skb->sk); 2586 + struct sock *sk = skb->sk; 2587 + 2588 + if (!sk_is_refcounted(sk)) 2589 + return; 2590 + 2591 + if (sk->sk_state == TCP_NEW_SYN_RECV && inet_reqsk(sk)->syncookie) { 2592 + inet_reqsk(sk)->rsk_listener = NULL; 2593 + reqsk_free(inet_reqsk(sk)); 2594 + return; 2595 + } 2596 + 2597 + sock_gen_put(sk); 2588 2598 } 2589 2599 EXPORT_SYMBOL(sock_pfree); 2590 2600 #endif /* CONFIG_INET */

+17 -5

net/ipv4/bpf_tcp_ca.c

··· 12 12 #include <net/bpf_sk_storage.h> 13 13 14 14 /* "extern" is to avoid sparse warning. It is only used in bpf_struct_ops.c. */ 15 - extern struct bpf_struct_ops bpf_tcp_congestion_ops; 15 + static struct bpf_struct_ops bpf_tcp_congestion_ops; 16 16 17 17 static u32 unsupported_ops[] = { 18 18 offsetof(struct tcp_congestion_ops, get_info), ··· 20 20 21 21 static const struct btf_type *tcp_sock_type; 22 22 static u32 tcp_sock_id, sock_id; 23 + static const struct btf_type *tcp_congestion_ops_type; 23 24 24 25 static int bpf_tcp_ca_init(struct btf *btf) 25 26 { ··· 36 35 return -EINVAL; 37 36 tcp_sock_id = type_id; 38 37 tcp_sock_type = btf_type_by_id(btf, tcp_sock_id); 38 + 39 + type_id = btf_find_by_name_kind(btf, "tcp_congestion_ops", BTF_KIND_STRUCT); 40 + if (type_id < 0) 41 + return -EINVAL; 42 + tcp_congestion_ops_type = btf_type_by_id(btf, type_id); 39 43 40 44 return 0; 41 45 } ··· 155 149 u32 midx; 156 150 157 151 midx = prog->expected_attach_type; 158 - t = bpf_tcp_congestion_ops.type; 152 + t = tcp_congestion_ops_type; 159 153 m = &btf_type_member(t)[midx]; 160 154 161 155 return __btf_member_bit_offset(t, m) / 8; ··· 197 191 case BPF_FUNC_ktime_get_coarse_ns: 198 192 return &bpf_ktime_get_coarse_ns_proto; 199 193 default: 200 - return bpf_base_func_proto(func_id); 194 + return bpf_base_func_proto(func_id, prog); 201 195 } 202 196 } 203 197 ··· 345 339 .release = __bpf_tcp_ca_release, 346 340 }; 347 341 348 - struct bpf_struct_ops bpf_tcp_congestion_ops = { 342 + static struct bpf_struct_ops bpf_tcp_congestion_ops = { 349 343 .verifier_ops = &bpf_tcp_ca_verifier_ops, 350 344 .reg = bpf_tcp_ca_reg, 351 345 .unreg = bpf_tcp_ca_unreg, ··· 356 350 .validate = bpf_tcp_ca_validate, 357 351 .name = "tcp_congestion_ops", 358 352 .cfi_stubs = &__bpf_ops_tcp_congestion_ops, 353 + .owner = THIS_MODULE, 359 354 }; 360 355 361 356 static int __init bpf_tcp_ca_kfunc_init(void) 362 357 { 363 - return register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &bpf_tcp_ca_kfunc_set); 358 + int ret; 359 + 360 + ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &bpf_tcp_ca_kfunc_set); 361 + ret = ret ?: register_bpf_struct_ops(&bpf_tcp_congestion_ops, tcp_congestion_ops); 362 + 363 + return ret; 364 364 } 365 365 late_initcall(bpf_tcp_ca_kfunc_init);

+27 -13

net/ipv4/syncookies.c

··· 51 51 count, &syncookie_secret[c]); 52 52 } 53 53 54 - /* Convert one nsec 64bit timestamp to ts (ms or usec resolution) */ 55 - static u64 tcp_ns_to_ts(bool usec_ts, u64 val) 56 - { 57 - if (usec_ts) 58 - return div_u64(val, NSEC_PER_USEC); 59 - 60 - return div_u64(val, NSEC_PER_MSEC); 61 - } 62 - 63 54 /* 64 55 * when syncookies are in effect and tcp timestamps are enabled we encode 65 56 * tcp options in the lower bits of the timestamp value that will be ··· 295 304 return 0; 296 305 } 297 306 307 + #if IS_ENABLED(CONFIG_BPF) 308 + struct request_sock *cookie_bpf_check(struct sock *sk, struct sk_buff *skb) 309 + { 310 + struct request_sock *req = inet_reqsk(skb->sk); 311 + 312 + skb->sk = NULL; 313 + skb->destructor = NULL; 314 + 315 + if (cookie_tcp_reqsk_init(sk, skb, req)) { 316 + reqsk_free(req); 317 + req = NULL; 318 + } 319 + 320 + return req; 321 + } 322 + EXPORT_SYMBOL_GPL(cookie_bpf_check); 323 + #endif 324 + 298 325 struct request_sock *cookie_tcp_reqsk_alloc(const struct request_sock_ops *ops, 299 326 struct sock *sk, struct sk_buff *skb, 300 327 struct tcp_options_received *tcp_opt, ··· 413 404 !th->ack || th->rst) 414 405 goto out; 415 406 416 - req = cookie_tcp_check(net, sk, skb); 417 - if (IS_ERR(req)) 418 - goto out; 407 + if (cookie_bpf_ok(skb)) { 408 + req = cookie_bpf_check(sk, skb); 409 + } else { 410 + req = cookie_tcp_check(net, sk, skb); 411 + if (IS_ERR(req)) 412 + goto out; 413 + } 419 414 if (!req) 420 415 goto out_drop; 421 416 ··· 467 454 ireq->wscale_ok, &rcv_wscale, 468 455 dst_metric(&rt->dst, RTAX_INITRWND)); 469 456 470 - ireq->rcv_wscale = rcv_wscale; 457 + if (!req->syncookie) 458 + ireq->rcv_wscale = rcv_wscale; 471 459 ireq->ecn_ok &= cookie_ecn_ok(net, &rt->dst); 472 460 473 461 ret = tcp_get_cookie_sock(sk, skb, req, &rt->dst);

+9 -4

net/ipv6/syncookies.c

··· 182 182 !th->ack || th->rst) 183 183 goto out; 184 184 185 - req = cookie_tcp_check(net, sk, skb); 186 - if (IS_ERR(req)) 187 - goto out; 185 + if (cookie_bpf_ok(skb)) { 186 + req = cookie_bpf_check(sk, skb); 187 + } else { 188 + req = cookie_tcp_check(net, sk, skb); 189 + if (IS_ERR(req)) 190 + goto out; 191 + } 188 192 if (!req) 189 193 goto out_drop; 190 194 ··· 251 247 ireq->wscale_ok, &rcv_wscale, 252 248 dst_metric(dst, RTAX_INITRWND)); 253 249 254 - ireq->rcv_wscale = rcv_wscale; 250 + if (!req->syncookie) 251 + ireq->rcv_wscale = rcv_wscale; 255 252 ireq->ecn_ok &= cookie_ecn_ok(net, dst); 256 253 257 254 ret = tcp_get_cookie_sock(sk, skb, req, dst);

+1 -1

net/netfilter/nf_bpf_link.c

··· 314 314 static const struct bpf_func_proto * 315 315 bpf_nf_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) 316 316 { 317 - return bpf_base_func_proto(func_id); 317 + return bpf_base_func_proto(func_id, prog); 318 318 } 319 319 320 320 const struct bpf_verifier_ops netfilter_verifier_ops = {

+85 -16

security/security.c

··· 5410 5410 } 5411 5411 5412 5412 /** 5413 - * security_bpf_map_alloc() - Allocate a bpf map LSM blob 5414 - * @map: bpf map 5413 + * security_bpf_map_create() - Check if BPF map creation is allowed 5414 + * @map: BPF map object 5415 + * @attr: BPF syscall attributes used to create BPF map 5416 + * @token: BPF token used to grant user access 5415 5417 * 5416 - * Initialize the security field inside bpf map. 5418 + * Do a check when the kernel creates a new BPF map. This is also the 5419 + * point where LSM blob is allocated for LSMs that need them. 5417 5420 * 5418 5421 * Return: Returns 0 on success, error on failure. 5419 5422 */ 5420 - int security_bpf_map_alloc(struct bpf_map *map) 5423 + int security_bpf_map_create(struct bpf_map *map, union bpf_attr *attr, 5424 + struct bpf_token *token) 5421 5425 { 5422 - return call_int_hook(bpf_map_alloc_security, 0, map); 5426 + return call_int_hook(bpf_map_create, 0, map, attr, token); 5423 5427 } 5424 5428 5425 5429 /** 5426 - * security_bpf_prog_alloc() - Allocate a bpf program LSM blob 5427 - * @aux: bpf program aux info struct 5430 + * security_bpf_prog_load() - Check if loading of BPF program is allowed 5431 + * @prog: BPF program object 5432 + * @attr: BPF syscall attributes used to create BPF program 5433 + * @token: BPF token used to grant user access to BPF subsystem 5428 5434 * 5429 - * Initialize the security field inside bpf program. 5435 + * Perform an access control check when the kernel loads a BPF program and 5436 + * allocates associated BPF program object. This hook is also responsible for 5437 + * allocating any required LSM state for the BPF program. 5430 5438 * 5431 5439 * Return: Returns 0 on success, error on failure. 5432 5440 */ 5433 - int security_bpf_prog_alloc(struct bpf_prog_aux *aux) 5441 + int security_bpf_prog_load(struct bpf_prog *prog, union bpf_attr *attr, 5442 + struct bpf_token *token) 5434 5443 { 5435 - return call_int_hook(bpf_prog_alloc_security, 0, aux); 5444 + return call_int_hook(bpf_prog_load, 0, prog, attr, token); 5445 + } 5446 + 5447 + /** 5448 + * security_bpf_token_create() - Check if creating of BPF token is allowed 5449 + * @token: BPF token object 5450 + * @attr: BPF syscall attributes used to create BPF token 5451 + * @path: path pointing to BPF FS mount point from which BPF token is created 5452 + * 5453 + * Do a check when the kernel instantiates a new BPF token object from BPF FS 5454 + * instance. This is also the point where LSM blob can be allocated for LSMs. 5455 + * 5456 + * Return: Returns 0 on success, error on failure. 5457 + */ 5458 + int security_bpf_token_create(struct bpf_token *token, union bpf_attr *attr, 5459 + struct path *path) 5460 + { 5461 + return call_int_hook(bpf_token_create, 0, token, attr, path); 5462 + } 5463 + 5464 + /** 5465 + * security_bpf_token_cmd() - Check if BPF token is allowed to delegate 5466 + * requested BPF syscall command 5467 + * @token: BPF token object 5468 + * @cmd: BPF syscall command requested to be delegated by BPF token 5469 + * 5470 + * Do a check when the kernel decides whether provided BPF token should allow 5471 + * delegation of requested BPF syscall command. 5472 + * 5473 + * Return: Returns 0 on success, error on failure. 5474 + */ 5475 + int security_bpf_token_cmd(const struct bpf_token *token, enum bpf_cmd cmd) 5476 + { 5477 + return call_int_hook(bpf_token_cmd, 0, token, cmd); 5478 + } 5479 + 5480 + /** 5481 + * security_bpf_token_capable() - Check if BPF token is allowed to delegate 5482 + * requested BPF-related capability 5483 + * @token: BPF token object 5484 + * @cap: capabilities requested to be delegated by BPF token 5485 + * 5486 + * Do a check when the kernel decides whether provided BPF token should allow 5487 + * delegation of requested BPF-related capabilities. 5488 + * 5489 + * Return: Returns 0 on success, error on failure. 5490 + */ 5491 + int security_bpf_token_capable(const struct bpf_token *token, int cap) 5492 + { 5493 + return call_int_hook(bpf_token_capable, 0, token, cap); 5436 5494 } 5437 5495 5438 5496 /** ··· 5501 5443 */ 5502 5444 void security_bpf_map_free(struct bpf_map *map) 5503 5445 { 5504 - call_void_hook(bpf_map_free_security, map); 5446 + call_void_hook(bpf_map_free, map); 5505 5447 } 5506 5448 5507 5449 /** 5508 - * security_bpf_prog_free() - Free a bpf program's LSM blob 5509 - * @aux: bpf program aux info struct 5450 + * security_bpf_prog_free() - Free a BPF program's LSM blob 5451 + * @prog: BPF program struct 5510 5452 * 5511 - * Clean up the security information stored inside bpf prog. 5453 + * Clean up the security information stored inside BPF program. 5512 5454 */ 5513 - void security_bpf_prog_free(struct bpf_prog_aux *aux) 5455 + void security_bpf_prog_free(struct bpf_prog *prog) 5514 5456 { 5515 - call_void_hook(bpf_prog_free_security, aux); 5457 + call_void_hook(bpf_prog_free, prog); 5458 + } 5459 + 5460 + /** 5461 + * security_bpf_token_free() - Free a BPF token's LSM blob 5462 + * @token: BPF token struct 5463 + * 5464 + * Clean up the security information stored inside BPF token. 5465 + */ 5466 + void security_bpf_token_free(struct bpf_token *token) 5467 + { 5468 + call_void_hook(bpf_token_free, token); 5516 5469 } 5517 5470 #endif /* CONFIG_BPF_SYSCALL */ 5518 5471

+37 -10

security/selinux/hooks.c

··· 6920 6920 BPF__PROG_RUN, NULL); 6921 6921 } 6922 6922 6923 - static int selinux_bpf_map_alloc(struct bpf_map *map) 6923 + static int selinux_bpf_map_create(struct bpf_map *map, union bpf_attr *attr, 6924 + struct bpf_token *token) 6924 6925 { 6925 6926 struct bpf_security_struct *bpfsec; 6926 6927 ··· 6943 6942 kfree(bpfsec); 6944 6943 } 6945 6944 6946 - static int selinux_bpf_prog_alloc(struct bpf_prog_aux *aux) 6945 + static int selinux_bpf_prog_load(struct bpf_prog *prog, union bpf_attr *attr, 6946 + struct bpf_token *token) 6947 6947 { 6948 6948 struct bpf_security_struct *bpfsec; 6949 6949 ··· 6953 6951 return -ENOMEM; 6954 6952 6955 6953 bpfsec->sid = current_sid(); 6956 - aux->security = bpfsec; 6954 + prog->aux->security = bpfsec; 6957 6955 6958 6956 return 0; 6959 6957 } 6960 6958 6961 - static void selinux_bpf_prog_free(struct bpf_prog_aux *aux) 6959 + static void selinux_bpf_prog_free(struct bpf_prog *prog) 6962 6960 { 6963 - struct bpf_security_struct *bpfsec = aux->security; 6961 + struct bpf_security_struct *bpfsec = prog->aux->security; 6964 6962 6965 - aux->security = NULL; 6963 + prog->aux->security = NULL; 6964 + kfree(bpfsec); 6965 + } 6966 + 6967 + static int selinux_bpf_token_create(struct bpf_token *token, union bpf_attr *attr, 6968 + struct path *path) 6969 + { 6970 + struct bpf_security_struct *bpfsec; 6971 + 6972 + bpfsec = kzalloc(sizeof(*bpfsec), GFP_KERNEL); 6973 + if (!bpfsec) 6974 + return -ENOMEM; 6975 + 6976 + bpfsec->sid = current_sid(); 6977 + token->security = bpfsec; 6978 + 6979 + return 0; 6980 + } 6981 + 6982 + static void selinux_bpf_token_free(struct bpf_token *token) 6983 + { 6984 + struct bpf_security_struct *bpfsec = token->security; 6985 + 6986 + token->security = NULL; 6966 6987 kfree(bpfsec); 6967 6988 } 6968 6989 #endif ··· 7349 7324 LSM_HOOK_INIT(bpf, selinux_bpf), 7350 7325 LSM_HOOK_INIT(bpf_map, selinux_bpf_map), 7351 7326 LSM_HOOK_INIT(bpf_prog, selinux_bpf_prog), 7352 - LSM_HOOK_INIT(bpf_map_free_security, selinux_bpf_map_free), 7353 - LSM_HOOK_INIT(bpf_prog_free_security, selinux_bpf_prog_free), 7327 + LSM_HOOK_INIT(bpf_map_free, selinux_bpf_map_free), 7328 + LSM_HOOK_INIT(bpf_prog_free, selinux_bpf_prog_free), 7329 + LSM_HOOK_INIT(bpf_token_free, selinux_bpf_token_free), 7354 7330 #endif 7355 7331 7356 7332 #ifdef CONFIG_PERF_EVENTS ··· 7408 7382 LSM_HOOK_INIT(audit_rule_init, selinux_audit_rule_init), 7409 7383 #endif 7410 7384 #ifdef CONFIG_BPF_SYSCALL 7411 - LSM_HOOK_INIT(bpf_map_alloc_security, selinux_bpf_map_alloc), 7412 - LSM_HOOK_INIT(bpf_prog_alloc_security, selinux_bpf_prog_alloc), 7385 + LSM_HOOK_INIT(bpf_map_create, selinux_bpf_map_create), 7386 + LSM_HOOK_INIT(bpf_prog_load, selinux_bpf_prog_load), 7387 + LSM_HOOK_INIT(bpf_token_create, selinux_bpf_token_create), 7413 7388 #endif 7414 7389 #ifdef CONFIG_PERF_EVENTS 7415 7390 LSM_HOOK_INIT(perf_event_alloc, selinux_perf_event_alloc),

+79 -17

tools/bpf/bpftool/link.c

··· 249 249 return err; 250 250 } 251 251 252 - static int cmp_u64(const void *A, const void *B) 253 - { 254 - const __u64 *a = A, *b = B; 252 + struct addr_cookie { 253 + __u64 addr; 254 + __u64 cookie; 255 + }; 255 256 256 - return *a - *b; 257 + static int cmp_addr_cookie(const void *A, const void *B) 258 + { 259 + const struct addr_cookie *a = A, *b = B; 260 + 261 + if (a->addr == b->addr) 262 + return 0; 263 + return a->addr < b->addr ? -1 : 1; 264 + } 265 + 266 + static struct addr_cookie * 267 + get_addr_cookie_array(__u64 *addrs, __u64 *cookies, __u32 count) 268 + { 269 + struct addr_cookie *data; 270 + __u32 i; 271 + 272 + data = calloc(count, sizeof(data[0])); 273 + if (!data) { 274 + p_err("mem alloc failed"); 275 + return NULL; 276 + } 277 + for (i = 0; i < count; i++) { 278 + data[i].addr = addrs[i]; 279 + data[i].cookie = cookies[i]; 280 + } 281 + qsort(data, count, sizeof(data[0]), cmp_addr_cookie); 282 + return data; 257 283 } 258 284 259 285 static void 260 286 show_kprobe_multi_json(struct bpf_link_info *info, json_writer_t *wtr) 261 287 { 288 + struct addr_cookie *data; 262 289 __u32 i, j = 0; 263 - __u64 *addrs; 264 290 265 291 jsonw_bool_field(json_wtr, "retprobe", 266 292 info->kprobe_multi.flags & BPF_F_KPROBE_MULTI_RETURN); ··· 294 268 jsonw_uint_field(json_wtr, "missed", info->kprobe_multi.missed); 295 269 jsonw_name(json_wtr, "funcs"); 296 270 jsonw_start_array(json_wtr); 297 - addrs = u64_to_ptr(info->kprobe_multi.addrs); 298 - qsort(addrs, info->kprobe_multi.count, sizeof(addrs[0]), cmp_u64); 271 + data = get_addr_cookie_array(u64_to_ptr(info->kprobe_multi.addrs), 272 + u64_to_ptr(info->kprobe_multi.cookies), 273 + info->kprobe_multi.count); 274 + if (!data) 275 + return; 299 276 300 277 /* Load it once for all. */ 301 278 if (!dd.sym_count) 302 279 kernel_syms_load(&dd); 280 + if (!dd.sym_count) 281 + goto error; 282 + 303 283 for (i = 0; i < dd.sym_count; i++) { 304 - if (dd.sym_mapping[i].address != addrs[j]) 284 + if (dd.sym_mapping[i].address != data[j].addr) 305 285 continue; 306 286 jsonw_start_object(json_wtr); 307 287 jsonw_uint_field(json_wtr, "addr", dd.sym_mapping[i].address); ··· 319 287 } else { 320 288 jsonw_string_field(json_wtr, "module", dd.sym_mapping[i].module); 321 289 } 290 + jsonw_uint_field(json_wtr, "cookie", data[j].cookie); 322 291 jsonw_end_object(json_wtr); 323 292 if (j++ == info->kprobe_multi.count) 324 293 break; 325 294 } 326 295 jsonw_end_array(json_wtr); 296 + error: 297 + free(data); 327 298 } 328 299 329 300 static __u64 *u64_to_arr(__u64 val) ··· 369 334 u64_to_ptr(info->perf_event.kprobe.func_name)); 370 335 jsonw_uint_field(wtr, "offset", info->perf_event.kprobe.offset); 371 336 jsonw_uint_field(wtr, "missed", info->perf_event.kprobe.missed); 337 + jsonw_uint_field(wtr, "cookie", info->perf_event.kprobe.cookie); 372 338 } 373 339 374 340 static void ··· 379 343 jsonw_string_field(wtr, "file", 380 344 u64_to_ptr(info->perf_event.uprobe.file_name)); 381 345 jsonw_uint_field(wtr, "offset", info->perf_event.uprobe.offset); 346 + jsonw_uint_field(wtr, "cookie", info->perf_event.uprobe.cookie); 382 347 } 383 348 384 349 static void ··· 387 350 { 388 351 jsonw_string_field(wtr, "tracepoint", 389 352 u64_to_ptr(info->perf_event.tracepoint.tp_name)); 353 + jsonw_uint_field(wtr, "cookie", info->perf_event.tracepoint.cookie); 390 354 } 391 355 392 356 static char *perf_config_hw_cache_str(__u64 config) ··· 463 425 jsonw_string_field(wtr, "event_config", perf_config); 464 426 else 465 427 jsonw_uint_field(wtr, "event_config", config); 428 + 429 + jsonw_uint_field(wtr, "cookie", info->perf_event.event.cookie); 466 430 467 431 if (type == PERF_TYPE_HW_CACHE && perf_config) 468 432 free((void *)perf_config); ··· 710 670 711 671 static void show_kprobe_multi_plain(struct bpf_link_info *info) 712 672 { 673 + struct addr_cookie *data; 713 674 __u32 i, j = 0; 714 - __u64 *addrs; 715 675 716 676 if (!info->kprobe_multi.count) 717 677 return; ··· 723 683 printf("func_cnt %u ", info->kprobe_multi.count); 724 684 if (info->kprobe_multi.missed) 725 685 printf("missed %llu ", info->kprobe_multi.missed); 726 - addrs = (__u64 *)u64_to_ptr(info->kprobe_multi.addrs); 727 - qsort(addrs, info->kprobe_multi.count, sizeof(__u64), cmp_u64); 686 + data = get_addr_cookie_array(u64_to_ptr(info->kprobe_multi.addrs), 687 + u64_to_ptr(info->kprobe_multi.cookies), 688 + info->kprobe_multi.count); 689 + if (!data) 690 + return; 728 691 729 692 /* Load it once for all. */ 730 693 if (!dd.sym_count) 731 694 kernel_syms_load(&dd); 732 695 if (!dd.sym_count) 733 - return; 696 + goto error; 734 697 735 - printf("\n\t%-16s %s", "addr", "func [module]"); 698 + printf("\n\t%-16s %-16s %s", "addr", "cookie", "func [module]"); 736 699 for (i = 0; i < dd.sym_count; i++) { 737 - if (dd.sym_mapping[i].address != addrs[j]) 700 + if (dd.sym_mapping[i].address != data[j].addr) 738 701 continue; 739 - printf("\n\t%016lx %s", 740 - dd.sym_mapping[i].address, dd.sym_mapping[i].name); 702 + printf("\n\t%016lx %-16llx %s", 703 + dd.sym_mapping[i].address, data[j].cookie, dd.sym_mapping[i].name); 741 704 if (dd.sym_mapping[i].module[0] != '\0') 742 705 printf(" [%s] ", dd.sym_mapping[i].module); 743 706 else ··· 749 706 if (j++ == info->kprobe_multi.count) 750 707 break; 751 708 } 709 + error: 710 + free(data); 752 711 } 753 712 754 713 static void show_uprobe_multi_plain(struct bpf_link_info *info) ··· 799 754 printf("+%#x", info->perf_event.kprobe.offset); 800 755 if (info->perf_event.kprobe.missed) 801 756 printf(" missed %llu", info->perf_event.kprobe.missed); 757 + if (info->perf_event.kprobe.cookie) 758 + printf(" cookie %llu", info->perf_event.kprobe.cookie); 802 759 printf(" "); 803 760 } 804 761 ··· 817 770 else 818 771 printf("\n\tuprobe "); 819 772 printf("%s+%#x ", buf, info->perf_event.uprobe.offset); 773 + if (info->perf_event.uprobe.cookie) 774 + printf("cookie %llu ", info->perf_event.uprobe.cookie); 820 775 } 821 776 822 777 static void show_perf_event_tracepoint_plain(struct bpf_link_info *info) ··· 830 781 return; 831 782 832 783 printf("\n\ttracepoint %s ", buf); 784 + if (info->perf_event.tracepoint.cookie) 785 + printf("cookie %llu ", info->perf_event.tracepoint.cookie); 833 786 } 834 787 835 788 static void show_perf_event_event_plain(struct bpf_link_info *info) ··· 852 801 printf("%s ", perf_config); 853 802 else 854 803 printf("%llu ", config); 804 + 805 + if (info->perf_event.event.cookie) 806 + printf("cookie %llu ", info->perf_event.event.cookie); 855 807 856 808 if (type == PERF_TYPE_HW_CACHE && perf_config) 857 809 free((void *)perf_config); ··· 1006 952 return -ENOMEM; 1007 953 } 1008 954 info.kprobe_multi.addrs = ptr_to_u64(addrs); 955 + cookies = calloc(count, sizeof(__u64)); 956 + if (!cookies) { 957 + p_err("mem alloc failed"); 958 + free(addrs); 959 + close(fd); 960 + return -ENOMEM; 961 + } 962 + info.kprobe_multi.cookies = ptr_to_u64(cookies); 1009 963 goto again; 1010 964 } 1011 965 } ··· 1039 977 cookies = calloc(count, sizeof(__u64)); 1040 978 if (!cookies) { 1041 979 p_err("mem alloc failed"); 1042 - free(cookies); 980 + free(ref_ctr_offsets); 1043 981 free(offsets); 1044 982 close(fd); 1045 983 return -ENOMEM;

+1 -1

tools/bpf/bpftool/prog.c

··· 2298 2298 int map_fd; 2299 2299 2300 2300 profile_perf_events = calloc( 2301 - sizeof(int), obj->rodata->num_cpu * obj->rodata->num_metric); 2301 + obj->rodata->num_cpu * obj->rodata->num_metric, sizeof(int)); 2302 2302 if (!profile_perf_events) { 2303 2303 p_err("failed to allocate memory for perf_event array: %s", 2304 2304 strerror(errno));

+75 -4

tools/include/uapi/linux/bpf.h

··· 847 847 * Returns zero on success. On error, -1 is returned and *errno* 848 848 * is set appropriately. 849 849 * 850 + * BPF_TOKEN_CREATE 851 + * Description 852 + * Create BPF token with embedded information about what 853 + * BPF-related functionality it allows: 854 + * - a set of allowed bpf() syscall commands; 855 + * - a set of allowed BPF map types to be created with 856 + * BPF_MAP_CREATE command, if BPF_MAP_CREATE itself is allowed; 857 + * - a set of allowed BPF program types and BPF program attach 858 + * types to be loaded with BPF_PROG_LOAD command, if 859 + * BPF_PROG_LOAD itself is allowed. 860 + * 861 + * BPF token is created (derived) from an instance of BPF FS, 862 + * assuming it has necessary delegation mount options specified. 863 + * This BPF token can be passed as an extra parameter to various 864 + * bpf() syscall commands to grant BPF subsystem functionality to 865 + * unprivileged processes. 866 + * 867 + * When created, BPF token is "associated" with the owning 868 + * user namespace of BPF FS instance (super block) that it was 869 + * derived from, and subsequent BPF operations performed with 870 + * BPF token would be performing capabilities checks (i.e., 871 + * CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN) within 872 + * that user namespace. Without BPF token, such capabilities 873 + * have to be granted in init user namespace, making bpf() 874 + * syscall incompatible with user namespace, for the most part. 875 + * 876 + * Return 877 + * A new file descriptor (a nonnegative integer), or -1 if an 878 + * error occurred (in which case, *errno* is set appropriately). 879 + * 850 880 * NOTES 851 881 * eBPF objects (maps and programs) can be shared between processes. 852 882 * ··· 931 901 BPF_ITER_CREATE, 932 902 BPF_LINK_DETACH, 933 903 BPF_PROG_BIND_MAP, 904 + BPF_TOKEN_CREATE, 905 + __MAX_BPF_CMD, 934 906 }; 935 907 936 908 enum bpf_map_type { ··· 983 951 BPF_MAP_TYPE_BLOOM_FILTER, 984 952 BPF_MAP_TYPE_USER_RINGBUF, 985 953 BPF_MAP_TYPE_CGRP_STORAGE, 954 + __MAX_BPF_MAP_TYPE 986 955 }; 987 956 988 957 /* Note that tracing related programs such as ··· 1028 995 BPF_PROG_TYPE_SK_LOOKUP, 1029 996 BPF_PROG_TYPE_SYSCALL, /* a program that can execute syscalls */ 1030 997 BPF_PROG_TYPE_NETFILTER, 998 + __MAX_BPF_PROG_TYPE 1031 999 }; 1032 1000 1033 1001 enum bpf_attach_type { ··· 1364 1330 1365 1331 /* Get path from provided FD in BPF_OBJ_PIN/BPF_OBJ_GET commands */ 1366 1332 BPF_F_PATH_FD = (1U << 14), 1333 + 1334 + /* Flag for value_type_btf_obj_fd, the fd is available */ 1335 + BPF_F_VTYPE_BTF_OBJ_FD = (1U << 15), 1336 + 1337 + /* BPF token FD is passed in a corresponding command's token_fd field */ 1338 + BPF_F_TOKEN_FD = (1U << 16), 1367 1339 }; 1368 1340 1369 1341 /* Flags for BPF_PROG_QUERY. */ ··· 1443 1403 * to using 5 hash functions). 1444 1404 */ 1445 1405 __u64 map_extra; 1406 + 1407 + __s32 value_type_btf_obj_fd; /* fd pointing to a BTF 1408 + * type data for 1409 + * btf_vmlinux_value_type_id. 1410 + */ 1411 + /* BPF token FD to use with BPF_MAP_CREATE operation. 1412 + * If provided, map_flags should have BPF_F_TOKEN_FD flag set. 1413 + */ 1414 + __s32 map_token_fd; 1446 1415 }; 1447 1416 1448 1417 struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */ ··· 1521 1472 * truncated), or smaller (if log buffer wasn't filled completely). 1522 1473 */ 1523 1474 __u32 log_true_size; 1475 + /* BPF token FD to use with BPF_PROG_LOAD operation. 1476 + * If provided, prog_flags should have BPF_F_TOKEN_FD flag set. 1477 + */ 1478 + __s32 prog_token_fd; 1524 1479 }; 1525 1480 1526 1481 struct { /* anonymous struct used by BPF_OBJ_* commands */ ··· 1637 1584 * truncated), or smaller (if log buffer wasn't filled completely). 1638 1585 */ 1639 1586 __u32 btf_log_true_size; 1587 + __u32 btf_flags; 1588 + /* BPF token FD to use with BPF_BTF_LOAD operation. 1589 + * If provided, btf_flags should have BPF_F_TOKEN_FD flag set. 1590 + */ 1591 + __s32 btf_token_fd; 1640 1592 }; 1641 1593 1642 1594 struct { ··· 1771 1713 __u32 map_fd; 1772 1714 __u32 flags; /* extra flags */ 1773 1715 } prog_bind_map; 1716 + 1717 + struct { /* struct used by BPF_TOKEN_CREATE command */ 1718 + __u32 flags; 1719 + __u32 bpffs_fd; 1720 + } token_create; 1774 1721 1775 1722 } __attribute__((aligned(8))); 1776 1723 ··· 4902 4839 * going through the CPU's backlog queue. 4903 4840 * 4904 4841 * The *flags* argument is reserved and must be 0. The helper is 4905 - * currently only supported for tc BPF program types at the ingress 4906 - * hook and for veth device types. The peer device must reside in a 4907 - * different network namespace. 4842 + * currently only supported for tc BPF program types at the 4843 + * ingress hook and for veth and netkit target device types. The 4844 + * peer device must reside in a different network namespace. 4908 4845 * Return 4909 4846 * The helper returns **TC_ACT_REDIRECT** on success or 4910 4847 * **TC_ACT_SHOT** on error. ··· 6550 6487 __u32 btf_id; 6551 6488 __u32 btf_key_type_id; 6552 6489 __u32 btf_value_type_id; 6553 - __u32 :32; /* alignment pad */ 6490 + __u32 btf_vmlinux_id; 6554 6491 __u64 map_extra; 6555 6492 } __attribute__((aligned(8))); 6556 6493 ··· 6626 6563 __u32 count; /* in/out: kprobe_multi function count */ 6627 6564 __u32 flags; 6628 6565 __u64 missed; 6566 + __aligned_u64 cookies; 6629 6567 } kprobe_multi; 6630 6568 struct { 6631 6569 __aligned_u64 path; ··· 6646 6582 __aligned_u64 file_name; /* in/out */ 6647 6583 __u32 name_len; 6648 6584 __u32 offset; /* offset from file_name */ 6585 + __u64 cookie; 6649 6586 } uprobe; /* BPF_PERF_EVENT_UPROBE, BPF_PERF_EVENT_URETPROBE */ 6650 6587 struct { 6651 6588 __aligned_u64 func_name; /* in/out */ ··· 6654 6589 __u32 offset; /* offset from func_name */ 6655 6590 __u64 addr; 6656 6591 __u64 missed; 6592 + __u64 cookie; 6657 6593 } kprobe; /* BPF_PERF_EVENT_KPROBE, BPF_PERF_EVENT_KRETPROBE */ 6658 6594 struct { 6659 6595 __aligned_u64 tp_name; /* in/out */ 6660 6596 __u32 name_len; 6597 + __u32 :32; 6598 + __u64 cookie; 6661 6599 } tracepoint; /* BPF_PERF_EVENT_TRACEPOINT */ 6662 6600 struct { 6663 6601 __u64 config; 6664 6602 __u32 type; 6603 + __u32 :32; 6604 + __u64 cookie; 6665 6605 } event; /* BPF_PERF_EVENT_EVENT */ 6666 6606 }; 6667 6607 } perf_event; ··· 6974 6904 BPF_TCP_LISTEN, 6975 6905 BPF_TCP_CLOSING, /* Now a valid state */ 6976 6906 BPF_TCP_NEW_SYN_RECV, 6907 + BPF_TCP_BOUND_INACTIVE, 6977 6908 6978 6909 BPF_TCP_MAX_STATES /* Leave at the end! */ 6979 6910 };

+1 -1

tools/lib/bpf/Build

··· 1 1 libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o str_error.o \ 2 2 netlink.o bpf_prog_linfo.o libbpf_probes.o hashmap.o \ 3 3 btf_dump.o ringbuf.o strset.o linker.o gen_loader.o relo_core.o \ 4 - usdt.o zip.o elf.o 4 + usdt.o zip.o elf.o features.o

+35 -7

tools/lib/bpf/bpf.c

··· 103 103 * [0] https://lore.kernel.org/bpf/20201201215900.3569844-1-guro@fb.com/ 104 104 * [1] d05512618056 ("bpf: Add bpf_ktime_get_coarse_ns helper") 105 105 */ 106 - int probe_memcg_account(void) 106 + int probe_memcg_account(int token_fd) 107 107 { 108 108 const size_t attr_sz = offsetofend(union bpf_attr, attach_btf_obj_fd); 109 109 struct bpf_insn insns[] = { ··· 120 120 attr.insns = ptr_to_u64(insns); 121 121 attr.insn_cnt = insn_cnt; 122 122 attr.license = ptr_to_u64("GPL"); 123 + attr.prog_token_fd = token_fd; 124 + if (token_fd) 125 + attr.prog_flags |= BPF_F_TOKEN_FD; 123 126 124 127 prog_fd = sys_bpf_fd(BPF_PROG_LOAD, &attr, attr_sz); 125 128 if (prog_fd >= 0) { ··· 149 146 struct rlimit rlim; 150 147 151 148 /* if kernel supports memcg-based accounting, skip bumping RLIMIT_MEMLOCK */ 152 - if (memlock_bumped || kernel_supports(NULL, FEAT_MEMCG_ACCOUNT)) 149 + if (memlock_bumped || feat_supported(NULL, FEAT_MEMCG_ACCOUNT)) 153 150 return 0; 154 151 155 152 memlock_bumped = true; ··· 172 169 __u32 max_entries, 173 170 const struct bpf_map_create_opts *opts) 174 171 { 175 - const size_t attr_sz = offsetofend(union bpf_attr, map_extra); 172 + const size_t attr_sz = offsetofend(union bpf_attr, map_token_fd); 176 173 union bpf_attr attr; 177 174 int fd; 178 175 ··· 184 181 return libbpf_err(-EINVAL); 185 182 186 183 attr.map_type = map_type; 187 - if (map_name && kernel_supports(NULL, FEAT_PROG_NAME)) 184 + if (map_name && feat_supported(NULL, FEAT_PROG_NAME)) 188 185 libbpf_strlcpy(attr.map_name, map_name, sizeof(attr.map_name)); 189 186 attr.key_size = key_size; 190 187 attr.value_size = value_size; ··· 194 191 attr.btf_key_type_id = OPTS_GET(opts, btf_key_type_id, 0); 195 192 attr.btf_value_type_id = OPTS_GET(opts, btf_value_type_id, 0); 196 193 attr.btf_vmlinux_value_type_id = OPTS_GET(opts, btf_vmlinux_value_type_id, 0); 194 + attr.value_type_btf_obj_fd = OPTS_GET(opts, value_type_btf_obj_fd, 0); 197 195 198 196 attr.inner_map_fd = OPTS_GET(opts, inner_map_fd, 0); 199 197 attr.map_flags = OPTS_GET(opts, map_flags, 0); 200 198 attr.map_extra = OPTS_GET(opts, map_extra, 0); 201 199 attr.numa_node = OPTS_GET(opts, numa_node, 0); 202 200 attr.map_ifindex = OPTS_GET(opts, map_ifindex, 0); 201 + 202 + attr.map_token_fd = OPTS_GET(opts, token_fd, 0); 203 203 204 204 fd = sys_bpf_fd(BPF_MAP_CREATE, &attr, attr_sz); 205 205 return libbpf_err_errno(fd); ··· 238 232 const struct bpf_insn *insns, size_t insn_cnt, 239 233 struct bpf_prog_load_opts *opts) 240 234 { 241 - const size_t attr_sz = offsetofend(union bpf_attr, log_true_size); 235 + const size_t attr_sz = offsetofend(union bpf_attr, prog_token_fd); 242 236 void *finfo = NULL, *linfo = NULL; 243 237 const char *func_info, *line_info; 244 238 __u32 log_size, log_level, attach_prog_fd, attach_btf_obj_fd; ··· 267 261 attr.prog_flags = OPTS_GET(opts, prog_flags, 0); 268 262 attr.prog_ifindex = OPTS_GET(opts, prog_ifindex, 0); 269 263 attr.kern_version = OPTS_GET(opts, kern_version, 0); 264 + attr.prog_token_fd = OPTS_GET(opts, token_fd, 0); 270 265 271 - if (prog_name && kernel_supports(NULL, FEAT_PROG_NAME)) 266 + if (prog_name && feat_supported(NULL, FEAT_PROG_NAME)) 272 267 libbpf_strlcpy(attr.prog_name, prog_name, sizeof(attr.prog_name)); 273 268 attr.license = ptr_to_u64(license); 274 269 ··· 1189 1182 1190 1183 int bpf_btf_load(const void *btf_data, size_t btf_size, struct bpf_btf_load_opts *opts) 1191 1184 { 1192 - const size_t attr_sz = offsetofend(union bpf_attr, btf_log_true_size); 1185 + const size_t attr_sz = offsetofend(union bpf_attr, btf_token_fd); 1193 1186 union bpf_attr attr; 1194 1187 char *log_buf; 1195 1188 size_t log_size; ··· 1214 1207 1215 1208 attr.btf = ptr_to_u64(btf_data); 1216 1209 attr.btf_size = btf_size; 1210 + 1211 + attr.btf_flags = OPTS_GET(opts, btf_flags, 0); 1212 + attr.btf_token_fd = OPTS_GET(opts, token_fd, 0); 1213 + 1217 1214 /* log_level == 0 and log_buf != NULL means "try loading without 1218 1215 * log_buf, but retry with log_buf and log_level=1 on error", which is 1219 1216 * consistent across low-level and high-level BTF and program loading ··· 1297 1286 1298 1287 ret = sys_bpf(BPF_PROG_BIND_MAP, &attr, attr_sz); 1299 1288 return libbpf_err_errno(ret); 1289 + } 1290 + 1291 + int bpf_token_create(int bpffs_fd, struct bpf_token_create_opts *opts) 1292 + { 1293 + const size_t attr_sz = offsetofend(union bpf_attr, token_create); 1294 + union bpf_attr attr; 1295 + int fd; 1296 + 1297 + if (!OPTS_VALID(opts, bpf_token_create_opts)) 1298 + return libbpf_err(-EINVAL); 1299 + 1300 + memset(&attr, 0, attr_sz); 1301 + attr.token_create.bpffs_fd = bpffs_fd; 1302 + attr.token_create.flags = OPTS_GET(opts, flags, 0); 1303 + 1304 + fd = sys_bpf_fd(BPF_TOKEN_CREATE, &attr, attr_sz); 1305 + return libbpf_err_errno(fd); 1300 1306 }

+35 -3

tools/lib/bpf/bpf.h

··· 51 51 52 52 __u32 numa_node; 53 53 __u32 map_ifindex; 54 + __s32 value_type_btf_obj_fd; 55 + 56 + __u32 token_fd; 57 + size_t :0; 54 58 }; 55 - #define bpf_map_create_opts__last_field map_ifindex 59 + #define bpf_map_create_opts__last_field token_fd 56 60 57 61 LIBBPF_API int bpf_map_create(enum bpf_map_type map_type, 58 62 const char *map_name, ··· 106 102 * If kernel doesn't support this feature, log_size is left unchanged. 107 103 */ 108 104 __u32 log_true_size; 105 + __u32 token_fd; 109 106 size_t :0; 110 107 }; 111 - #define bpf_prog_load_opts__last_field log_true_size 108 + #define bpf_prog_load_opts__last_field token_fd 112 109 113 110 LIBBPF_API int bpf_prog_load(enum bpf_prog_type prog_type, 114 111 const char *prog_name, const char *license, ··· 135 130 * If kernel doesn't support this feature, log_size is left unchanged. 136 131 */ 137 132 __u32 log_true_size; 133 + 134 + __u32 btf_flags; 135 + __u32 token_fd; 138 136 size_t :0; 139 137 }; 140 - #define bpf_btf_load_opts__last_field log_true_size 138 + #define bpf_btf_load_opts__last_field token_fd 141 139 142 140 LIBBPF_API int bpf_btf_load(const void *btf_data, size_t btf_size, 143 141 struct bpf_btf_load_opts *opts); ··· 647 639 648 640 LIBBPF_API int bpf_prog_test_run_opts(int prog_fd, 649 641 struct bpf_test_run_opts *opts); 642 + 643 + struct bpf_token_create_opts { 644 + size_t sz; /* size of this struct for forward/backward compatibility */ 645 + __u32 flags; 646 + size_t :0; 647 + }; 648 + #define bpf_token_create_opts__last_field flags 649 + 650 + /** 651 + * @brief **bpf_token_create()** creates a new instance of BPF token derived 652 + * from specified BPF FS mount point. 653 + * 654 + * BPF token created with this API can be passed to bpf() syscall for 655 + * commands like BPF_PROG_LOAD, BPF_MAP_CREATE, etc. 656 + * 657 + * @param bpffs_fd FD for BPF FS instance from which to derive a BPF token 658 + * instance. 659 + * @param opts optional BPF token creation options, can be NULL 660 + * 661 + * @return BPF token FD > 0, on success; negative error code, otherwise (errno 662 + * is also set to the error code) 663 + */ 664 + LIBBPF_API int bpf_token_create(int bpffs_fd, 665 + struct bpf_token_create_opts *opts); 650 666 651 667 #ifdef __cplusplus 652 668 } /* extern "C" */

+1 -1

tools/lib/bpf/bpf_core_read.h

··· 268 268 * a relocation, which records BTF type ID describing root struct/union and an 269 269 * accessor string which describes exact embedded field that was used to take 270 270 * an address. See detailed description of this relocation format and 271 - * semantics in comments to struct bpf_field_reloc in libbpf_internal.h. 271 + * semantics in comments to struct bpf_core_relo in include/uapi/linux/bpf.h. 272 272 * 273 273 * This relocation allows libbpf to adjust BPF instruction to use correct 274 274 * actual field offset, based on target kernel BTF type that matches original

+8 -2

tools/lib/bpf/btf.c

··· 1317 1317 1318 1318 static void *btf_get_raw_data(const struct btf *btf, __u32 *size, bool swap_endian); 1319 1319 1320 - int btf_load_into_kernel(struct btf *btf, char *log_buf, size_t log_sz, __u32 log_level) 1320 + int btf_load_into_kernel(struct btf *btf, 1321 + char *log_buf, size_t log_sz, __u32 log_level, 1322 + int token_fd) 1321 1323 { 1322 1324 LIBBPF_OPTS(bpf_btf_load_opts, opts); 1323 1325 __u32 buf_sz = 0, raw_size; ··· 1369 1367 opts.log_level = log_level; 1370 1368 } 1371 1369 1370 + opts.token_fd = token_fd; 1371 + if (token_fd) 1372 + opts.btf_flags |= BPF_F_TOKEN_FD; 1373 + 1372 1374 btf->fd = bpf_btf_load(raw_data, raw_size, &opts); 1373 1375 if (btf->fd < 0) { 1374 1376 /* time to turn on verbose mode and try again */ ··· 1400 1394 1401 1395 int btf__load_into_kernel(struct btf *btf) 1402 1396 { 1403 - return btf_load_into_kernel(btf, NULL, 0, 0); 1397 + return btf_load_into_kernel(btf, NULL, 0, 0, 0); 1404 1398 } 1405 1399 1406 1400 int btf__fd(const struct btf *btf)

-2

tools/lib/bpf/elf.c

··· 11 11 #include "libbpf_internal.h" 12 12 #include "str_error.h" 13 13 14 - #define STRERR_BUFSIZE 128 15 - 16 14 /* A SHT_GNU_versym section holds 16-bit words. This bit is set if 17 15 * the symbol is hidden and can only be seen when referenced using an 18 16 * explicit version number. This is a GNU extension.

+503

tools/lib/bpf/features.c

··· 1 + // SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) 2 + /* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */ 3 + #include <linux/kernel.h> 4 + #include <linux/filter.h> 5 + #include "bpf.h" 6 + #include "libbpf.h" 7 + #include "libbpf_common.h" 8 + #include "libbpf_internal.h" 9 + #include "str_error.h" 10 + 11 + static inline __u64 ptr_to_u64(const void *ptr) 12 + { 13 + return (__u64)(unsigned long)ptr; 14 + } 15 + 16 + int probe_fd(int fd) 17 + { 18 + if (fd >= 0) 19 + close(fd); 20 + return fd >= 0; 21 + } 22 + 23 + static int probe_kern_prog_name(int token_fd) 24 + { 25 + const size_t attr_sz = offsetofend(union bpf_attr, prog_name); 26 + struct bpf_insn insns[] = { 27 + BPF_MOV64_IMM(BPF_REG_0, 0), 28 + BPF_EXIT_INSN(), 29 + }; 30 + union bpf_attr attr; 31 + int ret; 32 + 33 + memset(&attr, 0, attr_sz); 34 + attr.prog_type = BPF_PROG_TYPE_SOCKET_FILTER; 35 + attr.license = ptr_to_u64("GPL"); 36 + attr.insns = ptr_to_u64(insns); 37 + attr.insn_cnt = (__u32)ARRAY_SIZE(insns); 38 + attr.prog_token_fd = token_fd; 39 + if (token_fd) 40 + attr.prog_flags |= BPF_F_TOKEN_FD; 41 + libbpf_strlcpy(attr.prog_name, "libbpf_nametest", sizeof(attr.prog_name)); 42 + 43 + /* make sure loading with name works */ 44 + ret = sys_bpf_prog_load(&attr, attr_sz, PROG_LOAD_ATTEMPTS); 45 + return probe_fd(ret); 46 + } 47 + 48 + static int probe_kern_global_data(int token_fd) 49 + { 50 + char *cp, errmsg[STRERR_BUFSIZE]; 51 + struct bpf_insn insns[] = { 52 + BPF_LD_MAP_VALUE(BPF_REG_1, 0, 16), 53 + BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 42), 54 + BPF_MOV64_IMM(BPF_REG_0, 0), 55 + BPF_EXIT_INSN(), 56 + }; 57 + LIBBPF_OPTS(bpf_map_create_opts, map_opts, 58 + .token_fd = token_fd, 59 + .map_flags = token_fd ? BPF_F_TOKEN_FD : 0, 60 + ); 61 + LIBBPF_OPTS(bpf_prog_load_opts, prog_opts, 62 + .token_fd = token_fd, 63 + .prog_flags = token_fd ? BPF_F_TOKEN_FD : 0, 64 + ); 65 + int ret, map, insn_cnt = ARRAY_SIZE(insns); 66 + 67 + map = bpf_map_create(BPF_MAP_TYPE_ARRAY, "libbpf_global", sizeof(int), 32, 1, &map_opts); 68 + if (map < 0) { 69 + ret = -errno; 70 + cp = libbpf_strerror_r(ret, errmsg, sizeof(errmsg)); 71 + pr_warn("Error in %s():%s(%d). Couldn't create simple array map.\n", 72 + __func__, cp, -ret); 73 + return ret; 74 + } 75 + 76 + insns[0].imm = map; 77 + 78 + ret = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, NULL, "GPL", insns, insn_cnt, &prog_opts); 79 + close(map); 80 + return probe_fd(ret); 81 + } 82 + 83 + static int probe_kern_btf(int token_fd) 84 + { 85 + static const char strs[] = "\0int"; 86 + __u32 types[] = { 87 + /* int */ 88 + BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4), 89 + }; 90 + 91 + return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types), 92 + strs, sizeof(strs), token_fd)); 93 + } 94 + 95 + static int probe_kern_btf_func(int token_fd) 96 + { 97 + static const char strs[] = "\0int\0x\0a"; 98 + /* void x(int a) {} */ 99 + __u32 types[] = { 100 + /* int */ 101 + BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4), /* [1] */ 102 + /* FUNC_PROTO */ /* [2] */ 103 + BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_FUNC_PROTO, 0, 1), 0), 104 + BTF_PARAM_ENC(7, 1), 105 + /* FUNC x */ /* [3] */ 106 + BTF_TYPE_ENC(5, BTF_INFO_ENC(BTF_KIND_FUNC, 0, 0), 2), 107 + }; 108 + 109 + return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types), 110 + strs, sizeof(strs), token_fd)); 111 + } 112 + 113 + static int probe_kern_btf_func_global(int token_fd) 114 + { 115 + static const char strs[] = "\0int\0x\0a"; 116 + /* static void x(int a) {} */ 117 + __u32 types[] = { 118 + /* int */ 119 + BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4), /* [1] */ 120 + /* FUNC_PROTO */ /* [2] */ 121 + BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_FUNC_PROTO, 0, 1), 0), 122 + BTF_PARAM_ENC(7, 1), 123 + /* FUNC x BTF_FUNC_GLOBAL */ /* [3] */ 124 + BTF_TYPE_ENC(5, BTF_INFO_ENC(BTF_KIND_FUNC, 0, BTF_FUNC_GLOBAL), 2), 125 + }; 126 + 127 + return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types), 128 + strs, sizeof(strs), token_fd)); 129 + } 130 + 131 + static int probe_kern_btf_datasec(int token_fd) 132 + { 133 + static const char strs[] = "\0x\0.data"; 134 + /* static int a; */ 135 + __u32 types[] = { 136 + /* int */ 137 + BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */ 138 + /* VAR x */ /* [2] */ 139 + BTF_TYPE_ENC(1, BTF_INFO_ENC(BTF_KIND_VAR, 0, 0), 1), 140 + BTF_VAR_STATIC, 141 + /* DATASEC val */ /* [3] */ 142 + BTF_TYPE_ENC(3, BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 1), 4), 143 + BTF_VAR_SECINFO_ENC(2, 0, 4), 144 + }; 145 + 146 + return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types), 147 + strs, sizeof(strs), token_fd)); 148 + } 149 + 150 + static int probe_kern_btf_float(int token_fd) 151 + { 152 + static const char strs[] = "\0float"; 153 + __u32 types[] = { 154 + /* float */ 155 + BTF_TYPE_FLOAT_ENC(1, 4), 156 + }; 157 + 158 + return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types), 159 + strs, sizeof(strs), token_fd)); 160 + } 161 + 162 + static int probe_kern_btf_decl_tag(int token_fd) 163 + { 164 + static const char strs[] = "\0tag"; 165 + __u32 types[] = { 166 + /* int */ 167 + BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */ 168 + /* VAR x */ /* [2] */ 169 + BTF_TYPE_ENC(1, BTF_INFO_ENC(BTF_KIND_VAR, 0, 0), 1), 170 + BTF_VAR_STATIC, 171 + /* attr */ 172 + BTF_TYPE_DECL_TAG_ENC(1, 2, -1), 173 + }; 174 + 175 + return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types), 176 + strs, sizeof(strs), token_fd)); 177 + } 178 + 179 + static int probe_kern_btf_type_tag(int token_fd) 180 + { 181 + static const char strs[] = "\0tag"; 182 + __u32 types[] = { 183 + /* int */ 184 + BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */ 185 + /* attr */ 186 + BTF_TYPE_TYPE_TAG_ENC(1, 1), /* [2] */ 187 + /* ptr */ 188 + BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_PTR, 0, 0), 2), /* [3] */ 189 + }; 190 + 191 + return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types), 192 + strs, sizeof(strs), token_fd)); 193 + } 194 + 195 + static int probe_kern_array_mmap(int token_fd) 196 + { 197 + LIBBPF_OPTS(bpf_map_create_opts, opts, 198 + .map_flags = BPF_F_MMAPABLE | (token_fd ? BPF_F_TOKEN_FD : 0), 199 + .token_fd = token_fd, 200 + ); 201 + int fd; 202 + 203 + fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, "libbpf_mmap", sizeof(int), sizeof(int), 1, &opts); 204 + return probe_fd(fd); 205 + } 206 + 207 + static int probe_kern_exp_attach_type(int token_fd) 208 + { 209 + LIBBPF_OPTS(bpf_prog_load_opts, opts, 210 + .expected_attach_type = BPF_CGROUP_INET_SOCK_CREATE, 211 + .token_fd = token_fd, 212 + .prog_flags = token_fd ? BPF_F_TOKEN_FD : 0, 213 + ); 214 + struct bpf_insn insns[] = { 215 + BPF_MOV64_IMM(BPF_REG_0, 0), 216 + BPF_EXIT_INSN(), 217 + }; 218 + int fd, insn_cnt = ARRAY_SIZE(insns); 219 + 220 + /* use any valid combination of program type and (optional) 221 + * non-zero expected attach type (i.e., not a BPF_CGROUP_INET_INGRESS) 222 + * to see if kernel supports expected_attach_type field for 223 + * BPF_PROG_LOAD command 224 + */ 225 + fd = bpf_prog_load(BPF_PROG_TYPE_CGROUP_SOCK, NULL, "GPL", insns, insn_cnt, &opts); 226 + return probe_fd(fd); 227 + } 228 + 229 + static int probe_kern_probe_read_kernel(int token_fd) 230 + { 231 + LIBBPF_OPTS(bpf_prog_load_opts, opts, 232 + .token_fd = token_fd, 233 + .prog_flags = token_fd ? BPF_F_TOKEN_FD : 0, 234 + ); 235 + struct bpf_insn insns[] = { 236 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_10), /* r1 = r10 (fp) */ 237 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8), /* r1 += -8 */ 238 + BPF_MOV64_IMM(BPF_REG_2, 8), /* r2 = 8 */ 239 + BPF_MOV64_IMM(BPF_REG_3, 0), /* r3 = 0 */ 240 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_probe_read_kernel), 241 + BPF_EXIT_INSN(), 242 + }; 243 + int fd, insn_cnt = ARRAY_SIZE(insns); 244 + 245 + fd = bpf_prog_load(BPF_PROG_TYPE_TRACEPOINT, NULL, "GPL", insns, insn_cnt, &opts); 246 + return probe_fd(fd); 247 + } 248 + 249 + static int probe_prog_bind_map(int token_fd) 250 + { 251 + char *cp, errmsg[STRERR_BUFSIZE]; 252 + struct bpf_insn insns[] = { 253 + BPF_MOV64_IMM(BPF_REG_0, 0), 254 + BPF_EXIT_INSN(), 255 + }; 256 + LIBBPF_OPTS(bpf_map_create_opts, map_opts, 257 + .token_fd = token_fd, 258 + .map_flags = token_fd ? BPF_F_TOKEN_FD : 0, 259 + ); 260 + LIBBPF_OPTS(bpf_prog_load_opts, prog_opts, 261 + .token_fd = token_fd, 262 + .prog_flags = token_fd ? BPF_F_TOKEN_FD : 0, 263 + ); 264 + int ret, map, prog, insn_cnt = ARRAY_SIZE(insns); 265 + 266 + map = bpf_map_create(BPF_MAP_TYPE_ARRAY, "libbpf_det_bind", sizeof(int), 32, 1, &map_opts); 267 + if (map < 0) { 268 + ret = -errno; 269 + cp = libbpf_strerror_r(ret, errmsg, sizeof(errmsg)); 270 + pr_warn("Error in %s():%s(%d). Couldn't create simple array map.\n", 271 + __func__, cp, -ret); 272 + return ret; 273 + } 274 + 275 + prog = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, NULL, "GPL", insns, insn_cnt, &prog_opts); 276 + if (prog < 0) { 277 + close(map); 278 + return 0; 279 + } 280 + 281 + ret = bpf_prog_bind_map(prog, map, NULL); 282 + 283 + close(map); 284 + close(prog); 285 + 286 + return ret >= 0; 287 + } 288 + 289 + static int probe_module_btf(int token_fd) 290 + { 291 + static const char strs[] = "\0int"; 292 + __u32 types[] = { 293 + /* int */ 294 + BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4), 295 + }; 296 + struct bpf_btf_info info; 297 + __u32 len = sizeof(info); 298 + char name[16]; 299 + int fd, err; 300 + 301 + fd = libbpf__load_raw_btf((char *)types, sizeof(types), strs, sizeof(strs), token_fd); 302 + if (fd < 0) 303 + return 0; /* BTF not supported at all */ 304 + 305 + memset(&info, 0, sizeof(info)); 306 + info.name = ptr_to_u64(name); 307 + info.name_len = sizeof(name); 308 + 309 + /* check that BPF_OBJ_GET_INFO_BY_FD supports specifying name pointer; 310 + * kernel's module BTF support coincides with support for 311 + * name/name_len fields in struct bpf_btf_info. 312 + */ 313 + err = bpf_btf_get_info_by_fd(fd, &info, &len); 314 + close(fd); 315 + return !err; 316 + } 317 + 318 + static int probe_perf_link(int token_fd) 319 + { 320 + struct bpf_insn insns[] = { 321 + BPF_MOV64_IMM(BPF_REG_0, 0), 322 + BPF_EXIT_INSN(), 323 + }; 324 + LIBBPF_OPTS(bpf_prog_load_opts, opts, 325 + .token_fd = token_fd, 326 + .prog_flags = token_fd ? BPF_F_TOKEN_FD : 0, 327 + ); 328 + int prog_fd, link_fd, err; 329 + 330 + prog_fd = bpf_prog_load(BPF_PROG_TYPE_TRACEPOINT, NULL, "GPL", 331 + insns, ARRAY_SIZE(insns), &opts); 332 + if (prog_fd < 0) 333 + return -errno; 334 + 335 + /* use invalid perf_event FD to get EBADF, if link is supported; 336 + * otherwise EINVAL should be returned 337 + */ 338 + link_fd = bpf_link_create(prog_fd, -1, BPF_PERF_EVENT, NULL); 339 + err = -errno; /* close() can clobber errno */ 340 + 341 + if (link_fd >= 0) 342 + close(link_fd); 343 + close(prog_fd); 344 + 345 + return link_fd < 0 && err == -EBADF; 346 + } 347 + 348 + static int probe_uprobe_multi_link(int token_fd) 349 + { 350 + LIBBPF_OPTS(bpf_prog_load_opts, load_opts, 351 + .expected_attach_type = BPF_TRACE_UPROBE_MULTI, 352 + .token_fd = token_fd, 353 + .prog_flags = token_fd ? BPF_F_TOKEN_FD : 0, 354 + ); 355 + LIBBPF_OPTS(bpf_link_create_opts, link_opts); 356 + struct bpf_insn insns[] = { 357 + BPF_MOV64_IMM(BPF_REG_0, 0), 358 + BPF_EXIT_INSN(), 359 + }; 360 + int prog_fd, link_fd, err; 361 + unsigned long offset = 0; 362 + 363 + prog_fd = bpf_prog_load(BPF_PROG_TYPE_KPROBE, NULL, "GPL", 364 + insns, ARRAY_SIZE(insns), &load_opts); 365 + if (prog_fd < 0) 366 + return -errno; 367 + 368 + /* Creating uprobe in '/' binary should fail with -EBADF. */ 369 + link_opts.uprobe_multi.path = "/"; 370 + link_opts.uprobe_multi.offsets = &offset; 371 + link_opts.uprobe_multi.cnt = 1; 372 + 373 + link_fd = bpf_link_create(prog_fd, -1, BPF_TRACE_UPROBE_MULTI, &link_opts); 374 + err = -errno; /* close() can clobber errno */ 375 + 376 + if (link_fd >= 0) 377 + close(link_fd); 378 + close(prog_fd); 379 + 380 + return link_fd < 0 && err == -EBADF; 381 + } 382 + 383 + static int probe_kern_bpf_cookie(int token_fd) 384 + { 385 + struct bpf_insn insns[] = { 386 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_attach_cookie), 387 + BPF_EXIT_INSN(), 388 + }; 389 + LIBBPF_OPTS(bpf_prog_load_opts, opts, 390 + .token_fd = token_fd, 391 + .prog_flags = token_fd ? BPF_F_TOKEN_FD : 0, 392 + ); 393 + int ret, insn_cnt = ARRAY_SIZE(insns); 394 + 395 + ret = bpf_prog_load(BPF_PROG_TYPE_TRACEPOINT, NULL, "GPL", insns, insn_cnt, &opts); 396 + return probe_fd(ret); 397 + } 398 + 399 + static int probe_kern_btf_enum64(int token_fd) 400 + { 401 + static const char strs[] = "\0enum64"; 402 + __u32 types[] = { 403 + BTF_TYPE_ENC(1, BTF_INFO_ENC(BTF_KIND_ENUM64, 0, 0), 8), 404 + }; 405 + 406 + return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types), 407 + strs, sizeof(strs), token_fd)); 408 + } 409 + 410 + typedef int (*feature_probe_fn)(int /* token_fd */); 411 + 412 + static struct kern_feature_cache feature_cache; 413 + 414 + static struct kern_feature_desc { 415 + const char *desc; 416 + feature_probe_fn probe; 417 + } feature_probes[__FEAT_CNT] = { 418 + [FEAT_PROG_NAME] = { 419 + "BPF program name", probe_kern_prog_name, 420 + }, 421 + [FEAT_GLOBAL_DATA] = { 422 + "global variables", probe_kern_global_data, 423 + }, 424 + [FEAT_BTF] = { 425 + "minimal BTF", probe_kern_btf, 426 + }, 427 + [FEAT_BTF_FUNC] = { 428 + "BTF functions", probe_kern_btf_func, 429 + }, 430 + [FEAT_BTF_GLOBAL_FUNC] = { 431 + "BTF global function", probe_kern_btf_func_global, 432 + }, 433 + [FEAT_BTF_DATASEC] = { 434 + "BTF data section and variable", probe_kern_btf_datasec, 435 + }, 436 + [FEAT_ARRAY_MMAP] = { 437 + "ARRAY map mmap()", probe_kern_array_mmap, 438 + }, 439 + [FEAT_EXP_ATTACH_TYPE] = { 440 + "BPF_PROG_LOAD expected_attach_type attribute", 441 + probe_kern_exp_attach_type, 442 + }, 443 + [FEAT_PROBE_READ_KERN] = { 444 + "bpf_probe_read_kernel() helper", probe_kern_probe_read_kernel, 445 + }, 446 + [FEAT_PROG_BIND_MAP] = { 447 + "BPF_PROG_BIND_MAP support", probe_prog_bind_map, 448 + }, 449 + [FEAT_MODULE_BTF] = { 450 + "module BTF support", probe_module_btf, 451 + }, 452 + [FEAT_BTF_FLOAT] = { 453 + "BTF_KIND_FLOAT support", probe_kern_btf_float, 454 + }, 455 + [FEAT_PERF_LINK] = { 456 + "BPF perf link support", probe_perf_link, 457 + }, 458 + [FEAT_BTF_DECL_TAG] = { 459 + "BTF_KIND_DECL_TAG support", probe_kern_btf_decl_tag, 460 + }, 461 + [FEAT_BTF_TYPE_TAG] = { 462 + "BTF_KIND_TYPE_TAG support", probe_kern_btf_type_tag, 463 + }, 464 + [FEAT_MEMCG_ACCOUNT] = { 465 + "memcg-based memory accounting", probe_memcg_account, 466 + }, 467 + [FEAT_BPF_COOKIE] = { 468 + "BPF cookie support", probe_kern_bpf_cookie, 469 + }, 470 + [FEAT_BTF_ENUM64] = { 471 + "BTF_KIND_ENUM64 support", probe_kern_btf_enum64, 472 + }, 473 + [FEAT_SYSCALL_WRAPPER] = { 474 + "Kernel using syscall wrapper", probe_kern_syscall_wrapper, 475 + }, 476 + [FEAT_UPROBE_MULTI_LINK] = { 477 + "BPF multi-uprobe link support", probe_uprobe_multi_link, 478 + }, 479 + }; 480 + 481 + bool feat_supported(struct kern_feature_cache *cache, enum kern_feature_id feat_id) 482 + { 483 + struct kern_feature_desc *feat = &feature_probes[feat_id]; 484 + int ret; 485 + 486 + /* assume global feature cache, unless custom one is provided */ 487 + if (!cache) 488 + cache = &feature_cache; 489 + 490 + if (READ_ONCE(cache->res[feat_id]) == FEAT_UNKNOWN) { 491 + ret = feat->probe(cache->token_fd); 492 + if (ret > 0) { 493 + WRITE_ONCE(cache->res[feat_id], FEAT_SUPPORTED); 494 + } else if (ret == 0) { 495 + WRITE_ONCE(cache->res[feat_id], FEAT_MISSING); 496 + } else { 497 + pr_warn("Detection of kernel %s support failed: %d\n", feat->desc, ret); 498 + WRITE_ONCE(cache->res[feat_id], FEAT_MISSING); 499 + } 500 + } 501 + 502 + return READ_ONCE(cache->res[feat_id]) == FEAT_SUPPORTED; 503 + }

+139 -465

tools/lib/bpf/libbpf.c

··· 59 59 #define BPF_FS_MAGIC 0xcafe4a11 60 60 #endif 61 61 62 + #define BPF_FS_DEFAULT_PATH "/sys/fs/bpf" 63 + 62 64 #define BPF_INSN_SZ (sizeof(struct bpf_insn)) 63 65 64 66 /* vsprintf() in __base_pr() uses nonliteral format string. It may break ··· 72 70 73 71 static struct bpf_map *bpf_object__add_map(struct bpf_object *obj); 74 72 static bool prog_is_subprog(const struct bpf_object *obj, const struct bpf_program *prog); 73 + static int map_set_def_max_entries(struct bpf_map *map); 75 74 76 75 static const char * const attach_type_name[] = { 77 76 [BPF_CGROUP_INET_INGRESS] = "cgroup_inet_ingress", ··· 530 527 struct bpf_map_def def; 531 528 __u32 numa_node; 532 529 __u32 btf_var_idx; 530 + int mod_btf_fd; 533 531 __u32 btf_key_type_id; 534 532 __u32 btf_value_type_id; 535 533 __u32 btf_vmlinux_value_type_id; ··· 696 692 size_t fd_array_cnt; 697 693 698 694 struct usdt_manager *usdt_man; 695 + 696 + struct kern_feature_cache *feat_cache; 697 + char *token_path; 698 + int token_fd; 699 699 700 700 char path[]; 701 701 }; ··· 938 930 return NULL; 939 931 } 940 932 933 + static int find_ksym_btf_id(struct bpf_object *obj, const char *ksym_name, 934 + __u16 kind, struct btf **res_btf, 935 + struct module_btf **res_mod_btf); 936 + 941 937 #define STRUCT_OPS_VALUE_PREFIX "bpf_struct_ops_" 942 938 static int find_btf_by_prefix_kind(const struct btf *btf, const char *prefix, 943 939 const char *name, __u32 kind); 944 940 945 941 static int 946 - find_struct_ops_kern_types(const struct btf *btf, const char *tname, 942 + find_struct_ops_kern_types(struct bpf_object *obj, const char *tname, 943 + struct module_btf **mod_btf, 947 944 const struct btf_type **type, __u32 *type_id, 948 945 const struct btf_type **vtype, __u32 *vtype_id, 949 946 const struct btf_member **data_member) 950 947 { 951 948 const struct btf_type *kern_type, *kern_vtype; 952 949 const struct btf_member *kern_data_member; 950 + struct btf *btf; 953 951 __s32 kern_vtype_id, kern_type_id; 954 952 __u32 i; 955 953 956 - kern_type_id = btf__find_by_name_kind(btf, tname, BTF_KIND_STRUCT); 954 + kern_type_id = find_ksym_btf_id(obj, tname, BTF_KIND_STRUCT, 955 + &btf, mod_btf); 957 956 if (kern_type_id < 0) { 958 957 pr_warn("struct_ops init_kern: struct %s is not found in kernel BTF\n", 959 958 tname); ··· 1014 999 } 1015 1000 1016 1001 /* Init the map's fields that depend on kern_btf */ 1017 - static int bpf_map__init_kern_struct_ops(struct bpf_map *map, 1018 - const struct btf *btf, 1019 - const struct btf *kern_btf) 1002 + static int bpf_map__init_kern_struct_ops(struct bpf_map *map) 1020 1003 { 1021 1004 const struct btf_member *member, *kern_member, *kern_data_member; 1022 1005 const struct btf_type *type, *kern_type, *kern_vtype; 1023 1006 __u32 i, kern_type_id, kern_vtype_id, kern_data_off; 1007 + struct bpf_object *obj = map->obj; 1008 + const struct btf *btf = obj->btf; 1024 1009 struct bpf_struct_ops *st_ops; 1010 + const struct btf *kern_btf; 1011 + struct module_btf *mod_btf; 1025 1012 void *data, *kern_data; 1026 1013 const char *tname; 1027 1014 int err; ··· 1031 1014 st_ops = map->st_ops; 1032 1015 type = st_ops->type; 1033 1016 tname = st_ops->tname; 1034 - err = find_struct_ops_kern_types(kern_btf, tname, 1017 + err = find_struct_ops_kern_types(obj, tname, &mod_btf, 1035 1018 &kern_type, &kern_type_id, 1036 1019 &kern_vtype, &kern_vtype_id, 1037 1020 &kern_data_member); 1038 1021 if (err) 1039 1022 return err; 1040 1023 1024 + kern_btf = mod_btf ? mod_btf->btf : obj->btf_vmlinux; 1025 + 1041 1026 pr_debug("struct_ops init_kern %s: type_id:%u kern_type_id:%u kern_vtype_id:%u\n", 1042 1027 map->name, st_ops->type_id, kern_type_id, kern_vtype_id); 1043 1028 1029 + map->mod_btf_fd = mod_btf ? mod_btf->fd : -1; 1044 1030 map->def.value_size = kern_vtype->size; 1045 1031 map->btf_vmlinux_value_type_id = kern_vtype_id; 1046 1032 ··· 1119 1099 return -ENOTSUP; 1120 1100 } 1121 1101 1102 + if (mod_btf) 1103 + prog->attach_btf_obj_fd = mod_btf->fd; 1122 1104 prog->attach_btf_id = kern_type_id; 1123 1105 prog->expected_attach_type = kern_member_idx; 1124 1106 ··· 1163 1141 if (!bpf_map__is_struct_ops(map)) 1164 1142 continue; 1165 1143 1166 - err = bpf_map__init_kern_struct_ops(map, obj->btf, 1167 - obj->btf_vmlinux); 1144 + err = bpf_map__init_kern_struct_ops(map); 1168 1145 if (err) 1169 1146 return err; 1170 1147 } ··· 2237 2216 int err; 2238 2217 2239 2218 if (!path) 2240 - path = "/sys/fs/bpf"; 2219 + path = BPF_FS_DEFAULT_PATH; 2241 2220 2242 2221 err = pathname_concat(buf, sizeof(buf), path, bpf_map__name(map)); 2243 2222 if (err) ··· 3246 3225 } else { 3247 3226 /* currently BPF_BTF_LOAD only supports log_level 1 */ 3248 3227 err = btf_load_into_kernel(kern_btf, obj->log_buf, obj->log_size, 3249 - obj->log_level ? 1 : 0); 3228 + obj->log_level ? 1 : 0, obj->token_fd); 3250 3229 } 3251 3230 if (sanitize) { 3252 3231 if (!err) { ··· 4567 4546 return 0; 4568 4547 } 4569 4548 4549 + static int bpf_object_prepare_token(struct bpf_object *obj) 4550 + { 4551 + const char *bpffs_path; 4552 + int bpffs_fd = -1, token_fd, err; 4553 + bool mandatory; 4554 + enum libbpf_print_level level; 4555 + 4556 + /* token is explicitly prevented */ 4557 + if (obj->token_path && obj->token_path[0] == '\0') { 4558 + pr_debug("object '%s': token is prevented, skipping...\n", obj->name); 4559 + return 0; 4560 + } 4561 + 4562 + mandatory = obj->token_path != NULL; 4563 + level = mandatory ? LIBBPF_WARN : LIBBPF_DEBUG; 4564 + 4565 + bpffs_path = obj->token_path ?: BPF_FS_DEFAULT_PATH; 4566 + bpffs_fd = open(bpffs_path, O_DIRECTORY, O_RDWR); 4567 + if (bpffs_fd < 0) { 4568 + err = -errno; 4569 + __pr(level, "object '%s': failed (%d) to open BPF FS mount at '%s'%s\n", 4570 + obj->name, err, bpffs_path, 4571 + mandatory ? "" : ", skipping optional step..."); 4572 + return mandatory ? err : 0; 4573 + } 4574 + 4575 + token_fd = bpf_token_create(bpffs_fd, 0); 4576 + close(bpffs_fd); 4577 + if (token_fd < 0) { 4578 + if (!mandatory && token_fd == -ENOENT) { 4579 + pr_debug("object '%s': BPF FS at '%s' doesn't have BPF token delegation set up, skipping...\n", 4580 + obj->name, bpffs_path); 4581 + return 0; 4582 + } 4583 + __pr(level, "object '%s': failed (%d) to create BPF token from '%s'%s\n", 4584 + obj->name, token_fd, bpffs_path, 4585 + mandatory ? "" : ", skipping optional step..."); 4586 + return mandatory ? token_fd : 0; 4587 + } 4588 + 4589 + obj->feat_cache = calloc(1, sizeof(*obj->feat_cache)); 4590 + if (!obj->feat_cache) { 4591 + close(token_fd); 4592 + return -ENOMEM; 4593 + } 4594 + 4595 + obj->token_fd = token_fd; 4596 + obj->feat_cache->token_fd = token_fd; 4597 + 4598 + return 0; 4599 + } 4600 + 4570 4601 static int 4571 4602 bpf_object__probe_loading(struct bpf_object *obj) 4572 4603 { ··· 4628 4555 BPF_EXIT_INSN(), 4629 4556 }; 4630 4557 int ret, insn_cnt = ARRAY_SIZE(insns); 4558 + LIBBPF_OPTS(bpf_prog_load_opts, opts, 4559 + .token_fd = obj->token_fd, 4560 + .prog_flags = obj->token_fd ? BPF_F_TOKEN_FD : 0, 4561 + ); 4631 4562 4632 4563 if (obj->gen_loader) 4633 4564 return 0; ··· 4641 4564 pr_warn("Failed to bump RLIMIT_MEMLOCK (err = %d), you might need to do it explicitly!\n", ret); 4642 4565 4643 4566 /* make sure basic loading works */ 4644 - ret = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, NULL, "GPL", insns, insn_cnt, NULL); 4567 + ret = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, NULL, "GPL", insns, insn_cnt, &opts); 4645 4568 if (ret < 0) 4646 - ret = bpf_prog_load(BPF_PROG_TYPE_TRACEPOINT, NULL, "GPL", insns, insn_cnt, NULL); 4569 + ret = bpf_prog_load(BPF_PROG_TYPE_TRACEPOINT, NULL, "GPL", insns, insn_cnt, &opts); 4647 4570 if (ret < 0) { 4648 4571 ret = errno; 4649 4572 cp = libbpf_strerror_r(ret, errmsg, sizeof(errmsg)); ··· 4658 4581 return 0; 4659 4582 } 4660 4583 4661 - static int probe_fd(int fd) 4662 - { 4663 - if (fd >= 0) 4664 - close(fd); 4665 - return fd >= 0; 4666 - } 4667 - 4668 - static int probe_kern_prog_name(void) 4669 - { 4670 - const size_t attr_sz = offsetofend(union bpf_attr, prog_name); 4671 - struct bpf_insn insns[] = { 4672 - BPF_MOV64_IMM(BPF_REG_0, 0), 4673 - BPF_EXIT_INSN(), 4674 - }; 4675 - union bpf_attr attr; 4676 - int ret; 4677 - 4678 - memset(&attr, 0, attr_sz); 4679 - attr.prog_type = BPF_PROG_TYPE_SOCKET_FILTER; 4680 - attr.license = ptr_to_u64("GPL"); 4681 - attr.insns = ptr_to_u64(insns); 4682 - attr.insn_cnt = (__u32)ARRAY_SIZE(insns); 4683 - libbpf_strlcpy(attr.prog_name, "libbpf_nametest", sizeof(attr.prog_name)); 4684 - 4685 - /* make sure loading with name works */ 4686 - ret = sys_bpf_prog_load(&attr, attr_sz, PROG_LOAD_ATTEMPTS); 4687 - return probe_fd(ret); 4688 - } 4689 - 4690 - static int probe_kern_global_data(void) 4691 - { 4692 - char *cp, errmsg[STRERR_BUFSIZE]; 4693 - struct bpf_insn insns[] = { 4694 - BPF_LD_MAP_VALUE(BPF_REG_1, 0, 16), 4695 - BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 42), 4696 - BPF_MOV64_IMM(BPF_REG_0, 0), 4697 - BPF_EXIT_INSN(), 4698 - }; 4699 - int ret, map, insn_cnt = ARRAY_SIZE(insns); 4700 - 4701 - map = bpf_map_create(BPF_MAP_TYPE_ARRAY, "libbpf_global", sizeof(int), 32, 1, NULL); 4702 - if (map < 0) { 4703 - ret = -errno; 4704 - cp = libbpf_strerror_r(ret, errmsg, sizeof(errmsg)); 4705 - pr_warn("Error in %s():%s(%d). Couldn't create simple array map.\n", 4706 - __func__, cp, -ret); 4707 - return ret; 4708 - } 4709 - 4710 - insns[0].imm = map; 4711 - 4712 - ret = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, NULL, "GPL", insns, insn_cnt, NULL); 4713 - close(map); 4714 - return probe_fd(ret); 4715 - } 4716 - 4717 - static int probe_kern_btf(void) 4718 - { 4719 - static const char strs[] = "\0int"; 4720 - __u32 types[] = { 4721 - /* int */ 4722 - BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4), 4723 - }; 4724 - 4725 - return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types), 4726 - strs, sizeof(strs))); 4727 - } 4728 - 4729 - static int probe_kern_btf_func(void) 4730 - { 4731 - static const char strs[] = "\0int\0x\0a"; 4732 - /* void x(int a) {} */ 4733 - __u32 types[] = { 4734 - /* int */ 4735 - BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4), /* [1] */ 4736 - /* FUNC_PROTO */ /* [2] */ 4737 - BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_FUNC_PROTO, 0, 1), 0), 4738 - BTF_PARAM_ENC(7, 1), 4739 - /* FUNC x */ /* [3] */ 4740 - BTF_TYPE_ENC(5, BTF_INFO_ENC(BTF_KIND_FUNC, 0, 0), 2), 4741 - }; 4742 - 4743 - return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types), 4744 - strs, sizeof(strs))); 4745 - } 4746 - 4747 - static int probe_kern_btf_func_global(void) 4748 - { 4749 - static const char strs[] = "\0int\0x\0a"; 4750 - /* static void x(int a) {} */ 4751 - __u32 types[] = { 4752 - /* int */ 4753 - BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4), /* [1] */ 4754 - /* FUNC_PROTO */ /* [2] */ 4755 - BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_FUNC_PROTO, 0, 1), 0), 4756 - BTF_PARAM_ENC(7, 1), 4757 - /* FUNC x BTF_FUNC_GLOBAL */ /* [3] */ 4758 - BTF_TYPE_ENC(5, BTF_INFO_ENC(BTF_KIND_FUNC, 0, BTF_FUNC_GLOBAL), 2), 4759 - }; 4760 - 4761 - return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types), 4762 - strs, sizeof(strs))); 4763 - } 4764 - 4765 - static int probe_kern_btf_datasec(void) 4766 - { 4767 - static const char strs[] = "\0x\0.data"; 4768 - /* static int a; */ 4769 - __u32 types[] = { 4770 - /* int */ 4771 - BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */ 4772 - /* VAR x */ /* [2] */ 4773 - BTF_TYPE_ENC(1, BTF_INFO_ENC(BTF_KIND_VAR, 0, 0), 1), 4774 - BTF_VAR_STATIC, 4775 - /* DATASEC val */ /* [3] */ 4776 - BTF_TYPE_ENC(3, BTF_INFO_ENC(BTF_KIND_DATASEC, 0, 1), 4), 4777 - BTF_VAR_SECINFO_ENC(2, 0, 4), 4778 - }; 4779 - 4780 - return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types), 4781 - strs, sizeof(strs))); 4782 - } 4783 - 4784 - static int probe_kern_btf_float(void) 4785 - { 4786 - static const char strs[] = "\0float"; 4787 - __u32 types[] = { 4788 - /* float */ 4789 - BTF_TYPE_FLOAT_ENC(1, 4), 4790 - }; 4791 - 4792 - return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types), 4793 - strs, sizeof(strs))); 4794 - } 4795 - 4796 - static int probe_kern_btf_decl_tag(void) 4797 - { 4798 - static const char strs[] = "\0tag"; 4799 - __u32 types[] = { 4800 - /* int */ 4801 - BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */ 4802 - /* VAR x */ /* [2] */ 4803 - BTF_TYPE_ENC(1, BTF_INFO_ENC(BTF_KIND_VAR, 0, 0), 1), 4804 - BTF_VAR_STATIC, 4805 - /* attr */ 4806 - BTF_TYPE_DECL_TAG_ENC(1, 2, -1), 4807 - }; 4808 - 4809 - return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types), 4810 - strs, sizeof(strs))); 4811 - } 4812 - 4813 - static int probe_kern_btf_type_tag(void) 4814 - { 4815 - static const char strs[] = "\0tag"; 4816 - __u32 types[] = { 4817 - /* int */ 4818 - BTF_TYPE_INT_ENC(0, BTF_INT_SIGNED, 0, 32, 4), /* [1] */ 4819 - /* attr */ 4820 - BTF_TYPE_TYPE_TAG_ENC(1, 1), /* [2] */ 4821 - /* ptr */ 4822 - BTF_TYPE_ENC(0, BTF_INFO_ENC(BTF_KIND_PTR, 0, 0), 2), /* [3] */ 4823 - }; 4824 - 4825 - return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types), 4826 - strs, sizeof(strs))); 4827 - } 4828 - 4829 - static int probe_kern_array_mmap(void) 4830 - { 4831 - LIBBPF_OPTS(bpf_map_create_opts, opts, .map_flags = BPF_F_MMAPABLE); 4832 - int fd; 4833 - 4834 - fd = bpf_map_create(BPF_MAP_TYPE_ARRAY, "libbpf_mmap", sizeof(int), sizeof(int), 1, &opts); 4835 - return probe_fd(fd); 4836 - } 4837 - 4838 - static int probe_kern_exp_attach_type(void) 4839 - { 4840 - LIBBPF_OPTS(bpf_prog_load_opts, opts, .expected_attach_type = BPF_CGROUP_INET_SOCK_CREATE); 4841 - struct bpf_insn insns[] = { 4842 - BPF_MOV64_IMM(BPF_REG_0, 0), 4843 - BPF_EXIT_INSN(), 4844 - }; 4845 - int fd, insn_cnt = ARRAY_SIZE(insns); 4846 - 4847 - /* use any valid combination of program type and (optional) 4848 - * non-zero expected attach type (i.e., not a BPF_CGROUP_INET_INGRESS) 4849 - * to see if kernel supports expected_attach_type field for 4850 - * BPF_PROG_LOAD command 4851 - */ 4852 - fd = bpf_prog_load(BPF_PROG_TYPE_CGROUP_SOCK, NULL, "GPL", insns, insn_cnt, &opts); 4853 - return probe_fd(fd); 4854 - } 4855 - 4856 - static int probe_kern_probe_read_kernel(void) 4857 - { 4858 - struct bpf_insn insns[] = { 4859 - BPF_MOV64_REG(BPF_REG_1, BPF_REG_10), /* r1 = r10 (fp) */ 4860 - BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8), /* r1 += -8 */ 4861 - BPF_MOV64_IMM(BPF_REG_2, 8), /* r2 = 8 */ 4862 - BPF_MOV64_IMM(BPF_REG_3, 0), /* r3 = 0 */ 4863 - BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_probe_read_kernel), 4864 - BPF_EXIT_INSN(), 4865 - }; 4866 - int fd, insn_cnt = ARRAY_SIZE(insns); 4867 - 4868 - fd = bpf_prog_load(BPF_PROG_TYPE_TRACEPOINT, NULL, "GPL", insns, insn_cnt, NULL); 4869 - return probe_fd(fd); 4870 - } 4871 - 4872 - static int probe_prog_bind_map(void) 4873 - { 4874 - char *cp, errmsg[STRERR_BUFSIZE]; 4875 - struct bpf_insn insns[] = { 4876 - BPF_MOV64_IMM(BPF_REG_0, 0), 4877 - BPF_EXIT_INSN(), 4878 - }; 4879 - int ret, map, prog, insn_cnt = ARRAY_SIZE(insns); 4880 - 4881 - map = bpf_map_create(BPF_MAP_TYPE_ARRAY, "libbpf_det_bind", sizeof(int), 32, 1, NULL); 4882 - if (map < 0) { 4883 - ret = -errno; 4884 - cp = libbpf_strerror_r(ret, errmsg, sizeof(errmsg)); 4885 - pr_warn("Error in %s():%s(%d). Couldn't create simple array map.\n", 4886 - __func__, cp, -ret); 4887 - return ret; 4888 - } 4889 - 4890 - prog = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, NULL, "GPL", insns, insn_cnt, NULL); 4891 - if (prog < 0) { 4892 - close(map); 4893 - return 0; 4894 - } 4895 - 4896 - ret = bpf_prog_bind_map(prog, map, NULL); 4897 - 4898 - close(map); 4899 - close(prog); 4900 - 4901 - return ret >= 0; 4902 - } 4903 - 4904 - static int probe_module_btf(void) 4905 - { 4906 - static const char strs[] = "\0int"; 4907 - __u32 types[] = { 4908 - /* int */ 4909 - BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4), 4910 - }; 4911 - struct bpf_btf_info info; 4912 - __u32 len = sizeof(info); 4913 - char name[16]; 4914 - int fd, err; 4915 - 4916 - fd = libbpf__load_raw_btf((char *)types, sizeof(types), strs, sizeof(strs)); 4917 - if (fd < 0) 4918 - return 0; /* BTF not supported at all */ 4919 - 4920 - memset(&info, 0, sizeof(info)); 4921 - info.name = ptr_to_u64(name); 4922 - info.name_len = sizeof(name); 4923 - 4924 - /* check that BPF_OBJ_GET_INFO_BY_FD supports specifying name pointer; 4925 - * kernel's module BTF support coincides with support for 4926 - * name/name_len fields in struct bpf_btf_info. 4927 - */ 4928 - err = bpf_btf_get_info_by_fd(fd, &info, &len); 4929 - close(fd); 4930 - return !err; 4931 - } 4932 - 4933 - static int probe_perf_link(void) 4934 - { 4935 - struct bpf_insn insns[] = { 4936 - BPF_MOV64_IMM(BPF_REG_0, 0), 4937 - BPF_EXIT_INSN(), 4938 - }; 4939 - int prog_fd, link_fd, err; 4940 - 4941 - prog_fd = bpf_prog_load(BPF_PROG_TYPE_TRACEPOINT, NULL, "GPL", 4942 - insns, ARRAY_SIZE(insns), NULL); 4943 - if (prog_fd < 0) 4944 - return -errno; 4945 - 4946 - /* use invalid perf_event FD to get EBADF, if link is supported; 4947 - * otherwise EINVAL should be returned 4948 - */ 4949 - link_fd = bpf_link_create(prog_fd, -1, BPF_PERF_EVENT, NULL); 4950 - err = -errno; /* close() can clobber errno */ 4951 - 4952 - if (link_fd >= 0) 4953 - close(link_fd); 4954 - close(prog_fd); 4955 - 4956 - return link_fd < 0 && err == -EBADF; 4957 - } 4958 - 4959 - static int probe_uprobe_multi_link(void) 4960 - { 4961 - LIBBPF_OPTS(bpf_prog_load_opts, load_opts, 4962 - .expected_attach_type = BPF_TRACE_UPROBE_MULTI, 4963 - ); 4964 - LIBBPF_OPTS(bpf_link_create_opts, link_opts); 4965 - struct bpf_insn insns[] = { 4966 - BPF_MOV64_IMM(BPF_REG_0, 0), 4967 - BPF_EXIT_INSN(), 4968 - }; 4969 - int prog_fd, link_fd, err; 4970 - unsigned long offset = 0; 4971 - 4972 - prog_fd = bpf_prog_load(BPF_PROG_TYPE_KPROBE, NULL, "GPL", 4973 - insns, ARRAY_SIZE(insns), &load_opts); 4974 - if (prog_fd < 0) 4975 - return -errno; 4976 - 4977 - /* Creating uprobe in '/' binary should fail with -EBADF. */ 4978 - link_opts.uprobe_multi.path = "/"; 4979 - link_opts.uprobe_multi.offsets = &offset; 4980 - link_opts.uprobe_multi.cnt = 1; 4981 - 4982 - link_fd = bpf_link_create(prog_fd, -1, BPF_TRACE_UPROBE_MULTI, &link_opts); 4983 - err = -errno; /* close() can clobber errno */ 4984 - 4985 - if (link_fd >= 0) 4986 - close(link_fd); 4987 - close(prog_fd); 4988 - 4989 - return link_fd < 0 && err == -EBADF; 4990 - } 4991 - 4992 - static int probe_kern_bpf_cookie(void) 4993 - { 4994 - struct bpf_insn insns[] = { 4995 - BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_attach_cookie), 4996 - BPF_EXIT_INSN(), 4997 - }; 4998 - int ret, insn_cnt = ARRAY_SIZE(insns); 4999 - 5000 - ret = bpf_prog_load(BPF_PROG_TYPE_KPROBE, NULL, "GPL", insns, insn_cnt, NULL); 5001 - return probe_fd(ret); 5002 - } 5003 - 5004 - static int probe_kern_btf_enum64(void) 5005 - { 5006 - static const char strs[] = "\0enum64"; 5007 - __u32 types[] = { 5008 - BTF_TYPE_ENC(1, BTF_INFO_ENC(BTF_KIND_ENUM64, 0, 0), 8), 5009 - }; 5010 - 5011 - return probe_fd(libbpf__load_raw_btf((char *)types, sizeof(types), 5012 - strs, sizeof(strs))); 5013 - } 5014 - 5015 - static int probe_kern_syscall_wrapper(void); 5016 - 5017 - enum kern_feature_result { 5018 - FEAT_UNKNOWN = 0, 5019 - FEAT_SUPPORTED = 1, 5020 - FEAT_MISSING = 2, 5021 - }; 5022 - 5023 - typedef int (*feature_probe_fn)(void); 5024 - 5025 - static struct kern_feature_desc { 5026 - const char *desc; 5027 - feature_probe_fn probe; 5028 - enum kern_feature_result res; 5029 - } feature_probes[__FEAT_CNT] = { 5030 - [FEAT_PROG_NAME] = { 5031 - "BPF program name", probe_kern_prog_name, 5032 - }, 5033 - [FEAT_GLOBAL_DATA] = { 5034 - "global variables", probe_kern_global_data, 5035 - }, 5036 - [FEAT_BTF] = { 5037 - "minimal BTF", probe_kern_btf, 5038 - }, 5039 - [FEAT_BTF_FUNC] = { 5040 - "BTF functions", probe_kern_btf_func, 5041 - }, 5042 - [FEAT_BTF_GLOBAL_FUNC] = { 5043 - "BTF global function", probe_kern_btf_func_global, 5044 - }, 5045 - [FEAT_BTF_DATASEC] = { 5046 - "BTF data section and variable", probe_kern_btf_datasec, 5047 - }, 5048 - [FEAT_ARRAY_MMAP] = { 5049 - "ARRAY map mmap()", probe_kern_array_mmap, 5050 - }, 5051 - [FEAT_EXP_ATTACH_TYPE] = { 5052 - "BPF_PROG_LOAD expected_attach_type attribute", 5053 - probe_kern_exp_attach_type, 5054 - }, 5055 - [FEAT_PROBE_READ_KERN] = { 5056 - "bpf_probe_read_kernel() helper", probe_kern_probe_read_kernel, 5057 - }, 5058 - [FEAT_PROG_BIND_MAP] = { 5059 - "BPF_PROG_BIND_MAP support", probe_prog_bind_map, 5060 - }, 5061 - [FEAT_MODULE_BTF] = { 5062 - "module BTF support", probe_module_btf, 5063 - }, 5064 - [FEAT_BTF_FLOAT] = { 5065 - "BTF_KIND_FLOAT support", probe_kern_btf_float, 5066 - }, 5067 - [FEAT_PERF_LINK] = { 5068 - "BPF perf link support", probe_perf_link, 5069 - }, 5070 - [FEAT_BTF_DECL_TAG] = { 5071 - "BTF_KIND_DECL_TAG support", probe_kern_btf_decl_tag, 5072 - }, 5073 - [FEAT_BTF_TYPE_TAG] = { 5074 - "BTF_KIND_TYPE_TAG support", probe_kern_btf_type_tag, 5075 - }, 5076 - [FEAT_MEMCG_ACCOUNT] = { 5077 - "memcg-based memory accounting", probe_memcg_account, 5078 - }, 5079 - [FEAT_BPF_COOKIE] = { 5080 - "BPF cookie support", probe_kern_bpf_cookie, 5081 - }, 5082 - [FEAT_BTF_ENUM64] = { 5083 - "BTF_KIND_ENUM64 support", probe_kern_btf_enum64, 5084 - }, 5085 - [FEAT_SYSCALL_WRAPPER] = { 5086 - "Kernel using syscall wrapper", probe_kern_syscall_wrapper, 5087 - }, 5088 - [FEAT_UPROBE_MULTI_LINK] = { 5089 - "BPF multi-uprobe link support", probe_uprobe_multi_link, 5090 - }, 5091 - }; 5092 - 5093 4584 bool kernel_supports(const struct bpf_object *obj, enum kern_feature_id feat_id) 5094 4585 { 5095 - struct kern_feature_desc *feat = &feature_probes[feat_id]; 5096 - int ret; 5097 - 5098 4586 if (obj && obj->gen_loader) 5099 4587 /* To generate loader program assume the latest kernel 5100 4588 * to avoid doing extra prog_load, map_create syscalls. 5101 4589 */ 5102 4590 return true; 5103 4591 5104 - if (READ_ONCE(feat->res) == FEAT_UNKNOWN) { 5105 - ret = feat->probe(); 5106 - if (ret > 0) { 5107 - WRITE_ONCE(feat->res, FEAT_SUPPORTED); 5108 - } else if (ret == 0) { 5109 - WRITE_ONCE(feat->res, FEAT_MISSING); 5110 - } else { 5111 - pr_warn("Detection of kernel %s support failed: %d\n", feat->desc, ret); 5112 - WRITE_ONCE(feat->res, FEAT_MISSING); 5113 - } 5114 - } 4592 + if (obj->token_fd) 4593 + return feat_supported(obj->feat_cache, feat_id); 5115 4594 5116 - return READ_ONCE(feat->res) == FEAT_SUPPORTED; 4595 + return feat_supported(NULL, feat_id); 5117 4596 } 5118 4597 5119 4598 static bool map_is_reuse_compat(const struct bpf_map *map, int map_fd) ··· 4793 5160 create_attr.map_flags = def->map_flags; 4794 5161 create_attr.numa_node = map->numa_node; 4795 5162 create_attr.map_extra = map->map_extra; 5163 + create_attr.token_fd = obj->token_fd; 5164 + if (obj->token_fd) 5165 + create_attr.map_flags |= BPF_F_TOKEN_FD; 4796 5166 4797 - if (bpf_map__is_struct_ops(map)) 5167 + if (bpf_map__is_struct_ops(map)) { 4798 5168 create_attr.btf_vmlinux_value_type_id = map->btf_vmlinux_value_type_id; 5169 + if (map->mod_btf_fd >= 0) { 5170 + create_attr.value_type_btf_obj_fd = map->mod_btf_fd; 5171 + create_attr.map_flags |= BPF_F_VTYPE_BTF_OBJ_FD; 5172 + } 5173 + } 4799 5174 4800 5175 if (obj->btf && btf__fd(obj->btf) >= 0) { 4801 5176 create_attr.btf_fd = btf__fd(obj->btf); ··· 4813 5172 4814 5173 if (bpf_map_type__is_map_in_map(def->type)) { 4815 5174 if (map->inner_map) { 5175 + err = map_set_def_max_entries(map->inner_map); 5176 + if (err) 5177 + return err; 4816 5178 err = bpf_object__create_map(obj, map->inner_map, true); 4817 5179 if (err) { 4818 5180 pr_warn("map '%s': failed to create inner map: %d\n", ··· 6508 6864 if (cached_result >= 0) 6509 6865 return cached_result; 6510 6866 6511 - btf_fd = libbpf__load_raw_btf((char *)types, sizeof(types), strs, sizeof(strs)); 6867 + btf_fd = libbpf__load_raw_btf((char *)types, sizeof(types), strs, sizeof(strs), 0); 6512 6868 if (btf_fd < 0) 6513 6869 return 0; 6514 6870 ··· 7117 7473 load_attr.prog_flags = prog->prog_flags; 7118 7474 load_attr.fd_array = obj->fd_array; 7119 7475 7476 + load_attr.token_fd = obj->token_fd; 7477 + if (obj->token_fd) 7478 + load_attr.prog_flags |= BPF_F_TOKEN_FD; 7479 + 7120 7480 /* adjust load_attr if sec_def provides custom preload callback */ 7121 7481 if (prog->sec_def && prog->sec_def->prog_prepare_load_fn) { 7122 7482 err = prog->sec_def->prog_prepare_load_fn(prog, &load_attr, prog->sec_def->cookie); ··· 7566 7918 static struct bpf_object *bpf_object_open(const char *path, const void *obj_buf, size_t obj_buf_sz, 7567 7919 const struct bpf_object_open_opts *opts) 7568 7920 { 7569 - const char *obj_name, *kconfig, *btf_tmp_path; 7921 + const char *obj_name, *kconfig, *btf_tmp_path, *token_path; 7570 7922 struct bpf_object *obj; 7571 7923 char tmp_name[64]; 7572 7924 int err; ··· 7603 7955 if (log_size && !log_buf) 7604 7956 return ERR_PTR(-EINVAL); 7605 7957 7958 + token_path = OPTS_GET(opts, bpf_token_path, NULL); 7959 + /* if user didn't specify bpf_token_path explicitly, check if 7960 + * LIBBPF_BPF_TOKEN_PATH envvar was set and treat it as bpf_token_path 7961 + * option 7962 + */ 7963 + if (!token_path) 7964 + token_path = getenv("LIBBPF_BPF_TOKEN_PATH"); 7965 + if (token_path && strlen(token_path) >= PATH_MAX) 7966 + return ERR_PTR(-ENAMETOOLONG); 7967 + 7606 7968 obj = bpf_object__new(path, obj_buf, obj_buf_sz, obj_name); 7607 7969 if (IS_ERR(obj)) 7608 7970 return obj; ··· 7620 7962 obj->log_buf = log_buf; 7621 7963 obj->log_size = log_size; 7622 7964 obj->log_level = log_level; 7965 + 7966 + if (token_path) { 7967 + obj->token_path = strdup(token_path); 7968 + if (!obj->token_path) { 7969 + err = -ENOMEM; 7970 + goto out; 7971 + } 7972 + } 7623 7973 7624 7974 btf_tmp_path = OPTS_GET(opts, btf_custom_path, NULL); 7625 7975 if (btf_tmp_path) { ··· 8139 8473 if (obj->gen_loader) 8140 8474 bpf_gen__init(obj->gen_loader, extra_log_level, obj->nr_programs, obj->nr_maps); 8141 8475 8142 - err = bpf_object__probe_loading(obj); 8476 + err = bpf_object_prepare_token(obj); 8477 + err = err ? : bpf_object__probe_loading(obj); 8143 8478 err = err ? : bpf_object__load_vmlinux_btf(obj, false); 8144 8479 err = err ? : bpf_object__resolve_externs(obj, obj->kconfig); 8145 8480 err = err ? : bpf_object__sanitize_maps(obj); ··· 8674 9007 bpf_program__exit(&obj->programs[i]); 8675 9008 } 8676 9009 zfree(&obj->programs); 9010 + 9011 + zfree(&obj->feat_cache); 9012 + zfree(&obj->token_path); 9013 + if (obj->token_fd > 0) 9014 + close(obj->token_fd); 8677 9015 8678 9016 free(obj); 8679 9017 } ··· 9638 9966 *btf_obj_fd = 0; 9639 9967 *btf_type_id = 1; 9640 9968 } else { 9641 - err = find_kernel_btf_id(prog->obj, attach_name, attach_type, btf_obj_fd, btf_type_id); 9969 + err = find_kernel_btf_id(prog->obj, attach_name, 9970 + attach_type, btf_obj_fd, 9971 + btf_type_id); 9642 9972 } 9643 9973 if (err) { 9644 9974 pr_warn("prog '%s': failed to find kernel BTF type ID of '%s': %d\n", ··· 10702 11028 #endif 10703 11029 } 10704 11030 10705 - static int probe_kern_syscall_wrapper(void) 11031 + int probe_kern_syscall_wrapper(int token_fd) 10706 11032 { 10707 11033 char syscall_name[64]; 10708 11034 const char *ksys_pfx;

+20 -1

tools/lib/bpf/libbpf.h

··· 177 177 * logs through its print callback. 178 178 */ 179 179 __u32 kernel_log_level; 180 + /* Path to BPF FS mount point to derive BPF token from. 181 + * 182 + * Created BPF token will be used for all bpf() syscall operations 183 + * that accept BPF token (e.g., map creation, BTF and program loads, 184 + * etc) automatically within instantiated BPF object. 185 + * 186 + * If bpf_token_path is not specified, libbpf will consult 187 + * LIBBPF_BPF_TOKEN_PATH environment variable. If set, it will be 188 + * taken as a value of bpf_token_path option and will force libbpf to 189 + * either create BPF token from provided custom BPF FS path, or will 190 + * disable implicit BPF token creation, if envvar value is an empty 191 + * string. bpf_token_path overrides LIBBPF_BPF_TOKEN_PATH, if both are 192 + * set at the same time. 193 + * 194 + * Setting bpf_token_path option to empty string disables libbpf's 195 + * automatic attempt to create BPF token from default BPF FS mount 196 + * point (/sys/fs/bpf), in case this default behavior is undesirable. 197 + */ 198 + const char *bpf_token_path; 180 199 181 200 size_t :0; 182 201 }; 183 - #define bpf_object_open_opts__last_field kernel_log_level 202 + #define bpf_object_open_opts__last_field bpf_token_path 184 203 185 204 /** 186 205 * @brief **bpf_object__open()** creates a bpf_object by opening

+1

tools/lib/bpf/libbpf.map

··· 411 411 } LIBBPF_1.2.0; 412 412 413 413 LIBBPF_1.4.0 { 414 + bpf_token_create; 414 415 } LIBBPF_1.3.0;

+45 -5

tools/lib/bpf/libbpf_internal.h

··· 15 15 #include <linux/err.h> 16 16 #include <fcntl.h> 17 17 #include <unistd.h> 18 + #include <sys/syscall.h> 18 19 #include <libelf.h> 19 20 #include "relo_core.h" 20 21 ··· 361 360 __FEAT_CNT, 362 361 }; 363 362 364 - int probe_memcg_account(void); 363 + enum kern_feature_result { 364 + FEAT_UNKNOWN = 0, 365 + FEAT_SUPPORTED = 1, 366 + FEAT_MISSING = 2, 367 + }; 368 + 369 + struct kern_feature_cache { 370 + enum kern_feature_result res[__FEAT_CNT]; 371 + int token_fd; 372 + }; 373 + 374 + bool feat_supported(struct kern_feature_cache *cache, enum kern_feature_id feat_id); 365 375 bool kernel_supports(const struct bpf_object *obj, enum kern_feature_id feat_id); 376 + 377 + int probe_kern_syscall_wrapper(int token_fd); 378 + int probe_memcg_account(int token_fd); 366 379 int bump_rlimit_memlock(void); 367 380 368 381 int parse_cpu_mask_str(const char *s, bool **mask, int *mask_sz); 369 382 int parse_cpu_mask_file(const char *fcpu, bool **mask, int *mask_sz); 370 383 int libbpf__load_raw_btf(const char *raw_types, size_t types_len, 371 - const char *str_sec, size_t str_len); 372 - int btf_load_into_kernel(struct btf *btf, char *log_buf, size_t log_sz, __u32 log_level); 384 + const char *str_sec, size_t str_len, 385 + int token_fd); 386 + int btf_load_into_kernel(struct btf *btf, 387 + char *log_buf, size_t log_sz, __u32 log_level, 388 + int token_fd); 373 389 374 390 struct btf *btf_get_from_fd(int btf_fd, struct btf *base_btf); 375 391 void btf_get_kernel_prefix_kind(enum bpf_attach_type attach_type, ··· 550 532 return insn->code == (BPF_LD | BPF_IMM | BPF_DW); 551 533 } 552 534 535 + /* Unconditionally dup FD, ensuring it doesn't use [0, 2] range. 536 + * Original FD is not closed or altered in any other way. 537 + * Preserves original FD value, if it's invalid (negative). 538 + */ 539 + static inline int dup_good_fd(int fd) 540 + { 541 + if (fd < 0) 542 + return fd; 543 + return fcntl(fd, F_DUPFD_CLOEXEC, 3); 544 + } 545 + 553 546 /* if fd is stdin, stdout, or stderr, dup to a fd greater than 2 554 547 * Takes ownership of the fd passed in, and closes it if calling 555 548 * fcntl(fd, F_DUPFD_CLOEXEC, 3). ··· 572 543 if (fd < 0) 573 544 return fd; 574 545 if (fd < 3) { 575 - fd = fcntl(fd, F_DUPFD_CLOEXEC, 3); 546 + fd = dup_good_fd(fd); 576 547 saved_errno = errno; 577 548 close(old_fd); 578 549 errno = saved_errno; ··· 584 555 return fd; 585 556 } 586 557 558 + static inline int sys_dup2(int oldfd, int newfd) 559 + { 560 + #ifdef __NR_dup2 561 + return syscall(__NR_dup2, oldfd, newfd); 562 + #else 563 + return syscall(__NR_dup3, oldfd, newfd, 0); 564 + #endif 565 + } 566 + 587 567 /* Point *fixed_fd* to the same file that *tmp_fd* points to. 588 568 * Regardless of success, *tmp_fd* is closed. 589 569 * Whatever *fixed_fd* pointed to is closed silently. ··· 601 563 { 602 564 int err; 603 565 604 - err = dup2(tmp_fd, fixed_fd); 566 + err = sys_dup2(tmp_fd, fixed_fd); 605 567 err = err < 0 ? -errno : 0; 606 568 close(tmp_fd); /* clean up temporary FD */ 607 569 return err; ··· 650 612 int st_type); 651 613 int elf_resolve_pattern_offsets(const char *binary_path, const char *pattern, 652 614 unsigned long **poffsets, size_t *pcnt); 615 + 616 + int probe_fd(int fd); 653 617 654 618 #endif /* __LIBBPF_LIBBPF_INTERNAL_H */

+9 -3

tools/lib/bpf/libbpf_probes.c

··· 219 219 } 220 220 221 221 int libbpf__load_raw_btf(const char *raw_types, size_t types_len, 222 - const char *str_sec, size_t str_len) 222 + const char *str_sec, size_t str_len, 223 + int token_fd) 223 224 { 224 225 struct btf_header hdr = { 225 226 .magic = BTF_MAGIC, ··· 230 229 .str_off = types_len, 231 230 .str_len = str_len, 232 231 }; 232 + LIBBPF_OPTS(bpf_btf_load_opts, opts, 233 + .token_fd = token_fd, 234 + .btf_flags = token_fd ? BPF_F_TOKEN_FD : 0, 235 + ); 233 236 int btf_fd, btf_len; 234 237 __u8 *raw_btf; 235 238 ··· 246 241 memcpy(raw_btf + hdr.hdr_len, raw_types, hdr.type_len); 247 242 memcpy(raw_btf + hdr.hdr_len + hdr.type_len, str_sec, hdr.str_len); 248 243 249 - btf_fd = bpf_btf_load(raw_btf, btf_len, NULL); 244 + btf_fd = bpf_btf_load(raw_btf, btf_len, &opts); 250 245 251 246 free(raw_btf); 252 247 return btf_fd; ··· 276 271 }; 277 272 278 273 return libbpf__load_raw_btf((char *)types, sizeof(types), 279 - strs, sizeof(strs)); 274 + strs, sizeof(strs), 0); 280 275 } 281 276 282 277 static int probe_map_create(enum bpf_map_type map_type) ··· 331 326 case BPF_MAP_TYPE_STRUCT_OPS: 332 327 /* we'll get -ENOTSUPP for invalid BTF type ID for struct_ops */ 333 328 opts.btf_vmlinux_value_type_id = 1; 329 + opts.value_type_btf_obj_fd = -1; 334 330 exp_err = -524; /* -ENOTSUPP */ 335 331 break; 336 332 case BPF_MAP_TYPE_BLOOM_FILTER:

+3

tools/lib/bpf/str_error.h

··· 2 2 #ifndef __LIBBPF_STR_ERROR_H 3 3 #define __LIBBPF_STR_ERROR_H 4 4 5 + #define STRERR_BUFSIZE 128 6 + 5 7 char *libbpf_strerror_r(int err, char *dst, int len); 8 + 6 9 #endif /* __LIBBPF_STR_ERROR_H */

+16 -16

tools/testing/selftests/bpf/README.rst

··· 115 115 verifier to understand such speculative pointer arithmetic. 116 116 Hence `this patch`__ addresses it on the compiler side. It was committed on llvm 12. 117 117 118 - __ https://reviews.llvm.org/D85570 118 + __ https://github.com/llvm/llvm-project/commit/ddf1864ace484035e3cde5e83b3a31ac81e059c6 119 119 120 120 The corresponding C code 121 121 ··· 165 165 has been pushed to llvm 10.x release branch and will be 166 166 available in 10.0.1. The patch is available in llvm 11.0.0 trunk. 167 167 168 - __ https://reviews.llvm.org/D78466 168 + __ https://github.com/llvm/llvm-project/commit/3cb7e7bf959dcd3b8080986c62e10a75c7af43f0 169 169 170 170 bpf_verif_scale/loop6.bpf.o test failure with Clang 12 171 171 ====================================================== ··· 204 204 This cause later verifier failure. The bug has been `fixed`__ in 205 205 Clang 13. 206 206 207 - __ https://reviews.llvm.org/D97479 207 + __ https://github.com/llvm/llvm-project/commit/1959ead525b8830cc8a345f45e1c3ef9902d3229 208 208 209 209 BPF CO-RE-based tests and Clang version 210 210 ======================================= ··· 221 221 - __builtin_btf_type_id() [0_, 1_, 2_]; 222 222 - __builtin_preserve_type_info(), __builtin_preserve_enum_value() [3_, 4_]. 223 223 224 - .. _0: https://reviews.llvm.org/D74572 225 - .. _1: https://reviews.llvm.org/D74668 226 - .. _2: https://reviews.llvm.org/D85174 227 - .. _3: https://reviews.llvm.org/D83878 228 - .. _4: https://reviews.llvm.org/D83242 224 + .. _0: https://github.com/llvm/llvm-project/commit/6b01b465388b204d543da3cf49efd6080db094a9 225 + .. _1: https://github.com/llvm/llvm-project/commit/072cde03aaa13a2c57acf62d79876bf79aa1919f 226 + .. _2: https://github.com/llvm/llvm-project/commit/00602ee7ef0bf6c68d690a2bd729c12b95c95c99 227 + .. _3: https://github.com/llvm/llvm-project/commit/6d218b4adb093ff2e9764febbbc89f429412006c 228 + .. _4: https://github.com/llvm/llvm-project/commit/6d6750696400e7ce988d66a1a00e1d0cb32815f8 229 229 230 230 Floating-point tests and Clang version 231 231 ====================================== ··· 234 234 types, which was introduced in `Clang 13`__. The older Clang versions will 235 235 either crash when compiling these tests, or generate an incorrect BTF. 236 236 237 - __ https://reviews.llvm.org/D83289 237 + __ https://github.com/llvm/llvm-project/commit/a7137b238a07d9399d3ae96c0b461571bd5aa8b2 238 238 239 239 Kernel function call test and Clang version 240 240 =========================================== ··· 248 248 249 249 libbpf: failed to find BTF for extern 'tcp_slow_start' [25] section: -2 250 250 251 - __ https://reviews.llvm.org/D93563 251 + __ https://github.com/llvm/llvm-project/commit/886f9ff53155075bd5f1e994f17b85d1e1b7470c 252 252 253 253 btf_tag test and Clang version 254 254 ============================== ··· 264 264 265 265 #<test_num> btf_tag:SKIP 266 266 267 - .. _0: https://reviews.llvm.org/D111588 268 - .. _1: https://reviews.llvm.org/D111199 267 + .. _0: https://github.com/llvm/llvm-project/commit/a162b67c98066218d0d00aa13b99afb95d9bb5e6 268 + .. _1: https://github.com/llvm/llvm-project/commit/3466e00716e12e32fdb100e3fcfca5c2b3e8d784 269 269 270 270 Clang dependencies for static linking tests 271 271 =========================================== ··· 274 274 generate valid BTF information for weak variables. Please make sure you use 275 275 Clang that contains the fix. 276 276 277 - __ https://reviews.llvm.org/D100362 277 + __ https://github.com/llvm/llvm-project/commit/968292cb93198442138128d850fd54dc7edc0035 278 278 279 279 Clang relocation changes 280 280 ======================== ··· 292 292 To fix this issue, user newer libbpf. 293 293 294 294 .. Links 295 - .. _clang reloc patch: https://reviews.llvm.org/D102712 295 + .. _clang reloc patch: https://github.com/llvm/llvm-project/commit/6a2ea84600ba4bd3b2733bd8f08f5115eb32164b 296 296 .. _kernel llvm reloc: /Documentation/bpf/llvm_reloc.rst 297 297 298 298 Clang dependencies for the u32 spill test (xdpwall) ··· 304 304 305 305 .. code-block:: console 306 306 307 - test_xdpwall:FAIL:Does LLVM have https://reviews.llvm.org/D109073? unexpected error: -4007 307 + test_xdpwall:FAIL:Does LLVM have https://github.com/llvm/llvm-project/commit/ea72b0319d7b0f0c2fcf41d121afa5d031b319d5? unexpected error: -4007 308 308 309 - __ https://reviews.llvm.org/D109073 309 + __ https://github.com/llvm/llvm-project/commit/ea72b0319d7b0f0c2fcf41d121afa5d031b319d5

+12 -9

tools/testing/selftests/bpf/bpf_experimental.h

··· 260 260 261 261 #define __is_signed_type(type) (((type)(-1)) < (type)1) 262 262 263 - #define __bpf_cmp(LHS, OP, SIGN, PRED, RHS, DEFAULT) \ 263 + #define __bpf_cmp(LHS, OP, PRED, RHS, DEFAULT) \ 264 264 ({ \ 265 265 __label__ l_true; \ 266 266 bool ret = DEFAULT; \ 267 - asm volatile goto("if %[lhs] " SIGN #OP " %[rhs] goto %l[l_true]" \ 267 + asm volatile goto("if %[lhs] " OP " %[rhs] goto %l[l_true]" \ 268 268 :: [lhs] "r"((short)LHS), [rhs] PRED (RHS) :: l_true); \ 269 269 ret = !DEFAULT; \ 270 270 l_true: \ ··· 276 276 * __lhs OP __rhs below will catch the mistake. 277 277 * Be aware that we check only __lhs to figure out the sign of compare. 278 278 */ 279 - #define _bpf_cmp(LHS, OP, RHS, NOFLIP) \ 279 + #define _bpf_cmp(LHS, OP, RHS, UNLIKELY) \ 280 280 ({ \ 281 281 typeof(LHS) __lhs = (LHS); \ 282 282 typeof(RHS) __rhs = (RHS); \ ··· 285 285 (void)(__lhs OP __rhs); \ 286 286 if (__cmp_cannot_be_signed(OP) || !__is_signed_type(typeof(__lhs))) { \ 287 287 if (sizeof(__rhs) == 8) \ 288 - ret = __bpf_cmp(__lhs, OP, "", "r", __rhs, NOFLIP); \ 288 + /* "i" will truncate 64-bit constant into s32, \ 289 + * so we have to use extra register via "r". \ 290 + */ \ 291 + ret = __bpf_cmp(__lhs, #OP, "r", __rhs, UNLIKELY); \ 289 292 else \ 290 - ret = __bpf_cmp(__lhs, OP, "", "i", __rhs, NOFLIP); \ 293 + ret = __bpf_cmp(__lhs, #OP, "ri", __rhs, UNLIKELY); \ 291 294 } else { \ 292 295 if (sizeof(__rhs) == 8) \ 293 - ret = __bpf_cmp(__lhs, OP, "s", "r", __rhs, NOFLIP); \ 296 + ret = __bpf_cmp(__lhs, "s"#OP, "r", __rhs, UNLIKELY); \ 294 297 else \ 295 - ret = __bpf_cmp(__lhs, OP, "s", "i", __rhs, NOFLIP); \ 298 + ret = __bpf_cmp(__lhs, "s"#OP, "ri", __rhs, UNLIKELY); \ 296 299 } \ 297 300 ret; \ 298 301 }) ··· 307 304 #ifndef bpf_cmp_likely 308 305 #define bpf_cmp_likely(LHS, OP, RHS) \ 309 306 ({ \ 310 - bool ret; \ 307 + bool ret = 0; \ 311 308 if (__builtin_strcmp(#OP, "==") == 0) \ 312 309 ret = _bpf_cmp(LHS, !=, RHS, false); \ 313 310 else if (__builtin_strcmp(#OP, "!=") == 0) \ ··· 321 318 else if (__builtin_strcmp(#OP, ">=") == 0) \ 322 319 ret = _bpf_cmp(LHS, <, RHS, false); \ 323 320 else \ 324 - (void) "bug"; \ 321 + asm volatile("r0 " #OP " invalid compare"); \ 325 322 ret; \ 326 323 }) 327 324 #endif

+10

tools/testing/selftests/bpf/bpf_kfuncs.h

··· 51 51 extern int bpf_sock_addr_set_sun_path(struct bpf_sock_addr_kern *sa_kern, 52 52 const __u8 *sun_path, __u32 sun_path__sz) __ksym; 53 53 54 + /* Description 55 + * Allocate and configure a reqsk and link it with a listener and skb. 56 + * Returns 57 + * Error code 58 + */ 59 + struct sock; 60 + struct bpf_tcp_req_attrs; 61 + extern int bpf_sk_assign_tcp_reqsk(struct __sk_buff *skb, struct sock *sk, 62 + struct bpf_tcp_req_attrs *attrs, int attrs__sz) __ksym; 63 + 54 64 void *bpf_cast_to_kern_ctx(void *) __ksym; 55 65 56 66 void *bpf_rdonly_cast(void *obj, __u32 btf_id) __ksym;

+75

tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c

··· 1 1 // SPDX-License-Identifier: GPL-2.0 2 2 /* Copyright (c) 2020 Facebook */ 3 + #include <linux/bpf.h> 3 4 #include <linux/btf.h> 4 5 #include <linux/btf_ids.h> 6 + #include <linux/delay.h> 5 7 #include <linux/error-injection.h> 6 8 #include <linux/init.h> 7 9 #include <linux/module.h> ··· 522 520 BTF_ID_FLAGS(func, bpf_kfunc_call_test_offset) 523 521 BTF_SET8_END(bpf_testmod_check_kfunc_ids) 524 522 523 + static int bpf_testmod_ops_init(struct btf *btf) 524 + { 525 + return 0; 526 + } 527 + 528 + static bool bpf_testmod_ops_is_valid_access(int off, int size, 529 + enum bpf_access_type type, 530 + const struct bpf_prog *prog, 531 + struct bpf_insn_access_aux *info) 532 + { 533 + return bpf_tracing_btf_ctx_access(off, size, type, prog, info); 534 + } 535 + 536 + static int bpf_testmod_ops_init_member(const struct btf_type *t, 537 + const struct btf_member *member, 538 + void *kdata, const void *udata) 539 + { 540 + return 0; 541 + } 542 + 525 543 static const struct btf_kfunc_id_set bpf_testmod_kfunc_set = { 526 544 .owner = THIS_MODULE, 527 545 .set = &bpf_testmod_check_kfunc_ids, 546 + }; 547 + 548 + static const struct bpf_verifier_ops bpf_testmod_verifier_ops = { 549 + .is_valid_access = bpf_testmod_ops_is_valid_access, 550 + }; 551 + 552 + static int bpf_dummy_reg(void *kdata) 553 + { 554 + struct bpf_testmod_ops *ops = kdata; 555 + int r; 556 + 557 + r = ops->test_2(4, 3); 558 + 559 + return 0; 560 + } 561 + 562 + static void bpf_dummy_unreg(void *kdata) 563 + { 564 + } 565 + 566 + static int bpf_testmod_test_1(void) 567 + { 568 + return 0; 569 + } 570 + 571 + static int bpf_testmod_test_2(int a, int b) 572 + { 573 + return 0; 574 + } 575 + 576 + static struct bpf_testmod_ops __bpf_testmod_ops = { 577 + .test_1 = bpf_testmod_test_1, 578 + .test_2 = bpf_testmod_test_2, 579 + }; 580 + 581 + struct bpf_struct_ops bpf_bpf_testmod_ops = { 582 + .verifier_ops = &bpf_testmod_verifier_ops, 583 + .init = bpf_testmod_ops_init, 584 + .init_member = bpf_testmod_ops_init_member, 585 + .reg = bpf_dummy_reg, 586 + .unreg = bpf_dummy_unreg, 587 + .cfi_stubs = &__bpf_testmod_ops, 588 + .name = "bpf_testmod_ops", 589 + .owner = THIS_MODULE, 528 590 }; 529 591 530 592 extern int bpf_fentry_test1(int a); ··· 601 535 ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_testmod_kfunc_set); 602 536 ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_testmod_kfunc_set); 603 537 ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SYSCALL, &bpf_testmod_kfunc_set); 538 + ret = ret ?: register_bpf_struct_ops(&bpf_bpf_testmod_ops, bpf_testmod_ops); 604 539 if (ret < 0) 605 540 return ret; 606 541 if (bpf_fentry_test1(0) < 0) ··· 611 544 612 545 static void bpf_testmod_exit(void) 613 546 { 547 + /* Need to wait for all references to be dropped because 548 + * bpf_kfunc_call_test_release() which currently resides in kernel can 549 + * be called after bpf_testmod is unloaded. Once release function is 550 + * moved into the module this wait can be removed. 551 + */ 552 + while (refcount_read(&prog_test_struct.cnt) > 1) 553 + msleep(20); 554 + 614 555 return sysfs_remove_bin_file(kernel_kobj, &bin_attr_bpf_testmod_file); 615 556 } 616 557

+5

tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h

··· 28 28 int cnt; 29 29 }; 30 30 31 + struct bpf_testmod_ops { 32 + int (*test_1)(void); 33 + int (*test_2)(int a, int b); 34 + }; 35 + 31 36 #endif /* _BPF_TESTMOD_H */

+1

tools/testing/selftests/bpf/config

··· 81 81 CONFIG_RC_CORE=y 82 82 CONFIG_SECURITY=y 83 83 CONFIG_SECURITYFS=y 84 + CONFIG_SYN_COOKIES=y 84 85 CONFIG_TEST_BPF=m 85 86 CONFIG_USERFAULTFD=y 86 87 CONFIG_VSOCKETS=y

+1 -1

tools/testing/selftests/bpf/prog_tests/bpf_verif_scale.c

··· 35 35 } 36 36 37 37 bpf_program__set_type(prog, type); 38 - bpf_program__set_flags(prog, BPF_F_TEST_RND_HI32 | BPF_F_TEST_REG_INVARIANTS); 38 + bpf_program__set_flags(prog, testing_prog_flags()); 39 39 bpf_program__set_log_level(prog, 4 | extra_prog_load_log_flags); 40 40 41 41 err = bpf_object__load(obj);

-44

tools/testing/selftests/bpf/prog_tests/ctx_rewrite.c

··· 626 626 return false; 627 627 } 628 628 629 - /* Request BPF program instructions after all rewrites are applied, 630 - * e.g. verifier.c:convert_ctx_access() is done. 631 - */ 632 - static int get_xlated_program(int fd_prog, struct bpf_insn **buf, __u32 *cnt) 633 - { 634 - struct bpf_prog_info info = {}; 635 - __u32 info_len = sizeof(info); 636 - __u32 xlated_prog_len; 637 - __u32 buf_element_size = sizeof(struct bpf_insn); 638 - 639 - if (bpf_prog_get_info_by_fd(fd_prog, &info, &info_len)) { 640 - perror("bpf_prog_get_info_by_fd failed"); 641 - return -1; 642 - } 643 - 644 - xlated_prog_len = info.xlated_prog_len; 645 - if (xlated_prog_len % buf_element_size) { 646 - printf("Program length %d is not multiple of %d\n", 647 - xlated_prog_len, buf_element_size); 648 - return -1; 649 - } 650 - 651 - *cnt = xlated_prog_len / buf_element_size; 652 - *buf = calloc(*cnt, buf_element_size); 653 - if (!buf) { 654 - perror("can't allocate xlated program buffer"); 655 - return -ENOMEM; 656 - } 657 - 658 - bzero(&info, sizeof(info)); 659 - info.xlated_prog_len = xlated_prog_len; 660 - info.xlated_prog_insns = (__u64)(unsigned long)*buf; 661 - if (bpf_prog_get_info_by_fd(fd_prog, &info, &info_len)) { 662 - perror("second bpf_prog_get_info_by_fd failed"); 663 - goto out_free_buf; 664 - } 665 - 666 - return 0; 667 - 668 - out_free_buf: 669 - free(*buf); 670 - return -1; 671 - } 672 - 673 629 static void print_insn(void *private_data, const char *fmt, ...) 674 630 { 675 631 va_list args;

+97 -17

tools/testing/selftests/bpf/prog_tests/fill_link_info.c

··· 19 19 }; 20 20 #define KMULTI_CNT ARRAY_SIZE(kmulti_syms) 21 21 static __u64 kmulti_addrs[KMULTI_CNT]; 22 + static __u64 kmulti_cookies[] = { 3, 1, 2 }; 22 23 23 24 #define KPROBE_FUNC "bpf_fentry_test1" 24 25 static __u64 kprobe_addr; ··· 31 30 { 32 31 asm volatile (""); 33 32 } 33 + 34 + #define PERF_EVENT_COOKIE 0xdeadbeef 34 35 35 36 static int verify_perf_link_info(int fd, enum bpf_perf_event_type type, long addr, 36 37 ssize_t offset, ssize_t entry_offset) ··· 65 62 ASSERT_EQ(info.perf_event.kprobe.addr, addr + entry_offset, 66 63 "kprobe_addr"); 67 64 65 + ASSERT_EQ(info.perf_event.kprobe.cookie, PERF_EVENT_COOKIE, "kprobe_cookie"); 66 + 68 67 if (!info.perf_event.kprobe.func_name) { 69 68 ASSERT_EQ(info.perf_event.kprobe.name_len, 0, "name_len"); 70 69 info.perf_event.kprobe.func_name = ptr_to_u64(&buf); ··· 86 81 goto again; 87 82 } 88 83 84 + ASSERT_EQ(info.perf_event.tracepoint.cookie, PERF_EVENT_COOKIE, "tracepoint_cookie"); 85 + 89 86 err = strncmp(u64_to_ptr(info.perf_event.tracepoint.tp_name), TP_NAME, 90 87 strlen(TP_NAME)); 91 88 ASSERT_EQ(err, 0, "cmp_tp_name"); ··· 103 96 goto again; 104 97 } 105 98 99 + ASSERT_EQ(info.perf_event.uprobe.cookie, PERF_EVENT_COOKIE, "uprobe_cookie"); 100 + 106 101 err = strncmp(u64_to_ptr(info.perf_event.uprobe.file_name), UPROBE_FILE, 107 102 strlen(UPROBE_FILE)); 108 103 ASSERT_EQ(err, 0, "cmp_file_name"); 104 + break; 105 + case BPF_PERF_EVENT_EVENT: 106 + ASSERT_EQ(info.perf_event.event.type, PERF_TYPE_SOFTWARE, "event_type"); 107 + ASSERT_EQ(info.perf_event.event.config, PERF_COUNT_SW_PAGE_FAULTS, "event_config"); 108 + ASSERT_EQ(info.perf_event.event.cookie, PERF_EVENT_COOKIE, "event_cookie"); 109 109 break; 110 110 default: 111 111 err = -1; ··· 153 139 DECLARE_LIBBPF_OPTS(bpf_kprobe_opts, opts, 154 140 .attach_mode = PROBE_ATTACH_MODE_LINK, 155 141 .retprobe = type == BPF_PERF_EVENT_KRETPROBE, 142 + .bpf_cookie = PERF_EVENT_COOKIE, 156 143 ); 157 144 ssize_t entry_offset = 0; 158 145 struct bpf_link *link; ··· 178 163 179 164 static void test_tp_fill_link_info(struct test_fill_link_info *skel) 180 165 { 166 + DECLARE_LIBBPF_OPTS(bpf_tracepoint_opts, opts, 167 + .bpf_cookie = PERF_EVENT_COOKIE, 168 + ); 181 169 struct bpf_link *link; 182 170 int link_fd, err; 183 171 184 - link = bpf_program__attach_tracepoint(skel->progs.tp_run, TP_CAT, TP_NAME); 172 + link = bpf_program__attach_tracepoint_opts(skel->progs.tp_run, TP_CAT, TP_NAME, &opts); 185 173 if (!ASSERT_OK_PTR(link, "attach_tp")) 186 174 return; 187 175 ··· 194 176 bpf_link__destroy(link); 195 177 } 196 178 179 + static void test_event_fill_link_info(struct test_fill_link_info *skel) 180 + { 181 + DECLARE_LIBBPF_OPTS(bpf_perf_event_opts, opts, 182 + .bpf_cookie = PERF_EVENT_COOKIE, 183 + ); 184 + struct bpf_link *link; 185 + int link_fd, err, pfd; 186 + struct perf_event_attr attr = { 187 + .type = PERF_TYPE_SOFTWARE, 188 + .config = PERF_COUNT_SW_PAGE_FAULTS, 189 + .freq = 1, 190 + .sample_freq = 1, 191 + .size = sizeof(struct perf_event_attr), 192 + }; 193 + 194 + pfd = syscall(__NR_perf_event_open, &attr, -1 /* pid */, 0 /* cpu 0 */, 195 + -1 /* group id */, 0 /* flags */); 196 + if (!ASSERT_GE(pfd, 0, "perf_event_open")) 197 + return; 198 + 199 + link = bpf_program__attach_perf_event_opts(skel->progs.event_run, pfd, &opts); 200 + if (!ASSERT_OK_PTR(link, "attach_event")) 201 + goto error; 202 + 203 + link_fd = bpf_link__fd(link); 204 + err = verify_perf_link_info(link_fd, BPF_PERF_EVENT_EVENT, 0, 0, 0); 205 + ASSERT_OK(err, "verify_perf_link_info"); 206 + bpf_link__destroy(link); 207 + 208 + error: 209 + close(pfd); 210 + } 211 + 197 212 static void test_uprobe_fill_link_info(struct test_fill_link_info *skel, 198 213 enum bpf_perf_event_type type) 199 214 { 215 + DECLARE_LIBBPF_OPTS(bpf_uprobe_opts, opts, 216 + .retprobe = type == BPF_PERF_EVENT_URETPROBE, 217 + .bpf_cookie = PERF_EVENT_COOKIE, 218 + ); 200 219 struct bpf_link *link; 201 220 int link_fd, err; 202 221 203 - link = bpf_program__attach_uprobe(skel->progs.uprobe_run, 204 - type == BPF_PERF_EVENT_URETPROBE, 205 - 0, /* self pid */ 206 - UPROBE_FILE, uprobe_offset); 222 + link = bpf_program__attach_uprobe_opts(skel->progs.uprobe_run, 223 + 0, /* self pid */ 224 + UPROBE_FILE, uprobe_offset, 225 + &opts); 207 226 if (!ASSERT_OK_PTR(link, "attach_uprobe")) 208 227 return; 209 228 ··· 250 195 bpf_link__destroy(link); 251 196 } 252 197 253 - static int verify_kmulti_link_info(int fd, bool retprobe) 198 + static int verify_kmulti_link_info(int fd, bool retprobe, bool has_cookies) 254 199 { 200 + __u64 addrs[KMULTI_CNT], cookies[KMULTI_CNT]; 255 201 struct bpf_link_info info; 256 202 __u32 len = sizeof(info); 257 - __u64 addrs[KMULTI_CNT]; 258 203 int flags, i, err; 259 204 260 205 memset(&info, 0, sizeof(info)); ··· 276 221 277 222 if (!info.kprobe_multi.addrs) { 278 223 info.kprobe_multi.addrs = ptr_to_u64(addrs); 224 + info.kprobe_multi.cookies = ptr_to_u64(cookies); 279 225 goto again; 280 226 } 281 - for (i = 0; i < KMULTI_CNT; i++) 227 + for (i = 0; i < KMULTI_CNT; i++) { 282 228 ASSERT_EQ(addrs[i], kmulti_addrs[i], "kmulti_addrs"); 229 + ASSERT_EQ(cookies[i], has_cookies ? kmulti_cookies[i] : 0, 230 + "kmulti_cookies_value"); 231 + } 283 232 return 0; 284 233 } 285 234 286 235 static void verify_kmulti_invalid_user_buffer(int fd) 287 236 { 237 + __u64 addrs[KMULTI_CNT], cookies[KMULTI_CNT]; 288 238 struct bpf_link_info info; 289 239 __u32 len = sizeof(info); 290 - __u64 addrs[KMULTI_CNT]; 291 240 int err, i; 292 241 293 242 memset(&info, 0, sizeof(info)); ··· 325 266 info.kprobe_multi.count = KMULTI_CNT; 326 267 info.kprobe_multi.addrs = 0x1; /* invalid addr */ 327 268 err = bpf_link_get_info_by_fd(fd, &info, &len); 328 - ASSERT_EQ(err, -EFAULT, "invalid_buff"); 269 + ASSERT_EQ(err, -EFAULT, "invalid_buff_addrs"); 270 + 271 + info.kprobe_multi.count = KMULTI_CNT; 272 + info.kprobe_multi.addrs = ptr_to_u64(addrs); 273 + info.kprobe_multi.cookies = 0x1; /* invalid addr */ 274 + err = bpf_link_get_info_by_fd(fd, &info, &len); 275 + ASSERT_EQ(err, -EFAULT, "invalid_buff_cookies"); 276 + 277 + /* cookies && !count */ 278 + info.kprobe_multi.count = 0; 279 + info.kprobe_multi.addrs = ptr_to_u64(NULL); 280 + info.kprobe_multi.cookies = ptr_to_u64(cookies); 281 + err = bpf_link_get_info_by_fd(fd, &info, &len); 282 + ASSERT_EQ(err, -EINVAL, "invalid_cookies_count"); 329 283 } 330 284 331 285 static int symbols_cmp_r(const void *a, const void *b) ··· 350 278 } 351 279 352 280 static void test_kprobe_multi_fill_link_info(struct test_fill_link_info *skel, 353 - bool retprobe, bool invalid) 281 + bool retprobe, bool cookies, 282 + bool invalid) 354 283 { 355 284 LIBBPF_OPTS(bpf_kprobe_multi_opts, opts); 356 285 struct bpf_link *link; 357 286 int link_fd, err; 358 287 359 288 opts.syms = kmulti_syms; 289 + opts.cookies = cookies ? kmulti_cookies : NULL; 360 290 opts.cnt = KMULTI_CNT; 361 291 opts.retprobe = retprobe; 362 292 link = bpf_program__attach_kprobe_multi_opts(skel->progs.kmulti_run, NULL, &opts); ··· 367 293 368 294 link_fd = bpf_link__fd(link); 369 295 if (!invalid) { 370 - err = verify_kmulti_link_info(link_fd, retprobe); 296 + err = verify_kmulti_link_info(link_fd, retprobe, cookies); 371 297 ASSERT_OK(err, "verify_kmulti_link_info"); 372 298 } else { 373 299 verify_kmulti_invalid_user_buffer(link_fd); ··· 587 513 test_kprobe_fill_link_info(skel, BPF_PERF_EVENT_KPROBE, true); 588 514 if (test__start_subtest("tracepoint_link_info")) 589 515 test_tp_fill_link_info(skel); 516 + if (test__start_subtest("event_link_info")) 517 + test_event_fill_link_info(skel); 590 518 591 519 uprobe_offset = get_uprobe_offset(&uprobe_func); 592 520 if (test__start_subtest("uprobe_link_info")) ··· 599 523 qsort(kmulti_syms, KMULTI_CNT, sizeof(kmulti_syms[0]), symbols_cmp_r); 600 524 for (i = 0; i < KMULTI_CNT; i++) 601 525 kmulti_addrs[i] = ksym_get_addr(kmulti_syms[i]); 602 - if (test__start_subtest("kprobe_multi_link_info")) 603 - test_kprobe_multi_fill_link_info(skel, false, false); 604 - if (test__start_subtest("kretprobe_multi_link_info")) 605 - test_kprobe_multi_fill_link_info(skel, true, false); 526 + if (test__start_subtest("kprobe_multi_link_info")) { 527 + test_kprobe_multi_fill_link_info(skel, false, false, false); 528 + test_kprobe_multi_fill_link_info(skel, false, true, false); 529 + } 530 + if (test__start_subtest("kretprobe_multi_link_info")) { 531 + test_kprobe_multi_fill_link_info(skel, true, false, false); 532 + test_kprobe_multi_fill_link_info(skel, true, true, false); 533 + } 606 534 if (test__start_subtest("kprobe_multi_invalid_ubuff")) 607 - test_kprobe_multi_fill_link_info(skel, true, true); 535 + test_kprobe_multi_fill_link_info(skel, true, true, true); 608 536 609 537 if (test__start_subtest("uprobe_multi_link_info")) 610 538 test_uprobe_multi_fill_link_info(skel, false, false);

+51

tools/testing/selftests/bpf/prog_tests/kptr_xchg_inline.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (C) 2023. Huawei Technologies Co., Ltd */ 3 + #include <test_progs.h> 4 + 5 + #include "linux/filter.h" 6 + #include "kptr_xchg_inline.skel.h" 7 + 8 + void test_kptr_xchg_inline(void) 9 + { 10 + struct kptr_xchg_inline *skel; 11 + struct bpf_insn *insn = NULL; 12 + struct bpf_insn exp; 13 + unsigned int cnt; 14 + int err; 15 + 16 + #if !(defined(__x86_64__) || defined(__aarch64__)) 17 + test__skip(); 18 + return; 19 + #endif 20 + 21 + skel = kptr_xchg_inline__open_and_load(); 22 + if (!ASSERT_OK_PTR(skel, "open_load")) 23 + return; 24 + 25 + err = get_xlated_program(bpf_program__fd(skel->progs.kptr_xchg_inline), &insn, &cnt); 26 + if (!ASSERT_OK(err, "prog insn")) 27 + goto out; 28 + 29 + /* The original instructions are: 30 + * r1 = map[id:xxx][0]+0 31 + * r2 = 0 32 + * call bpf_kptr_xchg#yyy 33 + * 34 + * call bpf_kptr_xchg#yyy will be inlined as: 35 + * r0 = r2 36 + * r0 = atomic64_xchg((u64 *)(r1 +0), r0) 37 + */ 38 + if (!ASSERT_GT(cnt, 5, "insn cnt")) 39 + goto out; 40 + 41 + exp = BPF_MOV64_REG(BPF_REG_0, BPF_REG_2); 42 + if (!ASSERT_OK(memcmp(&insn[3], &exp, sizeof(exp)), "mov")) 43 + goto out; 44 + 45 + exp = BPF_ATOMIC_OP(BPF_DW, BPF_XCHG, BPF_REG_1, BPF_REG_0, 0); 46 + if (!ASSERT_OK(memcmp(&insn[4], &exp, sizeof(exp)), "xchg")) 47 + goto out; 48 + out: 49 + free(insn); 50 + kptr_xchg_inline__destroy(skel); 51 + }

+4

tools/testing/selftests/bpf/prog_tests/libbpf_probes.c

··· 30 30 31 31 if (prog_type == BPF_PROG_TYPE_UNSPEC) 32 32 continue; 33 + if (strcmp(prog_type_name, "__MAX_BPF_PROG_TYPE") == 0) 34 + continue; 33 35 34 36 if (!test__start_subtest(prog_type_name)) 35 37 continue; ··· 69 67 int res; 70 68 71 69 if (map_type == BPF_MAP_TYPE_UNSPEC) 70 + continue; 71 + if (strcmp(map_type_name, "__MAX_BPF_MAP_TYPE") == 0) 72 72 continue; 73 73 74 74 if (!test__start_subtest(map_type_name))

+6

tools/testing/selftests/bpf/prog_tests/libbpf_str.c

··· 132 132 const char *map_type_str; 133 133 char buf[256]; 134 134 135 + if (map_type == __MAX_BPF_MAP_TYPE) 136 + continue; 137 + 135 138 map_type_name = btf__str_by_offset(btf, e->name_off); 136 139 map_type_str = libbpf_bpf_map_type_str(map_type); 137 140 ASSERT_OK_PTR(map_type_str, map_type_name); ··· 188 185 const char *prog_type_name; 189 186 const char *prog_type_str; 190 187 char buf[256]; 188 + 189 + if (prog_type == __MAX_BPF_PROG_TYPE) 190 + continue; 191 191 192 192 prog_type_name = btf__str_by_offset(btf, e->name_off); 193 193 prog_type_str = libbpf_bpf_prog_type_str(prog_type);

+1 -1

tools/testing/selftests/bpf/prog_tests/reg_bounds.c

··· 840 840 .log_level = 2, 841 841 .log_buf = log_buf, 842 842 .log_size = log_sz, 843 - .prog_flags = BPF_F_TEST_REG_INVARIANTS, 843 + .prog_flags = testing_prog_flags(), 844 844 ); 845 845 846 846 /* ; skip exit block below

+86 -4

tools/testing/selftests/bpf/prog_tests/tc_redirect.c

··· 188 188 { 189 189 struct nstoken *nstoken = NULL; 190 190 char src_fwd_addr[IFADDR_STR_LEN+1] = {}; 191 + char src_addr[IFADDR_STR_LEN + 1] = {}; 191 192 int err; 192 193 193 194 if (result->dev_mode == MODE_VETH) { ··· 207 206 } 208 207 209 208 if (get_ifaddr("src_fwd", src_fwd_addr)) 209 + goto fail; 210 + 211 + if (get_ifaddr("src", src_addr)) 210 212 goto fail; 211 213 212 214 result->ifindex_src = if_nametoindex("src"); ··· 274 270 SYS(fail, "ip route add " IP4_DST "/32 dev dst_fwd scope global"); 275 271 SYS(fail, "ip route add " IP6_DST "/128 dev dst_fwd scope global"); 276 272 273 + if (result->dev_mode == MODE_VETH) { 274 + SYS(fail, "ip neigh add " IP4_SRC " dev src_fwd lladdr %s", src_addr); 275 + SYS(fail, "ip neigh add " IP6_SRC " dev src_fwd lladdr %s", src_addr); 276 + SYS(fail, "ip neigh add " IP4_DST " dev dst_fwd lladdr %s", MAC_DST); 277 + SYS(fail, "ip neigh add " IP6_DST " dev dst_fwd lladdr %s", MAC_DST); 278 + } 279 + 277 280 close_netns(nstoken); 278 281 279 282 /** setup in 'dst' namespace */ ··· 291 280 SYS(fail, "ip addr add " IP4_DST "/32 dev dst"); 292 281 SYS(fail, "ip addr add " IP6_DST "/128 dev dst nodad"); 293 282 SYS(fail, "ip link set dev dst up"); 283 + SYS(fail, "ip link set dev lo up"); 294 284 295 285 SYS(fail, "ip route add " IP4_SRC "/32 dev dst scope global"); 296 286 SYS(fail, "ip route add " IP4_NET "/16 dev dst scope global"); ··· 469 457 return 0; 470 458 } 471 459 472 - static void rcv_tstamp(int fd, const char *expected, size_t s) 460 + static int __rcv_tstamp(int fd, const char *expected, size_t s, __u64 *tstamp) 473 461 { 474 462 struct __kernel_timespec pkt_ts = {}; 475 463 char ctl[CMSG_SPACE(sizeof(pkt_ts))]; ··· 490 478 491 479 ret = recvmsg(fd, &msg, 0); 492 480 if (!ASSERT_EQ(ret, s, "recvmsg")) 493 - return; 481 + return -1; 494 482 ASSERT_STRNEQ(data, expected, s, "expected rcv data"); 495 483 496 484 cmsg = CMSG_FIRSTHDR(&msg); ··· 499 487 memcpy(&pkt_ts, CMSG_DATA(cmsg), sizeof(pkt_ts)); 500 488 501 489 pkt_ns = pkt_ts.tv_sec * NSEC_PER_SEC + pkt_ts.tv_nsec; 490 + if (tstamp) { 491 + /* caller will check the tstamp itself */ 492 + *tstamp = pkt_ns; 493 + return 0; 494 + } 495 + 502 496 ASSERT_NEQ(pkt_ns, 0, "pkt rcv tstamp"); 503 497 504 498 ret = clock_gettime(CLOCK_REALTIME, &now_ts); ··· 514 496 if (ASSERT_GE(now_ns, pkt_ns, "check rcv tstamp")) 515 497 ASSERT_LT(now_ns - pkt_ns, 5 * NSEC_PER_SEC, 516 498 "check rcv tstamp"); 499 + return 0; 500 + } 501 + 502 + static void rcv_tstamp(int fd, const char *expected, size_t s) 503 + { 504 + __rcv_tstamp(fd, expected, s, NULL); 505 + } 506 + 507 + static int wait_netstamp_needed_key(void) 508 + { 509 + int opt = 1, srv_fd = -1, cli_fd = -1, nretries = 0, err, n; 510 + char buf[] = "testing testing"; 511 + struct nstoken *nstoken; 512 + __u64 tstamp = 0; 513 + 514 + nstoken = open_netns(NS_DST); 515 + if (!nstoken) 516 + return -1; 517 + 518 + srv_fd = start_server(AF_INET6, SOCK_DGRAM, "::1", 0, 0); 519 + if (!ASSERT_GE(srv_fd, 0, "start_server")) 520 + goto done; 521 + 522 + err = setsockopt(srv_fd, SOL_SOCKET, SO_TIMESTAMPNS_NEW, 523 + &opt, sizeof(opt)); 524 + if (!ASSERT_OK(err, "setsockopt(SO_TIMESTAMPNS_NEW)")) 525 + goto done; 526 + 527 + cli_fd = connect_to_fd(srv_fd, TIMEOUT_MILLIS); 528 + if (!ASSERT_GE(cli_fd, 0, "connect_to_fd")) 529 + goto done; 530 + 531 + again: 532 + n = write(cli_fd, buf, sizeof(buf)); 533 + if (!ASSERT_EQ(n, sizeof(buf), "send to server")) 534 + goto done; 535 + err = __rcv_tstamp(srv_fd, buf, sizeof(buf), &tstamp); 536 + if (!ASSERT_OK(err, "__rcv_tstamp")) 537 + goto done; 538 + if (!tstamp && nretries++ < 5) { 539 + sleep(1); 540 + printf("netstamp_needed_key retry#%d\n", nretries); 541 + goto again; 542 + } 543 + 544 + done: 545 + if (!tstamp && srv_fd != -1) { 546 + close(srv_fd); 547 + srv_fd = -1; 548 + } 549 + if (cli_fd != -1) 550 + close(cli_fd); 551 + close_netns(nstoken); 552 + return srv_fd; 517 553 } 518 554 519 555 static void snd_tstamp(int fd, char *b, size_t s) ··· 904 832 { 905 833 struct test_tc_dtime *skel; 906 834 struct nstoken *nstoken; 907 - int err; 835 + int hold_tstamp_fd, err; 836 + 837 + /* Hold a sk with the SOCK_TIMESTAMP set to ensure there 838 + * is no delay in the kernel net_enable_timestamp(). 839 + * This ensures the following tests must have 840 + * non zero rcv tstamp in the recvmsg(). 841 + */ 842 + hold_tstamp_fd = wait_netstamp_needed_key(); 843 + if (!ASSERT_GE(hold_tstamp_fd, 0, "wait_netstamp_needed_key")) 844 + return; 908 845 909 846 skel = test_tc_dtime__open(); 910 847 if (!ASSERT_OK_PTR(skel, "test_tc_dtime__open")) 911 - return; 848 + goto done; 912 849 913 850 skel->rodata->IFINDEX_SRC = setup_result->ifindex_src_fwd; 914 851 skel->rodata->IFINDEX_DST = setup_result->ifindex_dst_fwd; ··· 962 881 963 882 done: 964 883 test_tc_dtime__destroy(skel); 884 + close(hold_tstamp_fd); 965 885 } 966 886 967 887 static void test_tc_redirect_neigh_fib(struct netns_setup_result *setup_result)

+150

tools/testing/selftests/bpf/prog_tests/tcp_custom_syncookie.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright Amazon.com Inc. or its affiliates. */ 3 + 4 + #define _GNU_SOURCE 5 + #include <sched.h> 6 + #include <stdlib.h> 7 + #include <net/if.h> 8 + 9 + #include "test_progs.h" 10 + #include "cgroup_helpers.h" 11 + #include "network_helpers.h" 12 + #include "test_tcp_custom_syncookie.skel.h" 13 + 14 + static struct test_tcp_custom_syncookie_case { 15 + int family, type; 16 + char addr[16]; 17 + char name[10]; 18 + } test_cases[] = { 19 + { 20 + .name = "IPv4 TCP", 21 + .family = AF_INET, 22 + .type = SOCK_STREAM, 23 + .addr = "127.0.0.1", 24 + }, 25 + { 26 + .name = "IPv6 TCP", 27 + .family = AF_INET6, 28 + .type = SOCK_STREAM, 29 + .addr = "::1", 30 + }, 31 + }; 32 + 33 + static int setup_netns(void) 34 + { 35 + if (!ASSERT_OK(unshare(CLONE_NEWNET), "create netns")) 36 + return -1; 37 + 38 + if (!ASSERT_OK(system("ip link set dev lo up"), "ip")) 39 + goto err; 40 + 41 + if (!ASSERT_OK(write_sysctl("/proc/sys/net/ipv4/tcp_ecn", "1"), 42 + "write_sysctl")) 43 + goto err; 44 + 45 + return 0; 46 + err: 47 + return -1; 48 + } 49 + 50 + static int setup_tc(struct test_tcp_custom_syncookie *skel) 51 + { 52 + LIBBPF_OPTS(bpf_tc_hook, qdisc_lo, .attach_point = BPF_TC_INGRESS); 53 + LIBBPF_OPTS(bpf_tc_opts, tc_attach, 54 + .prog_fd = bpf_program__fd(skel->progs.tcp_custom_syncookie)); 55 + 56 + qdisc_lo.ifindex = if_nametoindex("lo"); 57 + if (!ASSERT_OK(bpf_tc_hook_create(&qdisc_lo), "qdisc add dev lo clsact")) 58 + goto err; 59 + 60 + if (!ASSERT_OK(bpf_tc_attach(&qdisc_lo, &tc_attach), 61 + "filter add dev lo ingress")) 62 + goto err; 63 + 64 + return 0; 65 + err: 66 + return -1; 67 + } 68 + 69 + #define msg "Hello World" 70 + #define msglen 11 71 + 72 + static void transfer_message(int sender, int receiver) 73 + { 74 + char buf[msglen]; 75 + int ret; 76 + 77 + ret = send(sender, msg, msglen, 0); 78 + if (!ASSERT_EQ(ret, msglen, "send")) 79 + return; 80 + 81 + memset(buf, 0, sizeof(buf)); 82 + 83 + ret = recv(receiver, buf, msglen, 0); 84 + if (!ASSERT_EQ(ret, msglen, "recv")) 85 + return; 86 + 87 + ret = strncmp(buf, msg, msglen); 88 + if (!ASSERT_EQ(ret, 0, "strncmp")) 89 + return; 90 + } 91 + 92 + static void create_connection(struct test_tcp_custom_syncookie_case *test_case) 93 + { 94 + int server, client, child; 95 + 96 + server = start_server(test_case->family, test_case->type, test_case->addr, 0, 0); 97 + if (!ASSERT_NEQ(server, -1, "start_server")) 98 + return; 99 + 100 + client = connect_to_fd(server, 0); 101 + if (!ASSERT_NEQ(client, -1, "connect_to_fd")) 102 + goto close_server; 103 + 104 + child = accept(server, NULL, 0); 105 + if (!ASSERT_NEQ(child, -1, "accept")) 106 + goto close_client; 107 + 108 + transfer_message(client, child); 109 + transfer_message(child, client); 110 + 111 + close(child); 112 + close_client: 113 + close(client); 114 + close_server: 115 + close(server); 116 + } 117 + 118 + void test_tcp_custom_syncookie(void) 119 + { 120 + struct test_tcp_custom_syncookie *skel; 121 + int i; 122 + 123 + if (setup_netns()) 124 + return; 125 + 126 + skel = test_tcp_custom_syncookie__open_and_load(); 127 + if (!ASSERT_OK_PTR(skel, "open_and_load")) 128 + return; 129 + 130 + if (setup_tc(skel)) 131 + goto destroy_skel; 132 + 133 + for (i = 0; i < ARRAY_SIZE(test_cases); i++) { 134 + if (!test__start_subtest(test_cases[i].name)) 135 + continue; 136 + 137 + skel->bss->handled_syn = false; 138 + skel->bss->handled_ack = false; 139 + 140 + create_connection(&test_cases[i]); 141 + 142 + ASSERT_EQ(skel->bss->handled_syn, true, "SYN is not handled at tc."); 143 + ASSERT_EQ(skel->bss->handled_ack, true, "ACK is not handled at tc"); 144 + } 145 + 146 + destroy_skel: 147 + system("tc qdisc del dev lo clsact"); 148 + 149 + test_tcp_custom_syncookie__destroy(skel); 150 + }

+75

tools/testing/selftests/bpf/prog_tests/test_struct_ops_module.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */ 3 + #include <test_progs.h> 4 + #include <time.h> 5 + 6 + #include "struct_ops_module.skel.h" 7 + 8 + static void check_map_info(struct bpf_map_info *info) 9 + { 10 + struct bpf_btf_info btf_info; 11 + char btf_name[256]; 12 + u32 btf_info_len = sizeof(btf_info); 13 + int err, fd; 14 + 15 + fd = bpf_btf_get_fd_by_id(info->btf_vmlinux_id); 16 + if (!ASSERT_GE(fd, 0, "get_value_type_btf_obj_fd")) 17 + return; 18 + 19 + memset(&btf_info, 0, sizeof(btf_info)); 20 + btf_info.name = ptr_to_u64(btf_name); 21 + btf_info.name_len = sizeof(btf_name); 22 + err = bpf_btf_get_info_by_fd(fd, &btf_info, &btf_info_len); 23 + if (!ASSERT_OK(err, "get_value_type_btf_obj_info")) 24 + goto cleanup; 25 + 26 + if (!ASSERT_EQ(strcmp(btf_name, "bpf_testmod"), 0, "get_value_type_btf_obj_name")) 27 + goto cleanup; 28 + 29 + cleanup: 30 + close(fd); 31 + } 32 + 33 + static void test_struct_ops_load(void) 34 + { 35 + DECLARE_LIBBPF_OPTS(bpf_object_open_opts, opts); 36 + struct struct_ops_module *skel; 37 + struct bpf_map_info info = {}; 38 + struct bpf_link *link; 39 + int err; 40 + u32 len; 41 + 42 + skel = struct_ops_module__open_opts(&opts); 43 + if (!ASSERT_OK_PTR(skel, "struct_ops_module_open")) 44 + return; 45 + 46 + err = struct_ops_module__load(skel); 47 + if (!ASSERT_OK(err, "struct_ops_module_load")) 48 + goto cleanup; 49 + 50 + len = sizeof(info); 51 + err = bpf_map_get_info_by_fd(bpf_map__fd(skel->maps.testmod_1), &info, 52 + &len); 53 + if (!ASSERT_OK(err, "bpf_map_get_info_by_fd")) 54 + goto cleanup; 55 + 56 + link = bpf_map__attach_struct_ops(skel->maps.testmod_1); 57 + ASSERT_OK_PTR(link, "attach_test_mod_1"); 58 + 59 + /* test_2() will be called from bpf_dummy_reg() in bpf_testmod.c */ 60 + ASSERT_EQ(skel->bss->test_2_result, 7, "test_2_result"); 61 + 62 + bpf_link__destroy(link); 63 + 64 + check_map_info(&info); 65 + 66 + cleanup: 67 + struct_ops_module__destroy(skel); 68 + } 69 + 70 + void serial_test_struct_ops_module(void) 71 + { 72 + if (test__start_subtest("test_struct_ops_load")) 73 + test_struct_ops_load(); 74 + } 75 +

+1052

tools/testing/selftests/bpf/prog_tests/token.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */ 3 + #define _GNU_SOURCE 4 + #include <test_progs.h> 5 + #include <bpf/btf.h> 6 + #include "cap_helpers.h" 7 + #include <fcntl.h> 8 + #include <sched.h> 9 + #include <signal.h> 10 + #include <unistd.h> 11 + #include <linux/filter.h> 12 + #include <linux/unistd.h> 13 + #include <linux/mount.h> 14 + #include <sys/socket.h> 15 + #include <sys/stat.h> 16 + #include <sys/syscall.h> 17 + #include <sys/un.h> 18 + #include "priv_map.skel.h" 19 + #include "priv_prog.skel.h" 20 + #include "dummy_st_ops_success.skel.h" 21 + #include "token_lsm.skel.h" 22 + 23 + static inline int sys_mount(const char *dev_name, const char *dir_name, 24 + const char *type, unsigned long flags, 25 + const void *data) 26 + { 27 + return syscall(__NR_mount, dev_name, dir_name, type, flags, data); 28 + } 29 + 30 + static inline int sys_fsopen(const char *fsname, unsigned flags) 31 + { 32 + return syscall(__NR_fsopen, fsname, flags); 33 + } 34 + 35 + static inline int sys_fspick(int dfd, const char *path, unsigned flags) 36 + { 37 + return syscall(__NR_fspick, dfd, path, flags); 38 + } 39 + 40 + static inline int sys_fsconfig(int fs_fd, unsigned cmd, const char *key, const void *val, int aux) 41 + { 42 + return syscall(__NR_fsconfig, fs_fd, cmd, key, val, aux); 43 + } 44 + 45 + static inline int sys_fsmount(int fs_fd, unsigned flags, unsigned ms_flags) 46 + { 47 + return syscall(__NR_fsmount, fs_fd, flags, ms_flags); 48 + } 49 + 50 + static inline int sys_move_mount(int from_dfd, const char *from_path, 51 + int to_dfd, const char *to_path, 52 + unsigned flags) 53 + { 54 + return syscall(__NR_move_mount, from_dfd, from_path, to_dfd, to_path, flags); 55 + } 56 + 57 + static int drop_priv_caps(__u64 *old_caps) 58 + { 59 + return cap_disable_effective((1ULL << CAP_BPF) | 60 + (1ULL << CAP_PERFMON) | 61 + (1ULL << CAP_NET_ADMIN) | 62 + (1ULL << CAP_SYS_ADMIN), old_caps); 63 + } 64 + 65 + static int restore_priv_caps(__u64 old_caps) 66 + { 67 + return cap_enable_effective(old_caps, NULL); 68 + } 69 + 70 + static int set_delegate_mask(int fs_fd, const char *key, __u64 mask, const char *mask_str) 71 + { 72 + char buf[32]; 73 + int err; 74 + 75 + if (!mask_str) { 76 + if (mask == ~0ULL) { 77 + mask_str = "any"; 78 + } else { 79 + snprintf(buf, sizeof(buf), "0x%llx", (unsigned long long)mask); 80 + mask_str = buf; 81 + } 82 + } 83 + 84 + err = sys_fsconfig(fs_fd, FSCONFIG_SET_STRING, key, 85 + mask_str, 0); 86 + if (err < 0) 87 + err = -errno; 88 + return err; 89 + } 90 + 91 + #define zclose(fd) do { if (fd >= 0) close(fd); fd = -1; } while (0) 92 + 93 + struct bpffs_opts { 94 + __u64 cmds; 95 + __u64 maps; 96 + __u64 progs; 97 + __u64 attachs; 98 + const char *cmds_str; 99 + const char *maps_str; 100 + const char *progs_str; 101 + const char *attachs_str; 102 + }; 103 + 104 + static int create_bpffs_fd(void) 105 + { 106 + int fs_fd; 107 + 108 + /* create VFS context */ 109 + fs_fd = sys_fsopen("bpf", 0); 110 + ASSERT_GE(fs_fd, 0, "fs_fd"); 111 + 112 + return fs_fd; 113 + } 114 + 115 + static int materialize_bpffs_fd(int fs_fd, struct bpffs_opts *opts) 116 + { 117 + int mnt_fd, err; 118 + 119 + /* set up token delegation mount options */ 120 + err = set_delegate_mask(fs_fd, "delegate_cmds", opts->cmds, opts->cmds_str); 121 + if (!ASSERT_OK(err, "fs_cfg_cmds")) 122 + return err; 123 + err = set_delegate_mask(fs_fd, "delegate_maps", opts->maps, opts->maps_str); 124 + if (!ASSERT_OK(err, "fs_cfg_maps")) 125 + return err; 126 + err = set_delegate_mask(fs_fd, "delegate_progs", opts->progs, opts->progs_str); 127 + if (!ASSERT_OK(err, "fs_cfg_progs")) 128 + return err; 129 + err = set_delegate_mask(fs_fd, "delegate_attachs", opts->attachs, opts->attachs_str); 130 + if (!ASSERT_OK(err, "fs_cfg_attachs")) 131 + return err; 132 + 133 + /* instantiate FS object */ 134 + err = sys_fsconfig(fs_fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0); 135 + if (err < 0) 136 + return -errno; 137 + 138 + /* create O_PATH fd for detached mount */ 139 + mnt_fd = sys_fsmount(fs_fd, 0, 0); 140 + if (err < 0) 141 + return -errno; 142 + 143 + return mnt_fd; 144 + } 145 + 146 + /* send FD over Unix domain (AF_UNIX) socket */ 147 + static int sendfd(int sockfd, int fd) 148 + { 149 + struct msghdr msg = {}; 150 + struct cmsghdr *cmsg; 151 + int fds[1] = { fd }, err; 152 + char iobuf[1]; 153 + struct iovec io = { 154 + .iov_base = iobuf, 155 + .iov_len = sizeof(iobuf), 156 + }; 157 + union { 158 + char buf[CMSG_SPACE(sizeof(fds))]; 159 + struct cmsghdr align; 160 + } u; 161 + 162 + msg.msg_iov = &io; 163 + msg.msg_iovlen = 1; 164 + msg.msg_control = u.buf; 165 + msg.msg_controllen = sizeof(u.buf); 166 + cmsg = CMSG_FIRSTHDR(&msg); 167 + cmsg->cmsg_level = SOL_SOCKET; 168 + cmsg->cmsg_type = SCM_RIGHTS; 169 + cmsg->cmsg_len = CMSG_LEN(sizeof(fds)); 170 + memcpy(CMSG_DATA(cmsg), fds, sizeof(fds)); 171 + 172 + err = sendmsg(sockfd, &msg, 0); 173 + if (err < 0) 174 + err = -errno; 175 + if (!ASSERT_EQ(err, 1, "sendmsg")) 176 + return -EINVAL; 177 + 178 + return 0; 179 + } 180 + 181 + /* receive FD over Unix domain (AF_UNIX) socket */ 182 + static int recvfd(int sockfd, int *fd) 183 + { 184 + struct msghdr msg = {}; 185 + struct cmsghdr *cmsg; 186 + int fds[1], err; 187 + char iobuf[1]; 188 + struct iovec io = { 189 + .iov_base = iobuf, 190 + .iov_len = sizeof(iobuf), 191 + }; 192 + union { 193 + char buf[CMSG_SPACE(sizeof(fds))]; 194 + struct cmsghdr align; 195 + } u; 196 + 197 + msg.msg_iov = &io; 198 + msg.msg_iovlen = 1; 199 + msg.msg_control = u.buf; 200 + msg.msg_controllen = sizeof(u.buf); 201 + 202 + err = recvmsg(sockfd, &msg, 0); 203 + if (err < 0) 204 + err = -errno; 205 + if (!ASSERT_EQ(err, 1, "recvmsg")) 206 + return -EINVAL; 207 + 208 + cmsg = CMSG_FIRSTHDR(&msg); 209 + if (!ASSERT_OK_PTR(cmsg, "cmsg_null") || 210 + !ASSERT_EQ(cmsg->cmsg_len, CMSG_LEN(sizeof(fds)), "cmsg_len") || 211 + !ASSERT_EQ(cmsg->cmsg_level, SOL_SOCKET, "cmsg_level") || 212 + !ASSERT_EQ(cmsg->cmsg_type, SCM_RIGHTS, "cmsg_type")) 213 + return -EINVAL; 214 + 215 + memcpy(fds, CMSG_DATA(cmsg), sizeof(fds)); 216 + *fd = fds[0]; 217 + 218 + return 0; 219 + } 220 + 221 + static ssize_t write_nointr(int fd, const void *buf, size_t count) 222 + { 223 + ssize_t ret; 224 + 225 + do { 226 + ret = write(fd, buf, count); 227 + } while (ret < 0 && errno == EINTR); 228 + 229 + return ret; 230 + } 231 + 232 + static int write_file(const char *path, const void *buf, size_t count) 233 + { 234 + int fd; 235 + ssize_t ret; 236 + 237 + fd = open(path, O_WRONLY | O_CLOEXEC | O_NOCTTY | O_NOFOLLOW); 238 + if (fd < 0) 239 + return -1; 240 + 241 + ret = write_nointr(fd, buf, count); 242 + close(fd); 243 + if (ret < 0 || (size_t)ret != count) 244 + return -1; 245 + 246 + return 0; 247 + } 248 + 249 + static int create_and_enter_userns(void) 250 + { 251 + uid_t uid; 252 + gid_t gid; 253 + char map[100]; 254 + 255 + uid = getuid(); 256 + gid = getgid(); 257 + 258 + if (unshare(CLONE_NEWUSER)) 259 + return -1; 260 + 261 + if (write_file("/proc/self/setgroups", "deny", sizeof("deny") - 1) && 262 + errno != ENOENT) 263 + return -1; 264 + 265 + snprintf(map, sizeof(map), "0 %d 1", uid); 266 + if (write_file("/proc/self/uid_map", map, strlen(map))) 267 + return -1; 268 + 269 + 270 + snprintf(map, sizeof(map), "0 %d 1", gid); 271 + if (write_file("/proc/self/gid_map", map, strlen(map))) 272 + return -1; 273 + 274 + if (setgid(0)) 275 + return -1; 276 + 277 + if (setuid(0)) 278 + return -1; 279 + 280 + return 0; 281 + } 282 + 283 + typedef int (*child_callback_fn)(int bpffs_fd, struct token_lsm *lsm_skel); 284 + 285 + static void child(int sock_fd, struct bpffs_opts *opts, child_callback_fn callback) 286 + { 287 + int mnt_fd = -1, fs_fd = -1, err = 0, bpffs_fd = -1, token_fd = -1; 288 + struct token_lsm *lsm_skel = NULL; 289 + 290 + /* load and attach LSM "policy" before we go into unpriv userns */ 291 + lsm_skel = token_lsm__open_and_load(); 292 + if (!ASSERT_OK_PTR(lsm_skel, "lsm_skel_load")) { 293 + err = -EINVAL; 294 + goto cleanup; 295 + } 296 + lsm_skel->bss->my_pid = getpid(); 297 + err = token_lsm__attach(lsm_skel); 298 + if (!ASSERT_OK(err, "lsm_skel_attach")) 299 + goto cleanup; 300 + 301 + /* setup userns with root mappings */ 302 + err = create_and_enter_userns(); 303 + if (!ASSERT_OK(err, "create_and_enter_userns")) 304 + goto cleanup; 305 + 306 + /* setup mountns to allow creating BPF FS (fsopen("bpf")) from unpriv process */ 307 + err = unshare(CLONE_NEWNS); 308 + if (!ASSERT_OK(err, "create_mountns")) 309 + goto cleanup; 310 + 311 + err = sys_mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, 0); 312 + if (!ASSERT_OK(err, "remount_root")) 313 + goto cleanup; 314 + 315 + fs_fd = create_bpffs_fd(); 316 + if (!ASSERT_GE(fs_fd, 0, "create_bpffs_fd")) { 317 + err = -EINVAL; 318 + goto cleanup; 319 + } 320 + 321 + /* ensure unprivileged child cannot set delegation options */ 322 + err = set_delegate_mask(fs_fd, "delegate_cmds", 0x1, NULL); 323 + ASSERT_EQ(err, -EPERM, "delegate_cmd_eperm"); 324 + err = set_delegate_mask(fs_fd, "delegate_maps", 0x1, NULL); 325 + ASSERT_EQ(err, -EPERM, "delegate_maps_eperm"); 326 + err = set_delegate_mask(fs_fd, "delegate_progs", 0x1, NULL); 327 + ASSERT_EQ(err, -EPERM, "delegate_progs_eperm"); 328 + err = set_delegate_mask(fs_fd, "delegate_attachs", 0x1, NULL); 329 + ASSERT_EQ(err, -EPERM, "delegate_attachs_eperm"); 330 + 331 + /* pass BPF FS context object to parent */ 332 + err = sendfd(sock_fd, fs_fd); 333 + if (!ASSERT_OK(err, "send_fs_fd")) 334 + goto cleanup; 335 + zclose(fs_fd); 336 + 337 + /* avoid mucking around with mount namespaces and mounting at 338 + * well-known path, just get detach-mounted BPF FS fd back from parent 339 + */ 340 + err = recvfd(sock_fd, &mnt_fd); 341 + if (!ASSERT_OK(err, "recv_mnt_fd")) 342 + goto cleanup; 343 + 344 + /* try to fspick() BPF FS and try to add some delegation options */ 345 + fs_fd = sys_fspick(mnt_fd, "", FSPICK_EMPTY_PATH); 346 + if (!ASSERT_GE(fs_fd, 0, "bpffs_fspick")) { 347 + err = -EINVAL; 348 + goto cleanup; 349 + } 350 + 351 + /* ensure unprivileged child cannot reconfigure to set delegation options */ 352 + err = set_delegate_mask(fs_fd, "delegate_cmds", 0, "any"); 353 + if (!ASSERT_EQ(err, -EPERM, "delegate_cmd_eperm_reconfig")) { 354 + err = -EINVAL; 355 + goto cleanup; 356 + } 357 + err = set_delegate_mask(fs_fd, "delegate_maps", 0, "any"); 358 + if (!ASSERT_EQ(err, -EPERM, "delegate_maps_eperm_reconfig")) { 359 + err = -EINVAL; 360 + goto cleanup; 361 + } 362 + err = set_delegate_mask(fs_fd, "delegate_progs", 0, "any"); 363 + if (!ASSERT_EQ(err, -EPERM, "delegate_progs_eperm_reconfig")) { 364 + err = -EINVAL; 365 + goto cleanup; 366 + } 367 + err = set_delegate_mask(fs_fd, "delegate_attachs", 0, "any"); 368 + if (!ASSERT_EQ(err, -EPERM, "delegate_attachs_eperm_reconfig")) { 369 + err = -EINVAL; 370 + goto cleanup; 371 + } 372 + zclose(fs_fd); 373 + 374 + bpffs_fd = openat(mnt_fd, ".", 0, O_RDWR); 375 + if (!ASSERT_GE(bpffs_fd, 0, "bpffs_open")) { 376 + err = -EINVAL; 377 + goto cleanup; 378 + } 379 + 380 + /* create BPF token FD and pass it to parent for some extra checks */ 381 + token_fd = bpf_token_create(bpffs_fd, NULL); 382 + if (!ASSERT_GT(token_fd, 0, "child_token_create")) { 383 + err = -EINVAL; 384 + goto cleanup; 385 + } 386 + err = sendfd(sock_fd, token_fd); 387 + if (!ASSERT_OK(err, "send_token_fd")) 388 + goto cleanup; 389 + zclose(token_fd); 390 + 391 + /* do custom test logic with customly set up BPF FS instance */ 392 + err = callback(bpffs_fd, lsm_skel); 393 + if (!ASSERT_OK(err, "test_callback")) 394 + goto cleanup; 395 + 396 + err = 0; 397 + cleanup: 398 + zclose(sock_fd); 399 + zclose(mnt_fd); 400 + zclose(fs_fd); 401 + zclose(bpffs_fd); 402 + zclose(token_fd); 403 + 404 + lsm_skel->bss->my_pid = 0; 405 + token_lsm__destroy(lsm_skel); 406 + 407 + exit(-err); 408 + } 409 + 410 + static int wait_for_pid(pid_t pid) 411 + { 412 + int status, ret; 413 + 414 + again: 415 + ret = waitpid(pid, &status, 0); 416 + if (ret == -1) { 417 + if (errno == EINTR) 418 + goto again; 419 + 420 + return -1; 421 + } 422 + 423 + if (!WIFEXITED(status)) 424 + return -1; 425 + 426 + return WEXITSTATUS(status); 427 + } 428 + 429 + static void parent(int child_pid, struct bpffs_opts *bpffs_opts, int sock_fd) 430 + { 431 + int fs_fd = -1, mnt_fd = -1, token_fd = -1, err; 432 + 433 + err = recvfd(sock_fd, &fs_fd); 434 + if (!ASSERT_OK(err, "recv_bpffs_fd")) 435 + goto cleanup; 436 + 437 + mnt_fd = materialize_bpffs_fd(fs_fd, bpffs_opts); 438 + if (!ASSERT_GE(mnt_fd, 0, "materialize_bpffs_fd")) { 439 + err = -EINVAL; 440 + goto cleanup; 441 + } 442 + zclose(fs_fd); 443 + 444 + /* pass BPF FS context object to parent */ 445 + err = sendfd(sock_fd, mnt_fd); 446 + if (!ASSERT_OK(err, "send_mnt_fd")) 447 + goto cleanup; 448 + zclose(mnt_fd); 449 + 450 + /* receive BPF token FD back from child for some extra tests */ 451 + err = recvfd(sock_fd, &token_fd); 452 + if (!ASSERT_OK(err, "recv_token_fd")) 453 + goto cleanup; 454 + 455 + err = wait_for_pid(child_pid); 456 + ASSERT_OK(err, "waitpid_child"); 457 + 458 + cleanup: 459 + zclose(sock_fd); 460 + zclose(fs_fd); 461 + zclose(mnt_fd); 462 + zclose(token_fd); 463 + 464 + if (child_pid > 0) 465 + (void)kill(child_pid, SIGKILL); 466 + } 467 + 468 + static void subtest_userns(struct bpffs_opts *bpffs_opts, 469 + child_callback_fn child_cb) 470 + { 471 + int sock_fds[2] = { -1, -1 }; 472 + int child_pid = 0, err; 473 + 474 + err = socketpair(AF_UNIX, SOCK_STREAM, 0, sock_fds); 475 + if (!ASSERT_OK(err, "socketpair")) 476 + goto cleanup; 477 + 478 + child_pid = fork(); 479 + if (!ASSERT_GE(child_pid, 0, "fork")) 480 + goto cleanup; 481 + 482 + if (child_pid == 0) { 483 + zclose(sock_fds[0]); 484 + return child(sock_fds[1], bpffs_opts, child_cb); 485 + 486 + } else { 487 + zclose(sock_fds[1]); 488 + return parent(child_pid, bpffs_opts, sock_fds[0]); 489 + } 490 + 491 + cleanup: 492 + zclose(sock_fds[0]); 493 + zclose(sock_fds[1]); 494 + if (child_pid > 0) 495 + (void)kill(child_pid, SIGKILL); 496 + } 497 + 498 + static int userns_map_create(int mnt_fd, struct token_lsm *lsm_skel) 499 + { 500 + LIBBPF_OPTS(bpf_map_create_opts, map_opts); 501 + int err, token_fd = -1, map_fd = -1; 502 + __u64 old_caps = 0; 503 + 504 + /* create BPF token from BPF FS mount */ 505 + token_fd = bpf_token_create(mnt_fd, NULL); 506 + if (!ASSERT_GT(token_fd, 0, "token_create")) { 507 + err = -EINVAL; 508 + goto cleanup; 509 + } 510 + 511 + /* while inside non-init userns, we need both a BPF token *and* 512 + * CAP_BPF inside current userns to create privileged map; let's test 513 + * that neither BPF token alone nor namespaced CAP_BPF is sufficient 514 + */ 515 + err = drop_priv_caps(&old_caps); 516 + if (!ASSERT_OK(err, "drop_caps")) 517 + goto cleanup; 518 + 519 + /* no token, no CAP_BPF -> fail */ 520 + map_opts.map_flags = 0; 521 + map_opts.token_fd = 0; 522 + map_fd = bpf_map_create(BPF_MAP_TYPE_STACK, "wo_token_wo_bpf", 0, 8, 1, &map_opts); 523 + if (!ASSERT_LT(map_fd, 0, "stack_map_wo_token_wo_cap_bpf_should_fail")) { 524 + err = -EINVAL; 525 + goto cleanup; 526 + } 527 + 528 + /* token without CAP_BPF -> fail */ 529 + map_opts.map_flags = BPF_F_TOKEN_FD; 530 + map_opts.token_fd = token_fd; 531 + map_fd = bpf_map_create(BPF_MAP_TYPE_STACK, "w_token_wo_bpf", 0, 8, 1, &map_opts); 532 + if (!ASSERT_LT(map_fd, 0, "stack_map_w_token_wo_cap_bpf_should_fail")) { 533 + err = -EINVAL; 534 + goto cleanup; 535 + } 536 + 537 + /* get back effective local CAP_BPF (and CAP_SYS_ADMIN) */ 538 + err = restore_priv_caps(old_caps); 539 + if (!ASSERT_OK(err, "restore_caps")) 540 + goto cleanup; 541 + 542 + /* CAP_BPF without token -> fail */ 543 + map_opts.map_flags = 0; 544 + map_opts.token_fd = 0; 545 + map_fd = bpf_map_create(BPF_MAP_TYPE_STACK, "wo_token_w_bpf", 0, 8, 1, &map_opts); 546 + if (!ASSERT_LT(map_fd, 0, "stack_map_wo_token_w_cap_bpf_should_fail")) { 547 + err = -EINVAL; 548 + goto cleanup; 549 + } 550 + 551 + /* finally, namespaced CAP_BPF + token -> success */ 552 + map_opts.map_flags = BPF_F_TOKEN_FD; 553 + map_opts.token_fd = token_fd; 554 + map_fd = bpf_map_create(BPF_MAP_TYPE_STACK, "w_token_w_bpf", 0, 8, 1, &map_opts); 555 + if (!ASSERT_GT(map_fd, 0, "stack_map_w_token_w_cap_bpf")) { 556 + err = -EINVAL; 557 + goto cleanup; 558 + } 559 + 560 + cleanup: 561 + zclose(token_fd); 562 + zclose(map_fd); 563 + return err; 564 + } 565 + 566 + static int userns_btf_load(int mnt_fd, struct token_lsm *lsm_skel) 567 + { 568 + LIBBPF_OPTS(bpf_btf_load_opts, btf_opts); 569 + int err, token_fd = -1, btf_fd = -1; 570 + const void *raw_btf_data; 571 + struct btf *btf = NULL; 572 + __u32 raw_btf_size; 573 + __u64 old_caps = 0; 574 + 575 + /* create BPF token from BPF FS mount */ 576 + token_fd = bpf_token_create(mnt_fd, NULL); 577 + if (!ASSERT_GT(token_fd, 0, "token_create")) { 578 + err = -EINVAL; 579 + goto cleanup; 580 + } 581 + 582 + /* while inside non-init userns, we need both a BPF token *and* 583 + * CAP_BPF inside current userns to create privileged map; let's test 584 + * that neither BPF token alone nor namespaced CAP_BPF is sufficient 585 + */ 586 + err = drop_priv_caps(&old_caps); 587 + if (!ASSERT_OK(err, "drop_caps")) 588 + goto cleanup; 589 + 590 + /* setup a trivial BTF data to load to the kernel */ 591 + btf = btf__new_empty(); 592 + if (!ASSERT_OK_PTR(btf, "empty_btf")) 593 + goto cleanup; 594 + 595 + ASSERT_GT(btf__add_int(btf, "int", 4, 0), 0, "int_type"); 596 + 597 + raw_btf_data = btf__raw_data(btf, &raw_btf_size); 598 + if (!ASSERT_OK_PTR(raw_btf_data, "raw_btf_data")) 599 + goto cleanup; 600 + 601 + /* no token + no CAP_BPF -> failure */ 602 + btf_opts.btf_flags = 0; 603 + btf_opts.token_fd = 0; 604 + btf_fd = bpf_btf_load(raw_btf_data, raw_btf_size, &btf_opts); 605 + if (!ASSERT_LT(btf_fd, 0, "no_token_no_cap_should_fail")) 606 + goto cleanup; 607 + 608 + /* token + no CAP_BPF -> failure */ 609 + btf_opts.btf_flags = BPF_F_TOKEN_FD; 610 + btf_opts.token_fd = token_fd; 611 + btf_fd = bpf_btf_load(raw_btf_data, raw_btf_size, &btf_opts); 612 + if (!ASSERT_LT(btf_fd, 0, "token_no_cap_should_fail")) 613 + goto cleanup; 614 + 615 + /* get back effective local CAP_BPF (and CAP_SYS_ADMIN) */ 616 + err = restore_priv_caps(old_caps); 617 + if (!ASSERT_OK(err, "restore_caps")) 618 + goto cleanup; 619 + 620 + /* token + CAP_BPF -> success */ 621 + btf_opts.btf_flags = BPF_F_TOKEN_FD; 622 + btf_opts.token_fd = token_fd; 623 + btf_fd = bpf_btf_load(raw_btf_data, raw_btf_size, &btf_opts); 624 + if (!ASSERT_GT(btf_fd, 0, "token_and_cap_success")) 625 + goto cleanup; 626 + 627 + err = 0; 628 + cleanup: 629 + btf__free(btf); 630 + zclose(btf_fd); 631 + zclose(token_fd); 632 + return err; 633 + } 634 + 635 + static int userns_prog_load(int mnt_fd, struct token_lsm *lsm_skel) 636 + { 637 + LIBBPF_OPTS(bpf_prog_load_opts, prog_opts); 638 + int err, token_fd = -1, prog_fd = -1; 639 + struct bpf_insn insns[] = { 640 + /* bpf_jiffies64() requires CAP_BPF */ 641 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_jiffies64), 642 + /* bpf_get_current_task() requires CAP_PERFMON */ 643 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_current_task), 644 + /* r0 = 0; exit; */ 645 + BPF_MOV64_IMM(BPF_REG_0, 0), 646 + BPF_EXIT_INSN(), 647 + }; 648 + size_t insn_cnt = ARRAY_SIZE(insns); 649 + __u64 old_caps = 0; 650 + 651 + /* create BPF token from BPF FS mount */ 652 + token_fd = bpf_token_create(mnt_fd, NULL); 653 + if (!ASSERT_GT(token_fd, 0, "token_create")) { 654 + err = -EINVAL; 655 + goto cleanup; 656 + } 657 + 658 + /* validate we can successfully load BPF program with token; this 659 + * being XDP program (CAP_NET_ADMIN) using bpf_jiffies64() (CAP_BPF) 660 + * and bpf_get_current_task() (CAP_PERFMON) helpers validates we have 661 + * BPF token wired properly in a bunch of places in the kernel 662 + */ 663 + prog_opts.prog_flags = BPF_F_TOKEN_FD; 664 + prog_opts.token_fd = token_fd; 665 + prog_opts.expected_attach_type = BPF_XDP; 666 + prog_fd = bpf_prog_load(BPF_PROG_TYPE_XDP, "token_prog", "GPL", 667 + insns, insn_cnt, &prog_opts); 668 + if (!ASSERT_GT(prog_fd, 0, "prog_fd")) { 669 + err = -EPERM; 670 + goto cleanup; 671 + } 672 + 673 + /* no token + caps -> failure */ 674 + prog_opts.prog_flags = 0; 675 + prog_opts.token_fd = 0; 676 + prog_fd = bpf_prog_load(BPF_PROG_TYPE_XDP, "token_prog", "GPL", 677 + insns, insn_cnt, &prog_opts); 678 + if (!ASSERT_EQ(prog_fd, -EPERM, "prog_fd_eperm")) { 679 + err = -EPERM; 680 + goto cleanup; 681 + } 682 + 683 + err = drop_priv_caps(&old_caps); 684 + if (!ASSERT_OK(err, "drop_caps")) 685 + goto cleanup; 686 + 687 + /* no caps + token -> failure */ 688 + prog_opts.prog_flags = BPF_F_TOKEN_FD; 689 + prog_opts.token_fd = token_fd; 690 + prog_fd = bpf_prog_load(BPF_PROG_TYPE_XDP, "token_prog", "GPL", 691 + insns, insn_cnt, &prog_opts); 692 + if (!ASSERT_EQ(prog_fd, -EPERM, "prog_fd_eperm")) { 693 + err = -EPERM; 694 + goto cleanup; 695 + } 696 + 697 + /* no caps + no token -> definitely a failure */ 698 + prog_opts.prog_flags = 0; 699 + prog_opts.token_fd = 0; 700 + prog_fd = bpf_prog_load(BPF_PROG_TYPE_XDP, "token_prog", "GPL", 701 + insns, insn_cnt, &prog_opts); 702 + if (!ASSERT_EQ(prog_fd, -EPERM, "prog_fd_eperm")) { 703 + err = -EPERM; 704 + goto cleanup; 705 + } 706 + 707 + err = 0; 708 + cleanup: 709 + zclose(prog_fd); 710 + zclose(token_fd); 711 + return err; 712 + } 713 + 714 + static int userns_obj_priv_map(int mnt_fd, struct token_lsm *lsm_skel) 715 + { 716 + LIBBPF_OPTS(bpf_object_open_opts, opts); 717 + char buf[256]; 718 + struct priv_map *skel; 719 + int err; 720 + 721 + skel = priv_map__open_and_load(); 722 + if (!ASSERT_ERR_PTR(skel, "obj_tokenless_load")) { 723 + priv_map__destroy(skel); 724 + return -EINVAL; 725 + } 726 + 727 + /* use bpf_token_path to provide BPF FS path */ 728 + snprintf(buf, sizeof(buf), "/proc/self/fd/%d", mnt_fd); 729 + opts.bpf_token_path = buf; 730 + skel = priv_map__open_opts(&opts); 731 + if (!ASSERT_OK_PTR(skel, "obj_token_path_open")) 732 + return -EINVAL; 733 + 734 + err = priv_map__load(skel); 735 + priv_map__destroy(skel); 736 + if (!ASSERT_OK(err, "obj_token_path_load")) 737 + return -EINVAL; 738 + 739 + return 0; 740 + } 741 + 742 + static int userns_obj_priv_prog(int mnt_fd, struct token_lsm *lsm_skel) 743 + { 744 + LIBBPF_OPTS(bpf_object_open_opts, opts); 745 + char buf[256]; 746 + struct priv_prog *skel; 747 + int err; 748 + 749 + skel = priv_prog__open_and_load(); 750 + if (!ASSERT_ERR_PTR(skel, "obj_tokenless_load")) { 751 + priv_prog__destroy(skel); 752 + return -EINVAL; 753 + } 754 + 755 + /* use bpf_token_path to provide BPF FS path */ 756 + snprintf(buf, sizeof(buf), "/proc/self/fd/%d", mnt_fd); 757 + opts.bpf_token_path = buf; 758 + skel = priv_prog__open_opts(&opts); 759 + if (!ASSERT_OK_PTR(skel, "obj_token_path_open")) 760 + return -EINVAL; 761 + err = priv_prog__load(skel); 762 + priv_prog__destroy(skel); 763 + if (!ASSERT_OK(err, "obj_token_path_load")) 764 + return -EINVAL; 765 + 766 + /* provide BPF token, but reject bpf_token_capable() with LSM */ 767 + lsm_skel->bss->reject_capable = true; 768 + lsm_skel->bss->reject_cmd = false; 769 + skel = priv_prog__open_opts(&opts); 770 + if (!ASSERT_OK_PTR(skel, "obj_token_lsm_reject_cap_open")) 771 + return -EINVAL; 772 + err = priv_prog__load(skel); 773 + priv_prog__destroy(skel); 774 + if (!ASSERT_ERR(err, "obj_token_lsm_reject_cap_load")) 775 + return -EINVAL; 776 + 777 + /* provide BPF token, but reject bpf_token_cmd() with LSM */ 778 + lsm_skel->bss->reject_capable = false; 779 + lsm_skel->bss->reject_cmd = true; 780 + skel = priv_prog__open_opts(&opts); 781 + if (!ASSERT_OK_PTR(skel, "obj_token_lsm_reject_cmd_open")) 782 + return -EINVAL; 783 + err = priv_prog__load(skel); 784 + priv_prog__destroy(skel); 785 + if (!ASSERT_ERR(err, "obj_token_lsm_reject_cmd_load")) 786 + return -EINVAL; 787 + 788 + return 0; 789 + } 790 + 791 + /* this test is called with BPF FS that doesn't delegate BPF_BTF_LOAD command, 792 + * which should cause struct_ops application to fail, as BTF won't be uploaded 793 + * into the kernel, even if STRUCT_OPS programs themselves are allowed 794 + */ 795 + static int validate_struct_ops_load(int mnt_fd, bool expect_success) 796 + { 797 + LIBBPF_OPTS(bpf_object_open_opts, opts); 798 + char buf[256]; 799 + struct dummy_st_ops_success *skel; 800 + int err; 801 + 802 + snprintf(buf, sizeof(buf), "/proc/self/fd/%d", mnt_fd); 803 + opts.bpf_token_path = buf; 804 + skel = dummy_st_ops_success__open_opts(&opts); 805 + if (!ASSERT_OK_PTR(skel, "obj_token_path_open")) 806 + return -EINVAL; 807 + 808 + err = dummy_st_ops_success__load(skel); 809 + dummy_st_ops_success__destroy(skel); 810 + if (expect_success) { 811 + if (!ASSERT_OK(err, "obj_token_path_load")) 812 + return -EINVAL; 813 + } else /* expect failure */ { 814 + if (!ASSERT_ERR(err, "obj_token_path_load")) 815 + return -EINVAL; 816 + } 817 + 818 + return 0; 819 + } 820 + 821 + static int userns_obj_priv_btf_fail(int mnt_fd, struct token_lsm *lsm_skel) 822 + { 823 + return validate_struct_ops_load(mnt_fd, false /* should fail */); 824 + } 825 + 826 + static int userns_obj_priv_btf_success(int mnt_fd, struct token_lsm *lsm_skel) 827 + { 828 + return validate_struct_ops_load(mnt_fd, true /* should succeed */); 829 + } 830 + 831 + #define TOKEN_ENVVAR "LIBBPF_BPF_TOKEN_PATH" 832 + #define TOKEN_BPFFS_CUSTOM "/bpf-token-fs" 833 + 834 + static int userns_obj_priv_implicit_token(int mnt_fd, struct token_lsm *lsm_skel) 835 + { 836 + LIBBPF_OPTS(bpf_object_open_opts, opts); 837 + struct dummy_st_ops_success *skel; 838 + int err; 839 + 840 + /* before we mount BPF FS with token delegation, struct_ops skeleton 841 + * should fail to load 842 + */ 843 + skel = dummy_st_ops_success__open_and_load(); 844 + if (!ASSERT_ERR_PTR(skel, "obj_tokenless_load")) { 845 + dummy_st_ops_success__destroy(skel); 846 + return -EINVAL; 847 + } 848 + 849 + /* mount custom BPF FS over /sys/fs/bpf so that libbpf can create BPF 850 + * token automatically and implicitly 851 + */ 852 + err = sys_move_mount(mnt_fd, "", AT_FDCWD, "/sys/fs/bpf", MOVE_MOUNT_F_EMPTY_PATH); 853 + if (!ASSERT_OK(err, "move_mount_bpffs")) 854 + return -EINVAL; 855 + 856 + /* disable implicit BPF token creation by setting 857 + * LIBBPF_BPF_TOKEN_PATH envvar to empty value, load should fail 858 + */ 859 + err = setenv(TOKEN_ENVVAR, "", 1 /*overwrite*/); 860 + if (!ASSERT_OK(err, "setenv_token_path")) 861 + return -EINVAL; 862 + skel = dummy_st_ops_success__open_and_load(); 863 + if (!ASSERT_ERR_PTR(skel, "obj_token_envvar_disabled_load")) { 864 + unsetenv(TOKEN_ENVVAR); 865 + dummy_st_ops_success__destroy(skel); 866 + return -EINVAL; 867 + } 868 + unsetenv(TOKEN_ENVVAR); 869 + 870 + /* now the same struct_ops skeleton should succeed thanks to libppf 871 + * creating BPF token from /sys/fs/bpf mount point 872 + */ 873 + skel = dummy_st_ops_success__open_and_load(); 874 + if (!ASSERT_OK_PTR(skel, "obj_implicit_token_load")) 875 + return -EINVAL; 876 + 877 + dummy_st_ops_success__destroy(skel); 878 + 879 + /* now disable implicit token through empty bpf_token_path, should fail */ 880 + opts.bpf_token_path = ""; 881 + skel = dummy_st_ops_success__open_opts(&opts); 882 + if (!ASSERT_OK_PTR(skel, "obj_empty_token_path_open")) 883 + return -EINVAL; 884 + 885 + err = dummy_st_ops_success__load(skel); 886 + dummy_st_ops_success__destroy(skel); 887 + if (!ASSERT_ERR(err, "obj_empty_token_path_load")) 888 + return -EINVAL; 889 + 890 + return 0; 891 + } 892 + 893 + static int userns_obj_priv_implicit_token_envvar(int mnt_fd, struct token_lsm *lsm_skel) 894 + { 895 + LIBBPF_OPTS(bpf_object_open_opts, opts); 896 + struct dummy_st_ops_success *skel; 897 + int err; 898 + 899 + /* before we mount BPF FS with token delegation, struct_ops skeleton 900 + * should fail to load 901 + */ 902 + skel = dummy_st_ops_success__open_and_load(); 903 + if (!ASSERT_ERR_PTR(skel, "obj_tokenless_load")) { 904 + dummy_st_ops_success__destroy(skel); 905 + return -EINVAL; 906 + } 907 + 908 + /* mount custom BPF FS over custom location, so libbpf can't create 909 + * BPF token implicitly, unless pointed to it through 910 + * LIBBPF_BPF_TOKEN_PATH envvar 911 + */ 912 + rmdir(TOKEN_BPFFS_CUSTOM); 913 + if (!ASSERT_OK(mkdir(TOKEN_BPFFS_CUSTOM, 0777), "mkdir_bpffs_custom")) 914 + goto err_out; 915 + err = sys_move_mount(mnt_fd, "", AT_FDCWD, TOKEN_BPFFS_CUSTOM, MOVE_MOUNT_F_EMPTY_PATH); 916 + if (!ASSERT_OK(err, "move_mount_bpffs")) 917 + goto err_out; 918 + 919 + /* even though we have BPF FS with delegation, it's not at default 920 + * /sys/fs/bpf location, so we still fail to load until envvar is set up 921 + */ 922 + skel = dummy_st_ops_success__open_and_load(); 923 + if (!ASSERT_ERR_PTR(skel, "obj_tokenless_load2")) { 924 + dummy_st_ops_success__destroy(skel); 925 + goto err_out; 926 + } 927 + 928 + err = setenv(TOKEN_ENVVAR, TOKEN_BPFFS_CUSTOM, 1 /*overwrite*/); 929 + if (!ASSERT_OK(err, "setenv_token_path")) 930 + goto err_out; 931 + 932 + /* now the same struct_ops skeleton should succeed thanks to libppf 933 + * creating BPF token from custom mount point 934 + */ 935 + skel = dummy_st_ops_success__open_and_load(); 936 + if (!ASSERT_OK_PTR(skel, "obj_implicit_token_load")) 937 + goto err_out; 938 + 939 + dummy_st_ops_success__destroy(skel); 940 + 941 + /* now disable implicit token through empty bpf_token_path, envvar 942 + * will be ignored, should fail 943 + */ 944 + opts.bpf_token_path = ""; 945 + skel = dummy_st_ops_success__open_opts(&opts); 946 + if (!ASSERT_OK_PTR(skel, "obj_empty_token_path_open")) 947 + goto err_out; 948 + 949 + err = dummy_st_ops_success__load(skel); 950 + dummy_st_ops_success__destroy(skel); 951 + if (!ASSERT_ERR(err, "obj_empty_token_path_load")) 952 + goto err_out; 953 + 954 + rmdir(TOKEN_BPFFS_CUSTOM); 955 + unsetenv(TOKEN_ENVVAR); 956 + return 0; 957 + err_out: 958 + rmdir(TOKEN_BPFFS_CUSTOM); 959 + unsetenv(TOKEN_ENVVAR); 960 + return -EINVAL; 961 + } 962 + 963 + #define bit(n) (1ULL << (n)) 964 + 965 + void test_token(void) 966 + { 967 + if (test__start_subtest("map_token")) { 968 + struct bpffs_opts opts = { 969 + .cmds_str = "map_create", 970 + .maps_str = "stack", 971 + }; 972 + 973 + subtest_userns(&opts, userns_map_create); 974 + } 975 + if (test__start_subtest("btf_token")) { 976 + struct bpffs_opts opts = { 977 + .cmds = 1ULL << BPF_BTF_LOAD, 978 + }; 979 + 980 + subtest_userns(&opts, userns_btf_load); 981 + } 982 + if (test__start_subtest("prog_token")) { 983 + struct bpffs_opts opts = { 984 + .cmds_str = "PROG_LOAD", 985 + .progs_str = "XDP", 986 + .attachs_str = "xdp", 987 + }; 988 + 989 + subtest_userns(&opts, userns_prog_load); 990 + } 991 + if (test__start_subtest("obj_priv_map")) { 992 + struct bpffs_opts opts = { 993 + .cmds = bit(BPF_MAP_CREATE), 994 + .maps = bit(BPF_MAP_TYPE_QUEUE), 995 + }; 996 + 997 + subtest_userns(&opts, userns_obj_priv_map); 998 + } 999 + if (test__start_subtest("obj_priv_prog")) { 1000 + struct bpffs_opts opts = { 1001 + .cmds = bit(BPF_PROG_LOAD), 1002 + .progs = bit(BPF_PROG_TYPE_KPROBE), 1003 + .attachs = ~0ULL, 1004 + }; 1005 + 1006 + subtest_userns(&opts, userns_obj_priv_prog); 1007 + } 1008 + if (test__start_subtest("obj_priv_btf_fail")) { 1009 + struct bpffs_opts opts = { 1010 + /* disallow BTF loading */ 1011 + .cmds = bit(BPF_MAP_CREATE) | bit(BPF_PROG_LOAD), 1012 + .maps = bit(BPF_MAP_TYPE_STRUCT_OPS), 1013 + .progs = bit(BPF_PROG_TYPE_STRUCT_OPS), 1014 + .attachs = ~0ULL, 1015 + }; 1016 + 1017 + subtest_userns(&opts, userns_obj_priv_btf_fail); 1018 + } 1019 + if (test__start_subtest("obj_priv_btf_success")) { 1020 + struct bpffs_opts opts = { 1021 + /* allow BTF loading */ 1022 + .cmds = bit(BPF_BTF_LOAD) | bit(BPF_MAP_CREATE) | bit(BPF_PROG_LOAD), 1023 + .maps = bit(BPF_MAP_TYPE_STRUCT_OPS), 1024 + .progs = bit(BPF_PROG_TYPE_STRUCT_OPS), 1025 + .attachs = ~0ULL, 1026 + }; 1027 + 1028 + subtest_userns(&opts, userns_obj_priv_btf_success); 1029 + } 1030 + if (test__start_subtest("obj_priv_implicit_token")) { 1031 + struct bpffs_opts opts = { 1032 + /* allow BTF loading */ 1033 + .cmds = bit(BPF_BTF_LOAD) | bit(BPF_MAP_CREATE) | bit(BPF_PROG_LOAD), 1034 + .maps = bit(BPF_MAP_TYPE_STRUCT_OPS), 1035 + .progs = bit(BPF_PROG_TYPE_STRUCT_OPS), 1036 + .attachs = ~0ULL, 1037 + }; 1038 + 1039 + subtest_userns(&opts, userns_obj_priv_implicit_token); 1040 + } 1041 + if (test__start_subtest("obj_priv_implicit_token_envvar")) { 1042 + struct bpffs_opts opts = { 1043 + /* allow BTF loading */ 1044 + .cmds = bit(BPF_BTF_LOAD) | bit(BPF_MAP_CREATE) | bit(BPF_PROG_LOAD), 1045 + .maps = bit(BPF_MAP_TYPE_STRUCT_OPS), 1046 + .progs = bit(BPF_PROG_TYPE_STRUCT_OPS), 1047 + .attachs = ~0ULL, 1048 + }; 1049 + 1050 + subtest_userns(&opts, userns_obj_priv_implicit_token_envvar); 1051 + } 1052 + }

+1 -1

tools/testing/selftests/bpf/prog_tests/xdpwall.c

··· 9 9 struct xdpwall *skel; 10 10 11 11 skel = xdpwall__open_and_load(); 12 - ASSERT_OK_PTR(skel, "Does LLMV have https://reviews.llvm.org/D109073?"); 12 + ASSERT_OK_PTR(skel, "Does LLVM have https://github.com/llvm/llvm-project/commit/ea72b0319d7b0f0c2fcf41d121afa5d031b319d5?"); 13 13 14 14 xdpwall__destroy(skel); 15 15 }

+1 -1

tools/testing/selftests/bpf/progs/bpf_misc.h

··· 80 80 #define __imm(name) [name]"i"(name) 81 81 #define __imm_const(name, expr) [name]"i"(expr) 82 82 #define __imm_addr(name) [name]"i"(&name) 83 - #define __imm_ptr(name) [name]"p"(&name) 83 + #define __imm_ptr(name) [name]"r"(&name) 84 84 #define __imm_insn(name, expr) [name]"i"(*(long *)&(expr)) 85 85 86 86 /* Magic constants used with __retval() */

+16

tools/testing/selftests/bpf/progs/bpf_tracing_net.h

··· 51 51 #define ICSK_TIME_LOSS_PROBE 5 52 52 #define ICSK_TIME_REO_TIMEOUT 6 53 53 54 + #define ETH_ALEN 6 54 55 #define ETH_HLEN 14 56 + #define ETH_P_IP 0x0800 55 57 #define ETH_P_IPV6 0x86DD 58 + 59 + #define NEXTHDR_TCP 6 60 + 61 + #define TCPOPT_NOP 1 62 + #define TCPOPT_EOL 0 63 + #define TCPOPT_MSS 2 64 + #define TCPOPT_WINDOW 3 65 + #define TCPOPT_TIMESTAMP 8 66 + #define TCPOPT_SACK_PERM 4 67 + 68 + #define TCPOLEN_MSS 4 69 + #define TCPOLEN_WINDOW 3 70 + #define TCPOLEN_TIMESTAMP 10 71 + #define TCPOLEN_SACK_PERM 2 56 72 57 73 #define CHECKSUM_NONE 0 58 74 #define CHECKSUM_PARTIAL 3

+2 -2

tools/testing/selftests/bpf/progs/iters.c

··· 78 78 "*(u32 *)(r1 + 0) = r6;" /* invalid */ 79 79 : 80 80 : [it]"r"(&it), 81 - [small_arr]"p"(small_arr), 82 - [zero]"p"(zero), 81 + [small_arr]"r"(small_arr), 82 + [zero]"r"(zero), 83 83 __imm(bpf_iter_num_new), 84 84 __imm(bpf_iter_num_next), 85 85 __imm(bpf_iter_num_destroy)

+48

tools/testing/selftests/bpf/progs/kptr_xchg_inline.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (C) 2023. Huawei Technologies Co., Ltd */ 3 + #include <linux/types.h> 4 + #include <bpf/bpf_helpers.h> 5 + 6 + #include "bpf_experimental.h" 7 + #include "bpf_misc.h" 8 + 9 + char _license[] SEC("license") = "GPL"; 10 + 11 + struct bin_data { 12 + char blob[32]; 13 + }; 14 + 15 + #define private(name) SEC(".bss." #name) __hidden __attribute__((aligned(8))) 16 + private(kptr) struct bin_data __kptr * ptr; 17 + 18 + SEC("tc") 19 + __naked int kptr_xchg_inline(void) 20 + { 21 + asm volatile ( 22 + "r1 = %[ptr] ll;" 23 + "r2 = 0;" 24 + "call %[bpf_kptr_xchg];" 25 + "if r0 == 0 goto 1f;" 26 + "r1 = r0;" 27 + "r2 = 0;" 28 + "call %[bpf_obj_drop_impl];" 29 + "1:" 30 + "r0 = 0;" 31 + "exit;" 32 + : 33 + : __imm_addr(ptr), 34 + __imm(bpf_kptr_xchg), 35 + __imm(bpf_obj_drop_impl) 36 + : __clobber_all 37 + ); 38 + } 39 + 40 + /* BTF FUNC records are not generated for kfuncs referenced 41 + * from inline assembly. These records are necessary for 42 + * libbpf to link the program. The function below is a hack 43 + * to ensure that BTF FUNC records are generated. 44 + */ 45 + void __btf_root(void) 46 + { 47 + bpf_obj_drop(NULL); 48 + }

+13

tools/testing/selftests/bpf/progs/priv_map.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */ 3 + 4 + #include "vmlinux.h" 5 + #include <bpf/bpf_helpers.h> 6 + 7 + char _license[] SEC("license") = "GPL"; 8 + 9 + struct { 10 + __uint(type, BPF_MAP_TYPE_QUEUE); 11 + __uint(max_entries, 1); 12 + __type(value, __u32); 13 + } priv_map SEC(".maps");

+13

tools/testing/selftests/bpf/progs/priv_prog.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */ 3 + 4 + #include "vmlinux.h" 5 + #include <bpf/bpf_helpers.h> 6 + 7 + char _license[] SEC("license") = "GPL"; 8 + 9 + SEC("kprobe") 10 + int kprobe_prog(void *ctx) 11 + { 12 + return 1; 13 + }

+30

tools/testing/selftests/bpf/progs/struct_ops_module.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */ 3 + #include <vmlinux.h> 4 + #include <bpf/bpf_helpers.h> 5 + #include <bpf/bpf_tracing.h> 6 + #include "../bpf_testmod/bpf_testmod.h" 7 + 8 + char _license[] SEC("license") = "GPL"; 9 + 10 + int test_2_result = 0; 11 + 12 + SEC("struct_ops/test_1") 13 + int BPF_PROG(test_1) 14 + { 15 + return 0xdeadbeef; 16 + } 17 + 18 + SEC("struct_ops/test_2") 19 + int BPF_PROG(test_2, int a, int b) 20 + { 21 + test_2_result = a + b; 22 + return a + b; 23 + } 24 + 25 + SEC(".struct_ops.link") 26 + struct bpf_testmod_ops testmod_1 = { 27 + .test_1 = (void *)test_1, 28 + .test_2 = (void *)test_2, 29 + }; 30 +

+1 -1

tools/testing/selftests/bpf/progs/test_core_reloc_type_id.c

··· 80 80 * to detect whether this test has to be executed, however strange 81 81 * that might look like. 82 82 * 83 - * [0] https://reviews.llvm.org/D85174 83 + * [0] https://github.com/llvm/llvm-project/commit/00602ee7ef0bf6c68d690a2bd729c12b95c95c99 84 84 */ 85 85 #if __has_builtin(__builtin_preserve_type_info) 86 86 struct core_reloc_type_id_output *out = (void *)&data.out;

+6

tools/testing/selftests/bpf/progs/test_fill_link_info.c

··· 33 33 return 0; 34 34 } 35 35 36 + SEC("perf_event") 37 + int event_run(void *ctx) 38 + { 39 + return 0; 40 + } 41 + 36 42 SEC("kprobe.multi") 37 43 int BPF_PROG(kmulti_run) 38 44 {

+26

tools/testing/selftests/bpf/progs/test_map_in_map.c

··· 21 21 __type(value, __u32); 22 22 } mim_hash SEC(".maps"); 23 23 24 + /* The following three maps are used to test 25 + * perf_event_array map can be an inner 26 + * map of hash/array_of_maps. 27 + */ 28 + struct perf_event_array { 29 + __uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY); 30 + __type(key, __u32); 31 + __type(value, __u32); 32 + } inner_map0 SEC(".maps"); 33 + 34 + struct { 35 + __uint(type, BPF_MAP_TYPE_ARRAY_OF_MAPS); 36 + __uint(max_entries, 1); 37 + __type(key, __u32); 38 + __array(values, struct perf_event_array); 39 + } mim_array_pe SEC(".maps") = { 40 + .values = {&inner_map0}}; 41 + 42 + struct { 43 + __uint(type, BPF_MAP_TYPE_HASH_OF_MAPS); 44 + __uint(max_entries, 1); 45 + __type(key, __u32); 46 + __array(values, struct perf_event_array); 47 + } mim_hash_pe SEC(".maps") = { 48 + .values = {&inner_map0}}; 49 + 24 50 SEC("xdp") 25 51 int xdp_mimtest0(struct xdp_md *ctx) 26 52 {

+64

tools/testing/selftests/bpf/progs/test_siphash.h

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright Amazon.com Inc. or its affiliates. */ 3 + 4 + #ifndef _TEST_SIPHASH_H 5 + #define _TEST_SIPHASH_H 6 + 7 + /* include/linux/bitops.h */ 8 + static inline u64 rol64(u64 word, unsigned int shift) 9 + { 10 + return (word << (shift & 63)) | (word >> ((-shift) & 63)); 11 + } 12 + 13 + /* include/linux/siphash.h */ 14 + #define SIPHASH_PERMUTATION(a, b, c, d) ( \ 15 + (a) += (b), (b) = rol64((b), 13), (b) ^= (a), (a) = rol64((a), 32), \ 16 + (c) += (d), (d) = rol64((d), 16), (d) ^= (c), \ 17 + (a) += (d), (d) = rol64((d), 21), (d) ^= (a), \ 18 + (c) += (b), (b) = rol64((b), 17), (b) ^= (c), (c) = rol64((c), 32)) 19 + 20 + #define SIPHASH_CONST_0 0x736f6d6570736575ULL 21 + #define SIPHASH_CONST_1 0x646f72616e646f6dULL 22 + #define SIPHASH_CONST_2 0x6c7967656e657261ULL 23 + #define SIPHASH_CONST_3 0x7465646279746573ULL 24 + 25 + /* lib/siphash.c */ 26 + #define SIPROUND SIPHASH_PERMUTATION(v0, v1, v2, v3) 27 + 28 + #define PREAMBLE(len) \ 29 + u64 v0 = SIPHASH_CONST_0; \ 30 + u64 v1 = SIPHASH_CONST_1; \ 31 + u64 v2 = SIPHASH_CONST_2; \ 32 + u64 v3 = SIPHASH_CONST_3; \ 33 + u64 b = ((u64)(len)) << 56; \ 34 + v3 ^= key->key[1]; \ 35 + v2 ^= key->key[0]; \ 36 + v1 ^= key->key[1]; \ 37 + v0 ^= key->key[0]; 38 + 39 + #define POSTAMBLE \ 40 + v3 ^= b; \ 41 + SIPROUND; \ 42 + SIPROUND; \ 43 + v0 ^= b; \ 44 + v2 ^= 0xff; \ 45 + SIPROUND; \ 46 + SIPROUND; \ 47 + SIPROUND; \ 48 + SIPROUND; \ 49 + return (v0 ^ v1) ^ (v2 ^ v3); 50 + 51 + static inline u64 siphash_2u64(const u64 first, const u64 second, const siphash_key_t *key) 52 + { 53 + PREAMBLE(16) 54 + v3 ^= first; 55 + SIPROUND; 56 + SIPROUND; 57 + v0 ^= first; 58 + v3 ^= second; 59 + SIPROUND; 60 + SIPROUND; 61 + v0 ^= second; 62 + POSTAMBLE 63 + } 64 + #endif

+572

tools/testing/selftests/bpf/progs/test_tcp_custom_syncookie.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright Amazon.com Inc. or its affiliates. */ 3 + 4 + #include "vmlinux.h" 5 + 6 + #include <bpf/bpf_helpers.h> 7 + #include <bpf/bpf_endian.h> 8 + #include "bpf_tracing_net.h" 9 + #include "bpf_kfuncs.h" 10 + #include "test_siphash.h" 11 + #include "test_tcp_custom_syncookie.h" 12 + 13 + /* Hash is calculated for each client and split into ISN and TS. 14 + * 15 + * MSB LSB 16 + * ISN: | 31 ... 8 | 7 6 | 5 | 4 | 3 2 1 0 | 17 + * | Hash_1 | MSS | ECN | SACK | WScale | 18 + * 19 + * TS: | 31 ... 8 | 7 ... 0 | 20 + * | Random | Hash_2 | 21 + */ 22 + #define COOKIE_BITS 8 23 + #define COOKIE_MASK (((__u32)1 << COOKIE_BITS) - 1) 24 + 25 + enum { 26 + /* 0xf is invalid thus means that SYN did not have WScale. */ 27 + BPF_SYNCOOKIE_WSCALE_MASK = (1 << 4) - 1, 28 + BPF_SYNCOOKIE_SACK = (1 << 4), 29 + BPF_SYNCOOKIE_ECN = (1 << 5), 30 + }; 31 + 32 + #define MSS_LOCAL_IPV4 65495 33 + #define MSS_LOCAL_IPV6 65476 34 + 35 + const __u16 msstab4[] = { 36 + 536, 37 + 1300, 38 + 1460, 39 + MSS_LOCAL_IPV4, 40 + }; 41 + 42 + const __u16 msstab6[] = { 43 + 1280 - 60, /* IPV6_MIN_MTU - 60 */ 44 + 1480 - 60, 45 + 9000 - 60, 46 + MSS_LOCAL_IPV6, 47 + }; 48 + 49 + static siphash_key_t test_key_siphash = { 50 + { 0x0706050403020100ULL, 0x0f0e0d0c0b0a0908ULL } 51 + }; 52 + 53 + struct tcp_syncookie { 54 + struct __sk_buff *skb; 55 + void *data_end; 56 + struct ethhdr *eth; 57 + struct iphdr *ipv4; 58 + struct ipv6hdr *ipv6; 59 + struct tcphdr *tcp; 60 + union { 61 + char *ptr; 62 + __be32 *ptr32; 63 + }; 64 + struct bpf_tcp_req_attrs attrs; 65 + u32 cookie; 66 + u64 first; 67 + }; 68 + 69 + bool handled_syn, handled_ack; 70 + 71 + static int tcp_load_headers(struct tcp_syncookie *ctx) 72 + { 73 + ctx->data_end = (void *)(long)ctx->skb->data_end; 74 + ctx->eth = (struct ethhdr *)(long)ctx->skb->data; 75 + 76 + if (ctx->eth + 1 > ctx->data_end) 77 + goto err; 78 + 79 + switch (bpf_ntohs(ctx->eth->h_proto)) { 80 + case ETH_P_IP: 81 + ctx->ipv4 = (struct iphdr *)(ctx->eth + 1); 82 + 83 + if (ctx->ipv4 + 1 > ctx->data_end) 84 + goto err; 85 + 86 + if (ctx->ipv4->ihl != sizeof(*ctx->ipv4) / 4) 87 + goto err; 88 + 89 + if (ctx->ipv4->version != 4) 90 + goto err; 91 + 92 + if (ctx->ipv4->protocol != IPPROTO_TCP) 93 + goto err; 94 + 95 + ctx->tcp = (struct tcphdr *)(ctx->ipv4 + 1); 96 + break; 97 + case ETH_P_IPV6: 98 + ctx->ipv6 = (struct ipv6hdr *)(ctx->eth + 1); 99 + 100 + if (ctx->ipv6 + 1 > ctx->data_end) 101 + goto err; 102 + 103 + if (ctx->ipv6->version != 6) 104 + goto err; 105 + 106 + if (ctx->ipv6->nexthdr != NEXTHDR_TCP) 107 + goto err; 108 + 109 + ctx->tcp = (struct tcphdr *)(ctx->ipv6 + 1); 110 + break; 111 + default: 112 + goto err; 113 + } 114 + 115 + if (ctx->tcp + 1 > ctx->data_end) 116 + goto err; 117 + 118 + return 0; 119 + err: 120 + return -1; 121 + } 122 + 123 + static int tcp_reload_headers(struct tcp_syncookie *ctx) 124 + { 125 + /* Without volatile, 126 + * R3 32-bit pointer arithmetic prohibited 127 + */ 128 + volatile u64 data_len = ctx->skb->data_end - ctx->skb->data; 129 + 130 + if (ctx->tcp->doff < sizeof(*ctx->tcp) / 4) 131 + goto err; 132 + 133 + /* Needed to calculate csum and parse TCP options. */ 134 + if (bpf_skb_change_tail(ctx->skb, data_len + 60 - ctx->tcp->doff * 4, 0)) 135 + goto err; 136 + 137 + ctx->data_end = (void *)(long)ctx->skb->data_end; 138 + ctx->eth = (struct ethhdr *)(long)ctx->skb->data; 139 + if (ctx->ipv4) { 140 + ctx->ipv4 = (struct iphdr *)(ctx->eth + 1); 141 + ctx->ipv6 = NULL; 142 + ctx->tcp = (struct tcphdr *)(ctx->ipv4 + 1); 143 + } else { 144 + ctx->ipv4 = NULL; 145 + ctx->ipv6 = (struct ipv6hdr *)(ctx->eth + 1); 146 + ctx->tcp = (struct tcphdr *)(ctx->ipv6 + 1); 147 + } 148 + 149 + if ((void *)ctx->tcp + 60 > ctx->data_end) 150 + goto err; 151 + 152 + return 0; 153 + err: 154 + return -1; 155 + } 156 + 157 + static __sum16 tcp_v4_csum(struct tcp_syncookie *ctx, __wsum csum) 158 + { 159 + return csum_tcpudp_magic(ctx->ipv4->saddr, ctx->ipv4->daddr, 160 + ctx->tcp->doff * 4, IPPROTO_TCP, csum); 161 + } 162 + 163 + static __sum16 tcp_v6_csum(struct tcp_syncookie *ctx, __wsum csum) 164 + { 165 + return csum_ipv6_magic(&ctx->ipv6->saddr, &ctx->ipv6->daddr, 166 + ctx->tcp->doff * 4, IPPROTO_TCP, csum); 167 + } 168 + 169 + static int tcp_validate_header(struct tcp_syncookie *ctx) 170 + { 171 + s64 csum; 172 + 173 + if (tcp_reload_headers(ctx)) 174 + goto err; 175 + 176 + csum = bpf_csum_diff(0, 0, (void *)ctx->tcp, ctx->tcp->doff * 4, 0); 177 + if (csum < 0) 178 + goto err; 179 + 180 + if (ctx->ipv4) { 181 + /* check tcp_v4_csum(csum) is 0 if not on lo. */ 182 + 183 + csum = bpf_csum_diff(0, 0, (void *)ctx->ipv4, ctx->ipv4->ihl * 4, 0); 184 + if (csum < 0) 185 + goto err; 186 + 187 + if (csum_fold(csum) != 0) 188 + goto err; 189 + } else if (ctx->ipv6) { 190 + /* check tcp_v6_csum(csum) is 0 if not on lo. */ 191 + } 192 + 193 + return 0; 194 + err: 195 + return -1; 196 + } 197 + 198 + static int tcp_parse_option(__u32 index, struct tcp_syncookie *ctx) 199 + { 200 + char opcode, opsize; 201 + 202 + if (ctx->ptr + 1 > ctx->data_end) 203 + goto stop; 204 + 205 + opcode = *ctx->ptr++; 206 + 207 + if (opcode == TCPOPT_EOL) 208 + goto stop; 209 + 210 + if (opcode == TCPOPT_NOP) 211 + goto next; 212 + 213 + if (ctx->ptr + 1 > ctx->data_end) 214 + goto stop; 215 + 216 + opsize = *ctx->ptr++; 217 + 218 + if (opsize < 2) 219 + goto stop; 220 + 221 + switch (opcode) { 222 + case TCPOPT_MSS: 223 + if (opsize == TCPOLEN_MSS && ctx->tcp->syn && 224 + ctx->ptr + (TCPOLEN_MSS - 2) < ctx->data_end) 225 + ctx->attrs.mss = get_unaligned_be16(ctx->ptr); 226 + break; 227 + case TCPOPT_WINDOW: 228 + if (opsize == TCPOLEN_WINDOW && ctx->tcp->syn && 229 + ctx->ptr + (TCPOLEN_WINDOW - 2) < ctx->data_end) { 230 + ctx->attrs.wscale_ok = 1; 231 + ctx->attrs.snd_wscale = *ctx->ptr; 232 + } 233 + break; 234 + case TCPOPT_TIMESTAMP: 235 + if (opsize == TCPOLEN_TIMESTAMP && 236 + ctx->ptr + (TCPOLEN_TIMESTAMP - 2) < ctx->data_end) { 237 + ctx->attrs.rcv_tsval = get_unaligned_be32(ctx->ptr); 238 + ctx->attrs.rcv_tsecr = get_unaligned_be32(ctx->ptr + 4); 239 + 240 + if (ctx->tcp->syn && ctx->attrs.rcv_tsecr) 241 + ctx->attrs.tstamp_ok = 0; 242 + else 243 + ctx->attrs.tstamp_ok = 1; 244 + } 245 + break; 246 + case TCPOPT_SACK_PERM: 247 + if (opsize == TCPOLEN_SACK_PERM && ctx->tcp->syn && 248 + ctx->ptr + (TCPOLEN_SACK_PERM - 2) < ctx->data_end) 249 + ctx->attrs.sack_ok = 1; 250 + break; 251 + } 252 + 253 + ctx->ptr += opsize - 2; 254 + next: 255 + return 0; 256 + stop: 257 + return 1; 258 + } 259 + 260 + static void tcp_parse_options(struct tcp_syncookie *ctx) 261 + { 262 + ctx->ptr = (char *)(ctx->tcp + 1); 263 + 264 + bpf_loop(40, tcp_parse_option, ctx, 0); 265 + } 266 + 267 + static int tcp_validate_sysctl(struct tcp_syncookie *ctx) 268 + { 269 + if ((ctx->ipv4 && ctx->attrs.mss != MSS_LOCAL_IPV4) || 270 + (ctx->ipv6 && ctx->attrs.mss != MSS_LOCAL_IPV6)) 271 + goto err; 272 + 273 + if (!ctx->attrs.wscale_ok || ctx->attrs.snd_wscale != 7) 274 + goto err; 275 + 276 + if (!ctx->attrs.tstamp_ok) 277 + goto err; 278 + 279 + if (!ctx->attrs.sack_ok) 280 + goto err; 281 + 282 + if (!ctx->tcp->ece || !ctx->tcp->cwr) 283 + goto err; 284 + 285 + return 0; 286 + err: 287 + return -1; 288 + } 289 + 290 + static void tcp_prepare_cookie(struct tcp_syncookie *ctx) 291 + { 292 + u32 seq = bpf_ntohl(ctx->tcp->seq); 293 + u64 first = 0, second; 294 + int mssind = 0; 295 + u32 hash; 296 + 297 + if (ctx->ipv4) { 298 + for (mssind = ARRAY_SIZE(msstab4) - 1; mssind; mssind--) 299 + if (ctx->attrs.mss >= msstab4[mssind]) 300 + break; 301 + 302 + ctx->attrs.mss = msstab4[mssind]; 303 + 304 + first = (u64)ctx->ipv4->saddr << 32 | ctx->ipv4->daddr; 305 + } else if (ctx->ipv6) { 306 + for (mssind = ARRAY_SIZE(msstab6) - 1; mssind; mssind--) 307 + if (ctx->attrs.mss >= msstab6[mssind]) 308 + break; 309 + 310 + ctx->attrs.mss = msstab6[mssind]; 311 + 312 + first = (u64)ctx->ipv6->saddr.in6_u.u6_addr8[0] << 32 | 313 + ctx->ipv6->daddr.in6_u.u6_addr32[0]; 314 + } 315 + 316 + second = (u64)seq << 32 | ctx->tcp->source << 16 | ctx->tcp->dest; 317 + hash = siphash_2u64(first, second, &test_key_siphash); 318 + 319 + if (ctx->attrs.tstamp_ok) { 320 + ctx->attrs.rcv_tsecr = bpf_get_prandom_u32(); 321 + ctx->attrs.rcv_tsecr &= ~COOKIE_MASK; 322 + ctx->attrs.rcv_tsecr |= hash & COOKIE_MASK; 323 + } 324 + 325 + hash &= ~COOKIE_MASK; 326 + hash |= mssind << 6; 327 + 328 + if (ctx->attrs.wscale_ok) 329 + hash |= ctx->attrs.snd_wscale & BPF_SYNCOOKIE_WSCALE_MASK; 330 + 331 + if (ctx->attrs.sack_ok) 332 + hash |= BPF_SYNCOOKIE_SACK; 333 + 334 + if (ctx->attrs.tstamp_ok && ctx->tcp->ece && ctx->tcp->cwr) 335 + hash |= BPF_SYNCOOKIE_ECN; 336 + 337 + ctx->cookie = hash; 338 + } 339 + 340 + static void tcp_write_options(struct tcp_syncookie *ctx) 341 + { 342 + ctx->ptr32 = (__be32 *)(ctx->tcp + 1); 343 + 344 + *ctx->ptr32++ = bpf_htonl(TCPOPT_MSS << 24 | TCPOLEN_MSS << 16 | 345 + ctx->attrs.mss); 346 + 347 + if (ctx->attrs.wscale_ok) 348 + *ctx->ptr32++ = bpf_htonl(TCPOPT_NOP << 24 | 349 + TCPOPT_WINDOW << 16 | 350 + TCPOLEN_WINDOW << 8 | 351 + ctx->attrs.snd_wscale); 352 + 353 + if (ctx->attrs.tstamp_ok) { 354 + if (ctx->attrs.sack_ok) 355 + *ctx->ptr32++ = bpf_htonl(TCPOPT_SACK_PERM << 24 | 356 + TCPOLEN_SACK_PERM << 16 | 357 + TCPOPT_TIMESTAMP << 8 | 358 + TCPOLEN_TIMESTAMP); 359 + else 360 + *ctx->ptr32++ = bpf_htonl(TCPOPT_NOP << 24 | 361 + TCPOPT_NOP << 16 | 362 + TCPOPT_TIMESTAMP << 8 | 363 + TCPOLEN_TIMESTAMP); 364 + 365 + *ctx->ptr32++ = bpf_htonl(ctx->attrs.rcv_tsecr); 366 + *ctx->ptr32++ = bpf_htonl(ctx->attrs.rcv_tsval); 367 + } else if (ctx->attrs.sack_ok) { 368 + *ctx->ptr32++ = bpf_htonl(TCPOPT_NOP << 24 | 369 + TCPOPT_NOP << 16 | 370 + TCPOPT_SACK_PERM << 8 | 371 + TCPOLEN_SACK_PERM); 372 + } 373 + } 374 + 375 + static int tcp_handle_syn(struct tcp_syncookie *ctx) 376 + { 377 + s64 csum; 378 + 379 + if (tcp_validate_header(ctx)) 380 + goto err; 381 + 382 + tcp_parse_options(ctx); 383 + 384 + if (tcp_validate_sysctl(ctx)) 385 + goto err; 386 + 387 + tcp_prepare_cookie(ctx); 388 + tcp_write_options(ctx); 389 + 390 + swap(ctx->tcp->source, ctx->tcp->dest); 391 + ctx->tcp->check = 0; 392 + ctx->tcp->ack_seq = bpf_htonl(bpf_ntohl(ctx->tcp->seq) + 1); 393 + ctx->tcp->seq = bpf_htonl(ctx->cookie); 394 + ctx->tcp->doff = ((long)ctx->ptr32 - (long)ctx->tcp) >> 2; 395 + ctx->tcp->ack = 1; 396 + if (!ctx->attrs.tstamp_ok || !ctx->tcp->ece || !ctx->tcp->cwr) 397 + ctx->tcp->ece = 0; 398 + ctx->tcp->cwr = 0; 399 + 400 + csum = bpf_csum_diff(0, 0, (void *)ctx->tcp, ctx->tcp->doff * 4, 0); 401 + if (csum < 0) 402 + goto err; 403 + 404 + if (ctx->ipv4) { 405 + swap(ctx->ipv4->saddr, ctx->ipv4->daddr); 406 + ctx->tcp->check = tcp_v4_csum(ctx, csum); 407 + 408 + ctx->ipv4->check = 0; 409 + ctx->ipv4->tos = 0; 410 + ctx->ipv4->tot_len = bpf_htons((long)ctx->ptr32 - (long)ctx->ipv4); 411 + ctx->ipv4->id = 0; 412 + ctx->ipv4->ttl = 64; 413 + 414 + csum = bpf_csum_diff(0, 0, (void *)ctx->ipv4, sizeof(*ctx->ipv4), 0); 415 + if (csum < 0) 416 + goto err; 417 + 418 + ctx->ipv4->check = csum_fold(csum); 419 + } else if (ctx->ipv6) { 420 + swap(ctx->ipv6->saddr, ctx->ipv6->daddr); 421 + ctx->tcp->check = tcp_v6_csum(ctx, csum); 422 + 423 + *(__be32 *)ctx->ipv6 = bpf_htonl(0x60000000); 424 + ctx->ipv6->payload_len = bpf_htons((long)ctx->ptr32 - (long)ctx->tcp); 425 + ctx->ipv6->hop_limit = 64; 426 + } 427 + 428 + swap_array(ctx->eth->h_source, ctx->eth->h_dest); 429 + 430 + if (bpf_skb_change_tail(ctx->skb, (long)ctx->ptr32 - (long)ctx->eth, 0)) 431 + goto err; 432 + 433 + return bpf_redirect(ctx->skb->ifindex, 0); 434 + err: 435 + return TC_ACT_SHOT; 436 + } 437 + 438 + static int tcp_validate_cookie(struct tcp_syncookie *ctx) 439 + { 440 + u32 cookie = bpf_ntohl(ctx->tcp->ack_seq) - 1; 441 + u32 seq = bpf_ntohl(ctx->tcp->seq) - 1; 442 + u64 first = 0, second; 443 + int mssind; 444 + u32 hash; 445 + 446 + if (ctx->ipv4) 447 + first = (u64)ctx->ipv4->saddr << 32 | ctx->ipv4->daddr; 448 + else if (ctx->ipv6) 449 + first = (u64)ctx->ipv6->saddr.in6_u.u6_addr8[0] << 32 | 450 + ctx->ipv6->daddr.in6_u.u6_addr32[0]; 451 + 452 + second = (u64)seq << 32 | ctx->tcp->source << 16 | ctx->tcp->dest; 453 + hash = siphash_2u64(first, second, &test_key_siphash); 454 + 455 + if (ctx->attrs.tstamp_ok) 456 + hash -= ctx->attrs.rcv_tsecr & COOKIE_MASK; 457 + else 458 + hash &= ~COOKIE_MASK; 459 + 460 + hash -= cookie & ~COOKIE_MASK; 461 + if (hash) 462 + goto err; 463 + 464 + mssind = (cookie & (3 << 6)) >> 6; 465 + if (ctx->ipv4) { 466 + if (mssind > ARRAY_SIZE(msstab4)) 467 + goto err; 468 + 469 + ctx->attrs.mss = msstab4[mssind]; 470 + } else { 471 + if (mssind > ARRAY_SIZE(msstab6)) 472 + goto err; 473 + 474 + ctx->attrs.mss = msstab6[mssind]; 475 + } 476 + 477 + ctx->attrs.snd_wscale = cookie & BPF_SYNCOOKIE_WSCALE_MASK; 478 + ctx->attrs.rcv_wscale = ctx->attrs.snd_wscale; 479 + ctx->attrs.wscale_ok = ctx->attrs.snd_wscale == BPF_SYNCOOKIE_WSCALE_MASK; 480 + ctx->attrs.sack_ok = cookie & BPF_SYNCOOKIE_SACK; 481 + ctx->attrs.ecn_ok = cookie & BPF_SYNCOOKIE_ECN; 482 + 483 + return 0; 484 + err: 485 + return -1; 486 + } 487 + 488 + static int tcp_handle_ack(struct tcp_syncookie *ctx) 489 + { 490 + struct bpf_sock_tuple tuple; 491 + struct bpf_sock *skc; 492 + int ret = TC_ACT_OK; 493 + struct sock *sk; 494 + u32 tuple_size; 495 + 496 + if (ctx->ipv4) { 497 + tuple.ipv4.saddr = ctx->ipv4->saddr; 498 + tuple.ipv4.daddr = ctx->ipv4->daddr; 499 + tuple.ipv4.sport = ctx->tcp->source; 500 + tuple.ipv4.dport = ctx->tcp->dest; 501 + tuple_size = sizeof(tuple.ipv4); 502 + } else if (ctx->ipv6) { 503 + __builtin_memcpy(tuple.ipv6.saddr, &ctx->ipv6->saddr, sizeof(tuple.ipv6.saddr)); 504 + __builtin_memcpy(tuple.ipv6.daddr, &ctx->ipv6->daddr, sizeof(tuple.ipv6.daddr)); 505 + tuple.ipv6.sport = ctx->tcp->source; 506 + tuple.ipv6.dport = ctx->tcp->dest; 507 + tuple_size = sizeof(tuple.ipv6); 508 + } else { 509 + goto out; 510 + } 511 + 512 + skc = bpf_skc_lookup_tcp(ctx->skb, &tuple, tuple_size, -1, 0); 513 + if (!skc) 514 + goto out; 515 + 516 + if (skc->state != TCP_LISTEN) 517 + goto release; 518 + 519 + sk = (struct sock *)bpf_skc_to_tcp_sock(skc); 520 + if (!sk) 521 + goto err; 522 + 523 + if (tcp_validate_header(ctx)) 524 + goto err; 525 + 526 + tcp_parse_options(ctx); 527 + 528 + if (tcp_validate_cookie(ctx)) 529 + goto err; 530 + 531 + ret = bpf_sk_assign_tcp_reqsk(ctx->skb, sk, &ctx->attrs, sizeof(ctx->attrs)); 532 + if (ret < 0) 533 + goto err; 534 + 535 + release: 536 + bpf_sk_release(skc); 537 + out: 538 + return ret; 539 + 540 + err: 541 + ret = TC_ACT_SHOT; 542 + goto release; 543 + } 544 + 545 + SEC("tc") 546 + int tcp_custom_syncookie(struct __sk_buff *skb) 547 + { 548 + struct tcp_syncookie ctx = { 549 + .skb = skb, 550 + }; 551 + 552 + if (tcp_load_headers(&ctx)) 553 + return TC_ACT_OK; 554 + 555 + if (ctx.tcp->rst) 556 + return TC_ACT_OK; 557 + 558 + if (ctx.tcp->syn) { 559 + if (ctx.tcp->ack) 560 + return TC_ACT_OK; 561 + 562 + handled_syn = true; 563 + 564 + return tcp_handle_syn(&ctx); 565 + } 566 + 567 + handled_ack = true; 568 + 569 + return tcp_handle_ack(&ctx); 570 + } 571 + 572 + char _license[] SEC("license") = "GPL";

+140

tools/testing/selftests/bpf/progs/test_tcp_custom_syncookie.h

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright Amazon.com Inc. or its affiliates. */ 3 + 4 + #ifndef _TEST_TCP_SYNCOOKIE_H 5 + #define _TEST_TCP_SYNCOOKIE_H 6 + 7 + #define __packed __attribute__((__packed__)) 8 + #define __force 9 + 10 + #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0])) 11 + 12 + #define swap(a, b) \ 13 + do { \ 14 + typeof(a) __tmp = (a); \ 15 + (a) = (b); \ 16 + (b) = __tmp; \ 17 + } while (0) 18 + 19 + #define swap_array(a, b) \ 20 + do { \ 21 + typeof(a) __tmp[sizeof(a)]; \ 22 + __builtin_memcpy(__tmp, a, sizeof(a)); \ 23 + __builtin_memcpy(a, b, sizeof(a)); \ 24 + __builtin_memcpy(b, __tmp, sizeof(a)); \ 25 + } while (0) 26 + 27 + /* asm-generic/unaligned.h */ 28 + #define __get_unaligned_t(type, ptr) ({ \ 29 + const struct { type x; } __packed * __pptr = (typeof(__pptr))(ptr); \ 30 + __pptr->x; \ 31 + }) 32 + 33 + #define get_unaligned(ptr) __get_unaligned_t(typeof(*(ptr)), (ptr)) 34 + 35 + static inline u16 get_unaligned_be16(const void *p) 36 + { 37 + return bpf_ntohs(__get_unaligned_t(__be16, p)); 38 + } 39 + 40 + static inline u32 get_unaligned_be32(const void *p) 41 + { 42 + return bpf_ntohl(__get_unaligned_t(__be32, p)); 43 + } 44 + 45 + /* lib/checksum.c */ 46 + static inline u32 from64to32(u64 x) 47 + { 48 + /* add up 32-bit and 32-bit for 32+c bit */ 49 + x = (x & 0xffffffff) + (x >> 32); 50 + /* add up carry.. */ 51 + x = (x & 0xffffffff) + (x >> 32); 52 + return (u32)x; 53 + } 54 + 55 + static inline __wsum csum_tcpudp_nofold(__be32 saddr, __be32 daddr, 56 + __u32 len, __u8 proto, __wsum sum) 57 + { 58 + unsigned long long s = (__force u32)sum; 59 + 60 + s += (__force u32)saddr; 61 + s += (__force u32)daddr; 62 + #ifdef __BIG_ENDIAN 63 + s += proto + len; 64 + #else 65 + s += (proto + len) << 8; 66 + #endif 67 + return (__force __wsum)from64to32(s); 68 + } 69 + 70 + /* asm-generic/checksum.h */ 71 + static inline __sum16 csum_fold(__wsum csum) 72 + { 73 + u32 sum = (__force u32)csum; 74 + 75 + sum = (sum & 0xffff) + (sum >> 16); 76 + sum = (sum & 0xffff) + (sum >> 16); 77 + return (__force __sum16)~sum; 78 + } 79 + 80 + static inline __sum16 csum_tcpudp_magic(__be32 saddr, __be32 daddr, __u32 len, 81 + __u8 proto, __wsum sum) 82 + { 83 + return csum_fold(csum_tcpudp_nofold(saddr, daddr, len, proto, sum)); 84 + } 85 + 86 + /* net/ipv6/ip6_checksum.c */ 87 + static inline __sum16 csum_ipv6_magic(const struct in6_addr *saddr, 88 + const struct in6_addr *daddr, 89 + __u32 len, __u8 proto, __wsum csum) 90 + { 91 + int carry; 92 + __u32 ulen; 93 + __u32 uproto; 94 + __u32 sum = (__force u32)csum; 95 + 96 + sum += (__force u32)saddr->in6_u.u6_addr32[0]; 97 + carry = (sum < (__force u32)saddr->in6_u.u6_addr32[0]); 98 + sum += carry; 99 + 100 + sum += (__force u32)saddr->in6_u.u6_addr32[1]; 101 + carry = (sum < (__force u32)saddr->in6_u.u6_addr32[1]); 102 + sum += carry; 103 + 104 + sum += (__force u32)saddr->in6_u.u6_addr32[2]; 105 + carry = (sum < (__force u32)saddr->in6_u.u6_addr32[2]); 106 + sum += carry; 107 + 108 + sum += (__force u32)saddr->in6_u.u6_addr32[3]; 109 + carry = (sum < (__force u32)saddr->in6_u.u6_addr32[3]); 110 + sum += carry; 111 + 112 + sum += (__force u32)daddr->in6_u.u6_addr32[0]; 113 + carry = (sum < (__force u32)daddr->in6_u.u6_addr32[0]); 114 + sum += carry; 115 + 116 + sum += (__force u32)daddr->in6_u.u6_addr32[1]; 117 + carry = (sum < (__force u32)daddr->in6_u.u6_addr32[1]); 118 + sum += carry; 119 + 120 + sum += (__force u32)daddr->in6_u.u6_addr32[2]; 121 + carry = (sum < (__force u32)daddr->in6_u.u6_addr32[2]); 122 + sum += carry; 123 + 124 + sum += (__force u32)daddr->in6_u.u6_addr32[3]; 125 + carry = (sum < (__force u32)daddr->in6_u.u6_addr32[3]); 126 + sum += carry; 127 + 128 + ulen = (__force u32)bpf_htonl((__u32)len); 129 + sum += ulen; 130 + carry = (sum < ulen); 131 + sum += carry; 132 + 133 + uproto = (__force u32)bpf_htonl(proto); 134 + sum += uproto; 135 + carry = (sum < uproto); 136 + sum += carry; 137 + 138 + return csum_fold((__force __wsum)sum); 139 + } 140 + #endif

+1 -1

tools/testing/selftests/bpf/progs/test_tcpbpf_kern.c

··· 59 59 60 60 asm volatile ( 61 61 "%[op] = *(u32 *)(%[skops] +96)" 62 - : [op] "+r"(op) 62 + : [op] "=r"(op) 63 63 : [skops] "r"(skops) 64 64 :); 65 65

+5 -5

tools/testing/selftests/bpf/progs/test_xdp_dynptr.c

··· 18 18 #include "test_iptunnel_common.h" 19 19 #include "bpf_kfuncs.h" 20 20 21 - const size_t tcphdr_sz = sizeof(struct tcphdr); 22 - const size_t udphdr_sz = sizeof(struct udphdr); 23 - const size_t ethhdr_sz = sizeof(struct ethhdr); 24 - const size_t iphdr_sz = sizeof(struct iphdr); 25 - const size_t ipv6hdr_sz = sizeof(struct ipv6hdr); 21 + #define tcphdr_sz sizeof(struct tcphdr) 22 + #define udphdr_sz sizeof(struct udphdr) 23 + #define ethhdr_sz sizeof(struct ethhdr) 24 + #define iphdr_sz sizeof(struct iphdr) 25 + #define ipv6hdr_sz sizeof(struct ipv6hdr) 26 26 27 27 struct { 28 28 __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);

+32

tools/testing/selftests/bpf/progs/token_lsm.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */ 3 + 4 + #include "vmlinux.h" 5 + #include <bpf/bpf_helpers.h> 6 + #include <bpf/bpf_tracing.h> 7 + 8 + char _license[] SEC("license") = "GPL"; 9 + 10 + int my_pid; 11 + bool reject_capable; 12 + bool reject_cmd; 13 + 14 + SEC("lsm/bpf_token_capable") 15 + int BPF_PROG(token_capable, struct bpf_token *token, int cap) 16 + { 17 + if (my_pid == 0 || my_pid != (bpf_get_current_pid_tgid() >> 32)) 18 + return 0; 19 + if (reject_capable) 20 + return -1; 21 + return 0; 22 + } 23 + 24 + SEC("lsm/bpf_token_cmd") 25 + int BPF_PROG(token_cmd, struct bpf_token *token, enum bpf_cmd cmd) 26 + { 27 + if (my_pid == 0 || my_pid != (bpf_get_current_pid_tgid() >> 32)) 28 + return 0; 29 + if (reject_cmd) 30 + return -1; 31 + return 0; 32 + }

+1 -1

tools/testing/selftests/bpf/progs/verifier_direct_packet_access.c

··· 568 568 569 569 SEC("tc") 570 570 __description("direct packet access: test23 (x += pkt_ptr, 4)") 571 - __failure __msg("invalid access to packet, off=0 size=8, R5(id=2,off=0,r=0)") 571 + __failure __msg("invalid access to packet, off=0 size=8, R5(id=3,off=0,r=0)") 572 572 __flag(BPF_F_ANY_ALIGNMENT) 573 573 __naked void test23_x_pkt_ptr_4(void) 574 574 {

+24

tools/testing/selftests/bpf/progs/verifier_loops1.c

··· 259 259 " ::: __clobber_all); 260 260 } 261 261 262 + SEC("xdp") 263 + __success 264 + __naked void not_an_inifinite_loop(void) 265 + { 266 + asm volatile (" \ 267 + call %[bpf_get_prandom_u32]; \ 268 + r0 &= 0xff; \ 269 + *(u64 *)(r10 - 8) = r0; \ 270 + r0 = 0; \ 271 + loop_%=: \ 272 + r0 = *(u64 *)(r10 - 8); \ 273 + if r0 > 10 goto exit_%=; \ 274 + r0 += 1; \ 275 + *(u64 *)(r10 - 8) = r0; \ 276 + r0 = 0; \ 277 + goto loop_%=; \ 278 + exit_%=: \ 279 + r0 = 0; \ 280 + exit; \ 281 + " : 282 + : __imm(bpf_get_prandom_u32) 283 + : __clobber_all); 284 + } 285 + 262 286 char _license[] SEC("license") = "GPL";

+216 -13

tools/testing/selftests/bpf/progs/verifier_spill_fill.c

··· 243 243 244 244 SEC("tc") 245 245 __description("Spill u32 const scalars. Refill as u64. Offset to skb->data") 246 - __failure __msg("invalid access to packet") 246 + __failure __msg("math between pkt pointer and register with unbounded min value is not allowed") 247 247 __naked void u64_offset_to_skb_data(void) 248 248 { 249 249 asm volatile (" \ ··· 253 253 w7 = 20; \ 254 254 *(u32*)(r10 - 4) = r6; \ 255 255 *(u32*)(r10 - 8) = r7; \ 256 - r4 = *(u16*)(r10 - 8); \ 256 + r4 = *(u64*)(r10 - 8); \ 257 257 r0 = r2; \ 258 - /* r0 += r4 R0=pkt R2=pkt R3=pkt_end R4=umax=65535 */\ 258 + /* r0 += r4 R0=pkt R2=pkt R3=pkt_end R4= */ \ 259 259 r0 += r4; \ 260 - /* if (r0 > r3) R0=pkt,umax=65535 R2=pkt R3=pkt_end R4=umax=65535 */\ 261 260 if r0 > r3 goto l0_%=; \ 262 - /* r0 = *(u32 *)r2 R0=pkt,umax=65535 R2=pkt R3=pkt_end R4=20 */\ 263 261 r0 = *(u32*)(r2 + 0); \ 264 262 l0_%=: r0 = 0; \ 265 263 exit; \ ··· 493 495 SEC("raw_tp") 494 496 __log_level(2) 495 497 __success 496 - /* make sure fp-8 is all STACK_ZERO */ 497 - __msg("2: (7a) *(u64 *)(r10 -8) = 0 ; R10=fp0 fp-8_w=00000000") 498 + /* fp-8 is spilled IMPRECISE value zero (represented by a zero value fake reg) */ 499 + __msg("2: (7a) *(u64 *)(r10 -8) = 0 ; R10=fp0 fp-8_w=0") 498 500 /* but fp-16 is spilled IMPRECISE zero const reg */ 499 501 __msg("4: (7b) *(u64 *)(r10 -16) = r0 ; R0_w=0 R10=fp0 fp-16_w=0") 500 - /* validate that assigning R2 from STACK_ZERO doesn't mark register 502 + /* validate that assigning R2 from STACK_SPILL with zero value doesn't mark register 501 503 * precise immediately; if necessary, it will be marked precise later 502 504 */ 503 - __msg("6: (71) r2 = *(u8 *)(r10 -1) ; R2_w=0 R10=fp0 fp-8_w=00000000") 505 + __msg("6: (71) r2 = *(u8 *)(r10 -1) ; R2_w=0 R10=fp0 fp-8_w=0") 504 506 /* similarly, when R2 is assigned from spilled register, it is initially 505 507 * imprecise, but will be marked precise later once it is used in precise context 506 508 */ ··· 518 520 __naked void partial_stack_load_preserves_zeros(void) 519 521 { 520 522 asm volatile ( 521 - /* fp-8 is all STACK_ZERO */ 523 + /* fp-8 is value zero (represented by a zero value fake reg) */ 522 524 ".8byte %[fp8_st_zero];" /* LLVM-18+: *(u64 *)(r10 -8) = 0; */ 523 525 524 526 /* fp-16 is const zero register */ 525 527 "r0 = 0;" 526 528 "*(u64 *)(r10 -16) = r0;" 527 529 528 - /* load single U8 from non-aligned STACK_ZERO slot */ 530 + /* load single U8 from non-aligned spilled value zero slot */ 529 531 "r1 = %[single_byte_buf];" 530 532 "r2 = *(u8 *)(r10 -1);" 531 533 "r1 += r2;" ··· 537 539 "r1 += r2;" 538 540 "*(u8 *)(r1 + 0) = r2;" /* this should be fine */ 539 541 540 - /* load single U16 from non-aligned STACK_ZERO slot */ 542 + /* load single U16 from non-aligned spilled value zero slot */ 541 543 "r1 = %[single_byte_buf];" 542 544 "r2 = *(u16 *)(r10 -2);" 543 545 "r1 += r2;" ··· 549 551 "r1 += r2;" 550 552 "*(u8 *)(r1 + 0) = r2;" /* this should be fine */ 551 553 552 - /* load single U32 from non-aligned STACK_ZERO slot */ 554 + /* load single U32 from non-aligned spilled value zero slot */ 553 555 "r1 = %[single_byte_buf];" 554 556 "r2 = *(u32 *)(r10 -4);" 555 557 "r1 += r2;" ··· 578 580 : 579 581 : __imm_ptr(single_byte_buf), 580 582 __imm_insn(fp8_st_zero, BPF_ST_MEM(BPF_DW, BPF_REG_FP, -8, 0)) 583 + : __clobber_common); 584 + } 585 + 586 + SEC("raw_tp") 587 + __log_level(2) 588 + __success 589 + /* fp-4 is STACK_ZERO */ 590 + __msg("2: (62) *(u32 *)(r10 -4) = 0 ; R10=fp0 fp-8=0000????") 591 + __msg("4: (71) r2 = *(u8 *)(r10 -1) ; R2_w=0 R10=fp0 fp-8=0000????") 592 + __msg("5: (0f) r1 += r2") 593 + __msg("mark_precise: frame0: last_idx 5 first_idx 0 subseq_idx -1") 594 + __msg("mark_precise: frame0: regs=r2 stack= before 4: (71) r2 = *(u8 *)(r10 -1)") 595 + __naked void partial_stack_load_preserves_partial_zeros(void) 596 + { 597 + asm volatile ( 598 + /* fp-4 is value zero */ 599 + ".8byte %[fp4_st_zero];" /* LLVM-18+: *(u32 *)(r10 -4) = 0; */ 600 + 601 + /* load single U8 from non-aligned stack zero slot */ 602 + "r1 = %[single_byte_buf];" 603 + "r2 = *(u8 *)(r10 -1);" 604 + "r1 += r2;" 605 + "*(u8 *)(r1 + 0) = r2;" /* this should be fine */ 606 + 607 + /* load single U16 from non-aligned stack zero slot */ 608 + "r1 = %[single_byte_buf];" 609 + "r2 = *(u16 *)(r10 -2);" 610 + "r1 += r2;" 611 + "*(u8 *)(r1 + 0) = r2;" /* this should be fine */ 612 + 613 + /* load single U32 from non-aligned stack zero slot */ 614 + "r1 = %[single_byte_buf];" 615 + "r2 = *(u32 *)(r10 -4);" 616 + "r1 += r2;" 617 + "*(u8 *)(r1 + 0) = r2;" /* this should be fine */ 618 + 619 + "r0 = 0;" 620 + "exit;" 621 + : 622 + : __imm_ptr(single_byte_buf), 623 + __imm_insn(fp4_st_zero, BPF_ST_MEM(BPF_W, BPF_REG_FP, -4, 0)) 581 624 : __clobber_common); 582 625 } 583 626 ··· 774 735 : __imm_ptr(two_byte_buf), 775 736 __imm_insn(fp8_st_one, BPF_ST_MEM(BPF_W, BPF_REG_FP, -8, 1)) /* 32-bit spill */ 776 737 : __clobber_common); 738 + } 739 + 740 + SEC("xdp") 741 + __description("32-bit spilled reg range should be tracked") 742 + __success __retval(0) 743 + __naked void spill_32bit_range_track(void) 744 + { 745 + asm volatile(" \ 746 + call %[bpf_ktime_get_ns]; \ 747 + /* Make r0 bounded. */ \ 748 + r0 &= 65535; \ 749 + /* Assign an ID to r0. */ \ 750 + r1 = r0; \ 751 + /* 32-bit spill r0 to stack. */ \ 752 + *(u32*)(r10 - 8) = r0; \ 753 + /* Boundary check on r0. */ \ 754 + if r0 < 1 goto l0_%=; \ 755 + /* 32-bit fill r1 from stack. */ \ 756 + r1 = *(u32*)(r10 - 8); \ 757 + /* r1 == r0 => r1 >= 1 always. */ \ 758 + if r1 >= 1 goto l0_%=; \ 759 + /* Dead branch: the verifier should prune it. \ 760 + * Do an invalid memory access if the verifier \ 761 + * follows it. \ 762 + */ \ 763 + r0 = *(u64*)(r9 + 0); \ 764 + l0_%=: r0 = 0; \ 765 + exit; \ 766 + " : 767 + : __imm(bpf_ktime_get_ns) 768 + : __clobber_all); 769 + } 770 + 771 + SEC("xdp") 772 + __description("64-bit spill of 64-bit reg should assign ID") 773 + __success __retval(0) 774 + __naked void spill_64bit_of_64bit_ok(void) 775 + { 776 + asm volatile (" \ 777 + /* Roll one bit to make the register inexact. */\ 778 + call %[bpf_get_prandom_u32]; \ 779 + r0 &= 0x80000000; \ 780 + r0 <<= 32; \ 781 + /* 64-bit spill r0 to stack - should assign an ID. */\ 782 + *(u64*)(r10 - 8) = r0; \ 783 + /* 64-bit fill r1 from stack - should preserve the ID. */\ 784 + r1 = *(u64*)(r10 - 8); \ 785 + /* Compare r1 with another register to trigger find_equal_scalars.\ 786 + * Having one random bit is important here, otherwise the verifier cuts\ 787 + * the corners. \ 788 + */ \ 789 + r2 = 0; \ 790 + if r1 != r2 goto l0_%=; \ 791 + /* The result of this comparison is predefined. */\ 792 + if r0 == r2 goto l0_%=; \ 793 + /* Dead branch: the verifier should prune it. Do an invalid memory\ 794 + * access if the verifier follows it. \ 795 + */ \ 796 + r0 = *(u64*)(r9 + 0); \ 797 + exit; \ 798 + l0_%=: r0 = 0; \ 799 + exit; \ 800 + " : 801 + : __imm(bpf_get_prandom_u32) 802 + : __clobber_all); 803 + } 804 + 805 + SEC("xdp") 806 + __description("32-bit spill of 32-bit reg should assign ID") 807 + __success __retval(0) 808 + __naked void spill_32bit_of_32bit_ok(void) 809 + { 810 + asm volatile (" \ 811 + /* Roll one bit to make the register inexact. */\ 812 + call %[bpf_get_prandom_u32]; \ 813 + w0 &= 0x80000000; \ 814 + /* 32-bit spill r0 to stack - should assign an ID. */\ 815 + *(u32*)(r10 - 8) = r0; \ 816 + /* 32-bit fill r1 from stack - should preserve the ID. */\ 817 + r1 = *(u32*)(r10 - 8); \ 818 + /* Compare r1 with another register to trigger find_equal_scalars.\ 819 + * Having one random bit is important here, otherwise the verifier cuts\ 820 + * the corners. \ 821 + */ \ 822 + r2 = 0; \ 823 + if r1 != r2 goto l0_%=; \ 824 + /* The result of this comparison is predefined. */\ 825 + if r0 == r2 goto l0_%=; \ 826 + /* Dead branch: the verifier should prune it. Do an invalid memory\ 827 + * access if the verifier follows it. \ 828 + */ \ 829 + r0 = *(u64*)(r9 + 0); \ 830 + exit; \ 831 + l0_%=: r0 = 0; \ 832 + exit; \ 833 + " : 834 + : __imm(bpf_get_prandom_u32) 835 + : __clobber_all); 836 + } 837 + 838 + SEC("xdp") 839 + __description("16-bit spill of 16-bit reg should assign ID") 840 + __success __retval(0) 841 + __naked void spill_16bit_of_16bit_ok(void) 842 + { 843 + asm volatile (" \ 844 + /* Roll one bit to make the register inexact. */\ 845 + call %[bpf_get_prandom_u32]; \ 846 + r0 &= 0x8000; \ 847 + /* 16-bit spill r0 to stack - should assign an ID. */\ 848 + *(u16*)(r10 - 8) = r0; \ 849 + /* 16-bit fill r1 from stack - should preserve the ID. */\ 850 + r1 = *(u16*)(r10 - 8); \ 851 + /* Compare r1 with another register to trigger find_equal_scalars.\ 852 + * Having one random bit is important here, otherwise the verifier cuts\ 853 + * the corners. \ 854 + */ \ 855 + r2 = 0; \ 856 + if r1 != r2 goto l0_%=; \ 857 + /* The result of this comparison is predefined. */\ 858 + if r0 == r2 goto l0_%=; \ 859 + /* Dead branch: the verifier should prune it. Do an invalid memory\ 860 + * access if the verifier follows it. \ 861 + */ \ 862 + r0 = *(u64*)(r9 + 0); \ 863 + exit; \ 864 + l0_%=: r0 = 0; \ 865 + exit; \ 866 + " : 867 + : __imm(bpf_get_prandom_u32) 868 + : __clobber_all); 869 + } 870 + 871 + SEC("xdp") 872 + __description("8-bit spill of 8-bit reg should assign ID") 873 + __success __retval(0) 874 + __naked void spill_8bit_of_8bit_ok(void) 875 + { 876 + asm volatile (" \ 877 + /* Roll one bit to make the register inexact. */\ 878 + call %[bpf_get_prandom_u32]; \ 879 + r0 &= 0x80; \ 880 + /* 8-bit spill r0 to stack - should assign an ID. */\ 881 + *(u8*)(r10 - 8) = r0; \ 882 + /* 8-bit fill r1 from stack - should preserve the ID. */\ 883 + r1 = *(u8*)(r10 - 8); \ 884 + /* Compare r1 with another register to trigger find_equal_scalars.\ 885 + * Having one random bit is important here, otherwise the verifier cuts\ 886 + * the corners. \ 887 + */ \ 888 + r2 = 0; \ 889 + if r1 != r2 goto l0_%=; \ 890 + /* The result of this comparison is predefined. */\ 891 + if r0 == r2 goto l0_%=; \ 892 + /* Dead branch: the verifier should prune it. Do an invalid memory\ 893 + * access if the verifier follows it. \ 894 + */ \ 895 + r0 = *(u64*)(r9 + 0); \ 896 + exit; \ 897 + l0_%=: r0 = 0; \ 898 + exit; \ 899 + " : 900 + : __imm(bpf_get_prandom_u32) 901 + : __clobber_all); 777 902 } 778 903 779 904 char _license[] SEC("license") = "GPL";

+2 -2

tools/testing/selftests/bpf/test_loader.c

··· 181 181 memset(spec, 0, sizeof(*spec)); 182 182 183 183 spec->prog_name = bpf_program__name(prog); 184 - spec->prog_flags = BPF_F_TEST_REG_INVARIANTS; /* by default be strict */ 184 + spec->prog_flags = testing_prog_flags(); 185 185 186 186 btf = bpf_object__btf(obj); 187 187 if (!btf) { ··· 688 688 ++nr_progs; 689 689 690 690 specs = calloc(nr_progs, sizeof(struct test_spec)); 691 - if (!ASSERT_OK_PTR(specs, "Can't alloc specs array")) 691 + if (!ASSERT_OK_PTR(specs, "specs_alloc")) 692 692 return; 693 693 694 694 i = 0;

+5 -1

tools/testing/selftests/bpf/test_maps.c

··· 1190 1190 goto out_map_in_map; 1191 1191 } 1192 1192 1193 - bpf_object__load(obj); 1193 + err = bpf_object__load(obj); 1194 + if (err) { 1195 + printf("Failed to load test prog\n"); 1196 + goto out_map_in_map; 1197 + } 1194 1198 1195 1199 map = bpf_object__find_map_by_name(obj, "mim_array"); 1196 1200 if (!map) {

-18

tools/testing/selftests/bpf/test_progs.c

··· 547 547 return bpf_map__fd(map); 548 548 } 549 549 550 - static bool is_jit_enabled(void) 551 - { 552 - const char *jit_sysctl = "/proc/sys/net/core/bpf_jit_enable"; 553 - bool enabled = false; 554 - int sysctl_fd; 555 - 556 - sysctl_fd = open(jit_sysctl, 0, O_RDONLY); 557 - if (sysctl_fd != -1) { 558 - char tmpc; 559 - 560 - if (read(sysctl_fd, &tmpc, sizeof(tmpc)) == 1) 561 - enabled = (tmpc != '0'); 562 - close(sysctl_fd); 563 - } 564 - 565 - return enabled; 566 - } 567 - 568 550 int compare_map_keys(int map1_fd, int map2_fd) 569 551 { 570 552 __u32 key, next_key;

+2 -1

tools/testing/selftests/bpf/test_sock_addr.c

··· 19 19 #include <bpf/libbpf.h> 20 20 21 21 #include "cgroup_helpers.h" 22 + #include "testing_helpers.h" 22 23 #include "bpf_util.h" 23 24 24 25 #ifndef ENOTSUPP ··· 680 679 681 680 bpf_program__set_type(prog, BPF_PROG_TYPE_CGROUP_SOCK_ADDR); 682 681 bpf_program__set_expected_attach_type(prog, test->expected_attach_type); 683 - bpf_program__set_flags(prog, BPF_F_TEST_RND_HI32 | BPF_F_TEST_REG_INVARIANTS); 682 + bpf_program__set_flags(prog, testing_prog_flags()); 684 683 685 684 err = bpf_object__load(obj); 686 685 if (err) {

+14 -46

tools/testing/selftests/bpf/test_verifier.c

··· 67 67 68 68 #define F_NEEDS_EFFICIENT_UNALIGNED_ACCESS (1 << 0) 69 69 #define F_LOAD_WITH_STRICT_ALIGNMENT (1 << 1) 70 + #define F_NEEDS_JIT_ENABLED (1 << 2) 70 71 71 72 /* need CAP_BPF, CAP_NET_ADMIN, CAP_PERFMON to load progs */ 72 73 #define ADMIN_CAPS (1ULL << CAP_NET_ADMIN | \ ··· 75 74 1ULL << CAP_BPF) 76 75 #define UNPRIV_SYSCTL "kernel/unprivileged_bpf_disabled" 77 76 static bool unpriv_disabled = false; 77 + static bool jit_disabled; 78 78 static int skips; 79 79 static bool verbose = false; 80 80 static int verif_log_level = 0; ··· 1343 1341 return true; 1344 1342 } 1345 1343 1346 - static struct bpf_insn *get_xlated_program(int fd_prog, int *cnt) 1347 - { 1348 - __u32 buf_element_size = sizeof(struct bpf_insn); 1349 - struct bpf_prog_info info = {}; 1350 - __u32 info_len = sizeof(info); 1351 - __u32 xlated_prog_len; 1352 - struct bpf_insn *buf; 1353 - 1354 - if (bpf_prog_get_info_by_fd(fd_prog, &info, &info_len)) { 1355 - perror("bpf_prog_get_info_by_fd failed"); 1356 - return NULL; 1357 - } 1358 - 1359 - xlated_prog_len = info.xlated_prog_len; 1360 - if (xlated_prog_len % buf_element_size) { 1361 - printf("Program length %d is not multiple of %d\n", 1362 - xlated_prog_len, buf_element_size); 1363 - return NULL; 1364 - } 1365 - 1366 - *cnt = xlated_prog_len / buf_element_size; 1367 - buf = calloc(*cnt, buf_element_size); 1368 - if (!buf) { 1369 - perror("can't allocate xlated program buffer"); 1370 - return NULL; 1371 - } 1372 - 1373 - bzero(&info, sizeof(info)); 1374 - info.xlated_prog_len = xlated_prog_len; 1375 - info.xlated_prog_insns = (__u64)(unsigned long)buf; 1376 - if (bpf_prog_get_info_by_fd(fd_prog, &info, &info_len)) { 1377 - perror("second bpf_prog_get_info_by_fd failed"); 1378 - goto out_free_buf; 1379 - } 1380 - 1381 - return buf; 1382 - 1383 - out_free_buf: 1384 - free(buf); 1385 - return NULL; 1386 - } 1387 - 1388 1344 static bool is_null_insn(struct bpf_insn *insn) 1389 1345 { 1390 1346 struct bpf_insn null_insn = {}; ··· 1465 1505 static bool check_xlated_program(struct bpf_test *test, int fd_prog) 1466 1506 { 1467 1507 struct bpf_insn *buf; 1468 - int cnt; 1508 + unsigned int cnt; 1469 1509 bool result = true; 1470 1510 bool check_expected = !is_null_insn(test->expected_insns); 1471 1511 bool check_unexpected = !is_null_insn(test->unexpected_insns); ··· 1473 1513 if (!check_expected && !check_unexpected) 1474 1514 goto out; 1475 1515 1476 - buf = get_xlated_program(fd_prog, &cnt); 1477 - if (!buf) { 1516 + if (get_xlated_program(fd_prog, &buf, &cnt)) { 1478 1517 printf("FAIL: can't get xlated program\n"); 1479 1518 result = false; 1480 1519 goto out; ··· 1526 1567 __u32 pflags; 1527 1568 int i, err; 1528 1569 1570 + if ((test->flags & F_NEEDS_JIT_ENABLED) && jit_disabled) { 1571 + printf("SKIP (requires BPF JIT)\n"); 1572 + skips++; 1573 + sched_yield(); 1574 + return; 1575 + } 1576 + 1529 1577 fd_prog = -1; 1530 1578 for (i = 0; i < MAX_NR_MAPS; i++) 1531 1579 map_fds[i] = -1; ··· 1554 1588 if (fixup_skips != skips) 1555 1589 return; 1556 1590 1557 - pflags = BPF_F_TEST_RND_HI32 | BPF_F_TEST_REG_INVARIANTS; 1591 + pflags = testing_prog_flags(); 1558 1592 if (test->flags & F_LOAD_WITH_STRICT_ALIGNMENT) 1559 1593 pflags |= BPF_F_STRICT_ALIGNMENT; 1560 1594 if (test->flags & F_NEEDS_EFFICIENT_UNALIGNED_ACCESS) ··· 1852 1886 UNPRIV_SYSCTL); 1853 1887 return EXIT_FAILURE; 1854 1888 } 1889 + 1890 + jit_disabled = !is_jit_enabled(); 1855 1891 1856 1892 /* Use libbpf 1.0 API mode */ 1857 1893 libbpf_set_strict_mode(LIBBPF_STRICT_ALL);

+90 -2

tools/testing/selftests/bpf/testing_helpers.c

··· 252 252 253 253 int extra_prog_load_log_flags = 0; 254 254 255 + int testing_prog_flags(void) 256 + { 257 + static int cached_flags = -1; 258 + static int prog_flags[] = { BPF_F_TEST_RND_HI32, BPF_F_TEST_REG_INVARIANTS }; 259 + static struct bpf_insn insns[] = { 260 + BPF_MOV64_IMM(BPF_REG_0, 0), 261 + BPF_EXIT_INSN(), 262 + }; 263 + int insn_cnt = ARRAY_SIZE(insns), i, fd, flags = 0; 264 + LIBBPF_OPTS(bpf_prog_load_opts, opts); 265 + 266 + if (cached_flags >= 0) 267 + return cached_flags; 268 + 269 + for (i = 0; i < ARRAY_SIZE(prog_flags); i++) { 270 + opts.prog_flags = prog_flags[i]; 271 + fd = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, "flag-test", "GPL", 272 + insns, insn_cnt, &opts); 273 + if (fd >= 0) { 274 + flags |= prog_flags[i]; 275 + close(fd); 276 + } 277 + } 278 + 279 + cached_flags = flags; 280 + return cached_flags; 281 + } 282 + 255 283 int bpf_prog_test_load(const char *file, enum bpf_prog_type type, 256 284 struct bpf_object **pobj, int *prog_fd) 257 285 { ··· 304 276 if (type != BPF_PROG_TYPE_UNSPEC && bpf_program__type(prog) != type) 305 277 bpf_program__set_type(prog, type); 306 278 307 - flags = bpf_program__flags(prog) | BPF_F_TEST_RND_HI32 | BPF_F_TEST_REG_INVARIANTS; 279 + flags = bpf_program__flags(prog) | testing_prog_flags(); 308 280 bpf_program__set_flags(prog, flags); 309 281 310 282 err = bpf_object__load(obj); ··· 327 299 { 328 300 LIBBPF_OPTS(bpf_prog_load_opts, opts, 329 301 .kern_version = kern_version, 330 - .prog_flags = BPF_F_TEST_RND_HI32 | BPF_F_TEST_REG_INVARIANTS, 302 + .prog_flags = testing_prog_flags(), 331 303 .log_level = extra_prog_load_log_flags, 332 304 .log_buf = log_buf, 333 305 .log_size = log_buf_sz, ··· 414 386 int kern_sync_rcu(void) 415 387 { 416 388 return syscall(__NR_membarrier, MEMBARRIER_CMD_SHARED, 0, 0); 389 + } 390 + 391 + int get_xlated_program(int fd_prog, struct bpf_insn **buf, __u32 *cnt) 392 + { 393 + __u32 buf_element_size = sizeof(struct bpf_insn); 394 + struct bpf_prog_info info = {}; 395 + __u32 info_len = sizeof(info); 396 + __u32 xlated_prog_len; 397 + 398 + if (bpf_prog_get_info_by_fd(fd_prog, &info, &info_len)) { 399 + perror("bpf_prog_get_info_by_fd failed"); 400 + return -1; 401 + } 402 + 403 + xlated_prog_len = info.xlated_prog_len; 404 + if (xlated_prog_len % buf_element_size) { 405 + printf("Program length %u is not multiple of %u\n", 406 + xlated_prog_len, buf_element_size); 407 + return -1; 408 + } 409 + 410 + *cnt = xlated_prog_len / buf_element_size; 411 + *buf = calloc(*cnt, buf_element_size); 412 + if (!buf) { 413 + perror("can't allocate xlated program buffer"); 414 + return -ENOMEM; 415 + } 416 + 417 + bzero(&info, sizeof(info)); 418 + info.xlated_prog_len = xlated_prog_len; 419 + info.xlated_prog_insns = (__u64)(unsigned long)*buf; 420 + if (bpf_prog_get_info_by_fd(fd_prog, &info, &info_len)) { 421 + perror("second bpf_prog_get_info_by_fd failed"); 422 + goto out_free_buf; 423 + } 424 + 425 + return 0; 426 + 427 + out_free_buf: 428 + free(*buf); 429 + *buf = NULL; 430 + return -1; 431 + } 432 + 433 + bool is_jit_enabled(void) 434 + { 435 + const char *jit_sysctl = "/proc/sys/net/core/bpf_jit_enable"; 436 + bool enabled = false; 437 + int sysctl_fd; 438 + 439 + sysctl_fd = open(jit_sysctl, O_RDONLY); 440 + if (sysctl_fd != -1) { 441 + char tmpc; 442 + 443 + if (read(sysctl_fd, &tmpc, sizeof(tmpc)) == 1) 444 + enabled = (tmpc != '0'); 445 + close(sysctl_fd); 446 + } 447 + 448 + return enabled; 417 449 }

+8

tools/testing/selftests/bpf/testing_helpers.h

··· 46 46 return (u64)t.tv_sec * 1000000000 + t.tv_nsec; 47 47 } 48 48 49 + struct bpf_insn; 50 + /* Request BPF program instructions after all rewrites are applied, 51 + * e.g. verifier.c:convert_ctx_access() is done. 52 + */ 53 + int get_xlated_program(int fd_prog, struct bpf_insn **buf, __u32 *cnt); 54 + int testing_prog_flags(void); 55 + bool is_jit_enabled(void); 56 + 49 57 #endif /* __TESTING_HELPERS_H */

+6

tools/testing/selftests/bpf/verifier/bpf_loop_inline.c

··· 57 57 .expected_insns = { PSEUDO_CALL_INSN() }, 58 58 .unexpected_insns = { HELPER_CALL_INSN() }, 59 59 .prog_type = BPF_PROG_TYPE_TRACEPOINT, 60 + .flags = F_NEEDS_JIT_ENABLED, 60 61 .result = ACCEPT, 61 62 .runs = 0, 62 63 .func_info = { { 0, MAIN_TYPE }, { 12, CALLBACK_TYPE } }, ··· 91 90 .expected_insns = { HELPER_CALL_INSN() }, 92 91 .unexpected_insns = { PSEUDO_CALL_INSN() }, 93 92 .prog_type = BPF_PROG_TYPE_TRACEPOINT, 93 + .flags = F_NEEDS_JIT_ENABLED, 94 94 .result = ACCEPT, 95 95 .runs = 0, 96 96 .func_info = { { 0, MAIN_TYPE }, { 16, CALLBACK_TYPE } }, ··· 129 127 .expected_insns = { HELPER_CALL_INSN() }, 130 128 .unexpected_insns = { PSEUDO_CALL_INSN() }, 131 129 .prog_type = BPF_PROG_TYPE_TRACEPOINT, 130 + .flags = F_NEEDS_JIT_ENABLED, 132 131 .result = ACCEPT, 133 132 .runs = 0, 134 133 .func_info = { ··· 168 165 .expected_insns = { PSEUDO_CALL_INSN() }, 169 166 .unexpected_insns = { HELPER_CALL_INSN() }, 170 167 .prog_type = BPF_PROG_TYPE_TRACEPOINT, 168 + .flags = F_NEEDS_JIT_ENABLED, 171 169 .result = ACCEPT, 172 170 .runs = 0, 173 171 .func_info = { ··· 239 235 }, 240 236 .unexpected_insns = { HELPER_CALL_INSN() }, 241 237 .prog_type = BPF_PROG_TYPE_TRACEPOINT, 238 + .flags = F_NEEDS_JIT_ENABLED, 242 239 .result = ACCEPT, 243 240 .func_info = { 244 241 { 0, MAIN_TYPE }, ··· 257 252 .unexpected_insns = { HELPER_CALL_INSN() }, 258 253 .result = ACCEPT, 259 254 .prog_type = BPF_PROG_TYPE_TRACEPOINT, 255 + .flags = F_NEEDS_JIT_ENABLED, 260 256 .func_info = { { 0, MAIN_TYPE }, { 16, CALLBACK_TYPE } }, 261 257 .func_info_cnt = 2, 262 258 BTF_TYPES

+3 -3

tools/testing/selftests/bpf/verifier/precise.c

··· 183 183 .prog_type = BPF_PROG_TYPE_XDP, 184 184 .flags = BPF_F_TEST_STATE_FREQ, 185 185 .errstr = "mark_precise: frame0: last_idx 7 first_idx 7\ 186 - mark_precise: frame0: parent state regs=r4 stack=:\ 186 + mark_precise: frame0: parent state regs=r4 stack=-8:\ 187 187 mark_precise: frame0: last_idx 6 first_idx 4\ 188 - mark_precise: frame0: regs=r4 stack= before 6: (b7) r0 = -1\ 189 - mark_precise: frame0: regs=r4 stack= before 5: (79) r4 = *(u64 *)(r10 -8)\ 188 + mark_precise: frame0: regs=r4 stack=-8 before 6: (b7) r0 = -1\ 189 + mark_precise: frame0: regs=r4 stack=-8 before 5: (79) r4 = *(u64 *)(r10 -8)\ 190 190 mark_precise: frame0: regs= stack=-8 before 4: (7b) *(u64 *)(r3 -8) = r0\ 191 191 mark_precise: frame0: parent state regs=r0 stack=:\ 192 192 mark_precise: frame0: last_idx 3 first_idx 3\

Configure Feed

Configure Feed