Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

+109

Documentation/bpf/map_cgrp_storage.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0-only 2 + .. Copyright (C) 2022 Meta Platforms, Inc. and affiliates. 3 + 4 + ========================= 5 + BPF_MAP_TYPE_CGRP_STORAGE 6 + ========================= 7 + 8 + The ``BPF_MAP_TYPE_CGRP_STORAGE`` map type represents a local fix-sized 9 + storage for cgroups. It is only available with ``CONFIG_CGROUPS``. 10 + The programs are made available by the same Kconfig. The 11 + data for a particular cgroup can be retrieved by looking up the map 12 + with that cgroup. 13 + 14 + This document describes the usage and semantics of the 15 + ``BPF_MAP_TYPE_CGRP_STORAGE`` map type. 16 + 17 + Usage 18 + ===== 19 + 20 + The map key must be ``sizeof(int)`` representing a cgroup fd. 21 + To access the storage in a program, use ``bpf_cgrp_storage_get``:: 22 + 23 + void *bpf_cgrp_storage_get(struct bpf_map *map, struct cgroup *cgroup, void *value, u64 flags) 24 + 25 + ``flags`` could be 0 or ``BPF_LOCAL_STORAGE_GET_F_CREATE`` which indicates that 26 + a new local storage will be created if one does not exist. 27 + 28 + The local storage can be removed with ``bpf_cgrp_storage_delete``:: 29 + 30 + long bpf_cgrp_storage_delete(struct bpf_map *map, struct cgroup *cgroup) 31 + 32 + The map is available to all program types. 33 + 34 + Examples 35 + ======== 36 + 37 + A BPF program example with BPF_MAP_TYPE_CGRP_STORAGE:: 38 + 39 + #include <vmlinux.h> 40 + #include <bpf/bpf_helpers.h> 41 + #include <bpf/bpf_tracing.h> 42 + 43 + struct { 44 + __uint(type, BPF_MAP_TYPE_CGRP_STORAGE); 45 + __uint(map_flags, BPF_F_NO_PREALLOC); 46 + __type(key, int); 47 + __type(value, long); 48 + } cgrp_storage SEC(".maps"); 49 + 50 + SEC("tp_btf/sys_enter") 51 + int BPF_PROG(on_enter, struct pt_regs *regs, long id) 52 + { 53 + struct task_struct *task = bpf_get_current_task_btf(); 54 + long *ptr; 55 + 56 + ptr = bpf_cgrp_storage_get(&cgrp_storage, task->cgroups->dfl_cgrp, 0, 57 + BPF_LOCAL_STORAGE_GET_F_CREATE); 58 + if (ptr) 59 + __sync_fetch_and_add(ptr, 1); 60 + 61 + return 0; 62 + } 63 + 64 + Userspace accessing map declared above:: 65 + 66 + #include <linux/bpf.h> 67 + #include <linux/libbpf.h> 68 + 69 + __u32 map_lookup(struct bpf_map *map, int cgrp_fd) 70 + { 71 + __u32 *value; 72 + value = bpf_map_lookup_elem(bpf_map__fd(map), &cgrp_fd); 73 + if (value) 74 + return *value; 75 + return 0; 76 + } 77 + 78 + Difference Between BPF_MAP_TYPE_CGRP_STORAGE and BPF_MAP_TYPE_CGROUP_STORAGE 79 + ============================================================================ 80 + 81 + The old cgroup storage map ``BPF_MAP_TYPE_CGROUP_STORAGE`` has been marked as 82 + deprecated (renamed to ``BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED``). The new 83 + ``BPF_MAP_TYPE_CGRP_STORAGE`` map should be used instead. The following 84 + illusates the main difference between ``BPF_MAP_TYPE_CGRP_STORAGE`` and 85 + ``BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED``. 86 + 87 + (1). ``BPF_MAP_TYPE_CGRP_STORAGE`` can be used by all program types while 88 + ``BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED`` is available only to cgroup program types 89 + like BPF_CGROUP_INET_INGRESS or BPF_CGROUP_SOCK_OPS, etc. 90 + 91 + (2). ``BPF_MAP_TYPE_CGRP_STORAGE`` supports local storage for more than one 92 + cgroup while ``BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED`` only supports one cgroup 93 + which is attached by a BPF program. 94 + 95 + (3). ``BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED`` allocates local storage at attach time so 96 + ``bpf_get_local_storage()`` always returns non-NULL local storage. 97 + ``BPF_MAP_TYPE_CGRP_STORAGE`` allocates local storage at runtime so 98 + it is possible that ``bpf_cgrp_storage_get()`` may return null local storage. 99 + To avoid such null local storage issue, user space can do 100 + ``bpf_map_update_elem()`` to pre-allocate local storage before a BPF program 101 + is attached. 102 + 103 + (4). ``BPF_MAP_TYPE_CGRP_STORAGE`` supports deleting local storage by a BPF program 104 + while ``BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED`` only deletes storage during 105 + prog detach time. 106 + 107 + So overall, ``BPF_MAP_TYPE_CGRP_STORAGE`` supports all ``BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED`` 108 + functionality and beyond. It is recommended to use ``BPF_MAP_TYPE_CGRP_STORAGE`` 109 + instead of ``BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED``.

+69 -40

Documentation/bpf/maps.rst

··· 1 1 2 - ========= 3 - eBPF maps 4 - ========= 2 + ======== 3 + BPF maps 4 + ======== 5 5 6 - 'maps' is a generic storage of different types for sharing data between kernel 7 - and userspace. 6 + BPF 'maps' provide generic storage of different types for sharing data between 7 + kernel and user space. There are several storage types available, including 8 + hash, array, bloom filter and radix-tree. Several of the map types exist to 9 + support specific BPF helpers that perform actions based on the map contents. The 10 + maps are accessed from BPF programs via BPF helpers which are documented in the 11 + `man-pages`_ for `bpf-helpers(7)`_. 8 12 9 - The maps are accessed from user space via BPF syscall, which has commands: 10 - 11 - - create a map with given type and attributes 12 - ``map_fd = bpf(BPF_MAP_CREATE, union bpf_attr *attr, u32 size)`` 13 - using attr->map_type, attr->key_size, attr->value_size, attr->max_entries 14 - returns process-local file descriptor or negative error 15 - 16 - - lookup key in a given map 17 - ``err = bpf(BPF_MAP_LOOKUP_ELEM, union bpf_attr *attr, u32 size)`` 18 - using attr->map_fd, attr->key, attr->value 19 - returns zero and stores found elem into value or negative error 20 - 21 - - create or update key/value pair in a given map 22 - ``err = bpf(BPF_MAP_UPDATE_ELEM, union bpf_attr *attr, u32 size)`` 23 - using attr->map_fd, attr->key, attr->value 24 - returns zero or negative error 25 - 26 - - find and delete element by key in a given map 27 - ``err = bpf(BPF_MAP_DELETE_ELEM, union bpf_attr *attr, u32 size)`` 28 - using attr->map_fd, attr->key 29 - 30 - - to delete map: close(fd) 31 - Exiting process will delete maps automatically 32 - 33 - userspace programs use this syscall to create/access maps that eBPF programs 34 - are concurrently updating. 35 - 36 - maps can have different types: hash, array, bloom filter, radix-tree, etc. 37 - 38 - The map is defined by: 39 - 40 - - type 41 - - max number of elements 42 - - key size in bytes 43 - - value size in bytes 13 + BPF maps are accessed from user space via the ``bpf`` syscall, which provides 14 + commands to create maps, lookup elements, update elements and delete 15 + elements. More details of the BPF syscall are available in 16 + :doc:`/userspace-api/ebpf/syscall` and in the `man-pages`_ for `bpf(2)`_. 44 17 45 18 Map Types 46 19 ========= ··· 23 50 :glob: 24 51 25 52 map_* 53 + 54 + Usage Notes 55 + =========== 56 + 57 + .. c:function:: 58 + int bpf(int command, union bpf_attr *attr, u32 size) 59 + 60 + Use the ``bpf()`` system call to perform the operation specified by 61 + ``command``. The operation takes parameters provided in ``attr``. The ``size`` 62 + argument is the size of the ``union bpf_attr`` in ``attr``. 63 + 64 + **BPF_MAP_CREATE** 65 + 66 + Create a map with the desired type and attributes in ``attr``: 67 + 68 + .. code-block:: c 69 + 70 + int fd; 71 + union bpf_attr attr = { 72 + .map_type = BPF_MAP_TYPE_ARRAY; /* mandatory */ 73 + .key_size = sizeof(__u32); /* mandatory */ 74 + .value_size = sizeof(__u32); /* mandatory */ 75 + .max_entries = 256; /* mandatory */ 76 + .map_flags = BPF_F_MMAPABLE; 77 + .map_name = "example_array"; 78 + }; 79 + 80 + fd = bpf(BPF_MAP_CREATE, &attr, sizeof(attr)); 81 + 82 + Returns a process-local file descriptor on success, or negative error in case of 83 + failure. The map can be deleted by calling ``close(fd)``. Maps held by open 84 + file descriptors will be deleted automatically when a process exits. 85 + 86 + .. note:: Valid characters for ``map_name`` are ``A-Z``, ``a-z``, ``0-9``, 87 + ``'_'`` and ``'.'``. 88 + 89 + **BPF_MAP_LOOKUP_ELEM** 90 + 91 + Lookup key in a given map using ``attr->map_fd``, ``attr->key``, 92 + ``attr->value``. Returns zero and stores found elem into ``attr->value`` on 93 + success, or negative error on failure. 94 + 95 + **BPF_MAP_UPDATE_ELEM** 96 + 97 + Create or update key/value pair in a given map using ``attr->map_fd``, ``attr->key``, 98 + ``attr->value``. Returns zero on success or negative error on failure. 99 + 100 + **BPF_MAP_DELETE_ELEM** 101 + 102 + Find and delete element by key in a given map using ``attr->map_fd``, 103 + ``attr->key``. Returns zero on success or negative error on failure. 104 + 105 + .. Links: 106 + .. _man-pages: https://www.kernel.org/doc/man-pages/ 107 + .. _bpf(2): https://man7.org/linux/man-pages/man2/bpf.2.html 108 + .. _bpf-helpers(7): https://man7.org/linux/man-pages/man7/bpf-helpers.7.html

+2 -7

arch/arm64/net/bpf_jit_comp.c

··· 1649 1649 struct bpf_prog *p = l->link.prog; 1650 1650 int cookie_off = offsetof(struct bpf_tramp_run_ctx, bpf_cookie); 1651 1651 1652 - if (p->aux->sleepable) { 1653 - enter_prog = (u64)__bpf_prog_enter_sleepable; 1654 - exit_prog = (u64)__bpf_prog_exit_sleepable; 1655 - } else { 1656 - enter_prog = (u64)__bpf_prog_enter; 1657 - exit_prog = (u64)__bpf_prog_exit; 1658 - } 1652 + enter_prog = (u64)bpf_trampoline_enter(p); 1653 + exit_prog = (u64)bpf_trampoline_exit(p); 1659 1654 1660 1655 if (l->cookie == 0) { 1661 1656 /* if cookie is zero, one instruction is enough to store it */

+96 -29

arch/x86/net/bpf_jit_comp.c

··· 904 904 *pprog = prog; 905 905 } 906 906 907 + /* emit the 3-byte VEX prefix 908 + * 909 + * r: same as rex.r, extra bit for ModRM reg field 910 + * x: same as rex.x, extra bit for SIB index field 911 + * b: same as rex.b, extra bit for ModRM r/m, or SIB base 912 + * m: opcode map select, encoding escape bytes e.g. 0x0f38 913 + * w: same as rex.w (32 bit or 64 bit) or opcode specific 914 + * src_reg2: additional source reg (encoded as BPF reg) 915 + * l: vector length (128 bit or 256 bit) or reserved 916 + * pp: opcode prefix (none, 0x66, 0xf2 or 0xf3) 917 + */ 918 + static void emit_3vex(u8 **pprog, bool r, bool x, bool b, u8 m, 919 + bool w, u8 src_reg2, bool l, u8 pp) 920 + { 921 + u8 *prog = *pprog; 922 + const u8 b0 = 0xc4; /* first byte of 3-byte VEX prefix */ 923 + u8 b1, b2; 924 + u8 vvvv = reg2hex[src_reg2]; 925 + 926 + /* reg2hex gives only the lower 3 bit of vvvv */ 927 + if (is_ereg(src_reg2)) 928 + vvvv |= 1 << 3; 929 + 930 + /* 931 + * 2nd byte of 3-byte VEX prefix 932 + * ~ means bit inverted encoding 933 + * 934 + * 7 0 935 + * +---+---+---+---+---+---+---+---+ 936 + * |~R |~X |~B | m | 937 + * +---+---+---+---+---+---+---+---+ 938 + */ 939 + b1 = (!r << 7) | (!x << 6) | (!b << 5) | (m & 0x1f); 940 + /* 941 + * 3rd byte of 3-byte VEX prefix 942 + * 943 + * 7 0 944 + * +---+---+---+---+---+---+---+---+ 945 + * | W | ~vvvv | L | pp | 946 + * +---+---+---+---+---+---+---+---+ 947 + */ 948 + b2 = (w << 7) | ((~vvvv & 0xf) << 3) | (l << 2) | (pp & 3); 949 + 950 + EMIT3(b0, b1, b2); 951 + *pprog = prog; 952 + } 953 + 954 + /* emit BMI2 shift instruction */ 955 + static void emit_shiftx(u8 **pprog, u32 dst_reg, u8 src_reg, bool is64, u8 op) 956 + { 957 + u8 *prog = *pprog; 958 + bool r = is_ereg(dst_reg); 959 + u8 m = 2; /* escape code 0f38 */ 960 + 961 + emit_3vex(&prog, r, false, r, m, is64, src_reg, false, op); 962 + EMIT2(0xf7, add_2reg(0xC0, dst_reg, dst_reg)); 963 + *pprog = prog; 964 + } 965 + 907 966 #define INSN_SZ_DIFF (((addrs[i] - addrs[i - 1]) - (prog - temp))) 908 967 909 968 static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image, ··· 1209 1150 case BPF_ALU64 | BPF_LSH | BPF_X: 1210 1151 case BPF_ALU64 | BPF_RSH | BPF_X: 1211 1152 case BPF_ALU64 | BPF_ARSH | BPF_X: 1153 + /* BMI2 shifts aren't better when shift count is already in rcx */ 1154 + if (boot_cpu_has(X86_FEATURE_BMI2) && src_reg != BPF_REG_4) { 1155 + /* shrx/sarx/shlx dst_reg, dst_reg, src_reg */ 1156 + bool w = (BPF_CLASS(insn->code) == BPF_ALU64); 1157 + u8 op; 1212 1158 1213 - /* Check for bad case when dst_reg == rcx */ 1214 - if (dst_reg == BPF_REG_4) { 1215 - /* mov r11, dst_reg */ 1216 - EMIT_mov(AUX_REG, dst_reg); 1217 - dst_reg = AUX_REG; 1159 + switch (BPF_OP(insn->code)) { 1160 + case BPF_LSH: 1161 + op = 1; /* prefix 0x66 */ 1162 + break; 1163 + case BPF_RSH: 1164 + op = 3; /* prefix 0xf2 */ 1165 + break; 1166 + case BPF_ARSH: 1167 + op = 2; /* prefix 0xf3 */ 1168 + break; 1169 + } 1170 + 1171 + emit_shiftx(&prog, dst_reg, src_reg, w, op); 1172 + 1173 + break; 1218 1174 } 1219 1175 1220 1176 if (src_reg != BPF_REG_4) { /* common case */ 1221 - EMIT1(0x51); /* push rcx */ 1222 - 1177 + /* Check for bad case when dst_reg == rcx */ 1178 + if (dst_reg == BPF_REG_4) { 1179 + /* mov r11, dst_reg */ 1180 + EMIT_mov(AUX_REG, dst_reg); 1181 + dst_reg = AUX_REG; 1182 + } else { 1183 + EMIT1(0x51); /* push rcx */ 1184 + } 1223 1185 /* mov rcx, src_reg */ 1224 1186 EMIT_mov(BPF_REG_4, src_reg); 1225 1187 } ··· 1252 1172 b3 = simple_alu_opcodes[BPF_OP(insn->code)]; 1253 1173 EMIT2(0xD3, add_1reg(b3, dst_reg)); 1254 1174 1255 - if (src_reg != BPF_REG_4) 1256 - EMIT1(0x59); /* pop rcx */ 1175 + if (src_reg != BPF_REG_4) { 1176 + if (insn->dst_reg == BPF_REG_4) 1177 + /* mov dst_reg, r11 */ 1178 + EMIT_mov(insn->dst_reg, AUX_REG); 1179 + else 1180 + EMIT1(0x59); /* pop rcx */ 1181 + } 1257 1182 1258 - if (insn->dst_reg == BPF_REG_4) 1259 - /* mov dst_reg, r11 */ 1260 - EMIT_mov(insn->dst_reg, AUX_REG); 1261 1183 break; 1262 1184 1263 1185 case BPF_ALU | BPF_END | BPF_FROM_BE: ··· 1907 1825 struct bpf_tramp_link *l, int stack_size, 1908 1826 int run_ctx_off, bool save_ret) 1909 1827 { 1910 - void (*exit)(struct bpf_prog *prog, u64 start, 1911 - struct bpf_tramp_run_ctx *run_ctx) = __bpf_prog_exit; 1912 - u64 (*enter)(struct bpf_prog *prog, 1913 - struct bpf_tramp_run_ctx *run_ctx) = __bpf_prog_enter; 1914 1828 u8 *prog = *pprog; 1915 1829 u8 *jmp_insn; 1916 1830 int ctx_cookie_off = offsetof(struct bpf_tramp_run_ctx, bpf_cookie); ··· 1925 1847 */ 1926 1848 emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_1, -run_ctx_off + ctx_cookie_off); 1927 1849 1928 - if (p->aux->sleepable) { 1929 - enter = __bpf_prog_enter_sleepable; 1930 - exit = __bpf_prog_exit_sleepable; 1931 - } else if (p->type == BPF_PROG_TYPE_STRUCT_OPS) { 1932 - enter = __bpf_prog_enter_struct_ops; 1933 - exit = __bpf_prog_exit_struct_ops; 1934 - } else if (p->expected_attach_type == BPF_LSM_CGROUP) { 1935 - enter = __bpf_prog_enter_lsm_cgroup; 1936 - exit = __bpf_prog_exit_lsm_cgroup; 1937 - } 1938 - 1939 1850 /* arg1: mov rdi, progs[i] */ 1940 1851 emit_mov_imm64(&prog, BPF_REG_1, (long) p >> 32, (u32) (long) p); 1941 1852 /* arg2: lea rsi, [rbp - ctx_cookie_off] */ 1942 1853 EMIT4(0x48, 0x8D, 0x75, -run_ctx_off); 1943 1854 1944 - if (emit_call(&prog, enter, prog)) 1855 + if (emit_call(&prog, bpf_trampoline_enter(p), prog)) 1945 1856 return -EINVAL; 1946 1857 /* remember prog start time returned by __bpf_prog_enter */ 1947 1858 emit_mov_reg(&prog, true, BPF_REG_6, BPF_REG_0); ··· 1975 1908 emit_mov_reg(&prog, true, BPF_REG_2, BPF_REG_6); 1976 1909 /* arg3: lea rdx, [rbp - run_ctx_off] */ 1977 1910 EMIT4(0x48, 0x8D, 0x55, -run_ctx_off); 1978 - if (emit_call(&prog, exit, prog)) 1911 + if (emit_call(&prog, bpf_trampoline_exit(p), prog)) 1979 1912 return -EINVAL; 1980 1913 1981 1914 *pprog = prog;

+19 -14

include/linux/bpf.h

··· 855 855 const struct btf_func_model *m, u32 flags, 856 856 struct bpf_tramp_links *tlinks, 857 857 void *orig_call); 858 - /* these two functions are called from generated trampoline */ 859 - u64 notrace __bpf_prog_enter(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx); 860 - void notrace __bpf_prog_exit(struct bpf_prog *prog, u64 start, struct bpf_tramp_run_ctx *run_ctx); 861 - u64 notrace __bpf_prog_enter_sleepable(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx); 862 - void notrace __bpf_prog_exit_sleepable(struct bpf_prog *prog, u64 start, 863 - struct bpf_tramp_run_ctx *run_ctx); 864 - u64 notrace __bpf_prog_enter_lsm_cgroup(struct bpf_prog *prog, 865 - struct bpf_tramp_run_ctx *run_ctx); 866 - void notrace __bpf_prog_exit_lsm_cgroup(struct bpf_prog *prog, u64 start, 867 - struct bpf_tramp_run_ctx *run_ctx); 868 - u64 notrace __bpf_prog_enter_struct_ops(struct bpf_prog *prog, 869 - struct bpf_tramp_run_ctx *run_ctx); 870 - void notrace __bpf_prog_exit_struct_ops(struct bpf_prog *prog, u64 start, 871 - struct bpf_tramp_run_ctx *run_ctx); 858 + u64 notrace __bpf_prog_enter_sleepable_recur(struct bpf_prog *prog, 859 + struct bpf_tramp_run_ctx *run_ctx); 860 + void notrace __bpf_prog_exit_sleepable_recur(struct bpf_prog *prog, u64 start, 861 + struct bpf_tramp_run_ctx *run_ctx); 872 862 void notrace __bpf_tramp_enter(struct bpf_tramp_image *tr); 873 863 void notrace __bpf_tramp_exit(struct bpf_tramp_image *tr); 864 + typedef u64 (*bpf_trampoline_enter_t)(struct bpf_prog *prog, 865 + struct bpf_tramp_run_ctx *run_ctx); 866 + typedef void (*bpf_trampoline_exit_t)(struct bpf_prog *prog, u64 start, 867 + struct bpf_tramp_run_ctx *run_ctx); 868 + bpf_trampoline_enter_t bpf_trampoline_enter(const struct bpf_prog *prog); 869 + bpf_trampoline_exit_t bpf_trampoline_exit(const struct bpf_prog *prog); 874 870 875 871 struct bpf_ksym { 876 872 unsigned long start; ··· 2053 2057 2054 2058 const struct bpf_func_proto *bpf_base_func_proto(enum bpf_func_id func_id); 2055 2059 void bpf_task_storage_free(struct task_struct *task); 2060 + void bpf_cgrp_storage_free(struct cgroup *cgroup); 2056 2061 bool bpf_prog_has_kfunc_call(const struct bpf_prog *prog); 2057 2062 const struct btf_func_model * 2058 2063 bpf_jit_find_kfunc_model(const struct bpf_prog *prog, ··· 2308 2311 static inline void bpf_prog_inc_misses_counter(struct bpf_prog *prog) 2309 2312 { 2310 2313 } 2314 + 2315 + static inline void bpf_cgrp_storage_free(struct cgroup *cgroup) 2316 + { 2317 + } 2311 2318 #endif /* CONFIG_BPF_SYSCALL */ 2312 2319 2313 2320 void __bpf_free_used_btfs(struct bpf_prog_aux *aux, ··· 2536 2535 extern const struct bpf_func_proto bpf_ktime_get_coarse_ns_proto; 2537 2536 extern const struct bpf_func_proto bpf_sock_from_file_proto; 2538 2537 extern const struct bpf_func_proto bpf_get_socket_ptr_cookie_proto; 2538 + extern const struct bpf_func_proto bpf_task_storage_get_recur_proto; 2539 2539 extern const struct bpf_func_proto bpf_task_storage_get_proto; 2540 + extern const struct bpf_func_proto bpf_task_storage_delete_recur_proto; 2540 2541 extern const struct bpf_func_proto bpf_task_storage_delete_proto; 2541 2542 extern const struct bpf_func_proto bpf_for_each_map_elem_proto; 2542 2543 extern const struct bpf_func_proto bpf_btf_find_by_name_kind_proto; ··· 2552 2549 extern const struct bpf_func_proto bpf_set_retval_proto; 2553 2550 extern const struct bpf_func_proto bpf_get_retval_proto; 2554 2551 extern const struct bpf_func_proto bpf_user_ringbuf_drain_proto; 2552 + extern const struct bpf_func_proto bpf_cgrp_storage_get_proto; 2553 + extern const struct bpf_func_proto bpf_cgrp_storage_delete_proto; 2555 2554 2556 2555 const struct bpf_func_proto *tracing_prog_func_proto( 2557 2556 enum bpf_func_id func_id, const struct bpf_prog *prog);

+7 -10

include/linux/bpf_local_storage.h

··· 116 116 .idx_lock = __SPIN_LOCK_UNLOCKED(name.idx_lock), \ 117 117 } 118 118 119 - u16 bpf_local_storage_cache_idx_get(struct bpf_local_storage_cache *cache); 120 - void bpf_local_storage_cache_idx_free(struct bpf_local_storage_cache *cache, 121 - u16 idx); 122 - 123 119 /* Helper functions for bpf_local_storage */ 124 120 int bpf_local_storage_map_alloc_check(union bpf_attr *attr); 125 121 126 - struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr); 122 + struct bpf_map * 123 + bpf_local_storage_map_alloc(union bpf_attr *attr, 124 + struct bpf_local_storage_cache *cache); 127 125 128 126 struct bpf_local_storage_data * 129 127 bpf_local_storage_lookup(struct bpf_local_storage *local_storage, 130 128 struct bpf_local_storage_map *smap, 131 129 bool cacheit_lockit); 132 130 133 - void bpf_local_storage_map_free(struct bpf_local_storage_map *smap, 131 + bool bpf_local_storage_unlink_nolock(struct bpf_local_storage *local_storage); 132 + 133 + void bpf_local_storage_map_free(struct bpf_map *map, 134 + struct bpf_local_storage_cache *cache, 134 135 int __percpu *busy_counter); 135 136 136 137 int bpf_local_storage_map_check_btf(const struct bpf_map *map, ··· 141 140 142 141 void bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage, 143 142 struct bpf_local_storage_elem *selem); 144 - 145 - bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_storage, 146 - struct bpf_local_storage_elem *selem, 147 - bool uncharge_omem, bool use_trace_rcu); 148 143 149 144 void bpf_selem_unlink(struct bpf_local_storage_elem *selem, bool use_trace_rcu); 150 145

+1

include/linux/bpf_types.h

··· 86 86 BPF_MAP_TYPE(BPF_MAP_TYPE_PERF_EVENT_ARRAY, perf_event_array_map_ops) 87 87 #ifdef CONFIG_CGROUPS 88 88 BPF_MAP_TYPE(BPF_MAP_TYPE_CGROUP_ARRAY, cgroup_array_map_ops) 89 + BPF_MAP_TYPE(BPF_MAP_TYPE_CGRP_STORAGE, cgrp_storage_map_ops) 89 90 #endif 90 91 #ifdef CONFIG_CGROUP_BPF 91 92 BPF_MAP_TYPE(BPF_MAP_TYPE_CGROUP_STORAGE, cgroup_storage_map_ops)

+14 -1

include/linux/bpf_verifier.h

··· 642 642 } 643 643 644 644 /* only use after check_attach_btf_id() */ 645 - static inline enum bpf_prog_type resolve_prog_type(struct bpf_prog *prog) 645 + static inline enum bpf_prog_type resolve_prog_type(const struct bpf_prog *prog) 646 646 { 647 647 return prog->type == BPF_PROG_TYPE_EXT ? 648 648 prog->aux->dst_prog->type : prog->type; 649 + } 650 + 651 + static inline bool bpf_prog_check_recur(const struct bpf_prog *prog) 652 + { 653 + switch (resolve_prog_type(prog)) { 654 + case BPF_PROG_TYPE_TRACING: 655 + return prog->expected_attach_type != BPF_TRACE_ITER; 656 + case BPF_PROG_TYPE_STRUCT_OPS: 657 + case BPF_PROG_TYPE_LSM: 658 + return false; 659 + default: 660 + return true; 661 + } 649 662 } 650 663 651 664 #endif /* _LINUX_BPF_VERIFIER_H */

+1

include/linux/btf_ids.h

··· 265 265 }; 266 266 267 267 extern u32 btf_tracing_ids[]; 268 + extern u32 bpf_cgroup_btf_id[]; 268 269 269 270 #endif

+4

include/linux/cgroup-defs.h

··· 507 507 /* Used to store internal freezer state */ 508 508 struct cgroup_freezer_state freezer; 509 509 510 + #ifdef CONFIG_BPF_SYSCALL 511 + struct bpf_local_storage __rcu *bpf_cgrp_storage; 512 + #endif 513 + 510 514 /* All ancestors including self */ 511 515 struct cgroup *ancestors[]; 512 516 };

+9

include/linux/module.h

··· 879 879 } 880 880 #endif /* CONFIG_MODULE_SIG */ 881 881 882 + #if defined(CONFIG_MODULES) && defined(CONFIG_KALLSYMS) 882 883 int module_kallsyms_on_each_symbol(int (*fn)(void *, const char *, 883 884 struct module *, unsigned long), 884 885 void *data); 886 + #else 887 + static inline int module_kallsyms_on_each_symbol(int (*fn)(void *, const char *, 888 + struct module *, unsigned long), 889 + void *data) 890 + { 891 + return -EOPNOTSUPP; 892 + } 893 + #endif /* CONFIG_MODULES && CONFIG_KALLSYMS */ 885 894 886 895 #endif /* _LINUX_MODULE_H */

+49 -1

include/uapi/linux/bpf.h

··· 922 922 BPF_MAP_TYPE_CPUMAP, 923 923 BPF_MAP_TYPE_XSKMAP, 924 924 BPF_MAP_TYPE_SOCKHASH, 925 - BPF_MAP_TYPE_CGROUP_STORAGE, 925 + BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED, 926 + /* BPF_MAP_TYPE_CGROUP_STORAGE is available to bpf programs attaching 927 + * to a cgroup. The newer BPF_MAP_TYPE_CGRP_STORAGE is available to 928 + * both cgroup-attached and other progs and supports all functionality 929 + * provided by BPF_MAP_TYPE_CGROUP_STORAGE. So mark 930 + * BPF_MAP_TYPE_CGROUP_STORAGE deprecated. 931 + */ 932 + BPF_MAP_TYPE_CGROUP_STORAGE = BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED, 926 933 BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, 927 934 BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE, 928 935 BPF_MAP_TYPE_QUEUE, ··· 942 935 BPF_MAP_TYPE_TASK_STORAGE, 943 936 BPF_MAP_TYPE_BLOOM_FILTER, 944 937 BPF_MAP_TYPE_USER_RINGBUF, 938 + BPF_MAP_TYPE_CGRP_STORAGE, 945 939 }; 946 940 947 941 /* Note that tracing related programs such as ··· 5443 5435 * **-E2BIG** if user-space has tried to publish a sample which is 5444 5436 * larger than the size of the ring buffer, or which cannot fit 5445 5437 * within a struct bpf_dynptr. 5438 + * 5439 + * void *bpf_cgrp_storage_get(struct bpf_map *map, struct cgroup *cgroup, void *value, u64 flags) 5440 + * Description 5441 + * Get a bpf_local_storage from the *cgroup*. 5442 + * 5443 + * Logically, it could be thought of as getting the value from 5444 + * a *map* with *cgroup* as the **key**. From this 5445 + * perspective, the usage is not much different from 5446 + * **bpf_map_lookup_elem**\ (*map*, **&**\ *cgroup*) except this 5447 + * helper enforces the key must be a cgroup struct and the map must also 5448 + * be a **BPF_MAP_TYPE_CGRP_STORAGE**. 5449 + * 5450 + * In reality, the local-storage value is embedded directly inside of the 5451 + * *cgroup* object itself, rather than being located in the 5452 + * **BPF_MAP_TYPE_CGRP_STORAGE** map. When the local-storage value is 5453 + * queried for some *map* on a *cgroup* object, the kernel will perform an 5454 + * O(n) iteration over all of the live local-storage values for that 5455 + * *cgroup* object until the local-storage value for the *map* is found. 5456 + * 5457 + * An optional *flags* (**BPF_LOCAL_STORAGE_GET_F_CREATE**) can be 5458 + * used such that a new bpf_local_storage will be 5459 + * created if one does not exist. *value* can be used 5460 + * together with **BPF_LOCAL_STORAGE_GET_F_CREATE** to specify 5461 + * the initial value of a bpf_local_storage. If *value* is 5462 + * **NULL**, the new bpf_local_storage will be zero initialized. 5463 + * Return 5464 + * A bpf_local_storage pointer is returned on success. 5465 + * 5466 + * **NULL** if not found or there was an error in adding 5467 + * a new bpf_local_storage. 5468 + * 5469 + * long bpf_cgrp_storage_delete(struct bpf_map *map, struct cgroup *cgroup) 5470 + * Description 5471 + * Delete a bpf_local_storage from a *cgroup*. 5472 + * Return 5473 + * 0 on success. 5474 + * 5475 + * **-ENOENT** if the bpf_local_storage cannot be found. 5446 5476 */ 5447 5477 #define ___BPF_FUNC_MAPPER(FN, ctx...) \ 5448 5478 FN(unspec, 0, ##ctx) \ ··· 5693 5647 FN(tcp_raw_check_syncookie_ipv6, 207, ##ctx) \ 5694 5648 FN(ktime_get_tai_ns, 208, ##ctx) \ 5695 5649 FN(user_ringbuf_drain, 209, ##ctx) \ 5650 + FN(cgrp_storage_get, 210, ##ctx) \ 5651 + FN(cgrp_storage_delete, 211, ##ctx) \ 5696 5652 /* */ 5697 5653 5698 5654 /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that don't

+1 -1

kernel/bpf/Makefile

··· 25 25 obj-$(CONFIG_BPF_SYSCALL) += stackmap.o 26 26 endif 27 27 ifeq ($(CONFIG_CGROUPS),y) 28 - obj-$(CONFIG_BPF_SYSCALL) += cgroup_iter.o 28 + obj-$(CONFIG_BPF_SYSCALL) += cgroup_iter.o bpf_cgrp_storage.o 29 29 endif 30 30 obj-$(CONFIG_CGROUP_BPF) += cgroup.o 31 31 ifeq ($(CONFIG_INET),y)

+247

kernel/bpf/bpf_cgrp_storage.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Copyright (c) 2022 Meta Platforms, Inc. and affiliates. 4 + */ 5 + 6 + #include <linux/types.h> 7 + #include <linux/bpf.h> 8 + #include <linux/bpf_local_storage.h> 9 + #include <uapi/linux/btf.h> 10 + #include <linux/btf_ids.h> 11 + 12 + DEFINE_BPF_STORAGE_CACHE(cgroup_cache); 13 + 14 + static DEFINE_PER_CPU(int, bpf_cgrp_storage_busy); 15 + 16 + static void bpf_cgrp_storage_lock(void) 17 + { 18 + migrate_disable(); 19 + this_cpu_inc(bpf_cgrp_storage_busy); 20 + } 21 + 22 + static void bpf_cgrp_storage_unlock(void) 23 + { 24 + this_cpu_dec(bpf_cgrp_storage_busy); 25 + migrate_enable(); 26 + } 27 + 28 + static bool bpf_cgrp_storage_trylock(void) 29 + { 30 + migrate_disable(); 31 + if (unlikely(this_cpu_inc_return(bpf_cgrp_storage_busy) != 1)) { 32 + this_cpu_dec(bpf_cgrp_storage_busy); 33 + migrate_enable(); 34 + return false; 35 + } 36 + return true; 37 + } 38 + 39 + static struct bpf_local_storage __rcu **cgroup_storage_ptr(void *owner) 40 + { 41 + struct cgroup *cg = owner; 42 + 43 + return &cg->bpf_cgrp_storage; 44 + } 45 + 46 + void bpf_cgrp_storage_free(struct cgroup *cgroup) 47 + { 48 + struct bpf_local_storage *local_storage; 49 + bool free_cgroup_storage = false; 50 + unsigned long flags; 51 + 52 + rcu_read_lock(); 53 + local_storage = rcu_dereference(cgroup->bpf_cgrp_storage); 54 + if (!local_storage) { 55 + rcu_read_unlock(); 56 + return; 57 + } 58 + 59 + bpf_cgrp_storage_lock(); 60 + raw_spin_lock_irqsave(&local_storage->lock, flags); 61 + free_cgroup_storage = bpf_local_storage_unlink_nolock(local_storage); 62 + raw_spin_unlock_irqrestore(&local_storage->lock, flags); 63 + bpf_cgrp_storage_unlock(); 64 + rcu_read_unlock(); 65 + 66 + if (free_cgroup_storage) 67 + kfree_rcu(local_storage, rcu); 68 + } 69 + 70 + static struct bpf_local_storage_data * 71 + cgroup_storage_lookup(struct cgroup *cgroup, struct bpf_map *map, bool cacheit_lockit) 72 + { 73 + struct bpf_local_storage *cgroup_storage; 74 + struct bpf_local_storage_map *smap; 75 + 76 + cgroup_storage = rcu_dereference_check(cgroup->bpf_cgrp_storage, 77 + bpf_rcu_lock_held()); 78 + if (!cgroup_storage) 79 + return NULL; 80 + 81 + smap = (struct bpf_local_storage_map *)map; 82 + return bpf_local_storage_lookup(cgroup_storage, smap, cacheit_lockit); 83 + } 84 + 85 + static void *bpf_cgrp_storage_lookup_elem(struct bpf_map *map, void *key) 86 + { 87 + struct bpf_local_storage_data *sdata; 88 + struct cgroup *cgroup; 89 + int fd; 90 + 91 + fd = *(int *)key; 92 + cgroup = cgroup_get_from_fd(fd); 93 + if (IS_ERR(cgroup)) 94 + return ERR_CAST(cgroup); 95 + 96 + bpf_cgrp_storage_lock(); 97 + sdata = cgroup_storage_lookup(cgroup, map, true); 98 + bpf_cgrp_storage_unlock(); 99 + cgroup_put(cgroup); 100 + return sdata ? sdata->data : NULL; 101 + } 102 + 103 + static int bpf_cgrp_storage_update_elem(struct bpf_map *map, void *key, 104 + void *value, u64 map_flags) 105 + { 106 + struct bpf_local_storage_data *sdata; 107 + struct cgroup *cgroup; 108 + int fd; 109 + 110 + fd = *(int *)key; 111 + cgroup = cgroup_get_from_fd(fd); 112 + if (IS_ERR(cgroup)) 113 + return PTR_ERR(cgroup); 114 + 115 + bpf_cgrp_storage_lock(); 116 + sdata = bpf_local_storage_update(cgroup, (struct bpf_local_storage_map *)map, 117 + value, map_flags, GFP_ATOMIC); 118 + bpf_cgrp_storage_unlock(); 119 + cgroup_put(cgroup); 120 + return PTR_ERR_OR_ZERO(sdata); 121 + } 122 + 123 + static int cgroup_storage_delete(struct cgroup *cgroup, struct bpf_map *map) 124 + { 125 + struct bpf_local_storage_data *sdata; 126 + 127 + sdata = cgroup_storage_lookup(cgroup, map, false); 128 + if (!sdata) 129 + return -ENOENT; 130 + 131 + bpf_selem_unlink(SELEM(sdata), true); 132 + return 0; 133 + } 134 + 135 + static int bpf_cgrp_storage_delete_elem(struct bpf_map *map, void *key) 136 + { 137 + struct cgroup *cgroup; 138 + int err, fd; 139 + 140 + fd = *(int *)key; 141 + cgroup = cgroup_get_from_fd(fd); 142 + if (IS_ERR(cgroup)) 143 + return PTR_ERR(cgroup); 144 + 145 + bpf_cgrp_storage_lock(); 146 + err = cgroup_storage_delete(cgroup, map); 147 + bpf_cgrp_storage_unlock(); 148 + cgroup_put(cgroup); 149 + return err; 150 + } 151 + 152 + static int notsupp_get_next_key(struct bpf_map *map, void *key, void *next_key) 153 + { 154 + return -ENOTSUPP; 155 + } 156 + 157 + static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr) 158 + { 159 + return bpf_local_storage_map_alloc(attr, &cgroup_cache); 160 + } 161 + 162 + static void cgroup_storage_map_free(struct bpf_map *map) 163 + { 164 + bpf_local_storage_map_free(map, &cgroup_cache, NULL); 165 + } 166 + 167 + /* *gfp_flags* is a hidden argument provided by the verifier */ 168 + BPF_CALL_5(bpf_cgrp_storage_get, struct bpf_map *, map, struct cgroup *, cgroup, 169 + void *, value, u64, flags, gfp_t, gfp_flags) 170 + { 171 + struct bpf_local_storage_data *sdata; 172 + 173 + WARN_ON_ONCE(!bpf_rcu_lock_held()); 174 + if (flags & ~(BPF_LOCAL_STORAGE_GET_F_CREATE)) 175 + return (unsigned long)NULL; 176 + 177 + if (!cgroup) 178 + return (unsigned long)NULL; 179 + 180 + if (!bpf_cgrp_storage_trylock()) 181 + return (unsigned long)NULL; 182 + 183 + sdata = cgroup_storage_lookup(cgroup, map, true); 184 + if (sdata) 185 + goto unlock; 186 + 187 + /* only allocate new storage, when the cgroup is refcounted */ 188 + if (!percpu_ref_is_dying(&cgroup->self.refcnt) && 189 + (flags & BPF_LOCAL_STORAGE_GET_F_CREATE)) 190 + sdata = bpf_local_storage_update(cgroup, (struct bpf_local_storage_map *)map, 191 + value, BPF_NOEXIST, gfp_flags); 192 + 193 + unlock: 194 + bpf_cgrp_storage_unlock(); 195 + return IS_ERR_OR_NULL(sdata) ? (unsigned long)NULL : (unsigned long)sdata->data; 196 + } 197 + 198 + BPF_CALL_2(bpf_cgrp_storage_delete, struct bpf_map *, map, struct cgroup *, cgroup) 199 + { 200 + int ret; 201 + 202 + WARN_ON_ONCE(!bpf_rcu_lock_held()); 203 + if (!cgroup) 204 + return -EINVAL; 205 + 206 + if (!bpf_cgrp_storage_trylock()) 207 + return -EBUSY; 208 + 209 + ret = cgroup_storage_delete(cgroup, map); 210 + bpf_cgrp_storage_unlock(); 211 + return ret; 212 + } 213 + 214 + BTF_ID_LIST_SINGLE(cgroup_storage_map_btf_ids, struct, bpf_local_storage_map) 215 + const struct bpf_map_ops cgrp_storage_map_ops = { 216 + .map_meta_equal = bpf_map_meta_equal, 217 + .map_alloc_check = bpf_local_storage_map_alloc_check, 218 + .map_alloc = cgroup_storage_map_alloc, 219 + .map_free = cgroup_storage_map_free, 220 + .map_get_next_key = notsupp_get_next_key, 221 + .map_lookup_elem = bpf_cgrp_storage_lookup_elem, 222 + .map_update_elem = bpf_cgrp_storage_update_elem, 223 + .map_delete_elem = bpf_cgrp_storage_delete_elem, 224 + .map_check_btf = bpf_local_storage_map_check_btf, 225 + .map_btf_id = &cgroup_storage_map_btf_ids[0], 226 + .map_owner_storage_ptr = cgroup_storage_ptr, 227 + }; 228 + 229 + const struct bpf_func_proto bpf_cgrp_storage_get_proto = { 230 + .func = bpf_cgrp_storage_get, 231 + .gpl_only = false, 232 + .ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL, 233 + .arg1_type = ARG_CONST_MAP_PTR, 234 + .arg2_type = ARG_PTR_TO_BTF_ID, 235 + .arg2_btf_id = &bpf_cgroup_btf_id[0], 236 + .arg3_type = ARG_PTR_TO_MAP_VALUE_OR_NULL, 237 + .arg4_type = ARG_ANYTHING, 238 + }; 239 + 240 + const struct bpf_func_proto bpf_cgrp_storage_delete_proto = { 241 + .func = bpf_cgrp_storage_delete, 242 + .gpl_only = false, 243 + .ret_type = RET_INTEGER, 244 + .arg1_type = ARG_CONST_MAP_PTR, 245 + .arg2_type = ARG_PTR_TO_BTF_ID, 246 + .arg2_btf_id = &bpf_cgroup_btf_id[0], 247 + };

+3 -35

kernel/bpf/bpf_inode_storage.c

··· 56 56 57 57 void bpf_inode_storage_free(struct inode *inode) 58 58 { 59 - struct bpf_local_storage_elem *selem; 60 59 struct bpf_local_storage *local_storage; 61 60 bool free_inode_storage = false; 62 61 struct bpf_storage_blob *bsb; 63 - struct hlist_node *n; 64 62 65 63 bsb = bpf_inode(inode); 66 64 if (!bsb) ··· 72 74 return; 73 75 } 74 76 75 - /* Neither the bpf_prog nor the bpf-map's syscall 76 - * could be modifying the local_storage->list now. 77 - * Thus, no elem can be added-to or deleted-from the 78 - * local_storage->list by the bpf_prog or by the bpf-map's syscall. 79 - * 80 - * It is racing with bpf_local_storage_map_free() alone 81 - * when unlinking elem from the local_storage->list and 82 - * the map's bucket->list. 83 - */ 84 77 raw_spin_lock_bh(&local_storage->lock); 85 - hlist_for_each_entry_safe(selem, n, &local_storage->list, snode) { 86 - /* Always unlink from map before unlinking from 87 - * local_storage. 88 - */ 89 - bpf_selem_unlink_map(selem); 90 - free_inode_storage = bpf_selem_unlink_storage_nolock( 91 - local_storage, selem, false, false); 92 - } 78 + free_inode_storage = bpf_local_storage_unlink_nolock(local_storage); 93 79 raw_spin_unlock_bh(&local_storage->lock); 94 80 rcu_read_unlock(); 95 81 96 - /* free_inoode_storage should always be true as long as 97 - * local_storage->list was non-empty. 98 - */ 99 82 if (free_inode_storage) 100 83 kfree_rcu(local_storage, rcu); 101 84 } ··· 205 226 206 227 static struct bpf_map *inode_storage_map_alloc(union bpf_attr *attr) 207 228 { 208 - struct bpf_local_storage_map *smap; 209 - 210 - smap = bpf_local_storage_map_alloc(attr); 211 - if (IS_ERR(smap)) 212 - return ERR_CAST(smap); 213 - 214 - smap->cache_idx = bpf_local_storage_cache_idx_get(&inode_cache); 215 - return &smap->map; 229 + return bpf_local_storage_map_alloc(attr, &inode_cache); 216 230 } 217 231 218 232 static void inode_storage_map_free(struct bpf_map *map) 219 233 { 220 - struct bpf_local_storage_map *smap; 221 - 222 - smap = (struct bpf_local_storage_map *)map; 223 - bpf_local_storage_cache_idx_free(&inode_cache, smap->cache_idx); 224 - bpf_local_storage_map_free(smap, NULL); 234 + bpf_local_storage_map_free(map, &inode_cache, NULL); 225 235 } 226 236 227 237 BTF_ID_LIST_SINGLE(inode_storage_map_btf_ids, struct,

+131 -78

kernel/bpf/bpf_local_storage.c

··· 113 113 * The caller must ensure selem->smap is still valid to be 114 114 * dereferenced for its smap->elem_size and smap->cache_idx. 115 115 */ 116 - bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_storage, 117 - struct bpf_local_storage_elem *selem, 118 - bool uncharge_mem, bool use_trace_rcu) 116 + static bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_storage, 117 + struct bpf_local_storage_elem *selem, 118 + bool uncharge_mem, bool use_trace_rcu) 119 119 { 120 120 struct bpf_local_storage_map *smap; 121 121 bool free_local_storage; ··· 242 242 __bpf_selem_unlink_storage(selem, use_trace_rcu); 243 243 } 244 244 245 + /* If cacheit_lockit is false, this lookup function is lockless */ 245 246 struct bpf_local_storage_data * 246 247 bpf_local_storage_lookup(struct bpf_local_storage *local_storage, 247 248 struct bpf_local_storage_map *smap, ··· 501 500 return ERR_PTR(err); 502 501 } 503 502 504 - u16 bpf_local_storage_cache_idx_get(struct bpf_local_storage_cache *cache) 503 + static u16 bpf_local_storage_cache_idx_get(struct bpf_local_storage_cache *cache) 505 504 { 506 505 u64 min_usage = U64_MAX; 507 506 u16 i, res = 0; ··· 525 524 return res; 526 525 } 527 526 528 - void bpf_local_storage_cache_idx_free(struct bpf_local_storage_cache *cache, 529 - u16 idx) 527 + static void bpf_local_storage_cache_idx_free(struct bpf_local_storage_cache *cache, 528 + u16 idx) 530 529 { 531 530 spin_lock(&cache->idx_lock); 532 531 cache->idx_usage_counts[idx]--; 533 532 spin_unlock(&cache->idx_lock); 534 533 } 535 534 536 - void bpf_local_storage_map_free(struct bpf_local_storage_map *smap, 537 - int __percpu *busy_counter) 535 + int bpf_local_storage_map_alloc_check(union bpf_attr *attr) 536 + { 537 + if (attr->map_flags & ~BPF_LOCAL_STORAGE_CREATE_FLAG_MASK || 538 + !(attr->map_flags & BPF_F_NO_PREALLOC) || 539 + attr->max_entries || 540 + attr->key_size != sizeof(int) || !attr->value_size || 541 + /* Enforce BTF for userspace sk dumping */ 542 + !attr->btf_key_type_id || !attr->btf_value_type_id) 543 + return -EINVAL; 544 + 545 + if (!bpf_capable()) 546 + return -EPERM; 547 + 548 + if (attr->value_size > BPF_LOCAL_STORAGE_MAX_VALUE_SIZE) 549 + return -E2BIG; 550 + 551 + return 0; 552 + } 553 + 554 + static struct bpf_local_storage_map *__bpf_local_storage_map_alloc(union bpf_attr *attr) 555 + { 556 + struct bpf_local_storage_map *smap; 557 + unsigned int i; 558 + u32 nbuckets; 559 + 560 + smap = bpf_map_area_alloc(sizeof(*smap), NUMA_NO_NODE); 561 + if (!smap) 562 + return ERR_PTR(-ENOMEM); 563 + bpf_map_init_from_attr(&smap->map, attr); 564 + 565 + nbuckets = roundup_pow_of_two(num_possible_cpus()); 566 + /* Use at least 2 buckets, select_bucket() is undefined behavior with 1 bucket */ 567 + nbuckets = max_t(u32, 2, nbuckets); 568 + smap->bucket_log = ilog2(nbuckets); 569 + 570 + smap->buckets = kvcalloc(sizeof(*smap->buckets), nbuckets, 571 + GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT); 572 + if (!smap->buckets) { 573 + bpf_map_area_free(smap); 574 + return ERR_PTR(-ENOMEM); 575 + } 576 + 577 + for (i = 0; i < nbuckets; i++) { 578 + INIT_HLIST_HEAD(&smap->buckets[i].list); 579 + raw_spin_lock_init(&smap->buckets[i].lock); 580 + } 581 + 582 + smap->elem_size = 583 + sizeof(struct bpf_local_storage_elem) + attr->value_size; 584 + 585 + return smap; 586 + } 587 + 588 + int bpf_local_storage_map_check_btf(const struct bpf_map *map, 589 + const struct btf *btf, 590 + const struct btf_type *key_type, 591 + const struct btf_type *value_type) 592 + { 593 + u32 int_data; 594 + 595 + if (BTF_INFO_KIND(key_type->info) != BTF_KIND_INT) 596 + return -EINVAL; 597 + 598 + int_data = *(u32 *)(key_type + 1); 599 + if (BTF_INT_BITS(int_data) != 32 || BTF_INT_OFFSET(int_data)) 600 + return -EINVAL; 601 + 602 + return 0; 603 + } 604 + 605 + bool bpf_local_storage_unlink_nolock(struct bpf_local_storage *local_storage) 538 606 { 539 607 struct bpf_local_storage_elem *selem; 608 + bool free_storage = false; 609 + struct hlist_node *n; 610 + 611 + /* Neither the bpf_prog nor the bpf_map's syscall 612 + * could be modifying the local_storage->list now. 613 + * Thus, no elem can be added to or deleted from the 614 + * local_storage->list by the bpf_prog or by the bpf_map's syscall. 615 + * 616 + * It is racing with bpf_local_storage_map_free() alone 617 + * when unlinking elem from the local_storage->list and 618 + * the map's bucket->list. 619 + */ 620 + hlist_for_each_entry_safe(selem, n, &local_storage->list, snode) { 621 + /* Always unlink from map before unlinking from 622 + * local_storage. 623 + */ 624 + bpf_selem_unlink_map(selem); 625 + /* If local_storage list has only one element, the 626 + * bpf_selem_unlink_storage_nolock() will return true. 627 + * Otherwise, it will return false. The current loop iteration 628 + * intends to remove all local storage. So the last iteration 629 + * of the loop will set the free_cgroup_storage to true. 630 + */ 631 + free_storage = bpf_selem_unlink_storage_nolock( 632 + local_storage, selem, false, false); 633 + } 634 + 635 + return free_storage; 636 + } 637 + 638 + struct bpf_map * 639 + bpf_local_storage_map_alloc(union bpf_attr *attr, 640 + struct bpf_local_storage_cache *cache) 641 + { 642 + struct bpf_local_storage_map *smap; 643 + 644 + smap = __bpf_local_storage_map_alloc(attr); 645 + if (IS_ERR(smap)) 646 + return ERR_CAST(smap); 647 + 648 + smap->cache_idx = bpf_local_storage_cache_idx_get(cache); 649 + return &smap->map; 650 + } 651 + 652 + void bpf_local_storage_map_free(struct bpf_map *map, 653 + struct bpf_local_storage_cache *cache, 654 + int __percpu *busy_counter) 655 + { 540 656 struct bpf_local_storage_map_bucket *b; 657 + struct bpf_local_storage_elem *selem; 658 + struct bpf_local_storage_map *smap; 541 659 unsigned int i; 660 + 661 + smap = (struct bpf_local_storage_map *)map; 662 + bpf_local_storage_cache_idx_free(cache, smap->cache_idx); 542 663 543 664 /* Note that this map might be concurrently cloned from 544 665 * bpf_sk_storage_clone. Wait for any existing bpf_sk_storage_clone ··· 715 592 716 593 kvfree(smap->buckets); 717 594 bpf_map_area_free(smap); 718 - } 719 - 720 - int bpf_local_storage_map_alloc_check(union bpf_attr *attr) 721 - { 722 - if (attr->map_flags & ~BPF_LOCAL_STORAGE_CREATE_FLAG_MASK || 723 - !(attr->map_flags & BPF_F_NO_PREALLOC) || 724 - attr->max_entries || 725 - attr->key_size != sizeof(int) || !attr->value_size || 726 - /* Enforce BTF for userspace sk dumping */ 727 - !attr->btf_key_type_id || !attr->btf_value_type_id) 728 - return -EINVAL; 729 - 730 - if (!bpf_capable()) 731 - return -EPERM; 732 - 733 - if (attr->value_size > BPF_LOCAL_STORAGE_MAX_VALUE_SIZE) 734 - return -E2BIG; 735 - 736 - return 0; 737 - } 738 - 739 - struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr) 740 - { 741 - struct bpf_local_storage_map *smap; 742 - unsigned int i; 743 - u32 nbuckets; 744 - 745 - smap = bpf_map_area_alloc(sizeof(*smap), NUMA_NO_NODE); 746 - if (!smap) 747 - return ERR_PTR(-ENOMEM); 748 - bpf_map_init_from_attr(&smap->map, attr); 749 - 750 - nbuckets = roundup_pow_of_two(num_possible_cpus()); 751 - /* Use at least 2 buckets, select_bucket() is undefined behavior with 1 bucket */ 752 - nbuckets = max_t(u32, 2, nbuckets); 753 - smap->bucket_log = ilog2(nbuckets); 754 - 755 - smap->buckets = kvcalloc(sizeof(*smap->buckets), nbuckets, 756 - GFP_USER | __GFP_NOWARN | __GFP_ACCOUNT); 757 - if (!smap->buckets) { 758 - bpf_map_area_free(smap); 759 - return ERR_PTR(-ENOMEM); 760 - } 761 - 762 - for (i = 0; i < nbuckets; i++) { 763 - INIT_HLIST_HEAD(&smap->buckets[i].list); 764 - raw_spin_lock_init(&smap->buckets[i].lock); 765 - } 766 - 767 - smap->elem_size = 768 - sizeof(struct bpf_local_storage_elem) + attr->value_size; 769 - 770 - return smap; 771 - } 772 - 773 - int bpf_local_storage_map_check_btf(const struct bpf_map *map, 774 - const struct btf *btf, 775 - const struct btf_type *key_type, 776 - const struct btf_type *value_type) 777 - { 778 - u32 int_data; 779 - 780 - if (BTF_INFO_KIND(key_type->info) != BTF_KIND_INT) 781 - return -EINVAL; 782 - 783 - int_data = *(u32 *)(key_type + 1); 784 - if (BTF_INT_BITS(int_data) != 32 || BTF_INT_OFFSET(int_data)) 785 - return -EINVAL; 786 - 787 - return 0; 788 595 }

+100 -63

kernel/bpf/bpf_task_storage.c

··· 71 71 72 72 void bpf_task_storage_free(struct task_struct *task) 73 73 { 74 - struct bpf_local_storage_elem *selem; 75 74 struct bpf_local_storage *local_storage; 76 75 bool free_task_storage = false; 77 - struct hlist_node *n; 78 76 unsigned long flags; 79 77 80 78 rcu_read_lock(); ··· 83 85 return; 84 86 } 85 87 86 - /* Neither the bpf_prog nor the bpf-map's syscall 87 - * could be modifying the local_storage->list now. 88 - * Thus, no elem can be added-to or deleted-from the 89 - * local_storage->list by the bpf_prog or by the bpf-map's syscall. 90 - * 91 - * It is racing with bpf_local_storage_map_free() alone 92 - * when unlinking elem from the local_storage->list and 93 - * the map's bucket->list. 94 - */ 95 88 bpf_task_storage_lock(); 96 89 raw_spin_lock_irqsave(&local_storage->lock, flags); 97 - hlist_for_each_entry_safe(selem, n, &local_storage->list, snode) { 98 - /* Always unlink from map before unlinking from 99 - * local_storage. 100 - */ 101 - bpf_selem_unlink_map(selem); 102 - free_task_storage = bpf_selem_unlink_storage_nolock( 103 - local_storage, selem, false, false); 104 - } 90 + free_task_storage = bpf_local_storage_unlink_nolock(local_storage); 105 91 raw_spin_unlock_irqrestore(&local_storage->lock, flags); 106 92 bpf_task_storage_unlock(); 107 93 rcu_read_unlock(); 108 94 109 - /* free_task_storage should always be true as long as 110 - * local_storage->list was non-empty. 111 - */ 112 95 if (free_task_storage) 113 96 kfree_rcu(local_storage, rcu); 114 97 } ··· 163 184 return err; 164 185 } 165 186 166 - static int task_storage_delete(struct task_struct *task, struct bpf_map *map) 187 + static int task_storage_delete(struct task_struct *task, struct bpf_map *map, 188 + bool nobusy) 167 189 { 168 190 struct bpf_local_storage_data *sdata; 169 191 170 192 sdata = task_storage_lookup(task, map, false); 171 193 if (!sdata) 172 194 return -ENOENT; 195 + 196 + if (!nobusy) 197 + return -EBUSY; 173 198 174 199 bpf_selem_unlink(SELEM(sdata), true); 175 200 ··· 203 220 } 204 221 205 222 bpf_task_storage_lock(); 206 - err = task_storage_delete(task, map); 223 + err = task_storage_delete(task, map, true); 207 224 bpf_task_storage_unlock(); 208 225 out: 209 226 put_pid(pid); 210 227 return err; 211 228 } 212 229 230 + /* Called by bpf_task_storage_get*() helpers */ 231 + static void *__bpf_task_storage_get(struct bpf_map *map, 232 + struct task_struct *task, void *value, 233 + u64 flags, gfp_t gfp_flags, bool nobusy) 234 + { 235 + struct bpf_local_storage_data *sdata; 236 + 237 + sdata = task_storage_lookup(task, map, nobusy); 238 + if (sdata) 239 + return sdata->data; 240 + 241 + /* only allocate new storage, when the task is refcounted */ 242 + if (refcount_read(&task->usage) && 243 + (flags & BPF_LOCAL_STORAGE_GET_F_CREATE) && nobusy) { 244 + sdata = bpf_local_storage_update( 245 + task, (struct bpf_local_storage_map *)map, value, 246 + BPF_NOEXIST, gfp_flags); 247 + return IS_ERR(sdata) ? NULL : sdata->data; 248 + } 249 + 250 + return NULL; 251 + } 252 + 253 + /* *gfp_flags* is a hidden argument provided by the verifier */ 254 + BPF_CALL_5(bpf_task_storage_get_recur, struct bpf_map *, map, struct task_struct *, 255 + task, void *, value, u64, flags, gfp_t, gfp_flags) 256 + { 257 + bool nobusy; 258 + void *data; 259 + 260 + WARN_ON_ONCE(!bpf_rcu_lock_held()); 261 + if (flags & ~BPF_LOCAL_STORAGE_GET_F_CREATE || !task) 262 + return (unsigned long)NULL; 263 + 264 + nobusy = bpf_task_storage_trylock(); 265 + data = __bpf_task_storage_get(map, task, value, flags, 266 + gfp_flags, nobusy); 267 + if (nobusy) 268 + bpf_task_storage_unlock(); 269 + return (unsigned long)data; 270 + } 271 + 213 272 /* *gfp_flags* is a hidden argument provided by the verifier */ 214 273 BPF_CALL_5(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *, 215 274 task, void *, value, u64, flags, gfp_t, gfp_flags) 216 275 { 217 - struct bpf_local_storage_data *sdata; 276 + void *data; 218 277 219 278 WARN_ON_ONCE(!bpf_rcu_lock_held()); 220 - if (flags & ~(BPF_LOCAL_STORAGE_GET_F_CREATE)) 279 + if (flags & ~BPF_LOCAL_STORAGE_GET_F_CREATE || !task) 221 280 return (unsigned long)NULL; 222 281 223 - if (!task) 224 - return (unsigned long)NULL; 225 - 226 - if (!bpf_task_storage_trylock()) 227 - return (unsigned long)NULL; 228 - 229 - sdata = task_storage_lookup(task, map, true); 230 - if (sdata) 231 - goto unlock; 232 - 233 - /* only allocate new storage, when the task is refcounted */ 234 - if (refcount_read(&task->usage) && 235 - (flags & BPF_LOCAL_STORAGE_GET_F_CREATE)) 236 - sdata = bpf_local_storage_update( 237 - task, (struct bpf_local_storage_map *)map, value, 238 - BPF_NOEXIST, gfp_flags); 239 - 240 - unlock: 282 + bpf_task_storage_lock(); 283 + data = __bpf_task_storage_get(map, task, value, flags, 284 + gfp_flags, true); 241 285 bpf_task_storage_unlock(); 242 - return IS_ERR_OR_NULL(sdata) ? (unsigned long)NULL : 243 - (unsigned long)sdata->data; 286 + return (unsigned long)data; 287 + } 288 + 289 + BPF_CALL_2(bpf_task_storage_delete_recur, struct bpf_map *, map, struct task_struct *, 290 + task) 291 + { 292 + bool nobusy; 293 + int ret; 294 + 295 + WARN_ON_ONCE(!bpf_rcu_lock_held()); 296 + if (!task) 297 + return -EINVAL; 298 + 299 + nobusy = bpf_task_storage_trylock(); 300 + /* This helper must only be called from places where the lifetime of the task 301 + * is guaranteed. Either by being refcounted or by being protected 302 + * by an RCU read-side critical section. 303 + */ 304 + ret = task_storage_delete(task, map, nobusy); 305 + if (nobusy) 306 + bpf_task_storage_unlock(); 307 + return ret; 244 308 } 245 309 246 310 BPF_CALL_2(bpf_task_storage_delete, struct bpf_map *, map, struct task_struct *, ··· 299 269 if (!task) 300 270 return -EINVAL; 301 271 302 - if (!bpf_task_storage_trylock()) 303 - return -EBUSY; 304 - 272 + bpf_task_storage_lock(); 305 273 /* This helper must only be called from places where the lifetime of the task 306 274 * is guaranteed. Either by being refcounted or by being protected 307 275 * by an RCU read-side critical section. 308 276 */ 309 - ret = task_storage_delete(task, map); 277 + ret = task_storage_delete(task, map, true); 310 278 bpf_task_storage_unlock(); 311 279 return ret; 312 280 } ··· 316 288 317 289 static struct bpf_map *task_storage_map_alloc(union bpf_attr *attr) 318 290 { 319 - struct bpf_local_storage_map *smap; 320 - 321 - smap = bpf_local_storage_map_alloc(attr); 322 - if (IS_ERR(smap)) 323 - return ERR_CAST(smap); 324 - 325 - smap->cache_idx = bpf_local_storage_cache_idx_get(&task_cache); 326 - return &smap->map; 291 + return bpf_local_storage_map_alloc(attr, &task_cache); 327 292 } 328 293 329 294 static void task_storage_map_free(struct bpf_map *map) 330 295 { 331 - struct bpf_local_storage_map *smap; 332 - 333 - smap = (struct bpf_local_storage_map *)map; 334 - bpf_local_storage_cache_idx_free(&task_cache, smap->cache_idx); 335 - bpf_local_storage_map_free(smap, &bpf_task_storage_busy); 296 + bpf_local_storage_map_free(map, &task_cache, &bpf_task_storage_busy); 336 297 } 337 298 338 299 BTF_ID_LIST_SINGLE(task_storage_map_btf_ids, struct, bpf_local_storage_map) ··· 339 322 .map_owner_storage_ptr = task_storage_ptr, 340 323 }; 341 324 325 + const struct bpf_func_proto bpf_task_storage_get_recur_proto = { 326 + .func = bpf_task_storage_get_recur, 327 + .gpl_only = false, 328 + .ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL, 329 + .arg1_type = ARG_CONST_MAP_PTR, 330 + .arg2_type = ARG_PTR_TO_BTF_ID, 331 + .arg2_btf_id = &btf_tracing_ids[BTF_TRACING_TYPE_TASK], 332 + .arg3_type = ARG_PTR_TO_MAP_VALUE_OR_NULL, 333 + .arg4_type = ARG_ANYTHING, 334 + }; 335 + 342 336 const struct bpf_func_proto bpf_task_storage_get_proto = { 343 337 .func = bpf_task_storage_get, 344 338 .gpl_only = false, ··· 359 331 .arg2_btf_id = &btf_tracing_ids[BTF_TRACING_TYPE_TASK], 360 332 .arg3_type = ARG_PTR_TO_MAP_VALUE_OR_NULL, 361 333 .arg4_type = ARG_ANYTHING, 334 + }; 335 + 336 + const struct bpf_func_proto bpf_task_storage_delete_recur_proto = { 337 + .func = bpf_task_storage_delete_recur, 338 + .gpl_only = false, 339 + .ret_type = RET_INTEGER, 340 + .arg1_type = ARG_CONST_MAP_PTR, 341 + .arg2_type = ARG_PTR_TO_BTF_ID, 342 + .arg2_btf_id = &btf_tracing_ids[BTF_TRACING_TYPE_TASK], 362 343 }; 363 344 364 345 const struct bpf_func_proto bpf_task_storage_delete_proto = {

+1 -1

kernel/bpf/cgroup_iter.c

··· 157 157 .show = cgroup_iter_seq_show, 158 158 }; 159 159 160 - BTF_ID_LIST_SINGLE(bpf_cgroup_btf_id, struct, cgroup) 160 + BTF_ID_LIST_GLOBAL_SINGLE(bpf_cgroup_btf_id, struct, cgroup) 161 161 162 162 static int cgroup_iter_seq_init(void *priv, struct bpf_iter_aux_info *aux) 163 163 {

+8 -12

kernel/bpf/cpumap.c

··· 85 85 { 86 86 u32 value_size = attr->value_size; 87 87 struct bpf_cpu_map *cmap; 88 - int err = -ENOMEM; 89 88 90 89 if (!bpf_capable()) 91 90 return ERR_PTR(-EPERM); ··· 96 97 attr->map_flags & ~BPF_F_NUMA_NODE) 97 98 return ERR_PTR(-EINVAL); 98 99 100 + /* Pre-limit array size based on NR_CPUS, not final CPU check */ 101 + if (attr->max_entries > NR_CPUS) 102 + return ERR_PTR(-E2BIG); 103 + 99 104 cmap = bpf_map_area_alloc(sizeof(*cmap), NUMA_NO_NODE); 100 105 if (!cmap) 101 106 return ERR_PTR(-ENOMEM); 102 107 103 108 bpf_map_init_from_attr(&cmap->map, attr); 104 109 105 - /* Pre-limit array size based on NR_CPUS, not final CPU check */ 106 - if (cmap->map.max_entries > NR_CPUS) { 107 - err = -E2BIG; 108 - goto free_cmap; 109 - } 110 - 111 110 /* Alloc array for possible remote "destination" CPUs */ 112 111 cmap->cpu_map = bpf_map_area_alloc(cmap->map.max_entries * 113 112 sizeof(struct bpf_cpu_map_entry *), 114 113 cmap->map.numa_node); 115 - if (!cmap->cpu_map) 116 - goto free_cmap; 114 + if (!cmap->cpu_map) { 115 + bpf_map_area_free(cmap); 116 + return ERR_PTR(-ENOMEM); 117 + } 117 118 118 119 return &cmap->map; 119 - free_cmap: 120 - bpf_map_area_free(cmap); 121 - return ERR_PTR(err); 122 120 } 123 121 124 122 static void get_cpu_map_entry(struct bpf_cpu_map_entry *rcpu)

+6

kernel/bpf/helpers.c

··· 1663 1663 return &bpf_dynptr_write_proto; 1664 1664 case BPF_FUNC_dynptr_data: 1665 1665 return &bpf_dynptr_data_proto; 1666 + #ifdef CONFIG_CGROUPS 1667 + case BPF_FUNC_cgrp_storage_get: 1668 + return &bpf_cgrp_storage_get_proto; 1669 + case BPF_FUNC_cgrp_storage_delete: 1670 + return &bpf_cgrp_storage_delete_proto; 1671 + #endif 1666 1672 default: 1667 1673 break; 1668 1674 }

+7 -5

kernel/bpf/syscall.c

··· 1016 1016 map->map_type != BPF_MAP_TYPE_CGROUP_STORAGE && 1017 1017 map->map_type != BPF_MAP_TYPE_SK_STORAGE && 1018 1018 map->map_type != BPF_MAP_TYPE_INODE_STORAGE && 1019 - map->map_type != BPF_MAP_TYPE_TASK_STORAGE) 1019 + map->map_type != BPF_MAP_TYPE_TASK_STORAGE && 1020 + map->map_type != BPF_MAP_TYPE_CGRP_STORAGE) 1020 1021 return -ENOTSUPP; 1021 1022 if (map->spin_lock_off + sizeof(struct bpf_spin_lock) > 1022 1023 map->value_size) { ··· 2118 2117 2119 2118 st = per_cpu_ptr(prog->stats, cpu); 2120 2119 do { 2121 - start = u64_stats_fetch_begin_irq(&st->syncp); 2120 + start = u64_stats_fetch_begin(&st->syncp); 2122 2121 tnsecs = u64_stats_read(&st->nsecs); 2123 2122 tcnt = u64_stats_read(&st->cnt); 2124 2123 tmisses = u64_stats_read(&st->misses); 2125 - } while (u64_stats_fetch_retry_irq(&st->syncp, start)); 2124 + } while (u64_stats_fetch_retry(&st->syncp, start)); 2126 2125 nsecs += tnsecs; 2127 2126 cnt += tcnt; 2128 2127 misses += tmisses; ··· 5134 5133 5135 5134 run_ctx.bpf_cookie = 0; 5136 5135 run_ctx.saved_run_ctx = NULL; 5137 - if (!__bpf_prog_enter_sleepable(prog, &run_ctx)) { 5136 + if (!__bpf_prog_enter_sleepable_recur(prog, &run_ctx)) { 5138 5137 /* recursion detected */ 5139 5138 bpf_prog_put(prog); 5140 5139 return -EBUSY; 5141 5140 } 5142 5141 attr->test.retval = bpf_prog_run(prog, (void *) (long) attr->test.ctx_in); 5143 - __bpf_prog_exit_sleepable(prog, 0 /* bpf_prog_run does runtime stats */, &run_ctx); 5142 + __bpf_prog_exit_sleepable_recur(prog, 0 /* bpf_prog_run does runtime stats */, 5143 + &run_ctx); 5144 5144 bpf_prog_put(prog); 5145 5145 return 0; 5146 5146 #endif

+67 -13

kernel/bpf/trampoline.c

··· 864 864 * [2..MAX_U64] - execute bpf prog and record execution time. 865 865 * This is start time. 866 866 */ 867 - u64 notrace __bpf_prog_enter(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx) 867 + static u64 notrace __bpf_prog_enter_recur(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx) 868 868 __acquires(RCU) 869 869 { 870 870 rcu_read_lock(); ··· 901 901 } 902 902 } 903 903 904 - void notrace __bpf_prog_exit(struct bpf_prog *prog, u64 start, struct bpf_tramp_run_ctx *run_ctx) 904 + static void notrace __bpf_prog_exit_recur(struct bpf_prog *prog, u64 start, 905 + struct bpf_tramp_run_ctx *run_ctx) 905 906 __releases(RCU) 906 907 { 907 908 bpf_reset_run_ctx(run_ctx->saved_run_ctx); ··· 913 912 rcu_read_unlock(); 914 913 } 915 914 916 - u64 notrace __bpf_prog_enter_lsm_cgroup(struct bpf_prog *prog, 917 - struct bpf_tramp_run_ctx *run_ctx) 915 + static u64 notrace __bpf_prog_enter_lsm_cgroup(struct bpf_prog *prog, 916 + struct bpf_tramp_run_ctx *run_ctx) 918 917 __acquires(RCU) 919 918 { 920 919 /* Runtime stats are exported via actual BPF_LSM_CGROUP ··· 928 927 return NO_START_TIME; 929 928 } 930 929 931 - void notrace __bpf_prog_exit_lsm_cgroup(struct bpf_prog *prog, u64 start, 932 - struct bpf_tramp_run_ctx *run_ctx) 930 + static void notrace __bpf_prog_exit_lsm_cgroup(struct bpf_prog *prog, u64 start, 931 + struct bpf_tramp_run_ctx *run_ctx) 933 932 __releases(RCU) 934 933 { 935 934 bpf_reset_run_ctx(run_ctx->saved_run_ctx); ··· 938 937 rcu_read_unlock(); 939 938 } 940 939 941 - u64 notrace __bpf_prog_enter_sleepable(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx) 940 + u64 notrace __bpf_prog_enter_sleepable_recur(struct bpf_prog *prog, 941 + struct bpf_tramp_run_ctx *run_ctx) 942 942 { 943 943 rcu_read_lock_trace(); 944 944 migrate_disable(); ··· 955 953 return bpf_prog_start_time(); 956 954 } 957 955 958 - void notrace __bpf_prog_exit_sleepable(struct bpf_prog *prog, u64 start, 959 - struct bpf_tramp_run_ctx *run_ctx) 956 + void notrace __bpf_prog_exit_sleepable_recur(struct bpf_prog *prog, u64 start, 957 + struct bpf_tramp_run_ctx *run_ctx) 960 958 { 961 959 bpf_reset_run_ctx(run_ctx->saved_run_ctx); 962 960 ··· 966 964 rcu_read_unlock_trace(); 967 965 } 968 966 969 - u64 notrace __bpf_prog_enter_struct_ops(struct bpf_prog *prog, 970 - struct bpf_tramp_run_ctx *run_ctx) 967 + static u64 notrace __bpf_prog_enter_sleepable(struct bpf_prog *prog, 968 + struct bpf_tramp_run_ctx *run_ctx) 969 + { 970 + rcu_read_lock_trace(); 971 + migrate_disable(); 972 + might_fault(); 973 + 974 + run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx); 975 + 976 + return bpf_prog_start_time(); 977 + } 978 + 979 + static void notrace __bpf_prog_exit_sleepable(struct bpf_prog *prog, u64 start, 980 + struct bpf_tramp_run_ctx *run_ctx) 981 + { 982 + bpf_reset_run_ctx(run_ctx->saved_run_ctx); 983 + 984 + update_prog_stats(prog, start); 985 + migrate_enable(); 986 + rcu_read_unlock_trace(); 987 + } 988 + 989 + static u64 notrace __bpf_prog_enter(struct bpf_prog *prog, 990 + struct bpf_tramp_run_ctx *run_ctx) 971 991 __acquires(RCU) 972 992 { 973 993 rcu_read_lock(); ··· 1000 976 return bpf_prog_start_time(); 1001 977 } 1002 978 1003 - void notrace __bpf_prog_exit_struct_ops(struct bpf_prog *prog, u64 start, 1004 - struct bpf_tramp_run_ctx *run_ctx) 979 + static void notrace __bpf_prog_exit(struct bpf_prog *prog, u64 start, 980 + struct bpf_tramp_run_ctx *run_ctx) 1005 981 __releases(RCU) 1006 982 { 1007 983 bpf_reset_run_ctx(run_ctx->saved_run_ctx); ··· 1019 995 void notrace __bpf_tramp_exit(struct bpf_tramp_image *tr) 1020 996 { 1021 997 percpu_ref_put(&tr->pcref); 998 + } 999 + 1000 + bpf_trampoline_enter_t bpf_trampoline_enter(const struct bpf_prog *prog) 1001 + { 1002 + bool sleepable = prog->aux->sleepable; 1003 + 1004 + if (bpf_prog_check_recur(prog)) 1005 + return sleepable ? __bpf_prog_enter_sleepable_recur : 1006 + __bpf_prog_enter_recur; 1007 + 1008 + if (resolve_prog_type(prog) == BPF_PROG_TYPE_LSM && 1009 + prog->expected_attach_type == BPF_LSM_CGROUP) 1010 + return __bpf_prog_enter_lsm_cgroup; 1011 + 1012 + return sleepable ? __bpf_prog_enter_sleepable : __bpf_prog_enter; 1013 + } 1014 + 1015 + bpf_trampoline_exit_t bpf_trampoline_exit(const struct bpf_prog *prog) 1016 + { 1017 + bool sleepable = prog->aux->sleepable; 1018 + 1019 + if (bpf_prog_check_recur(prog)) 1020 + return sleepable ? __bpf_prog_exit_sleepable_recur : 1021 + __bpf_prog_exit_recur; 1022 + 1023 + if (resolve_prog_type(prog) == BPF_PROG_TYPE_LSM && 1024 + prog->expected_attach_type == BPF_LSM_CGROUP) 1025 + return __bpf_prog_exit_lsm_cgroup; 1026 + 1027 + return sleepable ? __bpf_prog_exit_sleepable : __bpf_prog_exit; 1022 1028 } 1023 1029 1024 1030 int __weak

+15 -14

kernel/bpf/verifier.c

··· 5634 5634 u32 *btf_id; 5635 5635 }; 5636 5636 5637 - static const struct bpf_reg_types map_key_value_types = { 5638 - .types = { 5639 - PTR_TO_STACK, 5640 - PTR_TO_PACKET, 5641 - PTR_TO_PACKET_META, 5642 - PTR_TO_MAP_KEY, 5643 - PTR_TO_MAP_VALUE, 5644 - }, 5645 - }; 5646 - 5647 5637 static const struct bpf_reg_types sock_types = { 5648 5638 .types = { 5649 5639 PTR_TO_SOCK_COMMON, ··· 5700 5710 }; 5701 5711 5702 5712 static const struct bpf_reg_types *compatible_reg_types[__BPF_ARG_TYPE_MAX] = { 5703 - [ARG_PTR_TO_MAP_KEY] = &map_key_value_types, 5704 - [ARG_PTR_TO_MAP_VALUE] = &map_key_value_types, 5713 + [ARG_PTR_TO_MAP_KEY] = &mem_types, 5714 + [ARG_PTR_TO_MAP_VALUE] = &mem_types, 5705 5715 [ARG_CONST_SIZE] = &scalar_types, 5706 5716 [ARG_CONST_SIZE_OR_ZERO] = &scalar_types, 5707 5717 [ARG_CONST_ALLOC_SIZE_OR_ZERO] = &scalar_types, ··· 6350 6360 func_id != BPF_FUNC_task_storage_delete) 6351 6361 goto error; 6352 6362 break; 6363 + case BPF_MAP_TYPE_CGRP_STORAGE: 6364 + if (func_id != BPF_FUNC_cgrp_storage_get && 6365 + func_id != BPF_FUNC_cgrp_storage_delete) 6366 + goto error; 6367 + break; 6353 6368 case BPF_MAP_TYPE_BLOOM_FILTER: 6354 6369 if (func_id != BPF_FUNC_map_peek_elem && 6355 6370 func_id != BPF_FUNC_map_push_elem) ··· 6465 6470 case BPF_FUNC_task_storage_get: 6466 6471 case BPF_FUNC_task_storage_delete: 6467 6472 if (map->map_type != BPF_MAP_TYPE_TASK_STORAGE) 6473 + goto error; 6474 + break; 6475 + case BPF_FUNC_cgrp_storage_get: 6476 + case BPF_FUNC_cgrp_storage_delete: 6477 + if (map->map_type != BPF_MAP_TYPE_CGRP_STORAGE) 6468 6478 goto error; 6469 6479 break; 6470 6480 default: ··· 10671 10671 * 3 let S be a stack 10672 10672 * 4 S.push(v) 10673 10673 * 5 while S is not empty 10674 - * 6 t <- S.pop() 10674 + * 6 t <- S.peek() 10675 10675 * 7 if t is what we're looking for: 10676 10676 * 8 return t 10677 10677 * 9 for all edges e in G.adjacentEdges(t) do ··· 14150 14150 14151 14151 if (insn->imm == BPF_FUNC_task_storage_get || 14152 14152 insn->imm == BPF_FUNC_sk_storage_get || 14153 - insn->imm == BPF_FUNC_inode_storage_get) { 14153 + insn->imm == BPF_FUNC_inode_storage_get || 14154 + insn->imm == BPF_FUNC_cgrp_storage_get) { 14154 14155 if (env->prog->aux->sleepable) 14155 14156 insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_KERNEL); 14156 14157 else

+1

kernel/cgroup/cgroup.c

··· 5349 5349 atomic_dec(&cgrp->root->nr_cgrps); 5350 5350 cgroup1_pidlist_destroy_all(cgrp); 5351 5351 cancel_work_sync(&cgrp->release_agent_work); 5352 + bpf_cgrp_storage_free(cgrp); 5352 5353 5353 5354 if (cgroup_parent(cgrp)) { 5354 5355 /*

-2

kernel/module/kallsyms.c

··· 494 494 return ret; 495 495 } 496 496 497 - #ifdef CONFIG_LIVEPATCH 498 497 int module_kallsyms_on_each_symbol(int (*fn)(void *, const char *, 499 498 struct module *, unsigned long), 500 499 void *data) ··· 530 531 mutex_unlock(&module_mutex); 531 532 return ret; 532 533 } 533 - #endif /* CONFIG_LIVEPATCH */

+104 -3

kernel/trace/bpf_trace.c

··· 6 6 #include <linux/types.h> 7 7 #include <linux/slab.h> 8 8 #include <linux/bpf.h> 9 + #include <linux/bpf_verifier.h> 9 10 #include <linux/bpf_perf_event.h> 10 11 #include <linux/btf.h> 11 12 #include <linux/filter.h> ··· 1457 1456 return &bpf_get_current_cgroup_id_proto; 1458 1457 case BPF_FUNC_get_current_ancestor_cgroup_id: 1459 1458 return &bpf_get_current_ancestor_cgroup_id_proto; 1459 + case BPF_FUNC_cgrp_storage_get: 1460 + return &bpf_cgrp_storage_get_proto; 1461 + case BPF_FUNC_cgrp_storage_delete: 1462 + return &bpf_cgrp_storage_delete_proto; 1460 1463 #endif 1461 1464 case BPF_FUNC_send_signal: 1462 1465 return &bpf_send_signal_proto; ··· 1495 1490 case BPF_FUNC_this_cpu_ptr: 1496 1491 return &bpf_this_cpu_ptr_proto; 1497 1492 case BPF_FUNC_task_storage_get: 1493 + if (bpf_prog_check_recur(prog)) 1494 + return &bpf_task_storage_get_recur_proto; 1498 1495 return &bpf_task_storage_get_proto; 1499 1496 case BPF_FUNC_task_storage_delete: 1497 + if (bpf_prog_check_recur(prog)) 1498 + return &bpf_task_storage_delete_recur_proto; 1500 1499 return &bpf_task_storage_delete_proto; 1501 1500 case BPF_FUNC_for_each_map_elem: 1502 1501 return &bpf_for_each_map_elem_proto; ··· 2461 2452 unsigned long *addrs; 2462 2453 u64 *cookies; 2463 2454 u32 cnt; 2455 + u32 mods_cnt; 2456 + struct module **mods; 2464 2457 }; 2465 2458 2466 2459 struct bpf_kprobe_multi_run_ctx { ··· 2518 2507 return err; 2519 2508 } 2520 2509 2510 + static void kprobe_multi_put_modules(struct module **mods, u32 cnt) 2511 + { 2512 + u32 i; 2513 + 2514 + for (i = 0; i < cnt; i++) 2515 + module_put(mods[i]); 2516 + } 2517 + 2521 2518 static void free_user_syms(struct user_syms *us) 2522 2519 { 2523 2520 kvfree(us->syms); ··· 2538 2519 2539 2520 kmulti_link = container_of(link, struct bpf_kprobe_multi_link, link); 2540 2521 unregister_fprobe(&kmulti_link->fp); 2522 + kprobe_multi_put_modules(kmulti_link->mods, kmulti_link->mods_cnt); 2541 2523 } 2542 2524 2543 2525 static void bpf_kprobe_multi_link_dealloc(struct bpf_link *link) ··· 2548 2528 kmulti_link = container_of(link, struct bpf_kprobe_multi_link, link); 2549 2529 kvfree(kmulti_link->addrs); 2550 2530 kvfree(kmulti_link->cookies); 2531 + kfree(kmulti_link->mods); 2551 2532 kfree(kmulti_link); 2552 2533 } 2553 2534 ··· 2571 2550 swap(*cookie_a, *cookie_b); 2572 2551 } 2573 2552 2574 - static int __bpf_kprobe_multi_cookie_cmp(const void *a, const void *b) 2553 + static int bpf_kprobe_multi_addrs_cmp(const void *a, const void *b) 2575 2554 { 2576 2555 const unsigned long *addr_a = a, *addr_b = b; 2577 2556 ··· 2582 2561 2583 2562 static int bpf_kprobe_multi_cookie_cmp(const void *a, const void *b, const void *priv) 2584 2563 { 2585 - return __bpf_kprobe_multi_cookie_cmp(a, b); 2564 + return bpf_kprobe_multi_addrs_cmp(a, b); 2586 2565 } 2587 2566 2588 2567 static u64 bpf_kprobe_multi_cookie(struct bpf_run_ctx *ctx) ··· 2600 2579 return 0; 2601 2580 entry_ip = run_ctx->entry_ip; 2602 2581 addr = bsearch(&entry_ip, link->addrs, link->cnt, sizeof(entry_ip), 2603 - __bpf_kprobe_multi_cookie_cmp); 2582 + bpf_kprobe_multi_addrs_cmp); 2604 2583 if (!addr) 2605 2584 return 0; 2606 2585 cookie = link->cookies + (addr - link->addrs); ··· 2682 2661 cookie_b = data->cookies + (name_b - data->funcs); 2683 2662 swap(*cookie_a, *cookie_b); 2684 2663 } 2664 + } 2665 + 2666 + struct module_addr_args { 2667 + unsigned long *addrs; 2668 + u32 addrs_cnt; 2669 + struct module **mods; 2670 + int mods_cnt; 2671 + int mods_cap; 2672 + }; 2673 + 2674 + static int module_callback(void *data, const char *name, 2675 + struct module *mod, unsigned long addr) 2676 + { 2677 + struct module_addr_args *args = data; 2678 + struct module **mods; 2679 + 2680 + /* We iterate all modules symbols and for each we: 2681 + * - search for it in provided addresses array 2682 + * - if found we check if we already have the module pointer stored 2683 + * (we iterate modules sequentially, so we can check just the last 2684 + * module pointer) 2685 + * - take module reference and store it 2686 + */ 2687 + if (!bsearch(&addr, args->addrs, args->addrs_cnt, sizeof(addr), 2688 + bpf_kprobe_multi_addrs_cmp)) 2689 + return 0; 2690 + 2691 + if (args->mods && args->mods[args->mods_cnt - 1] == mod) 2692 + return 0; 2693 + 2694 + if (args->mods_cnt == args->mods_cap) { 2695 + args->mods_cap = max(16, args->mods_cap * 3 / 2); 2696 + mods = krealloc_array(args->mods, args->mods_cap, sizeof(*mods), GFP_KERNEL); 2697 + if (!mods) 2698 + return -ENOMEM; 2699 + args->mods = mods; 2700 + } 2701 + 2702 + if (!try_module_get(mod)) 2703 + return -EINVAL; 2704 + 2705 + args->mods[args->mods_cnt] = mod; 2706 + args->mods_cnt++; 2707 + return 0; 2708 + } 2709 + 2710 + static int get_modules_for_addrs(struct module ***mods, unsigned long *addrs, u32 addrs_cnt) 2711 + { 2712 + struct module_addr_args args = { 2713 + .addrs = addrs, 2714 + .addrs_cnt = addrs_cnt, 2715 + }; 2716 + int err; 2717 + 2718 + /* We return either err < 0 in case of error, ... */ 2719 + err = module_kallsyms_on_each_symbol(module_callback, &args); 2720 + if (err) { 2721 + kprobe_multi_put_modules(args.mods, args.mods_cnt); 2722 + kfree(args.mods); 2723 + return err; 2724 + } 2725 + 2726 + /* or number of modules found if everything is ok. */ 2727 + *mods = args.mods; 2728 + return args.mods_cnt; 2685 2729 } 2686 2730 2687 2731 int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) ··· 2859 2773 bpf_kprobe_multi_cookie_cmp, 2860 2774 bpf_kprobe_multi_cookie_swap, 2861 2775 link); 2776 + } else { 2777 + /* 2778 + * We need to sort addrs array even if there are no cookies 2779 + * provided, to allow bsearch in get_modules_for_addrs. 2780 + */ 2781 + sort(addrs, cnt, sizeof(*addrs), 2782 + bpf_kprobe_multi_addrs_cmp, NULL); 2862 2783 } 2784 + 2785 + err = get_modules_for_addrs(&link->mods, addrs, cnt); 2786 + if (err < 0) { 2787 + bpf_link_cleanup(&link_primer); 2788 + return err; 2789 + } 2790 + link->mods_cnt = err; 2863 2791 2864 2792 err = register_fprobe_ips(&link->fp, addrs, cnt); 2865 2793 if (err) { 2794 + kprobe_multi_put_modules(link->mods, link->mods_cnt); 2866 2795 bpf_link_cleanup(&link_primer); 2867 2796 return err; 2868 2797 }

+11 -5

kernel/trace/ftrace.c

··· 8267 8267 size_t found; 8268 8268 }; 8269 8269 8270 + /* This function gets called for all kernel and module symbols 8271 + * and returns 1 in case we resolved all the requested symbols, 8272 + * 0 otherwise. 8273 + */ 8270 8274 static int kallsyms_callback(void *data, const char *name, 8271 8275 struct module *mod, unsigned long addr) 8272 8276 { ··· 8313 8309 int ftrace_lookup_symbols(const char **sorted_syms, size_t cnt, unsigned long *addrs) 8314 8310 { 8315 8311 struct kallsyms_data args; 8316 - int err; 8312 + int found_all; 8317 8313 8318 8314 memset(addrs, 0, sizeof(*addrs) * cnt); 8319 8315 args.addrs = addrs; 8320 8316 args.syms = sorted_syms; 8321 8317 args.cnt = cnt; 8322 8318 args.found = 0; 8323 - err = kallsyms_on_each_symbol(kallsyms_callback, &args); 8324 - if (err < 0) 8325 - return err; 8326 - return args.found == args.cnt ? 0 : -ESRCH; 8319 + 8320 + found_all = kallsyms_on_each_symbol(kallsyms_callback, &args); 8321 + if (found_all) 8322 + return 0; 8323 + found_all = module_kallsyms_on_each_symbol(kallsyms_callback, &args); 8324 + return found_all ? 0 : -ESRCH; 8327 8325 } 8328 8326 8329 8327 #ifdef CONFIG_SYSCTL

+3 -32

net/core/bpf_sk_storage.c

··· 48 48 /* Called by __sk_destruct() & bpf_sk_storage_clone() */ 49 49 void bpf_sk_storage_free(struct sock *sk) 50 50 { 51 - struct bpf_local_storage_elem *selem; 52 51 struct bpf_local_storage *sk_storage; 53 52 bool free_sk_storage = false; 54 - struct hlist_node *n; 55 53 56 54 rcu_read_lock(); 57 55 sk_storage = rcu_dereference(sk->sk_bpf_storage); ··· 58 60 return; 59 61 } 60 62 61 - /* Netiher the bpf_prog nor the bpf-map's syscall 62 - * could be modifying the sk_storage->list now. 63 - * Thus, no elem can be added-to or deleted-from the 64 - * sk_storage->list by the bpf_prog or by the bpf-map's syscall. 65 - * 66 - * It is racing with bpf_local_storage_map_free() alone 67 - * when unlinking elem from the sk_storage->list and 68 - * the map's bucket->list. 69 - */ 70 63 raw_spin_lock_bh(&sk_storage->lock); 71 - hlist_for_each_entry_safe(selem, n, &sk_storage->list, snode) { 72 - /* Always unlink from map before unlinking from 73 - * sk_storage. 74 - */ 75 - bpf_selem_unlink_map(selem); 76 - free_sk_storage = bpf_selem_unlink_storage_nolock( 77 - sk_storage, selem, true, false); 78 - } 64 + free_sk_storage = bpf_local_storage_unlink_nolock(sk_storage); 79 65 raw_spin_unlock_bh(&sk_storage->lock); 80 66 rcu_read_unlock(); 81 67 ··· 69 87 70 88 static void bpf_sk_storage_map_free(struct bpf_map *map) 71 89 { 72 - struct bpf_local_storage_map *smap; 73 - 74 - smap = (struct bpf_local_storage_map *)map; 75 - bpf_local_storage_cache_idx_free(&sk_cache, smap->cache_idx); 76 - bpf_local_storage_map_free(smap, NULL); 90 + bpf_local_storage_map_free(map, &sk_cache, NULL); 77 91 } 78 92 79 93 static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr) 80 94 { 81 - struct bpf_local_storage_map *smap; 82 - 83 - smap = bpf_local_storage_map_alloc(attr); 84 - if (IS_ERR(smap)) 85 - return ERR_CAST(smap); 86 - 87 - smap->cache_idx = bpf_local_storage_cache_idx_get(&sk_cache); 88 - return &smap->map; 95 + return bpf_local_storage_map_alloc(attr, &sk_cache); 89 96 } 90 97 91 98 static int notsupp_get_next_key(struct bpf_map *map, void *key,

+3 -3

samples/bpf/README.rst

··· 37 37 38 38 make headers_install 39 39 40 - This will creates a local "usr/include" directory in the git/build top 41 - level directory, that the make system automatically pickup first. 40 + This will create a local "usr/include" directory in the git/build top 41 + level directory, that the make system will automatically pick up first. 42 42 43 43 Compiling 44 44 ========= ··· 87 87 ----------------------- 88 88 In order to cross-compile, say for arm64 targets, export CROSS_COMPILE and ARCH 89 89 environment variables before calling make. But do this before clean, 90 - cofiguration and header install steps described above. This will direct make to 90 + configuration and header install steps described above. This will direct make to 91 91 build samples for the cross target:: 92 92 93 93 export ARCH=arm64

+1 -1

samples/bpf/hbm_edt_kern.c

··· 35 35 * 36 36 * If the credit is below the drop threshold, the packet is dropped. If it 37 37 * is a TCP packet, then it also calls tcp_cwr since packets dropped by 38 - * by a cgroup skb BPF program do not automatically trigger a call to 38 + * a cgroup skb BPF program do not automatically trigger a call to 39 39 * tcp_cwr in the current kernel code. 40 40 * 41 41 * This BPF program actually uses 2 drop thresholds, one threshold

+1 -1

samples/bpf/xdp1_user.c

··· 51 51 52 52 sleep(interval); 53 53 54 - while (bpf_map_get_next_key(map_fd, &key, &key) != -1) { 54 + while (bpf_map_get_next_key(map_fd, &key, &key) == 0) { 55 55 __u64 sum = 0; 56 56 57 57 assert(bpf_map_lookup_elem(map_fd, &key, values) == 0);

+4

samples/bpf/xdp2_kern.c

··· 112 112 113 113 if (ipproto == IPPROTO_UDP) { 114 114 swap_src_dst_mac(data); 115 + 116 + if (bpf_xdp_store_bytes(ctx, 0, pkt, sizeof(pkt))) 117 + return rc; 118 + 115 119 rc = XDP_TX; 116 120 } 117 121

+2

scripts/bpf_doc.py

··· 685 685 'struct udp6_sock', 686 686 'struct unix_sock', 687 687 'struct task_struct', 688 + 'struct cgroup', 688 689 689 690 'struct __sk_buff', 690 691 'struct sk_msg_md', ··· 743 742 'struct udp6_sock', 744 743 'struct unix_sock', 745 744 'struct task_struct', 745 + 'struct cgroup', 746 746 'struct path', 747 747 'struct btf_ptr', 748 748 'struct inode',

+1 -1

tools/bpf/bpftool/Documentation/bpftool-map.rst

··· 55 55 | | **devmap** | **devmap_hash** | **sockmap** | **cpumap** | **xskmap** | **sockhash** 56 56 | | **cgroup_storage** | **reuseport_sockarray** | **percpu_cgroup_storage** 57 57 | | **queue** | **stack** | **sk_storage** | **struct_ops** | **ringbuf** | **inode_storage** 58 - | | **task_storage** | **bloom_filter** | **user_ringbuf** } 58 + | | **task_storage** | **bloom_filter** | **user_ringbuf** | **cgrp_storage** } 59 59 60 60 DESCRIPTION 61 61 ===========

+13 -2

tools/bpf/bpftool/Documentation/bpftool-prog.rst

··· 31 31 | **bpftool** **prog dump xlated** *PROG* [{**file** *FILE* | **opcodes** | **visual** | **linum**}] 32 32 | **bpftool** **prog dump jited** *PROG* [{**file** *FILE* | **opcodes** | **linum**}] 33 33 | **bpftool** **prog pin** *PROG* *FILE* 34 - | **bpftool** **prog** { **load** | **loadall** } *OBJ* *PATH* [**type** *TYPE*] [**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*] [**pinmaps** *MAP_DIR*] 34 + | **bpftool** **prog** { **load** | **loadall** } *OBJ* *PATH* [**type** *TYPE*] [**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*] [**pinmaps** *MAP_DIR*] [**autoattach**] 35 35 | **bpftool** **prog attach** *PROG* *ATTACH_TYPE* [*MAP*] 36 36 | **bpftool** **prog detach** *PROG* *ATTACH_TYPE* [*MAP*] 37 37 | **bpftool** **prog tracelog** ··· 131 131 contain a dot character ('.'), which is reserved for future 132 132 extensions of *bpffs*. 133 133 134 - **bpftool prog { load | loadall }** *OBJ* *PATH* [**type** *TYPE*] [**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*] [**pinmaps** *MAP_DIR*] 134 + **bpftool prog { load | loadall }** *OBJ* *PATH* [**type** *TYPE*] [**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*] [**pinmaps** *MAP_DIR*] [**autoattach**] 135 135 Load bpf program(s) from binary *OBJ* and pin as *PATH*. 136 136 **bpftool prog load** pins only the first program from the 137 137 *OBJ* as *PATH*. **bpftool prog loadall** pins all programs ··· 149 149 given networking device (offload). 150 150 Optional **pinmaps** argument can be provided to pin all 151 151 maps under *MAP_DIR* directory. 152 + 153 + If **autoattach** is specified program will be attached 154 + before pin. In that case, only the link (representing the 155 + program attached to its hook) is pinned, not the program as 156 + such, so the path won't show in **bpftool prog show -f**, 157 + only show in **bpftool link show -f**. Also, this only works 158 + when bpftool (libbpf) is able to infer all necessary 159 + information from the object file, in particular, it's not 160 + supported for all program types. If a program does not 161 + support autoattach, bpftool falls back to regular pinning 162 + for that program instead. 152 163 153 164 Note: *PATH* must be located in *bpffs* mount. It must not 154 165 contain a dot character ('.'), which is reserved for future

+4 -4

tools/bpf/bpftool/Documentation/common_options.rst

··· 7 7 Print bpftool's version number (similar to **bpftool version**), the 8 8 number of the libbpf version in use, and optional features that were 9 9 included when bpftool was compiled. Optional features include linking 10 - against libbfd to provide the disassembler for JIT-ted programs 11 - (**bpftool prog dump jited**) and usage of BPF skeletons (some 12 - features like **bpftool prog profile** or showing pids associated to 13 - BPF objects may rely on it). 10 + against LLVM or libbfd to provide the disassembler for JIT-ted 11 + programs (**bpftool prog dump jited**) and usage of BPF skeletons 12 + (some features like **bpftool prog profile** or showing pids 13 + associated to BPF objects may rely on it). 14 14 15 15 -j, --json 16 16 Generate JSON output. For commands that cannot produce JSON, this

+48 -24

tools/bpf/bpftool/Makefile

··· 93 93 RM ?= rm -f 94 94 95 95 FEATURE_USER = .bpftool 96 - FEATURE_TESTS = libbfd libbfd-liberty libbfd-liberty-z \ 97 - disassembler-four-args disassembler-init-styled libcap \ 98 - clang-bpf-co-re 99 - FEATURE_DISPLAY = libbfd libbfd-liberty libbfd-liberty-z \ 100 - libcap clang-bpf-co-re 96 + 97 + FEATURE_TESTS := clang-bpf-co-re 98 + FEATURE_TESTS += llvm 99 + FEATURE_TESTS += libcap 100 + FEATURE_TESTS += libbfd 101 + FEATURE_TESTS += libbfd-liberty 102 + FEATURE_TESTS += libbfd-liberty-z 103 + FEATURE_TESTS += disassembler-four-args 104 + FEATURE_TESTS += disassembler-init-styled 105 + 106 + FEATURE_DISPLAY := clang-bpf-co-re 107 + FEATURE_DISPLAY += llvm 108 + FEATURE_DISPLAY += libcap 109 + FEATURE_DISPLAY += libbfd 110 + FEATURE_DISPLAY += libbfd-liberty 111 + FEATURE_DISPLAY += libbfd-liberty-z 101 112 102 113 check_feat := 1 103 114 NON_CHECK_FEAT_TARGETS := clean uninstall doc doc-clean doc-install doc-uninstall ··· 126 115 endif 127 116 endif 128 117 129 - ifeq ($(feature-disassembler-four-args), 1) 130 - CFLAGS += -DDISASM_FOUR_ARGS_SIGNATURE 131 - endif 132 - ifeq ($(feature-disassembler-init-styled), 1) 133 - CFLAGS += -DDISASM_INIT_STYLED 134 - endif 135 - 136 118 LIBS = $(LIBBPF) -lelf -lz 137 119 LIBS_BOOTSTRAP = $(LIBBPF_BOOTSTRAP) -lelf -lz 138 120 ifeq ($(feature-libcap), 1) ··· 137 133 138 134 all: $(OUTPUT)bpftool 139 135 140 - BFD_SRCS = jit_disasm.c 136 + SRCS := $(wildcard *.c) 141 137 142 - SRCS = $(filter-out $(BFD_SRCS),$(wildcard *.c)) 138 + ifeq ($(feature-llvm),1) 139 + # If LLVM is available, use it for JIT disassembly 140 + CFLAGS += -DHAVE_LLVM_SUPPORT 141 + LLVM_CONFIG_LIB_COMPONENTS := mcdisassembler all-targets 142 + CFLAGS += $(shell $(LLVM_CONFIG) --cflags --libs $(LLVM_CONFIG_LIB_COMPONENTS)) 143 + LIBS += $(shell $(LLVM_CONFIG) --libs $(LLVM_CONFIG_LIB_COMPONENTS)) 144 + LDFLAGS += $(shell $(LLVM_CONFIG) --ldflags) 145 + else 146 + # Fall back on libbfd 147 + ifeq ($(feature-libbfd),1) 148 + LIBS += -lbfd -ldl -lopcodes 149 + else ifeq ($(feature-libbfd-liberty),1) 150 + LIBS += -lbfd -ldl -lopcodes -liberty 151 + else ifeq ($(feature-libbfd-liberty-z),1) 152 + LIBS += -lbfd -ldl -lopcodes -liberty -lz 153 + endif 143 154 144 - ifeq ($(feature-libbfd),1) 145 - LIBS += -lbfd -ldl -lopcodes 146 - else ifeq ($(feature-libbfd-liberty),1) 147 - LIBS += -lbfd -ldl -lopcodes -liberty 148 - else ifeq ($(feature-libbfd-liberty-z),1) 149 - LIBS += -lbfd -ldl -lopcodes -liberty -lz 155 + # If one of the above feature combinations is set, we support libbfd 156 + ifneq ($(filter -lbfd,$(LIBS)),) 157 + CFLAGS += -DHAVE_LIBBFD_SUPPORT 158 + 159 + # Libbfd interface changed over time, figure out what we need 160 + ifeq ($(feature-disassembler-four-args), 1) 161 + CFLAGS += -DDISASM_FOUR_ARGS_SIGNATURE 162 + endif 163 + ifeq ($(feature-disassembler-init-styled), 1) 164 + CFLAGS += -DDISASM_INIT_STYLED 165 + endif 166 + endif 150 167 endif 151 - 152 - ifneq ($(filter -lbfd,$(LIBS)),) 153 - CFLAGS += -DHAVE_LIBBFD_SUPPORT 154 - SRCS += $(BFD_SRCS) 168 + ifeq ($(filter -DHAVE_LLVM_SUPPORT -DHAVE_LIBBFD_SUPPORT,$(CFLAGS)),) 169 + # No support for JIT disassembly 170 + SRCS := $(filter-out jit_disasm.c,$(SRCS)) 155 171 endif 156 172 157 173 HOST_CFLAGS = $(subst -I$(LIBBPF_INCLUDE),-I$(LIBBPF_BOOTSTRAP_INCLUDE),\

+1

tools/bpf/bpftool/bash-completion/bpftool

··· 505 505 _bpftool_once_attr 'type' 506 506 _bpftool_once_attr 'dev' 507 507 _bpftool_once_attr 'pinmaps' 508 + _bpftool_once_attr 'autoattach' 508 509 return 0 509 510 ;; 510 511 esac

+8 -4

tools/bpf/bpftool/common.c

··· 1 1 // SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) 2 2 /* Copyright (C) 2017-2018 Netronome Systems, Inc. */ 3 3 4 + #ifndef _GNU_SOURCE 4 5 #define _GNU_SOURCE 6 + #endif 5 7 #include <ctype.h> 6 8 #include <errno.h> 7 9 #include <fcntl.h> ··· 627 625 } 628 626 629 627 const char * 630 - ifindex_to_bfd_params(__u32 ifindex, __u64 ns_dev, __u64 ns_ino, 631 - const char **opt) 628 + ifindex_to_arch(__u32 ifindex, __u64 ns_dev, __u64 ns_ino, const char **opt) 632 629 { 630 + __maybe_unused int device_id; 633 631 char devname[IF_NAMESIZE]; 634 632 int vendor_id; 635 - int device_id; 636 633 637 634 if (!ifindex_to_name_ns(ifindex, ns_dev, ns_ino, devname)) { 638 635 p_err("Can't get net device name for ifindex %d: %s", ifindex, ··· 646 645 } 647 646 648 647 switch (vendor_id) { 648 + #ifdef HAVE_LIBBFD_SUPPORT 649 649 case 0x19ee: 650 650 device_id = read_sysfs_netdev_hex_int(devname, "device"); 651 651 if (device_id != 0x4000 && ··· 655 653 p_info("Unknown NFP device ID, assuming it is NFP-6xxx arch"); 656 654 *opt = "ctx4"; 657 655 return "NFP-6xxx"; 656 + #endif /* HAVE_LIBBFD_SUPPORT */ 657 + /* No NFP support in LLVM, we have no valid triple to return. */ 658 658 default: 659 - p_err("Can't get bfd arch name for device vendor id 0x%04x", 659 + p_err("Can't get arch name for device vendor id 0x%04x", 660 660 vendor_id); 661 661 return NULL; 662 662 }

+2

tools/bpf/bpftool/iter.c

··· 1 1 // SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) 2 2 // Copyright (C) 2020 Facebook 3 3 4 + #ifndef _GNU_SOURCE 4 5 #define _GNU_SOURCE 6 + #endif 5 7 #include <unistd.h> 6 8 #include <linux/err.h> 7 9 #include <bpf/libbpf.h>

+215 -46

tools/bpf/bpftool/jit_disasm.c

··· 11 11 * Licensed under the GNU General Public License, version 2.0 (GPLv2) 12 12 */ 13 13 14 + #ifndef _GNU_SOURCE 14 15 #define _GNU_SOURCE 16 + #endif 15 17 #include <stdio.h> 16 18 #include <stdarg.h> 17 19 #include <stdint.h> 18 20 #include <stdlib.h> 19 - #include <assert.h> 20 21 #include <unistd.h> 21 22 #include <string.h> 22 - #include <bfd.h> 23 - #include <dis-asm.h> 24 23 #include <sys/stat.h> 25 24 #include <limits.h> 26 25 #include <bpf/libbpf.h> 26 + 27 + #ifdef HAVE_LLVM_SUPPORT 28 + #include <llvm-c/Core.h> 29 + #include <llvm-c/Disassembler.h> 30 + #include <llvm-c/Target.h> 31 + #include <llvm-c/TargetMachine.h> 32 + #endif 33 + 34 + #ifdef HAVE_LIBBFD_SUPPORT 35 + #include <bfd.h> 36 + #include <dis-asm.h> 27 37 #include <tools/dis-asm-compat.h> 38 + #endif 28 39 29 40 #include "json_writer.h" 30 41 #include "main.h" 31 42 32 - static void get_exec_path(char *tpath, size_t size) 43 + static int oper_count; 44 + 45 + #ifdef HAVE_LLVM_SUPPORT 46 + #define DISASM_SPACER 47 + 48 + typedef LLVMDisasmContextRef disasm_ctx_t; 49 + 50 + static int printf_json(char *s) 51 + { 52 + s = strtok(s, " \t"); 53 + jsonw_string_field(json_wtr, "operation", s); 54 + 55 + jsonw_name(json_wtr, "operands"); 56 + jsonw_start_array(json_wtr); 57 + oper_count = 1; 58 + 59 + while ((s = strtok(NULL, " \t,()")) != 0) { 60 + jsonw_string(json_wtr, s); 61 + oper_count++; 62 + } 63 + return 0; 64 + } 65 + 66 + /* This callback to set the ref_type is necessary to have the LLVM disassembler 67 + * print PC-relative addresses instead of byte offsets for branch instruction 68 + * targets. 69 + */ 70 + static const char * 71 + symbol_lookup_callback(__maybe_unused void *disasm_info, 72 + __maybe_unused uint64_t ref_value, 73 + uint64_t *ref_type, __maybe_unused uint64_t ref_PC, 74 + __maybe_unused const char **ref_name) 75 + { 76 + *ref_type = LLVMDisassembler_ReferenceType_InOut_None; 77 + return NULL; 78 + } 79 + 80 + static int 81 + init_context(disasm_ctx_t *ctx, const char *arch, 82 + __maybe_unused const char *disassembler_options, 83 + __maybe_unused unsigned char *image, __maybe_unused ssize_t len) 84 + { 85 + char *triple; 86 + 87 + if (arch) 88 + triple = LLVMNormalizeTargetTriple(arch); 89 + else 90 + triple = LLVMGetDefaultTargetTriple(); 91 + if (!triple) { 92 + p_err("Failed to retrieve triple"); 93 + return -1; 94 + } 95 + *ctx = LLVMCreateDisasm(triple, NULL, 0, NULL, symbol_lookup_callback); 96 + LLVMDisposeMessage(triple); 97 + 98 + if (!*ctx) { 99 + p_err("Failed to create disassembler"); 100 + return -1; 101 + } 102 + 103 + return 0; 104 + } 105 + 106 + static void destroy_context(disasm_ctx_t *ctx) 107 + { 108 + LLVMDisposeMessage(*ctx); 109 + } 110 + 111 + static int 112 + disassemble_insn(disasm_ctx_t *ctx, unsigned char *image, ssize_t len, int pc) 113 + { 114 + char buf[256]; 115 + int count; 116 + 117 + count = LLVMDisasmInstruction(*ctx, image + pc, len - pc, pc, 118 + buf, sizeof(buf)); 119 + if (json_output) 120 + printf_json(buf); 121 + else 122 + printf("%s", buf); 123 + 124 + return count; 125 + } 126 + 127 + int disasm_init(void) 128 + { 129 + LLVMInitializeAllTargetInfos(); 130 + LLVMInitializeAllTargetMCs(); 131 + LLVMInitializeAllDisassemblers(); 132 + return 0; 133 + } 134 + #endif /* HAVE_LLVM_SUPPORT */ 135 + 136 + #ifdef HAVE_LIBBFD_SUPPORT 137 + #define DISASM_SPACER "\t" 138 + 139 + typedef struct { 140 + struct disassemble_info *info; 141 + disassembler_ftype disassemble; 142 + bfd *bfdf; 143 + } disasm_ctx_t; 144 + 145 + static int get_exec_path(char *tpath, size_t size) 33 146 { 34 147 const char *path = "/proc/self/exe"; 35 148 ssize_t len; 36 149 37 150 len = readlink(path, tpath, size - 1); 38 - assert(len > 0); 151 + if (len <= 0) 152 + return -1; 153 + 39 154 tpath[len] = 0; 155 + 156 + return 0; 40 157 } 41 158 42 - static int oper_count; 43 159 static int printf_json(void *out, const char *fmt, va_list ap) 44 160 { 45 161 char *s; ··· 213 97 return r; 214 98 } 215 99 216 - void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes, 217 - const char *arch, const char *disassembler_options, 218 - const struct btf *btf, 219 - const struct bpf_prog_linfo *prog_linfo, 220 - __u64 func_ksym, unsigned int func_idx, 221 - bool linum) 100 + static int init_context(disasm_ctx_t *ctx, const char *arch, 101 + const char *disassembler_options, 102 + unsigned char *image, ssize_t len) 222 103 { 223 - const struct bpf_line_info *linfo = NULL; 224 - disassembler_ftype disassemble; 225 - struct disassemble_info info; 226 - unsigned int nr_skip = 0; 227 - int count, i, pc = 0; 104 + struct disassemble_info *info; 228 105 char tpath[PATH_MAX]; 229 106 bfd *bfdf; 230 107 231 - if (!len) 232 - return; 233 - 234 108 memset(tpath, 0, sizeof(tpath)); 235 - get_exec_path(tpath, sizeof(tpath)); 109 + if (get_exec_path(tpath, sizeof(tpath))) { 110 + p_err("failed to create disassembler (get_exec_path)"); 111 + return -1; 112 + } 236 113 237 - bfdf = bfd_openr(tpath, NULL); 238 - assert(bfdf); 239 - assert(bfd_check_format(bfdf, bfd_object)); 114 + ctx->bfdf = bfd_openr(tpath, NULL); 115 + if (!ctx->bfdf) { 116 + p_err("failed to create disassembler (bfd_openr)"); 117 + return -1; 118 + } 119 + if (!bfd_check_format(ctx->bfdf, bfd_object)) { 120 + p_err("failed to create disassembler (bfd_check_format)"); 121 + goto err_close; 122 + } 123 + bfdf = ctx->bfdf; 124 + 125 + ctx->info = malloc(sizeof(struct disassemble_info)); 126 + if (!ctx->info) { 127 + p_err("mem alloc failed"); 128 + goto err_close; 129 + } 130 + info = ctx->info; 240 131 241 132 if (json_output) 242 - init_disassemble_info_compat(&info, stdout, 133 + init_disassemble_info_compat(info, stdout, 243 134 (fprintf_ftype) fprintf_json, 244 135 fprintf_json_styled); 245 136 else 246 - init_disassemble_info_compat(&info, stdout, 137 + init_disassemble_info_compat(info, stdout, 247 138 (fprintf_ftype) fprintf, 248 139 fprintf_styled); 249 140 ··· 262 139 bfdf->arch_info = inf; 263 140 } else { 264 141 p_err("No libbfd support for %s", arch); 265 - return; 142 + goto err_free; 266 143 } 267 144 } 268 145 269 - info.arch = bfd_get_arch(bfdf); 270 - info.mach = bfd_get_mach(bfdf); 146 + info->arch = bfd_get_arch(bfdf); 147 + info->mach = bfd_get_mach(bfdf); 271 148 if (disassembler_options) 272 - info.disassembler_options = disassembler_options; 273 - info.buffer = image; 274 - info.buffer_length = len; 149 + info->disassembler_options = disassembler_options; 150 + info->buffer = image; 151 + info->buffer_length = len; 275 152 276 - disassemble_init_for_target(&info); 153 + disassemble_init_for_target(info); 277 154 278 155 #ifdef DISASM_FOUR_ARGS_SIGNATURE 279 - disassemble = disassembler(info.arch, 280 - bfd_big_endian(bfdf), 281 - info.mach, 282 - bfdf); 156 + ctx->disassemble = disassembler(info->arch, 157 + bfd_big_endian(bfdf), 158 + info->mach, 159 + bfdf); 283 160 #else 284 - disassemble = disassembler(bfdf); 161 + ctx->disassemble = disassembler(bfdf); 285 162 #endif 286 - assert(disassemble); 163 + if (!ctx->disassemble) { 164 + p_err("failed to create disassembler"); 165 + goto err_free; 166 + } 167 + return 0; 168 + 169 + err_free: 170 + free(info); 171 + err_close: 172 + bfd_close(ctx->bfdf); 173 + return -1; 174 + } 175 + 176 + static void destroy_context(disasm_ctx_t *ctx) 177 + { 178 + free(ctx->info); 179 + bfd_close(ctx->bfdf); 180 + } 181 + 182 + static int 183 + disassemble_insn(disasm_ctx_t *ctx, __maybe_unused unsigned char *image, 184 + __maybe_unused ssize_t len, int pc) 185 + { 186 + return ctx->disassemble(pc, ctx->info); 187 + } 188 + 189 + int disasm_init(void) 190 + { 191 + bfd_init(); 192 + return 0; 193 + } 194 + #endif /* HAVE_LIBBPFD_SUPPORT */ 195 + 196 + int disasm_print_insn(unsigned char *image, ssize_t len, int opcodes, 197 + const char *arch, const char *disassembler_options, 198 + const struct btf *btf, 199 + const struct bpf_prog_linfo *prog_linfo, 200 + __u64 func_ksym, unsigned int func_idx, 201 + bool linum) 202 + { 203 + const struct bpf_line_info *linfo = NULL; 204 + unsigned int nr_skip = 0; 205 + int count, i, pc = 0; 206 + disasm_ctx_t ctx; 207 + 208 + if (!len) 209 + return -1; 210 + 211 + if (init_context(&ctx, arch, disassembler_options, image, len)) 212 + return -1; 287 213 288 214 if (json_output) 289 215 jsonw_start_array(json_wtr); ··· 357 185 if (linfo) 358 186 btf_dump_linfo_plain(btf, linfo, "; ", 359 187 linum); 360 - printf("%4x:\t", pc); 188 + printf("%4x:" DISASM_SPACER, pc); 361 189 } 362 190 363 - count = disassemble(pc, &info); 191 + count = disassemble_insn(&ctx, image, len, pc); 192 + 364 193 if (json_output) { 365 194 /* Operand array, was started in fprintf_json. Before 366 195 * that, make sure we have a _null_ value if no operand ··· 397 224 if (json_output) 398 225 jsonw_end_array(json_wtr); 399 226 400 - bfd_close(bfdf); 401 - } 227 + destroy_context(&ctx); 402 228 403 - int disasm_init(void) 404 - { 405 - bfd_init(); 406 229 return 0; 407 230 }

+57 -33

tools/bpf/bpftool/main.c

··· 71 71 return 0; 72 72 } 73 73 74 + static int do_batch(int argc, char **argv); 75 + static int do_version(int argc, char **argv); 76 + 77 + static const struct cmd commands[] = { 78 + { "help", do_help }, 79 + { "batch", do_batch }, 80 + { "prog", do_prog }, 81 + { "map", do_map }, 82 + { "link", do_link }, 83 + { "cgroup", do_cgroup }, 84 + { "perf", do_perf }, 85 + { "net", do_net }, 86 + { "feature", do_feature }, 87 + { "btf", do_btf }, 88 + { "gen", do_gen }, 89 + { "struct_ops", do_struct_ops }, 90 + { "iter", do_iter }, 91 + { "version", do_version }, 92 + { 0 } 93 + }; 94 + 74 95 #ifndef BPFTOOL_VERSION 75 96 /* bpftool's major and minor version numbers are aligned on libbpf's. There is 76 97 * an offset of 6 for the version number, because bpftool's version was higher ··· 103 82 #define BPFTOOL_PATCH_VERSION 0 104 83 #endif 105 84 85 + static void 86 + print_feature(const char *feature, bool state, unsigned int *nb_features) 87 + { 88 + if (state) { 89 + printf("%s %s", *nb_features ? "," : "", feature); 90 + *nb_features = *nb_features + 1; 91 + } 92 + } 93 + 106 94 static int do_version(int argc, char **argv) 107 95 { 108 96 #ifdef HAVE_LIBBFD_SUPPORT ··· 119 89 #else 120 90 const bool has_libbfd = false; 121 91 #endif 92 + #ifdef HAVE_LLVM_SUPPORT 93 + const bool has_llvm = true; 94 + #else 95 + const bool has_llvm = false; 96 + #endif 122 97 #ifdef BPFTOOL_WITHOUT_SKELETONS 123 98 const bool has_skeletons = false; 124 99 #else 125 100 const bool has_skeletons = true; 126 101 #endif 102 + bool bootstrap = false; 103 + int i; 104 + 105 + for (i = 0; commands[i].cmd; i++) { 106 + if (!strcmp(commands[i].cmd, "prog")) { 107 + /* Assume we run a bootstrap version if "bpftool prog" 108 + * is not available. 109 + */ 110 + bootstrap = !commands[i].func; 111 + break; 112 + } 113 + } 127 114 128 115 if (json_output) { 129 116 jsonw_start_object(json_wtr); /* root object */ ··· 159 112 jsonw_name(json_wtr, "features"); 160 113 jsonw_start_object(json_wtr); /* features */ 161 114 jsonw_bool_field(json_wtr, "libbfd", has_libbfd); 115 + jsonw_bool_field(json_wtr, "llvm", has_llvm); 162 116 jsonw_bool_field(json_wtr, "libbpf_strict", !legacy_libbpf); 163 117 jsonw_bool_field(json_wtr, "skeletons", has_skeletons); 118 + jsonw_bool_field(json_wtr, "bootstrap", bootstrap); 164 119 jsonw_end_object(json_wtr); /* features */ 165 120 166 121 jsonw_end_object(json_wtr); /* root object */ ··· 177 128 #endif 178 129 printf("using libbpf %s\n", libbpf_version_string()); 179 130 printf("features:"); 180 - if (has_libbfd) { 181 - printf(" libbfd"); 182 - nb_features++; 183 - } 184 - if (!legacy_libbpf) { 185 - printf("%s libbpf_strict", nb_features++ ? "," : ""); 186 - nb_features++; 187 - } 188 - if (has_skeletons) 189 - printf("%s skeletons", nb_features++ ? "," : ""); 131 + print_feature("libbfd", has_libbfd, &nb_features); 132 + print_feature("llvm", has_llvm, &nb_features); 133 + print_feature("libbpf_strict", !legacy_libbpf, &nb_features); 134 + print_feature("skeletons", has_skeletons, &nb_features); 135 + print_feature("bootstrap", bootstrap, &nb_features); 190 136 printf("\n"); 191 137 } 192 138 return 0; ··· 323 279 return n_argc; 324 280 } 325 281 326 - static int do_batch(int argc, char **argv); 327 - 328 - static const struct cmd cmds[] = { 329 - { "help", do_help }, 330 - { "batch", do_batch }, 331 - { "prog", do_prog }, 332 - { "map", do_map }, 333 - { "link", do_link }, 334 - { "cgroup", do_cgroup }, 335 - { "perf", do_perf }, 336 - { "net", do_net }, 337 - { "feature", do_feature }, 338 - { "btf", do_btf }, 339 - { "gen", do_gen }, 340 - { "struct_ops", do_struct_ops }, 341 - { "iter", do_iter }, 342 - { "version", do_version }, 343 - { 0 } 344 - }; 345 - 346 282 static int do_batch(int argc, char **argv) 347 283 { 348 284 char buf[BATCH_LINE_LEN_MAX], contline[BATCH_LINE_LEN_MAX]; ··· 410 386 jsonw_name(json_wtr, "output"); 411 387 } 412 388 413 - err = cmd_select(cmds, n_argc, n_argv, do_help); 389 + err = cmd_select(commands, n_argc, n_argv, do_help); 414 390 415 391 if (json_output) 416 392 jsonw_end_object(json_wtr); ··· 474 450 json_output = false; 475 451 show_pinned = false; 476 452 block_mount = false; 477 - bin_name = argv[0]; 453 + bin_name = "bpftool"; 478 454 479 455 opterr = 0; 480 456 while ((opt = getopt_long(argc, argv, "VhpjfLmndB:l", ··· 552 528 if (version_requested) 553 529 return do_version(argc, argv); 554 530 555 - ret = cmd_select(cmds, argc, argv, do_help); 531 + ret = cmd_select(commands, argc, argv, do_help); 556 532 557 533 if (json_output) 558 534 jsonw_destroy(&json_wtr);

+16 -16

tools/bpf/bpftool/main.h

··· 172 172 int map_parse_fd_and_info(int *argc, char ***argv, void *info, __u32 *info_len); 173 173 174 174 struct bpf_prog_linfo; 175 - #ifdef HAVE_LIBBFD_SUPPORT 176 - void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes, 177 - const char *arch, const char *disassembler_options, 178 - const struct btf *btf, 179 - const struct bpf_prog_linfo *prog_linfo, 180 - __u64 func_ksym, unsigned int func_idx, 181 - bool linum); 175 + #if defined(HAVE_LLVM_SUPPORT) || defined(HAVE_LIBBFD_SUPPORT) 176 + int disasm_print_insn(unsigned char *image, ssize_t len, int opcodes, 177 + const char *arch, const char *disassembler_options, 178 + const struct btf *btf, 179 + const struct bpf_prog_linfo *prog_linfo, 180 + __u64 func_ksym, unsigned int func_idx, 181 + bool linum); 182 182 int disasm_init(void); 183 183 #else 184 184 static inline 185 - void disasm_print_insn(unsigned char *image, ssize_t len, int opcodes, 186 - const char *arch, const char *disassembler_options, 187 - const struct btf *btf, 188 - const struct bpf_prog_linfo *prog_linfo, 189 - __u64 func_ksym, unsigned int func_idx, 190 - bool linum) 185 + int disasm_print_insn(unsigned char *image, ssize_t len, int opcodes, 186 + const char *arch, const char *disassembler_options, 187 + const struct btf *btf, 188 + const struct bpf_prog_linfo *prog_linfo, 189 + __u64 func_ksym, unsigned int func_idx, 190 + bool linum) 191 191 { 192 + return 0; 192 193 } 193 194 static inline int disasm_init(void) 194 195 { 195 - p_err("No libbfd support"); 196 + p_err("No JIT disassembly support"); 196 197 return -1; 197 198 } 198 199 #endif ··· 203 202 unsigned int get_page_size(void); 204 203 unsigned int get_possible_cpus(void); 205 204 const char * 206 - ifindex_to_bfd_params(__u32 ifindex, __u64 ns_dev, __u64 ns_ino, 207 - const char **opt); 205 + ifindex_to_arch(__u32 ifindex, __u64 ns_dev, __u64 ns_ino, const char **opt); 208 206 209 207 struct btf_dumper { 210 208 const struct btf *btf;

+1 -2

tools/bpf/bpftool/map.c

··· 1 1 // SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) 2 2 /* Copyright (C) 2017-2018 Netronome Systems, Inc. */ 3 3 4 - #include <assert.h> 5 4 #include <errno.h> 6 5 #include <fcntl.h> 7 6 #include <linux/err.h> ··· 1458 1459 " devmap | devmap_hash | sockmap | cpumap | xskmap | sockhash |\n" 1459 1460 " cgroup_storage | reuseport_sockarray | percpu_cgroup_storage |\n" 1460 1461 " queue | stack | sk_storage | struct_ops | ringbuf | inode_storage |\n" 1461 - " task_storage | bloom_filter | user_ringbuf }\n" 1462 + " task_storage | bloom_filter | user_ringbuf | cgrp_storage }\n" 1462 1463 " " HELP_SPEC_OPTIONS " |\n" 1463 1464 " {-f|--bpffs} | {-n|--nomount} }\n" 1464 1465 "",

+2

tools/bpf/bpftool/net.c

··· 1 1 // SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) 2 2 // Copyright (C) 2018 Facebook 3 3 4 + #ifndef _GNU_SOURCE 4 5 #define _GNU_SOURCE 6 + #endif 5 7 #include <errno.h> 6 8 #include <fcntl.h> 7 9 #include <stdlib.h>

+2

tools/bpf/bpftool/perf.c

··· 2 2 // Copyright (C) 2018 Facebook 3 3 // Author: Yonghong Song <yhs@fb.com> 4 4 5 + #ifndef _GNU_SOURCE 5 6 #define _GNU_SOURCE 7 + #endif 6 8 #include <ctype.h> 7 9 #include <errno.h> 8 10 #include <fcntl.h>

+87 -12

tools/bpf/bpftool/prog.c

··· 1 1 // SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) 2 2 /* Copyright (C) 2017-2018 Netronome Systems, Inc. */ 3 3 4 + #ifndef _GNU_SOURCE 4 5 #define _GNU_SOURCE 6 + #endif 5 7 #include <errno.h> 6 8 #include <fcntl.h> 7 9 #include <signal.h> ··· 764 762 const char *name = NULL; 765 763 766 764 if (info->ifindex) { 767 - name = ifindex_to_bfd_params(info->ifindex, 768 - info->netns_dev, 769 - info->netns_ino, 770 - &disasm_opt); 765 + name = ifindex_to_arch(info->ifindex, info->netns_dev, 766 + info->netns_ino, &disasm_opt); 771 767 if (!name) 772 768 goto exit_free; 773 769 } ··· 820 820 printf("%s:\n", sym_name); 821 821 } 822 822 823 - disasm_print_insn(img, lens[i], opcodes, 824 - name, disasm_opt, btf, 825 - prog_linfo, ksyms[i], i, 826 - linum); 823 + if (disasm_print_insn(img, lens[i], opcodes, 824 + name, disasm_opt, btf, 825 + prog_linfo, ksyms[i], i, 826 + linum)) 827 + goto exit_free; 827 828 828 829 img += lens[i]; 829 830 ··· 837 836 if (json_output) 838 837 jsonw_end_array(json_wtr); 839 838 } else { 840 - disasm_print_insn(buf, member_len, opcodes, name, 841 - disasm_opt, btf, NULL, 0, 0, false); 839 + if (disasm_print_insn(buf, member_len, opcodes, name, 840 + disasm_opt, btf, NULL, 0, 0, 841 + false)) 842 + goto exit_free; 842 843 } 843 844 } else if (visual) { 844 845 if (json_output) ··· 1456 1453 return ret; 1457 1454 } 1458 1455 1456 + static int 1457 + auto_attach_program(struct bpf_program *prog, const char *path) 1458 + { 1459 + struct bpf_link *link; 1460 + int err; 1461 + 1462 + link = bpf_program__attach(prog); 1463 + if (!link) { 1464 + p_info("Program %s does not support autoattach, falling back to pinning", 1465 + bpf_program__name(prog)); 1466 + return bpf_obj_pin(bpf_program__fd(prog), path); 1467 + } 1468 + 1469 + err = bpf_link__pin(link, path); 1470 + bpf_link__destroy(link); 1471 + return err; 1472 + } 1473 + 1474 + static int pathname_concat(char *buf, size_t buf_sz, const char *path, const char *name) 1475 + { 1476 + int len; 1477 + 1478 + len = snprintf(buf, buf_sz, "%s/%s", path, name); 1479 + if (len < 0) 1480 + return -EINVAL; 1481 + if ((size_t)len >= buf_sz) 1482 + return -ENAMETOOLONG; 1483 + 1484 + return 0; 1485 + } 1486 + 1487 + static int 1488 + auto_attach_programs(struct bpf_object *obj, const char *path) 1489 + { 1490 + struct bpf_program *prog; 1491 + char buf[PATH_MAX]; 1492 + int err; 1493 + 1494 + bpf_object__for_each_program(prog, obj) { 1495 + err = pathname_concat(buf, sizeof(buf), path, bpf_program__name(prog)); 1496 + if (err) 1497 + goto err_unpin_programs; 1498 + 1499 + err = auto_attach_program(prog, buf); 1500 + if (err) 1501 + goto err_unpin_programs; 1502 + } 1503 + 1504 + return 0; 1505 + 1506 + err_unpin_programs: 1507 + while ((prog = bpf_object__prev_program(obj, prog))) { 1508 + if (pathname_concat(buf, sizeof(buf), path, bpf_program__name(prog))) 1509 + continue; 1510 + 1511 + bpf_program__unpin(prog, buf); 1512 + } 1513 + 1514 + return err; 1515 + } 1516 + 1459 1517 static int load_with_options(int argc, char **argv, bool first_prog_only) 1460 1518 { 1461 1519 enum bpf_prog_type common_prog_type = BPF_PROG_TYPE_UNSPEC; ··· 1528 1464 struct bpf_program *prog = NULL, *pos; 1529 1465 unsigned int old_map_fds = 0; 1530 1466 const char *pinmaps = NULL; 1467 + bool auto_attach = false; 1531 1468 struct bpf_object *obj; 1532 1469 struct bpf_map *map; 1533 1470 const char *pinfile; ··· 1648 1583 goto err_free_reuse_maps; 1649 1584 1650 1585 pinmaps = GET_ARG(); 1586 + } else if (is_prefix(*argv, "autoattach")) { 1587 + auto_attach = true; 1588 + NEXT_ARG(); 1651 1589 } else { 1652 1590 p_err("expected no more arguments, 'type', 'map' or 'dev', got: '%s'?", 1653 1591 *argv); ··· 1760 1692 goto err_close_obj; 1761 1693 } 1762 1694 1763 - err = bpf_obj_pin(bpf_program__fd(prog), pinfile); 1695 + if (auto_attach) 1696 + err = auto_attach_program(prog, pinfile); 1697 + else 1698 + err = bpf_obj_pin(bpf_program__fd(prog), pinfile); 1764 1699 if (err) { 1765 1700 p_err("failed to pin program %s", 1766 1701 bpf_program__section_name(prog)); 1767 1702 goto err_close_obj; 1768 1703 } 1769 1704 } else { 1770 - err = bpf_object__pin_programs(obj, pinfile); 1705 + if (auto_attach) 1706 + err = auto_attach_programs(obj, pinfile); 1707 + else 1708 + err = bpf_object__pin_programs(obj, pinfile); 1771 1709 if (err) { 1772 1710 p_err("failed to pin all programs"); 1773 1711 goto err_close_obj; ··· 2412 2338 " [type TYPE] [dev NAME] \\\n" 2413 2339 " [map { idx IDX | name NAME } MAP]\\\n" 2414 2340 " [pinmaps MAP_DIR]\n" 2341 + " [autoattach]\n" 2415 2342 " %1$s %2$s attach PROG ATTACH_TYPE [MAP]\n" 2416 2343 " %1$s %2$s detach PROG ATTACH_TYPE [MAP]\n" 2417 2344 " %1$s %2$s run PROG \\\n"

+2

tools/bpf/bpftool/xlated_dumper.c

··· 1 1 // SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) 2 2 /* Copyright (C) 2018 Netronome Systems, Inc. */ 3 3 4 + #ifndef _GNU_SOURCE 4 5 #define _GNU_SOURCE 6 + #endif 5 7 #include <stdarg.h> 6 8 #include <stdio.h> 7 9 #include <stdlib.h>

+49 -1

tools/include/uapi/linux/bpf.h

··· 922 922 BPF_MAP_TYPE_CPUMAP, 923 923 BPF_MAP_TYPE_XSKMAP, 924 924 BPF_MAP_TYPE_SOCKHASH, 925 - BPF_MAP_TYPE_CGROUP_STORAGE, 925 + BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED, 926 + /* BPF_MAP_TYPE_CGROUP_STORAGE is available to bpf programs attaching 927 + * to a cgroup. The newer BPF_MAP_TYPE_CGRP_STORAGE is available to 928 + * both cgroup-attached and other progs and supports all functionality 929 + * provided by BPF_MAP_TYPE_CGROUP_STORAGE. So mark 930 + * BPF_MAP_TYPE_CGROUP_STORAGE deprecated. 931 + */ 932 + BPF_MAP_TYPE_CGROUP_STORAGE = BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED, 926 933 BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, 927 934 BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE, 928 935 BPF_MAP_TYPE_QUEUE, ··· 942 935 BPF_MAP_TYPE_TASK_STORAGE, 943 936 BPF_MAP_TYPE_BLOOM_FILTER, 944 937 BPF_MAP_TYPE_USER_RINGBUF, 938 + BPF_MAP_TYPE_CGRP_STORAGE, 945 939 }; 946 940 947 941 /* Note that tracing related programs such as ··· 5443 5435 * **-E2BIG** if user-space has tried to publish a sample which is 5444 5436 * larger than the size of the ring buffer, or which cannot fit 5445 5437 * within a struct bpf_dynptr. 5438 + * 5439 + * void *bpf_cgrp_storage_get(struct bpf_map *map, struct cgroup *cgroup, void *value, u64 flags) 5440 + * Description 5441 + * Get a bpf_local_storage from the *cgroup*. 5442 + * 5443 + * Logically, it could be thought of as getting the value from 5444 + * a *map* with *cgroup* as the **key**. From this 5445 + * perspective, the usage is not much different from 5446 + * **bpf_map_lookup_elem**\ (*map*, **&**\ *cgroup*) except this 5447 + * helper enforces the key must be a cgroup struct and the map must also 5448 + * be a **BPF_MAP_TYPE_CGRP_STORAGE**. 5449 + * 5450 + * In reality, the local-storage value is embedded directly inside of the 5451 + * *cgroup* object itself, rather than being located in the 5452 + * **BPF_MAP_TYPE_CGRP_STORAGE** map. When the local-storage value is 5453 + * queried for some *map* on a *cgroup* object, the kernel will perform an 5454 + * O(n) iteration over all of the live local-storage values for that 5455 + * *cgroup* object until the local-storage value for the *map* is found. 5456 + * 5457 + * An optional *flags* (**BPF_LOCAL_STORAGE_GET_F_CREATE**) can be 5458 + * used such that a new bpf_local_storage will be 5459 + * created if one does not exist. *value* can be used 5460 + * together with **BPF_LOCAL_STORAGE_GET_F_CREATE** to specify 5461 + * the initial value of a bpf_local_storage. If *value* is 5462 + * **NULL**, the new bpf_local_storage will be zero initialized. 5463 + * Return 5464 + * A bpf_local_storage pointer is returned on success. 5465 + * 5466 + * **NULL** if not found or there was an error in adding 5467 + * a new bpf_local_storage. 5468 + * 5469 + * long bpf_cgrp_storage_delete(struct bpf_map *map, struct cgroup *cgroup) 5470 + * Description 5471 + * Delete a bpf_local_storage from a *cgroup*. 5472 + * Return 5473 + * 0 on success. 5474 + * 5475 + * **-ENOENT** if the bpf_local_storage cannot be found. 5446 5476 */ 5447 5477 #define ___BPF_FUNC_MAPPER(FN, ctx...) \ 5448 5478 FN(unspec, 0, ##ctx) \ ··· 5693 5647 FN(tcp_raw_check_syncookie_ipv6, 207, ##ctx) \ 5694 5648 FN(ktime_get_tai_ns, 208, ##ctx) \ 5695 5649 FN(user_ringbuf_drain, 209, ##ctx) \ 5650 + FN(cgrp_storage_get, 210, ##ctx) \ 5651 + FN(cgrp_storage_delete, 211, ##ctx) \ 5696 5652 /* */ 5697 5653 5698 5654 /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that don't

+5 -3

tools/lib/bpf/btf.c

··· 3887 3887 } 3888 3888 3889 3889 /* Check if given two types are identical ARRAY definitions */ 3890 - static int btf_dedup_identical_arrays(struct btf_dedup *d, __u32 id1, __u32 id2) 3890 + static bool btf_dedup_identical_arrays(struct btf_dedup *d, __u32 id1, __u32 id2) 3891 3891 { 3892 3892 struct btf_type *t1, *t2; 3893 3893 3894 3894 t1 = btf_type_by_id(d->btf, id1); 3895 3895 t2 = btf_type_by_id(d->btf, id2); 3896 3896 if (!btf_is_array(t1) || !btf_is_array(t2)) 3897 - return 0; 3897 + return false; 3898 3898 3899 3899 return btf_equal_array(t1, t2); 3900 3900 } ··· 3918 3918 m1 = btf_members(t1); 3919 3919 m2 = btf_members(t2); 3920 3920 for (i = 0, n = btf_vlen(t1); i < n; i++, m1++, m2++) { 3921 - if (m1->type != m2->type) 3921 + if (m1->type != m2->type && 3922 + !btf_dedup_identical_arrays(d, m1->type, m2->type) && 3923 + !btf_dedup_identical_structs(d, m1->type, m2->type)) 3922 3924 return false; 3923 3925 } 3924 3926 return true;

+113 -65

tools/lib/bpf/libbpf.c

··· 164 164 [BPF_MAP_TYPE_TASK_STORAGE] = "task_storage", 165 165 [BPF_MAP_TYPE_BLOOM_FILTER] = "bloom_filter", 166 166 [BPF_MAP_TYPE_USER_RINGBUF] = "user_ringbuf", 167 + [BPF_MAP_TYPE_CGRP_STORAGE] = "cgrp_storage", 167 168 }; 168 169 169 170 static const char * const prog_type_name[] = { ··· 1462 1461 return -ENOENT; 1463 1462 } 1464 1463 1465 - static int find_elf_var_offset(const struct bpf_object *obj, const char *name, __u32 *off) 1464 + static Elf64_Sym *find_elf_var_sym(const struct bpf_object *obj, const char *name) 1466 1465 { 1467 1466 Elf_Data *symbols = obj->efile.symbols; 1468 1467 const char *sname; 1469 1468 size_t si; 1470 - 1471 - if (!name || !off) 1472 - return -EINVAL; 1473 1469 1474 1470 for (si = 0; si < symbols->d_size / sizeof(Elf64_Sym); si++) { 1475 1471 Elf64_Sym *sym = elf_sym_by_idx(obj, si); ··· 1481 1483 sname = elf_sym_str(obj, sym->st_name); 1482 1484 if (!sname) { 1483 1485 pr_warn("failed to get sym name string for var %s\n", name); 1484 - return -EIO; 1486 + return ERR_PTR(-EIO); 1485 1487 } 1486 - if (strcmp(name, sname) == 0) { 1487 - *off = sym->st_value; 1488 - return 0; 1489 - } 1488 + if (strcmp(name, sname) == 0) 1489 + return sym; 1490 1490 } 1491 1491 1492 - return -ENOENT; 1492 + return ERR_PTR(-ENOENT); 1493 1493 } 1494 1494 1495 1495 static struct bpf_map *bpf_object__add_map(struct bpf_object *obj) ··· 1578 1582 } 1579 1583 1580 1584 static int 1581 - bpf_map_find_btf_info(struct bpf_object *obj, struct bpf_map *map); 1585 + map_fill_btf_type_info(struct bpf_object *obj, struct bpf_map *map); 1586 + 1587 + /* Internal BPF map is mmap()'able only if at least one of corresponding 1588 + * DATASEC's VARs are to be exposed through BPF skeleton. I.e., it's a GLOBAL 1589 + * variable and it's not marked as __hidden (which turns it into, effectively, 1590 + * a STATIC variable). 1591 + */ 1592 + static bool map_is_mmapable(struct bpf_object *obj, struct bpf_map *map) 1593 + { 1594 + const struct btf_type *t, *vt; 1595 + struct btf_var_secinfo *vsi; 1596 + int i, n; 1597 + 1598 + if (!map->btf_value_type_id) 1599 + return false; 1600 + 1601 + t = btf__type_by_id(obj->btf, map->btf_value_type_id); 1602 + if (!btf_is_datasec(t)) 1603 + return false; 1604 + 1605 + vsi = btf_var_secinfos(t); 1606 + for (i = 0, n = btf_vlen(t); i < n; i++, vsi++) { 1607 + vt = btf__type_by_id(obj->btf, vsi->type); 1608 + if (!btf_is_var(vt)) 1609 + continue; 1610 + 1611 + if (btf_var(vt)->linkage != BTF_VAR_STATIC) 1612 + return true; 1613 + } 1614 + 1615 + return false; 1616 + } 1582 1617 1583 1618 static int 1584 1619 bpf_object__init_internal_map(struct bpf_object *obj, enum libbpf_map_type type, ··· 1641 1614 def->max_entries = 1; 1642 1615 def->map_flags = type == LIBBPF_MAP_RODATA || type == LIBBPF_MAP_KCONFIG 1643 1616 ? BPF_F_RDONLY_PROG : 0; 1644 - def->map_flags |= BPF_F_MMAPABLE; 1617 + 1618 + /* failures are fine because of maps like .rodata.str1.1 */ 1619 + (void) map_fill_btf_type_info(obj, map); 1620 + 1621 + if (map_is_mmapable(obj, map)) 1622 + def->map_flags |= BPF_F_MMAPABLE; 1645 1623 1646 1624 pr_debug("map '%s' (global data): at sec_idx %d, offset %zu, flags %x.\n", 1647 1625 map->name, map->sec_idx, map->sec_offset, def->map_flags); ··· 1662 1630 zfree(&map->name); 1663 1631 return err; 1664 1632 } 1665 - 1666 - /* failures are fine because of maps like .rodata.str1.1 */ 1667 - (void) bpf_map_find_btf_info(obj, map); 1668 1633 1669 1634 if (data) 1670 1635 memcpy(map->mmaped, data, data_sz); ··· 2574 2545 fill_map_from_def(map->inner_map, &inner_def); 2575 2546 } 2576 2547 2577 - err = bpf_map_find_btf_info(obj, map); 2548 + err = map_fill_btf_type_info(obj, map); 2578 2549 if (err) 2579 2550 return err; 2580 2551 ··· 2879 2850 static int btf_fixup_datasec(struct bpf_object *obj, struct btf *btf, 2880 2851 struct btf_type *t) 2881 2852 { 2882 - __u32 size = 0, off = 0, i, vars = btf_vlen(t); 2883 - const char *name = btf__name_by_offset(btf, t->name_off); 2884 - const struct btf_type *t_var; 2853 + __u32 size = 0, i, vars = btf_vlen(t); 2854 + const char *sec_name = btf__name_by_offset(btf, t->name_off); 2885 2855 struct btf_var_secinfo *vsi; 2886 - const struct btf_var *var; 2887 - int ret; 2856 + bool fixup_offsets = false; 2857 + int err; 2888 2858 2889 - if (!name) { 2859 + if (!sec_name) { 2890 2860 pr_debug("No name found in string section for DATASEC kind.\n"); 2891 2861 return -ENOENT; 2892 2862 } 2893 2863 2894 - /* .extern datasec size and var offsets were set correctly during 2895 - * extern collection step, so just skip straight to sorting variables 2864 + /* Extern-backing datasecs (.ksyms, .kconfig) have their size and 2865 + * variable offsets set at the previous step. Further, not every 2866 + * extern BTF VAR has corresponding ELF symbol preserved, so we skip 2867 + * all fixups altogether for such sections and go straight to sorting 2868 + * VARs within their DATASEC. 2896 2869 */ 2897 - if (t->size) 2870 + if (strcmp(sec_name, KCONFIG_SEC) == 0 || strcmp(sec_name, KSYMS_SEC) == 0) 2898 2871 goto sort_vars; 2899 2872 2900 - ret = find_elf_sec_sz(obj, name, &size); 2901 - if (ret || !size) { 2902 - pr_debug("Invalid size for section %s: %u bytes\n", name, size); 2903 - return -ENOENT; 2873 + /* Clang leaves DATASEC size and VAR offsets as zeroes, so we need to 2874 + * fix this up. But BPF static linker already fixes this up and fills 2875 + * all the sizes and offsets during static linking. So this step has 2876 + * to be optional. But the STV_HIDDEN handling is non-optional for any 2877 + * non-extern DATASEC, so the variable fixup loop below handles both 2878 + * functions at the same time, paying the cost of BTF VAR <-> ELF 2879 + * symbol matching just once. 2880 + */ 2881 + if (t->size == 0) { 2882 + err = find_elf_sec_sz(obj, sec_name, &size); 2883 + if (err || !size) { 2884 + pr_debug("sec '%s': failed to determine size from ELF: size %u, err %d\n", 2885 + sec_name, size, err); 2886 + return -ENOENT; 2887 + } 2888 + 2889 + t->size = size; 2890 + fixup_offsets = true; 2904 2891 } 2905 2892 2906 - t->size = size; 2907 - 2908 2893 for (i = 0, vsi = btf_var_secinfos(t); i < vars; i++, vsi++) { 2894 + const struct btf_type *t_var; 2895 + struct btf_var *var; 2896 + const char *var_name; 2897 + Elf64_Sym *sym; 2898 + 2909 2899 t_var = btf__type_by_id(btf, vsi->type); 2910 2900 if (!t_var || !btf_is_var(t_var)) { 2911 - pr_debug("Non-VAR type seen in section %s\n", name); 2901 + pr_debug("sec '%s': unexpected non-VAR type found\n", sec_name); 2912 2902 return -EINVAL; 2913 2903 } 2914 2904 2915 2905 var = btf_var(t_var); 2916 - if (var->linkage == BTF_VAR_STATIC) 2906 + if (var->linkage == BTF_VAR_STATIC || var->linkage == BTF_VAR_GLOBAL_EXTERN) 2917 2907 continue; 2918 2908 2919 - name = btf__name_by_offset(btf, t_var->name_off); 2920 - if (!name) { 2921 - pr_debug("No name found in string section for VAR kind\n"); 2909 + var_name = btf__name_by_offset(btf, t_var->name_off); 2910 + if (!var_name) { 2911 + pr_debug("sec '%s': failed to find name of DATASEC's member #%d\n", 2912 + sec_name, i); 2922 2913 return -ENOENT; 2923 2914 } 2924 2915 2925 - ret = find_elf_var_offset(obj, name, &off); 2926 - if (ret) { 2927 - pr_debug("No offset found in symbol table for VAR %s\n", 2928 - name); 2916 + sym = find_elf_var_sym(obj, var_name); 2917 + if (IS_ERR(sym)) { 2918 + pr_debug("sec '%s': failed to find ELF symbol for VAR '%s'\n", 2919 + sec_name, var_name); 2929 2920 return -ENOENT; 2930 2921 } 2931 2922 2932 - vsi->offset = off; 2923 + if (fixup_offsets) 2924 + vsi->offset = sym->st_value; 2925 + 2926 + /* if variable is a global/weak symbol, but has restricted 2927 + * (STV_HIDDEN or STV_INTERNAL) visibility, mark its BTF VAR 2928 + * as static. This follows similar logic for functions (BPF 2929 + * subprogs) and influences libbpf's further decisions about 2930 + * whether to make global data BPF array maps as 2931 + * BPF_F_MMAPABLE. 2932 + */ 2933 + if (ELF64_ST_VISIBILITY(sym->st_other) == STV_HIDDEN 2934 + || ELF64_ST_VISIBILITY(sym->st_other) == STV_INTERNAL) 2935 + var->linkage = BTF_VAR_STATIC; 2933 2936 } 2934 2937 2935 2938 sort_vars: ··· 2969 2908 return 0; 2970 2909 } 2971 2910 2972 - static int btf_finalize_data(struct bpf_object *obj, struct btf *btf) 2911 + static int bpf_object_fixup_btf(struct bpf_object *obj) 2973 2912 { 2974 - int err = 0; 2975 - __u32 i, n = btf__type_cnt(btf); 2913 + int i, n, err = 0; 2976 2914 2915 + if (!obj->btf) 2916 + return 0; 2917 + 2918 + n = btf__type_cnt(obj->btf); 2977 2919 for (i = 1; i < n; i++) { 2978 - struct btf_type *t = btf_type_by_id(btf, i); 2920 + struct btf_type *t = btf_type_by_id(obj->btf, i); 2979 2921 2980 2922 /* Loader needs to fix up some of the things compiler 2981 2923 * couldn't get its hands on while emitting BTF. This ··· 2986 2922 * the info from the ELF itself for this purpose. 2987 2923 */ 2988 2924 if (btf_is_datasec(t)) { 2989 - err = btf_fixup_datasec(obj, btf, t); 2925 + err = btf_fixup_datasec(obj, obj->btf, t); 2990 2926 if (err) 2991 - break; 2927 + return err; 2992 2928 } 2993 - } 2994 - 2995 - return libbpf_err(err); 2996 - } 2997 - 2998 - static int bpf_object__finalize_btf(struct bpf_object *obj) 2999 - { 3000 - int err; 3001 - 3002 - if (!obj->btf) 3003 - return 0; 3004 - 3005 - err = btf_finalize_data(obj, obj->btf); 3006 - if (err) { 3007 - pr_warn("Error finalizing %s: %d.\n", BTF_ELF_SEC, err); 3008 - return err; 3009 2929 } 3010 2930 3011 2931 return 0; ··· 4283 4235 return 0; 4284 4236 } 4285 4237 4286 - static int bpf_map_find_btf_info(struct bpf_object *obj, struct bpf_map *map) 4238 + static int map_fill_btf_type_info(struct bpf_object *obj, struct bpf_map *map) 4287 4239 { 4288 4240 int id; 4289 4241 ··· 7281 7233 err = err ? : bpf_object__check_endianness(obj); 7282 7234 err = err ? : bpf_object__elf_collect(obj); 7283 7235 err = err ? : bpf_object__collect_externs(obj); 7284 - err = err ? : bpf_object__finalize_btf(obj); 7236 + err = err ? : bpf_object_fixup_btf(obj); 7285 7237 err = err ? : bpf_object__init_maps(obj, opts); 7286 7238 err = err ? : bpf_object_init_progs(obj, opts); 7287 7239 err = err ? : bpf_object__collect_relos(obj);

+1

tools/lib/bpf/libbpf_probes.c

··· 221 221 case BPF_MAP_TYPE_SK_STORAGE: 222 222 case BPF_MAP_TYPE_INODE_STORAGE: 223 223 case BPF_MAP_TYPE_TASK_STORAGE: 224 + case BPF_MAP_TYPE_CGRP_STORAGE: 224 225 btf_key_type_id = 1; 225 226 btf_value_type_id = 3; 226 227 value_size = 8;

+6 -10

tools/lib/bpf/usdt.c

··· 1225 1225 1226 1226 static int parse_usdt_arg(const char *arg_str, int arg_num, struct usdt_arg_spec *arg) 1227 1227 { 1228 - char *reg_name = NULL; 1228 + char reg_name[16]; 1229 1229 int arg_sz, len, reg_off; 1230 1230 long off; 1231 1231 1232 - if (sscanf(arg_str, " %d @ %ld ( %%%m[^)] ) %n", &arg_sz, &off, &reg_name, &len) == 3) { 1232 + if (sscanf(arg_str, " %d @ %ld ( %%%15[^)] ) %n", &arg_sz, &off, reg_name, &len) == 3) { 1233 1233 /* Memory dereference case, e.g., -4@-20(%rbp) */ 1234 1234 arg->arg_type = USDT_ARG_REG_DEREF; 1235 1235 arg->val_off = off; 1236 1236 reg_off = calc_pt_regs_off(reg_name); 1237 - free(reg_name); 1238 1237 if (reg_off < 0) 1239 1238 return reg_off; 1240 1239 arg->reg_off = reg_off; 1241 - } else if (sscanf(arg_str, " %d @ %%%ms %n", &arg_sz, &reg_name, &len) == 2) { 1240 + } else if (sscanf(arg_str, " %d @ %%%15s %n", &arg_sz, reg_name, &len) == 2) { 1242 1241 /* Register read case, e.g., -4@%eax */ 1243 1242 arg->arg_type = USDT_ARG_REG; 1244 1243 arg->val_off = 0; 1245 1244 1246 1245 reg_off = calc_pt_regs_off(reg_name); 1247 - free(reg_name); 1248 1246 if (reg_off < 0) 1249 1247 return reg_off; 1250 1248 arg->reg_off = reg_off; ··· 1454 1456 1455 1457 static int parse_usdt_arg(const char *arg_str, int arg_num, struct usdt_arg_spec *arg) 1456 1458 { 1457 - char *reg_name = NULL; 1459 + char reg_name[16]; 1458 1460 int arg_sz, len, reg_off; 1459 1461 long off; 1460 1462 1461 - if (sscanf(arg_str, " %d @ %ld ( %m[a-z0-9] ) %n", &arg_sz, &off, &reg_name, &len) == 3) { 1463 + if (sscanf(arg_str, " %d @ %ld ( %15[a-z0-9] ) %n", &arg_sz, &off, reg_name, &len) == 3) { 1462 1464 /* Memory dereference case, e.g., -8@-88(s0) */ 1463 1465 arg->arg_type = USDT_ARG_REG_DEREF; 1464 1466 arg->val_off = off; 1465 1467 reg_off = calc_pt_regs_off(reg_name); 1466 - free(reg_name); 1467 1468 if (reg_off < 0) 1468 1469 return reg_off; 1469 1470 arg->reg_off = reg_off; ··· 1471 1474 arg->arg_type = USDT_ARG_CONST; 1472 1475 arg->val_off = off; 1473 1476 arg->reg_off = 0; 1474 - } else if (sscanf(arg_str, " %d @ %m[a-z0-9] %n", &arg_sz, &reg_name, &len) == 2) { 1477 + } else if (sscanf(arg_str, " %d @ %15[a-z0-9] %n", &arg_sz, reg_name, &len) == 2) { 1475 1478 /* Register read case, e.g., -8@a1 */ 1476 1479 arg->arg_type = USDT_ARG_REG; 1477 1480 arg->val_off = 0; 1478 1481 reg_off = calc_pt_regs_off(reg_name); 1479 - free(reg_name); 1480 1482 if (reg_off < 0) 1481 1483 return reg_off; 1482 1484 arg->reg_off = reg_off;

+81

tools/testing/selftests/bpf/DENYLIST.aarch64

··· 1 + bloom_filter_map # libbpf: prog 'check_bloom': failed to attach: ERROR: strerror_r(-524)=22 2 + bpf_cookie/lsm 3 + bpf_cookie/multi_kprobe_attach_api 4 + bpf_cookie/multi_kprobe_link_api 5 + bpf_cookie/trampoline 6 + bpf_loop/check_callback_fn_stop # link unexpected error: -524 7 + bpf_loop/check_invalid_flags 8 + bpf_loop/check_nested_calls 9 + bpf_loop/check_non_constant_callback 10 + bpf_loop/check_nr_loops 11 + bpf_loop/check_null_callback_ctx 12 + bpf_loop/check_stack 13 + bpf_mod_race # bpf_mod_kfunc_race__attach unexpected error: -524 (errno 524) 14 + bpf_tcp_ca/dctcp_fallback 15 + btf_dump/btf_dump: var_data # find type id unexpected find type id: actual -2 < expected 0 16 + cgroup_hierarchical_stats # attach unexpected error: -524 (errno 524) 17 + d_path/basic # setup attach failed: -524 18 + deny_namespace # attach unexpected error: -524 (errno 524) 19 + fentry_fexit # fentry_attach unexpected error: -1 (errno 524) 20 + fentry_test # fentry_attach unexpected error: -1 (errno 524) 21 + fexit_sleep # fexit_attach fexit attach failed: -1 22 + fexit_stress # fexit attach unexpected fexit attach: actual -524 < expected 0 23 + fexit_test # fexit_attach unexpected error: -1 (errno 524) 24 + get_func_args_test # get_func_args_test__attach unexpected error: -524 (errno 524) (trampoline) 25 + get_func_ip_test # get_func_ip_test__attach unexpected error: -524 (errno 524) (trampoline) 26 + htab_update/reenter_update 27 + kfree_skb # attach fentry unexpected error: -524 (trampoline) 28 + kfunc_call/subprog # extern (var ksym) 'bpf_prog_active': not found in kernel BTF 29 + kfunc_call/subprog_lskel # skel unexpected error: -2 30 + kfunc_dynptr_param/dynptr_data_null # libbpf: prog 'dynptr_data_null': failed to attach: ERROR: strerror_r(-524)=22 31 + kprobe_multi_test/attach_api_addrs # bpf_program__attach_kprobe_multi_opts unexpected error: -95 32 + kprobe_multi_test/attach_api_pattern # bpf_program__attach_kprobe_multi_opts unexpected error: -95 33 + kprobe_multi_test/attach_api_syms # bpf_program__attach_kprobe_multi_opts unexpected error: -95 34 + kprobe_multi_test/bench_attach # bpf_program__attach_kprobe_multi_opts unexpected error: -95 35 + kprobe_multi_test/link_api_addrs # link_fd unexpected link_fd: actual -95 < expected 0 36 + kprobe_multi_test/link_api_syms # link_fd unexpected link_fd: actual -95 < expected 0 37 + kprobe_multi_test/skel_api # kprobe_multi__attach unexpected error: -524 (errno 524) 38 + ksyms_module/libbpf # 'bpf_testmod_ksym_percpu': not found in kernel BTF 39 + ksyms_module/lskel # test_ksyms_module_lskel__open_and_load unexpected error: -2 40 + libbpf_get_fd_by_id_opts # test_libbpf_get_fd_by_id_opts__attach unexpected error: -524 (errno 524) 41 + lookup_key # test_lookup_key__attach unexpected error: -524 (errno 524) 42 + lru_bug # lru_bug__attach unexpected error: -524 (errno 524) 43 + modify_return # modify_return__attach failed unexpected error: -524 (errno 524) 44 + module_attach # skel_attach skeleton attach failed: -524 45 + mptcp/base # run_test mptcp unexpected error: -524 (errno 524) 46 + netcnt # packets unexpected packets: actual 10001 != expected 10000 47 + recursion # skel_attach unexpected error: -524 (errno 524) 48 + ringbuf # skel_attach skeleton attachment failed: -1 49 + setget_sockopt # attach_cgroup unexpected error: -524 50 + sk_storage_tracing # test_sk_storage_tracing__attach unexpected error: -524 (errno 524) 51 + skc_to_unix_sock # could not attach BPF object unexpected error: -524 (errno 524) 52 + socket_cookie # prog_attach unexpected error: -524 53 + stacktrace_build_id # compare_stack_ips stackmap vs. stack_amap err -1 errno 2 54 + task_local_storage/exit_creds # skel_attach unexpected error: -524 (errno 524) 55 + task_local_storage/recursion # skel_attach unexpected error: -524 (errno 524) 56 + test_bprm_opts # attach attach failed: -524 57 + test_ima # attach attach failed: -524 58 + test_local_storage # attach lsm attach failed: -524 59 + test_lsm # test_lsm_first_attach unexpected error: -524 (errno 524) 60 + test_overhead # attach_fentry unexpected error: -524 61 + timer # timer unexpected error: -524 (errno 524) 62 + timer_crash # timer_crash__attach unexpected error: -524 (errno 524) 63 + timer_mim # timer_mim unexpected error: -524 (errno 524) 64 + trace_printk # trace_printk__attach unexpected error: -1 (errno 524) 65 + trace_vprintk # trace_vprintk__attach unexpected error: -1 (errno 524) 66 + tracing_struct # tracing_struct__attach unexpected error: -524 (errno 524) 67 + trampoline_count # attach_prog unexpected error: -524 68 + unpriv_bpf_disabled # skel_attach unexpected error: -524 (errno 524) 69 + user_ringbuf/test_user_ringbuf_post_misaligned # misaligned_skel unexpected error: -524 (errno 524) 70 + user_ringbuf/test_user_ringbuf_post_producer_wrong_offset 71 + user_ringbuf/test_user_ringbuf_post_larger_than_ringbuf_sz 72 + user_ringbuf/test_user_ringbuf_basic # ringbuf_basic_skel unexpected error: -524 (errno 524) 73 + user_ringbuf/test_user_ringbuf_sample_full_ring_buffer 74 + user_ringbuf/test_user_ringbuf_post_alignment_autoadjust 75 + user_ringbuf/test_user_ringbuf_overfill 76 + user_ringbuf/test_user_ringbuf_discards_properly_ignored 77 + user_ringbuf/test_user_ringbuf_loop 78 + user_ringbuf/test_user_ringbuf_msg_protocol 79 + user_ringbuf/test_user_ringbuf_blocking_reserve 80 + verify_pkcs7_sig # test_verify_pkcs7_sig__attach unexpected error: -524 (errno 524) 81 + vmlinux # skel_attach skeleton attach failed: -524

+1

tools/testing/selftests/bpf/DENYLIST.s390x

··· 10 10 bpf_tcp_ca # JIT does not support calling kernel function (kfunc) 11 11 cb_refs # expected error message unexpected error: -524 (trampoline) 12 12 cgroup_hierarchical_stats # JIT does not support calling kernel function (kfunc) 13 + cgrp_local_storage # prog_attach unexpected error: -524 (trampoline) 13 14 core_read_macros # unknown func bpf_probe_read#4 (overlapping) 14 15 d_path # failed to auto-attach program 'prog_stat': -524 (trampoline) 15 16 deny_namespace # failed to attach: ERROR: strerror_r(-524)=22 (trampoline)

+5 -3

tools/testing/selftests/bpf/Makefile

··· 359 359 test_subskeleton.skel.h test_subskeleton_lib.skel.h \ 360 360 test_usdt.skel.h 361 361 362 - LSKELS := fentry_test.c fexit_test.c fexit_sleep.c \ 363 - test_ringbuf.c atomics.c trace_printk.c trace_vprintk.c \ 364 - map_ptr_kern.c core_kern.c core_kern_overflow.c 362 + LSKELS := fentry_test.c fexit_test.c fexit_sleep.c atomics.c \ 363 + trace_printk.c trace_vprintk.c map_ptr_kern.c \ 364 + core_kern.c core_kern_overflow.c test_ringbuf.c \ 365 + test_ringbuf_map_key.c 366 + 365 367 # Generate both light skeleton and libbpf skeleton for these 366 368 LSKELS_EXTRA := test_ksyms_module.c test_ksyms_weak.c kfunc_call_test.c \ 367 369 kfunc_call_test_subprog.c

+41 -1

tools/testing/selftests/bpf/README.rst

··· 6 6 7 7 __ /Documentation/bpf/bpf_devel_QA.rst#q-how-to-run-bpf-selftests 8 8 9 + ============= 10 + BPF CI System 11 + ============= 12 + 13 + BPF employs a continuous integration (CI) system to check patch submission in an 14 + automated fashion. The system runs selftests for each patch in a series. Results 15 + are propagated to patchwork, where failures are highlighted similar to 16 + violations of other checks (such as additional warnings being emitted or a 17 + ``scripts/checkpatch.pl`` reported deficiency): 18 + 19 + https://patchwork.kernel.org/project/netdevbpf/list/?delegate=121173 20 + 21 + The CI system executes tests on multiple architectures. It uses a kernel 22 + configuration derived from both the generic and architecture specific config 23 + file fragments below ``tools/testing/selftests/bpf/`` (e.g., ``config`` and 24 + ``config.x86_64``). 25 + 26 + Denylisting Tests 27 + ================= 28 + 29 + It is possible for some architectures to not have support for all BPF features. 30 + In such a case tests in CI may fail. An example of such a shortcoming is BPF 31 + trampoline support on IBM's s390x architecture. For cases like this, an in-tree 32 + deny list file, located at ``tools/testing/selftests/bpf/DENYLIST.<arch>``, can 33 + be used to prevent the test from running on such an architecture. 34 + 35 + In addition to that, the generic ``tools/testing/selftests/bpf/DENYLIST`` is 36 + honored on every architecture running tests. 37 + 38 + These files are organized in three columns. The first column lists the test in 39 + question. This can be the name of a test suite or of an individual test. The 40 + remaining two columns provide additional meta data that helps identify and 41 + classify the entry: column two is a copy and paste of the error being reported 42 + when running the test in the setting in question. The third column, if 43 + available, summarizes the underlying problem. A value of ``trampoline``, for 44 + example, indicates that lack of trampoline support is causing the test to fail. 45 + This last entry helps identify tests that can be re-enabled once such support is 46 + added. 47 + 9 48 ========================= 10 49 Running Selftests in a VM 11 50 ========================= 12 51 13 52 It's now possible to run the selftests using ``tools/testing/selftests/bpf/vmtest.sh``. 14 53 The script tries to ensure that the tests are run with the same environment as they 15 - would be run post-submit in the CI used by the Maintainers. 54 + would be run post-submit in the CI used by the Maintainers, with the exception 55 + that deny lists are not automatically honored. 16 56 17 57 This script uses the in-tree kernel configuration and downloads a VM userspace 18 58 image from the system used by the CI. It builds the kernel (without overwriting

+24

tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c

··· 128 128 } 129 129 } 130 130 131 + noinline int bpf_testmod_fentry_test1(int a) 132 + { 133 + return a + 1; 134 + } 135 + 136 + noinline int bpf_testmod_fentry_test2(int a, u64 b) 137 + { 138 + return a + b; 139 + } 140 + 141 + noinline int bpf_testmod_fentry_test3(char a, int b, u64 c) 142 + { 143 + return a + b + c; 144 + } 145 + 146 + int bpf_testmod_fentry_ok; 147 + 131 148 noinline ssize_t 132 149 bpf_testmod_test_read(struct file *file, struct kobject *kobj, 133 150 struct bin_attribute *bin_attr, ··· 184 167 return snprintf(buf, len, "%d\n", writable.val); 185 168 } 186 169 170 + if (bpf_testmod_fentry_test1(1) != 2 || 171 + bpf_testmod_fentry_test2(2, 3) != 5 || 172 + bpf_testmod_fentry_test3(4, 5, 6) != 15) 173 + goto out; 174 + 175 + bpf_testmod_fentry_ok = 1; 176 + out: 187 177 return -EIO; /* always fail */ 188 178 } 189 179 EXPORT_SYMBOL(bpf_testmod_test_read);

+2

tools/testing/selftests/bpf/config

··· 1 1 CONFIG_BLK_DEV_LOOP=y 2 + CONFIG_BOOTPARAM_HARDLOCKUP_PANIC=y 3 + CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y 2 4 CONFIG_BPF=y 3 5 CONFIG_BPF_EVENTS=y 4 6 CONFIG_BPF_JIT=y

+181

tools/testing/selftests/bpf/config.aarch64

··· 1 + CONFIG_9P_FS=y 2 + CONFIG_ARCH_VEXPRESS=y 3 + CONFIG_ARCH_WANT_DEFAULT_BPF_JIT=y 4 + CONFIG_ARM_SMMU_V3=y 5 + CONFIG_ATA=y 6 + CONFIG_AUDIT=y 7 + CONFIG_BINFMT_MISC=y 8 + CONFIG_BLK_CGROUP=y 9 + CONFIG_BLK_DEV_BSGLIB=y 10 + CONFIG_BLK_DEV_INITRD=y 11 + CONFIG_BLK_DEV_IO_TRACE=y 12 + CONFIG_BLK_DEV_RAM=y 13 + CONFIG_BLK_DEV_SD=y 14 + CONFIG_BONDING=y 15 + CONFIG_BPFILTER=y 16 + CONFIG_BPF_JIT_ALWAYS_ON=y 17 + CONFIG_BPF_JIT_DEFAULT_ON=y 18 + CONFIG_BPF_PRELOAD_UMD=y 19 + CONFIG_BPF_PRELOAD=y 20 + CONFIG_BRIDGE=m 21 + CONFIG_CGROUP_CPUACCT=y 22 + CONFIG_CGROUP_DEVICE=y 23 + CONFIG_CGROUP_FREEZER=y 24 + CONFIG_CGROUP_HUGETLB=y 25 + CONFIG_CGROUP_NET_CLASSID=y 26 + CONFIG_CGROUP_PERF=y 27 + CONFIG_CGROUP_PIDS=y 28 + CONFIG_CGROUP_SCHED=y 29 + CONFIG_CGROUPS=y 30 + CONFIG_CHECKPOINT_RESTORE=y 31 + CONFIG_CHR_DEV_SG=y 32 + CONFIG_COMPAT=y 33 + CONFIG_CPUSETS=y 34 + CONFIG_CRASH_DUMP=y 35 + CONFIG_CRYPTO_USER_API_RNG=y 36 + CONFIG_CRYPTO_USER_API_SKCIPHER=y 37 + CONFIG_DEBUG_ATOMIC_SLEEP=y 38 + CONFIG_DEBUG_INFO_BTF=y 39 + CONFIG_DEBUG_INFO_DWARF4=y 40 + CONFIG_DEBUG_LIST=y 41 + CONFIG_DEBUG_LOCKDEP=y 42 + CONFIG_DEBUG_NOTIFIERS=y 43 + CONFIG_DEBUG_PAGEALLOC=y 44 + CONFIG_DEBUG_SECTION_MISMATCH=y 45 + CONFIG_DEBUG_SG=y 46 + CONFIG_DETECT_HUNG_TASK=y 47 + CONFIG_DEVTMPFS_MOUNT=y 48 + CONFIG_DEVTMPFS=y 49 + CONFIG_DRM_VIRTIO_GPU=y 50 + CONFIG_DRM=y 51 + CONFIG_DUMMY=y 52 + CONFIG_EXPERT=y 53 + CONFIG_EXT4_FS_POSIX_ACL=y 54 + CONFIG_EXT4_FS_SECURITY=y 55 + CONFIG_EXT4_FS=y 56 + CONFIG_FANOTIFY=y 57 + CONFIG_FB=y 58 + CONFIG_FUNCTION_PROFILER=y 59 + CONFIG_FUSE_FS=y 60 + CONFIG_FW_CFG_SYSFS_CMDLINE=y 61 + CONFIG_FW_CFG_SYSFS=y 62 + CONFIG_GDB_SCRIPTS=y 63 + CONFIG_HAVE_EBPF_JIT=y 64 + CONFIG_HAVE_KPROBES_ON_FTRACE=y 65 + CONFIG_HAVE_KPROBES=y 66 + CONFIG_HAVE_KRETPROBES=y 67 + CONFIG_HEADERS_INSTALL=y 68 + CONFIG_HIGH_RES_TIMERS=y 69 + CONFIG_HUGETLBFS=y 70 + CONFIG_HW_RANDOM_VIRTIO=y 71 + CONFIG_HW_RANDOM=y 72 + CONFIG_HZ_100=y 73 + CONFIG_IDLE_PAGE_TRACKING=y 74 + CONFIG_IKHEADERS=y 75 + CONFIG_INET6_ESP=y 76 + CONFIG_INET_ESP=y 77 + CONFIG_INET=y 78 + CONFIG_INPUT_EVDEV=y 79 + CONFIG_IP_ADVANCED_ROUTER=y 80 + CONFIG_IP_MULTICAST=y 81 + CONFIG_IP_MULTIPLE_TABLES=y 82 + CONFIG_IP_NF_IPTABLES=y 83 + CONFIG_IPV6_SEG6_LWTUNNEL=y 84 + CONFIG_IPVLAN=y 85 + CONFIG_JUMP_LABEL=y 86 + CONFIG_KERNEL_UNCOMPRESSED=y 87 + CONFIG_KPROBES_ON_FTRACE=y 88 + CONFIG_KPROBES=y 89 + CONFIG_KRETPROBES=y 90 + CONFIG_KSM=y 91 + CONFIG_LATENCYTOP=y 92 + CONFIG_LIVEPATCH=y 93 + CONFIG_LOCK_STAT=y 94 + CONFIG_MACVLAN=y 95 + CONFIG_MACVTAP=y 96 + CONFIG_MAGIC_SYSRQ=y 97 + CONFIG_MAILBOX=y 98 + CONFIG_MEMCG=y 99 + CONFIG_MEMORY_HOTPLUG=y 100 + CONFIG_MEMORY_HOTREMOVE=y 101 + CONFIG_NAMESPACES=y 102 + CONFIG_NET_9P_VIRTIO=y 103 + CONFIG_NET_9P=y 104 + CONFIG_NET_ACT_BPF=y 105 + CONFIG_NET_ACT_GACT=y 106 + CONFIG_NETDEVICES=y 107 + CONFIG_NETFILTER_XT_MATCH_BPF=y 108 + CONFIG_NETFILTER_XT_TARGET_MARK=y 109 + CONFIG_NET_KEY=y 110 + CONFIG_NET_SCH_FQ=y 111 + CONFIG_NET_VRF=y 112 + CONFIG_NET=y 113 + CONFIG_NF_TABLES=y 114 + CONFIG_NLMON=y 115 + CONFIG_NO_HZ_IDLE=y 116 + CONFIG_NR_CPUS=256 117 + CONFIG_NUMA=y 118 + CONFIG_OVERLAY_FS=y 119 + CONFIG_PACKET_DIAG=y 120 + CONFIG_PACKET=y 121 + CONFIG_PANIC_ON_OOPS=y 122 + CONFIG_PARTITION_ADVANCED=y 123 + CONFIG_PCI_HOST_GENERIC=y 124 + CONFIG_PCI=y 125 + CONFIG_PL320_MBOX=y 126 + CONFIG_POSIX_MQUEUE=y 127 + CONFIG_PROC_KCORE=y 128 + CONFIG_PROFILING=y 129 + CONFIG_PROVE_LOCKING=y 130 + CONFIG_PTDUMP_DEBUGFS=y 131 + CONFIG_RC_DEVICES=y 132 + CONFIG_RC_LOOPBACK=y 133 + CONFIG_RTC_CLASS=y 134 + CONFIG_RTC_DRV_PL031=y 135 + CONFIG_RT_GROUP_SCHED=y 136 + CONFIG_SAMPLE_SECCOMP=y 137 + CONFIG_SAMPLES=y 138 + CONFIG_SCHED_AUTOGROUP=y 139 + CONFIG_SCHED_TRACER=y 140 + CONFIG_SCSI_CONSTANTS=y 141 + CONFIG_SCSI_LOGGING=y 142 + CONFIG_SCSI_SCAN_ASYNC=y 143 + CONFIG_SCSI_VIRTIO=y 144 + CONFIG_SCSI=y 145 + CONFIG_SECURITY_NETWORK=y 146 + CONFIG_SERIAL_AMBA_PL011_CONSOLE=y 147 + CONFIG_SERIAL_AMBA_PL011=y 148 + CONFIG_STACK_TRACER=y 149 + CONFIG_STATIC_KEYS_SELFTEST=y 150 + CONFIG_SYSVIPC=y 151 + CONFIG_TASK_DELAY_ACCT=y 152 + CONFIG_TASK_IO_ACCOUNTING=y 153 + CONFIG_TASKSTATS=y 154 + CONFIG_TASK_XACCT=y 155 + CONFIG_TCG_TIS=y 156 + CONFIG_TCG_TPM=y 157 + CONFIG_TCP_CONG_ADVANCED=y 158 + CONFIG_TCP_CONG_DCTCP=y 159 + CONFIG_TLS=y 160 + CONFIG_TMPFS_POSIX_ACL=y 161 + CONFIG_TMPFS=y 162 + CONFIG_TRACER_SNAPSHOT_PER_CPU_SWAP=y 163 + CONFIG_TRANSPARENT_HUGEPAGE=y 164 + CONFIG_TUN=y 165 + CONFIG_UNIX=y 166 + CONFIG_UPROBES=y 167 + CONFIG_USELIB=y 168 + CONFIG_USER_NS=y 169 + CONFIG_VETH=y 170 + CONFIG_VIRTIO_BALLOON=y 171 + CONFIG_VIRTIO_BLK=y 172 + CONFIG_VIRTIO_CONSOLE=y 173 + CONFIG_VIRTIO_FS=y 174 + CONFIG_VIRTIO_INPUT=y 175 + CONFIG_VIRTIO_MMIO_CMDLINE_DEVICES=y 176 + CONFIG_VIRTIO_MMIO=y 177 + CONFIG_VIRTIO_NET=y 178 + CONFIG_VIRTIO_PCI=y 179 + CONFIG_VLAN_8021Q=y 180 + CONFIG_VSOCKETS=y 181 + CONFIG_XFRM_USER=y

-3

tools/testing/selftests/bpf/config.s390x

··· 82 82 CONFIG_MEMCG=y 83 83 CONFIG_MEMORY_HOTPLUG=y 84 84 CONFIG_MEMORY_HOTREMOVE=y 85 - CONFIG_MODULE_SIG=y 86 - CONFIG_MODULE_UNLOAD=y 87 - CONFIG_MODULES=y 88 85 CONFIG_NAMESPACES=y 89 86 CONFIG_NET=y 90 87 CONFIG_NET_9P=y

-1

tools/testing/selftests/bpf/config.x86_64

··· 18 18 CONFIG_BLK_DEV_RAM_SIZE=16384 19 19 CONFIG_BLK_DEV_THROTTLING=y 20 20 CONFIG_BONDING=y 21 - CONFIG_BOOTPARAM_HARDLOCKUP_PANIC=y 22 21 CONFIG_BOOTTIME_TRACING=y 23 22 CONFIG_BPF_JIT_ALWAYS_ON=y 24 23 CONFIG_BPF_KPROBE_OVERRIDE=y

+14 -6

tools/testing/selftests/bpf/prog_tests/bpf_iter.c

··· 941 941 { 942 942 __u64 val, expected_val = 0, res_first_val, first_val = 0; 943 943 DECLARE_LIBBPF_OPTS(bpf_iter_attach_opts, opts); 944 - __u32 expected_key = 0, res_first_key; 944 + __u32 key, expected_key = 0, res_first_key; 945 + int err, i, map_fd, hash_fd, iter_fd; 945 946 struct bpf_iter_bpf_array_map *skel; 946 947 union bpf_iter_link_info linfo; 947 - int err, i, map_fd, iter_fd; 948 948 struct bpf_link *link; 949 949 char buf[64] = {}; 950 950 int len, start; ··· 1001 1001 if (!ASSERT_EQ(skel->bss->val_sum, expected_val, "val_sum")) 1002 1002 goto close_iter; 1003 1003 1004 + hash_fd = bpf_map__fd(skel->maps.hashmap1); 1004 1005 for (i = 0; i < bpf_map__max_entries(skel->maps.arraymap1); i++) { 1005 1006 err = bpf_map_lookup_elem(map_fd, &i, &val); 1006 - if (!ASSERT_OK(err, "map_lookup")) 1007 - goto out; 1008 - if (!ASSERT_EQ(i, val, "invalid_val")) 1009 - goto out; 1007 + if (!ASSERT_OK(err, "map_lookup arraymap1")) 1008 + goto close_iter; 1009 + if (!ASSERT_EQ(i, val, "invalid_val arraymap1")) 1010 + goto close_iter; 1011 + 1012 + val = i + 4; 1013 + err = bpf_map_lookup_elem(hash_fd, &val, &key); 1014 + if (!ASSERT_OK(err, "map_lookup hashmap1")) 1015 + goto close_iter; 1016 + if (!ASSERT_EQ(key, val - 4, "invalid_val hashmap1")) 1017 + goto close_iter; 1010 1018 } 1011 1019 1012 1020 close_iter:

+171

tools/testing/selftests/bpf/prog_tests/cgrp_local_storage.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2022 Meta Platforms, Inc. and affiliates.*/ 3 + 4 + #define _GNU_SOURCE 5 + #include <unistd.h> 6 + #include <sys/syscall.h> 7 + #include <sys/types.h> 8 + #include <test_progs.h> 9 + #include "cgrp_ls_tp_btf.skel.h" 10 + #include "cgrp_ls_recursion.skel.h" 11 + #include "cgrp_ls_attach_cgroup.skel.h" 12 + #include "cgrp_ls_negative.skel.h" 13 + #include "network_helpers.h" 14 + 15 + struct socket_cookie { 16 + __u64 cookie_key; 17 + __u32 cookie_value; 18 + }; 19 + 20 + static void test_tp_btf(int cgroup_fd) 21 + { 22 + struct cgrp_ls_tp_btf *skel; 23 + long val1 = 1, val2 = 0; 24 + int err; 25 + 26 + skel = cgrp_ls_tp_btf__open_and_load(); 27 + if (!ASSERT_OK_PTR(skel, "skel_open_and_load")) 28 + return; 29 + 30 + /* populate a value in map_b */ 31 + err = bpf_map_update_elem(bpf_map__fd(skel->maps.map_b), &cgroup_fd, &val1, BPF_ANY); 32 + if (!ASSERT_OK(err, "map_update_elem")) 33 + goto out; 34 + 35 + /* check value */ 36 + err = bpf_map_lookup_elem(bpf_map__fd(skel->maps.map_b), &cgroup_fd, &val2); 37 + if (!ASSERT_OK(err, "map_lookup_elem")) 38 + goto out; 39 + if (!ASSERT_EQ(val2, 1, "map_lookup_elem, invalid val")) 40 + goto out; 41 + 42 + /* delete value */ 43 + err = bpf_map_delete_elem(bpf_map__fd(skel->maps.map_b), &cgroup_fd); 44 + if (!ASSERT_OK(err, "map_delete_elem")) 45 + goto out; 46 + 47 + skel->bss->target_pid = syscall(SYS_gettid); 48 + 49 + err = cgrp_ls_tp_btf__attach(skel); 50 + if (!ASSERT_OK(err, "skel_attach")) 51 + goto out; 52 + 53 + syscall(SYS_gettid); 54 + syscall(SYS_gettid); 55 + 56 + skel->bss->target_pid = 0; 57 + 58 + /* 3x syscalls: 1x attach and 2x gettid */ 59 + ASSERT_EQ(skel->bss->enter_cnt, 3, "enter_cnt"); 60 + ASSERT_EQ(skel->bss->exit_cnt, 3, "exit_cnt"); 61 + ASSERT_EQ(skel->bss->mismatch_cnt, 0, "mismatch_cnt"); 62 + out: 63 + cgrp_ls_tp_btf__destroy(skel); 64 + } 65 + 66 + static void test_attach_cgroup(int cgroup_fd) 67 + { 68 + int server_fd = 0, client_fd = 0, err = 0; 69 + socklen_t addr_len = sizeof(struct sockaddr_in6); 70 + struct cgrp_ls_attach_cgroup *skel; 71 + __u32 cookie_expected_value; 72 + struct sockaddr_in6 addr; 73 + struct socket_cookie val; 74 + 75 + skel = cgrp_ls_attach_cgroup__open_and_load(); 76 + if (!ASSERT_OK_PTR(skel, "skel_open")) 77 + return; 78 + 79 + skel->links.set_cookie = bpf_program__attach_cgroup( 80 + skel->progs.set_cookie, cgroup_fd); 81 + if (!ASSERT_OK_PTR(skel->links.set_cookie, "prog_attach")) 82 + goto out; 83 + 84 + skel->links.update_cookie_sockops = bpf_program__attach_cgroup( 85 + skel->progs.update_cookie_sockops, cgroup_fd); 86 + if (!ASSERT_OK_PTR(skel->links.update_cookie_sockops, "prog_attach")) 87 + goto out; 88 + 89 + skel->links.update_cookie_tracing = bpf_program__attach( 90 + skel->progs.update_cookie_tracing); 91 + if (!ASSERT_OK_PTR(skel->links.update_cookie_tracing, "prog_attach")) 92 + goto out; 93 + 94 + server_fd = start_server(AF_INET6, SOCK_STREAM, "::1", 0, 0); 95 + if (!ASSERT_GE(server_fd, 0, "start_server")) 96 + goto out; 97 + 98 + client_fd = connect_to_fd(server_fd, 0); 99 + if (!ASSERT_GE(client_fd, 0, "connect_to_fd")) 100 + goto close_server_fd; 101 + 102 + err = bpf_map_lookup_elem(bpf_map__fd(skel->maps.socket_cookies), 103 + &cgroup_fd, &val); 104 + if (!ASSERT_OK(err, "map_lookup(socket_cookies)")) 105 + goto close_client_fd; 106 + 107 + err = getsockname(client_fd, (struct sockaddr *)&addr, &addr_len); 108 + if (!ASSERT_OK(err, "getsockname")) 109 + goto close_client_fd; 110 + 111 + cookie_expected_value = (ntohs(addr.sin6_port) << 8) | 0xFF; 112 + ASSERT_EQ(val.cookie_value, cookie_expected_value, "cookie_value"); 113 + 114 + close_client_fd: 115 + close(client_fd); 116 + close_server_fd: 117 + close(server_fd); 118 + out: 119 + cgrp_ls_attach_cgroup__destroy(skel); 120 + } 121 + 122 + static void test_recursion(int cgroup_fd) 123 + { 124 + struct cgrp_ls_recursion *skel; 125 + int err; 126 + 127 + skel = cgrp_ls_recursion__open_and_load(); 128 + if (!ASSERT_OK_PTR(skel, "skel_open_and_load")) 129 + return; 130 + 131 + err = cgrp_ls_recursion__attach(skel); 132 + if (!ASSERT_OK(err, "skel_attach")) 133 + goto out; 134 + 135 + /* trigger sys_enter, make sure it does not cause deadlock */ 136 + syscall(SYS_gettid); 137 + 138 + out: 139 + cgrp_ls_recursion__destroy(skel); 140 + } 141 + 142 + static void test_negative(void) 143 + { 144 + struct cgrp_ls_negative *skel; 145 + 146 + skel = cgrp_ls_negative__open_and_load(); 147 + if (!ASSERT_ERR_PTR(skel, "skel_open_and_load")) { 148 + cgrp_ls_negative__destroy(skel); 149 + return; 150 + } 151 + } 152 + 153 + void test_cgrp_local_storage(void) 154 + { 155 + int cgroup_fd; 156 + 157 + cgroup_fd = test__join_cgroup("/cgrp_local_storage"); 158 + if (!ASSERT_GE(cgroup_fd, 0, "join_cgroup /cgrp_local_storage")) 159 + return; 160 + 161 + if (test__start_subtest("tp_btf")) 162 + test_tp_btf(cgroup_fd); 163 + if (test__start_subtest("attach_cgroup")) 164 + test_attach_cgroup(cgroup_fd); 165 + if (test__start_subtest("recursion")) 166 + test_recursion(cgroup_fd); 167 + if (test__start_subtest("negative")) 168 + test_negative(); 169 + 170 + close(cgroup_fd); 171 + }

+89

tools/testing/selftests/bpf/prog_tests/kprobe_multi_testmod_test.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + #include <test_progs.h> 3 + #include "kprobe_multi.skel.h" 4 + #include "trace_helpers.h" 5 + #include "bpf/libbpf_internal.h" 6 + 7 + static void kprobe_multi_testmod_check(struct kprobe_multi *skel) 8 + { 9 + ASSERT_EQ(skel->bss->kprobe_testmod_test1_result, 1, "kprobe_test1_result"); 10 + ASSERT_EQ(skel->bss->kprobe_testmod_test2_result, 1, "kprobe_test2_result"); 11 + ASSERT_EQ(skel->bss->kprobe_testmod_test3_result, 1, "kprobe_test3_result"); 12 + 13 + ASSERT_EQ(skel->bss->kretprobe_testmod_test1_result, 1, "kretprobe_test1_result"); 14 + ASSERT_EQ(skel->bss->kretprobe_testmod_test2_result, 1, "kretprobe_test2_result"); 15 + ASSERT_EQ(skel->bss->kretprobe_testmod_test3_result, 1, "kretprobe_test3_result"); 16 + } 17 + 18 + static void test_testmod_attach_api(struct bpf_kprobe_multi_opts *opts) 19 + { 20 + struct kprobe_multi *skel = NULL; 21 + 22 + skel = kprobe_multi__open_and_load(); 23 + if (!ASSERT_OK_PTR(skel, "fentry_raw_skel_load")) 24 + return; 25 + 26 + skel->bss->pid = getpid(); 27 + 28 + skel->links.test_kprobe_testmod = bpf_program__attach_kprobe_multi_opts( 29 + skel->progs.test_kprobe_testmod, 30 + NULL, opts); 31 + if (!skel->links.test_kprobe_testmod) 32 + goto cleanup; 33 + 34 + opts->retprobe = true; 35 + skel->links.test_kretprobe_testmod = bpf_program__attach_kprobe_multi_opts( 36 + skel->progs.test_kretprobe_testmod, 37 + NULL, opts); 38 + if (!skel->links.test_kretprobe_testmod) 39 + goto cleanup; 40 + 41 + ASSERT_OK(trigger_module_test_read(1), "trigger_read"); 42 + kprobe_multi_testmod_check(skel); 43 + 44 + cleanup: 45 + kprobe_multi__destroy(skel); 46 + } 47 + 48 + static void test_testmod_attach_api_addrs(void) 49 + { 50 + LIBBPF_OPTS(bpf_kprobe_multi_opts, opts); 51 + unsigned long long addrs[3]; 52 + 53 + addrs[0] = ksym_get_addr("bpf_testmod_fentry_test1"); 54 + ASSERT_NEQ(addrs[0], 0, "ksym_get_addr"); 55 + addrs[1] = ksym_get_addr("bpf_testmod_fentry_test2"); 56 + ASSERT_NEQ(addrs[1], 0, "ksym_get_addr"); 57 + addrs[2] = ksym_get_addr("bpf_testmod_fentry_test3"); 58 + ASSERT_NEQ(addrs[2], 0, "ksym_get_addr"); 59 + 60 + opts.addrs = (const unsigned long *) addrs; 61 + opts.cnt = ARRAY_SIZE(addrs); 62 + 63 + test_testmod_attach_api(&opts); 64 + } 65 + 66 + static void test_testmod_attach_api_syms(void) 67 + { 68 + LIBBPF_OPTS(bpf_kprobe_multi_opts, opts); 69 + const char *syms[3] = { 70 + "bpf_testmod_fentry_test1", 71 + "bpf_testmod_fentry_test2", 72 + "bpf_testmod_fentry_test3", 73 + }; 74 + 75 + opts.syms = syms; 76 + opts.cnt = ARRAY_SIZE(syms); 77 + test_testmod_attach_api(&opts); 78 + } 79 + 80 + void serial_test_kprobe_multi_testmod_test(void) 81 + { 82 + if (!ASSERT_OK(load_kallsyms_refresh(), "load_kallsyms_refresh")) 83 + return; 84 + 85 + if (test__start_subtest("testmod_attach_api_syms")) 86 + test_testmod_attach_api_syms(); 87 + if (test__start_subtest("testmod_attach_api_addrs")) 88 + test_testmod_attach_api_addrs(); 89 + }

+8

tools/testing/selftests/bpf/prog_tests/libbpf_str.c

··· 139 139 snprintf(buf, sizeof(buf), "BPF_MAP_TYPE_%s", map_type_str); 140 140 uppercase(buf); 141 141 142 + /* Special case for map_type_name BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED 143 + * where it and BPF_MAP_TYPE_CGROUP_STORAGE have the same enum value 144 + * (map_type). For this enum value, libbpf_bpf_map_type_str() picks 145 + * BPF_MAP_TYPE_CGROUP_STORAGE. 146 + */ 147 + if (strcmp(map_type_name, "BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED") == 0) 148 + continue; 149 + 142 150 ASSERT_STREQ(buf, map_type_name, "exp_str_value"); 143 151 } 144 152

+7

tools/testing/selftests/bpf/prog_tests/module_attach.c

··· 103 103 ASSERT_ERR(delete_module("bpf_testmod", 0), "delete_module"); 104 104 bpf_link__destroy(link); 105 105 106 + link = bpf_program__attach(skel->progs.kprobe_multi); 107 + if (!ASSERT_OK_PTR(link, "attach_kprobe_multi")) 108 + goto cleanup; 109 + 110 + ASSERT_ERR(delete_module("bpf_testmod", 0), "delete_module"); 111 + bpf_link__destroy(link); 112 + 106 113 cleanup: 107 114 test_module_attach__destroy(skel); 108 115 }

+65 -1

tools/testing/selftests/bpf/prog_tests/ringbuf.c

··· 13 13 #include <linux/perf_event.h> 14 14 #include <linux/ring_buffer.h> 15 15 #include "test_ringbuf.lskel.h" 16 + #include "test_ringbuf_map_key.lskel.h" 16 17 17 18 #define EDONE 7777 18 19 ··· 59 58 } 60 59 } 61 60 61 + static struct test_ringbuf_map_key_lskel *skel_map_key; 62 62 static struct test_ringbuf_lskel *skel; 63 63 static struct ring_buffer *ringbuf; 64 64 ··· 83 81 return (void *)(long)ring_buffer__poll(ringbuf, timeout); 84 82 } 85 83 86 - void test_ringbuf(void) 84 + static void ringbuf_subtest(void) 87 85 { 88 86 const size_t rec_sz = BPF_RINGBUF_HDR_SZ + sizeof(struct sample); 89 87 pthread_t thread; ··· 298 296 cleanup: 299 297 ring_buffer__free(ringbuf); 300 298 test_ringbuf_lskel__destroy(skel); 299 + } 300 + 301 + static int process_map_key_sample(void *ctx, void *data, size_t len) 302 + { 303 + struct sample *s; 304 + int err, val; 305 + 306 + s = data; 307 + switch (s->seq) { 308 + case 1: 309 + ASSERT_EQ(s->value, 42, "sample_value"); 310 + err = bpf_map_lookup_elem(skel_map_key->maps.hash_map.map_fd, 311 + s, &val); 312 + ASSERT_OK(err, "hash_map bpf_map_lookup_elem"); 313 + ASSERT_EQ(val, 1, "hash_map val"); 314 + return -EDONE; 315 + default: 316 + return 0; 317 + } 318 + } 319 + 320 + static void ringbuf_map_key_subtest(void) 321 + { 322 + int err; 323 + 324 + skel_map_key = test_ringbuf_map_key_lskel__open(); 325 + if (!ASSERT_OK_PTR(skel_map_key, "test_ringbuf_map_key_lskel__open")) 326 + return; 327 + 328 + skel_map_key->maps.ringbuf.max_entries = getpagesize(); 329 + skel_map_key->bss->pid = getpid(); 330 + 331 + err = test_ringbuf_map_key_lskel__load(skel_map_key); 332 + if (!ASSERT_OK(err, "test_ringbuf_map_key_lskel__load")) 333 + goto cleanup; 334 + 335 + ringbuf = ring_buffer__new(skel_map_key->maps.ringbuf.map_fd, 336 + process_map_key_sample, NULL, NULL); 337 + if (!ASSERT_OK_PTR(ringbuf, "ring_buffer__new")) 338 + goto cleanup; 339 + 340 + err = test_ringbuf_map_key_lskel__attach(skel_map_key); 341 + if (!ASSERT_OK(err, "test_ringbuf_map_key_lskel__attach")) 342 + goto cleanup_ringbuf; 343 + 344 + syscall(__NR_getpgid); 345 + ASSERT_EQ(skel_map_key->bss->seq, 1, "skel_map_key->bss->seq"); 346 + err = ring_buffer__poll(ringbuf, -1); 347 + ASSERT_EQ(err, -EDONE, "ring_buffer__poll"); 348 + 349 + cleanup_ringbuf: 350 + ring_buffer__free(ringbuf); 351 + cleanup: 352 + test_ringbuf_map_key_lskel__destroy(skel_map_key); 353 + } 354 + 355 + void test_ringbuf(void) 356 + { 357 + if (test__start_subtest("ringbuf")) 358 + ringbuf_subtest(); 359 + if (test__start_subtest("ringbuf_map_key")) 360 + ringbuf_map_key_subtest(); 301 361 }

+10 -1

tools/testing/selftests/bpf/prog_tests/skeleton.c

··· 2 2 /* Copyright (c) 2019 Facebook */ 3 3 4 4 #include <test_progs.h> 5 + #include <sys/mman.h> 5 6 6 7 struct s { 7 8 int a; ··· 23 22 struct test_skeleton__kconfig *kcfg; 24 23 const void *elf_bytes; 25 24 size_t elf_bytes_sz = 0; 26 - int i; 25 + void *m; 26 + int i, fd; 27 27 28 28 skel = test_skeleton__open(); 29 29 if (CHECK(!skel, "skel_open", "failed to open skeleton\n")) ··· 125 123 ASSERT_EQ(skel->bss->out_mostly_var, 123, "out_mostly_var"); 126 124 127 125 ASSERT_EQ(bss->huge_arr[ARRAY_SIZE(bss->huge_arr) - 1], 123, "huge_arr"); 126 + 127 + fd = bpf_map__fd(skel->maps.data_non_mmapable); 128 + m = mmap(NULL, getpagesize(), PROT_READ, MAP_SHARED, fd, 0); 129 + if (!ASSERT_EQ(m, MAP_FAILED, "unexpected_mmap_success")) 130 + munmap(m, getpagesize()); 131 + 132 + ASSERT_EQ(bpf_map__map_flags(skel->maps.data_non_mmapable), 0, "non_mmap_flags"); 128 133 129 134 elf_bytes = test_skeleton__elf_bytes(&elf_bytes_sz); 130 135 ASSERT_OK_PTR(elf_bytes, "elf_bytes");

+159 -5

tools/testing/selftests/bpf/prog_tests/task_local_storage.c

··· 3 3 4 4 #define _GNU_SOURCE /* See feature_test_macros(7) */ 5 5 #include <unistd.h> 6 + #include <sched.h> 7 + #include <pthread.h> 6 8 #include <sys/syscall.h> /* For SYS_xxx definitions */ 7 9 #include <sys/types.h> 8 10 #include <test_progs.h> 11 + #include "task_local_storage_helpers.h" 9 12 #include "task_local_storage.skel.h" 10 13 #include "task_local_storage_exit_creds.skel.h" 11 14 #include "task_ls_recursion.skel.h" 15 + #include "task_storage_nodeadlock.skel.h" 12 16 13 17 static void test_sys_enter_exit(void) 14 18 { ··· 43 39 static void test_exit_creds(void) 44 40 { 45 41 struct task_local_storage_exit_creds *skel; 46 - int err; 42 + int err, run_count, sync_rcu_calls = 0; 43 + const int MAX_SYNC_RCU_CALLS = 1000; 47 44 48 45 skel = task_local_storage_exit_creds__open_and_load(); 49 46 if (!ASSERT_OK_PTR(skel, "skel_open_and_load")) ··· 58 53 if (CHECK_FAIL(system("ls > /dev/null"))) 59 54 goto out; 60 55 61 - /* sync rcu to make sure exit_creds() is called for "ls" */ 62 - kern_sync_rcu(); 56 + /* kern_sync_rcu is not enough on its own as the read section we want 57 + * to wait for may start after we enter synchronize_rcu, so our call 58 + * won't wait for the section to finish. Loop on the run counter 59 + * as well to ensure the program has run. 60 + */ 61 + do { 62 + kern_sync_rcu(); 63 + run_count = __atomic_load_n(&skel->bss->run_count, __ATOMIC_SEQ_CST); 64 + } while (run_count == 0 && ++sync_rcu_calls < MAX_SYNC_RCU_CALLS); 65 + 66 + ASSERT_NEQ(sync_rcu_calls, MAX_SYNC_RCU_CALLS, 67 + "sync_rcu count too high"); 68 + ASSERT_NEQ(run_count, 0, "run_count"); 63 69 ASSERT_EQ(skel->bss->valid_ptr_count, 0, "valid_ptr_count"); 64 70 ASSERT_NEQ(skel->bss->null_ptr_count, 0, "null_ptr_count"); 65 71 out: ··· 79 63 80 64 static void test_recursion(void) 81 65 { 66 + int err, map_fd, prog_fd, task_fd; 82 67 struct task_ls_recursion *skel; 83 - int err; 68 + struct bpf_prog_info info; 69 + __u32 info_len = sizeof(info); 70 + long value; 71 + 72 + task_fd = sys_pidfd_open(getpid(), 0); 73 + if (!ASSERT_NEQ(task_fd, -1, "sys_pidfd_open")) 74 + return; 84 75 85 76 skel = task_ls_recursion__open_and_load(); 86 77 if (!ASSERT_OK_PTR(skel, "skel_open_and_load")) 87 - return; 78 + goto out; 88 79 89 80 err = task_ls_recursion__attach(skel); 90 81 if (!ASSERT_OK(err, "skel_attach")) 91 82 goto out; 92 83 93 84 /* trigger sys_enter, make sure it does not cause deadlock */ 85 + skel->bss->test_pid = getpid(); 94 86 syscall(SYS_gettid); 87 + skel->bss->test_pid = 0; 88 + task_ls_recursion__detach(skel); 89 + 90 + /* Refer to the comment in BPF_PROG(on_update) for 91 + * the explanation on the value 201 and 100. 92 + */ 93 + map_fd = bpf_map__fd(skel->maps.map_a); 94 + err = bpf_map_lookup_elem(map_fd, &task_fd, &value); 95 + ASSERT_OK(err, "lookup map_a"); 96 + ASSERT_EQ(value, 201, "map_a value"); 97 + ASSERT_EQ(skel->bss->nr_del_errs, 1, "bpf_task_storage_delete busy"); 98 + 99 + map_fd = bpf_map__fd(skel->maps.map_b); 100 + err = bpf_map_lookup_elem(map_fd, &task_fd, &value); 101 + ASSERT_OK(err, "lookup map_b"); 102 + ASSERT_EQ(value, 100, "map_b value"); 103 + 104 + prog_fd = bpf_program__fd(skel->progs.on_lookup); 105 + memset(&info, 0, sizeof(info)); 106 + err = bpf_obj_get_info_by_fd(prog_fd, &info, &info_len); 107 + ASSERT_OK(err, "get prog info"); 108 + ASSERT_GT(info.recursion_misses, 0, "on_lookup prog recursion"); 109 + 110 + prog_fd = bpf_program__fd(skel->progs.on_update); 111 + memset(&info, 0, sizeof(info)); 112 + err = bpf_obj_get_info_by_fd(prog_fd, &info, &info_len); 113 + ASSERT_OK(err, "get prog info"); 114 + ASSERT_EQ(info.recursion_misses, 0, "on_update prog recursion"); 115 + 116 + prog_fd = bpf_program__fd(skel->progs.on_enter); 117 + memset(&info, 0, sizeof(info)); 118 + err = bpf_obj_get_info_by_fd(prog_fd, &info, &info_len); 119 + ASSERT_OK(err, "get prog info"); 120 + ASSERT_EQ(info.recursion_misses, 0, "on_enter prog recursion"); 95 121 96 122 out: 123 + close(task_fd); 97 124 task_ls_recursion__destroy(skel); 125 + } 126 + 127 + static bool stop; 128 + 129 + static void waitall(const pthread_t *tids, int nr) 130 + { 131 + int i; 132 + 133 + stop = true; 134 + for (i = 0; i < nr; i++) 135 + pthread_join(tids[i], NULL); 136 + } 137 + 138 + static void *sock_create_loop(void *arg) 139 + { 140 + struct task_storage_nodeadlock *skel = arg; 141 + int fd; 142 + 143 + while (!stop) { 144 + fd = socket(AF_INET, SOCK_STREAM, 0); 145 + close(fd); 146 + if (skel->bss->nr_get_errs || skel->bss->nr_del_errs) 147 + stop = true; 148 + } 149 + 150 + return NULL; 151 + } 152 + 153 + static void test_nodeadlock(void) 154 + { 155 + struct task_storage_nodeadlock *skel; 156 + struct bpf_prog_info info = {}; 157 + __u32 info_len = sizeof(info); 158 + const int nr_threads = 32; 159 + pthread_t tids[nr_threads]; 160 + int i, prog_fd, err; 161 + cpu_set_t old, new; 162 + 163 + /* Pin all threads to one cpu to increase the chance of preemption 164 + * in a sleepable bpf prog. 165 + */ 166 + CPU_ZERO(&new); 167 + CPU_SET(0, &new); 168 + err = sched_getaffinity(getpid(), sizeof(old), &old); 169 + if (!ASSERT_OK(err, "getaffinity")) 170 + return; 171 + err = sched_setaffinity(getpid(), sizeof(new), &new); 172 + if (!ASSERT_OK(err, "setaffinity")) 173 + return; 174 + 175 + skel = task_storage_nodeadlock__open_and_load(); 176 + if (!ASSERT_OK_PTR(skel, "open_and_load")) 177 + goto done; 178 + 179 + /* Unnecessary recursion and deadlock detection are reproducible 180 + * in the preemptible kernel. 181 + */ 182 + if (!skel->kconfig->CONFIG_PREEMPT) { 183 + test__skip(); 184 + goto done; 185 + } 186 + 187 + err = task_storage_nodeadlock__attach(skel); 188 + ASSERT_OK(err, "attach prog"); 189 + 190 + for (i = 0; i < nr_threads; i++) { 191 + err = pthread_create(&tids[i], NULL, sock_create_loop, skel); 192 + if (err) { 193 + /* Only assert once here to avoid excessive 194 + * PASS printing during test failure. 195 + */ 196 + ASSERT_OK(err, "pthread_create"); 197 + waitall(tids, i); 198 + goto done; 199 + } 200 + } 201 + 202 + /* With 32 threads, 1s is enough to reproduce the issue */ 203 + sleep(1); 204 + waitall(tids, nr_threads); 205 + 206 + info_len = sizeof(info); 207 + prog_fd = bpf_program__fd(skel->progs.socket_post_create); 208 + err = bpf_obj_get_info_by_fd(prog_fd, &info, &info_len); 209 + ASSERT_OK(err, "get prog info"); 210 + ASSERT_EQ(info.recursion_misses, 0, "prog recursion"); 211 + 212 + ASSERT_EQ(skel->bss->nr_get_errs, 0, "bpf_task_storage_get busy"); 213 + ASSERT_EQ(skel->bss->nr_del_errs, 0, "bpf_task_storage_delete busy"); 214 + 215 + done: 216 + task_storage_nodeadlock__destroy(skel); 217 + sched_setaffinity(getpid(), sizeof(old), &old); 98 218 } 99 219 100 220 void test_task_local_storage(void) ··· 241 89 test_exit_creds(); 242 90 if (test__start_subtest("recursion")) 243 91 test_recursion(); 92 + if (test__start_subtest("nodeadlock")) 93 + test_nodeadlock(); 244 94 }

+20 -1

tools/testing/selftests/bpf/progs/bpf_iter_bpf_array_map.c

··· 19 19 __type(value, __u64); 20 20 } arraymap1 SEC(".maps"); 21 21 22 + struct { 23 + __uint(type, BPF_MAP_TYPE_HASH); 24 + __uint(max_entries, 10); 25 + __type(key, __u64); 26 + __type(value, __u32); 27 + } hashmap1 SEC(".maps"); 28 + 22 29 __u32 key_sum = 0; 23 30 __u64 val_sum = 0; 24 31 25 32 SEC("iter/bpf_map_elem") 26 33 int dump_bpf_array_map(struct bpf_iter__bpf_map_elem *ctx) 27 34 { 28 - __u32 *key = ctx->key; 35 + __u32 *hmap_val, *key = ctx->key; 29 36 __u64 *val = ctx->value; 30 37 31 38 if (key == (void *)0 || val == (void *)0) ··· 42 35 bpf_seq_write(ctx->meta->seq, val, sizeof(__u64)); 43 36 key_sum += *key; 44 37 val_sum += *val; 38 + 39 + /* workaround - It's necessary to do this convoluted (val, key) 40 + * write into hashmap1, instead of simply doing 41 + * bpf_map_update_elem(&hashmap1, val, key, BPF_ANY); 42 + * because key has MEM_RDONLY flag and bpf_map_update elem expects 43 + * types without this flag 44 + */ 45 + bpf_map_update_elem(&hashmap1, val, val, BPF_ANY); 46 + hmap_val = bpf_map_lookup_elem(&hashmap1, val); 47 + if (hmap_val) 48 + *hmap_val = *key; 49 + 45 50 *val = *key; 46 51 return 0; 47 52 }

+101

tools/testing/selftests/bpf/progs/cgrp_ls_attach_cgroup.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ 3 + 4 + #include "vmlinux.h" 5 + #include <bpf/bpf_helpers.h> 6 + #include <bpf/bpf_tracing.h> 7 + #include "bpf_tracing_net.h" 8 + 9 + char _license[] SEC("license") = "GPL"; 10 + 11 + struct socket_cookie { 12 + __u64 cookie_key; 13 + __u64 cookie_value; 14 + }; 15 + 16 + struct { 17 + __uint(type, BPF_MAP_TYPE_CGRP_STORAGE); 18 + __uint(map_flags, BPF_F_NO_PREALLOC); 19 + __type(key, int); 20 + __type(value, struct socket_cookie); 21 + } socket_cookies SEC(".maps"); 22 + 23 + SEC("cgroup/connect6") 24 + int set_cookie(struct bpf_sock_addr *ctx) 25 + { 26 + struct socket_cookie *p; 27 + struct tcp_sock *tcp_sk; 28 + struct bpf_sock *sk; 29 + 30 + if (ctx->family != AF_INET6 || ctx->user_family != AF_INET6) 31 + return 1; 32 + 33 + sk = ctx->sk; 34 + if (!sk) 35 + return 1; 36 + 37 + tcp_sk = bpf_skc_to_tcp_sock(sk); 38 + if (!tcp_sk) 39 + return 1; 40 + 41 + p = bpf_cgrp_storage_get(&socket_cookies, 42 + tcp_sk->inet_conn.icsk_inet.sk.sk_cgrp_data.cgroup, 0, 43 + BPF_LOCAL_STORAGE_GET_F_CREATE); 44 + if (!p) 45 + return 1; 46 + 47 + p->cookie_value = 0xF; 48 + p->cookie_key = bpf_get_socket_cookie(ctx); 49 + return 1; 50 + } 51 + 52 + SEC("sockops") 53 + int update_cookie_sockops(struct bpf_sock_ops *ctx) 54 + { 55 + struct socket_cookie *p; 56 + struct tcp_sock *tcp_sk; 57 + struct bpf_sock *sk; 58 + 59 + if (ctx->family != AF_INET6 || ctx->op != BPF_SOCK_OPS_TCP_CONNECT_CB) 60 + return 1; 61 + 62 + sk = ctx->sk; 63 + if (!sk) 64 + return 1; 65 + 66 + tcp_sk = bpf_skc_to_tcp_sock(sk); 67 + if (!tcp_sk) 68 + return 1; 69 + 70 + p = bpf_cgrp_storage_get(&socket_cookies, 71 + tcp_sk->inet_conn.icsk_inet.sk.sk_cgrp_data.cgroup, 0, 0); 72 + if (!p) 73 + return 1; 74 + 75 + if (p->cookie_key != bpf_get_socket_cookie(ctx)) 76 + return 1; 77 + 78 + p->cookie_value |= (ctx->local_port << 8); 79 + return 1; 80 + } 81 + 82 + SEC("fexit/inet_stream_connect") 83 + int BPF_PROG(update_cookie_tracing, struct socket *sock, 84 + struct sockaddr *uaddr, int addr_len, int flags) 85 + { 86 + struct socket_cookie *p; 87 + struct tcp_sock *tcp_sk; 88 + 89 + if (uaddr->sa_family != AF_INET6) 90 + return 0; 91 + 92 + p = bpf_cgrp_storage_get(&socket_cookies, sock->sk->sk_cgrp_data.cgroup, 0, 0); 93 + if (!p) 94 + return 0; 95 + 96 + if (p->cookie_key != bpf_get_socket_cookie(sock->sk)) 97 + return 0; 98 + 99 + p->cookie_value |= 0xF0; 100 + return 0; 101 + }

+26

tools/testing/selftests/bpf/progs/cgrp_ls_negative.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ 3 + 4 + #include "vmlinux.h" 5 + #include <bpf/bpf_helpers.h> 6 + #include <bpf/bpf_tracing.h> 7 + 8 + char _license[] SEC("license") = "GPL"; 9 + 10 + struct { 11 + __uint(type, BPF_MAP_TYPE_CGRP_STORAGE); 12 + __uint(map_flags, BPF_F_NO_PREALLOC); 13 + __type(key, int); 14 + __type(value, long); 15 + } map_a SEC(".maps"); 16 + 17 + SEC("tp_btf/sys_enter") 18 + int BPF_PROG(on_enter, struct pt_regs *regs, long id) 19 + { 20 + struct task_struct *task; 21 + 22 + task = bpf_get_current_task_btf(); 23 + (void)bpf_cgrp_storage_get(&map_a, (struct cgroup *)task, 0, 24 + BPF_LOCAL_STORAGE_GET_F_CREATE); 25 + return 0; 26 + }

+70

tools/testing/selftests/bpf/progs/cgrp_ls_recursion.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ 3 + 4 + #include "vmlinux.h" 5 + #include <bpf/bpf_helpers.h> 6 + #include <bpf/bpf_tracing.h> 7 + 8 + char _license[] SEC("license") = "GPL"; 9 + 10 + struct { 11 + __uint(type, BPF_MAP_TYPE_CGRP_STORAGE); 12 + __uint(map_flags, BPF_F_NO_PREALLOC); 13 + __type(key, int); 14 + __type(value, long); 15 + } map_a SEC(".maps"); 16 + 17 + struct { 18 + __uint(type, BPF_MAP_TYPE_CGRP_STORAGE); 19 + __uint(map_flags, BPF_F_NO_PREALLOC); 20 + __type(key, int); 21 + __type(value, long); 22 + } map_b SEC(".maps"); 23 + 24 + SEC("fentry/bpf_local_storage_lookup") 25 + int BPF_PROG(on_lookup) 26 + { 27 + struct task_struct *task = bpf_get_current_task_btf(); 28 + 29 + bpf_cgrp_storage_delete(&map_a, task->cgroups->dfl_cgrp); 30 + bpf_cgrp_storage_delete(&map_b, task->cgroups->dfl_cgrp); 31 + return 0; 32 + } 33 + 34 + SEC("fentry/bpf_local_storage_update") 35 + int BPF_PROG(on_update) 36 + { 37 + struct task_struct *task = bpf_get_current_task_btf(); 38 + long *ptr; 39 + 40 + ptr = bpf_cgrp_storage_get(&map_a, task->cgroups->dfl_cgrp, 0, 41 + BPF_LOCAL_STORAGE_GET_F_CREATE); 42 + if (ptr) 43 + *ptr += 1; 44 + 45 + ptr = bpf_cgrp_storage_get(&map_b, task->cgroups->dfl_cgrp, 0, 46 + BPF_LOCAL_STORAGE_GET_F_CREATE); 47 + if (ptr) 48 + *ptr += 1; 49 + 50 + return 0; 51 + } 52 + 53 + SEC("tp_btf/sys_enter") 54 + int BPF_PROG(on_enter, struct pt_regs *regs, long id) 55 + { 56 + struct task_struct *task; 57 + long *ptr; 58 + 59 + task = bpf_get_current_task_btf(); 60 + ptr = bpf_cgrp_storage_get(&map_a, task->cgroups->dfl_cgrp, 0, 61 + BPF_LOCAL_STORAGE_GET_F_CREATE); 62 + if (ptr) 63 + *ptr = 200; 64 + 65 + ptr = bpf_cgrp_storage_get(&map_b, task->cgroups->dfl_cgrp, 0, 66 + BPF_LOCAL_STORAGE_GET_F_CREATE); 67 + if (ptr) 68 + *ptr = 100; 69 + return 0; 70 + }

+88

tools/testing/selftests/bpf/progs/cgrp_ls_tp_btf.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ 3 + 4 + #include "vmlinux.h" 5 + #include <bpf/bpf_helpers.h> 6 + #include <bpf/bpf_tracing.h> 7 + 8 + char _license[] SEC("license") = "GPL"; 9 + 10 + struct { 11 + __uint(type, BPF_MAP_TYPE_CGRP_STORAGE); 12 + __uint(map_flags, BPF_F_NO_PREALLOC); 13 + __type(key, int); 14 + __type(value, long); 15 + } map_a SEC(".maps"); 16 + 17 + struct { 18 + __uint(type, BPF_MAP_TYPE_CGRP_STORAGE); 19 + __uint(map_flags, BPF_F_NO_PREALLOC); 20 + __type(key, int); 21 + __type(value, long); 22 + } map_b SEC(".maps"); 23 + 24 + #define MAGIC_VALUE 0xabcd1234 25 + 26 + pid_t target_pid = 0; 27 + int mismatch_cnt = 0; 28 + int enter_cnt = 0; 29 + int exit_cnt = 0; 30 + 31 + SEC("tp_btf/sys_enter") 32 + int BPF_PROG(on_enter, struct pt_regs *regs, long id) 33 + { 34 + struct task_struct *task; 35 + long *ptr; 36 + int err; 37 + 38 + task = bpf_get_current_task_btf(); 39 + if (task->pid != target_pid) 40 + return 0; 41 + 42 + /* populate value 0 */ 43 + ptr = bpf_cgrp_storage_get(&map_a, task->cgroups->dfl_cgrp, 0, 44 + BPF_LOCAL_STORAGE_GET_F_CREATE); 45 + if (!ptr) 46 + return 0; 47 + 48 + /* delete value 0 */ 49 + err = bpf_cgrp_storage_delete(&map_a, task->cgroups->dfl_cgrp); 50 + if (err) 51 + return 0; 52 + 53 + /* value is not available */ 54 + ptr = bpf_cgrp_storage_get(&map_a, task->cgroups->dfl_cgrp, 0, 0); 55 + if (ptr) 56 + return 0; 57 + 58 + /* re-populate the value */ 59 + ptr = bpf_cgrp_storage_get(&map_a, task->cgroups->dfl_cgrp, 0, 60 + BPF_LOCAL_STORAGE_GET_F_CREATE); 61 + if (!ptr) 62 + return 0; 63 + __sync_fetch_and_add(&enter_cnt, 1); 64 + *ptr = MAGIC_VALUE + enter_cnt; 65 + 66 + return 0; 67 + } 68 + 69 + SEC("tp_btf/sys_exit") 70 + int BPF_PROG(on_exit, struct pt_regs *regs, long id) 71 + { 72 + struct task_struct *task; 73 + long *ptr; 74 + 75 + task = bpf_get_current_task_btf(); 76 + if (task->pid != target_pid) 77 + return 0; 78 + 79 + ptr = bpf_cgrp_storage_get(&map_a, task->cgroups->dfl_cgrp, 0, 80 + BPF_LOCAL_STORAGE_GET_F_CREATE); 81 + if (!ptr) 82 + return 0; 83 + 84 + __sync_fetch_and_add(&exit_cnt, 1); 85 + if (*ptr != MAGIC_VALUE + exit_cnt) 86 + __sync_fetch_and_add(&mismatch_cnt, 1); 87 + return 0; 88 + }

+50

tools/testing/selftests/bpf/progs/kprobe_multi.c

··· 110 110 kprobe_multi_check(ctx, true); 111 111 return 0; 112 112 } 113 + 114 + extern const void bpf_testmod_fentry_test1 __ksym; 115 + extern const void bpf_testmod_fentry_test2 __ksym; 116 + extern const void bpf_testmod_fentry_test3 __ksym; 117 + 118 + __u64 kprobe_testmod_test1_result = 0; 119 + __u64 kprobe_testmod_test2_result = 0; 120 + __u64 kprobe_testmod_test3_result = 0; 121 + 122 + __u64 kretprobe_testmod_test1_result = 0; 123 + __u64 kretprobe_testmod_test2_result = 0; 124 + __u64 kretprobe_testmod_test3_result = 0; 125 + 126 + static void kprobe_multi_testmod_check(void *ctx, bool is_return) 127 + { 128 + if (bpf_get_current_pid_tgid() >> 32 != pid) 129 + return; 130 + 131 + __u64 addr = bpf_get_func_ip(ctx); 132 + 133 + if (is_return) { 134 + if ((const void *) addr == &bpf_testmod_fentry_test1) 135 + kretprobe_testmod_test1_result = 1; 136 + if ((const void *) addr == &bpf_testmod_fentry_test2) 137 + kretprobe_testmod_test2_result = 1; 138 + if ((const void *) addr == &bpf_testmod_fentry_test3) 139 + kretprobe_testmod_test3_result = 1; 140 + } else { 141 + if ((const void *) addr == &bpf_testmod_fentry_test1) 142 + kprobe_testmod_test1_result = 1; 143 + if ((const void *) addr == &bpf_testmod_fentry_test2) 144 + kprobe_testmod_test2_result = 1; 145 + if ((const void *) addr == &bpf_testmod_fentry_test3) 146 + kprobe_testmod_test3_result = 1; 147 + } 148 + } 149 + 150 + SEC("kprobe.multi") 151 + int test_kprobe_testmod(struct pt_regs *ctx) 152 + { 153 + kprobe_multi_testmod_check(ctx, false); 154 + return 0; 155 + } 156 + 157 + SEC("kretprobe.multi") 158 + int test_kretprobe_testmod(struct pt_regs *ctx) 159 + { 160 + kprobe_multi_testmod_check(ctx, true); 161 + return 0; 162 + }

+3

tools/testing/selftests/bpf/progs/task_local_storage_exit_creds.c

··· 14 14 __type(value, __u64); 15 15 } task_storage SEC(".maps"); 16 16 17 + int run_count = 0; 17 18 int valid_ptr_count = 0; 18 19 int null_ptr_count = 0; 19 20 ··· 29 28 __sync_fetch_and_add(&valid_ptr_count, 1); 30 29 else 31 30 __sync_fetch_and_add(&null_ptr_count, 1); 31 + 32 + __sync_fetch_and_add(&run_count, 1); 32 33 return 0; 33 34 }

+41 -4

tools/testing/selftests/bpf/progs/task_ls_recursion.c

··· 5 5 #include <bpf/bpf_helpers.h> 6 6 #include <bpf/bpf_tracing.h> 7 7 8 + #ifndef EBUSY 9 + #define EBUSY 16 10 + #endif 11 + 8 12 char _license[] SEC("license") = "GPL"; 13 + int nr_del_errs = 0; 14 + int test_pid = 0; 9 15 10 16 struct { 11 17 __uint(type, BPF_MAP_TYPE_TASK_STORAGE); ··· 32 26 { 33 27 struct task_struct *task = bpf_get_current_task_btf(); 34 28 29 + if (!test_pid || task->pid != test_pid) 30 + return 0; 31 + 32 + /* The bpf_task_storage_delete will call 33 + * bpf_local_storage_lookup. The prog->active will 34 + * stop the recursion. 35 + */ 35 36 bpf_task_storage_delete(&map_a, task); 36 37 bpf_task_storage_delete(&map_b, task); 37 38 return 0; ··· 50 37 struct task_struct *task = bpf_get_current_task_btf(); 51 38 long *ptr; 52 39 40 + if (!test_pid || task->pid != test_pid) 41 + return 0; 42 + 53 43 ptr = bpf_task_storage_get(&map_a, task, 0, 54 44 BPF_LOCAL_STORAGE_GET_F_CREATE); 55 - if (ptr) 56 - *ptr += 1; 45 + /* ptr will not be NULL when it is called from 46 + * the bpf_task_storage_get(&map_b,...F_CREATE) in 47 + * the BPF_PROG(on_enter) below. It is because 48 + * the value can be found in map_a and the kernel 49 + * does not need to acquire any spin_lock. 50 + */ 51 + if (ptr) { 52 + int err; 57 53 54 + *ptr += 1; 55 + err = bpf_task_storage_delete(&map_a, task); 56 + if (err == -EBUSY) 57 + nr_del_errs++; 58 + } 59 + 60 + /* This will still fail because map_b is empty and 61 + * this BPF_PROG(on_update) has failed to acquire 62 + * the percpu busy lock => meaning potential 63 + * deadlock is detected and it will fail to create 64 + * new storage. 65 + */ 58 66 ptr = bpf_task_storage_get(&map_b, task, 0, 59 67 BPF_LOCAL_STORAGE_GET_F_CREATE); 60 68 if (ptr) ··· 91 57 long *ptr; 92 58 93 59 task = bpf_get_current_task_btf(); 60 + if (!test_pid || task->pid != test_pid) 61 + return 0; 62 + 94 63 ptr = bpf_task_storage_get(&map_a, task, 0, 95 64 BPF_LOCAL_STORAGE_GET_F_CREATE); 96 - if (ptr) 65 + if (ptr && !*ptr) 97 66 *ptr = 200; 98 67 99 68 ptr = bpf_task_storage_get(&map_b, task, 0, 100 69 BPF_LOCAL_STORAGE_GET_F_CREATE); 101 - if (ptr) 70 + if (ptr && !*ptr) 102 71 *ptr = 100; 103 72 return 0; 104 73 }

+47

tools/testing/selftests/bpf/progs/task_storage_nodeadlock.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + 3 + #include "vmlinux.h" 4 + #include <bpf/bpf_helpers.h> 5 + #include <bpf/bpf_tracing.h> 6 + 7 + char _license[] SEC("license") = "GPL"; 8 + 9 + #ifndef EBUSY 10 + #define EBUSY 16 11 + #endif 12 + 13 + extern bool CONFIG_PREEMPT __kconfig __weak; 14 + int nr_get_errs = 0; 15 + int nr_del_errs = 0; 16 + 17 + struct { 18 + __uint(type, BPF_MAP_TYPE_TASK_STORAGE); 19 + __uint(map_flags, BPF_F_NO_PREALLOC); 20 + __type(key, int); 21 + __type(value, int); 22 + } task_storage SEC(".maps"); 23 + 24 + SEC("lsm.s/socket_post_create") 25 + int BPF_PROG(socket_post_create, struct socket *sock, int family, int type, 26 + int protocol, int kern) 27 + { 28 + struct task_struct *task; 29 + int ret, zero = 0; 30 + int *value; 31 + 32 + if (!CONFIG_PREEMPT) 33 + return 0; 34 + 35 + task = bpf_get_current_task_btf(); 36 + value = bpf_task_storage_get(&task_storage, task, &zero, 37 + BPF_LOCAL_STORAGE_GET_F_CREATE); 38 + if (!value) 39 + __sync_fetch_and_add(&nr_get_errs, 1); 40 + 41 + ret = bpf_task_storage_delete(&task_storage, 42 + bpf_get_current_task_btf()); 43 + if (ret == -EBUSY) 44 + __sync_fetch_and_add(&nr_del_errs, 1); 45 + 46 + return 0; 47 + }

+6

tools/testing/selftests/bpf/progs/test_module_attach.c

··· 110 110 return 0; /* don't override the exit code */ 111 111 } 112 112 113 + SEC("kprobe.multi/bpf_testmod_test_read") 114 + int BPF_PROG(kprobe_multi) 115 + { 116 + return 0; 117 + } 118 + 113 119 char _license[] SEC("license") = "GPL";

+70

tools/testing/selftests/bpf/progs/test_ringbuf_map_key.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ 3 + 4 + #include <linux/bpf.h> 5 + #include <bpf/bpf_helpers.h> 6 + #include "bpf_misc.h" 7 + 8 + char _license[] SEC("license") = "GPL"; 9 + 10 + struct sample { 11 + int pid; 12 + int seq; 13 + long value; 14 + char comm[16]; 15 + }; 16 + 17 + struct { 18 + __uint(type, BPF_MAP_TYPE_RINGBUF); 19 + } ringbuf SEC(".maps"); 20 + 21 + struct { 22 + __uint(type, BPF_MAP_TYPE_HASH); 23 + __uint(max_entries, 1000); 24 + __type(key, struct sample); 25 + __type(value, int); 26 + } hash_map SEC(".maps"); 27 + 28 + /* inputs */ 29 + int pid = 0; 30 + 31 + /* inner state */ 32 + long seq = 0; 33 + 34 + SEC("fentry/" SYS_PREFIX "sys_getpgid") 35 + int test_ringbuf_mem_map_key(void *ctx) 36 + { 37 + int cur_pid = bpf_get_current_pid_tgid() >> 32; 38 + struct sample *sample, sample_copy; 39 + int *lookup_val; 40 + 41 + if (cur_pid != pid) 42 + return 0; 43 + 44 + sample = bpf_ringbuf_reserve(&ringbuf, sizeof(*sample), 0); 45 + if (!sample) 46 + return 0; 47 + 48 + sample->pid = pid; 49 + bpf_get_current_comm(sample->comm, sizeof(sample->comm)); 50 + sample->seq = ++seq; 51 + sample->value = 42; 52 + 53 + /* test using 'sample' (PTR_TO_MEM | MEM_ALLOC) as map key arg 54 + */ 55 + lookup_val = (int *)bpf_map_lookup_elem(&hash_map, sample); 56 + 57 + /* workaround - memcpy is necessary so that verifier doesn't 58 + * complain with: 59 + * verifier internal error: more than one arg with ref_obj_id R3 60 + * when trying to do bpf_map_update_elem(&hash_map, sample, &sample->seq, BPF_ANY); 61 + * 62 + * Since bpf_map_lookup_elem above uses 'sample' as key, test using 63 + * sample field as value below 64 + */ 65 + __builtin_memcpy(&sample_copy, sample, sizeof(struct sample)); 66 + bpf_map_update_elem(&hash_map, &sample_copy, &sample->seq, BPF_ANY); 67 + 68 + bpf_ringbuf_submit(sample, 0); 69 + return 0; 70 + }

+17

tools/testing/selftests/bpf/progs/test_skeleton.c

··· 53 53 54 54 char huge_arr[16 * 1024 * 1024]; 55 55 56 + /* non-mmapable custom .data section */ 57 + 58 + struct my_value { int x, y, z; }; 59 + 60 + __hidden int zero_key SEC(".data.non_mmapable"); 61 + static struct my_value zero_value SEC(".data.non_mmapable"); 62 + 63 + struct { 64 + __uint(type, BPF_MAP_TYPE_ARRAY); 65 + __type(key, int); 66 + __type(value, struct my_value); 67 + __uint(max_entries, 1); 68 + } my_map SEC(".maps"); 69 + 56 70 SEC("raw_tp/sys_enter") 57 71 int handler(const void *ctx) 58 72 { ··· 88 74 out_mostly_var = read_mostly_var; 89 75 90 76 huge_arr[sizeof(huge_arr) - 1] = 123; 77 + 78 + /* make sure zero_key and zero_value are not optimized out */ 79 + bpf_map_update_elem(&my_map, &zero_key, &zero_value, BPF_ANY); 91 80 92 81 return 0; 93 82 }

+5 -2

tools/testing/selftests/bpf/test_bpftool_metadata.sh

··· 4 4 # Kselftest framework requirement - SKIP code is 4. 5 5 ksft_skip=4 6 6 7 + BPF_FILE_USED="metadata_used.bpf.o" 8 + BPF_FILE_UNUSED="metadata_unused.bpf.o" 9 + 7 10 TESTNAME=bpftool_metadata 8 11 BPF_FS=$(awk '$3 == "bpf" {print $2; exit}' /proc/mounts) 9 12 BPF_DIR=$BPF_FS/test_$TESTNAME ··· 58 55 59 56 trap cleanup EXIT 60 57 61 - bpftool prog load metadata_unused.o $BPF_DIR/unused 58 + bpftool prog load $BPF_FILE_UNUSED $BPF_DIR/unused 62 59 63 60 METADATA_PLAIN="$(bpftool prog)" 64 61 echo "$METADATA_PLAIN" | grep 'a = "foo"' > /dev/null ··· 70 67 71 68 rm $BPF_DIR/unused 72 69 73 - bpftool prog load metadata_used.o $BPF_DIR/used 70 + bpftool prog load $BPF_FILE_USED $BPF_DIR/used 74 71 75 72 METADATA_PLAIN="$(bpftool prog)" 76 73 echo "$METADATA_PLAIN" | grep 'a = "bar"' > /dev/null

+8

tools/testing/selftests/bpf/test_bpftool_synctypes.py

··· 501 501 source_map_types = set(bpf_info.get_map_type_map().values()) 502 502 source_map_types.discard('unspec') 503 503 504 + # BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED and BPF_MAP_TYPE_CGROUP_STORAGE 505 + # share the same enum value and source_map_types picks 506 + # BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED/cgroup_storage_deprecated. 507 + # Replace 'cgroup_storage_deprecated' with 'cgroup_storage' 508 + # so it aligns with what `bpftool map help` shows. 509 + source_map_types.remove('cgroup_storage_deprecated') 510 + source_map_types.add('cgroup_storage') 511 + 504 512 help_map_types = map_info.get_map_help() 505 513 help_map_options = map_info.get_options() 506 514 map_info.close()

+4 -2

tools/testing/selftests/bpf/test_flow_dissector.sh

··· 2 2 # SPDX-License-Identifier: GPL-2.0 3 3 # 4 4 # Load BPF flow dissector and verify it correctly dissects traffic 5 + 6 + BPF_FILE="bpf_flow.bpf.o" 5 7 export TESTNAME=test_flow_dissector 6 8 unmount=0 7 9 ··· 24 22 if bpftool="$(which bpftool)"; then 25 23 echo "Testing global flow dissector..." 26 24 27 - $bpftool prog loadall ./bpf_flow.o /sys/fs/bpf/flow \ 25 + $bpftool prog loadall $BPF_FILE /sys/fs/bpf/flow \ 28 26 type flow_dissector 29 27 30 28 if ! unshare --net $bpftool prog attach pinned \ ··· 97 95 fi 98 96 99 97 # Attach BPF program 100 - ./flow_dissector_load -p bpf_flow.o -s _dissect 98 + ./flow_dissector_load -p $BPF_FILE -s _dissect 101 99 102 100 # Setup 103 101 tc qdisc add dev lo ingress

+9 -8

tools/testing/selftests/bpf/test_lwt_ip_encap.sh

··· 38 38 # ping: SRC->[encap at veth2:ingress]->GRE:decap->DST 39 39 # ping replies go DST->SRC directly 40 40 41 + BPF_FILE="test_lwt_ip_encap.bpf.o" 41 42 if [[ $EUID -ne 0 ]]; then 42 43 echo "This script must be run as root" 43 44 echo "FAIL" ··· 374 373 # install replacement routes (LWT/eBPF), pings succeed 375 374 if [ "${ENCAP}" == "IPv4" ] ; then 376 375 ip -netns ${NS1} route add ${IPv4_DST} encap bpf xmit obj \ 377 - test_lwt_ip_encap.o sec encap_gre dev veth1 ${VRF} 376 + ${BPF_FILE} sec encap_gre dev veth1 ${VRF} 378 377 ip -netns ${NS1} -6 route add ${IPv6_DST} encap bpf xmit obj \ 379 - test_lwt_ip_encap.o sec encap_gre dev veth1 ${VRF} 378 + ${BPF_FILE} sec encap_gre dev veth1 ${VRF} 380 379 elif [ "${ENCAP}" == "IPv6" ] ; then 381 380 ip -netns ${NS1} route add ${IPv4_DST} encap bpf xmit obj \ 382 - test_lwt_ip_encap.o sec encap_gre6 dev veth1 ${VRF} 381 + ${BPF_FILE} sec encap_gre6 dev veth1 ${VRF} 383 382 ip -netns ${NS1} -6 route add ${IPv6_DST} encap bpf xmit obj \ 384 - test_lwt_ip_encap.o sec encap_gre6 dev veth1 ${VRF} 383 + ${BPF_FILE} sec encap_gre6 dev veth1 ${VRF} 385 384 else 386 385 echo " unknown encap ${ENCAP}" 387 386 TEST_STATUS=1 ··· 432 431 # install replacement routes (LWT/eBPF), pings succeed 433 432 if [ "${ENCAP}" == "IPv4" ] ; then 434 433 ip -netns ${NS2} route add ${IPv4_DST} encap bpf in obj \ 435 - test_lwt_ip_encap.o sec encap_gre dev veth2 ${VRF} 434 + ${BPF_FILE} sec encap_gre dev veth2 ${VRF} 436 435 ip -netns ${NS2} -6 route add ${IPv6_DST} encap bpf in obj \ 437 - test_lwt_ip_encap.o sec encap_gre dev veth2 ${VRF} 436 + ${BPF_FILE} sec encap_gre dev veth2 ${VRF} 438 437 elif [ "${ENCAP}" == "IPv6" ] ; then 439 438 ip -netns ${NS2} route add ${IPv4_DST} encap bpf in obj \ 440 - test_lwt_ip_encap.o sec encap_gre6 dev veth2 ${VRF} 439 + ${BPF_FILE} sec encap_gre6 dev veth2 ${VRF} 441 440 ip -netns ${NS2} -6 route add ${IPv6_DST} encap bpf in obj \ 442 - test_lwt_ip_encap.o sec encap_gre6 dev veth2 ${VRF} 441 + ${BPF_FILE} sec encap_gre6 dev veth2 ${VRF} 443 442 else 444 443 echo "FAIL: unknown encap ${ENCAP}" 445 444 TEST_STATUS=1

+5 -4

tools/testing/selftests/bpf/test_lwt_seg6local.sh

··· 23 23 24 24 # Kselftest framework requirement - SKIP code is 4. 25 25 ksft_skip=4 26 + BPF_FILE="test_lwt_seg6local.bpf.o" 26 27 readonly NS1="ns1-$(mktemp -u XXXXXX)" 27 28 readonly NS2="ns2-$(mktemp -u XXXXXX)" 28 29 readonly NS3="ns3-$(mktemp -u XXXXXX)" ··· 118 117 ip netns exec ${NS1} ip -6 addr add fb00::1/16 dev lo 119 118 ip netns exec ${NS1} ip -6 route add fb00::6 dev veth1 via fb00::21 120 119 121 - ip netns exec ${NS2} ip -6 route add fb00::6 encap bpf in obj test_lwt_seg6local.o sec encap_srh dev veth2 120 + ip netns exec ${NS2} ip -6 route add fb00::6 encap bpf in obj ${BPF_FILE} sec encap_srh dev veth2 122 121 ip netns exec ${NS2} ip -6 route add fd00::1 dev veth3 via fb00::43 scope link 123 122 124 123 ip netns exec ${NS3} ip -6 route add fc42::1 dev veth5 via fb00::65 125 - ip netns exec ${NS3} ip -6 route add fd00::1 encap seg6local action End.BPF endpoint obj test_lwt_seg6local.o sec add_egr_x dev veth4 124 + ip netns exec ${NS3} ip -6 route add fd00::1 encap seg6local action End.BPF endpoint obj ${BPF_FILE} sec add_egr_x dev veth4 126 125 127 - ip netns exec ${NS4} ip -6 route add fd00::2 encap seg6local action End.BPF endpoint obj test_lwt_seg6local.o sec pop_egr dev veth6 126 + ip netns exec ${NS4} ip -6 route add fd00::2 encap seg6local action End.BPF endpoint obj ${BPF_FILE} sec pop_egr dev veth6 128 127 ip netns exec ${NS4} ip -6 addr add fc42::1 dev lo 129 128 ip netns exec ${NS4} ip -6 route add fd00::3 dev veth7 via fb00::87 130 129 131 130 ip netns exec ${NS5} ip -6 route add fd00::4 table 117 dev veth9 via fb00::109 132 - ip netns exec ${NS5} ip -6 route add fd00::3 encap seg6local action End.BPF endpoint obj test_lwt_seg6local.o sec inspect_t dev veth8 131 + ip netns exec ${NS5} ip -6 route add fd00::3 encap seg6local action End.BPF endpoint obj ${BPF_FILE} sec inspect_t dev veth8 133 132 134 133 ip netns exec ${NS6} ip -6 addr add fb00::6/16 dev lo 135 134 ip netns exec ${NS6} ip -6 addr add fd00::4/16 dev lo

+2 -1

tools/testing/selftests/bpf/test_tc_edt.sh

··· 5 5 # with dst port = 9000 down to 5MBps. Then it measures actual 6 6 # throughput of the flow. 7 7 8 + BPF_FILE="test_tc_edt.bpf.o" 8 9 if [[ $EUID -ne 0 ]]; then 9 10 echo "This script must be run as root" 10 11 echo "FAIL" ··· 55 54 ip netns exec ${NS_SRC} tc qdisc add dev veth_src root fq 56 55 ip netns exec ${NS_SRC} tc qdisc add dev veth_src clsact 57 56 ip netns exec ${NS_SRC} tc filter add dev veth_src egress \ 58 - bpf da obj test_tc_edt.o sec cls_test 57 + bpf da obj ${BPF_FILE} sec cls_test 59 58 60 59 61 60 # start the listener

+3 -2

tools/testing/selftests/bpf/test_tc_tunnel.sh

··· 3 3 # 4 4 # In-place tunneling 5 5 6 + BPF_FILE="test_tc_tunnel.bpf.o" 6 7 # must match the port that the bpf program filters on 7 8 readonly port=8000 8 9 ··· 197 196 # client can no longer connect 198 197 ip netns exec "${ns1}" tc qdisc add dev veth1 clsact 199 198 ip netns exec "${ns1}" tc filter add dev veth1 egress \ 200 - bpf direct-action object-file ./test_tc_tunnel.o \ 199 + bpf direct-action object-file ${BPF_FILE} \ 201 200 section "encap_${tuntype}_${mac}" 202 201 echo "test bpf encap without decap (expect failure)" 203 202 server_listen ··· 297 296 ip netns exec "${ns2}" ip link del dev testtun0 298 297 ip netns exec "${ns2}" tc qdisc add dev veth2 clsact 299 298 ip netns exec "${ns2}" tc filter add dev veth2 ingress \ 300 - bpf direct-action object-file ./test_tc_tunnel.o section decap 299 + bpf direct-action object-file ${BPF_FILE} section decap 301 300 echo "test bpf encap with bpf decap" 302 301 client_connect 303 302 verify_data

+3 -2

tools/testing/selftests/bpf/test_tunnel.sh

··· 45 45 # 5) Tunnel protocol handler, ex: vxlan_rcv, decap the packet 46 46 # 6) Forward the packet to the overlay tnl dev 47 47 48 + BPF_FILE="test_tunnel_kern.bpf.o" 48 49 BPF_PIN_TUNNEL_DIR="/sys/fs/bpf/tc/tunnel" 49 50 PING_ARG="-c 3 -w 10 -q" 50 51 ret=0 ··· 546 545 > /sys/kernel/debug/tracing/trace 547 546 setup_xfrm_tunnel 548 547 mkdir -p ${BPF_PIN_TUNNEL_DIR} 549 - bpftool prog loadall ./test_tunnel_kern.o ${BPF_PIN_TUNNEL_DIR} 548 + bpftool prog loadall ${BPF_FILE} ${BPF_PIN_TUNNEL_DIR} 550 549 tc qdisc add dev veth1 clsact 551 550 tc filter add dev veth1 proto ip ingress bpf da object-pinned \ 552 551 ${BPF_PIN_TUNNEL_DIR}/xfrm_get_state ··· 573 572 SET=$2 574 573 GET=$3 575 574 mkdir -p ${BPF_PIN_TUNNEL_DIR} 576 - bpftool prog loadall ./test_tunnel_kern.o ${BPF_PIN_TUNNEL_DIR}/ 575 + bpftool prog loadall ${BPF_FILE} ${BPF_PIN_TUNNEL_DIR}/ 577 576 tc qdisc add dev $DEV clsact 578 577 tc filter add dev $DEV egress bpf da object-pinned ${BPF_PIN_TUNNEL_DIR}/$SET 579 578 tc filter add dev $DEV ingress bpf da object-pinned ${BPF_PIN_TUNNEL_DIR}/$GET

+5 -4

tools/testing/selftests/bpf/test_xdp_meta.sh

··· 1 1 #!/bin/sh 2 2 3 + BPF_FILE="test_xdp_meta.bpf.o" 3 4 # Kselftest framework requirement - SKIP code is 4. 4 5 readonly KSFT_SKIP=4 5 6 readonly NS1="ns1-$(mktemp -u XXXXXX)" ··· 43 42 ip netns exec ${NS1} tc qdisc add dev veth1 clsact 44 43 ip netns exec ${NS2} tc qdisc add dev veth2 clsact 45 44 46 - ip netns exec ${NS1} tc filter add dev veth1 ingress bpf da obj test_xdp_meta.o sec t 47 - ip netns exec ${NS2} tc filter add dev veth2 ingress bpf da obj test_xdp_meta.o sec t 45 + ip netns exec ${NS1} tc filter add dev veth1 ingress bpf da obj ${BPF_FILE} sec t 46 + ip netns exec ${NS2} tc filter add dev veth2 ingress bpf da obj ${BPF_FILE} sec t 48 47 49 - ip netns exec ${NS1} ip link set dev veth1 xdp obj test_xdp_meta.o sec x 50 - ip netns exec ${NS2} ip link set dev veth2 xdp obj test_xdp_meta.o sec x 48 + ip netns exec ${NS1} ip link set dev veth1 xdp obj ${BPF_FILE} sec x 49 + ip netns exec ${NS2} ip link set dev veth2 xdp obj ${BPF_FILE} sec x 51 50 52 51 ip netns exec ${NS1} ip link set dev veth1 up 53 52 ip netns exec ${NS2} ip link set dev veth2 up

+4 -4

tools/testing/selftests/bpf/test_xdp_vlan.sh

··· 200 200 # ---------------------------------------------------------------------- 201 201 # In ns1: ingress use XDP to remove VLAN tags 202 202 export DEVNS1=veth1 203 - export FILE=test_xdp_vlan.o 203 + export BPF_FILE=test_xdp_vlan.bpf.o 204 204 205 205 # First test: Remove VLAN by setting VLAN ID 0, using "xdp_vlan_change" 206 206 export XDP_PROG=xdp_vlan_change 207 - ip netns exec ${NS1} ip link set $DEVNS1 $XDP_MODE object $FILE section $XDP_PROG 207 + ip netns exec ${NS1} ip link set $DEVNS1 $XDP_MODE object $BPF_FILE section $XDP_PROG 208 208 209 209 # In ns1: egress use TC to add back VLAN tag 4011 210 210 # (del cmd) ··· 212 212 # 213 213 ip netns exec ${NS1} tc qdisc add dev $DEVNS1 clsact 214 214 ip netns exec ${NS1} tc filter add dev $DEVNS1 egress \ 215 - prio 1 handle 1 bpf da obj $FILE sec tc_vlan_push 215 + prio 1 handle 1 bpf da obj $BPF_FILE sec tc_vlan_push 216 216 217 217 # Now the namespaces can reach each-other, test with ping: 218 218 ip netns exec ${NS2} ping -i 0.2 -W 2 -c 2 $IPADDR1 ··· 226 226 # 227 227 export XDP_PROG=xdp_vlan_remove_outer2 228 228 ip netns exec ${NS1} ip link set $DEVNS1 $XDP_MODE off 229 - ip netns exec ${NS1} ip link set $DEVNS1 $XDP_MODE object $FILE section $XDP_PROG 229 + ip netns exec ${NS1} ip link set $DEVNS1 $XDP_MODE object $BPF_FILE section $XDP_PROG 230 230 231 231 # Now the namespaces should still be able reach each-other, test with ping: 232 232 ip netns exec ${NS2} ping -i 0.2 -W 2 -c 2 $IPADDR1

+13 -7

tools/testing/selftests/bpf/trace_helpers.c

··· 23 23 return ((struct ksym *)p1)->addr - ((struct ksym *)p2)->addr; 24 24 } 25 25 26 - int load_kallsyms(void) 26 + int load_kallsyms_refresh(void) 27 27 { 28 28 FILE *f; 29 29 char func[256], buf[256]; ··· 31 31 void *addr; 32 32 int i = 0; 33 33 34 - /* 35 - * This is called/used from multiplace places, 36 - * load symbols just once. 37 - */ 38 - if (sym_cnt) 39 - return 0; 34 + sym_cnt = 0; 40 35 41 36 f = fopen("/proc/kallsyms", "r"); 42 37 if (!f) ··· 50 55 sym_cnt = i; 51 56 qsort(syms, sym_cnt, sizeof(struct ksym), ksym_cmp); 52 57 return 0; 58 + } 59 + 60 + int load_kallsyms(void) 61 + { 62 + /* 63 + * This is called/used from multiplace places, 64 + * load symbols just once. 65 + */ 66 + if (sym_cnt) 67 + return 0; 68 + return load_kallsyms_refresh(); 53 69 } 54 70 55 71 struct ksym *ksym_search(long key)

+2

tools/testing/selftests/bpf/trace_helpers.h

··· 10 10 }; 11 11 12 12 int load_kallsyms(void); 13 + int load_kallsyms_refresh(void); 14 + 13 15 struct ksym *ksym_search(long key); 14 16 long ksym_get_addr(const char *name); 15 17

+24

tools/testing/selftests/bpf/verifier/jit.c

··· 21 21 .retval = 2, 22 22 }, 23 23 { 24 + "jit: lsh, rsh, arsh by reg", 25 + .insns = { 26 + BPF_MOV64_IMM(BPF_REG_0, 1), 27 + BPF_MOV64_IMM(BPF_REG_4, 1), 28 + BPF_MOV64_IMM(BPF_REG_1, 0xff), 29 + BPF_ALU64_REG(BPF_LSH, BPF_REG_1, BPF_REG_0), 30 + BPF_ALU32_REG(BPF_LSH, BPF_REG_1, BPF_REG_4), 31 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0x3fc, 1), 32 + BPF_EXIT_INSN(), 33 + BPF_ALU64_REG(BPF_RSH, BPF_REG_1, BPF_REG_4), 34 + BPF_MOV64_REG(BPF_REG_4, BPF_REG_1), 35 + BPF_ALU32_REG(BPF_RSH, BPF_REG_4, BPF_REG_0), 36 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_4, 0xff, 1), 37 + BPF_EXIT_INSN(), 38 + BPF_ALU64_REG(BPF_ARSH, BPF_REG_4, BPF_REG_4), 39 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_4, 0, 1), 40 + BPF_EXIT_INSN(), 41 + BPF_MOV64_IMM(BPF_REG_0, 2), 42 + BPF_EXIT_INSN(), 43 + }, 44 + .result = ACCEPT, 45 + .retval = 2, 46 + }, 47 + { 24 48 "jit: mov32 for ldimm64, 1", 25 49 .insns = { 26 50 BPF_MOV64_IMM(BPF_REG_0, 2),

+6

tools/testing/selftests/bpf/vmtest.sh

··· 21 21 QEMU_FLAGS=(-cpu host -smp 8) 22 22 BZIMAGE="arch/x86/boot/bzImage" 23 23 ;; 24 + aarch64) 25 + QEMU_BINARY=qemu-system-aarch64 26 + QEMU_CONSOLE="ttyAMA0,115200" 27 + QEMU_FLAGS=(-M virt,gic-version=3 -cpu host -smp 8) 28 + BZIMAGE="arch/arm64/boot/Image" 29 + ;; 24 30 *) 25 31 echo "Unsupported architecture" 26 32 exit 1

Configure Feed

Configure Feed