Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge branch 'add-a-dynptr-type-for-skb-metadata-for-tc-bpf'

Jakub Sitnicki says:

====================
Add a dynptr type for skb metadata for TC BPF

TL;DR
-----

This is the first step in an effort which aims to enable skb metadata
access for all BPF programs which operate on an skb context.

By skb metadata we mean the custom metadata area which can be allocated
from an XDP program with the bpf_xdp_adjust_meta helper [1]. Network stack
code accesses it using the skb_metadata_* helpers.

Changelog
---------
Changes in v7:
- Make dynptr read-only for cloned skbs for now. (Martin)
- Extend tests for skb clones to cover writes to metadata.
- Drop Jesse's review stamp for patch 2 due to an update.
- Link to v6: https://lore.kernel.org/r/20250804-skb-metadata-thru-dynptr-v6-0-05da400bfa4b@cloudflare.com

Changes in v6:
- Enable CONFIG_NET_ACT_MIRRED for bpf selftests to fix CI failure
- Switch from u32 to matchall classifier, which bpf selftests already use
- Link to v5: https://lore.kernel.org/r/20250731-skb-metadata-thru-dynptr-v5-0-f02f6b5688dc@cloudflare.com

Changes in v5:
- Invalidate skb payload and metadata slices on write to metadata. (Martin)
- Drop redundant bounds check in bpf_skb_meta_*(). (Martin)
- Check for unexpected flags in __bpf_dynptr_write(). (Martin)
- Fold bpf_skb_meta_{load,store}_bytes() into callers.
- Add a test for metadata access when an skb clone has been modified.
- Drop Eduard's Ack for patch 3. Patch updated.
- Keep Eduard's Ack for patches 4-8.
- Add Jesse's stamp from an internal review.
- Link to v4: https://lore.kernel.org/r/20250723-skb-metadata-thru-dynptr-v4-0-a0fed48bcd37@cloudflare.com

Changes in v4:
- Kill bpf_dynptr_from_skb_meta_rdonly. Not needed for now. (Marin)
- Add a test to cover passing OOB offsets to dynptr ops. (Eduard)
- Factor out bounds checks from bpf_dynptr_{read,write,slice}. (Eduard)
- Squash patches:
bpf: Enable read access to skb metadata with bpf_dynptr_read
bpf: Enable write access to skb metadata with bpf_dynptr_write
bpf: Enable read-write access to skb metadata with dynptr slice
- Kept Eduard's Acks for v3 on unchanged patches.
- Link to v3: https://lore.kernel.org/r/20250721-skb-metadata-thru-dynptr-v3-0-e92be5534174@cloudflare.com

Changes in v3:
- Add a kfunc set for skb metadata access. Limited to TC BPF. (Martin)
- Drop patches related to skb metadata access outside of TC BPF:
net: Clear skb metadata on handover from device to protocol
selftests/bpf: Cover lack of access to skb metadata at ip layer
selftests/bpf: Count successful bpf program runs
- Link to v2: https://lore.kernel.org/r/20250716-skb-metadata-thru-dynptr-v2-0-5f580447e1df@cloudflare.com

Changes in v2:
- Switch to a dedicated dynptr type for skb metadata (Andrii)
- Add verifier test coverage since we now touch its code
- Add missing test coverage for bpf_dynptr_adjust and access at an offset
- Link to v1: https://lore.kernel.org/r/20250630-skb-metadata-thru-dynptr-v1-0-f17da13625d8@cloudflare.com

Overview
--------

Today, the skb metadata is accessible only by the BPF TC ingress programs
through the __sk_buff->data_meta pointer. We propose a three step plan to
make skb metadata available to all other BPF programs which operate on skb
objects:

1) Add a dynptr type for skb metadata (this patch set)

This is a preparatory step, but it also stands on its own. Here we
enable access to the skb metadata through a bpf_dynptr, the same way we
can already access the skb payload today.

As the next step (2), we want to relocate the metadata as skb travels
through the network stack in order to persist it. That will require a
safe way to access the metadata area irrespective of its location.

This is where the dynptr [2] comes into play. It solves exactly that
problem. A dynptr to skb metadata can be backed by a memory area that
resides in a different location depending on the code path.

2) Persist skb metadata past the TC hook (future)

Having the metadata in front of the packet headers as the skb travels
through the network stack is problematic - see the discussion of
alternative approaches below. Hence, we plan to relocate it as
necessary past the TC hook.

Where to relocate it? We don't know yet. There are a couple of
options: (i) move it to the top of skb headroom, or (ii) allocate
dedicated memory for it. They are not mutually exclusive. The right
solution might be a mix.

When to relocate it? That is also an open question. It could be done
during device to protocol handover or lazily when headers get pushed or
headroom gets resized.

3) skb dynptr for sockops, sk_lookup, etc. (future)

There are BPF program types don't operate on __sk_buff context, but
either have, or could have, access to the skb itself. As a final touch,
we want to provide a way to create an skb metadata dynptr for these
program types.

TIMTOWDI
--------

Alternative approaches which we considered:

* Keep the metadata always in front of skb->data

We think it is a bad idea for two reasons, outlined below. Nevertheless we
are open to it, if necessary.

1) Performance concerns

It would require the network stack to move the metadata on each header
pull/push - see skb_reorder_vlan_header() [3] for an example. While
doable, there is an expected performance overhead.

2) Potential for bugs

In addition to updating skb_push/pull and pskp_expand_head, we would
need to audit any code paths which operate on skb->data pointer
directly without going through the helpers. This creates a "known
unknown" risk.

* Design a new custom metadata area from scratch

We have tried that in Arthur's patch set [4]. One of the outcomes of the
discussion there was that we don't want to have two places to store custom
metadata. Hence the change of approach to make the existing custom metadata
area work.

-jkbs

[1] https://docs.ebpf.io/linux/helper-function/bpf_xdp_adjust_meta/
[2] https://docs.ebpf.io/linux/concepts/dynptrs/
[3] https://elixir.bootlin.com/linux/v6.16-rc6/source/net/core/skbuff.c#L6211
[4] https://lore.kernel.org/all/20250422-afabre-traits-010-rfc2-v2-0-92bcc6b146c9@arthurfabre.com/
====================

Link: https://patch.msgid.link/20250814-skb-metadata-thru-dynptr-v7-0-8a39e636e0fb@cloudflare.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

+1029 -29
+6 -1
include/linux/bpf.h
··· 767 767 */ 768 768 MEM_WRITE = BIT(18 + BPF_BASE_TYPE_BITS), 769 769 770 + /* DYNPTR points to skb_metadata_end()-skb_metadata_len() */ 771 + DYNPTR_TYPE_SKB_META = BIT(19 + BPF_BASE_TYPE_BITS), 772 + 770 773 __BPF_TYPE_FLAG_MAX, 771 774 __BPF_TYPE_LAST_FLAG = __BPF_TYPE_FLAG_MAX - 1, 772 775 }; 773 776 774 777 #define DYNPTR_TYPE_FLAG_MASK (DYNPTR_TYPE_LOCAL | DYNPTR_TYPE_RINGBUF | DYNPTR_TYPE_SKB \ 775 - | DYNPTR_TYPE_XDP) 778 + | DYNPTR_TYPE_XDP | DYNPTR_TYPE_SKB_META) 776 779 777 780 /* Max number of base types. */ 778 781 #define BPF_BASE_TYPE_LIMIT (1UL << BPF_BASE_TYPE_BITS) ··· 1361 1358 BPF_DYNPTR_TYPE_SKB, 1362 1359 /* Underlying data is a xdp_buff */ 1363 1360 BPF_DYNPTR_TYPE_XDP, 1361 + /* Points to skb_metadata_end()-skb_metadata_len() */ 1362 + BPF_DYNPTR_TYPE_SKB_META, 1364 1363 }; 1365 1364 1366 1365 int bpf_dynptr_check_size(u32 size);
+6
include/linux/filter.h
··· 1784 1784 void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len); 1785 1785 void bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off, 1786 1786 void *buf, unsigned long len, bool flush); 1787 + void *bpf_skb_meta_pointer(struct sk_buff *skb, u32 offset); 1787 1788 #else /* CONFIG_NET */ 1788 1789 static inline int __bpf_skb_load_bytes(const struct sk_buff *skb, u32 offset, 1789 1790 void *to, u32 len) ··· 1818 1817 static inline void bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off, void *buf, 1819 1818 unsigned long len, bool flush) 1820 1819 { 1820 + } 1821 + 1822 + static inline void *bpf_skb_meta_pointer(struct sk_buff *skb, u32 offset) 1823 + { 1824 + return NULL; 1821 1825 } 1822 1826 #endif /* CONFIG_NET */ 1823 1827
+11
kernel/bpf/helpers.c
··· 1780 1780 return __bpf_skb_load_bytes(src->data, src->offset + offset, dst, len); 1781 1781 case BPF_DYNPTR_TYPE_XDP: 1782 1782 return __bpf_xdp_load_bytes(src->data, src->offset + offset, dst, len); 1783 + case BPF_DYNPTR_TYPE_SKB_META: 1784 + memmove(dst, bpf_skb_meta_pointer(src->data, src->offset + offset), len); 1785 + return 0; 1783 1786 default: 1784 1787 WARN_ONCE(true, "bpf_dynptr_read: unknown dynptr type %d\n", type); 1785 1788 return -EFAULT; ··· 1839 1836 if (flags) 1840 1837 return -EINVAL; 1841 1838 return __bpf_xdp_store_bytes(dst->data, dst->offset + offset, src, len); 1839 + case BPF_DYNPTR_TYPE_SKB_META: 1840 + if (flags) 1841 + return -EINVAL; 1842 + memmove(bpf_skb_meta_pointer(dst->data, dst->offset + offset), src, len); 1843 + return 0; 1842 1844 default: 1843 1845 WARN_ONCE(true, "bpf_dynptr_write: unknown dynptr type %d\n", type); 1844 1846 return -EFAULT; ··· 1890 1882 return (unsigned long)(ptr->data + ptr->offset + offset); 1891 1883 case BPF_DYNPTR_TYPE_SKB: 1892 1884 case BPF_DYNPTR_TYPE_XDP: 1885 + case BPF_DYNPTR_TYPE_SKB_META: 1893 1886 /* skb and xdp dynptrs should use bpf_dynptr_slice / bpf_dynptr_slice_rdwr */ 1894 1887 return 0; 1895 1888 default: ··· 2719 2710 bpf_xdp_copy_buf(ptr->data, ptr->offset + offset, buffer__opt, len, false); 2720 2711 return buffer__opt; 2721 2712 } 2713 + case BPF_DYNPTR_TYPE_SKB_META: 2714 + return bpf_skb_meta_pointer(ptr->data, ptr->offset + offset); 2722 2715 default: 2723 2716 WARN_ONCE(true, "unknown dynptr type %d\n", type); 2724 2717 return NULL;
+2
kernel/bpf/log.c
··· 498 498 return "skb"; 499 499 case BPF_DYNPTR_TYPE_XDP: 500 500 return "xdp"; 501 + case BPF_DYNPTR_TYPE_SKB_META: 502 + return "skb_meta"; 501 503 case BPF_DYNPTR_TYPE_INVALID: 502 504 return "<invalid>"; 503 505 default:
+13 -2
kernel/bpf/verifier.c
··· 674 674 return BPF_DYNPTR_TYPE_SKB; 675 675 case DYNPTR_TYPE_XDP: 676 676 return BPF_DYNPTR_TYPE_XDP; 677 + case DYNPTR_TYPE_SKB_META: 678 + return BPF_DYNPTR_TYPE_SKB_META; 677 679 default: 678 680 return BPF_DYNPTR_TYPE_INVALID; 679 681 } ··· 692 690 return DYNPTR_TYPE_SKB; 693 691 case BPF_DYNPTR_TYPE_XDP: 694 692 return DYNPTR_TYPE_XDP; 693 + case BPF_DYNPTR_TYPE_SKB_META: 694 + return DYNPTR_TYPE_SKB_META; 695 695 default: 696 696 return 0; 697 697 } ··· 2278 2274 static bool reg_is_dynptr_slice_pkt(const struct bpf_reg_state *reg) 2279 2275 { 2280 2276 return base_type(reg->type) == PTR_TO_MEM && 2281 - (reg->type & DYNPTR_TYPE_SKB || reg->type & DYNPTR_TYPE_XDP); 2277 + (reg->type & 2278 + (DYNPTR_TYPE_SKB | DYNPTR_TYPE_XDP | DYNPTR_TYPE_SKB_META)); 2282 2279 } 2283 2280 2284 2281 /* Unmodified PTR_TO_PACKET[_META,_END] register from ctx access. */ ··· 11646 11641 if (dynptr_type == BPF_DYNPTR_TYPE_INVALID) 11647 11642 return -EFAULT; 11648 11643 11649 - if (dynptr_type == BPF_DYNPTR_TYPE_SKB) 11644 + if (dynptr_type == BPF_DYNPTR_TYPE_SKB || 11645 + dynptr_type == BPF_DYNPTR_TYPE_SKB_META) 11650 11646 /* this will trigger clear_all_pkt_pointers(), which will 11651 11647 * invalidate all dynptr slices associated with the skb 11652 11648 */ ··· 12234 12228 KF_bpf_rbtree_right, 12235 12229 KF_bpf_dynptr_from_skb, 12236 12230 KF_bpf_dynptr_from_xdp, 12231 + KF_bpf_dynptr_from_skb_meta, 12237 12232 KF_bpf_dynptr_slice, 12238 12233 KF_bpf_dynptr_slice_rdwr, 12239 12234 KF_bpf_dynptr_clone, ··· 12284 12277 #ifdef CONFIG_NET 12285 12278 BTF_ID(func, bpf_dynptr_from_skb) 12286 12279 BTF_ID(func, bpf_dynptr_from_xdp) 12280 + BTF_ID(func, bpf_dynptr_from_skb_meta) 12287 12281 #else 12282 + BTF_ID_UNUSED 12288 12283 BTF_ID_UNUSED 12289 12284 BTF_ID_UNUSED 12290 12285 #endif ··· 13262 13253 dynptr_arg_type |= DYNPTR_TYPE_SKB; 13263 13254 } else if (meta->func_id == special_kfunc_list[KF_bpf_dynptr_from_xdp]) { 13264 13255 dynptr_arg_type |= DYNPTR_TYPE_XDP; 13256 + } else if (meta->func_id == special_kfunc_list[KF_bpf_dynptr_from_skb_meta]) { 13257 + dynptr_arg_type |= DYNPTR_TYPE_SKB_META; 13265 13258 } else if (meta->func_id == special_kfunc_list[KF_bpf_dynptr_clone] && 13266 13259 (dynptr_arg_type & MEM_UNINIT)) { 13267 13260 enum bpf_dynptr_type parent_type = meta->initialized_dynptr.type;
+57
net/core/filter.c
··· 11990 11990 return func; 11991 11991 } 11992 11992 11993 + /** 11994 + * bpf_skb_meta_pointer() - Gets a mutable pointer within the skb metadata area. 11995 + * @skb: socket buffer carrying the metadata 11996 + * @offset: offset into the metadata area, must be <= skb_metadata_len() 11997 + */ 11998 + void *bpf_skb_meta_pointer(struct sk_buff *skb, u32 offset) 11999 + { 12000 + return skb_metadata_end(skb) - skb_metadata_len(skb) + offset; 12001 + } 12002 + 11993 12003 __bpf_kfunc_start_defs(); 11994 12004 __bpf_kfunc int bpf_dynptr_from_skb(struct __sk_buff *s, u64 flags, 11995 12005 struct bpf_dynptr *ptr__uninit) ··· 12013 12003 } 12014 12004 12015 12005 bpf_dynptr_init(ptr, skb, BPF_DYNPTR_TYPE_SKB, 0, skb->len); 12006 + 12007 + return 0; 12008 + } 12009 + 12010 + /** 12011 + * bpf_dynptr_from_skb_meta() - Initialize a dynptr to the skb metadata area. 12012 + * @skb_: socket buffer carrying the metadata 12013 + * @flags: future use, must be zero 12014 + * @ptr__uninit: dynptr to initialize 12015 + * 12016 + * Set up a dynptr for access to the metadata area earlier allocated from the 12017 + * XDP context with bpf_xdp_adjust_meta(). Serves as an alternative to 12018 + * &__sk_buff->data_meta. 12019 + * 12020 + * If passed @skb_ is a clone which shares the data with the original, the 12021 + * dynptr will be read-only. This limitation may be lifted in the future. 12022 + * 12023 + * Return: 12024 + * * %0 - dynptr ready to use 12025 + * * %-EINVAL - invalid flags, dynptr set to null 12026 + */ 12027 + __bpf_kfunc int bpf_dynptr_from_skb_meta(struct __sk_buff *skb_, u64 flags, 12028 + struct bpf_dynptr *ptr__uninit) 12029 + { 12030 + struct bpf_dynptr_kern *ptr = (struct bpf_dynptr_kern *)ptr__uninit; 12031 + struct sk_buff *skb = (struct sk_buff *)skb_; 12032 + 12033 + if (flags) { 12034 + bpf_dynptr_set_null(ptr); 12035 + return -EINVAL; 12036 + } 12037 + 12038 + bpf_dynptr_init(ptr, skb, BPF_DYNPTR_TYPE_SKB_META, 0, skb_metadata_len(skb)); 12039 + 12040 + if (skb_cloned(skb)) 12041 + bpf_dynptr_set_rdonly(ptr); 12016 12042 12017 12043 return 0; 12018 12044 } ··· 12227 12181 BTF_ID_FLAGS(func, bpf_dynptr_from_skb, KF_TRUSTED_ARGS) 12228 12182 BTF_KFUNCS_END(bpf_kfunc_check_set_skb) 12229 12183 12184 + BTF_KFUNCS_START(bpf_kfunc_check_set_skb_meta) 12185 + BTF_ID_FLAGS(func, bpf_dynptr_from_skb_meta, KF_TRUSTED_ARGS) 12186 + BTF_KFUNCS_END(bpf_kfunc_check_set_skb_meta) 12187 + 12230 12188 BTF_KFUNCS_START(bpf_kfunc_check_set_xdp) 12231 12189 BTF_ID_FLAGS(func, bpf_dynptr_from_xdp) 12232 12190 BTF_KFUNCS_END(bpf_kfunc_check_set_xdp) ··· 12250 12200 static const struct btf_kfunc_id_set bpf_kfunc_set_skb = { 12251 12201 .owner = THIS_MODULE, 12252 12202 .set = &bpf_kfunc_check_set_skb, 12203 + }; 12204 + 12205 + static const struct btf_kfunc_id_set bpf_kfunc_set_skb_meta = { 12206 + .owner = THIS_MODULE, 12207 + .set = &bpf_kfunc_check_set_skb_meta, 12253 12208 }; 12254 12209 12255 12210 static const struct btf_kfunc_id_set bpf_kfunc_set_xdp = { ··· 12292 12237 ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_LWT_SEG6LOCAL, &bpf_kfunc_set_skb); 12293 12238 ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_NETFILTER, &bpf_kfunc_set_skb); 12294 12239 ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_kfunc_set_skb); 12240 + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_skb_meta); 12241 + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_ACT, &bpf_kfunc_set_skb_meta); 12295 12242 ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &bpf_kfunc_set_xdp); 12296 12243 ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SOCK_ADDR, 12297 12244 &bpf_kfunc_set_sock_addr);
+3
tools/testing/selftests/bpf/bpf_kfuncs.h
··· 19 19 extern int bpf_dynptr_from_xdp(struct xdp_md *xdp, __u64 flags, 20 20 struct bpf_dynptr *ptr__uninit) __ksym __weak; 21 21 22 + extern int bpf_dynptr_from_skb_meta(struct __sk_buff *skb, __u64 flags, 23 + struct bpf_dynptr *ptr__uninit) __ksym __weak; 24 + 22 25 /* Description 23 26 * Obtain a read-only pointer to the dynptr's data 24 27 * Returns
+1
tools/testing/selftests/bpf/config
··· 61 61 CONFIG_MPLS_ROUTING=y 62 62 CONFIG_MPTCP=y 63 63 CONFIG_NET_ACT_GACT=y 64 + CONFIG_NET_ACT_MIRRED=y 64 65 CONFIG_NET_ACT_SKBMOD=y 65 66 CONFIG_NET_CLS=y 66 67 CONFIG_NET_CLS_ACT=y
+2
tools/testing/selftests/bpf/prog_tests/dynptr.c
··· 32 32 {"test_ringbuf", SETUP_SYSCALL_SLEEP}, 33 33 {"test_skb_readonly", SETUP_SKB_PROG}, 34 34 {"test_dynptr_skb_data", SETUP_SKB_PROG}, 35 + {"test_dynptr_skb_meta_data", SETUP_SKB_PROG}, 36 + {"test_dynptr_skb_meta_flags", SETUP_SKB_PROG}, 35 37 {"test_adjust", SETUP_SYSCALL_SLEEP}, 36 38 {"test_adjust_err", SETUP_SYSCALL_SLEEP}, 37 39 {"test_zero_size_dynptr", SETUP_SYSCALL_SLEEP},
+196 -26
tools/testing/selftests/bpf/prog_tests/xdp_context_test_run.c
··· 9 9 #define TX_NETNS "xdp_context_tx" 10 10 #define RX_NETNS "xdp_context_rx" 11 11 #define TAP_NAME "tap0" 12 + #define DUMMY_NAME "dum0" 12 13 #define TAP_NETNS "xdp_context_tuntap" 13 14 14 15 #define TEST_PAYLOAD_LEN 32 ··· 157 156 return -1; 158 157 } 159 158 160 - static void assert_test_result(struct test_xdp_meta *skel) 159 + static int write_test_packet(int tap_fd) 160 + { 161 + __u8 packet[sizeof(struct ethhdr) + TEST_PAYLOAD_LEN]; 162 + int n; 163 + 164 + /* The ethernet header doesn't need to be valid for this test */ 165 + memset(packet, 0, sizeof(struct ethhdr)); 166 + memcpy(packet + sizeof(struct ethhdr), test_payload, TEST_PAYLOAD_LEN); 167 + 168 + n = write(tap_fd, packet, sizeof(packet)); 169 + if (!ASSERT_EQ(n, sizeof(packet), "write packet")) 170 + return -1; 171 + 172 + return 0; 173 + } 174 + 175 + static void assert_test_result(const struct bpf_map *result_map) 161 176 { 162 177 int err; 163 178 __u32 map_key = 0; 164 179 __u8 map_value[TEST_PAYLOAD_LEN]; 165 180 166 - err = bpf_map__lookup_elem(skel->maps.test_result, &map_key, 167 - sizeof(map_key), &map_value, 168 - TEST_PAYLOAD_LEN, BPF_ANY); 181 + err = bpf_map__lookup_elem(result_map, &map_key, sizeof(map_key), 182 + &map_value, TEST_PAYLOAD_LEN, BPF_ANY); 169 183 if (!ASSERT_OK(err, "lookup test_result")) 170 184 return; 171 185 172 186 ASSERT_MEMEQ(&map_value, &test_payload, TEST_PAYLOAD_LEN, 173 187 "test_result map contains test payload"); 188 + } 189 + 190 + static bool clear_test_result(struct bpf_map *result_map) 191 + { 192 + const __u8 v[sizeof(test_payload)] = {}; 193 + const __u32 k = 0; 194 + int err; 195 + 196 + err = bpf_map__update_elem(result_map, &k, sizeof(k), v, sizeof(v), BPF_ANY); 197 + ASSERT_OK(err, "update test_result"); 198 + 199 + return err == 0; 174 200 } 175 201 176 202 void test_xdp_context_veth(void) ··· 276 248 if (!ASSERT_OK(ret, "send_test_packet")) 277 249 goto close; 278 250 279 - assert_test_result(skel); 251 + assert_test_result(skel->maps.test_result); 280 252 281 253 close: 282 254 close_netns(nstoken); ··· 285 257 netns_free(tx_ns); 286 258 } 287 259 288 - void test_xdp_context_tuntap(void) 260 + static void test_tuntap(struct bpf_program *xdp_prog, 261 + struct bpf_program *tc_prio_1_prog, 262 + struct bpf_program *tc_prio_2_prog, 263 + struct bpf_map *result_map) 289 264 { 290 265 LIBBPF_OPTS(bpf_tc_hook, tc_hook, .attach_point = BPF_TC_INGRESS); 291 266 LIBBPF_OPTS(bpf_tc_opts, tc_opts, .handle = 1, .priority = 1); 292 267 struct netns_obj *ns = NULL; 293 - struct test_xdp_meta *skel = NULL; 294 - __u8 packet[sizeof(struct ethhdr) + TEST_PAYLOAD_LEN]; 295 268 int tap_fd = -1; 296 269 int tap_ifindex; 297 270 int ret; 271 + 272 + if (!clear_test_result(result_map)) 273 + return; 298 274 299 275 ns = netns_new(TAP_NETNS, true); 300 276 if (!ASSERT_OK_PTR(ns, "create and open ns")) ··· 310 278 311 279 SYS(close, "ip link set dev " TAP_NAME " up"); 312 280 313 - skel = test_xdp_meta__open_and_load(); 314 - if (!ASSERT_OK_PTR(skel, "open and load skeleton")) 315 - goto close; 316 - 317 281 tap_ifindex = if_nametoindex(TAP_NAME); 318 282 if (!ASSERT_GE(tap_ifindex, 0, "if_nametoindex")) 319 283 goto close; ··· 319 291 if (!ASSERT_OK(ret, "bpf_tc_hook_create")) 320 292 goto close; 321 293 322 - tc_opts.prog_fd = bpf_program__fd(skel->progs.ing_cls); 294 + tc_opts.prog_fd = bpf_program__fd(tc_prio_1_prog); 323 295 ret = bpf_tc_attach(&tc_hook, &tc_opts); 324 296 if (!ASSERT_OK(ret, "bpf_tc_attach")) 325 297 goto close; 326 298 327 - ret = bpf_xdp_attach(tap_ifindex, bpf_program__fd(skel->progs.ing_xdp), 299 + if (tc_prio_2_prog) { 300 + LIBBPF_OPTS(bpf_tc_opts, tc_opts, .handle = 1, .priority = 2, 301 + .prog_fd = bpf_program__fd(tc_prio_2_prog)); 302 + 303 + ret = bpf_tc_attach(&tc_hook, &tc_opts); 304 + if (!ASSERT_OK(ret, "bpf_tc_attach")) 305 + goto close; 306 + } 307 + 308 + ret = bpf_xdp_attach(tap_ifindex, bpf_program__fd(xdp_prog), 328 309 0, NULL); 329 310 if (!ASSERT_GE(ret, 0, "bpf_xdp_attach")) 330 311 goto close; 331 312 332 - /* The ethernet header is not relevant for this test and doesn't need to 333 - * be meaningful. 334 - */ 335 - struct ethhdr eth = { 0 }; 336 - 337 - memcpy(packet, &eth, sizeof(eth)); 338 - memcpy(packet + sizeof(eth), test_payload, TEST_PAYLOAD_LEN); 339 - 340 - ret = write(tap_fd, packet, sizeof(packet)); 341 - if (!ASSERT_EQ(ret, sizeof(packet), "write packet")) 313 + ret = write_test_packet(tap_fd); 314 + if (!ASSERT_OK(ret, "write_test_packet")) 342 315 goto close; 343 316 344 - assert_test_result(skel); 317 + assert_test_result(result_map); 345 318 346 319 close: 347 320 if (tap_fd >= 0) 348 321 close(tap_fd); 349 - test_xdp_meta__destroy(skel); 350 322 netns_free(ns); 323 + } 324 + 325 + /* Write a packet to a tap dev and copy it to ingress of a dummy dev */ 326 + static void test_tuntap_mirred(struct bpf_program *xdp_prog, 327 + struct bpf_program *tc_prog, 328 + bool *test_pass) 329 + { 330 + LIBBPF_OPTS(bpf_tc_hook, tc_hook, .attach_point = BPF_TC_INGRESS); 331 + LIBBPF_OPTS(bpf_tc_opts, tc_opts, .handle = 1, .priority = 1); 332 + struct netns_obj *ns = NULL; 333 + int dummy_ifindex; 334 + int tap_fd = -1; 335 + int tap_ifindex; 336 + int ret; 337 + 338 + *test_pass = false; 339 + 340 + ns = netns_new(TAP_NETNS, true); 341 + if (!ASSERT_OK_PTR(ns, "netns_new")) 342 + return; 343 + 344 + /* Setup dummy interface */ 345 + SYS(close, "ip link add name " DUMMY_NAME " type dummy"); 346 + SYS(close, "ip link set dev " DUMMY_NAME " up"); 347 + 348 + dummy_ifindex = if_nametoindex(DUMMY_NAME); 349 + if (!ASSERT_GE(dummy_ifindex, 0, "if_nametoindex")) 350 + goto close; 351 + 352 + tc_hook.ifindex = dummy_ifindex; 353 + ret = bpf_tc_hook_create(&tc_hook); 354 + if (!ASSERT_OK(ret, "bpf_tc_hook_create")) 355 + goto close; 356 + 357 + tc_opts.prog_fd = bpf_program__fd(tc_prog); 358 + ret = bpf_tc_attach(&tc_hook, &tc_opts); 359 + if (!ASSERT_OK(ret, "bpf_tc_attach")) 360 + goto close; 361 + 362 + /* Setup TAP interface */ 363 + tap_fd = open_tuntap(TAP_NAME, true); 364 + if (!ASSERT_GE(tap_fd, 0, "open_tuntap")) 365 + goto close; 366 + 367 + SYS(close, "ip link set dev " TAP_NAME " up"); 368 + 369 + tap_ifindex = if_nametoindex(TAP_NAME); 370 + if (!ASSERT_GE(tap_ifindex, 0, "if_nametoindex")) 371 + goto close; 372 + 373 + ret = bpf_xdp_attach(tap_ifindex, bpf_program__fd(xdp_prog), 0, NULL); 374 + if (!ASSERT_GE(ret, 0, "bpf_xdp_attach")) 375 + goto close; 376 + 377 + /* Copy all packets received from TAP to dummy ingress */ 378 + SYS(close, "tc qdisc add dev " TAP_NAME " clsact"); 379 + SYS(close, "tc filter add dev " TAP_NAME " ingress " 380 + "protocol all matchall " 381 + "action mirred ingress mirror dev " DUMMY_NAME); 382 + 383 + /* Receive a packet on TAP */ 384 + ret = write_test_packet(tap_fd); 385 + if (!ASSERT_OK(ret, "write_test_packet")) 386 + goto close; 387 + 388 + ASSERT_TRUE(*test_pass, "test_pass"); 389 + 390 + close: 391 + if (tap_fd >= 0) 392 + close(tap_fd); 393 + netns_free(ns); 394 + } 395 + 396 + void test_xdp_context_tuntap(void) 397 + { 398 + struct test_xdp_meta *skel = NULL; 399 + 400 + skel = test_xdp_meta__open_and_load(); 401 + if (!ASSERT_OK_PTR(skel, "open and load skeleton")) 402 + return; 403 + 404 + if (test__start_subtest("data_meta")) 405 + test_tuntap(skel->progs.ing_xdp, 406 + skel->progs.ing_cls, 407 + NULL, /* tc prio 2 */ 408 + skel->maps.test_result); 409 + if (test__start_subtest("dynptr_read")) 410 + test_tuntap(skel->progs.ing_xdp, 411 + skel->progs.ing_cls_dynptr_read, 412 + NULL, /* tc prio 2 */ 413 + skel->maps.test_result); 414 + if (test__start_subtest("dynptr_slice")) 415 + test_tuntap(skel->progs.ing_xdp, 416 + skel->progs.ing_cls_dynptr_slice, 417 + NULL, /* tc prio 2 */ 418 + skel->maps.test_result); 419 + if (test__start_subtest("dynptr_write")) 420 + test_tuntap(skel->progs.ing_xdp_zalloc_meta, 421 + skel->progs.ing_cls_dynptr_write, 422 + skel->progs.ing_cls_dynptr_read, 423 + skel->maps.test_result); 424 + if (test__start_subtest("dynptr_slice_rdwr")) 425 + test_tuntap(skel->progs.ing_xdp_zalloc_meta, 426 + skel->progs.ing_cls_dynptr_slice_rdwr, 427 + skel->progs.ing_cls_dynptr_slice, 428 + skel->maps.test_result); 429 + if (test__start_subtest("dynptr_offset")) 430 + test_tuntap(skel->progs.ing_xdp_zalloc_meta, 431 + skel->progs.ing_cls_dynptr_offset_wr, 432 + skel->progs.ing_cls_dynptr_offset_rd, 433 + skel->maps.test_result); 434 + if (test__start_subtest("dynptr_offset_oob")) 435 + test_tuntap(skel->progs.ing_xdp, 436 + skel->progs.ing_cls_dynptr_offset_oob, 437 + skel->progs.ing_cls, 438 + skel->maps.test_result); 439 + if (test__start_subtest("clone_data_meta_empty_on_data_write")) 440 + test_tuntap_mirred(skel->progs.ing_xdp, 441 + skel->progs.clone_data_meta_empty_on_data_write, 442 + &skel->bss->test_pass); 443 + if (test__start_subtest("clone_data_meta_empty_on_meta_write")) 444 + test_tuntap_mirred(skel->progs.ing_xdp, 445 + skel->progs.clone_data_meta_empty_on_meta_write, 446 + &skel->bss->test_pass); 447 + if (test__start_subtest("clone_dynptr_empty_on_data_slice_write")) 448 + test_tuntap_mirred(skel->progs.ing_xdp, 449 + skel->progs.clone_dynptr_empty_on_data_slice_write, 450 + &skel->bss->test_pass); 451 + if (test__start_subtest("clone_dynptr_empty_on_meta_slice_write")) 452 + test_tuntap_mirred(skel->progs.ing_xdp, 453 + skel->progs.clone_dynptr_empty_on_meta_slice_write, 454 + &skel->bss->test_pass); 455 + if (test__start_subtest("clone_dynptr_rdonly_before_data_dynptr_write")) 456 + test_tuntap_mirred(skel->progs.ing_xdp, 457 + skel->progs.clone_dynptr_rdonly_before_data_dynptr_write, 458 + &skel->bss->test_pass); 459 + if (test__start_subtest("clone_dynptr_rdonly_before_meta_dynptr_write")) 460 + test_tuntap_mirred(skel->progs.ing_xdp, 461 + skel->progs.clone_dynptr_rdonly_before_meta_dynptr_write, 462 + &skel->bss->test_pass); 463 + 464 + test_xdp_meta__destroy(skel); 351 465 }
+258
tools/testing/selftests/bpf/progs/dynptr_fail.c
··· 269 269 return SK_PASS; 270 270 } 271 271 272 + /* A metadata slice can't be accessed out of bounds */ 273 + SEC("?tc") 274 + __failure __msg("value is outside of the allowed memory range") 275 + int data_slice_out_of_bounds_skb_meta(struct __sk_buff *skb) 276 + { 277 + struct bpf_dynptr meta; 278 + __u8 *md; 279 + 280 + bpf_dynptr_from_skb_meta(skb, 0, &meta); 281 + 282 + md = bpf_dynptr_slice_rdwr(&meta, 0, NULL, sizeof(*md)); 283 + if (!md) 284 + return SK_DROP; 285 + 286 + /* this should fail */ 287 + *(md + 1) = 42; 288 + 289 + return SK_PASS; 290 + } 291 + 272 292 SEC("?raw_tp") 273 293 __failure __msg("value is outside of the allowed memory range") 274 294 int data_slice_out_of_bounds_map_value(void *ctx) ··· 1109 1089 return SK_PASS; 1110 1090 } 1111 1091 1092 + /* bpf_dynptr_slice()s are read-only and cannot be written to */ 1093 + SEC("?tc") 1094 + __failure __msg("R{{[0-9]+}} cannot write into rdonly_mem") 1095 + int skb_meta_invalid_slice_write(struct __sk_buff *skb) 1096 + { 1097 + struct bpf_dynptr meta; 1098 + __u8 *md; 1099 + 1100 + bpf_dynptr_from_skb_meta(skb, 0, &meta); 1101 + 1102 + md = bpf_dynptr_slice(&meta, 0, NULL, sizeof(*md)); 1103 + if (!md) 1104 + return SK_DROP; 1105 + 1106 + /* this should fail */ 1107 + *md = 42; 1108 + 1109 + return SK_PASS; 1110 + } 1111 + 1112 1112 /* The read-only data slice is invalidated whenever a helper changes packet data */ 1113 1113 SEC("?tc") 1114 1114 __failure __msg("invalid mem access 'scalar'") ··· 1232 1192 return SK_PASS; 1233 1193 } 1234 1194 1195 + /* Read-only skb data slice is invalidated on write to skb metadata */ 1196 + SEC("?tc") 1197 + __failure __msg("invalid mem access 'scalar'") 1198 + int ro_skb_slice_invalid_after_metadata_write(struct __sk_buff *skb) 1199 + { 1200 + struct bpf_dynptr data, meta; 1201 + __u8 *d; 1202 + 1203 + bpf_dynptr_from_skb(skb, 0, &data); 1204 + bpf_dynptr_from_skb_meta(skb, 0, &meta); 1205 + 1206 + d = bpf_dynptr_slice(&data, 0, NULL, sizeof(*d)); 1207 + if (!d) 1208 + return SK_DROP; 1209 + 1210 + bpf_dynptr_write(&meta, 0, "x", 1, 0); 1211 + 1212 + /* this should fail */ 1213 + val = *d; 1214 + 1215 + return SK_PASS; 1216 + } 1217 + 1218 + /* Read-write skb data slice is invalidated on write to skb metadata */ 1219 + SEC("?tc") 1220 + __failure __msg("invalid mem access 'scalar'") 1221 + int rw_skb_slice_invalid_after_metadata_write(struct __sk_buff *skb) 1222 + { 1223 + struct bpf_dynptr data, meta; 1224 + __u8 *d; 1225 + 1226 + bpf_dynptr_from_skb(skb, 0, &data); 1227 + bpf_dynptr_from_skb_meta(skb, 0, &meta); 1228 + 1229 + d = bpf_dynptr_slice_rdwr(&data, 0, NULL, sizeof(*d)); 1230 + if (!d) 1231 + return SK_DROP; 1232 + 1233 + bpf_dynptr_write(&meta, 0, "x", 1, 0); 1234 + 1235 + /* this should fail */ 1236 + *d = 42; 1237 + 1238 + return SK_PASS; 1239 + } 1240 + 1241 + /* Read-only skb metadata slice is invalidated on write to skb data */ 1242 + SEC("?tc") 1243 + __failure __msg("invalid mem access 'scalar'") 1244 + int ro_skb_meta_slice_invalid_after_payload_write(struct __sk_buff *skb) 1245 + { 1246 + struct bpf_dynptr data, meta; 1247 + __u8 *md; 1248 + 1249 + bpf_dynptr_from_skb(skb, 0, &data); 1250 + bpf_dynptr_from_skb_meta(skb, 0, &meta); 1251 + 1252 + md = bpf_dynptr_slice(&meta, 0, NULL, sizeof(*md)); 1253 + if (!md) 1254 + return SK_DROP; 1255 + 1256 + bpf_dynptr_write(&data, 0, "x", 1, 0); 1257 + 1258 + /* this should fail */ 1259 + val = *md; 1260 + 1261 + return SK_PASS; 1262 + } 1263 + 1264 + /* Read-write skb metadata slice is invalidated on write to skb data slice */ 1265 + SEC("?tc") 1266 + __failure __msg("invalid mem access 'scalar'") 1267 + int rw_skb_meta_slice_invalid_after_payload_write(struct __sk_buff *skb) 1268 + { 1269 + struct bpf_dynptr data, meta; 1270 + __u8 *md; 1271 + 1272 + bpf_dynptr_from_skb(skb, 0, &data); 1273 + bpf_dynptr_from_skb_meta(skb, 0, &meta); 1274 + 1275 + md = bpf_dynptr_slice_rdwr(&meta, 0, NULL, sizeof(*md)); 1276 + if (!md) 1277 + return SK_DROP; 1278 + 1279 + bpf_dynptr_write(&data, 0, "x", 1, 0); 1280 + 1281 + /* this should fail */ 1282 + *md = 42; 1283 + 1284 + return SK_PASS; 1285 + } 1286 + 1287 + /* Read-only skb metadata slice is invalidated whenever a helper changes packet data */ 1288 + SEC("?tc") 1289 + __failure __msg("invalid mem access 'scalar'") 1290 + int ro_skb_meta_slice_invalid_after_payload_helper(struct __sk_buff *skb) 1291 + { 1292 + struct bpf_dynptr meta; 1293 + __u8 *md; 1294 + 1295 + bpf_dynptr_from_skb_meta(skb, 0, &meta); 1296 + 1297 + md = bpf_dynptr_slice(&meta, 0, NULL, sizeof(*md)); 1298 + if (!md) 1299 + return SK_DROP; 1300 + 1301 + if (bpf_skb_pull_data(skb, skb->len)) 1302 + return SK_DROP; 1303 + 1304 + /* this should fail */ 1305 + val = *md; 1306 + 1307 + return SK_PASS; 1308 + } 1309 + 1310 + /* Read-write skb metadata slice is invalidated whenever a helper changes packet data */ 1311 + SEC("?tc") 1312 + __failure __msg("invalid mem access 'scalar'") 1313 + int rw_skb_meta_slice_invalid_after_payload_helper(struct __sk_buff *skb) 1314 + { 1315 + struct bpf_dynptr meta; 1316 + __u8 *md; 1317 + 1318 + bpf_dynptr_from_skb_meta(skb, 0, &meta); 1319 + 1320 + md = bpf_dynptr_slice_rdwr(&meta, 0, NULL, sizeof(*md)); 1321 + if (!md) 1322 + return SK_DROP; 1323 + 1324 + if (bpf_skb_pull_data(skb, skb->len)) 1325 + return SK_DROP; 1326 + 1327 + /* this should fail */ 1328 + *md = 42; 1329 + 1330 + return SK_PASS; 1331 + } 1332 + 1333 + /* Read-only skb metadata slice is invalidated on write to skb metadata */ 1334 + SEC("?tc") 1335 + __failure __msg("invalid mem access 'scalar'") 1336 + int ro_skb_meta_slice_invalid_after_metadata_write(struct __sk_buff *skb) 1337 + { 1338 + struct bpf_dynptr meta; 1339 + __u8 *md; 1340 + 1341 + bpf_dynptr_from_skb_meta(skb, 0, &meta); 1342 + 1343 + md = bpf_dynptr_slice(&meta, 0, NULL, sizeof(*md)); 1344 + if (!md) 1345 + return SK_DROP; 1346 + 1347 + bpf_dynptr_write(&meta, 0, "x", 1, 0); 1348 + 1349 + /* this should fail */ 1350 + val = *md; 1351 + 1352 + return SK_PASS; 1353 + } 1354 + 1355 + /* Read-write skb metadata slice is invalidated on write to skb metadata */ 1356 + SEC("?tc") 1357 + __failure __msg("invalid mem access 'scalar'") 1358 + int rw_skb_meta_slice_invalid_after_metadata_write(struct __sk_buff *skb) 1359 + { 1360 + struct bpf_dynptr meta; 1361 + __u8 *md; 1362 + 1363 + bpf_dynptr_from_skb_meta(skb, 0, &meta); 1364 + 1365 + md = bpf_dynptr_slice_rdwr(&meta, 0, NULL, sizeof(*md)); 1366 + if (!md) 1367 + return SK_DROP; 1368 + 1369 + bpf_dynptr_write(&meta, 0, "x", 1, 0); 1370 + 1371 + /* this should fail */ 1372 + *md = 42; 1373 + 1374 + return SK_PASS; 1375 + } 1376 + 1235 1377 /* The read-only data slice is invalidated whenever a helper changes packet data */ 1236 1378 SEC("?xdp") 1237 1379 __failure __msg("invalid mem access 'scalar'") ··· 1473 1251 1474 1252 /* this should fail */ 1475 1253 bpf_dynptr_from_skb(ctx, 0, &ptr); 1254 + 1255 + return 0; 1256 + } 1257 + 1258 + /* Only supported prog type can create skb_meta-type dynptrs */ 1259 + SEC("?raw_tp") 1260 + __failure __msg("calling kernel function bpf_dynptr_from_skb_meta is not allowed") 1261 + int skb_meta_invalid_ctx(void *ctx) 1262 + { 1263 + struct bpf_dynptr meta; 1264 + 1265 + /* this should fail */ 1266 + bpf_dynptr_from_skb_meta(ctx, 0, &meta); 1476 1267 1477 1268 return 0; 1478 1269 } ··· 1896 1661 1897 1662 /* this should fail */ 1898 1663 *data = 123; 1664 + 1665 + return 0; 1666 + } 1667 + 1668 + /* A skb clone's metadata slice becomes invalid anytime packet data changes */ 1669 + SEC("?tc") 1670 + __failure __msg("invalid mem access 'scalar'") 1671 + int clone_skb_packet_meta(struct __sk_buff *skb) 1672 + { 1673 + struct bpf_dynptr clone, meta; 1674 + __u8 *md; 1675 + 1676 + bpf_dynptr_from_skb_meta(skb, 0, &meta); 1677 + bpf_dynptr_clone(&meta, &clone); 1678 + md = bpf_dynptr_slice_rdwr(&clone, 0, NULL, sizeof(*md)); 1679 + if (!md) 1680 + return SK_DROP; 1681 + 1682 + if (bpf_skb_pull_data(skb, skb->len)) 1683 + return SK_DROP; 1684 + 1685 + /* this should fail */ 1686 + *md = 42; 1899 1687 1900 1688 return 0; 1901 1689 }
+55
tools/testing/selftests/bpf/progs/dynptr_success.c
··· 211 211 return 1; 212 212 } 213 213 214 + SEC("?tc") 215 + int test_dynptr_skb_meta_data(struct __sk_buff *skb) 216 + { 217 + struct bpf_dynptr meta; 218 + __u8 *md; 219 + int ret; 220 + 221 + err = 1; 222 + ret = bpf_dynptr_from_skb_meta(skb, 0, &meta); 223 + if (ret) 224 + return 1; 225 + 226 + /* This should return NULL. Must use bpf_dynptr_slice API */ 227 + err = 2; 228 + md = bpf_dynptr_data(&meta, 0, sizeof(*md)); 229 + if (md) 230 + return 1; 231 + 232 + err = 0; 233 + return 1; 234 + } 235 + 236 + /* Check that skb metadata dynptr ops don't accept any flags. */ 237 + SEC("?tc") 238 + int test_dynptr_skb_meta_flags(struct __sk_buff *skb) 239 + { 240 + const __u64 INVALID_FLAGS = ~0ULL; 241 + struct bpf_dynptr meta; 242 + __u8 buf; 243 + int ret; 244 + 245 + err = 1; 246 + ret = bpf_dynptr_from_skb_meta(skb, INVALID_FLAGS, &meta); 247 + if (ret != -EINVAL) 248 + return 1; 249 + 250 + err = 2; 251 + ret = bpf_dynptr_from_skb_meta(skb, 0, &meta); 252 + if (ret) 253 + return 1; 254 + 255 + err = 3; 256 + ret = bpf_dynptr_read(&buf, 0, &meta, 0, INVALID_FLAGS); 257 + if (ret != -EINVAL) 258 + return 1; 259 + 260 + err = 4; 261 + ret = bpf_dynptr_write(&meta, 0, &buf, 0, INVALID_FLAGS); 262 + if (ret != -EINVAL) 263 + return 1; 264 + 265 + err = 0; 266 + return 1; 267 + } 268 + 214 269 SEC("tp/syscalls/sys_enter_nanosleep") 215 270 int test_adjust(void *ctx) 216 271 {
+419
tools/testing/selftests/bpf/progs/test_xdp_meta.c
··· 1 + #include <stdbool.h> 1 2 #include <linux/bpf.h> 3 + #include <linux/errno.h> 2 4 #include <linux/if_ether.h> 3 5 #include <linux/pkt_cls.h> 4 6 5 7 #include <bpf/bpf_helpers.h> 8 + #include "bpf_kfuncs.h" 6 9 7 10 #define META_SIZE 32 8 11 ··· 26 23 __uint(value_size, META_SIZE); 27 24 } test_result SEC(".maps"); 28 25 26 + bool test_pass; 27 + 29 28 SEC("tc") 30 29 int ing_cls(struct __sk_buff *ctx) 31 30 { ··· 43 38 bpf_map_update_elem(&test_result, &key, data_meta, BPF_ANY); 44 39 45 40 return TC_ACT_SHOT; 41 + } 42 + 43 + /* Read from metadata using bpf_dynptr_read helper */ 44 + SEC("tc") 45 + int ing_cls_dynptr_read(struct __sk_buff *ctx) 46 + { 47 + struct bpf_dynptr meta; 48 + const __u32 zero = 0; 49 + __u8 *dst; 50 + 51 + dst = bpf_map_lookup_elem(&test_result, &zero); 52 + if (!dst) 53 + return TC_ACT_SHOT; 54 + 55 + bpf_dynptr_from_skb_meta(ctx, 0, &meta); 56 + bpf_dynptr_read(dst, META_SIZE, &meta, 0, 0); 57 + 58 + return TC_ACT_SHOT; 59 + } 60 + 61 + /* Write to metadata using bpf_dynptr_write helper */ 62 + SEC("tc") 63 + int ing_cls_dynptr_write(struct __sk_buff *ctx) 64 + { 65 + struct bpf_dynptr data, meta; 66 + __u8 *src; 67 + 68 + bpf_dynptr_from_skb(ctx, 0, &data); 69 + src = bpf_dynptr_slice(&data, sizeof(struct ethhdr), NULL, META_SIZE); 70 + if (!src) 71 + return TC_ACT_SHOT; 72 + 73 + bpf_dynptr_from_skb_meta(ctx, 0, &meta); 74 + bpf_dynptr_write(&meta, 0, src, META_SIZE, 0); 75 + 76 + return TC_ACT_UNSPEC; /* pass */ 77 + } 78 + 79 + /* Read from metadata using read-only dynptr slice */ 80 + SEC("tc") 81 + int ing_cls_dynptr_slice(struct __sk_buff *ctx) 82 + { 83 + struct bpf_dynptr meta; 84 + const __u32 zero = 0; 85 + __u8 *dst, *src; 86 + 87 + dst = bpf_map_lookup_elem(&test_result, &zero); 88 + if (!dst) 89 + return TC_ACT_SHOT; 90 + 91 + bpf_dynptr_from_skb_meta(ctx, 0, &meta); 92 + src = bpf_dynptr_slice(&meta, 0, NULL, META_SIZE); 93 + if (!src) 94 + return TC_ACT_SHOT; 95 + 96 + __builtin_memcpy(dst, src, META_SIZE); 97 + 98 + return TC_ACT_SHOT; 99 + } 100 + 101 + /* Write to metadata using writeable dynptr slice */ 102 + SEC("tc") 103 + int ing_cls_dynptr_slice_rdwr(struct __sk_buff *ctx) 104 + { 105 + struct bpf_dynptr data, meta; 106 + __u8 *src, *dst; 107 + 108 + bpf_dynptr_from_skb(ctx, 0, &data); 109 + src = bpf_dynptr_slice(&data, sizeof(struct ethhdr), NULL, META_SIZE); 110 + if (!src) 111 + return TC_ACT_SHOT; 112 + 113 + bpf_dynptr_from_skb_meta(ctx, 0, &meta); 114 + dst = bpf_dynptr_slice_rdwr(&meta, 0, NULL, META_SIZE); 115 + if (!dst) 116 + return TC_ACT_SHOT; 117 + 118 + __builtin_memcpy(dst, src, META_SIZE); 119 + 120 + return TC_ACT_UNSPEC; /* pass */ 121 + } 122 + 123 + /* Read skb metadata in chunks from various offsets in different ways. */ 124 + SEC("tc") 125 + int ing_cls_dynptr_offset_rd(struct __sk_buff *ctx) 126 + { 127 + struct bpf_dynptr meta; 128 + const __u32 chunk_len = META_SIZE / 4; 129 + const __u32 zero = 0; 130 + __u8 *dst, *src; 131 + 132 + dst = bpf_map_lookup_elem(&test_result, &zero); 133 + if (!dst) 134 + return TC_ACT_SHOT; 135 + 136 + /* 1. Regular read */ 137 + bpf_dynptr_from_skb_meta(ctx, 0, &meta); 138 + bpf_dynptr_read(dst, chunk_len, &meta, 0, 0); 139 + dst += chunk_len; 140 + 141 + /* 2. Read from an offset-adjusted dynptr */ 142 + bpf_dynptr_adjust(&meta, chunk_len, bpf_dynptr_size(&meta)); 143 + bpf_dynptr_read(dst, chunk_len, &meta, 0, 0); 144 + dst += chunk_len; 145 + 146 + /* 3. Read at an offset */ 147 + bpf_dynptr_read(dst, chunk_len, &meta, chunk_len, 0); 148 + dst += chunk_len; 149 + 150 + /* 4. Read from a slice starting at an offset */ 151 + src = bpf_dynptr_slice(&meta, 2 * chunk_len, NULL, chunk_len); 152 + if (!src) 153 + return TC_ACT_SHOT; 154 + __builtin_memcpy(dst, src, chunk_len); 155 + 156 + return TC_ACT_SHOT; 157 + } 158 + 159 + /* Write skb metadata in chunks at various offsets in different ways. */ 160 + SEC("tc") 161 + int ing_cls_dynptr_offset_wr(struct __sk_buff *ctx) 162 + { 163 + const __u32 chunk_len = META_SIZE / 4; 164 + __u8 payload[META_SIZE]; 165 + struct bpf_dynptr meta; 166 + __u8 *dst, *src; 167 + 168 + bpf_skb_load_bytes(ctx, sizeof(struct ethhdr), payload, sizeof(payload)); 169 + src = payload; 170 + 171 + /* 1. Regular write */ 172 + bpf_dynptr_from_skb_meta(ctx, 0, &meta); 173 + bpf_dynptr_write(&meta, 0, src, chunk_len, 0); 174 + src += chunk_len; 175 + 176 + /* 2. Write to an offset-adjusted dynptr */ 177 + bpf_dynptr_adjust(&meta, chunk_len, bpf_dynptr_size(&meta)); 178 + bpf_dynptr_write(&meta, 0, src, chunk_len, 0); 179 + src += chunk_len; 180 + 181 + /* 3. Write at an offset */ 182 + bpf_dynptr_write(&meta, chunk_len, src, chunk_len, 0); 183 + src += chunk_len; 184 + 185 + /* 4. Write to a slice starting at an offset */ 186 + dst = bpf_dynptr_slice_rdwr(&meta, 2 * chunk_len, NULL, chunk_len); 187 + if (!dst) 188 + return TC_ACT_SHOT; 189 + __builtin_memcpy(dst, src, chunk_len); 190 + 191 + return TC_ACT_UNSPEC; /* pass */ 192 + } 193 + 194 + /* Pass an OOB offset to dynptr read, write, adjust, slice. */ 195 + SEC("tc") 196 + int ing_cls_dynptr_offset_oob(struct __sk_buff *ctx) 197 + { 198 + struct bpf_dynptr meta; 199 + __u8 md, *p; 200 + int err; 201 + 202 + err = bpf_dynptr_from_skb_meta(ctx, 0, &meta); 203 + if (err) 204 + goto fail; 205 + 206 + /* read offset OOB */ 207 + err = bpf_dynptr_read(&md, sizeof(md), &meta, META_SIZE, 0); 208 + if (err != -E2BIG) 209 + goto fail; 210 + 211 + /* write offset OOB */ 212 + err = bpf_dynptr_write(&meta, META_SIZE, &md, sizeof(md), 0); 213 + if (err != -E2BIG) 214 + goto fail; 215 + 216 + /* adjust end offset OOB */ 217 + err = bpf_dynptr_adjust(&meta, 0, META_SIZE + 1); 218 + if (err != -ERANGE) 219 + goto fail; 220 + 221 + /* adjust start offset OOB */ 222 + err = bpf_dynptr_adjust(&meta, META_SIZE + 1, META_SIZE + 1); 223 + if (err != -ERANGE) 224 + goto fail; 225 + 226 + /* slice offset OOB */ 227 + p = bpf_dynptr_slice(&meta, META_SIZE, NULL, sizeof(*p)); 228 + if (p) 229 + goto fail; 230 + 231 + /* slice rdwr offset OOB */ 232 + p = bpf_dynptr_slice_rdwr(&meta, META_SIZE, NULL, sizeof(*p)); 233 + if (p) 234 + goto fail; 235 + 236 + return TC_ACT_UNSPEC; 237 + fail: 238 + return TC_ACT_SHOT; 239 + } 240 + 241 + /* Reserve and clear space for metadata but don't populate it */ 242 + SEC("xdp") 243 + int ing_xdp_zalloc_meta(struct xdp_md *ctx) 244 + { 245 + struct ethhdr *eth = ctx_ptr(ctx, data); 246 + __u8 *meta; 247 + int ret; 248 + 249 + /* Drop any non-test packets */ 250 + if (eth + 1 > ctx_ptr(ctx, data_end)) 251 + return XDP_DROP; 252 + if (eth->h_proto != 0) 253 + return XDP_DROP; 254 + 255 + ret = bpf_xdp_adjust_meta(ctx, -META_SIZE); 256 + if (ret < 0) 257 + return XDP_DROP; 258 + 259 + meta = ctx_ptr(ctx, data_meta); 260 + if (meta + META_SIZE > ctx_ptr(ctx, data)) 261 + return XDP_DROP; 262 + 263 + __builtin_memset(meta, 0, META_SIZE); 264 + 265 + return XDP_PASS; 46 266 } 47 267 48 268 SEC("xdp") ··· 301 71 302 72 __builtin_memcpy(data_meta, payload, META_SIZE); 303 73 return XDP_PASS; 74 + } 75 + 76 + /* 77 + * Check that skb->data_meta..skb->data is empty if prog writes to packet 78 + * _payload_ using packet pointers. Applies only to cloned skbs. 79 + */ 80 + SEC("tc") 81 + int clone_data_meta_empty_on_data_write(struct __sk_buff *ctx) 82 + { 83 + struct ethhdr *eth = ctx_ptr(ctx, data); 84 + 85 + if (eth + 1 > ctx_ptr(ctx, data_end)) 86 + goto out; 87 + /* Ignore non-test packets */ 88 + if (eth->h_proto != 0) 89 + goto out; 90 + 91 + /* Expect no metadata */ 92 + if (ctx->data_meta != ctx->data) 93 + goto out; 94 + 95 + /* Packet write to trigger unclone in prologue */ 96 + eth->h_proto = 42; 97 + 98 + test_pass = true; 99 + out: 100 + return TC_ACT_SHOT; 101 + } 102 + 103 + /* 104 + * Check that skb->data_meta..skb->data is empty if prog writes to packet 105 + * _metadata_ using packet pointers. Applies only to cloned skbs. 106 + */ 107 + SEC("tc") 108 + int clone_data_meta_empty_on_meta_write(struct __sk_buff *ctx) 109 + { 110 + struct ethhdr *eth = ctx_ptr(ctx, data); 111 + __u8 *md = ctx_ptr(ctx, data_meta); 112 + 113 + if (eth + 1 > ctx_ptr(ctx, data_end)) 114 + goto out; 115 + /* Ignore non-test packets */ 116 + if (eth->h_proto != 0) 117 + goto out; 118 + 119 + if (md + 1 > ctx_ptr(ctx, data)) { 120 + /* Expect no metadata */ 121 + test_pass = true; 122 + } else { 123 + /* Metadata write to trigger unclone in prologue */ 124 + *md = 42; 125 + } 126 + out: 127 + return TC_ACT_SHOT; 128 + } 129 + 130 + /* 131 + * Check that skb_meta dynptr is writable but empty if prog writes to packet 132 + * _payload_ using a dynptr slice. Applies only to cloned skbs. 133 + */ 134 + SEC("tc") 135 + int clone_dynptr_empty_on_data_slice_write(struct __sk_buff *ctx) 136 + { 137 + struct bpf_dynptr data, meta; 138 + struct ethhdr *eth; 139 + 140 + bpf_dynptr_from_skb(ctx, 0, &data); 141 + eth = bpf_dynptr_slice_rdwr(&data, 0, NULL, sizeof(*eth)); 142 + if (!eth) 143 + goto out; 144 + /* Ignore non-test packets */ 145 + if (eth->h_proto != 0) 146 + goto out; 147 + 148 + /* Expect no metadata */ 149 + bpf_dynptr_from_skb_meta(ctx, 0, &meta); 150 + if (bpf_dynptr_is_rdonly(&meta) || bpf_dynptr_size(&meta) > 0) 151 + goto out; 152 + 153 + /* Packet write to trigger unclone in prologue */ 154 + eth->h_proto = 42; 155 + 156 + test_pass = true; 157 + out: 158 + return TC_ACT_SHOT; 159 + } 160 + 161 + /* 162 + * Check that skb_meta dynptr is writable but empty if prog writes to packet 163 + * _metadata_ using a dynptr slice. Applies only to cloned skbs. 164 + */ 165 + SEC("tc") 166 + int clone_dynptr_empty_on_meta_slice_write(struct __sk_buff *ctx) 167 + { 168 + struct bpf_dynptr data, meta; 169 + const struct ethhdr *eth; 170 + __u8 *md; 171 + 172 + bpf_dynptr_from_skb(ctx, 0, &data); 173 + eth = bpf_dynptr_slice(&data, 0, NULL, sizeof(*eth)); 174 + if (!eth) 175 + goto out; 176 + /* Ignore non-test packets */ 177 + if (eth->h_proto != 0) 178 + goto out; 179 + 180 + /* Expect no metadata */ 181 + bpf_dynptr_from_skb_meta(ctx, 0, &meta); 182 + if (bpf_dynptr_is_rdonly(&meta) || bpf_dynptr_size(&meta) > 0) 183 + goto out; 184 + 185 + /* Metadata write to trigger unclone in prologue */ 186 + bpf_dynptr_from_skb_meta(ctx, 0, &meta); 187 + md = bpf_dynptr_slice_rdwr(&meta, 0, NULL, sizeof(*md)); 188 + if (md) 189 + *md = 42; 190 + 191 + test_pass = true; 192 + out: 193 + return TC_ACT_SHOT; 194 + } 195 + 196 + /* 197 + * Check that skb_meta dynptr is read-only before prog writes to packet payload 198 + * using dynptr_write helper. Applies only to cloned skbs. 199 + */ 200 + SEC("tc") 201 + int clone_dynptr_rdonly_before_data_dynptr_write(struct __sk_buff *ctx) 202 + { 203 + struct bpf_dynptr data, meta; 204 + const struct ethhdr *eth; 205 + 206 + bpf_dynptr_from_skb(ctx, 0, &data); 207 + eth = bpf_dynptr_slice(&data, 0, NULL, sizeof(*eth)); 208 + if (!eth) 209 + goto out; 210 + /* Ignore non-test packets */ 211 + if (eth->h_proto != 0) 212 + goto out; 213 + 214 + /* Expect read-only metadata before unclone */ 215 + bpf_dynptr_from_skb_meta(ctx, 0, &meta); 216 + if (!bpf_dynptr_is_rdonly(&meta) || bpf_dynptr_size(&meta) != META_SIZE) 217 + goto out; 218 + 219 + /* Helper write to payload will unclone the packet */ 220 + bpf_dynptr_write(&data, offsetof(struct ethhdr, h_proto), "x", 1, 0); 221 + 222 + /* Expect no metadata after unclone */ 223 + bpf_dynptr_from_skb_meta(ctx, 0, &meta); 224 + if (bpf_dynptr_is_rdonly(&meta) || bpf_dynptr_size(&meta) != 0) 225 + goto out; 226 + 227 + test_pass = true; 228 + out: 229 + return TC_ACT_SHOT; 230 + } 231 + 232 + /* 233 + * Check that skb_meta dynptr is read-only if prog writes to packet 234 + * metadata using dynptr_write helper. Applies only to cloned skbs. 235 + */ 236 + SEC("tc") 237 + int clone_dynptr_rdonly_before_meta_dynptr_write(struct __sk_buff *ctx) 238 + { 239 + struct bpf_dynptr data, meta; 240 + const struct ethhdr *eth; 241 + 242 + bpf_dynptr_from_skb(ctx, 0, &data); 243 + eth = bpf_dynptr_slice(&data, 0, NULL, sizeof(*eth)); 244 + if (!eth) 245 + goto out; 246 + /* Ignore non-test packets */ 247 + if (eth->h_proto != 0) 248 + goto out; 249 + 250 + /* Expect read-only metadata */ 251 + bpf_dynptr_from_skb_meta(ctx, 0, &meta); 252 + if (!bpf_dynptr_is_rdonly(&meta) || bpf_dynptr_size(&meta) != META_SIZE) 253 + goto out; 254 + 255 + /* Metadata write. Expect failure. */ 256 + bpf_dynptr_from_skb_meta(ctx, 0, &meta); 257 + if (bpf_dynptr_write(&meta, 0, "x", 1, 0) != -EINVAL) 258 + goto out; 259 + 260 + test_pass = true; 261 + out: 262 + return TC_ACT_SHOT; 304 263 } 305 264 306 265 char _license[] SEC("license") = "GPL";