Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge branch 'close-race-in-freeing-special-fields-and-map-value'

Kumar Kartikeya Dwivedi says:

====================
Close race in freeing special fields and map value

There exists a race across various map types where the freeing of
special fields (tw, timer, wq, kptr, etc.) can be done eagerly when a
logical delete operation is done on a map value, such that the program
which continues to have access to such a map value can recreate the
fields and cause them to leak.

The set contains fixes for this case. It is a continuation of Mykyta's
previous attempt in [0], but applies to all fields. A test is included
which reproduces the bug reliably in absence of the fixes.

Local Storage Benchmarks
------------------------
Evaluation Setup: Benchmarked on a dual-socket Intel Xeon Gold 6348 (Ice
Lake) @ 2.60GHz (56 cores / 112 threads), with the CPU governor set to
performance. Bench was pinned to a single NUMA node throughout the test.

Benchmark comes from [1] using the following command:
./bench -p 1 local-storage-create --storage-type <socket,task> --batch-size <16,32,64>

Before the test, 10 runs of all cases ([socket|task] x 3 batch sizes x 7
iterations per batch size) are done to warm up and prime the machine.

Then, 3 runs of all cases are done (with and without the patch, across
reboots).

For each comparison, we have 21 samples, i.e. per batch size (e.g.
socket 16) of a given local storage, we have 3 runs x 7 iterations.

The statistics (mean, median, stddev) and t-test is done for each
scenario (local storage and batch size pair) individually (21 samples
for either case). All values are for local storage creations in thousand
creations / sec (k/s).

Baseline (without patch) With patch Delta
Case Median Mean Std. Dev. Median Mean Std. Dev. Median %
---------------------------------------------------------------------------------------------------
socket 16 432.026 431.941 1.047 431.347 431.953 1.635 -0.679 -0.16%
socket 32 432.641 432.818 1.535 432.488 432.302 1.508 -0.153 -0.04%
socket 64 431.504 431.996 1.337 429.145 430.326 2.469 -2.359 -0.55%
task 16 38.816 39.382 1.456 39.657 39.337 1.831 +0.841 +2.17%
task 32 38.815 39.644 2.690 38.721 39.122 1.636 -0.094 -0.24%
task 64 37.562 38.080 1.701 39.554 38.563 1.689 +1.992 +5.30%

The cases for socket are within the range of noise, and improvements in task
local storage are due to high variance (CV ~4%-6% across batch sizes). The only
statistically significant case worth mentioning is socket with batch size 64
with p-value from t-test < 0.05, but the absolute difference is small (~2k/s).

TL;DR there doesn't appear to be any significant regression or improvement.

[0]: https://lore.kernel.org/bpf/20260216131341.1285427-1-mykyta.yatsenko5@gmail.com
[1]: https://lore.kernel.org/bpf/20260205222916.1788211-1-ameryhung@gmail.com

Changelog:
----------
v2 -> v3
v2: https://lore.kernel.org/bpf/20260227052031.3988575-1-memxor@gmail.com

* Add syzbot Tested-by.
* Add Amery's Reviewed-by.
* Fix missing rcu_dereference_check() in __bpf_selem_free_rcu. (BPF CI Bot)
* Remove migrate_disable() in bpf_selem_free_rcu. (Alexei)

v1 -> v2
v1: https://lore.kernel.org/bpf/20260225185121.2057388-1-memxor@gmail.com

* Add Paul's Reviewed-by.
* Fix use-after-free in accessing bpf_mem_alloc embedded in map. (syzbot CI)
* Add benchmark numbers for local storage.
* Add extra test case for per-cpu hashmap coverage with up to 16 refcount leaks.
* Target bpf tree.
====================

Link: https://patch.msgid.link/20260227224806.646888-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

+604 -58
+2 -2
include/linux/bpf.h
··· 124 124 u32 (*map_fd_sys_lookup_elem)(void *ptr); 125 125 void (*map_seq_show_elem)(struct bpf_map *map, void *key, 126 126 struct seq_file *m); 127 - int (*map_check_btf)(const struct bpf_map *map, 127 + int (*map_check_btf)(struct bpf_map *map, 128 128 const struct btf *btf, 129 129 const struct btf_type *key_type, 130 130 const struct btf_type *value_type); ··· 656 656 map->ops->map_seq_show_elem; 657 657 } 658 658 659 - int map_check_no_btf(const struct bpf_map *map, 659 + int map_check_no_btf(struct bpf_map *map, 660 660 const struct btf *btf, 661 661 const struct btf_type *key_type, 662 662 const struct btf_type *value_type);
+1 -1
include/linux/bpf_local_storage.h
··· 176 176 void bpf_local_storage_map_free(struct bpf_map *map, 177 177 struct bpf_local_storage_cache *cache); 178 178 179 - int bpf_local_storage_map_check_btf(const struct bpf_map *map, 179 + int bpf_local_storage_map_check_btf(struct bpf_map *map, 180 180 const struct btf *btf, 181 181 const struct btf_type *key_type, 182 182 const struct btf_type *value_type);
+6
include/linux/bpf_mem_alloc.h
··· 14 14 struct obj_cgroup *objcg; 15 15 bool percpu; 16 16 struct work_struct work; 17 + void (*dtor_ctx_free)(void *ctx); 18 + void *dtor_ctx; 17 19 }; 18 20 19 21 /* 'size != 0' is for bpf_mem_alloc which manages fixed-size objects. ··· 34 32 /* The percpu allocation with a specific unit size. */ 35 33 int bpf_mem_alloc_percpu_unit_init(struct bpf_mem_alloc *ma, int size); 36 34 void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma); 35 + void bpf_mem_alloc_set_dtor(struct bpf_mem_alloc *ma, 36 + void (*dtor)(void *obj, void *ctx), 37 + void (*dtor_ctx_free)(void *ctx), 38 + void *ctx); 37 39 38 40 /* Check the allocation size for kmalloc equivalent allocator */ 39 41 int bpf_mem_alloc_check_size(bool percpu, size_t size);
+1 -1
kernel/bpf/arena.c
··· 303 303 return -EOPNOTSUPP; 304 304 } 305 305 306 - static int arena_map_check_btf(const struct bpf_map *map, const struct btf *btf, 306 + static int arena_map_check_btf(struct bpf_map *map, const struct btf *btf, 307 307 const struct btf_type *key_type, const struct btf_type *value_type) 308 308 { 309 309 return 0;
+1 -1
kernel/bpf/arraymap.c
··· 548 548 rcu_read_unlock(); 549 549 } 550 550 551 - static int array_map_check_btf(const struct bpf_map *map, 551 + static int array_map_check_btf(struct bpf_map *map, 552 552 const struct btf *btf, 553 553 const struct btf_type *key_type, 554 554 const struct btf_type *value_type)
+1 -1
kernel/bpf/bloom_filter.c
··· 180 180 return -EINVAL; 181 181 } 182 182 183 - static int bloom_map_check_btf(const struct bpf_map *map, 183 + static int bloom_map_check_btf(struct bpf_map *map, 184 184 const struct btf *btf, 185 185 const struct btf_type *key_type, 186 186 const struct btf_type *value_type)
+1 -1
kernel/bpf/bpf_insn_array.c
··· 98 98 return -EINVAL; 99 99 } 100 100 101 - static int insn_array_check_btf(const struct bpf_map *map, 101 + static int insn_array_check_btf(struct bpf_map *map, 102 102 const struct btf *btf, 103 103 const struct btf_type *key_type, 104 104 const struct btf_type *value_type)
+40 -37
kernel/bpf/bpf_local_storage.c
··· 107 107 { 108 108 struct bpf_local_storage *local_storage; 109 109 110 - /* If RCU Tasks Trace grace period implies RCU grace period, do 111 - * kfree(), else do kfree_rcu(). 110 + /* 111 + * RCU Tasks Trace grace period implies RCU grace period, do 112 + * kfree() directly. 112 113 */ 113 114 local_storage = container_of(rcu, struct bpf_local_storage, rcu); 114 - if (rcu_trace_implies_rcu_gp()) 115 - kfree(local_storage); 116 - else 117 - kfree_rcu(local_storage, rcu); 115 + kfree(local_storage); 118 116 } 119 117 120 118 /* Handle use_kmalloc_nolock == false */ ··· 136 138 137 139 static void bpf_local_storage_free_trace_rcu(struct rcu_head *rcu) 138 140 { 139 - if (rcu_trace_implies_rcu_gp()) 140 - bpf_local_storage_free_rcu(rcu); 141 - else 142 - call_rcu(rcu, bpf_local_storage_free_rcu); 141 + /* 142 + * RCU Tasks Trace grace period implies RCU grace period, do 143 + * kfree() directly. 144 + */ 145 + bpf_local_storage_free_rcu(rcu); 143 146 } 144 147 145 148 static void bpf_local_storage_free(struct bpf_local_storage *local_storage, ··· 163 164 bpf_local_storage_free_trace_rcu); 164 165 } 165 166 167 + /* rcu callback for use_kmalloc_nolock == false */ 168 + static void __bpf_selem_free_rcu(struct rcu_head *rcu) 169 + { 170 + struct bpf_local_storage_elem *selem; 171 + struct bpf_local_storage_map *smap; 172 + 173 + selem = container_of(rcu, struct bpf_local_storage_elem, rcu); 174 + /* bpf_selem_unlink_nofail may have already cleared smap and freed fields. */ 175 + smap = rcu_dereference_check(SDATA(selem)->smap, 1); 176 + 177 + if (smap) 178 + bpf_obj_free_fields(smap->map.record, SDATA(selem)->data); 179 + kfree(selem); 180 + } 181 + 166 182 /* rcu tasks trace callback for use_kmalloc_nolock == false */ 167 183 static void __bpf_selem_free_trace_rcu(struct rcu_head *rcu) 168 184 { 169 - struct bpf_local_storage_elem *selem; 170 - 171 - selem = container_of(rcu, struct bpf_local_storage_elem, rcu); 172 - if (rcu_trace_implies_rcu_gp()) 173 - kfree(selem); 174 - else 175 - kfree_rcu(selem, rcu); 185 + /* 186 + * RCU Tasks Trace grace period implies RCU grace period, do 187 + * kfree() directly. 188 + */ 189 + __bpf_selem_free_rcu(rcu); 176 190 } 177 191 178 192 /* Handle use_kmalloc_nolock == false */ ··· 193 181 bool vanilla_rcu) 194 182 { 195 183 if (vanilla_rcu) 196 - kfree_rcu(selem, rcu); 184 + call_rcu(&selem->rcu, __bpf_selem_free_rcu); 197 185 else 198 186 call_rcu_tasks_trace(&selem->rcu, __bpf_selem_free_trace_rcu); 199 187 } ··· 207 195 /* The bpf_local_storage_map_free will wait for rcu_barrier */ 208 196 smap = rcu_dereference_check(SDATA(selem)->smap, 1); 209 197 210 - if (smap) { 211 - migrate_disable(); 198 + if (smap) 212 199 bpf_obj_free_fields(smap->map.record, SDATA(selem)->data); 213 - migrate_enable(); 214 - } 215 200 kfree_nolock(selem); 216 201 } 217 202 218 203 static void bpf_selem_free_trace_rcu(struct rcu_head *rcu) 219 204 { 220 - if (rcu_trace_implies_rcu_gp()) 221 - bpf_selem_free_rcu(rcu); 222 - else 223 - call_rcu(rcu, bpf_selem_free_rcu); 205 + /* 206 + * RCU Tasks Trace grace period implies RCU grace period, do 207 + * kfree() directly. 208 + */ 209 + bpf_selem_free_rcu(rcu); 224 210 } 225 211 226 212 void bpf_selem_free(struct bpf_local_storage_elem *selem, 227 213 bool reuse_now) 228 214 { 229 - struct bpf_local_storage_map *smap; 230 - 231 - smap = rcu_dereference_check(SDATA(selem)->smap, bpf_rcu_lock_held()); 232 - 233 215 if (!selem->use_kmalloc_nolock) { 234 216 /* 235 217 * No uptr will be unpin even when reuse_now == false since uptr 236 218 * is only supported in task local storage, where 237 219 * smap->use_kmalloc_nolock == true. 238 220 */ 239 - if (smap) 240 - bpf_obj_free_fields(smap->map.record, SDATA(selem)->data); 241 221 __bpf_selem_free(selem, reuse_now); 242 222 return; 243 223 } ··· 801 797 return 0; 802 798 } 803 799 804 - int bpf_local_storage_map_check_btf(const struct bpf_map *map, 800 + int bpf_local_storage_map_check_btf(struct bpf_map *map, 805 801 const struct btf *btf, 806 802 const struct btf_type *key_type, 807 803 const struct btf_type *value_type) ··· 962 958 */ 963 959 synchronize_rcu(); 964 960 965 - if (smap->use_kmalloc_nolock) { 966 - rcu_barrier_tasks_trace(); 967 - rcu_barrier(); 968 - } 961 + /* smap remains in use regardless of kmalloc_nolock, so wait unconditionally. */ 962 + rcu_barrier_tasks_trace(); 963 + rcu_barrier(); 969 964 kvfree(smap->buckets); 970 965 bpf_map_area_free(smap); 971 966 }
+86
kernel/bpf/hashtab.c
··· 125 125 char key[] __aligned(8); 126 126 }; 127 127 128 + struct htab_btf_record { 129 + struct btf_record *record; 130 + u32 key_size; 131 + }; 132 + 128 133 static inline bool htab_is_prealloc(const struct bpf_htab *htab) 129 134 { 130 135 return !(htab->map.map_flags & BPF_F_NO_PREALLOC); ··· 460 455 return -E2BIG; 461 456 462 457 return 0; 458 + } 459 + 460 + static void htab_mem_dtor(void *obj, void *ctx) 461 + { 462 + struct htab_btf_record *hrec = ctx; 463 + struct htab_elem *elem = obj; 464 + void *map_value; 465 + 466 + if (IS_ERR_OR_NULL(hrec->record)) 467 + return; 468 + 469 + map_value = htab_elem_value(elem, hrec->key_size); 470 + bpf_obj_free_fields(hrec->record, map_value); 471 + } 472 + 473 + static void htab_pcpu_mem_dtor(void *obj, void *ctx) 474 + { 475 + void __percpu *pptr = *(void __percpu **)obj; 476 + struct htab_btf_record *hrec = ctx; 477 + int cpu; 478 + 479 + if (IS_ERR_OR_NULL(hrec->record)) 480 + return; 481 + 482 + for_each_possible_cpu(cpu) 483 + bpf_obj_free_fields(hrec->record, per_cpu_ptr(pptr, cpu)); 484 + } 485 + 486 + static void htab_dtor_ctx_free(void *ctx) 487 + { 488 + struct htab_btf_record *hrec = ctx; 489 + 490 + btf_record_free(hrec->record); 491 + kfree(ctx); 492 + } 493 + 494 + static int htab_set_dtor(struct bpf_htab *htab, void (*dtor)(void *, void *)) 495 + { 496 + u32 key_size = htab->map.key_size; 497 + struct bpf_mem_alloc *ma; 498 + struct htab_btf_record *hrec; 499 + int err; 500 + 501 + /* No need for dtors. */ 502 + if (IS_ERR_OR_NULL(htab->map.record)) 503 + return 0; 504 + 505 + hrec = kzalloc(sizeof(*hrec), GFP_KERNEL); 506 + if (!hrec) 507 + return -ENOMEM; 508 + hrec->key_size = key_size; 509 + hrec->record = btf_record_dup(htab->map.record); 510 + if (IS_ERR(hrec->record)) { 511 + err = PTR_ERR(hrec->record); 512 + kfree(hrec); 513 + return err; 514 + } 515 + ma = htab_is_percpu(htab) ? &htab->pcpu_ma : &htab->ma; 516 + bpf_mem_alloc_set_dtor(ma, dtor, htab_dtor_ctx_free, hrec); 517 + return 0; 518 + } 519 + 520 + static int htab_map_check_btf(struct bpf_map *map, const struct btf *btf, 521 + const struct btf_type *key_type, const struct btf_type *value_type) 522 + { 523 + struct bpf_htab *htab = container_of(map, struct bpf_htab, map); 524 + 525 + if (htab_is_prealloc(htab)) 526 + return 0; 527 + /* 528 + * We must set the dtor using this callback, as map's BTF record is not 529 + * populated in htab_map_alloc(), so it will always appear as NULL. 530 + */ 531 + if (htab_is_percpu(htab)) 532 + return htab_set_dtor(htab, htab_pcpu_mem_dtor); 533 + else 534 + return htab_set_dtor(htab, htab_mem_dtor); 463 535 } 464 536 465 537 static struct bpf_map *htab_map_alloc(union bpf_attr *attr) ··· 2363 2281 .map_seq_show_elem = htab_map_seq_show_elem, 2364 2282 .map_set_for_each_callback_args = map_set_for_each_callback_args, 2365 2283 .map_for_each_callback = bpf_for_each_hash_elem, 2284 + .map_check_btf = htab_map_check_btf, 2366 2285 .map_mem_usage = htab_map_mem_usage, 2367 2286 BATCH_OPS(htab), 2368 2287 .map_btf_id = &htab_map_btf_ids[0], ··· 2386 2303 .map_seq_show_elem = htab_map_seq_show_elem, 2387 2304 .map_set_for_each_callback_args = map_set_for_each_callback_args, 2388 2305 .map_for_each_callback = bpf_for_each_hash_elem, 2306 + .map_check_btf = htab_map_check_btf, 2389 2307 .map_mem_usage = htab_map_mem_usage, 2390 2308 BATCH_OPS(htab_lru), 2391 2309 .map_btf_id = &htab_map_btf_ids[0], ··· 2566 2482 .map_seq_show_elem = htab_percpu_map_seq_show_elem, 2567 2483 .map_set_for_each_callback_args = map_set_for_each_callback_args, 2568 2484 .map_for_each_callback = bpf_for_each_hash_elem, 2485 + .map_check_btf = htab_map_check_btf, 2569 2486 .map_mem_usage = htab_map_mem_usage, 2570 2487 BATCH_OPS(htab_percpu), 2571 2488 .map_btf_id = &htab_map_btf_ids[0], ··· 2587 2502 .map_seq_show_elem = htab_percpu_map_seq_show_elem, 2588 2503 .map_set_for_each_callback_args = map_set_for_each_callback_args, 2589 2504 .map_for_each_callback = bpf_for_each_hash_elem, 2505 + .map_check_btf = htab_map_check_btf, 2590 2506 .map_mem_usage = htab_map_mem_usage, 2591 2507 BATCH_OPS(htab_lru_percpu), 2592 2508 .map_btf_id = &htab_map_btf_ids[0],
+1 -1
kernel/bpf/local_storage.c
··· 364 364 return -EINVAL; 365 365 } 366 366 367 - static int cgroup_storage_check_btf(const struct bpf_map *map, 367 + static int cgroup_storage_check_btf(struct bpf_map *map, 368 368 const struct btf *btf, 369 369 const struct btf_type *key_type, 370 370 const struct btf_type *value_type)
+1 -1
kernel/bpf/lpm_trie.c
··· 751 751 return err; 752 752 } 753 753 754 - static int trie_check_btf(const struct bpf_map *map, 754 + static int trie_check_btf(struct bpf_map *map, 755 755 const struct btf *btf, 756 756 const struct btf_type *key_type, 757 757 const struct btf_type *value_type)
+47 -11
kernel/bpf/memalloc.c
··· 102 102 int percpu_size; 103 103 bool draining; 104 104 struct bpf_mem_cache *tgt; 105 + void (*dtor)(void *obj, void *ctx); 106 + void *dtor_ctx; 105 107 106 108 /* list of objects to be freed after RCU GP */ 107 109 struct llist_head free_by_rcu; ··· 262 260 kfree(obj); 263 261 } 264 262 265 - static int free_all(struct llist_node *llnode, bool percpu) 263 + static int free_all(struct bpf_mem_cache *c, struct llist_node *llnode, bool percpu) 266 264 { 267 265 struct llist_node *pos, *t; 268 266 int cnt = 0; 269 267 270 268 llist_for_each_safe(pos, t, llnode) { 269 + if (c->dtor) 270 + c->dtor((void *)pos + LLIST_NODE_SZ, c->dtor_ctx); 271 271 free_one(pos, percpu); 272 272 cnt++; 273 273 } ··· 280 276 { 281 277 struct bpf_mem_cache *c = container_of(head, struct bpf_mem_cache, rcu_ttrace); 282 278 283 - free_all(llist_del_all(&c->waiting_for_gp_ttrace), !!c->percpu_size); 279 + free_all(c, llist_del_all(&c->waiting_for_gp_ttrace), !!c->percpu_size); 284 280 atomic_set(&c->call_rcu_ttrace_in_progress, 0); 285 281 } 286 282 ··· 312 308 if (atomic_xchg(&c->call_rcu_ttrace_in_progress, 1)) { 313 309 if (unlikely(READ_ONCE(c->draining))) { 314 310 llnode = llist_del_all(&c->free_by_rcu_ttrace); 315 - free_all(llnode, !!c->percpu_size); 311 + free_all(c, llnode, !!c->percpu_size); 316 312 } 317 313 return; 318 314 } ··· 421 417 dec_active(c, &flags); 422 418 423 419 if (unlikely(READ_ONCE(c->draining))) { 424 - free_all(llist_del_all(&c->waiting_for_gp), !!c->percpu_size); 420 + free_all(c, llist_del_all(&c->waiting_for_gp), !!c->percpu_size); 425 421 atomic_set(&c->call_rcu_in_progress, 0); 426 422 } else { 427 423 call_rcu_hurry(&c->rcu, __free_by_rcu); ··· 639 635 * Except for waiting_for_gp_ttrace list, there are no concurrent operations 640 636 * on these lists, so it is safe to use __llist_del_all(). 641 637 */ 642 - free_all(llist_del_all(&c->free_by_rcu_ttrace), percpu); 643 - free_all(llist_del_all(&c->waiting_for_gp_ttrace), percpu); 644 - free_all(__llist_del_all(&c->free_llist), percpu); 645 - free_all(__llist_del_all(&c->free_llist_extra), percpu); 646 - free_all(__llist_del_all(&c->free_by_rcu), percpu); 647 - free_all(__llist_del_all(&c->free_llist_extra_rcu), percpu); 648 - free_all(llist_del_all(&c->waiting_for_gp), percpu); 638 + free_all(c, llist_del_all(&c->free_by_rcu_ttrace), percpu); 639 + free_all(c, llist_del_all(&c->waiting_for_gp_ttrace), percpu); 640 + free_all(c, __llist_del_all(&c->free_llist), percpu); 641 + free_all(c, __llist_del_all(&c->free_llist_extra), percpu); 642 + free_all(c, __llist_del_all(&c->free_by_rcu), percpu); 643 + free_all(c, __llist_del_all(&c->free_llist_extra_rcu), percpu); 644 + free_all(c, llist_del_all(&c->waiting_for_gp), percpu); 649 645 } 650 646 651 647 static void check_mem_cache(struct bpf_mem_cache *c) ··· 684 680 685 681 static void free_mem_alloc_no_barrier(struct bpf_mem_alloc *ma) 686 682 { 683 + /* We can free dtor ctx only once all callbacks are done using it. */ 684 + if (ma->dtor_ctx_free) 685 + ma->dtor_ctx_free(ma->dtor_ctx); 687 686 check_leaked_objs(ma); 688 687 free_percpu(ma->cache); 689 688 free_percpu(ma->caches); ··· 1020 1013 return -E2BIG; 1021 1014 1022 1015 return 0; 1016 + } 1017 + 1018 + void bpf_mem_alloc_set_dtor(struct bpf_mem_alloc *ma, void (*dtor)(void *obj, void *ctx), 1019 + void (*dtor_ctx_free)(void *ctx), void *ctx) 1020 + { 1021 + struct bpf_mem_caches *cc; 1022 + struct bpf_mem_cache *c; 1023 + int cpu, i; 1024 + 1025 + ma->dtor_ctx_free = dtor_ctx_free; 1026 + ma->dtor_ctx = ctx; 1027 + 1028 + if (ma->cache) { 1029 + for_each_possible_cpu(cpu) { 1030 + c = per_cpu_ptr(ma->cache, cpu); 1031 + c->dtor = dtor; 1032 + c->dtor_ctx = ctx; 1033 + } 1034 + } 1035 + if (ma->caches) { 1036 + for_each_possible_cpu(cpu) { 1037 + cc = per_cpu_ptr(ma->caches, cpu); 1038 + for (i = 0; i < NUM_CACHES; i++) { 1039 + c = &cc->cache[i]; 1040 + c->dtor = dtor; 1041 + c->dtor_ctx = ctx; 1042 + } 1043 + } 1044 + } 1023 1045 }
+1 -1
kernel/bpf/syscall.c
··· 1234 1234 } 1235 1235 EXPORT_SYMBOL_GPL(bpf_obj_name_cpy); 1236 1236 1237 - int map_check_no_btf(const struct bpf_map *map, 1237 + int map_check_no_btf(struct bpf_map *map, 1238 1238 const struct btf *btf, 1239 1239 const struct btf_type *key_type, 1240 1240 const struct btf_type *value_type)
+218
tools/testing/selftests/bpf/prog_tests/map_kptr_race.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2026 Meta Platforms, Inc. and affiliates. */ 3 + #include <test_progs.h> 4 + #include <network_helpers.h> 5 + 6 + #include "map_kptr_race.skel.h" 7 + 8 + static int get_map_id(int map_fd) 9 + { 10 + struct bpf_map_info info = {}; 11 + __u32 len = sizeof(info); 12 + 13 + if (!ASSERT_OK(bpf_map_get_info_by_fd(map_fd, &info, &len), "get_map_info")) 14 + return -1; 15 + return info.id; 16 + } 17 + 18 + static int read_refs(struct map_kptr_race *skel) 19 + { 20 + LIBBPF_OPTS(bpf_test_run_opts, opts); 21 + int ret; 22 + 23 + ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.count_ref), &opts); 24 + if (!ASSERT_OK(ret, "count_ref run")) 25 + return -1; 26 + if (!ASSERT_OK(opts.retval, "count_ref retval")) 27 + return -1; 28 + return skel->bss->num_of_refs; 29 + } 30 + 31 + static void test_htab_leak(void) 32 + { 33 + LIBBPF_OPTS(bpf_test_run_opts, opts, 34 + .data_in = &pkt_v4, 35 + .data_size_in = sizeof(pkt_v4), 36 + .repeat = 1, 37 + ); 38 + struct map_kptr_race *skel, *watcher; 39 + int ret, map_id; 40 + 41 + skel = map_kptr_race__open_and_load(); 42 + if (!ASSERT_OK_PTR(skel, "open_and_load")) 43 + return; 44 + 45 + ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.test_htab_leak), &opts); 46 + if (!ASSERT_OK(ret, "test_htab_leak run")) 47 + goto out_skel; 48 + if (!ASSERT_OK(opts.retval, "test_htab_leak retval")) 49 + goto out_skel; 50 + 51 + map_id = get_map_id(bpf_map__fd(skel->maps.race_hash_map)); 52 + if (!ASSERT_GE(map_id, 0, "map_id")) 53 + goto out_skel; 54 + 55 + watcher = map_kptr_race__open_and_load(); 56 + if (!ASSERT_OK_PTR(watcher, "watcher open_and_load")) 57 + goto out_skel; 58 + 59 + watcher->bss->target_map_id = map_id; 60 + watcher->links.map_put = bpf_program__attach(watcher->progs.map_put); 61 + if (!ASSERT_OK_PTR(watcher->links.map_put, "attach fentry")) 62 + goto out_watcher; 63 + watcher->links.htab_map_free = bpf_program__attach(watcher->progs.htab_map_free); 64 + if (!ASSERT_OK_PTR(watcher->links.htab_map_free, "attach fexit")) 65 + goto out_watcher; 66 + 67 + map_kptr_race__destroy(skel); 68 + skel = NULL; 69 + 70 + kern_sync_rcu(); 71 + 72 + while (!READ_ONCE(watcher->bss->map_freed)) 73 + sched_yield(); 74 + 75 + ASSERT_EQ(watcher->bss->map_freed, 1, "map_freed"); 76 + ASSERT_EQ(read_refs(watcher), 2, "htab refcount"); 77 + 78 + out_watcher: 79 + map_kptr_race__destroy(watcher); 80 + out_skel: 81 + map_kptr_race__destroy(skel); 82 + } 83 + 84 + static void test_percpu_htab_leak(void) 85 + { 86 + LIBBPF_OPTS(bpf_test_run_opts, opts, 87 + .data_in = &pkt_v4, 88 + .data_size_in = sizeof(pkt_v4), 89 + .repeat = 1, 90 + ); 91 + struct map_kptr_race *skel, *watcher; 92 + int ret, map_id; 93 + 94 + skel = map_kptr_race__open(); 95 + if (!ASSERT_OK_PTR(skel, "open")) 96 + return; 97 + 98 + skel->rodata->nr_cpus = libbpf_num_possible_cpus(); 99 + if (skel->rodata->nr_cpus > 16) 100 + skel->rodata->nr_cpus = 16; 101 + 102 + ret = map_kptr_race__load(skel); 103 + if (!ASSERT_OK(ret, "load")) 104 + goto out_skel; 105 + 106 + ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.test_percpu_htab_leak), &opts); 107 + if (!ASSERT_OK(ret, "test_percpu_htab_leak run")) 108 + goto out_skel; 109 + if (!ASSERT_OK(opts.retval, "test_percpu_htab_leak retval")) 110 + goto out_skel; 111 + 112 + map_id = get_map_id(bpf_map__fd(skel->maps.race_percpu_hash_map)); 113 + if (!ASSERT_GE(map_id, 0, "map_id")) 114 + goto out_skel; 115 + 116 + watcher = map_kptr_race__open_and_load(); 117 + if (!ASSERT_OK_PTR(watcher, "watcher open_and_load")) 118 + goto out_skel; 119 + 120 + watcher->bss->target_map_id = map_id; 121 + watcher->links.map_put = bpf_program__attach(watcher->progs.map_put); 122 + if (!ASSERT_OK_PTR(watcher->links.map_put, "attach fentry")) 123 + goto out_watcher; 124 + watcher->links.htab_map_free = bpf_program__attach(watcher->progs.htab_map_free); 125 + if (!ASSERT_OK_PTR(watcher->links.htab_map_free, "attach fexit")) 126 + goto out_watcher; 127 + 128 + map_kptr_race__destroy(skel); 129 + skel = NULL; 130 + 131 + kern_sync_rcu(); 132 + 133 + while (!READ_ONCE(watcher->bss->map_freed)) 134 + sched_yield(); 135 + 136 + ASSERT_EQ(watcher->bss->map_freed, 1, "map_freed"); 137 + ASSERT_EQ(read_refs(watcher), 2, "percpu_htab refcount"); 138 + 139 + out_watcher: 140 + map_kptr_race__destroy(watcher); 141 + out_skel: 142 + map_kptr_race__destroy(skel); 143 + } 144 + 145 + static void test_sk_ls_leak(void) 146 + { 147 + struct map_kptr_race *skel, *watcher; 148 + int listen_fd = -1, client_fd = -1, map_id; 149 + 150 + skel = map_kptr_race__open_and_load(); 151 + if (!ASSERT_OK_PTR(skel, "open_and_load")) 152 + return; 153 + 154 + if (!ASSERT_OK(map_kptr_race__attach(skel), "attach")) 155 + goto out_skel; 156 + 157 + listen_fd = start_server(AF_INET6, SOCK_STREAM, "::1", 0, 0); 158 + if (!ASSERT_GE(listen_fd, 0, "start_server")) 159 + goto out_skel; 160 + 161 + client_fd = connect_to_fd(listen_fd, 0); 162 + if (!ASSERT_GE(client_fd, 0, "connect_to_fd")) 163 + goto out_skel; 164 + 165 + if (!ASSERT_EQ(skel->bss->sk_ls_leak_done, 1, "sk_ls_leak_done")) 166 + goto out_skel; 167 + 168 + close(client_fd); 169 + client_fd = -1; 170 + close(listen_fd); 171 + listen_fd = -1; 172 + 173 + map_id = get_map_id(bpf_map__fd(skel->maps.race_sk_ls_map)); 174 + if (!ASSERT_GE(map_id, 0, "map_id")) 175 + goto out_skel; 176 + 177 + watcher = map_kptr_race__open_and_load(); 178 + if (!ASSERT_OK_PTR(watcher, "watcher open_and_load")) 179 + goto out_skel; 180 + 181 + watcher->bss->target_map_id = map_id; 182 + watcher->links.map_put = bpf_program__attach(watcher->progs.map_put); 183 + if (!ASSERT_OK_PTR(watcher->links.map_put, "attach fentry")) 184 + goto out_watcher; 185 + watcher->links.sk_map_free = bpf_program__attach(watcher->progs.sk_map_free); 186 + if (!ASSERT_OK_PTR(watcher->links.sk_map_free, "attach fexit")) 187 + goto out_watcher; 188 + 189 + map_kptr_race__destroy(skel); 190 + skel = NULL; 191 + 192 + kern_sync_rcu(); 193 + 194 + while (!READ_ONCE(watcher->bss->map_freed)) 195 + sched_yield(); 196 + 197 + ASSERT_EQ(watcher->bss->map_freed, 1, "map_freed"); 198 + ASSERT_EQ(read_refs(watcher), 2, "sk_ls refcount"); 199 + 200 + out_watcher: 201 + map_kptr_race__destroy(watcher); 202 + out_skel: 203 + if (client_fd >= 0) 204 + close(client_fd); 205 + if (listen_fd >= 0) 206 + close(listen_fd); 207 + map_kptr_race__destroy(skel); 208 + } 209 + 210 + void serial_test_map_kptr_race(void) 211 + { 212 + if (test__start_subtest("htab_leak")) 213 + test_htab_leak(); 214 + if (test__start_subtest("percpu_htab_leak")) 215 + test_percpu_htab_leak(); 216 + if (test__start_subtest("sk_ls_leak")) 217 + test_sk_ls_leak(); 218 + }
+197
tools/testing/selftests/bpf/progs/map_kptr_race.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* Copyright (c) 2026 Meta Platforms, Inc. and affiliates. */ 3 + #include <vmlinux.h> 4 + #include <bpf/bpf_helpers.h> 5 + #include <bpf/bpf_tracing.h> 6 + #include "../test_kmods/bpf_testmod_kfunc.h" 7 + 8 + struct map_value { 9 + struct prog_test_ref_kfunc __kptr *ref_ptr; 10 + }; 11 + 12 + struct { 13 + __uint(type, BPF_MAP_TYPE_HASH); 14 + __uint(map_flags, BPF_F_NO_PREALLOC); 15 + __type(key, int); 16 + __type(value, struct map_value); 17 + __uint(max_entries, 1); 18 + } race_hash_map SEC(".maps"); 19 + 20 + struct { 21 + __uint(type, BPF_MAP_TYPE_PERCPU_HASH); 22 + __uint(map_flags, BPF_F_NO_PREALLOC); 23 + __type(key, int); 24 + __type(value, struct map_value); 25 + __uint(max_entries, 1); 26 + } race_percpu_hash_map SEC(".maps"); 27 + 28 + struct { 29 + __uint(type, BPF_MAP_TYPE_SK_STORAGE); 30 + __uint(map_flags, BPF_F_NO_PREALLOC); 31 + __type(key, int); 32 + __type(value, struct map_value); 33 + } race_sk_ls_map SEC(".maps"); 34 + 35 + int num_of_refs; 36 + int sk_ls_leak_done; 37 + int target_map_id; 38 + int map_freed; 39 + const volatile int nr_cpus; 40 + 41 + SEC("tc") 42 + int test_htab_leak(struct __sk_buff *skb) 43 + { 44 + struct prog_test_ref_kfunc *p, *old; 45 + struct map_value val = {}; 46 + struct map_value *v; 47 + int key = 0; 48 + 49 + if (bpf_map_update_elem(&race_hash_map, &key, &val, BPF_ANY)) 50 + return 1; 51 + 52 + v = bpf_map_lookup_elem(&race_hash_map, &key); 53 + if (!v) 54 + return 2; 55 + 56 + p = bpf_kfunc_call_test_acquire(&(unsigned long){0}); 57 + if (!p) 58 + return 3; 59 + old = bpf_kptr_xchg(&v->ref_ptr, p); 60 + if (old) 61 + bpf_kfunc_call_test_release(old); 62 + 63 + bpf_map_delete_elem(&race_hash_map, &key); 64 + 65 + p = bpf_kfunc_call_test_acquire(&(unsigned long){0}); 66 + if (!p) 67 + return 4; 68 + old = bpf_kptr_xchg(&v->ref_ptr, p); 69 + if (old) 70 + bpf_kfunc_call_test_release(old); 71 + 72 + return 0; 73 + } 74 + 75 + static int fill_percpu_kptr(struct map_value *v) 76 + { 77 + struct prog_test_ref_kfunc *p, *old; 78 + 79 + p = bpf_kfunc_call_test_acquire(&(unsigned long){0}); 80 + if (!p) 81 + return 1; 82 + old = bpf_kptr_xchg(&v->ref_ptr, p); 83 + if (old) 84 + bpf_kfunc_call_test_release(old); 85 + return 0; 86 + } 87 + 88 + SEC("tc") 89 + int test_percpu_htab_leak(struct __sk_buff *skb) 90 + { 91 + struct map_value *v, *arr[16] = {}; 92 + struct map_value val = {}; 93 + int key = 0; 94 + int err = 0; 95 + 96 + if (bpf_map_update_elem(&race_percpu_hash_map, &key, &val, BPF_ANY)) 97 + return 1; 98 + 99 + for (int i = 0; i < nr_cpus; i++) { 100 + v = bpf_map_lookup_percpu_elem(&race_percpu_hash_map, &key, i); 101 + if (!v) 102 + return 2; 103 + arr[i] = v; 104 + } 105 + 106 + bpf_map_delete_elem(&race_percpu_hash_map, &key); 107 + 108 + for (int i = 0; i < nr_cpus; i++) { 109 + v = arr[i]; 110 + err = fill_percpu_kptr(v); 111 + if (err) 112 + return 3; 113 + } 114 + 115 + return 0; 116 + } 117 + 118 + SEC("tp_btf/inet_sock_set_state") 119 + int BPF_PROG(test_sk_ls_leak, struct sock *sk, int oldstate, int newstate) 120 + { 121 + struct prog_test_ref_kfunc *p, *old; 122 + struct map_value *v; 123 + 124 + if (newstate != BPF_TCP_SYN_SENT) 125 + return 0; 126 + 127 + if (sk_ls_leak_done) 128 + return 0; 129 + 130 + v = bpf_sk_storage_get(&race_sk_ls_map, sk, NULL, 131 + BPF_SK_STORAGE_GET_F_CREATE); 132 + if (!v) 133 + return 0; 134 + 135 + p = bpf_kfunc_call_test_acquire(&(unsigned long){0}); 136 + if (!p) 137 + return 0; 138 + old = bpf_kptr_xchg(&v->ref_ptr, p); 139 + if (old) 140 + bpf_kfunc_call_test_release(old); 141 + 142 + bpf_sk_storage_delete(&race_sk_ls_map, sk); 143 + 144 + p = bpf_kfunc_call_test_acquire(&(unsigned long){0}); 145 + if (!p) 146 + return 0; 147 + old = bpf_kptr_xchg(&v->ref_ptr, p); 148 + if (old) 149 + bpf_kfunc_call_test_release(old); 150 + 151 + sk_ls_leak_done = 1; 152 + return 0; 153 + } 154 + 155 + long target_map_ptr; 156 + 157 + SEC("fentry/bpf_map_put") 158 + int BPF_PROG(map_put, struct bpf_map *map) 159 + { 160 + if (target_map_id && map->id == (u32)target_map_id) 161 + target_map_ptr = (long)map; 162 + return 0; 163 + } 164 + 165 + SEC("fexit/htab_map_free") 166 + int BPF_PROG(htab_map_free, struct bpf_map *map) 167 + { 168 + if (target_map_ptr && (long)map == target_map_ptr) 169 + map_freed = 1; 170 + return 0; 171 + } 172 + 173 + SEC("fexit/bpf_sk_storage_map_free") 174 + int BPF_PROG(sk_map_free, struct bpf_map *map) 175 + { 176 + if (target_map_ptr && (long)map == target_map_ptr) 177 + map_freed = 1; 178 + return 0; 179 + } 180 + 181 + SEC("syscall") 182 + int count_ref(void *ctx) 183 + { 184 + struct prog_test_ref_kfunc *p; 185 + unsigned long arg = 0; 186 + 187 + p = bpf_kfunc_call_test_acquire(&arg); 188 + if (!p) 189 + return 1; 190 + 191 + num_of_refs = p->cnt.refs.counter; 192 + 193 + bpf_kfunc_call_test_release(p); 194 + return 0; 195 + } 196 + 197 + char _license[] SEC("license") = "GPL";