Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

bpf: Fix grace period wait for tracepoint bpf_link

Recently, tracepoints were switched from using disabled preemption
(which acts as RCU read section) to SRCU-fast when they are not
faultable. This means that to do a proper grace period wait for programs
running in such tracepoints, we must use SRCU's grace period wait.
This is only for non-faultable tracepoints, faultable ones continue
using RCU Tasks Trace.

However, bpf_link_free() currently does call_rcu() for all cases when
the link is non-sleepable (hence, for tracepoints, non-faultable). Fix
this by doing a call_srcu() grace period wait.

As far RCU Tasks Trace gp -> RCU gp chaining is concerned, it is deemed
unnecessary for tracepoint programs. The link and program are either
accessed under RCU Tasks Trace protection, or SRCU-fast protection now.

The earlier logic of chaining both RCU Tasks Trace and RCU gp waits was
to generalize the logic, even if it conceded an extra RCU gp wait,
however that is unnecessary for tracepoints even before this change.
In practice no cost was paid since rcu_trace_implies_rcu_gp() was always
true. Hence we need not chaining any RCU gp after the SRCU gp.

For instance, in the non-faultable raw tracepoint, the RCU read section
of the program in __bpf_trace_run() is enclosed in the SRCU gp, likewise
for faultable raw tracepoint, the program is under the RCU Tasks Trace
protection. Hence, the outermost scope can be waited upon to ensure
correctness.

Also, sleepable programs cannot be attached to non-faultable
tracepoints, so whenever program or link is sleepable, only RCU Tasks
Trace protection is being used for the link and prog.

Fixes: a46023d5616e ("tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast")
Reviewed-by: Sun Jian <sun.jian.kdev@gmail.com>
Reviewed-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20260331211021.1632902-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

authored by

Kumar Kartikeya Dwivedi and committed by
Alexei Starovoitov
c76fef7d a8502a79

+47 -2
+4
include/linux/bpf.h
··· 1854 1854 * target hook is sleepable, we'll go through tasks trace RCU GP and 1855 1855 * then "classic" RCU GP; this need for chaining tasks trace and 1856 1856 * classic RCU GPs is designated by setting bpf_link->sleepable flag 1857 + * 1858 + * For non-sleepable tracepoint links we go through SRCU gp instead, 1859 + * since RCU is not used in that case. Sleepable tracepoints still 1860 + * follow the scheme above. 1857 1861 */ 1858 1862 void (*dealloc_deferred)(struct bpf_link *link); 1859 1863 int (*detach)(struct bpf_link *link);
+20
include/linux/tracepoint.h
··· 122 122 { 123 123 return tp->ext && tp->ext->faultable; 124 124 } 125 + /* 126 + * Run RCU callback with the appropriate grace period wait for non-faultable 127 + * tracepoints, e.g., those used in atomic context. 128 + */ 129 + static inline void call_tracepoint_unregister_atomic(struct rcu_head *rcu, rcu_callback_t func) 130 + { 131 + call_srcu(&tracepoint_srcu, rcu, func); 132 + } 133 + /* 134 + * Run RCU callback with the appropriate grace period wait for faultable 135 + * tracepoints, e.g., those used in syscall context. 136 + */ 137 + static inline void call_tracepoint_unregister_syscall(struct rcu_head *rcu, rcu_callback_t func) 138 + { 139 + call_rcu_tasks_trace(rcu, func); 140 + } 125 141 #else 126 142 static inline void tracepoint_synchronize_unregister(void) 127 143 { } ··· 145 129 { 146 130 return false; 147 131 } 132 + static inline void call_tracepoint_unregister_atomic(struct rcu_head *rcu, rcu_callback_t func) 133 + { } 134 + static inline void call_tracepoint_unregister_syscall(struct rcu_head *rcu, rcu_callback_t func) 135 + { } 148 136 #endif 149 137 150 138 #ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS
+23 -2
kernel/bpf/syscall.c
··· 3261 3261 bpf_link_dealloc(link); 3262 3262 } 3263 3263 3264 + static bool bpf_link_is_tracepoint(struct bpf_link *link) 3265 + { 3266 + /* 3267 + * Only these combinations support a tracepoint bpf_link. 3268 + * BPF_LINK_TYPE_TRACING raw_tp progs are hardcoded to use 3269 + * bpf_raw_tp_link_lops and thus dealloc_deferred(), see 3270 + * bpf_raw_tp_link_attach(). 3271 + */ 3272 + return link->type == BPF_LINK_TYPE_RAW_TRACEPOINT || 3273 + (link->type == BPF_LINK_TYPE_TRACING && link->attach_type == BPF_TRACE_RAW_TP); 3274 + } 3275 + 3264 3276 static void bpf_link_defer_dealloc_mult_rcu_gp(struct rcu_head *rcu) 3265 3277 { 3266 3278 if (rcu_trace_implies_rcu_gp()) ··· 3291 3279 if (link->prog) 3292 3280 ops->release(link); 3293 3281 if (ops->dealloc_deferred) { 3294 - /* Schedule BPF link deallocation, which will only then 3282 + /* 3283 + * Schedule BPF link deallocation, which will only then 3295 3284 * trigger putting BPF program refcount. 3296 3285 * If underlying BPF program is sleepable or BPF link's target 3297 3286 * attach hookpoint is sleepable or otherwise requires RCU GPs 3298 3287 * to ensure link and its underlying BPF program is not 3299 3288 * reachable anymore, we need to first wait for RCU tasks 3300 - * trace sync, and then go through "classic" RCU grace period 3289 + * trace sync, and then go through "classic" RCU grace period. 3290 + * 3291 + * For tracepoint BPF links, we need to go through SRCU grace 3292 + * period wait instead when non-faultable tracepoint is used. We 3293 + * don't need to chain SRCU grace period waits, however, for the 3294 + * faultable case, since it exclusively uses RCU Tasks Trace. 3301 3295 */ 3302 3296 if (link->sleepable || (link->prog && link->prog->sleepable)) 3303 3297 call_rcu_tasks_trace(&link->rcu, bpf_link_defer_dealloc_mult_rcu_gp); 3298 + /* We need to do a SRCU grace period wait for non-faultable tracepoint BPF links. */ 3299 + else if (bpf_link_is_tracepoint(link)) 3300 + call_tracepoint_unregister_atomic(&link->rcu, bpf_link_defer_dealloc_rcu_gp); 3304 3301 else 3305 3302 call_rcu(&link->rcu, bpf_link_defer_dealloc_rcu_gp); 3306 3303 } else if (ops->dealloc) {