Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'trace-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing updates from Steven Rostedt:

- Addition of faultable tracepoints

There's a tracepoint attached to both a system call entry and exit.
This location is known to allow page faults. The tracepoints are
called under an rcu_read_lock() which does not allow faults that can
sleep. This limits the ability of tracepoint handlers to page fault
in user space system call parameters. Now these tracepoints have been
made "faultable", allowing the callbacks to fault in user space
parameters and record them.

Note, only the infrastructure has been implemented. The consumers
(perf, ftrace, BPF) now need to have their code modified to allow
faults.

- Fix up of BPF code for the tracepoint faultable logic

- Update tracepoints to use the new static branch API

- Remove trace_*_rcuidle() variants and the SRCU protection they used

- Remove unused TRACE_EVENT_FL_FILTERED logic

- Replace strncpy() with strscpy() and memcpy()

- Use replace per_cpu_ptr(smp_processor_id()) with this_cpu_ptr()

- Fix perf events to not duplicate samples when tracing is enabled

- Replace atomic64_add_return(1, counter) with
atomic64_inc_return(counter)

- Make stack trace buffer 4K instead of PAGE_SIZE

- Remove TRACE_FLAG_IRQS_NOSUPPORT flag as it was never used

- Get the true return address for function tracer when function graph
tracer is also running.

When function_graph trace is running along with function tracer, the
parent function of the function tracer sometimes is
"return_to_handler", which is the function graph trampoline to record
the exit of the function. Use existing logic that calls into the
fgraph infrastructure to find the real return address.

- Remove (un)regfunc pointers out of tracepoint structure

- Added last minute bug fix for setting pending modules in stack
function filter.

echo "write*:mod:ext3" > /sys/kernel/tracing/stack_trace_filter

Would cause a kernel NULL dereference.

- Minor clean ups

* tag 'trace-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (31 commits)
ftrace: Fix regression with module command in stack_trace_filter
tracing: Fix function name for trampoline
ftrace: Get the true parent ip for function tracer
tracing: Remove redundant check on field->field in histograms
bpf: ensure RCU Tasks Trace GP for sleepable raw tracepoint BPF links
bpf: decouple BPF link/attach hook and BPF program sleepable semantics
bpf: put bpf_link's program when link is safe to be deallocated
tracing: Replace strncpy() with strscpy() when copying comm
tracing: Add might_fault() check in __DECLARE_TRACE_SYSCALL
tracing: Fix syscall tracepoint use-after-free
tracing: Introduce tracepoint_is_faultable()
tracing: Introduce tracepoint extended structure
tracing: Remove TRACE_FLAG_IRQS_NOSUPPORT
tracing: Replace multiple deprecated strncpy with memcpy
tracing: Make percpu stack trace buffer invariant to PAGE_SIZE
tracing: Use atomic64_inc_return() in trace_clock_counter()
trace/trace_event_perf: remove duplicate samples on the first tracepoint event
tracing/bpf: Add might_fault check to syscall probes
tracing/perf: Add might_fault check to syscall probes
tracing/ftrace: Add might_fault check to syscall probes
...

+452 -328
-3
Documentation/trace/ftrace.rst
··· 1031 1031 CPU#: The CPU which the process was running on. 1032 1032 1033 1033 irqs-off: 'd' interrupts are disabled. '.' otherwise. 1034 - .. caution:: If the architecture does not support a way to 1035 - read the irq flags variable, an 'X' will always 1036 - be printed here. 1037 1034 1038 1035 need-resched: 1039 1036 - 'N' both TIF_NEED_RESCHED and PREEMPT_NEED_RESCHED is set,
+18 -2
include/linux/bpf.h
··· 1645 1645 enum bpf_link_type type; 1646 1646 const struct bpf_link_ops *ops; 1647 1647 struct bpf_prog *prog; 1648 + /* whether BPF link itself has "sleepable" semantics, which can differ 1649 + * from underlying BPF program having a "sleepable" semantics, as BPF 1650 + * link's semantics is determined by target attach hook 1651 + */ 1652 + bool sleepable; 1648 1653 /* rcu is used before freeing, work can be used to schedule that 1649 1654 * RCU-based freeing before that, so they never overlap 1650 1655 */ ··· 1666 1661 */ 1667 1662 void (*dealloc)(struct bpf_link *link); 1668 1663 /* deallocate link resources callback, called after RCU grace period; 1669 - * if underlying BPF program is sleepable we go through tasks trace 1670 - * RCU GP and then "classic" RCU GP 1664 + * if either the underlying BPF program is sleepable or BPF link's 1665 + * target hook is sleepable, we'll go through tasks trace RCU GP and 1666 + * then "classic" RCU GP; this need for chaining tasks trace and 1667 + * classic RCU GPs is designated by setting bpf_link->sleepable flag 1671 1668 */ 1672 1669 void (*dealloc_deferred)(struct bpf_link *link); 1673 1670 int (*detach)(struct bpf_link *link); ··· 2416 2409 2417 2410 void bpf_link_init(struct bpf_link *link, enum bpf_link_type type, 2418 2411 const struct bpf_link_ops *ops, struct bpf_prog *prog); 2412 + void bpf_link_init_sleepable(struct bpf_link *link, enum bpf_link_type type, 2413 + const struct bpf_link_ops *ops, struct bpf_prog *prog, 2414 + bool sleepable); 2419 2415 int bpf_link_prime(struct bpf_link *link, struct bpf_link_primer *primer); 2420 2416 int bpf_link_settle(struct bpf_link_primer *primer); 2421 2417 void bpf_link_cleanup(struct bpf_link_primer *primer); ··· 2771 2761 static inline void bpf_link_init(struct bpf_link *link, enum bpf_link_type type, 2772 2762 const struct bpf_link_ops *ops, 2773 2763 struct bpf_prog *prog) 2764 + { 2765 + } 2766 + 2767 + static inline void bpf_link_init_sleepable(struct bpf_link *link, enum bpf_link_type type, 2768 + const struct bpf_link_ops *ops, struct bpf_prog *prog, 2769 + bool sleepable) 2774 2770 { 2775 2771 } 2776 2772
-17
include/linux/trace_events.h
··· 184 184 185 185 enum trace_flag_type { 186 186 TRACE_FLAG_IRQS_OFF = 0x01, 187 - TRACE_FLAG_IRQS_NOSUPPORT = 0x02, 188 187 TRACE_FLAG_NEED_RESCHED = 0x04, 189 188 TRACE_FLAG_HARDIRQ = 0x08, 190 189 TRACE_FLAG_SOFTIRQ = 0x10, ··· 192 193 TRACE_FLAG_BH_OFF = 0x80, 193 194 }; 194 195 195 - #ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT 196 196 static inline unsigned int tracing_gen_ctx_flags(unsigned long irqflags) 197 197 { 198 198 unsigned int irq_status = irqs_disabled_flags(irqflags) ? ··· 205 207 local_save_flags(irqflags); 206 208 return tracing_gen_ctx_flags(irqflags); 207 209 } 208 - #else 209 - 210 - static inline unsigned int tracing_gen_ctx_flags(unsigned long irqflags) 211 - { 212 - return tracing_gen_ctx_irq_test(TRACE_FLAG_IRQS_NOSUPPORT); 213 - } 214 - static inline unsigned int tracing_gen_ctx(void) 215 - { 216 - return tracing_gen_ctx_irq_test(TRACE_FLAG_IRQS_NOSUPPORT); 217 - } 218 - #endif 219 210 220 211 static inline unsigned int tracing_gen_ctx_dec(void) 221 212 { ··· 313 326 void trace_event_buffer_commit(struct trace_event_buffer *fbuffer); 314 327 315 328 enum { 316 - TRACE_EVENT_FL_FILTERED_BIT, 317 329 TRACE_EVENT_FL_CAP_ANY_BIT, 318 330 TRACE_EVENT_FL_NO_SET_FILTER_BIT, 319 331 TRACE_EVENT_FL_IGNORE_ENABLE_BIT, ··· 327 341 328 342 /* 329 343 * Event flags: 330 - * FILTERED - The event has a filter attached 331 344 * CAP_ANY - Any user can enable for perf 332 345 * NO_SET_FILTER - Set when filter has error and is to be ignored 333 346 * IGNORE_ENABLE - For trace internal events, do not enable with debugfs file ··· 341 356 * to a tracepoint yet, then it is cleared when it is. 342 357 */ 343 358 enum { 344 - TRACE_EVENT_FL_FILTERED = (1 << TRACE_EVENT_FL_FILTERED_BIT), 345 359 TRACE_EVENT_FL_CAP_ANY = (1 << TRACE_EVENT_FL_CAP_ANY_BIT), 346 360 TRACE_EVENT_FL_NO_SET_FILTER = (1 << TRACE_EVENT_FL_NO_SET_FILTER_BIT), 347 361 TRACE_EVENT_FL_IGNORE_ENABLE = (1 << TRACE_EVENT_FL_IGNORE_ENABLE_BIT), ··· 365 381 }; 366 382 struct trace_event event; 367 383 char *print_fmt; 368 - struct event_filter *filter; 369 384 /* 370 385 * Static events can disappear with modules, 371 386 * where as dynamic ones need their own ref count.
+10 -4
include/linux/tracepoint-defs.h
··· 29 29 int prio; 30 30 }; 31 31 32 + struct tracepoint_ext { 33 + int (*regfunc)(void); 34 + void (*unregfunc)(void); 35 + /* Flags. */ 36 + unsigned int faultable:1; 37 + }; 38 + 32 39 struct tracepoint { 33 40 const char *name; /* Tracepoint name */ 34 - struct static_key key; 41 + struct static_key_false key; 35 42 struct static_call_key *static_call_key; 36 43 void *static_call_tramp; 37 44 void *iterator; 38 45 void *probestub; 39 - int (*regfunc)(void); 40 - void (*unregfunc)(void); 41 46 struct tracepoint_func __rcu *funcs; 47 + struct tracepoint_ext *ext; 42 48 }; 43 49 44 50 #ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS ··· 89 83 90 84 #ifdef CONFIG_TRACEPOINTS 91 85 # define tracepoint_enabled(tp) \ 92 - static_key_false(&(__tracepoint_##tp).key) 86 + static_branch_unlikely(&(__tracepoint_##tp).key) 93 87 #else 94 88 # define tracepoint_enabled(tracepoint) false 95 89 #endif
+96 -73
include/linux/tracepoint.h
··· 17 17 #include <linux/errno.h> 18 18 #include <linux/types.h> 19 19 #include <linux/rcupdate.h> 20 + #include <linux/rcupdate_trace.h> 20 21 #include <linux/tracepoint-defs.h> 21 22 #include <linux/static_call.h> 22 23 ··· 32 31 }; 33 32 34 33 #define TRACEPOINT_DEFAULT_PRIO 10 35 - 36 - extern struct srcu_struct tracepoint_srcu; 37 34 38 35 extern int 39 36 tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data); ··· 104 105 * tracepoint_synchronize_unregister must be called between the last tracepoint 105 106 * probe unregistration and the end of module exit to make sure there is no 106 107 * caller executing a probe when it is freed. 108 + * 109 + * An alternative is to use the following for batch reclaim associated 110 + * with a given tracepoint: 111 + * 112 + * - tracepoint_is_faultable() == false: call_rcu() 113 + * - tracepoint_is_faultable() == true: call_rcu_tasks_trace() 107 114 */ 108 115 #ifdef CONFIG_TRACEPOINTS 109 116 static inline void tracepoint_synchronize_unregister(void) 110 117 { 111 - synchronize_srcu(&tracepoint_srcu); 118 + synchronize_rcu_tasks_trace(); 112 119 synchronize_rcu(); 120 + } 121 + static inline bool tracepoint_is_faultable(struct tracepoint *tp) 122 + { 123 + return tp->ext && tp->ext->faultable; 113 124 } 114 125 #else 115 126 static inline void tracepoint_synchronize_unregister(void) 116 127 { } 128 + static inline bool tracepoint_is_faultable(struct tracepoint *tp) 129 + { 130 + return false; 131 + } 117 132 #endif 118 133 119 134 #ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS ··· 210 197 #endif /* CONFIG_HAVE_STATIC_CALL */ 211 198 212 199 /* 213 - * ARCH_WANTS_NO_INSTR archs are expected to have sanitized entry and idle 214 - * code that disallow any/all tracing/instrumentation when RCU isn't watching. 215 - */ 216 - #ifdef CONFIG_ARCH_WANTS_NO_INSTR 217 - #define RCUIDLE_COND(rcuidle) (rcuidle) 218 - #else 219 - /* srcu can't be used from NMI */ 220 - #define RCUIDLE_COND(rcuidle) (rcuidle && in_nmi()) 221 - #endif 222 - 223 - /* 224 200 * it_func[0] is never NULL because there is at least one element in the array 225 201 * when the array itself is non NULL. 202 + * 203 + * With @syscall=0, the tracepoint callback array dereference is 204 + * protected by disabling preemption. 205 + * With @syscall=1, the tracepoint callback array dereference is 206 + * protected by Tasks Trace RCU, which allows probes to handle page 207 + * faults. 226 208 */ 227 - #define __DO_TRACE(name, args, cond, rcuidle) \ 209 + #define __DO_TRACE(name, args, cond, syscall) \ 228 210 do { \ 229 211 int __maybe_unused __idx = 0; \ 230 212 \ 231 213 if (!(cond)) \ 232 214 return; \ 233 215 \ 234 - if (WARN_ONCE(RCUIDLE_COND(rcuidle), \ 235 - "Bad RCU usage for tracepoint")) \ 236 - return; \ 237 - \ 238 - /* keep srcu and sched-rcu usage consistent */ \ 239 - preempt_disable_notrace(); \ 240 - \ 241 - /* \ 242 - * For rcuidle callers, use srcu since sched-rcu \ 243 - * doesn't work from the idle path. \ 244 - */ \ 245 - if (rcuidle) { \ 246 - __idx = srcu_read_lock_notrace(&tracepoint_srcu);\ 247 - ct_irq_enter_irqson(); \ 248 - } \ 216 + if (syscall) \ 217 + rcu_read_lock_trace(); \ 218 + else \ 219 + preempt_disable_notrace(); \ 249 220 \ 250 221 __DO_TRACE_CALL(name, TP_ARGS(args)); \ 251 222 \ 252 - if (rcuidle) { \ 253 - ct_irq_exit_irqson(); \ 254 - srcu_read_unlock_notrace(&tracepoint_srcu, __idx);\ 255 - } \ 256 - \ 257 - preempt_enable_notrace(); \ 223 + if (syscall) \ 224 + rcu_read_unlock_trace(); \ 225 + else \ 226 + preempt_enable_notrace(); \ 258 227 } while (0) 259 - 260 - #ifndef MODULE 261 - #define __DECLARE_TRACE_RCU(name, proto, args, cond) \ 262 - static inline void trace_##name##_rcuidle(proto) \ 263 - { \ 264 - if (static_key_false(&__tracepoint_##name.key)) \ 265 - __DO_TRACE(name, \ 266 - TP_ARGS(args), \ 267 - TP_CONDITION(cond), 1); \ 268 - } 269 - #else 270 - #define __DECLARE_TRACE_RCU(name, proto, args, cond) 271 - #endif 272 228 273 229 /* 274 230 * Make sure the alignment of the structure in the __tracepoints section will ··· 250 268 * site if it is not watching, as it will need to be active when the 251 269 * tracepoint is enabled. 252 270 */ 253 - #define __DECLARE_TRACE(name, proto, args, cond, data_proto) \ 271 + #define __DECLARE_TRACE_COMMON(name, proto, args, cond, data_proto) \ 254 272 extern int __traceiter_##name(data_proto); \ 255 273 DECLARE_STATIC_CALL(tp_func_##name, __traceiter_##name); \ 256 274 extern struct tracepoint __tracepoint_##name; \ 257 - static inline void trace_##name(proto) \ 258 - { \ 259 - if (static_key_false(&__tracepoint_##name.key)) \ 260 - __DO_TRACE(name, \ 261 - TP_ARGS(args), \ 262 - TP_CONDITION(cond), 0); \ 263 - if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) { \ 264 - WARN_ONCE(!rcu_is_watching(), \ 265 - "RCU not watching for tracepoint"); \ 266 - } \ 267 - } \ 268 - __DECLARE_TRACE_RCU(name, PARAMS(proto), PARAMS(args), \ 269 - PARAMS(cond)) \ 270 275 static inline int \ 271 276 register_trace_##name(void (*probe)(data_proto), void *data) \ 272 277 { \ ··· 280 311 static inline bool \ 281 312 trace_##name##_enabled(void) \ 282 313 { \ 283 - return static_key_false(&__tracepoint_##name.key); \ 314 + return static_branch_unlikely(&__tracepoint_##name.key);\ 315 + } 316 + 317 + #define __DECLARE_TRACE(name, proto, args, cond, data_proto) \ 318 + __DECLARE_TRACE_COMMON(name, PARAMS(proto), PARAMS(args), cond, PARAMS(data_proto)) \ 319 + static inline void trace_##name(proto) \ 320 + { \ 321 + if (static_branch_unlikely(&__tracepoint_##name.key)) \ 322 + __DO_TRACE(name, \ 323 + TP_ARGS(args), \ 324 + TP_CONDITION(cond), 0); \ 325 + if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) { \ 326 + WARN_ONCE(!rcu_is_watching(), \ 327 + "RCU not watching for tracepoint"); \ 328 + } \ 329 + } 330 + 331 + #define __DECLARE_TRACE_SYSCALL(name, proto, args, cond, data_proto) \ 332 + __DECLARE_TRACE_COMMON(name, PARAMS(proto), PARAMS(args), cond, PARAMS(data_proto)) \ 333 + static inline void trace_##name(proto) \ 334 + { \ 335 + might_fault(); \ 336 + if (static_branch_unlikely(&__tracepoint_##name.key)) \ 337 + __DO_TRACE(name, \ 338 + TP_ARGS(args), \ 339 + TP_CONDITION(cond), 1); \ 340 + if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) { \ 341 + WARN_ONCE(!rcu_is_watching(), \ 342 + "RCU not watching for tracepoint"); \ 343 + } \ 284 344 } 285 345 286 346 /* ··· 317 319 * structures, so we create an array of pointers that will be used for iteration 318 320 * on the tracepoints. 319 321 */ 320 - #define DEFINE_TRACE_FN(_name, _reg, _unreg, proto, args) \ 322 + #define __DEFINE_TRACE_EXT(_name, _ext, proto, args) \ 321 323 static const char __tpstrtab_##_name[] \ 322 324 __section("__tracepoints_strings") = #_name; \ 323 325 extern struct static_call_key STATIC_CALL_KEY(tp_func_##_name); \ ··· 326 328 struct tracepoint __tracepoint_##_name __used \ 327 329 __section("__tracepoints") = { \ 328 330 .name = __tpstrtab_##_name, \ 329 - .key = STATIC_KEY_INIT_FALSE, \ 331 + .key = STATIC_KEY_FALSE_INIT, \ 330 332 .static_call_key = &STATIC_CALL_KEY(tp_func_##_name), \ 331 333 .static_call_tramp = STATIC_CALL_TRAMP_ADDR(tp_func_##_name), \ 332 334 .iterator = &__traceiter_##_name, \ 333 335 .probestub = &__probestub_##_name, \ 334 - .regfunc = _reg, \ 335 - .unregfunc = _unreg, \ 336 - .funcs = NULL }; \ 336 + .funcs = NULL, \ 337 + .ext = _ext, \ 338 + }; \ 337 339 __TRACEPOINT_ENTRY(_name); \ 338 340 int __traceiter_##_name(void *__data, proto) \ 339 341 { \ ··· 356 358 } \ 357 359 DEFINE_STATIC_CALL(tp_func_##_name, __traceiter_##_name); 358 360 359 - #define DEFINE_TRACE(name, proto, args) \ 360 - DEFINE_TRACE_FN(name, NULL, NULL, PARAMS(proto), PARAMS(args)); 361 + #define DEFINE_TRACE_FN(_name, _reg, _unreg, _proto, _args) \ 362 + static struct tracepoint_ext __tracepoint_ext_##_name = { \ 363 + .regfunc = _reg, \ 364 + .unregfunc = _unreg, \ 365 + .faultable = false, \ 366 + }; \ 367 + __DEFINE_TRACE_EXT(_name, &__tracepoint_ext_##_name, PARAMS(_proto), PARAMS(_args)); 368 + 369 + #define DEFINE_TRACE_SYSCALL(_name, _reg, _unreg, _proto, _args) \ 370 + static struct tracepoint_ext __tracepoint_ext_##_name = { \ 371 + .regfunc = _reg, \ 372 + .unregfunc = _unreg, \ 373 + .faultable = true, \ 374 + }; \ 375 + __DEFINE_TRACE_EXT(_name, &__tracepoint_ext_##_name, PARAMS(_proto), PARAMS(_args)); 376 + 377 + #define DEFINE_TRACE(_name, _proto, _args) \ 378 + __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args)); 361 379 362 380 #define EXPORT_TRACEPOINT_SYMBOL_GPL(name) \ 363 381 EXPORT_SYMBOL_GPL(__tracepoint_##name); \ ··· 388 374 #else /* !TRACEPOINTS_ENABLED */ 389 375 #define __DECLARE_TRACE(name, proto, args, cond, data_proto) \ 390 376 static inline void trace_##name(proto) \ 391 - { } \ 392 - static inline void trace_##name##_rcuidle(proto) \ 393 377 { } \ 394 378 static inline int \ 395 379 register_trace_##name(void (*probe)(data_proto), \ ··· 410 398 return false; \ 411 399 } 412 400 401 + #define __DECLARE_TRACE_SYSCALL __DECLARE_TRACE 402 + 413 403 #define DEFINE_TRACE_FN(name, reg, unreg, proto, args) 404 + #define DEFINE_TRACE_SYSCALL(name, reg, unreg, proto, args) 414 405 #define DEFINE_TRACE(name, proto, args) 415 406 #define EXPORT_TRACEPOINT_SYMBOL_GPL(name) 416 407 #define EXPORT_TRACEPOINT_SYMBOL(name) ··· 473 458 __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args), \ 474 459 cpu_online(raw_smp_processor_id()) && (PARAMS(cond)), \ 475 460 PARAMS(void *__data, proto)) 461 + 462 + #define DECLARE_TRACE_SYSCALL(name, proto, args) \ 463 + __DECLARE_TRACE_SYSCALL(name, PARAMS(proto), PARAMS(args), \ 464 + cpu_online(raw_smp_processor_id()), \ 465 + PARAMS(void *__data, proto)) 476 466 477 467 #define TRACE_EVENT_FLAGS(event, flag) 478 468 ··· 616 596 struct, assign, print) \ 617 597 DECLARE_TRACE_CONDITION(name, PARAMS(proto), \ 618 598 PARAMS(args), PARAMS(cond)) 599 + #define TRACE_EVENT_SYSCALL(name, proto, args, struct, assign, \ 600 + print, reg, unreg) \ 601 + DECLARE_TRACE_SYSCALL(name, PARAMS(proto), PARAMS(args)) 619 602 620 603 #define TRACE_EVENT_FLAGS(event, flag) 621 604
+14
include/trace/bpf_probe.h
··· 53 53 #define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \ 54 54 __BPF_DECLARE_TRACE(call, PARAMS(proto), PARAMS(args)) 55 55 56 + #define __BPF_DECLARE_TRACE_SYSCALL(call, proto, args) \ 57 + static notrace void \ 58 + __bpf_trace_##call(void *__data, proto) \ 59 + { \ 60 + might_fault(); \ 61 + preempt_disable_notrace(); \ 62 + CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(__data, CAST_TO_U64(args)); \ 63 + preempt_enable_notrace(); \ 64 + } 65 + 66 + #undef DECLARE_EVENT_SYSCALL_CLASS 67 + #define DECLARE_EVENT_SYSCALL_CLASS(call, proto, args, tstruct, assign, print) \ 68 + __BPF_DECLARE_TRACE_SYSCALL(call, PARAMS(proto), PARAMS(args)) 69 + 56 70 /* 57 71 * This part is compiled out, it is only here as a build time check 58 72 * to make sure that if the tracepoint handling changes, the
+5
include/trace/define_trace.h
··· 46 46 assign, print, reg, unreg) \ 47 47 DEFINE_TRACE_FN(name, reg, unreg, PARAMS(proto), PARAMS(args)) 48 48 49 + #undef TRACE_EVENT_SYSCALL 50 + #define TRACE_EVENT_SYSCALL(name, proto, args, struct, assign, print, reg, unreg) \ 51 + DEFINE_TRACE_SYSCALL(name, reg, unreg, PARAMS(proto), PARAMS(args)) 52 + 49 53 #undef TRACE_EVENT_NOP 50 54 #define TRACE_EVENT_NOP(name, proto, args, struct, assign, print) 51 55 ··· 111 107 #undef TRACE_EVENT 112 108 #undef TRACE_EVENT_FN 113 109 #undef TRACE_EVENT_FN_COND 110 + #undef TRACE_EVENT_SYSCALL 114 111 #undef TRACE_EVENT_CONDITION 115 112 #undef TRACE_EVENT_NOP 116 113 #undef DEFINE_EVENT_NOP
-8
include/trace/events/preemptirq.h
··· 43 43 #else 44 44 #define trace_irq_enable(...) 45 45 #define trace_irq_disable(...) 46 - #define trace_irq_enable_rcuidle(...) 47 - #define trace_irq_disable_rcuidle(...) 48 46 #endif 49 47 50 48 #ifdef CONFIG_TRACE_PREEMPT_TOGGLE ··· 56 58 #else 57 59 #define trace_preempt_enable(...) 58 60 #define trace_preempt_disable(...) 59 - #define trace_preempt_enable_rcuidle(...) 60 - #define trace_preempt_disable_rcuidle(...) 61 61 #endif 62 62 63 63 #endif /* _TRACE_PREEMPTIRQ_H */ ··· 65 69 #else /* !CONFIG_PREEMPTIRQ_TRACEPOINTS */ 66 70 #define trace_irq_enable(...) 67 71 #define trace_irq_disable(...) 68 - #define trace_irq_enable_rcuidle(...) 69 - #define trace_irq_disable_rcuidle(...) 70 72 #define trace_preempt_enable(...) 71 73 #define trace_preempt_disable(...) 72 - #define trace_preempt_enable_rcuidle(...) 73 - #define trace_preempt_disable_rcuidle(...) 74 74 #endif
+2 -2
include/trace/events/syscalls.h
··· 15 15 16 16 #ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS 17 17 18 - TRACE_EVENT_FN(sys_enter, 18 + TRACE_EVENT_SYSCALL(sys_enter, 19 19 20 20 TP_PROTO(struct pt_regs *regs, long id), 21 21 ··· 41 41 42 42 TRACE_EVENT_FLAGS(sys_enter, TRACE_EVENT_FL_CAP_ANY) 43 43 44 - TRACE_EVENT_FN(sys_exit, 44 + TRACE_EVENT_SYSCALL(sys_exit, 45 45 46 46 TP_PROTO(struct pt_regs *regs, long ret), 47 47
+41 -3
include/trace/perf.h
··· 12 12 #undef __perf_task 13 13 #define __perf_task(t) (__task = (t)) 14 14 15 - #undef DECLARE_EVENT_CLASS 16 - #define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \ 15 + #undef __DECLARE_EVENT_CLASS 16 + #define __DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \ 17 17 static notrace void \ 18 - perf_trace_##call(void *__data, proto) \ 18 + do_perf_trace_##call(void *__data, proto) \ 19 19 { \ 20 20 struct trace_event_call *event_call = __data; \ 21 21 struct trace_event_data_offsets_##call __maybe_unused __data_offsets;\ ··· 56 56 } 57 57 58 58 /* 59 + * Define unused __count and __task variables to use @args to pass 60 + * arguments to do_perf_trace_##call. This is needed because the 61 + * macros __perf_count and __perf_task introduce the side-effect to 62 + * store copies into those local variables. 63 + */ 64 + #undef DECLARE_EVENT_CLASS 65 + #define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \ 66 + __DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), PARAMS(tstruct), \ 67 + PARAMS(assign), PARAMS(print)) \ 68 + static notrace void \ 69 + perf_trace_##call(void *__data, proto) \ 70 + { \ 71 + u64 __count __attribute__((unused)); \ 72 + struct task_struct *__task __attribute__((unused)); \ 73 + \ 74 + do_perf_trace_##call(__data, args); \ 75 + } 76 + 77 + #undef DECLARE_EVENT_SYSCALL_CLASS 78 + #define DECLARE_EVENT_SYSCALL_CLASS(call, proto, args, tstruct, assign, print) \ 79 + __DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), PARAMS(tstruct), \ 80 + PARAMS(assign), PARAMS(print)) \ 81 + static notrace void \ 82 + perf_trace_##call(void *__data, proto) \ 83 + { \ 84 + u64 __count __attribute__((unused)); \ 85 + struct task_struct *__task __attribute__((unused)); \ 86 + \ 87 + might_fault(); \ 88 + preempt_disable_notrace(); \ 89 + do_perf_trace_##call(__data, args); \ 90 + preempt_enable_notrace(); \ 91 + } 92 + 93 + /* 59 94 * This part is compiled out, it is only here as a build time check 60 95 * to make sure that if the tracepoint handling changes, the 61 96 * perf probe will fail to compile unless it too is updated. ··· 108 73 DEFINE_EVENT(template, name, PARAMS(proto), PARAMS(args)) 109 74 110 75 #include TRACE_INCLUDE(TRACE_INCLUDE_FILE) 76 + 77 + #undef __DECLARE_EVENT_CLASS 78 + 111 79 #endif /* CONFIG_PERF_EVENTS */
+58 -4
include/trace/trace_events.h
··· 45 45 PARAMS(print)); \ 46 46 DEFINE_EVENT(name, name, PARAMS(proto), PARAMS(args)); 47 47 48 + #undef TRACE_EVENT_SYSCALL 49 + #define TRACE_EVENT_SYSCALL(name, proto, args, tstruct, assign, print, reg, unreg) \ 50 + DECLARE_EVENT_SYSCALL_CLASS(name, \ 51 + PARAMS(proto), \ 52 + PARAMS(args), \ 53 + PARAMS(tstruct), \ 54 + PARAMS(assign), \ 55 + PARAMS(print)); \ 56 + DEFINE_EVENT(name, name, PARAMS(proto), PARAMS(args)); 57 + 48 58 #include "stages/stage1_struct_define.h" 49 59 50 60 #undef DECLARE_EVENT_CLASS ··· 66 56 }; \ 67 57 \ 68 58 static struct trace_event_class event_class_##name; 59 + 60 + #undef DECLARE_EVENT_SYSCALL_CLASS 61 + #define DECLARE_EVENT_SYSCALL_CLASS DECLARE_EVENT_CLASS 69 62 70 63 #undef DEFINE_EVENT 71 64 #define DEFINE_EVENT(template, name, proto, args) \ ··· 129 116 struct trace_event_data_offsets_##call { \ 130 117 tstruct; \ 131 118 }; 119 + 120 + #undef DECLARE_EVENT_SYSCALL_CLASS 121 + #define DECLARE_EVENT_SYSCALL_CLASS DECLARE_EVENT_CLASS 132 122 133 123 #undef DEFINE_EVENT 134 124 #define DEFINE_EVENT(template, name, proto, args) ··· 224 208 .trace = trace_raw_output_##call, \ 225 209 }; 226 210 211 + #undef DECLARE_EVENT_SYSCALL_CLASS 212 + #define DECLARE_EVENT_SYSCALL_CLASS DECLARE_EVENT_CLASS 213 + 227 214 #undef DEFINE_EVENT_PRINT 228 215 #define DEFINE_EVENT_PRINT(template, call, proto, args, print) \ 229 216 static notrace enum print_line_t \ ··· 263 244 tstruct \ 264 245 {} }; 265 246 247 + #undef DECLARE_EVENT_SYSCALL_CLASS 248 + #define DECLARE_EVENT_SYSCALL_CLASS DECLARE_EVENT_CLASS 249 + 266 250 #undef DEFINE_EVENT_PRINT 267 251 #define DEFINE_EVENT_PRINT(template, name, proto, args, print) 268 252 ··· 286 264 \ 287 265 return __data_size; \ 288 266 } 267 + 268 + #undef DECLARE_EVENT_SYSCALL_CLASS 269 + #define DECLARE_EVENT_SYSCALL_CLASS DECLARE_EVENT_CLASS 289 270 290 271 #include TRACE_INCLUDE(TRACE_INCLUDE_FILE) 291 272 ··· 399 374 400 375 #include "stages/stage6_event_callback.h" 401 376 402 - #undef DECLARE_EVENT_CLASS 403 - #define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \ 404 - \ 377 + 378 + #undef __DECLARE_EVENT_CLASS 379 + #define __DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \ 405 380 static notrace void \ 406 - trace_event_raw_event_##call(void *__data, proto) \ 381 + do_trace_event_raw_event_##call(void *__data, proto) \ 407 382 { \ 408 383 struct trace_event_file *trace_file = __data; \ 409 384 struct trace_event_data_offsets_##call __maybe_unused __data_offsets;\ ··· 428 403 \ 429 404 trace_event_buffer_commit(&fbuffer); \ 430 405 } 406 + 407 + #undef DECLARE_EVENT_CLASS 408 + #define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \ 409 + __DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), PARAMS(tstruct), \ 410 + PARAMS(assign), PARAMS(print)) \ 411 + static notrace void \ 412 + trace_event_raw_event_##call(void *__data, proto) \ 413 + { \ 414 + do_trace_event_raw_event_##call(__data, args); \ 415 + } 416 + 417 + #undef DECLARE_EVENT_SYSCALL_CLASS 418 + #define DECLARE_EVENT_SYSCALL_CLASS(call, proto, args, tstruct, assign, print) \ 419 + __DECLARE_EVENT_CLASS(call, PARAMS(proto), PARAMS(args), PARAMS(tstruct), \ 420 + PARAMS(assign), PARAMS(print)) \ 421 + static notrace void \ 422 + trace_event_raw_event_##call(void *__data, proto) \ 423 + { \ 424 + might_fault(); \ 425 + preempt_disable_notrace(); \ 426 + do_trace_event_raw_event_##call(__data, args); \ 427 + preempt_enable_notrace(); \ 428 + } 429 + 431 430 /* 432 431 * The ftrace_test_probe is compiled out, it is only here as a build time check 433 432 * to make sure that if the tracepoint handling changes, the ftrace probe will ··· 466 417 } 467 418 468 419 #include TRACE_INCLUDE(TRACE_INCLUDE_FILE) 420 + 421 + #undef __DECLARE_EVENT_CLASS 469 422 470 423 #include "stages/stage7_class_define.h" 471 424 ··· 484 433 .reg = trace_event_reg, \ 485 434 _TRACE_PERF_INIT(call) \ 486 435 }; 436 + 437 + #undef DECLARE_EVENT_SYSCALL_CLASS 438 + #define DECLARE_EVENT_SYSCALL_CLASS DECLARE_EVENT_CLASS 487 439 488 440 #undef DEFINE_EVENT 489 441 #define DEFINE_EVENT(template, call, proto, args) \
+1
init/Kconfig
··· 1989 1989 # 1990 1990 config TRACEPOINTS 1991 1991 bool 1992 + select TASKS_TRACE_RCU 1992 1993 1993 1994 source "kernel/Kconfig.kexec" 1994 1995
+49 -18
kernel/bpf/syscall.c
··· 35 35 #include <linux/rcupdate_trace.h> 36 36 #include <linux/memcontrol.h> 37 37 #include <linux/trace_events.h> 38 + #include <linux/tracepoint.h> 38 39 39 40 #include <net/netfilter/nf_bpf_link.h> 40 41 #include <net/netkit.h> ··· 3034 3033 attr->file_flags); 3035 3034 } 3036 3035 3037 - void bpf_link_init(struct bpf_link *link, enum bpf_link_type type, 3038 - const struct bpf_link_ops *ops, struct bpf_prog *prog) 3036 + /* bpf_link_init_sleepable() allows to specify whether BPF link itself has 3037 + * "sleepable" semantics, which normally would mean that BPF link's attach 3038 + * hook can dereference link or link's underlying program for some time after 3039 + * detachment due to RCU Tasks Trace-based lifetime protection scheme. 3040 + * BPF program itself can be non-sleepable, yet, because it's transitively 3041 + * reachable through BPF link, its freeing has to be delayed until after RCU 3042 + * Tasks Trace GP. 3043 + */ 3044 + void bpf_link_init_sleepable(struct bpf_link *link, enum bpf_link_type type, 3045 + const struct bpf_link_ops *ops, struct bpf_prog *prog, 3046 + bool sleepable) 3039 3047 { 3040 3048 WARN_ON(ops->dealloc && ops->dealloc_deferred); 3041 3049 atomic64_set(&link->refcnt, 1); 3042 3050 link->type = type; 3051 + link->sleepable = sleepable; 3043 3052 link->id = 0; 3044 3053 link->ops = ops; 3045 3054 link->prog = prog; 3055 + } 3056 + 3057 + void bpf_link_init(struct bpf_link *link, enum bpf_link_type type, 3058 + const struct bpf_link_ops *ops, struct bpf_prog *prog) 3059 + { 3060 + bpf_link_init_sleepable(link, type, ops, prog, false); 3046 3061 } 3047 3062 3048 3063 static void bpf_link_free_id(int id) ··· 3093 3076 atomic64_inc(&link->refcnt); 3094 3077 } 3095 3078 3079 + static void bpf_link_dealloc(struct bpf_link *link) 3080 + { 3081 + /* now that we know that bpf_link itself can't be reached, put underlying BPF program */ 3082 + if (link->prog) 3083 + bpf_prog_put(link->prog); 3084 + 3085 + /* free bpf_link and its containing memory */ 3086 + if (link->ops->dealloc_deferred) 3087 + link->ops->dealloc_deferred(link); 3088 + else 3089 + link->ops->dealloc(link); 3090 + } 3091 + 3096 3092 static void bpf_link_defer_dealloc_rcu_gp(struct rcu_head *rcu) 3097 3093 { 3098 3094 struct bpf_link *link = container_of(rcu, struct bpf_link, rcu); 3099 3095 3100 - /* free bpf_link and its containing memory */ 3101 - link->ops->dealloc_deferred(link); 3096 + bpf_link_dealloc(link); 3102 3097 } 3103 3098 3104 3099 static void bpf_link_defer_dealloc_mult_rcu_gp(struct rcu_head *rcu) ··· 3125 3096 static void bpf_link_free(struct bpf_link *link) 3126 3097 { 3127 3098 const struct bpf_link_ops *ops = link->ops; 3128 - bool sleepable = false; 3129 3099 3130 3100 bpf_link_free_id(link->id); 3131 - if (link->prog) { 3132 - sleepable = link->prog->sleepable; 3133 - /* detach BPF program, clean up used resources */ 3101 + /* detach BPF program, clean up used resources */ 3102 + if (link->prog) 3134 3103 ops->release(link); 3135 - bpf_prog_put(link->prog); 3136 - } 3137 3104 if (ops->dealloc_deferred) { 3138 - /* schedule BPF link deallocation; if underlying BPF program 3139 - * is sleepable, we need to first wait for RCU tasks trace 3140 - * sync, then go through "classic" RCU grace period 3105 + /* Schedule BPF link deallocation, which will only then 3106 + * trigger putting BPF program refcount. 3107 + * If underlying BPF program is sleepable or BPF link's target 3108 + * attach hookpoint is sleepable or otherwise requires RCU GPs 3109 + * to ensure link and its underlying BPF program is not 3110 + * reachable anymore, we need to first wait for RCU tasks 3111 + * trace sync, and then go through "classic" RCU grace period 3141 3112 */ 3142 - if (sleepable) 3113 + if (link->sleepable || (link->prog && link->prog->sleepable)) 3143 3114 call_rcu_tasks_trace(&link->rcu, bpf_link_defer_dealloc_mult_rcu_gp); 3144 3115 else 3145 3116 call_rcu(&link->rcu, bpf_link_defer_dealloc_rcu_gp); 3146 - } else if (ops->dealloc) 3147 - ops->dealloc(link); 3117 + } else if (ops->dealloc) { 3118 + bpf_link_dealloc(link); 3119 + } 3148 3120 } 3149 3121 3150 3122 static void bpf_link_put_deferred(struct work_struct *work) ··· 3966 3936 err = -ENOMEM; 3967 3937 goto out_put_btp; 3968 3938 } 3969 - bpf_link_init(&link->link, BPF_LINK_TYPE_RAW_TRACEPOINT, 3970 - &bpf_raw_tp_link_lops, prog); 3939 + bpf_link_init_sleepable(&link->link, BPF_LINK_TYPE_RAW_TRACEPOINT, 3940 + &bpf_raw_tp_link_lops, prog, 3941 + tracepoint_is_faultable(btp->tp)); 3971 3942 link->btp = btp; 3972 3943 link->cookie = cookie; 3973 3944
+3
kernel/trace/ftrace.c
··· 5093 5093 char *func; 5094 5094 int ret; 5095 5095 5096 + if (!tr) 5097 + return -ENODEV; 5098 + 5096 5099 /* match_records() modifies func, and we need the original */ 5097 5100 func = kstrdup(func_orig, GFP_KERNEL); 5098 5101 if (!func)
+36 -45
kernel/trace/trace.c
··· 593 593 return 0; 594 594 } 595 595 596 - int call_filter_check_discard(struct trace_event_call *call, void *rec, 597 - struct trace_buffer *buffer, 598 - struct ring_buffer_event *event) 599 - { 600 - if (unlikely(call->flags & TRACE_EVENT_FL_FILTERED) && 601 - !filter_match_preds(call->filter, rec)) { 602 - __trace_event_discard_commit(buffer, event); 603 - return 1; 604 - } 605 - 606 - return 0; 607 - } 608 - 609 596 /** 610 597 * trace_find_filtered_pid - check if a pid exists in a filtered_pid list 611 598 * @filtered_pids: The list of pids to check ··· 975 988 #endif 976 989 977 990 #ifdef CONFIG_STACKTRACE 978 - static void __ftrace_trace_stack(struct trace_buffer *buffer, 991 + static void __ftrace_trace_stack(struct trace_array *tr, 992 + struct trace_buffer *buffer, 979 993 unsigned int trace_ctx, 980 994 int skip, struct pt_regs *regs); 981 995 static inline void ftrace_trace_stack(struct trace_array *tr, ··· 985 997 int skip, struct pt_regs *regs); 986 998 987 999 #else 988 - static inline void __ftrace_trace_stack(struct trace_buffer *buffer, 1000 + static inline void __ftrace_trace_stack(struct trace_array *tr, 1001 + struct trace_buffer *buffer, 989 1002 unsigned int trace_ctx, 990 1003 int skip, struct pt_regs *regs) 991 1004 { ··· 1923 1934 max_data->critical_start = data->critical_start; 1924 1935 max_data->critical_end = data->critical_end; 1925 1936 1926 - strncpy(max_data->comm, tsk->comm, TASK_COMM_LEN); 1937 + strscpy(max_data->comm, tsk->comm); 1927 1938 max_data->pid = tsk->pid; 1928 1939 /* 1929 1940 * If tsk == current, then use current_uid(), as that does not use ··· 2897 2908 trace_function(struct trace_array *tr, unsigned long ip, unsigned long 2898 2909 parent_ip, unsigned int trace_ctx) 2899 2910 { 2900 - struct trace_event_call *call = &event_function; 2901 2911 struct trace_buffer *buffer = tr->array_buffer.buffer; 2902 2912 struct ring_buffer_event *event; 2903 2913 struct ftrace_entry *entry; ··· 2909 2921 entry->ip = ip; 2910 2922 entry->parent_ip = parent_ip; 2911 2923 2912 - if (!call_filter_check_discard(call, entry, buffer, event)) { 2913 - if (static_branch_unlikely(&trace_function_exports_enabled)) 2914 - ftrace_exports(event, TRACE_EXPORT_FUNCTION); 2915 - __buffer_unlock_commit(buffer, event); 2916 - } 2924 + if (static_branch_unlikely(&trace_function_exports_enabled)) 2925 + ftrace_exports(event, TRACE_EXPORT_FUNCTION); 2926 + __buffer_unlock_commit(buffer, event); 2917 2927 } 2918 2928 2919 2929 #ifdef CONFIG_STACKTRACE ··· 2919 2933 /* Allow 4 levels of nesting: normal, softirq, irq, NMI */ 2920 2934 #define FTRACE_KSTACK_NESTING 4 2921 2935 2922 - #define FTRACE_KSTACK_ENTRIES (PAGE_SIZE / FTRACE_KSTACK_NESTING) 2936 + #define FTRACE_KSTACK_ENTRIES (SZ_4K / FTRACE_KSTACK_NESTING) 2923 2937 2924 2938 struct ftrace_stack { 2925 2939 unsigned long calls[FTRACE_KSTACK_ENTRIES]; ··· 2933 2947 static DEFINE_PER_CPU(struct ftrace_stacks, ftrace_stacks); 2934 2948 static DEFINE_PER_CPU(int, ftrace_stack_reserve); 2935 2949 2936 - static void __ftrace_trace_stack(struct trace_buffer *buffer, 2950 + static void __ftrace_trace_stack(struct trace_array *tr, 2951 + struct trace_buffer *buffer, 2937 2952 unsigned int trace_ctx, 2938 2953 int skip, struct pt_regs *regs) 2939 2954 { 2940 - struct trace_event_call *call = &event_kernel_stack; 2941 2955 struct ring_buffer_event *event; 2942 2956 unsigned int size, nr_entries; 2943 2957 struct ftrace_stack *fstack; ··· 2980 2994 nr_entries = stack_trace_save(fstack->calls, size, skip); 2981 2995 } 2982 2996 2997 + #ifdef CONFIG_DYNAMIC_FTRACE 2998 + /* Mark entry of stack trace as trampoline code */ 2999 + if (tr->ops && tr->ops->trampoline) { 3000 + unsigned long tramp_start = tr->ops->trampoline; 3001 + unsigned long tramp_end = tramp_start + tr->ops->trampoline_size; 3002 + unsigned long *calls = fstack->calls; 3003 + 3004 + for (int i = 0; i < nr_entries; i++) { 3005 + if (calls[i] >= tramp_start && calls[i] < tramp_end) 3006 + calls[i] = FTRACE_TRAMPOLINE_MARKER; 3007 + } 3008 + } 3009 + #endif 3010 + 2983 3011 event = __trace_buffer_lock_reserve(buffer, TRACE_STACK, 2984 3012 struct_size(entry, caller, nr_entries), 2985 3013 trace_ctx); ··· 3005 3005 memcpy(&entry->caller, fstack->calls, 3006 3006 flex_array_size(entry, caller, nr_entries)); 3007 3007 3008 - if (!call_filter_check_discard(call, entry, buffer, event)) 3009 - __buffer_unlock_commit(buffer, event); 3008 + __buffer_unlock_commit(buffer, event); 3010 3009 3011 3010 out: 3012 3011 /* Again, don't let gcc optimize things here */ ··· 3023 3024 if (!(tr->trace_flags & TRACE_ITER_STACKTRACE)) 3024 3025 return; 3025 3026 3026 - __ftrace_trace_stack(buffer, trace_ctx, skip, regs); 3027 + __ftrace_trace_stack(tr, buffer, trace_ctx, skip, regs); 3027 3028 } 3028 3029 3029 3030 void __trace_stack(struct trace_array *tr, unsigned int trace_ctx, ··· 3032 3033 struct trace_buffer *buffer = tr->array_buffer.buffer; 3033 3034 3034 3035 if (rcu_is_watching()) { 3035 - __ftrace_trace_stack(buffer, trace_ctx, skip, NULL); 3036 + __ftrace_trace_stack(tr, buffer, trace_ctx, skip, NULL); 3036 3037 return; 3037 3038 } 3038 3039 ··· 3049 3050 return; 3050 3051 3051 3052 ct_irq_enter_irqson(); 3052 - __ftrace_trace_stack(buffer, trace_ctx, skip, NULL); 3053 + __ftrace_trace_stack(tr, buffer, trace_ctx, skip, NULL); 3053 3054 ct_irq_exit_irqson(); 3054 3055 } 3055 3056 ··· 3066 3067 /* Skip 1 to skip this function. */ 3067 3068 skip++; 3068 3069 #endif 3069 - __ftrace_trace_stack(printk_trace->array_buffer.buffer, 3070 - tracing_gen_ctx(), skip, NULL); 3070 + __ftrace_trace_stack(printk_trace, printk_trace->array_buffer.buffer, 3071 + tracing_gen_ctx(), skip, NULL); 3071 3072 } 3072 3073 EXPORT_SYMBOL_GPL(trace_dump_stack); 3073 3074 ··· 3078 3079 ftrace_trace_userstack(struct trace_array *tr, 3079 3080 struct trace_buffer *buffer, unsigned int trace_ctx) 3080 3081 { 3081 - struct trace_event_call *call = &event_user_stack; 3082 3082 struct ring_buffer_event *event; 3083 3083 struct userstack_entry *entry; 3084 3084 ··· 3111 3113 memset(&entry->caller, 0, sizeof(entry->caller)); 3112 3114 3113 3115 stack_trace_save_user(entry->caller, FTRACE_STACK_ENTRIES); 3114 - if (!call_filter_check_discard(call, entry, buffer, event)) 3115 - __buffer_unlock_commit(buffer, event); 3116 + __buffer_unlock_commit(buffer, event); 3116 3117 3117 3118 out_drop_count: 3118 3119 __this_cpu_dec(user_stack_count); ··· 3280 3283 */ 3281 3284 int trace_vbprintk(unsigned long ip, const char *fmt, va_list args) 3282 3285 { 3283 - struct trace_event_call *call = &event_bprint; 3284 3286 struct ring_buffer_event *event; 3285 3287 struct trace_buffer *buffer; 3286 3288 struct trace_array *tr = READ_ONCE(printk_trace); ··· 3323 3327 entry->fmt = fmt; 3324 3328 3325 3329 memcpy(entry->buf, tbuffer, sizeof(u32) * len); 3326 - if (!call_filter_check_discard(call, entry, buffer, event)) { 3327 - __buffer_unlock_commit(buffer, event); 3328 - ftrace_trace_stack(tr, buffer, trace_ctx, 6, NULL); 3329 - } 3330 + __buffer_unlock_commit(buffer, event); 3331 + ftrace_trace_stack(tr, buffer, trace_ctx, 6, NULL); 3330 3332 3331 3333 out: 3332 3334 ring_buffer_nest_end(buffer); ··· 3344 3350 __trace_array_vprintk(struct trace_buffer *buffer, 3345 3351 unsigned long ip, const char *fmt, va_list args) 3346 3352 { 3347 - struct trace_event_call *call = &event_print; 3348 3353 struct ring_buffer_event *event; 3349 3354 int len = 0, size; 3350 3355 struct print_entry *entry; ··· 3378 3385 entry->ip = ip; 3379 3386 3380 3387 memcpy(&entry->buf, tbuffer, len + 1); 3381 - if (!call_filter_check_discard(call, entry, buffer, event)) { 3382 - __buffer_unlock_commit(buffer, event); 3383 - ftrace_trace_stack(printk_trace, buffer, trace_ctx, 6, NULL); 3384 - } 3388 + __buffer_unlock_commit(buffer, event); 3389 + ftrace_trace_stack(printk_trace, buffer, trace_ctx, 6, NULL); 3385 3390 3386 3391 out: 3387 3392 ring_buffer_nest_end(buffer);
+7 -4
kernel/trace/trace.h
··· 1440 1440 int nr_events; 1441 1441 }; 1442 1442 1443 - extern int call_filter_check_discard(struct trace_event_call *call, void *rec, 1444 - struct trace_buffer *buffer, 1445 - struct ring_buffer_event *event); 1446 - 1447 1443 void trace_buffer_unlock_commit_regs(struct trace_array *tr, 1448 1444 struct trace_buffer *buffer, 1449 1445 struct ring_buffer_event *event, ··· 2182 2186 return 0; 2183 2187 } 2184 2188 #endif 2189 + 2190 + /* 2191 + * This is used only to distinguish 2192 + * function address from trampoline code. 2193 + * So this value has no meaning. 2194 + */ 2195 + #define FTRACE_TRAMPOLINE_MARKER ((unsigned long) INT_MAX) 2185 2196 2186 2197 #endif /* _LINUX_KERNEL_TRACE_H */
+3 -7
kernel/trace/trace_branch.c
··· 30 30 static void 31 31 probe_likely_condition(struct ftrace_likely_data *f, int val, int expect) 32 32 { 33 - struct trace_event_call *call = &event_branch; 34 33 struct trace_array *tr = branch_tracer; 35 34 struct trace_buffer *buffer; 36 35 struct trace_array_cpu *data; ··· 73 74 p--; 74 75 p++; 75 76 76 - strncpy(entry->func, f->data.func, TRACE_FUNC_SIZE); 77 - strncpy(entry->file, p, TRACE_FILE_SIZE); 78 - entry->func[TRACE_FUNC_SIZE] = 0; 79 - entry->file[TRACE_FILE_SIZE] = 0; 77 + strscpy(entry->func, f->data.func); 78 + strscpy(entry->file, p); 80 79 entry->constant = f->constant; 81 80 entry->line = f->data.line; 82 81 entry->correct = val == expect; 83 82 84 - if (!call_filter_check_discard(call, entry, buffer, event)) 85 - trace_buffer_unlock_commit_nostack(buffer, event); 83 + trace_buffer_unlock_commit_nostack(buffer, event); 86 84 87 85 out: 88 86 current->trace_recursion &= ~TRACE_BRANCH_BIT;
+1 -1
kernel/trace/trace_clock.c
··· 154 154 */ 155 155 u64 notrace trace_clock_counter(void) 156 156 { 157 - return atomic64_add_return(1, &trace_counter); 157 + return atomic64_inc_return(&trace_counter); 158 158 }
+6
kernel/trace/trace_event_perf.c
··· 352 352 int perf_trace_add(struct perf_event *p_event, int flags) 353 353 { 354 354 struct trace_event_call *tp_event = p_event->tp_event; 355 + struct hw_perf_event *hwc = &p_event->hw; 355 356 356 357 if (!(flags & PERF_EF_START)) 357 358 p_event->hw.state = PERF_HES_STOPPED; 359 + 360 + if (is_sampling_event(p_event)) { 361 + hwc->last_period = hwc->sample_period; 362 + perf_swevent_set_period(p_event); 363 + } 358 364 359 365 /* 360 366 * If TRACE_REG_PERF_ADD returns false; no custom action was performed
-2
kernel/trace/trace_events.c
··· 3149 3149 { 3150 3150 event_remove(call); 3151 3151 trace_destroy_fields(call); 3152 - free_event_filter(call->filter); 3153 - call->filter = NULL; 3154 3152 } 3155 3153 3156 3154 static int probe_remove_event_call(struct trace_event_call *call)
+4 -4
kernel/trace/trace_events_filter.c
··· 1616 1616 goto err_free; 1617 1617 } 1618 1618 1619 - strncpy(num_buf, str + s, len); 1619 + memcpy(num_buf, str + s, len); 1620 1620 num_buf[len] = 0; 1621 1621 1622 1622 ret = kstrtoul(num_buf, 0, &ip); ··· 1694 1694 if (!pred->regex) 1695 1695 goto err_mem; 1696 1696 pred->regex->len = len; 1697 - strncpy(pred->regex->pattern, str + s, len); 1697 + memcpy(pred->regex->pattern, str + s, len); 1698 1698 pred->regex->pattern[len] = 0; 1699 1699 1700 1700 } else if (!strncmp(str + i, "CPUS", 4)) { ··· 1859 1859 if (!pred->regex) 1860 1860 goto err_mem; 1861 1861 pred->regex->len = len; 1862 - strncpy(pred->regex->pattern, str + s, len); 1862 + memcpy(pred->regex->pattern, str + s, len); 1863 1863 pred->regex->pattern[len] = 0; 1864 1864 1865 1865 filter_build_regex(pred); ··· 1919 1919 goto err_free; 1920 1920 } 1921 1921 1922 - strncpy(num_buf, str + s, len); 1922 + memcpy(num_buf, str + s, len); 1923 1923 num_buf[len] = 0; 1924 1924 1925 1925 /* Make sure it is a value */
+4 -7
kernel/trace/trace_events_hist.c
··· 822 822 { 823 823 struct tracepoint *tp = event->tp; 824 824 825 - if (unlikely(atomic_read(&tp->key.enabled) > 0)) { 825 + if (unlikely(static_key_enabled(&tp->key))) { 826 826 struct tracepoint_func *probe_func_ptr; 827 827 synth_probe_func_t probe_func; 828 828 void *__data; ··· 1354 1354 } else if (field->flags & HIST_FIELD_FL_TIMESTAMP) 1355 1355 field_name = "common_timestamp"; 1356 1356 else if (field->flags & HIST_FIELD_FL_STACKTRACE) { 1357 - if (field->field) 1358 - field_name = field->field->name; 1359 - else 1360 - field_name = "common_stacktrace"; 1357 + field_name = "common_stacktrace"; 1361 1358 } else if (field->flags & HIST_FIELD_FL_HITCOUNT) 1362 1359 field_name = "hitcount"; 1363 1360 ··· 1596 1599 return; 1597 1600 } 1598 1601 1599 - strncpy(comm, task->comm, TASK_COMM_LEN); 1602 + strscpy(comm, task->comm, TASK_COMM_LEN); 1600 1603 } 1601 1604 1602 1605 static void hist_elt_data_free(struct hist_elt_data *elt_data) ··· 3402 3405 elt_data = context->elt->private_data; 3403 3406 track_elt_data = track_data->elt.private_data; 3404 3407 if (elt_data->comm) 3405 - strncpy(track_elt_data->comm, elt_data->comm, TASK_COMM_LEN); 3408 + strscpy(track_elt_data->comm, elt_data->comm, TASK_COMM_LEN); 3406 3409 3407 3410 track_data->updated = true; 3408 3411
+2 -2
kernel/trace/trace_events_user.c
··· 1676 1676 struct tracepoint *tp = &user->tracepoint; 1677 1677 char status = 0; 1678 1678 1679 - if (atomic_read(&tp->key.enabled) > 0) { 1679 + if (static_key_enabled(&tp->key)) { 1680 1680 struct tracepoint_func *probe_func_ptr; 1681 1681 user_event_func_t probe_func; 1682 1682 ··· 2280 2280 * It's possible key.enabled disables after this check, however 2281 2281 * we don't mind if a few events are included in this condition. 2282 2282 */ 2283 - if (likely(atomic_read(&tp->key.enabled) > 0)) { 2283 + if (likely(static_key_enabled(&tp->key))) { 2284 2284 struct tracepoint_func *probe_func_ptr; 2285 2285 user_event_func_t probe_func; 2286 2286 struct iov_iter copy;
+29 -7
kernel/trace/trace_functions.c
··· 176 176 tracing_reset_online_cpus(&tr->array_buffer); 177 177 } 178 178 179 + #ifdef CONFIG_FUNCTION_GRAPH_TRACER 180 + static __always_inline unsigned long 181 + function_get_true_parent_ip(unsigned long parent_ip, struct ftrace_regs *fregs) 182 + { 183 + unsigned long true_parent_ip; 184 + int idx = 0; 185 + 186 + true_parent_ip = parent_ip; 187 + if (unlikely(parent_ip == (unsigned long)&return_to_handler) && fregs) 188 + true_parent_ip = ftrace_graph_ret_addr(current, &idx, parent_ip, 189 + (unsigned long *)ftrace_regs_get_stack_pointer(fregs)); 190 + return true_parent_ip; 191 + } 192 + #else 193 + static __always_inline unsigned long 194 + function_get_true_parent_ip(unsigned long parent_ip, struct ftrace_regs *fregs) 195 + { 196 + return parent_ip; 197 + } 198 + #endif 199 + 179 200 static void 180 201 function_trace_call(unsigned long ip, unsigned long parent_ip, 181 202 struct ftrace_ops *op, struct ftrace_regs *fregs) ··· 205 184 struct trace_array_cpu *data; 206 185 unsigned int trace_ctx; 207 186 int bit; 208 - int cpu; 209 187 210 188 if (unlikely(!tr->function_enabled)) 211 189 return; ··· 213 193 if (bit < 0) 214 194 return; 215 195 196 + parent_ip = function_get_true_parent_ip(parent_ip, fregs); 197 + 216 198 trace_ctx = tracing_gen_ctx(); 217 199 218 - cpu = smp_processor_id(); 219 - data = per_cpu_ptr(tr->array_buffer.data, cpu); 200 + data = this_cpu_ptr(tr->array_buffer.data); 220 201 if (!atomic_read(&data->disabled)) 221 202 trace_function(tr, ip, parent_ip, trace_ctx); 222 203 ··· 262 241 * recursive protection is performed. 263 242 */ 264 243 local_irq_save(flags); 244 + parent_ip = function_get_true_parent_ip(parent_ip, fregs); 265 245 cpu = raw_smp_processor_id(); 266 246 data = per_cpu_ptr(tr->array_buffer.data, cpu); 267 247 disabled = atomic_inc_return(&data->disabled); ··· 322 300 unsigned int trace_ctx; 323 301 unsigned long flags; 324 302 int bit; 325 - int cpu; 326 303 327 304 if (unlikely(!tr->function_enabled)) 328 305 return; ··· 330 309 if (bit < 0) 331 310 return; 332 311 333 - cpu = smp_processor_id(); 334 - data = per_cpu_ptr(tr->array_buffer.data, cpu); 312 + parent_ip = function_get_true_parent_ip(parent_ip, fregs); 313 + data = this_cpu_ptr(tr->array_buffer.data); 335 314 if (atomic_read(&data->disabled)) 336 315 goto out; 337 316 ··· 342 321 * TODO: think about a solution that is better than just hoping to be 343 322 * lucky. 344 323 */ 345 - last_info = per_cpu_ptr(tr->last_func_repeats, cpu); 324 + last_info = this_cpu_ptr(tr->last_func_repeats); 346 325 if (is_repeat_check(tr, last_info, ip, parent_ip)) 347 326 goto out; 348 327 ··· 377 356 * recursive protection is performed. 378 357 */ 379 358 local_irq_save(flags); 359 + parent_ip = function_get_true_parent_ip(parent_ip, fregs); 380 360 cpu = raw_smp_processor_id(); 381 361 data = per_cpu_ptr(tr->array_buffer.data, cpu); 382 362 disabled = atomic_inc_return(&data->disabled);
+2 -6
kernel/trace/trace_functions_graph.c
··· 114 114 struct ftrace_graph_ent *trace, 115 115 unsigned int trace_ctx) 116 116 { 117 - struct trace_event_call *call = &event_funcgraph_entry; 118 117 struct ring_buffer_event *event; 119 118 struct trace_buffer *buffer = tr->array_buffer.buffer; 120 119 struct ftrace_graph_ent_entry *entry; ··· 124 125 return 0; 125 126 entry = ring_buffer_event_data(event); 126 127 entry->graph_ent = *trace; 127 - if (!call_filter_check_discard(call, entry, buffer, event)) 128 - trace_buffer_unlock_commit_nostack(buffer, event); 128 + trace_buffer_unlock_commit_nostack(buffer, event); 129 129 130 130 return 1; 131 131 } ··· 290 292 struct ftrace_graph_ret *trace, 291 293 unsigned int trace_ctx) 292 294 { 293 - struct trace_event_call *call = &event_funcgraph_exit; 294 295 struct ring_buffer_event *event; 295 296 struct trace_buffer *buffer = tr->array_buffer.buffer; 296 297 struct ftrace_graph_ret_entry *entry; ··· 300 303 return; 301 304 entry = ring_buffer_event_data(event); 302 305 entry->ret = *trace; 303 - if (!call_filter_check_discard(call, entry, buffer, event)) 304 - trace_buffer_unlock_commit_nostack(buffer, event); 306 + trace_buffer_unlock_commit_nostack(buffer, event); 305 307 } 306 308 307 309 static void handle_nosleeptime(struct ftrace_graph_ret *trace,
+1 -3
kernel/trace/trace_hwlat.c
··· 130 130 static void trace_hwlat_sample(struct hwlat_sample *sample) 131 131 { 132 132 struct trace_array *tr = hwlat_trace; 133 - struct trace_event_call *call = &event_hwlat; 134 133 struct trace_buffer *buffer = tr->array_buffer.buffer; 135 134 struct ring_buffer_event *event; 136 135 struct hwlat_entry *entry; ··· 147 148 entry->nmi_count = sample->nmi_count; 148 149 entry->count = sample->count; 149 150 150 - if (!call_filter_check_discard(call, entry, buffer, event)) 151 - trace_buffer_unlock_commit_nostack(buffer, event); 151 + trace_buffer_unlock_commit_nostack(buffer, event); 152 152 } 153 153 154 154 /* Macros to encapsulate the time capturing infrastructure */
+2 -6
kernel/trace/trace_mmiotrace.c
··· 294 294 struct trace_array_cpu *data, 295 295 struct mmiotrace_rw *rw) 296 296 { 297 - struct trace_event_call *call = &event_mmiotrace_rw; 298 297 struct trace_buffer *buffer = tr->array_buffer.buffer; 299 298 struct ring_buffer_event *event; 300 299 struct trace_mmiotrace_rw *entry; ··· 309 310 entry = ring_buffer_event_data(event); 310 311 entry->rw = *rw; 311 312 312 - if (!call_filter_check_discard(call, entry, buffer, event)) 313 - trace_buffer_unlock_commit(tr, buffer, event, trace_ctx); 313 + trace_buffer_unlock_commit(tr, buffer, event, trace_ctx); 314 314 } 315 315 316 316 void mmio_trace_rw(struct mmiotrace_rw *rw) ··· 323 325 struct trace_array_cpu *data, 324 326 struct mmiotrace_map *map) 325 327 { 326 - struct trace_event_call *call = &event_mmiotrace_map; 327 328 struct trace_buffer *buffer = tr->array_buffer.buffer; 328 329 struct ring_buffer_event *event; 329 330 struct trace_mmiotrace_map *entry; ··· 338 341 entry = ring_buffer_event_data(event); 339 342 entry->map = *map; 340 343 341 - if (!call_filter_check_discard(call, entry, buffer, event)) 342 - trace_buffer_unlock_commit(tr, buffer, event, trace_ctx); 344 + trace_buffer_unlock_commit(tr, buffer, event, trace_ctx); 343 345 } 344 346 345 347 void mmio_trace_mapping(struct mmiotrace_map *map)
+3 -9
kernel/trace/trace_osnoise.c
··· 499 499 static void 500 500 __trace_osnoise_sample(struct osnoise_sample *sample, struct trace_buffer *buffer) 501 501 { 502 - struct trace_event_call *call = &event_osnoise; 503 502 struct ring_buffer_event *event; 504 503 struct osnoise_entry *entry; 505 504 ··· 516 517 entry->softirq_count = sample->softirq_count; 517 518 entry->thread_count = sample->thread_count; 518 519 519 - if (!call_filter_check_discard(call, entry, buffer, event)) 520 - trace_buffer_unlock_commit_nostack(buffer, event); 520 + trace_buffer_unlock_commit_nostack(buffer, event); 521 521 } 522 522 523 523 /* ··· 576 578 static void 577 579 __trace_timerlat_sample(struct timerlat_sample *sample, struct trace_buffer *buffer) 578 580 { 579 - struct trace_event_call *call = &event_osnoise; 580 581 struct ring_buffer_event *event; 581 582 struct timerlat_entry *entry; 582 583 ··· 588 591 entry->context = sample->context; 589 592 entry->timer_latency = sample->timer_latency; 590 593 591 - if (!call_filter_check_discard(call, entry, buffer, event)) 592 - trace_buffer_unlock_commit_nostack(buffer, event); 594 + trace_buffer_unlock_commit_nostack(buffer, event); 593 595 } 594 596 595 597 /* ··· 650 654 static void 651 655 __timerlat_dump_stack(struct trace_buffer *buffer, struct trace_stack *fstack, unsigned int size) 652 656 { 653 - struct trace_event_call *call = &event_osnoise; 654 657 struct ring_buffer_event *event; 655 658 struct stack_entry *entry; 656 659 ··· 663 668 memcpy(&entry->caller, fstack->calls, size); 664 669 entry->size = fstack->nr_entries; 665 670 666 - if (!call_filter_check_discard(call, entry, buffer, event)) 667 - trace_buffer_unlock_commit_nostack(buffer, event); 671 + trace_buffer_unlock_commit_nostack(buffer, event); 668 672 } 669 673 670 674 /*
+4 -1
kernel/trace/trace_output.c
··· 460 460 (entry->flags & TRACE_FLAG_IRQS_OFF && bh_off) ? 'D' : 461 461 (entry->flags & TRACE_FLAG_IRQS_OFF) ? 'd' : 462 462 bh_off ? 'b' : 463 - (entry->flags & TRACE_FLAG_IRQS_NOSUPPORT) ? 'X' : 464 463 '.'; 465 464 466 465 switch (entry->flags & (TRACE_FLAG_NEED_RESCHED | ··· 1245 1246 break; 1246 1247 1247 1248 trace_seq_puts(s, " => "); 1249 + if ((*p) == FTRACE_TRAMPOLINE_MARKER) { 1250 + trace_seq_puts(s, "[FTRACE TRAMPOLINE]\n"); 1251 + continue; 1252 + } 1248 1253 seq_print_ip_sym(s, (*p) + delta, flags); 1249 1254 trace_seq_putc(s, '\n'); 1250 1255 }
+6 -20
kernel/trace/trace_preemptirq.c
··· 15 15 #define CREATE_TRACE_POINTS 16 16 #include <trace/events/preemptirq.h> 17 17 18 - /* 19 - * Use regular trace points on architectures that implement noinstr 20 - * tooling: these calls will only happen with RCU enabled, which can 21 - * use a regular tracepoint. 22 - * 23 - * On older architectures, use the rcuidle tracing methods (which 24 - * aren't NMI-safe - so exclude NMI contexts): 25 - */ 26 - #ifdef CONFIG_ARCH_WANTS_NO_INSTR 27 - #define trace(point) trace_##point 28 - #else 29 - #define trace(point) if (!in_nmi()) trace_##point##_rcuidle 30 - #endif 31 - 32 18 #ifdef CONFIG_TRACE_IRQFLAGS 33 19 /* Per-cpu variable to prevent redundant calls when IRQs already off */ 34 20 static DEFINE_PER_CPU(int, tracing_irq_cpu); ··· 28 42 void trace_hardirqs_on_prepare(void) 29 43 { 30 44 if (this_cpu_read(tracing_irq_cpu)) { 31 - trace(irq_enable)(CALLER_ADDR0, CALLER_ADDR1); 45 + trace_irq_enable(CALLER_ADDR0, CALLER_ADDR1); 32 46 tracer_hardirqs_on(CALLER_ADDR0, CALLER_ADDR1); 33 47 this_cpu_write(tracing_irq_cpu, 0); 34 48 } ··· 39 53 void trace_hardirqs_on(void) 40 54 { 41 55 if (this_cpu_read(tracing_irq_cpu)) { 42 - trace(irq_enable)(CALLER_ADDR0, CALLER_ADDR1); 56 + trace_irq_enable(CALLER_ADDR0, CALLER_ADDR1); 43 57 tracer_hardirqs_on(CALLER_ADDR0, CALLER_ADDR1); 44 58 this_cpu_write(tracing_irq_cpu, 0); 45 59 } ··· 61 75 if (!this_cpu_read(tracing_irq_cpu)) { 62 76 this_cpu_write(tracing_irq_cpu, 1); 63 77 tracer_hardirqs_off(CALLER_ADDR0, CALLER_ADDR1); 64 - trace(irq_disable)(CALLER_ADDR0, CALLER_ADDR1); 78 + trace_irq_disable(CALLER_ADDR0, CALLER_ADDR1); 65 79 } 66 80 67 81 } ··· 75 89 if (!this_cpu_read(tracing_irq_cpu)) { 76 90 this_cpu_write(tracing_irq_cpu, 1); 77 91 tracer_hardirqs_off(CALLER_ADDR0, CALLER_ADDR1); 78 - trace(irq_disable)(CALLER_ADDR0, CALLER_ADDR1); 92 + trace_irq_disable(CALLER_ADDR0, CALLER_ADDR1); 79 93 } 80 94 } 81 95 EXPORT_SYMBOL(trace_hardirqs_off); ··· 86 100 87 101 void trace_preempt_on(unsigned long a0, unsigned long a1) 88 102 { 89 - trace(preempt_enable)(a0, a1); 103 + trace_preempt_enable(a0, a1); 90 104 tracer_preempt_on(a0, a1); 91 105 } 92 106 93 107 void trace_preempt_off(unsigned long a0, unsigned long a1) 94 108 { 95 - trace(preempt_disable)(a0, a1); 109 + trace_preempt_disable(a0, a1); 96 110 tracer_preempt_off(a0, a1); 97 111 } 98 112 #endif
+1 -1
kernel/trace/trace_sched_switch.c
··· 187 187 188 188 static inline void set_cmdline(int idx, const char *cmdline) 189 189 { 190 - strncpy(get_saved_cmdlines(idx), cmdline, TASK_COMM_LEN); 190 + strscpy(get_saved_cmdlines(idx), cmdline, TASK_COMM_LEN); 191 191 } 192 192 193 193 static void free_saved_cmdlines_buffer(struct saved_cmdlines_buffer *s)
+2 -6
kernel/trace/trace_sched_wakeup.c
··· 378 378 struct task_struct *next, 379 379 unsigned int trace_ctx) 380 380 { 381 - struct trace_event_call *call = &event_context_switch; 382 381 struct trace_buffer *buffer = tr->array_buffer.buffer; 383 382 struct ring_buffer_event *event; 384 383 struct ctx_switch_entry *entry; ··· 395 396 entry->next_state = task_state_index(next); 396 397 entry->next_cpu = task_cpu(next); 397 398 398 - if (!call_filter_check_discard(call, entry, buffer, event)) 399 - trace_buffer_unlock_commit(tr, buffer, event, trace_ctx); 399 + trace_buffer_unlock_commit(tr, buffer, event, trace_ctx); 400 400 } 401 401 402 402 static void ··· 404 406 struct task_struct *curr, 405 407 unsigned int trace_ctx) 406 408 { 407 - struct trace_event_call *call = &event_wakeup; 408 409 struct ring_buffer_event *event; 409 410 struct ctx_switch_entry *entry; 410 411 struct trace_buffer *buffer = tr->array_buffer.buffer; ··· 421 424 entry->next_state = task_state_index(wakee); 422 425 entry->next_cpu = task_cpu(wakee); 423 426 424 - if (!call_filter_check_discard(call, entry, buffer, event)) 425 - trace_buffer_unlock_commit(tr, buffer, event, trace_ctx); 427 + trace_buffer_unlock_commit(tr, buffer, event, trace_ctx); 426 428 } 427 429 428 430 static void notrace
+28
kernel/trace/trace_syscalls.c
··· 299 299 int syscall_nr; 300 300 int size; 301 301 302 + /* 303 + * Syscall probe called with preemption enabled, but the ring 304 + * buffer and per-cpu data require preemption to be disabled. 305 + */ 306 + might_fault(); 307 + guard(preempt_notrace)(); 308 + 302 309 syscall_nr = trace_get_syscall_nr(current, regs); 303 310 if (syscall_nr < 0 || syscall_nr >= NR_syscalls) 304 311 return; ··· 344 337 struct syscall_metadata *sys_data; 345 338 struct trace_event_buffer fbuffer; 346 339 int syscall_nr; 340 + 341 + /* 342 + * Syscall probe called with preemption enabled, but the ring 343 + * buffer and per-cpu data require preemption to be disabled. 344 + */ 345 + might_fault(); 346 + guard(preempt_notrace)(); 347 347 348 348 syscall_nr = trace_get_syscall_nr(current, regs); 349 349 if (syscall_nr < 0 || syscall_nr >= NR_syscalls) ··· 598 584 int rctx; 599 585 int size; 600 586 587 + /* 588 + * Syscall probe called with preemption enabled, but the ring 589 + * buffer and per-cpu data require preemption to be disabled. 590 + */ 591 + might_fault(); 592 + guard(preempt_notrace)(); 593 + 601 594 syscall_nr = trace_get_syscall_nr(current, regs); 602 595 if (syscall_nr < 0 || syscall_nr >= NR_syscalls) 603 596 return; ··· 706 685 int syscall_nr; 707 686 int rctx; 708 687 int size; 688 + 689 + /* 690 + * Syscall probe called with preemption enabled, but the ring 691 + * buffer and per-cpu data require preemption to be disabled. 692 + */ 693 + might_fault(); 694 + guard(preempt_notrace)(); 709 695 710 696 syscall_nr = trace_get_syscall_nr(current, regs); 711 697 if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
+14 -61
kernel/tracepoint.c
··· 25 25 extern tracepoint_ptr_t __start___tracepoints_ptrs[]; 26 26 extern tracepoint_ptr_t __stop___tracepoints_ptrs[]; 27 27 28 - DEFINE_SRCU(tracepoint_srcu); 29 - EXPORT_SYMBOL_GPL(tracepoint_srcu); 30 - 31 28 enum tp_transition_sync { 32 29 TP_TRANSITION_SYNC_1_0_1, 33 30 TP_TRANSITION_SYNC_N_2_1, ··· 34 37 35 38 struct tp_transition_snapshot { 36 39 unsigned long rcu; 37 - unsigned long srcu; 38 40 bool ongoing; 39 41 }; 40 42 ··· 46 50 47 51 /* Keep the latest get_state snapshot. */ 48 52 snapshot->rcu = get_state_synchronize_rcu(); 49 - snapshot->srcu = start_poll_synchronize_srcu(&tracepoint_srcu); 50 53 snapshot->ongoing = true; 51 54 } 52 55 ··· 56 61 if (!snapshot->ongoing) 57 62 return; 58 63 cond_synchronize_rcu(snapshot->rcu); 59 - if (!poll_state_synchronize_srcu(&tracepoint_srcu, snapshot->srcu)) 60 - synchronize_srcu(&tracepoint_srcu); 61 64 snapshot->ongoing = false; 62 65 } 63 66 ··· 77 84 * tracepoints_mutex nests inside tracepoint_module_list_mutex. 78 85 */ 79 86 static DEFINE_MUTEX(tracepoints_mutex); 80 - 81 - static struct rcu_head *early_probes; 82 - static bool ok_to_free_tracepoints; 83 87 84 88 /* 85 89 * Note about RCU : ··· 101 111 return p == NULL ? NULL : p->probes; 102 112 } 103 113 104 - static void srcu_free_old_probes(struct rcu_head *head) 114 + static void rcu_free_old_probes(struct rcu_head *head) 105 115 { 106 116 kfree(container_of(head, struct tp_probes, rcu)); 107 117 } 108 118 109 - static void rcu_free_old_probes(struct rcu_head *head) 110 - { 111 - call_srcu(&tracepoint_srcu, head, srcu_free_old_probes); 112 - } 113 - 114 - static __init int release_early_probes(void) 115 - { 116 - struct rcu_head *tmp; 117 - 118 - ok_to_free_tracepoints = true; 119 - 120 - while (early_probes) { 121 - tmp = early_probes; 122 - early_probes = tmp->next; 123 - call_rcu(tmp, rcu_free_old_probes); 124 - } 125 - 126 - return 0; 127 - } 128 - 129 - /* SRCU is initialized at core_initcall */ 130 - postcore_initcall(release_early_probes); 131 - 132 - static inline void release_probes(struct tracepoint_func *old) 119 + static inline void release_probes(struct tracepoint *tp, struct tracepoint_func *old) 133 120 { 134 121 if (old) { 135 122 struct tp_probes *tp_probes = container_of(old, 136 123 struct tp_probes, probes[0]); 137 124 138 - /* 139 - * We can't free probes if SRCU is not initialized yet. 140 - * Postpone the freeing till after SRCU is initialized. 141 - */ 142 - if (unlikely(!ok_to_free_tracepoints)) { 143 - tp_probes->rcu.next = early_probes; 144 - early_probes = &tp_probes->rcu; 145 - return; 146 - } 147 - 148 - /* 149 - * Tracepoint probes are protected by both sched RCU and SRCU, 150 - * by calling the SRCU callback in the sched RCU callback we 151 - * cover both cases. So let us chain the SRCU and sched RCU 152 - * callbacks to wait for both grace periods. 153 - */ 154 - call_rcu(&tp_probes->rcu, rcu_free_old_probes); 125 + if (tracepoint_is_faultable(tp)) 126 + call_rcu_tasks_trace(&tp_probes->rcu, rcu_free_old_probes); 127 + else 128 + call_rcu(&tp_probes->rcu, rcu_free_old_probes); 155 129 } 156 130 } 157 131 ··· 281 327 struct tracepoint_func *old, *tp_funcs; 282 328 int ret; 283 329 284 - if (tp->regfunc && !static_key_enabled(&tp->key)) { 285 - ret = tp->regfunc(); 330 + if (tp->ext && tp->ext->regfunc && !static_key_enabled(&tp->key)) { 331 + ret = tp->ext->regfunc(); 286 332 if (ret < 0) 287 333 return ret; 288 334 } ··· 312 358 tracepoint_update_call(tp, tp_funcs); 313 359 /* Both iterator and static call handle NULL tp->funcs */ 314 360 rcu_assign_pointer(tp->funcs, tp_funcs); 315 - static_key_enable(&tp->key); 361 + static_branch_enable(&tp->key); 316 362 break; 317 363 case TP_FUNC_2: /* 1->2 */ 318 364 /* Set iterator static call */ ··· 337 383 break; 338 384 } 339 385 340 - release_probes(old); 386 + release_probes(tp, old); 341 387 return 0; 342 388 } 343 389 ··· 365 411 switch (nr_func_state(tp_funcs)) { 366 412 case TP_FUNC_0: /* 1->0 */ 367 413 /* Removed last function */ 368 - if (tp->unregfunc && static_key_enabled(&tp->key)) 369 - tp->unregfunc(); 370 - 371 - static_key_disable(&tp->key); 414 + if (tp->ext && tp->ext->unregfunc && static_key_enabled(&tp->key)) 415 + tp->ext->unregfunc(); 416 + static_branch_disable(&tp->key); 372 417 /* Set iterator static call */ 373 418 tracepoint_update_call(tp, tp_funcs); 374 419 /* Both iterator and static call handle NULL tp->funcs */ ··· 408 455 WARN_ON_ONCE(1); 409 456 break; 410 457 } 411 - release_probes(old); 458 + release_probes(tp, old); 412 459 return 0; 413 460 } 414 461
-2
scripts/tags.sh
··· 152 152 '/^BPF_CALL_[0-9]([[:space:]]*\([[:alnum:]_]*\).*/\1/' 153 153 '/^COMPAT_SYSCALL_DEFINE[0-9]([[:space:]]*\([[:alnum:]_]*\).*/compat_sys_\1/' 154 154 '/^TRACE_EVENT([[:space:]]*\([[:alnum:]_]*\).*/trace_\1/' 155 - '/^TRACE_EVENT([[:space:]]*\([[:alnum:]_]*\).*/trace_\1_rcuidle/' 156 155 '/^DEFINE_EVENT([^,)]*,[[:space:]]*\([[:alnum:]_]*\).*/trace_\1/' 157 - '/^DEFINE_EVENT([^,)]*,[[:space:]]*\([[:alnum:]_]*\).*/trace_\1_rcuidle/' 158 156 '/^DEFINE_INSN_CACHE_OPS([[:space:]]*\([[:alnum:]_]*\).*/get_\1_slot/' 159 157 '/^DEFINE_INSN_CACHE_OPS([[:space:]]*\([[:alnum:]_]*\).*/free_\1_slot/' 160 158 '/^PAGEFLAG([[:space:]]*\([[:alnum:]_]*\).*/Page\1/'