Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

tracing: Add better comments for the filtering temp buffer use case

When filtering is enabled, the event is copied into a temp buffer instead
of being written into the ring buffer directly, because the discarding of
events from the ring buffer is very expensive, and doing the extra copy is
much faster than having to discard most of the time.

As that logic is subtle, add comments to explain in more detail to what is
going on and how it works.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

+35 -1
+35 -1
kernel/trace/trace.c
··· 2734 2734 if (!tr->no_filter_buffering_ref && 2735 2735 (trace_file->flags & (EVENT_FILE_FL_SOFT_DISABLED | EVENT_FILE_FL_FILTERED)) && 2736 2736 (entry = this_cpu_read(trace_buffered_event))) { 2737 - /* Try to use the per cpu buffer first */ 2737 + /* 2738 + * Filtering is on, so try to use the per cpu buffer first. 2739 + * This buffer will simulate a ring_buffer_event, 2740 + * where the type_len is zero and the array[0] will 2741 + * hold the full length. 2742 + * (see include/linux/ring-buffer.h for details on 2743 + * how the ring_buffer_event is structured). 2744 + * 2745 + * Using a temp buffer during filtering and copying it 2746 + * on a matched filter is quicker than writing directly 2747 + * into the ring buffer and then discarding it when 2748 + * it doesn't match. That is because the discard 2749 + * requires several atomic operations to get right. 2750 + * Copying on match and doing nothing on a failed match 2751 + * is still quicker than no copy on match, but having 2752 + * to discard out of the ring buffer on a failed match. 2753 + */ 2738 2754 int max_len = PAGE_SIZE - struct_size(entry, array, 1); 2739 2755 2740 2756 val = this_cpu_inc_return(trace_buffered_event_cnt); 2757 + 2758 + /* 2759 + * Preemption is disabled, but interrupts and NMIs 2760 + * can still come in now. If that happens after 2761 + * the above increment, then it will have to go 2762 + * back to the old method of allocating the event 2763 + * on the ring buffer, and if the filter fails, it 2764 + * will have to call ring_buffer_discard_commit() 2765 + * to remove it. 2766 + * 2767 + * Need to also check the unlikely case that the 2768 + * length is bigger than the temp buffer size. 2769 + * If that happens, then the reserve is pretty much 2770 + * guaranteed to fail, as the ring buffer currently 2771 + * only allows events less than a page. But that may 2772 + * change in the future, so let the ring buffer reserve 2773 + * handle the failure in that case. 2774 + */ 2741 2775 if (val == 1 && likely(len <= max_len)) { 2742 2776 trace_event_setup(entry, type, trace_ctx); 2743 2777 entry->array[0] = len;