Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'trace-ring-buffer-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull trace ring buffer fixes from Steven Rostedt:

- Enable resize on mmap() error

When a process mmaps a ring buffer, its size is locked and resizing
is disabled. But if the user passes in a wrong parameter, the mmap()
can fail after the resize was disabled and the mmap() exits with
error without reenabling the ring buffer resize. This prevents the
ring buffer from ever being resized after that. Reenable resizing of
the ring buffer on mmap() error.

- Have resizing return proper error and not always -ENOMEM

If the ring buffer is mmapped by one task and another task tries to
resize the buffer it will error with -ENOMEM. This is confusing to
the user as there may be plenty of memory available. Have it return
the error that actually happens (in this case -EBUSY) where the user
can understand why the resize failed.

- Test the sub-buffer array to validate persistent memory buffer

On boot up, the initialization of the persistent memory buffer will
do a validation check to see if the content of the data is valid, and
if so, it will use the memory as is, otherwise it re-initializes it.
There's meta data in this persistent memory that keeps track of which
sub-buffer is the reader page and an array that states the order of
the sub-buffers. The values in this array are indexes into the
sub-buffers. The validator checks to make sure that all the entries
in the array are within the sub-buffer list index, but it does not
check for duplications.

While working on this code, the array got corrupted and had
duplicates, where not all the sub-buffers were accounted for. This
passed the validator as all entries were valid, but the link list was
incorrect and could have caused a crash. The corruption only produced
incorrect data, but it could have been more severe. To fix this,
create a bitmask that covers all the sub-buffer indexes and set it to
all zeros. While iterating the array checking the values of the array
content, have it set a bit corresponding to the index in the array.
If the bit was already set, then it is a duplicate and mark the
buffer as invalid and reset it.

- Prevent mmap()ing persistent ring buffer

The persistent ring buffer uses vmap() to map the persistent memory.
Currently, the mmap() logic only uses virt_to_page() to get the page
from the ring buffer memory and use that to map to user space. This
works because a normal ring buffer uses alloc_page() to allocate its
memory. But because the persistent ring buffer use vmap() it causes a
kernel crash.

Fixing this to work with vmap() is not hard, but since mmap() on
persistent memory buffers never worked, just have the mmap() return
-ENODEV (what was returned before mmap() for persistent memory ring
buffers, as they never supported mmap. Normal buffers will still
allow mmap(). Implementing mmap() for persistent memory ring buffers
can wait till the next merge window.

- Fix polling on persistent ring buffers

There's a "buffer_percent" option (default set to 50), that is used
to have reads of the ring buffer binary data block until the buffer
fills to that percentage. The field "pages_touched" is incremented
every time a new sub-buffer has content added to it. This field is
used in the calculations to determine the amount of content is in the
buffer and if it exceeds the "buffer_percent" then it will wake the
task polling on the buffer.

As persistent ring buffers can be created by the content from a
previous boot, the "pages_touched" field was not updated. This means
that if a task were to poll on the persistent buffer, it would block
even if the buffer was completely full. It would block even if the
"buffer_percent" was zero, because with "pages_touched" as zero, it
would be calculated as the buffer having no content. Update
pages_touched when initializing the persistent ring buffer from a
previous boot.

* tag 'trace-ring-buffer-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
ring-buffer: Update pages_touched to reflect persistent buffer content
tracing: Do not allow mmap() of persistent ring buffer
ring-buffer: Validate the persistent meta data subbuf array
tracing: Have the error of __tracing_resize_ring_buffer() passed to user
ring-buffer: Unlock resize on mmap error

+31 -9
+26 -2
kernel/trace/ring_buffer.c
··· 1672 1672 * must be the same. 1673 1673 */ 1674 1674 static bool rb_meta_valid(struct ring_buffer_meta *meta, int cpu, 1675 - struct trace_buffer *buffer, int nr_pages) 1675 + struct trace_buffer *buffer, int nr_pages, 1676 + unsigned long *subbuf_mask) 1676 1677 { 1677 1678 int subbuf_size = PAGE_SIZE; 1678 1679 struct buffer_data_page *subbuf; 1679 1680 unsigned long buffers_start; 1680 1681 unsigned long buffers_end; 1681 1682 int i; 1683 + 1684 + if (!subbuf_mask) 1685 + return false; 1682 1686 1683 1687 /* Check the meta magic and meta struct size */ 1684 1688 if (meta->magic != RING_BUFFER_META_MAGIC || ··· 1716 1712 1717 1713 subbuf = rb_subbufs_from_meta(meta); 1718 1714 1715 + bitmap_clear(subbuf_mask, 0, meta->nr_subbufs); 1716 + 1719 1717 /* Is the meta buffers and the subbufs themselves have correct data? */ 1720 1718 for (i = 0; i < meta->nr_subbufs; i++) { 1721 1719 if (meta->buffers[i] < 0 || ··· 1731 1725 return false; 1732 1726 } 1733 1727 1728 + if (test_bit(meta->buffers[i], subbuf_mask)) { 1729 + pr_info("Ring buffer boot meta [%d] array has duplicates\n", cpu); 1730 + return false; 1731 + } 1732 + 1733 + set_bit(meta->buffers[i], subbuf_mask); 1734 1734 subbuf = (void *)subbuf + subbuf_size; 1735 1735 } 1736 1736 ··· 1850 1838 cpu_buffer->cpu); 1851 1839 goto invalid; 1852 1840 } 1841 + 1842 + /* If the buffer has content, update pages_touched */ 1843 + if (ret) 1844 + local_inc(&cpu_buffer->pages_touched); 1845 + 1853 1846 entries += ret; 1854 1847 entry_bytes += local_read(&head_page->page->commit); 1855 1848 local_set(&cpu_buffer->head_page->entries, ret); ··· 1906 1889 static void rb_range_meta_init(struct trace_buffer *buffer, int nr_pages) 1907 1890 { 1908 1891 struct ring_buffer_meta *meta; 1892 + unsigned long *subbuf_mask; 1909 1893 unsigned long delta; 1910 1894 void *subbuf; 1911 1895 int cpu; 1912 1896 int i; 1897 + 1898 + /* Create a mask to test the subbuf array */ 1899 + subbuf_mask = bitmap_alloc(nr_pages + 1, GFP_KERNEL); 1900 + /* If subbuf_mask fails to allocate, then rb_meta_valid() will return false */ 1913 1901 1914 1902 for (cpu = 0; cpu < nr_cpu_ids; cpu++) { 1915 1903 void *next_meta; 1916 1904 1917 1905 meta = rb_range_meta(buffer, nr_pages, cpu); 1918 1906 1919 - if (rb_meta_valid(meta, cpu, buffer, nr_pages)) { 1907 + if (rb_meta_valid(meta, cpu, buffer, nr_pages, subbuf_mask)) { 1920 1908 /* Make the mappings match the current address */ 1921 1909 subbuf = rb_subbufs_from_meta(meta); 1922 1910 delta = (unsigned long)subbuf - meta->first_buffer; ··· 1965 1943 subbuf += meta->subbuf_size; 1966 1944 } 1967 1945 } 1946 + bitmap_free(subbuf_mask); 1968 1947 } 1969 1948 1970 1949 static void *rbm_start(struct seq_file *m, loff_t *pos) ··· 7149 7126 kfree(cpu_buffer->subbuf_ids); 7150 7127 cpu_buffer->subbuf_ids = NULL; 7151 7128 rb_free_meta_page(cpu_buffer); 7129 + atomic_dec(&cpu_buffer->resize_disabled); 7152 7130 } 7153 7131 7154 7132 unlock:
+5 -7
kernel/trace/trace.c
··· 5977 5977 ssize_t tracing_resize_ring_buffer(struct trace_array *tr, 5978 5978 unsigned long size, int cpu_id) 5979 5979 { 5980 - int ret; 5981 - 5982 5980 guard(mutex)(&trace_types_lock); 5983 5981 5984 5982 if (cpu_id != RING_BUFFER_ALL_CPUS) { ··· 5985 5987 return -EINVAL; 5986 5988 } 5987 5989 5988 - ret = __tracing_resize_ring_buffer(tr, size, cpu_id); 5989 - if (ret < 0) 5990 - ret = -ENOMEM; 5991 - 5992 - return ret; 5990 + return __tracing_resize_ring_buffer(tr, size, cpu_id); 5993 5991 } 5994 5992 5995 5993 static void update_last_data(struct trace_array *tr) ··· 8278 8284 struct ftrace_buffer_info *info = filp->private_data; 8279 8285 struct trace_iterator *iter = &info->iter; 8280 8286 int ret = 0; 8287 + 8288 + /* Currently the boot mapped buffer is not supported for mmap */ 8289 + if (iter->tr->flags & TRACE_ARRAY_FL_BOOT) 8290 + return -ENODEV; 8281 8291 8282 8292 ret = get_snapshot_map(iter->tr); 8283 8293 if (ret)