Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge branch kvm-arm64/hyp-tracing into kvmarm-master/next

* kvm-arm64/hyp-tracing: (40 commits)
: .
: EL2 tracing support, adding both 'remote' ring-buffer
: infrastructure and the tracing itself, courtesy of
: Vincent Donnefort. From the cover letter:
:
: "The growing set of features supported by the hypervisor in protected
: mode necessitates debugging and profiling tools. Tracefs is the
: ideal candidate for this task:
:
: * It is simple to use and to script.
:
: * It is supported by various tools, from the trace-cmd CLI to the
: Android web-based perfetto.
:
: * The ring-buffer, where are stored trace events consists of linked
: pages, making it an ideal structure for sharing between kernel and
: hypervisor.
:
: This series first introduces a new generic way of creating remote events and
: remote buffers. Then it adds support to the pKVM hypervisor."
: .
tracing: selftests: Extend hotplug testing for trace remotes
tracing: Non-consuming read for trace remotes with an offline CPU
tracing: Adjust cmd_check_undefined to show unexpected undefined symbols
tracing: Restore accidentally removed SPDX tag
KVM: arm64: avoid unused-variable warning
tracing: Generate undef symbols allowlist for simple_ring_buffer
KVM: arm64: tracing: add ftrace dependency
tracing: add more symbols to whitelist
tracing: Update undefined symbols allow list for simple_ring_buffer
KVM: arm64: Fix out-of-tree build for nVHE/pKVM tracing
tracing: selftests: Add hypervisor trace remote tests
KVM: arm64: Add selftest event support to nVHE/pKVM hyp
KVM: arm64: Add hyp_enter/hyp_exit events to nVHE/pKVM hyp
KVM: arm64: Add event support to the nVHE/pKVM hyp and trace remote
KVM: arm64: Add trace reset to the nVHE/pKVM hyp
KVM: arm64: Sync boot clock with the nVHE/pKVM hyp
KVM: arm64: Add trace remote for the nVHE/pKVM hyp
KVM: arm64: Add tracing capability for the nVHE/pKVM hyp
KVM: arm64: Support unaligned fixmap in the pKVM hyp
KVM: arm64: Initialise hyp_nr_cpus for nVHE hyp
...

Signed-off-by: Marc Zyngier <maz@kernel.org>

+4865 -122
+11
Documentation/trace/index.rst
··· 91 91 user_events 92 92 uprobetracer 93 93 94 + Remote Tracing 95 + -------------- 96 + 97 + This section covers the framework to read compatible ring-buffers, written by 98 + entities outside of the kernel (most likely firmware or hypervisor) 99 + 100 + .. toctree:: 101 + :maxdepth: 1 102 + 103 + remotes 104 + 94 105 Additional Resources 95 106 -------------------- 96 107
+66
Documentation/trace/remotes.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + =============== 4 + Tracing Remotes 5 + =============== 6 + 7 + :Author: Vincent Donnefort <vdonnefort@google.com> 8 + 9 + Overview 10 + ======== 11 + Firmware and hypervisors are black boxes to the kernel. Having a way to see what 12 + they are doing can be useful to debug both. This is where remote tracing buffers 13 + come in. A remote tracing buffer is a ring buffer executed by the firmware or 14 + hypervisor into memory that is memory mapped to the host kernel. This is similar 15 + to how user space memory maps the kernel ring buffer but in this case the kernel 16 + is acting like user space and the firmware or hypervisor is the "kernel" side. 17 + With a trace remote ring buffer, the firmware and hypervisor can record events 18 + for which the host kernel can see and expose to user space. 19 + 20 + Register a remote 21 + ================= 22 + A remote must provide a set of callbacks `struct trace_remote_callbacks` whom 23 + description can be found below. Those callbacks allows Tracefs to enable and 24 + disable tracing and events, to load and unload a tracing buffer (a set of 25 + ring-buffers) and to swap a reader page with the head page, which enables 26 + consuming reading. 27 + 28 + .. kernel-doc:: include/linux/trace_remote.h 29 + 30 + Once registered, an instance will appear for this remote in the Tracefs 31 + directory **remotes/**. Buffers can then be read using the usual Tracefs files 32 + **trace_pipe** and **trace**. 33 + 34 + Declare a remote event 35 + ====================== 36 + Macros are provided to ease the declaration of remote events, in a similar 37 + fashion to in-kernel events. A declaration must provide an ID, a description of 38 + the event arguments and how to print the event: 39 + 40 + .. code-block:: c 41 + 42 + REMOTE_EVENT(foo, EVENT_FOO_ID, 43 + RE_STRUCT( 44 + re_field(u64, bar) 45 + ), 46 + RE_PRINTK("bar=%lld", __entry->bar) 47 + ); 48 + 49 + Then those events must be declared in a C file with the following: 50 + 51 + .. code-block:: c 52 + 53 + #define REMOTE_EVENT_INCLUDE_FILE foo_events.h 54 + #include <trace/define_remote_events.h> 55 + 56 + This will provide a `struct remote_event remote_event_foo` that can be given to 57 + `trace_remote_register`. 58 + 59 + Registered events appear in the remote directory under **events/**. 60 + 61 + Simple ring-buffer 62 + ================== 63 + A simple implementation for a ring-buffer writer can be found in 64 + kernel/trace/simple_ring_buffer.c. 65 + 66 + .. kernel-doc:: include/linux/simple_ring_buffer.h
+8
arch/arm64/include/asm/kvm_asm.h
··· 89 89 __KVM_HOST_SMCCC_FUNC___pkvm_vcpu_load, 90 90 __KVM_HOST_SMCCC_FUNC___pkvm_vcpu_put, 91 91 __KVM_HOST_SMCCC_FUNC___pkvm_tlb_flush_vmid, 92 + __KVM_HOST_SMCCC_FUNC___tracing_load, 93 + __KVM_HOST_SMCCC_FUNC___tracing_unload, 94 + __KVM_HOST_SMCCC_FUNC___tracing_enable, 95 + __KVM_HOST_SMCCC_FUNC___tracing_swap_reader, 96 + __KVM_HOST_SMCCC_FUNC___tracing_update_clock, 97 + __KVM_HOST_SMCCC_FUNC___tracing_reset, 98 + __KVM_HOST_SMCCC_FUNC___tracing_enable_event, 99 + __KVM_HOST_SMCCC_FUNC___tracing_write_event, 92 100 }; 93 101 94 102 #define DECLARE_KVM_VHE_SYM(sym) extern char sym[]
+16
arch/arm64/include/asm/kvm_define_hypevents.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + 3 + #define REMOTE_EVENT_INCLUDE_FILE arch/arm64/include/asm/kvm_hypevents.h 4 + 5 + #define REMOTE_EVENT_SECTION "_hyp_events" 6 + 7 + #define HE_STRUCT(__args) __args 8 + #define HE_PRINTK(__args...) __args 9 + #define he_field re_field 10 + 11 + #define HYP_EVENT(__name, __proto, __struct, __assign, __printk) \ 12 + REMOTE_EVENT(__name, 0, RE_STRUCT(__struct), RE_PRINTK(__printk)) 13 + 14 + #define HYP_EVENT_MULTI_READ 15 + #include <trace/define_remote_events.h> 16 + #undef HYP_EVENT_MULTI_READ
+3
arch/arm64/include/asm/kvm_host.h
··· 923 923 924 924 /* Per-vcpu TLB for VNCR_EL2 -- NULL when !NV */ 925 925 struct vncr_tlb *vncr_tlb; 926 + 927 + /* Hyp-readable copy of kvm_vcpu::pid */ 928 + pid_t pid; 926 929 }; 927 930 928 931 /*
+2 -2
arch/arm64/include/asm/kvm_hyp.h
··· 129 129 #ifdef __KVM_NVHE_HYPERVISOR__ 130 130 void __pkvm_init_switch_pgd(phys_addr_t pgd, unsigned long sp, 131 131 void (*fn)(void)); 132 - int __pkvm_init(phys_addr_t phys, unsigned long size, unsigned long nr_cpus, 133 - unsigned long *per_cpu_base, u32 hyp_va_bits); 132 + int __pkvm_init(phys_addr_t phys, unsigned long size, unsigned long *per_cpu_base, u32 hyp_va_bits); 134 133 void __noreturn __host_enter(struct kvm_cpu_context *host_ctxt); 135 134 #endif 136 135 ··· 146 147 extern unsigned long kvm_nvhe_sym(__icache_flags); 147 148 extern unsigned int kvm_nvhe_sym(kvm_arm_vmid_bits); 148 149 extern unsigned int kvm_nvhe_sym(kvm_host_sve_max_vl); 150 + extern unsigned long kvm_nvhe_sym(hyp_nr_cpus); 149 151 150 152 #endif /* __ARM64_KVM_HYP_H__ */
+60
arch/arm64/include/asm/kvm_hypevents.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + 3 + #if !defined(__ARM64_KVM_HYPEVENTS_H_) || defined(HYP_EVENT_MULTI_READ) 4 + #define __ARM64_KVM_HYPEVENTS_H_ 5 + 6 + #ifdef __KVM_NVHE_HYPERVISOR__ 7 + #include <nvhe/trace.h> 8 + #endif 9 + 10 + #ifndef __HYP_ENTER_EXIT_REASON 11 + #define __HYP_ENTER_EXIT_REASON 12 + enum hyp_enter_exit_reason { 13 + HYP_REASON_SMC, 14 + HYP_REASON_HVC, 15 + HYP_REASON_PSCI, 16 + HYP_REASON_HOST_ABORT, 17 + HYP_REASON_GUEST_EXIT, 18 + HYP_REASON_ERET_HOST, 19 + HYP_REASON_ERET_GUEST, 20 + HYP_REASON_UNKNOWN /* Must be last */ 21 + }; 22 + #endif 23 + 24 + HYP_EVENT(hyp_enter, 25 + HE_PROTO(struct kvm_cpu_context *host_ctxt, u8 reason), 26 + HE_STRUCT( 27 + he_field(u8, reason) 28 + he_field(pid_t, vcpu) 29 + ), 30 + HE_ASSIGN( 31 + __entry->reason = reason; 32 + __entry->vcpu = __tracing_get_vcpu_pid(host_ctxt); 33 + ), 34 + HE_PRINTK("reason=%s vcpu=%d", __hyp_enter_exit_reason_str(__entry->reason), __entry->vcpu) 35 + ); 36 + 37 + HYP_EVENT(hyp_exit, 38 + HE_PROTO(struct kvm_cpu_context *host_ctxt, u8 reason), 39 + HE_STRUCT( 40 + he_field(u8, reason) 41 + he_field(pid_t, vcpu) 42 + ), 43 + HE_ASSIGN( 44 + __entry->reason = reason; 45 + __entry->vcpu = __tracing_get_vcpu_pid(host_ctxt); 46 + ), 47 + HE_PRINTK("reason=%s vcpu=%d", __hyp_enter_exit_reason_str(__entry->reason), __entry->vcpu) 48 + ); 49 + 50 + HYP_EVENT(selftest, 51 + HE_PROTO(u64 id), 52 + HE_STRUCT( 53 + he_field(u64, id) 54 + ), 55 + HE_ASSIGN( 56 + __entry->id = id; 57 + ), 58 + RE_PRINTK("id=%llu", __entry->id) 59 + ); 60 + #endif
+26
arch/arm64/include/asm/kvm_hyptrace.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + #ifndef __ARM64_KVM_HYPTRACE_H_ 3 + #define __ARM64_KVM_HYPTRACE_H_ 4 + 5 + #include <linux/ring_buffer.h> 6 + 7 + struct hyp_trace_desc { 8 + unsigned long bpages_backing_start; 9 + size_t bpages_backing_size; 10 + struct trace_buffer_desc trace_buffer_desc; 11 + 12 + }; 13 + 14 + struct hyp_event_id { 15 + unsigned short id; 16 + atomic_t enabled; 17 + }; 18 + 19 + extern struct remote_event __hyp_events_start[]; 20 + extern struct remote_event __hyp_events_end[]; 21 + 22 + /* hyp_event section used by the hypervisor */ 23 + extern struct hyp_event_id __hyp_event_ids_start[]; 24 + extern struct hyp_event_id __hyp_event_ids_end[]; 25 + 26 + #endif
+4
arch/arm64/kernel/image-vars.h
··· 138 138 KVM_NVHE_ALIAS(__hyp_data_end); 139 139 KVM_NVHE_ALIAS(__hyp_rodata_start); 140 140 KVM_NVHE_ALIAS(__hyp_rodata_end); 141 + #ifdef CONFIG_NVHE_EL2_TRACING 142 + KVM_NVHE_ALIAS(__hyp_event_ids_start); 143 + KVM_NVHE_ALIAS(__hyp_event_ids_end); 144 + #endif 141 145 142 146 /* pKVM static key */ 143 147 KVM_NVHE_ALIAS(kvm_protected_mode_initialized);
+18
arch/arm64/kernel/vmlinux.lds.S
··· 13 13 *(__kvm_ex_table) \ 14 14 __stop___kvm_ex_table = .; 15 15 16 + #ifdef CONFIG_NVHE_EL2_TRACING 17 + #define HYPERVISOR_EVENT_IDS \ 18 + . = ALIGN(PAGE_SIZE); \ 19 + __hyp_event_ids_start = .; \ 20 + *(HYP_SECTION_NAME(.event_ids)) \ 21 + __hyp_event_ids_end = .; 22 + #else 23 + #define HYPERVISOR_EVENT_IDS 24 + #endif 25 + 16 26 #define HYPERVISOR_RODATA_SECTIONS \ 17 27 HYP_SECTION_NAME(.rodata) : { \ 18 28 . = ALIGN(PAGE_SIZE); \ 19 29 __hyp_rodata_start = .; \ 20 30 *(HYP_SECTION_NAME(.data..ro_after_init)) \ 21 31 *(HYP_SECTION_NAME(.rodata)) \ 32 + HYPERVISOR_EVENT_IDS \ 22 33 . = ALIGN(PAGE_SIZE); \ 23 34 __hyp_rodata_end = .; \ 24 35 } ··· 318 307 319 308 HYPERVISOR_DATA_SECTION 320 309 310 + #ifdef CONFIG_NVHE_EL2_TRACING 311 + .data.hyp_events : { 312 + __hyp_events_start = .; 313 + *(SORT(_hyp_events.*)) 314 + __hyp_events_end = .; 315 + } 316 + #endif 321 317 /* 322 318 * Data written with the MMU off but read with the MMU on requires 323 319 * cache lines to be invalidated, discarding up to a Cache Writeback
+45 -23
arch/arm64/kvm/Kconfig
··· 42 42 43 43 If unsure, say N. 44 44 45 - config NVHE_EL2_DEBUG 46 - bool "Debug mode for non-VHE EL2 object" 47 - depends on KVM 48 - help 49 - Say Y here to enable the debug mode for the non-VHE KVM EL2 object. 50 - Failure reports will BUG() in the hypervisor. This is intended for 51 - local EL2 hypervisor development. 52 - 53 - If unsure, say N. 54 - 55 - config PROTECTED_NVHE_STACKTRACE 56 - bool "Protected KVM hypervisor stacktraces" 57 - depends on NVHE_EL2_DEBUG 58 - default n 59 - help 60 - Say Y here to enable pKVM hypervisor stacktraces on hyp_panic() 61 - 62 - If using protected nVHE mode, but cannot afford the associated 63 - memory cost (less than 0.75 page per CPU) of pKVM stacktraces, 64 - say N. 65 - 66 - If unsure, or not using protected nVHE (pKVM), say N. 45 + if KVM 67 46 68 47 config PTDUMP_STAGE2_DEBUGFS 69 48 bool "Present the stage-2 pagetables to debugfs" 70 - depends on KVM 71 49 depends on DEBUG_KERNEL 72 50 depends on DEBUG_FS 73 51 depends on ARCH_HAS_PTDUMP ··· 60 82 61 83 If in doubt, say N. 62 84 85 + config NVHE_EL2_DEBUG 86 + bool "Debug mode for non-VHE EL2 object" 87 + default n 88 + help 89 + Say Y here to enable the debug mode for the non-VHE KVM EL2 object. 90 + Failure reports will BUG() in the hypervisor. This is intended for 91 + local EL2 hypervisor development. 92 + 93 + If unsure, say N. 94 + 95 + if NVHE_EL2_DEBUG 96 + 97 + config NVHE_EL2_TRACING 98 + bool 99 + depends on TRACING && FTRACE 100 + select TRACE_REMOTE 101 + default y 102 + 103 + config PKVM_DISABLE_STAGE2_ON_PANIC 104 + bool "Disable the host stage-2 on panic" 105 + default n 106 + help 107 + Relax the host stage-2 on hypervisor panic to allow the kernel to 108 + unwind and symbolize the hypervisor stacktrace. This however tampers 109 + the system security. This is intended for local EL2 hypervisor 110 + development. 111 + 112 + If unsure, say N. 113 + 114 + config PKVM_STACKTRACE 115 + bool "Protected KVM hypervisor stacktraces" 116 + depends on PKVM_DISABLE_STAGE2_ON_PANIC 117 + default y 118 + help 119 + Say Y here to enable pKVM hypervisor stacktraces on hyp_panic() 120 + 121 + If using protected nVHE mode, but cannot afford the associated 122 + memory cost (less than 0.75 page per CPU) of pKVM stacktraces, 123 + say N. 124 + 125 + If unsure, or not using protected nVHE (pKVM), say N. 126 + 127 + endif # NVHE_EL2_DEBUG 128 + endif # KVM 63 129 endif # VIRTUALIZATION
+2
arch/arm64/kvm/Makefile
··· 30 30 kvm-$(CONFIG_ARM64_PTR_AUTH) += pauth.o 31 31 kvm-$(CONFIG_PTDUMP_STAGE2_DEBUGFS) += ptdump.o 32 32 33 + kvm-$(CONFIG_NVHE_EL2_TRACING) += hyp_trace.o 34 + 33 35 always-y := hyp_constants.h hyp-constants.s 34 36 35 37 define rule_gen_hyp_constants
+11 -1
arch/arm64/kvm/arm.c
··· 24 24 25 25 #define CREATE_TRACE_POINTS 26 26 #include "trace_arm.h" 27 + #include "hyp_trace.h" 27 28 28 29 #include <linux/uaccess.h> 29 30 #include <asm/ptrace.h> ··· 36 35 #include <asm/kvm_arm.h> 37 36 #include <asm/kvm_asm.h> 38 37 #include <asm/kvm_emulate.h> 38 + #include <asm/kvm_hyp.h> 39 39 #include <asm/kvm_mmu.h> 40 40 #include <asm/kvm_nested.h> 41 41 #include <asm/kvm_pkvm.h> ··· 707 705 708 706 if (!cpumask_test_cpu(cpu, vcpu->kvm->arch.supported_cpus)) 709 707 vcpu_set_on_unsupported_cpu(vcpu); 708 + 709 + vcpu->arch.pid = pid_nr(vcpu->pid); 710 710 } 711 711 712 712 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) ··· 2418 2414 2419 2415 kvm_register_perf_callbacks(); 2420 2416 2417 + err = kvm_hyp_trace_init(); 2418 + if (err) 2419 + kvm_err("Failed to initialize Hyp tracing\n"); 2420 + 2421 2421 out: 2422 2422 if (err) 2423 2423 hyp_cpu_pm_exit(); ··· 2473 2465 preempt_disable(); 2474 2466 cpu_hyp_init_context(); 2475 2467 ret = kvm_call_hyp_nvhe(__pkvm_init, hyp_mem_base, hyp_mem_size, 2476 - num_possible_cpus(), kern_hyp_va(per_cpu_base), 2468 + kern_hyp_va(per_cpu_base), 2477 2469 hyp_va_bits); 2478 2470 cpu_hyp_init_features(); 2479 2471 ··· 2681 2673 memcpy(page_addr, CHOOSE_NVHE_SYM(__per_cpu_start), nvhe_percpu_size()); 2682 2674 kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[cpu] = (unsigned long)page_addr; 2683 2675 } 2676 + 2677 + kvm_nvhe_sym(hyp_nr_cpus) = num_possible_cpus(); 2684 2678 2685 2679 /* 2686 2680 * Map the Hyp-code called directly from the host
+1 -1
arch/arm64/kvm/handle_exit.c
··· 539 539 540 540 /* All hyp bugs, including warnings, are treated as fatal. */ 541 541 if (!is_protected_kvm_enabled() || 542 - IS_ENABLED(CONFIG_NVHE_EL2_DEBUG)) { 542 + IS_ENABLED(CONFIG_PKVM_DISABLE_STAGE2_ON_PANIC)) { 543 543 struct bug_entry *bug = find_bug(elr_in_kimg); 544 544 545 545 if (bug)
+23
arch/arm64/kvm/hyp/include/nvhe/arm-smccc.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + #ifndef __ARM64_KVM_HYP_NVHE_ARM_SMCCC_H__ 3 + #define __ARM64_KVM_HYP_NVHE_ARM_SMCCC_H__ 4 + 5 + #include <asm/kvm_hypevents.h> 6 + 7 + #include <linux/arm-smccc.h> 8 + 9 + #define hyp_smccc_1_1_smc(...) \ 10 + do { \ 11 + trace_hyp_exit(NULL, HYP_REASON_SMC); \ 12 + arm_smccc_1_1_smc(__VA_ARGS__); \ 13 + trace_hyp_enter(NULL, HYP_REASON_SMC); \ 14 + } while (0) 15 + 16 + #define hyp_smccc_1_2_smc(...) \ 17 + do { \ 18 + trace_hyp_exit(NULL, HYP_REASON_SMC); \ 19 + arm_smccc_1_2_smc(__VA_ARGS__); \ 20 + trace_hyp_enter(NULL, HYP_REASON_SMC); \ 21 + } while (0) 22 + 23 + #endif /* __ARM64_KVM_HYP_NVHE_ARM_SMCCC_H__ */
+16
arch/arm64/kvm/hyp/include/nvhe/clock.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #ifndef __ARM64_KVM_HYP_NVHE_CLOCK_H 3 + #define __ARM64_KVM_HYP_NVHE_CLOCK_H 4 + #include <linux/types.h> 5 + 6 + #include <asm/kvm_hyp.h> 7 + 8 + #ifdef CONFIG_NVHE_EL2_TRACING 9 + void trace_clock_update(u32 mult, u32 shift, u64 epoch_ns, u64 epoch_cyc); 10 + u64 trace_clock(void); 11 + #else 12 + static inline void 13 + trace_clock_update(u32 mult, u32 shift, u64 epoch_ns, u64 epoch_cyc) { } 14 + static inline u64 trace_clock(void) { return 0; } 15 + #endif 16 + #endif
+14
arch/arm64/kvm/hyp/include/nvhe/define_events.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + 3 + #undef HYP_EVENT 4 + #define HYP_EVENT(__name, __proto, __struct, __assign, __printk) \ 5 + struct hyp_event_id hyp_event_id_##__name \ 6 + __section(".hyp.event_ids."#__name) = { \ 7 + .enabled = ATOMIC_INIT(0), \ 8 + } 9 + 10 + #define HYP_EVENT_MULTI_READ 11 + #include <asm/kvm_hypevents.h> 12 + #undef HYP_EVENT_MULTI_READ 13 + 14 + #undef HYP_EVENT
-2
arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
··· 30 30 PKVM_ID_FFA, 31 31 }; 32 32 33 - extern unsigned long hyp_nr_cpus; 34 - 35 33 int __pkvm_prot_finalize(void); 36 34 int __pkvm_host_share_hyp(u64 pfn); 37 35 int __pkvm_host_unshare_hyp(u64 pfn);
+70
arch/arm64/kvm/hyp/include/nvhe/trace.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + #ifndef __ARM64_KVM_HYP_NVHE_TRACE_H 3 + #define __ARM64_KVM_HYP_NVHE_TRACE_H 4 + 5 + #include <linux/trace_remote_event.h> 6 + 7 + #include <asm/kvm_hyptrace.h> 8 + 9 + static inline pid_t __tracing_get_vcpu_pid(struct kvm_cpu_context *host_ctxt) 10 + { 11 + struct kvm_vcpu *vcpu; 12 + 13 + if (!host_ctxt) 14 + host_ctxt = host_data_ptr(host_ctxt); 15 + 16 + vcpu = host_ctxt->__hyp_running_vcpu; 17 + 18 + return vcpu ? vcpu->arch.pid : 0; 19 + } 20 + 21 + #define HE_PROTO(__args...) __args 22 + #define HE_ASSIGN(__args...) __args 23 + #define HE_STRUCT RE_STRUCT 24 + #define he_field re_field 25 + 26 + #ifdef CONFIG_NVHE_EL2_TRACING 27 + 28 + #define HYP_EVENT(__name, __proto, __struct, __assign, __printk) \ 29 + REMOTE_EVENT_FORMAT(__name, __struct); \ 30 + extern struct hyp_event_id hyp_event_id_##__name; \ 31 + static __always_inline void trace_##__name(__proto) \ 32 + { \ 33 + struct remote_event_format_##__name *__entry; \ 34 + size_t length = sizeof(*__entry); \ 35 + \ 36 + if (!atomic_read(&hyp_event_id_##__name.enabled)) \ 37 + return; \ 38 + __entry = tracing_reserve_entry(length); \ 39 + if (!__entry) \ 40 + return; \ 41 + __entry->hdr.id = hyp_event_id_##__name.id; \ 42 + __assign \ 43 + tracing_commit_entry(); \ 44 + } 45 + 46 + void *tracing_reserve_entry(unsigned long length); 47 + void tracing_commit_entry(void); 48 + 49 + int __tracing_load(unsigned long desc_va, size_t desc_size); 50 + void __tracing_unload(void); 51 + int __tracing_enable(bool enable); 52 + int __tracing_swap_reader(unsigned int cpu); 53 + void __tracing_update_clock(u32 mult, u32 shift, u64 epoch_ns, u64 epoch_cyc); 54 + int __tracing_reset(unsigned int cpu); 55 + int __tracing_enable_event(unsigned short id, bool enable); 56 + #else 57 + static inline void *tracing_reserve_entry(unsigned long length) { return NULL; } 58 + static inline void tracing_commit_entry(void) { } 59 + #define HYP_EVENT(__name, __proto, __struct, __assign, __printk) \ 60 + static inline void trace_##__name(__proto) {} 61 + 62 + static inline int __tracing_load(unsigned long desc_va, size_t desc_size) { return -ENODEV; } 63 + static inline void __tracing_unload(void) { } 64 + static inline int __tracing_enable(bool enable) { return -ENODEV; } 65 + static inline int __tracing_swap_reader(unsigned int cpu) { return -ENODEV; } 66 + static inline void __tracing_update_clock(u32 mult, u32 shift, u64 epoch_ns, u64 epoch_cyc) { } 67 + static inline int __tracing_reset(unsigned int cpu) { return -ENODEV; } 68 + static inline int __tracing_enable_event(unsigned short id, bool enable) { return -ENODEV; } 69 + #endif 70 + #endif
+5 -1
arch/arm64/kvm/hyp/nvhe/Makefile
··· 17 17 hostprogs := gen-hyprel 18 18 HOST_EXTRACFLAGS += -I$(objtree)/include 19 19 20 - lib-objs := clear_page.o copy_page.o memcpy.o memset.o 20 + lib-objs := clear_page.o copy_page.o memcpy.o memset.o tishift.o 21 21 lib-objs := $(addprefix ../../../lib/, $(lib-objs)) 22 22 23 23 CFLAGS_switch.nvhe.o += -Wno-override-init ··· 29 29 ../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o 30 30 hyp-obj-y += ../../../kernel/smccc-call.o 31 31 hyp-obj-$(CONFIG_LIST_HARDENED) += list_debug.o 32 + hyp-obj-$(CONFIG_NVHE_EL2_TRACING) += clock.o trace.o events.o 32 33 hyp-obj-y += $(lib-objs) 34 + 35 + # Path to simple_ring_buffer.c 36 + CFLAGS_trace.nvhe.o += -I$(srctree)/kernel/trace/ 33 37 34 38 ## 35 39 ## Build rules for compiling nVHE hyp code
+65
arch/arm64/kvm/hyp/nvhe/clock.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Copyright (C) 2025 Google LLC 4 + * Author: Vincent Donnefort <vdonnefort@google.com> 5 + */ 6 + 7 + #include <nvhe/clock.h> 8 + 9 + #include <asm/arch_timer.h> 10 + #include <asm/div64.h> 11 + 12 + static struct clock_data { 13 + struct { 14 + u32 mult; 15 + u32 shift; 16 + u64 epoch_ns; 17 + u64 epoch_cyc; 18 + u64 cyc_overflow64; 19 + } data[2]; 20 + u64 cur; 21 + } trace_clock_data; 22 + 23 + static u64 __clock_mult_uint128(u64 cyc, u32 mult, u32 shift) 24 + { 25 + __uint128_t ns = (__uint128_t)cyc * mult; 26 + 27 + ns >>= shift; 28 + 29 + return (u64)ns; 30 + } 31 + 32 + /* Does not guarantee no reader on the modified bank. */ 33 + void trace_clock_update(u32 mult, u32 shift, u64 epoch_ns, u64 epoch_cyc) 34 + { 35 + struct clock_data *clock = &trace_clock_data; 36 + u64 bank = clock->cur ^ 1; 37 + 38 + clock->data[bank].mult = mult; 39 + clock->data[bank].shift = shift; 40 + clock->data[bank].epoch_ns = epoch_ns; 41 + clock->data[bank].epoch_cyc = epoch_cyc; 42 + clock->data[bank].cyc_overflow64 = ULONG_MAX / mult; 43 + 44 + smp_store_release(&clock->cur, bank); 45 + } 46 + 47 + /* Use untrusted host data */ 48 + u64 trace_clock(void) 49 + { 50 + struct clock_data *clock = &trace_clock_data; 51 + u64 bank = smp_load_acquire(&clock->cur); 52 + u64 cyc, ns; 53 + 54 + cyc = __arch_counter_get_cntvct() - clock->data[bank].epoch_cyc; 55 + 56 + if (likely(cyc < clock->data[bank].cyc_overflow64)) { 57 + ns = cyc * clock->data[bank].mult; 58 + ns >>= clock->data[bank].shift; 59 + } else { 60 + ns = __clock_mult_uint128(cyc, clock->data[bank].mult, 61 + clock->data[bank].shift); 62 + } 63 + 64 + return (u64)ns + clock->data[bank].epoch_ns; 65 + }
+25
arch/arm64/kvm/hyp/nvhe/events.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * Copyright (C) 2025 Google LLC 4 + * Author: Vincent Donnefort <vdonnefort@google.com> 5 + */ 6 + 7 + #include <nvhe/mm.h> 8 + #include <nvhe/trace.h> 9 + 10 + #include <nvhe/define_events.h> 11 + 12 + int __tracing_enable_event(unsigned short id, bool enable) 13 + { 14 + struct hyp_event_id *event_id = &__hyp_event_ids_start[id]; 15 + atomic_t *enabled; 16 + 17 + if (event_id >= __hyp_event_ids_end) 18 + return -EINVAL; 19 + 20 + enabled = hyp_fixmap_map(__hyp_pa(&event_id->enabled)); 21 + atomic_set(enabled, enable); 22 + hyp_fixmap_unmap(); 23 + 24 + return 0; 25 + }
+14 -14
arch/arm64/kvm/hyp/nvhe/ffa.c
··· 26 26 * the duration and are therefore serialised. 27 27 */ 28 28 29 - #include <linux/arm-smccc.h> 30 29 #include <linux/arm_ffa.h> 31 30 #include <asm/kvm_pkvm.h> 32 31 32 + #include <nvhe/arm-smccc.h> 33 33 #include <nvhe/ffa.h> 34 34 #include <nvhe/mem_protect.h> 35 35 #include <nvhe/memory.h> ··· 147 147 { 148 148 struct arm_smccc_1_2_regs res; 149 149 150 - arm_smccc_1_2_smc(&(struct arm_smccc_1_2_regs) { 150 + hyp_smccc_1_2_smc(&(struct arm_smccc_1_2_regs) { 151 151 .a0 = FFA_FN64_RXTX_MAP, 152 152 .a1 = hyp_virt_to_phys(hyp_buffers.tx), 153 153 .a2 = hyp_virt_to_phys(hyp_buffers.rx), ··· 161 161 { 162 162 struct arm_smccc_1_2_regs res; 163 163 164 - arm_smccc_1_2_smc(&(struct arm_smccc_1_2_regs) { 164 + hyp_smccc_1_2_smc(&(struct arm_smccc_1_2_regs) { 165 165 .a0 = FFA_RXTX_UNMAP, 166 166 .a1 = HOST_FFA_ID, 167 167 }, &res); ··· 172 172 static void ffa_mem_frag_tx(struct arm_smccc_1_2_regs *res, u32 handle_lo, 173 173 u32 handle_hi, u32 fraglen, u32 endpoint_id) 174 174 { 175 - arm_smccc_1_2_smc(&(struct arm_smccc_1_2_regs) { 175 + hyp_smccc_1_2_smc(&(struct arm_smccc_1_2_regs) { 176 176 .a0 = FFA_MEM_FRAG_TX, 177 177 .a1 = handle_lo, 178 178 .a2 = handle_hi, ··· 184 184 static void ffa_mem_frag_rx(struct arm_smccc_1_2_regs *res, u32 handle_lo, 185 185 u32 handle_hi, u32 fragoff) 186 186 { 187 - arm_smccc_1_2_smc(&(struct arm_smccc_1_2_regs) { 187 + hyp_smccc_1_2_smc(&(struct arm_smccc_1_2_regs) { 188 188 .a0 = FFA_MEM_FRAG_RX, 189 189 .a1 = handle_lo, 190 190 .a2 = handle_hi, ··· 196 196 static void ffa_mem_xfer(struct arm_smccc_1_2_regs *res, u64 func_id, u32 len, 197 197 u32 fraglen) 198 198 { 199 - arm_smccc_1_2_smc(&(struct arm_smccc_1_2_regs) { 199 + hyp_smccc_1_2_smc(&(struct arm_smccc_1_2_regs) { 200 200 .a0 = func_id, 201 201 .a1 = len, 202 202 .a2 = fraglen, ··· 206 206 static void ffa_mem_reclaim(struct arm_smccc_1_2_regs *res, u32 handle_lo, 207 207 u32 handle_hi, u32 flags) 208 208 { 209 - arm_smccc_1_2_smc(&(struct arm_smccc_1_2_regs) { 209 + hyp_smccc_1_2_smc(&(struct arm_smccc_1_2_regs) { 210 210 .a0 = FFA_MEM_RECLAIM, 211 211 .a1 = handle_lo, 212 212 .a2 = handle_hi, ··· 216 216 217 217 static void ffa_retrieve_req(struct arm_smccc_1_2_regs *res, u32 len) 218 218 { 219 - arm_smccc_1_2_smc(&(struct arm_smccc_1_2_regs) { 219 + hyp_smccc_1_2_smc(&(struct arm_smccc_1_2_regs) { 220 220 .a0 = FFA_FN64_MEM_RETRIEVE_REQ, 221 221 .a1 = len, 222 222 .a2 = len, ··· 225 225 226 226 static void ffa_rx_release(struct arm_smccc_1_2_regs *res) 227 227 { 228 - arm_smccc_1_2_smc(&(struct arm_smccc_1_2_regs) { 228 + hyp_smccc_1_2_smc(&(struct arm_smccc_1_2_regs) { 229 229 .a0 = FFA_RX_RELEASE, 230 230 }, res); 231 231 } ··· 728 728 size_t min_rxtx_sz; 729 729 struct arm_smccc_1_2_regs res; 730 730 731 - arm_smccc_1_2_smc(&(struct arm_smccc_1_2_regs){ 731 + hyp_smccc_1_2_smc(&(struct arm_smccc_1_2_regs){ 732 732 .a0 = FFA_ID_GET, 733 733 }, &res); 734 734 if (res.a0 != FFA_SUCCESS) ··· 737 737 if (res.a2 != HOST_FFA_ID) 738 738 return -EINVAL; 739 739 740 - arm_smccc_1_2_smc(&(struct arm_smccc_1_2_regs){ 740 + hyp_smccc_1_2_smc(&(struct arm_smccc_1_2_regs){ 741 741 .a0 = FFA_FEATURES, 742 742 .a1 = FFA_FN64_RXTX_MAP, 743 743 }, &res); ··· 788 788 * first if TEE supports it. 789 789 */ 790 790 if (FFA_MINOR_VERSION(ffa_req_version) < FFA_MINOR_VERSION(hyp_ffa_version)) { 791 - arm_smccc_1_2_smc(&(struct arm_smccc_1_2_regs) { 791 + hyp_smccc_1_2_smc(&(struct arm_smccc_1_2_regs) { 792 792 .a0 = FFA_VERSION, 793 793 .a1 = ffa_req_version, 794 794 }, res); ··· 824 824 goto out_unlock; 825 825 } 826 826 827 - arm_smccc_1_2_smc(&(struct arm_smccc_1_2_regs) { 827 + hyp_smccc_1_2_smc(&(struct arm_smccc_1_2_regs) { 828 828 .a0 = FFA_PARTITION_INFO_GET, 829 829 .a1 = uuid0, 830 830 .a2 = uuid1, ··· 939 939 if (kvm_host_psci_config.smccc_version < ARM_SMCCC_VERSION_1_2) 940 940 return 0; 941 941 942 - arm_smccc_1_2_smc(&(struct arm_smccc_1_2_regs) { 942 + hyp_smccc_1_2_smc(&(struct arm_smccc_1_2_regs) { 943 943 .a0 = FFA_VERSION, 944 944 .a1 = FFA_VERSION_1_2, 945 945 }, &res);
+1 -1
arch/arm64/kvm/hyp/nvhe/host.S
··· 120 120 121 121 mov x29, x0 122 122 123 - #ifdef CONFIG_NVHE_EL2_DEBUG 123 + #ifdef PKVM_DISABLE_STAGE2_ON_PANIC 124 124 /* Ensure host stage-2 is disabled */ 125 125 mrs x0, hcr_el2 126 126 bic x0, x0, #HCR_VM
+82 -5
arch/arm64/kvm/hyp/nvhe/hyp-main.c
··· 12 12 #include <asm/kvm_emulate.h> 13 13 #include <asm/kvm_host.h> 14 14 #include <asm/kvm_hyp.h> 15 + #include <asm/kvm_hypevents.h> 15 16 #include <asm/kvm_mmu.h> 16 17 17 18 #include <nvhe/ffa.h> 18 19 #include <nvhe/mem_protect.h> 19 20 #include <nvhe/mm.h> 20 21 #include <nvhe/pkvm.h> 22 + #include <nvhe/trace.h> 21 23 #include <nvhe/trap_handler.h> 22 24 23 25 DEFINE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params); ··· 138 136 hyp_vcpu->vcpu.arch.vsesr_el2 = host_vcpu->arch.vsesr_el2; 139 137 140 138 hyp_vcpu->vcpu.arch.vgic_cpu.vgic_v3 = host_vcpu->arch.vgic_cpu.vgic_v3; 139 + 140 + hyp_vcpu->vcpu.arch.pid = host_vcpu->arch.pid; 141 141 } 142 142 143 143 static void sync_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu) ··· 490 486 { 491 487 DECLARE_REG(phys_addr_t, phys, host_ctxt, 1); 492 488 DECLARE_REG(unsigned long, size, host_ctxt, 2); 493 - DECLARE_REG(unsigned long, nr_cpus, host_ctxt, 3); 494 - DECLARE_REG(unsigned long *, per_cpu_base, host_ctxt, 4); 495 - DECLARE_REG(u32, hyp_va_bits, host_ctxt, 5); 489 + DECLARE_REG(unsigned long *, per_cpu_base, host_ctxt, 3); 490 + DECLARE_REG(u32, hyp_va_bits, host_ctxt, 4); 496 491 497 492 /* 498 493 * __pkvm_init() will return only if an error occurred, otherwise it 499 494 * will tail-call in __pkvm_init_finalise() which will have to deal 500 495 * with the host context directly. 501 496 */ 502 - cpu_reg(host_ctxt, 1) = __pkvm_init(phys, size, nr_cpus, per_cpu_base, 503 - hyp_va_bits); 497 + cpu_reg(host_ctxt, 1) = __pkvm_init(phys, size, per_cpu_base, hyp_va_bits); 504 498 } 505 499 506 500 static void handle___pkvm_cpu_set_vector(struct kvm_cpu_context *host_ctxt) ··· 591 589 cpu_reg(host_ctxt, 1) = __pkvm_teardown_vm(handle); 592 590 } 593 591 592 + static void handle___tracing_load(struct kvm_cpu_context *host_ctxt) 593 + { 594 + DECLARE_REG(unsigned long, desc_hva, host_ctxt, 1); 595 + DECLARE_REG(size_t, desc_size, host_ctxt, 2); 596 + 597 + cpu_reg(host_ctxt, 1) = __tracing_load(desc_hva, desc_size); 598 + } 599 + 600 + static void handle___tracing_unload(struct kvm_cpu_context *host_ctxt) 601 + { 602 + __tracing_unload(); 603 + } 604 + 605 + static void handle___tracing_enable(struct kvm_cpu_context *host_ctxt) 606 + { 607 + DECLARE_REG(bool, enable, host_ctxt, 1); 608 + 609 + cpu_reg(host_ctxt, 1) = __tracing_enable(enable); 610 + } 611 + 612 + static void handle___tracing_swap_reader(struct kvm_cpu_context *host_ctxt) 613 + { 614 + DECLARE_REG(unsigned int, cpu, host_ctxt, 1); 615 + 616 + cpu_reg(host_ctxt, 1) = __tracing_swap_reader(cpu); 617 + } 618 + 619 + static void handle___tracing_update_clock(struct kvm_cpu_context *host_ctxt) 620 + { 621 + DECLARE_REG(u32, mult, host_ctxt, 1); 622 + DECLARE_REG(u32, shift, host_ctxt, 2); 623 + DECLARE_REG(u64, epoch_ns, host_ctxt, 3); 624 + DECLARE_REG(u64, epoch_cyc, host_ctxt, 4); 625 + 626 + __tracing_update_clock(mult, shift, epoch_ns, epoch_cyc); 627 + } 628 + 629 + static void handle___tracing_reset(struct kvm_cpu_context *host_ctxt) 630 + { 631 + DECLARE_REG(unsigned int, cpu, host_ctxt, 1); 632 + 633 + cpu_reg(host_ctxt, 1) = __tracing_reset(cpu); 634 + } 635 + 636 + static void handle___tracing_enable_event(struct kvm_cpu_context *host_ctxt) 637 + { 638 + DECLARE_REG(unsigned short, id, host_ctxt, 1); 639 + DECLARE_REG(bool, enable, host_ctxt, 2); 640 + 641 + cpu_reg(host_ctxt, 1) = __tracing_enable_event(id, enable); 642 + } 643 + 644 + static void handle___tracing_write_event(struct kvm_cpu_context *host_ctxt) 645 + { 646 + DECLARE_REG(u64, id, host_ctxt, 1); 647 + 648 + trace_selftest(id); 649 + } 650 + 594 651 typedef void (*hcall_t)(struct kvm_cpu_context *); 595 652 596 653 #define HANDLE_FUNC(x) [__KVM_HOST_SMCCC_FUNC_##x] = (hcall_t)handle_##x ··· 691 630 HANDLE_FUNC(__pkvm_vcpu_load), 692 631 HANDLE_FUNC(__pkvm_vcpu_put), 693 632 HANDLE_FUNC(__pkvm_tlb_flush_vmid), 633 + HANDLE_FUNC(__tracing_load), 634 + HANDLE_FUNC(__tracing_unload), 635 + HANDLE_FUNC(__tracing_enable), 636 + HANDLE_FUNC(__tracing_swap_reader), 637 + HANDLE_FUNC(__tracing_update_clock), 638 + HANDLE_FUNC(__tracing_reset), 639 + HANDLE_FUNC(__tracing_enable_event), 640 + HANDLE_FUNC(__tracing_write_event), 694 641 }; 695 642 696 643 static void handle_host_hcall(struct kvm_cpu_context *host_ctxt) ··· 739 670 740 671 static void default_host_smc_handler(struct kvm_cpu_context *host_ctxt) 741 672 { 673 + trace_hyp_exit(host_ctxt, HYP_REASON_SMC); 742 674 __kvm_hyp_host_forward_smc(host_ctxt); 675 + trace_hyp_enter(host_ctxt, HYP_REASON_SMC); 743 676 } 744 677 745 678 static void handle_host_smc(struct kvm_cpu_context *host_ctxt) ··· 828 757 { 829 758 u64 esr = read_sysreg_el2(SYS_ESR); 830 759 760 + 831 761 switch (ESR_ELx_EC(esr)) { 832 762 case ESR_ELx_EC_HVC64: 763 + trace_hyp_enter(host_ctxt, HYP_REASON_HVC); 833 764 handle_host_hcall(host_ctxt); 834 765 break; 835 766 case ESR_ELx_EC_SMC64: 767 + trace_hyp_enter(host_ctxt, HYP_REASON_SMC); 836 768 handle_host_smc(host_ctxt); 837 769 break; 838 770 case ESR_ELx_EC_IABT_LOW: 839 771 case ESR_ELx_EC_DABT_LOW: 772 + trace_hyp_enter(host_ctxt, HYP_REASON_HOST_ABORT); 840 773 handle_host_mem_abort(host_ctxt); 841 774 break; 842 775 case ESR_ELx_EC_SYS64: ··· 850 775 default: 851 776 BUG(); 852 777 } 778 + 779 + trace_hyp_exit(host_ctxt, HYP_REASON_ERET_HOST); 853 780 }
+6
arch/arm64/kvm/hyp/nvhe/hyp.lds.S
··· 16 16 HYP_SECTION(.text) 17 17 HYP_SECTION(.data..ro_after_init) 18 18 HYP_SECTION(.rodata) 19 + #ifdef CONFIG_NVHE_EL2_TRACING 20 + . = ALIGN(PAGE_SIZE); 21 + BEGIN_HYP_SECTION(.event_ids) 22 + *(SORT(.hyp.event_ids.*)) 23 + END_HYP_SECTION 24 + #endif 19 25 20 26 /* 21 27 * .hyp..data..percpu needs to be page aligned to maintain the same
+2 -2
arch/arm64/kvm/hyp/nvhe/mm.c
··· 244 244 245 245 void *hyp_fixmap_map(phys_addr_t phys) 246 246 { 247 - return fixmap_map_slot(this_cpu_ptr(&fixmap_slots), phys); 247 + return fixmap_map_slot(this_cpu_ptr(&fixmap_slots), phys) + offset_in_page(phys); 248 248 } 249 249 250 250 static void fixmap_clear_slot(struct hyp_fixmap_slot *slot) ··· 366 366 #ifdef HAS_FIXBLOCK 367 367 *size = PMD_SIZE; 368 368 hyp_spin_lock(&hyp_fixblock_lock); 369 - return fixmap_map_slot(&hyp_fixblock_slot, phys); 369 + return fixmap_map_slot(&hyp_fixblock_slot, phys) + offset_in_page(phys); 370 370 #else 371 371 *size = PAGE_SIZE; 372 372 return hyp_fixmap_map(phys);
+5 -2
arch/arm64/kvm/hyp/nvhe/psci-relay.c
··· 6 6 7 7 #include <asm/kvm_asm.h> 8 8 #include <asm/kvm_hyp.h> 9 + #include <asm/kvm_hypevents.h> 9 10 #include <asm/kvm_mmu.h> 10 - #include <linux/arm-smccc.h> 11 11 #include <linux/kvm_host.h> 12 12 #include <uapi/linux/psci.h> 13 13 14 + #include <nvhe/arm-smccc.h> 14 15 #include <nvhe/memory.h> 15 16 #include <nvhe/trap_handler.h> 16 17 ··· 66 65 { 67 66 struct arm_smccc_res res; 68 67 69 - arm_smccc_1_1_smc(fn, arg0, arg1, arg2, &res); 68 + hyp_smccc_1_1_smc(fn, arg0, arg1, arg2, &res); 70 69 return res.a0; 71 70 } 72 71 ··· 207 206 struct kvm_cpu_context *host_ctxt; 208 207 209 208 host_ctxt = host_data_ptr(host_ctxt); 209 + trace_hyp_enter(host_ctxt, HYP_REASON_PSCI); 210 210 211 211 if (is_cpu_on) 212 212 boot_args = this_cpu_ptr(&cpu_on_args); ··· 223 221 write_sysreg_el1(INIT_SCTLR_EL1_MMU_OFF, SYS_SCTLR); 224 222 write_sysreg(INIT_PSTATE_EL1, SPSR_EL2); 225 223 224 + trace_hyp_exit(host_ctxt, HYP_REASON_PSCI); 226 225 __host_enter(host_ctxt); 227 226 } 228 227
+1 -3
arch/arm64/kvm/hyp/nvhe/setup.c
··· 341 341 __host_enter(host_ctxt); 342 342 } 343 343 344 - int __pkvm_init(phys_addr_t phys, unsigned long size, unsigned long nr_cpus, 345 - unsigned long *per_cpu_base, u32 hyp_va_bits) 344 + int __pkvm_init(phys_addr_t phys, unsigned long size, unsigned long *per_cpu_base, u32 hyp_va_bits) 346 345 { 347 346 struct kvm_nvhe_init_params *params; 348 347 void *virt = hyp_phys_to_virt(phys); ··· 354 355 return -EINVAL; 355 356 356 357 hyp_spin_lock_init(&pkvm_pgd_lock); 357 - hyp_nr_cpus = nr_cpus; 358 358 359 359 ret = divide_memory_pool(virt, size); 360 360 if (ret)
+3 -3
arch/arm64/kvm/hyp/nvhe/stacktrace.c
··· 34 34 stacktrace_info->pc = pc; 35 35 } 36 36 37 - #ifdef CONFIG_PROTECTED_NVHE_STACKTRACE 37 + #ifdef CONFIG_PKVM_STACKTRACE 38 38 #include <asm/stacktrace/nvhe.h> 39 39 40 40 DEFINE_PER_CPU(unsigned long [NVHE_STACKTRACE_SIZE/sizeof(long)], pkvm_stacktrace); ··· 134 134 135 135 unwind(&state, pkvm_save_backtrace_entry, &idx); 136 136 } 137 - #else /* !CONFIG_PROTECTED_NVHE_STACKTRACE */ 137 + #else /* !CONFIG_PKVM_STACKTRACE */ 138 138 static void pkvm_save_backtrace(unsigned long fp, unsigned long pc) 139 139 { 140 140 } 141 - #endif /* CONFIG_PROTECTED_NVHE_STACKTRACE */ 141 + #endif /* CONFIG_PKVM_STACKTRACE */ 142 142 143 143 /* 144 144 * kvm_nvhe_prepare_backtrace - prepare to dump the nVHE backtrace
+4 -1
arch/arm64/kvm/hyp/nvhe/switch.c
··· 7 7 #include <hyp/switch.h> 8 8 #include <hyp/sysreg-sr.h> 9 9 10 - #include <linux/arm-smccc.h> 11 10 #include <linux/kvm_host.h> 12 11 #include <linux/types.h> 13 12 #include <linux/jump_label.h> ··· 20 21 #include <asm/kvm_asm.h> 21 22 #include <asm/kvm_emulate.h> 22 23 #include <asm/kvm_hyp.h> 24 + #include <asm/kvm_hypevents.h> 23 25 #include <asm/kvm_mmu.h> 24 26 #include <asm/fpsimd.h> 25 27 #include <asm/debug-monitors.h> ··· 308 308 __debug_switch_to_guest(vcpu); 309 309 310 310 do { 311 + trace_hyp_exit(host_ctxt, HYP_REASON_ERET_GUEST); 312 + 311 313 /* Jump in the fire! */ 312 314 exit_code = __guest_enter(vcpu); 313 315 314 316 /* And we're baaack! */ 317 + trace_hyp_enter(host_ctxt, HYP_REASON_GUEST_EXIT); 315 318 } while (fixup_guest_exit(vcpu, &exit_code)); 316 319 317 320 __sysreg_save_state_nvhe(guest_ctxt);
+306
arch/arm64/kvm/hyp/nvhe/trace.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * Copyright (C) 2025 Google LLC 4 + * Author: Vincent Donnefort <vdonnefort@google.com> 5 + */ 6 + 7 + #include <nvhe/clock.h> 8 + #include <nvhe/mem_protect.h> 9 + #include <nvhe/mm.h> 10 + #include <nvhe/trace.h> 11 + 12 + #include <asm/percpu.h> 13 + #include <asm/kvm_mmu.h> 14 + #include <asm/local.h> 15 + 16 + #include "simple_ring_buffer.c" 17 + 18 + static DEFINE_PER_CPU(struct simple_rb_per_cpu, __simple_rbs); 19 + 20 + static struct hyp_trace_buffer { 21 + struct simple_rb_per_cpu __percpu *simple_rbs; 22 + void *bpages_backing_start; 23 + size_t bpages_backing_size; 24 + hyp_spinlock_t lock; 25 + } trace_buffer = { 26 + .simple_rbs = &__simple_rbs, 27 + .lock = __HYP_SPIN_LOCK_UNLOCKED, 28 + }; 29 + 30 + static bool hyp_trace_buffer_loaded(struct hyp_trace_buffer *trace_buffer) 31 + { 32 + return trace_buffer->bpages_backing_size > 0; 33 + } 34 + 35 + void *tracing_reserve_entry(unsigned long length) 36 + { 37 + return simple_ring_buffer_reserve(this_cpu_ptr(trace_buffer.simple_rbs), length, 38 + trace_clock()); 39 + } 40 + 41 + void tracing_commit_entry(void) 42 + { 43 + simple_ring_buffer_commit(this_cpu_ptr(trace_buffer.simple_rbs)); 44 + } 45 + 46 + static int __admit_host_mem(void *start, u64 size) 47 + { 48 + if (!PAGE_ALIGNED(start) || !PAGE_ALIGNED(size) || !size) 49 + return -EINVAL; 50 + 51 + if (!is_protected_kvm_enabled()) 52 + return 0; 53 + 54 + return __pkvm_host_donate_hyp(hyp_virt_to_pfn(start), size >> PAGE_SHIFT); 55 + } 56 + 57 + static void __release_host_mem(void *start, u64 size) 58 + { 59 + if (!is_protected_kvm_enabled()) 60 + return; 61 + 62 + WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(start), size >> PAGE_SHIFT)); 63 + } 64 + 65 + static int hyp_trace_buffer_load_bpage_backing(struct hyp_trace_buffer *trace_buffer, 66 + struct hyp_trace_desc *desc) 67 + { 68 + void *start = (void *)kern_hyp_va(desc->bpages_backing_start); 69 + size_t size = desc->bpages_backing_size; 70 + int ret; 71 + 72 + ret = __admit_host_mem(start, size); 73 + if (ret) 74 + return ret; 75 + 76 + memset(start, 0, size); 77 + 78 + trace_buffer->bpages_backing_start = start; 79 + trace_buffer->bpages_backing_size = size; 80 + 81 + return 0; 82 + } 83 + 84 + static void hyp_trace_buffer_unload_bpage_backing(struct hyp_trace_buffer *trace_buffer) 85 + { 86 + void *start = trace_buffer->bpages_backing_start; 87 + size_t size = trace_buffer->bpages_backing_size; 88 + 89 + if (!size) 90 + return; 91 + 92 + memset(start, 0, size); 93 + 94 + __release_host_mem(start, size); 95 + 96 + trace_buffer->bpages_backing_start = 0; 97 + trace_buffer->bpages_backing_size = 0; 98 + } 99 + 100 + static void *__pin_shared_page(unsigned long kern_va) 101 + { 102 + void *va = kern_hyp_va((void *)kern_va); 103 + 104 + if (!is_protected_kvm_enabled()) 105 + return va; 106 + 107 + return hyp_pin_shared_mem(va, va + PAGE_SIZE) ? NULL : va; 108 + } 109 + 110 + static void __unpin_shared_page(void *va) 111 + { 112 + if (!is_protected_kvm_enabled()) 113 + return; 114 + 115 + hyp_unpin_shared_mem(va, va + PAGE_SIZE); 116 + } 117 + 118 + static void hyp_trace_buffer_unload(struct hyp_trace_buffer *trace_buffer) 119 + { 120 + int cpu; 121 + 122 + hyp_assert_lock_held(&trace_buffer->lock); 123 + 124 + if (!hyp_trace_buffer_loaded(trace_buffer)) 125 + return; 126 + 127 + for (cpu = 0; cpu < hyp_nr_cpus; cpu++) 128 + simple_ring_buffer_unload_mm(per_cpu_ptr(trace_buffer->simple_rbs, cpu), 129 + __unpin_shared_page); 130 + 131 + hyp_trace_buffer_unload_bpage_backing(trace_buffer); 132 + } 133 + 134 + static int hyp_trace_buffer_load(struct hyp_trace_buffer *trace_buffer, 135 + struct hyp_trace_desc *desc) 136 + { 137 + struct simple_buffer_page *bpages; 138 + struct ring_buffer_desc *rb_desc; 139 + int ret, cpu; 140 + 141 + hyp_assert_lock_held(&trace_buffer->lock); 142 + 143 + if (hyp_trace_buffer_loaded(trace_buffer)) 144 + return -EINVAL; 145 + 146 + ret = hyp_trace_buffer_load_bpage_backing(trace_buffer, desc); 147 + if (ret) 148 + return ret; 149 + 150 + bpages = trace_buffer->bpages_backing_start; 151 + for_each_ring_buffer_desc(rb_desc, cpu, &desc->trace_buffer_desc) { 152 + ret = simple_ring_buffer_init_mm(per_cpu_ptr(trace_buffer->simple_rbs, cpu), 153 + bpages, rb_desc, __pin_shared_page, 154 + __unpin_shared_page); 155 + if (ret) 156 + break; 157 + 158 + bpages += rb_desc->nr_page_va; 159 + } 160 + 161 + if (ret) 162 + hyp_trace_buffer_unload(trace_buffer); 163 + 164 + return ret; 165 + } 166 + 167 + static bool hyp_trace_desc_validate(struct hyp_trace_desc *desc, size_t desc_size) 168 + { 169 + struct ring_buffer_desc *rb_desc; 170 + unsigned int cpu; 171 + size_t nr_bpages; 172 + void *desc_end; 173 + 174 + /* 175 + * Both desc_size and bpages_backing_size are untrusted host-provided 176 + * values. We rely on __pkvm_host_donate_hyp() to enforce their validity. 177 + */ 178 + desc_end = (void *)desc + desc_size; 179 + nr_bpages = desc->bpages_backing_size / sizeof(struct simple_buffer_page); 180 + 181 + for_each_ring_buffer_desc(rb_desc, cpu, &desc->trace_buffer_desc) { 182 + /* Can we read nr_page_va? */ 183 + if ((void *)rb_desc + struct_size(rb_desc, page_va, 0) > desc_end) 184 + return false; 185 + 186 + /* Overflow desc? */ 187 + if ((void *)rb_desc + struct_size(rb_desc, page_va, rb_desc->nr_page_va) > desc_end) 188 + return false; 189 + 190 + /* Overflow bpages backing memory? */ 191 + if (nr_bpages < rb_desc->nr_page_va) 192 + return false; 193 + 194 + if (cpu >= hyp_nr_cpus) 195 + return false; 196 + 197 + if (cpu != rb_desc->cpu) 198 + return false; 199 + 200 + nr_bpages -= rb_desc->nr_page_va; 201 + } 202 + 203 + return true; 204 + } 205 + 206 + int __tracing_load(unsigned long desc_hva, size_t desc_size) 207 + { 208 + struct hyp_trace_desc *desc = (struct hyp_trace_desc *)kern_hyp_va(desc_hva); 209 + int ret; 210 + 211 + ret = __admit_host_mem(desc, desc_size); 212 + if (ret) 213 + return ret; 214 + 215 + if (!hyp_trace_desc_validate(desc, desc_size)) 216 + goto err_release_desc; 217 + 218 + hyp_spin_lock(&trace_buffer.lock); 219 + 220 + ret = hyp_trace_buffer_load(&trace_buffer, desc); 221 + 222 + hyp_spin_unlock(&trace_buffer.lock); 223 + 224 + err_release_desc: 225 + __release_host_mem(desc, desc_size); 226 + return ret; 227 + } 228 + 229 + void __tracing_unload(void) 230 + { 231 + hyp_spin_lock(&trace_buffer.lock); 232 + hyp_trace_buffer_unload(&trace_buffer); 233 + hyp_spin_unlock(&trace_buffer.lock); 234 + } 235 + 236 + int __tracing_enable(bool enable) 237 + { 238 + int cpu, ret = enable ? -EINVAL : 0; 239 + 240 + hyp_spin_lock(&trace_buffer.lock); 241 + 242 + if (!hyp_trace_buffer_loaded(&trace_buffer)) 243 + goto unlock; 244 + 245 + for (cpu = 0; cpu < hyp_nr_cpus; cpu++) 246 + simple_ring_buffer_enable_tracing(per_cpu_ptr(trace_buffer.simple_rbs, cpu), 247 + enable); 248 + 249 + ret = 0; 250 + 251 + unlock: 252 + hyp_spin_unlock(&trace_buffer.lock); 253 + 254 + return ret; 255 + } 256 + 257 + int __tracing_swap_reader(unsigned int cpu) 258 + { 259 + int ret = -ENODEV; 260 + 261 + if (cpu >= hyp_nr_cpus) 262 + return -EINVAL; 263 + 264 + hyp_spin_lock(&trace_buffer.lock); 265 + 266 + if (hyp_trace_buffer_loaded(&trace_buffer)) 267 + ret = simple_ring_buffer_swap_reader_page( 268 + per_cpu_ptr(trace_buffer.simple_rbs, cpu)); 269 + 270 + hyp_spin_unlock(&trace_buffer.lock); 271 + 272 + return ret; 273 + } 274 + 275 + void __tracing_update_clock(u32 mult, u32 shift, u64 epoch_ns, u64 epoch_cyc) 276 + { 277 + int cpu; 278 + 279 + /* After this loop, all CPUs are observing the new bank... */ 280 + for (cpu = 0; cpu < hyp_nr_cpus; cpu++) { 281 + struct simple_rb_per_cpu *simple_rb = per_cpu_ptr(trace_buffer.simple_rbs, cpu); 282 + 283 + while (READ_ONCE(simple_rb->status) == SIMPLE_RB_WRITING) 284 + ; 285 + } 286 + 287 + /* ...we can now override the old one and swap. */ 288 + trace_clock_update(mult, shift, epoch_ns, epoch_cyc); 289 + } 290 + 291 + int __tracing_reset(unsigned int cpu) 292 + { 293 + int ret = -ENODEV; 294 + 295 + if (cpu >= hyp_nr_cpus) 296 + return -EINVAL; 297 + 298 + hyp_spin_lock(&trace_buffer.lock); 299 + 300 + if (hyp_trace_buffer_loaded(&trace_buffer)) 301 + ret = simple_ring_buffer_reset(per_cpu_ptr(trace_buffer.simple_rbs, cpu)); 302 + 303 + hyp_spin_unlock(&trace_buffer.lock); 304 + 305 + return ret; 306 + }
+442
arch/arm64/kvm/hyp_trace.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * Copyright (C) 2025 Google LLC 4 + * Author: Vincent Donnefort <vdonnefort@google.com> 5 + */ 6 + 7 + #include <linux/cpumask.h> 8 + #include <linux/trace_remote.h> 9 + #include <linux/tracefs.h> 10 + #include <linux/simple_ring_buffer.h> 11 + 12 + #include <asm/arch_timer.h> 13 + #include <asm/kvm_host.h> 14 + #include <asm/kvm_hyptrace.h> 15 + #include <asm/kvm_mmu.h> 16 + 17 + #include "hyp_trace.h" 18 + 19 + /* Same 10min used by clocksource when width is more than 32-bits */ 20 + #define CLOCK_MAX_CONVERSION_S 600 21 + /* 22 + * Time to give for the clock init. Long enough to get a good mult/shift 23 + * estimation. Short enough to not delay the tracing start too much. 24 + */ 25 + #define CLOCK_INIT_MS 100 26 + /* 27 + * Time between clock checks. Must be small enough to catch clock deviation when 28 + * it is still tiny. 29 + */ 30 + #define CLOCK_UPDATE_MS 500 31 + 32 + static struct hyp_trace_clock { 33 + u64 cycles; 34 + u64 cyc_overflow64; 35 + u64 boot; 36 + u32 mult; 37 + u32 shift; 38 + struct delayed_work work; 39 + struct completion ready; 40 + struct mutex lock; 41 + bool running; 42 + } hyp_clock; 43 + 44 + static void __hyp_clock_work(struct work_struct *work) 45 + { 46 + struct delayed_work *dwork = to_delayed_work(work); 47 + struct hyp_trace_clock *hyp_clock; 48 + struct system_time_snapshot snap; 49 + u64 rate, delta_cycles; 50 + u64 boot, delta_boot; 51 + 52 + hyp_clock = container_of(dwork, struct hyp_trace_clock, work); 53 + 54 + ktime_get_snapshot(&snap); 55 + boot = ktime_to_ns(snap.boot); 56 + 57 + delta_boot = boot - hyp_clock->boot; 58 + delta_cycles = snap.cycles - hyp_clock->cycles; 59 + 60 + /* Compare hyp clock with the kernel boot clock */ 61 + if (hyp_clock->mult) { 62 + u64 err, cur = delta_cycles; 63 + 64 + if (WARN_ON_ONCE(cur >= hyp_clock->cyc_overflow64)) { 65 + __uint128_t tmp = (__uint128_t)cur * hyp_clock->mult; 66 + 67 + cur = tmp >> hyp_clock->shift; 68 + } else { 69 + cur *= hyp_clock->mult; 70 + cur >>= hyp_clock->shift; 71 + } 72 + cur += hyp_clock->boot; 73 + 74 + err = abs_diff(cur, boot); 75 + /* No deviation, only update epoch if necessary */ 76 + if (!err) { 77 + if (delta_cycles >= (hyp_clock->cyc_overflow64 >> 1)) 78 + goto fast_forward; 79 + 80 + goto resched; 81 + } 82 + 83 + /* Warn if the error is above tracing precision (1us) */ 84 + if (err > NSEC_PER_USEC) 85 + pr_warn_ratelimited("hyp trace clock off by %lluus\n", 86 + err / NSEC_PER_USEC); 87 + } 88 + 89 + rate = div64_u64(delta_cycles * NSEC_PER_SEC, delta_boot); 90 + 91 + clocks_calc_mult_shift(&hyp_clock->mult, &hyp_clock->shift, 92 + rate, NSEC_PER_SEC, CLOCK_MAX_CONVERSION_S); 93 + 94 + /* Add a comfortable 50% margin */ 95 + hyp_clock->cyc_overflow64 = (U64_MAX / hyp_clock->mult) >> 1; 96 + 97 + fast_forward: 98 + hyp_clock->cycles = snap.cycles; 99 + hyp_clock->boot = boot; 100 + kvm_call_hyp_nvhe(__tracing_update_clock, hyp_clock->mult, 101 + hyp_clock->shift, hyp_clock->boot, hyp_clock->cycles); 102 + complete(&hyp_clock->ready); 103 + 104 + resched: 105 + schedule_delayed_work(&hyp_clock->work, 106 + msecs_to_jiffies(CLOCK_UPDATE_MS)); 107 + } 108 + 109 + static void hyp_trace_clock_enable(struct hyp_trace_clock *hyp_clock, bool enable) 110 + { 111 + struct system_time_snapshot snap; 112 + 113 + if (hyp_clock->running == enable) 114 + return; 115 + 116 + if (!enable) { 117 + cancel_delayed_work_sync(&hyp_clock->work); 118 + hyp_clock->running = false; 119 + } 120 + 121 + ktime_get_snapshot(&snap); 122 + 123 + hyp_clock->boot = ktime_to_ns(snap.boot); 124 + hyp_clock->cycles = snap.cycles; 125 + hyp_clock->mult = 0; 126 + 127 + init_completion(&hyp_clock->ready); 128 + INIT_DELAYED_WORK(&hyp_clock->work, __hyp_clock_work); 129 + schedule_delayed_work(&hyp_clock->work, msecs_to_jiffies(CLOCK_INIT_MS)); 130 + wait_for_completion(&hyp_clock->ready); 131 + hyp_clock->running = true; 132 + } 133 + 134 + /* Access to this struct within the trace_remote_callbacks are protected by the trace_remote lock */ 135 + static struct hyp_trace_buffer { 136 + struct hyp_trace_desc *desc; 137 + size_t desc_size; 138 + } trace_buffer; 139 + 140 + static int __map_hyp(void *start, size_t size) 141 + { 142 + if (is_protected_kvm_enabled()) 143 + return 0; 144 + 145 + return create_hyp_mappings(start, start + size, PAGE_HYP); 146 + } 147 + 148 + static int __share_page(unsigned long va) 149 + { 150 + return kvm_share_hyp((void *)va, (void *)va + 1); 151 + } 152 + 153 + static void __unshare_page(unsigned long va) 154 + { 155 + kvm_unshare_hyp((void *)va, (void *)va + 1); 156 + } 157 + 158 + static int hyp_trace_buffer_alloc_bpages_backing(struct hyp_trace_buffer *trace_buffer, size_t size) 159 + { 160 + int nr_bpages = (PAGE_ALIGN(size) / PAGE_SIZE) + 1; 161 + size_t backing_size; 162 + void *start; 163 + 164 + backing_size = PAGE_ALIGN(sizeof(struct simple_buffer_page) * nr_bpages * 165 + num_possible_cpus()); 166 + 167 + start = alloc_pages_exact(backing_size, GFP_KERNEL_ACCOUNT); 168 + if (!start) 169 + return -ENOMEM; 170 + 171 + trace_buffer->desc->bpages_backing_start = (unsigned long)start; 172 + trace_buffer->desc->bpages_backing_size = backing_size; 173 + 174 + return __map_hyp(start, backing_size); 175 + } 176 + 177 + static void hyp_trace_buffer_free_bpages_backing(struct hyp_trace_buffer *trace_buffer) 178 + { 179 + free_pages_exact((void *)trace_buffer->desc->bpages_backing_start, 180 + trace_buffer->desc->bpages_backing_size); 181 + } 182 + 183 + static void hyp_trace_buffer_unshare_hyp(struct hyp_trace_buffer *trace_buffer, int last_cpu) 184 + { 185 + struct ring_buffer_desc *rb_desc; 186 + int cpu, p; 187 + 188 + for_each_ring_buffer_desc(rb_desc, cpu, &trace_buffer->desc->trace_buffer_desc) { 189 + if (cpu > last_cpu) 190 + break; 191 + 192 + __share_page(rb_desc->meta_va); 193 + for (p = 0; p < rb_desc->nr_page_va; p++) 194 + __unshare_page(rb_desc->page_va[p]); 195 + } 196 + } 197 + 198 + static int hyp_trace_buffer_share_hyp(struct hyp_trace_buffer *trace_buffer) 199 + { 200 + struct ring_buffer_desc *rb_desc; 201 + int cpu, p, ret = 0; 202 + 203 + for_each_ring_buffer_desc(rb_desc, cpu, &trace_buffer->desc->trace_buffer_desc) { 204 + ret = __share_page(rb_desc->meta_va); 205 + if (ret) 206 + break; 207 + 208 + for (p = 0; p < rb_desc->nr_page_va; p++) { 209 + ret = __share_page(rb_desc->page_va[p]); 210 + if (ret) 211 + break; 212 + } 213 + 214 + if (ret) { 215 + for (p--; p >= 0; p--) 216 + __unshare_page(rb_desc->page_va[p]); 217 + break; 218 + } 219 + } 220 + 221 + if (ret) 222 + hyp_trace_buffer_unshare_hyp(trace_buffer, cpu--); 223 + 224 + return ret; 225 + } 226 + 227 + static struct trace_buffer_desc *hyp_trace_load(unsigned long size, void *priv) 228 + { 229 + struct hyp_trace_buffer *trace_buffer = priv; 230 + struct hyp_trace_desc *desc; 231 + size_t desc_size; 232 + int ret; 233 + 234 + if (WARN_ON(trace_buffer->desc)) 235 + return ERR_PTR(-EINVAL); 236 + 237 + desc_size = trace_buffer_desc_size(size, num_possible_cpus()); 238 + if (desc_size == SIZE_MAX) 239 + return ERR_PTR(-E2BIG); 240 + 241 + desc_size = PAGE_ALIGN(desc_size); 242 + desc = (struct hyp_trace_desc *)alloc_pages_exact(desc_size, GFP_KERNEL); 243 + if (!desc) 244 + return ERR_PTR(-ENOMEM); 245 + 246 + ret = __map_hyp(desc, desc_size); 247 + if (ret) 248 + goto err_free_desc; 249 + 250 + trace_buffer->desc = desc; 251 + 252 + ret = hyp_trace_buffer_alloc_bpages_backing(trace_buffer, size); 253 + if (ret) 254 + goto err_free_desc; 255 + 256 + ret = trace_remote_alloc_buffer(&desc->trace_buffer_desc, desc_size, size, 257 + cpu_possible_mask); 258 + if (ret) 259 + goto err_free_backing; 260 + 261 + ret = hyp_trace_buffer_share_hyp(trace_buffer); 262 + if (ret) 263 + goto err_free_buffer; 264 + 265 + ret = kvm_call_hyp_nvhe(__tracing_load, (unsigned long)desc, desc_size); 266 + if (ret) 267 + goto err_unload_pages; 268 + 269 + return &desc->trace_buffer_desc; 270 + 271 + err_unload_pages: 272 + hyp_trace_buffer_unshare_hyp(trace_buffer, INT_MAX); 273 + 274 + err_free_buffer: 275 + trace_remote_free_buffer(&desc->trace_buffer_desc); 276 + 277 + err_free_backing: 278 + hyp_trace_buffer_free_bpages_backing(trace_buffer); 279 + 280 + err_free_desc: 281 + free_pages_exact(desc, desc_size); 282 + trace_buffer->desc = NULL; 283 + 284 + return ERR_PTR(ret); 285 + } 286 + 287 + static void hyp_trace_unload(struct trace_buffer_desc *desc, void *priv) 288 + { 289 + struct hyp_trace_buffer *trace_buffer = priv; 290 + 291 + if (WARN_ON(desc != &trace_buffer->desc->trace_buffer_desc)) 292 + return; 293 + 294 + kvm_call_hyp_nvhe(__tracing_unload); 295 + hyp_trace_buffer_unshare_hyp(trace_buffer, INT_MAX); 296 + trace_remote_free_buffer(desc); 297 + hyp_trace_buffer_free_bpages_backing(trace_buffer); 298 + free_pages_exact(trace_buffer->desc, trace_buffer->desc_size); 299 + trace_buffer->desc = NULL; 300 + } 301 + 302 + static int hyp_trace_enable_tracing(bool enable, void *priv) 303 + { 304 + hyp_trace_clock_enable(&hyp_clock, enable); 305 + 306 + return kvm_call_hyp_nvhe(__tracing_enable, enable); 307 + } 308 + 309 + static int hyp_trace_swap_reader_page(unsigned int cpu, void *priv) 310 + { 311 + return kvm_call_hyp_nvhe(__tracing_swap_reader, cpu); 312 + } 313 + 314 + static int hyp_trace_reset(unsigned int cpu, void *priv) 315 + { 316 + return kvm_call_hyp_nvhe(__tracing_reset, cpu); 317 + } 318 + 319 + static int hyp_trace_enable_event(unsigned short id, bool enable, void *priv) 320 + { 321 + struct hyp_event_id *event_id = lm_alias(&__hyp_event_ids_start[id]); 322 + struct page *page; 323 + atomic_t *enabled; 324 + void *map; 325 + 326 + if (is_protected_kvm_enabled()) 327 + return kvm_call_hyp_nvhe(__tracing_enable_event, id, enable); 328 + 329 + enabled = &event_id->enabled; 330 + page = virt_to_page(enabled); 331 + map = vmap(&page, 1, VM_MAP, PAGE_KERNEL); 332 + if (!map) 333 + return -ENOMEM; 334 + 335 + enabled = map + offset_in_page(enabled); 336 + atomic_set(enabled, enable); 337 + 338 + vunmap(map); 339 + 340 + return 0; 341 + } 342 + 343 + static int hyp_trace_clock_show(struct seq_file *m, void *v) 344 + { 345 + seq_puts(m, "[boot]\n"); 346 + 347 + return 0; 348 + } 349 + DEFINE_SHOW_ATTRIBUTE(hyp_trace_clock); 350 + 351 + static ssize_t hyp_trace_write_event_write(struct file *f, const char __user *ubuf, 352 + size_t cnt, loff_t *pos) 353 + { 354 + unsigned long val; 355 + int ret; 356 + 357 + ret = kstrtoul_from_user(ubuf, cnt, 10, &val); 358 + if (ret) 359 + return ret; 360 + 361 + kvm_call_hyp_nvhe(__tracing_write_event, val); 362 + 363 + return cnt; 364 + } 365 + 366 + static const struct file_operations hyp_trace_write_event_fops = { 367 + .write = hyp_trace_write_event_write, 368 + }; 369 + 370 + static int hyp_trace_init_tracefs(struct dentry *d, void *priv) 371 + { 372 + if (!tracefs_create_file("write_event", 0200, d, NULL, &hyp_trace_write_event_fops)) 373 + return -ENOMEM; 374 + 375 + return tracefs_create_file("trace_clock", 0440, d, NULL, &hyp_trace_clock_fops) ? 376 + 0 : -ENOMEM; 377 + } 378 + 379 + static struct trace_remote_callbacks trace_remote_callbacks = { 380 + .init = hyp_trace_init_tracefs, 381 + .load_trace_buffer = hyp_trace_load, 382 + .unload_trace_buffer = hyp_trace_unload, 383 + .enable_tracing = hyp_trace_enable_tracing, 384 + .swap_reader_page = hyp_trace_swap_reader_page, 385 + .reset = hyp_trace_reset, 386 + .enable_event = hyp_trace_enable_event, 387 + }; 388 + 389 + static const char *__hyp_enter_exit_reason_str(u8 reason); 390 + 391 + #include <asm/kvm_define_hypevents.h> 392 + 393 + static const char *__hyp_enter_exit_reason_str(u8 reason) 394 + { 395 + static const char strs[][12] = { 396 + "smc", 397 + "hvc", 398 + "psci", 399 + "host_abort", 400 + "guest_exit", 401 + "eret_host", 402 + "eret_guest", 403 + "unknown", 404 + }; 405 + 406 + return strs[min(reason, HYP_REASON_UNKNOWN)]; 407 + } 408 + 409 + static void __init hyp_trace_init_events(void) 410 + { 411 + struct hyp_event_id *hyp_event_id = __hyp_event_ids_start; 412 + struct remote_event *event = __hyp_events_start; 413 + int id = 0; 414 + 415 + /* Events on both sides hypervisor are sorted */ 416 + for (; event < __hyp_events_end; event++, hyp_event_id++, id++) 417 + event->id = hyp_event_id->id = id; 418 + } 419 + 420 + int __init kvm_hyp_trace_init(void) 421 + { 422 + int cpu; 423 + 424 + if (is_kernel_in_hyp_mode()) 425 + return 0; 426 + 427 + for_each_possible_cpu(cpu) { 428 + const struct arch_timer_erratum_workaround *wa = 429 + per_cpu(timer_unstable_counter_workaround, cpu); 430 + 431 + if (IS_ENABLED(CONFIG_ARM_ARCH_TIMER_OOL_WORKAROUND) && 432 + wa && wa->read_cntvct_el0) { 433 + pr_warn("hyp trace can't handle CNTVCT workaround '%s'\n", wa->desc); 434 + return -EOPNOTSUPP; 435 + } 436 + } 437 + 438 + hyp_trace_init_events(); 439 + 440 + return trace_remote_register("hypervisor", &trace_remote_callbacks, &trace_buffer, 441 + __hyp_events_start, __hyp_events_end - __hyp_events_start); 442 + }
+11
arch/arm64/kvm/hyp_trace.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + 3 + #ifndef __ARM64_KVM_HYP_TRACE_H__ 4 + #define __ARM64_KVM_HYP_TRACE_H__ 5 + 6 + #ifdef CONFIG_NVHE_EL2_TRACING 7 + int kvm_hyp_trace_init(void); 8 + #else 9 + static inline int kvm_hyp_trace_init(void) { return 0; } 10 + #endif 11 + #endif
+4 -4
arch/arm64/kvm/stacktrace.c
··· 197 197 kvm_nvhe_dump_backtrace_end(); 198 198 } 199 199 200 - #ifdef CONFIG_PROTECTED_NVHE_STACKTRACE 200 + #ifdef CONFIG_PKVM_STACKTRACE 201 201 DECLARE_KVM_NVHE_PER_CPU(unsigned long [NVHE_STACKTRACE_SIZE/sizeof(long)], 202 202 pkvm_stacktrace); 203 203 ··· 225 225 kvm_nvhe_dump_backtrace_entry((void *)hyp_offset, stacktrace[i]); 226 226 kvm_nvhe_dump_backtrace_end(); 227 227 } 228 - #else /* !CONFIG_PROTECTED_NVHE_STACKTRACE */ 228 + #else /* !CONFIG_PKVM_STACKTRACE */ 229 229 static void pkvm_dump_backtrace(unsigned long hyp_offset) 230 230 { 231 - kvm_err("Cannot dump pKVM nVHE stacktrace: !CONFIG_PROTECTED_NVHE_STACKTRACE\n"); 231 + kvm_err("Cannot dump pKVM nVHE stacktrace: !CONFIG_PKVM_STACKTRACE\n"); 232 232 } 233 - #endif /* CONFIG_PROTECTED_NVHE_STACKTRACE */ 233 + #endif /* CONFIG_PKVM_STACKTRACE */ 234 234 235 235 /* 236 236 * kvm_nvhe_dump_backtrace - Dump KVM nVHE hypervisor backtrace.
+1
fs/tracefs/inode.c
··· 664 664 fsnotify_create(d_inode(dentry->d_parent), dentry); 665 665 return tracefs_end_creating(dentry); 666 666 } 667 + EXPORT_SYMBOL_GPL(tracefs_create_file); 667 668 668 669 static struct dentry *__create_dir(const char *name, struct dentry *parent, 669 670 const struct inode_operations *ops)
+58
include/linux/ring_buffer.h
··· 251 251 void ring_buffer_map_dup(struct trace_buffer *buffer, int cpu); 252 252 int ring_buffer_unmap(struct trace_buffer *buffer, int cpu); 253 253 int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu); 254 + 255 + struct ring_buffer_desc { 256 + int cpu; 257 + unsigned int nr_page_va; /* excludes the meta page */ 258 + unsigned long meta_va; 259 + unsigned long page_va[] __counted_by(nr_page_va); 260 + }; 261 + 262 + struct trace_buffer_desc { 263 + int nr_cpus; 264 + size_t struct_len; 265 + char __data[]; /* list of ring_buffer_desc */ 266 + }; 267 + 268 + static inline struct ring_buffer_desc *__next_ring_buffer_desc(struct ring_buffer_desc *desc) 269 + { 270 + size_t len = struct_size(desc, page_va, desc->nr_page_va); 271 + 272 + return (struct ring_buffer_desc *)((void *)desc + len); 273 + } 274 + 275 + static inline struct ring_buffer_desc *__first_ring_buffer_desc(struct trace_buffer_desc *desc) 276 + { 277 + return (struct ring_buffer_desc *)(&desc->__data[0]); 278 + } 279 + 280 + static inline size_t trace_buffer_desc_size(size_t buffer_size, unsigned int nr_cpus) 281 + { 282 + unsigned int nr_pages = max(DIV_ROUND_UP(buffer_size, PAGE_SIZE), 2UL) + 1; 283 + struct ring_buffer_desc *rbdesc; 284 + 285 + return size_add(offsetof(struct trace_buffer_desc, __data), 286 + size_mul(nr_cpus, struct_size(rbdesc, page_va, nr_pages))); 287 + } 288 + 289 + #define for_each_ring_buffer_desc(__pdesc, __cpu, __trace_pdesc) \ 290 + for (__pdesc = __first_ring_buffer_desc(__trace_pdesc), __cpu = 0; \ 291 + (__cpu) < (__trace_pdesc)->nr_cpus; \ 292 + (__cpu)++, __pdesc = __next_ring_buffer_desc(__pdesc)) 293 + 294 + struct ring_buffer_remote { 295 + struct trace_buffer_desc *desc; 296 + int (*swap_reader_page)(unsigned int cpu, void *priv); 297 + int (*reset)(unsigned int cpu, void *priv); 298 + void *priv; 299 + }; 300 + 301 + int ring_buffer_poll_remote(struct trace_buffer *buffer, int cpu); 302 + 303 + struct trace_buffer * 304 + __ring_buffer_alloc_remote(struct ring_buffer_remote *remote, 305 + struct lock_class_key *key); 306 + 307 + #define ring_buffer_alloc_remote(remote) \ 308 + ({ \ 309 + static struct lock_class_key __key; \ 310 + __ring_buffer_alloc_remote(remote, &__key); \ 311 + }) 254 312 #endif /* _LINUX_RING_BUFFER_H */
+41
include/linux/ring_buffer_types.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #ifndef _LINUX_RING_BUFFER_TYPES_H 3 + #define _LINUX_RING_BUFFER_TYPES_H 4 + 5 + #include <asm/local.h> 6 + 7 + #define TS_SHIFT 27 8 + #define TS_MASK ((1ULL << TS_SHIFT) - 1) 9 + #define TS_DELTA_TEST (~TS_MASK) 10 + 11 + /* 12 + * We need to fit the time_stamp delta into 27 bits. 13 + */ 14 + static inline bool test_time_stamp(u64 delta) 15 + { 16 + return !!(delta & TS_DELTA_TEST); 17 + } 18 + 19 + #define BUF_PAGE_HDR_SIZE offsetof(struct buffer_data_page, data) 20 + 21 + #define RB_EVNT_HDR_SIZE (offsetof(struct ring_buffer_event, array)) 22 + #define RB_ALIGNMENT 4U 23 + #define RB_MAX_SMALL_DATA (RB_ALIGNMENT * RINGBUF_TYPE_DATA_TYPE_LEN_MAX) 24 + #define RB_EVNT_MIN_SIZE 8U /* two 32bit words */ 25 + 26 + #ifndef CONFIG_HAVE_64BIT_ALIGNED_ACCESS 27 + # define RB_FORCE_8BYTE_ALIGNMENT 0 28 + # define RB_ARCH_ALIGNMENT RB_ALIGNMENT 29 + #else 30 + # define RB_FORCE_8BYTE_ALIGNMENT 1 31 + # define RB_ARCH_ALIGNMENT 8U 32 + #endif 33 + 34 + #define RB_ALIGN_DATA __aligned(RB_ARCH_ALIGNMENT) 35 + 36 + struct buffer_data_page { 37 + u64 time_stamp; /* page time stamp */ 38 + local_t commit; /* write committed index */ 39 + unsigned char data[] RB_ALIGN_DATA; /* data of buffer page */ 40 + }; 41 + #endif
+65
include/linux/simple_ring_buffer.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #ifndef _LINUX_SIMPLE_RING_BUFFER_H 3 + #define _LINUX_SIMPLE_RING_BUFFER_H 4 + 5 + #include <linux/list.h> 6 + #include <linux/ring_buffer.h> 7 + #include <linux/ring_buffer_types.h> 8 + #include <linux/types.h> 9 + 10 + /* 11 + * Ideally those struct would stay private but the caller needs to know 12 + * the allocation size for simple_ring_buffer_init(). 13 + */ 14 + struct simple_buffer_page { 15 + struct list_head link; 16 + struct buffer_data_page *page; 17 + u64 entries; 18 + u32 write; 19 + u32 id; 20 + }; 21 + 22 + struct simple_rb_per_cpu { 23 + struct simple_buffer_page *tail_page; 24 + struct simple_buffer_page *reader_page; 25 + struct simple_buffer_page *head_page; 26 + struct simple_buffer_page *bpages; 27 + struct trace_buffer_meta *meta; 28 + u32 nr_pages; 29 + 30 + #define SIMPLE_RB_UNAVAILABLE 0 31 + #define SIMPLE_RB_READY 1 32 + #define SIMPLE_RB_WRITING 2 33 + u32 status; 34 + 35 + u64 last_overrun; 36 + u64 write_stamp; 37 + 38 + struct simple_rb_cbs *cbs; 39 + }; 40 + 41 + int simple_ring_buffer_init(struct simple_rb_per_cpu *cpu_buffer, struct simple_buffer_page *bpages, 42 + const struct ring_buffer_desc *desc); 43 + 44 + void simple_ring_buffer_unload(struct simple_rb_per_cpu *cpu_buffer); 45 + 46 + void *simple_ring_buffer_reserve(struct simple_rb_per_cpu *cpu_buffer, unsigned long length, 47 + u64 timestamp); 48 + 49 + void simple_ring_buffer_commit(struct simple_rb_per_cpu *cpu_buffer); 50 + 51 + int simple_ring_buffer_enable_tracing(struct simple_rb_per_cpu *cpu_buffer, bool enable); 52 + 53 + int simple_ring_buffer_reset(struct simple_rb_per_cpu *cpu_buffer); 54 + 55 + int simple_ring_buffer_swap_reader_page(struct simple_rb_per_cpu *cpu_buffer); 56 + 57 + int simple_ring_buffer_init_mm(struct simple_rb_per_cpu *cpu_buffer, 58 + struct simple_buffer_page *bpages, 59 + const struct ring_buffer_desc *desc, 60 + void *(*load_page)(unsigned long va), 61 + void (*unload_page)(void *va)); 62 + 63 + void simple_ring_buffer_unload_mm(struct simple_rb_per_cpu *cpu_buffer, 64 + void (*unload_page)(void *)); 65 + #endif
+48
include/linux/trace_remote.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + 3 + #ifndef _LINUX_TRACE_REMOTE_H 4 + #define _LINUX_TRACE_REMOTE_H 5 + 6 + #include <linux/dcache.h> 7 + #include <linux/ring_buffer.h> 8 + #include <linux/trace_remote_event.h> 9 + 10 + /** 11 + * struct trace_remote_callbacks - Callbacks used by Tracefs to control the remote 12 + * @init: Called once the remote has been registered. Allows the 13 + * caller to extend the Tracefs remote directory 14 + * @load_trace_buffer: Called before Tracefs accesses the trace buffer for the first 15 + * time. Must return a &trace_buffer_desc 16 + * (most likely filled with trace_remote_alloc_buffer()) 17 + * @unload_trace_buffer: 18 + * Called once Tracefs has no use for the trace buffer 19 + * (most likely call trace_remote_free_buffer()) 20 + * @enable_tracing: Called on Tracefs tracing_on. It is expected from the 21 + * remote to allow writing. 22 + * @swap_reader_page: Called when Tracefs consumes a new page from a 23 + * ring-buffer. It is expected from the remote to isolate a 24 + * @reset: Called on `echo 0 > trace`. It is expected from the 25 + * remote to reset all ring-buffer pages. 26 + * new reader-page from the @cpu ring-buffer. 27 + * @enable_event: Called on events/event_name/enable. It is expected from 28 + * the remote to allow the writing event @id. 29 + */ 30 + struct trace_remote_callbacks { 31 + int (*init)(struct dentry *d, void *priv); 32 + struct trace_buffer_desc *(*load_trace_buffer)(unsigned long size, void *priv); 33 + void (*unload_trace_buffer)(struct trace_buffer_desc *desc, void *priv); 34 + int (*enable_tracing)(bool enable, void *priv); 35 + int (*swap_reader_page)(unsigned int cpu, void *priv); 36 + int (*reset)(unsigned int cpu, void *priv); 37 + int (*enable_event)(unsigned short id, bool enable, void *priv); 38 + }; 39 + 40 + int trace_remote_register(const char *name, struct trace_remote_callbacks *cbs, void *priv, 41 + struct remote_event *events, size_t nr_events); 42 + 43 + int trace_remote_alloc_buffer(struct trace_buffer_desc *desc, size_t desc_size, size_t buffer_size, 44 + const struct cpumask *cpumask); 45 + 46 + void trace_remote_free_buffer(struct trace_buffer_desc *desc); 47 + 48 + #endif
+33
include/linux/trace_remote_event.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + 3 + #ifndef _LINUX_TRACE_REMOTE_EVENTS_H 4 + #define _LINUX_TRACE_REMOTE_EVENTS_H 5 + 6 + struct trace_remote; 7 + struct trace_event_fields; 8 + struct trace_seq; 9 + 10 + struct remote_event_hdr { 11 + unsigned short id; 12 + }; 13 + 14 + #define REMOTE_EVENT_NAME_MAX 30 15 + struct remote_event { 16 + char name[REMOTE_EVENT_NAME_MAX]; 17 + unsigned short id; 18 + bool enabled; 19 + struct trace_remote *remote; 20 + struct trace_event_fields *fields; 21 + char *print_fmt; 22 + void (*print)(void *evt, struct trace_seq *seq); 23 + }; 24 + 25 + #define RE_STRUCT(__args...) __args 26 + #define re_field(__type, __field) __type __field; 27 + 28 + #define REMOTE_EVENT_FORMAT(__name, __struct) \ 29 + struct remote_event_format_##__name { \ 30 + struct remote_event_hdr hdr; \ 31 + __struct \ 32 + } 33 + #endif
+73
include/trace/define_remote_events.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + 3 + #include <linux/trace_events.h> 4 + #include <linux/trace_remote_event.h> 5 + #include <linux/trace_seq.h> 6 + #include <linux/stringify.h> 7 + 8 + #define REMOTE_EVENT_INCLUDE(__file) __stringify(../../__file) 9 + 10 + #ifdef REMOTE_EVENT_SECTION 11 + # define __REMOTE_EVENT_SECTION(__name) __used __section(REMOTE_EVENT_SECTION"."#__name) 12 + #else 13 + # define __REMOTE_EVENT_SECTION(__name) 14 + #endif 15 + 16 + #define REMOTE_PRINTK_COUNT_ARGS(__args...) \ 17 + __COUNT_ARGS(, ##__args, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 0) 18 + 19 + #define __remote_printk0() \ 20 + trace_seq_putc(seq, '\n') 21 + 22 + #define __remote_printk1(__fmt) \ 23 + trace_seq_puts(seq, " " __fmt "\n") \ 24 + 25 + #define __remote_printk2(__fmt, __args...) \ 26 + do { \ 27 + trace_seq_putc(seq, ' '); \ 28 + trace_seq_printf(seq, __fmt, __args); \ 29 + trace_seq_putc(seq, '\n'); \ 30 + } while (0) 31 + 32 + /* Apply the appropriate trace_seq sequence according to the number of arguments */ 33 + #define remote_printk(__args...) \ 34 + CONCATENATE(__remote_printk, REMOTE_PRINTK_COUNT_ARGS(__args))(__args) 35 + 36 + #define RE_PRINTK(__args...) __args 37 + 38 + #define REMOTE_EVENT(__name, __id, __struct, __printk) \ 39 + REMOTE_EVENT_FORMAT(__name, __struct); \ 40 + static void remote_event_print_##__name(void *evt, struct trace_seq *seq) \ 41 + { \ 42 + struct remote_event_format_##__name __maybe_unused *__entry = evt; \ 43 + trace_seq_puts(seq, #__name); \ 44 + remote_printk(__printk); \ 45 + } 46 + #include REMOTE_EVENT_INCLUDE(REMOTE_EVENT_INCLUDE_FILE) 47 + 48 + #undef REMOTE_EVENT 49 + #undef RE_PRINTK 50 + #undef re_field 51 + #define re_field(__type, __field) \ 52 + { \ 53 + .type = #__type, .name = #__field, \ 54 + .size = sizeof(__type), .align = __alignof__(__type), \ 55 + .is_signed = is_signed_type(__type), \ 56 + }, 57 + #define __entry REC 58 + #define RE_PRINTK(__fmt, __args...) "\"" __fmt "\", " __stringify(__args) 59 + #define REMOTE_EVENT(__name, __id, __struct, __printk) \ 60 + static struct trace_event_fields remote_event_fields_##__name[] = { \ 61 + __struct \ 62 + {} \ 63 + }; \ 64 + static char remote_event_print_fmt_##__name[] = __printk; \ 65 + static struct remote_event __REMOTE_EVENT_SECTION(__name) \ 66 + remote_event_##__name = { \ 67 + .name = #__name, \ 68 + .id = __id, \ 69 + .fields = remote_event_fields_##__name, \ 70 + .print_fmt = remote_event_print_fmt_##__name, \ 71 + .print = remote_event_print_##__name, \ 72 + } 73 + #include REMOTE_EVENT_INCLUDE(REMOTE_EVENT_INCLUDE_FILE)
+4 -4
include/uapi/linux/trace_mmap.h
··· 17 17 * @entries: Number of entries in the ring-buffer. 18 18 * @overrun: Number of entries lost in the ring-buffer. 19 19 * @read: Number of entries that have been read. 20 - * @Reserved1: Internal use only. 21 - * @Reserved2: Internal use only. 20 + * @pages_lost: Number of pages overwritten by the writer. 21 + * @pages_touched: Number of pages written by the writer. 22 22 */ 23 23 struct trace_buffer_meta { 24 24 __u32 meta_page_size; ··· 39 39 __u64 overrun; 40 40 __u64 read; 41 41 42 - __u64 Reserved1; 43 - __u64 Reserved2; 42 + __u64 pages_lost; 43 + __u64 pages_touched; 44 44 }; 45 45 46 46 #define TRACE_MMAP_IOCTL_GET_READER _IO('R', 0x20)
+14
kernel/trace/Kconfig
··· 1281 1281 1282 1282 source "kernel/trace/rv/Kconfig" 1283 1283 1284 + config TRACE_REMOTE 1285 + bool 1286 + 1287 + config SIMPLE_RING_BUFFER 1288 + bool 1289 + 1290 + config TRACE_REMOTE_TEST 1291 + tristate "Test module for remote tracing" 1292 + select TRACE_REMOTE 1293 + select SIMPLE_RING_BUFFER 1294 + help 1295 + This trace remote includes a ring-buffer writer implementation using 1296 + "simple_ring_buffer". This is solely intending for testing. 1297 + 1284 1298 endif # FTRACE
+59
kernel/trace/Makefile
··· 128 128 obj-$(CONFIG_TRACEPOINT_BENCHMARK) += trace_benchmark.o 129 129 obj-$(CONFIG_RV) += rv/ 130 130 131 + obj-$(CONFIG_TRACE_REMOTE) += trace_remote.o 132 + obj-$(CONFIG_SIMPLE_RING_BUFFER) += simple_ring_buffer.o 133 + obj-$(CONFIG_TRACE_REMOTE_TEST) += remote_test.o 134 + 135 + # 136 + # simple_ring_buffer is used by the pKVM hypervisor which does not have access 137 + # to all kernel symbols. Fail the build if forbidden symbols are found. 138 + # 139 + # undefsyms_base generates a set of compiler and tooling-generated symbols that can 140 + # safely be ignored for simple_ring_buffer. 141 + # 142 + filechk_undefsyms_base = \ 143 + echo '$(pound)include <linux/atomic.h>'; \ 144 + echo '$(pound)include <linux/string.h>'; \ 145 + echo '$(pound)include <asm/page.h>'; \ 146 + echo 'static char page[PAGE_SIZE] __aligned(PAGE_SIZE);'; \ 147 + echo 'void undefsyms_base(void *p, int n);'; \ 148 + echo 'void undefsyms_base(void *p, int n) {'; \ 149 + echo ' char buffer[256] = { 0 };'; \ 150 + echo ' u32 u = 0;'; \ 151 + echo ' memset((char * volatile)page, 8, PAGE_SIZE);'; \ 152 + echo ' memset((char * volatile)buffer, 8, sizeof(buffer));'; \ 153 + echo ' memcpy((void * volatile)p, buffer, sizeof(buffer));'; \ 154 + echo ' cmpxchg((u32 * volatile)&u, 0, 8);'; \ 155 + echo ' WARN_ON(n == 0xdeadbeef);'; \ 156 + echo '}' 157 + 158 + $(obj)/undefsyms_base.c: FORCE 159 + $(call filechk,undefsyms_base) 160 + 161 + clean-files += undefsyms_base.c 162 + 163 + $(obj)/undefsyms_base.o: $(obj)/undefsyms_base.c 164 + 165 + targets += undefsyms_base.o 166 + 167 + # Ensure KASAN is enabled to avoid logic that may disable FORTIFY_SOURCE when 168 + # KASAN is not enabled. undefsyms_base.o does not automatically get KASAN flags 169 + # because it is not linked into vmlinux. 170 + KASAN_SANITIZE_undefsyms_base.o := y 171 + 172 + UNDEFINED_ALLOWLIST = __asan __gcov __kasan __kcsan __hwasan __sancov __sanitizer __tsan __ubsan __x86_indirect_thunk \ 173 + __msan simple_ring_buffer \ 174 + $(shell $(NM) -u $(obj)/undefsyms_base.o 2>/dev/null | awk '{print $$2}') 175 + 176 + quiet_cmd_check_undefined = NM $< 177 + cmd_check_undefined = \ 178 + undefsyms=$$($(NM) -u $< | grep -v $(addprefix -e , $(UNDEFINED_ALLOWLIST)) || true); \ 179 + if [ -n "$$undefsyms" ]; then \ 180 + echo "Unexpected symbols in $<:" >&2; \ 181 + echo "$$undefsyms" >&2; \ 182 + false; \ 183 + fi 184 + 185 + $(obj)/%.o.checked: $(obj)/%.o $(obj)/undefsyms_base.o FORCE 186 + $(call if_changed,check_undefined) 187 + 188 + always-$(CONFIG_SIMPLE_RING_BUFFER) += simple_ring_buffer.o.checked 189 + 131 190 libftrace-y := ftrace.o
+261
kernel/trace/remote_test.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Copyright (C) 2025 - Google LLC 4 + * Author: Vincent Donnefort <vdonnefort@google.com> 5 + */ 6 + 7 + #include <linux/module.h> 8 + #include <linux/simple_ring_buffer.h> 9 + #include <linux/trace_remote.h> 10 + #include <linux/tracefs.h> 11 + #include <linux/types.h> 12 + 13 + #define REMOTE_EVENT_INCLUDE_FILE kernel/trace/remote_test_events.h 14 + #include <trace/define_remote_events.h> 15 + 16 + static DEFINE_PER_CPU(struct simple_rb_per_cpu *, simple_rbs); 17 + static struct trace_buffer_desc *remote_test_buffer_desc; 18 + 19 + /* 20 + * The trace_remote lock already serializes accesses from the trace_remote_callbacks. 21 + * However write_event can still race with load/unload. 22 + */ 23 + static DEFINE_MUTEX(simple_rbs_lock); 24 + 25 + static int remote_test_load_simple_rb(int cpu, struct ring_buffer_desc *rb_desc) 26 + { 27 + struct simple_rb_per_cpu *cpu_buffer; 28 + struct simple_buffer_page *bpages; 29 + int ret = -ENOMEM; 30 + 31 + cpu_buffer = kmalloc_obj(*cpu_buffer); 32 + if (!cpu_buffer) 33 + return ret; 34 + 35 + bpages = kmalloc_objs(*bpages, rb_desc->nr_page_va); 36 + if (!bpages) 37 + goto err_free_cpu_buffer; 38 + 39 + ret = simple_ring_buffer_init(cpu_buffer, bpages, rb_desc); 40 + if (ret) 41 + goto err_free_bpages; 42 + 43 + scoped_guard(mutex, &simple_rbs_lock) { 44 + WARN_ON(*per_cpu_ptr(&simple_rbs, cpu)); 45 + *per_cpu_ptr(&simple_rbs, cpu) = cpu_buffer; 46 + } 47 + 48 + return 0; 49 + 50 + err_free_bpages: 51 + kfree(bpages); 52 + 53 + err_free_cpu_buffer: 54 + kfree(cpu_buffer); 55 + 56 + return ret; 57 + } 58 + 59 + static void remote_test_unload_simple_rb(int cpu) 60 + { 61 + struct simple_rb_per_cpu *cpu_buffer = *per_cpu_ptr(&simple_rbs, cpu); 62 + struct simple_buffer_page *bpages; 63 + 64 + if (!cpu_buffer) 65 + return; 66 + 67 + guard(mutex)(&simple_rbs_lock); 68 + 69 + bpages = cpu_buffer->bpages; 70 + simple_ring_buffer_unload(cpu_buffer); 71 + kfree(bpages); 72 + kfree(cpu_buffer); 73 + *per_cpu_ptr(&simple_rbs, cpu) = NULL; 74 + } 75 + 76 + static struct trace_buffer_desc *remote_test_load(unsigned long size, void *unused) 77 + { 78 + struct ring_buffer_desc *rb_desc; 79 + struct trace_buffer_desc *desc; 80 + size_t desc_size; 81 + int cpu, ret; 82 + 83 + if (WARN_ON(remote_test_buffer_desc)) 84 + return ERR_PTR(-EINVAL); 85 + 86 + desc_size = trace_buffer_desc_size(size, num_possible_cpus()); 87 + if (desc_size == SIZE_MAX) { 88 + ret = -E2BIG; 89 + goto err; 90 + } 91 + 92 + desc = kmalloc(desc_size, GFP_KERNEL); 93 + if (!desc) { 94 + ret = -ENOMEM; 95 + goto err; 96 + } 97 + 98 + ret = trace_remote_alloc_buffer(desc, desc_size, size, cpu_possible_mask); 99 + if (ret) 100 + goto err_free_desc; 101 + 102 + for_each_ring_buffer_desc(rb_desc, cpu, desc) { 103 + ret = remote_test_load_simple_rb(rb_desc->cpu, rb_desc); 104 + if (ret) 105 + goto err_unload; 106 + } 107 + 108 + remote_test_buffer_desc = desc; 109 + 110 + return remote_test_buffer_desc; 111 + 112 + err_unload: 113 + for_each_ring_buffer_desc(rb_desc, cpu, remote_test_buffer_desc) 114 + remote_test_unload_simple_rb(rb_desc->cpu); 115 + trace_remote_free_buffer(remote_test_buffer_desc); 116 + 117 + err_free_desc: 118 + kfree(desc); 119 + 120 + err: 121 + return ERR_PTR(ret); 122 + } 123 + 124 + static void remote_test_unload(struct trace_buffer_desc *desc, void *unused) 125 + { 126 + struct ring_buffer_desc *rb_desc; 127 + int cpu; 128 + 129 + if (WARN_ON(desc != remote_test_buffer_desc)) 130 + return; 131 + 132 + for_each_ring_buffer_desc(rb_desc, cpu, desc) 133 + remote_test_unload_simple_rb(rb_desc->cpu); 134 + 135 + remote_test_buffer_desc = NULL; 136 + trace_remote_free_buffer(desc); 137 + kfree(desc); 138 + } 139 + 140 + static int remote_test_enable_tracing(bool enable, void *unused) 141 + { 142 + struct ring_buffer_desc *rb_desc; 143 + int cpu; 144 + 145 + if (!remote_test_buffer_desc) 146 + return -ENODEV; 147 + 148 + for_each_ring_buffer_desc(rb_desc, cpu, remote_test_buffer_desc) 149 + WARN_ON(simple_ring_buffer_enable_tracing(*per_cpu_ptr(&simple_rbs, rb_desc->cpu), 150 + enable)); 151 + return 0; 152 + } 153 + 154 + static int remote_test_swap_reader_page(unsigned int cpu, void *unused) 155 + { 156 + struct simple_rb_per_cpu *cpu_buffer; 157 + 158 + if (cpu >= NR_CPUS) 159 + return -EINVAL; 160 + 161 + cpu_buffer = *per_cpu_ptr(&simple_rbs, cpu); 162 + if (!cpu_buffer) 163 + return -EINVAL; 164 + 165 + return simple_ring_buffer_swap_reader_page(cpu_buffer); 166 + } 167 + 168 + static int remote_test_reset(unsigned int cpu, void *unused) 169 + { 170 + struct simple_rb_per_cpu *cpu_buffer; 171 + 172 + if (cpu >= NR_CPUS) 173 + return -EINVAL; 174 + 175 + cpu_buffer = *per_cpu_ptr(&simple_rbs, cpu); 176 + if (!cpu_buffer) 177 + return -EINVAL; 178 + 179 + return simple_ring_buffer_reset(cpu_buffer); 180 + } 181 + 182 + static int remote_test_enable_event(unsigned short id, bool enable, void *unused) 183 + { 184 + if (id != REMOTE_TEST_EVENT_ID) 185 + return -EINVAL; 186 + 187 + /* 188 + * Let's just use the struct remote_event enabled field that is turned on and off by 189 + * trace_remote. This is a bit racy but good enough for a simple test module. 190 + */ 191 + return 0; 192 + } 193 + 194 + static ssize_t 195 + write_event_write(struct file *filp, const char __user *ubuf, size_t cnt, loff_t *pos) 196 + { 197 + struct remote_event_format_selftest *evt_test; 198 + struct simple_rb_per_cpu *cpu_buffer; 199 + unsigned long val; 200 + int ret; 201 + 202 + ret = kstrtoul_from_user(ubuf, cnt, 10, &val); 203 + if (ret) 204 + return ret; 205 + 206 + guard(mutex)(&simple_rbs_lock); 207 + 208 + if (!remote_event_selftest.enabled) 209 + return -ENODEV; 210 + 211 + guard(preempt)(); 212 + 213 + cpu_buffer = *this_cpu_ptr(&simple_rbs); 214 + if (!cpu_buffer) 215 + return -ENODEV; 216 + 217 + evt_test = simple_ring_buffer_reserve(cpu_buffer, 218 + sizeof(struct remote_event_format_selftest), 219 + trace_clock_global()); 220 + if (!evt_test) 221 + return -ENODEV; 222 + 223 + evt_test->hdr.id = REMOTE_TEST_EVENT_ID; 224 + evt_test->id = val; 225 + 226 + simple_ring_buffer_commit(cpu_buffer); 227 + 228 + return cnt; 229 + } 230 + 231 + static const struct file_operations write_event_fops = { 232 + .write = write_event_write, 233 + }; 234 + 235 + static int remote_test_init_tracefs(struct dentry *d, void *unused) 236 + { 237 + return tracefs_create_file("write_event", 0200, d, NULL, &write_event_fops) ? 238 + 0 : -ENOMEM; 239 + } 240 + 241 + static struct trace_remote_callbacks trace_remote_callbacks = { 242 + .init = remote_test_init_tracefs, 243 + .load_trace_buffer = remote_test_load, 244 + .unload_trace_buffer = remote_test_unload, 245 + .enable_tracing = remote_test_enable_tracing, 246 + .swap_reader_page = remote_test_swap_reader_page, 247 + .reset = remote_test_reset, 248 + .enable_event = remote_test_enable_event, 249 + }; 250 + 251 + static int __init remote_test_init(void) 252 + { 253 + return trace_remote_register("test", &trace_remote_callbacks, NULL, 254 + &remote_event_selftest, 1); 255 + } 256 + 257 + module_init(remote_test_init); 258 + 259 + MODULE_DESCRIPTION("Test module for the trace remote interface"); 260 + MODULE_AUTHOR("Vincent Donnefort"); 261 + MODULE_LICENSE("GPL");
+10
kernel/trace/remote_test_events.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + 3 + #define REMOTE_TEST_EVENT_ID 1 4 + 5 + REMOTE_EVENT(selftest, REMOTE_TEST_EVENT_ID, 6 + RE_STRUCT( 7 + re_field(u64, id) 8 + ), 9 + RE_PRINTK("id=%llu", __entry->id) 10 + );
+303 -51
kernel/trace/ring_buffer.c
··· 4 4 * 5 5 * Copyright (C) 2008 Steven Rostedt <srostedt@redhat.com> 6 6 */ 7 + #include <linux/ring_buffer_types.h> 7 8 #include <linux/sched/isolation.h> 8 9 #include <linux/trace_recursion.h> 9 10 #include <linux/trace_events.h> ··· 158 157 /* Used for individual buffers (after the counter) */ 159 158 #define RB_BUFFER_OFF (1 << 20) 160 159 161 - #define BUF_PAGE_HDR_SIZE offsetof(struct buffer_data_page, data) 162 - 163 - #define RB_EVNT_HDR_SIZE (offsetof(struct ring_buffer_event, array)) 164 - #define RB_ALIGNMENT 4U 165 - #define RB_MAX_SMALL_DATA (RB_ALIGNMENT * RINGBUF_TYPE_DATA_TYPE_LEN_MAX) 166 - #define RB_EVNT_MIN_SIZE 8U /* two 32bit words */ 167 - 168 - #ifndef CONFIG_HAVE_64BIT_ALIGNED_ACCESS 169 - # define RB_FORCE_8BYTE_ALIGNMENT 0 170 - # define RB_ARCH_ALIGNMENT RB_ALIGNMENT 171 - #else 172 - # define RB_FORCE_8BYTE_ALIGNMENT 1 173 - # define RB_ARCH_ALIGNMENT 8U 174 - #endif 175 - 176 - #define RB_ALIGN_DATA __aligned(RB_ARCH_ALIGNMENT) 177 - 178 160 /* define RINGBUF_TYPE_DATA for 'case RINGBUF_TYPE_DATA:' */ 179 161 #define RINGBUF_TYPE_DATA 0 ... RINGBUF_TYPE_DATA_TYPE_LEN_MAX 180 162 ··· 300 316 #define for_each_online_buffer_cpu(buffer, cpu) \ 301 317 for_each_cpu_and(cpu, buffer->cpumask, cpu_online_mask) 302 318 303 - #define TS_SHIFT 27 304 - #define TS_MASK ((1ULL << TS_SHIFT) - 1) 305 - #define TS_DELTA_TEST (~TS_MASK) 306 - 307 319 static u64 rb_event_time_stamp(struct ring_buffer_event *event) 308 320 { 309 321 u64 ts; ··· 317 337 #define RB_MISSED_STORED (1 << 30) 318 338 319 339 #define RB_MISSED_MASK (3 << 30) 320 - 321 - struct buffer_data_page { 322 - u64 time_stamp; /* page time stamp */ 323 - local_t commit; /* write committed index */ 324 - unsigned char data[] RB_ALIGN_DATA; /* data of buffer page */ 325 - }; 326 340 327 341 struct buffer_data_read_page { 328 342 unsigned order; /* order of the page */ ··· 409 435 rb_init_page(dpage); 410 436 411 437 return dpage; 412 - } 413 - 414 - /* 415 - * We need to fit the time_stamp delta into 27 bits. 416 - */ 417 - static inline bool test_time_stamp(u64 delta) 418 - { 419 - return !!(delta & TS_DELTA_TEST); 420 438 } 421 439 422 440 struct rb_irq_work { ··· 521 555 unsigned int mapped; 522 556 unsigned int user_mapped; /* user space mapping */ 523 557 struct mutex mapping_lock; 524 - unsigned long *subbuf_ids; /* ID to subbuf VA */ 558 + struct buffer_page **subbuf_ids; /* ID to subbuf VA */ 525 559 struct trace_buffer_meta *meta_page; 526 560 struct ring_buffer_cpu_meta *ring_meta; 561 + 562 + struct ring_buffer_remote *remote; 527 563 528 564 /* ring buffer pages to update, > 0 to add, < 0 to remove */ 529 565 long nr_pages_to_update; ··· 548 580 struct mutex mutex; 549 581 550 582 struct ring_buffer_per_cpu **buffers; 583 + 584 + struct ring_buffer_remote *remote; 551 585 552 586 struct hlist_node node; 553 587 u64 (*clock)(void); ··· 606 636 trace_seq_printf(s, "\tfield: char data;\t" 607 637 "offset:%u;\tsize:%u;\tsigned:%u;\n", 608 638 (unsigned int)offsetof(typeof(field), data), 609 - (unsigned int)buffer->subbuf_size, 639 + (unsigned int)(buffer ? buffer->subbuf_size : 640 + PAGE_SIZE - BUF_PAGE_HDR_SIZE), 610 641 (unsigned int)is_signed_type(char)); 611 642 612 643 return !trace_seq_has_overflowed(s); ··· 2209 2238 } 2210 2239 } 2211 2240 2241 + static struct ring_buffer_desc *ring_buffer_desc(struct trace_buffer_desc *trace_desc, int cpu) 2242 + { 2243 + struct ring_buffer_desc *desc, *end; 2244 + size_t len; 2245 + int i; 2246 + 2247 + if (!trace_desc) 2248 + return NULL; 2249 + 2250 + if (cpu >= trace_desc->nr_cpus) 2251 + return NULL; 2252 + 2253 + end = (struct ring_buffer_desc *)((void *)trace_desc + trace_desc->struct_len); 2254 + desc = __first_ring_buffer_desc(trace_desc); 2255 + len = struct_size(desc, page_va, desc->nr_page_va); 2256 + desc = (struct ring_buffer_desc *)((void *)desc + (len * cpu)); 2257 + 2258 + if (desc < end && desc->cpu == cpu) 2259 + return desc; 2260 + 2261 + /* Missing CPUs, need to linear search */ 2262 + for_each_ring_buffer_desc(desc, i, trace_desc) { 2263 + if (desc->cpu == cpu) 2264 + return desc; 2265 + } 2266 + 2267 + return NULL; 2268 + } 2269 + 2270 + static void *ring_buffer_desc_page(struct ring_buffer_desc *desc, int page_id) 2271 + { 2272 + return page_id > desc->nr_page_va ? NULL : (void *)desc->page_va[page_id]; 2273 + } 2274 + 2212 2275 static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer, 2213 2276 long nr_pages, struct list_head *pages) 2214 2277 { ··· 2250 2245 struct ring_buffer_cpu_meta *meta = NULL; 2251 2246 struct buffer_page *bpage, *tmp; 2252 2247 bool user_thread = current->mm != NULL; 2248 + struct ring_buffer_desc *desc = NULL; 2253 2249 long i; 2254 2250 2255 2251 /* ··· 2279 2273 if (buffer->range_addr_start) 2280 2274 meta = rb_range_meta(buffer, nr_pages, cpu_buffer->cpu); 2281 2275 2276 + if (buffer->remote) { 2277 + desc = ring_buffer_desc(buffer->remote->desc, cpu_buffer->cpu); 2278 + if (!desc || WARN_ON(desc->nr_page_va != (nr_pages + 1))) 2279 + return -EINVAL; 2280 + } 2281 + 2282 2282 for (i = 0; i < nr_pages; i++) { 2283 2283 2284 2284 bpage = alloc_cpu_page(cpu_buffer->cpu); ··· 2309 2297 rb_meta_buffer_update(cpu_buffer, bpage); 2310 2298 bpage->range = 1; 2311 2299 bpage->id = i + 1; 2300 + } else if (desc) { 2301 + void *p = ring_buffer_desc_page(desc, i + 1); 2302 + 2303 + if (WARN_ON(!p)) 2304 + goto free_pages; 2305 + 2306 + bpage->page = p; 2307 + bpage->range = 1; /* bpage->page can't be freed */ 2308 + bpage->id = i + 1; 2309 + cpu_buffer->subbuf_ids[i + 1] = bpage; 2312 2310 } else { 2313 2311 int order = cpu_buffer->buffer->subbuf_order; 2314 2312 bpage->page = alloc_cpu_data(cpu_buffer->cpu, order); ··· 2416 2394 if (cpu_buffer->ring_meta->head_buffer) 2417 2395 rb_meta_buffer_update(cpu_buffer, bpage); 2418 2396 bpage->range = 1; 2397 + } else if (buffer->remote) { 2398 + struct ring_buffer_desc *desc = ring_buffer_desc(buffer->remote->desc, cpu); 2399 + 2400 + if (!desc) 2401 + goto fail_free_reader; 2402 + 2403 + cpu_buffer->remote = buffer->remote; 2404 + cpu_buffer->meta_page = (struct trace_buffer_meta *)(void *)desc->meta_va; 2405 + cpu_buffer->nr_pages = nr_pages; 2406 + cpu_buffer->subbuf_ids = kcalloc(cpu_buffer->nr_pages + 1, 2407 + sizeof(*cpu_buffer->subbuf_ids), GFP_KERNEL); 2408 + if (!cpu_buffer->subbuf_ids) 2409 + goto fail_free_reader; 2410 + 2411 + /* Remote buffers are read-only and immutable */ 2412 + atomic_inc(&cpu_buffer->record_disabled); 2413 + atomic_inc(&cpu_buffer->resize_disabled); 2414 + 2415 + bpage->page = ring_buffer_desc_page(desc, cpu_buffer->meta_page->reader.id); 2416 + if (!bpage->page) 2417 + goto fail_free_reader; 2418 + 2419 + bpage->range = 1; 2420 + cpu_buffer->subbuf_ids[0] = bpage; 2419 2421 } else { 2420 2422 int order = cpu_buffer->buffer->subbuf_order; 2421 2423 bpage->page = alloc_cpu_data(cpu, order); ··· 2499 2453 2500 2454 irq_work_sync(&cpu_buffer->irq_work.work); 2501 2455 2456 + if (cpu_buffer->remote) 2457 + kfree(cpu_buffer->subbuf_ids); 2458 + 2502 2459 free_buffer_page(cpu_buffer->reader_page); 2503 2460 2504 2461 if (head) { ··· 2524 2475 int order, unsigned long start, 2525 2476 unsigned long end, 2526 2477 unsigned long scratch_size, 2527 - struct lock_class_key *key) 2478 + struct lock_class_key *key, 2479 + struct ring_buffer_remote *remote) 2528 2480 { 2529 2481 struct trace_buffer *buffer __free(kfree) = NULL; 2530 2482 long nr_pages; ··· 2564 2514 GFP_KERNEL); 2565 2515 if (!buffer->buffers) 2566 2516 goto fail_free_cpumask; 2517 + 2518 + cpu = raw_smp_processor_id(); 2567 2519 2568 2520 /* If start/end are specified, then that overrides size */ 2569 2521 if (start && end) { ··· 2622 2570 buffer->range_addr_end = end; 2623 2571 2624 2572 rb_range_meta_init(buffer, nr_pages, scratch_size); 2573 + } else if (remote) { 2574 + struct ring_buffer_desc *desc = ring_buffer_desc(remote->desc, cpu); 2575 + 2576 + buffer->remote = remote; 2577 + /* The writer is remote. This ring-buffer is read-only */ 2578 + atomic_inc(&buffer->record_disabled); 2579 + nr_pages = desc->nr_page_va - 1; 2580 + if (nr_pages < 2) 2581 + goto fail_free_buffers; 2625 2582 } else { 2626 2583 2627 2584 /* need at least two pages */ ··· 2639 2578 nr_pages = 2; 2640 2579 } 2641 2580 2642 - cpu = raw_smp_processor_id(); 2643 2581 cpumask_set_cpu(cpu, buffer->cpumask); 2644 2582 buffer->buffers[cpu] = rb_allocate_cpu_buffer(buffer, nr_pages, cpu); 2645 2583 if (!buffer->buffers[cpu]) ··· 2680 2620 struct lock_class_key *key) 2681 2621 { 2682 2622 /* Default buffer page size - one system page */ 2683 - return alloc_buffer(size, flags, 0, 0, 0, 0, key); 2623 + return alloc_buffer(size, flags, 0, 0, 0, 0, key, NULL); 2684 2624 2685 2625 } 2686 2626 EXPORT_SYMBOL_GPL(__ring_buffer_alloc); ··· 2707 2647 struct lock_class_key *key) 2708 2648 { 2709 2649 return alloc_buffer(size, flags, order, start, start + range_size, 2710 - scratch_size, key); 2650 + scratch_size, key, NULL); 2651 + } 2652 + 2653 + /** 2654 + * __ring_buffer_alloc_remote - allocate a new ring_buffer from a remote 2655 + * @remote: Contains a description of the ring-buffer pages and remote callbacks. 2656 + * @key: ring buffer reader_lock_key. 2657 + */ 2658 + struct trace_buffer *__ring_buffer_alloc_remote(struct ring_buffer_remote *remote, 2659 + struct lock_class_key *key) 2660 + { 2661 + return alloc_buffer(0, 0, 0, 0, 0, 0, key, remote); 2711 2662 } 2712 2663 2713 2664 void *ring_buffer_meta_scratch(struct trace_buffer *buffer, unsigned int *size) ··· 5345 5274 } 5346 5275 EXPORT_SYMBOL_GPL(ring_buffer_overruns); 5347 5276 5277 + static bool rb_read_remote_meta_page(struct ring_buffer_per_cpu *cpu_buffer) 5278 + { 5279 + local_set(&cpu_buffer->entries, READ_ONCE(cpu_buffer->meta_page->entries)); 5280 + local_set(&cpu_buffer->overrun, READ_ONCE(cpu_buffer->meta_page->overrun)); 5281 + local_set(&cpu_buffer->pages_touched, READ_ONCE(cpu_buffer->meta_page->pages_touched)); 5282 + local_set(&cpu_buffer->pages_lost, READ_ONCE(cpu_buffer->meta_page->pages_lost)); 5283 + 5284 + return rb_num_of_entries(cpu_buffer); 5285 + } 5286 + 5287 + static void rb_update_remote_head(struct ring_buffer_per_cpu *cpu_buffer) 5288 + { 5289 + struct buffer_page *next, *orig; 5290 + int retry = 3; 5291 + 5292 + orig = next = cpu_buffer->head_page; 5293 + rb_inc_page(&next); 5294 + 5295 + /* Run after the writer */ 5296 + while (cpu_buffer->head_page->page->time_stamp > next->page->time_stamp) { 5297 + rb_inc_page(&next); 5298 + 5299 + rb_list_head_clear(cpu_buffer->head_page->list.prev); 5300 + rb_inc_page(&cpu_buffer->head_page); 5301 + rb_set_list_to_head(cpu_buffer->head_page->list.prev); 5302 + 5303 + if (cpu_buffer->head_page == orig) { 5304 + if (WARN_ON_ONCE(!(--retry))) 5305 + return; 5306 + } 5307 + } 5308 + 5309 + orig = cpu_buffer->commit_page = cpu_buffer->head_page; 5310 + retry = 3; 5311 + 5312 + while (cpu_buffer->commit_page->page->time_stamp < next->page->time_stamp) { 5313 + rb_inc_page(&next); 5314 + rb_inc_page(&cpu_buffer->commit_page); 5315 + 5316 + if (cpu_buffer->commit_page == orig) { 5317 + if (WARN_ON_ONCE(!(--retry))) 5318 + return; 5319 + } 5320 + } 5321 + } 5322 + 5348 5323 static void rb_iter_reset(struct ring_buffer_iter *iter) 5349 5324 { 5350 5325 struct ring_buffer_per_cpu *cpu_buffer = iter->cpu_buffer; 5326 + 5327 + if (cpu_buffer->remote) { 5328 + rb_read_remote_meta_page(cpu_buffer); 5329 + rb_update_remote_head(cpu_buffer); 5330 + } 5351 5331 5352 5332 /* Iterator usage is expected to have record disabled */ 5353 5333 iter->head_page = cpu_buffer->reader_page; ··· 5550 5428 } 5551 5429 5552 5430 static struct buffer_page * 5553 - rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer) 5431 + __rb_get_reader_page_from_remote(struct ring_buffer_per_cpu *cpu_buffer) 5432 + { 5433 + struct buffer_page *new_reader, *prev_reader, *prev_head, *new_head, *last; 5434 + 5435 + if (!rb_read_remote_meta_page(cpu_buffer)) 5436 + return NULL; 5437 + 5438 + /* More to read on the reader page */ 5439 + if (cpu_buffer->reader_page->read < rb_page_size(cpu_buffer->reader_page)) { 5440 + if (!cpu_buffer->reader_page->read) 5441 + cpu_buffer->read_stamp = cpu_buffer->reader_page->page->time_stamp; 5442 + return cpu_buffer->reader_page; 5443 + } 5444 + 5445 + prev_reader = cpu_buffer->subbuf_ids[cpu_buffer->meta_page->reader.id]; 5446 + 5447 + WARN_ON_ONCE(cpu_buffer->remote->swap_reader_page(cpu_buffer->cpu, 5448 + cpu_buffer->remote->priv)); 5449 + /* nr_pages doesn't include the reader page */ 5450 + if (WARN_ON_ONCE(cpu_buffer->meta_page->reader.id > cpu_buffer->nr_pages)) 5451 + return NULL; 5452 + 5453 + new_reader = cpu_buffer->subbuf_ids[cpu_buffer->meta_page->reader.id]; 5454 + 5455 + WARN_ON_ONCE(prev_reader == new_reader); 5456 + 5457 + prev_head = new_reader; /* New reader was also the previous head */ 5458 + new_head = prev_head; 5459 + rb_inc_page(&new_head); 5460 + last = prev_head; 5461 + rb_dec_page(&last); 5462 + 5463 + /* Clear the old HEAD flag */ 5464 + rb_list_head_clear(cpu_buffer->head_page->list.prev); 5465 + 5466 + prev_reader->list.next = prev_head->list.next; 5467 + prev_reader->list.prev = prev_head->list.prev; 5468 + 5469 + /* Swap prev_reader with new_reader */ 5470 + last->list.next = &prev_reader->list; 5471 + new_head->list.prev = &prev_reader->list; 5472 + 5473 + new_reader->list.prev = &new_reader->list; 5474 + new_reader->list.next = &new_head->list; 5475 + 5476 + /* Reactivate the HEAD flag */ 5477 + rb_set_list_to_head(&last->list); 5478 + 5479 + cpu_buffer->head_page = new_head; 5480 + cpu_buffer->reader_page = new_reader; 5481 + cpu_buffer->pages = &new_head->list; 5482 + cpu_buffer->read_stamp = new_reader->page->time_stamp; 5483 + cpu_buffer->lost_events = cpu_buffer->meta_page->reader.lost_events; 5484 + 5485 + return rb_page_size(cpu_buffer->reader_page) ? cpu_buffer->reader_page : NULL; 5486 + } 5487 + 5488 + static struct buffer_page * 5489 + __rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer) 5554 5490 { 5555 5491 struct buffer_page *reader = NULL; 5556 5492 unsigned long bsize = READ_ONCE(cpu_buffer->buffer->subbuf_size); ··· 5776 5596 5777 5597 5778 5598 return reader; 5599 + } 5600 + 5601 + static struct buffer_page * 5602 + rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer) 5603 + { 5604 + return cpu_buffer->remote ? __rb_get_reader_page_from_remote(cpu_buffer) : 5605 + __rb_get_reader_page(cpu_buffer); 5779 5606 } 5780 5607 5781 5608 static void rb_advance_reader(struct ring_buffer_per_cpu *cpu_buffer) ··· 6341 6154 meta->entries = local_read(&cpu_buffer->entries); 6342 6155 meta->overrun = local_read(&cpu_buffer->overrun); 6343 6156 meta->read = cpu_buffer->read; 6157 + meta->pages_lost = local_read(&cpu_buffer->pages_lost); 6158 + meta->pages_touched = local_read(&cpu_buffer->pages_touched); 6344 6159 6345 6160 /* Some archs do not have data cache coherency between kernel and user-space */ 6346 6161 flush_kernel_vmap_range(cpu_buffer->meta_page, PAGE_SIZE); ··· 6352 6163 rb_reset_cpu(struct ring_buffer_per_cpu *cpu_buffer) 6353 6164 { 6354 6165 struct buffer_page *page; 6166 + 6167 + if (cpu_buffer->remote) { 6168 + if (!cpu_buffer->remote->reset) 6169 + return; 6170 + 6171 + cpu_buffer->remote->reset(cpu_buffer->cpu, cpu_buffer->remote->priv); 6172 + rb_read_remote_meta_page(cpu_buffer); 6173 + 6174 + /* Read related values, not covered by the meta-page */ 6175 + local_set(&cpu_buffer->pages_read, 0); 6176 + cpu_buffer->read = 0; 6177 + cpu_buffer->read_bytes = 0; 6178 + cpu_buffer->last_overrun = 0; 6179 + cpu_buffer->reader_page->read = 0; 6180 + 6181 + return; 6182 + } 6355 6183 6356 6184 rb_head_page_deactivate(cpu_buffer); 6357 6185 ··· 6599 6393 return ret; 6600 6394 } 6601 6395 EXPORT_SYMBOL_GPL(ring_buffer_empty_cpu); 6396 + 6397 + int ring_buffer_poll_remote(struct trace_buffer *buffer, int cpu) 6398 + { 6399 + struct ring_buffer_per_cpu *cpu_buffer; 6400 + 6401 + if (cpu != RING_BUFFER_ALL_CPUS) { 6402 + if (!cpumask_test_cpu(cpu, buffer->cpumask)) 6403 + return -EINVAL; 6404 + 6405 + cpu_buffer = buffer->buffers[cpu]; 6406 + 6407 + guard(raw_spinlock)(&cpu_buffer->reader_lock); 6408 + if (rb_read_remote_meta_page(cpu_buffer)) 6409 + rb_wakeups(buffer, cpu_buffer); 6410 + 6411 + return 0; 6412 + } 6413 + 6414 + guard(cpus_read_lock)(); 6415 + 6416 + /* 6417 + * Make sure all the ring buffers are up to date before we start reading 6418 + * them. 6419 + */ 6420 + for_each_buffer_cpu(buffer, cpu) { 6421 + cpu_buffer = buffer->buffers[cpu]; 6422 + 6423 + guard(raw_spinlock)(&cpu_buffer->reader_lock); 6424 + rb_read_remote_meta_page(cpu_buffer); 6425 + } 6426 + 6427 + for_each_buffer_cpu(buffer, cpu) { 6428 + cpu_buffer = buffer->buffers[cpu]; 6429 + 6430 + if (rb_num_of_entries(cpu_buffer)) 6431 + rb_wakeups(buffer, cpu_buffer); 6432 + } 6433 + 6434 + return 0; 6435 + } 6602 6436 6603 6437 #ifdef CONFIG_RING_BUFFER_ALLOW_SWAP 6604 6438 /** ··· 6878 6632 unsigned int commit; 6879 6633 unsigned int read; 6880 6634 u64 save_timestamp; 6635 + bool force_memcpy; 6881 6636 6882 6637 if (!cpumask_test_cpu(cpu, buffer->cpumask)) 6883 6638 return -1; ··· 6916 6669 /* Check if any events were dropped */ 6917 6670 missed_events = cpu_buffer->lost_events; 6918 6671 6672 + force_memcpy = cpu_buffer->mapped || cpu_buffer->remote; 6673 + 6919 6674 /* 6920 6675 * If this page has been partially read or 6921 6676 * if len is not big enough to read the rest of the page or ··· 6927 6678 */ 6928 6679 if (read || (len < (commit - read)) || 6929 6680 cpu_buffer->reader_page == cpu_buffer->commit_page || 6930 - cpu_buffer->mapped) { 6681 + force_memcpy) { 6931 6682 struct buffer_data_page *rpage = cpu_buffer->reader_page->page; 6932 6683 unsigned int rpos = read; 6933 6684 unsigned int pos = 0; ··· 7283 7034 } 7284 7035 7285 7036 static void rb_setup_ids_meta_page(struct ring_buffer_per_cpu *cpu_buffer, 7286 - unsigned long *subbuf_ids) 7037 + struct buffer_page **subbuf_ids) 7287 7038 { 7288 7039 struct trace_buffer_meta *meta = cpu_buffer->meta_page; 7289 7040 unsigned int nr_subbufs = cpu_buffer->nr_pages + 1; ··· 7292 7043 int id = 0; 7293 7044 7294 7045 id = rb_page_id(cpu_buffer, cpu_buffer->reader_page, id); 7295 - subbuf_ids[id++] = (unsigned long)cpu_buffer->reader_page->page; 7046 + subbuf_ids[id++] = cpu_buffer->reader_page; 7296 7047 cnt++; 7297 7048 7298 7049 first_subbuf = subbuf = rb_set_head_page(cpu_buffer); ··· 7302 7053 if (WARN_ON(id >= nr_subbufs)) 7303 7054 break; 7304 7055 7305 - subbuf_ids[id] = (unsigned long)subbuf->page; 7056 + subbuf_ids[id] = subbuf; 7306 7057 7307 7058 rb_inc_page(&subbuf); 7308 7059 id++; ··· 7311 7062 7312 7063 WARN_ON(cnt != nr_subbufs); 7313 7064 7314 - /* install subbuf ID to kern VA translation */ 7065 + /* install subbuf ID to bpage translation */ 7315 7066 cpu_buffer->subbuf_ids = subbuf_ids; 7316 7067 7317 7068 meta->meta_struct_len = sizeof(*meta); ··· 7467 7218 } 7468 7219 7469 7220 while (p < nr_pages) { 7221 + struct buffer_page *subbuf; 7470 7222 struct page *page; 7471 7223 int off = 0; 7472 7224 7473 7225 if (WARN_ON_ONCE(s >= nr_subbufs)) 7474 7226 return -EINVAL; 7475 7227 7476 - page = virt_to_page((void *)cpu_buffer->subbuf_ids[s]); 7228 + subbuf = cpu_buffer->subbuf_ids[s]; 7229 + page = virt_to_page((void *)subbuf->page); 7477 7230 7478 7231 for (; off < (1 << (subbuf_order)); off++, page++) { 7479 7232 if (p >= nr_pages) ··· 7502 7251 struct vm_area_struct *vma) 7503 7252 { 7504 7253 struct ring_buffer_per_cpu *cpu_buffer; 7505 - unsigned long flags, *subbuf_ids; 7254 + struct buffer_page **subbuf_ids; 7255 + unsigned long flags; 7506 7256 int err; 7507 7257 7508 - if (!cpumask_test_cpu(cpu, buffer->cpumask)) 7258 + if (!cpumask_test_cpu(cpu, buffer->cpumask) || buffer->remote) 7509 7259 return -EINVAL; 7510 7260 7511 7261 cpu_buffer = buffer->buffers[cpu]; ··· 7527 7275 if (err) 7528 7276 return err; 7529 7277 7530 - /* subbuf_ids include the reader while nr_pages does not */ 7278 + /* subbuf_ids includes the reader while nr_pages does not */ 7531 7279 subbuf_ids = kcalloc(cpu_buffer->nr_pages + 1, sizeof(*subbuf_ids), GFP_KERNEL); 7532 7280 if (!subbuf_ids) { 7533 7281 rb_free_meta_page(cpu_buffer);
+517
kernel/trace/simple_ring_buffer.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Copyright (C) 2025 - Google LLC 4 + * Author: Vincent Donnefort <vdonnefort@google.com> 5 + */ 6 + 7 + #include <linux/atomic.h> 8 + #include <linux/simple_ring_buffer.h> 9 + 10 + #include <asm/barrier.h> 11 + #include <asm/local.h> 12 + 13 + enum simple_rb_link_type { 14 + SIMPLE_RB_LINK_NORMAL = 0, 15 + SIMPLE_RB_LINK_HEAD = 1, 16 + SIMPLE_RB_LINK_HEAD_MOVING 17 + }; 18 + 19 + #define SIMPLE_RB_LINK_MASK ~(SIMPLE_RB_LINK_HEAD | SIMPLE_RB_LINK_HEAD_MOVING) 20 + 21 + static void simple_bpage_set_head_link(struct simple_buffer_page *bpage) 22 + { 23 + unsigned long link = (unsigned long)bpage->link.next; 24 + 25 + link &= SIMPLE_RB_LINK_MASK; 26 + link |= SIMPLE_RB_LINK_HEAD; 27 + 28 + /* 29 + * Paired with simple_rb_find_head() to order access between the head 30 + * link and overrun. It ensures we always report an up-to-date value 31 + * after swapping the reader page. 32 + */ 33 + smp_store_release(&bpage->link.next, (struct list_head *)link); 34 + } 35 + 36 + static bool simple_bpage_unset_head_link(struct simple_buffer_page *bpage, 37 + struct simple_buffer_page *dst, 38 + enum simple_rb_link_type new_type) 39 + { 40 + unsigned long *link = (unsigned long *)(&bpage->link.next); 41 + unsigned long old = (*link & SIMPLE_RB_LINK_MASK) | SIMPLE_RB_LINK_HEAD; 42 + unsigned long new = (unsigned long)(&dst->link) | new_type; 43 + 44 + return try_cmpxchg(link, &old, new); 45 + } 46 + 47 + static void simple_bpage_set_normal_link(struct simple_buffer_page *bpage) 48 + { 49 + unsigned long link = (unsigned long)bpage->link.next; 50 + 51 + WRITE_ONCE(bpage->link.next, (struct list_head *)(link & SIMPLE_RB_LINK_MASK)); 52 + } 53 + 54 + static struct simple_buffer_page *simple_bpage_from_link(struct list_head *link) 55 + { 56 + unsigned long ptr = (unsigned long)link & SIMPLE_RB_LINK_MASK; 57 + 58 + return container_of((struct list_head *)ptr, struct simple_buffer_page, link); 59 + } 60 + 61 + static struct simple_buffer_page *simple_bpage_next_page(struct simple_buffer_page *bpage) 62 + { 63 + return simple_bpage_from_link(bpage->link.next); 64 + } 65 + 66 + static void simple_bpage_reset(struct simple_buffer_page *bpage) 67 + { 68 + bpage->write = 0; 69 + bpage->entries = 0; 70 + 71 + local_set(&bpage->page->commit, 0); 72 + } 73 + 74 + static void simple_bpage_init(struct simple_buffer_page *bpage, void *page) 75 + { 76 + INIT_LIST_HEAD(&bpage->link); 77 + bpage->page = (struct buffer_data_page *)page; 78 + 79 + simple_bpage_reset(bpage); 80 + } 81 + 82 + #define simple_rb_meta_inc(__meta, __inc) \ 83 + WRITE_ONCE((__meta), (__meta + __inc)) 84 + 85 + static bool simple_rb_loaded(struct simple_rb_per_cpu *cpu_buffer) 86 + { 87 + return !!cpu_buffer->bpages; 88 + } 89 + 90 + static int simple_rb_find_head(struct simple_rb_per_cpu *cpu_buffer) 91 + { 92 + int retry = cpu_buffer->nr_pages * 2; 93 + struct simple_buffer_page *head; 94 + 95 + head = cpu_buffer->head_page; 96 + 97 + while (retry--) { 98 + unsigned long link; 99 + 100 + spin: 101 + /* See smp_store_release in simple_bpage_set_head_link() */ 102 + link = (unsigned long)smp_load_acquire(&head->link.prev->next); 103 + 104 + switch (link & ~SIMPLE_RB_LINK_MASK) { 105 + /* Found the head */ 106 + case SIMPLE_RB_LINK_HEAD: 107 + cpu_buffer->head_page = head; 108 + return 0; 109 + /* The writer caught the head, we can spin, that won't be long */ 110 + case SIMPLE_RB_LINK_HEAD_MOVING: 111 + goto spin; 112 + } 113 + 114 + head = simple_bpage_next_page(head); 115 + } 116 + 117 + return -EBUSY; 118 + } 119 + 120 + /** 121 + * simple_ring_buffer_swap_reader_page - Swap ring-buffer head with the reader 122 + * @cpu_buffer: A simple_rb_per_cpu 123 + * 124 + * This function enables consuming reading. It ensures the current head page will not be overwritten 125 + * and can be safely read. 126 + * 127 + * Returns 0 on success, -ENODEV if @cpu_buffer was unloaded or -EBUSY if we failed to catch the 128 + * head page. 129 + */ 130 + int simple_ring_buffer_swap_reader_page(struct simple_rb_per_cpu *cpu_buffer) 131 + { 132 + struct simple_buffer_page *last, *head, *reader; 133 + unsigned long overrun; 134 + int retry = 8; 135 + int ret; 136 + 137 + if (!simple_rb_loaded(cpu_buffer)) 138 + return -ENODEV; 139 + 140 + reader = cpu_buffer->reader_page; 141 + 142 + do { 143 + /* Run after the writer to find the head */ 144 + ret = simple_rb_find_head(cpu_buffer); 145 + if (ret) 146 + return ret; 147 + 148 + head = cpu_buffer->head_page; 149 + 150 + /* Connect the reader page around the header page */ 151 + reader->link.next = head->link.next; 152 + reader->link.prev = head->link.prev; 153 + 154 + /* The last page before the head */ 155 + last = simple_bpage_from_link(head->link.prev); 156 + 157 + /* The reader page points to the new header page */ 158 + simple_bpage_set_head_link(reader); 159 + 160 + overrun = cpu_buffer->meta->overrun; 161 + } while (!simple_bpage_unset_head_link(last, reader, SIMPLE_RB_LINK_NORMAL) && retry--); 162 + 163 + if (!retry) 164 + return -EINVAL; 165 + 166 + cpu_buffer->head_page = simple_bpage_from_link(reader->link.next); 167 + cpu_buffer->head_page->link.prev = &reader->link; 168 + cpu_buffer->reader_page = head; 169 + cpu_buffer->meta->reader.lost_events = overrun - cpu_buffer->last_overrun; 170 + cpu_buffer->meta->reader.id = cpu_buffer->reader_page->id; 171 + cpu_buffer->last_overrun = overrun; 172 + 173 + return 0; 174 + } 175 + EXPORT_SYMBOL_GPL(simple_ring_buffer_swap_reader_page); 176 + 177 + static struct simple_buffer_page *simple_rb_move_tail(struct simple_rb_per_cpu *cpu_buffer) 178 + { 179 + struct simple_buffer_page *tail, *new_tail; 180 + 181 + tail = cpu_buffer->tail_page; 182 + new_tail = simple_bpage_next_page(tail); 183 + 184 + if (simple_bpage_unset_head_link(tail, new_tail, SIMPLE_RB_LINK_HEAD_MOVING)) { 185 + /* 186 + * Oh no! we've caught the head. There is none anymore and 187 + * swap_reader will spin until we set the new one. Overrun must 188 + * be written first, to make sure we report the correct number 189 + * of lost events. 190 + */ 191 + simple_rb_meta_inc(cpu_buffer->meta->overrun, new_tail->entries); 192 + simple_rb_meta_inc(cpu_buffer->meta->pages_lost, 1); 193 + 194 + simple_bpage_set_head_link(new_tail); 195 + simple_bpage_set_normal_link(tail); 196 + } 197 + 198 + simple_bpage_reset(new_tail); 199 + cpu_buffer->tail_page = new_tail; 200 + 201 + simple_rb_meta_inc(cpu_buffer->meta->pages_touched, 1); 202 + 203 + return new_tail; 204 + } 205 + 206 + static unsigned long rb_event_size(unsigned long length) 207 + { 208 + struct ring_buffer_event *event; 209 + 210 + return length + RB_EVNT_HDR_SIZE + sizeof(event->array[0]); 211 + } 212 + 213 + static struct ring_buffer_event * 214 + rb_event_add_ts_extend(struct ring_buffer_event *event, u64 delta) 215 + { 216 + event->type_len = RINGBUF_TYPE_TIME_EXTEND; 217 + event->time_delta = delta & TS_MASK; 218 + event->array[0] = delta >> TS_SHIFT; 219 + 220 + return (struct ring_buffer_event *)((unsigned long)event + 8); 221 + } 222 + 223 + static struct ring_buffer_event * 224 + simple_rb_reserve_next(struct simple_rb_per_cpu *cpu_buffer, unsigned long length, u64 timestamp) 225 + { 226 + unsigned long ts_ext_size = 0, event_size = rb_event_size(length); 227 + struct simple_buffer_page *tail = cpu_buffer->tail_page; 228 + struct ring_buffer_event *event; 229 + u32 write, prev_write; 230 + u64 time_delta; 231 + 232 + time_delta = timestamp - cpu_buffer->write_stamp; 233 + 234 + if (test_time_stamp(time_delta)) 235 + ts_ext_size = 8; 236 + 237 + prev_write = tail->write; 238 + write = prev_write + event_size + ts_ext_size; 239 + 240 + if (unlikely(write > (PAGE_SIZE - BUF_PAGE_HDR_SIZE))) 241 + tail = simple_rb_move_tail(cpu_buffer); 242 + 243 + if (!tail->entries) { 244 + tail->page->time_stamp = timestamp; 245 + time_delta = 0; 246 + ts_ext_size = 0; 247 + write = event_size; 248 + prev_write = 0; 249 + } 250 + 251 + tail->write = write; 252 + tail->entries++; 253 + 254 + cpu_buffer->write_stamp = timestamp; 255 + 256 + event = (struct ring_buffer_event *)(tail->page->data + prev_write); 257 + if (ts_ext_size) { 258 + event = rb_event_add_ts_extend(event, time_delta); 259 + time_delta = 0; 260 + } 261 + 262 + event->type_len = 0; 263 + event->time_delta = time_delta; 264 + event->array[0] = event_size - RB_EVNT_HDR_SIZE; 265 + 266 + return event; 267 + } 268 + 269 + /** 270 + * simple_ring_buffer_reserve - Reserve an entry in @cpu_buffer 271 + * @cpu_buffer: A simple_rb_per_cpu 272 + * @length: Size of the entry in bytes 273 + * @timestamp: Timestamp of the entry 274 + * 275 + * Returns the address of the entry where to write data or NULL 276 + */ 277 + void *simple_ring_buffer_reserve(struct simple_rb_per_cpu *cpu_buffer, unsigned long length, 278 + u64 timestamp) 279 + { 280 + struct ring_buffer_event *rb_event; 281 + 282 + if (cmpxchg(&cpu_buffer->status, SIMPLE_RB_READY, SIMPLE_RB_WRITING) != SIMPLE_RB_READY) 283 + return NULL; 284 + 285 + rb_event = simple_rb_reserve_next(cpu_buffer, length, timestamp); 286 + 287 + return &rb_event->array[1]; 288 + } 289 + EXPORT_SYMBOL_GPL(simple_ring_buffer_reserve); 290 + 291 + /** 292 + * simple_ring_buffer_commit - Commit the entry reserved with simple_ring_buffer_reserve() 293 + * @cpu_buffer: The simple_rb_per_cpu where the entry has been reserved 294 + */ 295 + void simple_ring_buffer_commit(struct simple_rb_per_cpu *cpu_buffer) 296 + { 297 + local_set(&cpu_buffer->tail_page->page->commit, 298 + cpu_buffer->tail_page->write); 299 + simple_rb_meta_inc(cpu_buffer->meta->entries, 1); 300 + 301 + /* 302 + * Paired with simple_rb_enable_tracing() to ensure data is 303 + * written to the ring-buffer before teardown. 304 + */ 305 + smp_store_release(&cpu_buffer->status, SIMPLE_RB_READY); 306 + } 307 + EXPORT_SYMBOL_GPL(simple_ring_buffer_commit); 308 + 309 + static u32 simple_rb_enable_tracing(struct simple_rb_per_cpu *cpu_buffer, bool enable) 310 + { 311 + u32 prev_status; 312 + 313 + if (enable) 314 + return cmpxchg(&cpu_buffer->status, SIMPLE_RB_UNAVAILABLE, SIMPLE_RB_READY); 315 + 316 + /* Wait for the buffer to be released */ 317 + do { 318 + prev_status = cmpxchg_acquire(&cpu_buffer->status, 319 + SIMPLE_RB_READY, 320 + SIMPLE_RB_UNAVAILABLE); 321 + } while (prev_status == SIMPLE_RB_WRITING); 322 + 323 + return prev_status; 324 + } 325 + 326 + /** 327 + * simple_ring_buffer_reset - Reset @cpu_buffer 328 + * @cpu_buffer: A simple_rb_per_cpu 329 + * 330 + * This will not clear the content of the data, only reset counters and pointers 331 + * 332 + * Returns 0 on success or -ENODEV if @cpu_buffer was unloaded. 333 + */ 334 + int simple_ring_buffer_reset(struct simple_rb_per_cpu *cpu_buffer) 335 + { 336 + struct simple_buffer_page *bpage; 337 + u32 prev_status; 338 + int ret; 339 + 340 + if (!simple_rb_loaded(cpu_buffer)) 341 + return -ENODEV; 342 + 343 + prev_status = simple_rb_enable_tracing(cpu_buffer, false); 344 + 345 + ret = simple_rb_find_head(cpu_buffer); 346 + if (ret) 347 + return ret; 348 + 349 + bpage = cpu_buffer->tail_page = cpu_buffer->head_page; 350 + do { 351 + simple_bpage_reset(bpage); 352 + bpage = simple_bpage_next_page(bpage); 353 + } while (bpage != cpu_buffer->head_page); 354 + 355 + simple_bpage_reset(cpu_buffer->reader_page); 356 + 357 + cpu_buffer->last_overrun = 0; 358 + cpu_buffer->write_stamp = 0; 359 + 360 + cpu_buffer->meta->reader.read = 0; 361 + cpu_buffer->meta->reader.lost_events = 0; 362 + cpu_buffer->meta->entries = 0; 363 + cpu_buffer->meta->overrun = 0; 364 + cpu_buffer->meta->read = 0; 365 + cpu_buffer->meta->pages_lost = 0; 366 + cpu_buffer->meta->pages_touched = 0; 367 + 368 + if (prev_status == SIMPLE_RB_READY) 369 + simple_rb_enable_tracing(cpu_buffer, true); 370 + 371 + return 0; 372 + } 373 + EXPORT_SYMBOL_GPL(simple_ring_buffer_reset); 374 + 375 + int simple_ring_buffer_init_mm(struct simple_rb_per_cpu *cpu_buffer, 376 + struct simple_buffer_page *bpages, 377 + const struct ring_buffer_desc *desc, 378 + void *(*load_page)(unsigned long va), 379 + void (*unload_page)(void *va)) 380 + { 381 + struct simple_buffer_page *bpage = bpages; 382 + int ret = 0; 383 + void *page; 384 + int i; 385 + 386 + /* At least 1 reader page and two pages in the ring-buffer */ 387 + if (desc->nr_page_va < 3) 388 + return -EINVAL; 389 + 390 + memset(cpu_buffer, 0, sizeof(*cpu_buffer)); 391 + 392 + cpu_buffer->meta = load_page(desc->meta_va); 393 + if (!cpu_buffer->meta) 394 + return -EINVAL; 395 + 396 + memset(cpu_buffer->meta, 0, sizeof(*cpu_buffer->meta)); 397 + cpu_buffer->meta->meta_page_size = PAGE_SIZE; 398 + cpu_buffer->meta->nr_subbufs = cpu_buffer->nr_pages; 399 + 400 + /* The reader page is not part of the ring initially */ 401 + page = load_page(desc->page_va[0]); 402 + if (!page) { 403 + unload_page(cpu_buffer->meta); 404 + return -EINVAL; 405 + } 406 + 407 + simple_bpage_init(bpage, page); 408 + bpage->id = 0; 409 + 410 + cpu_buffer->nr_pages = 1; 411 + 412 + cpu_buffer->reader_page = bpage; 413 + cpu_buffer->tail_page = bpage + 1; 414 + cpu_buffer->head_page = bpage + 1; 415 + 416 + for (i = 1; i < desc->nr_page_va; i++) { 417 + page = load_page(desc->page_va[i]); 418 + if (!page) { 419 + ret = -EINVAL; 420 + break; 421 + } 422 + 423 + simple_bpage_init(++bpage, page); 424 + 425 + bpage->link.next = &(bpage + 1)->link; 426 + bpage->link.prev = &(bpage - 1)->link; 427 + bpage->id = i; 428 + 429 + cpu_buffer->nr_pages = i + 1; 430 + } 431 + 432 + if (ret) { 433 + for (i--; i >= 0; i--) 434 + unload_page((void *)desc->page_va[i]); 435 + unload_page(cpu_buffer->meta); 436 + 437 + return ret; 438 + } 439 + 440 + /* Close the ring */ 441 + bpage->link.next = &cpu_buffer->tail_page->link; 442 + cpu_buffer->tail_page->link.prev = &bpage->link; 443 + 444 + /* The last init'ed page points to the head page */ 445 + simple_bpage_set_head_link(bpage); 446 + 447 + cpu_buffer->bpages = bpages; 448 + 449 + return 0; 450 + } 451 + 452 + static void *__load_page(unsigned long page) 453 + { 454 + return (void *)page; 455 + } 456 + 457 + static void __unload_page(void *page) { } 458 + 459 + /** 460 + * simple_ring_buffer_init - Init @cpu_buffer based on @desc 461 + * @cpu_buffer: A simple_rb_per_cpu buffer to init, allocated by the caller. 462 + * @bpages: Array of simple_buffer_pages, with as many elements as @desc->nr_page_va 463 + * @desc: A ring_buffer_desc 464 + * 465 + * Returns 0 on success or -EINVAL if the content of @desc is invalid 466 + */ 467 + int simple_ring_buffer_init(struct simple_rb_per_cpu *cpu_buffer, struct simple_buffer_page *bpages, 468 + const struct ring_buffer_desc *desc) 469 + { 470 + return simple_ring_buffer_init_mm(cpu_buffer, bpages, desc, __load_page, __unload_page); 471 + } 472 + EXPORT_SYMBOL_GPL(simple_ring_buffer_init); 473 + 474 + void simple_ring_buffer_unload_mm(struct simple_rb_per_cpu *cpu_buffer, 475 + void (*unload_page)(void *)) 476 + { 477 + int p; 478 + 479 + if (!simple_rb_loaded(cpu_buffer)) 480 + return; 481 + 482 + simple_rb_enable_tracing(cpu_buffer, false); 483 + 484 + unload_page(cpu_buffer->meta); 485 + for (p = 0; p < cpu_buffer->nr_pages; p++) 486 + unload_page(cpu_buffer->bpages[p].page); 487 + 488 + cpu_buffer->bpages = NULL; 489 + } 490 + 491 + /** 492 + * simple_ring_buffer_unload - Prepare @cpu_buffer for deletion 493 + * @cpu_buffer: A simple_rb_per_cpu that will be deleted. 494 + */ 495 + void simple_ring_buffer_unload(struct simple_rb_per_cpu *cpu_buffer) 496 + { 497 + return simple_ring_buffer_unload_mm(cpu_buffer, __unload_page); 498 + } 499 + EXPORT_SYMBOL_GPL(simple_ring_buffer_unload); 500 + 501 + /** 502 + * simple_ring_buffer_enable_tracing - Enable or disable writing to @cpu_buffer 503 + * @cpu_buffer: A simple_rb_per_cpu 504 + * @enable: True to enable tracing, False to disable it 505 + * 506 + * Returns 0 on success or -ENODEV if @cpu_buffer was unloaded 507 + */ 508 + int simple_ring_buffer_enable_tracing(struct simple_rb_per_cpu *cpu_buffer, bool enable) 509 + { 510 + if (!simple_rb_loaded(cpu_buffer)) 511 + return -ENODEV; 512 + 513 + simple_rb_enable_tracing(cpu_buffer, enable); 514 + 515 + return 0; 516 + } 517 + EXPORT_SYMBOL_GPL(simple_ring_buffer_enable_tracing);
+2 -2
kernel/trace/trace.c
··· 3856 3856 * Should be used after trace_array_get(), trace_types_lock 3857 3857 * ensures that i_cdev was already initialized. 3858 3858 */ 3859 - static inline int tracing_get_cpu(struct inode *inode) 3859 + int tracing_get_cpu(struct inode *inode) 3860 3860 { 3861 3861 if (inode->i_cdev) /* See trace_create_cpu_file() */ 3862 3862 return (long)inode->i_cdev - 1; ··· 8589 8589 return tr->percpu_dir; 8590 8590 } 8591 8591 8592 - static struct dentry * 8592 + struct dentry * 8593 8593 trace_create_cpu_file(const char *name, umode_t mode, struct dentry *parent, 8594 8594 void *data, long cpu, const struct file_operations *fops) 8595 8595 {
+7
kernel/trace/trace.h
··· 689 689 struct dentry *parent, 690 690 void *data, 691 691 const struct file_operations *fops); 692 + struct dentry *trace_create_cpu_file(const char *name, 693 + umode_t mode, 694 + struct dentry *parent, 695 + void *data, 696 + long cpu, 697 + const struct file_operations *fops); 698 + int tracing_get_cpu(struct inode *inode); 692 699 693 700 694 701 /**
+1384
kernel/trace/trace_remote.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Copyright (C) 2025 - Google LLC 4 + * Author: Vincent Donnefort <vdonnefort@google.com> 5 + */ 6 + 7 + #include <linux/kstrtox.h> 8 + #include <linux/lockdep.h> 9 + #include <linux/mutex.h> 10 + #include <linux/tracefs.h> 11 + #include <linux/trace_remote.h> 12 + #include <linux/trace_seq.h> 13 + #include <linux/types.h> 14 + 15 + #include "trace.h" 16 + 17 + #define TRACEFS_DIR "remotes" 18 + #define TRACEFS_MODE_WRITE 0640 19 + #define TRACEFS_MODE_READ 0440 20 + 21 + enum tri_type { 22 + TRI_CONSUMING, 23 + TRI_NONCONSUMING, 24 + }; 25 + 26 + struct trace_remote_iterator { 27 + struct trace_remote *remote; 28 + struct trace_seq seq; 29 + struct delayed_work poll_work; 30 + unsigned long lost_events; 31 + u64 ts; 32 + struct ring_buffer_iter *rb_iter; 33 + struct ring_buffer_iter **rb_iters; 34 + struct remote_event_hdr *evt; 35 + int cpu; 36 + int evt_cpu; 37 + loff_t pos; 38 + enum tri_type type; 39 + }; 40 + 41 + struct trace_remote { 42 + struct trace_remote_callbacks *cbs; 43 + void *priv; 44 + struct trace_buffer *trace_buffer; 45 + struct trace_buffer_desc *trace_buffer_desc; 46 + struct dentry *dentry; 47 + struct eventfs_inode *eventfs; 48 + struct remote_event *events; 49 + unsigned long nr_events; 50 + unsigned long trace_buffer_size; 51 + struct ring_buffer_remote rb_remote; 52 + struct mutex lock; 53 + struct rw_semaphore reader_lock; 54 + struct rw_semaphore *pcpu_reader_locks; 55 + unsigned int nr_readers; 56 + unsigned int poll_ms; 57 + bool tracing_on; 58 + }; 59 + 60 + static bool trace_remote_loaded(struct trace_remote *remote) 61 + { 62 + return !!remote->trace_buffer; 63 + } 64 + 65 + static int trace_remote_load(struct trace_remote *remote) 66 + { 67 + struct ring_buffer_remote *rb_remote = &remote->rb_remote; 68 + struct trace_buffer_desc *desc; 69 + 70 + lockdep_assert_held(&remote->lock); 71 + 72 + if (trace_remote_loaded(remote)) 73 + return 0; 74 + 75 + desc = remote->cbs->load_trace_buffer(remote->trace_buffer_size, remote->priv); 76 + if (IS_ERR(desc)) 77 + return PTR_ERR(desc); 78 + 79 + rb_remote->desc = desc; 80 + rb_remote->swap_reader_page = remote->cbs->swap_reader_page; 81 + rb_remote->priv = remote->priv; 82 + rb_remote->reset = remote->cbs->reset; 83 + remote->trace_buffer = ring_buffer_alloc_remote(rb_remote); 84 + if (!remote->trace_buffer) { 85 + remote->cbs->unload_trace_buffer(desc, remote->priv); 86 + return -ENOMEM; 87 + } 88 + 89 + remote->trace_buffer_desc = desc; 90 + 91 + return 0; 92 + } 93 + 94 + static void trace_remote_try_unload(struct trace_remote *remote) 95 + { 96 + lockdep_assert_held(&remote->lock); 97 + 98 + if (!trace_remote_loaded(remote)) 99 + return; 100 + 101 + /* The buffer is being read or writable */ 102 + if (remote->nr_readers || remote->tracing_on) 103 + return; 104 + 105 + /* The buffer has readable data */ 106 + if (!ring_buffer_empty(remote->trace_buffer)) 107 + return; 108 + 109 + ring_buffer_free(remote->trace_buffer); 110 + remote->trace_buffer = NULL; 111 + remote->cbs->unload_trace_buffer(remote->trace_buffer_desc, remote->priv); 112 + } 113 + 114 + static int trace_remote_enable_tracing(struct trace_remote *remote) 115 + { 116 + int ret; 117 + 118 + lockdep_assert_held(&remote->lock); 119 + 120 + if (remote->tracing_on) 121 + return 0; 122 + 123 + ret = trace_remote_load(remote); 124 + if (ret) 125 + return ret; 126 + 127 + ret = remote->cbs->enable_tracing(true, remote->priv); 128 + if (ret) { 129 + trace_remote_try_unload(remote); 130 + return ret; 131 + } 132 + 133 + remote->tracing_on = true; 134 + 135 + return 0; 136 + } 137 + 138 + static int trace_remote_disable_tracing(struct trace_remote *remote) 139 + { 140 + int ret; 141 + 142 + lockdep_assert_held(&remote->lock); 143 + 144 + if (!remote->tracing_on) 145 + return 0; 146 + 147 + ret = remote->cbs->enable_tracing(false, remote->priv); 148 + if (ret) 149 + return ret; 150 + 151 + ring_buffer_poll_remote(remote->trace_buffer, RING_BUFFER_ALL_CPUS); 152 + remote->tracing_on = false; 153 + trace_remote_try_unload(remote); 154 + 155 + return 0; 156 + } 157 + 158 + static void trace_remote_reset(struct trace_remote *remote, int cpu) 159 + { 160 + lockdep_assert_held(&remote->lock); 161 + 162 + if (!trace_remote_loaded(remote)) 163 + return; 164 + 165 + if (cpu == RING_BUFFER_ALL_CPUS) 166 + ring_buffer_reset(remote->trace_buffer); 167 + else 168 + ring_buffer_reset_cpu(remote->trace_buffer, cpu); 169 + 170 + trace_remote_try_unload(remote); 171 + } 172 + 173 + static ssize_t 174 + tracing_on_write(struct file *filp, const char __user *ubuf, size_t cnt, loff_t *ppos) 175 + { 176 + struct seq_file *seq = filp->private_data; 177 + struct trace_remote *remote = seq->private; 178 + unsigned long val; 179 + int ret; 180 + 181 + ret = kstrtoul_from_user(ubuf, cnt, 10, &val); 182 + if (ret) 183 + return ret; 184 + 185 + guard(mutex)(&remote->lock); 186 + 187 + ret = val ? trace_remote_enable_tracing(remote) : trace_remote_disable_tracing(remote); 188 + if (ret) 189 + return ret; 190 + 191 + return cnt; 192 + } 193 + static int tracing_on_show(struct seq_file *s, void *unused) 194 + { 195 + struct trace_remote *remote = s->private; 196 + 197 + seq_printf(s, "%d\n", remote->tracing_on); 198 + 199 + return 0; 200 + } 201 + DEFINE_SHOW_STORE_ATTRIBUTE(tracing_on); 202 + 203 + static ssize_t buffer_size_kb_write(struct file *filp, const char __user *ubuf, size_t cnt, 204 + loff_t *ppos) 205 + { 206 + struct seq_file *seq = filp->private_data; 207 + struct trace_remote *remote = seq->private; 208 + unsigned long val; 209 + int ret; 210 + 211 + ret = kstrtoul_from_user(ubuf, cnt, 10, &val); 212 + if (ret) 213 + return ret; 214 + 215 + /* KiB to Bytes */ 216 + if (!val || check_shl_overflow(val, 10, &val)) 217 + return -EINVAL; 218 + 219 + guard(mutex)(&remote->lock); 220 + 221 + if (trace_remote_loaded(remote)) 222 + return -EBUSY; 223 + 224 + remote->trace_buffer_size = val; 225 + 226 + return cnt; 227 + } 228 + 229 + static int buffer_size_kb_show(struct seq_file *s, void *unused) 230 + { 231 + struct trace_remote *remote = s->private; 232 + 233 + seq_printf(s, "%lu (%s)\n", remote->trace_buffer_size >> 10, 234 + trace_remote_loaded(remote) ? "loaded" : "unloaded"); 235 + 236 + return 0; 237 + } 238 + DEFINE_SHOW_STORE_ATTRIBUTE(buffer_size_kb); 239 + 240 + static int trace_remote_get(struct trace_remote *remote, int cpu) 241 + { 242 + int ret; 243 + 244 + if (remote->nr_readers == UINT_MAX) 245 + return -EBUSY; 246 + 247 + ret = trace_remote_load(remote); 248 + if (ret) 249 + return ret; 250 + 251 + if (cpu != RING_BUFFER_ALL_CPUS && !remote->pcpu_reader_locks) { 252 + int lock_cpu; 253 + 254 + remote->pcpu_reader_locks = kcalloc(nr_cpu_ids, sizeof(*remote->pcpu_reader_locks), 255 + GFP_KERNEL); 256 + if (!remote->pcpu_reader_locks) { 257 + trace_remote_try_unload(remote); 258 + return -ENOMEM; 259 + } 260 + 261 + for_each_possible_cpu(lock_cpu) 262 + init_rwsem(&remote->pcpu_reader_locks[lock_cpu]); 263 + } 264 + 265 + remote->nr_readers++; 266 + 267 + return 0; 268 + } 269 + 270 + static void trace_remote_put(struct trace_remote *remote) 271 + { 272 + if (WARN_ON(!remote->nr_readers)) 273 + return; 274 + 275 + remote->nr_readers--; 276 + if (remote->nr_readers) 277 + return; 278 + 279 + kfree(remote->pcpu_reader_locks); 280 + remote->pcpu_reader_locks = NULL; 281 + 282 + trace_remote_try_unload(remote); 283 + } 284 + 285 + static bool trace_remote_has_cpu(struct trace_remote *remote, int cpu) 286 + { 287 + if (cpu == RING_BUFFER_ALL_CPUS) 288 + return true; 289 + 290 + return ring_buffer_poll_remote(remote->trace_buffer, cpu) == 0; 291 + } 292 + 293 + static void __poll_remote(struct work_struct *work) 294 + { 295 + struct delayed_work *dwork = to_delayed_work(work); 296 + struct trace_remote_iterator *iter; 297 + 298 + iter = container_of(dwork, struct trace_remote_iterator, poll_work); 299 + ring_buffer_poll_remote(iter->remote->trace_buffer, iter->cpu); 300 + schedule_delayed_work((struct delayed_work *)work, 301 + msecs_to_jiffies(iter->remote->poll_ms)); 302 + } 303 + 304 + static void __free_ring_buffer_iter(struct trace_remote_iterator *iter, int cpu) 305 + { 306 + if (cpu != RING_BUFFER_ALL_CPUS) { 307 + ring_buffer_read_finish(iter->rb_iter); 308 + return; 309 + } 310 + 311 + for_each_possible_cpu(cpu) { 312 + if (iter->rb_iters[cpu]) 313 + ring_buffer_read_finish(iter->rb_iters[cpu]); 314 + } 315 + 316 + kfree(iter->rb_iters); 317 + } 318 + 319 + static int __alloc_ring_buffer_iter(struct trace_remote_iterator *iter, int cpu) 320 + { 321 + if (cpu != RING_BUFFER_ALL_CPUS) { 322 + iter->rb_iter = ring_buffer_read_start(iter->remote->trace_buffer, cpu, GFP_KERNEL); 323 + 324 + return iter->rb_iter ? 0 : -ENOMEM; 325 + } 326 + 327 + iter->rb_iters = kcalloc(nr_cpu_ids, sizeof(*iter->rb_iters), GFP_KERNEL); 328 + if (!iter->rb_iters) 329 + return -ENOMEM; 330 + 331 + for_each_possible_cpu(cpu) { 332 + iter->rb_iters[cpu] = ring_buffer_read_start(iter->remote->trace_buffer, cpu, 333 + GFP_KERNEL); 334 + if (!iter->rb_iters[cpu]) { 335 + /* This CPU isn't part of trace_buffer. Skip it */ 336 + if (!trace_remote_has_cpu(iter->remote, cpu)) 337 + continue; 338 + 339 + __free_ring_buffer_iter(iter, RING_BUFFER_ALL_CPUS); 340 + return -ENOMEM; 341 + } 342 + } 343 + 344 + return 0; 345 + } 346 + 347 + static struct trace_remote_iterator 348 + *trace_remote_iter(struct trace_remote *remote, int cpu, enum tri_type type) 349 + { 350 + struct trace_remote_iterator *iter = NULL; 351 + int ret; 352 + 353 + lockdep_assert_held(&remote->lock); 354 + 355 + if (type == TRI_NONCONSUMING && !trace_remote_loaded(remote)) 356 + return NULL; 357 + 358 + ret = trace_remote_get(remote, cpu); 359 + if (ret) 360 + return ERR_PTR(ret); 361 + 362 + if (!trace_remote_has_cpu(remote, cpu)) { 363 + ret = -ENODEV; 364 + goto err; 365 + } 366 + 367 + iter = kzalloc_obj(*iter); 368 + if (iter) { 369 + iter->remote = remote; 370 + iter->cpu = cpu; 371 + iter->type = type; 372 + trace_seq_init(&iter->seq); 373 + 374 + switch (type) { 375 + case TRI_CONSUMING: 376 + ring_buffer_poll_remote(remote->trace_buffer, cpu); 377 + INIT_DELAYED_WORK(&iter->poll_work, __poll_remote); 378 + schedule_delayed_work(&iter->poll_work, msecs_to_jiffies(remote->poll_ms)); 379 + break; 380 + case TRI_NONCONSUMING: 381 + ret = __alloc_ring_buffer_iter(iter, cpu); 382 + break; 383 + } 384 + 385 + if (ret) 386 + goto err; 387 + 388 + return iter; 389 + } 390 + ret = -ENOMEM; 391 + 392 + err: 393 + kfree(iter); 394 + trace_remote_put(remote); 395 + 396 + return ERR_PTR(ret); 397 + } 398 + 399 + static void trace_remote_iter_free(struct trace_remote_iterator *iter) 400 + { 401 + struct trace_remote *remote; 402 + 403 + if (!iter) 404 + return; 405 + 406 + remote = iter->remote; 407 + 408 + lockdep_assert_held(&remote->lock); 409 + 410 + switch (iter->type) { 411 + case TRI_CONSUMING: 412 + cancel_delayed_work_sync(&iter->poll_work); 413 + break; 414 + case TRI_NONCONSUMING: 415 + __free_ring_buffer_iter(iter, iter->cpu); 416 + break; 417 + } 418 + 419 + kfree(iter); 420 + trace_remote_put(remote); 421 + } 422 + 423 + static void trace_remote_iter_read_start(struct trace_remote_iterator *iter) 424 + { 425 + struct trace_remote *remote = iter->remote; 426 + int cpu = iter->cpu; 427 + 428 + /* Acquire global reader lock */ 429 + if (cpu == RING_BUFFER_ALL_CPUS && iter->type == TRI_CONSUMING) 430 + down_write(&remote->reader_lock); 431 + else 432 + down_read(&remote->reader_lock); 433 + 434 + if (cpu == RING_BUFFER_ALL_CPUS) 435 + return; 436 + 437 + /* 438 + * No need for the remote lock here, iter holds a reference on 439 + * remote->nr_readers 440 + */ 441 + 442 + /* Get the per-CPU one */ 443 + if (WARN_ON_ONCE(!remote->pcpu_reader_locks)) 444 + return; 445 + 446 + if (iter->type == TRI_CONSUMING) 447 + down_write(&remote->pcpu_reader_locks[cpu]); 448 + else 449 + down_read(&remote->pcpu_reader_locks[cpu]); 450 + } 451 + 452 + static void trace_remote_iter_read_finished(struct trace_remote_iterator *iter) 453 + { 454 + struct trace_remote *remote = iter->remote; 455 + int cpu = iter->cpu; 456 + 457 + /* Release per-CPU reader lock */ 458 + if (cpu != RING_BUFFER_ALL_CPUS) { 459 + /* 460 + * No need for the remote lock here, iter holds a reference on 461 + * remote->nr_readers 462 + */ 463 + if (iter->type == TRI_CONSUMING) 464 + up_write(&remote->pcpu_reader_locks[cpu]); 465 + else 466 + up_read(&remote->pcpu_reader_locks[cpu]); 467 + } 468 + 469 + /* Release global reader lock */ 470 + if (cpu == RING_BUFFER_ALL_CPUS && iter->type == TRI_CONSUMING) 471 + up_write(&remote->reader_lock); 472 + else 473 + up_read(&remote->reader_lock); 474 + } 475 + 476 + static struct ring_buffer_iter *__get_rb_iter(struct trace_remote_iterator *iter, int cpu) 477 + { 478 + return iter->cpu != RING_BUFFER_ALL_CPUS ? iter->rb_iter : iter->rb_iters[cpu]; 479 + } 480 + 481 + static struct ring_buffer_event * 482 + __peek_event(struct trace_remote_iterator *iter, int cpu, u64 *ts, unsigned long *lost_events) 483 + { 484 + struct ring_buffer_event *rb_evt; 485 + struct ring_buffer_iter *rb_iter; 486 + 487 + switch (iter->type) { 488 + case TRI_CONSUMING: 489 + return ring_buffer_peek(iter->remote->trace_buffer, cpu, ts, lost_events); 490 + case TRI_NONCONSUMING: 491 + rb_iter = __get_rb_iter(iter, cpu); 492 + if (!rb_iter) 493 + return NULL; 494 + 495 + rb_evt = ring_buffer_iter_peek(rb_iter, ts); 496 + if (!rb_evt) 497 + return NULL; 498 + 499 + *lost_events = ring_buffer_iter_dropped(rb_iter); 500 + 501 + return rb_evt; 502 + } 503 + 504 + return NULL; 505 + } 506 + 507 + static bool trace_remote_iter_read_event(struct trace_remote_iterator *iter) 508 + { 509 + struct trace_buffer *trace_buffer = iter->remote->trace_buffer; 510 + struct ring_buffer_event *rb_evt; 511 + int cpu = iter->cpu; 512 + 513 + if (cpu != RING_BUFFER_ALL_CPUS) { 514 + if (ring_buffer_empty_cpu(trace_buffer, cpu)) 515 + return false; 516 + 517 + rb_evt = __peek_event(iter, cpu, &iter->ts, &iter->lost_events); 518 + if (!rb_evt) 519 + return false; 520 + 521 + iter->evt_cpu = cpu; 522 + iter->evt = ring_buffer_event_data(rb_evt); 523 + return true; 524 + } 525 + 526 + iter->ts = U64_MAX; 527 + for_each_possible_cpu(cpu) { 528 + unsigned long lost_events; 529 + u64 ts; 530 + 531 + if (ring_buffer_empty_cpu(trace_buffer, cpu)) 532 + continue; 533 + 534 + rb_evt = __peek_event(iter, cpu, &ts, &lost_events); 535 + if (!rb_evt) 536 + continue; 537 + 538 + if (ts >= iter->ts) 539 + continue; 540 + 541 + iter->ts = ts; 542 + iter->evt_cpu = cpu; 543 + iter->evt = ring_buffer_event_data(rb_evt); 544 + iter->lost_events = lost_events; 545 + } 546 + 547 + return iter->ts != U64_MAX; 548 + } 549 + 550 + static void trace_remote_iter_move(struct trace_remote_iterator *iter) 551 + { 552 + struct trace_buffer *trace_buffer = iter->remote->trace_buffer; 553 + 554 + switch (iter->type) { 555 + case TRI_CONSUMING: 556 + ring_buffer_consume(trace_buffer, iter->evt_cpu, NULL, NULL); 557 + break; 558 + case TRI_NONCONSUMING: 559 + ring_buffer_iter_advance(__get_rb_iter(iter, iter->evt_cpu)); 560 + break; 561 + } 562 + } 563 + 564 + static struct remote_event *trace_remote_find_event(struct trace_remote *remote, unsigned short id); 565 + 566 + static int trace_remote_iter_print_event(struct trace_remote_iterator *iter) 567 + { 568 + struct remote_event *evt; 569 + unsigned long usecs_rem; 570 + u64 ts = iter->ts; 571 + 572 + if (iter->lost_events) 573 + trace_seq_printf(&iter->seq, "CPU:%d [LOST %lu EVENTS]\n", 574 + iter->evt_cpu, iter->lost_events); 575 + 576 + do_div(ts, 1000); 577 + usecs_rem = do_div(ts, USEC_PER_SEC); 578 + 579 + trace_seq_printf(&iter->seq, "[%03d]\t%5llu.%06lu: ", iter->evt_cpu, 580 + ts, usecs_rem); 581 + 582 + evt = trace_remote_find_event(iter->remote, iter->evt->id); 583 + if (!evt) 584 + trace_seq_printf(&iter->seq, "UNKNOWN id=%d\n", iter->evt->id); 585 + else 586 + evt->print(iter->evt, &iter->seq); 587 + 588 + return trace_seq_has_overflowed(&iter->seq) ? -EOVERFLOW : 0; 589 + } 590 + 591 + static int trace_pipe_open(struct inode *inode, struct file *filp) 592 + { 593 + struct trace_remote *remote = inode->i_private; 594 + struct trace_remote_iterator *iter; 595 + int cpu = tracing_get_cpu(inode); 596 + 597 + guard(mutex)(&remote->lock); 598 + 599 + iter = trace_remote_iter(remote, cpu, TRI_CONSUMING); 600 + if (IS_ERR(iter)) 601 + return PTR_ERR(iter); 602 + 603 + filp->private_data = iter; 604 + 605 + return IS_ERR(iter) ? PTR_ERR(iter) : 0; 606 + } 607 + 608 + static int trace_pipe_release(struct inode *inode, struct file *filp) 609 + { 610 + struct trace_remote_iterator *iter = filp->private_data; 611 + struct trace_remote *remote = iter->remote; 612 + 613 + guard(mutex)(&remote->lock); 614 + 615 + trace_remote_iter_free(iter); 616 + 617 + return 0; 618 + } 619 + 620 + static ssize_t trace_pipe_read(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos) 621 + { 622 + struct trace_remote_iterator *iter = filp->private_data; 623 + struct trace_buffer *trace_buffer = iter->remote->trace_buffer; 624 + int ret; 625 + 626 + copy_to_user: 627 + ret = trace_seq_to_user(&iter->seq, ubuf, cnt); 628 + if (ret != -EBUSY) 629 + return ret; 630 + 631 + trace_seq_init(&iter->seq); 632 + 633 + ret = ring_buffer_wait(trace_buffer, iter->cpu, 0, NULL, NULL); 634 + if (ret < 0) 635 + return ret; 636 + 637 + trace_remote_iter_read_start(iter); 638 + 639 + while (trace_remote_iter_read_event(iter)) { 640 + int prev_len = iter->seq.seq.len; 641 + 642 + if (trace_remote_iter_print_event(iter)) { 643 + iter->seq.seq.len = prev_len; 644 + break; 645 + } 646 + 647 + trace_remote_iter_move(iter); 648 + } 649 + 650 + trace_remote_iter_read_finished(iter); 651 + 652 + goto copy_to_user; 653 + } 654 + 655 + static const struct file_operations trace_pipe_fops = { 656 + .open = trace_pipe_open, 657 + .read = trace_pipe_read, 658 + .release = trace_pipe_release, 659 + }; 660 + 661 + static void *trace_next(struct seq_file *m, void *v, loff_t *pos) 662 + { 663 + struct trace_remote_iterator *iter = m->private; 664 + 665 + ++*pos; 666 + 667 + if (!iter || !trace_remote_iter_read_event(iter)) 668 + return NULL; 669 + 670 + trace_remote_iter_move(iter); 671 + iter->pos++; 672 + 673 + return iter; 674 + } 675 + 676 + static void *trace_start(struct seq_file *m, loff_t *pos) 677 + { 678 + struct trace_remote_iterator *iter = m->private; 679 + loff_t i; 680 + 681 + if (!iter) 682 + return NULL; 683 + 684 + trace_remote_iter_read_start(iter); 685 + 686 + if (!*pos) { 687 + iter->pos = -1; 688 + return trace_next(m, NULL, &i); 689 + } 690 + 691 + i = iter->pos; 692 + while (i < *pos) { 693 + iter = trace_next(m, NULL, &i); 694 + if (!iter) 695 + return NULL; 696 + } 697 + 698 + return iter; 699 + } 700 + 701 + static int trace_show(struct seq_file *m, void *v) 702 + { 703 + struct trace_remote_iterator *iter = v; 704 + 705 + trace_seq_init(&iter->seq); 706 + 707 + if (trace_remote_iter_print_event(iter)) { 708 + seq_printf(m, "[EVENT %d PRINT TOO BIG]\n", iter->evt->id); 709 + return 0; 710 + } 711 + 712 + return trace_print_seq(m, &iter->seq); 713 + } 714 + 715 + static void trace_stop(struct seq_file *m, void *v) 716 + { 717 + struct trace_remote_iterator *iter = m->private; 718 + 719 + if (iter) 720 + trace_remote_iter_read_finished(iter); 721 + } 722 + 723 + static const struct seq_operations trace_sops = { 724 + .start = trace_start, 725 + .next = trace_next, 726 + .show = trace_show, 727 + .stop = trace_stop, 728 + }; 729 + 730 + static int trace_open(struct inode *inode, struct file *filp) 731 + { 732 + struct trace_remote *remote = inode->i_private; 733 + struct trace_remote_iterator *iter = NULL; 734 + int cpu = tracing_get_cpu(inode); 735 + int ret; 736 + 737 + if (!(filp->f_mode & FMODE_READ)) 738 + return 0; 739 + 740 + guard(mutex)(&remote->lock); 741 + 742 + iter = trace_remote_iter(remote, cpu, TRI_NONCONSUMING); 743 + if (IS_ERR(iter)) 744 + return PTR_ERR(iter); 745 + 746 + ret = seq_open(filp, &trace_sops); 747 + if (ret) { 748 + trace_remote_iter_free(iter); 749 + return ret; 750 + } 751 + 752 + ((struct seq_file *)filp->private_data)->private = (void *)iter; 753 + 754 + return 0; 755 + } 756 + 757 + static int trace_release(struct inode *inode, struct file *filp) 758 + { 759 + struct trace_remote_iterator *iter; 760 + 761 + if (!(filp->f_mode & FMODE_READ)) 762 + return 0; 763 + 764 + iter = ((struct seq_file *)filp->private_data)->private; 765 + seq_release(inode, filp); 766 + 767 + if (!iter) 768 + return 0; 769 + 770 + guard(mutex)(&iter->remote->lock); 771 + 772 + trace_remote_iter_free(iter); 773 + 774 + return 0; 775 + } 776 + 777 + static ssize_t trace_write(struct file *filp, const char __user *ubuf, size_t cnt, loff_t *ppos) 778 + { 779 + struct inode *inode = file_inode(filp); 780 + struct trace_remote *remote = inode->i_private; 781 + int cpu = tracing_get_cpu(inode); 782 + 783 + guard(mutex)(&remote->lock); 784 + 785 + trace_remote_reset(remote, cpu); 786 + 787 + return cnt; 788 + } 789 + 790 + static const struct file_operations trace_fops = { 791 + .open = trace_open, 792 + .write = trace_write, 793 + .read = seq_read, 794 + .read_iter = seq_read_iter, 795 + .release = trace_release, 796 + }; 797 + 798 + static int trace_remote_init_tracefs(const char *name, struct trace_remote *remote) 799 + { 800 + struct dentry *remote_d, *percpu_d, *d; 801 + static struct dentry *root; 802 + static DEFINE_MUTEX(lock); 803 + bool root_inited = false; 804 + int cpu; 805 + 806 + guard(mutex)(&lock); 807 + 808 + if (!root) { 809 + root = tracefs_create_dir(TRACEFS_DIR, NULL); 810 + if (!root) { 811 + pr_err("Failed to create tracefs dir "TRACEFS_DIR"\n"); 812 + return -ENOMEM; 813 + } 814 + root_inited = true; 815 + } 816 + 817 + remote_d = tracefs_create_dir(name, root); 818 + if (!remote_d) { 819 + pr_err("Failed to create tracefs dir "TRACEFS_DIR"%s/\n", name); 820 + goto err; 821 + } 822 + 823 + d = trace_create_file("tracing_on", TRACEFS_MODE_WRITE, remote_d, remote, &tracing_on_fops); 824 + if (!d) 825 + goto err; 826 + 827 + d = trace_create_file("buffer_size_kb", TRACEFS_MODE_WRITE, remote_d, remote, 828 + &buffer_size_kb_fops); 829 + if (!d) 830 + goto err; 831 + 832 + d = trace_create_file("trace_pipe", TRACEFS_MODE_READ, remote_d, remote, &trace_pipe_fops); 833 + if (!d) 834 + goto err; 835 + 836 + d = trace_create_file("trace", TRACEFS_MODE_WRITE, remote_d, remote, &trace_fops); 837 + if (!d) 838 + goto err; 839 + 840 + percpu_d = tracefs_create_dir("per_cpu", remote_d); 841 + if (!percpu_d) { 842 + pr_err("Failed to create tracefs dir "TRACEFS_DIR"%s/per_cpu/\n", name); 843 + goto err; 844 + } 845 + 846 + for_each_possible_cpu(cpu) { 847 + struct dentry *cpu_d; 848 + char cpu_name[16]; 849 + 850 + snprintf(cpu_name, sizeof(cpu_name), "cpu%d", cpu); 851 + cpu_d = tracefs_create_dir(cpu_name, percpu_d); 852 + if (!cpu_d) { 853 + pr_err("Failed to create tracefs dir "TRACEFS_DIR"%s/percpu/cpu%d\n", 854 + name, cpu); 855 + goto err; 856 + } 857 + 858 + d = trace_create_cpu_file("trace_pipe", TRACEFS_MODE_READ, cpu_d, remote, cpu, 859 + &trace_pipe_fops); 860 + if (!d) 861 + goto err; 862 + 863 + d = trace_create_cpu_file("trace", TRACEFS_MODE_WRITE, cpu_d, remote, cpu, 864 + &trace_fops); 865 + if (!d) 866 + goto err; 867 + } 868 + 869 + remote->dentry = remote_d; 870 + 871 + return 0; 872 + 873 + err: 874 + if (root_inited) { 875 + tracefs_remove(root); 876 + root = NULL; 877 + } else { 878 + tracefs_remove(remote_d); 879 + } 880 + 881 + return -ENOMEM; 882 + } 883 + 884 + static int trace_remote_register_events(const char *remote_name, struct trace_remote *remote, 885 + struct remote_event *events, size_t nr_events); 886 + 887 + /** 888 + * trace_remote_register() - Register a Tracefs remote 889 + * @name: Name of the remote, used for the Tracefs remotes/ directory. 890 + * @cbs: Set of callbacks used to control the remote. 891 + * @priv: Private data, passed to each callback from @cbs. 892 + * @events: Array of events. &remote_event.name and &remote_event.id must be 893 + * filled by the caller. 894 + * @nr_events: Number of events in the @events array. 895 + * 896 + * A trace remote is an entity, outside of the kernel (most likely firmware or 897 + * hypervisor) capable of writing events into a Tracefs compatible ring-buffer. 898 + * The kernel would then act as a reader. 899 + * 900 + * The registered remote will be found under the Tracefs directory 901 + * remotes/<name>. 902 + * 903 + * Return: 0 on success, negative error code on failure. 904 + */ 905 + int trace_remote_register(const char *name, struct trace_remote_callbacks *cbs, void *priv, 906 + struct remote_event *events, size_t nr_events) 907 + { 908 + struct trace_remote *remote; 909 + int ret; 910 + 911 + remote = kzalloc_obj(*remote); 912 + if (!remote) 913 + return -ENOMEM; 914 + 915 + remote->cbs = cbs; 916 + remote->priv = priv; 917 + remote->trace_buffer_size = 7 << 10; 918 + remote->poll_ms = 100; 919 + mutex_init(&remote->lock); 920 + init_rwsem(&remote->reader_lock); 921 + 922 + if (trace_remote_init_tracefs(name, remote)) { 923 + kfree(remote); 924 + return -ENOMEM; 925 + } 926 + 927 + ret = trace_remote_register_events(name, remote, events, nr_events); 928 + if (ret) { 929 + pr_err("Failed to register events for trace remote '%s' (%d)\n", 930 + name, ret); 931 + return ret; 932 + } 933 + 934 + ret = cbs->init ? cbs->init(remote->dentry, priv) : 0; 935 + if (ret) 936 + pr_err("Init failed for trace remote '%s' (%d)\n", name, ret); 937 + 938 + return ret; 939 + } 940 + EXPORT_SYMBOL_GPL(trace_remote_register); 941 + 942 + /** 943 + * trace_remote_free_buffer() - Free trace buffer allocated with trace_remote_alloc_buffer() 944 + * @desc: Descriptor of the per-CPU ring-buffers, originally filled by 945 + * trace_remote_alloc_buffer() 946 + * 947 + * Most likely called from &trace_remote_callbacks.unload_trace_buffer. 948 + */ 949 + void trace_remote_free_buffer(struct trace_buffer_desc *desc) 950 + { 951 + struct ring_buffer_desc *rb_desc; 952 + int cpu; 953 + 954 + for_each_ring_buffer_desc(rb_desc, cpu, desc) { 955 + unsigned int id; 956 + 957 + free_page(rb_desc->meta_va); 958 + 959 + for (id = 0; id < rb_desc->nr_page_va; id++) 960 + free_page(rb_desc->page_va[id]); 961 + } 962 + } 963 + EXPORT_SYMBOL_GPL(trace_remote_free_buffer); 964 + 965 + /** 966 + * trace_remote_alloc_buffer() - Dynamically allocate a trace buffer 967 + * @desc: Uninitialized trace_buffer_desc 968 + * @desc_size: Size of the trace_buffer_desc. Must be at least equal to 969 + * trace_buffer_desc_size() 970 + * @buffer_size: Size in bytes of each per-CPU ring-buffer 971 + * @cpumask: CPUs to allocate a ring-buffer for 972 + * 973 + * Helper to dynamically allocate a set of pages (enough to cover @buffer_size) 974 + * for each CPU from @cpumask and fill @desc. Most likely called from 975 + * &trace_remote_callbacks.load_trace_buffer. 976 + * 977 + * Return: 0 on success, negative error code on failure. 978 + */ 979 + int trace_remote_alloc_buffer(struct trace_buffer_desc *desc, size_t desc_size, size_t buffer_size, 980 + const struct cpumask *cpumask) 981 + { 982 + unsigned int nr_pages = max(DIV_ROUND_UP(buffer_size, PAGE_SIZE), 2UL) + 1; 983 + void *desc_end = desc + desc_size; 984 + struct ring_buffer_desc *rb_desc; 985 + int cpu, ret = -ENOMEM; 986 + 987 + if (desc_size < struct_size(desc, __data, 0)) 988 + return -EINVAL; 989 + 990 + desc->nr_cpus = 0; 991 + desc->struct_len = struct_size(desc, __data, 0); 992 + 993 + rb_desc = (struct ring_buffer_desc *)&desc->__data[0]; 994 + 995 + for_each_cpu(cpu, cpumask) { 996 + unsigned int id; 997 + 998 + if ((void *)rb_desc + struct_size(rb_desc, page_va, nr_pages) > desc_end) { 999 + ret = -EINVAL; 1000 + goto err; 1001 + } 1002 + 1003 + rb_desc->cpu = cpu; 1004 + rb_desc->nr_page_va = 0; 1005 + rb_desc->meta_va = (unsigned long)__get_free_page(GFP_KERNEL); 1006 + if (!rb_desc->meta_va) 1007 + goto err; 1008 + 1009 + for (id = 0; id < nr_pages; id++) { 1010 + rb_desc->page_va[id] = (unsigned long)__get_free_page(GFP_KERNEL); 1011 + if (!rb_desc->page_va[id]) 1012 + goto err; 1013 + 1014 + rb_desc->nr_page_va++; 1015 + } 1016 + desc->nr_cpus++; 1017 + desc->struct_len += offsetof(struct ring_buffer_desc, page_va); 1018 + desc->struct_len += struct_size(rb_desc, page_va, rb_desc->nr_page_va); 1019 + rb_desc = __next_ring_buffer_desc(rb_desc); 1020 + } 1021 + 1022 + return 0; 1023 + 1024 + err: 1025 + trace_remote_free_buffer(desc); 1026 + return ret; 1027 + } 1028 + EXPORT_SYMBOL_GPL(trace_remote_alloc_buffer); 1029 + 1030 + static int 1031 + trace_remote_enable_event(struct trace_remote *remote, struct remote_event *evt, bool enable) 1032 + { 1033 + int ret; 1034 + 1035 + lockdep_assert_held(&remote->lock); 1036 + 1037 + if (evt->enabled == enable) 1038 + return 0; 1039 + 1040 + ret = remote->cbs->enable_event(evt->id, enable, remote->priv); 1041 + if (ret) 1042 + return ret; 1043 + 1044 + evt->enabled = enable; 1045 + 1046 + return 0; 1047 + } 1048 + 1049 + static int remote_event_enable_show(struct seq_file *s, void *unused) 1050 + { 1051 + struct remote_event *evt = s->private; 1052 + 1053 + seq_printf(s, "%d\n", evt->enabled); 1054 + 1055 + return 0; 1056 + } 1057 + 1058 + static ssize_t remote_event_enable_write(struct file *filp, const char __user *ubuf, 1059 + size_t count, loff_t *ppos) 1060 + { 1061 + struct seq_file *seq = filp->private_data; 1062 + struct remote_event *evt = seq->private; 1063 + struct trace_remote *remote = evt->remote; 1064 + u8 enable; 1065 + int ret; 1066 + 1067 + ret = kstrtou8_from_user(ubuf, count, 10, &enable); 1068 + if (ret) 1069 + return ret; 1070 + 1071 + guard(mutex)(&remote->lock); 1072 + 1073 + ret = trace_remote_enable_event(remote, evt, enable); 1074 + if (ret) 1075 + return ret; 1076 + 1077 + return count; 1078 + } 1079 + DEFINE_SHOW_STORE_ATTRIBUTE(remote_event_enable); 1080 + 1081 + static int remote_event_id_show(struct seq_file *s, void *unused) 1082 + { 1083 + struct remote_event *evt = s->private; 1084 + 1085 + seq_printf(s, "%d\n", evt->id); 1086 + 1087 + return 0; 1088 + } 1089 + DEFINE_SHOW_ATTRIBUTE(remote_event_id); 1090 + 1091 + static int remote_event_format_show(struct seq_file *s, void *unused) 1092 + { 1093 + size_t offset = sizeof(struct remote_event_hdr); 1094 + struct remote_event *evt = s->private; 1095 + struct trace_event_fields *field; 1096 + 1097 + seq_printf(s, "name: %s\n", evt->name); 1098 + seq_printf(s, "ID: %d\n", evt->id); 1099 + seq_puts(s, 1100 + "format:\n\tfield:unsigned short common_type;\toffset:0;\tsize:2;\tsigned:0;\n\n"); 1101 + 1102 + field = &evt->fields[0]; 1103 + while (field->name) { 1104 + seq_printf(s, "\tfield:%s %s;\toffset:%zu;\tsize:%u;\tsigned:%d;\n", 1105 + field->type, field->name, offset, field->size, 1106 + field->is_signed); 1107 + offset += field->size; 1108 + field++; 1109 + } 1110 + 1111 + if (field != &evt->fields[0]) 1112 + seq_puts(s, "\n"); 1113 + 1114 + seq_printf(s, "print fmt: %s\n", evt->print_fmt); 1115 + 1116 + return 0; 1117 + } 1118 + DEFINE_SHOW_ATTRIBUTE(remote_event_format); 1119 + 1120 + static int remote_event_callback(const char *name, umode_t *mode, void **data, 1121 + const struct file_operations **fops) 1122 + { 1123 + if (!strcmp(name, "enable")) { 1124 + *mode = TRACEFS_MODE_WRITE; 1125 + *fops = &remote_event_enable_fops; 1126 + return 1; 1127 + } 1128 + 1129 + if (!strcmp(name, "id")) { 1130 + *mode = TRACEFS_MODE_READ; 1131 + *fops = &remote_event_id_fops; 1132 + return 1; 1133 + } 1134 + 1135 + if (!strcmp(name, "format")) { 1136 + *mode = TRACEFS_MODE_READ; 1137 + *fops = &remote_event_format_fops; 1138 + return 1; 1139 + } 1140 + 1141 + return 0; 1142 + } 1143 + 1144 + static ssize_t remote_events_dir_enable_write(struct file *filp, const char __user *ubuf, 1145 + size_t count, loff_t *ppos) 1146 + { 1147 + struct trace_remote *remote = file_inode(filp)->i_private; 1148 + int i, ret; 1149 + u8 enable; 1150 + 1151 + ret = kstrtou8_from_user(ubuf, count, 10, &enable); 1152 + if (ret) 1153 + return ret; 1154 + 1155 + guard(mutex)(&remote->lock); 1156 + 1157 + for (i = 0; i < remote->nr_events; i++) { 1158 + struct remote_event *evt = &remote->events[i]; 1159 + 1160 + trace_remote_enable_event(remote, evt, enable); 1161 + } 1162 + 1163 + return count; 1164 + } 1165 + 1166 + static ssize_t remote_events_dir_enable_read(struct file *filp, char __user *ubuf, size_t cnt, 1167 + loff_t *ppos) 1168 + { 1169 + struct trace_remote *remote = file_inode(filp)->i_private; 1170 + const char enabled_char[] = {'0', '1', 'X'}; 1171 + char enabled_str[] = " \n"; 1172 + int i, enabled = -1; 1173 + 1174 + guard(mutex)(&remote->lock); 1175 + 1176 + for (i = 0; i < remote->nr_events; i++) { 1177 + struct remote_event *evt = &remote->events[i]; 1178 + 1179 + if (enabled == -1) { 1180 + enabled = evt->enabled; 1181 + } else if (enabled != evt->enabled) { 1182 + enabled = 2; 1183 + break; 1184 + } 1185 + } 1186 + 1187 + enabled_str[0] = enabled_char[enabled == -1 ? 0 : enabled]; 1188 + 1189 + return simple_read_from_buffer(ubuf, cnt, ppos, enabled_str, 2); 1190 + } 1191 + 1192 + static const struct file_operations remote_events_dir_enable_fops = { 1193 + .write = remote_events_dir_enable_write, 1194 + .read = remote_events_dir_enable_read, 1195 + }; 1196 + 1197 + static ssize_t 1198 + remote_events_dir_header_page_read(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos) 1199 + { 1200 + struct trace_seq *s; 1201 + int ret; 1202 + 1203 + s = kmalloc(sizeof(*s), GFP_KERNEL); 1204 + if (!s) 1205 + return -ENOMEM; 1206 + 1207 + trace_seq_init(s); 1208 + 1209 + ring_buffer_print_page_header(NULL, s); 1210 + ret = simple_read_from_buffer(ubuf, cnt, ppos, s->buffer, trace_seq_used(s)); 1211 + kfree(s); 1212 + 1213 + return ret; 1214 + } 1215 + 1216 + static const struct file_operations remote_events_dir_header_page_fops = { 1217 + .read = remote_events_dir_header_page_read, 1218 + }; 1219 + 1220 + static ssize_t 1221 + remote_events_dir_header_event_read(struct file *filp, char __user *ubuf, size_t cnt, loff_t *ppos) 1222 + { 1223 + struct trace_seq *s; 1224 + int ret; 1225 + 1226 + s = kmalloc(sizeof(*s), GFP_KERNEL); 1227 + if (!s) 1228 + return -ENOMEM; 1229 + 1230 + trace_seq_init(s); 1231 + 1232 + ring_buffer_print_entry_header(s); 1233 + ret = simple_read_from_buffer(ubuf, cnt, ppos, s->buffer, trace_seq_used(s)); 1234 + kfree(s); 1235 + 1236 + return ret; 1237 + } 1238 + 1239 + static const struct file_operations remote_events_dir_header_event_fops = { 1240 + .read = remote_events_dir_header_event_read, 1241 + }; 1242 + 1243 + static int remote_events_dir_callback(const char *name, umode_t *mode, void **data, 1244 + const struct file_operations **fops) 1245 + { 1246 + if (!strcmp(name, "enable")) { 1247 + *mode = TRACEFS_MODE_WRITE; 1248 + *fops = &remote_events_dir_enable_fops; 1249 + return 1; 1250 + } 1251 + 1252 + if (!strcmp(name, "header_page")) { 1253 + *mode = TRACEFS_MODE_READ; 1254 + *fops = &remote_events_dir_header_page_fops; 1255 + return 1; 1256 + } 1257 + 1258 + if (!strcmp(name, "header_event")) { 1259 + *mode = TRACEFS_MODE_READ; 1260 + *fops = &remote_events_dir_header_event_fops; 1261 + return 1; 1262 + } 1263 + 1264 + return 0; 1265 + } 1266 + 1267 + static int trace_remote_init_eventfs(const char *remote_name, struct trace_remote *remote, 1268 + struct remote_event *evt) 1269 + { 1270 + struct eventfs_inode *eventfs = remote->eventfs; 1271 + static struct eventfs_entry dir_entries[] = { 1272 + { 1273 + .name = "enable", 1274 + .callback = remote_events_dir_callback, 1275 + }, { 1276 + .name = "header_page", 1277 + .callback = remote_events_dir_callback, 1278 + }, { 1279 + .name = "header_event", 1280 + .callback = remote_events_dir_callback, 1281 + } 1282 + }; 1283 + static struct eventfs_entry entries[] = { 1284 + { 1285 + .name = "enable", 1286 + .callback = remote_event_callback, 1287 + }, { 1288 + .name = "id", 1289 + .callback = remote_event_callback, 1290 + }, { 1291 + .name = "format", 1292 + .callback = remote_event_callback, 1293 + } 1294 + }; 1295 + bool eventfs_create = false; 1296 + 1297 + if (!eventfs) { 1298 + eventfs = eventfs_create_events_dir("events", remote->dentry, dir_entries, 1299 + ARRAY_SIZE(dir_entries), remote); 1300 + if (IS_ERR(eventfs)) 1301 + return PTR_ERR(eventfs); 1302 + 1303 + /* 1304 + * Create similar hierarchy as local events even if a single system is supported at 1305 + * the moment 1306 + */ 1307 + eventfs = eventfs_create_dir(remote_name, eventfs, NULL, 0, NULL); 1308 + if (IS_ERR(eventfs)) 1309 + return PTR_ERR(eventfs); 1310 + 1311 + remote->eventfs = eventfs; 1312 + eventfs_create = true; 1313 + } 1314 + 1315 + eventfs = eventfs_create_dir(evt->name, eventfs, entries, ARRAY_SIZE(entries), evt); 1316 + if (IS_ERR(eventfs)) { 1317 + if (eventfs_create) { 1318 + eventfs_remove_events_dir(remote->eventfs); 1319 + remote->eventfs = NULL; 1320 + } 1321 + return PTR_ERR(eventfs); 1322 + } 1323 + 1324 + return 0; 1325 + } 1326 + 1327 + static int trace_remote_attach_events(struct trace_remote *remote, struct remote_event *events, 1328 + size_t nr_events) 1329 + { 1330 + int i; 1331 + 1332 + for (i = 0; i < nr_events; i++) { 1333 + struct remote_event *evt = &events[i]; 1334 + 1335 + if (evt->remote) 1336 + return -EEXIST; 1337 + 1338 + evt->remote = remote; 1339 + 1340 + /* We need events to be sorted for efficient lookup */ 1341 + if (i && evt->id <= events[i - 1].id) 1342 + return -EINVAL; 1343 + } 1344 + 1345 + remote->events = events; 1346 + remote->nr_events = nr_events; 1347 + 1348 + return 0; 1349 + } 1350 + 1351 + static int trace_remote_register_events(const char *remote_name, struct trace_remote *remote, 1352 + struct remote_event *events, size_t nr_events) 1353 + { 1354 + int i, ret; 1355 + 1356 + ret = trace_remote_attach_events(remote, events, nr_events); 1357 + if (ret) 1358 + return ret; 1359 + 1360 + for (i = 0; i < nr_events; i++) { 1361 + struct remote_event *evt = &events[i]; 1362 + 1363 + ret = trace_remote_init_eventfs(remote_name, remote, evt); 1364 + if (ret) 1365 + pr_warn("Failed to init eventfs for event '%s' (%d)", 1366 + evt->name, ret); 1367 + } 1368 + 1369 + return 0; 1370 + } 1371 + 1372 + static int __cmp_events(const void *key, const void *data) 1373 + { 1374 + const struct remote_event *evt = data; 1375 + int id = (int)((long)key); 1376 + 1377 + return id - (int)evt->id; 1378 + } 1379 + 1380 + static struct remote_event *trace_remote_find_event(struct trace_remote *remote, unsigned short id) 1381 + { 1382 + return bsearch((const void *)(unsigned long)id, remote->events, remote->nr_events, 1383 + sizeof(*remote->events), __cmp_events); 1384 + }
+25
tools/testing/selftests/ftrace/test.d/remotes/buffer_size.tc
··· 1 + #!/bin/sh 2 + # SPDX-License-Identifier: GPL-2.0 3 + # description: Test trace remote buffer size 4 + # requires: remotes/test 5 + 6 + . $TEST_DIR/remotes/functions 7 + 8 + test_buffer_size() 9 + { 10 + echo 0 > tracing_on 11 + assert_unloaded 12 + 13 + echo 4096 > buffer_size_kb 14 + echo 1 > tracing_on 15 + assert_loaded 16 + 17 + echo 0 > tracing_on 18 + echo 7 > buffer_size_kb 19 + } 20 + 21 + if [ -z "$SOURCE_REMOTE_TEST" ]; then 22 + set -e 23 + setup_remote_test 24 + test_buffer_size 25 + fi
+99
tools/testing/selftests/ftrace/test.d/remotes/functions
··· 1 + # SPDX-License-Identifier: GPL-2.0 2 + 3 + setup_remote() 4 + { 5 + local name=$1 6 + 7 + [ -e $TRACING_DIR/remotes/$name/write_event ] || exit_unresolved 8 + 9 + cd remotes/$name/ 10 + echo 0 > tracing_on 11 + clear_trace 12 + echo 7 > buffer_size_kb 13 + echo 0 > events/enable 14 + echo 1 > events/$name/selftest/enable 15 + echo 1 > tracing_on 16 + } 17 + 18 + setup_remote_test() 19 + { 20 + [ -d $TRACING_DIR/remotes/test/ ] || modprobe remote_test || exit_unresolved 21 + 22 + setup_remote "test" 23 + } 24 + 25 + assert_loaded() 26 + { 27 + grep -q "(loaded)" buffer_size_kb || return 1 28 + } 29 + 30 + assert_unloaded() 31 + { 32 + grep -q "(unloaded)" buffer_size_kb || return 1 33 + } 34 + 35 + reload_remote() 36 + { 37 + echo 0 > tracing_on 38 + clear_trace 39 + assert_unloaded 40 + echo 1 > tracing_on 41 + assert_loaded 42 + } 43 + 44 + dump_trace_pipe() 45 + { 46 + output=$(mktemp $TMPDIR/remote_test.XXXXXX) 47 + cat trace_pipe > $output & 48 + pid=$! 49 + sleep 1 50 + kill -1 $pid 51 + 52 + echo $output 53 + } 54 + 55 + check_trace() 56 + { 57 + start_id="$1" 58 + end_id="$2" 59 + file="$3" 60 + 61 + # Ensure the file is not empty 62 + test -n "$(head $file)" 63 + 64 + prev_ts=0 65 + id=0 66 + 67 + # Only keep <timestamp> <id> 68 + tmp=$(mktemp $TMPDIR/remote_test.XXXXXX) 69 + sed -e 's/\[[0-9]*\]\s*\([0-9]*.[0-9]*\): [a-z]* id=\([0-9]*\)/\1 \2/' $file > $tmp 70 + 71 + while IFS= read -r line; do 72 + ts=$(echo $line | cut -d ' ' -f 1) 73 + id=$(echo $line | cut -d ' ' -f 2) 74 + 75 + test $(echo "$ts>$prev_ts" | bc) -eq 1 76 + test $id -eq $start_id 77 + 78 + prev_ts=$ts 79 + start_id=$((start_id + 1)) 80 + done < $tmp 81 + 82 + test $id -eq $end_id 83 + rm $tmp 84 + } 85 + 86 + get_cpu_ids() 87 + { 88 + sed -n 's/^processor\s*:\s*\([0-9]\+\).*/\1/p' /proc/cpuinfo 89 + } 90 + 91 + get_page_size() 92 + { 93 + sed -ne 's/^.*data.*size:\([0-9][0-9]*\).*/\1/p' events/header_page 94 + } 95 + 96 + get_selftest_event_size() 97 + { 98 + sed -ne 's/^.*field:.*;.*size:\([0-9][0-9]*\);.*/\1/p' events/*/selftest/format | awk '{s+=$1} END {print s}' 99 + }
+88
tools/testing/selftests/ftrace/test.d/remotes/hotplug.tc
··· 1 + #!/bin/sh 2 + # SPDX-License-Identifier: GPL-2.0 3 + # description: Test trace remote read with an offline CPU 4 + # requires: remotes/test 5 + 6 + . $TEST_DIR/remotes/functions 7 + 8 + hotunplug_one_cpu() 9 + { 10 + [ "$(get_cpu_ids | wc -l)" -ge 2 ] || return 1 11 + 12 + for cpu in $(get_cpu_ids); do 13 + echo 0 > /sys/devices/system/cpu/cpu$cpu/online || return 1 14 + break 15 + done 16 + 17 + echo $cpu 18 + } 19 + 20 + # Check non-consuming and consuming read 21 + check_read() 22 + { 23 + for i in $(seq 1 8); do 24 + echo $i > write_event 25 + done 26 + 27 + check_trace 1 8 trace 28 + 29 + output=$(dump_trace_pipe) 30 + check_trace 1 8 $output 31 + rm $output 32 + } 33 + 34 + test_hotplug() 35 + { 36 + echo 0 > trace 37 + assert_loaded 38 + 39 + # 40 + # Test a trace buffer containing an offline CPU 41 + # 42 + 43 + cpu=$(hotunplug_one_cpu) || exit_unsupported 44 + trap "echo 1 > /sys/devices/system/cpu/cpu$cpu/online" EXIT 45 + 46 + check_read 47 + 48 + # 49 + # Test a trace buffer with a missing CPU 50 + # 51 + 52 + reload_remote 53 + 54 + check_read 55 + 56 + # 57 + # Test a trace buffer with a CPU added later 58 + # 59 + 60 + echo 1 > /sys/devices/system/cpu/cpu$cpu/online 61 + trap "" EXIT 62 + assert_loaded 63 + 64 + check_read 65 + 66 + # Test if the ring-buffer for the newly added CPU is both writable and 67 + # readable 68 + for i in $(seq 1 8); do 69 + taskset -c $cpu echo $i > write_event 70 + done 71 + 72 + cd per_cpu/cpu$cpu/ 73 + 74 + check_trace 1 8 trace 75 + 76 + output=$(dump_trace_pipe) 77 + check_trace 1 8 $output 78 + rm $output 79 + 80 + cd - 81 + } 82 + 83 + if [ -z "$SOURCE_REMOTE_TEST" ]; then 84 + set -e 85 + 86 + setup_remote_test 87 + test_hotplug 88 + fi
+11
tools/testing/selftests/ftrace/test.d/remotes/hypervisor/buffer_size.tc
··· 1 + #!/bin/sh 2 + # SPDX-License-Identifier: GPL-2.0 3 + # description: Test hypervisor trace buffer size 4 + # requires: remotes/hypervisor/write_event 5 + 6 + SOURCE_REMOTE_TEST=1 7 + . $TEST_DIR/remotes/buffer_size.tc 8 + 9 + set -e 10 + setup_remote "hypervisor" 11 + test_buffer_size
+11
tools/testing/selftests/ftrace/test.d/remotes/hypervisor/hotplug.tc
··· 1 + #!/bin/sh 2 + # SPDX-License-Identifier: GPL-2.0 3 + # description: Test hypervisor trace read with an offline CPU 4 + # requires: remotes/hypervisor/write_event 5 + 6 + SOURCE_REMOTE_TEST=1 7 + . $TEST_DIR/remotes/hotplug.tc 8 + 9 + set -e 10 + setup_remote "hypervisor" 11 + test_hotplug
+11
tools/testing/selftests/ftrace/test.d/remotes/hypervisor/reset.tc
··· 1 + #!/bin/sh 2 + # SPDX-License-Identifier: GPL-2.0 3 + # description: Test hypervisor trace buffer reset 4 + # requires: remotes/hypervisor/write_event 5 + 6 + SOURCE_REMOTE_TEST=1 7 + . $TEST_DIR/remotes/reset.tc 8 + 9 + set -e 10 + setup_remote "hypervisor" 11 + test_reset
+11
tools/testing/selftests/ftrace/test.d/remotes/hypervisor/trace.tc
··· 1 + #!/bin/sh 2 + # SPDX-License-Identifier: GPL-2.0 3 + # description: Test hypervisor non-consuming trace read 4 + # requires: remotes/hypervisor/write_event 5 + 6 + SOURCE_REMOTE_TEST=1 7 + . $TEST_DIR/remotes/trace.tc 8 + 9 + set -e 10 + setup_remote "hypervisor" 11 + test_trace
+11
tools/testing/selftests/ftrace/test.d/remotes/hypervisor/trace_pipe.tc
··· 1 + #!/bin/sh 2 + # SPDX-License-Identifier: GPL-2.0 3 + # description: Test hypervisor consuming trace read 4 + # requires: remotes/hypervisor/write_event 5 + 6 + SOURCE_REMOTE_TEST=1 7 + . $TEST_DIR/remotes/trace_pipe.tc 8 + 9 + set -e 10 + setup_remote "hypervisor" 11 + test_trace_pipe
+11
tools/testing/selftests/ftrace/test.d/remotes/hypervisor/unloading.tc
··· 1 + #!/bin/sh 2 + # SPDX-License-Identifier: GPL-2.0 3 + # description: Test hypervisor trace buffer unloading 4 + # requires: remotes/hypervisor/write_event 5 + 6 + SOURCE_REMOTE_TEST=1 7 + . $TEST_DIR/remotes/unloading.tc 8 + 9 + set -e 10 + setup_remote "hypervisor" 11 + test_unloading
+90
tools/testing/selftests/ftrace/test.d/remotes/reset.tc
··· 1 + #!/bin/sh 2 + # SPDX-License-Identifier: GPL-2.0 3 + # description: Test trace remote reset 4 + # requires: remotes/test 5 + 6 + . $TEST_DIR/remotes/functions 7 + 8 + check_reset() 9 + { 10 + write_event_path="write_event" 11 + taskset="" 12 + 13 + clear_trace 14 + 15 + # Is the buffer empty? 16 + output=$(dump_trace_pipe) 17 + test $(wc -l $output | cut -d ' ' -f1) -eq 0 18 + 19 + if $(echo $(pwd) | grep -q "per_cpu/cpu"); then 20 + write_event_path="../../write_event" 21 + cpu_id=$(echo $(pwd) | sed -e 's/.*per_cpu\/cpu//') 22 + taskset="taskset -c $cpu_id" 23 + fi 24 + rm $output 25 + 26 + # Can we properly write a new event? 27 + $taskset echo 7890 > $write_event_path 28 + output=$(dump_trace_pipe) 29 + test $(wc -l $output | cut -d ' ' -f1) -eq 1 30 + grep -q "id=7890" $output 31 + rm $output 32 + } 33 + 34 + test_global_interface() 35 + { 36 + output=$(mktemp $TMPDIR/remote_test.XXXXXX) 37 + 38 + # Confidence check 39 + echo 123456 > write_event 40 + output=$(dump_trace_pipe) 41 + grep -q "id=123456" $output 42 + rm $output 43 + 44 + # Reset single event 45 + echo 1 > write_event 46 + check_reset 47 + 48 + # Reset lost events 49 + for i in $(seq 1 10000); do 50 + echo 1 > write_event 51 + done 52 + check_reset 53 + } 54 + 55 + test_percpu_interface() 56 + { 57 + [ "$(get_cpu_ids | wc -l)" -ge 2 ] || return 0 58 + 59 + for cpu in $(get_cpu_ids); do 60 + taskset -c $cpu echo 1 > write_event 61 + done 62 + 63 + check_non_empty=0 64 + for cpu in $(get_cpu_ids); do 65 + cd per_cpu/cpu$cpu/ 66 + 67 + if [ $check_non_empty -eq 0 ]; then 68 + check_reset 69 + check_non_empty=1 70 + else 71 + # Check we have only reset 1 CPU 72 + output=$(dump_trace_pipe) 73 + test $(wc -l $output | cut -d ' ' -f1) -eq 1 74 + rm $output 75 + fi 76 + cd - 77 + done 78 + } 79 + 80 + test_reset() 81 + { 82 + test_global_interface 83 + test_percpu_interface 84 + } 85 + 86 + if [ -z "$SOURCE_REMOTE_TEST" ]; then 87 + set -e 88 + setup_remote_test 89 + test_reset 90 + fi
+102
tools/testing/selftests/ftrace/test.d/remotes/trace.tc
··· 1 + #!/bin/sh 2 + # SPDX-License-Identifier: GPL-2.0 3 + # description: Test trace remote non-consuming read 4 + # requires: remotes/test 5 + 6 + . $TEST_DIR/remotes/functions 7 + 8 + test_trace() 9 + { 10 + echo 0 > tracing_on 11 + assert_unloaded 12 + 13 + echo 7 > buffer_size_kb 14 + echo 1 > tracing_on 15 + assert_loaded 16 + 17 + # Simple test: Emit few events and try to read them 18 + for i in $(seq 1 8); do 19 + echo $i > write_event 20 + done 21 + 22 + check_trace 1 8 trace 23 + 24 + # 25 + # Test interaction with consuming read 26 + # 27 + 28 + cat trace_pipe > /dev/null & 29 + pid=$! 30 + 31 + sleep 1 32 + kill $pid 33 + 34 + test $(wc -l < trace) -eq 0 35 + 36 + for i in $(seq 16 32); do 37 + echo $i > write_event 38 + done 39 + 40 + check_trace 16 32 trace 41 + 42 + # 43 + # Test interaction with reset 44 + # 45 + 46 + echo 0 > trace 47 + 48 + test $(wc -l < trace) -eq 0 49 + 50 + for i in $(seq 1 8); do 51 + echo $i > write_event 52 + done 53 + 54 + check_trace 1 8 trace 55 + 56 + # 57 + # Test interaction with lost events 58 + # 59 + 60 + # Ensure the writer is not on the reader page by reloading the buffer 61 + reload_remote 62 + 63 + # Ensure ring-buffer overflow by emitting events from the same CPU 64 + for cpu in $(get_cpu_ids); do 65 + break 66 + done 67 + 68 + events_per_page=$(($(get_page_size) / $(get_selftest_event_size))) # Approx: does not take TS into account 69 + nr_events=$(($events_per_page * 2)) 70 + for i in $(seq 1 $nr_events); do 71 + taskset -c $cpu echo $i > write_event 72 + done 73 + 74 + id=$(sed -n -e '1s/\[[0-9]*\]\s*[0-9]*.[0-9]*: [a-z]* id=\([0-9]*\)/\1/p' trace) 75 + test $id -ne 1 76 + 77 + check_trace $id $nr_events trace 78 + 79 + # 80 + # Test per-CPU interface 81 + # 82 + echo 0 > trace 83 + 84 + for cpu in $(get_cpu_ids) ; do 85 + taskset -c $cpu echo $cpu > write_event 86 + done 87 + 88 + for cpu in $(get_cpu_ids); do 89 + cd per_cpu/cpu$cpu/ 90 + 91 + check_trace $cpu $cpu trace 92 + 93 + cd - > /dev/null 94 + done 95 + } 96 + 97 + if [ -z "$SOURCE_REMOTE_TEST" ]; then 98 + set -e 99 + 100 + setup_remote_test 101 + test_trace 102 + fi
+102
tools/testing/selftests/ftrace/test.d/remotes/trace_pipe.tc
··· 1 + #!/bin/sh 2 + # SPDX-License-Identifier: GPL-2.0 3 + # description: Test trace remote consuming read 4 + # requires: remotes/test 5 + 6 + . $TEST_DIR/remotes/functions 7 + 8 + test_trace_pipe() 9 + { 10 + echo 0 > tracing_on 11 + assert_unloaded 12 + 13 + # Emit events from the same CPU 14 + for cpu in $(get_cpu_ids); do 15 + break 16 + done 17 + 18 + # 19 + # Simple test: Emit enough events to fill few pages 20 + # 21 + 22 + echo 1024 > buffer_size_kb 23 + echo 1 > tracing_on 24 + assert_loaded 25 + 26 + events_per_page=$(($(get_page_size) / $(get_selftest_event_size))) 27 + nr_events=$(($events_per_page * 4)) 28 + 29 + output=$(mktemp $TMPDIR/remote_test.XXXXXX) 30 + 31 + cat trace_pipe > $output & 32 + pid=$! 33 + 34 + for i in $(seq 1 $nr_events); do 35 + taskset -c $cpu echo $i > write_event 36 + done 37 + 38 + echo 0 > tracing_on 39 + sleep 1 40 + kill $pid 41 + 42 + check_trace 1 $nr_events $output 43 + 44 + rm $output 45 + 46 + # 47 + # Test interaction with lost events 48 + # 49 + 50 + assert_unloaded 51 + echo 7 > buffer_size_kb 52 + echo 1 > tracing_on 53 + assert_loaded 54 + 55 + nr_events=$((events_per_page * 2)) 56 + for i in $(seq 1 $nr_events); do 57 + taskset -c $cpu echo $i > write_event 58 + done 59 + 60 + output=$(dump_trace_pipe) 61 + 62 + lost_events=$(sed -n -e '1s/CPU:.*\[LOST \([0-9]*\) EVENTS\]/\1/p' $output) 63 + test -n "$lost_events" 64 + 65 + id=$(sed -n -e '2s/\[[0-9]*\]\s*[0-9]*.[0-9]*: [a-z]* id=\([0-9]*\)/\1/p' $output) 66 + test "$id" -eq $(($lost_events + 1)) 67 + 68 + # Drop [LOST EVENTS] line 69 + sed -i '1d' $output 70 + 71 + check_trace $id $nr_events $output 72 + 73 + rm $output 74 + 75 + # 76 + # Test per-CPU interface 77 + # 78 + 79 + echo 0 > trace 80 + echo 1 > tracing_on 81 + 82 + for cpu in $(get_cpu_ids); do 83 + taskset -c $cpu echo $cpu > write_event 84 + done 85 + 86 + for cpu in $(get_cpu_ids); do 87 + cd per_cpu/cpu$cpu/ 88 + output=$(dump_trace_pipe) 89 + 90 + check_trace $cpu $cpu $output 91 + 92 + rm $output 93 + cd - > /dev/null 94 + done 95 + } 96 + 97 + if [ -z "$SOURCE_REMOTE_TEST" ]; then 98 + set -e 99 + 100 + setup_remote_test 101 + test_trace_pipe 102 + fi
+41
tools/testing/selftests/ftrace/test.d/remotes/unloading.tc
··· 1 + #!/bin/sh 2 + # SPDX-License-Identifier: GPL-2.0 3 + # description: Test trace remote unloading 4 + # requires: remotes/test 5 + 6 + . $TEST_DIR/remotes/functions 7 + 8 + test_unloading() 9 + { 10 + # No reader, writing 11 + assert_loaded 12 + 13 + # No reader, no writing 14 + echo 0 > tracing_on 15 + assert_unloaded 16 + 17 + # 1 reader, no writing 18 + cat trace_pipe & 19 + pid=$! 20 + sleep 1 21 + assert_loaded 22 + kill $pid 23 + assert_unloaded 24 + 25 + # No reader, no writing, events 26 + echo 1 > tracing_on 27 + echo 1 > write_event 28 + echo 0 > tracing_on 29 + assert_loaded 30 + 31 + # Test reset 32 + clear_trace 33 + assert_unloaded 34 + } 35 + 36 + if [ -z "$SOURCE_REMOTE_TEST" ]; then 37 + set -e 38 + 39 + setup_remote_test 40 + test_unloading 41 + fi