Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

tracing: Fix false sharing in hwlat get_sample()

The get_sample() function in the hwlat tracer assumes the caller holds
hwlat_data.lock, but this is not actually happening. The result is
unprotected data access to hwlat_data, and in per-cpu mode can result in
false sharing which may show up as false positive latency events.

The specific case of false sharing observed was primarily between
hwlat_data.sample_width and hwlat_data.count. These are separated by
just 8B and are therefore likely to share a cache line. When one thread
modifies count, the cache line is in a modified state so when other
threads read sample_width in the main latency detection loop, they fetch
the modified cache line. On some systems, the fetch itself may be slow
enough to count as a latency event, which could set up a self
reinforcing cycle of latency events as each event increments count which
then causes more latency events, continuing the cycle.

The other result of the unprotected data access is that hwlat_data.count
can end up with duplicate or missed values, which was observed on some
systems in testing.

Convert hwlat_data.count to atomic64_t so it can be safely modified
without locking, and prevent false sharing by pulling sample_width into
a local variable.

One system this was tested on was a dual socket server with 32 CPUs on
each numa node. With settings of 1us threshold, 1000us width, and
2000us window, this change reduced the number of latency events from
500 per second down to approximately 1 event per minute. Some machines
tested did not exhibit measurable latency from the false sharing.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260210074810.6328-1-clord@mykolab.com
Signed-off-by: Colin Lord <clord@mykolab.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

authored by

Colin Lord and committed by
Steven Rostedt (Google)
f743435f b4bade50

+7 -8
+7 -8
kernel/trace/trace_hwlat.c
··· 102 102 /* keep the global state somewhere. */ 103 103 static struct hwlat_data { 104 104 105 - struct mutex lock; /* protect changes */ 105 + struct mutex lock; /* protect changes */ 106 106 107 - u64 count; /* total since reset */ 107 + atomic64_t count; /* total since reset */ 108 108 109 109 u64 sample_window; /* total sampling window (on+off) */ 110 110 u64 sample_width; /* active sampling portion of window */ ··· 193 193 * get_sample - sample the CPU TSC and look for likely hardware latencies 194 194 * 195 195 * Used to repeatedly capture the CPU TSC (or similar), looking for potential 196 - * hardware-induced latency. Called with interrupts disabled and with 197 - * hwlat_data.lock held. 196 + * hardware-induced latency. Called with interrupts disabled. 198 197 */ 199 198 static int get_sample(void) 200 199 { ··· 203 204 time_type start, t1, t2, last_t2; 204 205 s64 diff, outer_diff, total, last_total = 0; 205 206 u64 sample = 0; 207 + u64 sample_width = READ_ONCE(hwlat_data.sample_width); 206 208 u64 thresh = tracing_thresh; 207 209 u64 outer_sample = 0; 208 210 int ret = -1; ··· 267 267 if (diff > sample) 268 268 sample = diff; /* only want highest value */ 269 269 270 - } while (total <= hwlat_data.sample_width); 270 + } while (total <= sample_width); 271 271 272 272 barrier(); /* finish the above in the view for NMIs */ 273 273 trace_hwlat_callback_enabled = false; ··· 285 285 if (kdata->nmi_total_ts) 286 286 do_div(kdata->nmi_total_ts, NSEC_PER_USEC); 287 287 288 - hwlat_data.count++; 289 - s.seqnum = hwlat_data.count; 288 + s.seqnum = atomic64_inc_return(&hwlat_data.count); 290 289 s.duration = sample; 291 290 s.outer_duration = outer_sample; 292 291 s.nmi_total_ts = kdata->nmi_total_ts; ··· 831 832 832 833 hwlat_trace = tr; 833 834 834 - hwlat_data.count = 0; 835 + atomic64_set(&hwlat_data.count, 0); 835 836 tr->max_latency = 0; 836 837 save_tracing_thresh = tracing_thresh; 837 838