Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

blktrace: for ftrace use correct trace format ver

The ftrace blktrace path allocates buffers and writes trace events but
was using the wrong recording function. After
commit 4d8bc7bd4f73 ("blktrace: move ftrace blk_io_tracer to blk_io_trace2"),
the ftrace interface was moved to use blk_io_trace2 format, but
__blk_add_trace() still called record_blktrace_event() which writes in
blk_io_trace (v1) format.

This causes critical data corruption:

- blk_io_trace (v1) has 32-bit 'action' field at offset 28
- blk_io_trace2 (v2) has 32-bit 'pid' at offset 28 and 64-bit 'action'
at offset 32
- When record_blktrace_event() writes to a v2 buffer:
* Writing pid (offset 32 in v1) corrupts the v2 action field
* Writing action (offset 28 in v1) corrupts the v2 pid field
* The 64-bit action is truncated to 32-bit via lower_32_bits()

Fix by:
1. Adding version switch to select correct format (v1 vs v2)
2. Calling appropriate recording function based on version
3. Defaulting to v2 for ftrace (as intended by commit 4d8bc7bd4f73)
4. Adding WARN_ONCE for unexpected version values

Without this patch :-
linux-block (for-next) # sh reproduce_blktrace_bug.sh
dd-14242 [033] d..1. 3903.022308: Unknown action 36a2
dd-14242 [033] d..1. 3903.022333: Unknown action 36a2
dd-14242 [033] d..1. 3903.022365: Unknown action 36a2
dd-14242 [033] d..1. 3903.022366: Unknown action 36a2
dd-14242 [033] d..1. 3903.022369: Unknown action 36a2

The action field is corrupted because:
- ftrace allocated blk_io_trace2 buffer (64 bytes)
- But called record_blktrace_event() (writes v1, 48 bytes)
- Field offsets don't match, causing corruption

The hex value shown 0x30e3 is actually a PID, not an action code!

linux-block (for-next) #
linux-block (for-next) #
linux-block (for-next) # sh reproduce_blktrace_bug.sh
Trace output looks correct:

dd-2420 [019] d..1. 59.641742: 251,0 Q RS 0 + 8 [dd]
dd-2420 [019] d..1. 59.641775: 251,0 G RS 0 + 8 [dd]
dd-2420 [019] d..1. 59.641784: 251,0 P N [dd]
dd-2420 [019] d..1. 59.641785: 251,0 U N [dd] 1
dd-2420 [019] d..1. 59.641788: 251,0 D RS 0 + 8 [dd]

Fixes: 4d8bc7bd4f73 ("blktrace: move ftrace blk_io_tracer to blk_io_trace2")
Signed-off-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

authored by

Chaitanya Kulkarni and committed by
Jens Axboe
e48886b9 4a0940bd

+54 -5
+54 -5
kernel/trace/blktrace.c
··· 384 384 385 385 buffer = blk_tr->array_buffer.buffer; 386 386 trace_ctx = tracing_gen_ctx_flags(0); 387 - trace_len = sizeof(struct blk_io_trace2) + pdu_len + cgid_len; 387 + switch (bt->version) { 388 + case 1: 389 + trace_len = sizeof(struct blk_io_trace); 390 + break; 391 + case 2: 392 + default: 393 + /* 394 + * ftrace always uses v2 (blk_io_trace2) format. 395 + * 396 + * For sysfs-enabled tracing path (enabled via 397 + * /sys/block/DEV/trace/enable), blk_trace_setup_queue() 398 + * never initializes bt->version, leaving it 0 from 399 + * kzalloc(). We must handle version==0 safely here. 400 + * 401 + * Fall through to default to ensure we never hit the 402 + * old bug where default set trace_len=0, causing 403 + * buffer underflow and memory corruption. 404 + * 405 + * Always use v2 format for ftrace and normalize 406 + * bt->version to 2 when uninitialized. 407 + */ 408 + trace_len = sizeof(struct blk_io_trace2); 409 + if (bt->version == 0) 410 + bt->version = 2; 411 + break; 412 + } 413 + trace_len += pdu_len + cgid_len; 388 414 event = trace_buffer_lock_reserve(buffer, TRACE_BLK, 389 415 trace_len, trace_ctx); 390 416 if (!event) 391 417 return; 392 418 393 - record_blktrace_event(ring_buffer_event_data(event), 394 - pid, cpu, sector, bytes, 395 - what, bt->dev, error, cgid, cgid_len, 396 - pdu_data, pdu_len); 419 + switch (bt->version) { 420 + case 1: 421 + record_blktrace_event(ring_buffer_event_data(event), 422 + pid, cpu, sector, bytes, 423 + what, bt->dev, error, cgid, cgid_len, 424 + pdu_data, pdu_len); 425 + break; 426 + case 2: 427 + default: 428 + /* 429 + * Use v2 recording function (record_blktrace_event2) 430 + * which writes blk_io_trace2 structure with correct 431 + * field layout: 432 + * - 32-bit pid at offset 28 433 + * - 64-bit action at offset 32 434 + * 435 + * Fall through to default handles version==0 case 436 + * (from sysfs path), ensuring we always use correct 437 + * v2 recording function to match the v2 buffer 438 + * allocated above. 439 + */ 440 + record_blktrace_event2(ring_buffer_event_data(event), 441 + pid, cpu, sector, bytes, 442 + what, bt->dev, error, cgid, cgid_len, 443 + pdu_data, pdu_len); 444 + break; 445 + } 397 446 398 447 trace_buffer_unlock_commit(blk_tr, buffer, event, trace_ctx); 399 448 return;