Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

drm/amdkfd: Fix queue preemption/eviction failures by aligning control stack size to GPU page size

The control stack size is calculated based on the number of CUs and
waves, and is then aligned to PAGE_SIZE. When the resulting control
stack size is aligned to 64 KB, GPU hangs and queue preemption
failures are observed while running RCCL unit tests on systems with
more than two GPUs.

amdgpu 0048:0f:00.0: amdgpu: Queue preemption failed for queue with
doorbell_id: 80030008
amdgpu 0048:0f:00.0: amdgpu: Failed to evict process queues
amdgpu 0048:0f:00.0: amdgpu: GPU reset begin!. Source: 4
amdgpu 0048:0f:00.0: amdgpu: Queue preemption failed for queue with
doorbell_id: 80030008
amdgpu 0048:0f:00.0: amdgpu: Failed to evict process queues
amdgpu 0048:0f:00.0: amdgpu: Failed to restore process queues

This issue is observed on both 4 KB and 64 KB system page-size
configurations.

This patch fixes the issue by aligning the control stack size to
AMDGPU_GPU_PAGE_SIZE instead of PAGE_SIZE, so the control stack size
will not be 64 KB on systems with a 64 KB page size and queue
preemption works correctly.

Additionally, In the current code, wg_data_size is aligned to PAGE_SIZE,
which can waste memory if the system page size is large. In this patch,
wg_data_size is aligned to AMDGPU_GPU_PAGE_SIZE. The cwsr_size, calculated
from wg_data_size and the control stack size, is aligned to PAGE_SIZE.

Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

authored by

Donet Tom and committed by
Alex Deucher
a3e14436 6ab4054f

+4 -3
+4 -3
drivers/gpu/drm/amd/amdkfd/kfd_queue.c
··· 492 492 cu_num = props->simd_count / props->simd_per_cu / NUM_XCC(dev->gpu->xcc_mask); 493 493 wave_num = get_num_waves(props, gfxv, cu_num); 494 494 495 - wg_data_size = ALIGN(cu_num * WG_CONTEXT_DATA_SIZE_PER_CU(gfxv, props), PAGE_SIZE); 495 + wg_data_size = ALIGN(cu_num * WG_CONTEXT_DATA_SIZE_PER_CU(gfxv, props), 496 + AMDGPU_GPU_PAGE_SIZE); 496 497 ctl_stack_size = wave_num * CNTL_STACK_BYTES_PER_WAVE(gfxv) + 8; 497 498 ctl_stack_size = ALIGN(SIZEOF_HSA_USER_CONTEXT_SAVE_AREA_HEADER + ctl_stack_size, 498 - PAGE_SIZE); 499 + AMDGPU_GPU_PAGE_SIZE); 499 500 500 501 if ((gfxv / 10000 * 10000) == 100000) { 501 502 /* HW design limits control stack size to 0x7000. ··· 508 507 509 508 props->ctl_stack_size = ctl_stack_size; 510 509 props->debug_memory_size = ALIGN(wave_num * DEBUGGER_BYTES_PER_WAVE, DEBUGGER_BYTES_ALIGN); 511 - props->cwsr_size = ctl_stack_size + wg_data_size; 510 + props->cwsr_size = ALIGN(ctl_stack_size + wg_data_size, PAGE_SIZE); 512 511 513 512 if (gfxv == 80002) /* GFX_VERSION_TONGA */ 514 513 props->eop_buffer_size = 0x8000;