Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

drm/amdkfd: Handle GPU reset and drain retry fault race

Only check and drain IH1 ring if CAM is not enabled.

If GPU is under reset, don't access IH to drain retry fault.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

authored by

Philip Yang and committed by
Alex Deucher
5b57c3c3 e1c94109

+6 -1
+6 -1
drivers/gpu/drm/amd/amdkfd/kfd_svm.c
··· 33 33 #include "amdgpu_hmm.h" 34 34 #include "amdgpu.h" 35 35 #include "amdgpu_xgmi.h" 36 + #include "amdgpu_reset.h" 36 37 #include "kfd_priv.h" 37 38 #include "kfd_svm.h" 38 39 #include "kfd_migrate.h" ··· 2370 2369 2371 2370 pr_debug("drain retry fault gpu %d svms %p\n", i, svms); 2372 2371 2372 + if (!down_read_trylock(&pdd->dev->adev->reset_domain->sem)) 2373 + continue; 2374 + 2373 2375 amdgpu_ih_wait_on_checkpoint_process_ts(pdd->dev->adev, 2374 2376 pdd->dev->adev->irq.retry_cam_enabled ? 2375 2377 &pdd->dev->adev->irq.ih : ··· 2382 2378 amdgpu_ih_wait_on_checkpoint_process_ts(pdd->dev->adev, 2383 2379 &pdd->dev->adev->irq.ih_soft); 2384 2380 2381 + up_read(&pdd->dev->adev->reset_domain->sem); 2385 2382 2386 2383 pr_debug("drain retry fault gpu %d svms 0x%p done\n", i, svms); 2387 2384 } ··· 2566 2561 adev = pdd->dev->adev; 2567 2562 2568 2563 /* Check and drain ih1 ring if cam not available */ 2569 - if (adev->irq.ih1.ring_size) { 2564 + if (!adev->irq.retry_cam_enabled && adev->irq.ih1.ring_size) { 2570 2565 ih = &adev->irq.ih1; 2571 2566 checkpoint_wptr = amdgpu_ih_get_wptr(adev, ih); 2572 2567 if (ih->rptr != checkpoint_wptr) {