Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

drm/amdgpu: Check fence emitted count to identify bad jobs

In SRIOV, when host driver performs MODE 1 reset and notifies FLR to
guest driver, there is a small chance that there is no job running on hw
but the driver has not updated the pending list yet, causing the driver
not respond the FLR request. Modify the has_job_running function to
make sure if there is still running job.

v2: Use amdgpu_fence_count_emitted to determine job running status.
v3: Remove the timeout wait in has_job_running

Signed-off-by: Emily Deng <Emily.Deng@amd.com>
Signed-off-by: Shikang Fan <shikang.fan@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

authored by

Shikang Fan and committed by
Alex Deucher
0859eb54 9aa879da

+6 -8
+6 -8
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
··· 5238 5238 } 5239 5239 5240 5240 /** 5241 - * amdgpu_device_has_job_running - check if there is any job in mirror list 5241 + * amdgpu_device_has_job_running - check if there is any unfinished job 5242 5242 * 5243 5243 * @adev: amdgpu_device pointer 5244 5244 * 5245 - * check if there is any job in mirror list 5245 + * check if there is any job running on the device when guest driver receives 5246 + * FLR notification from host driver. If there are still jobs running, then 5247 + * the guest driver will not respond the FLR reset. Instead, let the job hit 5248 + * the timeout and guest driver then issue the reset request. 5246 5249 */ 5247 5250 bool amdgpu_device_has_job_running(struct amdgpu_device *adev) 5248 5251 { 5249 5252 int i; 5250 - struct drm_sched_job *job; 5251 5253 5252 5254 for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { 5253 5255 struct amdgpu_ring *ring = adev->rings[i]; ··· 5257 5255 if (!amdgpu_ring_sched_ready(ring)) 5258 5256 continue; 5259 5257 5260 - spin_lock(&ring->sched.job_list_lock); 5261 - job = list_first_entry_or_null(&ring->sched.pending_list, 5262 - struct drm_sched_job, list); 5263 - spin_unlock(&ring->sched.job_list_lock); 5264 - if (job) 5258 + if (amdgpu_fence_count_emitted(ring)) 5265 5259 return true; 5266 5260 } 5267 5261 return false;