Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

drm/amdgpu: Fix eviction fence worker race during fd close

The current cleanup order during file descriptor close can lead to
a race condition where the eviction fence worker attempts to access
a destroyed mutex from the user queue manager:

[ 517.294055] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
[ 517.294060] WARNING: CPU: 8 PID: 2030 at kernel/locking/mutex.c:564
[ 517.294094] Workqueue: events amdgpu_eviction_fence_suspend_worker [amdgpu]

The issue occurs because:
1. We destroy the user queue manager (including its mutex) first
2. Then try to destroy eviction fences which may have pending work
3. The eviction fence worker may try to access the already-destroyed mutex

Fix this by reordering the cleanup to:
1. First mark the fd as closing and destroy eviction fences,
which flushes any pending work
2. Then safely destroy the user queue manager after we're certain
no more fence work will be executed

The copy in amdgpu_driver_postclose_kms() needs to be removed (Christian)

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

authored by

Jesse.Zhang and committed by
Alex Deucher
0132ba7f b2c11e27

+1 -6
+1 -1
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
··· 2913 2913 2914 2914 if (fpriv) { 2915 2915 fpriv->evf_mgr.fd_closing = true; 2916 - amdgpu_userq_mgr_fini(&fpriv->userq_mgr); 2917 2916 amdgpu_eviction_fence_destroy(&fpriv->evf_mgr); 2917 + amdgpu_userq_mgr_fini(&fpriv->userq_mgr); 2918 2918 } 2919 2919 2920 2920 return drm_release(inode, filp);
-5
drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
··· 1502 1502 amdgpu_bo_unreserve(pd); 1503 1503 } 1504 1504 1505 - if (!fpriv->evf_mgr.fd_closing) { 1506 - fpriv->evf_mgr.fd_closing = true; 1507 - amdgpu_userq_mgr_fini(&fpriv->userq_mgr); 1508 - amdgpu_eviction_fence_destroy(&fpriv->evf_mgr); 1509 - } 1510 1505 amdgpu_ctx_mgr_fini(&fpriv->ctx_mgr); 1511 1506 amdgpu_vm_fini(adev, &fpriv->vm); 1512 1507