Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

drm/amd/display: Fix ISM teardown crash from NULL dc dereference

The Idle State Manager (ISM) uses delayed work to apply display idle
optimizations later, instead of immediately. This helps avoid rapid idle
transitions that can hurt power or performance.

A crash was seen during driver teardown. The system boots normally and
the driver loads successfully. Later, when the GPU is being stopped, the
log shows:

amdgpu 0000:0e:00.0: finishing device.
Workqueue: events_unbound dm_ism_sso_delayed_work_func [amdgpu]

After this, delayed ISM work still runs and reaches:

dm_ism_sso_delayed_work_func()
-> amdgpu_dm_ism_commit_event()
-> dm_ism_commit_idle_optimization_state()
-> dc_allow_idle_optimizations_internal()

The crash report showed:
KASAN: null-ptr-deref in range [0x690-0x697]

Signature:
[22601.113316] KASAN: null-ptr-deref in range [0x0000000000000690-0x0000000000000697]
...
[22601.113368] Workqueue: events_unbound dm_ism_sso_delayed_work_func [amdgpu]
[22601.113930] RIP: 0010:dc_allow_idle_optimizations_internal+0xa6/0xc40 [amdgpu]
...
[22601.114491] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: 0000000000000690
...
[22601.114561] Call Trace:
[22601.114566] <TASK>
[22601.114572] ? srso_alias_return_thunk+0x5/0xfbef5
[22601.114582] ? update_load_avg+0x1b6/0x20b0
[22601.114593] ? __pfx_dc_allow_idle_optimizations_internal+0x10/0x10 [amdgpu]
[22601.114932] ? psi_group_change+0x4ed/0x8d0
[22601.114942] dm_ism_commit_idle_optimization_state+0x214/0x570 [amdgpu]
[22601.115268] amdgpu_dm_ism_commit_event+0xe1d/0x15a0 [amdgpu]
[22601.115588] ? srso_alias_return_thunk+0x5/0xfbef5
[22601.115595] ? __kasan_check_write+0x18/0x20
[22601.115603] ? srso_alias_return_thunk+0x5/0xfbef5
[22601.115610] ? mutex_lock+0x83/0xc0
[22601.115620] dm_ism_sso_delayed_work_func+0x64/0x90 [amdgpu]

GDB resolved dc_allow_idle_optimizations_internal+0xa6 to:

struct dc_state *context = dc->current_state;

The matching disassembly showed:

mov %rdi, %r12
mov 0x690(%r12), %r13

where r12 holds the dc pointer. A GDB layout dump of struct dc showed:

/* 1680 | 8 */ struct dc_state *current_state;

Since 1680 decimal is 0x690, this confirms that current_state is at
offset 0x690. The faulting access was effectively:

dc + 0x690

which indicates that dc was NULL at the time of dereference.

This shows that ISM work can still run during teardown after dc has
been cleared.

ISM is not expected to run after dc is destroyed. Fix this by disabling
ISM under dc_lock in amdgpu_dm_fini() before dc_destroy(), ensuring no
further ISM work runs after dc teardown.

Also add ASSERT(dm->dc) in amdgpu_dm_ism_commit_event() to enforce this
invariant, and ASSERT(mutex_is_locked(&dm->dc_lock)) in
amdgpu_dm_ism_disable() to clarify the locking requirement.

Fixes: 754003486c3c ("drm/amd/display: Add Idle state manager(ISM)")
Suggested-by: Leo Li <sunpeng.li@amd.com>
Cc: Ray Wu <ray.wu@amd.com>
Cc: Roman Li <roman.li@amd.com>
Cc: Alex Hung <alex.hung@amd.com>
Cc: Tom Chung <chiahsuan.chung@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

authored by

Srinivasan Shanmugam and committed by
Alex Deucher
732a8add 8bf0cb97

+8 -8
+3 -8
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
··· 2240 2240 static void amdgpu_dm_fini(struct amdgpu_device *adev) 2241 2241 { 2242 2242 int i; 2243 - struct drm_crtc *crtc; 2244 - struct amdgpu_crtc *acrtc; 2245 2243 2246 2244 if (adev->dm.vblank_control_workqueue) { 2247 2245 destroy_workqueue(adev->dm.vblank_control_workqueue); ··· 2256 2258 adev->dm.idle_workqueue = NULL; 2257 2259 } 2258 2260 2259 - /* Finalize ISM for each CRTC before dc_destroy() sets dm->dc to NULL */ 2260 - drm_for_each_crtc(crtc, adev_to_drm(adev)) { 2261 - acrtc = to_amdgpu_crtc(crtc); 2262 - amdgpu_dm_ism_fini(&acrtc->ism); 2263 - 2264 - } 2261 + /* Disable ISM before dc_destroy() invalidates dm->dc */ 2262 + scoped_guard(mutex, &adev->dm.dc_lock) 2263 + amdgpu_dm_ism_disable(&adev->dm); 2265 2264 2266 2265 amdgpu_dm_destroy_drm_device(&adev->dm); 2267 2266
+5
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_ism.c
··· 472 472 /* ISM transitions must be called with mutex acquired */ 473 473 ASSERT(mutex_is_locked(&dm->dc_lock)); 474 474 475 + /* ISM should not run after dc is destroyed */ 476 + ASSERT(dm->dc); 477 + 475 478 if (!acrtc_state) { 476 479 trace_amdgpu_dm_ism_event(acrtc->crtc_id, "NO_STATE", 477 480 "NO_STATE", "N/A"); ··· 547 544 struct drm_crtc *crtc; 548 545 struct amdgpu_crtc *acrtc; 549 546 struct amdgpu_dm_ism *ism; 547 + 548 + ASSERT(mutex_is_locked(&dm->dc_lock)); 550 549 551 550 drm_for_each_crtc(crtc, dm->ddev) { 552 551 acrtc = to_amdgpu_crtc(crtc);