Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

drm/i915/guc: Handle race condition where wakeref count drops below 0

There is a rare race condition when preparing for a reset where
guc_lrc_desc_unpin() could be in the process of deregistering a context
while a different thread is scrubbing outstanding contexts and it alters
the context state and does a wakeref put. Then, if there is a failure
with deregister_context(), a second wakeref put could occur. As a result
the wakeref count could drop below 0 and fail an INTEL_WAKEREF_BUG_ON()
check.

Therefore if there is a failure with deregister_context(), undo the
context state changes and do a wakeref put only if the context was set
to be destroyed earlier.

v2: Expand comment to better explain change. (Daniele)
v3: Removed addition to the original comment. (Daniele)

Fixes: 2f2cc53b5fe7 ("drm/i915/guc: Close deregister-context race against CT-loss")
Signed-off-by: Jesus Narvaez <jesus.narvaez@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Cc: Mousumi Jana <mousumi.jana@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://lore.kernel.org/r/20250528230551.1855177-1-jesus.narvaez@intel.com

authored by

Jesus Narvaez and committed by
John Harrison
f36a75ab a6a26786

+14 -3
+14 -3
drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
··· 3443 3443 * GuC is active, lets destroy this context, but at this point we can still be racing 3444 3444 * with suspend, so we undo everything if the H2G fails in deregister_context so 3445 3445 * that GuC reset will find this context during clean up. 3446 + * 3447 + * There is a race condition where the reset code could have altered 3448 + * this context's state and done a wakeref put before we try to 3449 + * deregister it here. So check if the context is still set to be 3450 + * destroyed before undoing earlier changes, to avoid two wakeref puts 3451 + * on the same context. 3446 3452 */ 3447 3453 ret = deregister_context(ce, ce->guc_id.id); 3448 3454 if (ret) { 3455 + bool pending_destroyed; 3449 3456 spin_lock_irqsave(&ce->guc_state.lock, flags); 3450 - set_context_registered(ce); 3451 - clr_context_destroyed(ce); 3457 + pending_destroyed = context_destroyed(ce); 3458 + if (pending_destroyed) { 3459 + set_context_registered(ce); 3460 + clr_context_destroyed(ce); 3461 + } 3452 3462 spin_unlock_irqrestore(&ce->guc_state.lock, flags); 3453 3463 /* 3454 3464 * As gt-pm is awake at function entry, intel_wakeref_put_async merely decrements 3455 3465 * the wakeref immediately but per function spec usage call this after unlock. 3456 3466 */ 3457 - intel_wakeref_put_async(&gt->wakeref); 3467 + if (pending_destroyed) 3468 + intel_wakeref_put_async(&gt->wakeref); 3458 3469 } 3459 3470 3460 3471 return ret;