Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

accel/amdxdna: Fix command hang on suspended hardware context

When a hardware context is suspended, the job scheduler is stopped. If a
command is submitted while the context is suspended, the job is queued in
the scheduler but aie2_sched_job_run() is never invoked to restart the
hardware context. As a result, the command hangs.

Fix this by modifying the hardware context suspend routine to keep the job
scheduler running so that queued jobs can trigger context restart properly.

Fixes: aac243092b70 ("accel/amdxdna: Add command execution")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20260211205341.722982-1-lizhi.hou@amd.com

Lizhi Hou 07efce5a fdb65acf

+11 -7
+11 -7
drivers/accel/amdxdna/aie2_ctx.c
··· 53 53 { 54 54 drm_sched_stop(&hwctx->priv->sched, bad_job); 55 55 aie2_destroy_context(xdna->dev_handle, hwctx); 56 + drm_sched_start(&hwctx->priv->sched, 0); 56 57 } 57 58 58 59 static int aie2_hwctx_restart(struct amdxdna_dev *xdna, struct amdxdna_hwctx *hwctx) ··· 81 80 } 82 81 83 82 out: 84 - drm_sched_start(&hwctx->priv->sched, 0); 85 83 XDNA_DBG(xdna, "%s restarted, ret %d", hwctx->name, ret); 86 84 return ret; 87 85 } ··· 297 297 struct dma_fence *fence; 298 298 int ret; 299 299 300 - if (!hwctx->priv->mbox_chann) 300 + ret = amdxdna_pm_resume_get(hwctx->client->xdna); 301 + if (ret) 301 302 return NULL; 302 303 303 - if (!mmget_not_zero(job->mm)) 304 + if (!hwctx->priv->mbox_chann) { 305 + amdxdna_pm_suspend_put(hwctx->client->xdna); 306 + return NULL; 307 + } 308 + 309 + if (!mmget_not_zero(job->mm)) { 310 + amdxdna_pm_suspend_put(hwctx->client->xdna); 304 311 return ERR_PTR(-ESRCH); 312 + } 305 313 306 314 kref_get(&job->refcnt); 307 315 fence = dma_fence_get(job->fence); 308 - 309 - ret = amdxdna_pm_resume_get(hwctx->client->xdna); 310 - if (ret) 311 - goto out; 312 316 313 317 if (job->drv_cmd) { 314 318 switch (job->drv_cmd->opcode) {