Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

drm/amdgpu: fix strsep() corrupting lockup_timeout on multi-GPU (v3)

amdgpu_device_get_job_timeout_settings() passes a pointer directly
to the global amdgpu_lockup_timeout[] buffer into strsep().
strsep() destructively replaces delimiter characters with '\0'
in-place.

On multi-GPU systems, this function is called once per device.
When a multi-value setting like "0,0,0,-1" is used, the first
GPU's call transforms the global buffer into "0\00\00\0-1". The
second GPU then sees only "0" (terminated at the first '\0'),
parses a single value, hits the single-value fallthrough
(index == 1), and applies timeout=0 to all rings — causing
immediate false job timeouts.

Fix this by copying into a stack-local array before calling
strsep(), so the global module parameter buffer remains intact
across calls. The buffer is AMDGPU_MAX_TIMEOUT_PARAM_LENGTH
(256) bytes, which is safe for the stack.

v2: wrap commit message to 72 columns, add Assisted-by tag.
v3: use stack array with strscpy() instead of kstrdup()/kfree()
to avoid unnecessary heap allocation (Christian).

This patch was developed with assistance from Claude (claude-opus-4-6).

Assisted-by: Claude:claude-opus-4-6
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 94d79f51efecb74be1d88dde66bdc8bfcca17935)
Cc: stable@vger.kernel.org

authored by

Ruijing Dong and committed by
Alex Deucher
2d300ebf aed3d041

+11 -2
+11 -2
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
··· 4207 4207 4208 4208 static int amdgpu_device_get_job_timeout_settings(struct amdgpu_device *adev) 4209 4209 { 4210 - char *input = amdgpu_lockup_timeout; 4210 + char buf[AMDGPU_MAX_TIMEOUT_PARAM_LENGTH]; 4211 + char *input = buf; 4211 4212 char *timeout_setting = NULL; 4212 4213 int index = 0; 4213 4214 long timeout; ··· 4218 4217 adev->gfx_timeout = adev->compute_timeout = adev->sdma_timeout = 4219 4218 adev->video_timeout = msecs_to_jiffies(2000); 4220 4219 4221 - if (!strnlen(input, AMDGPU_MAX_TIMEOUT_PARAM_LENGTH)) 4220 + if (!strnlen(amdgpu_lockup_timeout, AMDGPU_MAX_TIMEOUT_PARAM_LENGTH)) 4222 4221 return 0; 4222 + 4223 + /* 4224 + * strsep() destructively modifies its input by replacing delimiters 4225 + * with '\0'. Use a stack copy so the global module parameter buffer 4226 + * remains intact for multi-GPU systems where this function is called 4227 + * once per device. 4228 + */ 4229 + strscpy(buf, amdgpu_lockup_timeout, sizeof(buf)); 4223 4230 4224 4231 while ((timeout_setting = strsep(&input, ",")) && 4225 4232 strnlen(timeout_setting, AMDGPU_MAX_TIMEOUT_PARAM_LENGTH)) {