Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

drm/amdgpu: Fix error handling in slot reset

If the device has not recovered after slot reset is called, it goes to
out label for error handling. There it could make decision based on
uninitialized hive pointer and could result in accessing an uninitialized
list.

Initialize the list and hive properly so that it handles the error
situation and also releases the reset domain lock which is acquired
during error_detected callback.

Fixes: 732c6cefc1ec ("drm/amdgpu: Replace tmp_adev with hive in amdgpu_pci_slot_reset")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Ce Sun <cesun102@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

authored by

Lijo Lazar and committed by
Alex Deucher
bb713621 9eaaae4c

+10 -7
+10 -7
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
··· 7043 7043 dev_info(adev->dev, "PCI error: slot reset callback!!\n"); 7044 7044 7045 7045 memset(&reset_context, 0, sizeof(reset_context)); 7046 + INIT_LIST_HEAD(&device_list); 7047 + hive = amdgpu_get_xgmi_hive(adev); 7048 + if (hive) { 7049 + mutex_lock(&hive->hive_lock); 7050 + list_for_each_entry(tmp_adev, &hive->device_list, gmc.xgmi.head) 7051 + list_add_tail(&tmp_adev->reset_list, &device_list); 7052 + } else { 7053 + list_add_tail(&adev->reset_list, &device_list); 7054 + } 7046 7055 7047 7056 if (adev->pcie_reset_ctx.swus) 7048 7057 link_dev = adev->pcie_reset_ctx.swus; ··· 7092 7083 reset_context.reset_req_dev = adev; 7093 7084 set_bit(AMDGPU_NEED_FULL_RESET, &reset_context.flags); 7094 7085 set_bit(AMDGPU_SKIP_COREDUMP, &reset_context.flags); 7095 - INIT_LIST_HEAD(&device_list); 7096 7086 7097 - hive = amdgpu_get_xgmi_hive(adev); 7098 7087 if (hive) { 7099 - mutex_lock(&hive->hive_lock); 7100 7088 reset_context.hive = hive; 7101 - list_for_each_entry(tmp_adev, &hive->device_list, gmc.xgmi.head) { 7089 + list_for_each_entry(tmp_adev, &hive->device_list, gmc.xgmi.head) 7102 7090 tmp_adev->pcie_reset_ctx.in_link_reset = true; 7103 - list_add_tail(&tmp_adev->reset_list, &device_list); 7104 - } 7105 7091 } else { 7106 7092 set_bit(AMDGPU_SKIP_HW_RESET, &reset_context.flags); 7107 - list_add_tail(&adev->reset_list, &device_list); 7108 7093 } 7109 7094 7110 7095 r = amdgpu_device_asic_reset(adev, &device_list, &reset_context);