Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

drm/amdgpu: Fix error handling in slot reset

If the device has not recovered after slot reset is called, it goes to
out label for error handling. There it could make decision based on
uninitialized hive pointer and could result in accessing an uninitialized
list.

Initialize the list and hive properly so that it handles the error
situation and also releases the reset domain lock which is acquired
during error_detected callback.

Fixes: 732c6cefc1ec ("drm/amdgpu: Replace tmp_adev with hive in amdgpu_pci_slot_reset")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Ce Sun <cesun102@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit bb71362182e59caa227e4192da5a612b09349696)

authored by

Lijo Lazar and committed by
Alex Deucher
b57c4ec9 a5fe1a54

+10 -7
+10 -7
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
··· 7059 7059 dev_info(adev->dev, "PCI error: slot reset callback!!\n"); 7060 7060 7061 7061 memset(&reset_context, 0, sizeof(reset_context)); 7062 + INIT_LIST_HEAD(&device_list); 7063 + hive = amdgpu_get_xgmi_hive(adev); 7064 + if (hive) { 7065 + mutex_lock(&hive->hive_lock); 7066 + list_for_each_entry(tmp_adev, &hive->device_list, gmc.xgmi.head) 7067 + list_add_tail(&tmp_adev->reset_list, &device_list); 7068 + } else { 7069 + list_add_tail(&adev->reset_list, &device_list); 7070 + } 7062 7071 7063 7072 if (adev->pcie_reset_ctx.swus) 7064 7073 link_dev = adev->pcie_reset_ctx.swus; ··· 7108 7099 reset_context.reset_req_dev = adev; 7109 7100 set_bit(AMDGPU_NEED_FULL_RESET, &reset_context.flags); 7110 7101 set_bit(AMDGPU_SKIP_COREDUMP, &reset_context.flags); 7111 - INIT_LIST_HEAD(&device_list); 7112 7102 7113 - hive = amdgpu_get_xgmi_hive(adev); 7114 7103 if (hive) { 7115 - mutex_lock(&hive->hive_lock); 7116 7104 reset_context.hive = hive; 7117 - list_for_each_entry(tmp_adev, &hive->device_list, gmc.xgmi.head) { 7105 + list_for_each_entry(tmp_adev, &hive->device_list, gmc.xgmi.head) 7118 7106 tmp_adev->pcie_reset_ctx.in_link_reset = true; 7119 - list_add_tail(&tmp_adev->reset_list, &device_list); 7120 - } 7121 7107 } else { 7122 7108 set_bit(AMDGPU_SKIP_HW_RESET, &reset_context.flags); 7123 - list_add_tail(&adev->reset_list, &device_list); 7124 7109 } 7125 7110 7126 7111 r = amdgpu_device_asic_reset(adev, &device_list, &reset_context);