Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

drm/amdgpu: introduce a kind of halt state for amdgpu device

It is useful to maintain error context when debugging
SW/FW issues. Introduce amdgpu_device_halt() for this
purpose. It will bring hardware to a kind of halt state,
so that no one can touch it any more.

Compare to a simple hang, the system will keep stable
at least for SSH access. Then it should be trivial to
inspect the hardware state and see what's going on.

v2:
- Set adev->no_hw_access earlier to avoid potential crashes.(Christian)

Suggested-by: Christian Koenig <christian.koenig@amd.com>
Suggested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: Christian Koenig <christian.koenig@amd.co>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

authored by

Lang Yu and committed by
Alex Deucher
34f3a4a9 cace4bff

+41
+2
drivers/gpu/drm/amd/amdgpu/amdgpu.h
··· 1317 1317 void amdgpu_device_invalidate_hdp(struct amdgpu_device *adev, 1318 1318 struct amdgpu_ring *ring); 1319 1319 1320 + void amdgpu_device_halt(struct amdgpu_device *adev); 1321 + 1320 1322 /* atpx handler */ 1321 1323 #if defined(CONFIG_VGA_SWITCHEROO) 1322 1324 void amdgpu_register_atpx_handler(void);
+39
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
··· 5663 5663 5664 5664 amdgpu_asic_invalidate_hdp(adev, ring); 5665 5665 } 5666 + 5667 + /** 5668 + * amdgpu_device_halt() - bring hardware to some kind of halt state 5669 + * 5670 + * @adev: amdgpu_device pointer 5671 + * 5672 + * Bring hardware to some kind of halt state so that no one can touch it 5673 + * any more. It will help to maintain error context when error occurred. 5674 + * Compare to a simple hang, the system will keep stable at least for SSH 5675 + * access. Then it should be trivial to inspect the hardware state and 5676 + * see what's going on. Implemented as following: 5677 + * 5678 + * 1. drm_dev_unplug() makes device inaccessible to user space(IOCTLs, etc), 5679 + * clears all CPU mappings to device, disallows remappings through page faults 5680 + * 2. amdgpu_irq_disable_all() disables all interrupts 5681 + * 3. amdgpu_fence_driver_hw_fini() signals all HW fences 5682 + * 4. set adev->no_hw_access to avoid potential crashes after setp 5 5683 + * 5. amdgpu_device_unmap_mmio() clears all MMIO mappings 5684 + * 6. pci_disable_device() and pci_wait_for_pending_transaction() 5685 + * flush any in flight DMA operations 5686 + */ 5687 + void amdgpu_device_halt(struct amdgpu_device *adev) 5688 + { 5689 + struct pci_dev *pdev = adev->pdev; 5690 + struct drm_device *ddev = &adev->ddev; 5691 + 5692 + drm_dev_unplug(ddev); 5693 + 5694 + amdgpu_irq_disable_all(adev); 5695 + 5696 + amdgpu_fence_driver_hw_fini(adev); 5697 + 5698 + adev->no_hw_access = true; 5699 + 5700 + amdgpu_device_unmap_mmio(adev); 5701 + 5702 + pci_disable_device(pdev); 5703 + pci_wait_for_pending_transaction(pdev); 5704 + }