Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

drm/amdgpu: Optimize gfx v9 GPU page fault handling

After GPU page fault, there are lots of page fault interrupts generated
at short period even with CAM filter enabled because the fault address
is different. Each page fault copy to KFD ih fifo to send event to user
space by KFD interrupt worker, this could cause KFD ih fifo overflow
while other processes generate events at same time.

KFD process is aborted after GPU page fault, we only need one GPU page
fault interrupt sent to KFD ih fifo to send memory exception event to
user space.

Incease KFD ih fifo size to 2 times of IH primary ring size, to handle
the burst events case.

This patch handle the gfx v9 path, cover retry on/off and CAM filter
on/off cases.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

authored by

Philip Yang and committed by
Alex Deucher
1b001432 f607b2b8

+84 -1
+10
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
··· 433 433 int kgd2kfd_start_sched(struct kfd_dev *kfd, uint32_t node_id); 434 434 int kgd2kfd_stop_sched(struct kfd_dev *kfd, uint32_t node_id); 435 435 bool kgd2kfd_compute_active(struct kfd_dev *kfd, uint32_t node_id); 436 + bool kgd2kfd_vmfault_fast_path(struct amdgpu_device *adev, struct amdgpu_iv_entry *entry, 437 + bool retry_fault); 438 + 436 439 #else 437 440 static inline int kgd2kfd_init(void) 438 441 { ··· 521 518 { 522 519 return false; 523 520 } 521 + 522 + static inline bool kgd2kfd_vmfault_fast_path(struct amdgpu_device *adev, struct amdgpu_iv_entry *entry, 523 + bool retry_fault) 524 + { 525 + return false; 526 + } 527 + 524 528 #endif 525 529 #endif /* AMDGPU_AMDKFD_H_INCLUDED */
+3
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
··· 623 623 } 624 624 } 625 625 626 + if (kgd2kfd_vmfault_fast_path(adev, entry, retry_fault)) 627 + return 1; 628 + 626 629 if (!printk_ratelimit()) 627 630 return 0; 628 631
+67
drivers/gpu/drm/amd/amdkfd/kfd_device.c
··· 1521 1521 return kfd_compute_active(node); 1522 1522 } 1523 1523 1524 + /** 1525 + * kgd2kfd_vmfault_fast_path() - KFD vm page fault interrupt handling fast path for gmc v9 1526 + * @adev: amdgpu device 1527 + * @entry: vm fault interrupt vector 1528 + * @retry_fault: if this is retry fault 1529 + * 1530 + * retry fault - 1531 + * with CAM enabled, adev primary ring 1532 + * | gmc_v9_0_process_interrupt() 1533 + * adev soft_ring 1534 + * | gmc_v9_0_process_interrupt() worker failed to recover page fault 1535 + * KFD node ih_fifo 1536 + * | KFD interrupt_wq worker 1537 + * kfd_signal_vm_fault_event 1538 + * 1539 + * without CAM, adev primary ring1 1540 + * | gmc_v9_0_process_interrupt worker failed to recvoer page fault 1541 + * KFD node ih_fifo 1542 + * | KFD interrupt_wq worker 1543 + * kfd_signal_vm_fault_event 1544 + * 1545 + * no-retry fault - 1546 + * adev primary ring 1547 + * | gmc_v9_0_process_interrupt() 1548 + * KFD node ih_fifo 1549 + * | KFD interrupt_wq worker 1550 + * kfd_signal_vm_fault_event 1551 + * 1552 + * fast path - After kfd_signal_vm_fault_event, gmc_v9_0_process_interrupt drop the page fault 1553 + * of same process, don't copy interrupt to KFD node ih_fifo. 1554 + * With gdb debugger enabled, need convert the retry fault to no-retry fault for 1555 + * debugger, cannot use the fast path. 1556 + * 1557 + * Return: 1558 + * true - use the fast path to handle this fault 1559 + * false - use normal path to handle it 1560 + */ 1561 + bool kgd2kfd_vmfault_fast_path(struct amdgpu_device *adev, struct amdgpu_iv_entry *entry, 1562 + bool retry_fault) 1563 + { 1564 + struct kfd_process *p; 1565 + u32 cam_index; 1566 + 1567 + if (entry->ih == &adev->irq.ih_soft || entry->ih == &adev->irq.ih1) { 1568 + p = kfd_lookup_process_by_pasid(entry->pasid); 1569 + if (!p) 1570 + return true; 1571 + 1572 + if (p->gpu_page_fault && !p->debug_trap_enabled) { 1573 + if (retry_fault && adev->irq.retry_cam_enabled) { 1574 + cam_index = entry->src_data[2] & 0x3ff; 1575 + WDOORBELL32(adev->irq.retry_cam_doorbell_index, cam_index); 1576 + } 1577 + 1578 + kfd_unref_process(p); 1579 + return true; 1580 + } 1581 + 1582 + /* 1583 + * This is the first page fault, set flag and then signal user space 1584 + */ 1585 + p->gpu_page_fault = true; 1586 + kfd_unref_process(p); 1587 + } 1588 + return false; 1589 + } 1590 + 1524 1591 #if defined(CONFIG_DEBUG_FS) 1525 1592 1526 1593 /* This function will send a package to HIQ to hang the HWS
+1 -1
drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c
··· 46 46 #include <linux/kfifo.h> 47 47 #include "kfd_priv.h" 48 48 49 - #define KFD_IH_NUM_ENTRIES 8192 49 + #define KFD_IH_NUM_ENTRIES 16384 50 50 51 51 static void interrupt_wq(struct work_struct *); 52 52
+3
drivers/gpu/drm/amd/amdkfd/kfd_priv.h
··· 1003 1003 struct semaphore runtime_enable_sema; 1004 1004 bool is_runtime_retry; 1005 1005 struct kfd_runtime_info runtime_info; 1006 + 1007 + /* if gpu page fault sent to KFD */ 1008 + bool gpu_page_fault; 1006 1009 }; 1007 1010 1008 1011 #define KFD_PROCESS_TABLE_SIZE 5 /* bits: 32 entries */