Merge tag 'drm-xe-next-fixes-2025-03-12' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next

tjh.dev / kernel

fork

Configure Feed

Issues Pull Requests Commits Tags

Feed URL

Select the types of activity you want to include in your feed.

Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

kernel os linux

fork

Configure Feed

Issues Pull Requests Commits Tags

Feed URL

Select the types of activity you want to include in your feed.

Merge tag 'drm-xe-next-fixes-2025-03-12' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next

Core Changes:
- Fix kernel-doc for gpusvm (Lucas)

Driver Changes:
- Drop duplicated pc_start call (Rodrigo)
- Drop sentinels from rtp (Lucas)
- Fix MOCS debugfs missing forcewake (Tvrtko)
- Ring flush invalitation (Tvrtko)
- Fix type for width alignement (Tvrtko)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/fsztfqcddrarwjlxjwm2k4wvc6u5vntceh6b7nsnxjmwzgtunj@sbkshjow65rf

Dave Airlie 1 year ago 5da39dce 64fc5dc8

+122 -121

13 changed files

expand all collapse all

Documentation

gpu

rfc

gpusvm.rst

drivers

gpu

drm

drm_gpusvm.c

display

xe_fb_pin.c

tests

xe_rtp_test.c

xe_guc.c

xe_hw_engine.c

xe_mocs.c

xe_reg_whitelist.c

xe_ring_ops.c

xe_rtp.c

xe_rtp.h

xe_tuning.c

xe_wa.c

+10 -5

Documentation/gpu/rfc/gpusvm.rst

reviewed

··· 67 67 Overview of baseline design 68 68 =========================== 69 69 70 70 - Baseline design is simple as possible to get a working basline in which can be 71 71 - built upon. 72 72 - 73 73 - .. kernel-doc:: drivers/gpu/drm/xe/drm_gpusvm.c 70 70 + .. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c 74 71 :doc: Overview 72 72 + 73 73 + .. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c 75 74 :doc: Locking 76 76 - :doc: Migrataion 75 75 + 76 76 + .. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c 77 77 + :doc: Migration 78 78 + 79 79 + .. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c 77 80 :doc: Partial Unmapping of Ranges 81 81 + 82 82 + .. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c 78 83 :doc: Examples 79 84 80 85 Possible future design features

+69 -55

drivers/gpu/drm/drm_gpusvm.c

reviewed

··· 23 23 * DOC: Overview 24 24 * 25 25 * GPU Shared Virtual Memory (GPU SVM) layer for the Direct Rendering Manager (DRM) 26 26 - * 27 27 - * The GPU SVM layer is a component of the DRM framework designed to manage shared 28 28 - * virtual memory between the CPU and GPU. It enables efficient data exchange and 29 29 - * processing for GPU-accelerated applications by allowing memory sharing and 26 26 + * is a component of the DRM framework designed to manage shared virtual memory 27 27 + * between the CPU and GPU. It enables efficient data exchange and processing 28 28 + * for GPU-accelerated applications by allowing memory sharing and 30 29 * synchronization between the CPU's and GPU's virtual address spaces. 31 30 * 32 31 * Key GPU SVM Components: 33 33 - * - Notifiers: Notifiers: Used for tracking memory intervals and notifying the 34 34 - * GPU of changes, notifiers are sized based on a GPU SVM 35 35 - * initialization parameter, with a recommendation of 512M or 36 36 - * larger. They maintain a Red-BlacK tree and a list of ranges that 37 37 - * fall within the notifier interval. Notifiers are tracked within 38 38 - * a GPU SVM Red-BlacK tree and list and are dynamically inserted 39 39 - * or removed as ranges within the interval are created or 40 40 - * destroyed. 41 41 - * - Ranges: Represent memory ranges mapped in a DRM device and managed 42 42 - * by GPU SVM. They are sized based on an array of chunk sizes, which 43 43 - * is a GPU SVM initialization parameter, and the CPU address space. 44 44 - * Upon GPU fault, the largest aligned chunk that fits within the 45 45 - * faulting CPU address space is chosen for the range size. Ranges are 46 46 - * expected to be dynamically allocated on GPU fault and removed on an 47 47 - * MMU notifier UNMAP event. As mentioned above, ranges are tracked in 48 48 - * a notifier's Red-Black tree. 49 49 - * - Operations: Define the interface for driver-specific GPU SVM operations 50 50 - * such as range allocation, notifier allocation, and 51 51 - * invalidations. 52 52 - * - Device Memory Allocations: Embedded structure containing enough information 53 53 - * for GPU SVM to migrate to / from device memory. 54 54 - * - Device Memory Operations: Define the interface for driver-specific device 55 55 - * memory operations release memory, populate pfns, 56 56 - * and copy to / from device memory. 32 32 + * 33 33 + * - Notifiers: 34 34 + * Used for tracking memory intervals and notifying the GPU of changes, 35 35 + * notifiers are sized based on a GPU SVM initialization parameter, with a 36 36 + * recommendation of 512M or larger. They maintain a Red-BlacK tree and a 37 37 + * list of ranges that fall within the notifier interval. Notifiers are 38 38 + * tracked within a GPU SVM Red-BlacK tree and list and are dynamically 39 39 + * inserted or removed as ranges within the interval are created or 40 40 + * destroyed. 41 41 + * - Ranges: 42 42 + * Represent memory ranges mapped in a DRM device and managed by GPU SVM. 43 43 + * They are sized based on an array of chunk sizes, which is a GPU SVM 44 44 + * initialization parameter, and the CPU address space. Upon GPU fault, 45 45 + * the largest aligned chunk that fits within the faulting CPU address 46 46 + * space is chosen for the range size. Ranges are expected to be 47 47 + * dynamically allocated on GPU fault and removed on an MMU notifier UNMAP 48 48 + * event. As mentioned above, ranges are tracked in a notifier's Red-Black 49 49 + * tree. 50 50 + * 51 51 + * - Operations: 52 52 + * Define the interface for driver-specific GPU SVM operations such as 53 53 + * range allocation, notifier allocation, and invalidations. 54 54 + * 55 55 + * - Device Memory Allocations: 56 56 + * Embedded structure containing enough information for GPU SVM to migrate 57 57 + * to / from device memory. 58 58 + * 59 59 + * - Device Memory Operations: 60 60 + * Define the interface for driver-specific device memory operations 61 61 + * release memory, populate pfns, and copy to / from device memory. 57 62 * 58 63 * This layer provides interfaces for allocating, mapping, migrating, and 59 64 * releasing memory ranges between the CPU and GPU. It handles all core memory ··· 68 63 * below. 69 64 * 70 65 * Expected Driver Components: 71 71 - * - GPU page fault handler: Used to create ranges and notifiers based on the 72 72 - * fault address, optionally migrate the range to 73 73 - * device memory, and create GPU bindings. 74 74 - * - Garbage collector: Used to unmap and destroy GPU bindings for ranges. 75 75 - * Ranges are expected to be added to the garbage collector 76 76 - * upon a MMU_NOTIFY_UNMAP event in notifier callback. 77 77 - * - Notifier callback: Used to invalidate and DMA unmap GPU bindings for 78 78 - * ranges. 66 66 + * 67 67 + * - GPU page fault handler: 68 68 + * Used to create ranges and notifiers based on the fault address, 69 69 + * optionally migrate the range to device memory, and create GPU bindings. 70 70 + * 71 71 + * - Garbage collector: 72 72 + * Used to unmap and destroy GPU bindings for ranges. Ranges are expected 73 73 + * to be added to the garbage collector upon a MMU_NOTIFY_UNMAP event in 74 74 + * notifier callback. 75 75 + * 76 76 + * - Notifier callback: 77 77 + * Used to invalidate and DMA unmap GPU bindings for ranges. 79 78 */ 80 79 81 80 /** ··· 92 83 * range RB tree and list, as well as the range's DMA mappings and sequence 93 84 * number. GPU SVM manages all necessary locking and unlocking operations, 94 85 * except for the recheck range's pages being valid 95 95 - * (drm_gpusvm_range_pages_valid) when the driver is committing GPU bindings. This 96 96 - * lock corresponds to the 'driver->update' lock mentioned in the HMM 97 97 - * documentation (TODO: Link). Future revisions may transition from a GPU SVM 86 86 + * (drm_gpusvm_range_pages_valid) when the driver is committing GPU bindings. 87 87 + * This lock corresponds to the ``driver->update`` lock mentioned in 88 88 + * Documentation/mm/hmm.rst. Future revisions may transition from a GPU SVM 98 89 * global lock to a per-notifier lock if finer-grained locking is deemed 99 90 * necessary. 100 91 * ··· 111 102 * DOC: Migration 112 103 * 113 104 * The migration support is quite simple, allowing migration between RAM and 114 114 - * device memory at the range granularity. For example, GPU SVM currently does not 115 115 - * support mixing RAM and device memory pages within a range. This means that upon GPU 116 116 - * fault, the entire range can be migrated to device memory, and upon CPU fault, the 117 117 - * entire range is migrated to RAM. Mixed RAM and device memory storage within a range 118 118 - * could be added in the future if required. 105 105 + * device memory at the range granularity. For example, GPU SVM currently does 106 106 + * not support mixing RAM and device memory pages within a range. This means 107 107 + * that upon GPU fault, the entire range can be migrated to device memory, and 108 108 + * upon CPU fault, the entire range is migrated to RAM. Mixed RAM and device 109 109 + * memory storage within a range could be added in the future if required. 119 110 * 120 111 * The reasoning for only supporting range granularity is as follows: it 121 112 * simplifies the implementation, and range sizes are driver-defined and should ··· 128 119 * Partial unmapping of ranges (e.g., 1M out of 2M is unmapped by CPU resulting 129 120 * in MMU_NOTIFY_UNMAP event) presents several challenges, with the main one 130 121 * being that a subset of the range still has CPU and GPU mappings. If the 131 131 - * backing store for the range is in device memory, a subset of the backing store has 132 132 - * references. One option would be to split the range and device memory backing store, 133 133 - * but the implementation for this would be quite complicated. Given that 134 134 - * partial unmappings are rare and driver-defined range sizes are relatively 135 135 - * small, GPU SVM does not support splitting of ranges. 122 122 + * backing store for the range is in device memory, a subset of the backing 123 123 + * store has references. One option would be to split the range and device 124 124 + * memory backing store, but the implementation for this would be quite 125 125 + * complicated. Given that partial unmappings are rare and driver-defined range 126 126 + * sizes are relatively small, GPU SVM does not support splitting of ranges. 136 127 * 137 128 * With no support for range splitting, upon partial unmapping of a range, the 138 129 * driver is expected to invalidate and destroy the entire range. If the range ··· 152 143 * potentially required driver locking (e.g., DMA-resv locks). 153 144 * 154 145 * 1) GPU page fault handler 146 146 + * 147 147 + * .. code-block:: c 155 148 * 156 149 * int driver_bind_range(struct drm_gpusvm *gpusvm, struct drm_gpusvm_range *range) 157 150 * { ··· 219 208 * return err; 220 209 * } 221 210 * 222 222 - * 2) Garbage Collector. 211 211 + * 2) Garbage Collector 212 212 + * 213 213 + * .. code-block:: c 223 214 * 224 215 * void __driver_garbage_collector(struct drm_gpusvm *gpusvm, 225 216 * struct drm_gpusvm_range *range) ··· 244 231 * __driver_garbage_collector(gpusvm, range); 245 232 * } 246 233 * 247 247 - * 3) Notifier callback. 234 234 + * 3) Notifier callback 235 235 + * 236 236 + * .. code-block:: c 248 237 * 249 238 * void driver_invalidation(struct drm_gpusvm *gpusvm, 250 239 * struct drm_gpusvm_notifier *notifier, ··· 514 499 return true; 515 500 } 516 501 517 517 - /** 502 502 + /* 518 503 * drm_gpusvm_notifier_ops - MMU interval notifier operations for GPU SVM 519 504 */ 520 505 static const struct mmu_interval_notifier_ops drm_gpusvm_notifier_ops = { ··· 2070 2055 2071 2056 /** 2072 2057 * drm_gpusvm_range_evict - Evict GPU SVM range 2073 2073 - * @pagemap: Pointer to the GPU SVM structure 2074 2058 * @range: Pointer to the GPU SVM range to be removed 2075 2059 * 2076 2060 * This function evicts the specified GPU SVM range. This function will not ··· 2160 2146 return err ? VM_FAULT_SIGBUS : 0; 2161 2147 } 2162 2148 2163 2163 - /** 2164 2164 - * drm_gpusvm_pagemap_ops() - Device page map operations for GPU SVM 2149 2149 + /* 2150 2150 + * drm_gpusvm_pagemap_ops - Device page map operations for GPU SVM 2165 2151 */ 2166 2152 static const struct dev_pagemap_ops drm_gpusvm_pagemap_ops = { 2167 2153 .page_free = drm_gpusvm_page_free,

+10 -10

drivers/gpu/drm/xe/display/xe_fb_pin.c

reviewed

··· 82 82 static int __xe_pin_fb_vma_dpt(const struct intel_framebuffer *fb, 83 83 const struct i915_gtt_view *view, 84 84 struct i915_vma *vma, 85 85 - u64 physical_alignment) 85 85 + unsigned int alignment) 86 86 { 87 87 struct xe_device *xe = to_xe_device(fb->base.dev); 88 88 struct xe_tile *tile0 = xe_device_get_root_tile(xe); ··· 108 108 XE_BO_FLAG_VRAM0 | 109 109 XE_BO_FLAG_GGTT | 110 110 XE_BO_FLAG_PAGETABLE, 111 111 - physical_alignment); 111 111 + alignment); 112 112 else 113 113 dpt = xe_bo_create_pin_map_at_aligned(xe, tile0, NULL, 114 114 dpt_size, ~0ull, ··· 116 116 XE_BO_FLAG_STOLEN | 117 117 XE_BO_FLAG_GGTT | 118 118 XE_BO_FLAG_PAGETABLE, 119 119 - physical_alignment); 119 119 + alignment); 120 120 if (IS_ERR(dpt)) 121 121 dpt = xe_bo_create_pin_map_at_aligned(xe, tile0, NULL, 122 122 dpt_size, ~0ull, ··· 124 124 XE_BO_FLAG_SYSTEM | 125 125 XE_BO_FLAG_GGTT | 126 126 XE_BO_FLAG_PAGETABLE, 127 127 - physical_alignment); 127 127 + alignment); 128 128 if (IS_ERR(dpt)) 129 129 return PTR_ERR(dpt); 130 130 ··· 194 194 static int __xe_pin_fb_vma_ggtt(const struct intel_framebuffer *fb, 195 195 const struct i915_gtt_view *view, 196 196 struct i915_vma *vma, 197 197 - u64 physical_alignment) 197 197 + unsigned int alignment) 198 198 { 199 199 struct drm_gem_object *obj = intel_fb_bo(&fb->base); 200 200 struct xe_bo *bo = gem_to_xe_bo(obj); ··· 277 277 278 278 static struct i915_vma *__xe_pin_fb_vma(const struct intel_framebuffer *fb, 279 279 const struct i915_gtt_view *view, 280 280 - u64 physical_alignment) 280 280 + unsigned int alignment) 281 281 { 282 282 struct drm_device *dev = fb->base.dev; 283 283 struct xe_device *xe = to_xe_device(dev); ··· 327 327 328 328 vma->bo = bo; 329 329 if (intel_fb_uses_dpt(&fb->base)) 330 330 - ret = __xe_pin_fb_vma_dpt(fb, view, vma, physical_alignment); 330 330 + ret = __xe_pin_fb_vma_dpt(fb, view, vma, alignment); 331 331 else 332 332 - ret = __xe_pin_fb_vma_ggtt(fb, view, vma, physical_alignment); 332 332 + ret = __xe_pin_fb_vma_ggtt(fb, view, vma, alignment); 333 333 if (ret) 334 334 goto err_unpin; 335 335 ··· 422 422 struct i915_vma *vma; 423 423 struct intel_framebuffer *intel_fb = to_intel_framebuffer(fb); 424 424 struct intel_plane *plane = to_intel_plane(new_plane_state->uapi.plane); 425 425 - u64 phys_alignment = plane->min_alignment(plane, fb, 0); 425 425 + unsigned int alignment = plane->min_alignment(plane, fb, 0); 426 426 427 427 if (reuse_vma(new_plane_state, old_plane_state)) 428 428 return 0; ··· 430 430 /* We reject creating !SCANOUT fb's, so this is weird.. */ 431 431 drm_WARN_ON(bo->ttm.base.dev, !(bo->flags & XE_BO_FLAG_SCANOUT)); 432 432 433 433 - vma = __xe_pin_fb_vma(intel_fb, &new_plane_state->view.gtt, phys_alignment); 433 433 + vma = __xe_pin_fb_vma(intel_fb, &new_plane_state->view.gtt, alignment); 434 434 435 435 if (IS_ERR(vma)) 436 436 return PTR_ERR(vma);

+1 -1

drivers/gpu/drm/xe/tests/xe_rtp_test.c

reviewed

··· 320 320 count_rtp_entries++; 321 321 322 322 xe_rtp_process_ctx_enable_active_tracking(&ctx, &active, count_rtp_entries); 323 323 - xe_rtp_process_to_sr(&ctx, param->entries, reg_sr); 323 323 + xe_rtp_process_to_sr(&ctx, param->entries, count_rtp_entries, reg_sr); 324 324 325 325 xa_for_each(&reg_sr->xa, idx, sre) { 326 326 if (idx == param->expected_reg.addr)

-8

drivers/gpu/drm/xe/xe_guc.c

reviewed

··· 1496 1496 1497 1497 int xe_guc_start(struct xe_guc *guc) 1498 1498 { 1499 1499 - if (!IS_SRIOV_VF(guc_to_xe(guc))) { 1500 1500 - int err; 1501 1501 - 1502 1502 - err = xe_guc_pc_start(&guc->pc); 1503 1503 - xe_gt_WARN(guc_to_gt(guc), err, "Failed to start GuC PC: %pe\n", 1504 1504 - ERR_PTR(err)); 1505 1505 - } 1506 1506 - 1507 1499 return xe_guc_submit_start(guc); 1508 1500 } 1509 1501

+2 -4

drivers/gpu/drm/xe/xe_hw_engine.c

reviewed

··· 400 400 PREEMPT_GPGPU_THREAD_GROUP_LEVEL)), 401 401 XE_RTP_ENTRY_FLAG(FOREACH_ENGINE) 402 402 }, 403 403 - {} 404 403 }; 405 404 406 406 - xe_rtp_process_to_sr(&ctx, lrc_setup, &hwe->reg_lrc); 405 405 + xe_rtp_process_to_sr(&ctx, lrc_setup, ARRAY_SIZE(lrc_setup), &hwe->reg_lrc); 407 406 } 408 407 409 408 static void ··· 458 459 XE_RTP_ACTIONS(SET(CSFE_CHICKEN1(0), CS_PRIORITY_MEM_READ, 459 460 XE_RTP_ACTION_FLAG(ENGINE_BASE))) 460 461 }, 461 461 - {} 462 462 }; 463 463 464 464 - xe_rtp_process_to_sr(&ctx, engine_entries, &hwe->reg_sr); 464 464 + xe_rtp_process_to_sr(&ctx, engine_entries, ARRAY_SIZE(engine_entries), &hwe->reg_sr); 465 465 } 466 466 467 467 static const struct engine_info *find_engine_info(enum xe_engine_class class, int instance)

+3 -1

drivers/gpu/drm/xe/xe_mocs.c

reviewed

··· 781 781 flags = get_mocs_settings(xe, &table); 782 782 783 783 xe_pm_runtime_get_noresume(xe); 784 784 - fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FW_GT); 784 784 + fw_ref = xe_force_wake_get(gt_to_fw(gt), 785 785 + flags & HAS_LNCF_MOCS ? 786 786 + XE_FORCEWAKE_ALL : XE_FW_GT); 785 787 if (!fw_ref) 786 788 goto err_fw; 787 789

+2 -2

drivers/gpu/drm/xe/xe_reg_whitelist.c

reviewed

··· 88 88 RING_FORCE_TO_NONPRIV_ACCESS_RD | 89 89 RING_FORCE_TO_NONPRIV_RANGE_4)) 90 90 }, 91 91 - {} 92 91 }; 93 92 94 93 static void whitelist_apply_to_hwe(struct xe_hw_engine *hwe) ··· 136 137 { 137 138 struct xe_rtp_process_ctx ctx = XE_RTP_PROCESS_CTX_INITIALIZER(hwe); 138 139 139 139 - xe_rtp_process_to_sr(&ctx, register_whitelist, &hwe->reg_whitelist); 140 140 + xe_rtp_process_to_sr(&ctx, register_whitelist, ARRAY_SIZE(register_whitelist), 141 141 + &hwe->reg_whitelist); 140 142 whitelist_apply_to_hwe(hwe); 141 143 } 142 144

+12 -16

drivers/gpu/drm/xe/xe_ring_ops.c

reviewed

··· 90 90 return i; 91 91 } 92 92 93 93 - static int emit_flush_imm_ggtt(u32 addr, u32 value, bool invalidate_tlb, 94 94 - u32 *dw, int i) 93 93 + static int emit_flush_imm_ggtt(u32 addr, u32 value, u32 flags, u32 *dw, int i) 95 94 { 96 95 dw[i++] = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_IMM_DW | 97 97 - (invalidate_tlb ? MI_INVALIDATE_TLB : 0); 96 96 + flags; 98 97 dw[i++] = addr | MI_FLUSH_DW_USE_GTT; 99 98 dw[i++] = 0; 100 99 dw[i++] = value; ··· 110 111 return i; 111 112 } 112 113 113 113 - static int emit_flush_invalidate(u32 flag, u32 *dw, int i) 114 114 + static int emit_flush_invalidate(u32 *dw, int i) 114 115 { 115 115 - dw[i] = MI_FLUSH_DW; 116 116 - dw[i] |= flag; 117 117 - dw[i++] |= MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_IMM_DW | 118 118 - MI_FLUSH_DW_STORE_INDEX; 119 119 - 120 120 - dw[i++] = LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR | MI_FLUSH_DW_USE_GTT; 116 116 + dw[i++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | MI_FLUSH_DW_OP_STOREDW | 117 117 + MI_FLUSH_IMM_DW | MI_FLUSH_DW_STORE_INDEX; 118 118 + dw[i++] = LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR; 121 119 dw[i++] = 0; 122 122 - dw[i++] = ~0U; 120 120 + dw[i++] = 0; 123 121 124 122 return i; 125 123 } ··· 253 257 if (job->ring_ops_flush_tlb) { 254 258 dw[i++] = preparser_disable(true); 255 259 i = emit_flush_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc), 256 256 - seqno, true, dw, i); 260 260 + seqno, MI_INVALIDATE_TLB, dw, i); 257 261 dw[i++] = preparser_disable(false); 258 262 } else { 259 263 i = emit_store_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc), ··· 269 273 dw, i); 270 274 } 271 275 272 272 - i = emit_flush_imm_ggtt(xe_lrc_seqno_ggtt_addr(lrc), seqno, false, dw, i); 276 276 + i = emit_flush_imm_ggtt(xe_lrc_seqno_ggtt_addr(lrc), seqno, 0, dw, i); 273 277 274 278 i = emit_user_interrupt(dw, i); 275 279 ··· 315 319 316 320 if (job->ring_ops_flush_tlb) 317 321 i = emit_flush_imm_ggtt(xe_lrc_start_seqno_ggtt_addr(lrc), 318 318 - seqno, true, dw, i); 322 322 + seqno, MI_INVALIDATE_TLB, dw, i); 319 323 320 324 dw[i++] = preparser_disable(false); 321 325 ··· 332 336 dw, i); 333 337 } 334 338 335 335 - i = emit_flush_imm_ggtt(xe_lrc_seqno_ggtt_addr(lrc), seqno, false, dw, i); 339 339 + i = emit_flush_imm_ggtt(xe_lrc_seqno_ggtt_addr(lrc), seqno, 0, dw, i); 336 340 337 341 i = emit_user_interrupt(dw, i); 338 342 ··· 409 413 if (!IS_SRIOV_VF(gt_to_xe(job->q->gt))) { 410 414 /* XXX: Do we need this? Leaving for now. */ 411 415 dw[i++] = preparser_disable(true); 412 412 - i = emit_flush_invalidate(0, dw, i); 416 416 + i = emit_flush_invalidate(dw, i); 413 417 dw[i++] = preparser_disable(false); 414 418 } 415 419

+5 -1

drivers/gpu/drm/xe/xe_rtp.c

reviewed

··· 237 237 * the save-restore argument. 238 238 * @ctx: The context for processing the table, with one of device, gt or hwe 239 239 * @entries: Table with RTP definitions 240 240 + * @n_entries: Number of entries to process, usually ARRAY_SIZE(entries) 240 241 * @sr: Save-restore struct where matching rules execute the action. This can be 241 242 * viewed as the "coalesced view" of multiple the tables. The bits for each 242 243 * register set are expected not to collide with previously added entries ··· 248 247 */ 249 248 void xe_rtp_process_to_sr(struct xe_rtp_process_ctx *ctx, 250 249 const struct xe_rtp_entry_sr *entries, 250 250 + size_t n_entries, 251 251 struct xe_reg_sr *sr) 252 252 { 253 253 const struct xe_rtp_entry_sr *entry; ··· 261 259 if (IS_SRIOV_VF(xe)) 262 260 return; 263 261 264 264 - for (entry = entries; entry && entry->name; entry++) { 262 262 + xe_assert(xe, entries); 263 263 + 264 264 + for (entry = entries; entry - entries < n_entries; entry++) { 265 265 bool match = false; 266 266 267 267 if (entry->flags & XE_RTP_ENTRY_FLAG_FOREACH_ENGINE) {

+1 -1

drivers/gpu/drm/xe/xe_rtp.h

reviewed

··· 430 430 431 431 void xe_rtp_process_to_sr(struct xe_rtp_process_ctx *ctx, 432 432 const struct xe_rtp_entry_sr *entries, 433 433 - struct xe_reg_sr *sr); 433 433 + size_t n_entries, struct xe_reg_sr *sr); 434 434 435 435 void xe_rtp_process(struct xe_rtp_process_ctx *ctx, 436 436 const struct xe_rtp_entry *entries);

+4 -8

drivers/gpu/drm/xe/xe_tuning.c

reviewed

··· 85 85 XE_RTP_RULES(MEDIA_VERSION(2000)), 86 86 XE_RTP_ACTIONS(SET(XE2LPM_SCRATCH3_LBCF, RWFLUSHALLEN)) 87 87 }, 88 88 - 89 89 - {} 90 88 }; 91 89 92 90 static const struct xe_rtp_entry_sr engine_tunings[] = { ··· 98 100 ENGINE_CLASS(RENDER)), 99 101 XE_RTP_ACTIONS(SET(SAMPLER_MODE, INDIRECT_STATE_BASE_ADDR_OVERRIDE)) 100 102 }, 101 101 - {} 102 103 }; 103 104 104 105 static const struct xe_rtp_entry_sr lrc_tunings[] = { ··· 135 138 XE_RTP_ACTIONS(FIELD_SET(FF_MODE, VS_HIT_MAX_VALUE_MASK, 136 139 REG_FIELD_PREP(VS_HIT_MAX_VALUE_MASK, 0x3f))) 137 140 }, 138 138 - 139 139 - {} 140 141 }; 141 142 142 143 /** ··· 175 180 xe_rtp_process_ctx_enable_active_tracking(&ctx, 176 181 gt->tuning_active.gt, 177 182 ARRAY_SIZE(gt_tunings)); 178 178 - xe_rtp_process_to_sr(&ctx, gt_tunings, &gt->reg_sr); 183 183 + xe_rtp_process_to_sr(&ctx, gt_tunings, ARRAY_SIZE(gt_tunings), &gt->reg_sr); 179 184 } 180 185 EXPORT_SYMBOL_IF_KUNIT(xe_tuning_process_gt); 181 186 ··· 186 191 xe_rtp_process_ctx_enable_active_tracking(&ctx, 187 192 hwe->gt->tuning_active.engine, 188 193 ARRAY_SIZE(engine_tunings)); 189 189 - xe_rtp_process_to_sr(&ctx, engine_tunings, &hwe->reg_sr); 194 194 + xe_rtp_process_to_sr(&ctx, engine_tunings, ARRAY_SIZE(engine_tunings), 195 195 + &hwe->reg_sr); 190 196 } 191 197 EXPORT_SYMBOL_IF_KUNIT(xe_tuning_process_engine); 192 198 ··· 206 210 xe_rtp_process_ctx_enable_active_tracking(&ctx, 207 211 hwe->gt->tuning_active.lrc, 208 212 ARRAY_SIZE(lrc_tunings)); 209 209 - xe_rtp_process_to_sr(&ctx, lrc_tunings, &hwe->reg_lrc); 213 213 + xe_rtp_process_to_sr(&ctx, lrc_tunings, ARRAY_SIZE(lrc_tunings), &hwe->reg_lrc); 210 214 } 211 215 212 216 void xe_tuning_dump(struct xe_gt *gt, struct drm_printer *p)

+3 -9

drivers/gpu/drm/xe/xe_wa.c

reviewed

··· 279 279 XE_RTP_ACTIONS(SET(VDBOX_CGCTL3F10(0), RAMDFTUNIT_CLKGATE_DIS)), 280 280 XE_RTP_ENTRY_FLAG(FOREACH_ENGINE), 281 281 }, 282 282 - 283 283 - {} 284 282 }; 285 283 286 284 static const struct xe_rtp_entry_sr engine_was[] = { ··· 622 624 FUNC(xe_rtp_match_first_render_or_compute)), 623 625 XE_RTP_ACTIONS(SET(TDL_TSL_CHICKEN, RES_CHK_SPR_DIS)) 624 626 }, 625 625 - 626 626 - {} 627 627 }; 628 628 629 629 static const struct xe_rtp_entry_sr lrc_was[] = { ··· 821 825 DIS_PARTIAL_AUTOSTRIP | 822 826 DIS_AUTOSTRIP)) 823 827 }, 824 824 - 825 825 - {} 826 828 }; 827 829 828 830 static __maybe_unused const struct xe_rtp_entry oob_was[] = { ··· 862 868 863 869 xe_rtp_process_ctx_enable_active_tracking(&ctx, gt->wa_active.gt, 864 870 ARRAY_SIZE(gt_was)); 865 865 - xe_rtp_process_to_sr(&ctx, gt_was, &gt->reg_sr); 871 871 + xe_rtp_process_to_sr(&ctx, gt_was, ARRAY_SIZE(gt_was), &gt->reg_sr); 866 872 } 867 873 EXPORT_SYMBOL_IF_KUNIT(xe_wa_process_gt); 868 874 ··· 880 886 881 887 xe_rtp_process_ctx_enable_active_tracking(&ctx, hwe->gt->wa_active.engine, 882 888 ARRAY_SIZE(engine_was)); 883 883 - xe_rtp_process_to_sr(&ctx, engine_was, &hwe->reg_sr); 889 889 + xe_rtp_process_to_sr(&ctx, engine_was, ARRAY_SIZE(engine_was), &hwe->reg_sr); 884 890 } 885 891 886 892 /** ··· 897 903 898 904 xe_rtp_process_ctx_enable_active_tracking(&ctx, hwe->gt->wa_active.lrc, 899 905 ARRAY_SIZE(lrc_was)); 900 900 - xe_rtp_process_to_sr(&ctx, lrc_was, &hwe->reg_lrc); 906 906 + xe_rtp_process_to_sr(&ctx, lrc_was, ARRAY_SIZE(lrc_was), &hwe->reg_lrc); 901 907 } 902 908 903 909 /**